关于linux上j2ee平台选择的问题 (200分)

caomo · 2003-03-03

如果我们要建立一个在linux平台上主要基于开放源代码的平台的j2ee应用
但存储的数据量很大，并发访问上百。
如何选择平台组合？
尽量不用商业软件的前提下，尽量省钱但又尽量提高性能

列举些常见平台
sevlet&jsp:tomcat,resin,...
j2ee :jboss,jrun,weblogic,websphere,...
db: mysql,postsql,oracle,...

谁能给出些组合并给出评价?
尽量避免某处成为瓶颈而使商业软件的高效实际失效。

多谢了！

小猪 · 2003-03-04

先说说license。
你的应用是研究性质的，还是商业应用？
如果是研究性质，似乎除了oracle，都可以用，
如果是商业应用，那么：
tomcat是免费的，没错，但resin不是免费的，你可以去resin网站上down一个
license看看，商业应用是要收费的。
jboss和jrun没有用过，但你可以去看看它们的license文件，weblogic和websphere
商用是肯定不行的。如果真的打算买个放心的话，weblogic绝对比websphere好用，
至少能帮你省点买硬件的银子。
mysql商用也不是免费的，前几天还见人讨论过，这是一个很大的误解[

]
至于postsql，不是很清楚，恐怕还是要你自己去查它的license。
至于技术上的组合，
很多人推荐jboss和jrun，据说是不错，我没用过，不好评价，我不用ejb，所以
平常开发都用resin，速度比tomcat快很多，这是我的感受。
至于数据库，你对事务的要求如何，如果要求很严格的事务控制，我觉得oracle
是最好的了，不过这玩意儿太贵，恐怕不能满足你的要求了[

]，也许试试postsql？
如果不要求事务的话，那恐怕只有mysql是最好的了。

oskiller · 2003-03-05

完全免费，建议JBOSS＋postgresql，无论商用还是研究都免费，但前者技术支持要钱。JBOSS的
盈利模式就是卖文档。至于JBOSS的servelt/JSP容器，建议选择jetty，这是JBOSS钦定的，和JBOSS
结合的不错，性能也比TOMCAT快得多，也都是cure JAVA。

caomo · 2003-03-05

谢谢两位，我好不容易把tomcat的一系列设置弄熟。
不知道jetty到底要比tomcat好多少，我只知道tomcat是sun钦定的。jetty对jsp和servlet支持如何？
而且，tomcat在apache棋下，还有struts，tiles和cocoon作伴，不知道jetty是否支持？
发现java领域东西太多太乱。学死人啊：~~~~
我主要在于研究。但有个公益项目可能会投钱，但一般政府喜欢投硬件，但不投软件。
就靠压榨学生血汗了~~~~~~~
国家又大力推广自由软件应用，我不得不吐血求救阿。
小猪兄能多解释下数据库中事务对开发和性能的影响吗？我印象中事务好像表示多个操作的几阶段批量提交。

小猪 · 2003-03-06

给你看看这个，是李颖同学介绍的。[

]
该文的价码是1000USD，发表于Delphi。Bill Tood是Database Group的总裁，常常在BorCon和各类巡回培训上发表演讲。他的培训课程价格相当高，一个下午是500USD。这篇文章虽然对于高手而言不是什么不得了的东西，但对于入门者，具有相当的指导意义。
Choosing a Database
Checklists for Selecting the Best Database for a Task

A common question among Delphi developers is: "Which database should I use?" It would be an easier question to answer if there weren't so many choices. This article examines how to determine which database is right for you and your application. At the end of this article I hope you'll have a checklist that you can use to make a good choice based on your needs.

I was going to write this article without mentioning any database by name, but found it too difficult. When Ido
mention a particular product, however, it's as an example only. If you look at the significant players in all segments of the database market, there are no bad products. Instead, there is a variety of products with different features. A particular feature may be an advantage for one type of application and a disadvantage for another.

The Basics
Different databases have different strengths and weaknesses, so it stands to reason that your decision will involve matching the database's capabilities to your application's requirements. Necessarily, that means you cannot choose a database until you have defined your requirements, so let's start with a few basic questions:
Is the application for a single user or multiple users?
How many simultaneous users will you have?
How many simultaneous users will you have in 10 years?
How big will your database be?
How big will your database be in 10 years?
How many hours per day will the database be in use?
In the event of a hardware or software failure, will it be necessary to recover up to the instant of the failure?
Is database replication required?
Will data access consist solely of inserts and updates, or will there be queries?
How complex will the queries be, and how much data will they scan?

The answers to these questions should prepare you for this next set of questions.

Does the database support multiple simultaneous users? If the application is multi-user, you should eliminate all desktop databases from your list of candidates and focus on database servers. Client-server applications reduce network traffic and provide better performance with many simultaneous users. Most importantly, the chances of a system crash causing a corrupt database are much lower with a database server, particularly if the PC that hosts the database software is dedicated to that task, has an uninterruptible power supply, and is physically secure. SQL database servers are available in every price range, including free, so you have no reason not to use one.

What is the database architecture? The vast majority of database servers today use the relational model, but it's not the only choice. Perhaps an object database, a hierarchical database, a hash-based database, an associative database, or some other architecture would be better for your application. Relational databases are the most common, though. So, for this article, I will assume you're choosing a relational database.

On what operating systems and hardware platformsdo
es the database run? You need to consider the number of users and database size now, and how they will change in the future. If the database size and number of users both are small, a database that runs on a single Windows PC with a single processor may be all you need. If you need to support hundreds or thousands of users, you'll need a product that runs on more powerful hardware. If you expect the load on your database system to increase over time, you need to look at database servers that support multiple operating systems and hardware platforms. That way, you can start with a single PC for your database server and scale up to a multi-processor PC, a cluster of servers, a large Sun system, or even an IBM mainframe, without having to change your software.

If your database will be hundreds of gigabytes - or even terabytes - in size and will support many users, you'll want the ability to spread the database across multiple drives, and the ability to control which database objects reside on which drives, in order to maximize performance. This level of ease in configuring the database provides the best performance and hardware utilization. However, it also means you'll be working with a very complex database that requires a highly skilled database administrator to install, configure, and maintain. If your database must be online 24 hours a day, seven days a week, you may need backup servers and automatic fail-over, so usersdo
n't have any interruption of service if a server fails.

Locking vs. Versioning
Which concurrency control architecturedo
es the database use? There are two models for controlling concurrent access to data in a database: the locking model and the versioning model. The principal advantage of the versioning model is that readers never block writers. Put another way, if one user is executing a long-running query that selects and analyzes data, and this query requires a consistent view of the data as it was at one moment in time, other users will be able to update rows used by this query while the query is running. The same isn't true for locking databases.

What is the smallest amount of data that can be locked? Lock granularity is a measurement of that amount. Most databases today support row-level locking. However, you may encounter databases that offer only page-level locks as their most granular. This means that instead of locking one row, an entire page of rows must be locked, which reduces concurrent access to data. Page locks are a particular problem in small tables that are accessed by many users. The chances are high that two users may need to update rows on the same page at the same time.

Does the database use automatic lock escalation? If so, under what conditions? It's great if the database you're considering supports row-level locking, but what happens when a user reads or updates a large number of rows? Because the overhead of maintaining a large number of locks can hurt performance, some databases will escalate to less granular locks to reduce the number of locks the server must maintain. This means you suddenly could find that the database was locking pages - or even entire tables - not individual rows. Obviously, this can severely limit concurrent access to data.

What lock typesdo
es the database use, and howdo
they coexist? When considering a locking-model database, it is critical to understand how locks conflict with one another. What happens when one user places one type of lock on a row, page, or table, and a second user tries to place the same or a different type of lock on the same object? The best way to understand lock conflicts is with a lock-conflict table that shows all the lock typesdo
wn the left side and across the top. Each cell in this grid would represent an attempt to place two locks simultaneously, and the cells would contain explanations of what happens when the locks are attempted. This may sound like a trivial task, but it isn't. For example, the last time I looked at IBM's DB2, it had 12 different lock types. That means you have 144 different combinations of locks to consider! Even worse, if you want to understand what conflicts you may encounter, you need to understand which lock types your application will use at what times.

Transactions
Does the database support transactions? Be careful here. Some databases claim to support transactions when they reallydo
n't. A true transaction must exhibit four characteristics, sometimes called the ACID test (see Figure 1).

Character
Description
Atomicity
All changes that are part of a transaction succeed or fail as a single unit, no matter how many rows and tables are involved.
Consistency
The database will always be left in a consistent state. If the database server crashes, all active transactions will roll back automatically when the server restarts, so the state of the database will be as it was before any of the active transactions began.
Isolation
Changes made by uncommitted transactions cannot be seen by other users.
Durability
Once a transaction is committed, all changes that are part of that transaction become a permanent part of the database and cannot be lost.

Figure 1: Characteristics of a transaction;
the ACID test.

What transaction-isolation levelsdo
es the database support? Again, you need to be careful in evaluating transaction-isolation levels. Some database vendors use transaction-isolation levels thatdo
n't exactly match those defined by the American National Standards Institute (ANSI). For example, InterBase's snapshot transaction-isolation level provides the same consistent view of the data as ANSI serializable isolation, butdo
esn't guarantee that a sequential order of execution of concurrent transactions will produce the same result. To add to the confusion, ANSI-standard repeatable-read isolation may not give you the behavior you expect. Figure 2 shows the ANSI-standard transaction-isolation levels.

Isolation Level
Description
Read uncommitted
Also known as dirty-read isolation. Other transactions can see changes made by a transaction that has not committed. Very few databases support read-uncommitted isolation because it violates the isolation rule for transactions and lets other transactions see and use values that can vanish from the database if the transaction rolls back.
Read committed
Other transactions can see changes made by a transaction as soon as it commits.
Repeatable read
If a transaction using repeatable-read isolation selects a set of rows and then
later selects the same set of rows, it will see exactly the same values in each field of each row that it read the first time. Note, however, that if new rows have been inserted into the database since the first SELECT, and those new rows satisfy the WHERE clause, they will be returned by the second SELECT. The read is repeatable only for those rows that were read the first time.
Serializable
Serializable isolation guarantees that if you execute a SELECT more than once during the course of the transaction, you'll get exactly the same result set every time. It also guarantees that if multiple serializable transactions execute concurrently, some sequential order of execution will produce the same results.

Figure 2: ANSI-standard transaction-isolation levels.

What happens when the transaction log gets full?
If the database uses a transaction log, what happens if it gets full?
Will the transaction log expand dynamically? If not, whatdo
you have todo
to increase the size of the transaction log?
Can the log size be increased while the database is in use?
Can you place the transaction log on a separate drive from the database files so that both the database and the log will not be lost if a single hard drive fails?

Can transactions span more than one database? Some databases allow a single transaction to make changes in more than one database;
othersdo
n't.

Does the database have a method of generating sequential numbers? It's common to store data in relational databases thatdo
n't contain a natural primary key. The most common solution is to use a sequential number as a surrogate primary key. This is such a common requirement that I would not consider a database thatdo
esn't have a way to generate sequential numbers safely in a multi-user environment. Some databases have a special field type todo
this. Other databases provide another mechanism, such as the generators used by InterBase. The database also should provide a way to change the next value that will be generated, so the value can be reset after testing or for other reasons. You also may want the option to disable automatic generation of numbers when importing data that already has a primary key assigned to each row.

SQL Implementation
What level of compliance to the ANSI SQL standarddo
es the database provide? If you require particular features of the ANSI SQL standard, make sure the database supports them. If you need any non-standard features, such as full-text search or cross tabulation, make sure the products you consider support them.

What functions are included in the SQL implementation? A complete library of date, time, math, and string functions enhances the power of SQL and the database's stored procedure and trigger language. This dramatically can reduce the amount of data manipulation that must be programmed into client applications. If you need todo
statistical analysis, look for those functions, as well.

Can you write your own functions? It's also very valuable to be able to write your own functions and call them in SQL statements, stored procedures, and triggers. This makes it easy for you to add specialized functions that aren't available in the database's standard function library.

Can SELECT statements span databases? Not all databases support the ability to join tables from two or more databases in a SELECT statement. If you need this capability, make sure the databases you're considering have it.

Howdo
es the query optimizer work? The power and sophistication of the query optimizer can have a huge impact on the performance of your application and can save you a lot of time you might spend otherwise on trying to improve the performance of your queries. Ask yourself these questions:
Does the optimizer look for the lowest-cost solution?
Does the optimizer estimate table statistics, ordo
es the system track them?
How and when are the table statistics updated?
Does the optimizer consider hardware cost?
Is the optimizer aware of the data, such as the minimum, maximum, average, and distribution of values?
Are there types of queries that are not optimized, such as multiple outer joins?

What features are provided to improve performance? See if the database you're looking at supports the following feature: clustered tables, clustered index, shared procedure cache, shared data cache, and write caching.

Clustered tables place selected tables near each other on disk for faster access. A clustered index stores the rows in sorted order, so the data rows are the leaf nodes of the index. This makes the index smaller and the searches faster. Shared-procedure caches and shared-data caches let multiple users share the same cache for improved memory use and performance. Write caching improves performance by reducing the number of write operations at the expense of safety. If you use write caching and the server crashes, everything in the cache will be lost, and it is likely that the database will be corrupt.

Does the database support stored procedures? Stored procedures are written in the database's procedure and trigger programming language. These procedures can be called by client applications, triggers, or other stored procedures. Check for the following features:
Do stored procedures support all data types?
Can you create statements on the fly and execute them?
Can you treat a stored procedure that returns a result set as a table in SQL statements?

Does the database support triggers? Triggers, like stored procedures, are functions written in the database's procedure and trigger language. But triggers are executed automatically when data is inserted, deleted, or updated. Check for features such as:
Can you attach multiple triggers to a single event?
Can you specify the order in which multiple triggers fire?
Can you choose whether the trigger fires before or after the event?
Can triggers fire events to notify the client of some condition?

Triggers can execute when a row in a table is inserted, deleted, or updated. The ability to write more than one trigger fired by the same event means you can write and test many small, simple, single-function triggers instead of one large one. It makes development, testing, and maintenance easier. If you can attach multiple triggers to a single event, it's critical that you be able to specify the order in which they execute, in case the triggers are dependent on each other.

Some databasesdo
n't allow you to have triggers that fire both before and after the event. This is very limiting. For example, one of the most common uses of triggers is to validate data before updates or inserts are allowed to proceed and block the operation if the data is not valid.

Events allow the database to notify the client that something has happened, without the overhead of having the client application query the database frequently. For example, you might want a sales-administration system notified if a customer cancels an order. Another example would be to notify the purchasing application if the inventory level of a product falls below a set amount.

Can you change metadata while the database is online?do
you have to shutdo
wn the database before you can add new tables, add new columns to existing tables, or make other changes to the database? If you can make metadata changes while the database is online, when will connected users see the changes?

Does the database support declarative referential integrity? You can waste a lot of time writing triggers to enforce referential integrity or, worse, enforcing it in your client applications. The database should allow you to declare referential-integrity relationships between tables and enforce them automatically. It should also let you specify whether you want deletes cascaded to the child table and whether you want primary key updates to be cascaded to the child tables automatically.

Hardware Failure
Does the database support replication? If the database supports replication,do
es it support synchronous replication, asynchronous replication, or both? Synchronous replication updates the target database instantly when the source database changes. If synchronous replication is used, what happens if the connection to the target database is lost? Will you be unable to make any changes to the source database? Will changes be logged and replicated when the target database comes back online?

Asynchronous replication logs changes in the source database and replicates them to the target database later. The disadvantage of asynchronous replication is that the target database is always behind the source database by some period of time. The advantage is that youdo
n't need a permanent connection between the source and target databases.

Note that for databases thatdo
n't support a separate transaction log, replication may be the only way to provide up-to-the-minute recovery if a catastrophic hardware failure occurs.

What options are available for backing up the database?
Can the database be backed up while it's in use?
Are differential or incremental backups supported, or must you back up the entire database? A differential backup backs up all changes since the last full backup. An incremental backup backs up all changes since the last backup of any kind.
Does the database maintain a separate transaction log?
Can the transaction log be backed up while the database is online?
Does the transaction log allow differential or incremental backups?

Howdo
you recover from a server crash or hardware failure? If the database server crashes, all transactions that were active at the time of the crash must be rolled back automatically when the database server restarts, to leave the database in a consistent state. How is this accomplished? If it requires processing the transaction log, how long will this take with the maximum number of transactions you expect to have open at one time?

This is one area in which versioning databases have an advantage when compared with locking-model databases. Nothing needs to be changed in a versioning database except the status code for the transactions that were active, and this takes only a second or two. Any record versions left behind by the transactions that were rolled back will be ignored and cleaned up automatically by the normal garbage-collection process.

What happens if you have a catastrophic hard-disk failure? Can you restore the last full backup and replay the transaction log to roll the database forward to the moment of the crash? What are your options if the entire server machine and all of its hard drives are destroyed?

Are there repair tools for recovering a corrupt database? How good are they?

Other Features
There are other important questions you should include on your checklist. They may include the following:
Can you import and export data in XML format?
What other data formatsdo
you need to import and export routinely?do
es the database support them?
What tools are provided for importing and exporting data?
Are gateways available for other databases with which you need to exchange data?
Does the database interface to the development tools you use?
Is there an ODBC driver?
Is there an OLE DB driver?
Is there a dbExpress driver?
Is there a native component suite for Delphi?
Are there case tools that support this database?

Maintenance, Security, and Cost
When considering what's required to maintain the database, ask yourself the following questions:
What steps must you take to keep the database operating at peak efficiency?do
you have to rebuild indices, update table statistics, or reorganize the data on disk to reduce fragmentation?
Can these routine maintenance processes be automated?
Can these operations be performed while the database is in use?
Will the database and log files grow dynamically, or must the DBA monitor and expand them?
How much skill is required of on-site personnel to maintain the database?
Are performance-monitoring tools included to help you tune the database?
How many tuning parameters are there?

A careful look at maintenance and tuning requirements will help you determine whether you need a specially trained database administrator to support your database. For example, Oracle is reported to have more than 400 tuning parameters. Clearly, tuning is not a task for the casual user.

How muchdo
es it cost?
What is the cost of the software?
What are the per-seat licensing costs?
What is the cost of a support contract?
What is the cost of updates?
What is the cost of training for your development and support staff?
Will you need a full-time, specially trained database administrator? At what cost?

What security featuresdo
es the database have?
Is user security maintained at the server level or the database level?
How granular is security for database objects? Can you grant access at the table level? At the column level?
Are roles supported to make administration easier?
Does the database support encryption? How good is the algorithm?
Can you restrict the user's access to metadata?
Can you protect your data, metadata, stored procedures, or triggers from access by anyone (including the database administrator) at a client site?
How dependent is the database on operating-system security?

Security needs vary widely. If you're setting up a database server within your business, security at the operating-system level may be sufficient to prevent unauthorized people from gaining access to the database files. If the database will be installed on notebook computers that can be lost or stolen, you face a different set of problems. If you're selling a commercial product and need to protect your intellectual property from everyone, including your clients, you need yet another set of security features.

Conclusion
The key to choosing the right database for your project or organization is to first create a thorough definition of your requirements. While not exhaustive, this checklist gives you a place to start. As you define your requirements, alter this list to make it match your needs, and you'll have a useful tool for comparing products.

oskiller · 2003-03-07

偶建议你打死别用TOMCAT，反正偶安装它是用来测试兼容性的。用来学习也可以，但是用它
做部署平台是错误的选择。它的性能很低，或者说它本来的定位就是servlet/JSP容器的一
种参考实现，所以它总是更多的考虑对标准的支持而不是性能扩展性等等。举个例子吧，
tomcat使用开放源码的Tyrex数据源作为其默认的DataSource绑定，而Tyrex并没有使用连接
缓冲池，更谈不上数据缓冲存储了。这样在数据库访问方面性能及其低下。另外原来曹晓刚
也提到过tomcat大量地采用了异常来控制程序流程，我想这多少也会影响到性能吧。虽然
tomcat在apache旗下，但是也别太迷信。jakarta的项目大多数都很不错，但是还是有一些
项目不尽人意。
jetty并不是最好的，但是你一定要用JBOSS的话，最好是用它，struts，tiles和cocoon等
等在jetty中可以正常运行，不用担心。不过我原来碰到过cocoon在jetty中频繁扔出异常
的情况，不过是很久以前了，现在应该没有问题。
你问的J2EE的事务相关问题，首先要理解什么是事务性资源，然后看看对事务支持的实现
方式(虚包代理)。你提到的数据库事务，仅仅是J2EE事务的一种。这个资料很多，IBM的
技术网站上就有几篇非常好的文章，去看看吧。

caomo · 2003-03-09

还有一个，现在app server的市场价和性价比如何
如能用上一个app server，倒是增添不少方便。
只要有充分的优势让我能说服老板[

]

小猪 · 2003-03-09

app server?
上面说的tomcat，resin，jetty，weblogic，websphere之类的不都是吗？

caomo · 2003-03-10

我说的就是：jrun,weblogic,websphere
这些商业软件。它们如果价格能接受，应该要比tomcat，jetty,jboss要好用得多吧：）

小猪 · 2003-03-10

至少websphere一点都不好，我用了将近两年了。
我的一个同事和好几个朋友倒是对weblogic赞不绝口。
另外，据讨论来看，似乎大家认为orion是目前性能最好的
app server。

caomo · 2003-03-11

thanks,小猪&oskiller

关于linux上j2ee平台选择的问题 (200分)

caomo

Unregistered / Unconfirmed

小猪

Unregistered / Unconfirmed

oskiller

Unregistered / Unconfirmed

caomo

Unregistered / Unconfirmed

小猪

Unregistered / Unconfirmed

oskiller

Unregistered / Unconfirmed

caomo

Unregistered / Unconfirmed

小猪

Unregistered / Unconfirmed

caomo

Unregistered / Unconfirmed

小猪

Unregistered / Unconfirmed

caomo

Unregistered / Unconfirmed

Similar threads