给你看看这个,是李颖同学介绍的。[
]
该文的价码是1000USD,发表于Delphi。Bill Tood是Database Group的总裁,常常在BorCon和各类巡回培训上发表演讲。他的培训课程价格相当高,一个下午是500USD。这篇文章虽然对于高手而言不是什么不得了的东西,但对于入门者,具有相当的指导意义。
Choosing a Database
Checklists for Selecting the Best Database for a Task
A common question among Delphi developers is: "Which database should I use?" It would be an easier question to answer if there weren't so many choices. This article examines how to determine which database is right for you and your application. At the end of this article I hope you'll have a checklist that you can use to make a good choice based on your needs.
I was going to write this article without mentioning any database by name, but found it too difficult. When Ido
mention a particular product, however, it's as an example only. If you look at the significant players in all segments of the database market, there are no bad products. Instead, there is a variety of products with different features. A particular feature may be an advantage for one type of application and a disadvantage for another.
The Basics
Different databases have different strengths and weaknesses, so it stands to reason that your decision will involve matching the database's capabilities to your application's requirements. Necessarily, that means you cannot choose a database until you have defined your requirements, so let's start with a few basic questions:
Is the application for a single user or multiple users?
How many simultaneous users will you have?
How many simultaneous users will you have in 10 years?
How big will your database be?
How big will your database be in 10 years?
How many hours per day will the database be in use?
In the event of a hardware or software failure, will it be necessary to recover up to the instant of the failure?
Is database replication required?
Will data access consist solely of inserts and updates, or will there be queries?
How complex will the queries be, and how much data will they scan?
The answers to these questions should prepare you for this next set of questions.
Does the database support multiple simultaneous users? If the application is multi-user, you should eliminate all desktop databases from your list of candidates and focus on database servers. Client-server applications reduce network traffic and provide better performance with many simultaneous users. Most importantly, the chances of a system crash causing a corrupt database are much lower with a database server, particularly if the PC that hosts the database software is dedicated to that task, has an uninterruptible power supply, and is physically secure. SQL database servers are available in every price range, including free, so you have no reason not to use one.
What is the database architecture? The vast majority of database servers today use the relational model, but it's not the only choice. Perhaps an object database, a hierarchical database, a hash-based database, an associative database, or some other architecture would be better for your application. Relational databases are the most common, though. So, for this article, I will assume you're choosing a relational database.
On what operating systems and hardware platformsdo
es the database run? You need to consider the number of users and database size now, and how they will change in the future. If the database size and number of users both are small, a database that runs on a single Windows PC with a single processor may be all you need. If you need to support hundreds or thousands of users, you'll need a product that runs on more powerful hardware. If you expect the load on your database system to increase over time, you need to look at database servers that support multiple operating systems and hardware platforms. That way, you can start with a single PC for your database server and scale up to a multi-processor PC, a cluster of servers, a large Sun system, or even an IBM mainframe, without having to change your software.
If your database will be hundreds of gigabytes - or even terabytes - in size and will support many users, you'll want the ability to spread the database across multiple drives, and the ability to control which database objects reside on which drives, in order to maximize performance. This level of ease in configuring the database provides the best performance and hardware utilization. However, it also means you'll be working with a very complex database that requires a highly skilled database administrator to install, configure, and maintain. If your database must be online 24 hours a day, seven days a week, you may need backup servers and automatic fail-over, so usersdo
n't have any interruption of service if a server fails.
Locking vs. Versioning
Which concurrency control architecturedo
es the database use? There are two models for controlling concurrent access to data in a database: the locking model and the versioning model. The principal advantage of the versioning model is that readers never block writers. Put another way, if one user is executing a long-running query that selects and analyzes data, and this query requires a consistent view of the data as it was at one moment in time, other users will be able to update rows used by this query while the query is running. The same isn't true for locking databases.
What is the smallest amount of data that can be locked? Lock granularity is a measurement of that amount. Most databases today support row-level locking. However, you may encounter databases that offer only page-level locks as their most granular. This means that instead of locking one row, an entire page of rows must be locked, which reduces concurrent access to data. Page locks are a particular problem in small tables that are accessed by many users. The chances are high that two users may need to update rows on the same page at the same time.
Does the database use automatic lock escalation? If so, under what conditions? It's great if the database you're considering supports row-level locking, but what happens when a user reads or updates a large number of rows? Because the overhead of maintaining a large number of locks can hurt performance, some databases will escalate to less granular locks to reduce the number of locks the server must maintain. This means you suddenly could find that the database was locking pages - or even entire tables - not individual rows. Obviously, this can severely limit concurrent access to data.
What lock typesdo
es the database use, and howdo
they coexist? When considering a locking-model database, it is critical to understand how locks conflict with one another. What happens when one user places one type of lock on a row, page, or table, and a second user tries to place the same or a different type of lock on the same object? The best way to understand lock conflicts is with a lock-conflict table that shows all the lock typesdo
wn the left side and across the top. Each cell in this grid would represent an attempt to place two locks simultaneously, and the cells would contain explanations of what happens when the locks are attempted. This may sound like a trivial task, but it isn't. For example, the last time I looked at IBM's DB2, it had 12 different lock types. That means you have 144 different combinations of locks to consider! Even worse, if you want to understand what conflicts you may encounter, you need to understand which lock types your application will use at what times.
Transactions
Does the database support transactions? Be careful here. Some databases claim to support transactions when they reallydo
n't. A true transaction must exhibit four characteristics, sometimes called the ACID test (see Figure 1).
Character
Description
Atomicity
All changes that are part of a transaction succeed or fail as a single unit, no matter how many rows and tables are involved.
Consistency
The database will always be left in a consistent state. If the database server crashes, all active transactions will roll back automatically when the server restarts, so the state of the database will be as it was before any of the active transactions began.
Isolation
Changes made by uncommitted transactions cannot be seen by other users.
Durability
Once a transaction is committed, all changes that are part of that transaction become a permanent part of the database and cannot be lost.
Figure 1: Characteristics of a transaction;
the ACID test.
What transaction-isolation levelsdo
es the database support? Again, you need to be careful in evaluating transaction-isolation levels. Some database vendors use transaction-isolation levels thatdo
n't exactly match those defined by the American National Standards Institute (ANSI). For example, InterBase's snapshot transaction-isolation level provides the same consistent view of the data as ANSI serializable isolation, butdo
esn't guarantee that a sequential order of execution of concurrent transactions will produce the same result. To add to the confusion, ANSI-standard repeatable-read isolation may not give you the behavior you expect. Figure 2 shows the ANSI-standard transaction-isolation levels.
Isolation Level
Description
Read uncommitted
Also known as dirty-read isolation. Other transactions can see changes made by a transaction that has not committed. Very few databases support read-uncommitted isolation because it violates the isolation rule for transactions and lets other transactions see and use values that can vanish from the database if the transaction rolls back.
Read committed
Other transactions can see changes made by a transaction as soon as it commits.
Repeatable read
If a transaction using repeatable-read isolation selects a set of rows and then
later selects the same set of rows, it will see exactly the same values in each field of each row that it read the first time. Note, however, that if new rows have been inserted into the database since the first SELECT, and those new rows satisfy the WHERE clause, they will be returned by the second SELECT. The read is repeatable only for those rows that were read the first time.
Serializable
Serializable isolation guarantees that if you execute a SELECT more than once during the course of the transaction, you'll get exactly the same result set every time. It also guarantees that if multiple serializable transactions execute concurrently, some sequential order of execution will produce the same results.
Figure 2: ANSI-standard transaction-isolation levels.
What happens when the transaction log gets full?
If the database uses a transaction log, what happens if it gets full?
Will the transaction log expand dynamically? If not, whatdo
you have todo
to increase the size of the transaction log?
Can the log size be increased while the database is in use?
Can you place the transaction log on a separate drive from the database files so that both the database and the log will not be lost if a single hard drive fails?
Can transactions span more than one database? Some databases allow a single transaction to make changes in more than one database;
othersdo
n't.
Does the database have a method of generating sequential numbers? It's common to store data in relational databases thatdo
n't contain a natural primary key. The most common solution is to use a sequential number as a surrogate primary key. This is such a common requirement that I would not consider a database thatdo
esn't have a way to generate sequential numbers safely in a multi-user environment. Some databases have a special field type todo
this. Other databases provide another mechanism, such as the generators used by InterBase. The database also should provide a way to change the next value that will be generated, so the value can be reset after testing or for other reasons. You also may want the option to disable automatic generation of numbers when importing data that already has a primary key assigned to each row.
SQL Implementation
What level of compliance to the ANSI SQL standarddo
es the database provide? If you require particular features of the ANSI SQL standard, make sure the database supports them. If you need any non-standard features, such as full-text search or cross tabulation, make sure the products you consider support them.
What functions are included in the SQL implementation? A complete library of date, time, math, and string functions enhances the power of SQL and the database's stored procedure and trigger language. This dramatically can reduce the amount of data manipulation that must be programmed into client applications. If you need todo
statistical analysis, look for those functions, as well.
Can you write your own functions? It's also very valuable to be able to write your own functions and call them in SQL statements, stored procedures, and triggers. This makes it easy for you to add specialized functions that aren't available in the database's standard function library.
Can SELECT statements span databases? Not all databases support the ability to join tables from two or more databases in a SELECT statement. If you need this capability, make sure the databases you're considering have it.
Howdo
es the query optimizer work? The power and sophistication of the query optimizer can have a huge impact on the performance of your application and can save you a lot of time you might spend otherwise on trying to improve the performance of your queries. Ask yourself these questions:
Does the optimizer look for the lowest-cost solution?
Does the optimizer estimate table statistics, ordo
es the system track them?
How and when are the table statistics updated?
Does the optimizer consider hardware cost?
Is the optimizer aware of the data, such as the minimum, maximum, average, and distribution of values?
Are there types of queries that are not optimized, such as multiple outer joins?
What features are provided to improve performance? See if the database you're looking at supports the following feature: clustered tables, clustered index, shared procedure cache, shared data cache, and write caching.
Clustered tables place selected tables near each other on disk for faster access. A clustered index stores the rows in sorted order, so the data rows are the leaf nodes of the index. This makes the index smaller and the searches faster. Shared-procedure caches and shared-data caches let multiple users share the same cache for improved memory use and performance. Write caching improves performance by reducing the number of write operations at the expense of safety. If you use write caching and the server crashes, everything in the cache will be lost, and it is likely that the database will be corrupt.
Does the database support stored procedures? Stored procedures are written in the database's procedure and trigger programming language. These procedures can be called by client applications, triggers, or other stored procedures. Check for the following features:
Do stored procedures support all data types?
Can you create statements on the fly and execute them?
Can you treat a stored procedure that returns a result set as a table in SQL statements?
Does the database support triggers? Triggers, like stored procedures, are functions written in the database's procedure and trigger language. But triggers are executed automatically when data is inserted, deleted, or updated. Check for features such as:
Can you attach multiple triggers to a single event?
Can you specify the order in which multiple triggers fire?
Can you choose whether the trigger fires before or after the event?
Can triggers fire events to notify the client of some condition?
Triggers can execute when a row in a table is inserted, deleted, or updated. The ability to write more than one trigger fired by the same event means you can write and test many small, simple, single-function triggers instead of one large one. It makes development, testing, and maintenance easier. If you can attach multiple triggers to a single event, it's critical that you be able to specify the order in which they execute, in case the triggers are dependent on each other.
Some databasesdo
n't allow you to have triggers that fire both before and after the event. This is very limiting. For example, one of the most common uses of triggers is to validate data before updates or inserts are allowed to proceed and block the operation if the data is not valid.
Events allow the database to notify the client that something has happened, without the overhead of having the client application query the database frequently. For example, you might want a sales-administration system notified if a customer cancels an order. Another example would be to notify the purchasing application if the inventory level of a product falls below a set amount.
Can you change metadata while the database is online?do
you have to shutdo
wn the database before you can add new tables, add new columns to existing tables, or make other changes to the database? If you can make metadata changes while the database is online, when will connected users see the changes?
Does the database support declarative referential integrity? You can waste a lot of time writing triggers to enforce referential integrity or, worse, enforcing it in your client applications. The database should allow you to declare referential-integrity relationships between tables and enforce them automatically. It should also let you specify whether you want deletes cascaded to the child table and whether you want primary key updates to be cascaded to the child tables automatically.
Hardware Failure
Does the database support replication? If the database supports replication,do
es it support synchronous replication, asynchronous replication, or both? Synchronous replication updates the target database instantly when the source database changes. If synchronous replication is used, what happens if the connection to the target database is lost? Will you be unable to make any changes to the source database? Will changes be logged and replicated when the target database comes back online?
Asynchronous replication logs changes in the source database and replicates them to the target database later. The disadvantage of asynchronous replication is that the target database is always behind the source database by some period of time. The advantage is that youdo
n't need a permanent connection between the source and target databases.
Note that for databases thatdo
n't support a separate transaction log, replication may be the only way to provide up-to-the-minute recovery if a catastrophic hardware failure occurs.
What options are available for backing up the database?
Can the database be backed up while it's in use?
Are differential or incremental backups supported, or must you back up the entire database? A differential backup backs up all changes since the last full backup. An incremental backup backs up all changes since the last backup of any kind.
Does the database maintain a separate transaction log?
Can the transaction log be backed up while the database is online?
Does the transaction log allow differential or incremental backups?
Howdo
you recover from a server crash or hardware failure? If the database server crashes, all transactions that were active at the time of the crash must be rolled back automatically when the database server restarts, to leave the database in a consistent state. How is this accomplished? If it requires processing the transaction log, how long will this take with the maximum number of transactions you expect to have open at one time?
This is one area in which versioning databases have an advantage when compared with locking-model databases. Nothing needs to be changed in a versioning database except the status code for the transactions that were active, and this takes only a second or two. Any record versions left behind by the transactions that were rolled back will be ignored and cleaned up automatically by the normal garbage-collection process.
What happens if you have a catastrophic hard-disk failure? Can you restore the last full backup and replay the transaction log to roll the database forward to the moment of the crash? What are your options if the entire server machine and all of its hard drives are destroyed?
Are there repair tools for recovering a corrupt database? How good are they?
Other Features
There are other important questions you should include on your checklist. They may include the following:
Can you import and export data in XML format?
What other data formatsdo
you need to import and export routinely?do
es the database support them?
What tools are provided for importing and exporting data?
Are gateways available for other databases with which you need to exchange data?
Does the database interface to the development tools you use?
Is there an ODBC driver?
Is there an OLE DB driver?
Is there a dbExpress driver?
Is there a native component suite for Delphi?
Are there case tools that support this database?
Maintenance, Security, and Cost
When considering what's required to maintain the database, ask yourself the following questions:
What steps must you take to keep the database operating at peak efficiency?do
you have to rebuild indices, update table statistics, or reorganize the data on disk to reduce fragmentation?
Can these routine maintenance processes be automated?
Can these operations be performed while the database is in use?
Will the database and log files grow dynamically, or must the DBA monitor and expand them?
How much skill is required of on-site personnel to maintain the database?
Are performance-monitoring tools included to help you tune the database?
How many tuning parameters are there?
A careful look at maintenance and tuning requirements will help you determine whether you need a specially trained database administrator to support your database. For example, Oracle is reported to have more than 400 tuning parameters. Clearly, tuning is not a task for the casual user.
How muchdo
es it cost?
What is the cost of the software?
What are the per-seat licensing costs?
What is the cost of a support contract?
What is the cost of updates?
What is the cost of training for your development and support staff?
Will you need a full-time, specially trained database administrator? At what cost?
What security featuresdo
es the database have?
Is user security maintained at the server level or the database level?
How granular is security for database objects? Can you grant access at the table level? At the column level?
Are roles supported to make administration easier?
Does the database support encryption? How good is the algorithm?
Can you restrict the user's access to metadata?
Can you protect your data, metadata, stored procedures, or triggers from access by anyone (including the database administrator) at a client site?
How dependent is the database on operating-system security?
Security needs vary widely. If you're setting up a database server within your business, security at the operating-system level may be sufficient to prevent unauthorized people from gaining access to the database files. If the database will be installed on notebook computers that can be lost or stolen, you face a different set of problems. If you're selling a commercial product and need to protect your intellectual property from everyone, including your clients, you need yet another set of security features.
Conclusion
The key to choosing the right database for your project or organization is to first create a thorough definition of your requirements. While not exhaustive, this checklist gives you a place to start. As you define your requirements, alter this list to make it match your needs, and you'll have a useful tool for comparing products.