Visual Studio Magazine - Database Design - Optimize Update-Concurrency Tests

Optimize Update-Concurrency Tests
Compare alternative approaches to SQL Server 2000 concurrency management.
by Roger Jennings

October 2003 Issue

Technology Toolbox: VB.NET, SQL Server 2000, XML, ADO.NET

Minimizing active database connections is a primary objective of scalable data-access architectures. Maintaining a live connection to the source database while users browse, update, or insert the data behind complex business documents isn't practical for more than a few simultaneous users. ADO's disconnected Recordset objects let you maintain a local copy (snapshot) of the data of interest, disconnect, update the snapshot values, reconnect, and attempt to update the affected tables. ADO.NET's DataSet objects extend the original disconnected-recordset approach to create an in-memory representation of data tables with client-side relationship definitions, primary- and foreign-key constraints, and other data-consistency-checking features. Both object types permit offline editing by persisting the original and edited data versions to a local XML file and attempting an update after reconnecting to the network. The local-persistence scenario also applies to long-running transactions in automated business-to-business (B2B) processes, such as supply-chain management (see Additional Resources).

The probability that another client or component has altered the underlying data increases as the time between the initial snapshot data access and updates made to the records grows. Updates based on noncurrent data snapshots often generate concurrency conflicts. Adopting the "last in wins" approach and overwriting prior changes that another person or process made is dangerous at best and might have fatal consequences. For example, two physicians altering a medical record simultaneously could prescribe different drugs or dosage based on conflicting patient-status data. Your data front ends or components must perform concurrency tests before an update to verify that no others have made changes to the underlying data prior to your updates. Bob Beauchemin's article in the VSM August 2003 issue covers the basics of single-table DataSet concurrency tests (see Additional Resources). I'll compare the performance of alternative approaches to SQL Server 2000 optimistic concurrency testing of transacted multitable updates by stored procedures. I'll also describe an often-overlooked concurrency test element for multirow updates.

The two most common concurrency-checking techniques are row-level timestamp values and column current-value comparisons. SQL Server timestamp values are an eight-byte binary representation of a row identifier that's guaranteed to be unique within a table; the timestamp value increments when any column value changes. Your initial SELECT query returns the timestamp value with the snapshot data. You add an equality test for the original timestamp value as a WHERE clause criterion to the stored procedure for the UPDATE operation. If the timestamp values don't match, your stored procedure sends a return-value parameter code to indicate that the update didn't occur.

Strongly typed DataSets default to the current-value approach for concurrency conflict management. Current-value comparisons test all original column values prior to performing an update, so your stored-procedure parameters must include original and updated parameter values for each table column. If any column value in the row doesn't match the original value, the SqlCommand.ExecuteNonQuery method returns a 0 row count as the RecordsAffected value to indicate that a concurrency collision occurred and throws an exception.

Multitable operations usually involve enlisting updates in a SqlClient.SqlTransaction object; if any update fails, it's common practice to roll back the entire transaction. Interactive clients usually refresh the current data and provide a Web or Windows Form that lets the user intervene to resolve the concurrency conflict.

Choose Your Concurrency Weapon
Timestamp comparisons are the traditional approach to SQL Server concurrency tests. The SQL Server timestamp datatype has no relationship to a datetime value, although SQL Server 2000 Books Online states that this might change in a future version. VS.NET's online help for the SqlDbType Enumeration states that SqlDbType.Timestamp corresponds to .NET's DateTime datatype, but Microsoft Knowledge Base article Q309489, "Use a TimeStamp Column to Detect Update Collisions in ADO.NET with Visual Basic .NET," assigns SqlDbType.Binary to the SQLParameter datatype argument value. If you intend to serialize the data to XML, you need a readable timestamp value that conforms to an XML Schema (XSD) datatype. Add a computed bigint (xsd:long) column with a CREATE | ALTER TABLE statement that includes column definitions:

...
TSBinary timestamp NOT NULL,
TSLong AS CONVERT(bigint, TSBinary),
...

Use the computed column values in the UPDATE procedure's WHERE clause criterion. The timestamp and computed columns add 16 bytes to the row, but the concurrency test benefits usually outweigh the table's size increase.

Most business documents in the real world—sales orders, invoices, and the like—have multiple related (child) records, such as line items. Updating line items complicates concurrency tests, because multiple users can add, delete, or alter individual line items and also change parent-record column values, such as the shipped date or shipping address. (Users shouldn't be able to alter an order that's been shipped, but that's a business-logic decision, not a concurrency issue.)

The disconnected client has no knowledge of other clients' changes, so one approach is to attempt to delete the snapshot's existing line-item records and insert a new set of line items (see Figure 1). This method accommodates updates where the sequence of child records is important. You can create a procedure in Transact-SQL (T-SQL) to delete the known line items if they're unmodified, and test for line items other clients have added (see Listing 1).

The code in Listing 1 sets the @LastDetail parameter to 1 when attempting to delete the last line item from its snapshot. If another client has added a line item, the SELECT COUNT statement returns 1 or more, and the procedure returns an arbitrary error code (-25). Line items that others have modified or deleted return -20. A return value less than 0 terminates the update process and invokes the SqlTransaction.Rollback method. A positive return value enables the code to invoke a simple INSERT stored procedure for each line item. The disadvantage of this method is the need to refresh unmodified line items; you must execute a total of 12 procedures for an order with six line items. An alternative approach is to increment the parent record's timestamp by an update operation when any child-record changes occur, but doing this incurs a substantial performance penalty if the changes are to child records only.

Compare Timestamps and Typed DataSets
Adding SqlParameter members to multiple SqlClient.SqlCommand objects and setting parameter values for the parent and each child-record deletion and update operation entails writing a substantial amount of VB.NET or C# code. I designed a WinForms test client to orchestrate update performance tests with modified versions of NorthwindCS's Orders and Order Details tables (see Figure 2). The OrderTS and OrderDetailsTS tables have timestamp and computed bigint columns. The client's clsOrdersTS.UpdateOrInsertOrderSP procedure has 258 instructions (active lines of code); adding the SqlParameter objects and setting their values accounts for 96 lines ( download the test client and T-SQL scripts).

Timestamp tests are well-suited to the Data Access Logic Components (DALCs) and Business Entity (BE) objects. The test client's UpdateOrInsertOrderSP procedure implements Microsoft's recommendations (see Additional Resources). Your Business Process Component (BPC) can persist the BE object with the failed changes, retrieve another BE object with the new timestamp values, substitute these values in the persisted object's timestamp members, then retry the update.

You can take advantage of VS.NET's Connection and DataAdapter designers to add a persistent typed DataSet to a WinForms project. The designer DataSet saves a substantial amount of manual coding and generates the required SqlSelectCommand, SqlInsertCommand, SqlUpdateCommand, and SqlDeleteCommand stored procedures automatically. The test client's GetOrUpdateDataSet procedure duplicates UpdateOrInsertOrderSP's update operations with 97 active VB.NET code lines. Only 12 lines handle automated updates to the Orders and Order Details records, because the DataAdapters for the OrdersTS and OrderDetailsTS tables take care of SqlParameter housekeeping chores for you.

However, you don't get a free ride with typed DataSets, because column-value concurrency tests double the number of parameter data bytes sent to the update stored procedures, and the procedures return a resultset containing the current column values. Also, the T-SQL update procedure's WHERE clause code is much more complex than it is for timestamp tests (see Listing 2). DataSets are better suited to "rich clients" in two-tier, client/server architectures where simplified binding to textboxes and DataGrid controls is paramount for speeding front-end application development. I recommend the timestamp approach in multitier environments that adhere to Microsoft's Patterns and Practices recommendations, because you can design your custom DALCs, BE classes, and BPCs to optimize code reuse and interoperability. DataSets have a predefined XML schema that might not be compatible with your BE class or type design.

When you decide on a concurrency strategy, take into account that DataSet updates don't detect child records that other users or components add, unless you add custom code to compare the number of child records in the DataSet and database. It might be dangerous to modify existing child records if a new record has been added. For example, if a physician adds a new prescription for a patient, substituting or updating another drug's dosage might have life-threatening consequences.

Consider Performance Consequences
The final factor in choosing a concurrency-management strategy is update-execution speed. I used a modified version of the WinForms front end from the "Compare SQL Server Data Access Methods" article to orchestrate performance comparisons between the DataSet and timestamp methods. The front end runs under Windows XP Professional SP1 and .NET Framework 1.0 on a 667-MHz Pentium III white box with 384 MB of RAM and a 20 GB UltraIDE drive. (Tests with Windows Server 2003 and .NET Framework 1.1 on the same machine exhibit no measurable performance difference.) The client connects through a 100-MBps switch to SQL Server 2000 Standard Edition SP3A running under Window 2000 SP3 on a Compaq ProLiant DL360 development server with two 833-MHz Pentium III processors, 1 GB of RAM, and two 18 GB SCSI drives in a hardware RAID 1 configuration.

I performed a total of about 1.5 million random retrieval and update operations on a database with 100,000 orders and 268,000 line-item records (see Figure 3 and the sidebar, "Compare Timestamp and DataSet Performance"). Retrieving data for an order having an average of 2.7 line-item records with a single SqlDataReader object is about twice as fast as filling a DataSet. I attribute the performance difference to the need to clear the client's persistent DataSet and then execute separate procedures to fill the order and line items with two SqlDataAdapters. (Client code adjusts update test times to eliminate the effect of retrieval time.)

Timestamp updates to an order and an average of 2.7 line-item records execute about 40 percent faster than DataSet updates, despite the additional timestamp stored-procedure executions that test for added line items. Updating only the line-item records gives similar results and minimizes the effect of disk-write queuing on the results. Passing additional parameter values and executing more-complex stored procedures for DataSets appears to be the major factor in this comparison.

Updating a single line item and eliminating the added line-item test for the timestamp method levels the playing field between the two approaches. In this case, timestamps were about 40 percent faster for an order and one line-item update, but only 10 percent faster for a single line-item update.

Your results will undoubtedly differ from mine. The number of parent and child columns, sizes of varchar or nvarchar parameters, and server-hardware horsepower have a profound effect on the number of update operations per second with either concurrency-management approach. However, you can be confident that timestamp-based concurrency management will outperform DataSet-value-based comparisons in most .NET multitier data-access architectures.

About the Author
Roger Jennings is a principal of OakLeaf Systems, columnist for FTP's family of Web sites, and contributing editor for Visual Studio Magazine. His primary computer-related interests are XML Web services, and client/server and XML/XSLT database applications. Roger's latest books include Visual Basic .NET XML Web Services Developer's Guide (Osborne/McGraw-Hill), Special Edition Using Access 2002 (Que), Special Edition Using Windows 2000 Server (Que), Admin911: Windows 2000 Group Policy (Osborne/McGraw-Hill), and Database Developer's Guide with Visual Basic 6 (Sams). OakLeaf's U.S. Code of Federal Regulations XML Web services recently won a charter Microsoft .NET Best Award. Reach Roger at .