Factor in Performance and Portability
Discover the impact of performance and scalability as well as portability in choosing an object persistence technology in this third of four installments
by Dr. Wilson Cheng and Dr. Pinaki Poddar

In previous installments of this discussion—"Choose the Right Object Persistence" (Java Pro online, July 16, 2003) and "More Object Persistence Choice Factors" (Java Pro online, July 23, 2003)—we covered the mapping support and domain models of JDBC, CMP, and JDO as well as object query mechanisms and transaction state management as factors to consider in choosing an object persistence technology. Here we'll discuss performance and scalability and portability as additional factors in your decision.

To choose the right object persistence technology, you must understand what performance optimizations they offer. Performance issues are not an explicit part of the JDBC, CMP, or JDO specification. However, applications still must be able to scale against large data sets and concurrent accesses. Technology providers boost performance with techniques such as client caching and fetch-on-demand. Because technology providers offer such optimizations as extensions, these optimizations are some of the most distinguishing features among solutions.

Client caching is an in-application-memory cache that retains instances fetched from the data store, even across transaction boundaries. Application requests for objects are first looked up in cache. They are brought from data store only when a requested object is not in cache. In-memory caching avoids redundant fetching of the same instance by different parts of the application, which is a significant performance gain.

Fetch-on-demand or lazy fetching is a technique whereby fields of an instance are brought from storage only when they are accessed. Consider a typical example where an Employee record has a social security number, name, and photograph. When an Employee instance is realized from data store, only its SSN and name fields are fetched. The photograph is fetched only if the user explicitly requests it later, saving the transfer of large volumes of image data.

Optimization requires persistence service to manage domain objects. For object persistence service to optimize performance, it must have the knowledge of the domain objects. Such is the notion of managed objects. In both CMP and JDO, the service provider manages domain objects in these manners: CMP requires the domain objects to be declared as entity beans—the container can then optimize the access of entity beans—and JDO requires the domain objects to be enhanced to implement the PersistenceCapable interface.

With managed objects, both CMP and JDO can implement performance optimizations by controlling an object's interaction with the data store. In JDBC, however, the driver has no notion of managed objects. Thus, performance optimization is fairly limited in JDBC.

JDBC has no built-in performance optimization. A developer must explicitly design JDBC performance optimizations. Fetch-on-demand, for example, can be built into a JDBC application with SQL query projections. However, the development cost of such nontrivial performance optimizations can be great.

Likewise, building a client cache in JDBC is nontrivial. No conceptual or natural association exists between an object and a data store record. On the other hand, prerequisite for a client cache is that objects have persistent identity (such as the primary key of an RDBMS record), and this persistent identity is associated with transient identity implicitly assigned by the Java Virtual Machine (JVM).

The JDBC-prepared query (in the form of a java.sql.PreparedStatement) is a useful performance optimization. If the RDBMS supports prepared query, frequent queries can be instantiated once but executed repeatedly. The cost of query parsing and query plan generation is often comparable to that of query evaluation-prepared queries that can accept different parameters at execution and may result in significant performance gain.

CMP performance optimization is advanced. Most CMP providers include many performance optimizations. Client caching is inherent in EJB design and built in to all application servers as a reusable pool of beans. In fact, most application servers have distributed client caching, where the cache logically spans multiple process memory spaces for distributing application load across multiple processors.

Fetch-on-demand is also natural in CMP because entities are fetched only on access. With the deployment descriptor, most CMP providers support identifying fields that belong to default fetch groups when only a subset of fields is needed from the data store (although default fetch groups are not mandated). Other optimizations such as marking beans as read-only to minimize data store writes are also supported by some providers.

JDO performance optimization is moderate. JDO explicitly uses a client cache programming model. This cache is managed by the JDO implementation without application intervention. But advanced applications can still control client-side caching programmatically because JDO provides methods to do so, such as to evict cached objects or refresh states from the data store.

Fetching can be controlled by qualifying fields to be included or excluded in the default fetch group of its class.

Portability
Platform portability is the cornerstone of Java applications. Other relevant portability issues are changes in schema, data store technology, or infrastructure service vendor. An application should be minimally aware of a database schema. One main purpose of CMP's or JDO's declarative mapping (in contrast with JDBC's definitive mapping) is to decouple an application schema from database schema.

Data store portability is the ability to change vendors but maintain the same persistence mechanism—for example, moving from Oracle to SQL Server. An even more drastic issue in data store portability is when data store technology itself is changed—for example, from file system to relational database.

JDBC is only moderately portable. JDBC is weakest in portability of the three techniques. Historically, JDBC matched Java's portability with the portability of SQL across RDBMSs. However, a JDBC application is tightly coupled to a specific persistent schema because an application must directly refer to database tables and columns. Any schema change demands costly changes to application logic.

SQL conformance among RDBMSs makes JDBC applications fairly portable among different vendors. However, most RDBMSs provide special extensions to SQL. Using these extensions with JDBC can impact portability. JDBC portability is further hampered by the incompatibility of Java and SQL types. A result set contains the values of each column as a SQL type, which must be coerced into a Java type. This coercion can differ once the database is ported.

CMP is highly portable. CMP applications are highly portable in face of the evolution of the data store schema, changes in data store vendor, and even changes in data store technology itself. The CMP deployment descriptor completely decouples the application from the schema. Likewise, declarative EJB-QL decouples the query mechanism. Schema changes require only EJB recompilation of concrete bean classes to reflect the changes. This promises a level of portability difficult to achieve with JDBC.

CMP allows a pluggable persistence manager that generates the concrete implementation of beans to interact with the data store. This model makes it possible to introduce alternative data store technology without impacting the application—as long as the CMP provider supports the alternative data store.

JDO is highly portable. JDO achieves complete schema decoupling, similar to CMP. The application needs not to be aware of the schema because it focuses exclusively on domain objects.

Data store portability has greater scope with JDO. JDO attempts to generalize over multiple types of data store technology, including RDBMSs. How practical is portability across data store technologies? IT managers do not decide to switch an application from a relational database to a file system overnight. Consider a financial instrument management application that uses two data stores: a relational database and an audit file. Special changes to a financial instrument (such as a new stock ticker symbol) must be audited and hence recorded in the audit file as well as the data store. This application can use the exact same JDO interface to store the object using a pair of JDO drivers: one for the RDBMS and another for the file system.

In an upcoming final installment, we'll discuss mechanics and summarize how to choose the technology that best meets your needs.

About the Authors
Dr. Wilson Cheng, vice president engineering, directs product development strategies and initiatives at Versant Corporation; has led interdisciplinary, intercompany teams to improve cross-platform, e-business solutions at Oracle; and was the chief architect for the object-replication project for Jasmine at Fujitsu. Cheng published more than 20 technical articles in various international conferences and was the program chair for the sixth annual Australian Conference on Parallel and Real-Time Systems.

Dr. Pinaki Poddar, principal software engineer for Versant Corporation, works in the area of object persistence and J2EE integration. He designed enterprise applications for the health care and finance industries and developed a Hindi speech recognition system in his past life. He is also a contributor to www.openadaptor.org, an open source project for enterprise application integration. Reach Dr. Cheng at , and reach Dr. Poddar at .