ust a few years after Java and distributed computing entered the lexicon of developers, we're facing another change in enterprise software. Today, the hot topics include business intelligence, Online Analytical Processing (OLAP), and portals. Implementing such systems will be the task of enterprise developers. Given Java's emergence as the leading enterprise development language, odds are good that you'll be working on such systems in the near future.
The increased interest in OLAP is being fueled by Microsoft's entry into the OLAP market and a widespread interest in enterprise information portals (EIP). Merrill Lynch launched the EIP phenomenon with a November 1998 report that defined EIP as an "amalgamation of software applications that consolidate, manage, analyze and distribute information across and outside of an enterprise (including Business Intelligence, Content Management, Data Warehouse & Mart and Data Management applications)." The report said EIP applications will eventually grow to exceed the market for Enterprise Resource Planning (ERP) products, which includes applications from Baan Co. NV, PeopleSoft Inc., SAP AG, and J. D. Edwards & Co.
How will this interest in OLAP and EIP overlap with Java and your existing systems? Probably in many ways. EIPs will integrate disparate applications, access internal and external data sources, and provide a bi-directional exchange of information. To feed the portal, EIP software will draw information from OLAP applications, ERP applications, databases, data mining, e-mail, the Web, and other data sources. An EIP acquires data from diverse sources, integrates it, and presents it for further analysis. To develop this single, integrated view of the enterprise, organizations will be using OLAP along with many technologies you've seen before, such as XML, Web farming, middleware, decision support queries, Java, and SQL. Java will play an important role in the creation of e-portals as a solution for accessing data from operational databases, data warehouses, and external data sources.
As you might expect, database issues will be central to these systems. Java developers will be called on to participate in business intelligence projects involving OLAP, and decision-support services (DSS), data warehouses, and data marts.
The distinction between a data warehouse and data mart is scope. The warehouse typically stores enterprise-wide data, whereas a data mart stores information for a smaller
entity—for example, an individual hospital instead of the
entire healthcare company. While industry watchers have recently been focusing on Java and the Web, many organizations have been accumulating massive amounts of data in data warehouses. Now those organizations are promoting a marriage of Web and data-warehousing technologies.
If your Java experience has been primarily applets and client applications, you'll face another learning curve before you can develop business-intelligence software. You understand how to use a layout manager, but what is OLAP, and where do you get decision-support data? What are decision-support queries? Can you buy OLAP software or must you develop everything yourself? These are questions I'll try to answer in this article.
The Growth of OLAP
At a recent conference on enterprise systems, I asked how many people in my session were from organizations using SQL databases. Then I asked how many were from organizations using or building a data warehouse. Data warehousing had a stronger response than SQL databases. That's just one sample of the market, but the result is consistent with current trends. The momentum for data warehousing and OLAP has been increasing as hardware prices drop. Microsoft added to the momentum by shipping an OLAP server with SQL Server 7.0, which lowered the entry-level price for OLAP. Data warehouses and OLAP are no longer the exclusive domain of the largest companies. Cheaper OLAP and competitive pressure is causing many smaller companies to build data warehouses.
E-portals and Web-style publishing may be a primal force as organizations move to get a maximum return on investment from data warehousing. As OLAP interest has grown, so has recognition of Java's capability to turn data sources into Web-enabled data sources and limited-
distribution information into widely available information. The question, "Can we use Java for decision-support applications?" was inevitable.
A typical internet/intranet data warehouse scenario is a client/server architecture with Web OLAP clients that connect to an OLAP server tied to an SQL database. Your users submit queries that return a multidimensional data cube as the query result. Users will often want to perform various types of analysis using the data cube—a process that is called "slicing and dicing." Depending on the OLAP engine, the data cube might be cached at the server or downloaded to the client and cached for offline analysis. The market is starting to see the emergence of server-side OLAP engines, but there is more Java client software for OLAP than server software.
Massive Databases
Databases and queries used for OLAP are quite different from those used by applications such as brokering stocks or processing orders. When designing databases and queries for a DSS application, your guidelines are fundamentally different from a developer using an operational data store for online transaction processing (OLTP). A Web shopping cart, for example, will have high numbers of users running short-lived queries against an OLTP database. A decision-support application will often serve a small number of users executing long-lived queries. An operational store, such as an orders database, typically contains information for a 60- to 90-day period but a data warehouse often contains two years or more of historical data. The operational store contains measurements and facts that paint an accurate picture of the current period or slice in time, but a data warehouse represents multiple time slices and measurements that describe years of operation. This enables you to aggregate data, analyze variances, detect trends, and run decision-support queries with data pivots and drill-downs.
The accumulation of years of data can produce massive databases. A large book distributor's data warehouse includes a core table with 200 million rows. A data warehouse with two years of frequent-flier information is likely to have a flight history table with 60 million rows using four GBs of disk storage. Large enterprises often require massively parallel hardware and very large decision-support databases. Sears Roebuck and Company uses the NCR Teradata DBMS to manage a database that is 4,630 gigabytes (4.63 terabytes).
Decision Support with Schemas
One of the primary factors in the success of a data warehouse project is database design. Designers creating databases for OLTP applications often use data-modeling techniques intended to remove data redundancy. These techniques produce designs that are well suited for OLTP performance, but are not optimal for OLAP. They are also not easily understood by the end users who comprise the decision-support audience.
To produce a more intuitive design that delivers good performance, data warehouse
designers often use a dimensional model and star schema for SQL databases. The star schema uses a fact table to store numerical measurements about the subject of interest, such as shipments, sales, purchases, subscriptions, car rentals, and so on. It uses dimension tables to store attributes that are textual and discrete. For example, a sales-fact table will often be linked with a product-dimension table to produce information about product sales. If the schema includes a time dimension table, you can run queries about product sales over time. If you add a region dimension, you can run queries about sales by product and region. Figure 1 illustrates a star schema having a single fact table and multiple dimension tables.
Because star schemas have become so widespread for dimensional data warehouses, some DBMS vendors have modified their SQL optimizers to recognize star schemas. The optimizers for NCR Teradata, IBM DB2 Universal Database, Oracle 8, and Microsoft SQL Server 7.0 include logic to generate execution plans for queries against star schemas. There is an ongoing debate about whether to use a star schema, or simply tables that are normalized to third normal form (3NF), when designing databases for relational OLAP (ROLAP) applications. In either case, your Java code uses an SQL API and submits SQL queries.
OLAP queries operate on multidimensional data sets or data cubes, whether or not they use a star schema. Figure 2 is an example of a cube having a product, sales, and time dimension.
OLAP: Buy or Build?
If you're going to use Java with OLAP and decision-support software, you must decide whether to buy off-the-shelf software or write your own. If you decide to write your own, I've provided some design reminders in the sidebar "Six Tips for Better OLAP Designs." You'll also have to decide which alternative APIs to use. The OLAP product you purchase may dictate your choice of API. Some products support a proprietary API, whereas others are programmable using Java Database Connectivity (JDBC) or OLE DB for OLAP. Products such as Oracle Express and Hyperion Essbase preceded Java and expose proprietary APIs. The Essbase API enables you to program for Hyperion's product, or the version of the Essbase engine bundled with IBM DB2 OLAP Server.
Programmers have developed Java classes and components for connecting to mature OLAP products. For example, you can link to Essbase by using Mocha Blend from Painted Word Inc. or components from AlphaBlox. You can also use Appsource's Wired for OLAP with Essbase or Microsoft's OLAP Server. Wired for OLAP includes Wired Application Server and two types of pure Java clients. The Web Viewer Edition uses a simple interface and provides reporting capabilities. The Web Interactive Edition supports browsing dimensions and the ability to write back edited data to the OLAP server.
Viador's E-Portal Suite is a portal system that provides an integrated EIP offering and a browser-based interface (Viador Sage) for analyzing data and publishing it to the Web. The E-Portal Suite client also includes Viador Sentinel, which provides business-to-business connections and Viador Information Center. Information Center is a Java server-based information backbone that delivers content, manages sessions, and provides security and data access. It can scale to several CPUs to provide load balancing. Viador provides a Java interface for both Express and Essbase. It also has a gateway that connects to Microsoft SQL Server, Sybase, Informix, Oracle, IBM DB2, White Light, Hyperion Essbase, and Oracle Express.
SQRIBE Technologies has become a major player in the EIP space, in part because its PowerSQRIBE product was built with Java. PowerSQRIBE uses a server-based process that allows you to run queries to generate a data cube that is downloaded to the client for slice-and-dice analysis. PowerSQRIBE includes a pure Java client that provides line graphs, bar charts, pie charts, pivot charts, and drill-down analysis. The PowerSQRIBE server uses JDBC to query ROLAP databases and produces
compressed cubes to reduce download time. Users can also download cubes for offline analysis. PowerSQRIBE is part of an enterprise suite that includes Report Mart, VisualSQRIBE and SQR Server. It has the ability to move PowerSQRIBE cubes into the ReportMart, which is Sqribe's EIP software.
Information Builders FOCUS product line includes WebFOCUS Application Server with WebFOCUS Suite, a collection of pure Java applets for analysis and reporting. The suite provides a managed reporting environment that supports OLAP navigation. The application server can connect to over 70 different types of databases and ERP applications such as SAP. WebFOCUS allows developers to build a virtual hierarchy in metadata to specify relationships between columns in databases. Then the software can project the hierarchy onto different database types without having to transform the database into a multidimensional or star schema format.
Comshare specializes in performance analysis, and its Commander DecisionWeb product has a three-tier architecture that uses application servers and Java clients. Commander DecisionWeb extends Comshare's OLAP capabilities to Java-enabled Web browsers. The application server does server-based calculations and notifies clients of changes in the database. Commander DecisionWeb users can generate charts and grids, and use developer modules for defining formats.
Seagate Software's Info 7 product is a business intelligence software suite that works with Hyperion Essbase or Microsoft OLAP Server. Info 7 uses a Unix-based server and a query client written in Java. Info 7 can retrieve and analyze data in reports that can be distributed as a package.
Finally, you may be heartened to discover that Java software for OLAP isn't completely limited to the client. IdeaSoft has entered the market with O3, an OLAP engine written in Java. IdeaSoft is a software company in Montevideo, Uruguay whose products are distributed by ZTI (). IdeaSoft offers an evaluation version of O3 on its Web site (see "Resources").
Programming Your Own Relational OLAP
Organizations that adopt a relational OLAP (ROLAP) approach build a dimensional data warehouse using SQL databases. Designing star schemas and running SQL queries does not limit you to a single choice of API. One approach is to use JDBC for developing SQL clients. JDBC 2.0 includes new functionality that makes it better suited than JDBC 1.x for writing OLAP programs. For example, bidirectional cursors make it easier to browse dimensions. JDBC 2.0 also adds support for array data types.
Most of the new development in OLAP uses APIs developed by Microsoft and a number of leading OLAP vendors. They developed the specification for OLE DB for OLAP, a component-level API for accessing multidimensional data sets. Microsoft also extended the ActiveX Data Object (ADO) API to support multidimensional queries. ADO operates as an object layer over OLE DB. To use OLE DB interfaces, you must program in C/C++ or another language that can use pointers. Java programmers cannot write to OLE DB interfaces, but you can use ADO properties and methods. ADO and OLE DB decision support programs also use SQL extensions known as multidimensional expressions (MDX).
Java's power to develop client or server software means it will be an integral part of the e-portal picture. Companies will use Java to develop servers and powerful middleware for integrating data for delivery to and from e-portals. Java is also a major contributing
influence to the emergence of Web OLAP applications.
Ken North consults, writes, and teaches Expert Series seminars. He is the author of Windows Multi-DBMS Programming, John Wiley & Sons, 1995 and Database Magic with Ken North, Prentice Hall PTR, 1998. Reach him at or visit his Web site at .