Welcome Guest!
Create Account | Login
Locator+ Code:

Search:
FTPOnline
Channels Conferences Resources Hot Topics Partner Sites Magazines About FTP RSS 2.0 Feed

Free Subscription to Enterprise Architect

email article
printer friendly
more resources

Designing the Smart-Data Enterprise
Get prepared for the 10 ways that semantic computing will impact enterprise IT
by Michael C. Daconta

Posted November 28, 2003

As developers and system architects, our treatment of data has evolved significantly over the course of our careers and, further back, of computing history. In 1971, David Parnas introduced the principle of information hiding to advocate isolating clients form the details of a design. In The Mythical Man-Month (Addison-Wesley: 1995) Frederick Brooks, Jr. stated that data representation is the essence of programming. That same year, Brian Kernighan and P.J. Plauger advised us in The Elements of Programming Style (McGraw-Hill: 1988) to let the data structure your program. More recently, in the essay "The Cathedral and the Bazaar," Eric Raymond proposed the following rule: "Smart data structures and dumb code work a lot better than the other way around."

Besides our general attitude toward data, the tools and techniques for representing and designing data have also changed. Some of those techniques are dataflow diagrams (DFD), entity-relationship models, the Integrated Definition (IDEF) family of models, Unified Modeling Language (UML) class models, Document Type Definitions (DTDs), and XML schemas. Understanding how data evolves, why it does so, and where it is heading will allow you to architect better enterprise systems.

The History of Data
Historically, data began as a second-class citizen locked away in proprietary applications. In the data evolution timeline (see Figure 1) this period is referred to as the age of programs. Data was seen as secondary to the programs processing it. This incorrect attitude gave rise to the expression garbage in, garbage out or GIGO. GIGO basically reveals the flaw in the original argument by establishing the dependency between processing and data. In other words, useful software is wholly dependent on good data. Computing professionals began to realize that data was important and must be verified and protected. Programming languages began to acquire object-oriented facilities that internally made data first-class citizens. However, the data was still kept internal to applications so that vendors could keep data proprietary to their applications for competitive reasons.

Data proliferated in volume and number of formats in the era of personal computers. The most popular formats were the office document formats like Microsoft Word. These data formats and databases had proprietary schemas understood only by the applications processing the data.

The growth of the Internet in the 1980s and the World Wide Web in the 1990s began the shift away from proprietary formats. In 1994, with the Netscape browser, that shift gained widespread adoption and continued to grow throughout the decade. Toward the end of the decade XML began its meteoric rise, and by 2000 many vertical industries were defining open markup languages to better share data and metadata. This trend has accelerated as Web services begin moving from early adopters to mainstream acceptance.

Note that the evolution of data has sped up in recent years, with major shifts occurring more rapidly. This speed is an effect of a more mature understanding of how to model data. We are now moving into a new phase of the data evolution—the age of semantic models—in which the standardization of the World Wide Web Consortium (W3C)'s Ontology Web Language (OWL) will be the catalyst. It is important to understand that data evolution will not end with this phase; there are more fine-grained shifts and more follow-on phases to come.

A key aspect of the data evolution is the ways it has affected software development. Examine the data evolution timeframe from the developer's perspective (see Figure 2), which relates changes in programming languages to the way they treat and manipulate data.

Procedural programming focused on functional decomposition of a task with data in simple structures to be passed into and out of procedures in a processing chain. Thus, data was something to be manipulated and modified by procedures. The implicit attitude is that "data is less important than code."

Object-oriented programming introduced encapsulation, inheritance, and polymorphism. Encapsulation combined the data and the functions that operated on the data into classes. A class is a user-defined data structure template where data members are protected from external manipulation. This concept of protecting data with special keywords and accessor methods put data on equal footing with the methods that manipulated it. Data was as important as the code that processed it, though still specific to that code and therefore not portable.

Data Gets Smart
With the advent of XML and its automated binding to internal data, Model-Driven Architecture (MDA), and runtime-metadata support in .Net and Java, we are entering a new phase of portable data. The crux of this shift is that "data is now more important than code." Though this has not yet been fully realized, the shift is occurring because the goals of data mobility, interoperability, and self-description require it. When you no longer assume that your data is tied to a single application, when it must interoperate with current and future applications both within and external to your organization, and when the data may last long after the life cycle of a single application, you have a new Copernican revolution in which applications are seen as revolving around data instead of the other way around.

Back to top

Printer-Friendly Version





Java Pro | .NET Magazine | Visual Studio Magazine | XML & Web Services Magazine
VSLive! | Thunder Lizard Events | Discussions | Newsletters | FTP Home