Holding the Bag With XML DOM
The XML bag repository stores state information and offers cross-platform compatibility
by Kurt Cagle
Posted May 6, 2003
On occasion, I have found myself sorely in need of a bag. Now, this isn't the paper-or- plastic bag issue that you have to resolve every time you go shopping for groceries. In this particular case, a bag is a repository, a place to store state information temporarily. A bag is similar to a collection in Visual Basic or a hash in most other languages, although in this case, the mechanism I want to use to implement this bag is XML-based. This in turn gives the bag a few capabilities that other storage mechanisms might not have, such as the ability to access the contents of that bag via Xpath.
The idea behind a bag is simple—it holds things, regardless of what those things are. Bags tend to be a pain in traditionally strongly typed languages such as Java, since this inclusiveness requires some fairly fast and fancy playing with object pointers—in essence telling the compiler to look the other way while such information is stored. However, because XML is intrinsically untyped, even in the face of schemas (schemas exist as suggestions about type, not necessarily explicit impositions of type), you can store any kind of information in an XML bag.
For instance, suppose that you wanted to store these pieces of information in a bag: the name of a person (as a string), the date you're planning to meet (as a date), the name of a location where you're planning to meet (another string) and a GPS location (as a complex type). This information could be encoded readily like this:
<bag xmlns:xs="http://www.w3.org/2001/XMLSchema-
datatypes">
<item key="personToMeet">
Aleria Delamare
</item>
<item key="meetingDateTime" type="xs:dateTime">
2003-02-20T10:00:00.000
</item>
<item key="locationName">
Starbucks
</item>
<item key="location" type="GPSPosition"
schemaLocation="http://www.metaphoricalweb.com/
schemas/gpsPosition.xsd">
<gpsPosition>
<latitude>47.7113</latitude>
<longitude>-122.2342</longitude>
</gpsPosition>
</item>
</bag>
Note that this isn't a definitive object—instead, the bag contains information that might be stored between session states, perhaps even user information that varies depending on the requirement of the user (such as CSS properties that need to be changed).
The simplest <item>s within the bag consist of strings and follow the concept of a simple associative array:
<item key="personToMeet">
Aleria Delamare
</item>
Because XML normalizes its whitespace, that expression is essentially the same as:
<item key="personToMeet">Aleria Delamare</item>
Handling richer white space may be accomplished by wrapping the contents of the text in a CDATA block:
<item key="personToMeet"><![CDATA[
Aleria Delamare
]]></item>
The default datatype for an item's content is a string, but beyond strings it is necessary to specify the datatype explicitly. For instance, you can specify a meeting time using the XSD dateTime datatype:
<item key="meetingDateTime" type="xs:dateTime">
2003-02-20T10:00:00.000
</item>
This type uses the built-in datatypes specified by the XML Schema Definition (XSD) recommendation, which is why the bag instantiates this namespaces (xs:) ahead of time.
Most pure hash tables work with more complex datatypes by using the binary of the type as the basis for building the key. However, with XML, you have the ability to associate a complex datatype with an object via a schema, and you can even include the definition of the schema as part of the bag.
This is what happens with the location and alternateLocation items. Both use the datatype GPSPosition, which consists of a set of two constrained floating point numbers, <latitude>, which varies from -90 (the South Pole) to +90 (the North Pole), and <longitude>, which treats Greenwich, England, as the zero point and then wraps from -179.9999 to 180.00, where negative numbers indicate westerly movement.
The particular implementation of the XMLBag object that's included here does not perform schema post validation, but obviously, in creating a bag, it's typically useful to insure that there is a schema that can be used to convert the objects back into internal binary representations. The schemaLocation attribute is a URL pointer to an external XML Schema Definition object.
Creating and Accessing the Bag
Before we get started, the Java code here uses the Java 1.4.1 SDK. I standardized on this because most of the XML classes necessary to make this class work are already defined internally within the 1.4 SDK, rather than needing to be included in an external class path. This also marks the first chance I've had to use the Eclipse Java IDE (www.eclipse.org), and I've come away impressed. I'm running this on Linux.
The XMLBag class has two constructors—one that creates an empty bag (with a single <bag/> element), the other that creates a bag loaded from a previously existing XML file in bag format. The class also has both a save() and load() function to persist and retrieve the bag (this can also be recast as an XML serialization as needed).
import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import org.w3c.*;
import org.apache.xpath.*;
import java.io.*;
import java.io.FileWriter;
import java.io.FileReader;
import java.io.StringReader;
/**
* @author seatails
*
* To change this generated comment edit the
* template variable "typecomment":
* Window>Preferences>Java>Templates.
* To enable and disable the creation of type
* comments go to:
* Window>Preferences>Java>Code Generation.
*/
public class XMLBag {
private Document xmlBag =null;
private DocumentBuilder db;
public XMLBag(){
try {
DocumentBuilderFactory
dbf=DocumentBuilderFactory.newInstance();
db=dbf.newDocumentBuilder();
xmlBag=db.newDocument();
xmlBag.appendChild(xmlBag.createElement
("bag"));
}
catch(Exception e){
System.err.println(e.getMessage());
}
}
public XMLBag(String filename){
try {
this.load(filename);
}
catch(Exception e){
System.err.println(e.getMessage());
}
}
public String toString(){
return xmlBag.getDocumentElement()
.toString();
}
public void save(String filePath){
try {
FileWriter fw=new FileWriter(filePath);
fw.write(this.toString());
fw.close();
}
catch(Exception e){
System.err.println(e.getMessage());
}
}
public void load(String filePath){
try {
DocumentBuilderFactory dbf=
DocumentBuilderFactory.newInstance();
db=dbf.newDocumentBuilder();
xmlBag=db.parse(new File(filePath));
}
catch(Exception e){
System.err.println(e.getMessage());
}
}
The setItem method, used to assign a given key to a string, number, or complex XML, document, is pretty heavily overloaded. You have the option of specifying either a string (along with datatype information) or an object as the object to be stored.
public Element setItem(String itemKey,String
itemValue,String dataType){
try {
Element item=this.getItemFromKey(itemKey);
if (item==null){
item=xmlBag.createElement("item");
xmlBag.getDocumentElement()
.appendChild(item);
item.setAttribute("key",itemKey);
}
item.setAttribute("type",dataType);
if (item.getChildNodes().getLength()>0){
item.removeChild(item.getFirstChild());
}
item.appendChild(xmlBag.createTextNode
(itemValue));
return item;
}
catch(Exception e){
System.out.println("Error:"+e.getMessage());
return null;
}
}
public Element setItem(String itemKey,String itemValue){
return setItem(itemKey,itemValue,"xs:string");
}
public Element setItem(String itemKey,Document
objectNode,String dataType,String schemaLocation){
try {
Element item=this.getItemFromKey(itemKey);
if (item==null){
item=xmlBag.createElement("item");
item.setAttribute("key",itemKey);
}
item.setAttribute("type",dataType);
if (item.getChildNodes().getLength()>0){
item.removeChild(item.getFirstChild()) ;
}
if (schemaLocation!=null){
item.setAttribute("schemaLocation",schemaLocation);
}
else {
item.removeAttribute("schemaLocation");
}
appendDuplicateNode(item,objectNode.getDocumentE
lement());
xmlBag.getDocumentElement().appendChild(item);
return item;
}
catch(Exception e){
System.out.println("Error:"+e.getMessage());
return null;
}
}
To copy the contents of an XML document into an <item> element, you have to get inventive. The cloneNode() method may seem tailor-made for duplicating an XML document, but at least within the Java implementations, the cloned node is still, technically speaking, owned by the initial document—you can't just append it as a child to an <item> node.
To get around this, I wrote the private appendDuplicateNode() method, which takes the node in the bag to which you want to attach a copy of the document, and this then recursively walks the tree of the object to be copied to create a duplicate of this tree in the bag. The sample here is somewhat restrictive—I don't copy comments, for instance, but this can be easily rectified just by examining the code.
private Node appendDuplicateNode(Node
parentNode,Node nodeToCopy){
Document doc=parentNode.getOwnerDocument();
Node newNode=null;
switch(nodeToCopy.getNodeType()){
case Node.ELEMENT_NODE:
newNode=doc.createElement(nodeToCopy.
getNodeName());
parentNode.appendChild(newNode);
NamedNodeMap attrNodes=
nodeToCopy.getAttributes();
for (int attrIndex=0;
attrIndex<attrNodes.getLength();
attrIndex++){
Node attrNode=attrNodes.item(attrIndex);
((Element)newNode).setAttribute
(attrNode.getNode
Name(),attrNode.getNodeValue());
}
NodeList childNodes=
nodeToCopy.getChildNodes();
for (int childIndex=0;
childIndex<childNodes.getLength();
childIndex++){
Node childNode=childNodes.item(childIndex);
appendDuplicateNode(newNode,childNode);
}
break;
case Node.TEXT_NODE:
newNode=doc.createTextNode(((Text)nodeToCopy)
.getData());
parentNode.appendChild(newNode);
break;
case Node.CDATA_SECTION_NODE:
newNode=doc.createCDATASection
(((CDATASection)nodeToCopy).getData());
parentNode.appendChild(newNode);
break;
default:
default:
}
return newNode;
}
The getItemFromKey() function retrieves an item as an XML document, using the item key and the XPathAPI static class. This particular class (part of the Apache library) adds Xpath support for XML documents, giving you the ability to retrieve either all nodes that satisfy a given query (XpathAPI.selectNodes) or the first node that satisfies this condition (.selectSingleNode()). This routine is used fairly extensively by the various getItemXXX methods:
public Element getItemFromKey(String itemKey){
String xpathKey="//item[@key='"+itemKey+"']";
try {
Node itemNode=
XPathAPI.selectSingleNode(xmlBag,xpathKey);
return (Element)itemNode;
}
catch(Exception e){
return null;
}
}
public String getItemContent(String itemKey){
Element itemNode=getItemFromKey(itemKey);
StringBuffer sb=new StringBuffer();
NodeList nl=itemNode.getChildNodes();
for (int i=0;i<nl.getLength();i++){
Node node=nl.item(i);
sb.append(node.toString());
}
return sb.toString();
}
public void removeItem(String itemKey){
Element itemNode=getItemFromKey(itemKey);
while (itemNode!=null){
itemNode.getParentNode().removeChild(itemNode);
}
}
public String getItemType(String itemKey){
Element itemNode=getItemFromKey(itemKey);
String type = itemNode.getAttribute("type");
return type;
}
public NodeList getItemsFromXPath(String
xpathPred){
try {
String xpathExpr = "bag/item["+xpathPred+"]";
NodeList itemNodeList=null;
itemNodeList=XPathAPI.selectNodeList(xmlBag,
xpathExpr);
return itemNodeList;
}
catch(Exception e){
System.err.println(e.getMessage());
return null;
}
}
The class includes a test main() function illustrating how various functions are invoked. Of course, in practice, you would probably not be assigning hard data directly into the object. Instead, this would let you do such things as define user specifications within a given environment:
public static void main(String [] args){
XMLBag bag=new XMLBag();
System.out.println("After Constructor
toString:"+bag.toString());
bag.load("/home/seatails/bagTest.xml");
System.out.println("After Load
toString:"+bag.toString());
bag.setItem("personToMeet","Aleria Delamare");
bag.setItem("meetingDateTime",
"2003-02-20T10:00:00.000","dateTime");
bag.setItem("locationName","Starbucks");
bag.setItem("newLocationName","Starbucks");
Document positionXML=null;
try {
DocumentBuilder dbTemp=
DocumentBuilderFactory.newInstance()
.newDocumentBuilder();
positionXML=
dbTemp.parse(new File
("/home/seatails/position.xml"));
bag.setItem("location",positionXML,
"GPSPosition",
"http://www.metaphoricalweb.com/
schemas/gpsPosition");
}
catch(Exception e){
}
System.out.println("getItemContent():
"+bag.getItemContent("meetingDateTime"));
System.out.println("getItemType():
"+bag.getItemType("meetingDateTime"));
NodeList nl=
bag.getItemsFromXPath("text()='Starbucks'");
for (int i=0;i<nl.getLength();i++){
Element el=(Element)nl.item(i);
System.out.println("getItemsFromXPath:
"+el.toString());
}
System.out.println("Before Save
toString():"+bag.toString());
bag.save("/home/seatails/bagTest.xml");
}
}
I wanted to talk about the getItemsFromXPath() function last. The getItemFromXPath() function is what makes the XMLBag superior in some respects to more traditional hash tables or associative arrays in other languages. This function makes it possible to retrieve an item that satisfies a particular Xpath expression predicate. This opens up several possibilities.
For instance, you could retrieve all dates within the bag that have the year 2003, using an expression such as:
bag.getItemsFromXPath("starts-with(.)='2003' and
@type='xs:dateTime'");
This is specifically designed so that you can only retrieve <item> elements, by the way. The reason for this pattern is to assure that the processing tools will always retrieve a consistent interface: you can be assured that regardless of what Xpath predicate condition, you'll always receive bag items, making processing more consistent.
Several improvements still can be made to this particular bag object. As it's coded now, the routine is not namespace aware, though this will be a priority in the final code. This becomes especially significant in insuring that <item> elements in subordinate XML objects don't get mistaken for bag items.
An XML bag is not useful everywhere. XML tends to be fairly memory intensive, and while the code for optimizing the XML DOM is reasonably fast, you may find that sometimes it is preferable to use a more specialized object.
However, the XML bag does offer more than a few advantages of its own. Because the contents of the bag are persisted in XML, you can create bag readers and writers on any number of development platforms—a .NET writer, for instance, could generate a bag that could be read by a Java reader (and the DOM code to do either is sufficiently similar, especially now that both platforms have adopted in great part the W3C XML DOM) so that porting even the consumers of these bags becomes easy, if not exactly trivial.
Moreover, because the primary query mechanism into the bag is Xpath, the code to perform queries is also cross-platform compatible. Indeed, you could very easily create a bag that contains Xpath queries for other bags.
The code discussed here is hardly rocket science. Indeed, more than anything what I've illustrated is the power of XML (both with DOM and Xpath) to be a generalized data structure that can be used in place of the plethora of trees, bags, collections, vectors, and other elements that populate contemporary coding.
About the Author
Kurt Cagle is an author, trainer, and consultant specializing in XML-related technologies. He has written or cowritten 12 books on XML and multimedia; his most recent book is SVG Programming: The Graphical Web (APress, 2002). Reach him at .
|