XML and Excel Spreadsheet Conversions
Apply APIs from the Jakarta Project to make XML-to-spreadsheet and spreadsheet-to-XML conversions
by Deepak Vohra

January 4, 2006

XML is the standard medium of data exchange, and often an XML document is presented in a Microsoft Excel spreadsheet format. The Jakarta POI project provides an API to create an Excel spreadsheet. The POI HSSF API can also be used to parse an Excel spreadsheet and convert it to another format such as XML. The HSSF API has the provision to set the layout, border settings, and fonts of an Excel spreadsheet document. Here, we'll generate an Excel spreadsheet example by parsing an XML document and adding data from the XML document to a spreadsheet. Subsequently we'll convert the Excel spreadsheet to an XML document.

The Jakarta POI HSSF API has classes to create an Excel workbook file and add spreadsheets to the workbook. With the POI API a workbook is represented by the HSSFWorkbook class. Spreadsheet fonts, sheet order, and cell styles are set in the HSSFWorkbook class.

A spreadsheet is represented with the HSSFSheet class. Sheet layout, including column widths, margins, header, footer, and print setup, is set with the HSSFSheet class. A spreadsheet row is represented with the HSSFRow class. The row height is set with the HSSFRow class. The HSSFCell class represents a cell in a spreadsheet row. The cell style is set with the HSSFCell class. The indexing of spreadsheets in a workbook, rows in a spreadsheet, and cells in a row is 0 based. In the procedure presented here, we'll convert an XML document example to an Excel spreadsheet, and then convert the spreadsheet to an XML document.

XML to Spreadsheet Data
You can create and parse an Excel spreadsheet by using the Apache POI HSSF API. Download the Apache POI JAR file, and extract it to an installation directory. Add poi-2.5.1-final-20040804.jar to the Classpath. The XML document example is parsed with the JDK 5.0 XPath API. You can download and install JDK 5.0 from the Java site, and you can use the Excel Viewer to open the Excel spreadsheet that is generated from the XML document example. Download and install the Excel Viewer; the Excel document can also be displayed in Excel itself.

Let's convert the XML document to an Excel spreadsheet by using the Apache HSSF API. Download the document example, catalog.xml (see Listing 1). The XML document is parsed with the JDK 5.0 XPath API, and the retrieved values are added to an Excel spreadsheet. Begin by importing the Apache POI HSSF package:

import  org.apache.poi.hssf.
        usermodel.*;

Use the HSSFWorkbook constructor to create an Excel workbook file:

HSSFWorkbook wb=
        new HSSFWorkbook();

Create a spreadsheet from the workbook:

HSSFSheet spreadSheet=
        wb.createSheet("spreadSheet");

Create a cell style object for the spreadsheet:

HSSFCellStyle cellStyle=
        wb.createCellStyle();

Next, specify the border settings for the HSSFCellStyle object:

cellStyle.setBorderRight(
        HSSFCellStyle.BORDER_MEDIUM);
cellStyle.setBorderTop(
        HSSFCellStyle.BORDER_MEDIUM);

A row in the spreadsheet has cells corresponding to each of the elements in the journal tag of the XML document. You can set the column width of each of the columns in the spreadsheet. For example, specify the column width of the first cell of a row like this:

spreadSheet.setColumnWidth((
        short)0,  (short)(256*25));

Create an InputSource object for the XML document that you want to convert to an Excel spreadsheet:

InputSource inputSource = 
        new InputSource(
        new FileInputStream(
        new File(
        "C:/Excel/catalog.xml")));

Create an XPath object to parse the XML document:

XPathFactory  factory=
        XPathFactory.newInstance();
XPath xPath=factory.newXPath();

Specify the XPath expression for a node list you want to obtain:

String expression=
        "/catalog/journal";

Then obtain a node list for the specified XPath expression. The node list consists of nodes corresponding to the journal elements in the XML document example:

com.sun.org.apache.xml.internal.
        dtm.ref.DTMNodeList nodeList = 
        (com.sun.org.apache.xml.
        internal.dtm.ref.DTMNodeList) 
        (xPath.evaluate(
        expression, inputSource, 
        XPathConstants.NODESET));

Iterate over the node list, and add a row to the spreadsheet that corresponds to each of the journal nodes in the node list. Use the HSSFRow object to add a spreadsheet row:

for(int i=0; 
        i<nodeList.getLength(); 
        i++){
HSSFRow row=
        spreadSheet.createRow((
        short)i);

}

Now you can create a cell in a spreadsheet row for each of the elements in the journal element. Use the createCell() method of the HSSFRow object to create a cell:

HSSFCell cell=
        row.createCell((short)0);

Set the value of a row cell with the setCellValue() method. For example, the value of the cell for the section element is set like this:

cell.setCellValue(((Element)(
        nodeList.item(i))).
        getElementsByTagName(
        "section").item(0).
        getFirstChild().
        getNodeValue());

The first cell in a row has index 0; set the cell style of a cell with the setCellStyle() method of the HSSFCell object:

cell.setCellStyle(cellStyle);

Similarly, you can set the values of the columns for the other cells in a row. Create a FileOutputStream to output the Excel workbook to an XLS file:

FileOutputStream output=
        new FileOutputStream(
        new File(
        "C:/Excel/ExampleExcel.xls"));

Output the Excel workbook to an XLS file, and close the FileOutputStream:

wb.write(output);
 output.flush();
 output.close();

An Excel spreadsheet is generated.

Going the Other Way
The spreadsheet example that is generated from the XML document example is available for download. The Java application XMLToExcel.java, which is used to convert an XML document to an Excel spreadsheet, is shown in Listing 2.

Now that we've seen how easy it is to generate an Excel document from an XML document, let's covert the other way, that is, convert the Excel document to an XML document. You also use the Apache POI HSSF API to parse an Excel spreadsheet and retrieve the cell values from the spreadsheet. Begin by importing the Apache POI HSSF API:

import  org.apache.poi.hssf.
        usermodel.*;

The root element of the XML document generated is catalog, and a journal element is added to correspond to each of the rows of the Excel spreadsheet. Generate an XML document, and specify the root element of the document:

DocumentBuilderFactory factory =
        DocumentBuilderFactory.
        newInstance();
DocumentBuilder builder = 
        factory.newDocumentBuilder();
Document document = 
        builder.newDocument();
Element catalogElement=
        document.createElement(
        "catalog");
document.appendChild(
        catalogElement);

Next, create an InputStream object for the Excel spreadsheet that will be converted to an XML document:

InputStream input=
        new FileInputStream(excelFile);

Obtain the workbook for the InputStream object:

HSSFWorkbook workbook=
        new HSSFWorkbook(input);

Obtain the spreadsheet for the Excel workbook:

HSSFSheet spreadsheet=
        workbook.getSheetAt(0);

Iterate over the rows in the spreadsheet, and add a journal element to the XML document for each of the rows:

for(int i=0; i<=
        spreadsheet.getLastRowNum(); 
        i++)

{

HSSFRow row=
        spreadsheet.getRow(i);

Element journalElement=
        document.createElement(
        "journal");
catalogElement.appendChild(
        journalElement);

}

A cell in a spreadsheet row is retrieved with the getCell() method, and a cell value is retrieved with the getStringCellValue() method. For example, the section cell value is retrieved from the spreadsheet and set in the XML document this way:

Element sectionElement=
        document.createElement(
        "section");
journalElement.appendChild(
        sectionElement);
sectionElement.appendChild(
        document.createTextNode(
        row.getCell((short)0).
        getStringCellValue()));

Similarly, the other cell values are retrieved from the spreadsheet and specified in the XML document. The Excel spreadsheet from which an XML document—catalog.xml—is generated is available for download. ExcelToXML.java, the Java application used to convert an Excel spreadsheet to an XML document, is listed in Listing 3.

About the Author
Deepak Vohra is a Sun-certified Java programmer and Sun-certified Web component developer, who has published several articles in publications that cover Java and XML content. Contact Deepak at .