Tempblog: 2011

Friday 15 April 2011

Reading/Parsing RSS feed using ROME

ROME is an open source tool to parse, generate and publish RSS and Atom feeds. Using Rome you can parse the available RSS and Atom feeds. Without bothering about format and version of RSS feed. The core library depends on the JDOM XML parser.
Atom is on the similar lines of RSS is another kind of feed. But it’s different in some aspects as protocol, payloads.
RSS is a method to share and publish contents. The contents may be any things from news to any little information. The main component is xml. Using xml you can share your contents on web. At the same time you are free to get what you like from others.

Why use Rome instead of other available readers

The Rome project started with the motivation of ‘ESCAPE’ where each letter stands for:
E – Easy to use. Just give a URL and forget about its type and version, you will be given a output in the format which you like.
S – Simple. Simple structure. The complications are all hidden from developers.
C – Complete. It handles all the versions of RSS and Atom feeds.
A – Abstract. It provides abstraction over various syndication specifications.
P – Powerful. Don’t worry about the format let Rome handle it.
E – Extensible. It needs a simple pluggable architecture to provide future extension of formats.

Dependency

Following are few dependencies:
J2SE 1.4+, JDOM 1.0, Jar files (rome-0.8.jar, purl-org-content-0.3.jar, jdom.jar)

Using Rome to read a Syndication Feed

Considering you have all the required jar files we will start with reading the RSS feed. ROME represents syndication feeds (RSS and Atom) as instances of the com.sun.syndication.synd.SyndFeed interface.
ROME includes parsers to process syndication feeds into SyndFeed instances. The SyndFeedInput class handles the parsers using the correct one based on the syndication feed being processed. The developer does not need to worry about selecting the right parser for a syndication feed, the SyndFeedInput will take care of it by peeking at the syndication feed structure. All it takes to read a syndication feed using ROME are the following 2 lines of code:

SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build (new XmlReader (feedUrl));

Now it’s simple to get the details of Feed. You have the object.

The sample code is as follows.

import java.net.URL;
import java.util.Iterator;
 
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;
 
/**
 * @author Hanumant Shikhare
 */
public class Reader {
 
  public static void main(String[] args) throws Exception {
 
    URL url  = new URL("http://viralpatel.net/blogs/feed");
    XmlReader reader = null;
 
    try {
 
      reader = new XmlReader(url);
      SyndFeed feed = new SyndFeedInput().build(reader);
      System.out.println("Feed Title: "+ feed.getAuthor());
 
     for (Iterator i = feed.getEntries().iterator(); i.hasNext();) {
        SyndEntry entry = (SyndEntry) i.next();
        System.out.println(entry.getTitle());
            }
        } finally {
            if (reader != null)
                reader.close();
        }
    }
}

Understanding the Program

Initialize the URL object with the RSS Feed or Atom url. Then we will need XMLReader object which will then take URL object, as its constructor argument. Initialize the SyndFeed object by calling the build(reader) method. This method takes the XMLReader object as an argument.

References

https://rome.dev.java.net/
http://www.intertwingly.net/wiki/pie/Rss20AndAtom10Compared
http://www.rss-specifications.com

Tuesday 15 March 2011

Streaming API for XML (StaX)

1. Overview

Streaming API for XML, called StaX, is an API for reading and writing XML Documents.
StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.
The core StaX API falls into two categories and they are listed below. They are

Cursor API
Event Iterator API

Applications can any of these two API for parsing XML documents. The following will focus on the event iterator API as I consider it more convenient to use.

2. Event Iterator API

The event iterator API has two main interfaces: XMLEventReader for parsing XML and XMLEventWriter for generating XML.

3. XMLEventReader - Read XML Example

4. Write XML File- Example

XMLEventReader - Read XML file using STAX

Applications loop over the entire document requesting for the Next Event. The Event Iterator API is implemented on top of Cursor API.
In this example we will read the following XML document and create objects from it.

<?xml version="1.0" encoding="UTF-8"?>
<config>
    <item date="January 2009">
        <mode>1</mode>
        <unit>900</unit>
        <current>1</current>
        <interactive>1</interactive>
    </item>
    <item date="February 2009">
        <mode>2</mode>
        <unit>400</unit>
        <current>2</current>
        <interactive>5</interactive>
    </item>
    <item date="December 2009">
        <mode>9</mode>
        <unit>5</unit>
        <current>100</current>
        <interactive>3</interactive>
    </item>
</config>

This the pojo class which holds the information stored by above xml file:

public class Item {
    private String date; 
    private String mode;
    private String unit;
    private String current;
    private String interactive;
    
    public String getDate() {
        return date;
    }
    
    public void setDate(String date) {
        this.date = date;
    }
    public String getMode() {
        return mode;
    }
    public void setMode(String mode) {
        this.mode = mode;
    }
    public String getUnit() {
        return unit;
    }
    public void setUnit(String unit) {
        this.unit = unit;
    }
    public String getCurrent() {
        return current;
    }
    public void setCurrent(String current) {
        this.current = current;
    }
    public String getInteractive() {
        return interactive;
    }
    public void setInteractive(String interactive) {
        this.interactive = interactive;
    }

    @Override
    public String toString() {
        return "Item [current=" + current + ", date=" + date + ", interactive="
                + interactive + ", mode=" + mode + ", unit=" + unit + "]";
    }
}

The following reads the XML file and creates a List of object Items from the entries in the XML file.

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.EndElement;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

import de.vogella.xml.stax.model.Item;

public class StaXParser {
    static final String DATE = "date";
    static final String ITEM = "item";
    static final String MODE = "mode";
    static final String UNIT = "unit";
    static final String CURRENT = "current";
    static final String INTERACTIVE = "interactive";

    @SuppressWarnings({ "unchecked", "null" })
    public List<Item> readConfig(String configFile) {
        List<Item> items = new ArrayList<Item>();
        try {
            // First create a new XMLInputFactory
            XMLInputFactory inputFactory = XMLInputFactory.newInstance();
            // Setup a new eventReader
            InputStream in = new FileInputStream(configFile);
            XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
            // Read the XML document
            Item item = null;

            while (eventReader.hasNext()) {
                XMLEvent event = eventReader.nextEvent();

                if (event.isStartElement()) {
                    StartElement startElement = event.asStartElement();
                    // If we have a item element we create a new item
                    if (startElement.getName().getLocalPart() == (ITEM)) {
                        item = new Item();
                        // We read the attributes from this tag and add the date
                        // attribute to our object
                        Iterator<Attribute> attributes = startElement
                                .getAttributes();
                        while (attributes.hasNext()) {
                            Attribute attribute = attributes.next();
                            if (attribute.getName().toString().equals(DATE)) {
                                item.setDate(attribute.getValue());
                            }

                        }
                    }

                    if (event.isStartElement()) {
                        if (event.asStartElement().getName().getLocalPart()
                                .equals(MODE)) {
                            event = eventReader.nextEvent();
                            item.setMode(event.asCharacters().getData());
                            continue;
                        }
                    }
                    if (event.asStartElement().getName().getLocalPart()
                            .equals(UNIT)) {
                        event = eventReader.nextEvent();
                        item.setUnit(event.asCharacters().getData());
                        continue;
                    }

                    if (event.asStartElement().getName().getLocalPart()
                            .equals(CURRENT)) {
                        event = eventReader.nextEvent();
                        item.setCurrent(event.asCharacters().getData());
                        continue;
                    }

                    if (event.asStartElement().getName().getLocalPart()
                            .equals(INTERACTIVE)) {
                        event = eventReader.nextEvent();
                        item.setInteractive(event.asCharacters().getData());
                        continue;
                    }
                }
                // If we reach the end of an item element we add it to the list
                if (event.isEndElement()) {
                    EndElement endElement = event.asEndElement();
                    if (endElement.getName().getLocalPart() == (ITEM)) {
                        items.add(item);
                    }
                }

            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (XMLStreamException e) {
            e.printStackTrace();
        }
        return items;
    }

}

Testing the program:

import java.util.List;

import com.vani.xml.stax.model.Item;

public class TestRead {
    public static void main(String args[]) {
        StaXParser read = new StaXParser();
        List<Item> readConfig = read.readConfig("config.xml");
        for (Item item : readConfig) {
            System.out.println(item);
        }
    }
}

JDOM (index)

Using JDOM to read a web.xml file

Now let's see JDOM in action by looking at how you could use it to parse a web.xml file, the Web application deployment descriptor from Servlet API 2.2. Let's assume that you want to look at the Web application to see which servlets have been registered, how many init parameters each servlet has, what security roles are defined, and whether or not the Web application is marked as distributed.
Here's a sample web.xml file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
    "http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">
<web-app>
    <servlet>
        <servlet-name>snoop</servlet-name>
        <servlet-class>SnoopServlet</servlet-class>
    </servlet>
    <servlet>
        <servlet-name>file</servlet-name>
        <servlet-class>ViewFile</servlet-class>
        <init-param>
            <param-name>initial</param-name>
            <param-value>1000</param-value>
            <description>
                The initial value for the counter  <!-- optional -->
            </description>
        </init-param>
    </servlet>
    <servlet-mapping>
        <servlet-name>mv</servlet-name>
        <url-pattern>*.wm</url-pattern>
    </servlet-mapping>
    <distributed/>
    <security-role>
      <role-name>manager</role-name>
      <role-name>director</role-name>
      <role-name>president</role-name>
    </security-role>
</web-app>

On processing that file, you'd want to get output that looks like this:

This WAR has 2 registered servlets:
        snoop for SnoopServlet (it has 0 init params)
        file for ViewFile (it has 1 init params)
This WAR contains 3 roles:
        manager
        director
        president
This WAR is distributed

With JDOM, achieving that output is easy. The following example reads the WAR file, builds a JDOM document representation in memory, then extracts the pertinent information from it:

import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
public class WarReader {
  public static void main(String[] args) {
    PrintStream out = System.out;
    if (args.length != 1 && args.length != 2) {
      out.println("Usage: WarReader [web.xml]");
      return;
    }
    try {
      // Request document building without validation
      SAXBuilder builder = new SAXBuilder(false);
      Document doc = builder.build(new File(args[0]));
      // Get the root element
      Element root = doc.getRootElement();
      // Print servlet information
      List servlets = root.getChildren("servlet");
      out.println("This WAR has "+ servlets.size() +" registered servlets:");
      Iterator i = servlets.iterator();
      while (i.hasNext()) {
        Element servlet = (Element) i.next();
        out.print("\t" + servlet.getChild("servlet-name")
                                .getText() +
                  " for " + servlet.getChild("servlet-class")
                                .getText());
        List initParams = servlet.getChildren("init-param");
        out.println(" (it has " + initParams.size() + " init params)");
      }
      // Print security role information
      List securityRoles = root.getChildren("security-role");
      if (securityRoles.size() == 0) {
        out.println("This WAR contains no roles");
      }
      else {
        Element securityRole = (Element) securityRoles.get(0);
        List roleNames = securityRole.getChildren("role-name");
        out.println("This WAR contains " + roleNames.size() + " roles:");
        i = roleNames.iterator();
        while (i.hasNext()) {
          Element e = (Element) i.next();
          out.println("\t" + e.getText());
        }
      }
      // Print distributed information (notice this is out of order)
      List distrib = root.getChildren("distributed");
      if (distrib.size() == 0) {
        out.println("This WAR is not distributed");
      } else {
        out.println("This WAR is distributed");
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Reading the element data in jdom

Getting the root element
Every XML document must have a root element. That element is the starting point for accessing all the information within the document. For example, that snippet of a document has <root-demo> as the root:

<root-demo id="demo">
  <description>Gotta fit servlets in somewhere!</description>
  <distributable/>
 </root-demo>

root-demo is the root of this xml document.

The root Element instance is available on a Document directly:

Element webapp = doc.getRootElement();

Getting the children
You can obtain an Element's children with various methods. getChild() returns null if no child by that name exists.

List getChildren(); // return all children
List getChildren(String name); // return all children by name
Element getChild(String name); // return first child by name

To demonstrate:

// Get a List of all direct children as Element objects
  List allChildren = element.getChildren();
  out.println("First kid: " + ((Element)allChildren.get(0)).getName());
  // Get a list of all direct children with a given name
  List namedChildren = element.getChildren("name");
  // Get a list of the first kid with a given name
  Element kid = element.getChild("name");

Using getChild() makes it easy to quickly access nested elements when the structure of the XML document is known in advance. Given that XML:

<?xml version="1.0"?>
<linux:config>
  <gui>
    <window-manager>
      <name>Enlightenment</name>
      <version>0.16.2</version>
    </window-manager>
    <!-- etc -->
  </gui>
</linux:config>

That code directly retrieves the current window manager name:

String windowManager = rootElement.getChild("gui")
                                  .getChild("window-manager")
                                  .getChild("name")
                                  .getText();

Just be careful about NullPointerExceptions if the document has not been validated. For simpler document navigation, future JDOM versions are likely to support XPath references. Children can get their parent using getParent().

Getting the element attributes

<table width="100%" border="0">  </table>

Those attributes are directly available on an Element.

String width = table.getAttributeValue("width");

You can also retrieve the attribute as an Attribute instance. That ability helps JDOM support advanced concepts such as Attributes residing in a namespace. (See the section Namespaces later in the article for more information.)

Attribute widthAttrib = table.getAttribute("width");
  String width = widthAttrib.getValue();

For convenience you can retrieve attributes as various primitive types.

int width = table.getAttribute("border").getIntValue();

You can retrieve the value as any Java primitive type. If the attribute cannot be converted to the primitive type, a DataConversionException is thrown. If the attribute does not exist, then the getAttribute() call returns null.

Extracting element content

We touched on getting element content earlier, and showed how easy it is to extract an element's text content using element.getText(). That is the standard case, useful for elements that look like this:

<name>Enlightenment</name>

But sometimes an element can contain comments, text content, and child elements. It may even contain, in advanced documents, a processing instruction:

<table>
    <!-- Some comment -->
    Some text
    <tr>Some child</tr>
    <?pi Some processing instruction?>
  </table>

This isn't a big deal. You can retrieve text and children as always:

String text = table.getText(); // "Some text"
  Element tr = table.getChild("tr"); // <tr> child

That keeps the standard uses simple. Sometimes as when writing output, it's important to get all the content of an Element in the right order. For that you can use a special method on Element called getMixedContent(). It returns a List of content that may contain instances of Comment, String, Element, and ProcessingInstruction. Java programmers can use instanceof to determine what's what and act accordingly. That code prints out a summary of an element's content:

List mixedContent = table.getMixedContent();
  Iterator i = mixedContent.iterator();
  while (i.hasNext()) {
    Object o = i.next();
    if (o instanceof Comment) {
      // Comment has a toString()
      out.println("Comment: " + o);
    }
    else if (o instanceof String) {
      out.println("String: " + o);
    }
    else if (o instanceof ProcessingInstruction) {
      out.println("PI: " + ((ProcessingInstriction)o).getTarget());
    }
    else if (o instanceof Element) {
      out.println("Element: " + ((Element)o).getName());
    }
  }

Dealing with processing instructions

Processing instructions (often called PIs for short) are something that certain XML documents have in order to control the tool that's processing them. For example, with the Cocoon Web content creation library, the XML files may have cocoon processing instructions that look like this:

<?cocoon-process type="xslt"?>

Each ProcessingInstruction instance has a target and data. The target is the first word, the data is everything afterward, and they're retrieved by using getTarget() and getData().

String target = pi.getTarget(); // cocoon-process
  String data = pi.getData(); // type="xslt"

Since the data often appears like a list of attributes, the ProcessingInstruction class internally parses the data and supports getting data attribute values directly with getValue(String name):

String type = pi.getValue("type");  // xslt

You can find PIs anywhere in the document, just like Comment objects, and can retrieve them the same way as Comments -- using getMixedContent():

List mixed = element.getMixedContent();  // List may contain PIs

PIs may reside outside the root Element, in which case they're available using the getMixedContent() method on Document:

List mixed = doc.getMixedContent();

It's actually very common for PIs to be placed outside the root element, so for convenience, the Document class has several methods that help retrieve all the Document-level PIs, either by name or as one large bunch:

List allOfThem = doc.getProcessingInstructions();
  List someOfThem = doc.getProcessingInstructions("cocoon-process");
  ProcessingInstruction oneOfThem =
    doc.getProcessingInstruction("cocoon-process");

That allows the Cocoon parser to read the first cocoon-process type with code like this:

String type =
    doc.getProcessingInstruction("cocoon-process").getValue("type");

As you probably expect, getProcessingInstruction(String) will return null if no such PI exists.

Namespaces

Namespaces are an advanced XML concept that has been gaining in importance. Namespaces allow elements with the same local name to be treated differently because they're in different namespaces. It works similarly to Java packages and helps avoid name collisions.
Namespaces are supported in JDOM using the helper class org.jdom.Namespace. You retrieve namespaces using the Namespace.getNamespace(String prefix, String uri) method. In XML the following code declares the xhtml prefix to correspond to the URL "http://www.w3.org/1999/xhtml". Then <xhtml:title> is treated as a title in the "http://www.w3.org/1999/xhtml" namespace.

<html xmlns:xhtml="http://www.w3.org/1999/xhtml">

When a child is in a namespace, you can retrieve it using overloaded versions of getChild() and getChildren() that take a second Namespace argument.

Namespace ns =
    Namespace.getNamespace("xhtml", "http://www.w3.org/1999/xhtml");
  List kids = element.getChildren("p", ns);
  Element kid = element.getChild("title", ns);

If a Namespace is not given, the element is assumed to be in the default namespace, which lets Java programmers ignore namespaces if they so desire.

Making a list, checking it twice

JDOM has been designed using the List and Map interfaces from the Java 2 Collections API. The Collections API provides JDOM with great power and flexibility through standard APIs. It does mean that to use JDOM, you either have to use Java 2 (JDK 1.2) or use JDK 1.1 with the Collections library installed.
All the List and Map objects are mutable, meaning their contents can be changed, reordered, added to, or deleted, and the change will affect the Document itself -- unless you explicitly copy the List or Map first. We'll get deeper into that in Part 2 of the article.

Exceptions

As you probably noticed, several exception classes in the JDOM library can be thrown to indicate various error situations. As a convenience, all of those exceptions extend the same base class, JDOMException. That allows you the flexibility to catch specific exceptions or all JDOM exceptions with a single try/catch block. JDOMException itself is usually thrown to indicate the occurrence of an underlying exception such as a parse error; in that case, you can retrieve the root cause exception using the getRootCause() method. That is similar to how RemoteException behaves in RMI code and how ServletException behaves in servlet code. However, the underlying exception isn't often needed because the JDOMException message contains information such as the parse problem and line number.

Reading the DocType via jdom

The DocType looks like this in xml:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The first word after DOCTYPE indicates the name of the element being constrained, the word after PUBLIC is the document type's public identifier, and the last word is the document type's system identifier. The DocType is available by calling getDocType() on a Document, and the DocType class has methods to get the individual pieces of the DOCTYPE declaration.

DocType docType = doc.getDocType();
 System.out.println("Element: " + docType.getElementName());
 System.out.println("Public ID: " + docType.getPublicID());
 System.out.println("System ID: " + docType.getSystemID());

Outputting a document via JDOM

You can output a Document using an output tool, of which there are several standard ones available. The org.jdom.output.XMLOutputter tool is probably the most commonly used. It writes the document as XML to a specified OutputStream.
The SAXOutputter tool is another alternative. It generates SAX events based on the JDOM document, which you can then send to an application component that expects SAX events. In a similar manner, DOMOutputter creates a DOM document, which you can then supply to a DOM-receiving application component. The code to output a Document as XML looks like this:

XMLOutputter outputter = new XMLOutputter();
outputter.output(doc, System.out);

XMLOutputter takes parameters to customize the output. The first parameter is the indentation string; the second parameter indicates whether you should write new lines. For machine-to-machine communication, you can ignore the niceties of indentation and new lines for the sake of speed:

XMLOutputter outputter = new XMLOutputter("", false);
outputter.output(doc, System.out);

Full Code:

import java.io.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
public class PrettyPrinter {
   public static void main(String[] args) {
        // Assume filename argument
        String filename = args[0];
        try {
            // Build the document with SAX and Xerces, no validation
            SAXBuilder builder = new SAXBuilder();
            // Create the document
            Document doc = builder.build(new File(filename));
            // Output the document, use standard formatter
            XMLOutputter fmt = new XMLOutputter();
            fmt.output(doc, System.out);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Reading the xml using JDOM

JDOM represents an XML document as an instance of the org.jdom.Document class. The Document class is a lightweight class that can hold a DocType, multiple ProcessingInstruction objects, a root Element, and Comment objects. You can construct a Document from scratch without needing a factory:

Document doc = new Document(new Element("rootElement"));

But in case of JDOM there are multiple ways of building Document. Egs. :

Builders

SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(url);

You can build documents from any data source using builder classes found in the org.jdom.input package. Currently there are two builders, SAXBuilder and DOMBuilder. SAXBuilder uses a SAX parser behind the scenes to build the Document from the file; the SAXBuilder listens for the SAX events and builds a corresponding Document in memory. That approach is very fast (basically as fast as SAX), and it is the approach we recommend. DOMBuilder is another alternative that builds a JDOM Document from an existing org.w3c.dom.Document object. It allows JDOM to interface easily with tools that construct DOM trees.
Builders are also being developed that construct JDOM Document objects from SQL queries, LDAP queries, and other data formats. So, once in memory, documents are not tied to their build tool.
Examples with SAX builder:
1. String source

String data =
                "<root>" +
                "<Companyname>" +
                "<Employee name=\"Girish\" Age=\"25\">Developer</Employee>" +
                "</Companyname>" +
                "<Companyname>" +
                "<Employee name=\"Komal\" Age=\"25\">Administrator</Employee>" +
                "</Companyname>" +
                "</root>";
        SAXBuilder builder = new SAXBuilder();
        Document document = builder.build(new ByteArrayInputStream(data.getBytes()));

2.From URL

SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(url);

Validation
The SAXBuilder and DOMBuilder constructors let the user specify if validation should be turned on, as well as which parser class should perform the actual parsing duties.

public SAXBuilder(String parserClass, boolean validation);
public DOMBuilder(String adapterClass, boolean validation);

The defaults are to use Apache's open source Xerces parser and to turn off validation. Notice that the DOMBuilder doesn't take a parserClass but rather an adapterClass. That is because not all DOM parsers have the same API. To still allow user-pluggable parsers, JDOM uses an adapter class that has a common API for all DOM parsers. Adapters have been written for all the popular DOM parsers, including Apache's Xerces, Crimson, IBM's XML4J, Sun's Project X, and Oracle's parsers V1 and V2. Each one implements that standard interface by making the right method calls on the backend parser. That works somewhat similarly to JAXP (Resources) except it supports newer parsers that JAXP does not yet support.

Jdom resources

Read Part 2 of "Easy Java/XML Integration with JDOM," Jason Hunter and Brett McLaughlin (July, 2000) to learn how to use JDOM to create and mutate XML
http://www.javaworld.com/javaworld/jw-07-2000/jw-0728-jdom2.html
The home of JDOM
http://jdom.org/
Mailing list sign-up for jdom-interest and jdom-announce, as well as list archives
http://jdom.org/involved/lists.html
The JDOM announcement press release
http://www.oreillynet.com/pub/a/mediakit/pressrelease/20000427.html
More information on DOM
http://www.w3.org/DOM/
More information on SAX
http://www.megginson.com/SAX/
More information on JAXP
http://java.sun.com/xml/

The idea behind JDOM

These may be ideas behind JDOM usage:

The JDOM API has been developed to be straightforward for Java programmers. While other XML APIs were created to be cross-language (supporting the same API for Java, C++, and even JavaScript), JDOM takes advantage of Java's abilities by using features such as method overloading, the Collections APIs, and (behind the scenes) reflection.
To be straightforward, the API has to represent the document in a way programmers would expect. For example, how would a Java programmer expect to get the text content of an element?
```
<element>This is my text content</element>
```
In some APIs, an element's text content is available only as a child Node of the Element. While technically correct, that design requires the following code to access an element's content:
```
String content = element.getFirstChild()
                    .getValue();
```
However, JDOM makes the text content available in a more straightforward way:
```
String text = element.getText();
```
Wherever possible, JDOM makes the programmer's job easier. The rule of thumb is that JDOM should help solve 80 percent or more of Java/XML problems with 20 percent or less of the traditional effort. That does not mean that JDOM conforms to only 80 percent of the XML specification. (In fact, we expect that JDOM will be fully compliant before the 1.0 final release.) What that rule of thumb does mean is that just because something could be added to the API doesn't mean it will. The API should remain sleek.
It is that it should be fast and lightweight. Loading and manipulating documents should be quick, and memory requirements should be low. JDOM's design definitely allows for that. For example, even the early, untuned implementation has operated more quickly than DOM and roughly on par with SAX, even though it has many more features than SAX.

Why do we need jdom when we have sax and dom?

JDOM vs SAX and DOM

DOM represents a document tree fully held in memory. It is a large API designed to perform almost every conceivable XML task. It also must have the same API across multiple languages. Because of those constraints, DOM does not always come naturally to Java developers who expect typical Java capabilities such as method overloading, the use of standard Java object types, and simple set and get methods. DOM also requires lots of processing power and memory, making it untractable for many lightweight Web applications and programs.

SAX does not hold a document tree in memory. Instead, it presents a view of the document as a sequence of events. For example, it reports every time it encounters a begin tag and an end tag. That approach makes it a lightweight API that is good for fast reading. However, the event-view of a document is not intuitive to many of today's server-side, object oriented Java developers. SAX also does not support modifying the document, nor does it allow random access to the document.

JDOM attempts to incorporate the best of DOM and SAX. It's a lightweight API designed to perform quickly in a small-memory footprint. JDOM also provides a full document view with random access but, surprisingly, it does not require the entire document to be in memory. The API allows for future flyweight implementations that load information only when needed. Additionally, JDOM supports easy document modification through standard constructors and normal set methods.

JDOM tutorial : Introduction

JDOM is an open source API designed to represent an XML document and its contents to the typical Java developer in an intuitive and straightforward way. As the name indicates, JDOM is Java optimized. It behaves like Java, it uses Java collections, and it provides a low-cost entry point for using XML. JDOM users don't need to have tremendous expertise in XML to be productive and get their jobs done.
JDOM interoperates well with existing standards such as the Simple API for XML (SAX) and the Document Object Model (DOM). However, it's more than a simple abstraction above those APIs. JDOM takes the best concepts from existing APIs and creates a new set of classes and interfaces that provide, in the words of one JDOM user, "the interface I expected when I first looked at org.w3c.dom." JDOM can read from existing DOM and SAX sources, and can output to DOM- and SAX-receiving components. That ability enables JDOM to interoperate seamlessly with existing program components built against SAX or DOM.
JDOM has been made available under an Apache-style, open source license. That license is among the least restrictive software licenses available, enabling developers to use JDOM in creating products without requiring them to release their own products as open source. It is the license model used by the Apache Project, which created the Apache server. In addition to making the software free, being open source enables the API to take contributions from some of the best Java and XML minds in the industry and to adapt quickly to new standards as they evolve.
History of JDOM
The JDOM API was developed by Jason Hunter and Brett McLaughlin in March 2000. Now it is being maintained by the http://www.jdom.org/. You can download the latest version of JDOM libraries and source file from its official website at http://www.jdom.org/.
The JDOM api was developed to provides fast and robust api for processing xml documents. The JDOM API is designed specifically for Java platform, making it more useful. It uses the built-in String support of the Java language. It also makes use of Java 2 collection classes wherever possible. So, JDOM API gives good performance.
Downloading JDOM API
The JDOM API is distributed from it official website at http://www.jdom.org/. You can get the latest source and binary version from http://www.jdom.org/.
The current version of JDOM is 1.1.1, which can be downloaded from http://www.jdom.org/downloads/source.html

Saturday 12 March 2011

XML in java (toc)

This tutorial gives brief introduction to xml:

Parsing an XML Document with XPath

Generating XML

Writing xml to file

XMLEncoder and XMLDecoder

Friday 11 March 2011

XMLEncoder and XMLDecoder

We will look at two approaches to representing data from Java programs in XML format. One approach is to design a custom XML language for the specific data structures that you want to represent. We will consider this approach in the next subsection. First, we'll look at an easy way to store data in XML files and to read those files back into a program. The technique uses the classes XMLEncoder and XMLDecoder. These classes are defined in the package java.beans. An XMLEncoder can be used to write objects to an OutputStream in XML form. An XMLDecoder can be used to read the output of an XMLEncoder and reconstruct the objects that were written by it. XMLEncoder and XMLDecoder have much the same functionality as ObjectOutputStream and ObjectInputStream and are used in much the same way. In fact, you don't even have to know anything about XML to use them. However, you do need to know a little about Java beans.

XMLEncoder and XMLDecoder can't be used with arbitrary objects; they can only be used with beans. When an XMLEncoder writes an object, it uses the "get" methods of that object to find out what information needs to be saved. When an XMLDecoder reconstructs an object, it creates the object using the constructor with no parameters and it uses "set" methods to restore the object's state to the values that were saved by the XMLEncoder. (Some standard java classes are processed using additional techniques. For example, a different constructor might be used, and other methods might be used to inspect and restore the state.)

Suppose that we want to use XMLEncoder and XMLDecoder to create and read files in that program. Part of the data for a SimplePaint sketch is stored in objects of type CurveData, defined as:

private static class CurveData {
   Color color;  // The color of the curve.
   boolean symmetric;  // Are reflections also drawn?
   ArrayList<Point> points;  // The points on the curve.
}

To use such objects with XMLEncoder and XMLDecoder, we have to modify this class so that it follows the Java bean pattern. The class has to be public, and we need get and set methods for each instance variable. This gives:

public static class CurveData {
   private Color color;  // The color of the curve.
   private boolean symmetric;  // Are reflections also drawn?
   private ArrayList<Point> points;  // The points on the curve.
   public Color getColor() {
      return color;
   }
   public void setColor(Color color) {
      this.color = color;
   }
   public ArrayList<Point> getPoints() {
      return points;
   }
   public void setPoints(ArrayList<Point> points) {
      this.points = points;
   }
   public boolean isSymmetric() {
      return symmetric;
   }
   public void setSymmetric(boolean symmetric) {
      this.symmetric = symmetric;
   }
}

I didn't really need to make the instance variables private, but bean properties are usually private and are accessed only through their get and set methods.

At this point, we might define another bean class, SketchData, to hold all the necessary data for representing the user's picture. If we did that, we could write the data to a file with a single output statement. In my program, however, I decided to write the data in several pieces.

An XMLEncoder can be constructed to write to any output stream. The output stream is specified in the encoder's constructor. For example, to create an encoder for writing to a file:

XMLEncoder encoder; 
try {
   FileOutputStream stream = new FileOutputStream(selectedFile); 
   encoder = new XMLEncoder( stream );
     .
     .

Once an encoder has been created, its writeObject() method is used to write objects, coded into XML form, to the stream. In the SimplePaint program, I save the background color, the number of curves in the picture, and the data for each curve. The curve data are stored in a list of type ArrayList<CurveData> named curves. So, a complete representation of the user's picture can be created with:

encoder.writeObject(getBackground());
  encoder.writeObject(new Integer(curves.size()));
  for (CurveData c : curves)
     encoder.writeObject(c);
  encoder.close();

When reading the data back into the program, an XMLDecoder is created to read from an input file stream. The objects are then read, using the decoder's readObject() method, in the same order in which they were written. Since the return type of readObject() is Object, the returned values must be type-cast to their correct type:

Color bgColor = (Color)decoder.readObject();
   Integer curveCt = (Integer)decoder.readObject();
   ArrayList<CurveData> newCurves = new ArrayList<CurveData>();
   for (int i = 0; i < curveCt; i++) {
      CurveData c = (CurveData)decoder.readObject();
      newCurves.add(c);
   }
   decoder.close();
   curves = newCurves; // Replace the program's data with data from the file.
   setBackground(bgColor);
   repaint();

The XML format used by XMLEncoder and XMLDecoder is more robust than the binary format used for object streams and is more appropriate for long-term storage of objects in files.

Full program