Random musings from my awakening dementia...
08.29.2003  
XmlReader Java Bean
 

I'm quite interested in the concept of software components and how those ideas can be applied to Java code. Thoughts or ideas I have on this subject get dropped here for the benefit of humanity and my own hubris.

© 2003-2005, Howard Abrams



Except where otherwise noted, all original content is licensed under a Creative Commons License.
See details.

Don’t know if you tried out my previous Java Bean component on reading/parsing XML files, but that one would parse the file and then allow you to get particular information out of it based on queries. Not bad if you are looking for a single piece of information.

But what if you were wanting to get lots of information out of an XML file? A better approach might be one that is event-driven, like SAX.

However, there is nothing simple about SAX. The other limitation is that you need to store all the state yourself. For instance, let’s suppose you got a characters call with some text. If you didn’t keep track, you wouldn’t know what tag this text is a child of.

So, I created my XmlReader java bean. This takes the name of an XML file as a property, and when you call its parse() method, it starts generating events for each tag it encounters:

  • startDocument … This listener’s method is called when the document is initially begins to process it.
  • endDocument … This method is called when it is complete.
  • startElement … Passed an XmlReaderTagEvent when it encounters a tag. The element includes the name of the tag, a listing of all the attributes, and the first children’s text (if the first child is a text node).
  • endElement … Called when the element’s closing tag is encountered.
  • characters … Passed an XmlReaderTagEvent which includes the character data encountered (see its getValue() method, but also includes the surrounding tag’s name and attribute information.

The XmlReaderListener (which you’ll need to implement if you want to receive the events) is similar to SAX, but with a notable exception. The XmlReaderTagEvent events that are given contain a lot more information.

Let’s work off of an example (keep in mind that this is much easier to work with if you use a component integration tool (like AppComposer) … We begin with an XML file to work with— one that would be a configuration file for our program and some others.

<?xml version="1.0" standalone="yes" ?>
<testconfig>
	<global>
		<trim options="atEnd">whitespace</trim>
	</global>
	<foobar>
		<debug>yes</debug>
		<logfile>C:\TEMP\FOOBAR.LOG</logfile>
		<welcome type="text">
			Welcome to the system. Please take note of the changes,
			and lock the door on your way out.
		</welcome>
		<welcome type="html">
			Welcome to <b>the system</b>. Please take note of the changes,
			and lock the door on your way out.
		</welcome>
	</foobar>
	<bobdog>
		<trim>ignore</trim>
		<debug>no</debug>
		<logfile />
	</bobdog>
</testconfig>

Let’s suppose that we had a class that implemented the XmlReaderListener interface:

public class myApp  implements XmlReaderListener { …

It’s constructor (or other likely location) could initialize and set up the component bean:

		XmlReader bean = new XmlReader();
		File file = new File("org/howardism/xml/XmlQueryJTest.xml");
		bean.setFile(file);
		bean.setTrimWhitespace(true);
		bean.addXmlReaderListener(this);
		bean.parse();

Our startDocument() method is first called, followed by a call to startElement() method is called multiple times with each tag:

	public void startElement(XmlReaderTagEvent event)
	{
		if (event.getTagName().equals("foobar"))
				inFoobarSection = true;
		...

Wait a minute. Didn’t you just say that you didn’t need to keep track of your position in the XML file? Yes, you caught me … let’s replace that code with something else:

		if (event.getTagName().equals("debug") &#38;&
		    event.getParentTag().equals("foobar") )
				debug = event.getValue();

Hey, now that is pretty slick, huh? Well, it isn’t as slick as simply using the XmlQuery bean with a //foobar/debug query … but the idea is that you can parse lots of data with this bean.

What if you need the parent’s parent? Good question. Each event includes a getParent() tag, which actually returns the parent event of this current event. This means that you can do particularly evil things like:

"testconfig".equals(event.getParent().getParentTag())

But isn’t the purpose of using an event-based parser so that you didn’t store a lot of extra information? Why would you be storing all this parent stuff? Another good question. If you have a large file and want to treat things more like SAX and not have the bean store so much context, you can set the StoreParentInfo property to false.

So, where can I get it? I’m glad you asked. Just download my XmlBeans archive and give it a whirl. Note: if you are using it within AppComposer, then simply uncompress the ZIP in the AppComposer destination directory.

The source code is located under src (and this includes the JUnit test suite). There are examples that demonstrate this bean usage as AppComposer capsules (good place to start, I suppose). Oh, and there is plenty of documentation in the docs directory.

A comment to this from Howard the Geek

Hmmm… why does this not generate namespace events? Or better yet, why isn’t the namespace stored with the XmlReaderTagEvents? Guess I’m going to be working on a version 1.1 …

Wait, maybe I will wait for my constituency to supply me with market data? Ok, first person to ask for it makes me start to work on it!

Comment posted on Friday, 29 August 2003