ODT files are an archive of XML files that describe and contain the content for a text document in the OpenDocument format. As I wrote earlier, you can use ColdFusion 8 and the CFZip tag to browse the contents of these files. Extracting the actual content of an ODT file, however, is more useful than simply looking.

I created a simple document to work with containing a basic variety of paragraphs and headers:

Image of text document.

The particular file in the ODT archive that contains the content of the text document is named, very intuitively, content.xml. Since content.xml is a physical file within the ODT archive, we can use CFFile to peek at the XML:

view plain print about
1<!--- CFZip requires an absolute path to the archive. --->
2<cfset variables.archive = getDirectoryFromPath(expandPath("*.*")) & "test.odt">
4<!--- Read in the contents of content.xml --->
5<cfzip action="read" file="#variables.archive#" entrypath="content.xml" variable="xmlContent">
7<cfset variables.content = xmlParse(variables.xmlContent)>

Dumping the xml content gives us a view of the structure of the document:

Image of content XML dump.

As you can see, the root node of the content.xml document is named office:document-content. I have collapsed a few of the nodes to highlight the actual content nodes.

Obtaining the actual text content is straightforward with ColdFusion's xmlSearch function:

view plain print about
1<cfset variables.text = xmlSearch(variables.content, "/office:document-content/office:body/office:text")>

The XPath search parameter specifies to retrieve all elements within the office:text child node, and returns an array. The children of the office:text node contain the elements that make up the document content.