Pages

Thursday 11 July 2013

How to create DOCX file using DOX4J api using Java.

How to read the Word Template?


  1. A docx file is merely a zip-archive of xml files (plus any binary files for embedded objects such as images), we met that requirement by unpacking the zip file, feeding the document.xml to a template engine that does the merging for us, and then zipping the output document to get the new docx file.
  2. If we want to generate a doc file either Apache POI jar can be used but, how to read the xml & write in doc file or Jasper which reads the xml file and write to doc file. 
  3. Docx to docx conversion can be done by feeding the values in XML and save it by using docx4J jar.




What DOX4J can do?


  • Open existing docx (from filesystem, SMB/CIFS, WebDAV using VFS), pptx, xlsx.
  • Create new docx, pptx, xlsx.
  • Programmatically manipulate the above (of course) Specific to docx4j (as opposed to pptx4j, xlsx4j).
  • Template substitution; CustomXML binding.
  • Produce/consume Word 2007's xmlPackage (pkg) format.
  • Save docx to filesystem as a docx (ie zipped), or to JCR (unzipped).
  • Apply transforms, including common filters.
  • Export as HTML or PDF.
  • Difference/Compare documents, paragraphs or sdt (content controls).
  • Font support (font substitution, and use of any fonts embedded in the document).




Problems with Apache POI:-

Apache POI's HWPF can read .doc files, and docx4j could use this for basic conversion of .doc to .docx.
The problem with this approach is that POI's HWPF code fails on many .doc files.

What Approach we can use?

An effective approach is to use OpenOffice (via jodconverter) to convert the doc to docx, which docx4j
can then process. If you need to return a binary .doc, OpenOffice/jodconverter can convert the docx back to .doc.

There is also http://b2xtranslator.sourceforge.net/ . If a pure Java approach were required, this could be converted.



Some References : -

How to open and manipulate Word document/template in Java?

http://stackoverflow.com/questions/9379580/how-to-open-and-manipulate-word-document-templatein-java

Getting started with DOCX4J.
http://www.docx4java.org/svn/docx4j/trunk/docx4j/docs/Docx4j_GettingStarted.html

Working with Apache POI.
http://poi.apache.org/

Example awaited.

No comments:

Post a Comment