This week I was working on a proof-of-concept for a customer in which we wanted to extract attachments from a fillable PDF form after its submission. After trying the usual CFPDF tags, reading the documentation, and getting nowhere, we decided to try to attempt to use the iText PDF library for Java from within ColdFusion to extract the attachments.
After doing some reading through the JavaDocs for iText, I found that it includes a class called ExtractAttachments that “…lets you extract the attachemnts of a PDF.” Since this is precisely what we wanted to do, I thought for sure this would be a simple affair and I’d be able to finish what I needed to do and turn in early for the evening. Not quite!
I downloaded Mark Mandel’s excellent JavaLoader project and dropped it into my webroot. Then I grabbed the 2 JAR files from iText that I needed and put them in a folder named “lib” in the web root (remember this is a POC so I didn’t mind having that stuff in the web root).
Making the classes contained in the JARs available to ColdFusion was a snap thanks to JavaLoader. You create an array of paths to your JAR files (in my case–it also works with directories of class files) and pass that into the constructor for JavaLoader like so:
<!--- Build an array of JAR files that JavaLoader should reference ---><cfset loadPaths = arrayNew(1) /><cfset arrayAppend( loadPaths, expandPath("/lib/iText-2.1.7.jar") )/><cfset arrayAppend( loadPaths, expandPath("/lib/iText-toolbox-2.1.7.jar") )/> <!--- Create the JavaLoader object to dynamically load Java classes ---><cfset loader = createObject("component", "javaloader.JavaLoader") .init(loadPaths) />
Once the classes were available, it was very simple to get an instance of the ExtractAttachments class created.
<!--- Create the extractor object ---><cfset extractor = loader.create("com.lowagie.toolbox.plugins.ExtractAttachments") />
Ok, so now we have the ExtractAttachments class loaded into a ColdFusion variable so we can use it. But now what? Outside of the JavaDoc, there is precious little documentation on how to use this particular feature of iText so it took quite a bit of experimentation and tracing through the Java source code to figure out how to get my particular test PDF file to be used by the extractor object.
The ExtractAttachments class constructor does not allow you to pass in any arguments, so I couldn’t give it my file that way. In looking through the Java code, I found that the constructor was creating an empty FileArgument object and putting it into its internal arguments variable (which is an array that can hold various types of arguments). Turns out there is a setValue() method in the FileArgument that takes a Java file object. Upon finding that, I created a Java file object that pointed to my test PDF and passed it into the setValue() method of the FileArgument like so:
<!--- Create a java file object for our source PDF file ---><cfset inputFile = loader.create("java.io.File") .init( expandPath('/test.pdf') ) /> <!--- Set our java file object into the FileArgument argument created by ExtractAttachments constructor ---><cfset extractor.getArguments()[1].setValue( inputFile ) />
The ExtractAttachments constructor only ever puts one argument into its internal array, so I was able to reliably hard code the array index (the [1] in the code sample). Once that file object was passed into the FileArgument, all that remained was telling the extractor object to go to work:
<!--- Extract the attachment(s) ---><cfset extractor.execute() />
The result was a newly created PNG file (which is what I had attached to the PDF) created in the same directory where the file CFML page was executing. I’m sure there are ways to specify where that file gets created and do some more intricate things with iText, but for a proof-of-concept test, this one was highly successful.