Monday, March 15, 2010

Asserting MS Office formats

As a follow up to the PDF post, you can do something similar using Apache POI
import org.apache.poi.POITextExtractor;
import org.apache.poi.extractor.ExtractorFactory;

ByteArrayInputStream bais = new ByteArrayInputStream(in);
POITextExtractor extractor = ExtractorFactory.createExtractor(bais);

You should be able to parse word documents and excel and ppt's using this beanshell post processor. Remember to copy the POI libraries to JMeter/lib

Future work
Look into Apache Tika for a unified interface (may need to sacrifice some functionality like the startPage endPage of PDFBox)

No comments: