Apache Poi Recipes

Post on 08-May-2015

6,019 views 1 download

description

Apache POI Recipes, presented at ApacheCon US 2009 in Oakland, gives a general description of Apache POI project and describes 3 use cases where POI functionalities are used in real applications.

transcript

Apache POIRecipesPaolo Mottadelli - ApacheCon Oakland 2009

http://chromasia.comThursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

my to-do list

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

POI @ Content Tech

✴ Document to application (and back)✴ Publish data

✴ Build a doc from your content

✴ Know your documents✴ Extract text

✴ Extract content

Thursday, November 5, 2009

1A-B-C

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

POI modules (1): OLE2

✴ POIFS: reading/writing Office Documents

✴ HSSF r/w Excel Spreadsheets

✴ HWPF r/w Word Docs

✴ HSLF r/w PowerPoint Docs

✴ HPSF r/w property sets

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

POI modules (2): OOXML

✴ XSSF: r/w OXML Excel

✴ XWPF: r/w OXML Word

✴ XSLF: r/w OXML PowerPoint

Thursday, November 5, 2009

POI 3.5

http://chromasia.comThursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

OOXML dev status

✴ XSSF: Final in POI-3.5

✴ XWPF: Draft (basic features)

✴ XSLF: Not covered (only text ext.)

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

HSSF & XSSF

✴ Common user model interface

✴ User model based on existing HSSF

✴ Using OpenXML4J and SAX

Thursday, November 5, 2009

2Same recipe, different flavours

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Common H/XSSF access

✴ org.apache.poi.ss.usermodel

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Upgrading to POI-3.5

✴ HSSFFormulaEvaluator.CellValue✴ convert from .hssf. to .ss.

✴ HSSFRow.MissingCellPolicy✴ convert from .hssf. to .ss.

✴ RecordFormatException in DDF✴ convert from .hssf. to .util. Dreadful Drawing Format

Thursday, November 5, 2009

3MeetOffice Open XML

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Open XML

✴ XML based✴ WordprocessingML

✴ SpreadsheetML

✴ PresentationML

✴ Stored as a package✴ Open Packaging Conventions

made (very) simple

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Package concepts

✴ Package (the container)

✴ Part (xml file)

✴ Relationship✴ package-relationship

✴ part-relationship

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Expanded package, Excel

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

WordprocessingML

✴ body✴ paragraphs

✴ runs

✴ properties (for runs and pars)

✴ styles

✴ headers/footers ...

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

SpreadsheetML

✴ workbook✴ worksheets

✴ rows

✴ cells

✴ styles

✴ formulas

✴ images ...

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

PresentationML

✴ presentation✴ slides

✴ slides-masters

✴ notes-masters

✴ layout, animation, audio, video, transitions ...

Thursday, November 5, 2009

4openxml4j

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

openXML4J

✴ Package, parts, rels

"/xl/worksheets/sheet1.xml"

Thursday, November 5, 2009

5Text Extraction

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Extractors

✴ POITextExtractor✴ POIOLE2TextExtractor

✴ POIXMLTextExtractor

✴ XSSFExcelExtractor

✴ XWPFWordExtractor

✴ XSLFPowerPointExtractor

✴ If text is all what you need

getText()

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Text extraction

✴ made simple

Thursday, November 5, 2009

6EXCELSimple Tasks

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

New Workbook

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

New Sheet

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Creating Cells

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Cell types

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Fills and colors

Thursday, November 5, 2009

7EXCELImp/Exp to XML

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Export to XML

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

xmlMaps.xml

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

XML Import/Export

Thursday, November 5, 2009

8WORDSimple Doc

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

A simple doc

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Thursday, November 5, 2009

9Use Case 1Alfresco Search

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Use Case

✴ Upload a document

✴ Detect document mimetype

✴ Extract text and metadata

✴ Create search index

✴ Search (and find) the document

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Without Tika✴ Detect the document mimetype

✴ (source/target mimetype)

✴ Get the proper ContentTransformer✴ (ContentTransformerRegistry)

✴ Tranform Doc Content to Text✴ (PoiHssfContentTransformer)

✴ Create Lucene indexPOI here

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

With Tika

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Extension use case

✴ Adding support for Office Open XML documents (Office 2007+)✴ Word 2007+

✴ Excel 2007+

✴ PowerPoint 2007+

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

POI text extractors

✴ Remember?

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Apache Tika (Excel)

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Apache Tika

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Apache Tika (Word)

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Apache Tika (Word)

Thursday, November 5, 2009

10Use Case 2JM LaffertyFinancial Forecasting

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Make your wb look pro-

✴ Rich text

✴ Graphics

✴ Formulas & Named Ranges

✴ Data validations

✴ Conditional formatting

✴ Cell comments

Thursday, November 5, 2009

Thursday, November 5, 2009

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Formula evaluation

✴ The evaluation engine enables you to calculate formula results from within a POI application

✴ Formulas may be added to your workbook by POI

✴ Evaluation is available for .xls and .xlsx

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Formula evaluation (continued)

✴ All arithmetic operators are implemented

✴ Over 280 Excel built in functions are supported

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Formula evaluation (code)

Thursday, November 5, 2009

11Use Case 3:CQ5 Import

Thursday, November 5, 2009

Thursday, November 5, 2009

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

importDocument()

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

getParagraphs(...)

✴ Makes use of✴ org.apache.poi.hwpf.usermodel.Range

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

importDocument()

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

getTitle(...)

✴ Gets the first paragraph’s text

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

importDocument()

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Thursday, November 5, 2009

Thursday, November 5, 2009

Thursday, November 5, 2009

12Want more?

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

More Examples

✴ http://poi.apache.org/spreadsheet/examples.html

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

Even more

✴ Get in touch✴ http://poi.apache.org/

✴ Get informed✴ dev@poi.apache.org

✴ Get involved✴ http://svn.apache.org/repos/asf/poi/trunk/

Thursday, November 5, 2009

- ApacheCon US 2009, Oakland - Apache POI Recipes -

paolo@apache.org

✴ Get slides✴ http://www.slideshare.net/paolomoz/apache-poi-recipes

Thanks

Thursday, November 5, 2009