Date post: | 11-Nov-2014 |
Category: |
Technology |
Upload: | julian-viereck |
View: | 1,618 times |
Download: | 0 times |
pdf.jsJulian Viereck
@jviereck
Overview
• What is pdf.js
• How PDF is structured
• Processing in pdf.js
• Images & Fonts
• Problems
• Todo
• Demo
What is pdf.js
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source
root objID, xRef byte offset
root obj = ref to pages catalog
How PDF is structuredHeader
Body
[Objects]
xRef Table
Trailer
sequence of objets
fonts, drawing cmds, images, words, bookmarks, form fields
mapping objID ⇔ byte offset
PDF version
PDF file
CanvasGraphics
PartialEvaluator
Processing in pdf.js
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
• load required objects (fonts, images)
• graphics.executeIR(IR)
InternalRepresentation
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
3 0 obj<</Type /Page/MediaBox [0 0 612 792]/Resources 4 0 R/Parent 2 0 R/Contents 5 0 R>>endobj
1. page=PDFDoc.getPage(2) ➟ obj#3
2. page.startRendering(...) ➟ obj#4, obj#5
stream maybe encoded!
setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj CanvasGraphics
PartialEvaluator xRef, catalog, resources+
IR Form
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
• fillWithPixelData(bytes, imgData)
• canvas.putImageData(imgData)
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)
• some fonts can’t be converted :(
• use drawing commands?
Problems• No way to detect font is loaded (hacks)
• Font width (wrong on some platforms)
• Subpixel font size depending on platform
• Text selection
• Printing
• Speed
• use workers (postMessage lose shape)
• partial rendering
platform = browser + OS
Todo
• more font work, printing, speed
• support more rendering spec
• explore using SVG
• PDF forms, “advanced PDF features”
• infrastructure: automated testing, requireJS
• test more PDF (need your help!)
Demo
Contact
Github: https://github.com/andreasgal/pdf.js
Mailing list: https://groups.google.com/group/mozilla.dev.pdf-js/topics
IRC: irc.mozilla.org #pdfjs