+ All Categories
Home > Documents > The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001):...

The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001):...

Date post: 25-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
103
Hyphenation for HTML Mathias Nater [email protected] http://mnn.ch/ Motivation layout w/o hyphenation layout with hyphenation The T E X hyphenation algorithm The original TEX hyphenation algorithm (1977) The current TEX hyphenation algorithm (1983) Creating the patterns (patgen) Using the patterns (hyphenation) HTML and the soft hyphen The Port to Javascript Server side or Client side? How it works Differences and Improvements Back to the Future The T E X hyphenation applied to HTML About Frank M. Liangs hyphenation algorithm and its port to Javascript Mathias Nater [email protected] http://mnn.ch/ BachoT E X 2010
Transcript
Page 1: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The TEX hyphenation applied to HTMLAbout Frank M. Liangs hyphenation algorithm and its

port to Javascript

Mathias [email protected]://mnn.ch/

BachoTEX 2010

Page 2: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Organisation

MotivationText layout without hyphenationText layout with hyphenation

The TEX hyphenation algorithmThe original TEX hyphenation algorithm (1977)The current TEX hyphenation algorithm (1983)

Creating the patterns (patgen)Using the patterns (hyphenation)

HTML and the soft hyphen

The Port to JavascriptServer side or Client side?How it worksDifferences and ImprovementsBack to the Future

Page 3: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Organisation

MotivationText layout without hyphenationText layout with hyphenation

The TEX hyphenation algorithmThe original TEX hyphenation algorithm (1977)The current TEX hyphenation algorithm (1983)

Creating the patterns (patgen)Using the patterns (hyphenation)

HTML and the soft hyphen

The Port to JavascriptServer side or Client side?How it worksDifferences and ImprovementsBack to the Future

Page 4: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Organisation

MotivationText layout without hyphenationText layout with hyphenation

The TEX hyphenation algorithmThe original TEX hyphenation algorithm (1977)The current TEX hyphenation algorithm (1983)

Creating the patterns (patgen)Using the patterns (hyphenation)

HTML and the soft hyphen

The Port to JavascriptServer side or Client side?How it worksDifferences and ImprovementsBack to the Future

Page 5: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Organisation

MotivationText layout without hyphenationText layout with hyphenation

The TEX hyphenation algorithmThe original TEX hyphenation algorithm (1977)The current TEX hyphenation algorithm (1983)

Creating the patterns (patgen)Using the patterns (hyphenation)

HTML and the soft hyphen

The Port to JavascriptServer side or Client side?How it worksDifferences and ImprovementsBack to the Future

Page 6: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Organisation

hyphenation patterns

patgenword list

soft hyphenhyphenation algorithm

need

hyphenator

Page 7: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenation

Current BrowsersI MS IE 6/7/8 (∼ 44%)I Firefox 3.5 (∼ 42%)I Safari 4 (∼ 4%)I Opera 10 (∼ 3%)

do not hyphenate text automatically!I align left: overfull boxes and unbalanced line endingsI justified: big word spaces and rivers

Page 8: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenation

Current BrowsersI MS IE 6/7/8 (∼ 44%)I Firefox 3.5 (∼ 42%)I Safari 4 (∼ 4%)I Opera 10 (∼ 3%)

do not hyphenate text automatically!I align left: overfull boxes and unbalanced line endingsI justified: big word spaces and rivers

Page 9: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenation

Current BrowsersI MS IE 6/7/8 (∼ 44%)I Firefox 3.5 (∼ 42%)I Safari 4 (∼ 4%)I Opera 10 (∼ 3%)

do not hyphenate text automatically!This leads to poor typography:

I align left: overfull boxes and unbalanced line endingsI justified: big word spaces and rivers

Page 10: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenation

Current BrowsersI MS IE 6/7/8 (∼ 44%)I Firefox 3.5 (∼ 42%)I Safari 4 (∼ 4%)I Opera 10 (∼ 3%)

do not hyphenate text automatically!This leads to poor typography:

I align left: overfull boxes and unbalanced line endingsI justified: big word spaces and rivers

Page 11: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenation

Current BrowsersI MS IE 6/7/8 (∼ 44%)I Firefox 3.5 (∼ 42%)I Safari 4 (∼ 4%)I Opera 10 (∼ 3%)

do not hyphenate text automatically!This leads to poor typography:

I align left: overfull boxes and unbalanced line endingsI justified: big word spaces and rivers

Page 12: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenationtext-align: left;

Page 13: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout without hyphenationtext-align: justify;

Page 14: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout with hyphenationtext-align: left;

Page 15: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout with hyphenationtext-align: left;

Page 16: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout with hyphenationtext-align: justify;

Page 17: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout with hyphenationtext-align: justify;

Page 18: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

We need (automatic)hyphenation in

HTML!

Page 19: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Page 20: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Page 21: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Page 22: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Page 23: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Page 24: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Page 25: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The original TEX hyphenation algorithmThe original hyphenation algorithm

1977 by Donald E. Knuth and Franklin M. LiangI for english onlyI suffix and prefix removalI vowel-consonant-consonant-vowel breakingI special case rules (e.g. “break after ck!”)I small exception dictionary

Found ∼ 40% of the allowable hyphen points with 1% error

Page 26: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The current TEX hyphenation algorithmThe current hyphenation algorithm

1983 PhD thesis by Franklin M. LiangI use of hyphenation patternsI two algorithms:

I pattern creation (patgen)I applying the patterns (TEX)

I support for a wide range of languagesI small, easy, fast

Page 27: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The current TEX hyphenation algorithmThe current hyphenation algorithm

1983 PhD thesis by Franklin M. LiangI use of hyphenation patternsI two algorithms:

I pattern creation (patgen)I applying the patterns (TEX)

I support for a wide range of languagesI small, easy, fast

Page 28: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The current TEX hyphenation algorithmThe current hyphenation algorithm

1983 PhD thesis by Franklin M. LiangI use of hyphenation patternsI two algorithms:

I pattern creation (patgen)I applying the patterns (TEX)

I support for a wide range of languagesI small, easy, fast

Page 29: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

The current TEX hyphenation algorithmThe current hyphenation algorithm

1983 PhD thesis by Franklin M. LiangI use of hyphenation patternsI two algorithms:

I pattern creation (patgen)I applying the patterns (TEX)

I support for a wide range of languagesI small, easy, fast

Page 30: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Creating patterns with patgen

I INPUT: a list of hyphenated words [, precomputedpattern, translate file]

I takes up to 9 runs (asking for many settings, adding anew level in each run)

I OUTPUT: pattern file, statistics (a lot!)

I old codeI no UTF-8I refactored by David Antoš (OPatGen), but doesn’t

compile

Page 31: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Creating patterns with patgen

I INPUT: a list of hyphenated words [, precomputedpattern, translate file]

I takes up to 9 runs (asking for many settings, adding anew level in each run)

I OUTPUT: pattern file, statistics (a lot!)

I old codeI no UTF-8I refactored by David Antoš (OPatGen), but doesn’t

compile

Page 32: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Creating patterns with patgen

I INPUT: a list of hyphenated words [, precomputedpattern, translate file]

I takes up to 9 runs (asking for many settings, adding anew level in each run)

I OUTPUT: pattern file, statistics (a lot!)

I old codeI no UTF-8I refactored by David Antoš (OPatGen), but doesn’t

compile

Page 33: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Creating patterns with patgen

I INPUT: a list of hyphenated words [, precomputedpattern, translate file]

I takes up to 9 runs (asking for many settings, adding anew level in each run)

I OUTPUT: pattern file, statistics (a lot!)

I old codeI no UTF-8I refactored by David Antoš (OPatGen), but doesn’t

compile

Page 34: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Creating patterns with patgen

I INPUT: a list of hyphenated words [, precomputedpattern, translate file]

I takes up to 9 runs (asking for many settings, adding anew level in each run)

I OUTPUT: pattern file, statistics (a lot!)

I old codeI no UTF-8I refactored by David Antoš (OPatGen), but doesn’t

compile

Page 35: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Creating patterns with patgen

I INPUT: a list of hyphenated words [, precomputedpattern, translate file]

I takes up to 9 runs (asking for many settings, adding anew level in each run)

I OUTPUT: pattern file, statistics (a lot!)

I old codeI no UTF-8I refactored by David Antoš (OPatGen), but doesn’t

compile

Page 36: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns

I .in1 b2l2 4edi b4le.I patterns: short strings with integer valuesI odd values: valid breakpointsI even values: forbidden breakpointsI lower values are overwritten by higher valuesI points mark begin/end of the word

Page 37: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns

I .in1 b2l2 4edi b4le.I patterns: short strings with integer valuesI odd values: valid breakpointsI even values: forbidden breakpointsI lower values are overwritten by higher valuesI points mark begin/end of the word

Page 38: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns

I .in1 b2l2 4edi b4le.I patterns: short strings with integer valuesI odd values: valid breakpointsI even values: forbidden breakpointsI lower values are overwritten by higher valuesI points mark begin/end of the word

Page 39: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns

I .in1 b2l2 4edi b4le.I patterns: short strings with integer valuesI odd values: valid breakpointsI even values: forbidden breakpointsI lower values are overwritten by higher valuesI points mark begin/end of the word

Page 40: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns

I .in1 b2l2 4edi b4le.I patterns: short strings with integer valuesI odd values: valid breakpointsI even values: forbidden breakpointsI lower values are overwritten by higher valuesI points mark begin/end of the word

Page 41: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns (example)

incredible. i n c r e d i b l e .. i n1

b2l24e d i

i1b ln1c r

b4l e .5c r e d

e d3i b2r2e d

––––––––––––-. i n5c2r4e d3i1b4l2e .in-cred-i-ble

Page 42: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns (example)

incredible. i n c r e d i b l e .. i n1

b2l24e d i

i1b ln1c r

b4l e .5c r e d

e d3i b2r2e d

––––––––––––-. i n5c2r4e d3i1b4l2e .in-cred-i-ble

Page 43: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns (example)

incredible. i n c r e d i b l e .. i n1

b2l24e d i

i1b ln1c r

b4l e .5c r e d

e d3i b2r2e d

––––––––––––-. i n5c2r4e d3i1b4l2e .in-cred-i-ble

Page 44: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns (example)

incredible. i n c r e d i b l e .. i n1

b2l24e d i

i1b ln1c r

b4l e .5c r e d

e d3i b2r2e d

––––––––––––-. i n5c2r4e d3i1b4l2e .in-cred-i-ble

Page 45: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

applying the patterns (example)

incredible. i n c r e d i b l e .. i n1

b2l24e d i

i1b ln1c r

b4l e .5c r e d

e d3i b2r2e d

––––––––––––-. i n5c2r4e d3i1b4l2e .in-cred-i-ble

example 2

Page 46: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

HTML and the Soft Hyphen

I limited control over textlayout(text-align: left | right | justify)

I manual line breaks (<br>)I manually inserted soft hyphens

(&shy; – discretionary hyphen)I some more controls are upcoming with CSS3

I laying out text is up to the browserI developer has no control over how text is displayed

Page 47: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

HTML and the Soft Hyphen

I limited control over textlayout(text-align: left | right | justify)

I manual line breaks (<br>)I manually inserted soft hyphens

(&shy; – discretionary hyphen)I some more controls are upcoming with CSS3

I laying out text is up to the browserI developer has no control over how text is displayed

Page 48: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

HTML and the Soft Hyphen

I limited control over textlayout(text-align: left | right | justify)

I manual line breaks (<br>)I manually inserted soft hyphens

(&shy; – discretionary hyphen)I some more controls are upcoming with CSS3

I laying out text is up to the browserI developer has no control over how text is displayed

Page 49: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

HTML and the Soft Hyphen

I limited control over textlayout(text-align: left | right | justify)

I manual line breaks (<br>)I manually inserted soft hyphens

(&shy; – discretionary hyphen)I some more controls are upcoming with CSS3

I laying out text is up to the browserI developer has no control over how text is displayed

Page 50: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

HTML and the Soft Hyphen

I limited control over textlayout(text-align: left | right | justify)

I manual line breaks (<br>)I manually inserted soft hyphens

(&shy; – discretionary hyphen)I some more controls are upcoming with CSS3

I laying out text is up to the browserI developer has no control over how text is displayed

Page 51: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout with hyphenationtext-align: justify;

Page 52: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Text layout with hyphenationtext-align: justify;

Page 53: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Putting all together

hyphenation patterns

patgenword list

soft hyphenhyphenation algorithm

need

hyphenator

Page 54: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 55: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 56: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 57: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 58: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 59: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 60: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 61: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 62: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Server side or Client side hyphenation?

Pro server side:I lower bandwidth usageI fasterI only hyphenate once, store the result

Pro client side:I cleaner HTML (search engines!)I takes in count client odditiesI can be switched on/offI hyphenation is part of CSS3, so even the W3C believes

that hyphenation belongs to the clientI user generated text can be hyphenated on the fly

Page 63: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 64: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 65: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 66: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 67: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 68: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 69: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 70: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

My Decision

I server side solutions already existed: php, perl, java,python

I I believe that hyphenation has to be done in the clientI Javascript is a very interesting languageI the acceptance of Javascript is growingI Firefox 2 didn’t support &shy;I I like bookmarklets

I hyphenator.js: client-side hyphenationI it’s proofing to be a good decision:

I other – webkit based – programs are using hyphenatorI it’s easy to useI there’s a big effort on making javascript faster

Page 71: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 72: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 73: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 74: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 75: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 76: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 77: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 78: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 79: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

How it works

1. register all elements that need hyphenation2. if the language is not set, ask for it3. download the patterns, if not already done4. split the paragraphs in words (and URLs)5. process each word, put &shy; at every valid breakpoint6. The browser will re-render the text automatically, taking

in account the soft hyphens.

I execution is fastI downloading the script and the patterns takes time

script: 25 KB, en: 25 KB, pl: 37 KB, de: 74 KB

Page 80: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Main Differences

I don’t care about space in RAM, care about program sizeI no Trie (retrieval tree)

I no special data structures in JavascriptI using a trie is faster in execution (10ms)I but: building the tree from the patterns takes timeI but: for a tree extra code is used (uses bandwith)I but: transferring the hardcoded trie is no solution, either

(overhead: 50%)I using a hash table (Javascript: object) insteadI UTF-8 (Thanks to Arthur and Mojca)

Page 81: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Main Differences

I don’t care about space in RAM, care about program sizeI no Trie (retrieval tree)

I no special data structures in JavascriptI using a trie is faster in execution (10ms)I but: building the tree from the patterns takes timeI but: for a tree extra code is used (uses bandwith)I but: transferring the hardcoded trie is no solution, either

(overhead: 50%)I using a hash table (Javascript: object) insteadI UTF-8 (Thanks to Arthur and Mojca)

Page 82: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Main Differences

I don’t care about space in RAM, care about program sizeI no Trie (retrieval tree)

I no special data structures in JavascriptI using a trie is faster in execution (10ms)I but: building the tree from the patterns takes timeI but: for a tree extra code is used (uses bandwith)I but: transferring the hardcoded trie is no solution, either

(overhead: 50%)I using a hash table (Javascript: object) insteadI UTF-8 (Thanks to Arthur and Mojca)

Page 83: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements I

Packing the patterns (helper: compressor):I size of the pattern file does matterI no whitespace (> 12% saved!)

a1 ą1 e1 ę1 i1 o1 ó1 u1 y1 _a1 _b8 _c8 _ć8 _d8

2:’a1ą1e1ę1i1o1ó1u1y1’,3:’_a1_b8_c8_ć8_d8_e1_f8

I http-requests take timeI merge the script and the necessary patterns (usualy just

one) in one fileI saves 2 requests per pattern-file

Page 84: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements I

Packing the patterns (helper: compressor):I size of the pattern file does matterI no whitespace (> 12% saved!)

a1 ą1 e1 ę1 i1 o1 ó1 u1 y1 _a1 _b8 _c8 _ć8 _d8

2:’a1ą1e1ę1i1o1ó1u1y1’,3:’_a1_b8_c8_ć8_d8_e1_f8

I http-requests take timeI merge the script and the necessary patterns (usualy just

one) in one fileI saves 2 requests per pattern-file

Page 85: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements I

Packing the patterns (helper: compressor):I size of the pattern file does matterI no whitespace (> 12% saved!)

a1 ą1 e1 ę1 i1 o1 ó1 u1 y1 _a1 _b8 _c8 _ć8 _d8

2:’a1ą1e1ę1i1o1ó1u1y1’,3:’_a1_b8_c8_ć8_d8_e1_f8

Merging script and patterns in one file (helper: merge+pack)I http-requests take timeI merge the script and the necessary patterns (usualy just

one) in one fileI saves 2 requests per pattern-file

Page 86: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements I

Packing the patterns (helper: compressor):I size of the pattern file does matterI no whitespace (> 12% saved!)

a1 ą1 e1 ę1 i1 o1 ó1 u1 y1 _a1 _b8 _c8 _ć8 _d8

2:’a1ą1e1ę1i1o1ó1u1y1’,3:’_a1_b8_c8_ć8_d8_e1_f8

Merging script and patterns in one file (helper: merge+pack)I http-requests take timeI merge the script and the necessary patterns (usualy just

one) in one fileI saves 2 requests per pattern-file

Page 87: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements II

Using reduced pattern sets for static sites (helper:reducePatternSet)

I most patterns are not usedI if the the text will not change, use a precomputed subsetI savings vary

I only take in account breakpoints of composite words:Zeilen-ende instead of Zei-len-en-dede patterns are now 37 KB instead of 74 KB265683 good, 22837 bad, 995752 missed21.06 %, 1.81 %, 78.94 %

I or use different settings for patgen!

Page 88: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements II

Using reduced pattern sets for static sites (helper:reducePatternSet)

I most patterns are not usedI if the the text will not change, use a precomputed subsetI savings vary

I only take in account breakpoints of composite words:Zeilen-ende instead of Zei-len-en-dede patterns are now 37 KB instead of 74 KB265683 good, 22837 bad, 995752 missed21.06 %, 1.81 %, 78.94 %

I or use different settings for patgen!

Page 89: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements II

Using reduced pattern sets for static sites (helper:reducePatternSet)

I most patterns are not usedI if the the text will not change, use a precomputed subsetI savings vary

(Recompute the patterns)I only take in account breakpoints of composite words:

Zeilen-ende instead of Zei-len-en-dede patterns are now 37 KB instead of 74 KB265683 good, 22837 bad, 995752 missed21.06 %, 1.81 %, 78.94 %

I or use different settings for patgen!

Page 90: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements II

Using reduced pattern sets for static sites (helper:reducePatternSet)

I most patterns are not usedI if the the text will not change, use a precomputed subsetI savings vary

(Recompute the patterns)I only take in account breakpoints of composite words:

Zeilen-ende instead of Zei-len-en-dede patterns are now 37 KB instead of 74 KB265683 good, 22837 bad, 995752 missed21.06 %, 1.81 %, 78.94 %

I or use different settings for patgen!

Page 91: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Improvements II

Using reduced pattern sets for static sites (helper:reducePatternSet)

I most patterns are not usedI if the the text will not change, use a precomputed subsetI savings vary

(Recompute the patterns)I only take in account breakpoints of composite words:

Zeilen-ende instead of Zei-len-en-dede patterns are now 37 KB instead of 74 KB265683 good, 22837 bad, 995752 missed21.06 %, 1.81 %, 78.94 %

I or use different settings for patgen!

Page 92: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Hyphenator.js Problems and Oddities

I Problems upon copy/paste of hyphenated textI Problems with loaded fonts (@font-face)I Patterns very different in sizeI some rare misplaced hyphenation breaks may happen

Page 93: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Hyphenator.js Problems and Oddities

I Problems upon copy/paste of hyphenated textI Problems with loaded fonts (@font-face)I Patterns very different in sizeI some rare misplaced hyphenation breaks may happen

Page 94: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Hyphenator.js Problems and Oddities

I Problems upon copy/paste of hyphenated textI Problems with loaded fonts (@font-face)I Patterns very different in sizeI some rare misplaced hyphenation breaks may happen

Page 95: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

Hyphenator.js Problems and Oddities

I Problems upon copy/paste of hyphenated textI Problems with loaded fonts (@font-face)I Patterns very different in sizeI some rare misplaced hyphenation breaks may happen

Page 96: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

What the future shall/may bring

I CSS3: browsers do hyphenation (w/o hyphenator.js)I TUG: maintained hyphenation patterns (beware of size!)I Wish: better typography in web sites.I Me: Try to rewrite PatGen for UTF-8 Support

Page 97: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

What the future shall/may bring

I CSS3: browsers do hyphenation (w/o hyphenator.js)I TUG: maintained hyphenation patterns (beware of size!)I Wish: better typography in web sites.I Me: Try to rewrite PatGen for UTF-8 Support

Page 98: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

What the future shall/may bring

I CSS3: browsers do hyphenation (w/o hyphenator.js)I TUG: maintained hyphenation patterns (beware of size!)I Wish: better typography in web sites.I Me: Try to rewrite PatGen for UTF-8 Support

Page 99: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Motivationlayout w/o hyphenation

layout with hyphenation

The TEXhyphenationalgorithmThe original TEXhyphenation algorithm(1977)

The current TEXhyphenation algorithm(1983)

Creating the patterns(patgen)

Using the patterns(hyphenation)

HTML and thesoft hyphen

The Port toJavascriptServer side or Client side?

How it works

Differences andImprovements

Back to the Future

What the future shall/may bring

I CSS3: browsers do hyphenation (w/o hyphenator.js)I TUG: maintained hyphenation patterns (beware of size!)I Wish: better typography in web sites.I Me: Try to rewrite PatGen for UTF-8 Support

Page 100: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Appendix

Soft hyphen

In chapter 9.3.3. the HTML 4.01 Specification tells us thefollowing about hyphenation in HTML:

[. . . ] The soft hyphen tells the user agent where aline break can occur.[. . . ] If a line is broken at a soft hyphen, a hyphencharacter must be displayed at the end of the firstline. If a line is not broken at a soft hyphen, the useragent must not display a hyphen character. [. . . ]The soft hyphen is represented by the characterentity reference &shy; (&#173; or &#xAD;)

Return

Page 101: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Appendix

applying the patterns – example 2

hyphenation. h y p h e n a t i o n .

2i o1n a

o2nh e2n

n2a t1t i o

h e n a4h y3p h

h e n5a t––––––––––––-. h y3p h e2n5a4t2i o2n .hy-phen-ation

Return

Page 102: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Appendix

For Further Reading I

David Antoš (2001):PatLib, Pattern Manipulating Library – Master ThesisMasaryk University Brno, Faculty of Informatics

Donald E. Knuth (1999):Digital Typography.Stanford, California: Center for the Study of Languageand InformationISBN 1-57586-010-4Franklin Mark Liang (1983):Word Hy-phen-a-tion by Com-put-er. PhD thesisDepartment of Computer Science, Stanford University:Stanford, CA 94305.http://www.tug.org/docs/liang/liang-thesis.pdf

Page 103: The TeX hyphenation applied to HTML - About Frank M ... · ForFurtherReadingI David Antoš (2001): PatLib,PatternManipulatingLibrary–MasterThesis

Hyphenation forHTML

Mathias [email protected]://mnn.ch/

Appendix

For Further Reading II

Christine Römer, Herbert Voß (2008):Deutsche Silbentrennmuster – aus linguistischer undTEXnischer Sicht. PDF, Jena 06. 03. 2008http://www.personal.uni-jena.de/˜xcr/v2/Dateien/File/Jena2008.pdf

Raggett Dave, Le Hors Arnaud, Jacobs Ian(1999):HTML 4.01 Specification – W3C Recommendation 24December 1999.http://www.w3.org/TR/html401/


Recommended