Date post: | 15-Jun-2015 |
Category: |
Education |
Upload: | biblioteca-nacional-de-espana |
View: | 516 times |
Download: | 0 times |
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Computer Lexica in OCR and Retrieval
Katrien Depuydt, Jesse de Does (Instituut voor Nederlandse Lexicologie, Leiden)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
4 March 2009 presentation The Hague 2
Can we handle ‘de wereld’ (‘the world’)’?
werreid
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 3
OCR:Abbyy Finereader SDK with built in standard Dutch dictionary
OCR:Abbyy Finereader SDK combining built in modernDutch dictionary with IMPACT external historical lexicon of Dutch:
werreld
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 4
werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarelsswerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareldweirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
RETRIEVAL: key in modern WERELD and find all
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 5
The long s problem: An example ….
OCR at start of project
A. De eerde was de gevaarlykflti om de verlei¬ding aan 't Hof; de tweede de ftillie en veiligde;de derde de zwaarde, daar hy byna drie millioenenharde en onbefchaafde Menfchen beftieren moest.
.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 6
The long s problem: An example ….
OCR at start of project Results April 2010
A. De eerde was de gevaarlykflti om de verlei¬ding aan 't Hof; de tweede de ftillie en veiligde;de derde de zwaarde, daar hy byna drie millioenenharde en onbefchaafde Menfchen beftieren moest.
A. De eerste was de gevaarlykste om de verlei-ding aan 't Hof; de tweede de stilste en veiligste;de derde de zwaarste, daar hy byna drie millioenenharde en onbeschaafde Menschen bestieren moest.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 7
The long s problem: An example ….
OCR at start of project Results April 2010
A. De eerde was de gevaarlykflti om de verlei¬ding aan 't Hof; de tweede de ftillie en veiligde;de derde de zwaarde, daar hy byna drie millioenenharde en onbefchaafde Menfchen beftieren moest.
A. De eerste was de gevaarlykste om de verlei-ding aan 't Hof; de tweede de stilste en veiligste;de derde de zwaarste, daar hy byna drie millioenenharde en onbeschaafde Menschen bestieren moest.
Workaround: “integrated postcorrection” tell the engine that “eerfte” is OK and postcorrect it afterwards with the lexicon.
In this way we keep it from turning to “eerde” (earth) instead of “eerste” (first)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 8
Overview
What is a computer lexiconLexica in IMPACTTools for lexicon building and applying lexica Some resultsSearching Demonstration
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 9
What is a computer lexicon?
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 10
Computer lexicon vs electronic dictionary (1)
An electronic dictionary is: Digitised full text (no pictures)For human useIdeally: searchable with explicitely coded material (XML), such as a
lemma, part of speech (PoS), meaning, quotes etc.Examples: OED online, WNT online
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 11
Dictionary XML (example)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 12
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 13
Computer Lexicon vs Electronic Dictionary (2)A computer lexicon is:
Always in a structured digital format (XML, relational database) Main purpose: computer applicationExplicitely coded information (e.g. lemma wereld, part of speech
noun, morphology werelden, werelds … , syntax)
Examples of use:
Linguistic enrichment of text material‘Advanced’ searching (words with all spelling variant and inflections)Automatic summarization, keyword extraction…
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 14
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 15
Lexica in IMPACT
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 16
The OCR lexiconAn OCR lexicon is
A checked list of words in a languageBased on a corpus (collection) of dated texts (selection!)Preferably with frequency informationPreferably from the same time period or of the same text type as
the texts you wish to digitize
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 17
OCR lexicon: example1550-1750 > 1900
song 820rihte 818theire 818manye 818sume 815Do 814Whiche 811fyrst 811while 811Water 810wt 809shalbe 808thingis 807again 806sona 806wa 805mode 804work 802between 801law 799moder 798mis 798softe 798
television 418electronic 375video 194hormone 176jazz 162eco 142software 136vitamin 128movie 121taxi 113isotopic 108electronics 95radar 86basically 71sabotage 71homozygote 70psychedelic 67phonemic 66insulin 64zap 64antibody 61fungicidal 61
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 18
The IR lexicon IR lexicon: most important information categories
word forms (lists of words) + - frequency information- quotes (dated sources) from corpora or electronic dictionaries- MODERN LEMMA (// entrance dictionary) linked to spelling variants and inflected forms of the same word
The modern lemma is used for searching in textsStandard use in corpus linguistics and modern historical lexicography
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 19
<?xml version='1.0'?><!DOCTYPE lexicon SYSTEM 'NL_Structure.dtd'><lexicon><lexical_entry><lemma_id>219490</lemma_id><modern_lemma>aantuilen</modern_lemma><gloss></gloss><POS>VRB</POS><ne_label></ne_label><language_id></language_id><portmanteau_lemma_id></portmanteau_lemma_id>
<wordform><form_representation><wordform_id>850026</wordform_id><written_form>tuyld</written_form><attestation><id>92141</id><token_id></token_id><quote>Verhael ick (<I>t.w. een als vrouw verkleede man</I>) haer mijn min in Vrouwelijcker schynen:Sy acht het boertery, en tuyld daer weer op an, Vermits een Vrou niet op een Vrou verlieven kan,</quote>
<derivation_id>0</derivation_id><document_id>204</document_id><start_pos>119</start_pos><end_pos>124</end_pos></attestation></form_representation></wordform>
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 20
Tools for lexicon building and application of lexica
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 21
Types variation (spelling, inflection…)uytterlijcste uyterlijkste d'uyterlijke uiterlyke uyterlijcke uiterlijke uyterlijck uiterlyken uiterlijkste uiterlicke wterlicke wterlijcke ulterlijk uiterlyk uiterlijk uyterlick wterlicken d'uyterlijcke uiterlijken uiterlijks wterlijck uytterlicke uitterlijke ujterlijke uytterlijk uyterlycke uyterlicken uijterlicke d'uiterlijcke wtterlijcke wterlyke wtterlijk uuterlick uuterlic uyterlijke uyterlijcken uyterlicke d'uiterlyke wterlijke vuyterlijcke uuterlycke uuterlicke wterlijken uyterlijcksten uuyterlicke uuyterlick uuyterlycke uytterlijcke uytterlycke uytterlick vuytterlicke uiterlijker uyterlyck uterliek wterlijcken uiterlijkst uitterlijk uytterlijcken uyterlyk wterlick uutterlijck uuyterlicken uyttelijck uijterlijk uytterlijck uuterlijck uiterlick uitterlyk uuyterlic uuyterlyck uuyterlijck uiterlijck uytterlyck uterlyc wterlijk
I
werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlytwereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlysswarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
II
(patterns to predict variation)
(a number are predictable with patterns, others need to be taken from a lexicon )
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Neil Fitzgerald, 7th July 2011 22
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 23
Computer lexica
For OCR and OCR post correctionImproving searchability of historic text material by building a lexicon with variants by using a modern lemma as a search entry
Tools for lexicon buildingTools for application of lexicon in search engines Lexicon cookbook
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 24
Tools (more specific)- Lexicon building from corpus material and dictionaries - Use of lexica in search engines
- Tool to extract spelling variation patterns from historical material
- Tool to relate previously unrecognised spelling variations to their standard form
- Tool to deduct previously unrecognised inflected forms to their basic form
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 25
Spelling variation tools (pattern-based)Language-independent approach:
Supervised rule (pattern) induction from pairs (“modern” word, historical word), yielding patterns like aa/ae, s/z, …. Pattern weights are computed from example material
Additional approaches possible, eg. :Use of aligned data (parallel historical text and modern version)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 26
LemmatizationReduction of historical word forms to modern lemmaHistorical word standard (“modern”) spelling lemma form
(pattern matching) (lemmatizer)
Dystels (1) distels (2) distel
When we have a perfect or near-perfect modern full form lexicon, the second step is simply lexicon lookup.
But: 1) We will not have full form information for many lemmata
(especially the historical ones)2) Even lemmata present in modern language may have historical
inflected forms different from the present-day paradigm
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 27
Lemmatization and reverse lemmatizationWe also need a lemmatization process for these situations
A typical lemmatizer assigns some standard form (infinitive, nominative, stem) to inflected forms. Usually based on patterns relating the inflected form to the standard form.
But:Matching these patterns can be hard to combine with matching both spelling variation patterns and OCR errors (bok/bokken/bokkeu)We adopt the solution of actually expanding the “hypothetical modern full form lexicon” containing the most plausible possible paradigmatic expansions of lemmataThis construction is carried out by means of a statistical reverse lemmatizer
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 28
AttestationFrom hypothetical (non-witnessed) lexicon content to attested word forms in “real” textAutomatic selection of candidate attestationsManual work: verification and correction
Two approachesDictionary based (INL): Woordenboek der Nederlandsche TaalCorpus based (LMU, INL): Dutch DBNL corpus
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 29
IMPACT Dictionary Attestation Tool
work• We are working on what works.
• Depart from me, ye that worke iniquity.
• She worcketh knittinge of stockings.
headword
Quotations
variants
TaskFind the variants of a headword as they occur in the quotations
Lexicon building at work: Verifying attestations in historical dictionaries
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 30
IMPACT Dictionary Attestation Tool
Automatically (preprocessing)
• match literallye.g: work work, Work
• match using existing lexica and listse.g: work works, worked, wrought
• approximate matchinge.g: work worke
By hand (using the tool)
• correct automatic mismatchese.g: works words, worms
• find missed matchese.g: work worketh, wrowght
TaskFind the variants of a headword as they occur in the quotations
Electronic
historical
dictionaryDatabase
with lemmata
and quotatioms
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 31
IMPACT Attestation ToolTool
Lemma headword
QuotationsSorted by uncertainty
Up-to-date overview of what is done and needs to be don
Done by this user so far
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 32
IMPACT Lexicon Tool
Automatically (preprocessing = apply lemmatizer)
• match literallye.g: work work, Work
• match using existing lexica and listse.g: work works, worked, wrought
• matching using spelling variation modulee.g: uiterlijk uyterlick
By hand (using the tool)
• assign correct lemma e.g: was (N) zijn (V)
• group tokens belonging togethere.g: konings zoon koningszoon
• select attestations
TaskFind and verify attestations in a historical corpus
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 33
Corpus-based lexicon building: Impact Lexicon Tool
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 34
General vocabulary vs. Named entitiesTools for lexicon building described so far: applicable to general lexiconTools for NE recognition, classification and variant matching- library requirement- distinguish general vocabulary from NE’s- avoid unpleasant mixups like Abimelech apemelk!
(b/p; i/e; e/0; k/ch)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT workshop, Bratislava, May 7, 2010 35
Improvement of state of the art / innovation
We use existing computational linguistic approaches, but figure out how to apply them to historical languageWe develop a workflow to deal with the problems posed by historical language, figuring out how all pieces fit together
Data selection and acquisitionManual workComputational linguistics tools
�
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
languages in IMPACTDutch, German, English, Spanish, FrenchPolish, Czech, Slovene and Bulgarian
- Cross language perspective paper- Parallel OCR and IR experiments- GT datasets
- Language tools: language independent- Except from 3 core languages: proof of concept lexica
IMPACT <Demo Day BL, 12 July 2011> 36
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
OCR evaluation results(preliminary!)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
1. CzechCo jest konstituce?, čili, Krátký, prostonárodní wýklad hlawnějších zásad konstitucí ewropejských, 1848 Ferina Lišák z Kuliferdy a na Klukově, čili, Kratičká historye zlopověstných kousků starého Reinecke, 1848 Homerowa Iliada, 1802 Na den narození neimocněišího, a neijasněišího cysare rímského, téz dědičného rakauského a krále ceského, Frantiska II., w Praze 12. den mesyce Unora, léta 1805, 1805 Plody sborů učenců řeči českoslowanské prešporského, 1836 Rozprawy o gmenách, počátkách i starožitnostech národu Slawského a geho kmeni /, 1830 Sokol, 1872 Základowé pitwy (Anatomie), čili, Soustawnj rozbor a popis těla lidského a gednotliwých geho částek, 1840
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
2.Dutch18th and 19th century books, newspapers, parliamentary papers
Provinciale Overijsselsche en Zwolsche courant : staats-, handels-, nieuws- en advertentieblad, 1852-1852 Rechtsgeleerd advis in de zaak van den gewezen stadhouder, en over deszelfs schryven aan de gouverneurs van de Oost- en West-Indische bezittingen van den staat [...]. Ingelevert [...] op den 7 january 1796. / By B. Voorda et al, 1796-1796 Verhaal van het levensgevaar, waar in zig drie Rotterdamsche burgers [...] bevonden hebben, te Utrecht, 1784-1784 Vrijmoedige aanmerkingen, over de uitsluiting van allen die door publieke armkassen bedeeld worden, als stemgerechtigden [...] bij eene oproeping van het Nederlandsche volk tot eene Nationaale Conventie, 1795-1795
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Precision: 0.8432889410216431 , Recall: 0.843331934927516
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
English16th-19th century materialSources for lexicon building: OED, ECCO
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
French17th century books
Conduite du jugement naturel où tous les bons esprits de l'un et l'autre sexe pourront facilement puiser la pureté de la science, par M. Jacques Forton, sieur de S. Ange,..., 1653 Dissertation de la philosophie en général, 1668 La Dialectique du sieur de Launay, contenant l'art de raisonner juste sur toute sorte de matières..., 1673 Lettre de M. Gadroys à M. de La Grange Trianon,... pour servir de réponse à celle que M. de Castelet a écrite contre les raisons de M. Descartes touchant le flux et le reflux de la mer. - Seconde lettre de M. Gadroys... [au même, sur le même sujet.], 1677 Traitez de métaphysique démontrée selon la méthode des géomètres. [Par le sieur de La Coudraye.], 1693
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
GermanDas Buch des heyligen Römischen Reichs unnderhalltunge, 1501 Die Poesie ihr Wesen und ihre Formen mit Grundzügen der vergleichenden Literaturgeschichte, 1884 Echo Deß Hochzeitlichen Te Deum Laudamus, 1722 Ergebnisse der Erhebungen über die Beschäftigung gewerblicher Arbeiter an Sonn- und Festtagen, Bd.:1, Gruppe I bis VII der Gewerbestatistik, Berlin, 1887, 1887 Quedlinburgisches Kreis-Tags-Memorial, 1673 Von der Regierung der Kirche und den unterschiedlichen Würden der Geistlichkeit *(full title in comments), 1779 Warhaffter und grundlicher Bericht uß was Ursachen Martinus du Voysin (zu Basel verburgerter Krämer) inn der Statt Surseew im Aargöw, ..., den 13. Tag Octobris deß 1608. Jars erstlich enthauptet, und volgends verbrennt worden, 1609
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
PolishAdwersaria, albo terminata sprawy wojennej, która się toczyła w wołoskiej ziemi z tureckim cesarzem, 1621 Chorągiew Sarmacka w Wołoszech, to jest pospolite ruszenie i szczęśliwy powrót Polaków z Wołoch w roku 1621, 1621 Diariusz wiadomości od wyjazdu króla z Wilna do Smoleńska, 1610 Discurs o cenie pieniedzy teraznieyszey y o niektorych skutkach iey…, 1632 Nowe Ateny, albo Akademia wszelkiey scyencyi pełna, na różne tytuły iak na classes podzielona, mądrym dla memoryału, idiotom dla nauki, politykom dla praktyki, melancholikom dla rozrywki erygowana ... . Część 3 albo Supplement., 1746 Pasja żołnierzy obojga narodów w stolicy moskiewskiej krótko opisana, 1613 Powodzenia niebezpiecznego ale szczęśliwego wojska j. k. m. w Multanach opisanie, 1601 Relacja chwalebnej ekspedycji Jana Kazimierza, króla polskiego i szwedzkiego, 1650 Wyprawa i wyjazd sułtana Amurata, cesarza tureckiego, na wojnę do Korony Polskiej, 1634 Wyprawa i wyjazd sułtana Amurata, cesarza tureckiego, na wojnę do Korony Polskiej_BW, 1634 Żałosne opisanie upadku króla hiszpańskiego na morzu i na lądzie, 1589
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
SloveneGenovefa, 1841 Gosp. Krištofa Šmida korarja avgustanskiga, zgodBe S. Pisma za mlade ljud..., 1850 Kmetijske in rokodelske novice, 1844 Kratkozhasne uganke, 1788 Kuharske Bukve, 1799 Marianske Kempensar, ali Dvoje bukuvze, 1769 Novice kmetijskih, rokodelnih in narodskih reči, 1851 Sgodbe svetiga pisma za mlade ljudi, 1830 Ta male katechismus, 1768 Vezhna pratika od gospodarstva, 1789 Zerkviza na skali, 1855
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT <Demo Day BL, 12 July 2011> 53
Retrieval demonstrator
Indexing and retrieval library (java) implemented on the lucene search engineLexicon in MySQL database
OCR with Finereader SDK and external dictionary interface of about 2000 images of the Dutch Ground Truth selectionPage XML output [in framework]NE tagging Indexing and retrieval while using lexicon and NE tagging
53
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.