Date post: | 09-Dec-2014 |
Category: |
Documents |
Upload: | stichting-den |
View: | 351 times |
Download: | 2 times |
CLARIN-NLReaching out to the users
Arjan van Hessen
Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands
State of the Technology
Language and Speech Technology is (nearly) mature Many applications are available Most of it is usable (although not perfect) but…..
Unused Technology & Resources
Many scholars are not aware of the HLT & Resources
A-priori technical knowledge still necessary Use it to much
dependent of “friends” in the field
Lack of standardization is killing
It is less used than expected
Research Life cycle
Cultural Heritage Institution(s)
New Idea
Research
BuildingTuning
Publications
?
Unused Technology & Resources
CAR
HLT & CHI paths
Language processing
Machine learning
Humaninities
CATCHCultural Heritage Institutions
After the project
7
Lack of standardizationBad interfaces
CLARIN-EU (2007-2012)CLARIN-NL (2009-2015)
CLARIN-ERIC (2012-xxxx)CLARIAH (2015-…)
Infrastructure program for the Humanities
8
Issues to address
1. Finding the users
2. Identification of their needs/problems
3. Do our solutions correspond to their problems?
4. Usability of tools: can they use them?
5. Visualisation
6. Tutorials and web material (movies, courses)
7. Sustainability of tools and resources
9
1. FINDING THE USERSHow to identify and convince potential users
10
Humanities enter a New Era
Huge amounts of digital data are becoming available
Traditionally, Spitzweg’s “lonely scholar” no longer
sufficesBig data, supported by
automated methods
Hardware allows this and many tools are available and under
development
11
User Surveys
Go out to ask potential users User survey in the Netherlands (2010)
12
2. IDENTIFICATION OF THEIR NEEDS/PROBLEMS
What do they need?
13
User attraction cycle
14
Finding new users
Convincing these users to
participate
Train these users in the use of all those wonderful tools
Support the users
Listening to the users
3. DO OUR SOLUTIONS CORRESPOND TO THEIR PROBLEMS?
What to prevent in order to NOT scare off (potential) users
15
16
The CLARIN dream
Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)
Give me all negative articles about Catholics in the Fryske Courant (1868-1924)
Find European TV news interviews that involve discussions about Geert Wilders
16
17
The CLARIN nightmare in 6 sleepless nights – night 1
Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) “All” means from all countries and all archives, not just some
archives in some (9) countries that happen to be in CLARIN If contemporary docs exist in digital form at all they are
probably pictures – how do we get access to the content? Can we rely on standardized metadata to find them? Many of the docs may be in Latin – can we handle that, and
what about the other languages? How would a scholar know how to formulate this query? How to present results?
4. USABILITY OF TOOLSThe gearbox syndrome
18
19
The gearbox syndrome explained
Humanities scholar with a problem, waiting for a solution
First HLT researcher offering help
20
The gearbox syndrome explained
Humanities scholar with a problem, waiting for a solution
First generation named entity recognizer (rule based)
21
The gearbox syndrome explained
Humanities scholar with a problem, waiting for a solution
Second HLT researcher offering help
22
The gearbox syndrome explained
Humanities scholar with a problem, waiting for a solution
Second generation named entity recognizer (statistics based)
23
The gearbox syndrome explained
Humanities scholar with a problem, waiting for a solution
Third HLT researcher offering help
24
The gearbox syndrome explained
Humanities scholar with a problem, waiting for a solution
LREC 2012 paper about next generation named entity recognizer
25
The gearbox syndrome explained
Making understandable interfaces
5. VISUALIZATION
A picture says more than 1000 wordsEasy visualization fosters data analysisNice visualisation eases use of analysis toolsNice-to-look-at tools help to reach out to the community
27
Who answered which words: visualizing word frequency information in letters
28
C. Culy. 2012. "Some challenges of language and linguistic data for information visualization. " Invited keynote presentation at Advanced Visual Methods for Linguistics. University of York, September 7, 2012.
29
30
Parliamentary Debate
31Which party interrupted which other party and how often?
6. TUTORIALS AND WEB MATERIAL
Create and publish web tutorialsPublish recorded lectures about CLARIN-specific topicsMake and publish show cases
32
7. SUSTAINABILITY OF TOOLS AND RESOURCES
Resources and tools must be accessible after a project finishesData and tools must use international accepted standardsEasy access via federated login
35
CLARIN Centres
36
Conclusion
CLARIN offers a good and sustainable infrastructure for long-term use of both Resources and Tools
Participating in CLARIN gives you access to enclosure tools, standardized metadata, tools for metadata, the CLARIN community
Give other groups/institutions access to your data….. If you want
37