Date post: | 13-Apr-2017 |
Category: |
Data & Analytics |
Upload: | ariadnenetwork |
View: | 135 times |
Download: | 0 times |
ARIADNE is funded by the European Commission's Seventh Framework Programme
ARIADNE SemanAc IntegraAon of Archaeological InformaAon
Achille FeliceD
VAST-‐LAB -‐ PIN, Università degli Studi di Firenze, Italy
ID Item Room Showcase
35 Amphora 3 2
24 Coin 8 15
18 ... ... ...
ID Artifact SU
1020 Coin 12
1021 ... ...
1022 Amphora 13
Museum DB: Items Table
Excava4on DB: Ar4facts Table
Different archives Different data structures Is integra4on possible?
IntegraAng archaeological data
ID Item Room Showcase
35 Amphora 3 2
24 Coin 8 15
18 ... ... ...
ID Artifact SU
1020 Coin 12
1021 ... ...
1022 Amphora 13
Museum DB: Items Table
Excava4on DB: Ar4facts Table
Object Place
IntegraAng archaeological data
ID Item Room Showcase
35 Amphora 3 2
24 Coin 8 15
18 ... ... ...
ID Artifact SU
1020 Coin 12
1021 ... ...
1022 Amphora 13
Museum DB: Items Table
Excava4on DB: Ar4facts Table
Object
Place
Object
Place
Found In Stored In
Implicit knowledge: semanAc relaAons
Temporal enAAes
ID Artifact SU Data Period
1020 Coin 12 1981 V B.C.
1021 ... ...
1022 Amphora 13 1974 III B.C.
Time Object
Created
Found
The CIDOC CRM model • The CIDOC Conceptual Reference Model
– A collaboraAon with the InternaAonal Council of Museums – An ontology of 92classes and 142 properAes for culture and more – With the capacity to explain hundreds of (meta)data formats – Accepted by ISO TC46 in September 2000 – InternaAonal standard since 2006 -‐ ISO 21127:2006 – To be revised 2014 (minor extensions)
• Serving as: – Intellectual guide to create schemata, formats, profiles – A language for analysis of exisAng sources for integraAon/
mediaAon – IdenAfy elements with common meaning – TransportaAon format for data integraAon / migraAon / Internet
What is a formal ontology made of? • System of classes and relaAons that should describe some domain of
discourse – It entails no specific encoding
• Any ontological specifica4on should contain at least:
• Scope: a definiAon of the intended field of discourse/reality that the formal ontology should cover – e.g. Car Manufacturing, Cultural Heritage, Fashion
• Classes: universals meant to represent some set of en44es in the world of discourse, that have a disAnct, idenAfiable behaviour and iden4ty
• Proper4es: the rela4ons that exist between classes in the ontology. The relaAons formally define the possible proposiAons that can be made of instances of classes
7
Reading a formal ontology • Formal Ontologies are arranged
hierarchically.
• The highest classes are the most abstract and define, with their properAes, the highest levels of discourse within a domain.
• Scan for classes and relaAons that seem relevant to what you want to describe. Are they adequate?
• If you adopt a formal ontology, then the world you want to describe should largely be expressible under its high level terms and have specific enough terms to support minimally ambiguous discourse.
8
Anatomy of a Class • The Label: arbitrary but idenAfying
• Subclass/Superclass: Place in IsA
• The Scope Note: gives the meaning, the intension. First thing to check!
• The Examples: helps to verify… do others think/do it like you do
• The Proper4es/Rela4ons: more verificaAon of appropriateness.
• How does it relate to other concepts? Is this how my concept behaves?
9
Anatomy of a Property
• The Label: arbitrary but idenAfying
• The Domain: The set of classes from which the property can originate
• The Range: the set of classes to which the property can join the domain class
• Superproperty/subproperty: Place in IsA Hierarchy
• The Scope Note: gives the meaning, the intension. First thing to check!
• The Examples: helps to verify… do others think/do it like you do
10
• IdenAficaAon of real world items by real world names • ObservaAon and ClassificaAon of real world items • Part-‐decomposiAon and structural properAes of Conceptual and
Physical Objects, Periods, Actors, Places and Times • ParAcipaAon of persistent items in temporal enAAes.
– Creates a no4on of history: “world-‐lines” mee4ng in space-‐4me.
• LocaAon of periods in space-‐Ame and physical objects in space. • Influence of objects on acAviAes and products and vice-‐versa. • Reference of informaAon objects to any real-‐world item.
Official DocumentaAon hhp://cidoc-‐crm.org/official_release_cidoc.html
The CIDOC CRM model
CIDOC CRM: top level classes
participate in
E39 Actors
E55 Types
E28 Conceptual Objects
E18 Physical Thing
E2 Temporal Entities
E41
App
ella
tions
affect or / refer to
refer to / refine
refe
r to
/ ide
ntify
location
at within E53 Places E52 Time-Spans
Knowledge etracAon
E2 Temporal EnAAes
E2 Temporal Entity
E5 Event E63 Beginning of Existence
E7 Activity
E69 Death
E6 Destruction
E87 Curation Activity
E83 Type Creation
E13 Attribute Assignment
E86 Leaving
E80 Part Removal
E 79 Part Addition
Generalization
E64 End of Existence
E10 Transfer of Custody
E15 Identifier Assignment
E4 Period
E3 Condition State
E68 Dissolution
E81 Transformation
E67 Birth
E66 Formation
E65 Creation
E11 Modification
E9 Move
E8 Acquisition
E85 Joining
E12 Production
E17 Type Assignment
E14 Condition Assessment
E16 Measurement
CIDOC CRM approach
P54 has current permanent location (is current permanent location of)
E18 Physical Thing
E7 Activity
E9 Move
E53 Place E19 Physical Object
P53 has former or current location (is former or current location of)
P55 has current location (currently holds)
P26 moved to (was destination of)
1,n
0,n 0,n
0,n 0,n
1,n
0,1
0,n
1,n
1,n P27 moved from (was origin of)
P25 moved (moved by)
E55 Type P21 had general purpose (was purpose of)
0,n 0,n P20 had specific purpose
(was purpose of) 0,n
0,n
0,n 0,1
E5 Period P7 took place at (witnessed)
1,n
0,n
CIDOC CRM and archaeology
Mapping to CIDOC CRM
– Data DescripAon
– Common Language
Archaeological ObjectE22 Man-made Object
Excavation/SurveyE7 Activity
P24B changed ownership through DSCU, DSCS: Finding PlaceE53 Place
P7 took place at
DSCF, DSCA, RCGA,: Excavation responsibles
E39 Actor
DSCT, RCGE: MotivationE17 Activity
P14 carried out by
P17 was motivated by
SCAN: Excavation Name E41 Appellation
P57 is identified by
DSCD RCGD: Excavation DateE52 Time Span
P4 has time-span
DSCM, RCGM: MethodE55 Type
P32 used general technique
[Open Vocabulary]"Stratigraphic""Open Area"...
[Open Vocabulary]"Rescue Archaeology""Photo Interpretation"...
TCL: Type = "Finding"
NCUN, DSCI: Identifiers E42 Identifier
P1 is identified by[DSC Authority File]
OBJECT FINDINGE8 Acquisition
P117 occurs during
Mapping Memory Manager Tool (3M)
3M Tool (FORTH) hhp://www.ics.forth.gr/isl/3M/
CRM mapping and encoding of legacy archives
ACDM to CIDOC CRM: Data conversion
• All ACDM informaAon in ARIADNE Catalog
• Exported from ARIADNE Registry
• ACDM/XML format
• Uploaded in 3M as source data
ACDM to CIDOC CRM: Data conversion
• Schema-‐to-‐Schema mapping applied
• 3M TransformaAon Engine
• CIDOC CRM encoding
• CIDOC CRM/RDF Format
• Ready for PARTHENOS
ARIADNE is funded by the European Commission's Seventh Framework Programme
----------------------------------------------------------------------------------
Metatada Repository
CIDOC CRM
Content Providers
Integra4on & Interoperability
XML OAI-‐PMH RDF
IntegraAon Layer – Common semanAc representaAon (mapping)
– Data transparency – Data peculiarity preserved by the system
ARIADNE Reference Model
Few concepts, high recall
Special concepts, high precision
• CRMinf v0.7: who said that? – from data to knowledge – integraAng data with their scholarly jusAficaAon – being validated with scholarly annotaAons
• CRMsci v1.2.2: a ScienAfic ObservaAon model – generalizes over INSPIRE, OBOE, SEEK, Darwin Core – generalizes concepts of units of maher and their “(physical) genesis” – introduces concept of observaAon and data evaluaAon – validated in archeology, biodiversity and geology
• CRMba v1.3: buildings archaeology – introduces concepts of buildings – Will be integrated with CRMarchaeo
• CRMarchaeo v1.2.1: an ExcavaAon model – introduces concepts of straAgraphy and excavaAon – being validated by archaeological records
• CRMgeo v1.2: a SpaAotemporal model – integrates CRM with OGC standards – a complete model of phenomena occupying spaceAme (consistent with modern physics) – integrates geometry-‐ and semanAcs-‐derived topological relaAons – core concepts being integrated into CRM
• CRMdig v3.2: a model of DigiAzaAon processes – validated in European & US projects, to be adapted to CRMsci
ARIADNE Reference Model v1.0
24
Item Level IntegraAon
• Goal: AggregaAon and integraAon of a set of diverse datasets to prove that it is possible to create a rich common repository at a data item level
• Use Case: IntegraAon of heterogeneous datasets containing informaAon about coins
• Involved partners: CNR, FORTH, PIN, DAI
Case Studies Ø NumismaAcs
• tradiAonal science with experience and iniAaAves in standardizaAon so it was chosen as a very good starAng point for item-‐level integraAon
• Nomisma.org serves as a authoritaAve resource
Ø Wood/Dendrochronology • integraAon of informaAon from diverse datasets and (via NLP)
archaeological reports in different languages • GeVy AAT serves as an authoritaAve resource
Ø Sculptures • data integraAon of sources from various disciplines including sculpture informaAon and its archaeological context.
• focuses on the provenance of informaAon according to bibliographic references which leads to advanced literature research
NumismaAcs Case Study Extracts of 5 diverse databases & datasets: Ø OEAW: dFMRO coin archive 72 records
Ø COINS Project: SAR Archive 627 records
Ø COINS Project: FWM Archive
Ø iDAI Coins Pergamon 517 records
Ø CultureItalia: MuseiD-‐Italia 25562 records
Ø NLP data from Heslington East ExcavaAon Archive 37 records
Ø ACDM records
Research quesAons • Origin -‐ Where does this coin come from? • Tracking -‐ How did it arrive here? • Chronology -‐ First/last appearance • Prac4cal/symbolic value, incidents -‐ Why is it deposited
here? • Poli4cal message -‐ Why was it produced (i.e. "minted")? • Economic stability, power -‐ Why was it widely used /
not used? • Sta4s4cs -‐ Material versus nominal value
Research quesAons There exist several queries that are trivial to be answered by each dataset separately, however they become important if they can be answered by the aggregated repository: • Find coins minted in the same place/area or by the same
authority • Find coins produced in the same period or Ame span
(typically the same century or half/quarter century) • Find coins having common shape/iconography/inscripAons • Find coins made by a specific material
Item Level IntegraAon for NumismaAc
Wood/Dendrochronology Case Study • Extracts of 5 archaeological datasets, output from NLP
on 25 grey literature reports • MulAlingual -‐ English, Dutch and Swedish data • Data integraAon via CIDOC CRM and Gehy AAT • 1.09 million RDF triples • 23,594 records • 37,935 objects • DemonstraAon query builder
for easier cross-‐search and browse of integrated datasets
Wood/Dendrochronology Case Study
SPARQL queries
DemonstraAon applicaAon: Query Builder
DCCD
RDF triple store
ADS, DANS, SND
Gehy AAT (RDF)
VAG cruck NMS VAG
dendro UNID
XML NLP
Direct import TransformaAon (STELETO)
Cleansing + NormalisaAon (OpenRefine)
tabular records
TransformaAon (STELETO)
Grey literature Archaeological datasets
tabular records TransformaAon (XSLT)
Sculptures Case Study • Extracts of 5 diverse databases & datasets:
– Archaeological object database: Arachne – Field research databases: Athenian Agora, iDAI.field – Museum data: BriAsh Museum – Research data: Oxford Roman Economy Project
• Data integraAon via CIDOC CRM and controlled vocabularies: Gehy AAT, Wikidata, Zenon, iDAI.gazeheer
• 5,44 million triples • 58343 records
Sculptures Case Study
Integra4on & Interoperability ARIADNE portal
Integrated Knowledge Repository
X3ML Mapping Framework
mapping provider dataset records to CIDOC CRM
Content Providers
ARIADNE aggrega4on infrastructure
Provider dataset descrip4ons
Catalog
Integrated Browse/Query Interface
Provider records
ACDM records
ACDM records
mapping ACDM records to CIDOC CRM
Browse the Catalog
NLP
NLP records
Mapping provider records to CRM
Ini4al OEAW record (parAal)
<COIN> <ID>626</ID> <COUNTRY_ID>1</COUNTRY_ID> <FIND_SPOT_ID>242</FIND_SPOT_ID> <FIND_MANNER_ID>2</FIND_MANNER_ID> <FIND_DATE>-‐</FIND_DATE> <WEIGHT>0.43</WEIGHT> <ISSUER_ID>243</ISSUER_ID> <DENOMINATION>239</DENOMINATION> ….
</COIN> <DENOMINATION>
<DEN_ID>239</DEN_ID> <DEN_NAME>Kls</DEN_NAME> <DEN_METAL>2</DEN_METAL>
</DENOMINATION> <METAL>
<MET_ID>2</MET_ID> <MET_NAME>AR</MET_NAME>
</METAL>
<crm:E22_Man-‐Made_Object rdf:about="hhp://www.oeaw.ac.at/COIN/626"> <crm:P52_has_current_owner rdf:resource="hhp://www.oeaw.ac.at/"/> <crm:P67i_is_referred_to_by rdf:resourse="hVp://registry.ariadne-‐infrastructure.eu/ACDMdescrip4on/doi%3A10.5878%2F002271"/> <crm:P43_has_dimension> <crm:E54_Dimension rdf:about="urn:uuid:cc4ad6d4-‐656b-‐4…."> <crm:P90_has_value>0.43</crm:P90_has_value> <crm:P91_has_unit rdf:resource=“…/measurement units/gr“/> </crm:P91_has_unit> <crm:P2_has_type rdf:resource=“…/dimensions/weight"/> <rdfs:label>weight of coin 626</rdfs:label> </crm:E54_Dimension> </crm:P43_has_dimension> <crm:P45_consists_of> <skos:Concept rdf:about="hhp://www.oeaw.ac.at/material/2"> <skos:prefLabel>AR</skos:prefLabel> <rdf:type rdf:resource=“…/cidoc-‐crm/E57_Material"/> </skos:Concept> </crm:P45_consists_of> </crm:E22_Man-‐Made_Object >
Transformed OEAW record (parAal)
Mapping provider records to CRM
Ini4al DAI record (parAal)
<ROW MODID="9" RECORDID="310"> …
<PS_MuenzeID>103361</PS_MuenzeID> <Erhaltung_Durchmesser>18,50</Erhaltung_Durchmesser>
<Erhaltung_Gewicht>5,7</Erhaltung_Gewicht> <Erhaltung_Prozent>100</Erhaltung_Prozent> <Erhaltung_Staerke>4,85</Erhaltung_Staerke> <Funddatum>18.9.2010</Funddatum> <GrobdaAerung>hellenisAsch</GrobdaAerung>
<Metall>Bronze</Metall> … </ROW>
<crm:E22_Man-‐Made_Object rdf:about="hhps://www.dainst.org/COIN/103361"> <crm:P50_has_current_keeper rdf:resource="hhps://www.dainst.org/"/> <crm:P108i_was_produced_by> <crm:E12_ProducAon rdf:about="urn:uuid:eb011a7f-‐b5f1-‐4aab-‐9bd8-‐4a07f4f008ea“ <crm:P43_has_dimension> <crm:E54_Dimension rdf:about="urn:uuid:…-‐c6f3d6989c9d">
<crm:P90_has_value>5,7</crm:P90_has_value> <crm:P91_has_unit rdf:resource=“…/measurement units/gr"/> <crm:P2_has_type rdf:resource=“…/dimensions/weight"/>
<rdfs:label>weight of coin 103361</rdfs:label> </crm:E54_Dimension> <crm:P45_consists_of rdf:resource="hhps://www.dainst.org/material/Bronze"/> </crm:E22_Man-‐Made_Object>
Transformed DAI record (parAal)
Integrated Knowledge Repository Experimental integrated knowledge repository in Blazegraph
Ø NumismaAcs Case Sudy 1,2M triples Ø Wood/dendro Case Study 1,5M triples Ø Sculptures Case Study 5,5 M triples Ø AAT 4,4M triples Total ~ 13M triples
Contains different levels of informaAon: Ø Item specific informaAon Ø Document research data Ø NLP data Ø Catalog informaAon
Research quesAons Ø Query mechanisms support innovaAve reasoning on
archaeological datasets
Ø Query power lies in relaAng and combining
Ø data from different providers, preserving the original meaning and their perspecAve
Ø item level with catalog info on archaeological datasets
Research quesAons
Ø Find all bronze coins (item level info, retrieves datasets from mulAple providers)
Ø Find the publishers of all collecAons that contain coins (catalog info)
Ø Find all datasets and grey literature reports that contain bronze antonianus (item level, NLP data and catalog info)
SAR records
NLP
record
CulturaItalia records
DAI
record
OEAW records
Catalog info
Integrated Repository Experimental integrated repository in Blazegraph
Ø dFMRO 72 records (all Roman coins) Ø SAR 627 records (all Roman coins) Ø Pergamon 517 records (12 Roman coins – 1 empty record) Ø MuseiD-‐Italia 2 records Ø NLP data from Heslington East ExcavaAon Archive 37 records
Ø ACDM 2 records (OEAW, Heslington)
Terminology Ø Provider specific terminology Ø ARIADNE specific terminology
Ø GeVy AAT
Ø Nomisma.org • Nomisma.org is the standard, nearly everyone is referring to in the numismaAcs
• Normalized vocabulary with references, f.e. to Gehy AAT • Ontology, which is used for data integraAon of coin databases
Research quesAons Different levels of informaAon:
• Item specific info • Catalog info
Query power lies in combining item level with catalog info: • Find all bronze antoninianus coins (item level info, retrieves datasets from mulAple providers) • Find the publishers of all collecAons that contain coins (catalog info) • Find the publishers of all collecAons that contain bronze antoninianus (item level and catalog info)
Queries Query to find the contributor of a coin (produced with NLP) through the catalog SELECT ?thing ?contributor WHERE {{ {?thing <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P67i_is_referred_to_by> ?s1. ?s1 <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P148i_is_component_of> ?d1. ?d1 <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P148i_is_component_of> ?d2. ?catalog <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P129_is_about> ?d2. ?catalog <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P94i_was_created_by> ?creaAon. ?creaAon <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P11_had_parAcipant> ?contributor. }}}
Queries Query to find the owner of a coin through the catalog SELECT ?thing ?owner WHERE { ?thing <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P67i_is_referred_to_by> ?param. ?param <hhp://www.cidoc-‐crm.org/cidoc-‐crm/P52_has_current_owner> ?owner. } hhp://www.oeaw.ac.at/COIN/626
ARIADNE is funded by the European Commission's Seventh Framework Programme
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
SemanAc Repository Registry
ACDM CIDOC CRM
Content Providers
Integra4on & Interoperability
Integra4on Services
ConfiguraAon & Management ARIADNE Portal
Browse/Query Interfaces
Vocabularies
CRM/RDF OAI-‐PMH XML
Metadata Enhancement
Data +
Metadata
Data +
Metadata
Data +
Metadata
Data +
Metadata
Resource Discovery Preview PreservaAon
Data Access (SPARQL, REST)
Archive Discovery
Digital Asset
Management Dataset Discovery
Vocabularies Management WEB
LOD
Repository and Services Architecture
Final consideraAons
• Very advanced stage of development
– End of the project
• ARIADNE main goal
– “IntegraAon of exisAng archaeological research data infrastructure through new and powerful technologies” (ARIADNE DoW)
• “From differences results the most beauAful harmony”(Heraclitus of Ephesus)
ARIADNE is a project funded by the European Commission under the Community’s Seventh Framework Programme, contract no. FP7-‐INFRASTRUCTURES-‐2012-‐1-‐313193. The views and opinions expressed in this presentaAon are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.
Thank you …