+ All Categories
Home > Documents > An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is...

An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is...

Date post: 15-Mar-2019
Category:
Upload: trandan
View: 224 times
Download: 0 times
Share this document with a friend
10
JANUARY/FEBRUARY 2009 1541-1672/09/$25.00 © 2009 IEEE 47 Published by the IEEE Computer Society By providing an integrated Web interface to the critical materials science databases and analytical tools, MatSeek represents a significant advance toward a full-fledged materials-informatics workbench. M aterials scientists and nanotechnologists are struggling with the challenge of managing the large volumes of multivariate, multidimensional, and mixed- media data sets being generated from the experimental, characterization, testing, and postprocessing steps associated with their search for new materials. In addition, they increasingly require access to a number of large- scale and complex databases (both public and commercial) containing crystallographic-structure data, thermodynamic data, phase stability data, and ionic-conduction data. As a result, materials scien- tists are demanding data management and integra- tion tools that enable them to search across these disparate databases and to correlate their experi- mental data with external, publicly available data, in order to identify new fertile areas for searching. Systematic data integration and analysis tools are required to generate targeted experimental pro- grams that reduce the duplication of costly com- pound preparation, testing, and characterization. As a result, the discipline of materials infor- matics is emerging to address the issues of data management, curation, integration, and analysis that are challenging materials scientists. Materi- als informatics is defined as the high-speed robust acquisition, management, analysis, and dissemi- nation of diverse materials data. Moreover, mate- rials data access, acquisition, interoperability, and curation were recently identified as critical cyber- infrastructure imperatives for the materials science community. 1,2 Critical requirements specifically identified include persistent unique identifiers for materials science resources; metadata standards for describing samples, processes, and properties; common semantic models and ontologies to en- able mapping between database schemas, infor- mation integration, and semantic interoperability; and laboratory information management and prov- enance capture systems that capture both the labo- ratory procedures as well as the postprocessing of the data. Semantic Web technologies are essential to addressing many of these issues. In this article we describe MatSeek—a Semantic Web application that aims to address the challenges associated with integrating heterogeneous data- bases associated with materials science. (For a look at other attempts to integrate databases for materi- als science, see the sidebar on the next page.) A ma- chine-processable ontology is used to correlate pro- cessing parameters with nanostructure and physical and chemical properties, in order to help scientists discover potential new materials for specific, high- priority applications. MatSeek provides a federated search interface over the critical materials science databases. Based on an OWL ontology (MatOnto), it provides a sin- gle Web-based search interface to the Inorganic Crystal Structure Database (ICSD), 3 the Ionic SEMANTIC SCIENTIFIC KNOWLEDGE INTEGRATION MatSeek: An Ontology-Based Federated Search Interface for Materials Scientists Kwok Cheung, Jane Hunter, and John Drennan, University of Queensland Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.
Transcript
Page 1: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

January/February 2009 1541-1672/09/$25.00 © 2009 IEEE 47Published by the IEEE Computer Society

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

By providing an

integrated Web

interface to the

critical materials

science databases

and analytical tools,

MatSeek represents

a significant advance

toward a full-fledged

materials-informatics

workbench.

Materials scientists and nanotechnologists are struggling with the challenge of

managing the large volumes of multivariate, multidimensional, and mixed-

media data sets being generated from the experimental, characterization, testing, and

postprocessing steps associated with their search for new materials. In addition, they

increasingly require access to a number of large-scale and complex databases (both public and commercial) containing crystallographic-structure data, thermodynamic data, phase stability data, and ionic-conduction data. As a result, materials scien-tists are demanding data management and integra-tion tools that enable them to search across these disparate databases and to correlate their experi-mental data with external, publicly available data, in order to identify new fertile areas for searching. Systematic data integration and analysis tools are required to generate targeted experimental pro-grams that reduce the duplication of costly com-pound preparation, testing, and characterization.

As a result, the discipline of materials infor-matics is emerging to address the issues of data management, curation, integration, and analysis that are challenging materials scientists. Materi-als informatics is defined as the high-speed robust acquisition, management, analysis, and dissemi-nation of diverse materials data. Moreover, mate-rials data access, acquisition, interoperability, and curation were recently identified as critical cyber-infrastructure imperatives for the materials science community.1,2 Critical requirements specifically identified include persistent unique identifiers for

materials science resources; metadata standards for describing samples, processes, and properties; common semantic models and ontologies to en-able mapping between database schemas, infor-mation integration, and semantic interoperability; and laboratory information management and prov-enance capture systems that capture both the labo-ratory procedures as well as the postprocessing of the data. Semantic Web technologies are essential to addressing many of these issues.

In this article we describe MatSeek—a Semantic Web application that aims to address the challenges associated with integrating heterogeneous data-bases associated with materials science. (For a look at other attempts to integrate databases for materi-als science, see the sidebar on the next page.) A ma-chine-processable ontology is used to correlate pro-cessing parameters with nanostructure and physical and chemical properties, in order to help scientists discover potential new materials for specific, high- priority applications.

MatSeek provides a federated search interface over the critical materials science databases. Based on an OWL ontology (MatOnto), it provides a sin-gle Web-based search interface to the Inorganic Crystal Structure Database (ICSD),3 the Ionic

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

MatSeek: An Ontology-Based Federated Search Interface for Materials Scientists

Kwok Cheung, Jane Hunter, and John Drennan, University of Queensland

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 2: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

48 www.computer.org/intelligent Ieee InTeLLIGenT SySTeMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

The process of integrating data from multiple inde-pendently developed databases presents significant challenges to materials scientists that include resolv-

ing differences in metadata terms as well as data struc-tures, formats, and metrics. Here we review three previous approaches to addressing these challenges within materials science: data warehouses, XML Schemas, and the adoption of Semantic Web technologies.

The data warehouse approach, proposed by Y. Li, aims to address the issues of low precision and recall of retrieved data during materials selection processes.1 It extracts data from databases, identifies and resolves data differences, and loads cleansed data into a centralized repository. As a result, it provides a single platform for end users to search data quickly and easily, improves the recall and precision rates, and ensures the availability of deposited data. However, this approach is inflexible and cannot easily adapt to changes in back-end database schemas or the addition of new data-bases. Additionally, it incurs costly overheads for data up-date, cleansing, and maintenance. These limitations led to the development of a more flexible approach based on XML schemas—MatML (Materials Markup Language).

MatML is an extensible markup language developed especially to facilitate the exchange of materials informa-tion. It can uniformly represent materials property data to resolve syntactic and structural heterogeneity. Because MatML is simple, flexible, and human understandable, it offers many benefits to materials scientists and engineers.2 Basing MatML on an XML Schema provides a shared vocabu-lary and formalized constraints over data structures and types. However, XML Schemas are based on a tree structure (acyclic graph) that has monotonic, implicit, and ambiguous relationships between connected nodes. As a result, XML Schemas provide little support for the semantic knowledge necessary to enable flexible dynamic mappings between vocabularies.

In contrast to XML schemas, OWL ontologies enable se-mantic mapping between domain-specific knowledge struc-tures by declaring binary relationships between nodes, and enable the inferencing of new relationships between nodes via reasoning engines. Within the materials science commu-nity, there have been two previous efforts at applying Se-mantic Web technologies to materials science data integra-tion issues.

Xiaoming Zhang and his colleagues developed two types of ontologies: domain-specific and mapping ontologies.3 The domain-specific ontologies include global and local on-tologies. The global ontology, named the Semantic Model for Materials Scientific Data (SMM), encapsulates the high-level structure of the materials science knowledge. The local ontologies are based on local database schemas. Mapping ontologies define the mappings between terms in the SMM ontology and local ontologies and between the local ontolo-gies and database schemas. However, the authors do not discuss how they ensure SMM quality, particularly in terms of coherence.4

Zhang and his colleagues developed the mapping ontol-ogy—the data source description ontology (OWL-DSDO)—

for structuring names of databases, entities, and attributes from domain-specific database schemas at the instance level. They then mapped the local domain-specific ontologies to the OWL-DSDO ontology at the class level through the de-clared object properties RelatedOntClass and RelatedProp-erty. They also developed inference rules for defining new concepts such as CorrosionResistantMaterial.

Toshihiro Ashino developed the Material Ontology (http://www.codata.jp:8080/doc/MaterialOntology_v1.1.pdf), which consists of seven modules: Materials Information, Substance, Property, Environment, Process, Unit Dimension, and Physical Constant. The vocabularies are from two sources: Matdata.net (www.matdata.net/index.jsp) and the Japanese National Institute for Materials Science’s MatNavi (http://mits.nims.go.jp/db_top_eng.htm). Because agents complying with those vocabulary sources can share the Material Ontology, they can selectively commit to the individual modules rather than the entire ontology. This approach can reduce ontologi-cal commitment. (Ontological commitments are agreements to use the shared vocabulary in a coherent and consistent manner.) Also, the structure of the Property module is ques-tionable. For example, the class property:PhysicalQuantity is subsumed by property:Property. The subclass’s name sug-gests it is a quantitative indicator and does not describe material characteristics such as its sibling classes: property:Mechanical and property:Electrical. Such inconsistencies can become problematic when applying reasoning across the knowledge base.

Our approach is similar to Zhang’s SMM ontology and Ashino’s Materials Ontology. We have developed MatOnto (for more on MatOnto, see the main article) as an extensible ontology for encapsulating materials science knowledge. However, our approach differs from previous efforts. The design of our ontology has been directed by the entities and attributes that exist in the experimental data sets and exter-nal databases that we are attempting to integrate. We have adopted a more practical, user-driven approach than previ-ous, more theoretical efforts, and we have chosen to base our ontology on an upper ontology to support extensibility (we discuss this in more detail in the section “Implementa-tion and User Interface” in the main article).

References 1. Y. Li, “Building The Data Warehouse for Materials Selection in

Mechanical Design,” Advanced Eng. Materials, vol. 6, nos. 1–2, 2004, pp. 92–95.

2. C.P. Sturrock, E.F. Begley, and J.G. Kaufman, MatML—Materi-als Markup Language Workshop Report, tech. report NISTIR 6785, US Nat’l Inst. Standards and Technology, 2001.

3. X. Zhang et al., “Material Scientific Data Integration for Se-mantic Grid,” Proc. 3rd Int’l Conf. Semantics, Knowledge, and Grid, IEEE Press, 2007, pp. 414–417; http://ieeexplore.ieee.org/xpl/RecentCon.jsp?punumber=4438492.

4. T.R. Gruber, “Toward Principles for the Design of Ontologies Used for Knowledge Sharing,” Int’l J. Human-Computer Stud-ies, vol. 43, nos. 5–6, 1995, pp. 907–928.

related work on integrating databases for materials Science

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 3: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

January/February 2009 www.computer.org/intelligent 49

Radii database,4 and the US National Insti-tute of Standards and Technology (NIST) Phase Equilibria Diagrams (PED) Data-base.5 It combines Semantic Web and Web 2.06 technologies to

provide a single Web-based platform to key materials science databases and anal-ysis tools;enable the mapping between database schemas and the consequent integration of heterogeneous and disparate data sets;construct ontology-based query state-ments dynamically and precisely;provide an intuitive, Google-like, user-friendly search interface; andenhance the interactivity and usability of computer-based tools for materials scientists.

An Example ScenarioAt the University of Queensland, we are working with fuel cell scientists based at the Australian Institute of Bioengineering and Nanotechnology (AIBN) who are searching for novel oxygen-ion-conducting materials that can operate more efficiently at lower temperatures for longer durations. For ex-ample, a critical component of a fuel cell is the electrolyte. The compound used for the electrolyte must have oxygen conductivi-ties > 10–1 Scm–2 (Siemens per centimeter squared) and mechanical and chemical sta-bility at elevated temperatures (500°C). In order to discover new potential compounds for fuel cell electrolytes, fuel cell scientists want rapid answers to queries such as “Give me compounds that contain tungsten- oxygen-X (where X is a different cation), with bond lengths between Y and Z nano-meters, with large anomalies and anisot-ropy in the positional parameters of oxy-gen, and with bond angles between J° and K°, and that are stable below 500°C.”

To answer such queries, scientists cur-rently have to manually search, retrieve, process, and correlate data from a num-ber of related but disparate databases in-cluding the ICSD, the FactSage thermody-namics database, the Ionic Radii database, and the NIST PED Database. One of the greatest hurdles to this process is that the search interfaces, metadata terms, data structures, formats, and metrics are in-consistent across these databases. For ex-ample, temperature factors can be rep-resented in three different formats: the isotropic temperature factor, the temper-

ature factor, and the mean square ampli-tude of vibration. Sophisticated database integration and mining tools are required so that fuel cell scientists can more easily retrieve answers to such queries and itera-tively home in on potential new areas of interest deserving of investigation.

MatSeek is designed and developed to seamlessly interrogate, retrieve, integrate, and present data across the databases. Fig-ure 1 illustrates the scientific workflow and envisaged methodology in which MatSeek will be employed to drive the experimen-tal program for preparing and testing novel compounds.

System ArchitectureFigure 2 illustrates the overall system archi-tecture, which comprises a set of key com-ponents on the server and client sides and the MatSeek Web search interface. The de-sign of the system was based on a decision to adopt Web 2.0 technologies such as Ajax and Web services. This approach enables fast, flexible development of a user-centric application that provides real-time access to changing, shared data.

On the server side are these key compo- nents:

The MatOnto ontology7 is represented in OWL and provides the mediation be-tween queries across the disparate data-bases. A detailed description of MatOnto is provided in the next section.Apache Axis28 is the core engine for Web services. An independent, adaptable, and reusable Web service provides the search entry point to external databases including the ICSD and the Ionic Radii database.A Web application has been developed using JavaServer Pages (http://java.sun.com/products/jsp) on top of Apache Tom-cat. This provides the gateway to search-ing the NIST PED Database, rendering 3D crystal-structure images, calculating bond lengths and angles, locating and re-trieving scholarly references, and export-ing Crystallographic Interchange Format (CIF) files.

On the client side are the components on which MatSeek’s search interface is based:

Localknowledge

base

MatDLdigitallibrary

Inorganic CrystalStructure

Database (ICSD)

FACT/SAGEthermodynamic

software

Scientific-data-integration system

Scientific-publication

system

• Ontology-based federated search interface• Preprocessing of retrieved data• Information integration• Presentation/visualization

IonicRadii

database

Phasediagramsdatabase

First cut of potential compounds

• Statistical analysis of data, parameters, and ranges• Experimental program design

Scientific-workflow system

• Compound preparation• Crystal development

• Performance testing• Characterization

Solid-state chemistry

• Analysis of results• Molecular-dynamics modelling and simulation• Identification of new trials

Figure 1. A schematic illustration of overall project structure and components. This shows the workflow involved in the discovery of new materials.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 4: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

50 www.computer.org/intelligent Ieee InTeLLIGenT SySTeMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

A Sparql JavaScript library (www. thefigtrees.net/lee/sw/sparql.js) supports querying of the ontologies on the server side.An OpenLink Ajax Toolkit framework (OAT, http://oat.openlinksw.com) en-ables MatSeek to invoke the Web service for data retrieval at the server side and pre-sent retrieved data at the client side. OAT is an open source JavaScript-based library for browser-independent rich-Internet- application development. Ajax program-ming is a Web 2.0 development technique that enhances Web pages’ responsiveness, interactivity, and usability.9

MatSeek’s user interface is rendered by the Dojo widget library.10 Dojo is an open source JavaScript library, designed for the rapid development of Ajax-based ap-plications and Web sites.

Figure 2 illustrates how these compo-nents are combined to provide the interface between the users and the different materi-als science databases.

The OntologiesThis section describes the MatOnto ontol-ogy, which defines the key materials science concepts, and the Referential Relationship ontology, which is used to relate concepts between different databases.

The MatOnto OntologyThe aim of MatOnto is to represent and re-late knowledge about materials, their struc-ture, and properties and the processing steps involved in their composition and engineer-ing.7 In addition, MatOnto should

provide an extensible framework that en-capsulates the top-level structured knowl-edge of materials science;enable integration of and mapping be-tween disparate databases within the ma-terials science domain;enable the modeling and capture of precise provenance data in both the digital and physical domains (this is essential to en-able verification, validation, comparison, and reuse of experimental results); and

enable the inferencing and extraction of new knowledge in the materials sci-ence domain, through the application of SWRL (Semantic Web Rule Language) rules and a reasoning engine.

Because MatOnto needs to be able to de-scribe events and objects in both the real physical world and the digital domain, and needs to be extensible to incorporate new databases as they become available, the de-cision was made to base it on an upper on-tology. Dolce11 was chosen as the upper ontology because it has been previously ex-tensively validated through numerous appli-cations and has ongoing support and main-tenance. Dolce stems from the Entity root class, which has three subclasses: Endurant, Perdurant and Abstract, from which we de-fine MatOnto subclasses, including the root class matonto:Material.

We identified five core properties associ-ated with matonto:Material:

matonto:Property includes the materi-al’s properties and subproperties (Me-chanical, Electrical, Thermal, Chemical, Magnetic, Biological, Acoustical, Opti-cal, and Radiological).matonto:Family includes the material’s classification (Metal, Glass, Ceramic, Polymer, Hybrid, and Elastomer).matonto:Process includes Manufactur-ing or Measurement processes.matonto:Structure includes the mate-rials’ structure (Crystalline or Amor- phous).matonto:Measurement includes the data resulting from the Measurement, Prop-erty, Performance, Modeling and Simu-lation, or Characterization process.

The relationships between these classes and properties are illustrated in Figure 3.

In the design of MatOnto, we also lever-aged a number of existing peer-reviewed on-tologies: Ontolingua’s Standard Units and Dimensions,12 the Joint Academic Classifi-cation of Subjects (JACS),13 and the W3C’s Time Ontology in OWL.14 We also extended EXPO,15 an ontology for describing scien-tific experiments with the concepts of events and processes, to enable representation of materials processing steps.

We also developed the class matonto:Crystalline (which is a subclass of matonto:Structure). Figure 4 provides an overview of the Crystalline subontology, which also

ICSD Ionic Radiidatabase

Phasediagramdatabase

3D crystalstructure

image

Bond lengthsand angles

Scholarlyreferences

CrystallographicInterchangeFormat file

Schemamapping

Sparql library

Openlink AjaxToolkit Framework

Ontologies Axis2JavaServer

Pages

Dojo

Gateway totools

Server sideClient side

Dataretrieval

Figure 2. A high-level view of MatSeek’s system architecture. This shows the components on the client and server that together provide the interface between the users and the databases and other tools.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 5: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

January/February 2009 www.computer.org/intelligent 51

facilitates the link between data in the Ionic Radii database and the ICSD.

Figure 5 (on the next page) demonstrates a subset of MatOnto—defining aspects of structural properties and measurement data. This figure illustrates the classes that map onto entity attributes of the ICSD and Ionic Radii database schemas. The arrow la-beled owl:equivalentClass indicates that the ICSD chemicalElement and Ionic Radii ion classes have the same instance populations. The methodology for mapping between da-tabase schemas is discussed in the section “Implementation and User Interfaces.”

referential relationship OntologyThis ontology models referential relation-ships between entities within a relational database. Figure 6 (on the next page) dem-onstrates the ontological structure, which consists of four interlinked classes: data-base, entity, keyAttribute, and non-keyAt-tribute. In particular, the keyAttribute class links to itself in a bidirectional way because it contains populations of primary and for-eign keys. This ontology enables MatSeek to

infer referential relationships between en-tities through foreign keys from the entity attributes mapped onto search keywords,construct an SQL query statement dy-namically and accurately that includes the attributes and entities and that also joins the entities, and

search with controlled keywords, thereby providing a user-friendly search inter-face that requires minimum previous knowledge.

The User InterfaceFigure 7 (on the next page) provides an an-notated screen capture of MatSeek’s search interface. On the right is an accordion wid-get that consists of two components: the Search Keywords panel and the Databases and Tools panel.

The Search Keywords panel provides a hierarchical menu of search keywords for browsing. The Databases and Tools panel enables users to access the ICSD, the Ionic Radii database, and the NIST PED database and analysis tools (3D crystal-structure im-ages, bond lengths and angles, scholarly ref-erences, and export of CIF files).

• On the top left, the Search panel consists of a text box for targeted search keywords, an accordion widget for customizing search conditions, and a button for invoking search requests. Finally, on the bottom left, the Re-sult panel is in a tabbed-page format that displays each individual search result.

Implementing the SystemA series of meetings were held with the fuel cell scientists in order to understand and satisfy their precise search and analysis re-quirements. This user needs analysis was the driving force behind the functionality we describe here.

Consider again the fuel cell scientist from the section “An Example Scenario.” He or she wants to search for ternary and quater-nary compounds that include tungsten and belong to the cubic-crystal system. Ionic

matonto:Structure

matonto:Material

cif: CrystalSystem

cif: Atom SiteProperty

Has structure

Has crystal system Has atom site property

cif:FractionCoordinate

cif:OccupancyFraction

cif: TemperatureFactor

matonto:Crystalline

cif:IonHas ion

Figure 4. The Crystalline subontology. This facilitates the link between the Ionic Radii database and the Inorganic Crystal Structure Database (ICSD).

matonto:Mechanical

matonto:Electrical

matonto:Thermal

matonto:Chemical

matonto:Magnetic

matonto:Biological

matonto:Acoustical

matonto:Optical

matonto:Radiological

matonto:Property

Hasproperty

matonto:Material

InvestigatesHas

measurement data

Has structure

Has processCategorized into

jacs:MaterialsScience

matonto:Family

matonto:MaterialProperty Data

matonto:PerformanceData

matonto:Modelling andSimulationData

matonto:CharacterizationData

matonto:Crystalline

matonto:Amorphous

matonto:ManufacturingProcess

matonto:MeasurementProcess

matonto:Metal matonto:Glass matonto:Ceramic matonto:Polymer matonto:Hybrid matonto:Elastomer

matonto:Process

matonto:Structure

matonto:MeasurementData

Figure 3. A high-level view of the MatOnto ontology. This illustrates how we define and relate concepts associated with the processing, structure, and properties of materials.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 6: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

52 www.computer.org/intelligent Ieee InTeLLIGenT SySTeMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

conductivity data is stored in the Ionic Ra-dii database, whereas crystal-structure in-formation is stored in the ICSD. Conse-quently, MatSeek needs to first support

search requests across both the ICSD and Ionic Radii database simultaneously. The aggregated search results are then used as input to the NIST PED Database and analy-

sis tools, which enable further filtering and identification of potential compounds, based on stability and bond lengths.

Figure 8 demonstrates the query con-struction process. First, a user selects search keywords from the MatOnto interface on the right. Second, as the user confirms the selected keywords, MatSeek displays the keywords in the text boxes in panel on the left. The keywords correspond to the entity attributes of the ICSD and Ionic Radii data-base schemas. Finally, the user can custom-ize search conditions via the accordion con-tainer below the text box.

After the user hits the Search button, MatSeek responds to the search request in the following way. First, it maps the search keywords onto the entity attributes in the database schemas through MatOnto. Fig-ure 5 demonstrates the mappings through the MatOnto ontology. For example, the structureFormula class matches the ICSD attribute icsd.STRUCT_FORM, while the ionicRadius class matches the Ionic Ra-dii database attribute ionic_radii.IONIC_RADIUS. Additionally, the predicate owl:equivalentClass bridges the semantic gap between both databases through the equiv-alence relationship between the chemi-calElement and ion classes in terms of sym-bolic representations. The simple Sparql query in Figure 9 supports the mappings between search keywords or ontological terms and the entity attributes—for exam-ple, MatOnto:structuredFormula and icsd.STRUCT_FORM.

After mapping search keywords to the

material measurementdata

structureproperty

crystallinestructure

crystalsystem

hasmeasurement

data

hasCrystalSystem

crystal_system.CRYST_SYS

hasProperty

materialproperty

chemicalcomposition

structuredformula

chemicalelement

icsd.STRUCT_FORM

hasElement

p_record.EL_SYMBOL

hasFormula

characterizationdata

Ion

ionic_radii.IONowl:equivalentClass

coordinationnumber

charge

ionic_radii.CHARGE

ionic_radii.COORDINATIONionic radius

ionicparameter

ionicRadii.IONIC_Radius

determinedBy

hasRadius

Figure 5. A subset of the MatOnto ontology. This illustrates the link between the ICSD and the Ionic Radii database.

has Entity

hasNon-keyhasKey

database

entity

non-keyAttribute

keyAttribute

references/referencedBy

Figure 6. The Referential Relationship ontology. This enables existing database capabilities to be leveraged for optimum performance.

Figure 7. MatSeek’s search interface. This provides the user interface for searching the databases and accessing the analytical tools.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 7: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

January/February 2009 www.computer.org/intelligent 53

entity attributes, MatSeek constructs a SQL query statement for the ICSD (be-cause there exists an implicit whole/part relationship between icsd.STRUCT_FORM and ionic_radii.ION). Given the mapped entity attributes including icsd.STRUCT_FORM, p_record.EL_SYMBOL, and crys-tal_system.CRYST_SYS, MatSeek infers referential relationships between these three entities through the Referential Relation-ship ontology, constructs an SQL query statement dynamically that includes the attributes and entities, joins the entities, and queries the ICSD using the SQL statement. Figure 10 demonstrates how the Referential Relationship ontology models the referential relationships between the icsd, p_record, and crystal_system entities. Through addi-tional inferencing, MatSeek identifies two more entities that were missing components of the referential relationship chain: space_group and space_group_number. Figure 11 (on the next page) shows the resulting SQL statement generated by MatSeek.

MatSeek retrieves search results from the ICSD in a table format that consists of rows of data items and associated metadata. Mat-Seek maps the metadata onto the MatOnto class names and specifies the data items as instances of the mapped classes. Figure 12 (on the next page) demonstrates the popu-lation of the ICSD resulting data items and their binding through the ICSDResultRow instance. MatSeek also replicates instances of class chemicalElement to ion because of the equivalence relationship.16 After the rep-lication, MatSeek generates an SQL query statement that includes instances of class ion and the entity attributes in the Ionic Ra-dii database that correspond to the search keywords. Figure 13 (on the next page) shows the resulting SQL statement.

The system queries the Ionic Radii data-base, retrieves the results, and populates the data so that it is compliant with MatOnto. Figure 12 demonstrates the population of resulting data items from the Ionic Ra-dii database and their binding through the IRDBResultRow instance.

Finally, the system retrieves and presents the complete aggregated results set using the Sparql query statement in Figure 14 (on page 55). This excludes auxiliary in-stances of the classes ICSDResultRow and IRDBResultRow, converts the collection into a table model, and sends it back to the client application. Figure 15 (on page 55) demon-strates the returned aggregated result set.

Once the user has identified potential in-teresting compounds using the search func-tionality we just described, MatSeek then enables users to access the NIST PED Da-tabase and additional analysis tools through

the Databases and Tools panel shown in Figure 7.

For example, users might want to submit a particular compound (retrieved by search-ing the ICSD and the Ionic Radii database)

Figure 8. An example of a search request. The user interface enables complex queries to be constructed using terms drawn from the ontologies.

PREFIX MatOnto: <http://localhost:8080/Onto/MatOnto/ MaterialsOntology.owl#>PREFIX RDB: <http://localhost:8080/Onto/RDB/RDB.owl#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?entityAttributeWHERE { ? entityAttribute rdf:type MatOnto: structuredFormula}

Figure 9. Example of a Sparql query. This illustrates the mapping between terms in the MatOnto ontology and the ICSD.

has Entity

hasNon-keyhasKey

database

entity

non-keyAttribute

keyAttribute

references/referencedBy

p_recordicsdspace_groupspace_group_numbercrystal_system

p_record.EL_SYMBOLicsd.STRUCT_FORMcrystal_system.CRYST_SYS

p_record.IDNUMicsd.IDNUM

icsd.SGRspace_group.SGR

space_group.SGR_NUMspace_group_number.SGR_NUM

space_group_number.CRYST_SYS_CODEcrystal_system.CRYST_SYS_CODE

joins

joins

joins

joins

Figure 10. Employing the Referential Relationship ontology. This figure shows how this ontology is used to relate the icsd, p_record, and crystal_system entities.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 8: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

54 www.computer.org/intelligent Ieee InTeLLIGenT SySTeMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

to the NIST PED Database to view its sta-bility at different temperatures and its 3D crystal structure. Figure 16a demonstrates the page that results from submitting one of the compounds (listed in Figure 1) into the NIST PED Database. In Figure 16a the de-tails of the phases at different temperatures are displayed in a PED. In Figure 16b, a 3D crystal-structure image for the compound is rendered.

In addition, users requested the ability to retrieve additional related information on a targeted compound from the ICSD, includ-ing calculated bond lengths and angles and scholarly references. These options are also available via the Databases and Tools panel. Figure 17 (on page 56) illustrates the results from submitting a particular com-pound of interest to these particular ICSD tools.

User FeedbackWe evaluated MatSeek by deploying it within a team of fuel cell scientists working at the AIBN. A survey of users conducted after usability testing indicated that Mat-Seek greatly increased the speed and effi-ciency at which users could search, aggre-gate, and analyze materials science data.

However, they requested a number of changes to the user interface to improve us-ability. For example, they requested that the pull-down menus of search keywords be moved to the left side (see Figure 8). They also requested the incorporation of FactSage within the federated search interface. In ad-dition, they requested the incorporation of statistical-analysis tools such as R (www. r-project.org) and MatLab within MatSeek’s menu bar.

One of the major limitations identified by users was a lack of complete, comprehen-sive data in the publicly available databases. Many areas of interest within the ICSD and the NIST PED Database are completely lacking in data. However, the coverage of these databases is expected to improve over time. Commercial databases are more com-plete and comprehensive but outside the scope and budget of this project. Hopefully, over time, as the software tools become more reliable and sophisticated and as the culture of sharing materials science data through open-access archives becomes more widely adopted by the materials science commu-nity, this situation will improve.

As far as we are aware, there are no other open source workbenches for

materials science that are built on a combi-nation of Semantic Web and Web 2.0 tech-nologies. The MatOnto ontology enables streamlined integration of highly hetero-geneous databases. The decision to base it on an upper ontology enables easy exten-sibility and integration of new databases.

SELECT icsd.STRUCT_FORM, crystal_system.CRYST_SYS, p_record.EL_SYMBOLFROM icsd, p_record, crystal_system, space_group, space_group_numberWHERE icsd.STRUCT_FORM LIKE ‘%W%’ AND crystal_system.CRYST_SYS=‘cubic’AND crystal_system.CRYST_SYS_CODE = space_group_number.CRYST_SYS_CODEAND space_group_number.SGR_NUM = space_group.SGR_NUMAND space_group.SGR = icsd.SGRAND icsd.IDNUM = p_record.IDNUM

Figure 11. Example of a complex SQL query generated by MatSeek. This retrieves all ternary structures that contain tungsten and are cubic.

hasCrystalSystem

structureProperty

crystallineStructure

CrystalSystem

ICSDResultRow

resultRow

ICSDResultRow_1

cubic

Zr W

Zr W

V

4

0.6600

W2Zr

IRDBResultRow

coordinationNumber

charge

IRDBResultRow_1

hasElement

chemicalComposition

structuredFormula

chemicalElement

owl:equivalentClass

Ion

hasRadius

ionicRadius

ionicParameter

determinedBy

hasFormula

derivedFromhasDataItem

Figure 12. Generating result sets. This figure illustrates the population of data items retrieved from the ICSD and the Ionic Radii database.

SELECT ionic_radii.ION, ionic_radii.CHARGE, ionic_radii. COORDINATION, ionic_radii.IONIC_RADIUSFROM ionic_radiiWHERE ionic_radii.ION = ‘W’ OR ionic_radii.ION =‘ZR’

Figure 13. The SQL statement generated by MatSeek. This retrieves ionic radii data for compounds containing tungsten and zircon.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 9: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

January/February 2009 www.computer.org/intelligent 55

The dynamic mapping between ontologi-cal terms and the entity attributes from da-tabase schemas relieves users from needing to understand multiple metadata vocabu-laries. Moreover, the adoption of Web 2.0 technologies (for example, Ajax) enables fast, flexible software development; a user-friendly Web-based search interface; real-time access to changing, shared data; and search results that can be saved in a format that can easily be reused as input to subse-quent queries.

The system developed to date is a work-ing prototype that demonstrates the benefits of a single entry point to key materials da-tabases and analysis tools. However, further effort is required to improve the system’s usability and robustness and to overcome existing limitations. Future effort will focus on the following five issues.

First, manual effort is currently required to map the database entities and attributes into the MatOnto ontology. Ideally, the up-loading and mapping of new database sche-mas would be streamlined and semiauto-mated via a Web interface.

Second, the currently incorporated tools focus mainly on the analysis of crystal structures. There are many other analytical and modeling services (such as R) and data mining tools (such as WEKA, the Waikato Environment for Knowledge Analysis17) that could usefully be added to improve the precision of predictions.

Third, we plan to add support for search and analysis workflows that can be saved, ed-ited, and reused via machine-processable lan-guages (for example, Kepler18 and Taverna19).

Fourth, the ontology will be extended to support the addition of more materials sci-ence databases, such as FactSage.

Finally, the robustness and integrity of the MatOnto ontology will be thoroughly tested through more complex queries that

depend on sophisticated reasoning and se-mantic inferencing rules (for example, SWRL and Pellet20).

PREFIX matOnto: <http://localhost:8080/Onto/MatOnto/MaterialsOntology.owl#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?icsdResultRow ?icsdDataItem ?icsdMetadataTerm ?struct_form ?structFormMetadata ?element ?irdbResultRow ?irdbDataItem ?irdbMetadataTermWHERE { ?icsdResultRow matOnto:hasDataItem ?icsdDataItem . ?icsdDataItem rdf:type ?icsdMetadataTerm . ?icsdResultRow matOnto:hasStructuredFormula ?struct_form . ?struct_form rdf:type ?structFormMetadata . ?struct_form matOnto:hasElement ?element . ?irdbResultRow matOnto:derivedFrom ?element . ?irdbResultRow matOnto:hasDataItem ?irdbDataItem . ?irdbDataItem rdf:type ?irdbMetadataTerm }

Figure 14. Example of a Sparql query. This aggregates the retrieved results and maps them into the common MatOnto format.

Query : Search: structured Formula, crystalSystem, ion, charge, coordinationNumber, ionicRadiusCondition: crystal_system.CRYST_SYS = ‘cubic’ AND , icsd. STRUCT_FORM_ LIKE ‘%W%’

w2 Zr

w2 Zr

w2 Zr

w2 Zr

w2 Zr

w2 Zr

w2 Zr

w2 Zr

cubic

cubic

cubic

cubic

cubic

cubic

cubic

cubic

Zr

W

W

Zr

W

W

W

Zr

4

6

4

4

5

6

6

4

V

VI

VI

VIII

VI

V

IV

IX

0.6600

0.6000

0.6600

0.8400

0.6200

0.5100

0.4200

0.8900

structured formula

Specify Query Conditions

Results

# crystal system Ion charge coordination ionic_radius

Figure 15. A retrieved result set using MatSeek. This illustrates the retrieved ionic radii and ICSD data for compounds containing tungsten and zircon.

(b)

(a)

File

Figure 16. Extracting phase stability and crystal structure. This figure illustrates (a) a results page from the US National Institute of Standards and Technology Phase Equilibria Diagrams Database and (b) a rendered 3D crystal structure image.

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.

Page 10: An Ontology-Based Federated Search Interface for Materials ...165243/MatSeek_Cheung... · matics is emerging to address the issues of data management, ... Materi-als informatics is

56 www.computer.org/intelligent Ieee InTeLLIGenT SySTeMS

S e m a n t i c S c i e n t i f i c K n o w l e d g e i n t e g r a t i o n

References 1. W. Hunt, “Materials Informatics: Grow-

ing from the Bio World,” J. Minerals, Metals and Materials Soc., vol. 58, no. 7, 2006, p. 88.

2. S.J.L. Billinge, K. Rajan, and S.B. Sinnott, From Cyberinfrastructure to Cyberdiscov-ery in Materials Science: Enhancing Out-comes in Materials Research, Education, and Outreach, US Nat’l Science Founda-tion, 2006.

3. A. Belsky et al., “New Developments in the Inorganic Crystal Structure Database (ICSD): Accessibility in Support of Mate-rials Research and Design,” Acta Crystal-lographica Section B, vol. 58, no. 3, part 1, 2002, pp. 364–369.

4. R. Shannon, “Revised Effective Ionic Radii and Systematic Studies of Interatomic Dis-tances in Halides and Chalcogenides,” Acta Crystallographica Section A, vol. 32, no. 5, 1976, pp. 751–767.

5. M. Tanaka, “Toward a Proposed Ontol-ogy for Nanoscience,” CAIS/ACSI 2005: Data, Information, and Knowledge in a

Networked World, Canadian Assoc. for Information Science, 2005; www.cais-acsi.ca/proceedings/2005/tanaka_2005.pdf.

6. T. O’Reilly, “What Is Web 2.0—Design Patterns and Business Models for the Next Generation of Software,” O’Reilly, 2005; www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html.

7. K. Cheung, J. Drennan, and J. Hunter, “Towards an Ontology for Data-Driven Discovery of New Materials,” Proc. 2008 AAAI Spring Symp. Semantic Scientific Knowledge Integration, AAAI Press, 2008, pp. 9–14; www.aaai.org/Library/Symposia/Spring/ss08-05.php.

8. P. Srinath et al., “Axis2, Middleware for Next Generation Web Services,” Proc. 2006 Int’l Conf. Web Services (ICWS 06), IEEE CS Press, 2006, pp. 833–840.

9. K. Smith, “Simplifying Ajax-Style Web Development,” Computer, vol. 39, no. 5, 2006, pp. 98–101.

10. M.L. Reuven, “At the Forge: Dojo,” Linux J., no. 155, 2007, p. 10; www.linuxjournal.com/article/9554.

11. G. Aldo et al., “Sweetening Ontologies with Dolce,” Knowledge Eng. and Knowl-edge Management: Ontologies and the Se-mantic Web, Springer, 2002, pp. 223–233.

12. Y. Sure et al., “The SWRC Ontology—Se-mantic Web for Research Communities,” Progress in Artificial Intelligence, LNCS 3803, Springer, 2005, pp. 218–231.

13. Joint Academic Classification of Subjects Version 2 (JACS2), Higher Education Statistics Agency, 2006; www.hesa.ac. uk/index.php/content/view/894/263.

14. Time Ontology in OWL, World Wide Web Consortium (W3C) working draft, Sept. 2006; www.w3.org/TR/owl-time.

15. L.N. Soldatova and R.D. King, “An Ontol-ogy of Scientific Experiments,” J. Royal Soc. Interface, vol. 3, no. 11, 2006, pp. 795–803.

16. M. Smith, C. Welty, and D. McGuinness, OWL Web Ontology Language Guide, World Wide Web Consortium (W3C), 2004; www.w3.org/TR/owl-guide/#owl_equivalentClass.

17. I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Morgan Kaufmann, 2005.

18. S. Bowers et al., “Kepler/pPOD: Scientific Workflow and Provenance Support for As- sembling the Tree of Life,” Provenance and Annotation of Data and Processes, LNCS 5272, Springer, 2008, pp. 70–77; www.springerlink.com/content/978-3-540-89964-8.

19. D. Hull et al., “Taverna: A Tool for Build-ing and Running Workflows of Services,” Nucleic Acids Research, vol. 34, 2006, pp. W729–W732.

20. E. Sirin et al., “Pellet: A Practical OWL-DL Reasoner,” Web Semantics: Science, Services, and Agents on the World Wide Web, vol. 5, no. 2, 2007, pp. 51–53.

(b)(a)(a)

ICSD for WWW ICSD for WWWDetails of the selected entries References for the selected entries

Bond

Bond

Bond

Bond

Bond

Bond

Angle

Angle

Angle

Angle

Atom1

Print

Atom2 Atom3 Value Error

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

W1

2.6916

2.6916

2.6916

2.6916

2.6916

2.6916

179.9802

120.0000

60.0000

60.0000

W1

W1

W1

W1

Figure 17. Retrieving further related information. This figure illustrates the results from submitting a particular compound to the related tools: (a) calculated bond lengths and angles and (b) retrieved scholarly references.

t h e a u t h o r SKwok Cheung is a PhD student at the University of Queensland’s Australian Institute for Bio-engineering & Nanotechnology. His research interests are Semantic Web technologies and ontol-ogy development. He received his Bachelor of Information Technology degree from the Univer-sity of Queensland. Contact him at [email protected].

Jane Hunter is Professor of eResearch at the University of Queensland’s School of Information Technology & Electrical Engineering. Her research interest is the application of Semantic Web technologies to scientific data management. She received her PhD in computer science from the University of Cambridge. Contact her at [email protected].

John Drennan is the director of the Centre for Microscopy and Microanalysis at the Univer-sity of Queensland’s Australian Institute for Bioengineering & Nanotechnology. His research interests are solid-state ionics within materials science. He received his PhD in chemistry from Flinders University. Contact him at [email protected].

Visit

on the Web

www.computer.org/intelligent

IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on February 2, 2009 at 20:56 from IEEE Xplore. Restrictions apply.


Recommended