Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | jackson-lamb |
View: | 217 times |
Download: | 0 times |
© copyright 2009 Semantic Insights™
© copyright 2009 Semantic Insights™
Semantic Insights:Semantic Technology in Action
Chuck Rehberg, Chief ScientistSemantic Insights™ a Division of Trigent
Software, Inc
Topics Covered
• PriArt, a powerful new way to do research
• What PriArt does
• Investigation Examples
– EPA
– Fish & Wildlife
– USPTO – Potential Patent Infringement
– Internet Investigation
• Web 2.0, Web 3.0, and Cloud Computing
PriArt, a powerful new way to do researchVersion 1 Joe gave Bob the ball.
Can your keyword search do that?
Version 2 The ball was given by Joe to Bob.
Synonyms Robert said, “Joseph gave me the ball.”
Instance Cubs pitcher Joe DiMaggio handed the ball to Bob Kelly.Generalize The first guy gave the ball to the second guy.
Specialize DiMaggio gave little Bobby Fischer an autographed baseball.Pronouns Joe’s son is Bob. He gave his son a baseball.
Identity On February 17, the murder weapon, a baseball, was shown to Joe DiMaggio who subsequently transferred it to Bobby Fischer.
What PriArt does
1. PriArt starts with a plain-english statement of what you are interested in (we call it your investigation)
2. PriArt gathers information from a potentially large corpus of documents (by “reading” them) and generates a structured report containing only the information relevant to your investigation.
An EPA Example
• The Goal:
Suppose you were interested in finding what is published in a recent EPA report on the environmental effects of Mercury. You have a quote from an existing source and you want to know what the new EPA publication says about it.
EPA Example: Information Source• Information Source: U.S. Environmental
Protection Agency (EPA). (2008) EPA’s 2008 Report on the Environment.
– National Center for Environmental Assessment, Washington, DC; EPA/600/R-07/045F. Available from the National Technical Information Service, Springfield, VA, and online at http://www.epa.gov/roe
• The specific URL of interest is located at:
– http://oaspub.epa.gov/eims/eimscomm.getfile?p_download_id=485027
EPA Example: Investigation
• Your Investigation statement might look something like:
“The effects of Mercury on human health are diverse and depend on the forms of mercury encountered. Fetuses and children may be more susceptible to mercury and to neurological health effects. Prenatal exposures interfere with the growth.”
Fish & Wildlife Example: The Goal• Suppose you have specific topics you are
interested in and wish to know what the “Fish & Wildlife” reports say about them.
1. Migratory bird numbers are shrinking.
2. Wetlands destruction probably will contribute to shrinking.
3. Biomass energy crops will reduce available habitats.
4. Birds reduce insect populations in temperate forests.
5. Declines in migratory birds pose a threat to the health of our forests and farmlands.
Fish & Wildlife Example: Sources
• Information Sources: U.S. Fish & Wildlife Service, Division of Migratory Bird Management Reports
• Located at:http://www.fws.gov/migratorybirds/reports/reports.html
About the Quality of Results
• The quality of the results is primarily influenced by how semantically close the information documents are to your investigation.
• The more PriArt understands about your investigation, the better the results.
• There are two ways to improve the results:
1. Do the “Knowledge Engineering” to add Ontology, Logic, and Processing to automatically expand the meaning of the investigation, and/or
2. Or better yet… Let the machine do the work: add more information to your investigation.
Other things that effect the Results• The bottleneck is bandwidth
Cloud Computing will help (and is necessary to scale in general). However, with scaled “readers” the bottleneck becomes the server of the sources to read.
• PDF require special handlingPDFs presents challenges in identifying the content of lists and tables. We have heuristics to handle that. Charts are another matter. This requires much more work. But is it worth the effort?
PriArt “Reads” Documents in Real-Time
USPTO Example: Potential “Patent Infringement”
• Suppose you were interested in finding potential infringements to a specific patent in the US Patent Office database.
• Investigation of a specific Patent:
United States Patent #7,433,858 Rehberg, et al. October 7, 2008 Rule selection engine
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=rehberg.INNM.&OS=IN/rehberg&RS=IN/rehberg
Example Internet Investigation• Suppose you were interested in finding what is
known about certain aspects of Autism. You notice a paragraph on a webpage– http://www.autism-society.org/site/PageServer?
pagename=about_home
• The paragraph reads:
“Autism is a complex developmental disability that typically appears during the first three years of life and affects a person’s ability to communicate and interact with others. Autism is defined by a certain set of behaviors and is a "spectrum disorder" that affects individuals differently and to varying degrees. There is no known single cause for autism, but increased awareness and funding can help families today.”
PriArt: Web 2.0, Web 3.0, and Cloud Computing • Web 2.0
– Collaborative Investigations can improve the quality and accuracy of the investigations
• Web 3.0– Natural Language Processing
• Improves itself through Experience• On-going Training by our top PhD Linguists
– Common and Domain-Specific Dictionaries• Improves itself through Experience• Tools to semi-automate curation of Dictionaries
– Common and Domain-Specific Ontologies• Improves itself through Experience• Tools to semi-automate curation of Ontologies
• Cloud Computing– Essential for mass scalability. Work in progress…
Contact Us• We are seeking pilot projects and early Beta
sites now. Send email to arrange a demo.
• For more information, please contact me:Chuck Rehberg, Chief Scientist/CTO
Semantic Insights™, A Division of Trigent Software, Inc.
2 Willow St, Suite 201,
Southborough, MA 01745
Direct: +1.508.490.6053
Cell: +1.508.333.5726
www.semanticinsights.com
Blog: http://semanticinsights.com/wordpress/
© copyright 2009 Semantic Insights™