Date post: | 19-May-2015 |
Category: |
Business |
Upload: | thomasunivalor |
View: | 1,250 times |
Download: | 4 times |
1
Thomas Martinuzzo, Jr. Eng.
2
3
What is DIET ?DIET is an information extraction and manipulation toolDIET can extract information from the DEEP web by understanding
pages structures
Web surface : 20 Billion pages indexed by search engines
DEEP web : +600 Billion pages
« The 60 largest Deep Web sources contain 84 billion pages of content. That's about 750 terabytes of information, sufficient by themselves to exceed the size of the surface Web by 40 times. » Brightplanet.com
Pic from Maxumowners.org
4
DIET Features & Benefits Use artificial intelligence to build automatic wrappersNo to minimal user interventionUser can easily extract and manipulate information
5
Car website : Characteristics: List of cars by name with description, date, price,
picture … Over 100 pages of data ! Problem : No local search engine.
But … I am looking for Acura MDX 2005 or something like that !
…
Job website : Characteristics: List of jobs by title with small description, salary,
city. Over 800 jobs. Local search engine. Sort capabilities. Problem : We can only see 10 jobs by page. Unable to search by
salary range. Unable to sort by city.
BUT … I want to see all jobs over 75 000$ in one single page and save it for future consultation.
6
DIET TechnologiesDIET Core Web Services
Access only by certified clientsDIET Web Application
Users and services managersWeb based application (JSP/Servlet/JavaServer Faces/JavaBean)
Based on Java EE 5/Glassfish/MySql technology
7
Univalor WebsiteList of new technology group by domainsSimple search engine available
8
Using DIETWe want to extract and them to manipulate all available technologiesGive Univalor technologies URL to DIET :
http://www.univalor.ca/companies_available_technologies.asp
9
Wrapper are generatedDIET creates a Wrapper by learning the structures of Univalor
Webpages.DIET extracts data thru the Wrapper.DIET displays the results
10
Manipulate information with DIETOnce the information was extracted, it can be manipulated.
11
Plug-in opportunityDIET Core Web Services can be used by third party clientsInternet Explorer and Mozilla Firefox integration
Export capabilitiesExtracted information can be export on multiple storages formats
And more …Users can create their own WrappersDIET can be the perfect tool for DEEP search
12
Research and Development: Samuel Pierre, [email protected]
Commercialization and licensing
Didier Leconte, [email protected]
Thomas Martinuzzo, Jr. [email protected]
Thanks !