+ All Categories
Home > Documents > CIS 895 – MSE Project

CIS 895 – MSE Project

Date post: 15-Jan-2016
Category:
Upload: chun
View: 33 times
Download: 2 times
Share this document with a friend
Description:
CIS 895 – MSE Project. KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st , 2009 Naga Sowjanya Karumuri [email protected]. Outline. Project Data Flow Diagram Action Items Architectural Design Test Plan Formal Inspection Checklist Project Plan - PowerPoint PPT Presentation
Popular Tags:
26
CIS 895 – MSE PROJECT KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st , 2009 Naga Sowjanya Karumuri [email protected] 1
Transcript
Page 1: CIS 895 – MSE Project

CIS 895 – MSE PROJECT

KDD- Service based Numerical Entity Searcher (KSNES)

Presentation 2 on March 31st , 2009

Naga Sowjanya [email protected]

1

Page 2: CIS 895 – MSE Project

OUTLINE

Project Data Flow Diagram Action Items Architectural Design Test Plan Formal Inspection Checklist Project Plan Prototype Demonstration Questions / Comments

2

Page 3: CIS 895 – MSE Project

PROJECT DATA FLOW DIAGRAM:

NUMERICAL ENTITY SEARCHER

3

Page 4: CIS 895 – MSE Project

MODULES IN THE PROJECT

Webpage (JSP): For requesting and receiving information from the service.

POS Tagger (Java): Stanford POS Tagger

Numerical Phrase Extractor (Java): Implemented using Shallow Parsing Technique

Number-Unit/Date Pattern Recognizer (C++): Implemented based on the Numerical Quantifier developed by Benjamin Sapp, UIUC.

4

Page 5: CIS 895 – MSE Project

ACTION ITEMS

Implemented Numerical Phrase Extractor

Detailed Description of Test Plan

Wrote Formal Specification using USE

UML Representation of the System

5

Page 6: CIS 895 – MSE Project

ARCHITECTURAL DESIGN

6

Service Oriented Architecture

Page 7: CIS 895 – MSE Project

PACKAGE VIEW

7

Overall Package View

Class Descriptions, Attributes and Operations are contained in Architecture Design Document

Page 8: CIS 895 – MSE Project

SEQUENCE DIAGRAM

8

Page 9: CIS 895 – MSE Project

CLASS DIAGRAM(NPE PACKAGE)

9

Page 10: CIS 895 – MSE Project

CLASS DIAGRAM(NDPR PACKAGE)

10

Page 11: CIS 895 – MSE Project

IMPLEMENTING NUMERICAL PHRASE EXTRACTOR

Input: Tagged Text I/PRP lost/VBD thirty-three/JJ dollars/NNS in/IN

1998/CD

Regular Expressions are used to determine the numerical patterns in the input. thirty-three/JJ dollars/NNS in/IN 1998/CD

Output: Numerical Phrases thirty-three dollars in 1998 11

Page 12: CIS 895 – MSE Project

TAGSET

12

Page 13: CIS 895 – MSE Project

SOME PATTERNS

"\\d+-\\d+(/JJ|/CD) [a-zA-Z]+/NN"parses

"(between|Between|from|From|In|in|since| Since|during|During)/IN ..../CD (([a-zA-Z]+/CC|[a-z]

+/TO) ..../CD)?”parses

'between 1987 and 1997', 'in 2007 and 2008’13

\\d+-\\d+(/JJ|/CD) [a-zA-Z]+/NN

3-2/JJ lead/NN

20-20/JJ match/NN

Page 14: CIS 895 – MSE Project

ASSIGNING BOUNDS Words that will be detected so as to set the bounds like

>, <, ~, = “ = ” is used if no words are mentioned

14

Bound Corresponding words

> more than, no less than, no fewer than, at most, over

< up to, not over, no more than, at least, less than, not over than

~ about, around, approximately, some, nearly, almost,

Page 15: CIS 895 – MSE Project

SOME PATTERNS [a-zA-Z0-9]+/CD( percent/NN)?( out/IN)?

of/IN( the/DT)? ( [a-zA-Z]+/CD)?( [a-zA-Z]+/JJ)? [a-zA-Z]+(/NN|/NNS|/NNP)

parsesone of the five peopletwo of the groupsone of the rare cases89 percent of peoplefive of the seven former employees3 out of 5 people

15

Page 16: CIS 895 – MSE Project

PHRASES THAT CAN BE PARSED

16

Numerical Phrases

27 year-old boy

A 3-2 lead

9 in 10 people

About 100 miles per hour

200 adults and children

$3 million

About two-thirds of the vote

The 17-mile drive

Less than 10% support

Six-bedroom apartment

5.987 ml

10:00 a.m. CST

From 400 to 500 miles

Temporal Phrases

Last year

Next week

Monday – Sunday

January–December

1956-60

Mid-1990s

Between 1999 and 2008

17th centaury

18 April 2008

Dec 21, 2009

October 10th 1984

John, 67

Since 1998

Page 17: CIS 895 – MSE Project

PHRASES THAT ARE NOT CURRENTLY PARSED

Numerical Phrases Temporal Phrases

six-pack of drinks 31st of March 1998

$100 more Since mid-November

252° (as POS can’t parse this) the January-April period

17

Future Work:

These phrases can also be parsed by adding more patterns to the current system but for now the most important and commonly occurring patterns are considered.

Current goal is to develop a basic idea of numerical phrase extraction.

Page 18: CIS 895 – MSE Project

FORMAL SPECIFICATION

Created and validated using USE 2.3.1. All Classes are specified

All important attributes and methods are specified

Constructor methods are not specified Contained at the end of the Architectural

Design Document

18

Page 19: CIS 895 – MSE Project

TEST PLAN

Outputs are checked at each module by the developer by matching them to the results manually calculated Check if the POS tagger has given the tagged

text. Check if the numerical phrases are extracted Check if the numerical phrase is explained to

Value, Unit and Unit-Type. UML diagrams and the required

specifications will be checked for consistency by two fellow MSE students

User interaction will be tested by the developer and the technical inspectors.

19

Page 20: CIS 895 – MSE Project

FORMAL INSPECTION CHECKLIST

The following items are to be checked: The symbols used in the class diagram conform to UML

standards The symbols used in the sequence diagrams conform to

UML standards The classes in the class diagrams have corresponding

descriptions provided in the Architecture Document The descriptions of the classes in the Architecture

Document are clear and concise The classes in the USE model are consistent with those in

the Architecture Document All the requirements in the Software Requirements

Specification have been covered in the Architecture Document

The multiplicities in the USE model have been depicted in the class diagram 20

Page 21: CIS 895 – MSE Project

PROJECT SCHEDULE Key Dates

Presentation 1: February 24th, 2009 Complete Numerical Sub-Chunker

Presentation 2: March 31st , 2009 Complete Numerical Phrase Extractor

Presentation 3: April 10th, 2009 Patch up the modules Develop a GUI Set them up on the server

To completely submit the documents by April 13th, 2009 to the committee

Final Portfolio submitted by April 15th , 2009

21

Page 22: CIS 895 – MSE Project

PROJECT SCHEDULE

22

Page 23: CIS 895 – MSE Project

PROTOTYPE DEMONSTRATION

POS Tagger working For now it works on the local machine

Numerical Pattern Extractor For now it works on the local machine

23

Page 24: CIS 895 – MSE Project

PHASE 3 DELIVERABLES

Action items Component Design Assessment Evaluation Project Evaluation User’s Manual Formal Technical Inspection Checklists Presentation 3 Executable Project Source Code

24

Page 25: CIS 895 – MSE Project

TO-DO LIST

Revise the Documents Revise Project Schedule Work on the Phase3 deliverables Final Demo

25

Page 26: CIS 895 – MSE Project

Questions??

Suggestions!!

THANK YOU 26


Recommended