Date post: | 06-Aug-2015 |
Category: |
Data & Analytics |
Upload: | projectlearnpad |
View: | 41 times |
Download: | 2 times |
Public Administration, LawsRequirements, Natural Language
Alessio Ferrari1
ISTI-CNR, Pisa, Italy
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 1 / 45
Preliminaries
Who am I?Alessio Ferrari, Ph. D. in Computer EngineeringThree years at GE Transportation Systems s.p.a. (Modelling andCode Generation)Three years at ISTI-CNR (Requirements Engineering and NLP)Main interests: artificial intelligence, natural language
Content of this TalkLearnPAd EU Project: model-based learning for PublicAdministrations (www.learnpad.eu)Requirements in LearnPAdNatural language pragmatic ambiguities
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 2 / 45
Context
Norm Natural LanguageRegulator
Graphical Language RequirementLaw Regulation
Requirements EngineerArtifact
Specification
Needs
Public Administation
Procedure
Software ProcedureCivil Servant
UserCitizen
Needs
WHY NOT?
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 4 / 45
LearnPAd Project
FP7- ICT-2013.8.2 European ProjectModel-based learning in the Public Administration (PA) domainIDEA 1: PA procedures can be modelled with Business ProcessModel and Notation (BPMN)IDEA 2: PA procedures can be enriched by civil servants withNatural Language (NL) descriptions
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 5 / 45
LearnPAd: Overview
Quality of service of PA improved
Quick changes in PA procedures addressed
Process-driven learning provided
Informative learning provided
Procedural learning provided
Knowledge assessment performed
Knowledge sharing fostered
Learning support provided
Learners engaged
Meritocracy promoted
Quality of learning content ensured
Learning content accessed by learners
Learning content defined
Basic definition of learning content
provided
Iterative definition of learning content
provided
Cooperation fostered
Learning content
increased
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 8 / 45
LearnPAd: Requirements Process
ObjectiveAchieve a clear and agreed set of requirementsfor the LearnPAd platform
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 9 / 45
EU Projects Peculiaritiesnumber/distribution of partners: 9 partners, plenary discussiondifficultculture: Italy, France, Switzerland, Austria, Lithuania, need tomeet/talkindustrial vs academic mindsets: 4 academic, 2 close sourcecompanies, 2 open source, 1 PA, industries more practical in REbackground: different domains and terminologyabstraction: focus on specific background leads to lack ofabstractionage/roles: uneasiness of young vs oldobjectives: requirements introduced to pursue specific interestsfocus: the project is not the main activity of participants
What often happens...Everyone develop their piece of the project → integration issues
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 10 / 45
LearnPAd: Requirements Process
KJ Sessions
Collaborative Requirements
Sessions(WIKI)
Requirements Analysis
Preliminary Requirements
Structured Requirements
Justifications
Goal Model
Learning
Modelling
Quality
Glossary Tags
Requirements Consolidation
ConsolidatedRequirements
GOAL Modelling
Goals evaluation
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 11 / 45
KJ Sessions
Activity24 people in 3 groups: Modelling, Learning, QualityDescription of the task by the moderatorWrite requirements in cardsDiscuss the requirementsSecond session to add new requirements
People really excited and high degree of participationInitial individual activity mitigated age/role effects and objectivediscrepanciesSecond session to align terminologyModerators: with recognized authority, or external (notrepresentative of any group)Still, most of the 249 requirements were poorly specified
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 12 / 45
Collaborative Refinement
Requirements uploaded in a Wiki platform (XWiki)Justifications given and Refinements provided
People rather motivated (even if motivation was not perceived)249 → 337 requirementsPeople do not contribute to the requirements of others
Still, requirements were poorly specifiedA selected task force of project participants provided a set of 191consolidated requirementsPeople directly asked to clarify their requirementsExcel sheets used for refinement and consolidation
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 13 / 45
Goal Modelling
Bottom-up goal model definitionFrom requirements to justifications (goals)Provide higher degree of abstraction and spot-our missing needs
Goal ModelsStage 0Reqs: -
Stage 1R: 82
Stage 2R: 78
Stage 3R: 90 Score
G S E G S E G S E G S EMain 24 4 3 32 4 5 24 4 4 24 4 4 HLearning content accessed - - - 9 0 1 32 4 1 47 5 4 HQuality of WIKI Documents - - - 17 0 2 17 0 2 17 0 2 MQuality of BP Models - - - 12 0 3 17 0 4 17 0 4 MLearning support provided - - - 13 0 1 17 1 1 17 1 1 HBP Models edited - - - - - - 15 0 2 15 0 2 MBP Models reused - - - - - - 8 0 0 8 0 0 MQuality by logging - - - - - - 15 0 0 15 0 0 MIterative definition of content - - - - - - 19 0 1 19 0 1 MPlatform flexibility enforced - - - - - - 11 0 0 11 0 0 HKnowledge assessment - - - - - - 8 0 0 39 1 5 MProcedural learning provided - - - - - - - - - 24 2 1 LTOTAL 24 4 3 83 4 12 183 9 15 283 13 24
Table : Growth of the goal models at each stage. R = number of originalrequirements. G = number of hard-goals and requirements. S = number ofsoft-goals. E = number of expectations.
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 14 / 45
What have we learnt
People have to be trained about writing requirementsPeople from academia less confident in collaborativerequirements elicitationToo few user requirements → involve users in separate meetingsNeed for a web-moderator/leader to motivate collaborativerefinementXWiki is good to get statistics on requirementsGoal modelling useful to have abstract view and spot out missingneeds but requires effortTooling not appropriate for goal modelling and sharing (wepreferred sharing with Google Docs but traceability was poor)Integrated tools for the whole requirements process are missing
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 15 / 45
Improved Requirements Process
KJ Sessions Collaborative Requirements
Sessions(XWiki)
Requirements Analysis
Preliminary Requirements
Structured Requirements
Justifications
Goal Model
Learning
Modelling
Quality
Glossary Tags
VOLERE Requirements
Analysis
ConsolidatedRequirements
and Justifications
GOAL Modelling(Objectiver)
Goals evaluation
Requirements Lesson
Preliminary Glossary
Web Moderator
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 16 / 45
LearnPAd: Quality of NL descriptions ensured
BP Model
BP Manager
WIKI Doc
Load SelectCriterions
VALIDATE
Press Validate
Quality Evaluation
Page
Complexity
Structuring
Ambiguity
Complexity: 0.9 (Reduce)
Structuring: 0.1 (Increase)
Ambiguity: 0.7 (Reduce) INSPECT
INSPECT
InspectionPage
The document shall be sent to the proper authorities as soon as
possible after the document has been signed by the officer
WIKI Doc (Non Editable)
Press Inspect
MODIFY
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 18 / 45
LearnPAd: Quality of NL descriptions ensured
ObjectiveIdentify typical NL defects of PA documents
RationaleWe do not have contributions of civil servantsWe ask civil servants about their difficulties with their currentdocumentsWe identify quality defects of currently existing PA documents,normally edited (and read) by civil servants
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 19 / 45
Defects in NL Descriptions: Process
Perform Interviews
Define Questionnaire
Deliver Questionnaire
Evaluate Questionnaire
List of most relevant
categories of defects to be
detected in PA procedures
Evaluate Web-links defining guidelines for editing PA procedures
Define guidelines for editing PA procedures
Guidelines for editing PA
procedures
List of categories of defects to be detected in PA
procedures
Evaluate guidelines
Rule-based identifiable
defects
Non-rule based
identifiable defects
Define defect categories to be identified with
machine-learning
Implement rule-based approach for the
identification of most relevant defects
Tag data-set according to categories
Select PA procedures from the Web
Select a sub-set of PA procedures as data-
set
Implement machine-learning approach
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 20 / 45
Defects in NL Descriptions: From the interviews
7 people interviewed1 EU officer, 4 people from administrative staff of CNR (ResearchInstitute), 2 municipality employees from the Marche RegionWhich are the defects in the NL documents you deal with?
DefectsMost of the time, procedures are not described anywhere!Cross-references with too many lawsAmbiguity and VaguenessLack of contextRedundancy
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 21 / 45
Using Collective Intelligence to Detect Pragmatic Ambiguities
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 23 / 45
Ambiguity in Natural Language Requirements
It would be nice to have formal requirements, but NL is the mostwidely understood communication codeNL is inherently ambiguousAmbiguous requirements might cause misinterpretationsamong stakeholdersThe developer/modeller might decide a possible interpretation ofthe requirement - unconscious disambiguationAmbiguities are lexical, syntactic, semantic, and...
PRAGMATIC
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 24 / 45
Pragmatic Ambiguities depend on the CONTEXT
Fe
-+
Common Sense Knowledge
Domain Knowledge
Other Requirements
Other Situational Aspects
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 26 / 45
Domain knowledge acquisition for different readers
DOCUMENT SET 1 DOCUMENT SET 2
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 28 / 45
Different readers analyse the same requirement
REQUIREMENT
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 29 / 45
Different readers compare their interpretations
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 30 / 45
Overview
REQUIREMENT
DOMAIN DOCUMENTS
Domain Knowledge Graph Construction
Requirement Interpretation
Interpretation Comparison
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 31 / 45
Domain Knowledge Modelling
We model the domain knowledge as a weighted graphEach node is a conceptEach edge represents a connection among conceptsThe weight of the edge represent how close is the connectionbetween two conceptsThe lower the weight, the closer the connectionThe weight is derived from the number of co-occurrences
We build this weighted graph starting from Web pagesconcerning the domain of the requirements document
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 32 / 45
Domain Knowledge Graphs
0.17
0.05
0.167
0.33
0.25
0.25
0.16 0.037
0.1
0.25
0.11 0.0710.17
0.5
0.5
0.33
0.33
patient
observ
deathlocat
visit time
careinform
patient
risk
deathlocat
visit sourc
care
sign
contact
hospit
hospit
Lower weights indicate stronger connections
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 33 / 45
Requirements interpretation as a least-cost path search
Interpreting a requirement is activating the concepts of therequirement in the knowledge graphActivating two concepts in a requirement implies the activation ofother neighboring conceptsThe concepts that are activated are those that are more closelyconnected with the concepts in the requirement (i.e., their edgeshave lower weight)The interpretation of the requirement is a least-cost path searchwithin the domain knowledge graph
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 34 / 45
Requirements Interpretation
REQ. 1 - The system shall store patient data
system
store
patient
data
button
feedback
screen
database
retrieve
memory
content
location
vaccine
name
sicknessdoctor
surname
ram
disk
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 35 / 45
Interpretation Comparison
system
storepatient
data
button
feedback screen
database
retrieve
memory
content
location
vaccine
namesickness
doctor
noise
return
health
duration
care
9
10
5
9
9 + 10 + 5σ = = 0.38 = 𝜏 < 0.5
AMBIGUITY
surname
ram
disk
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 36 / 45
Issues on Coverage and Threshold
CoverageThe content of the domain document shall cover the content of therequirements specificationMinimum coverage: ρ = terms in requirements∩terms in documents
terms in requirements
ThresholdMultiple analysis with different combinations of documents tocompute similarities: σ̄(Ri) and σmin(Ri)
Thresholds computed as average of the similarities for R1 . . .Rn
τσ̄ and τσmin are the considered thresholds
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 37 / 45
Experimental Evaluation
SourceRequirement specification of a system for Outbreak Management(OM) issued by the Public Health Information Network (PHIN)Data collection (names, vaccines, clinical samples) from peoplethat might be affected by an epidemic health event
Set-up114 requirements43 include pragmatic ambiguities (manual)25 domain documents5 different combinations of documents
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 39 / 45
Experimental Evaluation: Domain DocumentsID Title Linkd1 PHEMCE strategy http://goo.gl/hYaipmd2 Application to clinical and Public Health Practice http://goo.gl/hVVy1Yd3 Biodefense countermeasure Department of Defense http://goo.gl/I6U0Nsd4 Wikipedia page for “Case Definition” http://goo.gl/yPndtxd5 Wikipedia page for “Chain of Custody” http://goo.gl/4uvTucd6 Definition of “Chain of custody” http://goo.gl/OUgcQdd7 Communicable disease outbreak plan http://goo.gl/rV72wXd8 Foodborn outbreak management http://goo.gl/pTlgp9d9 Guidelines for the investigation and control of outbreaks http://goo.gl/Sv4Ebud10 Practice guidelines of the infectious diseases http://goo.gl/GjLvg2d11 Implementation guide ambulatory healthcare http://goo.gl/qEiLGRd12 Management of scabies outbreaks http://goo.gl/GUAbKSd13 Modeling information systems architectures di P. Grefen http://goo.gl/j2E4Lxd14 Outbreak control http://goo.gl/f0HC1hd15 Outbreak management guidelines for healthcare http://goo.gl/EcYVEid16 Surveillance and response in humanitarian emergencies http://goo.gl/ybje6id17 PHIN guide for syndromic surveillance http://goo.gl/lEz8zwd18 PHIN messagging guide for syndromic surveillance http://goo.gl/3AAXNEd19 Developing a management system: an overview http://goo.gl/0l5sthd20 Industrial system 800xA system architecture http://goo.gl/RSaBnDd21 System architecture and complexity http://goo.gl/v44tC0d22 WHO guidelines for epidemic prearedness and response http://goo.gl/PK9yn7d23 Wikipedia page for “Management System” http://goo.gl/mgWfhhd24 Wikipedia page for “Outbreak” http://goo.gl/LUQEWmd25 Wikipedia page for “Scabies” http://goo.gl/fjYYrQ
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 40 / 45
Experimental evaluation: Combinations and Results
Combinations of Documents
k G1 |VG1 | |EG1 | ρG1
G2 |VG2 | |EG2 | ρG2
1 d1d3d5d7d9d11d13d16d17d19d20d23d25 7131 62265 0.99d2d4d6d8d10d12d14d15d18d21d22d24 5970 33325 0.98
2 d2d3d6d7d10d11d15d16d17d19d22d23 7383 49989 0.98d1d4d5d8d9d12d13d14d18d20d25d20d24 5826 46179 0.99
3 d6d7d15d22d16d23d1d9d18d25d8d14d24 6375 58736 1d2d10d17d3d11d19d5d13d20d4d12d20 6642 34882 0.98
4 d6d22d16d1d18d8d24d10d3d19d13d4d20 6914 46384 0.99d15d7d23d9d25d14d2d17d11d5d20d12 6400 49848 0.98
5 d22d1d8d10d19d4d15d23d25d2d11d5 6693 41735 0.99d6d16d18d24d3d13d20d7d9d14d17d20d12 6550 53973 1
Precision and Recall
Threshold p rτσ̄ = 0.3247 45% 58%τσmin = 0.2781 51% 63%
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 41 / 45
Observations
Requirements analysis tools shall be tuned to favour recall overprecision (Dan Berry)False negative cases are the main issue
“Demographic information should be collected about theinvestigator [...]”→ influence of the other terms in the computation of the similarity“Mapping interfaces and data dictionaries must be defined [...]”→ multi-word terms
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 42 / 45
Summary and Future Works
Unsupervised and statistical (not rule-based) methodConsider novel similarity metrics to emphasize the role ofsingle ambiguous termsConsider multi-word termsInclude the common-sense knowledge
I Concepts that are highly connected in the domain knowledge areless connected in the common sense knowledge
Integrate structural and dynamic beliefs about the world and thedomain within the knowledge graphs
Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 43 / 45