Main Activities
Research Areas
o Machine Learning Algorithms
o Probabilistic and Relational Models
o Optimization Under Uncertainty
o World Wide Web
o Life Sciences
o Ambient Intelligence
o Finance
Applicative Domains
Faculty: Francesco Archetti
Enza Messina
Guglielmo Lulli
Post Doc: Elisabetta Fersini
Luca Cattelani
Antonio Candelieri
PhD: Federico Alberto Pozzi
Machine Learning and Relational Data
- Traditional learning methods are consistent with the classical statistical inference problem formulation
are independent and identically distributed (i.i.d.)
aiuto!
Probabilistic Models
Learning Techniques
SRL
Probabilistic Models
Relational Representation
Learning Techniques
- but do not reflect the real world!
We need a solution able to deal with relationships and with uncertainty in more general terms
SL
The World is inherently Uncertain
Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution
Fever Ache
Influenza Random Variables
Direct Influences
Propositional Model!
Real-World Data (Dramatically Simplified)
PatientID Gender Birthdate
P1 M 3/22/63
PatientID Date Physician Symptoms Diagnosis
P1 1/1/01 Smith palpitations hypoglycemic
P1 2/1/03 Jones fever, aches influenza
PatientID Date Lab Test Result
P1 1/1/01 blood glucose 42
P1 1/9/01 blood glucose 45
PatientID SNP1 SNP2 … SNP500K
P1 AA AB BB
P2 AB BB AA
PatientID Date Prescribed Date Filled Physician Medication Dose Duration
P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months
Non- i.i.d
Multi-Relational
Solution: First-Order Logic / Relational Databases
Shared Parameters
Probabilistic Relational Models
Integrate uncertainty with relational model
Convenient language for specifying complex models
“Web of influence”: subtle & intuitive reasoning
Framework for incorporating heterogeneous data by connecting related entities (consider also relation uncertainty)
New problems:
Relational clustering
Collective classification
Open Problems: Inference and Learning
Level
Gene Cluster
Lipid HSF
Endoplasmatic
GCN4
Exp. cluster
Exp. type L
E
A
R
N
E
R
Heterogeneous
Information
Inference
Some Applications
Learning Models for Relational Data:
Relational Clustering
#origin_ref
#destination_ref
Link
♦ document_id
class
Document
lvR
Document Analysis
E. Fersini, E. Messina, F. Archetti, “A probabilistic relational approach for web document clustering”, Journal of Information
Processing and Management, Vol. 46, no 2, p. 117-130, 2010.
E. Fersini, E. Messina, F. Archetti. “Web page classification: A probabilistic model with relational uncertainty”. In Proc. of the 2010
Conference on Information Processing and Management of Uncertainty, 2010.
E. Fersini, E. Messina, F. Archetti, Probabilistic relational models with relational uncertainty: an early study in web page classification,
IEEE WI-IAT Workshop, 2009.
Publications
1. Constraint Learning
2. Objective Function Adaptation
Relational Classification:
Probabilistic Relational Models with Relational Uncertainty
Conditional Random Fields
Document Analysis E-Forensics
JUdicial MAnagement by Digital Libraries Semantics
Information Extraction
Emotion Recognition
Proceedings n° ……..
Accused Name XXXXXX
Witness Name KKKKKK
Prosecutor Name -
Lawyer Name YYYYYY ZZZZZZ
Meeting Date 1989
Meeting Location Civitanova Marche
Hearing Summarization
Document Analysis E-Forensics
E. Fersini, E. Messina, F. Archetti. “Multimedia Summarization in Law Courts: A Clustering-based Environment for Browsing and
Consulting Judicial Folders”. In proc. of the 10th Industrial Conference on Data Mining, 2010.
E. Fersini, G. Arosio, E. Messina, F. Archetti, “Emotion recognition in judicial domain: a multilayer SVM approach, LNAI, in Proc. of
the 6th International Conference on Machine Learning and Data Mining, Leipzig, 2009.
E. Fersini, G. Arosio, E. Messina, F. Archetti, D. Toscani. Multimedia Summarization in Law Courts: An Environment for Browsing and
Consulting Judicial Folders. In Proc. of the 2nd International Conference on ICT Solutions for Justice, Skopje, 2009.
E. Fersini, F. Callegaro, M. Cislaghi, R. Mazzilli, S. Somaschini, R. Muscillo, D. Pellegrini. Managing Knowledge Extraction and
Retrieval from Multimedia Contents: a Case Study in Judicial Domain. In Proc. of the 2nd International Conference on ICT
Solutions for Justice, Skopje, 2009.
Publications
Submitted Projects
PON
eJRM - electronic Justice Relationship Management
Project Coordinator: BV- TECH Spa
Call FP7 - Coordination and support action (coordinating)
FERIIC - Forensic Evidence Recovery, Interpretation, Integration and Coordination
Project Coordinator: Northumbria University (UK)
Submitted
E. Fersini, E. Messina, F. Archetti. “Emotional States in Judicial Courtrooms: An Experimental Investigation”. Sumbitted to Journal of
Speech Commiunication.
E. Fersini, E. Messina, D. Toscani, F. Archetti, M. Cislaghi. Semantics and machine learning for building the next generation of judicial
case and court management systems. Submitted to the Int. Conference on Knowledge Management and Information Sharing
Life Sciences
Find a partition of a given set of instances using additional information coming from instances relationships.
SEMI-SUPERVISED LEARNING METHOD
where relations can be represented by pair-wise constraints on some of the istances (specifying wheter two istances should be in same or different cluster)
13
Relational clustering
• Learning of relations
• Modify distance measure in clustering objective function
Systems Biology Applications
Regulatory modules
Gene
Coding Control DNA
RNA single strand
Transcription +
Human cancer
Gene expressio
n
Drug Activity
Gene drug interaction identification of a drug treatment for a given cell line based both on drug activity pattern and gene expression profile
Learning gene regulatory networks
Modelling the pharmacology of cancer
Collaborations
15
Pharmacogenomics Application:
Predict drug response to oral anticoagulation therapy (OAT)
Grouping (Profiling) patients based on their clinical and genotypic features in order to suggest doctors the correct drug dosage
Haemorragic risk Thrombotic risk Data of about 4000 patients:
Clinical and therapeutical data: personal patients data, medical diagnosis, therapy, INR and dosage measurements
Genetic data: polymorphism of three genes: CYP2C9, VKORC1 and CYP4F2 that contribute to differences in patients’ response.
In collaboration with
Publications
E. Fersini, C. Manfredotti, E. Messina, F. Archetti Relational K-Means for Gene Expression Profiles and Drug Activity Pattern
Analysis, to appear on Int. Journal of Mathematical Modelling and Algorithms.
F. Archetti, I.Giordani, L. Vanneschi, “Genetic Programming for Anticancer Therapeutic Response Prediction using the NCI-60
Dataset”, Computers & Operations Research, Vol.37, No.8, pp.1395-1405, August 2010.
E. Fersini, I.Giordani, E.Messina, F. Archetti, "Relational Clustering and Bayesian Networks for Linking Gene Expression
Profiles and Drug Activity Patterns", International Workshop of Applications of Machine Learning in Bioinformatics (satellite
workshop of IEEE International Conference on Bioinformatics and Biomedicine- BIBM, november 2009.
L. Vanneschi , F. Archetti, M. Castelli, I. Giordani, "Classification of Oncologic Data with Genetic Programming," Journal of
Artificial Evolution and Applications, vol. 2009, Article ID 848532, 13 pages, 2009. doi:10.1155/2009/848532.
F. Archetti, I.Giordani, L. Vanneschi, “Genetic Programming for QSAR Investigation of Docking Energy”, Applied Soft
Computing, Vol. 10, No. 1, pp. 170-182, issn: 1568-4946, Jan 2010.
G. Ogliari, I. Giordani, A. Mihalich, D. Castaldi, A. Di Blasio, A. Dubini, E. Messina, F. Archetti, D. Mari, Nuova classificazione
clinica e Farmacogenetica per predire la dinamica dell'inr nell'anziano in tao. Giornale di gerontologia, vol. lvii; p. 495-496, issn:
0017-0305, dicembre 2009
F. Archetti, I. Giordani, E. Messina, G. Ogliari, D. Mari, "A comparison of data mining approaches in the categorization of oral
anticoagulant patients", International Workshop of Applications of Machine Learning in Bioinformatics (satellite workshop of IEEE
International Conference on Bioinformatics and Biomedicine- BIBM, november 2009
F.Archetti, I.Giordani, G.Mauri, E.Messina. “A new clustering approach for learning transcriptional regulatory modules”,
Proceedings of BITS09, Sixth Annual Meeting Bioinformatic Italian Society, March 18-20 2009 Genova, pp:76-77.
Submitted
F. Archetti, I.Giordani, G.Mauri, E.Messina. “A new clustering approach for learning transcriptional regulatory modules”, submitted to Int.
Journal of Data Mining and Bioinformatics.
Projects
Submitted proposals:
Funding of research projects in the field of Thrombosis - Call for applications 2010
Oral Anticoagulation Therapy in the elderly and women
Partners:
Brunel University, Centre for Intelligent Data Analysis
Harvard Medical School, Biomedical Cybernetics Laboratory
Univ. of Milano, Dept. of Medical Sciences, Geriatrics Unit
Ist. Clinico Humanitas - Thrombosis Unit (Corrado Lodigiani, MD, PhD)
Ist. Auxologico Italiano, IRCCS Centro di Ricerche e Tecnologie Biomediche,
PON
HEARTDRIVE
Project Coordinator: Calpark – Parco Tecnologico e Scientifico della Calabria
PRIN
Revealing common patterns among insulin-resistance, osteoporosis and chronic inflammatory
diseases by using Bayesian Networks.
Project Coordinator: Università degli Studi "Magna Graecia" di CATANZARO
Ambient Intelligence
Multi-target tracking Multi-target tracking: finding the tracks of an unknown number of moving targets
from noisy observations.
Exploiting relations can improve the efficiency of the tracker
Monitoring relations can be a goal in itself
We model the transition probability of the system with a RDBN.
In collaboration with
A new representation modelling not only objects but also their relations
A new computational strategy based on a family of Sequential Monte Carlo methods called Particle Filter
Statistical techniques for the detection of anomalous behaviours
Cristina E. Manfredotti, Enza Messina: Relational Dynamic Bayesian Networks to Improve Multi-target Tracking. ACIVS 2009: 528-539.
C. Manfredotti, E. Messina, D.J. Fleet, Relations to improve multi-target tracking in an activity recognition system. Proceedings of the International
Conference on Imaging for Crime Detection and Prevention, London, 2009.
Publications
Wireless Sensor Networks Bayesian abstractions for virtual sensing through low cost data aggregation and net-
wide anomaly detection
Modelling Cluster Heads as nodes of a BN
Inference to know sensor values also in presence of temporary faults:
Lack of communication (sensor failure or sleep)
Outlier due to sensor malfunctioning
20
CH1 CH2
CH3
CH4
CH5
WSN
BN
sink
F. Archetti, E. Messina, D. Toscani and M. Frigerio - IKNOS – Inference and Knowledge in Networks Of Sensors. International Journal of Sensor Networks (IJSNet), Vol.8 No. 3, 2010
F. Chiti, R. Fantacci, F. Archetti, E. Messina, D. Toscani, Integrated Communications Framework for Context aware Continuous Monitoring with Body Sensor Networks, IEEE Journal on Selected Areas in Communications - Wireless and Pervasive Communications
for Healthcare. Volume 27, Issue 4, 2009.
D. Toscani, I. Giordani, M. Cislaghi, L. Quarenghi. Querying Sensor Data for Environmental Monitoring. Submitted to International Journal of Sensor Networks (IJSNet), 2010
D. Toscani, I. Giordani, L. Quarenghi, F. Archetti . A software Environment For Supporting Sensor Querying. Submitted to IEEE Sensors 2010 Conference, Hawaii, 2010
Publications
Submitted
Transportation & Logistics
In collaboration with:
Data Models Decisions
wwwf
ltk
f
lth
f
tj --+£
,,,
u
Lu f
j
Pj f
de
stf
h
k ori
gf
v
w
1,,,
£+£ wwwf
Tw
f
Tv
f
Tu
Publications
PRIN MIUR
Enhancing the European Air Transportation System
Partners: Università di Padova, Università di Trieste.
Projects
To be completed
LENVIS - Localised environmental and health information services for all (EU-FP7)
sviluppo di una rete collaborativa di supporto alle decisioni, per lo scambio di informazioni e servizi riguardanti l'ambiente e la salute
Publications
D. Toscani, L. Quarenghi, F.Bargna, F. Archetti, E. Messina, "A DSS for Assessing the Impact of Environmental Quality on
Emergency Hospital Admissions", In proceedings of the WHCM 2010 - IEEE Workshop on Health Care Management, February
18-20, 2010 - Venice, Italy.
Ambient Intelligence Currently active Projects
D. Toscani, I. Giordani, F. Bargna, L. Quarenghi, F. Archetti. A software System for Data Integration and Decision Support for
Evaluation of Air Pollution Health Impact. Submitted to ICEIS 2010 - 12th International Conference on Enterprise Information
Systems. Funchal, Madeira – Portugal, 2010
Submitted
INSYEME – Integrated Systems for Emergencies (MIUR - FIRB) GREIS - Gestione del Risparmio Energetico attraverso Informazioni di Sicurezza (MIUR)
In collaboration with SAL Lab.
H-CIM Health Care through Intelligent Monitoring (MIUR)
In collaboration withNOMADIS Lab.
Projects
Submitted
FP7 ICT call 6 OPENCITY Open framework for Transport Demand Management for smart and sustainable
urban mobility in an open and accessible city Project Coordinator: Consorzio Milano Ricerche
In collaboration with SAL Lab. e Imaging & Vision Lab.
FLECS – FLy’s eyes for Collaborative Surveillance -
Financial Time Series
Hidden var.: Regime
Financial Time Series & Scenario Generation
1( | )
( | )
t t
t t
p x x
p z x
-Transition Model
Observation Model
Markov Chain
Mixture of Gaussians (Autoregressive Process)
(Autoregressive) Hidden Markov Model
Observations: prices txtS
tS
Regime Switching Models
t=1 t=2 t=3 t=4
25
Financial Time Series
Extend state space models to more general Relational Dynamic Bayesian Networks to
account not only prices but also, through CPT, “exogenous” economic factors and
unstructured information
Algorithms for managing risk tracking portfolio using all available evidence and taking
into account all uncertainties
“Markets are good at gathering information from many heterogeneous sources and
combining it appropriately, the same we would expect from models”
PRIN 2007 "Modelli probabilistici per la rappresentazione dell’incertezza per la definizione di metodologie di selezione del portafoglio” (Università di Bergamo, Università della Calabria) Collaboration with Brunel University and CARISMA Research Centre: Workshop “Application of Hidden Markov Models and Filters to Time Series Methods in Finance”, London,
September 2010
Projects & Collaborations
G. Consigli, C. Manfredotti, E. Messina, A sequential learning method for tracking stochastic volatility, EURO XXIV, July
2010, Lisbon
Publications