+ All Categories
Home > Documents > Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... ·...

Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... ·...

Date post: 26-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
18
Theses Turin, January 2015 Elena Baralis, Tania Cerquitelli, Silvia Chiusano, Paolo Garza Luca Cagliero, Luigi Grimaudo Daniele Apiletti, Giulia Bruno, Alessandro Fiori
Transcript
Page 1: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

Theses

Turin, January 2015

Elena Baralis, Tania Cerquitelli, Silvia Chiusano, Paolo Garza

Luca Cagliero, Luigi Grimaudo

Daniele Apiletti, Giulia Bruno, Alessandro Fiori

Page 2: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

2DBMG

General information

Duration: 6 months full time

equivalent overall duration if part time

Internal thesis

cooperation on active research topic or research project

good programming and analytical skills required

supervised by a group member

can work at home or in our lab (LAB5)

External thesis (stage)

supervised by external tutor

More info on topicshttp://dbdmg.polito.it/wordpress/theses

Page 3: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

3DBMG

Main Topics

Big data and cloud-based data mining services and algorithms

Database and data mining applications

Text and social network mining

Network traffic data analysis

Clinical and biological data management

Green/urban data mining

Page 4: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

4DBMG

Big data and cloud-based data mining

Study of innovative, parallel, and distributed data mining approaches for

Pattern mining algorithms

Clustering techniques

Classification algorithms

Summarization algorithms

to efficiently gain interesting insights from huge data volume

Design and development of novel cloud-based data mining services based on

HADOOP and Spark frameworks

MapReduce paradigm

Exploitation of the cloud-based services for novel big data analytics applications (e.g., network traffic data, fraud detection, social networks)

Analysis modules based on HADOOP and Spark Ecosystems

European research project ONTIC

Page 5: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

5DBMG

Itemset mining algorithms for time series data analysis design and development of novel itemset mining algorithms

targeted to time series data analysis analysis of historical financial data (e.g., stock prices, stock indexes,

credit card transactions) to plan trading and investment strategies to support fraud detection

Data Mining Algorithms

Integration of data mining algorithms into Rapid Miner Rapid Miner is an established Java-based machine learning tool integration in Rapid Miner of state-of-the-art algorithms for

weighted itemset mining document analysis and summarization

Page 6: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

6DBMG

Design and implementation of anautomatic system (Mining Advisor) toselect for a dataset an optimal miningalgorithm for a given analysis taskbased on innovative data characterization

statistics definition/design of mining algorithms

(i.e., access methods and mining primitives), possibly disk-based

algorithm selection strategies exploiting a trade-off between accuracy and exploration time

Different instances of a Mining advisor can be tailored to different data mining techniques (e.g., clustering algorithms, pattern discovering)

Mining Advisor

Data characterizationthrough metrics

computation

Mining advisor

Miningprimitives

Algorithmselection

Access methods

Datasetunder

analysis

Page 7: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

7DBMG

Network data analysis

Analysis of huge wireless network traffic captures

wireless traffic monitoring by exploiting itemset

mining algorithms

wireless traffic classification by exploiting association

rules and probabilistic models

Analysis of very large wired network traffic captures

wired traffic monitoring and characterization by exploiting data mining techniques (e.g., association rules, clustering, classification)

distributed VLDB/cloud technologies to support efficient storage,retrieval and indexing of huge amounts of network data

European research projects: mPlane and ONTIC

Page 8: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

8DBMG

Text mining Text summarization

identification of salient knowledge from news, research articles, blogs

generation of sound and easy-to-read summaries of large document collections

development of multi-lingual and automatic summarization systems development of cloud-based summarization systems targeted to the

extraction of succinct summaries from big data collections

Social and educational text mining Content curation systems allow users to build personalized and

dynamically updated news reports Integration of a summarization system into a content curation

platform Evaluation of system appreciation and feedback

E-learning refers to the use of ICTs in education Development and integration of a summarization system in an e-

learning context

Page 9: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

9DBMG

Social network mining

Social network analysis user behavior analysis by means of data mining techniques topic extraction and correlation analysis discovery of user communities, trends and deviations classification of web objects using user-generated content

Social watching analysis of social messages during specific TV programs analysis of evolution of hashtags during program broadcast characterization of user groups and social interactions

Page 10: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

10DBMG

Clinical and biological data management

Physiological data analysis

analyze physiological data collected during incremental tests (e.g., cardiopulmonary exercise testing) commonly used in clinical domain and in sport science improve the effectiveness of the reliability/training sessions predict the final values of crucial parameters reduce test duration and the physical effort for patients/athletes

Clinical data analysis analyze data collected by the healthcare network of an Italian Health

Care Center

extract medical treatments (in terms of performed examinations, prescribed drugs) frequently done by patients

identify deviation from expected medical treatments according to medical guidelines

Page 11: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

11DBMG

Genomic Computing

Next Generation Sequencing (NGS) is a new and high-throughput technology for DNA sequencing

There is a need for new and effective data mining approaches to discover knowledge from NGS data

Goals analysis of complex NGS data and development of innovative

semantics-aware algorithms

exploitation of bio-ontologies, gene/protein and genetic disorder libraries

smart indexing and mining of large-scale NGS datasets

compact data representation and efficient data access

mining algorithms based on disk-based structures

National research project [PRIN 2011]: GenData 2020

Page 12: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

12DBMG

Green data mining

Joint analysis of

Energy consumption logs of residential and public building heating systems and indoor climate conditions

Data on the user thermal comfort perception of indoor climate conditions and user feedbacks

Goals

Suggest ready-to-implement energy efficient actions based on innovative and user-friendly indicators

Discovering of interesting correlations among the large and heterogeneous amount of available data

Regional research project: EDEN project

Page 13: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

13DBMG

Green data mining

Joint analysis of energy and water consumption data to efficiently support an intelligent building management system localization of network losses and leaks detection of abnormal consumption characterization of user consumption forecast of energy and water consumption

Analysis of available bikes in the stations of a public bike-sharing system to forecast critical situations (e.g., empty or full stations) to

reschedule the bike redistribution process on the fly characterize the cyclic mobility patterns to support

human mobility in urban areas

Page 14: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

14DBMG

Data analysis for Smart Cities

Mining urban data to increase the well-being of citizens by improving the efficiency, accessibility and functionality of provided services

Analysis of data collected through sensor networks embedded in smart street furniture

National research project

Analysis of air pollution data on urban area to detect possible critical conditions

National research project: MIE

Analysis on data un citizens urban safety and security

S[M2]ART

Page 15: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

15DBMG

Data analysis for Smart Cities

IOC (Intelligent Operations Center) - IBM Platform for data analytics Study IOC architecture, data flow and programming model Deploy IOC in a real application to efficiently support

an intelligent transportation system an intelligent water management system

Page 16: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

16DBMG

Laboratory Assistant Suite modular architecture to manage different kinds of raw experimental data, tracking

several laboratory activities, integrate different resources and aid in performing a variety of analyses to extract knowledge related to tumors

design of model-driven automated GUI generation development of infrastructural components (e.g., task scheduler, email notification

system, dashboard)

Biomedical Informatics @ IRCCS

Genome analysis innovative approaches to analyzing NGS data from human genomes analytical algorithms to identify genetic variants in tumors and in the blood of patients implementation and optimization of analytical algorithms for identification and

classification of genetic variants in paired comparative tests development of data analysis pipelines in parallel and distributed environment

Microarray data analysis study of class discovery algorithms (e.g., clustering, bi-clustering) identify robust gene markers by means of the integration of several classification

methods analysis of gene expression values over the time on data derived from xenopatients

Page 17: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

17DBMG

Realization of virtual representation of life-science working environments based on 3D interactive models development of a prototypic application for user-friendly management of complex and hierarchical

storage systems, by means of 3D realistic representation of the physical containers and their interactions

Computer vision for sensor-based real-time tracking of laboratory activities development of a prototypic platform for automated monitoring of interactions between users,

instruments and experimental materials

virtual representation of objects and activities is also foreseen to provide intuitive and user-friendly GUIs

Virtual Laboratory @ IRCCS

Biological laboratories need Next Generation

LIMS

Virtual reality to improve graphical

user interfaces usability in laboratoryinformation management systems (LIMS)

Computer vision to improve the efficiency

of laboratory data-tracking procedures

Page 18: Theses - polito.itdbdmg.polito.it › wordpress › wp-content › uploads › 2014 › ... · HADOOP and Spark frameworks ... analysis of gene expression values over the time on

18DBMG

Ooros exploiting geo-located social interactions among

people, places and businesses (e.g., checkins) business intelligence for social recommendations

Analysis of a real-world dataset from hundreds of businesses in Turin, see www.desidoo.com

External stages

Narus Lab Developing novel solutions to analyze network traffic data for security purposes (i.e., malware traffic detection, signature generation, anomaly detection), using machine learning and data mining techniques to find relevant patterns

The thesis will be conducted in collaboration with NARUS, Inc. – Sunnyvale, California in the context of the new laboratories that Narus is opening within the Politecnico di Torino Campus.


Recommended