Date post: | 22-Nov-2014 |
Category: |
Engineering |
Upload: | aboul-ella-hassanien |
View: | 177 times |
Download: | 1 times |
Chair of the Scientific Research Group in Egypt (SRGE)http://www.egyptscience.net Dean of faculty of computers and information – Bein Suef University
Web site: http://www.fci.cu.edu.eg/~abo/ Face book: http://www.facebook.com/profile.php?id=100000780092307 Research gate: http://www.researchgate.net/home.Home.html
Data is the new oil Big data, data mining and bio - inspiring
techniques
Professor Aboul Ella Hassanien
CU scholar: http://scholar.cu.edu.eg/abo
Agenda
• Scientific Research Group in Egypt (SRGE)• SRGE Trends AND Directions
Information and Network Security (Geospatial Data 2D/3D))
Biomedical Informatics (Biomedical/Bioinformatics) Intelligent Technology for blind and deaf people Intelligent Environment and Control System Data mining, graph mining and Social Networks (image
and data Registration/fusion in remote sensing)
• Big Data Set and Complex System• Data Mining and Intelligent systems• Open Discussion
Scientific Research Group in Egypt (SRGE)
Scientific Research Group in Egypt (SRGE)Members
• 1 Professor• 15 Assistant Professors• 20 Ph.D students • 25 M. Sc. students• 50 International
collaborative researchers from 15 countries
• 10 undergraduate student
Profe
ssor
Ass. P
rof
Ph.D. s
tude
nts
M.S
c. Stu
dent
s
unde
rgra
duat
e st
uden
t
colla
bora
tive
rese
rach
ers
05
101520253035404550
SRGE member numbers.
no.
20 Faculties and institutes
Scientific Research Group in Egypt (SRGE)
Objective
• To encourage and make it easy for the Egyptian young researchers to cooperate and increase their contribution in academic research.
• To integrate the various research efforts of the scientific team to be a source of innovation on possible scientific, technological and socio-economic trajectories to mould the future of machine intelligence technologies and applications.
• To produce Master/PhD graduates: Who can conduct high quality academic research,
Who can publish their research in high quality academic journals,
Who can obtain tenure track faculty positions at high ranking research universities,
Who are good teachers, and more generally who are good academics
Scientific Research Group in Egypt (SRGE)Research map
Scientific Research Group in Egypt (SRGE)
(Publications (2013-2014
• 2013: more than 100 publications ▫ 32 (ISI) Journal papers
Elsevier AND Springer and other prestigious Journals
▫ 60 International Conferences IEEE/Springer
▫ Book Chapter 10 book chapters (Springer)
▫ Editing Book Five (Springer)
▫ Editing Proceeding One
▫ Special issues THEE
Scientific Research Group in Egypt (SRGE)
SRGE research tracks (2013-2014)
• Track-(1) Network and information security• Track-(2) Biomedical eng. & Bioinformatics • Track- (3) Intelligent environment and applications• Track- (4) Iintelligent technology for disable people• Track- (5) Chem(o)informatics• Track- (6) Social networks/ Big Data and graph mining
Scientific Research Group in Egypt (SRGE)Research tracks
• Track-I Network and Information Security▫ Intrusion Detection System
(Machine Intelligence, Danger theory. AIS)
▫ Cryptanalysis (Evolutionary optimization)▫ Image Authentication and Applications▫ Watermarking (vector and raster data)▫ Digital Signatures▫ Biometrics
Heart sound recognition Face and Finger print Gait processing
Problems: - Heart Sound as a biometric- Watermarking (Vector data)- Image authentication- Asymmetric hash function- Multi-Biometric-based
Network and Information SecurityHeart Sound Recognition
Biometric
Network and Information SecurityBlind Source Separation (ICA)
Blind Separation of Information from Galaxy Spectra
0 50 100 150 200 250 300 350-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Early diagnosis of pathology in fetus
Blind Source Separation (BSS) deals with the problem of separating independent sources from their observed mixtures only while both the mixing process and original sources are unknown.
Network and Information Security Vector geo-spatial data /3D animated Watermarking
Geospatial data 3D animated object
• Geospatial data or geographic information is the data or information that identifies the geographic location of features and boundaries on Earth, such as natural or constructed features, oceans, and more.
Scientific Research Group in Egypt (SRGE)Research tracks
• Track-II Biomedical Informatics
▫ Medical image processing Breast cancer analysis, (sonar, MRI, fMRI,
CT) Liver fibrosis and tumour analysis (biopsy,
MRI, CT) Medical image annotation
• Bioinformatics Problems: - Breast Cancer Case- Liver Fibrosis – HCV- Content-based image retrieval- Formal Concept Analysis (visualize (rule
based))
Track-II Biomedical Informatics Hepatitis C Virus in Egypt -HCV*
The World Health Organization has decleared hepatitis C a global health problem, with approximately 3% of the world’s population (roughly 170-200 million people) infected with HCV.
Egypt has one of the highest prevalence rates of the C virus in the world
In Egypt the situation is quite worse.
EGYPT: 14.7 % infected with Hepatitis C
Track-II Biomedical Informatics Liver Fibrosis
Stage 0 No fibrosis (fatty liver)
Stage 1 Portal expansion with fibrosis (<1/3 area)
Stage 2 Bridging fibrosis (>1/3)
Stage 3 Marked bridging fibrosis or early cirrhosis ( no reason for tissue conversion)
Stage 4 Definite cirrhosis (<50% of biopsy fibrosis)
Stage 5 Definite cirrhosis (>50% of biopsy fibrosis)
Challenges: distinguish between the late fibrosis stage and tumorGood segmentation techniques/features-based/classifier/
Scientific Research Group in Egypt (SRGE)Research tracks
• Track-III : Intelligent Environment
• Intelligent Water/Air Quality Monitoring• Smart Reading Environments• Intelligent Lighting system• Video Processing
(Video annotation/summarization)
Problems: - Monitoring Water/air
Pollutions - Climate Change
Intelligent Environment Track Water Quality Monitoring
Intelligent environment and applications trackCattle identification
• Identify the origin of each animal; • Trace the path of each animal from
location to location; • Trace each animal exposed to disease; • Eradicate or control an animal health
threat; • Retrieve information within hours of an
outbreak and implement intervention strategies;
• Enhance the safety and security of the food chain;
• Improve consumer confidence; and, • Facilitate efficient market transactions as
it provides assurance to buyers regarding the animals life history.
Track –III Intelligent environment and applicationsArabian Horse identification using Iris pattern
The Arabian horse is a breed of horse that originated on the Arabian Peninsula. It is one of the oldest breeds, dating back 4,500 years.
Recent developments in iris scanning have led to a new form of equine identification, and research has indicated that the horse's eye could be the most telling identifier.
Arabian horse
Intelligent environment and applicationsCattle Identification Using Muzzle Print Images
Scientific Research Group in Egypt (SRGE)Research tracks
• Track-IV : The intelligent technology for blind and visual impaired people
▫ Text to speech processing▫ Document management for blind and visual
impairment people▫ Developing Games for blind and visual
impairment people▫ Mobil applications for blind and visual
impairment people▫ Automatic Sign Language (ASL) Recognition
for Deaf-Blind people
The intelligent technology for blind and visual impaired people Track
HOW ASSISTIVE TECHNOLOGY COULD HELP THE DISABLED PEOPLE
• Tongue Drive System• For disabled people,
technology may do more than just improve their lives - high-tech tools may give them life back.
• Researchers from the Gergia Institute of Technology created the latest device, a mouth retainer that allows people with spinal cord ( إصابات
الشوكي النخاع injuries to(فيoperate a computer and move an electric wheelchair with only their tongues
Thought-Controlled Wheelchair
• Users wear a cap that can read brain signals. Those signals are then relayed to a brain scan electroencephalograph (EEG) on the wheelchair which are then analyzed by a computer program and sent to the wheelchair. Toyota said its next goal is to allow users to think about letters in order to spell words.
Analysis brain signals?
Big Data in Complex System
Data is the new oil
Simple to start• What is the maximum file size
you have dealt so far?▫Movies/Files/Streaming video
that you have used?▫What have you observed?
• What is the maximum download speed you get?
• Simple computation▫How much time to just
transfer.
What is big data?• 90% of the data in the world today
has been created in the last two years alone.
• This data comes from everywhere: ▫ sensors used to gather climate
information, ▫ posts to social media sites, ▫ digital pictures and videos, ▫ Cell phone GPS signals to name a few.
This data is “big data.”
Big Data Born
• Google, eBay, LinkedIn, and Facebook were built around Big Data from the beginning.
• No need to integrate Big Data with more traditional sources of data and the analytics performed upon them
• No merging Big Data technologies with their traditional IT infrastructures
• Big Data could stand alone, Big Data analytics could be the only focus of analytics
What is Big Data?
• Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.
• Big Data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. – Wikipedia, October 2014 (http://en.wikipedia.org/wiki/Big_data)
Huge amount of data• There are huge volumes of data in the
world:+ From the beginning of recorded time
until 2003,+ We created 5 billion gigabytes
(exabytes) of data.+ In 2011, the same amount was
created every two days+ In 2013, the same amount of data is
created every 10 minutes.
How much data?
• Google processes 20 PB a day (2008)
• Wayback Machine has 3 PB + 100 TB/month (3/2009)
• Facebook has 2.5 PB of user data + 15 TB/day (4/2009)
• eBay has 6.5 PB of user data + 50 TB/day (5/2009)
Big data spans three dimensions: Volume, Velocity and Variety
Big data spans three dimensions: Volume, Velocity and Variety•Volume:
▫Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information. Turn 12 terabytes of Tweets created each day
into improved product sentiment analysis
Big data spans three dimensions: Volume, Velocity and Variety• Velocity:
• Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.
▫Analyze 500 million daily call detail records in real-time to predict customer churn faster
The latest I have heard is 10 Nano seconds delay is too much.
Big data spans three dimensions: Volume, Velocity and Variety•Variety:
▫Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Monitor 100’s of live video feeds from surveillance cameras
to target points of interest Exploit the 80% data growth in images, video and documents
to improve customer satisfaction
Time for thinking
• What do you do with the data.▫ Lets take an example:
“From application developers to video streamers, organizations of all sizes face the challenge of capturing, searching, analyzing, and leveraging as much as terabytes of data per second—too much for the constraints of traditional system capabilities and database management tools.”
Finally.…`Big- Data’ is similar to ‘Small-data’ but bigger
.. But having data bigger it requires different approaches:Techniques, tools, architecture
… with an aim to solve new problemsOr old problems in a better way
What to do with these data?•Aggregation and Statistics
▫Data warehouse and OLAP•Indexing, Searching, and Querying
▫Keyword based search ▫Pattern matching (XML/RDF)
•Knowledge discovery▫Data Mining▫Statistical Modeling
Data Mining
What is Data mining?
• Data mining (knowledge discovery from data) ▫Extraction of
interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
• Alternative names▫Knowledge
discovery (mining) in databases (KDD), knowledge extraction, business intelligence, etc.
42
Why Not Traditional Data Analysis?
• Huge amount of data▫Algorithms must be highly scalable to handle
such as Tera-bytes of data• High-dimensionality of data
▫Micro-array may have tens of thousands of dimensions
• High complexity of data▫ Data streams and sensor data▫ Time-series data, temporal data, sequence data ▫ Structure data, graphs, social networks and multi-linked
data▫ Heterogeneous databases and legacy databases▫ Spatial, spatiotemporal, multimedia, text and Web data▫ Software programs, scientific simulations
43
Multi-Dimensional View of Data Mining
• Data to be mined
▫ Relational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW
• Knowledge to be mined
▫ Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc.
▫ Multiple/integrated functions and mining at multiple levels
• Techniques utilized
▫ Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc.
• Applications adapted
▫ Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
44
Data Mining: On What Kinds of Data?
• Database-oriented data sets and applications
▫ Relational database, data warehouse, transactional database
• Advanced data sets and advanced applications
▫ Data streams and sensor data
▫ Time-series data, temporal data, sequence data (incl. bio-sequences)
▫ Structure data, graphs, social networks and multi-linked data
▫ Object-relational databases
▫ Heterogeneous databases and legacy databases
▫ Spatial data and spatiotemporal data
▫ Multimedia database
▫ Text databases
▫ The World-Wide Web
45
Data Mining Functionalities
• Multidimensional concept description: Characterization and discrimination
▫ Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions
• Frequent patterns, association, correlation vs. causality
▫ Diaper Beer [0.5%, 75%] (Correlation or causality?)
• Classification and prediction
▫ Construct models (functions) that describe and distinguish classes or concepts for future prediction E.g., classify countries based on (climate), or classify cars
based on (gas mileage)
▫ Predict some unknown or missing numerical values
46
Data Mining Functionalities (2)• Cluster analysis
▫ Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns
▫ Maximizing intra-class similarity & minimizing interclass similarity
• Outlier analysis▫ Outlier: Data object that does not comply with the general
behavior of the data▫ Noise or exception? Useful in fraud detection, rare events
analysis• Trend and evolution analysis
▫ Trend and deviation: e.g., regression analysis▫ Periodicity analysis▫ Similarity-based analysis
• Other pattern-directed or statistical analyses
47
Data Mining Functionalities (1)Basic Data Mining Tasks
•Classification maps data into predefined groups or classes▫Supervised learning▫Pattern recognition▫Prediction
•Clustering groups similar data together into clusters.▫Unsupervised learning▫Segmentation▫Partitioning
48
Data Mining Functionalities (2)Basic Data Mining Tasks
•Summarization maps data into subsets with associated simple descriptions.▫Characterization▫Generalization
•Link Analysis uncovers relationships among data.▫Affinity Analysis▫Association Rules▫Sequential Analysis determines sequential
patterns.
49
Architecture: Typical Data Mining System
data cleaning, integration, and selection
Database or Data Warehouse Server
Data Mining Engine
Pattern Evaluation
Graphical User Interface
Knowledge-Base
DatabaseData
WarehouseWorld-Wide
Web
Other InfoRepositories
50
Similarity Measures
Determine similarity between two objects.
51
Similarity MeasuresDetermine similarity between two objects.
52
Distance Measures
Measure dissimilarity between objects
53
Example: Information Retrieval
• Information Retrieval (IR): retrieving desired information from textual data.
• Library Science• Digital Libraries• Web Search Engines• Traditionally keyword based• Sample query:
Find all documents about “data mining”.
DM: Similarity measures; Mine text/Web data.
54
Information Retrieval (cont’d)
•Similarity: measure of how close a query is to a document.•Documents which are “close enough”
are retrieved.•Metrics:
▫Precision = |Relevant and retrieved|
|Retrieved|
▫Recall = |Relevant and Retrieved|
|Relevant|
Intelligent SystemsBio inspiring system
Biologically inspired computing relies heavily on the fields of biology, computer science and mathematics.
Recommender system
Artificial Immune system (AIS)• AIS are adaptive systems, inspired by
theoretical immunology and observed
immune functions, principles and models,
which are applied to problem solving• Applications
▫ Bioinformatics▫ Intrusion detection▫ Virus detection
Swarm Intelligent
Definition:-is an artificial intelligence technique based around the study of
collective behavior in decentralized, self-organized systems-SI systems are typically made up of a population of simple agents interacting locally with one another and with their environment.
Goals:-performance optimization and robustness
-self-organized control and cooperation (decentralized) -division of labour and distributed task allocation
Swarm Intelligent Techniques
• Ant Colony Optimization (ACO)• Marriage in Honey Bees Optimization (MBO)• Particle Swarm Optimization (PSO).
Fish Swarm school
Ant Colony Optimization
• Ant Colony Optimization is an efficient method to finding optimal solutions to a graph
• Using three algorithms based on choosing a city, updating
pheromone trails and pheromone trail decay, we can determine an optimal solution to a graph
• Ant Colony Optimization has been used to figure out solutions to real world problems, such as truck routing
Ant Colony Optimization (ACO)
Ant Colony Optimization Cont .• Many difficult optimization problems have been
solved by so-called ant algorithms such as
- The Traveling Salesman Problem.
- The Quadratic Assignment Problem
- Other hard optimization problems . • These different approaches all try to take
advantage of how social insects seem to function.
Marriage in Honey Bees Optimization (MBO)
Bees’ Comb
Marriage in Honey Bees Optimization Cont.
The main processes in MBO are:
(1) the mating flight of the queen bee with drones
(2) the creation of new broods by the queen bee
(3) the improvement of the broods' fitness by workers.
(4) the adaptation of the workers' fitness
(5) the replacement of the least fittest queen(s) with the fittest brood(s).
Particle Swarm Optimization (PSO) .
• PSO method is motivated from the simulation of
social behavior of bird flocking and fish schooling
Particle Swarm Optimization Cont.• In PSO, each single solution is a "bird" in the
search space. We call it "particle".
• All of particles have
▫fitness values which are evaluated by the fitness
function to be optimized, and
▫velocities which direct the flying of the particles.
• The particles fly through the problem space by
following the current optimum particles.
Swarm Intelligent Application
• Swarm Robotics• Crowd simulation • Ant-based routing• Telecommunication (routing and
congestion problems, intrudion detection)
• Computer Animation• Electronic• Data Mining• Production control• Industrial Design
Swarm robotics (e.g.: Swarm-bots)• Collective task completion
• No need for overly complex algorithms
• Adaptable to changing environment
Communication Networks
• Routing packets to destination in shortest time
• Similar to Shortest Route• Statistics kept from prior
routing (learning from experience)
Big Data and Data Mining Bio-inspired techniques
Weeding
Thanks