Date post: | 31-Jan-2016 |
Category: |
Documents |
Upload: | adb-health-sector-group |
View: | 11 times |
Download: | 0 times |
From Internet Usage to Genomics Analysis
with Google BigQuery
Mr. Kanakorn Horsiritham
Prince of Songkla University, THAILAND
Agenda
Big Data
Big Data Analysis
Big Data Tools
Google BigQuery
Prince of Songkla University
Analysis
Google Genomics
Conclusion
Big Data
http://www.viawest.com/sites/default/files/asset/document/ViaWest_Big_Data_Infographic.pdf
(2011)
Source: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html
(2013)
Big Data
Big Data Analysis The process of examining big data to uncover
hidden patterns
unknown correlations
useful information
Method
Machine Learning
Data Visualization
Etc ...
http://www.sas.com/en_us/insights/analytics/big-data-analytics.html
Big Data Tools
Tools:
- Hadoop
- Google BigQuery
Hadoop
http://blog.agro-know.com/wp-content/uploads/2015/07/Hadoop_Ecosystem.jpg
Google BigQuery
Google BigQuery
Google Cloud Storage
SQL API
Corperate Data
BI Tools
Google Sheets
Co Worker
Store Data in the cloud Analyze interactively Securely Share
Distribute the Result
Google BigQuery
https://cloud.google.com/bigquery/pricing
Free
Loading/Export Data
Storage Price
$0.02/GB/Month ($20/TB/Month) Storage pricing is prorated per MB, per second.
For example, if you store 1 TB for half of a month, you pay $10
Query Price
First 1 TB of month is FREE
$5/TB
Google BigQuery
Only $7 / Month
Internet Usage Prince of Songkla University
Now, we are here !
Prince of Songkla University
Hadyai
Pattani
Trang
Phuket
Surat
Prince of Songkla University
5 Campus
Credit: Facebook "Care Chayada"
Credit: Facebook "Care Chayada"
Schema
Analysis
How each application uses internet traffic?
How each application uses internet traffic?
How each application uses internet traffic?
What are the URL Category of those SSL Applicaiton?
What are the URL Category of those SSL Applicaiton?
What are the URL Category of those SSL Applicaiton?
How these top 5 applications are used in each hour of day?
How these top 5 applications are used in each hour of day?
How these top 5 applications are used in each hour of day?
How users use internet during a week?
How users use internet during a week?
How users use internet during a week?
How users use internet in week day and weekend?
How users use internet in week day and weekend?
How users use internet in week day?
How users use internet in weekend?
What are the most popular social networking?
What are the most popular social networking?
What are the most popular social networking?
How different do students and staffs use social
networking?
How different do students and staffs use social networking?
How different do students and staffs use social networking?
Which Faculty their students use 'Reference and Research' category?
Google Genomics
Google Genomics
Genomic Browser http://gabrowse.appspot.com/
Genomic Browser http://gabrowse.appspot.com/
VCF Format
Upload into Google Cloud Storage
Google Genomics API
Export to Google BigQuery
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 50000000 100000000 150000000 200000000 250000000 300000000
titv
titv
Conclusion
Conclusion
Google BigQuery
• easy to use and not expensive. Just prepare you data, upload and then query
• Process large data within a few seconds
PSU Internet Usage Analysis
• Heavy traffic : video streaming, i.e. youtube and other in the form of web browsing
• When count with number of users, Web browsing and Facebook are used by most users.
Conclusion
• Internet usage peak on workday by office hours and fade down after office hours and weekend
• Staff use internet weekday by office hours and other used by student alone.
• Top 3 Social Networking are Facebook, LINE and Twitter. Both staff and
student use Facebook the most but staff like twitter more than LINE while
student like LINE more than Twitter.
• And Faculty of Science, Mangement Science and Engineering user internet
for Education Purpose the most.
Conclusion
Google Genomics
• Upload the VCF files created by DNA Sequencer to Google Cloud Storage
• Process with Google Genomics
• Export to Google BigQuery
• Then query the data set