Post on 11-Jul-2020
transcript
Data Science, Big Data, and Artificial Intelligence: Concept, Context, and
Applications
Prof. Zainal A. Hasibuan, PhD.
Ketua Asosiasi Pendidikan Tinggi Informatikadan Komputer (APTIKOM)Webinar Aptikom 19 May, 2020
Yes, We Are Connected!
Covid19 Proves The Concept of Connectivity
Technologies That Make Things ConnectedArtificial Intelligence• Teknologi: algoritma perangkat lunak yang
mengotomatisasi tugas-tugas pengambilan keputusan yang kompleks untuk meniru proses dan indera pemikiran manusia
• Manfaat: dapat belajar, memahami, menalar, merencanakan dan bertindak ketika diasupidengan dataInternet of Things (IoT)
• Teknologi: ekosistem sensor, komputer tertanam, dan perangkat "pintar"
• Manfaat: mampu berkomunikasi di antara mereka sendiri dan dengan layanan cloud pribadi / publik untuk mengumpulkan, menganalisis, dan menyajikan data tentang dunia fisik3D Printing
• Teknologi: menciptakan objek tiga dimensi berdasarkan model digital dengan "mencetak" lapisan material yang berurutan
• Manfaat: berbagai bahan dapat digunakan, mis. kayu, kaca, sel hidup untuk bio-printing; meminimalkan limbah
Robotic
• Teknologi: mesin dengan sensor, kontrol, dan kecerdasan yang ditingkatkan yang digunakan untuk mengotomatisasi, menambah, atau membantu aktivitas manusia
• Manfaat: meningkatkan efisiensi dan produktivitas
Blockchain• Teknologi: buku kas digital yang menggunakan
algoritma perangkat lunak untuk merekam dan mengkonfirmasi transaksi dengan keandalan dan anonimitas
• Manfaat: meningkatkan keterlacakan, transparansi, efisiensi, meningkatkan keamananDrone
• Teknologi: Pesawat tidak berawak• Manfaat: sangat serbaguna karena variasi
besar dalam kapasitas, ukuran, kemampuan dan fungsinya
Virtual Reality (VR)• Teknologi: menyiratkan pengalaman
“immersion” lengkap, yang 100% dihasilkan komputer
• Manfaat: inovasi dapat disajikan tanpa benar-benar memproduksinya
Augmented Reality (AR)• Teknologi: menawarkan pengalaman dunia
nyata dengan hamparan yang dihasilkan komputer
• Manfaat: campuran dunia nyata dan komputer
Family
Music
Sport
Friends
Pets
Basically, We are Networked Society
Potensi Implementasi Data Science di Indonesia
250 Juta Penduduk
1.340 Suku Bangsa
17.508 Pulau
We are big
746 Bahasa Daerah
We are adaptive
132,7 JutaPengguna Internet
106 Juta Pengguna Aktif Sosial Media
371,4 JutaPelanggan Ponsel
Bonus Demografi Usia Produktif
Ekonomi Tumbuh
We have opportunity
Politik dan Keamanan Stabil
Data Science Extracts Knowledge & Insights From Big Data
Forming Society 5.0: A Human-Centered Society
The Context of Data Science, Big Data, and Artificial Intelligence
Big Data (BD)
Definitions, Techniques, and Examples of DS, BD, and AI
Keyword Definition Techniques & Analysis
Example & Application
Data Science
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data.
K-Means, LinearRegression, Naïve Bayesian, etc.
Personalized healthcare recommendations
Big Data Big Data is a massive volume of both structured and unstructured data that is so large & difficult to process using traditional database and software techniques.
Education Performance Analysis, Sentiment Analysis, Customer Behavior Analysis
Big Data of National Education System
Artificial Intelligence
Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn.
Rule-based systems, Neural Networks, Fuzzy Models, etc.
Plagiarism Checkers
Why Data Science, Big Data, and Artificial Intelligence are Important?
BIG DATA:
ValueVolumeVarietyVelocityVeracity
Big Data: More, Messy, Good Enough• In this new world we can analyze far MORE data.• Big data gives us an especially clear view of the granular:
subcategories and submarkets that samples cannot assess.• As scale increases, the number of inaccuracies increases as well
(Messy).• A move away from the search for causality to discover patterns and
correlations.• Big data is about WHAT, not WHY.• Big data changes the nature of business, markets, and society.• Values is shifted from physical infrastructure to intangibles such as
brands and intellectual property.• Big data is the oil of the information economy.• As individual shifts from privacy to probability: likelihood one get a
heart attack, default on a mortgage, commit crime, climate change, eradicating diseases, fostering good governing and economic development.
• deals with both structured and unstructured data
• a field that includes everything that is associated with the cleansing, preparation and final analysis of data
• combines the programming, logical reasoning, mathematics and statistics
• cleanses, prepares and aligns the data
• an umbrella of several techniques that are used for extracting the information and the insights of data
Source: Leonard Heiler, 2017. https://www.datasciencecentral.com/profiles/blogs/difference-of-data-science-machine-learning-and-data-mining
Paradigm Shift of Big Data Computation in Data Science: From Factual to Potential
Foundational
• What happened?
• When and where?
• How much?
Advanced, Predictive
• What will
happen?
• What will be the impact?
• Big Data Analysis
• Strategic
Direction
• Interpretative
• Enterprise data
Data
integration
• Descriptive
• Basic reporting
Data
reporting
• Enterprise analytics
• Evidence-based medicine
• Outcomes analytics
Data
analytics
• Population behavior
• Innovation
Data
Predictive
Prescriptive
• What are potential
scenarios?
• What is the best course?
• How can we pre-empt and
mitigate the crisis?
• Structure and unstructure
data
• Future Direction
Source: (Hasibuan 2016)
Relational
• How one data
relates to another data
• Rules and
method
Role of Big Data
Research Paradigm Shift: From Data to Big Data
Big Data
Sampled Data
Data
• Population
• Heterogeneous
• Pattern
• Representation
• Inference
• Hypothesis
• Limited
• Homogeneous
How to Mechanize DS, BD, and AI?• An organization that has big amounts of data
gain competitive advantages in its playing field.
• The more data an organization has, the more accurate its descriptions, predictions, and prescriptions can be.
• Data Science, Big Data, and Artificial Intelligence play significant roles to present the solutions
• This means making use of mathematical models to create algorithms to identify, classify, cluster, predict, learn, and to process data.
DS, BD, and AI: Methodologies and Algorithms
Key Word Methodology Algorithms
Data Science Classification (to classify), Regression (to predict), Similarity (to correlate)
Support vector machine (SVM), Linear Regression , Association Rule Mining, etc.
Big Data Data Mining, MachineLearning, NLP
Support vector machine (SVM), K-Mean, Naïve Bayes, etc
Artificial Intelligent Supervised Learning, Unsupervised Learning, and Reinforcement Learning
Support vector machine (SVM), ), K-Mean, Naïve Bayesian, Convolution Neural Network (CNN), etc.
Example of Linear Regression
• One of the most widely-used methods of statistical analysis
• Applicable to many problems, particularly when the expected output is a score rather than a category
• Good for predicting trends and to forecast the effects of a new policy or other change.
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
• Learns to define a hyperplane to separate data into two classes
• Can help figure out an underlying separation mechanism between people
• some of the biggest problems that have been solved using SVMs (with suitably modified implementations) are display advertising, human splice site recognition, image-based gender detection, large-scale image classification
Example of Support vector machine (SVM)
Source: James Le, 2016https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
• Not one algorithm, but a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features.
• The algorithm learns to predict an attribute based on other, known features.
• Assumes all attributes of an item are independent of each other
Example of Naïve Bayesian
http://uc-r.github.io/naive_bayes
10 Algoritma untuk Ahli Big DataSource: James Le, 2016
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
Algoritma Penjelasan Gambar Sumber
K-Means Clustering
• Sederhana, Algoritma pembelajaran unsupervised yang sering digunakan pada himpunan big data.
• Paling cocok untuk pengelompokan tingkat tinggi, skala besar
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
Association Rule Mining
• Algoritma pembelajaran yang mencari asosiasi yang terjadi padafrekuensi tinggi
• Dapat mengidentifikasi asosiasiyang mungkin tidak Anda harapkandalam pengambilan sampel acak
https://gerardnico.com/data_mining/association
Linear Regression
• Salah satu metode analisis statistikyang paling banyak digunakan
• Dapat diterapkan untuk banyakmasalah, terutarama ketikakeluaran yang diharapkan adalahskor daripada kategori
• Baik untuk memprediksi tren dan
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
Algoritma Penjelasan Gambar Sumber
Logistic Regression
• Digunakan untuk menemukankeberhasilan kegagalan suatuperistiwa tertentu
• Algoritma klasifikasi.• cara statistik yang kuat untuk
memodelkan hasil binomial dengansatu atau lebih variabel penjelas
• mengukur hubungan antara kategorivariabel dependen dan satu ataulebih variabel independen denganmengestimasi probaliitasmenggunakan fungsi logistik
Source: James Le, 2016https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
Algoritma Penjelasan Gambar Sumber
C4.5 • Algoritma pembelajaran supervised
• Dikembangkan oleh John Ross Quinlan yang menciptakan decision tree (pengambilan keputusan)
• Membuat pohon keputusan dari input yang telah diklasifikasi
• Pohon keputusan dapat digunakan sebagai alat diagnostik
https://github.com/barisesmer/C4.5
Support vector machine (SVM)
• Belajar untuk mendefinisikan hyperplane untuk memisahkan data menjadi dua kelas
• Dapat membantu mencari tahu dasar mekanisme pemisahan antar orang-orang
• Beberapa masalah besar telah dipecahkan menggunakan SVM (dengan implementasi yang dimodifikasi secara tepat) adalah iklan bergambar, pengenalan situs sambungan manusia, deteksi gender berbasis gambar, klasifikasi gambar skala besar.
https://www.kdnuggets.com/2016/08/10-algorithms-machine-learning-engineers.html
Algoritma Penjelasan Gambar Sumber
Apriori • Algoritma pencocokan kesamaan• Biasa digunakan dalam basis data transaksional dengan jumlah transaksibesar, matriks sparse, dengan item (atribut) di sepanjang sumbuhorizontal, dan transaksi di sepanjangsumbu vertikal.
• Jalankan dengan tingkat overhead komputasi yang tinggi.
https://www.analyticsvidhya.com/blog/2014/08/effective-cross-selling-market-basket-analysis/
8. EM (expectation-maximization)
• Algoritma Pengelompokan yang digunakan untuk menemukan pengetahuan
• Menemukan parameter maksimum (lokal) dari model statistik dalam kasus di mana persamaan tidak dapat diselesaikan secara langsung.
• Memprediksi data yang dapat digunakan dalam metode analisis statistik lainnya.
https://medium.com/@thiagoricieri/understanding-expectation-maximization-and-soft-clustering-4645e997cdb6
EM (expectation-maximization)
• Pengelompokan EM dari data Faithful eruption.
• Model acak awal (yang, karena skala sumbu yang berbeda, tampak bidang yang sangat datar dan lebar) cocok dengan data yang diamati.
• Pada iterasi pertama, model
https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
Algoritma Penjelasan Gambar Sumber
Adaptive Boosting(AdaBoost)
• Metode umum yang dapat diterapkan pada sejumlah pengklasifikasi
• Suatu algoritma yang membangun sebuah classifier dan kemudian meningkatkannya
• Mengoptimalkan kemampuan untuk mempelajari mesin yang berpartisipasi.
Source: Brendan Marsh,2016
Naïve Bayesian
• Bukan satu algoritma, tetapi keluarga klasifikasi probabilistik sederhana berdasarkan penerapan teorema Bayes dengan asumsi kemandirian yang kuat (naif) di antara fitur-fiturnya.
• Algoritma belajar untuk memprediksi atribut berdasarkan fitur lain yang diketahui.
• Mengasumsikan semua atribut item tidak tergantung satu sama lain
http://uc-r.github.io/naive_bayes
Conclusion• These methodologies, techniques, and algorithms are the
tools for Data Science, big data, artificial intelligence use to classifying data, identifying similarities, and predicting trends.
• Using Data Science to analyze Big Data is an effective way of tapping into the inherent value of large data into meaningful information and knowledge. Furthermore Artificial Intelligence uses the results to learn and re-learn the system to gain business intelligence and insight.
• Big Data of an organization should be collected continuously, in order to grow in volume and diversity : spacially and temporally.