Date post: | 16-Aug-2015 |
Category: |
Education |
Upload: | kyunghoon-kim |
View: | 36 times |
Download: | 1 times |
Korean Word Networkv.0.1.0
August 7, 2015
Kyunghoon [email protected]
Department of Mathematical SciencesUlsan National Institute of Science and Technology
Republic of Korea
1Contents
What is a text mining?Text is the most common vehicleFeatures of Text MiningDocument clustering
Warming upInstallation1. PythonInstallation2. pip upgrade and library install
PythonIPythonPython Basic Coding
Text Mining ExampleMorpheme Analysis
Network Analysis
2Introduction
What is a text mining?▶ Data mining is about looking for patterns in data.
▶ Text mining is about looking for patterns in text.▶ It is the process of analyzing text to extract information that is
useful for particular purposes.
2Introduction
What is a text mining?▶ Data mining is about looking for patterns in data.▶ Text mining is about looking for patterns in text.
▶ It is the process of analyzing text to extract information that isuseful for particular purposes.
2Introduction
What is a text mining?▶ Data mining is about looking for patterns in data.▶ Text mining is about looking for patterns in text.▶ It is the process of analyzing text to extract information that is
useful for particular purposes.
4Introduction
▶ Comprehensible
▶ → Text summarization▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.
4Introduction
▶ Comprehensible▶ → Text summarization
▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.
4Introduction
▶ Comprehensible▶ → Text summarization▶ Document classification [supervised learning]
▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.
4Introduction
▶ Comprehensible▶ → Text summarization▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]
▶ Metadata extraction▶ etc.
4Introduction
▶ Comprehensible▶ → Text summarization▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.
5Introduction
Document clustering can assist information retrieval by creating linksbetween similar documents.
http://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-K-Means-Algorithm
http://www.nature.com/nmeth/journal/v8/n6/fig_tab/nmeth.1619_F1.html
6Warming up
Installation1. Python▶ Download Anaconda at http://continuum.io/downloads▶ Install it. (BECAREFUL! Install with All User Option)
Anaconda is a completely free Python distribution. It includes over 195 of themost popular Python packages for science, math, engineering, data analysis.e.g., IPython, Numpy, Scipy, Pandas, Scikit-learn, especially Networkx
7Warming up
Installation2. pip upgrade and library installIf you are windows user, edit your code
▶ Open a file ‘C:\Anaconda\Lib\site.py’▶ Replace
def setencoding():encoding = “ascii”
▶ todef setencoding():
encoding = “mbcs”
8Warming up
Installation2. pip upgrade and library installClick windows key and type ‘cmd’, enter.
▶ python -m pip install -U pip▶ pip install umorpheme
9
IPythonInteractive Computing
IPython▶ IPython is a command shell for interactive computing in multiple
programming languages, originally developed for the Pythonprogramming language, that offers enhanced introspection, richmedia, additional shell syntax, tab completion, and rich history.
▶ IPython Notebook is a web-based interactive computationalenvironment for creating IPython notebooks. An IPython notebookis a JSON document containing an ordered list of input/output cellswhich can contain code, text, mathematics, plots and rich media.
10
IPythonInteractive Computing
Execution1. cmd2. mkdir test3. cd test4. ipython notebook
If your logo is , command ‘pip install -U ipython’ for version 3(jupyter)
18
PythonBasic Coding
References1. English :
https://wakari.io/nb/url///wakari.io/static/notebooks/Lecture_1_Introduction_to_Python_Programming.ipynb
2. Korean :https://wikidocs.net/book/1
3. Interactive Tutorial :http://interactivepython.org/runestone/static/pythonds/index.html
4. List of Python Learning :https://www.codementor.io/learn-python-online
19
PythonBasic Coding
References5. Online Python :
https://www.pythonanywhere.com/try-ipython/6. Python Visualization :
http://www.pythontutor.com/visualize.html
20Morpheme Analysis
In order to use a online morpheme analyzer, register your information athttp://information.center/korean,After ‘Service Registration’, you can get a API Key(string with 14length). Don’t share it with others.
21Morpheme Analysis
1 s = '여러분반갑습니다, 유니스트에오신것을환영합니다.'2 api_url = 'http://information.center/api/korean'3 api_key = '3WOK8DKWKS2I59'4 data = um.analyzer(s, api_url, api_key, '유니스트', 1)
25Morpheme Analysis
https://docs.google.com/spreadsheets/d/1-9blXKjtjeKZqsf4NzHeYJCrr49-nXeRF6D80udfcwY/