+ All Categories
Home > Education > Korean Word Network

Korean Word Network

Date post: 16-Aug-2015
Category:
Upload: kyunghoon-kim
View: 36 times
Download: 1 times
Share this document with a friend
36
Korean Word Network v.0.1.0 August 7, 2015 Kyunghoon Kim [email protected] Department of Mathematical Sciences Ulsan National Institute of Science and Technology Republic of Korea
Transcript

Korean Word Networkv.0.1.0

August 7, 2015

Kyunghoon [email protected]

Department of Mathematical SciencesUlsan National Institute of Science and Technology

Republic of Korea

1Contents

What is a text mining?Text is the most common vehicleFeatures of Text MiningDocument clustering

Warming upInstallation1. PythonInstallation2. pip upgrade and library install

PythonIPythonPython Basic Coding

Text Mining ExampleMorpheme Analysis

Network Analysis

2Introduction

What is a text mining?▶ Data mining is about looking for patterns in data.

▶ Text mining is about looking for patterns in text.▶ It is the process of analyzing text to extract information that is

useful for particular purposes.

2Introduction

What is a text mining?▶ Data mining is about looking for patterns in data.▶ Text mining is about looking for patterns in text.

▶ It is the process of analyzing text to extract information that isuseful for particular purposes.

2Introduction

What is a text mining?▶ Data mining is about looking for patterns in data.▶ Text mining is about looking for patterns in text.▶ It is the process of analyzing text to extract information that is

useful for particular purposes.

3Introduction

▶ Text is the most common vehicle for the formal exchange ofinformation.

4Introduction

▶ Comprehensible

▶ → Text summarization▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.

4Introduction

▶ Comprehensible▶ → Text summarization

▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.

4Introduction

▶ Comprehensible▶ → Text summarization▶ Document classification [supervised learning]

▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.

4Introduction

▶ Comprehensible▶ → Text summarization▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]

▶ Metadata extraction▶ etc.

4Introduction

▶ Comprehensible▶ → Text summarization▶ Document classification [supervised learning]▶ Document clustering [unsupervised learning]▶ Metadata extraction▶ etc.

5Introduction

Document clustering can assist information retrieval by creating linksbetween similar documents.

http://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-K-Means-Algorithm

http://www.nature.com/nmeth/journal/v8/n6/fig_tab/nmeth.1619_F1.html

6Warming up

Installation1. Python▶ Download Anaconda at http://continuum.io/downloads▶ Install it. (BECAREFUL! Install with All User Option)

Anaconda is a completely free Python distribution. It includes over 195 of themost popular Python packages for science, math, engineering, data analysis.e.g., IPython, Numpy, Scipy, Pandas, Scikit-learn, especially Networkx

7Warming up

Installation2. pip upgrade and library installIf you are windows user, edit your code

▶ Open a file ‘C:\Anaconda\Lib\site.py’▶ Replace

def setencoding():encoding = “ascii”

▶ todef setencoding():

encoding = “mbcs”

8Warming up

Installation2. pip upgrade and library installClick windows key and type ‘cmd’, enter.

▶ python -m pip install -U pip▶ pip install umorpheme

9

IPythonInteractive Computing

IPython▶ IPython is a command shell for interactive computing in multiple

programming languages, originally developed for the Pythonprogramming language, that offers enhanced introspection, richmedia, additional shell syntax, tab completion, and rich history.

▶ IPython Notebook is a web-based interactive computationalenvironment for creating IPython notebooks. An IPython notebookis a JSON document containing an ordered list of input/output cellswhich can contain code, text, mathematics, plots and rich media.

10

IPythonInteractive Computing

Execution1. cmd2. mkdir test3. cd test4. ipython notebook

If your logo is , command ‘pip install -U ipython’ for version 3(jupyter)

11

IPythonInteractive Computing

New ipython notebook

12

IPythonInteractive Computing

13

IPythonInteractive Computing

14

IPythonInteractive Computing

15

IPythonInteractive Computing

16

IPythonInteractive Computing

17

IPythonInteractive Computing

18

PythonBasic Coding

References1. English :

https://wakari.io/nb/url///wakari.io/static/notebooks/Lecture_1_Introduction_to_Python_Programming.ipynb

2. Korean :https://wikidocs.net/book/1

3. Interactive Tutorial :http://interactivepython.org/runestone/static/pythonds/index.html

4. List of Python Learning :https://www.codementor.io/learn-python-online

19

PythonBasic Coding

References5. Online Python :

https://www.pythonanywhere.com/try-ipython/6. Python Visualization :

http://www.pythontutor.com/visualize.html

20Morpheme Analysis

In order to use a online morpheme analyzer, register your information athttp://information.center/korean,After ‘Service Registration’, you can get a API Key(string with 14length). Don’t share it with others.

21Morpheme Analysis

1 s = '여러분반갑습니다, 유니스트에오신것을환영합니다.'2 api_url = 'http://information.center/api/korean'3 api_key = '3WOK8DKWKS2I59'4 data = um.analyzer(s, api_url, api_key, '유니스트', 1)

22Morpheme Analysis

23Morpheme Analysis

24Morpheme Analysis

25Morpheme Analysis

https://docs.google.com/spreadsheets/d/1-9blXKjtjeKZqsf4NzHeYJCrr49-nXeRF6D80udfcwY/

26Morpheme Analysis

27Morpheme Analysis

28Morpheme Analysis

29Network Analysis

Next Time ...


Recommended