Поиск ключевых слов в тексте

Post on 16-Jun-2015

900 views 0 download

Tags:

transcript

Information:

1. it is very important thing2. it’s amount increases very fast

The problem is:

«How to find necessary information?»

Simple example

Consider some E-Libratygen.lib.rus.ec*– library of science literature.Contains moreover than 250k books.

*(it is not advertising, but just example)

Search “физика”-993 results

Search “закон Ньютона” – 0 results

The question appears:

«How to get the list of keywords from each

book?»

I’ll try answer it in my coursework

Ziph’s law

Ziph’s law

Ziph’s law

Ziph’s law (1940-s) is empirical law

TF-IDF weight

TF-IDF weight

TF-IDF weight

Result weight = TF*IDF

Lemmatisation

Lemmatisation

Lemmatisation

Mystem – the program which can perform lemmatisation.

For non-commercial use.

By

Algorithm

1. Get text2. For each word:– Perform lemmatisation– Find amount of occurances in the text

3. Get list of keywords, using Ziph’s law4. Get more accurate list of words, usint TF-IDF5. Get next text

Algorithm of keywords search

Algorithm of text search by query

Result

The list of keywords (with their weights) for each text

Result

OWL ontology

Classes:• Library• Text• Keyword

Relations:• Contains• Has keyword• Arrears in text• Has TFIDF equals

OWL ontology