+ All Categories
Home > Documents > Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi

Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi

Date post: 24-Feb-2016
Category:
Upload: mareo
View: 29 times
Download: 0 times
Share this document with a friend
Description:
Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi. Overview. Introduction Building a ground truth set Experiments Results. Introduction. Purpose: Music mood classification through lyric text mining approaches MIR (Music Information Retrieval) - PowerPoint PPT Presentation
14
ata Mining and Text Analytics in Musi Audi Sugianto and Nicholas Tawonezvi
Transcript
Page 1: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Data Mining and Text Analytics in MusicAudi Sugianto and Nicholas Tawonezvi

Page 2: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Overview Introduction Building a ground truth set Experiments Results

Page 3: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Introduction Purpose: Music mood classification through lyric text mining approaches MIR (Music Information Retrieval) Use of Audio Datasets:

AMC (Audio Mood Classification) USPOP, USCRAP, etc.

Use of Social tags from last.fm

Challenges: Natural subjectivity of music Human perspectives on mood

Page 4: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Generating Ground TruthData Collection

Combination of in-house and public audio tracks Collect songs with at least one social tag from last.fm Lyrics can be gathered from mainly Lyricwiki.org.

Use of Lingua to ensure data quality Finalise songs that have both correct lyrics and tags

Page 5: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Generating Ground TruthAlgorithms, Resources and Techniques

WordNet-Affect Used to filter out junk tags Assignment of labels to concepts (emotions, moods, responses)

Use of human expertise to identify mood-related words in the music domain

Affective Aspect Judgemental Tags Ambiguous Meanings

Use of WordNet to categorise into groups based on synonyms. Use of music experts to merge groups by musical similarity

Page 6: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Generating Ground TruthSelecting Songs

Approaches: Tag identification Lyric counts Multi-label Classification

Page 7: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Mood Categories and Song Distributions

Page 8: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

ExperimentsEvaluation Measures and Classifiers

Use of 10-fold Cross Validation Break data into 10 sets of size n/10. Train on 9 datasets and test on 1. Repeat 10 times and take a mean accuracy.

Classification with Support Vector Machines (SVM) Algorithms to analyse data and recognise patterns

Page 9: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

ExperimentsLyric Preprocessing

Facts: Repetitions of words and sections: - Lack of verbatim transcripts Consisting of sections:

Intro, interlude, verse, etc. in the annotations Notes about song and instrumentation

Possible solution: Identifying and converting repetition and annotation patterns to actual repeated segments

Page 10: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

ExperimentsLyrics Features

Common text classification tasks: Bag-of-words (BOW)

Collection of Unordered words Part-of-Speech (POS)

Use of Stanford Tagger Function Words (the, a, etc.)

Assigning of values: Frequency Tf-idf weight Normalised-frequency Boolean Value

Page 11: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

ExperimentsStemming

Stemming – Merging words with same morphological roots Snowball Stemmer

Irregular nouns and verbs as inputs

Page 12: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Results

Text categorisation provides dimensionality and good generalisability POS Boolean representation is poorer because of high content of POS types in lyrics Content words are more useful in mood classification

10th International Society for Music Information Retrieval Conference (ISMIR 2009)

Page 13: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

AcknowledgementHu, X. et al. 2009. Lyric Text Mining in Music Mood Classification. International Music Information Retrieval Systems Evaluation Laboratory University of Illinois at Urbana-Champaign. [Online]. Pp.411-416. [Accessed 6 December 2013]. Available fromː http://ismir2009.ismir.net/proceedings/PS3-4.pdf

Training and Testing Data Sets. 2013. Training and Testing Data Sets. [Online]. [Accessed 5 December 2013]. Available from:http://technet.microsoft.com/en-us/library/bb895173.aspx.

Kohavi, Ron (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): 1137–1143.(Morgan Kaufmann, San Mateo, CA)

D. Ellis, A. Berenzweig, and B. Whitman: The USPOP2002 Pop Music Data Set. Availablefromː http://labrosa.ee.columbia.edu/projects/musicsim/uspop2002.html.

Page 14: Data Mining and Text  Analytics in Music Audi  Sugianto  and Nicholas  Tawonezvi

Software & Additional Resourceshttp://www.music-ir.org/mirex/2007/index.php/AMChttp://en.wikipedia.org/wiki/MoodLogichttp://search.cpan.org/search%3fmodule=Lingua::Ident – Statistical language identifierhttp://snowball.tartarus.org/http://www.englishpage.com/irregularverbs/irregularverbs.htm - irregular verb listhttp://www.esldesk.com/eslquizzes/irregular-nouns/irregular-nouns.htm - irregular noun list http://nlp.stanford.edu/software/tagger.shtml - http://www.music-ir.org/mirex/2007/abs/AI_CC_GC_MC_AS_tzanetakis.pdf - POS Taggerhttp://www.music-ir.org/archive/figs/18moodcat.htm - Mood Categories & Song Distributionshttp://www.originlab.com/index.aspx?go=Products/Origin/Statistics/NonparametricTests&pid=1087 – Performance identifier


Recommended