Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 225 times |
Download: | 3 times |
Collaborative Filtering and Rules for Music Object Rating and Selection
Sifter Project MeetingMichelle Anderson
Marcel BallHarold BoleyNancy HowseDaniel Lemire
NRC, IITFredericton, NB, Canada
June 19th, 2003(Revised June 18th)
How to implement industry standards for existing Sifter subprojects: RALOCA, COFI Music?
Currently, several industry standards are in place to facilitate the description, search, storage, etc of Learning Objects.*
An LO can be expressed as an entity with content surrounded by an outer shell of descriptive tags (metadata).
* Learning Objects can be composed of multimedia content (images, video, sound), instructional content, learning objectives, or a combination of these different formats.
Learning Objects: Metadata components*
LO
General
Meta Metadata
Technical
Life Cycle
EducationalRights
Relation
Annotation
Classification
*Based on the SCORM Meta-data Information Model
Where do RALOCA and COFI Music come in?
If these systems (or combined together to form one entity)can interpret relevant meta information about an LObased on current standards to provide interoperabilitythese LOs can be sifted, weighted or compared.
RALOCA / COFI Music
- SCORM- CanCore- RSS-LOM- IMS
LO repository
Collaborative Filtering Systems (COFI)
• Collects ratings from a number of users
• Recommends items to user based on correlations between ratings of current user and other users in database
Some Algorithms
• Average – O(1)
• Per Item Average – O(1)
• Pearson – O(m)
• Where ‘m’ is the number of users.
Some Admin Features…
• Add/remove items
• Remove users
• View a list of users and the number of items they have rated
• View the ratings of a user
RALOCA
Rule-Applying Learning-Object
Comparison Agent
Marcel A. BallNational Research Council
Institute for Information Technologye-Business
RALOCA
• RALOCA is a rule-based system for multi-dimensional comparison of learning objects (currently, music albums) based on jDREW Bottom-up (BU) with data represented in Object-Oriented RuleML.
• Part of Sifter Mosaic/NRC e-Learning project
The functionality of RALOCA
• COFI provides RALOCA with a table of predictions (summarized ratings)
• RALOCA uses a rule-based approach to combine the multi-dimensional predictions from COFI into a one-dimensional ranking of the items (objects)
Interfacing with COFI Music
• RALOCA builds on top of collaborative filtering technology from the COFI Music project (Nancy Howse) for ratings of the LOs
• Currently data is exchanged between RALOCA and COFI Music using Java serialization
• Currently has code in place to use the per item average algorithm
• We will use more advanced collaborative filtering algorithms, which will lead to better predictions
LO RuleML Representation<fact>
<_head><atom>
<_opr><rel>product</rel></_opr><_r n=”asin”><ind>B00004YTYO</ind></_r><_r n=”title”><ind>Between the Bridges</ind></_r><_r n=”artist”><ind>Sloan</ind></_r><_r n=”cost”><ind>15.99</ind></_r><_r n=”lyrics”><ind>6</ind></_r><_r n=”originality”><ind>8</ind></_r><_r n=”performance”><ind>6</ind></_r>. . .
</atom></_head>
</fact>
Modification RulesModification rules allow the system to dynamically change values
for the dimensions of a LO, based on information about the LOs, and the user profile.
Example: There is a 5% discount for students buying roducts costing over $20.00.
The modify relation has four roles:
- amount - in our example this is '%-5'- variable - we want to change the 'cost'- product - a variable that will hold the asin of the LO- comment
<imp><_head>
<atom><_opr><rel>modify</rel></_opr><_r n=”amount”><ind>%-5</ind></_r><_r n=”comment”><ind>5% discount for students</ind></_r><_r n=”variable”><ind>cost</ind></_r><_r n=”product”><var>ASIN</var></_r>
</atom></_head><_body>
<and><atom>
<_opr><rel>isstudent</rel></_opr><ind>yes</ind>
</atom><atom>
<_opr><rel>product</rel></_opr><_r n=”asin”><var>ASIN</var></_r><_r n=”cost”><var>COST</var></_r>
</atom><atom>
<_opr><rel>$gt</rel></_opr><var>COST</var><ind>20</ind><ind>true</ind>
</atom></and>
</_body></imp>
Is the user a student?
Is the cost greater-than 20?
Retrieve asin and cost of the LO
XML Representation of n-Dimensional Object Ratings
• Ratings of (music, film, …) object will be on a scale from 0 to 10
• COFI’s n-dimensional ratings of a given music object with some asin code can be represented in OO RuleML as ‘complex term’ (cterm) elements:
– A rate value v becomes marked up as <ind>v</ind>.– Each v-rated dimension d becomes <_r =“d”><ind>v</ind></_r>.
• There is one rating cterm for every music object and for every rater: One row from the COFI prediction table
Two Sample RatingsFor example, for object asinXYZ let us illutstrate 3 dimensions“lyrics”, “originality”, “performance”, as rated by 2 raters:
<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics”><ind>3</ind></_r><_r n=“originality”><ind>8</ind></_r><_r n=“performance”><ind>6</ind></_r>
</cterm>rating
Cterm
lyricsperformance
originality
<ind>3</ind><ind>8</ind>
<ind>6</ind>
<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics”><ind>7</ind></_r><_r n=“originality”><ind>8</ind></_r><_r n=“performance”><ind>4</ind></_r>
</cterm>
rating
Cterm
lyricsperformance
originality
<ind>7</ind> <ind>4</ind><ind>8</ind>
COFI: Generating a Summarized RatingThese ratings can act as a ‘training set’ of typical instances and aweighted representation can be inferred, e.g. using data-mining / collaborating filtering techniques: in this example just the arithmetic means, using the standard deviations to determine the significance(w) of the ratings.
<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics” w=“0.2”><ind>5</ind></_r><_r n=“originality” w=“0.7”><ind>8</ind></_r><_r n=“performance” w=“0.1”><ind>5</ind></_r>
</cterm>
The weights, w, on a scale from 0.0 to 1.0, reflect the raters’ agreements in each of the dimensions (weights add up to 1.0).
<_r n="lyrics"> <cterm> <_opc><ctor>rating</ctor></_opc> <_r n="value"><ind>6</ind></_r> <_r n="stddev"><ind>2.4</ind></_r> </cterm></_r><_r n="music"> <cterm> <_opc><ctor>rating</ctor></_opc> <_r n="value"><ind>8</ind></_r> <_r n="stddev"><ind>3.8</ind></_r> </cterm></_r><_r n="performance"> <cterm> <_opc><ctor>rating</ctor></_opc> <_r n="value"><ind>3</ind></_r> <_r n="stddev"><ind>1.6</ind></_r> </cterm></_r>
. . .
Ranking by Standard Deviation<fact> <_head> <atom> <_opr><rel>product</rel></_opr>
<_r n="asin"> <ind>B00004YTYO</ind> </_r> <_r n="title"> <ind>Between the Bridges</ind> </_r>
. . .
<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind>13</ind></_r> <_r
n="genre"><ind>Traditional</ind></_r> <_r n="label"><ind>BMG</ind></_r> </atom> </_head>
</fact>
RALOCA: Retrieval Patterns
A retrieval pattern can now be used to find a subset of ranked instances from (a ‘test set’ of ) many instances, based on the summarized rating.
For example a user might specify desired (minimum) “lyrics”,“originality”, and “performance” ratings along with their weights.
<cterm uriref=“asinXYZ”> <_opc><ctor>rating</ctor></_opc> <_r n=“lyrics” w=“0.4”><ind>8</ind></_r> <_r n=“originality” w=“0.3”><ind>7</ind></_r>
<_r n=“performance” w=“0.3”><ind>6</ind></_r></cterm>
XSLT: OO RuleML to “Song Rating XML”
We can use XSLT to transform generic OO RuleML into a domainspecific positional format (“Song Rating XML”) for rating of musicobjects
<cterm uriref="asinXYZ/userid"> <_opc><ctor>rating</ctor></_opc> <_r n=“lyrics”><ind>6</ind></_r> <_r n=“originality”><ind>9</ind></_r> <_r n=“performance”><ind>6</ind></_r></cterm>
<===XSLT===>
<rating song="asinXYZ" user="userid"> <lyrics>6</lyrics> <originality>9</originality> <performance>6</performance> </rating>
OO RuleML to Positional RuleMLTranslators
•XSLT Transformations
•3 Step Process
•Similar to Unix pipes
OO RuleML Representation
signature(database schema, template)
implementation
<fact>implementation
<fact>
. . .
Apply to
Apply toApply to
applysig.xsl
<fact> <_head> <atom> <_opr><rel>product</rel></_opr> <_r n="title"><ind>Between the Bridges</ind></_r> <_r n="artist"><ind>Sloan</ind></_r> <_r n="asin"><ind>B00004YTYO</ind></_r> <_r n="publishYear"><ind>1997</ind></_r> <_r n="numtracks"><ind>13</ind></_r> <_r n="genre"><ind>Traditional</ind></_r> <_r n="cost"><ind>14.99</ind></_r> <_r n="label"><ind>BMG</ind></_r> </atom> </_head></fact>
Signature is applied to atoms with rel = product and fills in missing roles.All order is lost.
<fact> <_head>
<atom> <_opr><rel>product</rel></_opr> <_r n="label"><ind>BMG</ind></_r>
… <_r n="publish Year"><ind/></_r> <_r n="quality"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="music"><ind/></_r> <_r n="lyrics"><ind/></_r>
… <_r n="asin"><ind>B00004YTYO</ind>
</_r></atom>
</_head></fact>
<signature … order="sorted"><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind/></_r> <_r n="title"><ind/></_r> <_r n="artist"><ind/></_r> <_r n="cost"><ind/></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>
<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind/></_r> <_r n="genre"><ind/></_r> <_r n="label"><ind/></_r> </atom></_head>
</signature>
The order of the signature is applied to the atom when order = sorted. Otherwise _r’s are sorted by the n attribute
nprmlsort.xsl
<fact> <_head>
<atom> <_opr><rel>product</rel></_opr> <_r n="label"><ind>BMG</ind></_r>
… <_r n="publish Year"><ind/></_r> <_r n="quality"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="music"><ind/></_r> <_r n="lyrics"><ind/></_r>
… <_r n="asin"><ind>B00004YTYO</ind>
</_r></atom>
</_head></fact>
<fact><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind> … </ind></_r> <_r n="title"><ind> … </ind> </_r> <_r n="artist"><ind> … </ind></_r> <_r n="cost"><ind> … </ind></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>
<_r n="publish Year"><ind> … </ind></_r> <_r n="numtracks"><ind> … </ind></_r> <_r n="genre"><ind> … </ind></_r> <_r n="label"><ind> … </ind></_r> </atom></_head>
</fact>
<signature … order="sorted"><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind/></_r> <_r n="title"><ind/></_r> <_r n="artist"><ind/></_r> <_r n="cost"><ind/></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>
<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind/></_r> <_r n="genre"><ind/></_r> <_r n="label"><ind/></_r> </atom></_head>
</signature>
oorml2prml.xsl
<fact> <_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind> … </ind></_r> <_r n="title"><ind> … </ind> </_r> <_r n="artist"><ind> … </ind></_r> <_r n="cost"><ind> … </ind></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r> <_r n="publish Year"><ind> … </ind></_r> <_r n="numtracks"><ind> … </ind></_r> <_r n="genre"><ind> … </ind></_r> <_r n="label"><ind> … </ind></_r> </atom> </_head></fact>
<signature … order="sorted"><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind/></_r> <_r n="title"><ind/></_r> <_r n="artist"><ind/></_r> <_r n="cost"><ind/></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>
<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind/></_r> <_r n="genre"><ind/></_r> <_r n="label"><ind/></_r> </atom></_head>
</signature>
<signature … order="sorted"> <_head>
<atom> <_opr><rel>product</rel></_opr> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> </atom>
</_head></signature>
Metaroles (_r) are removed, leaving a positionalized version of each atom
<fact><_head> <atom> <_opr><rel>product</rel></_opr> <ind>B00004YTYO</ind> <ind>Between the Bridges</ind> <ind>Sloan</ind> <ind>14.99</ind> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind>13</ind> <ind>Traditional</ind> <ind>BMG</ind> </atom></_head>
</fact>
Relational Database
• Table 1: Ratings (ItemID, UserID, Dimension1, …)
• Table 2: Users (UserID, UserName, Password, …)
• Table 3: Comments (ItemID, UserID, Date, Comment, …)
• Table 4: Item (ItemID, Title, Author, …)
Free Text
• We will start collecting and displaying comments for two reasons:– add more content to our sites – allow further research by Anna Maclachlan
and others
Conclusion
• COFI and RALOCA are specific to the music domain they describe, but they can easily be converted to describe various other e-Learning domains: movies, etc.
• This could be implemented to add an advanced rating / search feature to existing data-collecting systems.
Learning Objects: Industry Standardization
LO IEEE-LOM; provides structuredDescriptions of re-usable digitalLearning resources.
RSS-LOM Module (translation)
RSS 1.0; allows learning object repositories to syndicate listings and description of learning objects.
Date
LO / FEED
Author Technical FormatX X
Unique Identifier:(registry agency
identifier number and time in milliseconds)
RSS-LOM-Eval
Completing Missing DimensionsTaking the “performance” rating from the collaborative pattern, this will be expanded into the final retrieval pattern:
<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics”><ind>8</ind></_r><_r n=“originality”><ind>7</ind></_r><_r n=“performance”><ind>5</ind></_r>
</cterm>
Possibly also the weights can be taken from the collaborative pattern (so omitting a dimension would not mean it has weight 0.0)
<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics” w=“0.6”><ind>8</ind></_r><_r n=“originality” w=“0.1”><ind>7</ind></_r><_r n=“performance” w=“0.3”><ind>5</ind></_r>
</cterm>
Scoring RulesThe system uses a RuleML file to calculate the score of an LO. The only fixed relation within the scoring rule file is the 'score' relation, which has two arguments, one containing the 'asin' of the album (a unique identifier) and the actual score.
• Currently implemented as a normalized weighted sum
• Can be changed to implement another scoring scheme providing thatthe scheme can be calculated using the built in relations in jDREW– currently the following: addition ($add), subtraction
($sub), multiplication($mul), division ($div), summation ($sum), less-than ($lt), greater-than ($gt), square-root($sqrt).
RALOCA: Technologies used
• Object-Oriented RuleML – using the XSLT Translators written by Stephen Greene to convert Object-Oriented RuleML into positional RuleML, which can be interpreted by jDREW
• jDREW BU developed by Bruce Spencer modifications by Marcel Ball
Collaborative filtering I
Correlates the current user’s ratings with those of other users.
Collaborative filtering system correlate the provided ratings of the current user with the ratings of all other users of the database.
to predict the current users’ ratings for unrated items.