Usage patterns discovery from a web log in an Indian e-learning site: A case study

Usage patterns discovery from a web log in an Indiane-learning site: A case study

Renuka Mahajan & J. S. Sodhi & Vishal Mahajan

# Springer Science+Business Media New York 2014

Abstract An important research area in education and technology is how the learnersuse e-learning. By exploring the various factors and relationships between them, we canget an insight into the learners’ behaviors for delivering tailored e-content required bythem. Although many tools exist to record detailed navigational activities, they don’texplore the learners’ usage patterns for an adaptive e-learning site. The previous weblog data analyses, done so far, have been very limited in their scope as they lackdetailed empirical results on the learning technology usage. This paper discusses thedetailed results of a case study of web data mining in a specific e-learning application.The main objective of this study is to conduct research on usability and effectiveness ofthe e-content by analyzing the web log. For this, a suitable data set was retrieved fromraw web log records, to which various web mining & statistical techniques could beapplied. We have evaluated different features of e-content that can lead to betterlearning outcomes for the learners, by understanding their navigational behaviors, theirinteraction with system and their area of interest. We found, for example, whatsequence of topics were the most liked and the least liked by the learners; we alsofound that these patterns, lead us to hypothesize, the correlation and regression analysisbetween the average time, test score and total attempts.

Keywords E-learning .Webmining . Personalization

Educ Inf TechnolDOI 10.1007/s10639-014-9312-1

R. Mahajan (*)Amity Institute of Information Technologies, Noida, UP, Indiae-mail: [email protected]

J. S. SodhiAKC DataSystems, Delhi, Indiae-mail: [email protected]

V. MahajanHCL Technologies, Noida, UP, Indiae-mail: [email protected]

1 Introduction

E-learning is the use of technology to deliver education and training to a diverselearner population. With the increasing use of e-learning in schools and universi-ties, there is a huge demand for quality e-learning platforms that are adaptive tolearners’ characteristics to enhance their learning experience and performance.Adaptive e-learning breaks the concept of ‘one size fits all’. An adaptive systemmust be capable of detecting user needs and finally adapting to their needs. Foradapting, one needs to answer these important questions. First, what web datashould a system utilize for adaptation? Second, how to use this web data collectedto adapt to the needs of the learners.

The web log analysis can provide valuable insights into how the learners utilize asite and how it could be made adaptive. This feedback is extremely valuable forlearners, while instructors can see how often their course material is used. For thiscase study, actual usage data in web log of diverse learner population in a learningportal were statistically analyzed. Various patterns thus discovered represented frequentusage among various courses and recommended links for building an adaptive e-learning site, so that a learner can easily find the most relevant information dependingon the previous learners’ trails.

1.1 Related work

There has been some previous research in analysis of interactions in a log file in e-learning.

& As in (Sheard et al. 2005), Information gained from knowing when the studentaccesses various resources can help educators understand student’s preferred learn-ing patterns.

& An analysis of student use of a courseware website by (Peled and Rashty 1999)found out that the most popular online activities were rather passive and focused ongetting information rather than contributing. They also found differences in the typeof resources accessed by males and female students.

& A more recent study (Gao and Lehman 2003) investigated learning outcomes ofstudents using Web-based learning environments by providing different levels ofinteractivity.

& Authors of (McIsaac and Blocher 1999) studied the interactions of doctoral studentswith an online environment and concluded that these student’s interactions weregoal focused.

& Authors of (Hellwege et al. 1996) explored the student’s use of a geology websiteand concluded after the log file analysis that students accessed the most recentlecture notes first, picking up a couple of key slides, before returning to a previouslecture.

& Authors of (Valsamidis and Democritus 2011) investigated whether there is arelationship between a student activity and their grades in an e-learning system.Their results show that “e-content’s quality is closely related to the students’ gradesand the instructors should improve their courses in order to increase their qualityand as a consequence the course usage by the students”.

Educ Inf Technol

& (Vanijja and Supattathum 2006) proposed some interesting results from the e-learning system’s access log analysis in terms of usage among various courses.

The previous researches show that the analysis of learners’ interactions can help inunderstanding learner behavior in the e-learning system. Author of (Romero et al.2008a) states that e-learning systems accumulate a vast amount of information which isvery valuable for analyzing students’ behavior and could create a gold mine ofeducational data (Mostow and Beck 2006). However, lot of publications in the fieldof Educational Data Mining and Learning Analytics lack detailed empirical results onthe learning technology usage. Moreover, they do not describe the impact of web loganalysis for constructing adaptive online e-learning system. Further navigation andinteraction optimization can be done by applying Web Usage Mining techniques(specifically Frequent Pattern Mining) to the navigations paths, to provide systems’adaptability to learners’ requirements and capacities.

Hence in order to understand what web data should be used for adaptation and howto use it to adapt to the needs of the learners, we list the research objectives as under.

The Research Objectives in this Web Data analysis are

1. What topics are the most popular and the least popular by analyzing frequent pagesvisited at the same time

2. What sequence of topics are the most requested and the least requested by learners?3. What learners’ access patterns indicate learning progress and learners’ access

duration?4. To group students into different clusters with similar learning characteristics.5. Testing the hypothesis for correlation analysis between Average Time and Average

Score, Total Attempts and Average Score and Total Attempts and Average Time.6. Testing the hypothesis for regression analysis between independent variables-Total

Attempts and Average Time Taken v/s dependent- Marks Scored.7. How to optimize navigation & interactions of the learners by applying frequent

pattern mining for building adaptive e-learning environment.

Using this knowledge, the trends of the activity of the learners, can be determinedand recommendations to the next visited pages can be calculated.

This case study is based on an e-learning system designed for students of classesfrom 6th to 10th. It provides study planner, notes, reports to track progress, lessonsummary for quick reference, previous years board papers and practice tests on Math’s,Science, Social Science and English lessons for CBSE, ICSE and 18 other Indian StateBoards, to improve grades and clear student’s concepts. It provides rich multimediainterface for the students. The prerequisite for accessing the e-content is that thestudents have to be registered on the e-learning application. The other important aspectis that the students have to take an assessment test at least once after studying thematerial on a specific topic within a subject, to mark that topic as complete. However, atopic can be referred multiple times before attempting the assessment test.

We organize this paper as follows: We describe our methods in Section 2. Wesummarize the experimental results in Section 3. We present discussion in Section 4.We provide our concluding remarks in Section 5 and the pointers for future work inSection 6.

Educ Inf Technol

2 Methodology

We conducted analysis of a real world data set. Once the course is undertaken, thecumulative number of web log data for each learner was downloaded and compiled forall learner clicks.

Population and Sampling: The sample of 3,561 records was selected from apopulation of around 5,000 records from the actual web usage log records fromlearn-next portal from Dec 1 2,013 to Dec 31 2013.

Measurement instruments design and development: The secondary data i.e. web logdata was used. The dimensions consisting of usage and performance data was carefullyselected from this web log data. Data Preparation and Preprocessing were applied toretrieve this suitable data set from raw web log records, to which various data mining &statistical techniques could be applied. Web data preprocessing includes tasks such asdata cleaning, user and session identification, page view identification etc. After datacleaning, filtering, pre-processing, integrating from multiple sources, we transformedthe integrated data into a relational database or a warehouse, suitable to be used as inputto various data mining algorithms. This is similar to the data set that (Comunale andSexton 2001–2001) explored. The log entries consisted of time, date of request, IPaddress of the client, the resource requested, mode of request(POST/GET), HTTPstatus,no of hits, cookie etc. This data was preprocessed and transformed.Pedagogical data as in (Hardy et al. 2004) i.e. the page visited, the time of access,the referred page, the time spent viewing the page etc. were collected. For our study,following fields were finally filtered—Total Attempts, Unique Attempts, totalassessments- fail/pass, Assessments Attempted, Marks Scored, Average Time Taken(secs.) and the complete navigation path (Class, Subject, name of the Chapter, and theLesson’s Name) from 3,561 web log sessions. This forms the final data to be used asinput for all the methods Table 1.

2.1 Attributes definition

Further, patterns discovery can be based on the summarization techniques such as:statistical analysis, frequent pattern mining algorithms, clustering etc. Statistical

Table 1 Attribute definitions

Name Description

Average time It is the average of the time spent by students on the specific topic, within a subject, in theselected data sample registered for the course.

Average score It is the average of the marks scored in assessment of each topic, within a subject, by thestudents in the selected data sample registered for the course

Total attempts These are the numbers of times a specific topic within a subject is referred by students in theselected data sample registered for the course

Marks scored These are the marks the student obtained n the assessment of a specific topic within a subject

n_assessment Number of assessments done

n_assess_p Number of assessments passed

n_assess_f Number of assessments failed

Educ Inf Technol

analysis are the common methods to give statistical description (trend and peri-odicity) of pattern based on frequency table, mean, median, standard deviation,histogram. More advanced inferential statistics such as correlation analysis (i.e.the measurement of the relation between two or more variables), regressionanalysis (which takes a numerical dataset and develops a mathematical formulathat transforms input variables into real-valued prediction for the dependentvariable/s) as in (Sia et al. 2007), and hypothesis testing (answer to yes/noquestion) are used. Association rules are used to discover interesting relationsbetween objects, events, etc., such as in (Baron and Spiliopoulou 2004).Sequential Pattern mining is very useful when the data follows a sequential natureor a specific order, e.g., the sequence of actions performed by a learner in an e-learning system. Clustering, like in (Vaarandi 2003), is a technique to grouptogether a set of data having similar characteristics. The K-means algorithm isan algorithm to cluster objects based on attributes in k partitions. Our objective isto group students from a specific course into different clusters depending on thefinal marks. As the patterns we want to discover have a simple format, a simplestatistical analysis can be used to extract periodic patterns from Web pages, findclusters, followed by correlation analysis and regression analysis in SPSS16.Further, the extraction of sequential patterns has been done by frequent rulemining in e-learning for recommending next possible learning path.

3 Results and findings

3.1 Learner data analysis

As in (Yadav and Choubey 2012), an important personalization parameter of e-learningscenario is their present knowledge level. We propose the construction of adaptivelearning environment for three types of learners with different knowledge levels (foundby their test scores) to find relevant subject contents. In e-learning, clustering can beused for finding clusters of students with similar test scores. For this objective,considering their assessment marks as in (Esichaikul et al. 2011), the learners arecategorized as novices, intermediate and advanced using K-means Cluster analysis(MacQueen 1967) in SPSS 16 Fig. 1.

3.2 Web log data analysis

In order to assess the frequency distribution of content usage, the most frequentaccessed pages/topics, the least accessed pages/topics, a frequency distribution wasdetermined by considering one variable at a time. The objective is to count the numberof responses associated with different values of the variable.

3.2.1 Learners across various levels of education (Class wise)

Inference The above trend clearly shows that the e-learning site is mostly accessed bystudents of higher classes. However, the acceptability in lower classes keeps decreasingFig. 2.

Educ Inf Technol

3.2.2 Types of subjects accessed

Inference Above trend shows that Maths is the most preferred subject on the e-learning site. Science is the next most sought after subject preferred by thelearners. Botany and Zoology are the least frequently accessed. The learnershave also shown a reasonable interest in accessing Physics, Biology andChemistry. Fair amount of interest has also been placed on General Maths.Further analysis reveals (refer Average time spent on each Math’s topic) thatstudents have spent maximum time in Standard Tests to find out the weakareas, also Standard Tests are available only for Maths and Science Topic andhence it can be concluded that one of the important reasons learners are usingthe web site is to access the Standard Test to test their knowledge. Previousresearches shows student’s preference for using e-material (Intratat 2011) Fig. 3.

Fig. 2 Class wise learner classification

Categorization Of Learners

30.4423.74

45.82

Novice Intermediate Advance

Fig. 1 Learner clusters

Educ Inf Technol

3.3 For mostly sought subjects

3.3.1 Maths

3.3.1.1 Topics frequently accessed

Inference The above trend clearly represents that ten topics covers 78 % of thetotal log records. This might be because the information provided on these topicspossibly is more relevant to them. Topics frequently accessed are:—‘CoordinateGeometry, Real Numbers, Theorem of Pythagoras, Introduction to Trigonometry,Quadratic Equations, Polynomials, Arithmetic Progressions and Pair of LinearEquations in Two Variables’, suggesting their personal interest and motivationFig. 4.

However, only 4 % of the learners had accessed topics—‘Trigonometry,Polynomials over Integers, Analytical Geometry and Factor Theorem’ reflecting theircomfort level/expertise in the above subject or they may have found the information forthese topics irrelevant to them.

3.3.1.2 Average time spent on each math’s topic

Inference This analysis indicates that the learners have spent maximum time onTests and Model papers to find the weak areas to improve on. Average time spenton topics ‘Triangles, Algebraic Expressions, Polynomials, Data Handling,Comparing Quantities and Lines and Angles’ appear to be more or less similarsuggesting that the level of understanding or interest is same. Studies have beendone by (Cotton and Grestya 2007) to explore in detail, the ways in which

BiologyBotany

ChemistryGeneral Maths

MathsPhysics

ScienceZoology

Subject

0

10

20

30

40

Perce

nt

Fig. 3 Frequently accessed subjects

Educ Inf Technol

students engaged with a resource and to evaluate the ways and extent to which itmight enhance their learning Fig. 5.

3.3.1.3 Average assessment score on each math’s topic

Inference Data representation in the above graph indicates that after accessing thesite, the learners’ have scored the highest percentile in mainly 12 topics—

Fig. 4 Frequently accessed topics within math’s

Educ Inf Technol

Fig. 5 Topic wise average time spent in maths

Educ Inf Technol

‘Symmetry, Tangent Properties of Circles, Factor Theorem and Chord Propertiesof a Circle’. For topics—‘Compound Interest, Tangent to a Circle, Probability and

Fig. 6 Topic wise average score in maths

Educ Inf Technol

Distance and Section Formulae’ the scores are reasonably encouraging. Fortopics—‘Some Applications of Trigonometry, Circumference and Area of aCircle, Algebra and Trigonometry Constructions’, the graph shows that learnersdo not have good understanding of the topics concerned. Hence more focus shouldbe given to these topics to improve learner’s understanding Fig. 6.

3.3.2 Science


Inference The above trend clearly shows that only nine topics—Acids, Bases and Salts,Carbon and its Compounds, Life Processes, Crop Production and Management, Matterin Our Surroundings, Sound, Light—Reflection and Refraction occupy the sphere ofthe highest liking by the learners of various age groups, possibly due to their inherentcomplexity, interest or relevance Fig. 7.

However, 52 topics are least frequently accessed e.g. Food: Where Does It ComeFrom, Nutrition in Animals, Coal and Petroleum,Stars and Our Solar System,Diversityin Living Organisms,Periodic Classification of,elements, Cell—Structure andFunctions,Synthetic Fibers and Plastics, Motion and Time, Components of Food, Funwith Magnets, Gravitation Human Eye and Colorful World, Electric Current and itsEffects etc.

3.3.2.2 Average time spent on each topic

Inference This analysis indicates that the learners have spent maximum average timeon seven topics such as Work and Energy, Motion, Light Reflections and Refraction,Chemical Effects of Electric Current, Electricity, Motion and Time and ChemicalReactions and Equations.

Comparatively less time was spent on remaining Topics -Metals and Non-Metals,Human eye and Colorful World, Atoms and Molecules and Standard Tests, force andLaws of motion, Matter in our Surroundings, Gravitation, Periodic Classification ofElements, Light,Shadows and Reflections, Structure of an Atom, Our Environment,Fun with Magnets, Electric Current and its effects Fig. 8.

Least time seems to be spent on Sources of Energy, Carbon and its compounds,Changes around us, Light, Separation of substances, Acid Bases and Salts, Managementof Natural Resources, The Living Organisms, Why do we fall ill, Magnetic Effects ofElectric Current, Hereditary & evolution, Is Matter Around us Pure, Heat and Winds,Storms and Cyclones suggesting the content needs further improvement.

3.3.2.3 Average assessment score on each topic

Inference Data analysis of assessments in the above graph indicates that, after accessingthe site, the learners’ have scored the highest percentile in ten topics showing goodunderstanding of concepts. These topics are- Water: A Precious Resource, ElectricCurrent and its Effects, Sorting Materials into Groups, Our Environment, Weather,Climate and Adaptations of Animals to Climate, How do Organisms Reproduce, Natural

Educ Inf Technol

Fig. 7 Frequently accessed topics in science

Educ Inf Technol

Fig. 8 Topic wise average time spent in science

Educ Inf Technol

Resources, Management of Natural Resources, Carbon and its Compounds, Acids, Basesand Salts. But, majority of the learners have gained average marks in remaining topics suchas Motion and Time,Control and Coordination,Force and Laws of Motion, Is MatterAround Us Pure, Matter in Our Surroundings,Changes Around Us, Combustion andFlame,Structure of The Atom, Human Eye and Colorful World, Fun With Magnets,LifeProcesses,Electricity And Circuits,Heredity and Evolution,Metals and Non-metals,Why dowe Fall ill,Improvement in Food Resources,Separation of Substances, Tissues, ChemicalReactions and Equations,Materials: Metals and Non-Metals,Winds, Storms andCyclones,Work and Energy,The Fundamental Unit of Life,Stars and Our SolarSystem,Coal and Petroleum and Diversity in Living Organisms. It implies that the learnersneed to focus more on above mentioned topics Fig. 9.

3.3.3 Physics


Inference The above findings clearly show that learners showed immense interest in thetopics- Light, Force, Universe, and Sound.

Whereas least navigated topics identified are Energy And Work,Motion And ItsKinds,Some Facts About Magnet,The Electric spark,Dynamics,Magnetism, RefractionAt Plane Surfaces,Static Electricity,Our Universe,Measurement Of Mass, Weight andDensity, More About Solids, Liquids And Gases, Kinematics,Magnetism and Electricity,Fundamental Measurements, Moving Objects,Measurement of Length, Volume, Timeand Mass, Modern Physics, Wonders of light Part—I, Electromagnetic Induction Fig. 10.

3.3.3.2 Average time spent on each topic

Inference This analysis indicates that the learners have spent the maximum time ontopics such as Light, Shadows and Images, Measurements in Our Daily Life andCalorimetry. This suggests that either these are their preferred areas of interest or theinformation present to the learners is of their relevance Fig. 11.

Considerably less amount of time has been spent on the remaining topics likeRefraction Through a Lens, Work, Energy and Power, Thermometry,Reflection OfLight At A Plane Surface,Pressure In Fluids And Atmospheric Pressure, Air, Winds andCyclones, Motion and Time, Motion In One Dimension, Heating Effect of ElectricCurrent and its Applications, Magnetic Effects of Electric Current, Refraction At PlaneSurfaces, Dispersion of Light and Natural Optical Phenomena, Forms of Energy, Light:Nature, Reflection and Refraction, The Electric spark, More About Solids, Liquids AndGases,Our Universe—Gravitation, Current Electricity, Newton’s LawsOfMotion, Sound,Measurement, Kinematics, Magnet, The Laws of Motion, Rectilinear Propagation ofLight, Energy And Work, Wave Motion, Transfer Of Heat and Static Electricity.

3.3.3.3 Average assessment score on each topic

Inference Data representation in the above graph indicates, that after accessing thesite, the learners’ had scored high in topics—Heating Effect of Electric Current

Educ Inf Technol

and its Applications, Learning How to Measure Temperature and itsMeasurement, Up thrust In Fluids And Archimedes’ Principle, Measurement Of

Fig. 9 Topic wise average score in science

Educ Inf Technol

Length, Volume, Time And Mass, Some Facts About Magnet, Spectrum, TransferOf Heat, Reflection Of Light At A Plane Surface, Static Electricity. The Learnershave performed averagely in topics—Electric Charge, Motion and Time, TheLaws of Motion,Refraction At Plane Surfaces,Floatation And RelativeDensity,Playing with Magnets,Fundamental Measurements, Newton’s Laws Of

Fig. 10 Frequently accessed topics in physics

Educ Inf Technol

Fig. 11 Topic wise average time spent in physics

Educ Inf Technol

Motion,All About Electromagnetism,Waves,Our Universe,Motion In OneDimension,Universe,Magnetism,Stars and Our Solar System, More AboutEnergy, More About Solids, Liquids And Gases and Thermometry Fig. 12.

However, the learners had scored less on the topics—Energy And Work Force,Some Facts About Force, Work, Energy and Power, The Electric spark RectilinearPropagation of Light, Pressure In Fluids And Atmospheric Pressure. A study byLu (Zhu et al. 2000) analyzed log file interactions on a website also found arelationship between frequencies of assess to learning resources and final examscores.

3.3.4 Testing the significant difference—inferential analysis

3.3.4.1 Correlation analysis (Using pearson coefficient) Previous researches also usesmore advanced statistical methods such as correlation analysis to infer student’sattitudes that affect learning (Arroyo et al. 2004), or for predicting their final exam

Fig. 12 Topic wise average score in physics

Educ Inf Technol

score (Pritchard and Warnakulasooriya 2005). We use SPSS 16 to conduct thefollowing inferential analysis.

3.3.4.2 Correlations Correlation coefficients between Average Time and AverageScore of the learner sample are given in the table below.

Null Hypothesis states that there will be no significantrelationship between Average Time and Average Score ofthe learners.

As is evident from the table below, Result (r=.018)—TheAverage Time has negligible relationship to Average Score at.01 level of significance (in 2-tailed) Table 2.

3.3.4.3 Correlations Correlation coefficients between Total Attempts and AverageScore of the learner sample are given in the table below.

Null Hypothesis states that there will be no significantrelationship between total attempts and Average Score ofthe learners.

As is evident from the below table, Result (r=.73**)-Totalattempts have very strong positive relationship to AverageScore at .01 level of significance (in 2 tailed) Table 3.

3.3.4.4 Correlations Correlation coefficients between Total Attempts and AverageTime of the learner sample are given in the below table.

Table 3 Correlation between total attempts & average score

Total attempts Average score

Total attempts Pearson correlation 1 .73a

Sig. (2-tailed) .000

N 3561 3561

average score Pearson correlation .73a 1


N 3561 3561

a Correlation is significant at the 0.01 level (2-tailed)

Table 2 Correlation between average score & average time

Average score Average time taken (secs)

Average score Pearson correlation 1 .018


N 3561 3561

Average time taken (secs) Pearson correlation .018 1


N 3561 3561

Educ Inf Technol

Null Hypothesis states that there will be no significantrelationship between Total Attempts and Average Time ofthe learners.

As is evident from the above table, Result (r=− 0.03) -TotalAttempts has no relationship to Average Time, at 0.01 level ofsignificance (in 2- tailed test) Table 4.

3.3.5 Test of significance of regression parameters

Regression analysis can be used to predict a student’s knowledge and which metricshelp to explain the poor prediction of exam scores (Feng and Heffernan 2005), forpredicting whether the student will answer a question correctly enough (Beck et al.2000), and for predicting end-of-year accountability assessment scores (Anozie andJunker 2006). Our hypothesis test is:-

Ho: This independent variable (Total Attempts/Average Time Taken) is not asignificant predictor of the dependent (i.e. Marks Scored)Ha: This independent variable (Total Attempts/Average Time Taken) is a signifi-cant predictor of the dependent variable (Marks Scored)

Model summary Since the value of R square=.007 in Table 5, it means that 7 % ofvariation in the Marks Scored are explained by Average Time Taken (ATT) and TotalAttempts (TA).

Table 5 Model summary

Model summary

R R square AdjustedR square

Std. error ofthe estimate

R squarechange

F change df1 df2 Sig. F change

.085a .007 .007 20.479 .007 12.989 2 3558 .000

a Predictors: (constant), average

Time taken (secs), total attempts

Table 4 Correlation between total attempts & average time

Total attempts Average time taken (secs)

Total attempts Pearson correlation 1 −.003Sig. (2-tailed) .847

N 3561 3561

Average time taken (secs) Pearson correlation −.003 1


N 3561 3561

Educ Inf Technol

ANOVA table The value of R square is significant as indicated by the p value 0.000 of Fstatistic as given in ANOVATable 6.

The estimated regression equation as obtained in table () is written as Y=44.498–0.006TA-0.005ATT.

The above equation shows that the Total Attempts is negatively related with MarksScored as it is evident from the negative value of its coefficient (−0.006) . Similarly,Average Time Taken is negatively related to the Marks Scored as the coefficient forATT is −0.005 Table 7.

Since p<0.05, we reject the null and conclude that both independent variables arepredictors of dependent. Also, the relative importance of Average Time Taken is morethan the Total Attempts in explaining the Marks Scored by comparing the value ofstandardized regression coefficients.

3.3.6 Sequential rule mining

A sequential mining tool (Romero et al. 2008b) can discover and recommend themost interesting paths used by students from log data. A typical sequential patternis that if 70 % of learners who first visit A.html, followed by B.html will alsoaccess page C.html in the same learning session. These frequent patterns thusdiscovered from previous interaction (i.e. an analyzed web log data) of n learnerswith the system would help in making recommendations to the n + 1th (new)

Table 7 Coefficients

Coefficientsa

Model Un standardized coefficients Standardized coefficients

B Std. error Beta t Sig.

(Constant) 44.498 .497 89.599 .000

Total attempts average −.006 .003 −.034 −2.059 .040

Time taken (secs) −.005 .001 −.078 −4.669 .000

a Dependent variable: marks scored

Table 6 Anova

ANOVAb

Model Sum of squares df Mean Square F Sig.

1 Regression residual total 10894.674 2 5447.337 12.989 .000a

1492173.975 3558 419.386

1503068.649 3560

a Predictors: (constant), average time taken (secs)b Dependent variable: marks scored

Educ Inf Technol

learner. It can discover rules as suggested in (Mahajan et al. 2012) (i.e. indiscovering relevant associations between learning activities & generating associ-ation rules that are applied in real time). The following figure shows frequentpattern mining rules-the antecedent of a rule shows current session activities &consequent suggests recommended next step in the learning session e.g. thefrequent pattern 5- >1 suggests 5 as the next possible learning path.

There are various pattern-discovering algorithms e.g. GSP(Generalized SequentialPatterns) [by Agrawal and Shrikant], Web Access Pattern (WAP) mine (Pei et al. 2000),Pre order Linked (PL WAP) (Ezeife et al. 2005). We used Pentium processor with1.66GHz and 2.5GB Ram memory with Windows XP 2003. All codes were opensource codes and were implemented in C++, database MySQL as in (Ezeife et al.2005). The dataset of experiments consisted of 3,561 learners’ navigation data fromreal web log dataset. A sample of results of this experiment is shown in figure givenbelow. Most efficient in our case is GSP at different levels of minimum support, appliedto the hierarchy of navigation paths (e.g. A->B->C or 7->5->3), to discover andrecommend the most frequent paths used by previous learners, to the new learner whoshare similar characteristics. The following figure shows a sample of learner’s frequentpatterns mined using GSP Fig. 13.

The recommendation rules obtained matches the learners’ current navigation se-quence with the frequent web access patterns (i.e. the antecedent of the rule) and predictthe most frequent learner path (i.e. the consequent of the rule).

4 Discussion

1. Acceptance of e-learning at various levels of education:The research shows that e-Learning is accepted in Indian education system.

However, its acceptance varies from VIth standard to Xth standard. Decreasingacceptability reason could be lack of maturity, focus and interest in lower levels.Whereas students at senior secondary level are more serious, self-driven andfocused. Most importantly, they have strong realization of self learning. Like(Mobasher et al. 1997) found that frequent usages of the website leads to highercourse grades.

Fig. 13 Hierarchy of frequent paths for building recommender system

Educ Inf Technol

2. Subject wise access patterns; clearly represent maximum usage of e-contentpertaining to Maths. This trend shows that learners took maximum interest toMaths questions and solutions, which enabled them for more challenging problemsolving compared to Botany and Zoology, as these are theoretical subjects and doesnot offer anything extra to the learners.

The analysis of self-assessment scores showed what e-content was quite usefuland catered to their learning needs. Hence the learners gained required competence atthe end of the course, where their assessment scores were high. Other researchers like(Valsamidis and Democritus 2011) also assess the relationship between learner’scourse usage and his corresponding grade. The analysis of topics, where they scoredless, shows what topics were not well explained and hence need more focus forimprovement. The findings by (Davies and Graff 2005) revealed that greater onlineinteraction did not contribute significantly to higher performance for students;however, students who failed in their courses tended to interact less frequently.

3. Topic wise frequent access patterns indicate what topics stimulated interest, in asubject, by people of a particular age group. This is possibly due to their magnitudeof complexity, or are of interest or the information provided on these topics mightbe more relevant to them. The analysis also showed least accessed topics, reflectinglearners previous comfort level/expertise in the above resources or they may havefound the information for these topics irrelevant to them.

Future research on this would further lead to reasons that could improve the e-content usage & what steps to take to make e-content more useful and whatlearners really value. The future research can help in creating or improving thoseweak elements that learners don’t find useful.

4. The analysis also revealed maximum time spent on the topics in various subjectsby varied learners. This indicates topics of interest which learners might haveintended to find the weak areas to improve on. The topics where least amount oftime was spent revealed learners’ lack of interest or relevance.

5. The correlation output for different combinations of the three dimensions namelyaverage time spent, average assessment marks and frequency of access arepresented.

a). The Average Time has negligible relationship to Average Score at .01 level ofsignificance (in 2-tailed). It reveals that no relationship exist between theabove two dimensions. This could be because the learner could have obtainedhigh grade because he also used other educational material and came to webpage only for assessment. So this Time Spent is not the time spent only onweb site for learning.

b). Total attempts have very strong positive relationship to Average Score at .01level of significance (in 2 tailed). Hence increase in the number of attempts,will lead to increase in the assessment scores.

c). Total Attempts has no relationship to Average Time at 0.01 level of signifi-cance (in 2- tailed test). This suggests that there is no correlation betweenthese two dimensions.

6. Both independent variables Total Attempts and Average Time Taken are significantpredictors of dependent i.e. Marks Scored. Moreover, Average Time Taken is moreimportant than the Total Attempts in explaining the Marks Scored.

Educ Inf Technol

7. As a result of Frequent Pattern Mining, we found a meaningful hierarchy of paths(A- >B- >C– >D) taken by the learners as Class- >Subject- >Chapter- >Topic. Byapplying Algorithms like GSP, WAP mine, PL WAP etc., is possible to identify,which sequence of topics are used most frequently and least frequently at variouslogical levels.

Similarly, it is also possible to find the most relevant hierarchy in terms ofassessment marks and average time spent.

5 Conclusions

Based on the stated research objectives, the result of analysis shows how the e-content was accessed and the relationship between different variables of a weblog data.

This reveals patterns (or, a lack of them) showing how the learners have beeninteracting with the website. The practical impact of this study is that all of thisinformation helps in arriving at the most preferred learner access patterns that resultsbetter learning. These learner actions are associated with more learning and helps inarriving at sequence of topics is most preferred and least preferred by a specific learner.It could suggest

i. Shortcuts to frequently visited pages based on previous user activities.ii. Layout of course can be reorganized to better suit learner’s needs.iii. Suggest topics to improve the performance based on online assessment results.

This, in turn, can enable an e-learning site to reorganize the site content moreefficiently or to recommend additional topics. It helps in identifying those inter-action sequences which indicate problems and to explore how to reorganizeplacement of material and assessments on the basis of usage and performancedata. The Adaptation Model will ‘learn from these suggestions’ beforerestructuring itself automatically. It will accordingly, pre fetch frequently accessedpages, eliminate weak links and determine best way to adapt the e-learningsystem. Further work needs to investigate how frequent pattern mining can beapplied to the web log data using discovered patterns for personalizing activitiesto different clusters of learners. An adaptive e-learning system that can automat-ically tracks learner’s activities and intelligently recommend online resources canthus enhance overall e-learning experience.

6 Future scope

Although, interesting patterns could be found, it requires future work.

a) Designing Optimized data structures for storing the frequent web patterns forrecommendations

Educ Inf Technol

b) Further study can be done to find better visualizations that result in quickerteachers’ comprehension to provide timely feedback.

c) Other Web Usage Mining techniques such as outlier analysis for recognizingunusually good or bad performers from performance data in web log, to beprocessed and interpreted quickly.

This information on actual web usage by a learner can help in adapting awebsite to suit another similar potential learner.

References

Anozie, N. O., & Junker, B. W. (2006). Predicting end-of-year accountability assessment scores from monthlystudent records in an online tutoring system. American Association for Artificial Intelligence Workshopon Educational Data Mining (AAAI-06), July 17.

Arroyo, I., Beal, C. R., Murray, T., Walles, R., & Woolf, B. P. (2004). Web-based intelligent multimediatutoring for high stakes achievement. Proceedings of the Intelligent Tutoring Systems, 7th InternationalConference (pp.468–477), ITS 2004, Springer

Baron, S, & Spiliopoulou, M. (2004). Monitoring the evolution of web usage. [Online]. Available: http://link.springer.com/content/pdf/10.1007/978–3–540–30123–3_11.pdf.

Beck, J. E., Woolf, B. P., & Beal, C. R. (2000). ADVISOR: A machine learning architecture for intelligenttutor construction. Seventeenth National Conference on Artificial intelligence, 552–557.

Comunale, C. L., & Sexton, T. R. (2001–2002). The effectiveness of course Web sites in higher education: anexploratory study. Journal of Educational Technology Systems, 30(2), 171–190.

Cotton, D. R. E., & Grestya, K. A. (2007). The rhetoric and reality of e learning: using the think-aloud methodto evaluate an online resource. Assessment & Evaluation in Higher Education, 32, 583–600.

Davies, J., & Graff, M. (2005). Performance in e-learning: online participation and student grades. BritishJournal of Educational Technology, 36(4), 657–663.

Esichaikul, V, Lamnoi, S., & Bechter, C. (2011). Student modelling in adaptive e-learning systems, knowledgemanagement & e-learning. An International Journal (KM&EL), 3.

Ezeife, C. I., Lu, Y., & Liu, Y. (2005) PLWAP sequential mining: open source code.Feng, M., & Heffernan, N.T. (2005). Informing teachers live about student learning: Reporting in the

ASSISTment System. The 12th Annual Conference on Artificial Intelligence in Education Workshopon Usage Analysis in Learning Systems.

Gao, T., & Lehman, J. D. (2003). The effects of different levels of interaction on the achievement andmotivational perceptions of college students in a web-based learning environment. Journal of InteractiveLearning Research, 14(4), 367–386.

Hardy, J., Antonioletti, M., & Bates, S. P. (2004). E-learner tracking: tools for discovering learner behaviour.The IASTED International Conference on Web-base Education.

Hellwege, J., Gleadow, A., & McNaught, C. (1996). Paperless lectures on the web: An evaluation of theeducational outcomes of teaching Geology using the Web. Proceedings of 13th Annual Conference of theAustralian Society for Computers in Learning in Tertiary education.

Intratat, C. (2011). Alternatives for making language learning games more appealing for self-access learning.Studies in Self-Access Learning Journal, 2(3), 136–152.

MacQueen, J. B. (1967) Some methods for classification and analysis of multivariate observations.Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley,University of California Press, 1, 281–297.

Mahajan, R., Sodhi, J. S., & Mahajan, V. (2012). Mining user access patterns efficiently for adaptive e-learning environment. International Journal of e-Education, e-Business, e-Management and e-Learning,2, 277–279.

McIsaac, M. S., & Blocher, J. M. (1999). Student and teacher perceptions of interaction in online computer-mediated communication. Educational Media International, v.36 n.2, 121–131.

Mobasher, B., Cooley, R., & Srivastava, J. (1997). Web mining: information and pattern discovery on theworld wide web. Tools with Artificial Intelligence, pp. 558–567.

Educ Inf Technol

http://dx.doi.org/link.springer.com/content/pdf/10.1007/978%E2%80%933%E2%80%93540%E2%80%9330123%E2%80%933_11.pdf

http://dx.doi.org/link.springer.com/content/pdf/10.1007/978%E2%80%933%E2%80%93540%E2%80%9330123%E2%80%933_11.pdf

Mostow, J., & Beck, J. (2006). Some useful tactics to modify, map, and mine data from intelligent tutors.Natural Language Engineering (Special Issue on Educational Applications), 12(2), 195–208.

Pei, J., Han, J., Mortazavi-asl, B., & Zhu, H. (2000). Mining access patterns efficiently from web logs. InProceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 396–407). Kyoto: Springer.

Peled, A., & Rashty, D. (1999). Logging for success: advancing the use of WWW logs to improve computermediated distance learning. Journal of Educational Computing Research, 21(3).

Pritchard, D., & Warnakulasooriya, R. (2005). Data from a web-based homework tutor can predict student’sfinal exam score. InWorld Conference on Educational Multimedia, Hypermedia and Telecommunications,Vol, no. 1 (pp. 2523–2529).

Romero, C., Ventura, S., & García, E. (2008a). Data mining in course management systems: Moodle casestudy and tutorial. Computers & Education, 51, 368–384.

Romero, C., Ventura, S., & García, E. (2008b). Data mining in course management systems: Moodle casestudy and tutorial. Computers & Education, 51, 368–384.

Sheard, J., Albrecht, D., & Butbul, E. (2005). ViSION: Visualization student interactions online. Proceedingsof the Eleventh Australasian World Wide Web Conference, pp. 48–58.

Sia, K., Cho, J., Hino, K., Chi, Y., & Tseng, S. (2007). Monitoring RSS feeds based on user browsing pattern.International Conference on Weblogs and Social Media, pp. 161–168.

Vaarandi, R. (2003). A data clustering algorithm for mining patterns from. IEEE IPOM’03 Proceedings, pp.119–126.

Valsamidis, S., & Democritus, S. K. (2011). E-learning platform usage analysis. Interdisciplinary Journal of E-Learning and Learning Objects, 7, 185–204.

Vanijja, V., & Supattathum, M. (2006). Statistical analysis of eLearning usage in a university. ThirdInternational Conference on eLearning for Knowledge-Based Society.

Yadav, D., & Choubey, A. (2012). E-learning: current state of art and future prospects. IJCSI InternationalJournal of Computer Science Issues, 9, 490–499.

Zhu, J. J. H., Stokes, M., & Lu, A. X. Y. (2000). The use and effects of web-based instruction: evidence from asingle-source study. Journal of Interactive Learning Research, 11(2), 197–218.

Educ Inf Technol

Date post:	23-Dec-2016
Category:	Documents
Upload:	vishal
View:	217 times
Download:	3 times

Usage patterns discovery from a web log in an Indian e-learning site: A case study

Documents