Date post: | 07-Apr-2018 |
Category: |
Documents |
Upload: | alirezadavoodi |
View: | 228 times |
Download: | 0 times |
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 1/47
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 2/47
mountain. The main interaction of a player with Prime Climb consists of making a movement
from a location on a mountain of numbers to another location on the mountain until she reachesthe top of the mountain. Each location on a mountain either represents a number or is blocked.
The other possible forms of interactions with the game is attending to the given the hints and
using a tool called Magnifying glass which shows the factor tree of a number once the student
uses the magnifying glass and clicks on a number on the mountain.Prime Climb utilizes a probabilistic student’s model, a Dynamic Bayesian Network (DBN) to
track and assess the student’s number factorization knowledge during the interaction. Thepedagogical agent embedded in the game, will provide the student with adaptive hints when,
according to the student’s model assessment, the student needs such interventions. As an
adaptive educational game, successfulness of Prime Climb in assisting the student learn numberfactorization knowledge depends on how accurate the student’s model is when evaluating the
level of relevant knowledge and skills and providing supports to the student.
The objective of this report is three folds. Firstly, we summarize the simulations carried out to
improve the student’s model accuracy. To this end, a data-driven approach was applied to refinethe parameters of the student’s model which are essential in defining the conditional probability
tables of the nodes in the DBN which is designed for Prime Climb. Then, we report on theanalysis of the performance of the pedagogical agent in providing adaptive interventions to thestudents during the game-play. To this end, two measures of intervention performance called hint
precision and hint recall were defined and calculated. Finally, the accuracy of the student’s
model in assessing the current level of number factorization knowledge is examined andanalyzed.
The rest of this manuscript is organized as following. In Section 2, we briefly summarize the
results of the data-driven student’s model parameters refinement. Section 3, discusses the
analysis of the intervention mechanism used in Prime Climb. Section 4 describes the effects of using different prior probabilities settings on the hinting mechanism in Prime Climb. Section 5
focuses on analysis the performance of the student’s model in evaluating the level of
factorization knowledge during the interaction. Section 6 summarizes the statistical analysis of the effect of different prior probabilities settings on the student’s model. Section 7 presents some
preliminary results on analysis of the pre-test and post-test. Finally, in Section 8, some future
works are mentioned.
2 Data-Driven Model Refinement
Prime Climb utilizes a parametric probabilistic student’s model, a Dynamic Bayesian
Network, DBN, to track the evolution of the student’s number factorization and common factor
skills while the student interacts with the game. Essentially, there exists three steps in creating aDBN as following:
1. Determining the random variables and their domains.
2. Specifying the connections among different random variables.
3. Parameterizing the model by specifying the conditional probability tables, CPT, of the
random variables and specifying the prior probabilities if available.
In Prime Climb, an expert-driven approach has been used for defining the random variables, their
domains and connections among them. In such expert-driven approach, a domain expert
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 3/47
determines the variables and their connections based on her own intuition and experience. On the
Contrary, a data-driven mechanism has been used to find the optimal parameters setting to beused in the conditional probability tables, CPTs, of the nodes in the network. In such data-driven
method, the values of the relevant parameters are calculated using sample training datasets.
There are four parameters used in student’s model in Prime Climb. These parameters specify
how the evidence (making a correct or wrong movement) propagates in the student’s model andrepresent the probability of a student making a correct or wrong movement under a specific
situation as well as knowing/not knowing numbers factorization knowledge. These parametersare as following:
• Guess: The probability that the student makes a correct movement while the student doesnot have the required skill for making such a movement.
• Edu-Guess: Standing for Educational-Guess, Edu_Guess determines the probability thatthe student makes a correct movement while she partially has gained the requiredknowledge for making such a movement.
• Slip: The probability that the student makes a wrong move, while based on the student’smodel assessment, the corresponding skill is known to the student.
• Max: Show how the evidence on a skill will propagate to other relevant skills.In addition, one other step of constructing a DBN, as mentioned earlier, is assigning the random
variables with initial probabilities known as prior probabilities. In Prime Climb, three types of
prior probabilities settings are considered namely, 1) Population, 2) User-specific 2) Generic.We elaborate on these prior probabilities settings and the model’s parameter in the subsequent
subsection.
2.1 Sensitivity of the Model to Parameters
Given the structure (nodes and connections) of the DBN in Prime Climb, a more appropriate set
of model’s parameters allows the model to more precisely track and assess the evolution of the
desired skills (factorization and common factor) during the game-play and eventually at the endof the interaction, result in posterior probabilities for the skill’s corresponding nodes in the DBN,
which accurately predicts the relevant knowledge in the students after the game-play.
In order to find the best set of parameters, a comprehensive range of values between 0-1 wasselected for each parameter. We then utilized a Receiver-Operator Curve (ROC) and found the
best pair of sensitivity and specificity which results in the highest accuracy and balance between
sensitivity and specificity. A ROC Curve plots the true positive rate (sensitivity) versus false
positive rate (1-specificity) when a discrimination threshold varies. As our measure of accuracy,we chose accuracy=(sensitivity+specificity)/2. sensitivity is the true positive rate, the percentage
of known skills that the model classifies as such. specificity is the true negative rate, the
percentage of unknown skills classified as such. A simulator was developed to simulate theinteractions of 45 students with Prime Climb. Table 1, represents the optimal values found for
the model’s parameters. In the next subsection we report on the values of specificity, sensitivity
and accuracy for the different prior probabilities settings.
Table 1: Optimal values of the model’s parameters
Parameter Guess Edu-Guess Slip Max
Value 0.6 0.7 0.1 0.2
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 4/47
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 5/47
Table 2: Summary of the simulation results for different prior probabilities settings
Prior Setting Accuracy Specificity Sensitivity
Population 0.755 0.737 0.779
User-specific 0.713 0.648 0.77
Generic 0.684 0.773 0.612
A probable drawback of relying on this result and using the population prior probabilities for the
future studies with different students could be that future subjects might have lower level of number factorization knowledge compared to the sample group of subjects used to refine the
student’s model parameters and this would result in a model which initially might overestimate
the student’s knowledge and might not perform as well as expected. On the hand, the user-
specific prior probability setting is the one which is expected to more specifically represent the
student’s prior number factorization knowledge than the other two settings. Yet, according to the
results in Table 2, using user-specific prior probabilities resulted in a model the lowest specificity
compared to the other two settings.
3 Hint Precision and Recall
A true intervention strategy in an adaptive educational game insures pedagogical effectiveness
by providing decent tailored supports when required while does not intervene amply which
might negatively affect the user’s engagement in the game. The intervention mechanism in Prime
Climb has been developed in forms of providing different types of hints during the interaction of the student with the game. The hinting strategy in Prime Climb utilizes the student’s model’s
assessment of the student number factorization and common factor knowledge during the game-
play to provide adaptive supports in terms of hints on unknown skills. To decide on when tointervene, the hinting strategy uses four thresholds namely: 1) Fact-CorrectMove, 2) Fact-
WrongMove, 3) CF-CorrectMove and 4) CF-WrongMove. The first two thresholds, 1 and 2,determine the values, used to evaluate a number factorization (Fact) skill as known or unknownafter a correct and wrong movement respectively. Similarly, the last two thresholds, 3 and 4, are
used to assess the common factor (CF) skill as known or unknown immediately after a correct
ore wrong movement. A human-adjusted approach has been applied to find an original setting
for the four aforementioned thresholds in the intervention strategy in Prime Climb. To this end,subsequent to choosing some initial values for each of the thresholds, some graduate students
played the game and their reports on timing the hints were used to adjust the initial values for the
thresholds. The Table 2 shows the final values selected for each of the thresholds.
Table 3: The thresholds used in the hinting algorithm in Prime Climb
Threshold Final value
Fact-CorrectMove 0.5
Fact-WrongMove 0.8
CF-CorrectMove 0.1
CF-WrongMove 0.5
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 6/47
The Algorithm 1 shows how these thresholds are used in the intervention mechanism in Prime
Climb to decide when and on what skill to provide hints.
Algorithm 1: Hinting strategy in Prime Climb
//Initializing variables
if (Player made a correct move)
{
fact_unknown = (playerBelief < fact_correctMoveHintThreshold ||
partnerBelief < fact_correctMoveHintThreshold);
cf_unknown = (cfBelief < cf_correctMoveHintThreshold);
}
else //Player made a wrong move
{
fact_unknown = (playerBelief < fact_wrongMoveHintThreshold ||
partnerBelief < fact_wrongMoveHintThreshold);
cf_unknown = (cfBelief < cf_wrongMoveHintThreshold);}
//When and what skill to hint on
if (cf_unknown && (!fact_unknown))
{
Hint on Common Factor Skill
}
else if (fact_unknown && (!cf_unknown))
{
Hint on Factorization Skill
}
else if (cf_unknown && fact_unknown)
{Hint on Common Factor and Factorization alternatively
}
Algorithm 1: The hinting strategy in Prime Climb
From a pedagogical perspective, it is essential to provide the student with “correct” supportswhen she needs it. A “correct” support is given on the correct skill when required and presented
with helpful context in a way that encourages the student to attend to the support. As the
intervention mechanism in Prime Climb uses real-time assessment of the student’s knowledge to
determine when and on what to provide help, effectiveness of the mechanism is influenced byhow accurately the student’s model tracks and assesses the evolution of desired skills. To
investigate how well the hinting strategy and student’s model provides tailored supports to thestudent during the interaction, two measures of performance are defined: 1) Hint Precision and2) Hint Recall.
Generally, precision is defined as the fraction of retrieved instances which are relevant while
recall is the fraction of relevant instances that are retrieved. Similarly hint precision is defined asthe fraction of given hints which are justified and the hint recall is defined as the fraction of
justified hints which are retrieved and given to the student.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 7/47
An intervention provided to the user is called justified if it is given at the correct time and on the
right skill. On the contrary, an unjustified intervention is presented to the student when it is notrequired and expected by the student. Similarly, if the intervention strategy fails to provide a
justified intervention, it is said that a justified intervention has been missed. Finally, when no
intervention is given when it is not required, the intervention mechanism has “correctly not
given” the hint. Given these terminology, the hint precision and hint recall are defined using thefollowing equations.
Equation 1: Hint precision
)intint(
intPrint
shd unjustifieof Number sh justified of Number
sh justified of Number ecision H
+
=
Equation 2: Hint recall
)intint(
intReint
shmissed of Number sh justified of Number
sh justified of Number call H
+
=
3.1 Simulation of the intervention mechanism using the original threshold setting
In order to calculate the hint precision and hint recall in Prime Climb, the data from interactionsof 45 students in grade 5,6 with Prime Climb was used to simulate the hint strategy using the
original parameter settings (see Table 3). To this end, we initialized the student’s model with
each of the settings of prior probabilities and used the optimal model’s parameters setting (See
Table 1). Since there is no ground truth on how the student’s number factorization and commonfactor knowledge evolve during the interaction of the student with Prime Climb, in the process of
calculating the hint precision and hint recall, we only considered the movements in which either
the player’s number or the partner’s number or both keep the same score from pre-test to post-test. In each movement made by the student, there are two numbers involved: 1) Player’s
number and 2) Partner’s number. The player’s number is the number to which the player has
just moved while the partner had moved to the partner’s number on the mountain. All thenumbers the students ever moved to during the game-play were assigned a label based on the
performance of the student on that specific number in the pre and post tests. We used 5 labels to
represent the status of the numbers from the pre-test to post-test as following:
1. KK : Stands for Known-Known and shows that the number has been known to the studentboth in the pre-test and post-test (student has answered correctly to the number’s
corresponding question in both tests).
2. UU : Stands for Unknown-Unknown and shows that the number has been unknown to thestudent both in the pre-test and post-test.
3. KU : Stands for known-Unknown and shows that the student has correctly answer the
number’s corresponding question in the pre-test and wrongly in the post-test.4. UK : Stands for Unknown-Known and shows that the student has given a wrong answer to
the number’s corresponding question in the pre-test and a correct answer in the post-test.
5. NAP: If the number does not appear on the tests.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 8/47
Given the above terminologies, the types of the hints are defined based on the status of the
numbers on which the hints are given as following:
• Justified hint : A hint which is given on a number with status of UU .
• Unjustified hint : A hint which is given on a number with status of KK.
• Missed hint: When the hinting mechanism fails to provide a hint on a number with statusof UU.
• CorrectlyNotGiven hint : When the hinting mechanism correctly detects not to provide
hint on a number with status of KK .
In calculation of hint precision and hint recall it has been assumed that a student should receive a
hint following a movement which contains at least a number with status of UU and should never
receive a hint on a number with a status of KK . For each set of prior probabilities, total numbers
of different types of hints were calculated and the confusion matrix was constructed. Table 4shows the structure of the confusion matrix calculated for the intervention mechanism in Prime
Climb. For instance, in this confusion matrix, an unjustified hint is a hint given on a number
which is known to the student according to pre-test and post-test scores of the student and isunknown on the basis of the student’s model assessment.
Table 4: Structure of the confusion matrix for the intervention mechanism
Model assessment of student knowledge
Unknown Known
Pre-Post
Test
Known Unjustified hint (UJ) Correctly Not Given (CN)Unknown Justified hint (J) Missed hint (M)
3.1.1 Simulation of the Intervention Mechanism Using Population Prior Settings
As previously discussed, three types of prior probabilities settings are used in Prime Climb to
initialize the student’s model. Table 5 represents the confusion matrix for the hinting mechanism
in Prime Climb when the population prior setting was used. The result was based on using theoriginal thresholds for the hinting strategy (see Table 3) and optimal model’s parameters (see
Table 1).
Table 5: Confusion Matrix (# of raw data points and [percentages]) when the population priors is used
Model assessment of student knowledge(Population-based Prior)
Unknown Known Total
Pre-PostTest Known 108 [12.3%] (UJ) 306 [34.8%] (CN) 414 [50.9%]Unknown 122 [13.9%] (J) 343[39.0%] (M) 465 [49.1%]
Total 230 [26.2%] 649 [73.8%] 879 [100%]
Given the equations 1 and 2, the hint precision and hint recall are 0.53 and 0.26 respectively.
As calculated, the hint precision and hint recall are of low values which means that initializing
the student’s model with the population prior probabilities and using the model as the basis forproviding tailored supports to the student could result in many unjustified interventions (almost
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 9/47
47% of all interventions) and this has the potential of ceasing the student to benefit from the
provided supports. It also could result in many missed hints (about 74% of the time the modelfails to provide a justified hint) which could negatively affect the learning gain in the students.
To find out which situations during the game-play make the most contribution toward
lowering the hint precision and hint recall we made some further investigations. To this end, all
the movements made by the student were extracted from the log files and each movement wasassigned a label which comprised the status of the player’s number in the pre-test and post-test
followed by the status of the partner’s number in the pre-test and post-test. A number’s status isof format of XY which X represents if, based on the pre-test result, the student knows (K ) / does
not know (U ) the factorization of the number. Whether the student knows the factorization of the
same number based on the post-test result is shown by Y . If a number does not appear in the pre-test and post-test, it is assigned a NAP status. For instance, in the status (UK-NAP), UK
represents the status of the player’s number in the movement and shows that factorization
knowledge of the number is Unknown to the student in the pre-test and Known in the post-test.
On the other hand, NAP represents the status of the partner’s number which means that thenumber does not appear in the pre-test and post-test.
Then all the movements which have the potential of receiving unjustified and justified hintswere extracted. Figure 2, illustrates the frequencies of the relevant movements to the hints. Asdepicted, in 3.95% of the time the model underestimates the known number factorization
knowledge in the students. On the contrary, in 64% of the time, when a justified hint was
required, no hint was given to the students, an indication of a high rate of overestimation of unknown number factorization skills. In addition, in “at least” (since we could not judge on
given hints on numbers with status NAP) 22.8% of the time the model succeeded to provide a
justified hint to the student.
Figure 2: Frequency (raw# and percentages) proportion of each hint types to its relevant possible
movements for the population prior
Then all the movements which have the potential of receiving unjustified hints were extracted.
There are 9 types of movements on which unjustified hints might be given. Figure 3 shows the
labels of the 9 movement types.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 10/47
Figure 3: Frequency (raw# and percentages) of the unjustified hints for each movement type
Figure 4: Frequency (raw# and percentages) of the missed hints for each movement type
Figures 3, 4 and 5 illustrate more detailed analysis of all relevant movements to the hints.
Figure 3, represents all the statuses of the movements which are relevant to unjustified hints.
Next to each movement’s status, the raw number and percentage of given unjustified hintsrelevant to the movement is given. For instance, 40 unjustified hints are given on movements
with status of KK-KK which includes 11.5% of all movements with status KK-KK . Figures 4 and
5 represent similar information for the missed and justified hints. As shown in Figure 4, at least
in 50% of all the relevant movements the model has failed to provide a hint and a justified hint
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 11/47
has been missed . In addition, Figure 5 represents the low rate of given justified hints for each
relevant status of the movements.
Figure 5: Frequency (raw# and percentages) of the justified hints for each movement type
We can conclude from Figures 2-5 that the hinting strategy is successful in not giving many
unjustified hints on the numbers on which the student’s model has population prior knowledge,although, as mentioned before, almost 47% of given hints are unjustified . Also the hinting
strategy is in trouble in giving justified hints and there are too many missed hints meaning that
the student’s model overestimates the student’s factorization knowledge on numbers with statusof UU . This deficiency could hinder learning gains through receiving tailored helps during the
interaction with Prime Climb. Similarly, in the next subsection the effect of initializing the model
with the generic prior probabilities on hint precision and hint recall is discussed.
3.1.2 Simulation of the Intervention Mechanism Using Generic Prior Setting
To further investigate the effect of the prior probability settings on the hint precision and hint
recall, a similar process was carried out on the model which was initialized by the generic prior
probabilities. In the generic prior setting, the prior probabilities of all numbers on the mountainsare set to 0.5 regardless of how the student has scored on that specific number on the pre-test.
The confusion matrix of the intervention mechanism based on the generic prior is shown in
Table 6. As calculated by using the Equations 1 and 2, the hint precision and hint recall are0.378 and 0.363 when the generic prior setting. Figure 4 represents the frequencies of all the
relevant movements as well as the frequencies and the percentages of the corresponding hints.The results show an increase in frequency of given unjustified hints and decrease in frequency of
missed hints. A detailed statistical analysis and comparison will be discussed in Section 4. The
results on using the generic prior probabilities provided the intuition that lowering the prior
probabilities could result in higher rate of underestimation of known skills and lower rate of
underestimation of unknown skills.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 12/47
Table 6: Confusion Matrix when generic priors is used
Model assessment of student knowledge(Generic-based Prior)
Unknown Known Total
Pre-Post
Test
Known 379 (UJ) [34.4%] 257 (CN)[23.3%] 636[57.7%]
Unknown 169 (J) [15.3%] 297 (M)[27.0%] 466[42.3%]Total 548[49.7] 554[50.3%] 1102[100%]
Figure 6: Frequency (raw# and percentage) proportion of each hint types to its relevant possible
movements for the generic prior probabilities
Figure 7: Frequency (raw# and percentages) of the unjustified hints for each movement type
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 13/47
Figures 7, 8 and 9 respectively, illustrate all the relevant movement statuses to the hints. Figure 7
shows that a low rate of unjustified hints given on the movements although almost 70% of all thegiven hints are unjustified (see the confusion matrix, Table 6). Furthermore, as shown in Figure
8, the student’s model has failed to provide a justified hint in at least 30% of each relevant
movement statuses. Figure 9 also shows all relevant statuses, the raw frequency of each
movement as well as the raw frequency and percentage of the given justified hints on eachcorresponding status. We also conducted the similar study using the user-specific prior setting as
discussed in the next subsection.
Figure 8: Frequency (raw# and percentages) of the missed hints for each movement type
Figure 9: Frequency (raw# and percentages) of the justified hints for each movement type
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 14/47
3.1.3 Simulation of the Intervention Mechanism Using User-specific Prior Settings
In the user-specific prior setting, the prior probabilities of the numbers appearing in the pre-test
and post-test are calculated based on the student’s performance on the number’s corresponding
question in the pre-test. In other words, if the student has answered correctly to a number’s
corresponding question in the pre-test, the prior probability of the number is set to 0.9 and 0.1otherwise. Clearly, the prior probability of a known number in the user-specific prior setting is
higher than the same number’s prior probability in the generic and population prior probabilitiessettings. To investigate the effect of initializing the student’s model with the user-specific prior
probabilities, we have conducted a similar simulation to the simulations described in the 2
previous subsections. Table 7 represents the confusion matrix of the intervention mechanismwhen the user-specific prior setting is used.
Table 7: Confusion Matrix when the user-specific priors is used
Model assessment of student knowledge
(User-Specific-based Prior)
Unknown Known Total
Pre-Post
Test
Known 79(UJ)[8.7%] 315(CN)[34.8%] 394[43.5%]
Unknown 468(J)[51.7%] 44(M)[4.8%] 512[56.5%]
Total 547[60.4%] 359[39.6%] 906[100%]
When the user-specific prior is used, the hint precision and hint recall are 0.856 and 0.91respectively. The results show a considerable improvement in the hint precision and hint recall
compared to the results obtained when the population and generic priors were used. Figure 10
represents the raw frequencies of all relevant movements to the hints as well as the rawfrequencies and the percentages of the hints. As shown in Figure 10, the student’s model
initialized by the user-specific prior probabilities has succeeded to provide a justified hint on
87.3% of the relevant movements. Also, there are low rates of the unjustified and missed hints.
Figure 10: Frequency (raw# and percentage) proportion of each hint types to its relevant possible
movements when the user-specific prior probabilities are used
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 15/47
Figures 11, 12 and 13 respectively represent all the relevant statues to the hints, the frequencies
of each status’ corresponding movements and the frequencies and percentages of the hints.Figure 11 shows that the highest rate of the unjustified hints is related to the movements with
status of KK-KK . This could be an indication that the students might have made enough wrong
movements which involved numbers with status of KK . On the other hand, it could also be an
indication for a not well adjusted slip parameter (see Section 2). Also the highest rate of themissed hints pertains to the movements with status of UU-KK . Moreover, Figure 13 represents a
high rate of justified hints on each relevant movement. In the next section, a statisticalcomparison of the results will be presented.
Figure 11: Frequency (raw# and percentages) of the unjustified hints for each movement type
Figure 12: Frequency (raw# and percentages) of the missed hints for each movement type
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 16/47
Figure 13: Frequency (raw# and percentages) of the justified hints for each movement type
4 Comparison and Results of Hint Precision and Hint Recall
Table 8, summarizes the results of simulating the intervention mechanism using the three
different prior settings. In the simulation, the total number of movements made by all the players
was 8666 movements. The intervention mechanism used in Prime Climb, provides supports ontwo skills: 1) number factorization skills: the knowledge of factorizing a number to its factors
and 2) common factor skill: the concept of two numbers having at least a factor in common. The
results show that, on average, more than one hint is given on each three movements made by theplayer during the game-play.
Table 8: General statistics on the total number and [percentage] of hints using different prior probability
settings
Prior Setting
Population User-Specific Generic
Number of hints 3344 [38.6%] 3807 [43.9%] 3561[41.1%]
Factorization hints 3256 3721 3510
Common Factor hints 88 86 51
In the following subsections, the effects of initializing the student’s model with three priorprobability settings are compared on the total number of given hints, total number of justified
hints, total number of justified hints, total number of missed hints and total number of correctly
not-given hints. In all the comparisons, we first conducted the test of homogeneity of variancesand whenever there was a violation of the assumption of homogeneity of variance, the Welch test
followed by the Games-Howell post-hoc test has been applied instead of the traditional single
factor ANOVA.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 17/47
4.1 Total Number of Given Hints
We found no statistically significant difference on the total number of hints given between
different groups of prior settings. (F(2,132)=1.32, p=0.270203>0.05). On average, each studenthas made 193 movements (std.: 53) during interaction with Prime Climb. Table 9 represents the
mean and standard deviation of the total number of given hints with respect to the different priorsettings. Figure 14 illustrates the average number of given hints to each player during theinteraction with Prime Climb when the different prior settings were used. Also Figure 15
compares the total hints given to each student in different prior settings.
Figure 14: Average number of total hints given to each player
Figure 15: Total number of hints given to each player (student)
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 18/47
Table 9: Mean and Standard Deviation of the total number (# raw data point) of given hints
Population Prior User Specific Prior Generic Prior
Mean 74.31 84.6 79.13
Standard Deviation 27.5 32.71 29.66
4.2 Number of Given Justified Hints
Using the Welch test we found a statistically significant difference on the total number of given
justified hints, among the different groups of prior settings (p<0.05). Table 10 represents the
means, standard deviation and total number of justified hints for each prior probabilities setting.Also, Table 11 represents the results of the Games-Howell post-hoc test. (“*” indicates the
significant difference)
Table 10: Descriptive statistics on total number of justified hints
Population Prior User Specific Prior Generic PriorMean 1.83 13.8 3.67
Standard Deviation 2.65 10.841 5.80
Total number of given justified hints
55 414 110
Table 11: Games-Howell Post-hoc test result (Dependent variable: total number of justified hints)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .000 *
Generic .649User-specific Population .000 *
Generic .002 *Generic Population .649
User-specific .002 *
The results showed that there is no significant effect of using the population prior probabilities
and the generic prior probabilities on the total number of justified hints. On the contrary there is
a statistically significant difference between the user-specific and population as well as betweenthe user-specific and the generic prior probabilities settings with respect to the total number of
justified hints. Figures 16, 17, respectively illustrate the average number of given justified hints
and total number of given justified hints to each student.
4.3 Number of Given Unjustified Hints
The Welch test showed that there was a statistically significant difference on the total number of given unjustified hints, among different groups of prior settings (p<0.05). Table 12 shows the
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 19/47
descriptive statistics on the total number of justified hints. Table 13 represents the results of the
Games-Howell post-hoc test.
Figure 16: Average number of given justified hints
Figure 17: Total number of given justified hints to each student
Table 12: Descriptive statistics on total number of unjustified hints
Population Prior User Specific Prior Generic Prior
Mean 2.64 1.73 7.93
Standard Deviation 3.34 2.63 7.41
Total number of givenunjustified hints
116 76 349
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 20/47
Table 13: Games-Howell Test (Dependent variable: total number of unjustified hints)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .546
Generic .002 *
User-specific Population .546 Generic .000 *
Generic Population .002 *User-specific .000 *
The results showed that there is a significant difference between the generic prior probabilitiessetting and the population and user-specific prior probabilities settings on the total number of
unjustified hints. Also there is no statistically significant difference between the user-specific and
population prior probabilities settings on the total number of unjustified hints. Figure 18 and 19respectively illustrate the average number of given justified hints, total number of given
unjustified hints to each student.
Figure 18: Average number of given unjustified hints
4.4 Number of Missed Hints
The Welch test showed that there was a statistically significant difference on the total number
of missed hints, among different groups of prior settings (p<0.05). Table 14 shows thedescriptive statistics on the total number of missed hints. Table 15 represents the results of the
Games-Howell post-hoc test. The results showed no significant difference on the total number of
missed hints between the generic and population prior probabilities settings while there existed asignificant difference between the user-specific prior probabilities setting and the population and
generic prior probabilities settings on the total number of missed hints.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 21/47
Figure 19: Total number of unjustified hints
Table 14: Descriptive statistics on total number of missed hints
Population Prior User Specific Prior Generic Prior
Mean 10.47 1.37 9.2
Standard Deviation 8.50 2.73 7.48
Total number of given
unjustified hints314 41 276
Table 15: Paired T-test results. (Dependent variable: total number of missed hints)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .000 *
Generic .770User-specific Population .000 *
Generic .000 *
Generic Population .770User-specific .000 *
Figures 20 and 21 respectively illustrate the average number of missed hints, total number of missed hints of each student in the different prior probabilities.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 22/47
Figure 20: Average number of missed hints
Figure 21: Total number of missed hints for each student
4.5 Number of Correctly Not-Given Hints
No significant difference between the total number of correctly not-given hints was found usinga single factor ANOVA test (F(2,129)= 0.034 , p>0.05). Table 16 shows the descriptive statisticson the total number of correctly not-given hints. Figures 22 and 23 respectively illustrate the
average number of correctly not-given hints and total number of correctly not-given hints for
each student in different prior probabilities settings.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 23/47
Table 16: Descriptive statistics on total number of correctly not-given hints
Population Prior User Specific Prior Generic Prior
Mean 6.95 7.16 5.84
Standard Deviation 8.24 8.30 7.483
Total number of givenunjustified hints
306 315 257
Figure 22: Average number of correctly not given hints
4.6 Hint Precision
The Welch test showed that there was a statistically significant difference on the hint
precision, among different groups of prior settings (p<0.05). Table 17 represents the results of
the Games-Howell test and Table 18 shows the descriptive statistics on the hint precision. The
results showed no significant difference between the population and the generic probabilitiessettings on hint precision. On the contrary there was a statistically significant difference between
the user-specific prior setting and the population and the generic prior probabilities settings.
Table 17: Game-Howell post-hoc test result (Dependent variable: hint precision)
Games-Howell Test
PriorProbabilities
PriorProbabilities
p-value(Sig.)
Significant(*: Yes)
Comparison
Population User-specific .001 *Generic .682
User-specific Population .001 *
Generic .000 *
Generic Population .682User-specific .000 *
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 24/47
Table 18: Descriptive statistics on the hint precision
Population Prior User Specific Prior Generic Prior
Mean 50.79% 85.2% 41.7%
Standard Deviation 43.07 23.59 40.98
Figures 23 and 24 respectively illustrate the average hint precision and the hint precision of eachstudent.
Figure 23: Average hint precision for the different prior settings
Figure 24: Total hint precision of each student for the different prior settings
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 25/47
4.7 Hint Recall
The Welch Single Factor ANOVA test showed that there was a statistically significant
difference on the hint recall, among different groups of prior settings (p<0.05). Table 19 showsthe descriptive statistics on the hint recall and Table 20 gives the results of the Games-Howell
post-hoc test. The results showed no statistically significant difference between the population and the generic while there was a statistically significant between the user-specific priorprobabilities setting and the population and the generic prior probabilities settings.
Table 19: Descriptive statistics on the hint recall
Population Prior User Specific Prior Generic Prior
Mean 21.27% 93.96% 26.44%
Standard Deviation 25.31 12.01 29.33
Table 20: Games-Howell test results (Dependent variable: hint recall)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .000 *Generic .746
User-specific Population .000 *
Generic .000 *Generic Population .746
User-specific .000 *
Figures 25 and 26 respectively illustrate the average hint recall and the hint recall of each
student.
Figure 25: Average the hint recall for the different prior settings
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 26/47
Figure 26: Total hint recall of each student for the different prior settings
4.8 Thresholds Refinement in the Hinting Mechanism
As discussed in the previous section, an expert-based approach was used to find the optimal
thresholds used in the intervention mechanism. Alternatively, a data-driven approach also can
also be utilized to determine the values for the threshold possibly resulting in higher hint
precision and hint recall. Similar to the student’s model parameter refinement discussed in the
Section 2, a set of values for the Fact-correctMove and Fact-wrongMove thresholds were
examined and the hint precision and hint recall were calculated. We defined another measure of
performance, called accuracy=(hint precision + hint sensitivity)/2. Figures 27, 29, 31 illustrate
how the hint precision and hint recall change while the value for Fact-WrongMove thresholdvaries and Fact-correctMove threshold holds its original values (ie. 0.5) for all three types of
prior probabilities settings, population, user-specific, generic. Subsequently, Figures 28, 30, 32plot changes in hint precision, hint recall and accuracy with respect to different values for Fact-
CorrectMove threshold while Fact-WrongMove threshold holds its optimal value, the value
resulting in highest hint precision and hint recall. The thresholds which resulted in the highest
hint precision and hint recall are represented in the Figures. Table 21 and 22 summarize the
optimal thresholds and total number of given hints, average and standard deviation of the total
number of given hints for all prior probabilities settings.
Table 21: Summary of hinting strategy’s thresholds refinement
Prior settingCF-Correct
Move
CF-Wrong
Move
Hint
PrecisionHint Recall
Population 0.72 0.8 55.2% 56.2%
Generic 0.88 0.76 40.6% 94.2%
User-specific 0.68 0.44 92.8% 95.1%
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 27/47
Table 22: Descriptive statistics of the hinting strategy’s thresholds refinement
Prior settingTotal number of
given hints
Average number of
given hints
Std. number of
given hints
Population 6703 148 34
Generic 8024 178 45
User-specific 6556 145 34
Figure 27: FACT-Wrong threshold refinement for the population prior probabilities
Figure 28: FACT-CorrectMove threshold refinement for the population prior probabilities
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 28/47
Figure 29: FACT-WrongMove threshold refinement for the generic prior probabilities
Figure 30: FACT-CorrectMove threshold refinement for the generic probabilities
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 29/47
Figure 31: FACT-WrongMovement threshold refinement for the user-specific prior probabilities
Figure 32: FACT-CorrectMovement threshold refinement for the user-specific prior probabilities
5 Model Precision and Sensitivity
In Sections 3 and 4, two measures of effectiveness of the intervention (hinting) mechanism in
Prime Climb, hint precision and hint recall were calculated. On the contrary, the main objectiveof the current section is quantifying the ability of the student’s model to detect the level of
number factorization skills in the player during the interaction with Prime Climb. Similar to the
strategy followed in calculating the hint precision and hint recall, since there is no ground-truth
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 30/47
on how the number factorization knowledge evolves during the interaction of the student with
the game from the pre-test to the post-test, only numbers with the same score in the pre-test andpost-test were considered. To this end, four measures were defined namely, 1)model positive
precision, 2)model negative precision 3)model sensitivity 4)model specificity. Before formulating
the above measures, some terminologies need to be defined. In the following definitions, “a
known/unknown factorization skill” to the player refers to a factorization skill on which thestudent keeps the same score from the pre-test to the post-test and the student has
correctly/wrongly answered the skill’s corresponding question in the pre-test and post-test.
• True-Positive: The student’s model correctly assesses a known factorization skill asknown to the student during the game-play.
• False-Positive: The student’s model fails to assess an unknown factorization skill asunknown to the student during the game-play.
• True-Negative: The student’s model correctly assesses an unknown factorization skillas unknown to the student during the game-play.
• False-Negative: The student’s model fails to assess a known factorization skill as
known to the student during the game-play.
Given the above definitions, model positive precision, model negative precision, model
sensitivity and model specificity are formulated as following:
Equation 3: Model Positive Precision
)#(#
#mod
PositiveFalseof PositiveTrueof
PositiveTrueof precision positiveel
+
=
Equation 4: Model Negative Precision
)#(#
#mod
NegativeFalseof NegativeTrueof
NegativeTrueof precisionnegativeel
+
=
Equation 5: Model Sensitivity
)#(#
#mod
NegativeFalseof PositiveTrueof
PositiveTrueof ysensitivit el
+
=
Equation 6: Model Specificity
)#(##mod
PositiveFalseof NegativeTrueof NegativeTrueof yspecificit el+
=
5.1 Simulation of Interactions of the Users with Prime Climb
Log files of the interactions of 45 students in grade 5,6 with Prime Climb were parsed to
simulate the movements the students made during the game-play. In sum, there are 8666
movements extracted from the log files. Then, a post-processing filtering was applied to exclude
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 31/47
the movements in which neither the player’s number nor the partner’s number keep the same
score from the pre-test to the post-test resulting in 3203 left movements with at least one numberwith the same status in the pre-test and post-test and the movements were classified in 16
possible groups on the basis of the status of the player’s number and the partner’s number in the
pre-test and post-test. Figure 33, represents the percentage frequency distribution of the statuses.
Generally, there are 3083 (84.6%) and 559 (15.4%) data points which represent numbers withstatus of KK and UU respectively.
Figure 33: Percentage frequency of each movement types in total movements made by the players
The objective of the simulation was to evaluate how accurately the model could evaluate thelevel of factorization knowledge for the numbers with the status of KK and UU after each
movement. To this end, we used two thresholds namely FACT-CorrectMove and FACT-WrongMove . The former threshold represents the cut-off to evaluate a number factorization skillas known (above the threshold) or unknown (below the threshold) after a correct movement, and
the latter threshold is the cut-off for evaluation a number factorization skill as known (above the
threshold) or unknown (below the threshold) after a wrong movement. The initial values used forthese two thresholds were 0.5 (for FACT-CorrectMove) and 0.8 (for FACT-WrongMove). These
values are identical to the original values used for the thresholds in the hinting strategy (see
Table 3). We counted the number of True-Positive, True-Negative, False Positive and False
Negative cases of all the students and formed the confusion matrix and used the Equations 3-6 tocalculate the mode positive precision, model negative precision, model sensitivity and model
specificity. The structure of the confusion matrix is represented in Table 23.
Table 23: Structure of the confusion matrix
Model assessment of student knowledge
Unknown Known
Pre-PostTest
Known False Negative (FN) True Positive (TP)Unknown True Negative (TN) False Positive (FP)
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 32/47
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 33/47
5.1.2 Generic Prior Setting
Table 26 shows the confusion matrix when the generic prior probabilities were used to
initialize the student’s model. Similar to the population prior probabilities setting, a low
percentage of (4.7/ 15.3)% for the TrueNegative indicates that the model with the generic prior
probabilities has problem with detecting the “unknown” factorization skills to the student duringthe game-play and consequently a low model negative precision and model specificity are
expected. The student’s model has best performed on evaluating the “known” skills as “known”
(68.4/84.7)%. Table 27 represents the values for the four measures of model positive precision,
model negative precision, model sensitivity and model specificity. Figure 35, illustrates the
percentages of the elements of the confusion matrix (TruePositive, FalsePositive, TrueNegative,
FalseNegative) for each relevant status.
Table 26: Confusion Matrix (# of raw data points and [percentages]) for the generic priors
Model assessment of student knowledge
(Generic-based Prior)
Unknown Known Total
Pre-Post
Test
Known 592[16.3%] (FN) 2491[68.4%] (TP) 3083[84.7%]Unknown 171[4.7%] (TN) 386[10.6%] (FP) 557[15.3%]
Total 763[21.0%] 2877[79.0%] 3640[100%]
Table 27: Summary of the results on the model analysis for the generic priors setting
Prior Setting Generic-based
MeasuresModel Positive
Precision
Model Negative
PrecisionModel Sensitivity Model Specificity
Values 0.866 0.225 0.808 0.307
Figure 35: Frequency (%) of the elements of the confusion matrix for each relevant status
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 34/47
5.1.3 User-specific Prior Setting
Table 28 gives the confusion matrix when the user-specific prior probabilities are used to
initialize the student’s model in Prime Climb. The low percentages of the FalseNegative (4.3/
84.7)% and FalsePositive (2.1% / 82.5%) and high percentages of TrueNegative (13.2% / 17.5%)
and TruePositive (80.4/84.7)% has provided evidence on that the student’s model initialized bythe user-specific prior probabilities performs well in assessing the “known” skill as “known” and
“unknown skills” as “unknown”. Table 29 represents the values for four measures of model
positive precision, model negative precision, model sensitivity and model specificity. Figure 36also illustrates the percentages of the elements of the confusion matrix in their relevant statuses.
Table 28: Confusion Matrix (# of raw data points and [percentages]) for the user-specific prior setting
Model assessment of student knowledge
(User-specific-based Prior)
Unknown Known Total
Pre-Post
Test
Known 156[4.3%] (FN) 2927[80.4%] (TP) 3083[84.7%]
Unknown 480[13.2%] (TN) 77[2.1%] (FP) 557[15.3%]
Total [17.5%] [82.5%] 3640[100%]
Table 29: Summary of the results on the model analysis for the user-specific priors setting
Prior Setting User-specific-based
MeasuresModel Positive
Precision
Model Negative
Precision
Model Positive
Sensitivity
Model Negative
Sensitivity
Values 0.975 0.755 0.95 0.862
Figure 36: Frequency (%) of the elements of the confusion matrix for each relevant status
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 35/47
6 Comparison of the Model’s Performance for Different Prior Probabilities
In the previous section we showed that how the student’s model performs when the different
prior probabilities settings were used to initialize the student’s model. In this section, the effects
of using different prior probabilities settings on the number of TruePositive, FalsePositive,
TrueNegative, FalseNegative, model positive precision, model negative precision, modelsensitivity and model specificity are statistically discussed.
6.1 Total Number of True-Negative
The Welch test showed that there was a statistically significant difference on the
TrueNegative, among the different groups of prior settings (p<0.05). Table 30 shows thedescriptive statistics on the TrueNegative. Table 31 represents the results of subsequent Games-
Howell test. The results showed that there was no statistically significant difference on the total
number of TrueNegative between the population and generic prior probabilities while there wasa statistically significant difference between the user-specific prior probabilities setting and the
other two settings.
Table 30: Descriptive statistics on True Negative
Population Prior User Specific Prior Generic Prior
Mean 4.1 16 5.7
Standard Deviation 5.58 13.24 8.08
Table 31: Games-Howell test result (Dependent variable: True Negative)
Games-Howell Test
PriorProbabilities
PriorProbabilities
p-value(Sig.)
Significant(*: Yes)
Comparison
Population User-specific .000 *Generic .648
User-specific Population .000 *
Generic .002 *
Generic Population .648User-specific .002 *
6.2 Total Number of False-Negative
The welch test showed that there was a statistically significant difference on the
FalseNegative, among different groups of prior settings (p<0.05). Table 32 shows the descriptive
statistics on the total number of FalseNegative. Table 33 represents the results of the Games-Howell tests. The results showed that there is a statistically significant difference between all
three groups of prior probabilities settings on the total number of FalseNegative.
Table 32: Descriptive statistics on False Negative
Population Prior User Specific Prior Generic Prior
Mean 6.61 3.54 13.45
Standard Deviation 6.15 4.87 10.45
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 36/47
Table 33: Games-Howell test result (Dependent variable: False-Negative)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .030 *
Generic .001 *
User-specific Population .030 *Generic .000 *
Generic Population .001 *User-specific .000 *
Figure 37: Average of TrueNegative for the different prior probabilities settings
Figure 38: Total TrueNegative for each student
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 37/47
Figures 39 and 40 respectively illustrate the average number of FalseNegative and the total
FalseNegative for each student for each prior probabilities setting.
Figure 39: Average of FalseNegative for the different prior probabilities settings
Figure 40: Total number of FalseNegative for each student when different prior probabilities were used
6.3 Comparison of Total Number of True Positive
Following a non-significant difference between the variances of the three prior probabilities
settings using the test of homogeneity of variance (Levene statistics), a traditional single factorANOVA showed that there is no statistically significant difference on the total number of
TruePositive among different groups of prior settings (F(2,129)= 0.63 ,p= 0.531758>0.05). Table
34 represents the mean and standard deviation of total TruePositive for the different settings.
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 38/47
Figures 41 and 42, respectively illustrate the average of the TruePositive and the total number of
TruePositive of each student for different prior probabilities settings.
Table 34: Descriptive statistics on True Positive
Population Prior User Specific Prior Generic Prior
Mean 63.45 66.52 56.61
Standard Deviation 42.91 43.48 40.24
Figure 41: Average of TruePositive
Figure 42: Total number of TruePositive of each student for each prior probabilities setting
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 39/47
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 40/47
Figure 44: The total number of FalsePositive of each student for each prior probabilities setting
6.5 Comparison of Model Positive Precision
The Welch test showed that there was a statistically significant difference on the model
positive precision, among different groups of prior probabilities settings (p=<0.05). Table 37
shows the descriptive statistics on the model positive precision. Table 38, represents the result of
the Games-Howell post-hoc test. The results showed that there is no statistically significant
difference on model positive precision between the population and generic prior probabilitiessettings. Furthermore, there was a statistically significant difference on the model positive
precision between the user-specific prior probabilities settings and the other two settings. Figures
45 and 46 respectively illustrate the average (in percentage) of the model positive precision andthe model positive precision for each student for different prior probabilities settings.
Table 37: Descriptive statistics on Model Positive Precision
Population Prior User Specific Prior Generic Prior
Mean 84.68 96.87 84.57
Standard Deviation 20.73 8.78 20.61
Table 38: Paired T-test results. (Dependent variable: Model Positive Precision)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .002 *
Generic 1.00
User-specific Population .002 *Generic .002 *
Generic Population 1.000
User-specific .002 *
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 41/47
Figure 45: Average of the model positive precision
Figure 46: model positive precision for each student for different prior probabilities settings
6.6 Comparison of the Model Negative Precision
The Welch test showed that there was a statistically significant difference on the model
negative precision, among different groups of prior settings (p<0.05). Table 40 shows thedescriptive statistics on the model negative precision. Table 41 represents the results of Games-Howell test. The results showed that there is no statistically significant difference on the model
negative precision between the population and generic settings. Also, there was a statistically
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 42/47
significant difference between the user-specific prior probabilities setting and the other two
settings. Figures 47 and 48 illustrate the average (in percentage) of model negative precision andthe total model negative precision for each student for each prior probability settings
respectively.
Table 39: Descriptive statistics on Model Negative Precision
Population Prior User Specific Prior Generic Prior
Mean38.73 75.91 32.94
Standard Deviation38.52 26.45 36.25
Table 40: Games-Howell test results (Dependent variable: Model Negative Precision)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .000 *
Generic .821
User-specific Population .000 *Generic .000 *
Generic Population .821
User-specific .000 *
Figure 47: Average of the Model Negative Precision
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 43/47
Figure 48: Model Negative Precision for each student and each prior probabilities setting
6.7 Comparison of the Model Sensitivity
The Welch test showed a statistically significant difference on the model sensitivity, among
different groups of prior settings (p<0.05). Table 41 shows the descriptive statistics on the model
sensitivity. Table 42 gives the result of the Games-Howell test. The results showed that thereexisted a statistically significant difference among all settings of the prior probabilities. Figures
49, 50 illustrate the average (in percentage) of the model sensitivity and the total model sensitivity
for each student and for each prior probabilities setting.
Table 41: Descriptive statistics on Model Sensitivity
Population Prior User Specific Prior Generic Prior
Mean 90.5 95.26 80.93
Standard Deviation 6.82 5.45 9.95
Table 42: Games-Howell test result (Dependent variable: Model Sensitivity)
Games-
Howell Test
Prior
Probabilities
Prior
Probabilities
p-value
(Sig.)
Significant
(*: Yes)
Comparison
Population User-specific .002 *Generic .000 *
User-specific Population .002 *
Generic .000 *
Generic Population .000 *User-specific .000 *
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 44/47
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 45/47
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 46/47
Figure 52: Model specificity for each student for the different prior probabilities settings
7 Preliminary Analysis on Pre-Post Tests
As shown in the previous Section 4, the original values for the thresholds in the hinting
mechanism in Prime Climb resulted in low hint precision and hint recall in the population and
generic prior settings. On the contrary, it was shown that initializing the student’s model with the
user-specific prior probabilities will result in high hint precision and hint recall. Moreover as
already discussed in Section 3, in measuring the hint precision and hint recall we had to consider
solely the movements involving at least one number (player’s number or partner’s number)
which appears on the pre-test and post-test and the student keeps the same answer to thenumber’s corresponding question in both pre-test and post-test. Following this constraint, a few
number of hints (Mean: 29.3%, Std: 9.96%) out of all hints given to the student could be
consider in calculating the hint precision and hint recall. This fact could negatively affect thevalues of hint precision and hint recall when the user-specific prior is used as the prior
probabilities are only set for the nodes in the BN whose corresponding numbers appear on the
pre-test and post-test and the prior probabilities of the others are set to 0.5 which is equal to theprior probabilities used in the generic prior setting. To investigate the possibility of such
negative effect, we have calculated some preliminary descriptive statistics on the numbers
appearing on the pre-test and post-test. Table 45 represents the numbers with most frequency of appearance in the movements and whether or not they appear on the pre-test and post-test. (Y:
yes, N: No)Table 45: Numbers (15-top) with highest frequency of appearance in the movements
Number 17 25 76 4 27 40 81 89 99 97 96 19 37 31 9
Frequency
713 644 578 554 515 498 463 439 412 407 391 373 366 345 325
Inpretest?
N Y N N Y N Y Y N Y N N N Y Y
8/3/2019 Hint Analysis
http://slidepdf.com/reader/full/hint-analysis 47/47
It can be resulted that more than 50% (8 out 15) of the numbers with highest frequency of visitdo not appear on the pre-test and post-test. Table 46 and 47 also show the number with most
frequency of visit in correct and wrong movements respectively. It is shown that 60% of the
highest visited numbers involving in the correct movements do not appear on the pre-test and
post-test. The situation is worse for the wrong movements (0.73%).
Table 46: Numbers with highest frequency of visit in the correct movements
Number 17 25 76 89 27 4 97 81 19 37 31 13 99 40 71
Frequency
71
3
58
0
49
2
43
9
42
7
42
0
40
7
39
9
36
7
36
2
34
5
31
0
30
8
30
4
28
3
In
pretest?
N Y N Y Y N Y Y N N Y N N N N
Table 47: Numbers with highest frequency of visit in the wrong movements
Number 40 57 18 96 4 15 99 36 50 21 9 33 27 76 69
Frequency 194 145 143 142 134 108 104 100 95 94 91 89 88 86 74
In
pretest?
N N N N N Y N N N N Y Y Y N N
8 Conclusion and Future work
This manuscript reports on the results on the student’s model parameters refinement, analysis of
the intervention mechanism and the student’s model used in Prime Climb. It was discussed thatthe highest accuracy of predicting the performance of the students in the post-test, conducted
after the students interacting with Prime Climb, is 75.5% when the population prior setting is
used. It was also found that when the population and generic prior settings were used the hint
precision and hint recall were of very low values. On the contrary, these values were high when
the user-specific prior setting was used and there was significant difference on total number of
justified, unjustified and missed hints with between the user-specific prior probability settingsand the other two settings while in all cases (except for the total number of correctly not-given
hints) there was no significant difference between the population prior probabilities and the
generic prior probabilities settings. Furthermore, it was shown that the student’s model
initialized with the user-specific prior probabilities setting resulted in higher model positive
precision, model negative precision, model sensitivity and model specificity.
As for future work, we would like to concentrate on the situations which negatively affect the
model’s specificity and model negative precision and investigate if they follow some specificpatterns. The other focus will be on finding the most appropriate time to intervene as it was
shown that, although the student is interrupted too much during the interaction with the game and
provided with hints, the hint precision and hint recall are very low when the population and