http://www.iaeme.com/IJCIET/index.asp 1674 [email protected]
International Journal of Civil Engineering and Technology (IJCIET) Volume 10, Issue 01, January 2019, pp. 1674-1685, Article ID: IJCIET_10_01_153
Available online at http://www.iaeme.com/ijciet/issues.asp?JType=IJCIET&VType=10&IType=01
ISSN Print: 0976-6308 and ISSN Online: 0976-6316
© IAEME Publication Scopus Indexed
USE OF RECURRENT NEURAL NETWORK
ARCHITECTURES FOR DATA VERIFICATION
IN THE SYSTEM OF DISTANCE EDUCATION
Arseniy Aleksandrovich Lebedev
Laboratory of Innovations, Ltd, Kazan, Tatarstan, Russian Federation
ABSTRACT
There are very few examples of the use of various architectures for recurrent neural
networks to predict student learning outcomes. In fact, the only architecture used to
solve this problem is the LSTM architecture. In the works devoted to the use of LSTM
to predict educational outcomes, the results of a detailed theoretical substantiation of
the preference of this particular architecture of the RNN are not presented. In this
regard, it seems advisable to provide such justification in the framework of this study.
The main property of input data for prediction of educational outcomes is its
temporary nature. Some sequence of user actions unfolds in time and is evaluated
(classified) by an external observer as evidence of the presence or absence of an
educational result (objective or metaobjective). In this regard, the RNN used to classify
user actions should perform a procedure for adjusting the weights of neurons for a
certain set of states in the past. At the same time, the length of the sequence of these
states is not predetermined: it can be both short (for example, for objective results),
and quite long.
Keywords: Distance education, Recurrent neural network, Architecture, Structure,
Information technology, Monitoring, Educational outcomes prediction, Online courses.
Cite this Article: Arseniy Aleksandrovich Lebedev, Use of Recurrent Neural Network
Architectures for Data Verification in the System of Distance Education, International
Journal of Civil Engineering and Technology, 10(01), 2019, pp. 1674–1685
http://www.iaeme.com/IJCIET/issues.asp?JType=IJCIET&VType=10&IType=01
1. INTRODUCTION
Early PNS architectures used the time-error propagation method [1] and real-time recurrent
learning [2] to solve this problem. There were two main problems with the propagation of the
error signal:
1.1. Two main problems with the propagation of the error signal:
1. the error signal increased sharply with distance to past states;
Arseniy Aleksandrovich Lebedev
http://www.iaeme.com/IJCIET/index.asp 1675 [email protected]
2. the error signal disappeared.
At the same time, an exponential relationship was observed between changes in the error
signal time and the number of weights fitted. Problem 1 led to an undesirable effect of constant
oscillation of the scales, problem 2 led to a situation where, with a sufficiently large time
distance between the states affected by the error signal, network training either took an
unacceptably long time or did not occur at all [3].
2. LITERATURE REVIEW
Researchers have proposed a variety of different RNN (recurrent neural network) architectures
aimed at solving these problems.
2.1. Altinay
The work by Altinay in 2017 [4] provides an overview of a number of approaches that use
various modified gradient descent solutions for solving problems with error signal propagation,
but none of the proposed options solves both problems at once.
2.2. Zagami
In work by Jason Zagami in 2018 [5], the architecture of a with a time delay was proposed for
solving both problems, but only for cases of relatively short sequences of states. In such a
network, the neuron weights are updated with a weighted sum of old weights.
2.3. Fox-Turnbull
The idea of time-delayed recurrent networks formed the basis of the NARX neural network
described in work by Fox-Turnbull in 2016 [6].
2.4. Campbell
To solve problems with the propagation of an error signal in cases of relatively long sequences
of states in work by Campbell in 2015 [7], it was proposed to use a set of time constants
governing the updating of the weights.
2.5. Baines and Chen
An attempt to combine the neural network approach with a time delay and a time regulation
constant for updating the weights was undertaken in the work (Mapotse, 2018). However, as
in work by Baines in 2018 [8], and in work by Chen in 2018 [9], in the case of long sequences
of states, a neat and time-consuming process of selecting time constants was required.
2.6. Hsu
An alternative solution to both problems for short and long sequences of events was described
in work by Hsu in 2016 [10]. The authors proposed to update the weights of the recurrent cell
by summing the old weight and the current normalized input value. At the same time, the
normalized current input value gradually distorted (supplanted) the stored information about
past states, which made it impossible to work with long sequences of states.
2.7. Fletcher-Watson
In work by Fletcher-Watson in 2015 [11], to solve problems with the error signal on long
sequences of events, it was proposed to use special, separate network cells that affect the
weights. Such cells are added only if conflicting error signals occur on the network. In a limited
Use of Recurrent Neural Network Architectures for Data Verification in the System of Distance
Education
http://www.iaeme.com/IJCIET/index.asp 1676 [email protected]
number of cases, this approach can significantly reduce the number of calculations on the
network, however, in unfavorable cases, the number of additional cells can be equal to the
number of states in the sequence, which can lead to problems similar to the infinite oscillation
of weights.
2.8. Hochreiter and Schmidhuber
In the paper by Hochreiter and Schmidhuber [12] the LSTM recurrent neural network
architecture was proposed. It solves both problems with the propagation of an error signal for
almost arbitrary sequences of states. The backpropagation of the error signal in this architecture
is set automatically by a constant, obtained by applying an efficient algorithm based on a
gradient descent. In this case, the signal propagation occurs through the states of the network
cells that have specific 4-layer architecture. As a result, LSTM is capable of maintaining a
temporal relationship of more than 1000 states even in the case of fairly “noisy” input data and
at the same time does not lose this property on short sequences of states.
3. MATERIALS AND METHODS
In contrast to the classical machine learning methods, in which they find a point estimate for
the parameters of the neural network w, in Bayesian neural networks objects, target variables
and parameters are treated as random variables. Accordingly, the neural network models the
dependence p (y | x, w). The prior distribution p (w) sets the initial knowledge and expectations
about the parameters. For example, in thinning models, the prior distribution encourages zero
parameter values. The learning process consists in finding the posterior distribution of the
parameters p (w | D). Then the predictions of the model will be given as
p(y|x) = Ep(w|D)p(y|x,w) (1)
To find the posterior distribution on the parameters according to the Bayes formula
(2)
This fails due to the non-calculated integral in the denominator. Therefore, an approximate
a posteriori distribution qλ (w) in a certain parametric family of distributions is sought, and λ
is the parameters of an approximate a posteriori distribution. Parameters λ are chosen so as to
minimize the KL divergence:
KL(qλ(w)||p(w|D)) → min, w (3)
Which is tantamount to maximizing the variational lower bound on the likelihood
logarithm:
N L(λ) = XEqλ(w) logp(yi|xi,w) − KL(qλ(w)||p(w)) → max λ i=1 (4)
This expression is essentially the sum of the term responsible for the quality of the solution
of the problem, and the regularizer, showing that the a posteriori distribution of the parameters
should be close to the a priori one.
The first addend is usually estimated using the Monte Carlo method with one sample of
weights for each object:
Eqλ(w) logp(yi|xi,w) ≈ logp(yi|xi,w),w ∼ qλ(w). (5)
To avoid gradient bias, a reparameterization trick is applied by Damewood in 2016 [13]:
the weights are given as
. (6)
Arseniy Aleksandrovich Lebedev
http://www.iaeme.com/IJCIET/index.asp 1677 [email protected]
The reparameterization trick is not applicable to all distributions, but it is applicable, for
example, to the normal distribution:
, (7)
elementwise multiplication.
Also, the method of multiple regression, the logistic regression method and the coefficient
of determination were used as a control in the experiment.
4. RESULTS AND DISCUSSIONS
Dropout [14] is a regularization technique for neural networks, which imposes multiplicative
noise on the inputs of each layer. Typically, noise vector elements are generated from the
Bernoulli distribution (binary dropout) or from the normal distribution with a center at 1
(Gaussian dropout), and the parameters of this noise are adjusted using cross-validation. In
work by McLain in 2018 [15], an interpretation of the Gaussian dropout is proposed as a way
to specify the Bayesian neural network. This made it possible to adjust the noise parameters
automatically. In work by Virtanen in 2015 [16], this approach was extended to thin the fully
connected neural networks and was called the thinning variational dropout (TVD).
Consider a fully connected layer h = g (Wx + b) with a weight matrix W. In a TVD, the a
priori distribution of weights is given as a factorized log-uniform distribution:
(8)
This distribution has a large mass at zero and therefore encourages zero weights.
Approximate a posteriori distribution is sought in the family of factorized normal
distributions.
k, n q(W) = Y q(wij), q(wij|θij,αij) = N(θij,αijθij2 )i, j=1 (9)
The use of such a posteriori distribution is equivalent to the imposition of a multiplicative
[17]
wij = θijξij, ξij ∼ N(1,αij), (10)
Or additive [18]:
. (11)
Of normal noise on weight. The parametrization of weights (11) is called additive
reparametrization and makes it possible to reduce the dispersion of gradients L over the average
weights θij. In addition, since the sum of normal distributions is a normal distribution with
calculated parameters, noise can be imposed on Wx pre-activation, rather than separately on
the components of the W matrix. This technique is called local reparametrization [19, 20].
Local reparameterization allows one to reduce the dispersion of gradients even more, and also
saves computations, since sampling noise on weights separately for each object is an expensive
operation.
In the TVD, the variation of the lower estimate (4) is optimized by {θ, logσ} using the trick
of reparametrization, additive reparametrization and local reparametrization to achieve
unbiased low dispersion gradients. Since the a priori and the approximate a posteriori
distributions are factorized by weights, the KL divergence also splits into a sum for individual
Use of Recurrent Neural Network Architectures for Data Verification in the System of Distance
Education
http://www.iaeme.com/IJCIET/index.asp 1678 [email protected]
weights, and each term depends only on the noise dispersion αij due to the special choice of
the prior distribution:
KL(q(wij|θij,αij)||p(wij)) = k(αij),
k(α) ≈ 0.64σ(1.87 + 1.49logα) − 0.5log(1 + α−1) + C. (12)
The last expression is a fairly accurate approximation of the KL divergence and is obtained.
KL divergence (12) encourages large values of αij and small modulo values of θij. If αij →
∞ for the weight wij, then due to the large noise dispersion of the model, it is advantageous to
set θij = 0 and σij = αijθij2 = 0 to avoid large prediction errors. As a result, the distribution q
(wij | θij, αij) approaches the delta function at 0, and this weight is always zero.
The thinning variational dropout model was extended to achieve group thinning of the full-
connected layer. Group dilution refers to the removal of a weight group from a model, for
example, rows or columns of a weight matrix. Group thinning allows you to remove elements
of hidden layers of the neural network, which accelerates the passage forward. As an example,
we will combine the columns of the weights matrix of a fully connected layer into groups and
number them 1 ... k.
The authors propose to introduce group multiplicative weights zi for each weight group and
adjust weights in the following parameterization:
wij = wˆijzi. (13)
In a fully connected layer, this parametrization is equivalent to imposing multiplicative
noise on the input layer:
. (14)
Since the main task is to nullify zi, the authors use for the multiplicative variables the same
pair of a priori-approximate posteriori distribution, as in the TVD:
. (15)
For individual weights, the standard normal a priori distribution is used, and the posterior
distribution, as in the TVD, is approximated in the class of normal distributions:
p(wij) = N(wij|0,1) q(wij) = N(wij|θij,σij2 ). (13)
The prior distribution to individual weights encourages zero averages θij, and this, in turn,
helps bring the group averages θiz to zero, that is, reset the group variables.
The model is trained in the same way as the TVD model by optimizing the variational lower
bound (4). KL-divergence splits into a sum of KL-divergences for group variables and for
weights, with the last term calculated analytically.
For most tasks, recurrent neural networks are defined by dense weights matrices, with most
of the weights being uninformative and not affecting the quality of the solution to the problem.
Despite the existence of heuristic approaches to thinning RNN based on a large number of
hyperparameters, the use of Bayesian thinning techniques has not been previously investigated
for recurrent neural networks. On the other hand, the literature describes various models of
Bayesian regularization of RNN, some features of which are also reflected in the proposed
model.
When applying a TVD to a RNN, the features of the recurrent layer should be taken into
account.:
Arseniy Aleksandrovich Lebedev
http://www.iaeme.com/IJCIET/index.asp 1679 [email protected]
• weights in the recurrent layer are related in time, that is, different elements of the
input sequence are multiplied by the same weights matrices;
• in Bayesian regularization of RNN, the current hidden state ht and the matrix of
recurrent weights Wh are not independent random variables, since the second
involved in the expressions for calculating the first.
First, we consider a model of a thinning variational dropout for a recurrent layer, and then
we note the features of applying a TVD to a fully connected layer and a layer of representations
in the RNN.
Following, we use a log-uniform prior distribution on the weights of the recurrent layer
{Wx, Wh} and approximate the posterior distribution in the class of normal distributions:
, (14)
The training model is to optimize the variation lower bound
(15)
On parameters {θ, logσ} using stochastic gradient optimization methods. In expression
(15), the first term is the likelihood of the model, averaged over the distribution over the
weights q (w | θ, σ). In the process of optimization, this plausibility is estimated by the Monte
Carlo method with one sample of weights. As in the model of the TVD model, the
reparameterization trick and additive reparameterization are used here in order to obtain
unbiased gradients with low dispersion.
(16)
In plausibility, the dependence of the target variable yi is expanded in time to emphasize
that the same weights Wx, Wh are used at all times. So normal noise in Bayesian
regularization of RNN, it must be time bound: the same noise sample is used for one object at
a time.
However, in Bayesian regularization of RNN, local reparameterization cannot be applied
to either Wx or Wh scales. Applying local reparameterization to the Wx weights matrix in the
RNN implies using the same noise sample to preactivate Wxxt ∀t, which is not equivalent to
using the same Wx weights sample at all points in time. For Wh, local reparametrization cannot
be applied for another reason: since ht − 1 and Wh are not independent random variables, the
assertion about the sum of normal distributions is not applicable to the product Whht − 1.
Instead of using local reparameterization in order to avoid resource-intensive sampling of
three-dimensional noise matrices, it is proposed to use one sample of Wx and Wh weights for
all objects of one mini-batch.
A similar scheme can be applied to gate architectures, for example, LSTM. In this case, the
prior and approximate a posteriori distribution will be used for each of the parameter matrices.
Use of Recurrent Neural Network Architectures for Data Verification in the System of Distance
Education
http://www.iaeme.com/IJCIET/index.asp 1680 [email protected]
At the testing stage, we can use the average values of weights Θx, Θh by analogy with. In
addition, weights with large values get nulled.
The final diagram of one-time step forward along the thinning Bayesian RNN, including
zeroing the scales in test mode, is given in algorithm (4).
When applying a thinning dropout to other RNN layers preceding or following the recurrent
layer, at the training stage one should use the same sample of weights at all times for one object.
Thus, when formulating the TVD model for RNN, the following features of recurrent
neural networks are taken into account:
1. sampling the same noise on the weight at all points in time;
2. unlike the direct distribution networks, local reparametrization is not applicable to
RNN, therefore it is proposed to sample one weights matrix for all mini-batch
objects.
In order for thinning to help speed up the forward path through the recurrent neural network
when performing computations on GPUs, you need to remove weights in groups corresponding
to one neuron. To do this, you can apply the approach described. However, this approach can
be improved to obtain different levels of sparseness in gate recurrent architectures. Consider
this approach for the most popular LSTM gateway architecture.
In LSTM, in addition to the latent state vector, the internal memory ct vector is maintained
at each time instant. At each time step, the memory is first updated using the gate mechanism,
then the hidden state is updated:
(17)
By analogy with (14), in order to achieve group thinning, we introduce into the model
multiplicative group variables on weights. In addition to the group variables zx and zh on the
rows of weights matrices responsible for excluding the elements of the input and hidden
vectors, we also introduce the group variables zi, zf, zg and zo on the columns of the weights
matrices responsible for causing the gates i, f, o and information flow g to input data. For
example, for the matrix Wfx we get the following parameterization of the weights:
wf, ijx = wˆf,ijx zix · zjf. (18)
Such parametrization corresponds to the imposition of multiplicative noise on the input
vector xt and the hidden state ht, as well as separate multiplicative noise on the preactivation
of gates and information flow:
(19)
Arseniy Aleksandrovich Lebedev
http://www.iaeme.com/IJCIET/index.asp 1681 [email protected]
When the zx and zh components are zeroed out, the element of the input vector or hidden
state is excluded from the model, respectively. When the components zi, zf, zg, or zo are zeroed
out, the element of the corresponding gate or information flow becomes constant, not
determined by the input data xt and ht. Note that the appearance of constant gates in the model
simplifies, but does not violate the structure of LSTM, and in addition, it saves the computation
of matrix products.
The standard normal a priori distribution for individual weights wˆij was used, but in
practice this limits the thinning of the model. In this paper, it is proposed to use such a prior-
approximate a posteriori distribution, as in the TVD for the RNN, for all groups of weights (for
example, the distributions for the Wfx matrix are given):
(20)
Due to the thinning of all three groups of weights, a hierarchical effect is achieved: thinning
individual weights contributes to the appearance of constant gates and simplifies the structure
of the LSTM, which, in turn, helps to eliminate the xt and ht elements.
The sampling of the group variables zx, zh, zi, zf, zg and zo is carried out using the trick of
reparametrization and additive reparametrization, as in the model of the TVD for RNN. The
training of the model, the forward passage and the testing phase are similar to the TVD model
for RNN with the addition of only additional sampling of group variables in the forward pass
and the components of KL-divergence responsible for group variables in the variational lower
bound (15).
Group thinning can be applied similarly to the layer of representations. To do this, you need
to introduce group multiplicative variables to the elements of the dictionary and dilute both the
elements of the representation matrix and the multiplicative group variables. The a priorI and
approximate a posteriori representation for weights remain the same as in the group TVD
model described above for RNN. As a result of applying such a model, the effect of thinning
the input dictionary is achieved, that is, the selection of features.
Thus, when formulating the model of group TVD for RNN, the following features of the
gating recurrent neural networks are taken into account:
Introduced multiplicative variables on the preactivation of gates and information flow;
For weights wij, a thinning log-uniform prior distribution is used, which enhances the
thinning of group variables.
Traditional RNN maps the input sequence of vectors x1, ..., xT to the output sequence of
vectors y1, ..., yT. This is achieved by calculating the sequence of “hidden” states h1, ..., hT,
considered as sequential registration of information about past states and which is relevant for
predicting future states. These variables are connected by a simple system of equations
presented in formulas (4) and (11):
ht = tanh(Whxxt + Whhht-1 + bh), (21)
yt = σ(Wyhht + by), (22)
Use of Recurrent Neural Network Architectures for Data Verification in the System of Distance
Education
http://www.iaeme.com/IJCIET/index.asp 1682 [email protected]
where tanh and sigmoid function σ () are applied element by element, Whx is the matrix of
weights of input values, Whh is the matrix of recurrent weights, Wyh is the matrix of weights
of output values, bh and by are the offset regulators of hidden and output values, respectively.
Traditional RNN has problems associated with the propagation of the error signal when a
large number of analyzed states. The RNN architecture that solves this problem most efficiently
is LSTM. In LSTM, hidden items retain their values until they are completely cleared by
triggering special “forget gate” gates. Due to the mechanism of the LSTM valves, information
on intermediate states is stored much longer than conventional RNN, which greatly facilitates
the process of their learning. In addition, in LSTM, hidden layers are updated using
multiplicative interaction (rather than additive), which allows this architecture to reflect much
more complex transformations with the same number of neurons in hidden layers.
The LSTM variant of the RNN was used in the two most recent experimental studies
devoted to studying the capabilities of the RNN for predicting educational outcomes.
In the work of scientists from Kyushu University, the accuracy of predicting the future
score for the course “Information Science” was investigated on the basis of the actions of 108
students who are journaled by M2B CSR training support. The input to the RNN was obtained
from journals by giving the student a score from 0 to 6 on the following scales:
1. attendance (0 - no, 3 – late attendance, 5 – in-time attendance);
2. point for testing (with a scale breakdown with a step of 20%);
3. the fact of delivery of the report (0 - not submitted, 3 - delivered late, 5 - delivered
on time);
4. the number of views of course materials, materials in the service of electronic
textbooks, actions in the service of electronic textbooks, words in texts sent to the
service of electronic portfolio (broken down by step 10%, below 50% - 0 points).
A total of 9 variables were proposed. Output data served as a final score for the course on
a 5-point scale. The paper does not indicate the specific type of LSTM used for forecasting,
however, based on a brief description, it can be assumed that this is the traditional LSTM
described in the paper.
As one step (state), there were used scores on 9 scales, given to a student for one week of
classes. The maximum number of weeks used in the experiment was 15. The method of
multiple regression and the coefficient of determination were used as a control in the
experiment. Figure 1 shows the accuracy of the prediction of the final score after each week of
the course.
Figure 1. The accuracy prediction of the final score prediction after each week of the course
Arseniy Aleksandrovich Lebedev
http://www.iaeme.com/IJCIET/index.asp 1683 [email protected]
From Figure 1 it can be seen that when using LSTM, prediction accuracy of >90% is
achieved at the 6th week step, while for multiple regression only at the 10th one, and for the
determination coefficient - only at the 14th one.
Of particular interest in the work is the fact that as input data are used not individual
properties of students, but directly written program code. For vectorization of program code,
the researchers used the method they developed based on building the AST representation of
the student’s program code by analogy with the presentation of texts in natural language.
As a control method, the logistic regression method was used, the input data for which was
a two-dimensional vector. The first element of this vector is calculated as a function inversely
proportional to the number of appearances among the one student sent program codes that are
close to the correct version. The second element is a binary sign of the success / failure of the
assignment.
Figure 2 shows the graphs of the prediction accuracy of the educational outcome of the next
assignment for the control method and LSTM.
Figure 2. The accuracy of the educational result of the next task prediction, depending on the number
of attempts to send the task to students
In Figure 2 it can be seen that the LSTM method has an average of 10 percentage points
higher accuracy compared to the control one. The minimum accuracy of LSTM is more than
80%. The difference in accuracy is explained by the authors of the work by the fact that the
LSTM method allows building a forecast directly based on the meaning of the student’s
response (properties of its program code), while the control method takes into account only the
number of attempts that were close to successful.
5. CONCLUSION
The extremely high prediction accuracy achieved in the experiment (100%) is explained, in our
opinion, by a simple 5-point scale of estimation. It should be noted that in no other work
analyzed in the framework of the analytical review such accuracy has been achieved.
Nevertheless, given the widespread 5-point scale in the Russian Federation, it is extremely
important that the use of the simplest LSTM architecture in combination with a small set of
equally simple input data can provide such prediction accuracy, at least for subject-specific
results.
The paper describes an experiment of using LSTM to predict an educational result based
on the source code of the programs compiled by students during the execution of a single task
of the Hour of Code mass online course on the Code.org platform. The input data set contained
about 1.2 million program codes compiled by 263.5 thousand students. At the disposal of
Use of Recurrent Neural Network Architectures for Data Verification in the System of Distance
Education
http://www.iaeme.com/IJCIET/index.asp 1684 [email protected]
researchers there were data only for two tasks of the course. As a single step (state), one attempt
was made by the student to send a program code (students had the opportunity to send several
program codes in response to one task). At the same time, only those students who made from
2 to 10 attempts to complete the task were selected from the total data set. LSTM should have
predicted the educational outcome in the form of the likelihood of a successful assignment,
next to the one whose performance data was used for LSTM training.
The traditional LSTM described was used for training (the authors indicate this explicitly).
Since the output of the LSTM must produce a probabilistic value, the output values of the last
cell of the network pass through a fully connected layer and the next layer with the Softmax
activation function.
FUNDING STATEMENT
Applied research described in this paper is carried out with financial support of the state
represented by the Russian Federation Ministry for Education and Science under the
Agreement #14.576.21.0091 of 26 September 2017 (unique identifier of applied research -
RFMEFI57617X0091).
REFERENCES
[1] Hughes, E. Sh., Bradford, J. and Likens, C. Facilitating Collaboration, Communication,
and Critical Thinking Skills in Physical Therapy Education through Technology-Enhanced
Instruction: A Case Study. TechTrends, 62(3), 2018, pp. 296–302. DOI: 10.1007/s11528-
018-0259-8.
[2] Abramov, R. A. Management Functions of Integrative Formations of Differentiated Nature.
Biosci Biotech Res Asia, 12(1), 2015, pp. 991–997.
[3] Poth, Ch. The Contributions of Mixed Insights to Advancing Technology-Enhanced
Formative Assessments within Higher Education Learning Environments: An Illustrative
Example. International Journal of Educational Technology in Higher Education, 15(1),
2018, pp 9. DOI: 10.1186/s41239-018-0090-5.
[4] Altinay, F., Dagli, G. and Altinay, Z. Role of Technology and Management in Tolerance
and Reconciliation Education. Quality & Quantity, 51(6), 2017, pp. 2725–36. DOI:
10.1007/s11135-016-0419-x.
[5] Zagami, J. et al. Creating Future Ready Information Technology Policy for National
Education Systems. Technology, Knowledge and Learning, 23(3), 2018, pp. 495–506.
DOI: 10.1007/s10758-018-9387-7.
[6] Fox-Turnbull, W. H. The Nature of Primary Students’ Conversation in Technology
Education. International Journal of Technology and Design Education, 26(1), 2016, pp. 21–
41. DOI: 10.1007/s10798-015-9303-6.
[7] Campbell, T. and Oh, Ph. S. Engaging Students in Modeling as an Epistemic Practice of
Science. Journal of Science Education and Technology, 24(2-3), 2015, pp. 125–31. DOI:
10.1007/s10956-014-9544-2.
[8] Baines, D et al. Conceptualising Production, Productivity and Technology in Pharmacy
Practice: A Novel Framework for Policy, Education and Research. Human Resources for
Health, 16(1), 2018, pp. 51. DOI: 10.1186/s12960-018-0317-5.
[9] Chen, G., Xu, B., Lu, M. and Chen N.-Sh. Exploring Blockchain Technology and Its
Potential Applications for Education. Smart Learning Environments, 5(1), 2018, pp. 1.
DOI: 10.1186/s40561-017-0050-x.
[10] Hsu, L. Diffusion of Innovation and Use of Technology in Hospitality Education: An
Empirical Assessment with Multilevel Analyses of Learning Effectiveness. The Asia-
Arseniy Aleksandrovich Lebedev
http://www.iaeme.com/IJCIET/index.asp 1685 [email protected]
Pacific Education Researcher, 25(1), 2016, pp. 135–45. DOI: 10.1007/s40299-015-0244-
3.
[11] Fletcher-Watson, S. Evidence-Based Technology Design and Commercialisation:
Recommendations Derived from Research in Education and Autism. TechTrends, 59(1),
2015, pp. 84–88. DOI: 10.1007/s11528-014-0825-7.
[12] Abramov, R. A., Tronin, S. A., Brovkin, A. V. and Pak, K. C. Regional features of energy
resources extraction in eastern Siberia and the far east. International Journal of Energy
Economics and Policy, 8(4), 2018, pp. 280–287.
[13] Damewood, A. M. Current Trends in Higher Education Technology: Simulation.
TechTrends, 60(3), 2016, pp. 268–71. DOI: 10.1007/s11528-016-0048-1.
[14] Dorner, H. and Kumar, S. Online Collaborative Mentoring for Technology Integration in
Pre-Service Teacher Education. TechTrends, 60(1), 2016, pp. 48–55. DOI:
10.1007/s11528-015-0016-1.
[15] McLain, M. Emerging Perspectives on the Demonstration as a Signature Pedagogy in
Design and Technology Education. International Journal of Technology and Design
Education, 28(4), 2018, pp. 985–1000. DOI: 10.1007/s10798-017-9425-0.
[16] Virtanen, S., Räikkönen, E. and Ikonen, P. Gender-Based Motivational Differences in
Technology Education. International Journal of Technology and Design Education, 25(2),
2015, pp. 197–211. DOI: 10.1007/s10798-014-9278-8.
[17] Muller, J. The Future of Knowledge and Skills in Science and Technology Higher
Education. Higher Education, 70(3), 2015, pp. 409–16. DOI: 10.1007/s10734-014-9842-x.
[18] Tondeur, J., van Braak, J., Ertmer, P. A. and Ottenbreit-Leftwich, A. Erratum to:
Understanding the Relationship between Teachers’ Pedagogical Beliefs and Technology
Use in Education: A Systematic Review of Qualitative Evidence. Educational Technology
Research and Development, 65(3), 2017, pp. 577. DOI: 10.1007/s11423-016-9492-z.
[19] Mapotse, T. A. An Emancipation Framework for Technology Education Teachers: An
Action Research Study. International Journal of Technology and Design Education, 25(2),
2015, pp. 213–25. DOI: 10.1007/s10798-014-9275-y.
[20] Mapotse, T. A. Development of a Technology Education Cascading Theory through
Community Engagement Site-Based Support. International Journal of Technology and
Design Education, 28(3), 2018, pp. 685–99. DOI: 10.1007/s10798-017-9411-6.