Evaluating Predictive Knowledge

Evaluating Predictive Knowledge

Alex Kearney, Anna Koop, Craig Sherstan, Johannes GuntherRichard S. Sutton, Patrick M. Pilarski, Matthew E. Taylor

Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;[email protected]

AbstractPredictive Knowledge (PK) is a group of approaches to ma-chine perception and knowledgability using large collectionsof predictions made online in real-time through interactionwith the environment. Determining how well a collection ofpredictions captures the relevant dynamics of the environ-ment remains an open challenge. In this paper, we introducespecifications for sensorimotor baselines and robustness-to-transfer metrics for evaluation of PK. We illustrate the use ofthese metrics by comparing variant architectures of GeneralValue Function (GVF) networks.

Predictive KnowledgeA key challenge for machine intelligence is that of represen-tation: a system’s performance is tied to its ability to per-ceive and represent its environment. Predictive knowledgerepresentations use large collections of predictions to modelthe environment. An agent continually anticipates its sensa-tion from its environment by making many predictions aboutthe dynamics of its environment with respect to its behaviour(Modayil, White, and Sutton 2014). These predictions aboutexpected sensation can then be used to inform an agent’sinternal representation of its environment (Littman and Sut-ton 2002). Other proposals describe inter-relations of pre-dictions, similar to TD Networks (Tanner and Sutton 2005;Makino and Takagi 2008) to enable abstract, conceptual rep-resentations by making predictions of predictions (Schapireand Rivest 1988).

In this paper we discuss the subtleties of evaluation pre-dictive representation and propose two complimentary tech-niques. We specifically consider PK methods that 1) are ableto expand their representations by proposing new predic-tions, 2) are able to self-verify their predictions through in-teraction with their environment, and 3) are able continuallylearn their predictions on-line.

To examine these evaluation metrics we use the GeneralValue Function framework for predictive representations(White 2015). GVFs estimate the expected discounted returnof a signal C defined as Gt =

�∞k=0(

�kj=1(γt+j))Ct+k+1.

Value is estimated with respect to a specific policy π,discount function γ, and cumulant c: v(s;π, γ, c) =Eπ[Gt|St = s].

The parameters c, π, and γ are the question parameterswhich specify what a GVF is about; the answer param-

Environment

How manylayers should

be in thenetwork?

How do you determine whichGVFs to replace and when to

replace them?

How many GVFsshould be in each

layer?

How are the questionand answer

parameters chosen?

GVF1,1 GVF1,2 GVF1,3 GVF1,4

GVF2,1 GVF2,2 GVF2,3 GVF2,4

QuestionParameters

AnswerParameters

π

γ

c

α

λ

How should the inputsbe constructed for each

layer?

ot

pt

Figure 1: Many of the decisions which specify a PK architecture.

eters—such as the step size α and eligibility decay λ—describe how a learning method learns to answer the GVFquestion. GVFs can be learnt online, incrementally throughmethods such as Temporal-difference (TD) learning (Sutton1988). The representational power of a given GVF networkdepends not just on the quality of the answers, but also inthe architecture of the network, as illustrated in Figure 1.

PK systems have been shown to be a scalable way toupdate and verify an agent’s representation of the world,with examples of real-world robotic prediction tasks mak-ing thousands or tens of thousands of predictions in real-timeon consumer-grade devices (Sutton et al. 2011; White 2015;Pilarski and Sherstan 2016).

Evaluating PK ArchitecturesExisting evaluation metrics for PK fall into two categories:1) reporting the average error over all predictions within thePK system and 2) reporting errors on a known, challeng-ing subset of the predictions within the system. Reportingthe average error penalizes the accuracy of every predictionequally, when some predictions may have high error (suchas for inherently random signals) but still provide represen-tational power. Conversely, a representation that makes irrel-evant but constant predictions will perform well according toaverage error, while providing no useful signals. Reportingerrors on a subset of predictions requires identification ofsaid subset across all architectures and lends itself to over-fitting for those particular questions. It is difficult to iden-

Reasoning and Learning in Real-World Systems for Long-Term Autonomy (LTA)Papers from the AAAI 2018 Fall Symposium

43

tify predictions of interest without biasing towards particulararchitectures or network structures. Identifying predictionsthat require more complex representations in real-world set-tings requires extensive domain knowledge. In addition, itforces the inclusion of those pre-defined predictions when agoal of PK is to independently construct a useful representa-tion. Neither of these are entirely satisfactory proxies for thereal question: What is the representational power of a givenPK system?

As a result of this evaluation bottleneck, examples of PKon real-world problems are largely proof-of-concept appli-cations which serve to highlight the type, quantity, and di-versity of predictions which can be made (Pilarski and Sher-stan 2016; Modayil, White, and Sutton 2014; Sutton et al.2011). Where evaluation exists, it focuses on prediction er-ror as a means of evaluating the quality of a collection of pre-dictions. This is insufficient, as the reliability of predictionsdoes not necessarily equate to the quality of a learned rep-resentation. While low prediction error describes the qualityof a single predictor, low average prediction error is not nec-essarily indicative of the best collection of predictions forconstructing representations of the environment.

For example, one could maintain a diverse collection ofGVFs for different time-scales γ and policies π that exclu-sively anticipate the voltage of servos on a robotic limb—asignal that is often constant. These trite predictions wouldlikely have a lower error than a collection of predictionswhich represent the environment more completely. More-over, comparing the average error between two collectionsof predictions with different question parameters is inap-propriate, as the errors are with respect to different signals.When we compare the average error of different sets of pre-dictions in PK architectures, we are unable to meaningfullyquantify how changes in the architectural proposal impactthe knowledgability of a system.

We propose sensorimotor predictions as a baseline whichbalances our ability to meaningfully assess the representa-tional capacity of a collection of predictions in a meaningfulway, while being general enough to be extensible to real-world prediction problems.

Evaluation by Sensorimotor BaselinesA scalable alternative to comparison by hand-crafted pre-dictions is to maintain a collection of baseline sensorimotorpredictions common between each architecture being eval-uated. Instead of hand-crafting predictions based on the id-iosyncrasies of a particular domain, a sensorimotor baselineuses the observations from the environment as prediction tar-gets. The identification of good features is integral to beingable to make reliable predictions; in evaluating the ability ofa system to predict its raw stimuli, we are in fact evaluatingthe ability of the system to perform representation learningfor the simplest predictions we could want to make.

By comparing architectures based on how well they canrepresent their stimuli, we are prioritizing architectures thatare able to find better representations for learning low-levelsensory input, rather than better representations of the envi-ronment in general. While a limitation, it is a natural ap-proach to evaluation: approaches to PK have been moti-

Figure 2: The data source for the experiments in this work: TheBento Arm, controlled by a human participant, generating a streamof multimodal sensory data from participants’ interactions with amodified Box and Blocks task.

vated by being able to anticipate their environment (Mo-dayil, White, and Sutton 2014), and low-level anticipatorypredictions are useful as inputs in applications of PK (Sher-stan, Modayil, and Pilarski 2015).

Sensorimotor baselines are a balance between the twoaforementioned methods of evlauation: Baseline predictionsenable us to assess the representation generated by our PKsystem with no designer intervention, making them a gen-eral scalable alternative for evaluation of real-world systems.By assessing representation quality, we can begin to pre-cisely quantify the impact of different construction methodsin real-world domains. Using sensorimotor baselines is a fairfirst step in bridging the evaluation gap between toy domainsand real-world problems.

Evaluation by TransferPerhaps one of the most natural qualities of an effective PKsystem is generality. PK systems are intended for use inlife-long, continual learning methods—methods that are ex-pected to learn for the duration of their deployment. In sucha setting it is imperative that the predictions being made areresilient to changes in their environment. A method of eval-uating the ability of a continual learning system to producegeneral representations is through transfer-learning (Taylorand Stone 2009). We can evaluate the generality of PK byconstructing GVFs in one setting and testing their general-ity on experience in a transfer environment that shares sometraits with the source setting. An architecture that is able topropose and interrelate GVFs such that they are more robustto such transfers is an architecture that produces more gen-eral representations.

Experiment: Prosthetic Prediction TaskWe explore sensorimotor baselines and transfer using datafrom a human control task on the Bento Arm (Dawson etal. 2014), an open-source robot arm intended for use as aresearch prosthesis. Human control of a robotic prosthesisis an area with active interest in PK (Pilarski and Sherstan2016), and GVFs have been previously used to improve thecontrol in this domain (Pilarski et al. 2013).

Data for this experiment was sourced from the previousexperiments of (Edwards et al. 2016). Four users performeda common manipulation challenge where they used the robotarm to move objects over a barrier (Figure 2). Each user

44

Layer 1 Layer Layer

En

vir

on

me

nt

Figure 3: The architecture used for comparison. Each layer hasd baseline predictions which predict each of the elements in theobservations �o . Each layer has n additional predictions. The cu-mulants c1...n are functions of some output of the previous layer�pm−1, or in the case of the first layer, the observations �ot. Forall predictions γ = 0.95, λ = 0, and step sizes are initializedto α0 = 1

50, where 50 is the number of active features. Predictions

are on-policy—π is always the robot arm’s behaviour. Experimentsvary the number of layers m and number of additional GVFs n.

performed the task three times using two different controlschemes. We use one control scheme as the source environ-ment and the other as a transfer environment, yielding 12trials in total. The signals used to construct the observationsare the position, load, velocity, and a binary movement sig-nal for both the shoulder and hand joints.

The PK ArchitectureTo explore how choices in architecture impact the qualityof learned predictions, we start with a straightforward rep-resentation using layers of GVFs (Figure 3). Inputs are pro-duced in a feed-forward fashion: the base layer receives theobservations from the environment ot as state st, while eachadditional layer receives the output predictions from the pre-vious layer. At each time-step, the position, velocity, andload of the shoulder and the gripper were used to constructthe environment observations ot. Step sizes are adapted us-ing TIDBD (Kearney et al. 2017). We construct a binaryrepresentation of state by using a selective Kanerva coder(Travnik and Pilarski 2017) with 2000 prototypes and 50 ac-tive features.

In addition to the baseline sensorimotor predictions, thereare n GVFs that are proposed and tested by the system.When proposing a new GVF, the architecture must specifyboth what the GVF is about by choosing c, γ, and π, andhow the prediction is learnt by choosing appropriate learningparameters—in this instance, α and λ. Our architecture gen-erates GVFs by randomly choosing cumulants, where c caneither be an accumulation of a signal from the previous layer,or an operation on two signals—sums, differences, products,and ratios.

Each trial includes 20000 time-steps on a non-adaptivesource setting where predictions are constructed, and 20000time-steps on an adaptive switching transfer setting wherethe GVFs remain the same, but continue to be updated ateach time-step. During the source setting, every 1000 time-steps the worst 10% of GVFs by average prediction error areculled and replaced with new GVFs, excluding the baseline

Layer 1

Layer 2

Layer 3

Layer 4

Environment SwitchSetting Switch

Figure 4: Accumulation of prediction error averaged over all sen-sorimotor baseline predictions in each layer. Our architecture hasfour layers and 100 constructed predictions in addition to the senso-rimotor baseline. Error is averaged over 12 trials; variance is plot-ted but negligible.

predictions we use for evaluation. Prediction error is calcu-lated online by estimating the discounted return from a sam-ple of observed signals over approximately seven times 1

1−γ ,the expected time to termination.

EvaluationTo demonstrate evaluation using baseline predictions andsetting transfer, we analyse the impact of two specificationchoices: 1) the number of layers in a network (Figure 4) and2) the number of predictions in each layer (Figure 5). As in-dicated in Figure 1, these are decisions a designer must makewhen designing an architecture, and to date there is no clearintuition as to how these decisions impact the quality of therepresentation constructed. Our baseline predictions are ofthe load, position, velocity, and binary movement signal ofthe shoulder and gripper, or 8 predictions in total.

Since the first layer constructs its state exclusively fromthe observations from the setting ot, it is not using anylearned representations; by comparing each additional layerto the first layer, we assess how increasing representationalabstraction impacts the baseline prediction error. Both thesecond and third-layer representations outperform predic-tions with no representation construction, while the fourthlayer performs the worst. The sensorimotor baseline clearlyillustrates the impact of abstraction on the ability to repre-sent the environment.

There is a tension between what the GVFs in a layer candescribe and the dimensionality of the representation for thefollowing layer. Interestingly, the relationship between per-formance is not directly proportional to the number of pre-dictions: while 10 additional predictions has the greatest per-formance, 100 additional predictions outperforms both 30and 60 additional predictions. Of note is resilience to trans-fer to new prediction settings. Under none of the circum-stances did the the methods accumulate substantially morebaseline error after the switch to the transfer prediction set-ting. This demonstrates that the constructed representationsgeneralized well between different control settings; how-ever, in the future more complex transfer settings could bechosen.

45

10 predictions

30 predictions

200 predictions

100 predictions

60 predictions

Environment Switch

300 predictions

Setting Switch

Figure 5: Accumulation of prediction error for averaged over allsensorimotor baselines predictions in the second layer. We vary thenumber of constructed predictions and the sensorimotor baseline.Error is averaged over 12 trials; variance is plotted but negligible.

By using a baseline of predictions, and performing trans-fer, we were able to elucidate how changes in the architec-ture impact predictive representations in a manner which re-quires little computational overhead. In doing so, we are pro-viding a first step towards being able to study the impact ofarchitectural choices on the learned representations of PKsystems on real-world domains.

Limitations & Further WorkThis paper’s core contributions are a discussion of chal-lenges in evaluation in PK and we do not perform an ex-haustive evaluation of all the possible choices which couldbe made. For instance, we only consider the on-policy case,limiting the ability of our architecture to capture the impactof behaviour on the dynamics of the setting. In addition, fu-ture work could expand evaluation to include internal sig-nals from predictions. For instance, internal signals of pre-dictions corresponding to feature relevance could be usedto identify the degree to which predictions used in the con-struction of state are impacting the model.

ConclusionPredictive approaches to knowledge are a rich and variedarea of reinforcement learning research that focus on build-ing internal representations of the environment through con-tinual, life-long interaction. There has been recent successin refining fundamental aspects of PK architectures on toydomains; however, these evaluation methods do not transfereffectively to large, real-world problems, such as applica-tions in robotics, a core domain for predictive approaches toknowledge. In this paper, we highlight challenges in devel-oping PK architectures and, as a primary contribution, pro-pose the use of sensorimotor baselines and setting transfer toassess the quality of representations learned using PK. Wedemonstrate the usefulness of sensorimotor baselines andsetting transfer by elucidating the impact of increasing thenumbers of layers and number of predictions in each layeron the ability of an architecture to predict its stimuli. In pro-viding preliminary evaluation methods for knowledge con-struction, we are taking a necessary step in the developmentof predictive knowledge.

References[Dawson et al. 2014] Dawson, M. R.; Sherstan, C.; Carey, J. P.;

Hebert, J. S.; and Pilarski, P. M. 2014. Development of the BentoArm: An improved robotic arm for myoelectric training and re-search. Proceedings of MEC 14:60–64.

[Edwards et al. 2016] Edwards, A. L.; Dawson, M. R.; Hebert, J. S.;Sherstan, C.; Sutton, R. S.; Chan, K. M.; and Pilarski, P. M. 2016.Application of real-time machine learning to myoelectric prosthe-sis control: A case series in adaptive switching. Prosthetics andorthotics international 40(5):573–581.

[Kearney et al. 2017] Kearney, A.; Veeriah, V.; Travnik, J.; Sutton,R. S.; and Pilarski, P. M. 2017. Every step you take: VectorizedAdaptive Step sizes for Temporal Difference Learning.

[Littman and Sutton 2002] Littman, M. L., and Sutton, R. S. 2002.Predictive representations of state. In Advances in neural informa-tion processing systems, 1555–1561.

[Makino and Takagi 2008] Makino, T., and Takagi, T. 2008. On-line discovery of temporal-difference networks. In Proceedings ofthe 25th international conference on Machine learning, 632–639.ACM.

[Modayil, White, and Sutton 2014] Modayil, J.; White, A.; andSutton, R. S. 2014. Multi-timescale nexting in a reinforcementlearning robot. Adaptive Behavior 22(2):146–160.

[Pilarski and Sherstan 2016] Pilarski, P. M., and Sherstan, C. 2016.Steps toward knowledgeable neuroprostheses. In BiomedicalRobotics and Biomechatronics (BioRob), 2016 6th IEEE Interna-tional Conference on, 220–220. IEEE.

[Pilarski et al. 2013] Pilarski, P. M.; Dawson, M. R.; Degris, T.;Carey, J. P.; Chan, K. M.; Hebert, J. S.; and Sutton, R. S. 2013.Adaptive artificial limbs: A real-time approach to prediction andanticipation. IEEE Robotics & Automation Magazine 20(1):53–64.

[Schapire and Rivest 1988] Schapire, R. E., and Rivest, R. L. 1988.Diversity-based inference of finite automata. Master’s thesis, Mas-sachusetts Institute of Technology, Dept. of Electrical Engineeringand Computer Science.

[Sherstan, Modayil, and Pilarski 2015] Sherstan, C.; Modayil, J.;and Pilarski, P. M. 2015. A collaborative approach to the simul-taneous multi-joint control of a prosthetic arm. In RehabilitationRobotics (ICORR), 2015 IEEE International Conference on, 13–18. IEEE.

[Sutton et al. 2011] Sutton, R. S.; Modayil, J.; Delp, M.; Degris, T.;Pilarski, P. M.; White, A.; and Precup, D. 2011. Horde: A scalablereal-time architecture for learning knowledge from unsupervisedsensorimotor interaction. In AAMAS 2011, 761–768. InternationalFoundation for Autonomous Agents and Multiagent Systems.

[Sutton 1988] Sutton, R. S. 1988. Learning to predict by the meth-ods of temporal differences. Machine learning 3(1):9–44.

[Tanner and Sutton 2005] Tanner, B., and Sutton, R. S. 2005.Temporal-Difference Networks. In International Conference onMachine Learning.

[Taylor and Stone 2009] Taylor, M. E., and Stone, P. 2009. Transferlearning for reinforcement learning domains: A survey. Journal ofMachine Learning Research 10(Jul):1633–1685.

[Travnik and Pilarski 2017] Travnik, J. B., and Pilarski, P. M. 2017.Representing high-dimensional data to intelligent prostheses andother wearable assistive robots: A first comparison of tile codingand selective Kanerva coding. IEEE International Conference onRehabilitation Robotics: [proceedings] 2017:1443–1450.

[White 2015] White, A. 2015. Developing a predictive approach toknowledge. PhD Thesis, PhD thesis, University of Alberta.

46

Date post:	12-Nov-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Evaluating Predictive Knowledge

Documents