Calrg14 tm351

Abstract: TM351 is a new course on databases, data explora:on and simple data visualisa:on, due for first presenta:on in 2015J. The course will require students to make use of several different databases and work with them in an interac:ve fashion. The course team are proposing to use a virtual machine (VM) to deliver all the course soKware to student as an integrated package, allowing the same "student soKware lab" configura:on to run on different plaMorms (Windows, OS/X, Linux) or even on a cloud server. The main user interface to the services being run on the VM will be via a web browser on the student's host opera:ng system (that is, their "normal" one) in the form of an IPython Notebook. This interface provides a blend of text cells (wriYen using markdown) that can display HTML text, images, etc, and executable code cells. Code cells can be edited by students and then the code fragments contained within them executed on a cell by cell basis, the output from the code execu:on being inserted within the document. In this presenta:on, I will review the ra:onale for adop:ng the use of a virtual machine to deliver soKware environments to students, and demonstrate how we intend to make use of IPython notebooks for the delivery of interac:ve course materials. Possible applica:ons to other courses will also be reviewed, along with a considera:on of possible workflows associated with the produc:on, maintenance and student use.

1

TM351 – a new course, currently in produc:on… -‐  Level 3 -‐  30 points -‐  First presenta:on slated for October 2015 (15J)

2

It’s replacing a “tradi:onal” databases course, but we’re planning quite a twists… What those twists are in content terms, though, is the subject of an other presenta:on…

3

What I am going to talk about are two new things we’re exploring in the context of the course, and which we’re hopefully might also prove aYrac:ve to other course teams.

4

The first thing are virtual machines. These have already been used on a couple of other OU courses – TM128 and M812 both use virtual machines – but we are taking a more fundamental view about how to use notebooks to delivering interac:ve teaching material as well as soKware applica:on services. So what is a virtual machine?

5

We’re all familiar with the idea that a student can run OU supplied soKware, either third party soKware or OU created soKware, or a combina:on of both in the case of open source applica:ons where we take some open code and then modify it ourselves, on the student’s own desktop.

6

We may even require students to install more that one piece of soKware, perhaps further requiring that these applica:ons can interoperate. With a move to be be “open” and agnos:c towards a par:cular opera:ng system, there are considerable challenges to be faced: -‐  soKware libraries should ideally be cross plaMorm rather than mul:ple na:ve

implementa:ons of the ostensibly the same applica:on; -‐  soKware versions across applica:ons should update in synch with each other; -‐  the UI, or look and feel, should be the same across plaMorms – or we have more

wri:ng to do; -‐  support issues are likely to scale badly for us as we have to cope with more

varia:ons on the configura:on of individual student machines (for example, different opera:ng systems, different versions of the same opera:ng system);

7

One way of mi:ga:ng against change is to seYle on a single UI space – such as a browser. Applica:ons can be built solely within the browser, and made available to the user requiring liYle more desktop (or server) applica:on support other than a web server. Applica:on front ends wriYen in HTML5 and Javascript can provide an experience rich enough to rival that of a na:ve applica:on. Applica:on front ends can also be created for applica:ons running as services either on the students’ desktop or via a remote server. Applica:ons can draw on files in a folder on the student’s desktop machine, and the browser can be used to save files (e.g. from the internet) into that folder.

8

To get round the problem of having to install soKware onto mul:ple different possible system configura:ons, how much easier would it be if we knew exactly what opera:ng system each student was running and they were all running exactly the same opera:ng system. Virtualisa:on plaMorms such as Viirtualbox and VMware are cross-‐plaMorm applica:ons that can be downloaded to a student’s own machine and that then allow an addi4onal guest opera4ng system to be installed in its own container running on the the student’s own computer (the host) via the virtualisa4on pla;orm. The guest opera:ng system and the soKware that runs on the guest opera:ng system are said to define a virtual machine or “VM”. The virtual machine can be defined by a central service and then delivered to the students in such a way that each receives a copy of exactly the same virtual machine in terms of its opera:ng system and the applica:ons preinstalled onto it.

9

What this means is that we can define a VM, preinstall soKware onto it, and ship it to students so they can run it via a virtualisa:on plaMorm installed onto their machine. The VM can run applica:ons as services, exposing their UIs via a browser. Files can easily be shared between the host and guest machines. As far as students are concerned, all they need to do is install a virtualisa:on system onto their computer, and then the same OU virtual machine into that system irrespec:ve of the opera:ng system they happen to be running.

10

It is also possible to run the VM on a remote server, with the students accessing the services running in that VM via their browser. This means that students can access services using computers that themselves may not be capable of installing or running par4cular applica4ons – such as some tablet computers.

11

Notebook compu:ng is my great hope for the future. Notebook compu4ng is like spreadsheet compu4ng, a democra:sa:on of access to and the process of prac:cally based, task oriented compu:ng. Spreadsheets help you get stuff done, even if you don’t consider yourself to be a programmer. My hope is that the notebook metaphor – and it’s actually quite an old one – can similarly encourage people who don’t consider themselves programmers to do and to use programmy things.

12

Notebook compu4ng buys us in to two ways of thinking that I think are useful from a pedagogical perspec:ve – that is, pedagogy not just as a way of teaching but also as a way of learning in the sense of learning about something through inves4ga4ng it. Here, I’m thinking of an inves:ga:on as a form of problem based learning – I’m not up enough on educa:onal or learning theory to know whether there is a body of theory, or even just a school of thought, about “inves:ga:ve learning”. These two ways of thinking are literate programming and reproducible research.

13

In case you haven’t already realised it, code is an expressive medium. Code has its poets, and ar:sts, as well as its architects, engineers and technicians. One of the grand masters of code is Don – Donald – Knuth. Don Knuth said “A literate programmer is an essayist who writes programs for humans to understand” as part of a longer quote. Here’s that longer quote: “Literate programming is a programming methodology that combines a programming language with a documenta:on language, making programs more robust, more portable, and more easily maintained than programs wriYen only in a high-‐level language. “Computer programmers already know both kind of languages; they need only learn a few conven:ons about alterna:ng between languages to create programs that are works of literature. A literate programmer is an essayist who writes programs for humans to understand, instead of primarily wri:ng instruc:ons for machines to follow. When programs are wriYen in the recommended style they can be transformed into documents by a document compiler and into efficient code by an algebraic compiler.” Notebooks are environments that encourage the programming of wri:ng literate code. Notebooks encourage you to write prose and illustrate it with code – and the

14

The other idea that the notebooks buy is into is reproducible research. I love this idea and think you should too. It lets archiving make sense. Do I really have to say any more than just show that quote? Now you may say that that’s all very well for, I don’t know, physics or biology, or science, or economics. Or social science in general, where they do all sorts of inexplicable things with sta:s:cs and probably should try to keep track of what they doing. But not the humani:es. But that’s not quite right, because in the digital humani4es there are computa:onal tools that you can use. Par:cularly in the areas of text analysis and visualisa:on. Such as some of the visualisa:ons we saw in the first part of this presenta:on. But you need a tool that democra:ses access to this technology. You need an environment that the social scien:sts found in the form of a spreadsheet. But beYer.

15

(I also like to think of notebooks as a place where I can have a conversa4on with data.).

16

So how do notebooks help? The tool I want to describe is – are – called IPython Notebooks. IPython Notebooks let you execute code wriYen in the Python programming language in an interac:ve way. But they also work with other languages – Javascript, Ruby, R, and so on, as well as other applica:ons. I use a notebook for drawing diagrams using Graphviz, for example. They also include words – of introduc:on, of analysis, of conclusion, of reflec:on. And they also include the things the code wants to tell u, or that the data wants to tell us via the code. The code outputs. (Or more correctly, the code+data outputs.)

17


18


19


20

The first thing notebooks let you do is write text for the non-‐coding reader. Words. In English. (Or Spanish. Or French. I would say Chinese, but I haven’t checked what character sets are supported, so I can’t say that for definite un:l I check!) “Literate programming is a programming methodology that combines a programming language with a documenta:on language”. That’s what Knuth said. But we can take it further. Past code. Past documenta:on. To write up. To story. The medium in which we can write our human words is a simple text markup language called markdown. If you’ve ever wriYen HTML, it’s not that hard. If you’ve ever wriYen and email and wrapped asterisks around a word or phrase to emphasise it, or wriYen a list of items down by puxng each new item onto a new line and preceding it with a dash, it’s that easy.

21

Here’s a notebook, and here’s some text. There’s also some code. But note the text – we have a header, and then some “human text”. You might also no:ce some up and down arrows in the notebook toolbar. These allow us to rearrange the order of the cells in the notebook in a straighMorward way. In a sense, we are encouraged to rearrange the sequence of cells into an order that makes more sense as a narra:ve for the reader of the document, or in the execu:on of an inves:ga:on. The downside of this is that we can author a document in a ‘non-‐linear’ way and then linearise it for final distribu:on simply by reordering the order in which the cells are presented. There are constraints though – if a cell computa4onally depends on the result of, or state change resul:ng from, the execu:on of a prior cell, their rela:ve ordering cannot be changed.

22

As well as human readable text cells – markdown cells or header cells at a variety of levels – there are also code cells. Code cells allow you to write (or copy and paste in) code and then run it. Applica:ons give you menu op:ons that in the background copy, paste and execute the code you want to run, or apply to some par:cular set of data, or text. Code cells work the same way, but they’re naked. They show you the code. At this point it’s important to remember that code can call code. Thousands of lines of code that do really clever and difficult things can be called from a single line of code. OKen code with a sensible func:on name just like a sensible menu item label. A self-‐describing name that calls the masses of really clever code that someone else has wriYen behind the scenes. But you know which code because you just called it. Explicitly. Let’s see an example – not a brilliant example, but an example nonetheless.

23

Here’s some code. It’s actually two code cells – in one, I define a func:on. In the second, I call it. (Already this is revisionist. I developed the func:on by not wrapping it in a func:on. It was just a series of lines of code that wrote to perform a par:cular task. But it was a useful task. So I wrapped the lines of code in a func:on, and now I can call those lines of code just by calling the func:on name. I can also hide the func:on in another file, outside of the notebook, then just include it in any notebook I want to… …or within a notebook, I could just copy a set of lines of code and repeatedly paste them into the notebook, applying them to a different set of data each :me… but that just gets messy, and that’s what being able to call a bunch of lines of coped wrapped up in a func:on call avoids.

24

As far as reproducible research goes, the ability of a notebook to execute a code element and display the output from execu4ng that code means that there is a one-‐to-‐one binding between a code fragment and the data on which it operates and the output obtained from execu:ng just that code on just that data.

25

The output of the code is not a human copied and pasted artefact. The output of the code – in this case, the result of execu:ng a par:cular func:on – is only and exactly the output from execu:ng that func:on on a specified dataset.

26

The output of a code cell is not limited to the arcane outputs of a computa:onal func:on. We can display data table results as data tables.

27

We can also generate rich HTML outputs – in this case an interac:ve map overlaid with markers corresponding to loca:ons specified in a dataset, and with lines connec:ng markers as defined by connec:ons described in the original dataset. We can also delete the outputs of all the code cells, and then rerun the code, one step – one cell – aKer the other. Reproducing results becomes simply a maYer of rerunning the code in the notebook against the data loaded in by the notebook – and then comparing the code cell outputs to the code cell outputs of the original document. Tools are also under development that help spot differences between those outputs, at least in cases where the outputs are text based.

28

So can we run virtual machines and IPython notebooks together?

29

The IPython notebooks are actually browser based front end applica:ons being powered by an IPython server…

30

It’s easy enough to run the IPython server on a virtual machine, either running as a guest VM on a student’s host computer, or running as on online service accessed by the student via the web using their own web browser.

31

There is a lot more that could be said – for example: -‐  workflows around the building/provisioning of virtual machines, -‐  how we might be able to host such machines either centrally or as a self-‐service

op:on, -‐  the corollary between notebook style compu:ng and spreadsheets, -‐  the no:on of conversa4ons with data, -‐  etc. etc.

32

Date post:	27-Jan-2015
Category:	Technology
Upload:	tony-hirst
View:	102 times
Download:	0 times

Calrg14 tm351

Technology