Health and safety lecture · Web viewHealth and safety lecture The Internet of Things: The...

Health and safety lecture

The Internet of Things: The challenge for health and safety professionals

Title SlideGood evening. It is a privilege to be invited to deliver the first lecture in what we hope will be an annual series of forward-looking talks about key issues in health and safety.

I shall talk about the degree to which we already depend on computer systems to keep us safe and healthy and then about the future: about artificial Intelligence, driverless cars, and about the millions of machines and devices that are being connected to the internet that we call we call the Internet of Things.

And I shall talk about the problem of assuring the safety of systems that depend on computers and in particular about cybersecurity and what we can learn from recent events such as the Wannacry and Petya ransomware attacks.

By the end of this talk, I hope to have convinced you that the scale of the risks that we face needs urgent attention by safety professionals and that we need help from policy makers, regulators and Government.

I shall say what I think should be done, and I look forward to hearing your views.

How dependent on computer systems are we now for our health and safety?Computers are already embedded in every aspect of health and safety. On the roads, cars have become mobile computers, with a modern car containing about 100 million lines of software.

Navigation now depends largely on GPS, even for emergency vehicles. Computers also control traffic lights, level crossings, motorway signs and much more.

Cars have become increasingly automated and the Government is promoting and preparing for semi-autonomous platoons of lorries and driverless cars within a few years.

Autonomous vehicles are increasingly used in agriculture and warehousing, on the water and beneath it, and in the air (and above it). Amazon is considering airship warehouses with delivery drones and development has started on pilotless passenger aircraft. Last year’s science fiction increasingly looks like tomorrow’s reality.

This is an independent lecture and does not represent HSE policy

In healthcare—despite the surprising statistic that the NHS is the world’s largest purchaser of fax machines—computers are found everywhere. Computers are in pacemakers and infusion pumps, in MRI scanners and operating theatres, in the systems that hold patient records and in the many other computer systems that support medical staff and that monitor, diagnose and treat patients.

A typical NHS trust has more than 150 different computer systems, not counting the computers embedded in medical equipment.

Our food depends on computer systems, from farms through to retailers. As the Storm Desmond floods demonstrated in Lancaster in December 2015, if we lose the electricity that powers computers and communications, life in cities becomes extremely difficult and increasingly unsafe, with limited access to food, fuel, heating or reliable information.

Energy supplies depend on computing, which controls and protects oil and gas platforms, refineries, pipelines, the National Grid and power stations – including solar farms and wind turbines.

It is no exaggeration to say that modern society depends on the reliable operation of computer systems, and many of these computer systems are essential to keep us healthy and safe.

At the heart of all these systems is software – a lot of software!

Slide 2Most people probably expect that such important software is designed and built very professionally, that it is maintained carefully and that those who build it and those who buy and install it have strong evidence that it is fit for purpose.

But this expectation is not the truth.

Most software is not well engineered. Indeed, most software development should not be called engineering: it is a craft activity where the best craftspeople do an excellent job, the worst are terrible, and the average is worryingly poor.

Most software developers use little computer science and few rigorous engineering methods.

Worst of all, most software developers rely almost entirely on testing to show that their software is reliable, secure and safe, when we have known for at least forty years that testing software can only ever reveal that it does contain errors.


Slide 3No practical amount of testing could ever show that a typical program is error-free.

Here are the results from an experiment carried out by Watts Humphrey of the Software Engineering Institute at Carnegie-Mellon University. A KLOC is 1000 Lines of Code.

Slide 4

You may be thinking that these error rates are not credible because, if things were this bad, then computer systems would be failing all the time. But that is to misunderstand the nature of software-based systems.

When we test a bridge by loading it, or an aircraft wing by bending it until it breaks, the tests rely on physical laws and materials science.

The science and the physics allow us to test a structure or a component with individual values or at extremes and to calculate or infer the behaviour for other values.

Physical laws give us very high confidence that if a bridge has been shown to support the weight of a bus, it is extremely unlikely that it will collapse under the weight of a horse or a bicycle.

But software is not a physical structure and if we test software and find that it works correctly for a hundred or a thousand different input values, we cannot validly conclude that it will work for other values, even if these new values lie between the values that have already been tested.

Slide 5

A trivial example might be a simple arithmetic calculation such as 1/(a-b) where a and b are whole numbers that are calculated elsewhere in the program. When a equals b, (a-b) becomes zero and division by zero will often cause a computer program to fail. It would be impractical to write (or to run) tests that tried every circumstance that could cause a and b to have the same value, so testing is unlikely to find this fault.

Most programs of more than a few hundred lines of software are immensely complex, with many millions of different possible sequences of instructions (or paths through the program) and many trillions of different system states. When you test a system, you find the faults that cause failures in the paths and states that you tested; in general, you find the big errors, those that cause failures in very many paths and in very many states.

If you correct these errors, you are left with defects that cause failures only on relatively uncommon paths through the system.


Because it is wholly impractical to test every possible path in every possible state of the system, very many defects will remain even after heroic amounts of testing. That is one reason why there are many software updates and product recalls for software defects, including cars and medical devices.

Slide 6 which is blank and blackThere have been injuries and fatalities caused by software defects: but there are undoubtedly many more than are reported because a software failure usually leaves no trace.

If an infusion pump overdoses a patient, the nursing staff are likely to be blamed.

If a car crashes when cornering fast, the dead driver will be blamed and no-one will question whether the computer-controlled brakes, throttle or suspension may be at fault.

It may need many similar accidents to occur in a car or a medical device before there is a detailed investigation into the software.

And when accidents occur that may have been caused by software, the manufacturers are likely to be the only people with access to the software designs, the source code and to the detailed expertise to interpret any forensic data. Manufacturers will not be impartial investigators, because they will also carry the liability, the recall costs and the reputational damage if their product is found to be at fault and to have caused the accident.

Manufacturers can also rely on the principle in English law that a machine is presumed to be working correctly

This principle is absurd when applied to software controlled machines and it causes a serious problem, because it means that the party that has least access to the electronic evidence has to prove a negative, almost always without the relevant evidence from the computer system.

We have seen that even experienced software developers make a lot of errors and that testing software is a hopeless way to find them all. Indeed, academic studies suggest that even rigorous testing may typically find only half of the defects.

Fortunately, we do not have to rely exclusively on testing. Software is not a physical product, it is a form of mathematics and in principle it can be analysed mathematically. It is possible to prove that for all possible input, in every possible order, a program cannot divide by zero, or run out of memory, or use uninitialised data, or attempt to read or write outside the limits of an array or buffer. It is possible to prove that a program does what it has been specified to do.


This can be done if the program has been designed and written using languages that are mathematically defined and that can be analysed fully. Such languages exist and they have been used for critical parts of important systems such as the control systems for metro trains and the systems that support air traffic control, as we shall see later.

We call these languages and their associated tools and methods Formal methods. Formal methods could quickly find the risk of division by zero that I mentioned earlier, and tell you what input data would cause it to happen.

If a computer system has a safety function and if the developers have not used formal methods where they would have reduced the risks that the system would fail dangerously, then it is difficult to see how they could argue successfully that the risks had been reduced so far as reasonably practicable, which is a requirement of the Health and Safety at Work Act.

As we approach the fiftieth anniversary of the NATO Software Engineering Conferences where the limitations of testing were clearly identified, we need to say clearly that relying so heavily on testing is unprofessional and that when safety is jeopardised it is also unlawful. It may be necessary to bring a few successful prosecutions before this message is finally understood.

Over recent years, the risk that safety-critical systems will fail has become much greater because of the increased prevalence and sophistication of cyberattacks.

Safety cannot be assured if a system is not secure, so cybersecurity is an increasingly important issue for those who have duties under the Health and Safety at Work Act and therefore for HSE.

Cyber criminals exploit the vulnerabilities that are created by software defects or by weaknesses in the defences that are supposed to stop unauthorised people from accessing the system.

There are thousands of these vulnerabilities listed on the internet and which should be well-known to professional software engineers but software developers continue to build systems that can easily be attacked using well-known exploits.

Slide 7The SQL injection attack that was used against the telecommunications company TalkTalk in October 2015 was first described in 1998; the method of attack was actually older than the teenager who used it against TalkTalk. TalkTalk was fined £400,000 over this data breach, with great publicity, and yet the same injection vulnerability is still one of the most common defects found in websites.

Engineers are supposed to learn from major failures and mostly they do – but most software developers do not behave like engineers.


I would like to be able to say that industrial control systems and other safety-critical systems are always developed to much higher engineering standards than websites, but that isn’t always the case. A few seem exemplary, but others are poor.

Slide 8Here are some figures that were published many years ago in Crosstalk, the journal of defence software engineering. They show the defects that an MoD team found in the software for a military transport aircraft, built in America by a leading aircraft manufacturer who was following the highest international civil aviation standards – including very extensive MC/DC testing.

As you can see, the defect rates were very high.

It may be that avionics software is better now (though I am not aware of any evidence that this is the case), but the standards that are followed for developing safety-critical software are still not good enough, and even a hundred-fold improvement in these figures would still imply that there were many thousands of defects in the 100 million lines of software in a modern car.

It’s hard to get access to detailed analyses of the recent safety-critical software because of commercial confidentiality and security, but the international industry standards give little indication of major improvements. The leading standard for safety-related computer systems IEC 61508 has serious flaws: for example, it relies far too heavily on testing and it pays little attention to the scale and nature of the cybersecurity threat.

Cyberattacks invalidate the assumption that failures in independent safety functions will occur independently, which is a critical assumption in many safety cases.

The threat of cyberattacks also creates a dilemma about software updates.Once a safety-critical system has gone through its rigorous acceptance procedures, it should only be changed when absolutely necessary – because of the work that is required to ensure that the changes do not invalidate the safety case.

This conflicts with the well-founded advice that security updates should be applied to systems as soon as possible.

This is especially important when a security patch is widely available, because the patched software can be analysed by attackers to discover the vulnerability and then to attach unpatched systems.


Once again, the heart of the problem is the reliance on lengthy and expensive testing as the main way to show that software risks have been eliminated or controlled.

Modern cars illustrate many of the issues that affect the Internet of Things. There is no reason to suppose that software in cars is any better than software in aircraft, though again it is very difficult to do independent research because the software is proprietary.

Occasionally, a court case leads to some expert review and publication.

Slide 9An expert witness called Michael Barr examined the software for the electronic throttle in a 2005 Toyota Camry that was at the heart of a claim that the car had accelerated by itself and could not be stopped. The details of the case, called Bookout v Toyota, can be found on the internet.

Michael Barr found thousands of minor defects and many serious errors in the software design and in the way it had been written.

I recommend reading his analysis, which has uncomfortable parallels with the review of the Full Authority Digital Engine Controller installed in Chinook helicopters.

Slide 10A House of Commons report into the Mull of Kintyre Chinook accident on 2 June 1994 said: In the summer of 1993 an independent defence IT contractor, EDS-SCICON, was instructed to review the FADEC software; after examining only 18 per cent of the code they found 486 anomalies and stopped the review.

I must emphasise that some excellent practice does exist in industry.

Slide 11 (courtesy of Altran UK).The latest air traffic control system for National Air Traffic Services in the UK was developed and safety-assured using formal methods.

The developers started with a mathematically rigorous statement of requirements and a design in a language called Z; they then programmed the system in the SPARK programming language. The SPARK tool-set checked that the software fulfilled the design and proved that there were no defects that could cause the system to behave in an undefined way or to fail at run-time.

From 250,000 lines of software, this generated the need for 152,927 separate mathematical proofs; of which 151,026 (98.76 %) were proved entirely automatically by the SPARK tools without human intervention.


Any time the software is changed, the complete proof is reconstructed to ensure that no errors have been introduced. This takes 15 minutes on a standard desktop computer – just enough time to have a cup of coffee.

This could be the solution to the software update dilemma that I mentioned earlier, making it possible to correct security defects quickly whilst preserving a strong safety case.

The German company, Siemens, also uses formal methods (different ones) to generate software for the control systems for the Paris Metro and elsewhere, and Microsoft and Facebook have introduced proof tools into their software development processes, to improve security, speed up development and reduce costs.

The use of formal methods is demonstrably “reasonably practicable” and reduces risks, and yet most developers of safety-critical systems do not use formal methods and most regulators accept unscientific safety arguments, even for newly developed systems, because the culture and practice of informal software development is so pervasive throughout long supply chains that it would be disproportionate to do anything else.

But two factors are making this situation increasingly dangerous: the introduction of many more safety-critical systems and the rising threat of cyberattacks.

As I said earlier, cyberattackers often exploit the software vulnerabilities that result from poor software engineering, for example by sending data that causes the software to fail or to pass control to the attacker.

The safety of many important systems depends on cybersecurity and the risk of serious cyberattacks has been assessed as a Tier One threat on the National Risk Register.

The scale of the threat from cyberattacks depends on the strength of the attacker’s motive, the ease with which the attack can be carried out, and the risk of being caught and punished.

The groups that carry out cyberattacks are (in rough order of capability) teenage vandals (script kiddies), activists (hacktivists), minor criminals, terrorists, organised crime groups, and nation states. In the decade since I was on the Board of the Serious Organised Crime Agency SOCA (now the National Crime Agency, NCA), we have seen the cyberattack tools and methods that were formerly only used by nation states, migrate through organised crime down to ordinary criminals, hacktivists and script kiddies.

Slide 12One example is the EternalBlue exploit that was developed by the US National Security Agency’s elite Tactical Access Operations team, stolen by


The Shadow Brokers, and published on the internet along with dozens of other exploits and tools – over a gigabyte in total.

That exploit was used in the Wannacry ransomware attack that affected 200,000 computers running Microsoft Windows, including some in the NHS. Fortunately, the objective seems to have been criminal extortion rather than terrorism, as the software encrypted data and then immediately announced itself with a ransomware screen.

The software could easily have been designed instead to make changes to critical data and to remain invisible until several backup cycles had copied the corrupted data into all the backups.

If files can be read and encrypted, then they can be read and changed. How quickly could the NHS recover if blood-groups and other important fields in electronic medical records could no longer be trusted and if there were no backups that could be trusted either?

The same tools were subsequently used by other criminals to attack unpatched systems with the Petya malware that crippled systems around the world, causing lasting disruption to many organisations including the global parcel company TNT, the leading advertising firm WPP, the Danish shipping group Maersk, French construction firm Saint-Gobain and major Russian industrial groups.

Slide 13The cybersecurity threat is not taken seriously enough. Far too many systems have been shown to be vulnerable. The remote takeover of cars and medical equipment are examples that have received some publicity but it seems that almost all systems can be penetrated and compromised if someone has enough motivation; the commercial companies that offer penetration testing services report that they usually find vulnerabilities within a few minutes.

Slide 14The Red Teams that are used to assess the defences of military systems almost always manage to penetrate them.

At the DefCon cybersecurity conference earlier this year, there was a challenge to hack a set of US voting machines. All the machines were successfully penetrated.

Even systems that are isolated from any networks can be penetrated, as the Iranian Government found out when the Stuxnet malware destroyed many of the gas centrifuges that were separating uranium isotopes for the Iranian nuclear programme.

HSE is aware of the risks that cyberattacks could present to major hazard sites and has recently published Operational Guidance to assist inspectors to assess industrial control systems for cybersecurity, but the risks must be


recognised and controlled by the duty holders and although HSE’s interventions are an important stimulus to action, much more will be needed.We must invest in protecting critical data from corruption and eliminate single points of failure for multiple systems.

For example, we must provide a robust and diverse alternative to GPS for precision navigation and timing, because GPS can easily be jammed over a wide area with equipment that is cheap and readily available. For a little more investment it can be spoofed, generating signals that a receiver will interpret as whatever the attacker wants. Think about the effect of that on shipping, on a foggy night in the English Channel, for example.

We must find a way to encourage (and, if necessary, compel) software developers to use state-of-the-art software engineering methods, so that new software has many fewer vulnerabilities.

We must replace the most important and vulnerable software components with proven, high-integrity alternatives.

We must update the international standards for safety-critical software so that they are fit for purpose in the new world of frequent cyberattacks.

And we must ensure that we do not introduce major new risks with driverless cars, or as we add millions of additional devices to the Internet of Things – the IoT.

It is becoming the norm to include computers in almost every new device, new building, and everywhere else – and to connect these computers to the internet so that they can send and receive data. The result is an dramatic growth in computers talking to other computers with little or no human intervention.

IoT systems are already widespread and widely hacked. It is trivially easy to find internet connected devices by using search engines, if you know what to search for. There is even a specialised search engine, Shodan. A recent denial of service attack exploited thousands of internet connected DVD recorders and video cameras and used them to generate vast quantities of traffic directed at the website of the security consultant Brian Krebs, as an act of revenge. The software used to construct the Mirai botnet that was used in that attack was later released on the internet by the attackers, for anyone to exploit.

Slide 15Many IoT systems are safety-critical: medical examples include pacemakers, patient monitors, infusion pumps, radiotherapy machines and even electronic patient records. More widely there are factory control systems, signals, sensors and alarms, thermostats, security access systems, agricultural machinery, cranes, elevators, emergency signs and lighting and hundreds more.


These systems are being designed and installed with little thought for the cyberattack threats today and in the future.

Often they are low-cost devices that use software components that were designed for other purposes.

They may contain hidden services with default passwords that the user cannot change or does not know about. (This was the case with the video cameras and DVD players recruited to the Mirai botnet). If there are support arrangements at the beginning, they are likely to end before the device is replaced.

When support ends, these devices remain with all their vulnerabilities, an open door that may expose connected systems and networks to uncontrollable risks.

Slide 16Driverless and highly automated cars provide an interesting example of the Internet of Things – many of the issues we can describe in cars will have parallels in industrial control systems, medical devices, smart buildings and smart infrastructure.

When a car’s behaviour is critically dependent on its software, a software update may radically change its safety. How long will it take to verify that it is safe to distribute an update to many thousands of cars – and how does that compare with the frequency of updates that may be necessary to keep ahead of cyber criminals?

What manufacturer will commit to maintaining 100 million lines of software for the lifetime of a car – 20 or more years, when software manufacturers generally end maintenance much more quickly?

Will the MOT test be updated to check that all known software vulnerabilities have been patched, and to fail any car whose software is out of maintenance? Will the next Wannacry ransomware threaten automated cars, demanding protection money in Bitcoins?

And if you don’t pay the ransom, will it crash the car rather than just disable it?The growing threat of cyberattacks makes software defects much more serious. When we only had to worry about reliability, we could take comfort from the fact that each model of car had been driven for millions of miles and that problems occurred randomly and were unlikely to affect any individual car before its next scheduled maintenance.

The cyber threat is different. A vulnerability may be latent in thousands of cars and could then be exploited everywhere on the day it is discovered. New technologies bring new problems. Artificial intelligence (and especially machine learning) is being employed ever more widely, and will certainly be in


systems that affect health and safety (including in highly automated and driverless cars) yet the report on machine learning that was published this year by the Royal Society made it clear that verifying the behaviour and cybersecurity of machine learning systems is still a research challenge, and that often, even the developers of such a system may not be able to explain why it made a particular decision.

In view of all these issues, how should we proceed?

Firstly, we must ensure that the costs of poor software fall primarily on the manufacturers, because they are the ones who profit from the software and who can make the necessary improvements.

Companies must be put on notice that they will be held accountable for the consequences of any vulnerabilities, so that they have the greatest possible incentive to use strong engineering and assurance methods.

This accountability is explicit in European Directives and implicit in the Health and Safety at Work Act. There must be much stronger enforcement of these duties by all the regulators of safety-related systems, so that the business implications become clear to duty holders and to Finance Directors.

Most importantly we need a strategy that will ensure that new safety-critical systems (such as driverless cars) are designed and built to far higher cybersecurity and software assurance standards than currently exist.

No such strategy currently exists. Indeed, the UK’s current industrial strategy seeks to train more developers to develop software informally, and relies far too heavily on testing as the basis for assurance. This can only make our current problems worse.

In the short term, we need to act to eliminate and control the vulnerabilities that already exist. In the longer term we need to navigate a way towards a future where computer systems can be shown to be safe against cyberattack, with scientifically sound evidence to justify our confidence.

I offer these proposals for what needs to be done.

Slide 17Safety professionals should

Ensure that you and your suppliers have a security development lifecycle including a threat model, a security policy, processes for performing security assurance during development, and a vulnerability management process.

Develop improved standards for safety-critical software. Challenge safety cases that critically depend on testing or “proven in

use” claims.

Slide 18Regulators should


Require that duty holders and their suppliers have a rigorous security development lifecycle as described above.

Challenge safety evidence that depends excessively on testing, if it lacks rigorous software analysis to support it.

Slide 19Policymakers should

When new technology policies are considered, evaluate the risks and consequences of cyberattack and include the associated costs are included before the policy is agreed. (With the current state of technology, I suggest that this would have changed the current enthusiasm for driverless cars and for several other digital policies.)

Develop a cybersecurity strategy that moves beyond the current reactive, defensive and blame-the-user approach and sets a path towards a future that is provably secure by design.

Make software importers and manufacturers liable for the consequences of cyber vulnerabilities that would have been prevented by state-of-the-art software engineering.

Eliminate the absurd and damaging legal presumption that a software system is working correctly, and legislate to ensure that when an accident occurs, investigators, prosecutors and defendants are on a equal footing with access to the forensic data and the software design and source code.

Slide 20

It will take a decade and more to reduce the current levels of risk so that they are tolerable and As Low As Reasonably Practicable but I believe that it is indefensible to wait until a series of disasters force us to act.


Date post:	21-Apr-2018
Category:	Documents
Upload:	phungdang
View:	216 times
Download:	3 times

Health and safety lecture · Web viewHealth and safety lecture The Internet of Things: The...

Documents