+ All Categories
Home > Software > Software Disasters

Software Disasters

Date post: 21-Feb-2017
Category:
Upload: arno-huetter
View: 3,670 times
Download: 0 times
Share this document with a friend
26
Software Disasters ARNO HUETTER
Transcript
Page 1: Software Disasters

Software DisastersARNO HUETTER

Page 2: Software Disasters

About the Author

Arno Huetter Arno wrote his first lines of code on a Sinclair ZX80 in 1984.

Over the years, he has been programming in C/C++, Java and C#, and also did quite some database development.

Today he is Development Lead at Dynatrace (APM vendor).

Page 3: Software Disasters

OS/2 (1985-2001)

Page 4: Software Disasters

The PC World in 1985 PC (with DOS) is clear market leader, but Apple Macintosh is the new cool

thing Windows 1.0 merely a DOS GUI extension IBM‘s TopView has flopped (rudimentary shell that allowed for copy/paste

between and multitasking of DOS programs)

Page 5: Software Disasters

Windows 1.0

30 years of innovation

Page 6: Software Disasters

Enter OS/2 OS/2 intended as the protected mode successor of DOS IBM decides to form another partnership with Microsoft The Plan:

IBM programmers would develop significant parts Microsoft to be paid by kLOC contractor rates Must run on 286, compatible with TopView, run DOS programs in

„compatibility box“ Presentation Manager should allow recompiled Windows

applications to run (never worked that way, required rewrite or VDM starting with OS/2 2.0)

Page 7: Software Disasters

1987: OS/2 1.0

1988: OS/2 1.1

Page 8: Software Disasters

1987 to 1991 Marketing along with IBM‘s PS/2 platform (although PS/2 not required) leads

to customer confusion RAM prices shoot up in 1987 (USD 133 for 1MB), OS/2 requires 4MB

compared to the usual 1MB for DOS USD 340 for retail copy (DOS shipped for free with new PCs) USD 3,000 for OS/2 SDK No printer support except IBM printers, no drivers for common devices Missing guidance / support / ecosystem for 3rd party software vendors 1989: OS/2 1.2 introduces HPFS, Ethernet, TCP/IP 1990: Windows 3.0 takes off, IBM/Microsoft collaboration unravels 1991: OS/2 1.3 turns out to be a modest success, but fades compared to

Windows 3.x

Page 9: Software Disasters

1992: OS/2 2.0

1994: OS/2 Warp

Page 10: Software Disasters

1992 to 1994 1992: Windows 3.1 released 1992: OS/2 2.0, true 32bit operating system, taking full advantage of 386,

and technically ahead of Windows 3.x (preemptive multitasking, memory protection)

Workplace with „object-oriented“ UI behavior and 32bit API Multiple DOS programs running side-by-side Windows 3.0/3.1 compatibility via VDM. Windows code included in OS/2 Due to Windows compatibility, developers simply decided to develop for

Windows only (and could state „it runs on OS/2 as well“) OS/2 versions of Lotus 1-2-3 or Corel Draw sluggish compared to Windows 1993: Mainframe market collapses. IBM CEO John Akers ousted, replaced by

Louis Gerstner. Gerstner turns struggling company around 1994: OS/2 3.0 (Warp) introduced 1994: Windows NT 3.5 introduced (modern, rock-solid, multi-core support)

Page 11: Software Disasters

1995 to 2001 1995: Windows 95 hits market, becomes instant success IBM weak on marketing, hardly getting PC clone makers on board OS/2 sold mainly to corporate customers for networking environments, but

finally loses there as well to Windows NT Even IBM‘s „Mr. OS/2“, David Barnes, is quoted saying: „OS/2 is great, but

then Sony‘s Betamax was way better than VHS…“ 1996: OS/2 Warp 4 released, adds Java and speech recognition IBM finally stops development, but continues to sell OS/2 until 2001 Gerstner quote #1: “The pro-OS/2 argument was based on technical

superiority... What my colleagues seemed unwilling or unable to accept was that the war was already over and was a resounding defeat”

Gerstner quote #2: “The battle between OS/2 and Microsoft Windows was draining tens of millions of dollars, absorbing huge chunks of senior management’s time, and making a mockery of our image.”

Page 12: Software Disasters
Page 13: Software Disasters
Page 14: Software Disasters

1998 to 2002: Netscape 1998: Consensus: Netscape 4 code base is

pretty bad. So let’s do a complete rewrite! Mozilla organization formed.

Code base might have been bad, but it worked quite well for most users (browser market share at 50%)

1999: Netscape acquired by AOL 2000: Netscape 6 released. Wasn’t really

ready, fails miserably 2002: Mozilla 1.0 released. First real

release in four years. Browser market share at 6%

2003: AOL closes Netscape division, Mozilla Foundation continues independently

2004: Resurrection: Firefox 1.0 based on Mozilla

Page 15: Software Disasters

Ariane 5 (1996)

Page 16: Software Disasters
Page 17: Software Disasters

declare vertical_veloc_sensor: float; horizontal_veloc_sensor: float; vertical_veloc_bias: integer; horizontal_veloc_bias: integer; ...begin declare pragma suppress(numeric_error, horizontal_veloc_bias); begin sensor_get(vertical_veloc_sensor); sensor_get(horizontal_veloc_sensor); vertical_veloc_bias := integer(vertical_veloc_sensor); horizontal_veloc_bias := integer(horizontal_veloc_sensor); ... exception when numeric_error => calculate_vertical_veloc(); when others => use_irs1(); end;end irs2;

Page 18: Software Disasters

Ariane 5 - Summary of Events 64bit floating point to 16bit signed integer conversion Numeric overflow when horizontal velocity sensor value > 32768 (internal

unit) Exception handling deactivated Redundant system contained different hardware but same software, hence

ran into same problem Unhandled exception triggered self destruction in order to avoid rocket

breaking apart Code originated from Ariane 4, which was slower and flew at different angle Calculation not even needed during flight (just during prep), but still running USD 5 billion overall development costs USD 500 million for rocket + satellites Program delayed by years

Page 19: Software Disasters

2000 to 2005: FBI Virtual Case File Software system to manage all documents relating to

cases being investigated by the FBI Modern web interface for 22,000 users to replace

previous ACS system (which was obsolete already at introduction due to outdated technology)

Estimated completion time: 22 months Until 2005, 700,000 lines of code written, five different

project leads in charge

Page 20: Software Disasters

2000 to 2005: FBI Virtual Case File VCF turns out to be incomplete, inadequate and poorly

designed, essentially unusable under real-world conditions

Even in rudimentary tests system did not comply with basic requirements

After having invested 170 Mio USD, the FBI decided to buy off-the-shelf software instead

Causes: No architecture blueprints, repeated changes in specification, engineers with little or no computer science training, code bloat, scope creep

Page 21: Software Disasters

2003: US Northeast Blackout Race condition in General Electric's Unix-based XA/21

energy management system Bug stalls FirstEnergy's control room alarm system –

operators do not receive alerts any more Unprocessed events queued up and the primary server

failed within 30 minutes Applications automatically transferred to the backup

server, which itself failed Operator screen refresh rate drops from 1sec to 1min Operators hence dismiss a call about the tripping and

reclosure of a 345 kV shared line More lines to go offline in a chain reaction,

undervoltage and overcurrent interpreted as a short circuit

30 minutes later 256 power plants are off-line, most due to automatic protective controls

Page 22: Software Disasters

2005: WoW Glitch Game update on September 13th introduced new

character „Hakkar“ Hakkar was able to inflict a disease „Corrupted Blood“

on playing characters, draining their health points and finally killing them

Disease could be passed to other players Effect was meant to be localized to one game area Developers didn‘t consider WoW teleporting

functionality Infected players teleported into other areas, soon

leading to corpses littering the streets Fortunately, player death is not permanent in WoW

and admins resetted the game (Virtual) death toll: unknown

Page 23: Software Disasters

2012: Knight Capital loses 440M USD August 12th: New Trading Software installed Administrator forgets to deploy on one out of eigth

server nodes New code repurposed a flag previously used for testing

scenarios On that one server node, old trading algorithm

interprets flag differently and starts buying and selling 100 different stocks randomly without human verification

NYSE has to suspend trade of several stocks Knight Capital loses 440 Mio USD in only 30 minutes,

until system is suspended Investors have to raise 400 Mio USD in order to rescue

the company

Page 24: Software Disasters

Source: http://www.typemock.com/software-bugs-infographic

Page 25: Software Disasters

Why do SW projects fail (IEEE) Unrealistic or unarticulated project goals Inaccurate estimates of needed resources Badly defined system requirements Poor reporting of the project's status Unmanaged risks Poor communication among customers, developers, and users Use of immature technology Inability to handle the project's complexity Sloppy development practices Poor project management Stakeholder politics Commercial pressures

Page 26: Software Disasters

Thank you!Twitter: https://twitter.com/ArnoHuBlog: http://arnosoftwaredev.blogspot.com


Recommended