Secrets of Unidata Software EngineersSecrets of Unidata Software Engineers
Russ RewUCAR Software Engineering Assembly
April 26, 2006
Russ RewUCAR Software Engineering Assembly
April 26, 2006
Unidata in a Nutshell
Unidata in a Nutshell
Mission:
To provide data, tools and community leadership for enhanced Earth-system education and research
The Unidata Program Center:
Facilitates (real-time) data access
Provides and supports data access, analysis, and visualization tools and services
Builds and advocates for a community of geoscience educators and researchers
UPC size: 12 developers, 12 other staff
Unidata DevelopersUnidata
Developers•Tom Baltzer
• John Caron (.75)
• Steve Chiswell
• Ethan Davis
• Steve Emmerson
• Ed Hartnett (.25)
• Yuan Ho
• Robb Kambic
•Jeff McWhirter
• Don Murray (.75)
• Jen Oxelson
• Russ Rew (.25)
• Anne Wilson
• Tom Yoksas (.75)
Overview: The Mystery
Overview: The Mystery
Premise: Unidata has been very successful in its software development
Premise: Unidata’s software engineering process appears haphazard and chaotic
Mystery: Why is Unidata’s software successful and popular when it makes little use of recognized development methodologies?
Speculations, theories, and revelations
Some Software Successes
Some Software Successes
Integrated Data Viewer (IDV)
Local Data Manager (LDM)
netCDF, netCDF Java (nj22)
THREDDS and THREDDS Data Server
(TDS)
Units library (udunits)
IDVIDVUnidata’s newest scientific analysis and visualization tool
Freely available 100% Java framework and reference application
Provides 2- and 3-D displays of geoscience data
Stand-alone or networked application
Integrates data from disparate sources
End-to-end test for Unidata technologies
IDV’s SuccessIDV’s Success
In use at over 80 Unidata sites and use growing rapidly
Selected as the visualization tool for the Operations Center in T-REX
Bill Hibbard, developer of Vis5D and VisAD, calls the IDV “far better than any other environmental visualization system”
QuickTime™ and apeg
ˇˇ
LDMLDMPeer-to-peer system for reliable, event-driven data distribution using LDM-6 software
Supports subscriptions to near real-time data feeds
LDM protocols use persistent TCP connections, suitable for pushing a large number of small products, as well as large products
Highly configurable: can inject, distribute, capture, filter, and process arbitrary data products
Source
LDM
Source
Source
LDM LDM
LDMLDM
LDM LDM
LDM
LDM
Internet
LDM’s SuccessLDM’s SuccessUnidata’s Internet Data Distribution system:
Near real-time data for 175 universities and research organizations
30 data feeds (radar, satellite, text bulletins, lightning, model forecasts, surface obs, upper air obs, ...),
Also used by USGS, NASA, ESRL, weather services in Spain and Korea, active projects on 6 continents
Data volume: 2.5 GB/hr, 120000 products/hr; ranks fifth in weekly Internet2 traffic (Iperf, HTTP, NNTP, SSH, LDM, ... FTP)
More LDM SuccessesMore LDM Successes
NOAA/NWS adopted for Level II radar distribution
From 134 radars to 125 weather forecast offices, 22 universities, 10 federal organizations, 12 commercial organizations
Will be used in THORPEX Interactive Grand Global Ensemble (TIGGE)
Model output collection from 10 global modeling centers
Collected at 3 archive centers (NCAR, ECMWF, Beijing)
Test from ECMWF to NCAR sustained 17 GB/hr
Candidate to replace WMO’s Global Telecommunications System (GTS)
NetCDF’s NicheNetCDF’s NicheSimple data model for scientific datasets
Portable, self-describing data
Supports direct access (unlike XML)
Many language interfaces: C, Fortran, C++, Java, Python, Perl, Ruby, ...
Lots of applications
Efficient subsetting of multidimensional arrays
Supports appending, sharing, archiving data
NetCDF-Java (nj22)
NetCDF-Java (nj22)
100% Java library, more advanced than C-based interfaces
Prototype implementation of Common Data Model for access to netCDF-4, OPeNDAP, HDF5
Provides netCDF interfaces to other formats: Grids (GRIB1, GRIB2), Radar (NEXRAD, NIDS, DORADE), Satellite (DMSP, GINI), Point Observations (BUFR (soon))
Provides uniform coordinate systems layer
Access to THREDDS catalogs
Implements access through NcML
Common Data ModelCommon Data Model
Coordinate Systems
Common Data Access Model
Scientific Datatypes
Grid
Point
Radial
Trajectory
Swath
Station
ApplicationApplicationApplications
THREDDS HDF5 netCDFGRIBOPeNDAP ...
Success ofSuccess of
Basis for CF Conventions for climate and forecast data
Used at LLNL/PCMDI for archiving model output for the upcoming IPCC Fourth Assessment Report: 23 models, 30 TBytes, 70000 files
Used in various archives maintained by NOAA, NASA, USGS, DoE, NCAR, BADC, CSIRO, ...
C and Fortran netCDF Users Guides have been translated into Japanese at Kyoto University
Other uses in chromatography, mass spectrometry, neuro-imaging, biomolecule trajectory simulations, ...
Used in 15 commercial packages and over 50 open source packages for analysis, visualization, and data management
THREDDSTHREDDSOriginally funded under NSF Digital Libraries initiative
“Discovery and use of scientific data”
Middleware between data providers and users
Dataset Inventory Catalogs (XML)
Now part of Unidata Data Collections effort
Data Serving (pull)
THREDDS Data Server (TDS) most recent development
A THREDDS catalog provides a hierarchical structure for factoring inherited metadata
TDS (THREDDS Data Server)
TDS (THREDDS Data Server)
Integrates data access with THREDDS catalogs and services
Tomcat/Servlet, 100% Java, single war file
Data input is netCDF Java 2.2 library
Data output:
OPeNDAP (for accessing subsets)
HTTP Server (for bulk file transfer)
OGC Web Coverage Server (currently gridded only, subsetting supported)
Supports dynamic generation of catalogs
Success of THREDDS
Success of THREDDS
THREDDS used in NCAR Community Data Portal, many other data archives
TDS in use for serving IDD data from motherode.ucar.edu, other data providers
From “Lessons Learned: Evaluation Studies Related to Geoscience Data in THREDDS and DLESE”, Susan Lynds et al:
•“Data providers agreed that THREDDS has made data access much easier than it used to be and enables them to reach new user communities.”
udunitsudunitsLibrary for manipulating units of physical qualities.
Conversion of unit specifications between formatted and binary forms
Arithmetic manipulation of unit specifications
Conversion of values between compatible scales of measurement
C, Fortran, and Java interfaces
Required by CF conventions
udunits Successudunits Success
Almost as widely used as netCDF
The Unidata Development Process
The Unidata Development Process
Unidata’s software engineering process appears haphazard and chaotic.
No uniform software engineering process
No regular code reviews
Specifications for software often missing or vague
No enforcement of coding standards
No measurement of programmer productivity
No effort underway to improve software engineering methodology
What Accounts for Unidata’s Successes?
What Accounts for Unidata’s Successes?
... and can other organizations benefit from the answers?
Magic fairy dust?
Advanced processes?
Signing bonuses?
Working conditions?
Luck?
I’ll Offer Some Theories
I’ll Offer Some Theories
The identified factors are subjective
Based on almost twenty years involvement in Unidata
Discussion question: are any of these easily transferrable?
Discussion question: would we have had even better software success with application of disciplined development methodologies?
Involve Developers in
Software Support
Involve Developers in
Software SupportSuperior support for users of legacy applications:
GEMPAK
McIDAS
Support for software developed elsewhere:
OPeNDAP
VisAD
Every developer expected to answer user questions
GEMPAKGEMPAK
Application for analysis and visualization
In use at over 200 sites, use still growing
Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting data types
QuickTime™ and aGIF decompressor
are needed to see this picture.
McIDASMcIDAS
In use at approximately 100 sites, a growing number outside the U.S.
Developer specialized expert in package, not process: maintaining, upgrading, testing, distributing, supporting, teaching user workshop, supporting user community, supporting new data types
Unidata User Support
Unidata User Support
Over 30 responses to user questions/day
Searchable support archives help
Support for legacy apps still significant
Balance between visualization apps, data middleware
Keeps developers close to users
Leverage User EffortsLeverage User Efforts•NetCDF users have contributed language interfaces, applications, good ideas, and bug reports: www.unidata.ucar.edu/software/netcdf/credits.htmlBob Albrecht, Ethan Alpert, Chris Anderson, Ayal Anis, Harald Anlauf, Phil Austin, Eric Bachalo, Jason Bacon, Sandy Ballard, Matthew Banta, Mike Berkley, Sherman Beus, Lorenzo Bigagli, Mark Borges, Nicola Botta, Dr. Kenneth P. Bowman, Bill Boyd, Mark Bradford, Bernward Bretthauer, Dr. Paul A. Bristow, Roy Britten, Glenn Carver, Tom Cavin, Morrell Chance, Susan C. Cherniss, Jason E. Christy, Gerardo Cisneros, Alain Coat, Carlie J. Coats, Jr., Jon Corbet, Alexandru Corlan, Jim Cowie, Arlindo da Silva, Rick Danielson, Alan Dawes, Donald W. Denbo, Charles R. Denham, Arnaud Desitter, Steve Diggs, Michael Dixon, Alastair Doherty, Bob Drach, Patrice Dumas, Frank Dzaak, Brian Eaton, Harry Edmon, Lee Elson, Ata Etemadi, Constantinos Evangelinos, John Evans, Joe Fahle, Gabor Fichtinger, Glenn Flierl, Connor J. Flynn, Anne Fouilloux, Jean-Francois Foccroulle, Mike Folk, David Forrest, David W. Forslund, Ben Foster, Masaki Fukuda, Dave Fulker, James Gallagher, Bear Giles, Tom Glaess, Peter Gleckler, André Gosselin, Gary Granger, Jonathan Gregory, Patrick Guio, Mark Hadfield, Magnus Hagdorn, Paul Hamer, Steve Hankin, Bill Hart, Kate Hedstrom, Charles Hemphill, Olaf Heudecker, Donn Hines, Konrad Hinsen, Leigh Holcombe, Tim Holt, Toshinobu Hondo, Takeshi Horinouchi, Chris Houck, Matt Huddleston, Matt Hughes, Doug Hunt, Alan Imerito, Jouk Jansen, Harry Jenter, Susan Jesuroga, Patrick Jöckel, Tomas Johannesson, Peter Gylling Jørgensen, Narita Kazumi, John Kemp, Jeff Kuehn, V. Lakshmanan, Bruce Langdon, Stephen Leak, Tom LeFebvre, Angel Li, Jianwei Li, Rick Light, Brian Lincoln, Keith Lindsay, Fei Liu, Jeffery W. Long, Dave Lucas, Valerio Luccio, Lifeng Luo, Steve Luzmoor, Lawrence Lyjak, Rich Lysakowski, Sergey Malyshev, Len Makin, Jim Mansbridge, Andreas Manschke, Chris Marquardt, Marinna Martini, William C. Mattison, Craig Mattocks, Mike McCarrick, Bill McKie, Ron Melton, Roy Mendelssohn, Pavel Michna, Barb Mihalas, Henry LeRoy Miller Jr., Philip Miller, Rakesh Mithal, Masahiro Miiyaki, Christine C. Molling, Skip Montanaro, Thomas L. Moore, Stefano Nativi, Gottfried Necker, Peter Neelin, Michael Nolta, Bill Noon, Enda O'Brien, Dave Osburn, Dan Packman, Simon Paech, Gabor Papp, Morten Pedersen, Dr. Louise Perkins, Michael D Perryman, Hartmut Peters, Ron Pfaff, David Pierce, Alexander Pletzer, Philippe Poilbarbe, Dierk Polzin, Jacob Weismann Poulsen, Ken Prada, Dave Raymond, Michael Redetzky, Rene Redler, Mark Reyes, Doug Reynolds, Mike Rilee, Mark Rivers, Randolph Roesler, Mike Romberg, Mathis Rosenhauer, Suzanne T. Rupert, Toshihiro Sakakima, Eric Salathe, Matthew H. Savoie, Marie Schall, Larry A. Schoof, Dan Schmitt, Robert B. Schmunk, Rich Schramm, William J. Schroeder, Uwe Schulzweida, Keith Searight, Guntram Seiss, Remko Scharroo, John Sheldon, Masato Shiotani, Michael Shopsin, Richard P. Signell, Steve Simpson, Joe Sirott, Greg Sjaardema, Dirk Slawinski, Cathy Smith, Neil R. Smith, Peter Paul Smolka, Nancy Soreide, Hudson Souza, Gunter Spranz, Richard Stallman, Bob Swanson, John Tanski, Karl Taylor, Jason Thaxter, Kevin W. Thomas, Philippe Tulkens, Tom Umeda, Joe VanAndel, Paul van Delst, Gerald van der Grijn, Richard van Hees, János Végh, Bernhard Wagner, Thomas Wainwright, Stephen Walker, Chris Webster, Paul Wessel, Carsten Wieczorrek, Gerry Wiener, Ralf Wildenhues, David Wilensky, Hartmut Wilhelms, Gareth Williams, David Wojtowicz, Jeff Wong, Randy Zagar, Charlie Zender, Remik Ziemlinski.
Strive for Discipline-Independence
Strive for Discipline-Independence
Demand is greater than supply for useful data-oriented infrastructure for science
Examples:
netCDF
LDM
THREDDS
udunits
Common Data Model
...
Emphasize Loose Coupling
Emphasize Loose Coupling
Data providers and data consumers should be uncoupled
Data storage should be uncoupled from visualization and analysis applications
Data distribution should be independent of type of data
...
Find Right Level for AbstractionsFind Right Level for Abstractions
Meteorological Data
Georeferenced Data
Scientific Data
Data
Radar Data
Improve Software Quality by Porting
Improve Software Quality by Porting
Platform-independence is important
Achieving it seems to improve quality of software in unexpected ways
Aiming for reasonable tradeoffs between portability and performance requires expertise
Solving portability problems for others (e.g. providing portable data, service-oriented architectures) is a growth industry
Java developers may ignore this
Work on Small Projects
Work on Small Projects
Unidata projects and software packages typically require only one or two developers
Much of software engineering is about scaling to large projects with dozens of developers
May be the #1 secret for success
Find and Exploit Tight Feedback LoopsFind and Exploit
Tight Feedback LoopsDevelop for an active and interested user community
Find specific users with problems important to them that your software can solve
Exploit short iterations for incremental development
Governance: establish and pay attention to an external Users Committee that meets regularly
Use the Software You Develop
Use the Software You Develop
“Eat your own dogfood”
The Unidata Integrated Data Viewer uses netCDF Java, THREDDS, NcML, netCDF decoders, VisAD, OPeNDAP, ADDE servers
Provides end-to-end testing
Prioritizes useful enhancements
Leads to early bug identification by developers instead of users
If taken too far, leads to NIH syndrome
Drive Development with Tests
Drive Development with Tests
Test-driven development (TDD) and Unit Testing gives developers confidence to
refactor code
try big changes
port to new platforms
Example: netCDF “make check” runs over 150,000 tests
Value People over Process
Value People over Process
Important tenet of the “Manifesto for Agile Software Development”, http://agilemanifesto.org/, to value:
Individuals and interactions over processes and toolsWorking software over comprehensive documentationCustomer collaboration over contract negotiation
Responding to change over following a plan
Arrange Long Funding CyclesArrange Long Funding CyclesT. T. T.
Put up in a placewhere it's easy to seethe cryptic admonishment
T. T. T.
When you feel how depressinglyslowly you climb,
it's well to remember that Things Take Time.
--Piet Hein
Summary: The “Secrets”
Summary: The “Secrets”
1.Involve developers in support
2.Leverage users efforts
3.Strive for discipline-independent infrastructure
4.Emphasize loose coupling
5.Choose the right level for abstractions
6.Improve quality by porting
More “Secrets”More “Secrets”
7. Work on small projects
8. Find good feedback loops
9. Use your own software
10.Drive development with tests
11.Value people over process
12.Arrange for long funding cycles