Post on 10-May-2015
description
transcript
The life-‐sciences as a pathfinder in data-‐intensive research prac3ce Dr Andrew Treloar, Director of Technology
July 10, 2014 CC-‐BY-‐SA, @atreloar 1
Structure presenta3on § Research Lifecycles § Func3ons of Scholarly Communica3on § Pointers to the future § Characterising the future § Pathfinder problems § Conclusions
July 10, 2014 CC-‐BY-‐SA, @atreloar 2
So many lifecycles…
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 3
Minimal Research Lifecycle
Think
Do Share
July 10, 2014 CC-‐BY-‐SA, @atreloar 4
Sharing: Scholarly Communication System and its Functions § Registration § Certification § Awareness § Archiving
(Rosendaal and Geurts, 1997)
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 5
System of Journals § Registration
§ submission of manuscript § Certification
§ peer-review (pre-publication) § commentary (post-publication)
§ Awareness § discovery services
§ Archiving § libraries (print) § publishers (electronic) § special purpose organisations (e.g. Portico)
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 6
Pointers to the future
“the future is already here – it’s just not very evenly distributed”
William Gibson, NPR interview
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 7
Registration: BioRxiv
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 8
Registration: Github
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 9
Registration: WikiPathways
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 10
Registration: NeuroLex
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 11
Registration: Nanopublications
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 12
Registra3on: some observa3ons § Decoupling registra3on from cer3fica3on § Timestamping, versioning § Registra3on of various types of objects § Machines as creators and contributors
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 13
Certification: PubMed Commons
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 14
Certification: PubPeer
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 15
Cer3fica3on: Publons
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 16
Cer3fica3on: some observa3ons § Peer-‐review decoupled from publica3on process § Cer3fica3on of various types of objects § Machines valida3ng form § Social endorsement
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 17
Awareness: myExperiment
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 18
Awareness: eLabNotebook RSS
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 19
Awareness: Twitter
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 20
Awareness: some observations § Awareness for various types of objects § Real 3me awareness § Awareness support targeted at machines § Awareness through social media
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 21
Archiving: PDB
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 22
Archiving: GenBank
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 23
Characterising the future
Fixed Varying
Discrete Continuous
Hidden VisibleResearch Process
Nature of object
Process of making public
Speed of communicationDelayed Instant
Atomic CompoundAtomicity of object
Communicated objectPublication
+data proxies
Publication + linked data + linked models
Formal InformalNature of processJuly 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 24
Fundamental changes § The research process (objects, social
dimension) is becoming more exposed § Articles, books are no longer the only
relevant objects for research communication
§ Objects are no longer static § Machines are joining humans as
(co-)creators and consumers of research objects
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 25
Pathfinder problems § Integrity of the scholarly record § The three obsolescences
§ hardware § file format § soWware
July 10, 2014 CC-‐BY-‐SA, @atreloar 26
System of Journals: Archiving
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 27
Web of Objects: Archiving?
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 28
Not just citation relationships
July 10, 2014 CC-‐BY-‐SA, @hvdsomp and @atreloar 29
The problem of obsolescence § Lifescience research environment can be viewed as undergoing a process of accelerated evolu3on
§ Other disciplines will hit these problems in 3me
July 10, 2014 CC-‐BY-‐SA, @atreloar 30
Cambrian explosion
July 10, 2014 31
Hardware obsolescence: Roche 454
July 10, 2014 CC-‐BY-‐SA, @atreloar 32
SoWware obsolescence: too much choice, not enough support
July 10, 2014 CC-‐BY-‐SA, @atreloar 33
Abandonware § “Last summer, a member of the biology department of the
University of Udine in Italy approached Nicola Vitacolonna with an intriguing project. The ANREP program, which annotates structural mo3fs in gene or protein sequences, was out of date having been wriben more than a decade ago. Although s3ll used by molecular biologists, its slow compu3ng ability meant a straighcorward mul3ple search could take all night on a desktop PC. The Udine biologist wanted Vitacolonna, a postdoctoral fellow in computa3onal biology, to write a program that could do the job more quickly.” § Sam Jaffe, Scien3sts Abandon their SoWware, The Scien)st, Feb 16, 2004
July 10, 2014 CC-‐BY-‐SA, @atreloar 34
File format obsolescence: Illumina § Probability of error in basecalling encoded using ascii code to reduce file size
§ Meaning of the ascii code changed along the life cycle and for data generated at different 3me points the quality might be encoded differently
§ “If you get an error like "Invalid quality score value", your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores. You'll need to add the op3on "-‐Q33" to your FASTX Toolkit arguments”. Obviously…
July 10, 2014 CC-‐BY-‐SA, @atreloar 35
Evereb Rogers, Diffusion of Innova)on, 1962
July 10, 2014 CC-‐BY-‐SA, @atreloar 36
Conclusions § Need to move to a smaller number of standard file formats
§ Need to move to a more sustainable model of soWware development and maintenance
§ Need to encourage placorm manufacturers to innovate around the hardware, not the soWware
§ NOTE: other disciplines are looking to lifesciences to work out how to solve some of these problems
July 10, 2014 CC-‐BY-‐SA, @atreloar 37
On best prac3ces in the development of bioinforma3cs soWware, Front. Genet., 02 Jul 14
§ Source code available to reviewers § SoWware indexed, citable, available § Source code documented § Source code managed § Test libraries, sample data and dataset repositories available
July 10, 2014 CC-‐BY-‐SA, @atreloar 38
Ques3ons? § andrew.treloar@ands.org.au
§ @atreloar § hbps://www.slideshare.net/atreloar/the-‐lifesciences-‐as-‐a-‐pathfinder-‐in-‐dataintensive-‐research-‐prac3ce
July 10, 2014 CC-‐BY-‐SA, @atreloar 39