Facilitating exploration of mass biological information by ...elhaij/transfer-2/Elhai-ISBRA.pdf ·...

Facilitating exploration of mass biological information by researchers unfamiliar with computer programming

Jeff Elhai,1 JP Massar,2 John Myers3, and Mark Slupesky2

1 Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond VA, USA [email protected] 2 Berkeley CA, USA

([email protected], [email protected]) 3Sequoia Consulting, North Hills CA, USA

[email protected]

Abstract. BioBIKE (biobike.csbc.vcu.edu) is a web-based integrated knowledge environment combining biological databases with analytical tools and a programming language that is imbued with the concepts of molecular biology and accessible to those without programming experience. A graphical, menu-driven interface facilitates the process through which researchers may exploit genomic, metabolic, and experimental data and apply analytical approaches tailored to biological questions of interest. BioBIKE addresses the need of researchers new to computation to be able to explore the dimensions of bioinformatic problems independently, while at the same time serving as the medium through which they can communicate effectively with software developers on collaborative solutions to more complex problems.

Keywords: End-user programming, Visual programming; Collaboration

1 Introduction

Imagine that accountants came to their profession with no sense of numbers and little idea how their accounting tools worked. It still might be possible for them to function productively, by mechanically applying the principles of best practice. However, if they encountered a situation that deviated from the experience on which those princi-ples rested, how would they perceive it? And how would they devise a new account-ing strategy to meet the challenge?

It would seem absurd for accounting or any other profession to find itself in such a state, but biological research is getting close. Many biological subdisciplines now rely increasingly on mass information, e.g. genomes and transcriptomes. However, most researchers are not able to use a computer themselves to work directly with the raw material of such analyses [1]. They choose instead either to abstain from the benefits of large data sets (proceeding as they did before their availability), to muddle through as best they can with the few tools with which they are comfortable, or (if they have

the resources) to rely on the services of others who are capable of building computa-tional tools for them [2].

It is therefore not surprising that many have called for a new world order in which biological researchers (or, indeed, everyone) routinely learn to interact creatively with the computer [1,3-5]. But this is hardly the first such call [e.g., 6-7], and we have no reason to believe that the next generation of researchers will respond any differently from the last. The lack of enthusiasm of many in the biological research community for computation may be attributable in part to a discomfort with mathematical abstractions [8] that might be addressed through changes in the way in which mathematics and computation are presented in K-12 education [9-10], but part may also be the result of a rational calculation [11]. Researchers may not see an immediate payoff in terms of improved research outcomes that could come through the effort they would need to expend to embrace computation.

We might make more substantial progress by first acknowledging that the cultures of computer science and laboratory science are intrinsically different [12]. In general, the laboratory researcher uses and sometimes develops software for a specific purpose. The means to achieve that purpose is generally not specified at the outset but emerges through experimentation (or perhaps remains subconscious even at the end). This approach – anathema to computer science – is essential to the scientific process, because the goal of understanding nature demands placing as few prior restraints on discovery as possible. The surprises that may arise through this loosely directed process are irritations from the perspective of an engineer but are the life blood of science [1].

In contrast, the culture that the computer scientist brings to software development is marked rigor, necessary to establish whether the approach under consideration is valid beyond the relatively few tests typically performed by the researcher and wheth-er it is resistant to unusual conditions that may arise. The efficiencies made possible by a deep knowledge of algorithms may determine whether an approach can serve practically in the analysis of large data sets.

The environment and training that might lead a computer scientist to a mastery of software development are therefore different from those suitable to lead a laboratory researcher to the very different point of being able to use the computer to explore nature. These differences are illuminated by the rich literature concerning the distinc-tive challenges of end-user software development [13-14]. In this light, we describe here a web-based resource, BioBIKE [15-16], whose aim is to provide an environ-ment to researchers that may enable them to use the computer creatively and largely on their own, despite minimal prior experience in computer programming. It offers the incentive of immediate gratification – questions of research interest that can be addressed directly through the resource. These rewards may embolden the researcher to investigate the capabilities of the resource and gain the greater rewards from more sophisticated personal programming.

The researcher should be able to handle most small questions within BioBIKE, perhaps requiring assistance but nonetheless retaining ownership. However, questions that grow past a certain level of complexity call for a collaboration with someone knowledgeable in such matters. The meeting of biological and computer science

collaborators has been described as a trading zone [2], emphasizing the intellectual interest to both parties that goes well beyond the traditional reliance on specification of the problem [17]. This kind of interaction requires a common trade language [18], one powerful enough to permit at least a provisional solution to the computational problem yet accessible enough for a neophyte to explore the problem's possibilities. BioBIKE is an attempt to provide such a language.

2 Overview of BioBIKE

BioBIKE (Biological Integrated Knowledge Environment) combines the data used by a research community with a language specialized for molecular biology and genome analysis, made available to the user through either of two interfaces – one graphical, the other text-based. BioBIKE decreases the minimal effort required for competence by relieving researchers of the onerous tasks of finding data and converting the results of one operation into the format required for the next and by helping them find tools or devise new tools to bring the data to bear on biological problems.

There are currently three instances of BioBIKE, each based on a different database serving a different research community: CyanoBIKE (for those who study cyanobac-teria), PhAnToMe/BioBIKE (for those who study bacteria and their phages), and GunneraBIKE (for those who study the flowering plant Gunnera).

All instances are freely accessible through a common web portal (http://biobike.csbc.vcu.edu). Users who wish to add knowledge to the database (e.g. correct the annotation of genes) must register, but otherwise there are no restrictions on use. At present, the instances are accessible only through FireFox.

Further details may be gleaned from earlier reports on BioBIKE [15-16].

2.1 Databases underlying BioBIKE instances

Each instance draws on a large set of genomes of interest to a particular research community. At the time of writing, PhAnToMe/BioBIKE contains 752 bacterial genomes and 750 phage genomes from the SEED database [19]. Information about each organism, gene, and protein is kept in individual frames, flexible data structures that can be dynamically redefined [15]. The type of information contained in the frame of a gene varies from instance to instance but will include the genome coordinates and annotation and may also include information such as its GO ID, COG ID, splice sites, and transmembrane regions. Users are able to upload and bring into BioBIKE any genome that is in GenBank format.

Instances also possess metabolic knowledge, taken from GO [20], KEGG [21], and BioCyc [22]. In addition, BioBIKE calls out to KEGG to obtain organism-specific metabolic maps as needed. All of this makes it possible to define a set of genes in an organism related to a specific metabolic pathway and use the set in subsequent operations.

BioBIKE maintains an internal table of protein similarities from precomputed Blast [23] results. These tables are used to determine orthologous relationships amongst

proteins, thereby making certain operations surprisingly fast. For example, Fig. 1 shows the use of the precomputed orthologous relationships to determine an interesting set of proteins, all within a second.

Fig. 1. Use of precomputed orthologous relationships to find all proteins with orthologs present in all N2-fixing cyanobacteria but not in cyanobacteria that don't fix N2. (A) The set of cyanobacteria that do not fix N2 is defined based on a built-in set of N2-fixing cyanobacteria. (B) The two sets are used to find the desired set of protein specific to N2-fixing cyanobacteria (see also Fig. 6).

If a research community has microarray data available, then that can be uploaded

into BioBIKE and made available. Fig. 2 shows a function in the midst of being constructed that exploits microarray data from the cyanobacterium Anabaena PCC 7120 (nicknamed A7120).

2.2 Programming language

Figs. 1 and 2 show representations of BBL (BioBIKE Language), an extension of Lisp. The boxes can be translated directly to Lisp simply by replacing the boundaries of the yellow and orange boxes with parentheses. For example the bottom line of Fig. 1 translates to the BBL macro:

(COMMON-ORTHOLOGS-OF *n-fixing-cyanobacteria* :NOT-IN non-n-fixers)

Most users interact exclusively within the graphical interface, however computation-ally sophisticated users may feel more comfortable using the text interface, called the weblistener [15], particularly when working with complicated programs. Code from the graphical interface can be translated into text form with a single click. However, at present there is no automated way to translate code from text to graphical form.

Fig. 2. Computation based on micro-array data. Hierarchical menus facili-tate access to microarray data (left). Clicking the desired data set brings it into the open argument box (below). The function returns the expression ratio for those genes of Anabaena PCC 7120 (A7120) related to carbo-hydrate metabolism.

Fig. 3. Anatomy of a function box. (A) A completed function that looks for telephone numbers within text (three regions of numbers, the first of indeterminate length, sepa-rated by hyphens). The user is selecting options from a (truncated) option menu (one option is to use conventional Regex). (B) Function as it appears before any user edit-ing. The Action Menu is displayed.

BBL has access to all of Lisp’s functionality (though not all of it is easily access-ible through the graphical interface). In this sense, it is therefore a general purpose programming language. However, its extensions are heavily weighted towards those that facilitate operations of interest to those doing genomic analysis, and in that sense it is a domain-specific language.

The feel of the BBL graphical interface is markedly different from those inspired by wiring diagrams, such as LabView [24], as well as from the interfaces of other available genome-related services, such as Taverna [25] and Galaxy [26]. The latter applications use a pipeline as the dominant metaphor, connecting one tool with another. It is certainly possible to create reusable connections between tools in BioBIKE, but its dominant metaphor is different: boxes that fit into holes to form increasingly complex structures. In this, BioBIKE bears a family resemblance to Scratch [27] and its progeny, Snap! [28], both aimed at student audiences. This is no accident, as BioBIKE can trace its ancestry back to the MIT Media Lab (home of Scratch) through Mike Travers, an early participant in the BioBIKE project.

The box represents a program element, either a function (yellow or orange) or data (gray) that fits into holes (white). A function box consists of its name, followed by argument boxes (labeled to prompt the user what is needed) possibly with added words to improve readability (Fig. 3). Most functions also offer options that modify the default behavior of the function and control the format of the output (Fig. 3A). Every function and data box is associated with an Action Menu that allows the user to manipulate the box (Fig. 3B). Boxes may be moved around the work space, through copy/paste operations, or directly by drag-and-drop. These tools make it easy to form complex functions from simple parts. Results from evaluating a function or data box appear in a box in the Result Pane, and these boxes also can be copied or moved to form components of new functions.

2.3 Tools

Like several other bioinformatics platforms (e.g. Galaxy [26]), BioBIKE offers access through a common interface to a collection of widely used tools, including those for sequence searches [23], alignment [29], tree construction [30], and motif discovery [31]. However, BioBIKE is unusual in that the results of all these tools are returned in a form that allows users to use them for subsequent computations of any sort. For

example, one could readily search the output of Blast (with no need for parsing), looking for sequence matches from organisms of a certain phylogenetic class.

Users can also build their own tools, either by bundling together a sequence of boxes that perform a useful purpose or by defining a function for future use or to share with others. It is not difficult to define one's own functions (Fig. 4), including recursive functions, with the option of exploiting BioBIKE's native type-checking facility and automatic mapping of lists (see Section 3.2). Functions so defined appear on the menu palette (Fig. 5) in the same way as built-in BioBIKE functions. Variables defined by the user are also available on a menu, reducing the likelihood of errors that arise from misspelling.

Fig. 4. Definition of a recursive function. The type specification is optional, causing an informative error message to appear if the user uses an inappropriate value is the argument.

Fig. 5. The menu palette. The user selects functions from menus dropping down from the green buttons or from the instance- and user-dependent blue buttons. This screenshot shows a menu containing user-defined function constructed in Fig. 4.

3 Features of the BioBIKE language

3.1 Intelligibility and tangibility

BBL embraces the natural language approach [32], in which utterances as much as possible follow the speech patterns of the target group of users. The goal of the language is that someone comfortable with the terms and concepts of molecular biology should be able to look at BBL code without prior training and make sense out of it (of course creating that code would be a good deal more difficult).

Similarly, the language attempts to present data-objects and results in as natural a fashion as possible. Many functions return results in a form that is well suited for subsequent computability but at the same time display pop-up windows in which the same information is shown in a format designed for human consumption. Fig. 6 illustrates this behavior.

BBL has gradually moved away from the sometimes confusing syntax inherited from Lisp, but some remains – note the operator-first syntax of the logical comparison in Fig. 4. This problem is being addressed.

Abstractions pose significant obstacles for those new to programming [33], and the natures of variables and compound data-objects are often difficult to grasp. BioBIKE allows users to bring into the workspace data-boxes that suggest the metaphor of a window into computer memory. A data-box associated with a variable displays the continuously updated value of that variable, enabling the user to assess its dependency on certain operations. Users may also view compound data types – lists, frames, and hashes (called tables in BioBIKE) in a spreadsheet-like environment that helps make the object more tangible and also facilitates editing of their elements.

3.2 Iteration

The concept of iteration is another major source of anxiety for those new to programming. BBL’s primary response to this problem is to hide iteration as much as possible, incorporating it invisibly into functions. Accordingly, functions automatically map themselves over lists given as arguments, when it makes sense to do so (see Fig. 6 for an example). In more complicated cases, the user can demonstrate the desired action for a single item and then indicate the list to which the action is to be applied. This capability is illustrated in Fig. 7.

However, sometimes loops can’t be avoided, and BBL provides a graphical representation of Lisp’s powerful LOOP macro, which combines various kinds of loops into a single form. Fig. 8 shows a simple loop that produces a quick plot of hydrophobicity along the length of a protein.

Fig. 6. Comparison of result and display from a function. (A) A func-tion that returns the description of a list of genes was executed immedi-ately after the function shown in Fig. 1. The asterisk is shorthand for "previous result". (B) The resulting popup display, including links to the genes. (C) The result, in a form suitable for further computation.

3.3 Terseness, visibility, and viscosity

Green and Petre [33] identified several lenses – cognitive dimensions -- through which one might evaluate visual programming environments. Here we consider three

Fig. 7. Implicit mapping of function over a list. (A) A function that creates a list consisting of the name, length, and number of prolines in a specific protein is dragged into APPLY-FUNCTION (shown by the arrows) to map it over the list of all proteins in Anabeana. The arrows are not part of the language. (B) Results of the mapping.

Fig. 8. Simple loop to create a hydropho-bicity plot. (A) After two parameters are defined, a loop considers a window sliding over a window. For each window, the hydrophobicities of the component amino acids are averaged, and the averages are collected. The resulting list is plotted. (B) The plot of the hydrophobicities of the protein.

*** PROBLEM: When you call T-TEST with the PAIRED flag, you must pass two lists of equal length. Of the two lists you passed, the first has 47 elements, while the second has 29 elements. *** ADVICE: Perhaps the PAIRED flag is not appropriate.

that relate to the appearance of code on the screen. Visual environments in general are at risk of overloading the screen. This problem is mitigated somewhat by BBL’s sophisticated functions that accomplish a great deal in the space of a single box (e.g. Fig.1). However, there is no question that functions longer than several lines consume a great deal of screen space and can be difficult to manage. To address this problem, BBL allows any box and its contents to be labeled and collapsed to a fraction of its size. For example, the loop shown in Fig. 8 could be collapsed to a box containing just “Calculates hydrophobicity” and inserted in that form into another function, Any group of boxes can be bundled together in the order they are to be executed, and once bundled, they too can be collapsed.

Several other features of the graphical interface facilitate editing. Boxes may be dragged to the proximity of any other box or inside any empty hole. Functions and variables, once defined, can be deleted from the workspace, freeing up space. They remain in memory and usable by other program units. Finally, it is possible to edit at any level within nested boxes, either removing or adding a level, with no need to disrupt other nesting levels.

These features, in principal, make it possible to create code of arbitrary complexity. In reality, however, it is sufficiently tedious working with large functions that most would prefer porting the code to the weblistener (the text interface).

3.4 Error-proneness and progressive evaluation

Just as space efficiency is a common Achilles heel of visual programming languages, a common strength is the reduction in syntactic burden. Users of BBL will seldom suffer from mismatched parentheses and will never forget terminating semicolons. Spelling typos are rare, because it is possible to pull down function and variable names from menus. Functions display their syntax. Users don't have to struggle to recall how many arguments a function takes and in what order (see Figs. 2 and 7).

But errors do occur, and when they do, error messages are usually generated by BBL rather than by Lisp. BBL messages try to offer specific advice when possible (Fig. 9).

Some aid is available for debugging. Most users, those who confine themselves to relatively simple functions, find the greatest help in the ability to evaluate any box within a compound function. For example, if the RATIOS-OF function in Fig. 2 gives an unanticipated result, evaluating the internal GENES-IN-PATHWAY/S function will tell you if it is returning the expected list of genes. If not, then it is possible to evaluate the still more internal arguments to the function to see if their values are as expected.

Every function and variable may be specifically monitored, as controlled through the box's Action Menu (see the menu dropped from the green wedge of MATCHES-OF-PATTERN in Fig. 3B). When monitoring is activated, evaluation of the box

Fig. 9. A sample error message elicted by calling the T-TEST with lists of unequal length.

causes the name of the box and its value to be displayed in a popup window. This is particularly helpful in debugging complicated loops and functions.

3.5 Help

Online help is available for most BBL functions through the Help item on a function's Action Menu (Fig. 3B) and the Help icon that is next to the name of the function whenever it appears in a menu on the Menu Palette (Fig. 5). These screens generally provide graphical examples that illustrate the function's scope and limitations. There are also help pages organized by biological problem. For those who want more exten-sive advice, there are tours that take the user click-by-click through sessions focusing on different capabilities. These are available directly from the BioBIKE portal.

All of these sources of help are indexed and available through the Search box that appears in the Workspace below the Menu Palette (Fig. 5). The search facility uses a natural language search to return a clickable list of help screens and functions. Clicking on a function conveniently brings it into the workspace.

All user sessions are automatically logged, and the logs are available to users as a reminder of what was done and how.

3.6 Collaboration

Functions and variables defined by a user are not visible outside of the user's workspace unless they are explicitly shared. The user may choose to share one object or multiple related objects. Indeed, an entire workspace and accompanying environment may be saved in such a way that others can restore it to their own workspace. Real time collaboration is also possible. Two or more users may collaborate within the same session, viewing the same workspace, though only one has control of the mouse at any one moment.

4 Concluding Remarks

There is an overwhelming incentive for biological researchers to gain some degree of mastery over the one tool – the computer – that can help them sift through the data that is becoming their primary source of insights. However, there has always been a strong incentive to form an alliance with the computer, and for the most part, biological researchers have not been moved. There's no obvious reason that incentive alone will move the next generation any more than the last.

Instead, we need a different strategy, one that begins by acknowledging (a) that most intent on a career in biology will not tolerate the slow, sustained course required to become practicing computer scientists and (b) that this is not a necessary end point in any case. In the long run, we will be better served if the majority learn how to write serviceable prose than a small minority learn how to write poetry. Researchers need enough literacy to do their work and to communicate with the computer poets who can take their efforts to a higher plane.

BioBIKE – still a work in progress -- will not spark a revolution, but we believe that the lessons learned in its development may inform a strategy that can do the job. An effective environment should yield immediate benefits, and that means a graphical, domain-specific language that offers a clear route to practical success. It must have outstanding on-line help that users can search for appropriate examples of successful approaches. It must provide strong support for collaborative efforts.

In passing, we note that what may draw in reluctant researchers are the same characteristics that may be effective in engaging undergraduates. One of us has used BioBIKE to enable sophomores without programming experience to complete bona fide bioinformatics research projects within the context of a one-semester course. To draw students into research, there is no more powerful tool than research itself.

Acknowledgments. This work was funded by National Science Foundation grant DBI-0850146 to J.E and supported by software grants from Franz, Inc., and Lisp-Works, Inc. We thank Jeff Shrager, Arnaud Taton, and Mike Travers for continued stimulating discussions, Joe Anderson, Victor Clarke, Tara Nulton, for the construc-tion of help screens, and a long list of past developers of BioBIKE.

References

1. Elhai, J.: Humans, Computers, and the Route to Biological Insights: Regaining Our Capacity for Surprise. J. Comput. Biol. 18, 867-878 (2011)

2. Shrager, J.: From Wizards to Trading Zones: Crossing the Chasm of Computers in Scientific Collaboration. In: Gorman, M.E. (ed.) Trading Zones and interactional Expertise. pp. 107-124. MIT Press, Cambridge MA (2010)

3. Emmott, S., Rison, S.: Towards 2020 Science. Microsoft Research (2006) 4. Wing, J.M.: Computational Thinking. Commun. ACM, Vol 49, 3, pp. 33-35 (2006) 5. Pevzner, P., Shamir, R.: Computing Has Changed Biology – Biology Education Must

Catch Up. Science 325, 541-542 (2009) 6. Ledley, R. S.: Digital Electronic Computers in Biomedical Science. Science 130, 1225-

1234 (1959) 7. Kemeny, J.G.: The Case for Computer Literacy. Daedalus, 112, 211-230 (1983) 8. Fawcett, T., Higginson, A.: Heavy Use of Equations Impedes Communication Among Bi-

ologists. Proc. Natl. Acad. Sci. USA 109, 11735-11739 (2012) 9. Lockhart, P.: A Mathematician's Lament. www.maa.org/devlin/lockhartslament.pdf (2002)

10. Resnick, M.: Technologies for Lifelong Kindergarten. Educ. Technol. Res. Devel. 46, 43-55 (1998)

11. Blackwell, A.F.: First Steps in Programming: A Rationale for Attention Investment Models. In: Proceedings IEEE 2002 Symposia on Human Centric Computing Languages and Environments. pp. 2-10. IEEE Press (2002)

12. Segal, J., Morris, C.: Developing Scientific Software. IEEE Software 25:18-20 (2008) 13. Nardi, B.A.: A Small Matter of Programming: Perspectives on End User Computing. MIT

Press, Cambridge MA (1993) 14. Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C.,

Lawrance, J., Lieberman, H., Myers, B., Rosson, M. B., Rothermel, G., Shaw, M.,

Wiedenbeck, S.: The State of the Art in End-User Software Engineering. ACM Comput. Surv. 43, 3, Article 21 (2011)

15. Massar, J.P., Travers, M., Elhai, J., Shrager, J.: BioLingua: A Programmable Knowledge Environment for Biologists. Bioinformatics, 21, 199–207 (2005)

16. Elhai, J., Taton, A., Massar, J.P., Myers, J.K., Travers, M., Casey, J., Slupesky, M., Shrager, J.: BioBIKE: a web-based, programmable, integrated biological knowledge base. Nucleic Acids Res. 37, W28-W32 (2009)

17. Segal J.: When Software Engineers Met Research Scientists: A Case Study. Empirical Software Eng. 10, 517–536 (2005)

18. Galison, P.: Trading with the Enemy. In: Gorman, M.E. (ed.) Trading Zones and interactional Expertise. pp. 25-52. MIT Press, Cambridge MA (2010)

19. Overbeek, R., Disz, T., Stevens, R.: The SEED: A Peer-To-Peer Environment for Genome Annotation. Commun. ACM 47, 46-51 (2004)

20. Gene Ontology Consortium: Gene Ontology Annotations and Resources. Nucleic Acids Res. 41, D530-D535 (2013)

21. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M. Hirakawa, M.: From Genomics to Chemical Genomics: New Developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006)

22. Latendresse, M., Paley, S., Karp, P.D.: Browsing Metabolic and Regulatory Networks with BioCyc. In: van Helden, J. et al. (eds.) Bacterial Molecular Networks: Methods and Protocols. Methods Molec. Biol. 804, 197-216 (2012)

23. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J.: Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 25, 3389–3402 (1997)

24. Labview, National Instruments. (http://www.ni.com/labview/) 25. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P. Oinn, T.: Taverna:

A Tool for Building and Running Workflows of Services. Nucleic Acids Res. 34, W729–W732 (2006)

26. Goecks, J., Nekrutenko, A., Taylor, J., Galaxy Team: Galaxy: A Comprehensive Approach for Supporting Accessible, Reproducible, and Transparent Computational Research in the Life Sciences. Genome Biol. 11, R86 (2010)

27. Maloney, J., Resnick, M., Rusk, N., Silverman, B., Eastmond, E.: The Scratch Program-ming Language and Environment. ACM T. Comput. Educ. 10, article 16 (2010)

28. Mönig, J., Harvey, B.: BYOB 3.1 — Build Your Own Blocks (a/k/a SNAP!) (http://snap.berkeley.edu/)

29. Thompson, J.D., Higgins, D.G. Gibson, T.J.: CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res., 22, 4673–4680 (1994)

30. Felsenstein, J.: PHYLIP (Phylogeny Inference Package) Version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle (2005)

31. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., Noble, W.S.: MEME Suite: Tools for Motif Discovery and Searching. Nucleic Acids Res. 37, W202-W208 (2009)

32. Myers, B.A., Pane, J.F., Ko, A.: Natural Programming Languages and Environments. Commun. ACM 47, 47-52 (2004)

33. Green, T.R.G. and Petre, M.: Usability Analysis of Visual Programming Environments: A ‘Cognitive Dimensions’ Framework. J. Visual. Lang. Comput., 7, 131–174 (1996)

Date post:	17-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Facilitating exploration of mass biological information by ...elhaij/transfer-2/Elhai-ISBRA.pdf ·...

Documents