+ All Categories
Home > Education > Biothings presentation

Biothings presentation

Date post: 12-Feb-2017
Category:
Upload: cyrus-afrasiabi
View: 130 times
Download: 0 times
Share this document with a friend
19
Biothings.api https://github.com/SuLab/biothings.api Generalizing MyGene and MyVariant
Transcript
Page 1: Biothings presentation

Biothings.apihttps://github.com/SuLab/biothings.api

Generalizing MyGene and MyVariant

Page 2: Biothings presentation

Motivation

• Isolate the common aspects of MyGene and MyVariant codebases and make them available in a separate framework: biothings.api

• Allows easier development of additional biothings APIs (Disease, Drug/Chemical, GO, Species… -> JSON, aggregate on a single field)

• Allows easier maintenance and development of current biothings (gene, variant).

Page 3: Biothings presentation

System Overview

• The tornado HTTP server consists of handlers that contain the code to run when a particular URL pattern is matched, e.g. /variant/, or /metadata

• The biothing codebase essentially contains the connection between the appropriate Tornado HTTP Request Handler for a request and the elasticsearch query that executes that request

Page 4: Biothings presentation

Biothings – HTTP Handling• tornado.web.RequestHandler: base tornado class for HTTP request handling. Important class methods:

get/post, get_arguments, write• biothings.www.helper.BaseHandler: contains methods common to all biothings RequestHandlers.

Important class methods: get_query_params, return_json• biothings.www.api.handlers.QueryHandler: contains methods to implement the biothings query

endpoint. Important class methods: get, post, _examine_kwargs• biothings.www.api.handlers.BiothingHandler: contains methods to implement the biothings annotation

endpoint. Important class methods: get, post, _examine_kwargs• biothings.www.api.handlers.MetaDataHandler: contains methods to implement the metadata endpoint• biothings.www.api.handlers.StatusHandler: contains methods to implement a status endpoint for AWS

ELB

Page 5: Biothings presentation

Biothings – HTTP Handling• biothings.www.api.handlers.BiothingHandler:

– GET request (e.g. /variant/chr6:g.152708291G>A)

– POST request (e.g. /variant/)

Page 6: Biothings presentation

Biothings – HTTP Handling• biothings.www.api.handlers.QueryHandler:

– GET request (e.g. /query?q=_exists_:dbsnp)

– POST request (e.g. /query/)

Page 7: Biothings presentation

Biothings – Elasticsearch query• biothings.www.api.es.ESQuery – contains the python code

for constructing the elasticsearch query and formatting the resulting data– query(q, **kwargs) – Contains the elasticsearch query to run with data obtained from a

GET or POST to the /query/ endpoint.– get_biothing(bid, **kwargs) – Contains the elasticsearch query to run with data

obtained from a GET to the /annotation/ endpoint.– mget_biothings(bid_list, **kwargs) – Contains the elasticsearch query to run with data

obtained from a POST to the /annotation/ endpoint.– _cleaned_res(res) – Contains the code to format the return object for get_biothing and

mget_biothings.– _cleaned_res2(res) – Contains the code to format the return object for query.– _get_biothingdoc(hit) – Contains the code to format a single biothing object from any

elasticsearch query. Called by _cleaned_res and _cleaned_res2.– _modify_biothingdoc(doc) – Contains the code to modify a biothing_doc. Called in

_get_biothingdoc. Currently empty -> for subclassing.

Page 8: Biothings presentation

Biothings - Settings• Problem: Until now, we have left out the problem of how to

refer to things that MUST be project specific (e.g., the name of the elasticsearch index to search, the type of the document, etc). How do we do this?

• Solution: We make a settings module in biothings that all code within biothings refers to. That module looks for an environment variable called BIOTHING_SETTINGS with the name of a module that can be imported to set project specific variables.– export BIOTHING_SETTINGS = ‘biothings.config’

• Similar to Django.

Page 9: Biothings presentation

Biothings - Settings

Page 10: Biothings presentation

Biothings – Project template• At this point, we have the tools necessary to easily create and

subclass 4 types of biothings handlers (BiothingHandler, QueryHandler, MetaDataHandler, StatusHandler), and the elasticsearch query class (ESQuery)

• Could definitely stop here and have a useful tool, but we wanted to make it even easier to create a new project (also enforces a uniform project structure across all biothings APIs).

• To do this we have a project template folder containing the project directory structure and some skeleton code:– config.py, – URL patterns to Handlers connection– Handlers to ESQuery connection

Page 11: Biothings presentation

Biothings - Project template

• To create the actual project directory from the template, we wrote a small function: start-project.py– Usage: python start-project.py <path-to-project-directory>

<biothing-object-name>– python start-project.py ~ variant

• Any folder or file in the template directory will be created in the project directory. The contents of any file are passed through the python String.template function before they are created in the project directory.

Page 12: Biothings presentation

Biothings – Project template

www.api.handlers

Page 13: Biothings presentation

Biothings – MyVariant Project

www.api.handlers

Page 14: Biothings presentation

Biothings – MyVariant Project

www.api.handlersPart 1

Page 15: Biothings presentation

Biothings – MyVariant Project

www.api.handlersPart 2

Page 16: Biothings presentation

Biothings – MyVariant Project

www.api.es Part 1

Page 17: Biothings presentation

Biothings – MyVariant Project

www.api.es Part 2

Page 18: Biothings presentation

Recreating MyVariant.info using biothings.api

• Recreated current MyVariant.info service using the biothings.api framework– Very little extra code required (~100 lines)– Less than a day of time to create the web front end from start.– https://github.com/cyrus0824/myvariant.info_new

• Seems disingenuous to gauge the utility of a tool by recreating a codebase if that tool was itself created from the codebase => Should try implementing other APIs, especially MyGene.info (has more varied gene specific query options), and modify biothings as needed.

Page 19: Biothings presentation

Future work• Integrate data load and data index functions into

biothings• Documentation! – Projects like this need very good

documentation to be of any use to an API developer (on the level of tornado’s excellent documentation: http://www.tornadoweb.org/en/stable/web.html)

• Auto-generate clients (python client, R client)• Auto-generate ansible-playbook to create cluster

hardware on AWS• One-click API…


Recommended