+ All Categories
Home > Documents > SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that...

SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that...

Date post: 01-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
59
SMRT Link Services Release 5.1.0 Pacific Biosciences Apr 26, 2018
Transcript
Page 1: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link ServicesRelease 5.1.0

Pacific Biosciences

Apr 26, 2018

Page 2: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline
Page 3: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

Contents

1 Introduction 1

2 SMRT Link System Architecture High Level Overview 32.1 ServiceJob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Import DataSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Fasta Reference Convert and Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.3 “Analysis” or “pbsmrtpipe” Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.4 Other Job Types Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.5 ServiceJob Data Model and Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 DataStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 SMRT Link Importing of DataStoreFile(s) from a DataStore . . . . . . . . . . . . . . . . . . . . . . 6

2.3.1 PacBio DataSet Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 PacBio Report Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.3 Accessing Report(s) from SMRT Link Analysis . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 SMRT Link Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Tools 93.1 Commandline Interaction with SMRT Link Services . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Checking the Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Importing a DataSet into SMRT Link Server . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.3 Submit an Resequencing Analysis Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.4 Resubmitting an Analysis Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.5 Importing/Converting a RSII movie into SMRT Link . . . . . . . . . . . . . . . . . . . . . 103.1.6 Querying Job History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1.7 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Conversion and Other Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.1 Fasta to ReferenceSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.2 Convert RSII movie metadata XML to HdfSubreadSet XML . . . . . . . . . . . . . . . . . 11

4 SMRT Link Analysis Services Config 134.1 Common Analysis Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 SMRT Link Bundle External Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 PacBio Data Bundle Model and Services 155.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Example PacBio Data Bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

i

Page 4: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

5.3 PacBio Data Bundle Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.4 SMRT Bundle Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.5 Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.6 Legacy API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.7 Building a Stand Alone Chemistry Update Bundle Server . . . . . . . . . . . . . . . . . . . . . . . 185.8 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.9 Details of the Root Bundle Dir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.10 Building and Starting up the Chemistry Bundle Upgrade Server . . . . . . . . . . . . . . . . . . . . 195.11 Getting a List of PacBio Data Bundles from SMRT Link Server . . . . . . . . . . . . . . . . . . . . 195.12 Bundles Stored within the SL System install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.12.1 Chemistry Data Bundle Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.13 SMRT Link PartNumbers and Automation Constraints WebService . . . . . . . . . . . . . . . . . . 205.14 SMRT Link Periodic Checking for Chemistry Data Bundle Upgrades . . . . . . . . . . . . . . . . . 20

6 SMRT Link Analysis Services API 21

7 Eve SMRT Server for Events and Tech Support File Uploads 237.1 General Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237.3 Building, Configuring and Starting up Eve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.4 Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.5 Start the Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247.6 Eve and SMRT Link Server Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.7 Tech Support TGZ Bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.8 Eve WebServices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267.9 Events/Messages Generated Within SMRT Link Analysis Service . . . . . . . . . . . . . . . . . . . 26

7.9.1 How Messages are sent out of SMRT Link to External Server . . . . . . . . . . . . . . . . . 27

8 Logging configuration 298.1 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298.2 Command-Line Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

9 PacBio Update Server 339.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339.2 Packaging Chemistry Update Bundle Server Build . . . . . . . . . . . . . . . . . . . . . . . . . . . 339.3 Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339.4 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349.5 Systemd service definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349.6 Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.7 Automated build and deploy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

10 SMRT Link Services Common Tasks And Workflows 3710.1 How to get the reports for SMRT Link Job By Id . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

10.1.1 How to get the SMRT Link reports for dataset by UUID . . . . . . . . . . . . . . . . . . . 3810.1.2 How to get QC reports for a particular SMRT Link Run . . . . . . . . . . . . . . . . . . . . 3910.1.3 How to get QC reports for a particular Collection . . . . . . . . . . . . . . . . . . . . . . . 4010.1.4 How to get recent Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4110.1.5 How to setup a Run in Run Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4110.1.6 How to monitor progress of a SMRT Link Run . . . . . . . . . . . . . . . . . . . . . . . . 4210.1.7 How to capture Run level summary metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 4410.1.8 How to setup a job on a particular Collection . . . . . . . . . . . . . . . . . . . . . . . . . 4410.1.9 How to delete a SMRT Link Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4810.1.10 How to setup a SMRT Link Analysis Job for a specific Pipeline . . . . . . . . . . . . . . . . 4910.1.11 Querying Job History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

ii

Page 5: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

11 Disclaimer 53

iii

Page 6: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

iv

Page 7: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 1

Introduction

This document describes the SMRT Link Web Services API provided by Pacific Biosciences. The API allows devel-opers to design and qc instrument runs, query new data from the instrument, and start analyses on the data.

The Web Services support RESTful access from web clients, and can be used from the command line with wget orcurl, and from programming languages, such as Python, or Scala.

1. The API includes functions for:

• Managing run designs

• Managing resources, such as instruments

• Managing Data Sets

• Managing jobs

2. The Web Services APIs

• Run in or under a standard Linux/Apache environment, and can be accessed from Windows, Mac OS or Linuxoperating systems.

• Are installed as part of the Secondary Analysis system, and require a one-time configuration.

3. Commandline Tools for dataset processing (fasta to ReferenceSet)

4. Commandline tools for interfacing with the SMRT Link Analysis Services (pbservice)

1

Page 8: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

2 Chapter 1. Introduction

Page 9: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 2

SMRT Link System Architecture High Level Overview

The SMRT Link System is comprised of 6 core components:

• SMRT Link System Installer for general admin, configuring the system, and upgrading

• SMRT Link Tools Commandline tools written in python, c++ and scala from the SAT and SL team

• SMRT Link Analysis Services (SLA) Scala driven webservices using spray framework

• SMRT Link Tomcat WebServer For SMRT Link UI written in Javascript/Typescript using angular2

• SMRT View Visualization of SMRT Link Analysis Jobs

• Enterprise WSO2 API Manager for authentication and authorization

Note, “SMRT Link” is a very overloaded term. It’s recommended to communicate using the subcomponent of thesystem to avoid confusion.

This overview provides a description of the core abstractions used in the SMRT Link Analysis Services to process andproduce data leverage SMRT Link Tools. The core unit of computational work at the SMRT Link Analysis Serviceslevel is the ServiceJob.

ServiceJob

A ServiceJob (i.e., “engine” job refered to in the scala code) is a general polymorphic async computational unit thattakes input of type T and returns a DataStore. A DataStore is a list of DataStoreFile instances. Each DataStoreFilecontain metadata about the file, such as file type (GFF, Fasta, PacBio DataSet, PacBio Report, Log, Txt), globallyunique id (uuid), file size, and “source id” (details provided in a later section).

After a ServiceJob is run, the DataStore (and it’s DataStoreFile(s)) is imported back into SMRT Link Analysis. TheseDataSets are now accessible for further analysis by other ServiceJob(s).

(In psuedo-ish scala code)

def run[T](opts: T): DataStore

There are several Service Job types within SMRT Link Analysis Services of note:

3

Page 10: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Import DataSet

Takes a path to a PacBio DataSet and generates a DataStore with a path to the PacBio DataSet as well as generatingReport(s) file types and a Log of the ServiceJob output.

Fasta Reference Convert and Import

Task a path to a fasta file and converts to a PacBio ReferenceSet. The ReferenceSet (and DataSet Reports, Log ofServiceJob) are added to the DataStore

“Analysis” or “pbsmrtpipe” Job

Internally, this job type is referred to as “pbsmrtpipe” job, whereas marketing refers to this job type as “analysis”. Thisis what will be displayed in the SMRT Link UI.

This job type takes a Map[String, EntryPoint] (EntryPoint defined below), task options, pipeine template id as inputs(i.e., “T”) and emits a DataStore. Depending on the pipeline template Id, the DataStore will be populated with differentoutput file types. (Pipeline Templates are described in more detail in the next section).

In pseuod-ish scala code:

case class Opts(entyPoints:Map[String, EntryPoint],taskOptions: Map[String, TaskOption],pipelineId: String,jobName: String)

def run[Opts](opts: Opt): DataStore

Analysis jobs are the heart of processing PacBio DataSets (e.g., SubreadSet(s)) within SMRT Link.

An EntryPoint is a container for an id of DataSet and a DataSetMetaType (e.g, “SubreadSet”, “ReferenceSet”). TheSLA Services will resolve the DataSet to a path that can be used within a pbsmrtpipe execution.

Each analysis pipeline id has a well-defined set of EntryPoint(s) that are required. For example a pipeline tem-plate id “alpha” might have an entry point of e_subread:SubreadSet and e_rset:ReferenceSet (using the entry-id:DataSetMetaType notation).

A Pipeline Template is a static encoding of the EntryPoint(s) of a pipeline (by id), default task options and displaymetadata, such as name, description of the pipeline. Pipeline Template objects are currently defined in python (as codeto enable resuability of subworkflows) and can be emitted as JSON files. These JSON files are loaded by SMRT LinkAnalysis on startup and exposed as webservice (for the UI or pbservice).

The schema for the Pipeline Template data model is here

Pipelines are executed by pbsmrtpipe which will call one (or more) tasks defined using the PacBio ToolContractinterface.

The ToolContract interface encodes task metadata, such as the input and output file types (e.g, GFF, SubreadSet),available and default task options, is the task distributed, number of processors/threads to use, etc...

Note, for historical reasons, there’s some loosenses in nomenclature; “task” and “tool contract” are often used inter-changeably. These represent the same noun in the SMRT Link software stack.

More details of ToolContract data model and interface is defined in pbcommand

More details of pbsmrtpipe and Creating Analysis Pipelines are described here.

4 Chapter 2. SMRT Link System Architecture High Level Overview

Page 11: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandlineby invoking pbsmrtpipe directly. Conversely, only a subset of Pipelines that are runnable from the commandline arerunnable from SMRT Link Services. Specifically, only pipelines that only have PacBio DataSet types as Entry-Points are supported. This is because the UI only allows selecting and binding of EntryPoint(s) as PacBio DataSets.

There is a “raw” pbsmrtpipe interface to the SMRT Link Web services that supports creating ServiceJobs that alreadyhave the EntryPoint(s) resolved to paths.

Other Job Types Examples

While the previous example of ServiceJob(s) are focused on importing or analysis to creating output files, there areother uses for a ServiceJob. For example, the DeleteDataSetJob is a job type that will delete datasets (and parentdatasets) from the file system asynchronously and generate a DataStore file with a Report and Log of the output.

Note that only “pbsmrtpipe” (i.e., analysis) and import-dataset Jobs (in DataManagement) are displayed in SMRTLink UI.

ServiceJob Data Model and Polymorphism

The metadata of a ServiceJob is stored within the SMRT Link Database and is the core unit that is displayed on theUI.

For brevity, only a subset of the properties are show below. See the SMRT Link docs for more details.

case class ServiceJob(uuid: UUID,id: String,name: String,jobTypeId: String,state: JobStates.JobState,createdAt: DateTime,settings: JsonObject)

Property Summary

• UUID globally unique identifer for the job

• id unique to the SMRT Link Instance

• jobTypeId Unique identifier for the job type (e.g., “pbsmrtpipe”)

• name Name of the ServiceJob

• state Current state of the job

• settings Json structure of the JobType specific settings

The settings are where the polymorophism has handled.

For example an import-dataset job will have settings of:

{"path": "/path/to/subreadset.xml", "datasetMetaType": "PacBio.MetaTypes.SubreadSet"}

Whereas “analysis” jobs will have the pipeline id, entry points (excluded for brevity) amongst other options that areencoding type T for the ServiceJob options.

{"pipelineId": "pbsmrtpipe.pipelines.my_pipeline"}

In summary, given a ServiceJob, the settings is a well-defined schema for the specific jobTypeId.

2.1. ServiceJob 5

Page 12: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Model for Running Service Jobs within SMRT Link

Internal to the SMRT Link Services is an execution manager leveraging the **akka framework**. This enables thenumber of ServiceJob(s) running to be throttled and to not overload the box where the services are running.

For example, if you submit 100 analysis jobs, you won’t be forking and creating 100 pbsmrtpipe instances that aresubmitting N number of tasks to the cluster manager. The max number of ServiceJob(s) that are running will bethrottled by the value of max number of service workers that is defined in the SMRT Link System (JSON) config.

See the docs for more details on the configuration.

DataStore

As described in the previous section, a ServiceJob outputs a DataStore. A DataStore is a list of DataStoreFileinstances that contain metadata about the file, such as file type (GFF, Fasta, PacBio DataSet, PacBio Report, Log,Txt), globally unique id (uuid), file size, and “source id”.

Each DataStoreFile has a “source id” that is unique to the Job type and can be understood as mechanism to referencea specific output from a ServiceJob.

This provides an identifier to refer to the output of pipeline of a specific pipeine id.

DataStoreFile example

{"modifiedAt": "2017-03-03T11:52:21.031Z","name": "Filtered SubreadSet XML","fileTypeId": "PacBio.DataSet.SubreadSet","path": "/path/to/pbcoretools.tasks.filterdataset-0/filtered.subreadset.xml","description": "Filtered SubreadSet XML","uuid": "f5166313-f3e4-a963-a230-2b551666b30b","fileSize": 8912,"importedAt": "2017-03-03T11:52:21.031Z","jobId": 279,"createdAt": "2017-03-03T11:52:21.031Z","isActive": true,"jobUUID": "a45451da-3f2f-4e8e-9f76-61a12a306936","sourceId": "pbcoretools.tasks.filterdataset-out-0"}

SMRT Link Importing of DataStoreFile(s) from a DataStore

As a ServiceJob is run DataStoreFile(s) are being generated and imported into the SMRT Link System. For example,after mapping is completed in a Resequencing job, the AlignmentSet will be imported back into the system can be usedin other pipelines.

Depending on the fileTypeId of the DataStoreFile, the import might trigger other actions and store a richer set ofmetadata into the SMRT Link Database.

The two specific file types are PacBio Report and PacBio DataSet(s), such as BarcodeSet, SubreadSet, ReferenceSet.

6 Chapter 2. SMRT Link System Architecture High Level Overview

Page 13: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

PacBio DataSet Overview

These XML files are a metadata wrapper to underlying file, or files, such as fasta files, gmap indexes, or aligned orun-aligned BAM files.

Please see the official docs here

SMRT Link Analysis supports all PacBio DataSet types.

PacBio Report Overview

The PacBio Report data model is used to encode the metrics computed (e.g, max readlength), plot, plot groups andtables. Each report has a UUID that is globally unique and an “id” to communicate the report type (e.g., “map-ping_stats”)

Currently, there are officially supported APIs to read and write (via JSON) these data models. The supported modelsare in python (pbcommand) and in scala (smrtflow)

The Report DataModel avro Schema is here

Many (almost all) Report(s) generated from ServiceJob(s) are from the python pbreports package. By default, the(minimal) display data in the report will be used to display the Report in the SMRT Link UI.

Each Report type (by id) has a schema of the expected output types and attempts to separate the view data from themodel. This abstraction is a **Report Spec**.

Further customization of the view of a Report by type can be configured using ReportViewRules and loaded by SMRTLink Analysis on start up.

Accessing Report(s) from SMRT Link Analysis

The raw Reports (as JSON) are accesible from the SMRT Link Services as follows.

Get a List of all datastore files.

/secondary-analysis/jobs/pbsmrtpipe/1234/datastore

To display only the Report file types, ServiceReportFile (similar to the DataStoreFile)

/secondary-analysis/jobs/pbsmrtpipe/1234/reports

From the report UUID referenced in the ServiceReportFile, the raw JSON of the report can be obtained.

/secondary-analysis/jobs/pbsmrtpipe/1234/reports/{UUID}

See the SMRT Link Analysis Service swagger docs for more details.

Configuring SMRT Link

SMRT Link Analysis, Tomcat webserver, SMRT View and WSO2 are configured using the smrtlink-system-config.json file within the SMRT Link Analysi GUI Bundle. This is located smrtsuite/current/bundles/smrtlink-analysisservices-gui in the SL System build.

The config file uses the scala/java HOCON (as JSON) format. The Schema for the config is here

2.3. SMRT Link Importing of DataStoreFile(s) from a DataStore 7

Page 14: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Interacting With SMRT Link Analysis Services APIs

The recommended model for interfacing with the SMRT Link Services is using pbservice commandline exe, or thescala client API in smrtflow

The rich comandline tool, pbservice provides access to get job status of SMRT Link Analysis jobs, submit analysisjobs, import datasets and much more.

Please see the docs for more details.

F.A.Q. What is the difference between **smrtflow** and SMRT Link. SMRT Link Services and serveral command-line tools, such as pbservice are written in scala. These tools and services reside in a scala package called smrtflow.One of the applications in smrtflow is the SMRT Link Analysis web services.

There is python API in pbcommand to interface with the SMRT Link Services and an example ipython notebookwritten as a cookbook that can be used to demonstrate how to use the API.

SMRT Link Testing

[TBD]

• Describe the testkit Sim layer in smrtflow for testing service driven pipelines

• Describe pbtestkit for pbsmrtpipe

• Describe SL UI tests driven by protractor

8 Chapter 2. SMRT Link System Architecture High Level Overview

Page 15: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 3

Tools

Commandline Tools

Commandline Interaction with SMRT Link Services

The pbservice exe can be used in to interact with the SMRT Link Services from the commandline. Note that machine-readable output can be obtained in most cases by adding the argument –json to the command line; attempting to parseplain-text output is strongly discouraged.

Checking the Status

$> pbservice status --host smrtlink-beta --port 8081

Note: PB_SERVICE_HOST and PB_SERVICE_PORT env variables can be used to set the default –host and –portvalues.

$> export PB_SERVICE_PORT=8081$> export PB_SERVICE_HOST=smrtlink-bihourly$> pbservice status

SMRTLink Services Version: 0.1.10-c63303eStatus: Services have been up for 27 minutes and 13.320 seconds.DataSet Summary:SubreadSets 22HdfSubreadSets 56ReferenceSets 9BarcodeSets 7AlignmentSets 10ConsensusReadSets 6ConsensusAlignmentSets 1ContigSets 9

9

Page 16: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

GmapReferenceSets 0SMRT Link Job Summary:import-dataset Jobs 35merge-dataset Jobs 0convert-fasta-reference Jobs 2pbsmrtpipe Jobs 16

Importing a DataSet into SMRT Link Server

$> pbservice import-dataset --host smrtlink-beta --port 8081 /path/to/subreadset.xml

Note: This can also operate recursively on a directory. All files ending in *.subreadset.xml will be imported into thesystem. Any files that have already been imported into the system will be skipped.

Submit an Resequencing Analysis Job

$> pbservice run-pipeline sa3_ds_resequencing_fat --host smrtlink-beta --port 8081 \-e /path/to/subreadset.xml \-e /path/to/referenceset.xml \--job-title my_job_title

This will automatically import the entry point datasets if they are not already present in the system.

Resubmitting an Analysis Job

$> pbservice get-job <ID> --show-settings --host smrtlink-beta --port 8081 > my_job.→˓json$> pbservice run-analysis my_job.json --host smrtlink-beta --port 8081 --timeout 36000

You may edit the JSON file in between these commands if you want to modify the settings. It’s recommended thatyou change the job name or add a suffix of “_resubmit”.

Importing/Converting a RSII movie into SMRT Link

$> pbservice import-rs-movie --host smrtlink-alpha --port 8081 /path/to/movies

This will create a new HdfSubreadSet XML and database entry. Note that the underlying HDF5 data files will not beconverted at this stage; conversion to BAM format requires running a separate pbsmrtpipe job.

Querying Job History

$> pbservice get-jobs --job-state FAILED --job-type pbsmrtpipe --search-name→˓like:hg19 --search-pipeline pbsmrtpipe.pipelines.sa3_ds_resequencing_fat

The get-jobs subcommand allows searching for jobs by type, name (full or partial), job state, and/or pbsmrtpipepipeline (if relevant).

10 Chapter 3. Tools

Page 17: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Authentication

Use of pbservice to access a remote SMRT Link server (not running on localhost) requires user authentication overHTTPS; this is also required for some API calls that only work with authentication (projects are the most importantsuch feature). There are several ways to specify authentication credentials:

1. Add –ask-pass to the command-line arguments, and pbservice will prompt for a password. This is recommendedfor interactive use since the password stays private. If your Unix login ID is different from the user ID you wishto log in to SMRT Link with, you also need to add –user <username>.

2. Add –user <username> –password <password> to the command line arguments. Because this will displaythe password in shell history and/or log files, you should never do this with a full Unix account. Users thatneed this form (e.g. for scripting) should obtain SMRT Link login credentials that do not provide access to anyother systems.

3. Set the environment variables PB_SERVICE_AUTH_USER and PB_SERVICE_AUTH_PASSWORD. Again, thisshould never be done with a Unix account, only a limited SMRT Link-specific account.

For further options, please use pbservice –help for more functionality.

Conversion and Other Tools

Fasta to ReferenceSet

Convert a Fasta file to a ReferenceSet and generate the required index files (fai and sa (requires sawriter exe in$PATH)).

$> fasta-to-reference /path/to/file.fasta /path/to/output-dir my-reference-name --→˓organism=my-org --ploidy=haploid

Convert RSII movie metadata XML to HdfSubreadSet XML

$> movie-metadata-to-dataset /path/to/rs-movie.metadata.xml /path/to/output.→˓subreadset.xml

3.2. Conversion and Other Tools 11

Page 18: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

12 Chapter 3. Tools

Page 19: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 4

SMRT Link Analysis Services Config

Configuration is can be done using scala conf files, setting -D when launching the JVM, or by setting ENV vars.

Please see the reference.conf or the application.conf of each sbt subproject (e.g., smrt-server-analysis) for details.

When running from SMRT Link Analysis Services in production from the smrtlink-system-config.json file, the -D=config.file=/path/to/smrtlink-system-config.json can be used.

For configuring standalone exes build using “sbt pack”, use env variable JAVA_OPTS=”–D=config.file=/path/to/smrtlink-system-config.json” before launching the exe.

The specification for this file is documented in the SmrtLinkSystemConfig.avsc Avro schema.

Common Analysis Configuration

• PB_SERVICES_PORT or SMRTFLOW_SERVER_PORT (smrtflow.server.port) Set the port to use

• PB_SERVICES_MANIFEST_FILE (smrtflow.server.manifestFile) Path to PacBioManifest file that contains ver-sions of subcomponents, such as “smrtlink”

• PB_ENGINE_JOB_ROOT (smrtflow.engine.jobRootDir) Job root directory PB_ENGINE_JOB_ROOT (exam-ple: jobs-root, /path/to/jobs-root)

• PB_SMRTPIPE_XML_PRESET (smrtflow.engine.pbsmrtpipePresetXml) Path to default pbsmrtpipe PresetXML (example: /path/to/preset.xml)

• PB_ENGINE_MAX_WORKERS (smrtflow.engine.maxWorkers) Number of maximum services job workers torun concurrently. This will limit the total number of pbsmrtpipe, import-dataset, etc... jobs that are run concur-rently.

For the db configuration see the hocon .conf files for details.

13

Page 20: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

SMRT Link Bundle External Resources

• PB_PIPELINE_TEMPLATE_DIR Path to Resolved pipeline templates JSON files to load on startup

• PB_RULES_REPORT_VIEW_DIR Path to Report view rules JSON files

• PB_RULES_PIPELINE_VIEW_DIR Path to Pipeline View Rule JSON files

Note: Multiple paths can be provided with a ”:” separator. The order is important. Example exportPB_PIPELINE_TEMPLATE_DIR=”/path/a:/path/b”

See https://github.com/PacificBiosciences/pbpipeline-helloworld-resources for example SMRT Link External Re-sources and documentation.

Testing

• PB_TEST_DATA_FILES Path to PacBioTestFiles repo (https://github.com/PacificBiosciences/PacBioTestData)

Note, this should point to the file.json file within the repo. For example, “exportPB_TEST_DATA_FILES=/path/to/file.json”

14 Chapter 4. SMRT Link Analysis Services Config

Page 21: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 5

PacBio Data Bundle Model and Services

A PacBio Data Model is a manifest.xml with a directory for resources, such as config files, or resources used byapplications within SMRT Link (and SAT applications), ICS and Primary Analysis.

Requirements

• Contain parameter and configuration files from ICS, PA, SAT, and SL Services

• OS independent, standard vanilla .tgz format

• Each PacBio component (ICS, SL, SAT, DEP) can own sub-components within the bundle and define theschemas as they see fit

• Each single bundle represents a coherent grouping of config/parameter files that are intended to work across allcomponents of the system.

Example PacBio Data Bundle

The PacBio Data Bundle is a general file format that can be used in several different usecases. For example, anextension of the PipelineTemplate, View Rule, Report Rules Data Bundle in SMRT Link, and PacBioTestData Bundle(TODO).

The most important bundle is the “Chemistry” Bundle. This PacBio Data Bundle type that contains ICS, SAT relatedfiles to be used from SL and SL services is provided here http://bitbucket.nanofluidics.com:7990/projects/SL/repos/chemistry-data-bundle/browse

Example PacBio Data Bundle manifest.xml

<?xml version='1.0' encoding='utf-8'?><Manifest>

<Package>chemistry</Package><Version>4.0.0</Version><Created>11/28/16 11:16:46 PM</Created>

15

Page 22: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

<Author>build</Author></Manifest>

Note, the version must be provided using the Semantic Version scheme. This ensures a clear, well-defined modelfor ordering and comparing bundles versions.

PacBio Data Bundle Model

This model contains metadata about the bundle.

• type {String} Bundle type id (e.g., “chemistry”)

• version: {String} SemVer of the bundle. Unique Identifier to bundle resource within a bundle type. The bundleversion comparision will be performed following the semver spec.

• importedAt: {DateTime} When the bundle was imported at

• isActive: {Boolean} If the bundle is active (For a given bundle type, only one bundle can be active)

• url: {URL} Download Link URL for file(s) (as .tgz?) served from SL Services

SMRT Bundle Server

Warning: As of 5.1.0, the chemistry bundle type id has been changed to chemistry-pb

Warning: As of 5.1.0, the SMRT Link Bundle services and the Bundle Upgrade have diverged. Please see theSMRT Link Server docs for details on the bundle related services in SMRT Link Server.

These services are for the stand-alone Chemistry Parameter Update Bundle server.

The endpoints are documented in the swagger file (bundleserver_swagger.json) within the project, or the/api/v2/swagger endpoint of the services.

The swagger-UI can be used to visualize the endpoint APIs. http://swagger.io/swagger-ui/

Servers

• http://smrtlink-update-staging.pacbcloud.com:8084

• http://smrtlink-update.pacbcloud.com:8084

Status Staging Server

$> http get http://smrtlink-update-staging.pacbcloud.com:8084/status -b{

"id": "bundle-server","message": "Services have been up for 112 hours, 21 minutes and 30.557 seconds.","uptime": 404490557,"user": "root","uuid": "66fb205f-2599-3a37-919e-a0dc5552fee0",

16 Chapter 5. PacBio Data Bundle Model and Services

Page 23: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

"version": "0.6.7+5475.e2b6df3"}

Status

$> http get http://smrtlink-update.pacbcloud.com:8084/status -b{

"id": "bundle-server","message": "Services have been up for 124 hours, 48 minutes and 10.517 seconds.","uptime": 449290517,"user": "root","uuid": "56b814db-f0ef-3b91-880b-d1855545b3f8","version": "0.6.7+2.82f4bc1"

}

Legacy API

Note: This should only be used for PacBio System Release version “5.0.0”.

List bundles

$> http get http://smrtlink-update-staging.pacbcloud.com:8084/smrt-link/bundles -b[

{"createdBy": "integration team","importedAt": "2017-06-08T20:48:14.322Z","isActive": false,"typeId": "chemistry","version": "9.9.9"

},{

"createdBy": "build","importedAt": "2017-06-08T21:40:04.475Z","isActive": true,"typeId": "chemistry","version": "5.0.0+00c49de"

}]

Get a Specific bundle resource

GET /smrt-link/bundles/{bundle-type-id}/{bundle-version} # Bundle Resource or 404

Example:

GET /smrt-link/bundles/chemistry/1.2.3+3.ebbde5

Download a PacBio Data Bundle

GET /smrt-link/bundles/{bundle-type-id}/download

New

5.6. Legacy API 17

Page 24: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Building a Stand Alone Chemistry Update Bundle Server

Get repo: http://bitbucket.nanofluidics.com:7990/projects/SL/repos/smrtflow/browse

$> sbt smrt-server-bundle/{compile,pack}

Generates the Server Exe smrt-server-bundle/target/pack/bin/smrt-server-data-bundle

Configuration

The configuration for SMRT Link or the stand-alone Chemistry Data is performed in the same way.

For running a stand alone chemistry bundler server, it is strongly recommended for consistency to standardizeon port 8084

$> export PB_SERVICES_PORT=8084

Configure the root bundle path

$> export SMRTFLOW_BUNDLE_DIR=/path/to/pacbio-bundles

Or by setting the smrtflow.server.bundleDir key in the smrtlink-system-config.json (if running from SMRT LinkServer).

Details of the Root Bundle Dir

When the bundle server is started up, the system will load bundles within subdirectories named with the PacBioSystem Release Version of root directory. The subdirectories must be valid semver format and contain a list of validbundles.

A valid bundle is a bundle that has a unzipped companion with the name as BUNDLE-ID-BUNDLE-VERSION direc-tory with an unzipped companion of the same name ‘‘BUNDLE-ID‘-BUNDLE-VERSION.tar.gz.

This will yield <ROOT-BUNDLE-DIR>/<PACBIO-SYSTEM-VERSION>/<BUNDLE-ID>-<BUNDLE-VERSION>and <ROOT-BUNDLE-DIR>/<PACBIO-SYSTEM-VERSION>/<BUNDLE-ID>-<BUNDLE-VERSION>.tar.gz for-mat.

Note: ALL BUNDLES within a specific PACBIO-SYSTEM-VERSION must be compatible with the companionversion SMRT Link.

Example directory structure

For a PacBio System Release Version 7.0.0 in root bundle dir /path/to/bundles-root, the directory structure could be:

$> mkocher@login14-biofx01:pacbio-bundles$ ls -la /path/to/bundles-root/7.0.0total 112drwxar-xr-x 4 secondarytest Domain Users 4096 May 31 18:04 .drwxr-xr-x 6 secondarytest Domain Users 4096 May 31 15:40 ..drwxr-xr-x 6 secondarytest Domain Users 4096 May 31 18:04 chemistry-4.1.0-rw-r--r-- 1 secondarytest Domain Users 42269 May 31 18:04 chemistry-4.1.0.tar.gzdrwxr-xr-x 6 secondarytest Domain Users 4096 May 31 15:40 chemistry-5.0.0-rwxr-xr-x 1 secondarytest Domain Users 38566 May 31 15:40 chemistry-5.0.0.tar.gz-rwxr-xr-x 1 secondarytest Domain Users 1168 May 31 15:40 README.md

18 Chapter 5. PacBio Data Bundle Model and Services

Page 25: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Warning: For loading 5.0.0 bundles to be used in the Legacy V1 API, use the “5.0.0”. These bundles will nowbe available at the legacy routes as well as the V2 API.

Building and Starting up the Chemistry Bundle Upgrade Server

$> smrt-server-link/target/pack/bin/smrt-server-data-bundle

Command line args

--log-file=/path/to/log.file--log-level=DEBUG

Note, there is no support for –help

The log file will log the loaded and “active” data bundles on startup.

Getting a List of PacBio Data Bundles from SMRT Link Server

Warning: This approach will NOT work for the Bundle Server

Use pbservice to get a list of bundles on the SMRT Link server.

$> smrt-server-link/target/pack/bin/pbservice get-bundles --host=smrtlink-bihourly --→˓port=8081Bundle Id Version Imported At Is Activechemistry 5.0.0 2017-06-01T01:04:09.885Z truechemistry 4.1.0 2017-06-01T01:04:15.121Z falsechemistry 4.1.0 2017-06-01T01:04:15.130Z false

The pbservice exe will be built from sbt smrt-server-link/{compile,pack} command.

Bundles Stored within the SL System install

• All PacBio Data bundles are stored with SMRT Link pbbundler. The default chemistry bundle is packagedwithin pbbundler SL package.

• The default chemistry bundle is packaged within pbbundler SL package and is pulled from http://bitbucket.nanofluidics.com:7990/projects/SL/repos/chemistry-data-bundle/browse

Chemistry Data Bundle Details

The “Chemistry” bundle is the core PacBio data model that contains information related to chemistry parameters andconfiguration for SMRT Link, ICS, PA and tools from secondary analysis (i.e.,SAT)

5.10. Building and Starting up the Chemistry Bundle Upgrade Server 19

Page 26: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

SMRT Link PartNumbers and Automation Constraints WebService

The definitions/PacBioAutomationConstraints.xml is loaded from most recent chemistry bundle.This is translated from XML (via jaxb) and exposed as JSON as a webservice. This service will be used by theRunDesign and SampleSetup UI application in SL.

GET /smrt-link/automation-constraints # Returns a single PacBioAutomationConstraints→˓JSON response

Note, if there is not a chemistry bundle loaded, the response will return a 404.

SMRT Link Periodic Checking for Chemistry Data Bundle Upgrades

SMRT Link Services are configured to check the configured Chemistry Bundle Upgrade services (if the URL isconfigured in the smrtlink-system-config.json) every 12 hrs. The check to the external server for “newer” ChemistryParameter bundles based on the semantic version scheme. See http://semver.org/ for details.

Using the nested naming format in the JSON file, the smrtflow.server.chemistryBundleURL has type Option[URL].The URL is the base url of the external bundle service. For example, http://my-server/smrt-link/bundles. This externalendpoint will poll the external server every day for newer chemistry bundles.

If a newer “Chemistry” Data Bundle is detected it will be downloaded and added to the chemistry bundle registry andexposed at smrt-link/bundles/chemistry. Note, it will only be added to the registry, it will not be activated when thebundle is downloaded.

Activation must be done via an explicit call to the services to activate the PacBio Chemistry Data Bundle. See theswagger file or endpoint for details on the WebService calls.

20 Chapter 5. PacBio Data Bundle Model and Services

Page 27: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 6

SMRT Link Analysis Services API

SL Analysis services use swagger.json format starting in SL System version 4.1.0.

Please see the ./smrt-server-analysis/src/main/resources/smrtlink_swagger.json file for details.

21

Page 28: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

22 Chapter 6. SMRT Link Analysis Services API

Page 29: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 7

Eve SMRT Server for Events and Tech Support File Uploads

General Overview

• System Overview

• Building, Configuring and Starting up Eve

• Eve Services

• TechSupport TGZ bundle format

• (for Context) How messages are generated within SL and define the general message datastructure

• (for Context) How are messages sent out of SL to an external Service

System Overview

Eve is a SMRT Server instance that enables SMRT Link to send Events and TechSupport TGZ file uploads.

Events that are sent to Eve are written to disk and imported into ElasticSearch and are accessible from Kibana.

There are three main components:

• Eve SMRT Server

– Staging https://smrtlink-eve-staging.pacbcloud.com:8083

– Production https://smrtlink-eve.pacbcloud.com:8083

• ElasticSearch 2.4.x

• UI Kibana 4.6

– Staging http://smrtfleet-kibana-staging.nanofluidics.com

– Production http://smrtfleet-kibana.nanofluidics.com

23

Page 30: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Building, Configuring and Starting up Eve

Requires java 1.8 and sbt

Build

$> sbt smrt-server-link/{compile,pack}

Generates exes in smrt-server-link/target/pack/bin. Specifically, smrt-server-eve

For demo and testing purposes the system is configured to write the events to a directory.

Optional custom configure of the port and the port to start on:

Set the port. By convention and to standardize, it is STRONGLY recommended to be set to 8083

export PB_SERVICES_PORT=8083

Set the location where Eve will write files to

export EVE_ROOT_DIR=/path/to/where/files/are/written

The output files are written in the form:

Tech Support TGZ Uploaded bundles are unzipped next to the companion .tar.gz file.

Note, the UUID of the uploaded bundle is NOT the UUID of the tech support bundle. In other words, a TGZ TechSupport bundle can be uploaded multiple times.

<EVE_ROOT_DIR>/files/<YEAR>/<MONTH><DAY>/<UPLOADED_BUNDLE_UUID><EVE_ROOT_DIR>/files/<YEAR>/<MONTH><DAY>/<UPLOADED_BUNDLE_UUID>.tar.gz

Events are written as

<EVE_ROOT_DIR>/<SMRT-LINK-SYSTEM-UUID>/<EVENT-UUID>.json

Other Config

export SMRTFLOW_EVENT_API_SECRET="rick" # this must be consistent with the client→˓used in SL

See the application.conf and reference.conf for all configuration options accessible from Eve.

Start the Server

$> smrt-server-link/target/pack/bin/smrt-server-eve

Command line args

--log-file=/path/to/log.file--log-level=DEBUG

Note, there is no support for –help

Server Status

24 Chapter 7. Eve SMRT Server for Events and Tech Support File Uploads

Page 31: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

$>http get https://smrtlink-eve-staging.pacbcloud.com:8083/status -b{

"id": "smrt-eve","message": "Services have been up for 309 hours, 34 minutes and 30.636 seconds.","uptime": 1114470636,"user": "pbweb","uuid": "debcf761-44dc-3856-9a2a-8abfbbdab6b7","version": "0.6.7+817dff6"

}

Eve and SMRT Link Server Tools

Building Tools

$>sbt smrt-server-link/{compile,pack}

Generates tools, such as pbservice, tech-support-bundler and tech-support-uploader. See the –help in each tool fordetails.

Summary of Usage:

Uploading TGZ bundles to Eve, is performed by tech-support-uploader

$> tech-support-uploader /path/to/ts-bundle.tar.gz --url http://localhost:8083

When using from SMRT Link build, the default Eve URL should be set from smrtlink-system-config.json

For triggering Tech Support System Status from SMRT Link, use pbservice

$> pbservice ts-status --user=rick --comment="Test Bundle creation" --host=smrtlink-→˓bihourly --port=8081

This will create a tech-support system status TGZ bundle and upload to Eve. If the SMRT Link system is not configuredwith a Eve URL, the job creation step will fail.

Similarly, for requesting a TechSupport Failed Job 1234

$> pbservice ts-failed-job 1234 --user=mkocher --comment="Test Failed Job"

Note, if the job is not in a failed state, or the job does not exist, there should be an error message and pbservice willreturn with a non-zero exit code.

Tech Support TGZ Bundle

The Tech Support TGZ bundle is a tar.gz file that contains a tech-support-manifest.json metadata file as well as anyfiles, such as log or config files included in the bundle.

The bundle manifest defines the “type” of bundle and the schema of REQUIRED files and directory structure to beincluded in the bundle.

The two main bundles are

1. SMRT Link System Status (or failed installs)

2. SMRT Link Failed Job

7.6. Eve and SMRT Link Server Tools 25

Page 32: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Example manifest JSON for the SMRT Link System Status

{"bundleTypeVersion": 1,"bundleTypeId": "sl_ts_bundle_system_status","id": "cef996da-bf7c-4cec-b983-af4e95486ca6","comment": "Created by smrtflow version 0.6.7+755.92d16d8","smrtLinkSystemVersion": "5.0.0.SNAPSHOT4888","dnsName": "smrtlink-bihourly.nanofluidics.com","createdAt": "2017-05-25T11:10:56.749-07:00","user": "mkocher","smrtLinkSystemId": "a0a2702a-cb7a-3a63-ac5f-fad696425a04"

}

Note, that when a Tech Support TGZ bundle is uploaded into Eve, an “uploaded” Event with the TS Manifest metadatawill be created. This Event will also will have the path to the unzipped TechSupport Bundle.

All tools MUST use the Event interface to look for recently uploaded TechSupport TGZ bundles.

DO NOT USE THE DIRECT ACCESS TO FILE SYSTEM This can change and is not the public interface to Eve.It’s a configuration parameter and the output destination can change.

Eve WebServices

See the /smrt-server-link/src/main/resources/eventserver_swagger.json or “<HOST>:<PORT>/api/v1/swagger”for de-tails of the WebServices.

Use the swagger UI to get a prettified view of the swagger JSON file

Events/Messages Generated Within SMRT Link Analysis Service

Internal Components (e.g., DataSet Service, JobManager Service) to SL will send messages to a EventManagerActor.Each message has a standard packet and schema.

Example (made terse as possible for demonstration purposes) and defined as a SMRT Link Message

{"uuid": "83927d00-f46c-11e6-9f9b-3c15c2cc8f88","createdAt": "2017-02-16T08:36:21.082-08:00","eventTypeId": "smrtlink_job_change_state","eventTypeVersion": 1,"message": {

"jobId": 1234,"jobTypeId":"pbsmrtpipe","state": "SUCCCESSFUL"

}}

• eventTypeId must map to a well defined schema defined in message which should be documented. When themodel changes, the id must change. One possible way of doing this is to sl_job_change_state2 to encode theversion. The eventTypeId should be prefixed with sl_

• eventTypeVersion Version of eventTypeId message schema

• createdAt ISO8601 encoded version of the datetime the original message was created

26 Chapter 7. Eve SMRT Server for Events and Tech Support File Uploads

Page 33: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

• message message payload

• uuid Globally unique identifier for message. Assigned by the creator of the message

Internally at the EventManagerActor, the messages will be augmented with the SL context information, such as theSL globally unique identifier (TODO: How is this determined and assigned? For demonstration purposes a UUID willbe used. A URL of the SL instance is actually more useful, but is leaking customer information)

Defining this data model as a SMRT Link System Message

{"smrtlinkId": "2319db24-f46e-11e6-a35c-3c15c2cc8f88","uuid": "83927d00-f46c-11e6-9f9b-3c15c2cc8f88","dnsName: "my-host","createdAt": "2017-02-16T08:36:21.082-08:00","eventTypeId": "smrtlink_job_change_state","eventTypeVersion": 1,"message": {

"jobId": 1234,"jobTypeId": "pbsmrtpipe","state": "SUCCCESSFUL"

}}

How Messages are sent out of SMRT Link to External Server

The EventManagerActor will forward messages to the listeners (i.e., Actors) that can take actions, such as sending anemail on job failure, make POST requests to External Server, or create jobs for “auto-analysis”.

Filtering of messages that are sent to External Servers should be handled by configuration. In other words, it shouldbe configurable to only send smrtlink_job_change_state messages, or only eula_signed event/message types.

7.9. Events/Messages Generated Within SMRT Link Analysis Service 27

Page 34: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

28 Chapter 7. Eve SMRT Server for Events and Tech Support File Uploads

Page 35: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 8

Logging configuration

Simple logging with params consistent with the PacBio tools. See [smrtflow#42](https://github.com/PacificBiosciences/smrtflow/pull/42) for history.

Usage

Example command-line usage:

$ ./smrt-server-link/target/pack/bin/pbservice -h(snip)Usage: ./app_with_logging [options]

This is an app that supports PacBio logging flags.-h | --help

Show Options and exit--log2stdout

If true, log output will be displayed to the console. Default is false.--log-level <value>

Level for logging: "ERROR", "WARN", "DEBUG", or "INFO". Default is "ERROR"--debug

Same as --log-level DEBUG--quiet

Same as --log-level ERROR--verbose

Same as --log-level INFO--log-file <value>

File for log output. Default is "."--logback <value>

Override all logger config with the given logback.xml file.

All of the above flags are intended for reuse in all of the SMRT services but may also be helpful for any code thatwants to use our Scala logging conventions.

Extend the LoggerConfig trait and have the OptionsParser instance invoke the LoggerOption.add method.:

29

Page 36: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

// 1 of 2: extend LoggerConfig in the Config-style class

case class GetStatusConfig(host: String = "http://localhost", port: Int = 8070)→˓extends LoggerConfig

// 2 of 2: make an `OptionParser` that invokes `LoggerOptions.add`

lazy val parser = new OptionParser[GetStatusConfig]("get-status") {head("Get SMRTLink status ", VERSION)note("Tool to check the status of a currently running smrtlink server")

opt[String]("host") action { (x, c) =>c.copy(host = x)

} text "Hostname of smrtlink server"

opt[Int]("port") action { (x, c) =>c.copy(port = x)

} text "Services port on smrtlink server"...// reuse the common `--debug` param and logging paramsLoggerOptions.add(this.asInstanceOf[OptionParser[LoggerConfig]])

}

Alternatively, use the LoggerOptions.parse and related methods to directly parse an arg array.

object MyCode extends App {// whatever pre-logging config logic ...

// parse the args for logging related optionsLoggerOptions.parseAddDebug(args)

// whatever post-logging conig logic}

Command-Line Example

There are a few practical use cases that are supported. By default, INFO level events are logged to STDOUT.

### Running a Production Server

You’ll likely want to capture everything above warnings in a production environment. Use the –log-level and –log-fileflags:

$ ./smrt-server-link/target/pack/bin/pbservice status --host smrtlink-bihourly --port→˓8081 --log-file /var/log/my_log.log --log-level WARN

In this case, the normal params exist and only these two alter the logging.

• –log-file = Where the logger will save data.

• –log-level = Sets the logger handler to display all WARN level messages and worse.

Users will almost always have a custom log location. Allowing this to be specified via command-line is a simple wayto support this versus requiring a custom log config file or property file.

### Dev Logging

30 Chapter 8. Logging configuration

Page 37: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

When working on the code you probably always want to see errors. If that is true, run with –log2stdout and –log-levelERROR:

$ ./smrt-server-link/target/pack/bin/some_service --log2stdout --log-level ERROR

Here is the more verbose, show me all log messages example:

$ ./smrt-server-link/target/pack/bin/some_service --log2stdout --log-level DEBUG

### Using a logback.xml config

It is possible to ignore all of the default conventions used by this API and rely on a standard logback.xml config file viathe –logback. This provides the most flexibility possible and relies on a well known, commonly used logging library:

./smrt-server-link/target/pack/bin/some_service --logback logback.xml

8.2. Command-Line Example 31

Page 38: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

32 Chapter 8. Logging configuration

Page 39: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 9

PacBio Update Server

A PacBio update server is a system configured to run Stand Alone Chemistry Update Bundle Server process.

Requirements

• CentOS server, currently CentOS 7.3.1611, with Java installed.

• Update Bundle Server package (see Packaging Update Bundle Server below)

Packaging Chemistry Update Bundle Server Build

See build instructions here: http://bitbucket.nanofluidics.com:7990/projects/SL/repos/smrtflow/browse/docs/pacbio_bundles.rst

Build Update Bundle server, using the instructions in section “Building a Stand Alone Chemistry Update BundleServer”.

Create Update Bundle Server tarball

$> tar -czf smrt-server-data-bundle-<VERSION>.tgz bin/smrt-server-data-bundle lib→˓VERSION

Install

Copy the install tarball for the smrt-server-data-bundle server onto the system. The following examples assume it hasbeen stored in /var/tmp.

Create the install directory.

33

Page 40: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

$> mkdir -p /opt/pacbio/smrt-server-data-bundle/<VERSION>

Untar the installation tarball into the install directory.

$> tar -C /opt/pacbio/smrt-server-data-bundle/<VERSION> -xzf /var/tmp/smrt-server-→˓data-bundle-<VERSION>.tgz

Create the “current” symlink referencing the newly installed version.

$> ln -s <VERSION> /opt/pacbio/smrt-server-data-bundle/current

Create /opt/pacbio/smrt-server-data-bundle/smrt-server-data-bundle.service (see systemd service definition section)

Enable the systemd service by creating a symlink to the service definition file in /etc/systemd/system/multi-user.target.wants.

$> ln -s /opt/pacbio/smrt-server-data-bundle/smrt-server-data-bundle.service /etc/→˓systemd/system/multi-user.target.wants/smrt-server-data-bundle.service

Instruct systemd to reread all service definitions so that it becomes aware of rhte smrt-server-data-bundle service.

$> systemctl daemon-reload

Start the smrt-server-data-bundle service.

$> systemctl start smrt-server-data-bundle

Configuration

PB_SERVICES_PORT=8084 SMRTFLOW_BUNDLE_DIR=/opt/pacbio/smrt-server-data-bundle/chemistry-updates

The above environment variables are set as part of the systemd service that starts the smrt-server-data-bundle service.

Systemd service definition

smrt-server-data-bundle.service contents:

[Unit]Description=PacBio SMRTLink Update-Only serverDocumentation=After=network-online.target

[Service]Type=simpleWorkingDirectory=/opt/pacbio/smrt-server-data-bundle/currentUser=rootEnvironment=SEGFAULT_SIGNALS=all PB_SERVICES_PORT=8084 SMRTFLOW_BUNDLE_DIR=/opt/→˓pacbio/smrt-server-data-bundle/chemistry-updatesStandardOutput=journalExecStart=/opt/pacbio/smrtlink-server-update-only/current/bin/smrt-server-data-bundleNotifyAccess=all## Restart the process if it fails (which means !=0 exit, abnormal termination, or→˓abort or watchdog)RestartSec=5s

34 Chapter 9. PacBio Update Server

Page 41: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Restart=on-failure## try starting up to this many times:StartLimitBurst=6## ... within this time limit:StartLimitInterval=5min## ... otherwise, reboot the machine.#StartLimitAction=reboot-forceStartLimitAction=none

TimeoutStopSec=10s

[Install]WantedBy=multi-user.target

Upgrade

Copy the install tarball for the smrt-server-data-bundle server onto the system. The following examples assume it hasbeen stored in /var/tmp.

Create the install directory.

:: $> mkdir -p /opt/pacbio/smrt-server-data-bundle/<VERSION>

Untar the installation tarball into the install directory.

$> tar -C /opt/pacbio/smrt-server-data-bundle/<VERSION> -xzf /var/tmp/smrt-server-→˓data-bundle-<VERSION>.tgz

Stop the smrt-server-data-bundle service.

$> systemctl stop smrt-server-data-bundle

The “prev” symlink, if it exists points to the n-1 version in case an upgrade needs to be rolled back. It needs to beremoved.

$> rm -f /opt/pacbio/smrt-server-data-bundle/prev

The “current” version now becomes the “prev” version by renaming the current symlink.

$> mv /opt/pacbio/smrt-server-data-bundle/current /opt/pacbio/smrt-server-data-bundle/→˓prev

Create the “current” symlink referencing the newly installed version.

$> ln -s <VERSION> /opt/pacbio/smrt-server-data-bundle/current

Restart the smrt-server-data-bundle service

$> systemctl start smrt-server-data-bundle

Automated build and deploy

The steps for building, creating the release bundle, and installing the release bundle that are documented above, havebeen automated as a Bamboo job: http://bamboo.nanofluidics.com:8085/browse/DEP-SD. This job is triggered by

9.6. Upgrade 35

Page 42: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

changes to the master branch in the smrtflow repository. The build results are automatically deployed to smrtlink-update-staging but must be manually deployed, as part of a release process, to smrtlink-update.

36 Chapter 9. PacBio Update Server

Page 43: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 10

SMRT Link Services Common Tasks And Workflows

This chapter describes common tasks performed using the SMRT Link Web Services API and provides “how to”recipes for accomplishing these tasks.

To accomplish a task, you usually need to perform several API calls; the workflow describes the order of these calls.

How to get the reports for SMRT Link Job By Id

To get the reports for a job, given the job ID, perform the following steps:

1. Determine the job type from the list of available job types. Use the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/job-types

2. Get the corresponding job type string. The job type can be found in the “jobTypeId” field.

3. Get reports produced by the job. Given the job ID and the job type, use them in the GET request with thefollowing endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/{jobType}/{jobID}→˓/reports

Example

Suppose you view a SMRT Analysis job results page in the SMRT Link UI.

To find the job ID, look for the “Analysis Id” field under Analysis Overview, Status.

Note: The job ID will also appear in the {jobID} path parameter of the SMRT Link UI URL. Suppose you view thefollowing SMRT Analysis job results page:

http://SMRTLinkServername.domain:9090/#/analysis/job/3957

Then the job ID is 3957.

To get the job type, use the GET request with the following endpoint:

37

Page 44: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/job-types

Look for the appropriate jobTypeId in the response.

A SMRT Analysis job corresponds to the ‘pbsmrtpipe’ type, so the jobTypeId will be “pbsmrtpipe”. The desiredendpoint is:

http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe/3957/→˓reports

Use the GET request with this endpoint to get a list of reports produced by the job with ID = 3957.

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe/3957/→˓reports

Individual reports associated with a job can be retrieved by adding the report ID specified in the uuid field, forexample:

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe/3957/→˓reports/06dd155b-eb0f-4c26-9f07-2b9a76452dd9

How to get the SMRT Link reports for dataset by UUID

To get reports for a dataset, given the dataset UUID, perform the following steps:

1. Determine the dataset type from the list of available dataset types. Use the GET request with the followingendpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/dataset-types

2. Get the corresponding dataset type string. The dataset type can be found in the “shortName” field. Dataset typesare explained in Overview of Dataset Service.

3. Get reports that correspond to the dataset. Given the dataset UUID and the dataset type, use them in the GETrequest with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/{datasetType}/→˓{datasetUUID}/reports

Example

To get reports associated with a subreadset with UUID = 146338e0-7ec2-4d2d-b938-11bce71b7ed1, perform the fol-lowing steps:

Use the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/dataset-types

You see that the shortName of SubreadSets is “subreads”. The desired endpoint is:

http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/146338e0-7ec2-4d2d-→˓b938-11bce71b7ed1/reports

Use the GET request with this endpoint to get reports that correspond to the SubreadSet with UUID = 146338e0-7ec2-4d2d-b938-11bce71b7ed1:

38 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 45: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/146338e0-7ec2-→˓4d2d-b938-11bce71b7ed1/reports

Once you have the UUID for an individual report, it can be downloaded using the datastore files service: the uuidfield

GET http://SMRTLinkServername.domain:9091/smrt-link/datastore-files/519817b6-4bfe-→˓4402-a54e-c16b29eb06eb/download

How to get QC reports for a particular SMRT Link Run

To get QC reports for a particular Run, given the Run Name, perform the following steps:

1. Get the list of all Runs: Use the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs

In the response, perform a text search for the Run Name: Find the object whose “name” field is equal to the RunName, and get the Run UUID, which can be found in the “uniqueId” field.

2. Get all Collections that belong to this Run: Use the Run UUID found in the previous step in the GET requestwith the following endpoint:

1. Take a Collection UUID of one of Collection objects received in the previous response. The Collection UUIDscan be found in the “uniqueId” fields.

For complete Collections, the Collection UUID will be the same as the UUID of the SubreadSet for that Collection.

Make sure that the Collection whose “uniqueId” field you take has the field “status” set to “Complete”. This isbecause obtaining dataset reports based on the Collection UUID as described below will only work if the Collectionis complete. If the Collection is not complete, the SubreadSet does not exist yet.

Retrieve the QC reports that correspond to this Collection: Use the Collection UUID obtained in the previous step inthe GET request with the following endpoint:

Note: See ‘How to get the SMRT Link reports for dataset by UUID‘__ for more details.

2. Take a report UUID of one of the reports of the Collection from the previous response. The report UUIDs canbe found in the “uuid” fields.

3. Download one of the reports associated with the Collection: Use the report UUID in the GET request with thefollowing endpoint:

1. Repeat previous steps to download all desired reports associated for that specific Collection.

2. Repeat Steps 4 - 8 to download QC reports for all complete Collections of that Run.

Example

You view the Run QC page in the SMRT Link UI, and open the page of a Run with status “Complete”. Take the RunName and look for the Run UUID in the list of all Runs, as described above.

Note: The Run ID will also appear in the {runUUID} path parameter of the SMRT Link UI URL

GET http://SMRTLinkServername.domain:9090/#/run-qc/{runUUID}

So the shorter way would be to take the Run UUID directly from the URL, such as

GET http://SMRTLinkServername.domain:9090/#/run-qc/d7b83cfc-91a6-4cea-8025-→˓8bcc1f39e045

10.1. How to get the reports for SMRT Link Job By Id 39

Page 46: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

With this Run UUID = d7b83cfc-91a6-4cea-8025-8bcc1f39e045, get all Collections that belong to this Run:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/d7b83cfc-91a6-4cea-8025-→˓8bcc1f39e045/collections

Take a UUID of a completed Collection, such as “uniqueId”: “59230aeb-a8e3-4b46-b1b1-24c782c158c1”. With thisCollection UUID, retrieve QC reports of the corresponding SubreadSet:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/59230aeb-a8e3-→˓4b46-b1b1-24c782c158c1/reports

Take a UUID of some report, such as. “uuid”: “00c310ab-e989-4978-961e-c673b9a2b027”. With this report UUID,download the corresponding report file:

GET http://SMRTLinkServername.domain:9091/smrt-link/datastore-files/00c310ab-e989-→˓4978-961e-c673b9a2b027/download

Repeat the last two API calls until you download all desired reports for all complete Collections.

How to get QC reports for a particular Collection

For completed Collections, the Collection UUID will be the same as the UUID of the SubreadSet for that Collection.To retrieve the QC reports of a completed Collection, given the Collection UUID, perform the following steps:

1. Get the QC reports that correspond to this Collection: Use the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/{collectionUUID}→˓/reports

Note: See ‘How to get the SMRT Link reports for dataset by UUID‘__ for more details.

Note: Obtaining dataset reports based on the Collection UUID as described above will only work if the Collection iscomplete. If the Collection is not complete, then the SubreadSet does not exist yet.

2. Take a report UUID of one of the reports of the Collection from the previous response.

The report UUIDs can be found in the “uuid” fields.

3. Download one of the reports of the Collection: Use the report UUID in the GET request with the followingendpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/datastore-files/{reportUUID}/→˓download

4. Repeat Steps 2 - 3 to download all desired reports of the Collection.

Example

Suppose you have a complete Collection with UUID = 59230aeb-a8e3-4b46-b1b1-24c782c158c1. Get all reports ofthe SubreadSet which corresponds to this Collection:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/59230aeb-a8e3-→˓4b46-b1b1-24c782c158c1/reports

Take the UUID of a desired report, such as “uuid”: “00c310ab-e989-4978-961e-c673b9a2b027”. With this reportUUID, download the corresponding report file:

GET http://SMRTLinkServername.domain:9091/smrt-link/datastore-files/00c310ab-e989-→˓4978-961e-c673b9a2b027/download

40 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 47: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Repeat the last API call until you download all desired reports associated with this Collection.

How to get recent Runs

To get recent Runs, perform the following steps:

1. Get the list of all Runs: Use the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs

2. Filter the response based on the value of the “createdAt” field. For example:

“createdAt”: “2016-12-13T19:11:54.086Z”

Note: You may also search Runs based on specific criteria, such as reserved state, creator, or summarysubstring.

Example, suppose you want to find all Runs created on or after 01.01.2017. First, get the list of all Runs:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs

The response will be an array of Run objects, as in the following example (some fields are removed for displaypurposes):

[{“name” : “2016-11-08_3150473_2kLambda_A12”,“uniqueId” : “97286726-b243-45b3-82f7-8b5f58c56d53”,“createdAt” : “2016-11-08T17:50:57.955Z”,“summary” : “lambdaNEB”}, {“name” : “2017_01_24_A7_4kbSymAsym_DS_3150540”,“uniqueId” : “abd8f5ec-a177-4d41-8556-81c5ffb6b0aa”,“createdAt” : “2017-01-24T20:09:27.629Z”,“summary” : “pBR322_InsertOnly”}, {“name” : “SMS_GoatVer_VVC034_3150433_2kLambda_400pm_SNR10.5”,“uniqueId” : “b81de65a-8018-4843-9da7-ff2647a9d01e”,“createdAt” : “2016-10-17T23:36:35.000Z”,“summary” : “lambdaNEB”}]

Now, search the above response for all Run objects whose “createdAt” field starts with the “2017-01” substring. Fromthe above example, you will get two Runs that fit your criteria (that is, created on or after 01.01.2017):

Run with “name” equal to “2017_01_24_A7_4kbSymAsym_DS_3150540”,

Run with “name” equal to “2017_01_21_A7_RC0_2.5-6kb_DS”.

How to setup a Run in Run Design

To setup a Run design, perform the following steps:

1. Prepare the Run Design information in an XML file. (The XML file should correspond to the PacBioData-Model.xsd schema.)

2. Create the Run design: Use the POST request with the following endpoint:

10.1. How to get the reports for SMRT Link Job By Id 41

Page 48: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

POST http://SMRTLinkServername.domain:9091/smrt-link/runs

The payload (request body) for this POST request is a JSON with the following fields:

• dataModel: The serialized XML containing the Run Design information

• name: The name of the run

• summary: A short description of the run

Example, Create a Run design using the following API call:

POST http://SMRTLinkServername.domain:9091/smrt-link/runs

Use the payload as in the following example:

{"dataModel" : "<serialized Run Design XML file according to the PacBioDataModel.xsd→˓schema>", "name" : "Run_201601220309_D15", "summary" : "tkb_C5_circular_23x_I92782"→˓}

How to monitor progress of a SMRT Link Run

Run progress can be monitored by looking at the completion status of each Collection associated with that run. Performthe following steps:

1. If you do not have the Run UUID, retrieve it as follows. Get the list of all Runs, using the GET request with thefollowing endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs

In the response, perform a text search for the Run Name. Find the object whose “name” field is equal to the RunName, and get the Run UUID, which can be found in the “uniqueId” field.

2. Once you have the Run UUID, get all Collections that belong to the run.

Use the Run UUID in the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/{runUUID}/collections

The response will contain the list of all Collections of that run.

3. Monitor Collection status to see when all Collections are complete.

Until all Collections of the Run have the field “status” set to “Complete”, repeat the GET request with the followingendpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/{runUUID}/collections

You may also monitor each Collection individually.

Use the Collection UUID in the GET request with the following endpoint:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/{runUUID}/collections/→˓{collectionUUID}

4. To monitor Run progress using QC metrics as well, do this at the Collection level, for each Collection thatbelongs to this run. For instructions, see ‘How to get QC reports for a particular Collection‘__.

42 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 49: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

The full set of QC metrics for a Collection will only be available when the Collection is complete. Monitor thecompletion status of each Collection and, for each complete Collection, check its QC metrics. QC metrics of allCollections that belong to the Run will let you evaluate an overall success of the run.

Example

If you want to monitor the Run with Name = “54149_DryRun_2Cells_20161219”, use the following steps:

1. Get the list of all Runs:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs

The response will be an array of Run objects, as in the following example (some fields are removed for displaypurposes)

[{“name” : “2016-11-08_3150473_2kLambda_A12”,“uniqueId” : “97286726-b243-45b3-82f7-8b5f58c56d53”,“createdAt” : “2016-11-08T17:50:57.955Z”,“summary” : “lambdaNEB”}, {“name” : “54149_DryRun_2Cells_20161219”,“uniqueId” : “798ff161-23ee-433a-bfd9-be8361b40f15”,“createdAt” : “2016-12-19T16:08:41.610Z”,“summary” : “DryRun_2Cells”}, {“name” : “2017_01_21_A7_RC0_2.5-6kb_DS”,“uniqueId” : “5026afad-fbfa-407a-924b-f89dd019ca9f”,“createdAt” : “2017-01-21T00:21:52.534Z”,“summary” : “gencode_23_transcripts”}]

2. Search the above response for the object with the “name” field equal to”54149_DryRun_2Cells_20161219”.

From the above example, you will get the Run object with the “uniqueId” field equal to “798ff161-23ee-433a-bfd9-be8361b40f15”.

3. With this Run UUID = 798ff161-23ee-433a-bfd9-be8361b40f15, get all Collections that belong to this run:

The response will be an array of Collection objects of this run, as in the following example:

[{"name" : "DryRun_1stCell","instrumentName" : "Sequel","context" : "m54149_161219_161247","well" : "A01","status" : "Complete","instrumentId" : "54149","startedAt" : "2016-12-19T16:12:47.014Z","uniqueId" : "7cf74b62-c6b8-431d-b8ae-7e28cfd8343b","collectionPathUri" : "/pbi/collections/314/3140149/r54149_20161219_160902/1_A01","runId" : "798ff161-23ee-433a-bfd9-be8361b40f15","movieMinutes" : 120

}, {"name" : "DryRun_2ndCell","instrumentName" : "Sequel","context" : "m54149_161219_184813","well" : "B01","status" : "Ready","instrumentId" : "54149",

10.1. How to get the reports for SMRT Link Job By Id 43

Page 50: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

"startedAt" : "2016-12-19T16:12:47.014Z","uniqueId" : "08af5ab4-7cf4-4d13-9bcb-ae977d493f04","collectionPathUri" : "/pbi/collections/314/3140149/r54149_20161219_160902/2_B01","runId" : "798ff161-23ee-433a-bfd9-be8361b40f15","movieMinutes" : 120

}]

In the above example, the first Collection has “status”, “Complete”.

You can take its UUID, i.e. “uniqueId”: “7cf74b62-c6b8-431d-b8ae-7e28cfd8343b”, and get its QC metrics. Forinstructions, see ‘How to get QC reports for a particular Collection‘__.

The second Collection has “status” : “Ready”.

You can take its UUID, i.e. “uniqueId”: “08af5ab4-7cf4-4d13-9bcb-ae977d493f04”, and monitor its status until itbecomes “Complete”; use the following API call:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/798ff161-23ee-433a-bfd9-→˓be8361b40f15/collections/08af5ab4-7cf4-4d13-9bcb-ae977d493f04

Once this Collection becomes complete, you can get its QC metrics as well.

How to capture Run level summary metrics

Run-level summary metrics are captured in the QC reports. See the following sections:

• ‘How to get QC reports for a particular SMRT Link Run‘__.

• ‘How to get QC reports for a particular Collection‘__.

How to setup a job on a particular Collection

To create a job using the SMRT Link Web Services API, use the POST request with the following endpoint:

The payload (request body) for this POST request is a JSON whose schema depends on the job type. To specificallycreate a SMRT Analysis job, you need to create a job of type “pbsmrtpipe”, with the payload as the one shown in‘How to setup an SMRT Link Analysis Job for a specific Pipeline‘__. You need to provide dataset IDs in the“entryPoints” array of the above payload.

Perform the following steps:

1. If you do not have the Collection UUID, retrieve it as follows.

To get the Collection UUID starting from a Run page in the SMRT Link Run QC UI, do the following:

1. Get the Run Name from the Run page in the SMRT Link Run QC UI.

2. Get the list of all Runs, using the GET request with the following endpoint:

GET http:/SMRTLinkServername.domain:9091/smrt-link/runs

In the response, perform a text search for the Run Name.

Find the object whose “name” field is equal to the Run Name, and get the Run UUID, which can be found in the“uniqueId” field.

Once you have the Run UUID, get all Collections that belong to this Run. Use the Run UUID in the GET request withthe following endpoint:

44 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 51: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/{runUUID}/collections

1. From here you can get the UUID of the Collection. It can be found in the “uniqueId” field of the correspondingCollection object from the previous response.

Note: Make sure that the Collection whose “uniqueId” field you take has the field “status” set to “Complete”. Thisis because obtaining dataset ID based on the Collection UUID as described below will only work if the Collection iscomplete. If the Collection is not complete, then the SubreadSet does not exist yet.

1. Find the dataset ID that corresponds to the Collection UUID.

For complete Collections, the Collection UUID will be the same as the UUID of the SubreadSet for that Collection.Use the Collection UUID in the GET request on the following endpoint to get the corresponding SubreadSet object:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/{collectionUUID}

Get the dataset ID from the “id” field of the response.

1. Build the request body with the dataset ID.

Use the dataset ID in the payload as the one shown in ‘How to setup an SMRT Link Analysis Job for a specificPipeline‘__.

1. Create a job of type “pbsmrtpipe”.

Use the request body built in the previous step in the POST request with the following endpoint:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe

Example

Suppose you want to setup a job for complete Collections that belong to the Run with Name =“54149_DryRun_2Cells_20161219”.

First, get the list of all Runs:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs

The response will be an array of Run objects, as in the following example:

[{"name" : "2016-11-08_3150473_2kLambda_A12","uniqueId" : "97286726-b243-45b3-82f7-8b5f58c56d53","createdAt" : "2016-11-08T17:50:57.955Z",...

"summary" : "lambdaNEB"}, {...}, {"name" : "54149_DryRun_2Cells_20161219","uniqueId" : "798ff161-23ee-433a-bfd9-be8361b40f15","createdAt" : "2016-12-19T16:08:41.610Z",..."summary" : "DryRun_2Cells"}, {...}, {"name" : "2017_01_21_A7_RC0_2.5-6kb_DS","uniqueId" : "5026afad-fbfa-407a-924b-f89dd019ca9f",

10.1. How to get the reports for SMRT Link Job By Id 45

Page 52: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

"createdAt" : "2017-01-21T00:21:52.534Z",..."summary" : "gencode_23_transcripts"}

Now, search the above response for the object with the “name” field equal to “54149_DryRun_2Cells_20161219”.

From the above example, you will get the Run object with the “uniqueId” field equal to “798ff161-23ee-433a-bfd9-be8361b40f15”.

With this Run UUID = 798ff161-23ee-433a-bfd9-be8361b40f15, get all Collections that belong to this run:

GET http://SMRTLinkServername.domain:9091/smrt-link/runs/798ff161-23ee-433a-bfd9-→˓be8361b40f15/collections

The response will be an array of Collection objects of this run, as in the following example:

[{"name" : "DryRun_1stCell","instrumentName" : "Sequel","context" : "m54149_161219_161247","well" : "A01","status" : "Complete","instrumentId" : "54149","startedAt" : "2016-12-19T16:12:47.014Z","uniqueId" : "7cf74b62-c6b8-431d-b8ae-7e28cfd8343b","collectionPathUri" : "/pbi/collections/314/3140149/r54149_20161219_160902/1_A01","runId" : "798ff161-23ee-433a-bfd9-be8361b40f15","movieMinutes" : 120

},{

"name" : "DryRun_2ndCell","instrumentName" : "Sequel","context" : "m54149_161219_184813","well" : "B01","status" : "Ready","instrumentId" : "54149","startedAt" : "2016-12-19T16:12:47.014Z","uniqueId" : "08af5ab4-7cf4-4d13-9bcb-ae977d493f04","collectionPathUri" : "/pbi/collections/314/3140149/r54149_20161219_160902/2_B01","runId" : "798ff161-23ee-433a-bfd9-be8361b40f15","movieMinutes" : 120

}]

In the above example, both Collections of the Run have “status” : “Complete”. Hence, the corresponding SubreadSetsshould already exist, and can be retrieved as described below.

Take the UUID of the first Collection, i.e. “uniqueId”: “7cf74b62-c6b8-431d-b8ae-7e28cfd8343b”, and get the corre-sponding SubreadSet object:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/7cf74b62-c6b8-→˓431d-b8ae-7e28cfd8343b

The response will be a SubreadSet object, as in the following example:

{“name” : “54149_DryRun_2Cells_20161219”,“uuid” : “7cf74b62-c6b8-431d-b8ae-7e28cfd8343b”,“id” : 5164,

46 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 53: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

“createdAt” : “2016-12-19T19:20:46.968Z”,“path” : “/pbi/collections/314/3140149/r54149_20161219_160902/1_A01/m54149_161247.→˓subreadset.xml”,“tags” : “subreadset”,“instrumentName” : “Sequel”,“wellExampleName” : “DryRun_1stCell”, “runName” :“54149_DryRun_2Cells_20161219”, “datasetType” :“PacBio.DataSet.SubreadSet”, “comments” : ” “}

From the above response, take the value of the “id” field, which is 5164 in the above example. So dataset ID = 5164will be the value for the first entry point for ‘pbsmrtpipe’ job.

Now take the UUID of the second Collection, i.e. “uniqueId”: “08af5ab4-7cf4-4d13-9bcb-ae977d493f04”, and getthe corresponding SubreadSet object:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads/08af5ab4-7cf4-→˓4d13-9bcb-ae977d493f04

The response will be a SubreadSet object, as in the following example:

{“name” : “54149_DryRun_2Cells_20161219”,“uuid” : “08af5ab4-7cf4-4d13-9bcb-ae977d493f04”,“id” : 5165,“createdAt” : “2016-12-19T21:57:11.173Z”,“path” : “/pbi/collections/314/3140149/r54149_20161219_160902/2_B01/m54149_184813.

→˓subreadset.xml”,“tags” : “subreadset”,“instrumentName” : “Sequel”,“wellExampleName” : “DryRun_2ndCell”,“runName” : “54149_DryRun_2Cells_20161219”,“datasetType” : “PacBio.DataSet.SubreadSet”,“comments” : ” “

}

From the response, again take the value of the “id” field, which is 5165 in the above example. So dataset ID = 5165will be the value for the second entry point for ‘pbsmrtpipe’ job.

Build the request body for creating ‘pbsmrtpipe’ job. Use these two dataset IDs obtained above as values of the“datasetId” fields in the “entryPoints” array. For example:

{"name" : "A4_All4mer_1hr_launchChem","entryPoints" : [

{"entryId" : "eid_subread","fileTypeId" : "PacBio.DataSet.SubreadSet","datasetId" : 5164

},{

"entryId" : "eid_subread2","fileTypeId" : "PacBio.DataSet.SubreadSet","datasetId" : 5165

}],"workflowOptions" : [],"taskOptions" : [

10.1. How to get the reports for SMRT Link Job By Id 47

Page 54: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

{"optionId" : "genomic_consensus.task_options.algorithm","value" : "quiver","optionTypeId" : "pbsmrtpipe.option_types.string"

},],"pipelineId" : "pbsmrtpipe.pipelines.sa3_resequencing"

}

Now create a job of type “pbsmrtpipe”. Use the request body built above in the following API call:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe

Verify that the job was created successfully. The return HTTP status should be 201 Created.

How to delete a SMRT Link Job

To delete a job, you need to create another job of type “delete-job”, and pass the UUID of the job to delete in thepayload (a.k.a. request body).

Perform the following steps:

1. Build the payload for the POST request as a JSON with the following fields:

• jobId: The UUID of the job to be deleted.

• removeFiles: A boolean flag specifying whether to remove files associated with the job being deleted.

• dryRun: A boolean flag allowing to check whether it is safe to delete the job prior to actually deleting it.

Note: If you want to make sure that it is safe to delete the job (there is no other piece of datadependent on the job being deleted), then first set the the “dryRun” field to ‘true’ and perform theAPI call described in Step 2 below. If the call succeeds, meaning that the job can be safely deleted,set the “dryRun” field to ‘false’ and repeat the same API call again, as described in Step 3 below.

1. Check whether the job can be deleted, without actually changing anything in the database or on disk.

Create a job of type “delete-job” with the payload which has dryRun = true; use the POST requestwith the following endpoint:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/delete-job

1. If the previous API call succeeded, that is, the job may be safely deleted, then proceed with actually deleting thejob.

Create a job of type “delete-job” with the payload which has dryRun = false; use the POST requestwith the following endpoint:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/delete-job

Suppose you want to delete the job with UUID = 13957a79-1bbb-44ea-83f3-6c0595bf0d42. Define the payload as inthe following example, and set the “dryRun” field in it to ‘true’:

{“jobId” : “13957a79-1bbb-44ea-83f3-6c0595bf0d42”,“removeFiles” :true,“dryRun” : true

}

48 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 55: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

Create a job of type “delete-job”, using the above payload in the following POST request:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/delete-job

Verify that the response status is 201: Created.

Also notice that the response body contains JSON corresponding to the job to be deleted, as in the following example:

{“name” : “Job merge-datasets”,“uuid” : “13957a79-1bbb-44ea-83f3-6c0595bf0d42”,“jobTypeId” : “merge-datasets”,“id” : 53,“createdAt” : “2016-01-29T00:09:58.462Z”,...“comment” : “Merging Datasets MergeDataSetOptions(PacBio.DataSet.SubreadSet, Auto-

→˓merged subreads @1454026198403)”}

Define the payload as in the following example, and this time set the “dryRun” field to ‘false’, to actually delete thejob:

{“jobId” : “13957a79-1bbb-44ea-83f3-6c0595bf0d42”,“removeFiles” : true,“dryRun” : false

}

Create a job of type “delete-job”, using the above payload in the following POST request:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/delete-job

Verify that the response status is 201: Created. Notice that this time the response body contains JSON correspondingto the job of type “delete-job”, as in the following example:

{“name” : “Job delete-job”,“uuid” : “1f60c976-e426-43b5-8ced-f8139de6ceff”,“jobTypeId” : “delete-job”,“id” : 7666,“createdAt” : “2017-03-09T11:51:38.828-08:00”,...“comment” : “Deleting job 13957a79-1bbb-44ea-83f3-6c0595bf0d42”

}

How to setup a SMRT Link Analysis Job for a specific Pipeline

To create an analysis job for a specific pipeline, you need to create a job of type “pbsmrtpipe” with the payload basedon the template of the desired pipeline. Perform the following steps:

1. Get the list of all pipeline templates used for creating analysis jobs:

GET http://SMRTLinkServername.domain:9091/smrt-link/resolved-pipeline-templates

1. In the response, search for the name of the specific pipeline that you want to set up. Once the desired templateis found, note the values of the pipeline “id” and “entryPoints” elements of that template.

10.1. How to get the reports for SMRT Link Job By Id 49

Page 56: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

2. Get the datasets list that corresponds to the type specified in the first element of “entryPoints” array. For example,for the type “fileTypeId” : “PacBio.DataSet.SubreadSet”, get the list of “subreads” datasets:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads

4. Repeat step 3. for the dataset types specified in the rest of elements of “entryPoints” array.

5. From the lists of datasets brought on steps 3. and 4, select IDs of the datasets that you want to use as entry pointsfor the pipeline you are about to set up.

6. Build the request body for creating a job of type “pbsmrtpipe”. The basic structure looks like this:

{"entryPoints": [

{"datasetId": 2,"entryId": "eid_subread","fileTypeId": "PacBio.DataSet.SubreadSet"

},{

"datasetId": 1,"entryId": "eid_ref_dataset","fileTypeId": "PacBio.DataSet.ReferenceSet"

}],"name": "Lambda SAT job","pipelineId": "pbsmrtpipe.pipelines.sa3_sat","taskOptions": [],"workflowOptions": []

}

Use the pipeline “id” found on step 2 as the value for “pipelineId” element.

Use dataset types of “entryPoints” array found on step 2 and corresponding dataset IDs found on step 5 as the valuesfor elements of “entryPoints” array.

Note that “taskOptions” array is optional and may be completely empty in the request body.

7. Create a job of type “pbsmrtpipe”.

Use the request body built in the previous step in the POST request with the following endpoint:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe

8. You may monitor the state of the job created on step 7 with the use of the following request:

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe/→˓{jobID}/events

Where jobID is equal to the value received in “id” element of the response on step 7.

Example

Suppose you want to setup an analysis job for Resequencing pipeline.

First, get the list of all pipeline templates used for creating analysis jobs:

The response will be an array of pipeline template objects. In this response, do the search for the entry with “name” :“Resequencing”. The entry may look as in the following example:

50 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 57: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

{“name” : “Resequencing”,“id” : “pbsmrtpipe.pipelines.sa3_ds_resequencing_fat”,“description” : “Full Resequencing Pipeline - Blasr mapping and Genomic Consensus.

→˓”,“version” : “0.1.0”,“entryPoints” : [{

“entryId” : “eid_subread”, “fileTypeId” : “PacBio.DataSet.SubreadSet”, “name” :→˓“Entry Name: PacBio.DataSet.SubreadSet”}, {

“entryId” : “eid_ref_dataset”, “fileTypeId” : “PacBio.DataSet.ReferenceSet”,→˓“name” : “Entry Name: PacBio.DataSet.ReferenceSet”}

],“tags” : [ “consensus”, “reports”],“taskOptions” : [{

"optionTypeId": "choice_string","name": "Algorithm","choices": ["quiver", "arrow", "plurality", "poa", "best"],"description": "Variant calling algorithm","id": "genomic_consensus.task_options.algorithm","default": "best"

}]}

In the above entry, take the value of the pipeline “id” : “pbsmrtpipe.pipelines.sa3_ds_resequencing_fat”.

Also, take the dataset types of “entryPoints” elements: “fileTypeId” : “PacBio.DataSet.SubreadSet” and “fileTypeId”: “PacBio.DataSet.ReferenceSet”.

Now, get the lists of the datasets that correspond to the types specified in the elements of the “entryPoints” array.

In particular, for the type “fileTypeId” : “PacBio.DataSet.SubreadSet”, get the list of “subreads” datasets:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/subreads

And for the type “fileTypeId” : “PacBio.DataSet.ReferenceSet”, get the list of “references” datasets:

GET http://SMRTLinkServername.domain:9091/smrt-link/datasets/references

From the above lists of datasets, select IDs of the datasets that you want to use as entry points for the Resequencingpipeline you are about to setup.

For example, take the dataset with “id”: 18 from the “subreads” list and the dataset with “id”: 2 from the “references”list.

Build the request body for creating ‘pbsmrtpipe’ job for Resequencing pipeline.

Use the pipeline “id” obtained above as the value for “pipelineId” element.

Use these two dataset IDs obtained above as values of the “datasetId” fields in the “entryPoints” array. For example:

{“pipelineId” : “pbsmrtpipe.pipelines.sa3_ds_resequencing_fat”,“entryPoints” : [

{“entryId” : “eid_subread”,“fileTypeId” : “PacBio.DataSet.SubreadSet”,“datasetId” : 18

},{

“entryId” : “eid_ref_dataset”,

10.1. How to get the reports for SMRT Link Job By Id 51

Page 58: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

SMRT Link Services, Release 5.1.0

“fileTypeId” : “PacBio.DataSet.ReferenceSet”,“datasetId” : 2

}],“taskOptions” : [],"workflowOptions": [],"name": "My Resequencing Job"

}

Now create a job of type “pbsmrtpipe”.

Use the request body built above in the following API call:

POST http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe

Verify that the job was created successfully. The return HTTP status should be 201 Created.

Querying Job History

The job service endpoints provide a number of search criteria (plus paging support) that can be used to limit the returnresults. A full list of available search criteria is provided in the the JSON Swagger API definition for the jobs endpoint.The following search retrieves all failed Site Acceptance Test (SAT) pipeline jobs:

GET http://SMRTLinkServername.domain:9091/smrt-link/job-manager/jobs/pbsmrtpipe?→˓state=FAILED&subJobTypeId=pbsmrtpipe.pipelines.sa3_sat

For most datatypes additional operators besides equality are allowed, for example:

This retrieves all pbsmrtpipe jobs run before 2018-03-01 by a user with the login ID “myusername”. (Note that certainsearches, especially partial text searches using like:, may be significantly slower to execute and can overload the serverif called too frequently.)

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2015 - 2017, Pacific Bio-sciences of California, Inc. All rights reserved. Information in this document is subject to change withoutnotice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certainnotices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences productsand/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions ofSale and to the applicable license terms at http://www.pacb.com/legal-and-trademarks/product-license-and-use-restrictions/.

Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq and Sequel aretrademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science, Inc. NGS-go and NGSengine are trademarks of GenDx. FEMTO Pulse and Fragment Analyzer are trademarks ofAdvanced Analytical Technologies. All other trademarks are the sole property of their respective owners.

P/N 100-855-900-04

52 Chapter 10. SMRT Link Services Common Tasks And Workflows

Page 59: SMRT Link Services · 2019-04-02 · SMRT Link Services, Release 5.1.0 By design, any pipeline that is runnable from the SMRT Link Services can be runnable directly from the commandline

CHAPTER 11

Disclaimer

For Research Use Only. Not for use in diagnostic procedures. Copyright 2017, Pacific Biosciences of California,Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciencesassumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or userestrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to theapplicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at http://www.pacb.com/legal-and-trademarks/product-license-and-use-restrictions/.

Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq and Sequel are trademarks ofPacific Biosciences. BluePippin and SageELF are trademarks of Sage Science, Inc. NGS-go and NGSengine aretrademarks of GenDx. All other trademarks are the sole property of their respective owners.

53


Recommended