Date post: | 22-Mar-2016 |
Category: |
Documents |
Upload: | uchicago-center-for-research-informatics |
View: | 219 times |
Download: | 2 times |
Center for Research Informatics
2012-2013
University of Chicago
ANNUAL REPORT
The Center for Research Informatics (CRI) was set up two years ago in August 2011
to provide services and resources to support biomedical informatics. The CRI views
biomedical informatics very broadly to include bioinformatics, clinical informatics,
translational informatics, and health care informatics. As is detailed in this report,
during the past two years, the CRI has set up a Clinical Research Data Warehouse and
a Bioinformatics Core, updated the BSD’s high-performance computing cluster, and
made secure and compliant storage and computing resources available to every BSD
researcher.
The Office of the CRIO has also set up a governance structure, bringing together BSD
and UCM leadership and experts in information systems, patient privacy, and a variety
of research fields to guide us in making long-term decisions that serve our researchers,
comply with all relevant laws and policies, and protect our patients’ data.
In addition, the Office of the CRIO has supported initiatives such as setting up a secure
and compliant computing infrastructure so that researchers can more quickly and easily
analyze large-scale genomic datasets.
As we move forward and continue to grow, it is important that we hear from faculty
so that we can provide the services, resources, and education that are important to
you. I am very interested in hearing from every BSD researcher to make sure that your
needs are met. You can contact me directly or talk to any member of the BSD Research
Informatics Oversight Committee (you can find their names on the CRI website and on
page 74 of this report).
We look forward to hearing from you.
Robert Grossman, Ph.D.
Message from the BSD Chief Research Informatics Officer
Who We Are
Clinical and Translational Informatics
Bioinformatics Core
Systems and Security
Training and Education
Faculty Oversight and Governance
Looking Ahead
Our Partners
40
2
12
26
53
59
67
65
Center for Research InformaticsAnnual Report 2012-13
TABLE OF CONTENTS
Appendix 69
WHO WE ARE
As a physician scientist and informaticist, I know firsthand the problems facing basic
researchers, translational medicine specialists, and clinicians. Deficiencies in any part of
the pipeline can have serious downstream effects. A productive translational research
operation requires a solid and secure infrastructure, easy access to powerful analytical
tools and high-performance computing, expertise in complex study design and data
analysis, and the ability to create and implement platforms for collecting, storing,
studying, and presenting research data. When the Center for Research Informatics was
established two years ago, the path from the bench to the bedside at the University of
Chicago was fragmented and inefficient, serving neither basic scientists nor clinicians
well. In the short time since, the CRI has grown into an active and versatile group of
professionals, capable of executing on a wide range of technologies to enable world-
class biomedical research.
To support these endeavors, we have built an industrial-grade, HIPAA-compliant secure
infrastructure, able to store and compute over even the largest and most complex
datasets. Every research group, from the basic sciences to clinical faculty, has access
to the storage and computing resources provided by the CRI. The Bioinformatics Core
takes on the most intricate and complicated data analysis tasks, working closely with
investigators to advise on data collection, perform complex computations, and render
advanced interpretation of the results. Nearly one hundred groups have taken advantage
of the Core so far, and the number of grants and papers directly resulting from our
assistance is growing every day. Our Clinical and Translational Group provides state-of-
the-art custom application development for investigators in need of specialized data
collection or other assistance with their clinical studies.
The crowning achievement of the Center’s first two years has been the design, building,
and implementation of a robust Clinical Research Data Warehouse. Starting with just
a data feed from the Centricity billing system, the CRDW now boasts data from over
600,000 patients in an easily searchable system that allows researchers to quickly
determine cohort size for their studies. The CRI has the BSD’s only system capable of
performing complex queries to study quality and other important clinical metrics.
Looking ahead, we have many exciting plans for the CRI. In 2014, we will release a
feature-rich system for accessing de-identified patient data in the CRDW. We will
implement a full-text searchable system for querying pathology and radiology reports.
We will further expand our computational infrastructure, doubling our HPC power
A Letter from the Director
Who We Are
CRI Annual Report 2012-134
and increasing our storage capacity. We continue to develop, test, and roll out new
bioinformatics methods, making these available to researchers through our core service
offerings and as self-service workflows on our Galaxy platform.
We hope that everyone in the BSD will take advantage of the array of resources offered
by the CRI. We look forward to continuing our mission of providing advanced informatics
support to enable world-class biomedical research throughout the BSD.
Samuel Volchenboum, MD, PhD
Director and Associate CRIOSamuel Volchenboum, MD, PhD
our leadership
Sam has been a part of the CRI since May 2012 in his role as Associate Chief Research Informatics Officer, leading our faculty outreach and education efforts. In April 2013 he was appointed Director of the CRI and now leads our operations and strategic planning. In addition to his work in the CRI, Sam serves the Department of Pediatrics as Assistant Professor, is an Associate Director of the Institute for Translational Medicine, and is a Faculty Fellow in the Computation Institute. His research includes using proteomics to study neuroblastoma, a pediatric solid tumor; developing software to facilitate real-time mass spectrometry peptide identification; and creating tools to improve provider communication and patient care.
Who We Are
cri.uchicago.edu 5
CRI Annual Report 2012-136
Who We Are
The Center for Research Informatics
(CRI) was created in 2011 to support the
University of Chicago Biological Sciences
Division (BSD) research community. We
offer state-of-the-art, standards-com-
pliant technologies for the acquisition,
management, and storage of clinical,
translational, and basic research data.
Our resources and services are open to
all members of the BSD; users include
students, postdoctoral fellows, tech-
nicians, staff, faculty, researchers, and
collaborators. In addition, we support
research and education in informatics
and work with the Institute for Trans-
lational Medicine (ITM), Clinical and
Translational Science Awards (CTSA)
program, and other partners on joint
initiatives. Finally, we are strong advo-
cates for informatics within our research
communities.
Since 2011, when the Office of the CRIO was created to guide research informatics efforts across the BSD, the Center for Research Informatics has grown into a robust service organization. In our first year, we developed a mission, hired several of the Direc-tors who would lead our major initiatives, and began work on key projects. Since early 2012, we’ve seen several important projects come to fruition: a functioning and growing Clinical Research Data Warehouse, with faculty data requests being evaluated by our Data Use Committee and fulfilled by our staff; a Bioinformatics Core providing pipelines, consulting, training, and other services; and an improved, HIPAA-compliant computing and storage infrastructure for researchers.
As we work to improve and expand our existing services, we will continue to reach out to faculty to increase user adoption and develop new initiatives. We look forward to seeing where the next year of our timeline will bring us.
August 2011
The CRI is created as a central organization for informatics service,
research, and education.
Robert Grossman
is appointed CRIO.
February 2011
The Office of the CRIO is created with a charge of overseeing and directing
research informatics across the BSD.
About the CRI
How We’ve Grown: A Timeline
cri.uchicago.edu 7
Who We Are
The CRI’s services and resources are
provided by three core groups. The
Clinical and Translational Informatics
team manages our Clinical Research
Data Warehouse (CRDW), provides
custom programming for initiatives in
clinical research, and supports data
management for clinical trials. The
Bioinformatics Core provides analy-
sis pipelines, consulting services, and
other expertise for researchers working
with genomic data. The Systems and
Security team maintains and improves
our scientific computing infrastructure,
provides technical support to users, and
ensures our technical compliance with
appropriate security regulations. All
three of these groups, along with our
administration, governance committees,
and strategic partners, work together
toward our goal of enabling world-class
research in a secure environment.
Don Saner joins the CRI as
Director of Clinical and Translational
Informatics.
October 2011
The BSD Research Informatics Governance Structure is approved
by the BSD Dean’s Office.
Hannah Lawrence
joins the CRI as Executive
Administrator.
The CRI’s mission is to provide informatics resources and services to BSD faculty, to
support high-quality biomedical and clinical research, and to promote research and
education in informatics.
OUR MISSION
CRI Annual Report 2012-138
Who We Are
Jorge Andrade joins the CRI as Director of
Bioinformatics.
May 2012
Sam Volchenboum is appointed Associate
Chief Research Informatics Officer.
February 2012
The IRB protocol for the CRDW is approved, and the
system goes live for BSD researchers.
The CRI is an initiative developed and
managed by the Chief Research Infor-
matics Officer (CRIO) and Associate
CRIO. The Office of the CRIO is respon-
sible for advising the Dean and the Dean
for Research and Graduate Education of
the Biological Sciences Division on the
BSD’s investment in research informat-
ics, for managing research informatics
services and resources, for creating new
research informatics initiatives, and for
operating a Research Informatics Gover-
nance Structure.
Chief Research Informatics Officer Robert Grossman, PhD
As Chief Research Informatics Officer, Bob guides informatics activities and initiatives across the BSD, including providing strategic direction and oversight for the CRI. In addition to his role as CRIO, Bob is a Senior Fellow in the Institute for Genomics and Systems Biology, a Senior Fellow in the Computation Institute, and a Professor of Medicine in the Section of Genetic Medicine. His research group focuses on big data, biomedical informatics, data science, cloud computing, and related areas. Bob served as the first Director of the CRI from August 2011 to April 2013.
our leadership
cri.uchicago.edu 9
Who We Are
September 2012
The Bioinformatics Core makes its core informatics
pipelines available to researchers.
The Bioinformatics
Core begins offering services.
October 2012
The cohort discovery tool i2b2 is released to facilitate
queries of the CRDW for all BSD researchers.
The CRI’s administrative
team: Caitlin Pike, Michael Daus, and Hannah Lawrence
Who We Are
November 2012
The CRI makes available a new, HIPAA-compliant HPC cluster, storage and backup
resources, and other computing infrastructure.
April 2013
Sam Volchenboum is appointed Director
of the CRI.
Plamen Martinov joins the CRI as
Director of Systems and Security.
CRI Annual Report 2012-1310
Executive Administrator Hannah Lawrence
As Executive Administrator of the CRI, Hannah is responsible for planning and oversight of our financial and administrative functions, including coordination across service areas, management of governance committees, communications, and project management for CRI initiatives. She works closely with our leadership team to develop short- and long-term organizational plans, manages our budget and hiring process, and provides other administrative and strategic support. Prior to joining the CRI, she served as Strategist and Planner in the Office of the Dean of the BSD. There, she was responsible for organizing the Informatics Advisory Group that ultimately led to the Dean’s decision to appoint a CRIO and create the CRI. She also served as the first administrative manager of the BSD’s Faculty Advisory Committee.
our leadership
Who We Are
Our Organization
For a detailed list of CRI employees, please see Appendix A.
cri.uchicago.edu 11
Timothy HolperManager of CRDW
Development
Brian FurnerManager of
Programming
Seong ChoiKeith Danahey
Kevin LeProgrammers
Julissa AcevedoBusiness Systems
Analyst
Luis MacielDatabase
Administrator
Tiffany CyrusProject Manager
Riyue BaoElizabeth BartomKyle Hernandez
Lei HuangJianpeng Xu
Chunling ZhangBioinformaticians
Wenjun KangScientific
Programmer
Hannah LawrenceExecutive
Administrator
Don SanerDirector of Clinicaland Translational
Informatics
Jorge AndradeDirector of
Bioinformatics
Plamen MartinovDirector of Systems
and Security
Sam VolchenboumAssociate CRIO & Director of the
CRI
Robert GrossmanChief Research
Informatics Officer
Caitlin PikeCommunication
Specialist
Michael DausAdministrative
Specialist
Andy BrookBeth Lynn EicherMichael Jarsulic
Sneha Jha Olumide Kehinde
Systems Administrators
Bruce ThompsonSecurity Analyst
Brad OrrSenior Project
Manager
CLINICAL & TRANSLATIONAL INFORMATICS
Clinical and Translational Informatics
CRI Annual Report 2012-1314
Since the CRI’s beginnings, one of the
central projects of the Clinical and Trans-
lational team has been to build, populate,
and maintain the Clinical Research Data
Warehouse. Over the past two years,
the team has seen this initiative develop
from a concept to a functioning and
growing data warehouse. The IRB proto-
col outlining its standards, governance,
and oversight was approved in February
2012, and the team began fulfilling data
requests for researchers in May 2012.
The CRDW incorporates six years’ worth
of data from electronic medical records
and patient billing, including lab values,
procedure and diagnosis codes, demo-
graphics, medications, and visit informa-
tion. The Clinical and Translational team
continues to work to expand the amount
and types of data available; they are
currently engaged in integrating radiol-
ogy and pathology notes and discharge
summaries.
The CRDW
Director of Clinical and Translational InformaticsDon Saner
our leadership
Over the past two years, Don has led his team through the process of building and developing the Clinical Research Data Warehouse. In addition, he provides informatics leadership and support for other clinical and translational research projects in conjunction with the Institute for Translational Medicine and our other partners. With over 20 years of experience at the University of Chicago, Don joined the CRI as one of our first staff members in 2011.
cri.uchicago.edu
Clinical and Translational Informatics
To interact with the CRDW, research-
ers use a datamart interface. The first
datamart implemented by the CRI, i2b2,
went live for users in June 2012. i2b2, or
“Informatics for Integrating Biology and
the Bedside,” is an NIH-funded open-
source project created by a National
Center for Biomedical Computing based
at Partners HealthCare System. i2b2 is
designed for cohort identification, allow-
ing researchers to query the CRDW for
sets of patients meeting search criteria.
Applications and benefits of the i2b2
interface include:
• Helping investigators create new
research hypotheses
• Identifying potential cohorts for clin-
ical trials
• Reducing the time researchers must
spend on discovery of research
cohorts, study feasibility, and subject
recruitment
• Familiarizing researchers with the
standard terminologies and data that
reside in the CRDW
Researchers log into i2b2 using BSD or
hospital credentials and can then explore
the CRDW using an intuitive drag-and-
drop interface. Queries may be created
As of August 2013,the CRDW contains1...
1 Please note that some numbers regard-ing the data housed in the CRDW differ from those listed in the CRI’s 2012 Annual Report due to changes in how these data are measured. (1) The number of patients reported here represents only those patients who have encounter data associ-ated with them, while last year’s report included all patients regardless of encounter data. (2) Billing records where debits are canceled out by credits (for example, when a procedure is ordered but never performed) are no longer listed here. (3) This year’s report no longer includes “orphaned” records that were previously included in CRDW summary counts but cannot be returned via i2b2 or in custom data requests.
607,000patients
6.1 millionencounters
16.1 millionmedications
36.8 millionprocedures
93 millionlabs
13.9 milliondiagnoses
15
i2b2 and Data Requests
Clinical and Translational Informatics
with multiple and/or logic, using search
terms based on standard terminologies.
The information returned by the system
allows researchers to determine the
number of patients meeting their crite-
ria. These results can inform subsequent
requests for full datasets with either
de-identified data or IRB-approved
protected health information. To date,
almost one hundred users have cre-
ated and executed over one thousand
searches.
Requests for CRDW data are fulfilled
by the Clinical and Translational team,
under the oversight of the Data Use and
Technical Policy Committees. Research-
ers submit requests using a simple
online form, providing information about
the scientific purpose of the requested
data. To protect the University’s data
and comply with patient privacy laws
and our IRB protocol, the Data Use Com-
mittee monitors data requests to ensure
appropriate use of CRDW data. (For
more information about this committee,
see page 63.) In addition, the CRI acts as
an Honest Broker service for researchers,
integrating data from different sources
such as the Cancer Registry and Epic and
removing identifiers when necessary to
protect patient privacy. Since May 2012,
the Clinical and Translational team has
fulfilled 92 data requests of varying size
and complexity.
CRI Annual Report 2012-1316
92 10
Data Request Status Summary (as of August 2013)
17
not approved
completed
on hold awaiting IRB awaiting user
in progress
Clinical and Translational Informatics
cri.uchicago.edu 17
As the demand for clinical data for
research purposes continues to grow,
the future of the CRDW’s develop-
ment will include the creation of a fully
de-identified datamart for research,
which will allow investigators to access
and query de-identified data within
a secure data zone. In addition, the
continued development of the CRDW
includes the incorporation of additional
data elements, which have been pri-
oritized by the Research Informatics
Governance Committee. These elements
include the Cancer Registry; radiology,
pathology, and discharge summaries;
other data elements from Epic’s Clarity
data warehouse; and the integration of
research-specific databases including
Velos and REDCap.
5
Family Medicine
Human Genetics
Medicine
Neurology
Obstetrics & Gynecology
Orthopaedics
Pathology
Pediatrics
Psychiatry
Radiology
Surgery
Total Data Requests
115302010
Who uses the CRDW? Since May 2012, our Clinical and Translational Informatics team has processed data requests from 11 different University departments.
40 50 60 70
Clinical and Translational Informatics
CRI Annual Report 2012-1318
The CRI maintains an instance of caTis-
sue, a robust open-source biobanking
management system used by labs to
organize freezers and track samples. In
addition to the standard deployment
of caTissue, the CRI has worked with
Dr. Michael Maitland, who manages
the Cancer Center’s biofluids core, to
add customizations created by Indiana
University. These customizations, called
caTrack, permit tracking the chain of
custody for all samples using barcode
labels and handheld barcode readers,
which then synchronize with caTissue
when placed in docking stations. This
system permits tracking of a sample’s
origin, when it was drawn, and where it
was initially stored.
Biobanking Management
Highlighted Accomplishments from 2012-13
• Expanded the CRDW by incorporating data elements from Epic and Clarity as well as the Cancer Registry
• Released i2b2 and began fulfilling data requests
• Established data request guidelines and policies to protect patient information
• Developed a Data Use Committee review process for data requests
• Expanded the team by hiring a business systems analyst, a database administrator, and a project manager
• Launched a REDCap users group
• Rewrote Dr. David Meltzer’s Hospitalist Protocol application and wrote a dashboard and alerting system for his Continuity of Care program
our contributions
Brian Furner, Keith Danahey, and Tim Holper
Clinical and Translational Informatics
cri.uchicago.edu 19
Beyond maintaining the CRDW and
fulfilling data requests, the Clinical and
Translational team works directly with
research groups to provide custom
research application development for
their clinical research projects.
One such project is 1200 Patients, a
personalized medicine initiative jointly
sponsored by the CRI and the Center for
Personalized Therapeutics. This pharma-
cogenomics project seeks to develop a
new medical system model for person-
alized care in which patients’ genetic
information can be incorporated into the
decision-making process of prescribing
medications.
Patients who have consented to partic-
ipate in the project are genotyped in a
Custom Programming:1200 Patients and TRIDOM
Clinical and Translational Informatics
CRI Annual Report 2012-1320
CLIA-certified lab. Their genetic informa-
tion is then stored in a relational database
along with curated pharmacogenomic
data from published studies. During clinic
visits, a physician dashboard displays a
“30-second summary” synthesized from
the information in the database relevant
to the patient’s genomic profile. Physi-
cians can use these summaries to inform
their choices in prescribing medication—
by pre-identifying patients who are likely
to experience severe side effects, for
example, or by predicting when a patient
may need alternative dosing.
Clinical and Translational Informatics team members Tiffany Cyrus, Brian Furner, Don Saner, Luis Maciel, Julissa Acevedo, Tim Holper, Kevin Le, and Keith Danahey
Clinical and Translational Informatics
cri.uchicago.edu 21
Our Goals for 2013-14our future
• Improve internal efficiency, including time, data repository, bug tracking, reporting, and status tracking, by implementing team project management software
• Implement a standard procedure for processing project requests that includes defining business requirements and scope of work, providing a time estimate, and receiving client approval
• Create a dashboard with metrics for the CRDW, REDCap, and Velos to increase the visibility of the data in each system
• Enhance the skill set of each team member through professional training
• Develop and maintain tools for data collection and reporting, including continuing to develop in-house systems to support custom applications and datamarts
• Improve internal knowledge and collaboration across the team through biweekly presentations
As new technologies allow the practice
of medicine to become increasingly
personalized, the 1200 Patients project
contributes to this progress by improv-
ing doctors’ ability to make patient-
specific medication decisions. The CRI
helps enable this important initiative
by providing custom programming and
database design as well as data import
and overall technical management.
TRIDOM (Translational Research Initia-
tive in the Department of Medicine) is
a biobanking protocol started in 2005
that stores DNA, plasma, and serum for
consented patients who are scheduled
Clinical and Translational Informatics
CRI Annual Report 2012-1322
for a standard-of-care blood draw.
The CRI contributed to this project
by leading a rewrite of the database
that maintains consent and sample
information and generates operational
reports. In addition, the CRI partnered
with eSphere, which makes the Human
Tissue Resource Center’s biobanking
software, to create an automated feed
to the TRIDOM database. To date, TRI-
DOM has enrolled over 8,800 patients
and has banked samples for more than
5,900 of these patients, resulting in a
total of over 58,000 samples.
300
1200
1500
900
600
0
REDCap adoption has progressed at a steady rate since 2011.
total projects
total users
July
Augu
st
Sept
embe
r
Oct
ober
Nov
embe
r
Dec
embe
r
Janu
ary
Febr
uary
Mar
ch
April
May
June July
Augu
st
Sept
embe
r
Oct
ober
Nov
embe
r
Dec
embe
r
Janu
ary
Febr
uary
Mar
ch
April
May
June July
Augu
st
2011 2012 2013
Clinical and Translational Informatics
cri.uchicago.edu 23
Clinical Trials Management
The Clinical and Translational team
supports clinical trials management by
operating two data management solu-
tions, REDCap and Velos eResearch.
The University of Chicago has been a
member of the REDCap consortium
since 2010. This self-managed, secure,
web-based application, developed by
the Vanderbilt University Clinical and
Translational Science Awards (CTSA),
supports data collection strategies for
research studies with tools for build-
ing and managing online surveys and
databases. The University of Chicago’s
instance of REDCap currently supports
more than 700 users from across the
BSD and houses over 600 projects.
As the number of REDCap users con-
tinues to grow, the CRI has worked to
provide opportunities for education and
collaboration. A REDCap users group
provides the community with an ongo-
ing meeting space for discussion, new
feature announcements, tips and tricks,
and real-time help. The CRI also offers
individual and small group REDCap
tutorial sessions for those in need of
more personalized guidance (for more
detail, see page 56).
Velos eResearch is a clinical trials
management system that integrates
study administration and clinical data
management. The system supports
many aspects of running a clinical trial,
including:
• Patient recruitment and scheduling
• IRB and study monitoring
• Project planning and study design
• Protocol compliance
• Web-based data capture on a
per-protocol basis
• Data safety monitoring and adverse
event reporting
Velos is now supporting over 1,500 pro-
tocols for more than 550 investigators
at the University. The CRI is currently
working with the vendor to complete a
hardware migration and upgrade that
will improve the user experience.
faculty spotlight
David Meltzer
David Meltzer, MD, PhD, is Chief of the Section of Hospital Medicine, Director of the Center for Heath and the Social Sciences, and an Associate Professor of Medicine, Economics, and Public Policy Studies at the University of Chicago. He also serves as the co-leader of the Institute for Translational Medicine’s Training Cluster, as well as the co-director of the ITM’s academic arm, the Committee for Clinical and Translational Science.
Dr. Meltzer’s research ex-plores problems in health economics and public policy, focusing on the theoretical foundations of medical cost-effectiveness analysis and the cost and quality of hospital care. In the past year, the CRI’s Clin-ical and Translational team has worked with Dr. Meltzer in support of two of his proj-ects: the Hospitalist Project and the Comprehensive Care Program (CCP) initiative.
The Hospitalist Project, which is supported in part by the Clinical and Translational Science Awards, has been in operation for over 16 years and has enrolled over 100,000 patients. The
Clinical and Translational Informatics
CRI Annual Report 2012-1324
aims of this multi-site project are to study the quality and cost of care among hospitalized patients at the University of Chicago and Mercy Hospital, to examine whether there are significant differences in outcomes and costs for patients cared for by hospitalists compared to those cared for by other inpatient attending physicians, and to develop a research infrastructure that allows collaboration among multiple investigators and institutions.
Patients enrolled in the Hospitalist Project are administered two separate interviews, one during their hospitalization and one 30 days after discharge. The results are recorded during the patient encounter using iPads connected to a custom-written web application. This information is stored in an SQL server database, from which it can be exported with a custom-built reporting system into SAS and Stata for analysis. This year, the CRI’s Clinical and Translational team updated the database and web-based interface for this project.
The CCP initiative is a randomized study started in 2012 with the aim of testing novel care delivery systems for improving the quality and reducing the cost of health care. The study’s hypothesis is that improving continuity in the doctor-patient relationship by having a single physician see patients at high risk of hospitalization in both inpatient and outpatient settings will improve outcomes and lower costs by reducing unnecessary emergency department visits, hospital admissions, and readmissions.
The CRI has supported this effort by creating a custom dashboard that serves as a central location for CCP staff and physicians to record study information on patients and integrate this information with data from electronic medical records. In addition to the dashboard, the CRI has implemented a notification system that sends pages and emails to the appropriate CCP physicians and staff when a patient enrolled to the CCP protocol visits the emergency department or is admitted to the hospital.
Clinical and Translational Informatics
cri.uchicago.edu 25
SYSTEMS & SECURITY
Systems and Security
The Systems and Security team began
with just two members and a limited
infrastructure set up by the Initiative in
Biomedical Informatics. It has now grown
to include five full-time employees who
support and manage the development
of infrastructure in multiple data centers.
The team has grown our computing envi-
ronment considerably since its inception
with generous support from the Institute
for Translational Medicine and the BSD.
The purpose of the Systems and Secu-
rity team is to provide core services and
scientific computing resources with the
highest quality of customer service, in
order to enable BSD faculty to conduct
advanced biological research in a com-
pliant environment while simultaneously
protecting intellectual property and sen-
sitive information.
CRI Annual Report 2012-1328
Director of Systems and SecurityPlamen Martinov
our leadership
Plamen joined the CRI leadership team as Director of Systems and Security in April 2013 and manages the team of engineers responsible for the development and operations of our secure computing infrastructure. He leads our efforts to ensure compliance with security regulations and provides regular reports to the Research Informatics Compliance Review and Technical Policy Committees. Prior to joining the CRI, Plamen was Lead Data Security Engineer for Chicago Biomedicine Information Systems.
About Us
Systems and Security
cri.uchicago.edu 29
The CRI’s resources, which have been
upgraded and expanded over the past
year, include:
• 1,024-core high-performance com-
puting (HPC) cluster (2.2 GHz AMD
Opteron 6274)
• Large Memory Linux supercomputer
with 1 TB of RAM, 8 Intel® Xeon®
E7-8870 2.4 GHz processors (160
cores)
• 700-TB ultra-high-density NAS for
data storage that can scale up to 20
PB, available for both labshares and
individuals
• Virtual Server Infrastructure with
the capacity to support up to 1,500
virtual servers on Windows or Linux
platforms
• Centralized and automated data
backup and encryption with the
capability to back up 2.1 PB of data
• Galaxy web-enabled biomedical data
analytics tool that is fully integrated
with the CRI’s HPC cluster
Infrastructure
Storage & Backup By the Numbers
files backed up
total labshare capacity
total backup capacity
virtual infrastructure total storage
854,597,057
1.7 petabytes
2.5 petabytes
110 terabytes
Systems and Security
CRI Annual Report 2012-1330
Genetics
Health Studies
Cellular Screening Center
Laboratory for Advanced Computing
Cardiology
Pediatrics
Clinical & Translational Informatics
Bioinformatics Core
Clinical Cancer Genetics
Childhood Cancer & Blood Diseases
Medicine Administration
Radiology, HIRO
Academic & Administrative Applications
CRI Infrastructure
10 6050403020
Who uses our virtual machines? Our VMs in Kenwood Data Center serve 11 different BSD groups, in addition to supporting CRI activities.
Total VMs
125
The Systems and Security team main-
tains the computing infrastructure that
supports not only the CRI’s activities but
also research for faculty across the BSD.
These new HIPAA-compliant resources
went live in November 2012 and are
available to all members of the BSD.
To date, our resources are supporting a
total of 479 active users: 279 users of our
storage resources, 162 users of our HPC
cluster, and 38 Galaxy users. Kenwood
Data Center houses 125 virtual machines,
with 350 TB of storage in use and 595 TB
of data backed up.
cri.uchicago.edu 31
Systems and Security
Highlighted Accomplishments from 2012-13our contributions
• Made a new HPC cluster and large memory servers available to all BSD researchers
• Launched a VMware farm and began provisioning virtual machines for researchers
• Migrated labshares from outdated equipment to a new state-of-the-art data center
• Introduced the CRI help desk and created an online technical help portal to direct users to the correct sources of information
• Hired Plamen as Director and expanded the team by hiring four systems administrators and a security analyst
• Created detailed security procedures to demonstrate HIPAA compliance
• Worked with the Compliance Review Committee to establish policies and procedures for patient privacy and data security
• In collaboration with the Bioinformatics Core, released the CRI’s implementation of Galaxy
Systems and Security
CRI Annual Report 2012-1332
Kenwood Data CenterThe CRI’s advanced computing resources
are housed in the state-of-the-art Ken-
wood Data Center on the University of
Chicago campus. The CRI has made sub-
stantial investments in superior, resilient
technology at Kenwood, improving the
security, reliability, and recoverability
of system resources through the mod-
ernization of data center services and
standard architecture.
A primary focus of the Systems and Secu-
rity team over the past year has been the
migration of users’ data from outdated
equipment in the Prudential Data Center
to the newer, better-equipped Kenwood
Data Center. The closing of Prudential
and the move to Kenwood will allow us
to shift our investments to more efficient
and standardized computing platforms
and technologies. The targets for com-
pleting this move include:
• Virtualizing and migrating more than
80 servers
• Migrating 80 labshares and 300
home directories
• Adding an additional 1,024-core HPC
cluster
• Adding two 1-TB Large Memory
Servers
Migration has proceeded carefully over
the course of a year to achieve the goals
of protecting the integrity of all data
and ensuring clear communication with
users. As of August 2013, 64 labshares
(55 TB) out of a total of 75 (79 TB) have
been migrated. All home directories have
been migrated to new servers, and the
team is on track to migrate 10 servers per
month. The completion of this project is
expected by the end of 2013.
Kenwood Data Center is equipped to
house systems that are compliant with
federal guidelines, including HIPAA and
the Federal Information Security Man-
agement Act (FISMA). Moving all CRI
resources to this facility helps us to pro-
tect patient privacy and keep our data
secure.
650
600
550
500
450
400
350
300
250
200
150
100
50
0
iBi Clus
ter
iBi Big
Memory
Large M
emory
Labsha
res
Home D
irecto
ries
Storag
e
Server
Backup
HPC
2,368
Total Data Center Users
Kenwood
Prudential
Systems and Security
cri.uchicago.edu 33
Systems and Security
CRI Annual Report 2012-1334
Who uses our labshare storage resources? More than 30 different departments and groups across
the University store data on CRI-provided labshares.
Ben MayBiochemistry & Molecular Biophysics
BioinformaticsBiomedical Sciences Cluster
Cancer ResearchCell & Molecular BiologyCellular Screening Center
Center for Clinical Cancer GeneticsChicago Booth School
Childhood Cancer & Blood DiseasesEcology & Evolution
Endocrinology, Diabetes, & MetabolismEvolutionary Biology
Genetic MedicineGenetics & Clinical Cytogenetics
Health StudiesHematology/Oncology
Human GeneticsInstitutional Biosafety
Internal MedicineLaboratory - Human Genetics
MedicineMolecular Genetics & Cell Biology
NeurobiologyNeuroscience
Obstetrics & GynecologyPathology
Pulmonary/Critical CareRheumatology
Science & EducationSurgery
0 10,000 GB
20,000 GB
30,000 GB
80,000 GB
Total Storage Used
151,791 GB
cri.uchicago.edu 35
Systems and Security
To complement our improved computing
infrastructure, the Systems and Security
team has made several important steps
over the past year to enhance customer
service and technical support for users
of our resources.
The CRI’s help desk was opened in sum-
mer 2012 with phone and email support
staffed by employees who either resolve
or triage issues. Concurrently, we intro-
duced a new, easier-to-use system for
submitting trouble tickets. Including
Technical Support
Systems and Security team members Beth Lynn Eicher, Sneha Jha, Plamen Martinov, Olumide Kehinde, Dan Sullivan, Brad Orr, Mike Jarsulic, and Bruce Thompson
Systems and Security
CRI Annual Report 2012-1336
working through the initial backlog of
unresolved tickets, the support staff
has now resolved almost ten thousand
issues. The CRI also created a web portal
to make it easier for users to find sources
of technical help.
In addition, the Systems and Security
team hosted several events throughout
the year to educate users about CRI
resources and specific issues. See IT Live!
seminars in November and December
highlighted our HPC cluster, Galaxy, and
i2b2, with each seminar focused on a live
demonstration of one resource. In March
and April, the team hosted HPC Lunch
& Learn events tailored to existing and
potential HPC users, both to introduce
our newest resources and spread the
word about our migration to Kenwood.
For more detail, see pages 57-58.
Bruce Thompson and Beth Lynn
Eicher in Kenwood Data Center
Systems and Security
cri.uchicago.edu 37
Fabrice Smieliauskas, PhD, is a health economist whose research interests cen-ter on the operation of markets for medical technologies. His work includes studies of financial conflicts of interest in medicine and of disparities in the adoption and abandonment of new medical technologies. Dr. Smieliauskas primarily uses SAS and Stata to perform statistical analyses for his research.
The focus of Dr. Smieliauskas’s ongoing work is the evidence base for new medical technologies. He is analyzing the response of payers and providers to evidence that a common medical treatment is of limited value to patients, as well as the response to state and federal policies that mandate coverage of drugs for “off-label” indica-tions not approved by the FDA. He is also developing a unique comprehensive data-base on clinical cancer trials in order to address several open research questions, including the effects of a variety of gov-ernmental and institutional policies on the rate and direction of cancer innovation.
Dr. Smieliauskas’s research has been enabled in part by the CRI’s Large Mem-ory Server. Before moving his work to the CRI’s infrastructure, he encountered problems using servers that were not capable of handling the large memory requirements of his research. He was deterred from other potential options by high user fees and inability to handle sensitive data in compliance with HIPAA. “By contrast,” he noted, “the new CRI server is able to handle confidential data securely, essential to much of the research at the BSD.”
Dr. Smieliauskas also spoke positively about the support provided by the Sys-tems and Security team, saying, “The server administrative team is friendly and responds promptly to user needs and requests. I also have great con-fidence in CRI management and their desire and ability to continue adding capacity and computing capabilities as they grow.”
faculty spotlight
Fabrice Smieliauskas
Plamen Martinov and Olumide Kehinde in Kenwood Data Center
Systems and Security
CRI Annual Report 2012-1338
Data Security and Compliance
Secure handling of sensitive human sub-
ject data is of the utmost importance to
the CRI’s Systems and Security team. The
CRI is the primary computing resource
for BSD researchers working with
electronic protected health information
(ePHI). For this reason, it is essential that
our infrastructure and security policies
comply with relevant federal guidelines,
including HIPAA.
Systems and Security
cri.uchicago.edu 39
Our Goals for 2013-14our future
• Redesign and implement secure, stable, and sustainable computing infrastructure and resources at Kenwood Data Center
• Complete the Prudential-to-Kenwood data center move by migrating or discontinuing all resources, while maintaining the integrity of all user data
• Update and deliver improved customer service policies, procedures, services, and automation
• Improve communication internally and within the user community
• Enhance the skill set of each team member and of the team as a whole
To this end, the Research Informatics
Compliance Review Committee, made
up of IT security professionals from
other IT organizations across the Uni-
versity, spent several months drafting,
editing, and approving a set of policies
and procedures for data protection. This
committee is also responsible for con-
ducting regular audits to ensure com-
pliance to these important standards.
(For more information on this commit-
tee, see page 63.) In addition, the CRI
retains experts in FISMA and HIPAA as
consultants.
BIOINFORMATICS CORE
Bioinformatics Core
The analysis that transforms high-
throughput raw data into biologically
meaningful information can present a
challenge to clinical, translational, and
basic researchers alike. To make it easier
for BSD investigators to take full advan-
tage of high-throughput technologies in
their research, the Bioinformatics Core
has developed a set of pipelines for the
analysis of Next-Generation Sequencing
and Microarray data.
These automated pipelines quickly
absorb and process large amounts of raw
data and produce meaningful analysis
results. They are executed on the CRI’s
high-performance computing (HPC)
cluster, which provides the significant
Bioinformatics Analysis Pipelines
CRI Annual Report 2012-1342
Director of BioinformaticsJorge Andrade, PhD
our leadership
As the technical director responsible for planning and oversight of the Bioinformatics Core, Jorge works closely with CRI leadership to develop and deliver bioinformatics services and expertise. Jorge joined the CRI in May 2012 and brings extensive experience in the pharmaceutical industry and scientific research community, most recently at the Beijing Genome Institute, where he was an Associate Director.
Bioinformatics Core
computational power necessary for
working with such large quantities of
data. Researchers interested in using
one of the CRI’s pipelines for their data
analysis can request this on the CRI
website. The Core currently offers a
total of 12 production-ready pipelines
for a variety of platforms and analyses.
In addition, the Bioinformatics Core
maintains a catalog of publicly-available
and commercial software tools, refer-
ence datasets, and databases for use by
the BSD research community. A com-
plete list of these resources is available
on the CRI’s website.
cri.uchicago.edu 43
Illumina pipelines for RNA-Seq, ChIP-Seq, Exome Sequencing, Whole Genome Re-Sequencing (WGRS), Consensus Genotyping, and De-Novo Assembly
SOLiD pipelines for RNA-Seq, WGRS, ChIP-Seq, and De-Novo Assembly
One pipeline for Illumina and Affymetrix Expression Arrays
One pipeline for Affymetrix and Exiquon miRNA Arrays
For more information on what each of these pipelines offers, see Appendix B.
The Core currently offers production-ready pipelines for the following platforms and analyses:
Bioinformatics Core
CRI Annual Report 2012-1344
Kenan Onel, MD, PhD, is an Associate Professor of Pediatrics in the Section of Hematology/Oncology and Director of the Pediatric Familial Cancer Clinic. He is an expert on pediatric and other familial genetic cancer syndromes. The Onel Lab uses genomic platforms and systems biology strategies to investi-gate how genetics contribute to cancer risk and response to therapy.
The lab studies families with high-penetrance cancer-predispos-ing conditions, but no known can-cer-predisposing gene mutations, in order to discover new genes that may, when mutated, predispose individuals to cancer. Recently, they have begun to advance this research by taking advantage of the CRI’s Next-Generation Sequencing offerings. Dr. Onel reports, “The Bioinformatics Core has been in-strumental in pushing forward our work because of their expertise in handling genomic data.”
Dr. Onel’s lab used the CRI’s Galaxy instance to develop a pipeline for exome analysis. In addition, they worked with the CRI’s team of bio-informaticians to develop a power-ful command-line pipeline utilizing multiple aligners and callers, allowing a robust analysis over a large number of family studies. The Onel Lab and the CRI bioinformaticians continue to meet weekly to discuss progress.
According to Dr. Onel, “The analysts have been intellectually engaged in the projects and extremely professional. They have helped us develop methods for analysis of family data that we could not have done on our own.”
faculty spotlight
Kenan Onel
cri.uchicago.edu 45
Bioinformatics Core
For researchers interested in self-
service data analysis, the CRI maintains
a customized version of Galaxy, an
open-source bioinformatics workflow
management and system integration
tool. The CRI’s Galaxy instance is inte-
grated with our advanced computing
infrastructure, allowing researchers
to take advantage of our HPC and
large-scale storage resources within a
self-service data analysis environment.
Galaxy can substantially facilitate the
use of common bioinformatics tools by
non-bioinformaticians and those without
extensive computing expertise.
Highlighted Accomplishments from 2012-13our contributions
• Developed, tested, and implemented 12 production-ready pipelines, some of which are currently in use for the NIH-funded Bionimbus Protected Data Cloud contract
• Expanded the Core to a team of seven scientists
• Hosted monthly training seminars which have drawn over 350 total participants
• Developed a project management and invoicing system now used as a template by other BSD Cores
• Launched the CRI’s implementation of Galaxy
Self-Service Data Analysis: Galaxy
Bioinformatics Core
CRI Annual Report 2012-1346
A selection of completed analysis projects illustrates the diversity of the research facilitated by the Bioinformatics Core:
Genome assembly and annotation of the Siberian hamster genome, using SOLiD sequences generated at the University of Chicago Genomics Core facility
ChIP-Seq data analysis for a project studying how hyperglycemia induces epigenetic changes that lead to renal injury
A gene expression profile analysis of the rat brain in response to perimenopausal hormonal signals
RNA-Seq analysis of differential gene expression between two types of melanoma tumors
Quality trimming of large sets of Illumina sequences for a study of nasal microbiota of people with chronic allergies
Exome-wide analysis of 60 samples from four cohorts for research on the genetic components of several cancer types
The CRI’s implementation of the Galaxy
framework includes workflows for several
Next-Generation Sequencing pipelines
to enable users to perform, reproduce,
and share complete analyses.
Available workflows in Galaxy include:
• RNA-Seq: Sample Level for quality
control, mapping, and statistics for
paired-end Illumina reads (individual
samples)
• RNA-Seq: Project Level Merge for
merging multiple samples and gen-
erating a differentially expressed
list, for both single- and paired-end
Illumina reads
• Exome Sequencing Analysis for qual-
ity control, mapping, and recalibra-
tion for both single- and paired-end
Illumina reads
The CRI’s Galaxy platform became
available to researchers in November
2012, concurrent with the release of our
updated HPC and storage resources. It
now supports around 40 active users
and is maintained and updated with
new tools, workflows, and pipelines in
collaboration with the CRI’s Systems and
Security team.
cri.uchicago.edu 47
Bioinformatics Core
Bioinformatics Core team members Jorge Andrade, Wenjun Kang, Riyue Bao, Jianpeng Xu, Chunling Zhang, and Lei Huang
Bioinformatics Core
For researchers who are looking for
personalized analysis, including cus-
tom-built pipelines, the Bioinformatics
Core provides consulting and custom-
ized services. Our bioinformaticians’
expertise extends to areas beyond those
of our standard pipelines, including pro-
teomics and genome-wide association
studies.
Collaboration and Custom Analysis
Ben MayBiochemistry
Molecular BiologyEcology & Evolution
External OrganizationsHealth Studies
Human GeneticsMedicine
MicrobiologyMolecular Genetics
NeurobiologyNeurology
Obstetrics & GynecologyOrganismal Biology
PathologyPediatrics
Radiation OncologySocial Sciences
Surgery
Total Project Requests
995 3025201510
Who uses the Bioinformatics Core? Since May 2012, the Core has received project requests from 18 different University departments and several external organizations.
CRI Annual Report 2012-1348
Bioinformatics Core
When a researcher submits an online
project request form, the Core returns a
proposal, including the scope of deliver-
ables, a timeline for completion, and the
estimated cost. The execution of each
project is guided by frequent discussion
between researchers and bioinformati-
cians, and updates are provided regularly.
When a project is complete, results are
delivered in the form of a written report
and presentation.
Since May 2012, the Core’s bioinfor-
maticians have completed a total of 53
projects, with many more in progress. An
average of seven new project requests
are submitted each month. These proj-
ects, conducted for researchers from
over 25 different University departments
and sections, vary widely in both scope
and subject matter (for examples, see
sidebar on page 46).
Sept. 2012
Oct. 2012
Nov. 2012
Dec. 2012
Jan. 2013
Feb. 2013
Mar. 2013
Apr. 2013
May 2013
June 2013
July 2013
in progress
completed
submitted
The Bioinformatics Core has completed 53 projects since May 2012, with 24 currently in progress and new requests submitted each month.
Total Projects
Completed
53
10 6050403020
cri.uchicago.edu 49
CRI Annual Report 2012-1350
Bioinformatics Core
Director of Bioinformatics Jorge Andrade
The Bioinformatics Core supports the
creation of research grants in several
ways, with the goal of fully developing
and integrating the bioinformatics com-
ponents of each grant and increasing
its competitiveness for funding. CRI
bioinformaticians can collaborate with
researchers directly on the bioinformat-
ics components of their grants, or they
can provide cost analysis services and
letters of support. In addition, standard
language is available to be added to
grants, documenting the accessibility
of the necessary tools and expertise to
complete the bioinformatics research
indicated. The Core has so far contrib-
uted to the writing and submission of 13
research grants, and has established this
as an area of focus for future growth.
One more important part of the Core’s
mission is to provide training opportuni-
ties that will help investigators develop
bioinformatics expertise within their own
laboratories. To this end, the CRI hosts a
free monthly training seminar open to all
members of the BSD, covering a differ-
ent topic in bioinformatics analysis each
month. For more detail and a list of past
topics, see pages 54-55.
Other Services: Grant Analysis and Training
cri.uchicago.edu 51
Bioinformatics Core
Our Goals for 2013-14our future
• Recruit and hire two additional Bioinformatics Scientists
• Improve efficiency by reducing the standard turnaround time of projects in the following production pipelines: Exome-Seq, RNA-Seq, and Microarray expression arrays
• Increase customer satisfaction by producing high-quality and customer-oriented services (to this end, the Core has already introduced a feedback survey to gather information on potential areas of improvement)
• Improve existing pipelines by performing comparative analysis of tools, and develop new pipelines to accommodate new technologies, protocols, and data types
• Continue to author and coauthor scientific publications
• Increase interest in the Core by developing advertising materials and meeting with department chairs
• Increase the Core’s funding through chargebacks, grant inclusion, and expanding internal and external collaboration
Bioinformatics Core
CRI Annual Report 2012-1352
Ernst Lengyel, MD, PhD, is a Professor of Obstetrics/Gynecology, specializing in advanced surgical treatments for patients with ovarian cancer. The Lengyel Lab is dedicated to studying the biology of ovarian cancer metastasis and finding new drugs for its treatment.
One of the scientific goals of the Lengyel Lab is to understand the mecha-nisms of a common problem in the treatment of ovarian cancer: that most
patients will develop a resistance to carboplatin and taxol chemotherapy. To study this, the lab sought to identify the miRNA expression profiles of chemore-sistant versus chemosensitive patients. They worked with the CRI’s Bioinfor-matics Core to obtain and analyze these genomic data.
With the assistance of the CRI’s bioin-formaticians, the lab mined the Cancer Genome Atlas (TCGA) and distinguished unique patient groups of chemosen-sitive and chemoresistant patients. They then performed six analysis sets, and were able to identify seven miRNA genes upregulated in chemoresistant
disease and three in very chemosensitive disease. These findings have since been validated in an independent cohort of patients.
The Lengyel Lab’s findings, enabled in part by the CRI, have prognostic and functional implications that may aid in developing therapies to target these miRNA in chemoresistant patients. Dr. Lengyel said, “Without the CRI we would not have had the expertise to take advantage of the TCGA ovarian cancer data.”
faculty spotlight
Ernst Lengyel
TRAINING & EDUCATION
Training and Education
CRI Annual Report 2012-1354
An integral part of the CRI’s mission is providing training and education for our users so that they become comfortable and confident both with our resources and with other technologies for biological computing.
The Bioinformatics Core presents a free
training seminar with a different topic
each month, taught by PhD bioinforma-
ticians. These seminars cover the use
and application of a variety of publicly
and commercially available software
and tools for bioinformatics analysis—R,
Bioconductor, and Galaxy, for example.
In some cases, the training is directly
tied in to CRI resources such as our
HPC cluster. As investigators bring this
education back to their laboratories,
bioinformatics expertise can be further
developed throughout the BSD. Since
these seminars began in May 2012, they
have attracted over 350 participants.
A post-training survey, introduced in
February 2013, requests opinions from
users after each seminar in an effort to
improve future sessions and cover the
topics most important to researchers.
Of those responding to the survey in
February through July 2013, 96 percent
found their course worthwhile, with 80
percent choosing “very” or “extremely”
worthwhile. The CRI’s bioinformaticians
have received high marks, with 96 per-
cent of respondents calling their instruc-
tors “very knowledgeable.” Overall, 92
percent of respondents reported satis-
faction with the course they attended.
Survey participants also provided sug-
gestions and requests for future seminar
topics, helping the Bioinformatics Core
to continue to design valuable training
opportunities that meet the needs of our
research community.
Bioinformatics Training
Training and Education
cri.uchicago.edu 55
Introduction to Linux Command Line for Bioinformatics
Introduction to Linux Command Line for Bioinformatics
Analyzing Illumina ChIP-Seq Data with the CRI
Introduction to CRI’s HPC Cluster for Bioinformatics Computing
Analysis of Illumina and Microarray Data with R and Bioconductor
Analysis of Microarrays with R and Bioconductor
Galaxy: Web-Based Bioinformatics Analysis and RNA-Seq Workflow Management
Analysis of Microarray Data with R and Bioconductor
Introduction to R
Analyzing Illumina RNA-Seq Data with the CRI
Analyzing Illumina Whole Exome Data with the CRI
Analyzing Illumina RNA-Seq Data with the CRI
Introduction to R and Execution on HPC
7/2013
6/2013
5/2013
4/2013
3/2013
2/2013
1/2013
11/2012
9/2012
8/2012
7/2012
6/2012
5/2012
24
37
32
34
12
25
22
31
40
41
16
12
25
date topic attendance
Past training seminars have covered a range of systems and software programs useful for bioinformatics analysis.
Training and Education
CRI Annual Report 2012-1356
Other CRI Training
The CRI provides individual and small-
group training sessions upon request for
researchers who need assistance with
using REDCap for their studies. Julissa
Acevedo, the CRI’s Business Systems
Analyst, holds an average of eight train-
ing and demo sessions per month for
groups of one to three researchers at a
time.
In addition, the CRI hosted several events
over the past year with the goal of shar-
ing information about our computing
resources with existing and potential
users.
Three See IT Live! seminars were held in
November and December, aligning with
the release of new CRI resources. Each
seminar was centered on a live demon-
stration of the featured resource and
included an overview of the CRI, instruc-
tions on obtaining an account, and an
opportunity to meet our technical staff
and ask questions, as well as free refresh-
ments. These events were open to all
members of the BSD and were attended
by a total of 32 participants.
What bioinformatics training participants had to say...
“Very approachable and helpful instructors.” (June 2013)
“I thought it was really cool and I learned a lot about how to navigate around a Linux system. Great job!” (June 2013)
“Very helpful.” (April 2013)
“Well organized, very knowledgeable, and well presented.” (March 2013)
“Very good training class, need more like this.” (February 2013)
Training and Education
cri.uchicago.edu 57
See IT Live! sessions highlighted the
following resources:
Galaxy: Learn how to access Galaxy, a
web-based portal providing data stor-
age, data management, and analytical
tools integrated with our computing
resources
i2b2: Learn how to use our de-identified
datamart to identify cohorts and request
data for your research
HPC Cluster: Learn how to optimize and
run jobs on our new high-performance
computing cluster
Jorge Andrade presents a bioinformatics training seminar
CRI Annual Report 2012-1358
Training and Education
Don Saner presents at the CRI’s first HPC Lunch & Learn
event
In addition to the HPC Cluster session
of See IT Live!, the CRI’s Systems and
Security team hosted two Lunch & Learn
events in the spring to further educate
HPC users about our available resources
and to provide an overview of the data
center migration from Prudential to
Kenwood. Each event included a discus-
sion of available resources in Kenwood,
the rationale and timeline for moving out
of Prudential, and a question and answer
session. These events were advertised to
both potential and current users and were
attended by a total of 32 participants.
FACULTY OVERSIGHT & GOVERNANCE
Faculty Oversight and Governance
CRI Annual Report 2012-1360
The CRI’s strategic decision-making and
long-term planning are led by a gover-
nance structure set up by the Office of the
CRIO. These committees guide research
informatics activities across the entire
BSD, ensuring that informed long-term
decisions for the Division are reached in
a transparent and accountable way.
Governance Structure
Research Informatics Executive Governance
Committee
Research Informatics Governance Committee
Research Informatics Technical Policy
Committee
Research Informatics Data Use Committee
Research Informatics Compliance Review
Committee
Faculty Oversight and Governance
cri.uchicago.edu 61
The five committees outlined below
bring together senior BSD and University
of Chicago Medicine (UCM) leadership,
information systems experts, patient
privacy experts, and faculty represent-
ing basic science, clinical research,
and translational research. Decisions
from these committees guide us in
establishing policies and procedures,
prioritizing new initiatives, safeguarding
patient information, and complying with
BSD policies and applicable federal and
state laws.
For a full list of governance committee
membership, see Appendix C.
Research Informatics Executive Governance Committee
Dr. Kenneth Polonsky, Dean and Executive Vice President for Medical
Affairs
To provide high-level strategic decisions for all research informatics
activities across the BSD, integrating the needs of faculty, clinicians,
and BSD and hospital leadership
BSD and UCM executive leadership
Chair
Mission
Members
Faculty Oversight and Governance
CRI Annual Report 2012-1362
Research Informatics Governance Committee
Dr. Robert Grossman, Chief Research Informatics Officer
To establish priorities and policies for research informatics across
the BSD, including those for the development and use of the CRDW
and the comprehensive computing resources provided to BSD
faculty
Senior faculty and staff leadership from across the BSD and UCM
Chair
Mission
Members
Research Informatics Technical Policy Committee
Dr. Robert Grossman, Chief Research Informatics Officer
To provide oversight and governance for the technical aspects of
research informatics across the BSD and to ensure appropriate
safeguards for ePHI used in research
Staff and faculty with expertise in informatics and information
technology security
Chair
Mission
Members
Faculty Oversight and Governance
cri.uchicago.edu 63
Research Informatics Data Use Committee
Dr. Dana Edelson, Assistant Professor of Medicine
To review, approve, monitor, and prioritize requests for CRDW data
release to individual investigators and to approved datamarts and
systems, including i2b2 and any subsequently-developed datamarts
Staff and faculty with expertise in regulatory issues, compliance,
and data management
Chair
Mission
Members
Research Informatics Compliance Review Committee
Tyler DeNormandie, Information Systems Manager and Senior
Systems Engineer, Health Studies and Family Medicine
To advise the CRI Systems and Security group on best practices for
ensuring compliance with BSD policies and to help ensure the CRI’s
implementation of appropriate safeguards for ePHI used in research
Information technology security experts representing the major IT
organizations at the University and UCM
Chair
Mission
Members
CRI Annual Report 2012-1364
Informatics Oversight Committee
Guidance and oversight for research
informatics throughout the BSD are
provided by the Informatics Oversight
Committee, made up of faculty leaders
representing both basic science and clin-
ical departments. The recommendations
of this committee guide us in ensuring
that the direction and activities of the CRI
are in line with the needs of the research
faculty we serve. This committee reports
to the Research Advisory Committee,
a BSD/UCM Committee that reports to
the Dean for Research and Graduate
Education.
The Informatics Oversight Committee is
chaired by Dr. John Cunningham, Chief
of the Section of Pediatric Hematology/
Oncology. For a full list of members, see
Appendix D.
Faculty Oversight
Research Advisory Committee
Office of the CRIO
Faculty Oversight and Governance
cri.uchicago.edu 65
The Center for Research Informatics is grateful for the support of our strategic
partners and collaborators across the University of Chicago. These partners
enable the collaborative work that has made the CRI successful in our first years
of operations.
OUR PARTNERS
Biological Sciences Division bsd.uchicago.edu
Chicago Biomedicine Information Systems help.bsd.uchicago.edu
Comprehensive Cancer Center cancer.uchicago.edu
Computation Institute ci.uchicago.edu
Our Partners
CRI Annual Report 2012-1366
Human Imaging Research Office hiro.bsd.uchicago.edu
Institute for Genomics & Systems Biology igsb.anl.gov
Institute for Translational Medicine itm.uchicago.edu
Institutional Review Board humansubjects.uchicago.edu
IT Services itservices.uchicago.edu
Office of Clinical Research bsdocr.bsd.uchicago.edu
The Center for Research Informatics has achieved many important goals over the past
two years. The establishment of the Clinical Research Data Warehouse, the design and
implementation of a state-of-the-art high-performance computing and storage infra-
structure that can house protected health information, and the development of a solid
and robust Bioinformatics Core are all providing the foundation of support for a large
number of research programs throughout the BSD. Many grants, papers, and research
projects owe part of their success to the services delivered by the CRI. Providing ser-
vices and training to enable high-quality scientific research was our goal when we were
established, and we are happy to have achieved this level of success in the short period
of time since the CRI was founded.
Building on these accomplishments, we plan to improve and expand our offerings in
several ways over the coming years:
Our Clinical Research Data Warehouse now serves BSD faculty by offering cohort dis-
covery tools able to search over millions of patient encounters. The next phase for the
CRDW will include the development of a de-identified datamart and tools for analyzing
de-identified data in a secure manner. We will further enhance the CRDW by including
full-text search functionality for pathology and radiology notes.
We are starting now to work with researchers to use data from the CRDW to build pre-
dictive models that produce alerts to improve hospital operations and quality of care.
From identifying patients at risk for readmission to the hospital to predicting which
patients may suffer a cardiac arrest while inpatient, we are leading the way for clinical
researchers to design, develop, and use complex alerts in their practice. We are working
closely with CBIS to ensure that we will be able to implement alert notifications within
the Epic electronic medical record.
LOOKING AHEAD
cri.uchicago.edu 67
Institute for Genomics & Systems Biology igsb.anl.gov
IT Services itservices.uchicago.edu
Office of Clinical Research bsdocr.bsd.uchicago.edu
In the next year we will further expand our high-performance computational resources
to provide increased capacity, functionality, and performance with the goal of ensur-
ing that all faculty have access to agile and advanced computing. Coupled with these
efforts, we will expand our training and educational offerings to enable more faculty to
leverage these computing tools to enhance their research.
The CRI is fully invested in advancing world-class research within the Biological Sciences
Division. Our successes thus far and our ambitious plan going forward demonstrate our
commitment to this goal.
Looking Ahead
CRI Annual Report 2012-1368
A CRI weekly planning meeting
APPENDIX
CRI Annual Report 2012-1370
Appendix
Appendix A: CRI Staff List
Samuel Volchenboum, MD, PhD
Director & Associate CRIO
Administration
Hannah Lawrence
Executive Administrator
Michael Daus
Administrative Specialist
Caitlin Pike
Communication Specialist
Bioinformatics Core
Jorge Andrade, PhD
Director of Bioinformatics
Riyue Bao, PhD
Bioinformatician
Elizabeth Bartom, PhD
Bioinformatician
Kyle Hernandez, PhD
Bioinformatician
Lei Huang, PhD
Bioinformatician
Wenjun Kang, MS
Scientific Programmer
Jianpeng Xu, PhD
Bioinformatician
Chunling Zhang, PhD
Bioinformatician
Clinical and Translational Informatics
Don Saner
Director of Clinical and Translational
Informatics
Julissa Acevedo
Business Systems Analyst
Seong Choi
Programmer
Tiffany Cyrus
Project Manager
Keith Danahey
Database/Systems Administrator and
Programmer
Brian Furner
Manager of Programming
Timothy Holper
Manager of CRDW Development
Kevin Le
Programmer/Analyst
Luis Maciel
Database Administrator
Systems and Security
Plamen Martinov
Director of Systems and Security
Andy Brook
Senior Systems Administrator
Beth Lynn Eicher
Senior Systems Administrator
Michael Jarsulic
Senior Systems Administrator
Sneha Jha
Systems Administrator
Olumide Kehinde
Senior Systems Administrator
Brad Orr
Senior Project Manager
Bruce Thompson
Security Analyst
cri.uchicago.edu 71
Appendix
Appendix B: Bioinformatics Core Pipelines
Illumina
RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis
ChiP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation
Exome Sequencing: Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detec-tion, and Annotation
Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation
Consensus Genotyping Pipeline: Genotyping, SNP Detection & InDel Detection using three different methods (Samtools, GATK, and Atlas-2), comparison of variant calls, list of consensus call variants, and list of method specific calls
De-Novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assembly, Scaffold Assembly, Assemble Statistics, and Downstream Analysis
SOLiD
RNA-Seq: Raw Data QC, Filtering, Mapping, Data Summarization, Expression Quantification, Differentially Expressed Genes, Pathways, and Gene Ontology Analysis
Whole Genome Re-Sequencing (WGRS): Raw Data QC, Filtering, Mapping, Genotyping, SNP Detection, InDel Detection, SV (Somatic SV) Detection, CNV Analysis, and Annotation
ChiP-Seq: Raw Data QC, Filtering, Mapping, Peak Calling, Peak Differential Analysis, Peak Related Genes Analysis, Gene Ontology Analysis, and Annotation
De-Novo Assembly: Raw Data QC, Merging, Clipping, Filtering, Contigs Assembly, Scaffold Assembly, Assemble Statistics, and Downstream Analysis
Illumina and Affymetrix Expression Arrays
Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed Genes, Functional Annotation, and Pathway Enrichment Analysis
Affymetrix and Exiquon miRNA Arrays
Filtering, Data Summarization and Normalization, Sample/Gene/Probe-based QC, Differentially Expressed miRNAs, Predict miRNA Targeted Genes, Functional Annotation, and Pathway Enrichment Analysis
CRI Annual Report 2012-1372
Appendix
Appendix C: Research Informatics Governance Committees
Kenneth Polonsky (Chair) Dean and Executive Vice President for Medical Affairs
Conrad Gilliam Dean for Research and Graduate Education
Robert Grossman Chief Research Informatics Officer, BSD
Sharon O’Keefe President, UCM
Eric Yablonka Vice President and Chief Information Officer, CBIS
Name Title and Affiliation(s)
Research Informatics Executive Governance Committee
Robert Grossman (Chair) Chief Research Informatics Officer, BSD
Sameer Badlani Chief Medical Information Officer, UCM
John Cunningham Chief, Section of Pediatric Hematology/Oncology
Chris Daugherty Chair, Institutional Review Board
Dana Edelson Assistant Professor of Medicine
Conrad Gilliam Dean for Research and Graduate Education
Marilyn Hanzal Associate General Counsel, Legal Affairs
Catherine Ostapina Senior Compliance Advisor & Director, Office of Corporate Compliance
Lainie Ross Professor of Pediatrics, Medicine, and Surgery
Julian Solway Associate Dean for Translational Medicine
Walter Stadler Associate Dean for Clinical Research
Samuel Volchenboum Director, CRI
Eric Yablonka Vice President and Chief Information Officer, CBIS
Name Title and Affiliation(s)
Research Informatics Governance Committee
cri.uchicago.edu 73
Appendix
Robert Grossman (Chair) Chief Research Informatics Officer, BSD
Paul Chang Professor of Radiology; Vice Chair of Radiology Informatics
Tyler DeNormandie Information Systems Manager and Senior Systems Engineer, Health Studies
Roger Engelmann Image Analysis Software Developer, Human Imaging Research Office
Rajan Gopalakrishnan Director for Informatics and Information Technology, Comprehensive Cancer Center
John Moses Director of Enterprise Architecture and New Technologies, UCM
Prasanna Nippani Assistant Director of Information Technology, UCM
Don Saner (Co-Chair) Director of Clinical and Translational Informatics, CRI
Samuel Volchenboum Director, CRI
Name Title and Affiliation(s)
Research Informatics Technical Policy Committee
Name Title and Affiliation(s)
Research Informatics Data Use Committee
Dana Edelson (Chair) Assistant Professor of Medicine
Samuel Armato Associate Professor of Radiology
Rajan Gopalakrishnan Director for Informatics and Information Technology, Comprehensive Cancer Center
Nick Gruszauskas Technical Director, Human Imaging Research Office
Contessa Hsu Application Manager, UCM
Millie Maleckar Director of Regulatory Compliance for Human Subjects, Institutional Review Board
Prasanna Nippani Assistant Director of Information Technology, UCM
Don Saner Director of Clinical and Translational Informatics, CRI
Phil Schumm Senior Biostatistician, Health Studies; Director, Research Computing Group
Cassie Simon Assistant Director, UCM Cancer Registry
Appendix
CRI Annual Report 2012-1374
Tyler DeNormandie (Chair) Information Systems Manager and Senior Systems Engineer, Health Studies
James Clark Network Security Officer, IT Services
Andrew Kramski Infrastructure Security Engineer, UCM
Plamen Martinov Director of Systems and Security, CRI
Catherine Ostapina Senior Compliance Advisor & Director, Office of Corporate Compliance
Daniel Sullivan Web Developer and Infrastructure Architect Specialist, CBIS
Bruce Thompson Security Analyst, CRI
Name Title and Affiliation(s)
Research Informatics Compliance Review Committee
Appendix D: Faculty Oversight
Name Title and Affiliation(s)
John Cunningham (Chair) Chief, Section of Pediatric Hematology/Oncology
Michael Glotzer Professor of Molecular Genetics and Cell Biology
Robert Grossman Chief Research Informatics Officer, BSD
Michelle Le Beau Director, Comprehensive Cancer Center
Marsha Rosner Chair, Ben May Department for Cancer Research
Robert Rosner Professor of Astronomy/Astrophysics and Physics
Matthew Stephens Professor of Human Genetics and Statistics
Ronald Thisted Professor of Statistics, Health Studies, and Anesthesia/Critical Care
Samuel Volchenboum Director, CRI
Informatics Oversight Committee
© The University of Chicago, 2013. All rights reserved.
Written and designed by Caitlin Pike.
Photography by Robert Kozloff.