+ All Categories
Home > Documents > intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing...

intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing...

Date post: 08-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
69
Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics University of North Carolina at Charlotte Introduction to Preprocessing of Untargeted Metabolomics Data
Transcript
Page 1: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics

University of North Carolina at Charlotte

Introduction to Preprocessing of Untargeted Metabolomics Data

Page 2: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Outline• Raw untargeted LC/MS and GC/MS metabolomics data

- Profile and centroid data

- Mass vs. retention time map

- TIC

- EIC

• Principles of LC/MS and GC/MS data preprocessing

• Feature identification

- Identification of known compounds

- Identification of unknown compounds

2

Page 3: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

3

Raw Untargeted LC/MS and GC/MS Metabolomics data

Page 4: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

4

list of scans in raw files• MS scans in blue• MS/MS scans in red• # sequential number• @ retention time• MS level• type of spectrum

• p = profile• c = centroid• t = thresholded

• polarity of ionization• + = positive• - = negative• ? = unknown

List of mass spectra

Page 5: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

One mass spectrum

5

Page 6: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

One mass spectrum

6

Page 7: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

One mass spectrum

7

Page 8: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Zoom in one mass spectrum

8

profile mode

Page 9: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Mass spectra in centroid mode

9

Page 10: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Mass spectra in centroid mode

10

Page 11: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Spectrum in centroid mode• Data files are much smaller than files in profile mode.

• We will use the centroid data for practicing data pre-processing using XCMS and MZmine 2.

11

Page 12: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

LC-MS raw data in 3D

12

Page 13: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Raw data in 3D

13

Page 14: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

3D to 2D• Direct processing of the 3D data is NOT trivial

• Instead, we examine 2D

- Mass vs. retention time

- Total ion current vs. retention time: TIC

- Ion current vs. retention time for a particular mass: EIC (Extracted Ion Chromatogram)

14

Page 15: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Mass vs. retention time map

15

Page 16: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

TIC

16

Page 17: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

EIC

17

Page 18: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

EIC

18

Page 19: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

19

Principles of LC/MS and GC/MS Data Preprocessing

Page 20: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Data preprocessing workflow

20

GC-MS

LC-MSdetect masses from

mass spectra

detect masses frommass spectra

constructEICs

constructEICs

detectchromatographic peaks

detectchromatographic peaks

deconvolution

annotation

databasesearch

databasesearch

1 2 3 4 5 6alignment /

correspondence

alignment / correspondence

Page 21: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Construct EICs

21

Page 22: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Select one EIC

22

Page 23: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

One EIC

23

Page 24: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Detect EIC peaks

24

Page 25: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

• Use wavelet transform

• Implemented in XCMS as the centWave method

Detect EIC peaks

25

mexican hat wavelet

−20 −10 0 10 20

−0.4

0.0

0.2

0.4

0.6

0.8

t

ψψs,ττ((

t))

s = 1s = 2s = 8

68

1012

1:dim(wCoefs)[1]

Scal

e

+

++

2850 2900 2950 3000 3050 3100 3150

02

46

810

Seconds

Inte

nsity

* 10

3

ChromatogramGaussian Fit

Tautenhahn, R.; Bottcher, C.; Neumann, S., Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 2008, 9, 504.

Page 26: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Detected EIC peaks

26

Page 27: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

27

LC/MS-specific Data Preprocessing

Page 28: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Find isotopes

28

Page 29: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Find isotopes

29

Page 30: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Find isotopes

30

Page 31: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

31

Alignment

zoom in……

Page 32: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

32

Alignment

Page 33: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

33

Alignment

Page 34: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Peaks table after alignment

34

Page 35: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

35

GC/MS-specific Data Preprocessing

Page 36: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

GC-EI-MS

36

Page 37: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

• Example: EI fragmentation of methanol

EI fragmentation

37

[CH3OH]•+ �! CH3O+ +H•

[CH3OH]•+ �! CH2O+ +H2

[CH3OH]•+ �! CH+3 + •OH

Page 38: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Deconvolution

38

Page 39: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

39

Page 40: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

40

Page 41: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

41

ADAP-GC 2.0

ADAP-GC 2.0: Deconvolution of Coeluting Metabolites from GC/TOF-MS Data for Metabolomics Studies. Analytical chemistry 2012, 84 (15), 6619-29.

Page 42: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

42

Feature identification

Page 43: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Feature identification• Apply statistics and machine learning to detect

discriminating peaks

• Identify discriminating peaks

43

Page 44: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Identification of known compounds• Screening search for compound ID based on LC-MS

data

- Searching monoisotopic mass and isotopic distribution against compound databases

• Library match for compound identification from both LC-MS/MS and GC-MS spectra

44

Page 45: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

HMDB

45

Page 46: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

HMDB

46

Page 47: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

MS/MS or GC-MS spectra matching• Library match for compound identification from both

LC-MS/MS and GC-MS spectra

47

Page 48: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

Identification of unknown compounds• MS-FINDER

• CSI:FingerID

• CFM-ID

• MetFrag

• MIDAS

• MAGMA

48

Page 49: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

MetFrag

49

Page 50: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

More on identification

50

Page 51: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

More on identification

51

Page 52: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

More on identification

52

Page 53: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

More on identification

53

Page 54: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

More on identification• Information we have for identification of compounds

based on MS/MS

- M+H

- Experimental isotopic identification

- MS/MS

54

Page 55: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

55

More on identification

Page 56: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

56

More on identification

Page 57: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

57

More on identification

Page 58: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

58

More on identification

Page 59: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

59

More on identification

Page 60: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

60

More on identification

Page 61: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

61

More on identification

Page 62: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

62

More on identification

Page 63: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

63

More on identification

Page 64: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

64

More on identification

Page 65: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

65

More on identification

Page 66: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

• Compare isotopic distributions

66

theoretical experimental

More on identification

Page 67: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

67

More on identification

Page 68: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

• CompareMS/MS

68

library

experimental

More on identification

Page 69: intro to data preprocessing - UAB › ... › day2 › intro_to_data_preprocessing.pdfprocessing using XCMS and MZmine 2. 11 LC-MS raw data in 3D 12 Raw data in 3D 13 3D to 2D •

69

Thank you!


Recommended