Identification of “Known Unknowns” Using Accurate Mass Data and
Large “Spectraless” Databases
James Little, Eastman Chemical Company
Accurate Mass and Novel Applications of Mass Spectrometry for Unknown Environmental Analyses Symposium
Session No. 1520, 3/14/2012
Origin of “Known Unknowns” Term
“. . . there are known knowns; there are things we know we know. We also
know there are known unknowns; that is to say we know there are some
things we do not know. But there are also unknown unknowns -- the ones we
don't know we don't know. And if one looks throughout the history of our
country and other free countries, it is the latter category that tend to be the
difficult ones.”
Donald Rumsfeld, weapons of mass destruction in Iraq News Briefing, Feb.
12, 2002
“Known Unknowns”-unknown to investigator, but known in the
chemical literature, internet or reference databases
“Unknown Unknowns”-not previously cited in literature, internet or
reference databases
Non-Targeted Species:
First Approach: Search Computer Mass Spectral Databases
Obtain accurate or nominal mass data by GC-MS or LC-MS
Technique Commercial Spectra
Purchased
Eastman Created
Spectra
EI GC-MS ~1.1 M ~55K
MS/MS LC-MS ~65K ~4K
Search electron ionization or collision induced dissociation
(“MS/MS”) databases
Usually much more successful for EI than collision
induced dissociation searches
NIST Search Used for EI and MS/MS Searches
All new spectra with structures added to Eastman libraries
• Thermo Excalibur
• Agilent Chemstation
• Agilent MassHunter
• Waters MassLynx
• NIST AMDIS
• Cambridge ChemDraw
Data Processing:
NIST Library Search:
Updated and distributed automatically nightly to 40 systems
Search by spectrum, structure (model compounds), electronic
notebook, etc.
Search of “Spectraless” Databases: If Not Found in Spectral Search
Obtain accurate mass EI or MS/MS data
Database No. of Entries
CAS Registry ~65 M
ChemSpider ~26 M
Molecular formula (MF) determined using isotopic pattern and other
approaches
Search large “spectraless” databases to find candidate structures1,2
Rank candidates by number of associated references (both) or
association with key words (CAS only)
Confirm with fragmentation patterns, exchangeable protons, sample
history, UV-VIS spectra, relative retention times, purchased standard, etc.
Get Substances
C22H29N3O
2,256
CAS Registry
~65,000,000
Sort by Number
of References
2,059 #1 1,252 Ref
#2 20
#3 19
#1 Correct
Search CAS Registry with SciFinder by MF: UV Additive in Polymer
Request to Identify UV additive in polymer
Found in LC chromatogram by characteristic UV spectrum
MF by accurate mass and isotope pattern
MW 351
Interpretation of MS/MS Spectrum
M+H+
-C5H10
-2 C5H10
C5H11+
-C2H4
-2 C2H4
Confirmation of Candidate Structure
CH3CN + C5H11+
UV spectrum characteristic of UV Absorber
Confirmation of Candidate Structure (continued)
nm
AU
302 340
204 Diode Array
Spectrum
One exchangeable proton by deuterium exchange via infusion in
solvent mix with minimal D2O
Ultimately confirmed by purchased standard
Identification of greenish yellow species in baby diaper adhesive
CAS Registry
~65,000,000
Get all references
for all substances
482
Get Substances
C30H42O2
264 1 Ref.
Refined all references with
“greenish yellow”
Ironic, BHT included to stabilize, led to color from oxidation
Problem Solved!
MF Search Refined with Key Word for More Obscure Compounds
“BHT” “BHTdimer”
O2
[Ox]
ChemSpider Search by Monoisotopic Mass Instead of Molecular Formula (MF)
Problem:
Approach:
The number of MF’s increases dramatically with molecular weight
e.g. at 1000 da for eleven most common elements, >350 million MF3
Often difficult to get unique MF at molecular weights > 600 Da
In practice, number of known compounds decreases with molecular weight
Thus, initially search by monoisotopic mass, not molecular formula at >600 Da
Then compare isotope abundance of all candidates
ChemSpider
~26,000,000
Search by
783.520 +/-15 ppm
16
Sort by Number
of References
51
#1 29 Ref.
#2 5
#3 2
#1 Correct
Search by Monoisotopic Mass: Antioxidant in Polymer
Identification of additive in polymer
Search ChemSpider by monoisotopic mass
Confirmation of Antioxidant Candidate Structure
Initially confirmed by MS/MS fragmentation pattern
3 exchangeable protons by infusion in solvent mix with D2O
Ultimately confirmed by purchased standard
Twelve Examples from Literature by Monoisotopic Mass vs. MF
Ranked by Number of References
Species MF Monoisotopic
Mass
Rank MF Rank Monoisotopic
Mass using +/- 5 ppm
window
Moxidectin C37H53NO8 639.3771 1 of 5 1 of 39
Erythromycin C37H67NO13 733.4612 1 of 42 1 of 53
Digoxin C41H64O14 780.4296 1 of 47 1 of 65
Rifampicin C43H58N4O12 822.4051 1 of 29 1 of 96
Rapamycin C51H79N1O13 913.5551 1 of 43 1 of 51
Amphotericin B C47H73N1O17 923.4878 1 of 33 1 of 42
Gramicidin S C60H92N12O10 1140.7059 1 of 5 1 of 13
Cereulide C57H96N6O18 1152.6781 1 of 3 2 of 8
Cyclosporin A C62H111N11O12 1201.8414 1 of 36 1 of 38
Vancomycin C66H75Cl2N9O24 1447.4302 1 of 24 1 of 26
perfluorotriazine C30H18N3O6P3F48 1520.9642 1 of 1 1 of 1
Thiostrepton C72H85N19O18S5 1663.4924 1 of 5 1 of 5
Average 1 of 23 1 of 36
Searching CAS Registry with STN Express by Molecular Weight
Error for measurement normally 4-5 times greater
442.2844
443.2863
444.2911 445.3012
Molecular weight = (m/z x intensity)
intensity – 1.0074
Monoisotopic mass = 442.2844 – 1.0073 = 441.2771
= 441.600
CAS Registry only searched by molecular weight, not monoisotopic mass
Only searched with STN Express, command base interface
=> file registry
=> s 441.57-441.63/mw
Two significant figures to right of decimal
Comparison of Four Approaches for 90 Test Compounds
Search Approach #1 #2 #3 #4 #5 >#5
CAS Registry/molecular formula 84 4 1 1
ChemSpider/molecular formula 81 4 3 1 1
CAS Registry/average molecular weight 66 13 4 1 3 3
ChemSpider/monoisotopic mass 77 4 4 2 3
Search by molecular formula is best by CAS or ChemSpider
Monoisotopic mass by ChemSpider very useful for compounds MW> 600
Monoisotopic mass and molecular weight also useful for compounds at
lower molecular weight
Summary of ChemSpider and CAS Registry Capabilities
Search
Approach
Pros Cons
ChemSpider -free via internet
-good user interface
-automation by instrument
manufacturer using Web API
(Application Program
Interface)
-ability to search monoisotopic
mass
-smaller No. of entries (~26 M) and
references
-can’t refine by key word
CAS Registry
with SciFinder
or STN Express
-larger No. of entries (~65 M)
and references
-refine by key word
-good SciFinder user interface
for MF searches on internet
-fee charged
-no API available for instrument
manufacturer automation
-no ability to search by monoisotopic
mass, only MW with complicated STN
Express interface
Conclusions and Future Plans
Computer searches of mass spectral databases easiest
approach for identifying “known unknowns”
Automation of latter approach needed for more complex samples
Searching “spectraless” databases very powerful alternative approach
Attempting to persuade CAS to add monoisotopic mass
Ranking and visual comparison of computer generated CID
spectra of candidate structures to observed spectrum
References
1. “Identification of “known unknowns” utilizing accurate mass data and ChemSpider,” J.
Little, A. Williams, A. Pshenichnov, V. Tkachenko, Vol. 23, No. 1, p 179-185
2. “Identification of “known unknowns” utilizing accurate mass data and chemical abstracts
service databases, J. Little, C. Cleven, S. Brown Vol. 22, No. 2, p 348-359.
3. “Metabolomic database annotations via query of elemental compositions: mass
accuracy is insufficient even at less than 1 ppm,” T. Kind, Oliver Fiehn, BMC
Bioinformatics 2006, 7:234.
Additional Information on Internet with Screenshots
Search “Little Mass Spec,” top hit in Google
Acknowledgements
NIST: Steve Stein, David Sparkman, Dmitrii Tchekhovskoi, Anzor Mikaya
ChemSpider: Tony Williams, Alexey Pshenichnov, Valery Tkachenko
Eastman: Bill Tindall, Kent Morrill, Curt Cleven, Adam Howard, Jean
Coffman, Mike Ramsey, Sen Li
ETSU School of Pharmacy: Stacy Brown
CAS: Anthony Machosky
Waters: Jim Lekander
Agilent Technologies: Mike Scott
Art Work for Journal Cover: Minta Fannon