+ All Categories
Home > Documents > Neglected Tropical Diseases and Omics Science: Proteogenomics Analysis of the Promastigote Stage of ...

Neglected Tropical Diseases and Omics Science: Proteogenomics Analysis of the Promastigote Stage of ...

Date post: 26-Jan-2017
Category:
Upload: akhilesh
View: 212 times
Download: 0 times
Share this document with a friend
14
Original Article Neglected Tropical Diseases and Omics Science: Proteogenomics Analysis of the Promastigote Stage of Leishmania major Parasite Harsh Pawar, 1,2 Santosh Renuse, 1,3 Sweta N. Khobragade, 4 Sandip Chavan, 1,5 Gajanan Sathe, 1,5 Praveen Kumar, 1 Kiran N. Mahale, 4 Kalpita Gore, 4 Aditi Kulkarni, 4 Tanwi Dixit, 4 Rajesh Raju, 1 T. S. Keshava Prasad, 1 H. C. Harsha, 1 Milind S. Patole, 4 and Akhilesh Pandey 6–9 Abstract Among the neglected tropical diseases, leishmaniasis is one of the most devastating, resulting in significant mortality and contributing to nearly 2 million disability-adjusted life years. Cutaneous leishmaniasis is a debilitating disorder caused by the kinetoplastid protozoan parasite Leishmania major, which results in dis- figuration and scars. L. major genome was the first to be sequenced within the genus Leishmania. Use of proteomic data for annotating genomes is a complementary approach to conventional genome annotation approaches and is referred to as proteogenomics. We have used a proteogenomics-based approach to map the proteome of L. major and also annotate its genome. In this study, we searched L. major promastigote proteomic data against the annotated L. major protein database. Additionally, we searched the proteomic data against six- frame translated L. major genome. In all, we identified 3613 proteins in L. major promastigotes, which covered 43% of its proteome. We also identified 26 genome search-specific peptides, which led to the identification of three novel genes previously not identified in L. major. We also corrected the annotation of N-termini of 15 genes, which resulted in extension of their protein products. We have validated our proteogenomics findings by RT-PCR and sequencing. In addition, our study resulted in identification of 266 N-terminally acetylated peptides in L. major, one of the largest acetylated peptide datasets thus far in Leishmania. This dataset should be a valuable resource to researchers focusing on neglected tropical diseases. Introduction T he protozoan parasite Leishmania causes a wide range of infectious diseases referred to as Leishmaniasis, which is the second most common cause of mortality and fourth in morbidity amongst different tropical infectious diseases (WHO, 2010). Leishmaniasis also contributes to nearly 2 million disability-adjusted life years in terms of disease burden (McDowell et al., 2011). These kinetoplastid parasites lead a digenic life cycle between the sand fly vector and their mammalian or vertebrate hosts (Killick-Kendrick, 1990). The parasites exist as motile free living flagellated forms referred to as promastigotes within the sand fly gut. Once the sand fly takes a blood meal on the human host, these flagellated parasites infect the phagocytic cells (i.e., macro- phages and neutrophils in human host). These engulfed par- asites reside within the phagolysosome and differentiate into aflagellated amastigotes. Depending on genetics and immunity-related factors in the host, pathogenic species of Leishmania are responsible for causing a spectrum of clinical manifestations in patients (Murray et al., 2005). The majority of Leishmaniasis cases reported are derma- totropic infections termed as cutaneous leishmaniasis (CL). CL is caused by a complex that includes L. major, L. mex- icana, and L. tropica. Upon infection with the parasite, symptoms generally start appearing within a few days and may last up to several months. Progressively increasing ery- thematous, and frequently pruritic, papules appear at the site 1 Institute of Bioinformatics, International Technology Park, Bangalore, India. 2 Rajiv Gandhi University of Health Sciences, Bangalore, India. 3 Department of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam, India. 4 National Centre for Cell Sciences, Pune, India. 5 Manipal University, Madhav Nagar, Manipal, India. 6 McKusick-Nathans Institute of Genetic Medicine, Departments of 7 Biological Chemistry, 8 Oncology, and 9 Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland. OMICS A Journal of Integrative Biology Volume 18, Number 8, 2014 ª Mary Ann Liebert, Inc. DOI: 10.1089/omi.2013.0159 1
Transcript

Original Article

Neglected Tropical Diseases and Omics Science:Proteogenomics Analysis of the Promastigote Stage

of Leishmania major Parasite

Harsh Pawar,1,2 Santosh Renuse,1,3 Sweta N. Khobragade,4 Sandip Chavan,1,5 Gajanan Sathe,1,5

Praveen Kumar,1 Kiran N. Mahale,4 Kalpita Gore,4 Aditi Kulkarni,4 Tanwi Dixit,4 Rajesh Raju,1

T. S. Keshava Prasad,1 H. C. Harsha,1 Milind S. Patole,4 and Akhilesh Pandey6–9

Abstract

Among the neglected tropical diseases, leishmaniasis is one of the most devastating, resulting in significantmortality and contributing to nearly 2 million disability-adjusted life years. Cutaneous leishmaniasis is adebilitating disorder caused by the kinetoplastid protozoan parasite Leishmania major, which results in dis-figuration and scars. L. major genome was the first to be sequenced within the genus Leishmania. Use ofproteomic data for annotating genomes is a complementary approach to conventional genome annotationapproaches and is referred to as proteogenomics. We have used a proteogenomics-based approach to map theproteome of L. major and also annotate its genome. In this study, we searched L. major promastigote proteomicdata against the annotated L. major protein database. Additionally, we searched the proteomic data against six-frame translated L. major genome. In all, we identified 3613 proteins in L. major promastigotes, which covered43% of its proteome. We also identified 26 genome search-specific peptides, which led to the identification ofthree novel genes previously not identified in L. major. We also corrected the annotation of N-termini of 15genes, which resulted in extension of their protein products. We have validated our proteogenomics findings byRT-PCR and sequencing. In addition, our study resulted in identification of 266 N-terminally acetylatedpeptides in L. major, one of the largest acetylated peptide datasets thus far in Leishmania. This dataset should bea valuable resource to researchers focusing on neglected tropical diseases.

Introduction

The protozoan parasite Leishmania causes a widerange of infectious diseases referred to as Leishmaniasis,

which is the second most common cause of mortality andfourth in morbidity amongst different tropical infectiousdiseases (WHO, 2010). Leishmaniasis also contributes tonearly 2 million disability-adjusted life years in terms ofdisease burden (McDowell et al., 2011). These kinetoplastidparasites lead a digenic life cycle between the sand fly vectorand their mammalian or vertebrate hosts (Killick-Kendrick,1990). The parasites exist as motile free living flagellatedforms referred to as promastigotes within the sand fly gut.Once the sand fly takes a blood meal on the human host, these

flagellated parasites infect the phagocytic cells (i.e., macro-phages and neutrophils in human host). These engulfed par-asites reside within the phagolysosome and differentiateinto aflagellated amastigotes. Depending on genetics andimmunity-related factors in the host, pathogenic species ofLeishmania are responsible for causing a spectrum of clinicalmanifestations in patients (Murray et al., 2005).

The majority of Leishmaniasis cases reported are derma-totropic infections termed as cutaneous leishmaniasis (CL).CL is caused by a complex that includes L. major, L. mex-icana, and L. tropica. Upon infection with the parasite,symptoms generally start appearing within a few days andmay last up to several months. Progressively increasing ery-thematous, and frequently pruritic, papules appear at the site

1Institute of Bioinformatics, International Technology Park, Bangalore, India.2Rajiv Gandhi University of Health Sciences, Bangalore, India.3Department of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam, India.4National Centre for Cell Sciences, Pune, India.5Manipal University, Madhav Nagar, Manipal, India.6McKusick-Nathans Institute of Genetic Medicine, Departments of 7Biological Chemistry, 8Oncology, and 9Pathology, Johns Hopkins

University School of Medicine, Baltimore, Maryland.

OMICS A Journal of Integrative BiologyVolume 18, Number 8, 2014ª Mary Ann Liebert, Inc.DOI: 10.1089/omi.2013.0159

1

of sand fly bite. Lymphatic spread with regional lymphade-nopathy is common during the infection. The initial papuleturns scaly and further scaly region develops into an ulcershowing inflammation. Very often, the site is self-healing,leaving a permanent scar, although the parasites persistthroughout life. In few cases, this inflammatory site mayacquire secondary infection leading to further complications.In a small percentage of cases, primary CL lesion may leavean individual at risk for later development of mucocutaneousleishmaniasis (Schwartz et al., 2006).

The L. major genome was the first kinetoplastid protozoanparasite genome to be sequenced, along with Trypanosomacruzi and Trypanosoma brucei in 2005. The L. major genomesize is 32.8 Mb and has 36 chromosomes ranging from0.28 Mb to *2.8 Mb. According to the current annotations,L. major has 9222 genes of which 8272 are protein-coding,911 are RNA genes, and 39 are pseudogenes (Ivens et al.,2005). In addition to L. major, another Leishmania from theold world group L. infantum (36 chromosomes) and L. bra-ziliensis (35 chromosomes) from the new world group weresequenced in 2007 (Peacock et al. 2007). Recently, two otherLeishmania species have been sequenced: L. donovani,which has 36 chromosomes with a genome size of 32.4million base pairs encoding 8195 genes (Downing et al.,2011) and L. mexicana, which has 34 chromosomes (Rogerset al., 2011).

Annotating genomes using mass spectrometry data is acomplementary approach to conventional genome annotation(Mann and Pandey, 2001; Pandey and Lewitter, 1999;Pandey and Mann, 2000). This approach of searching massspectrometry data against EST databases (Choudhary et al.,2001) and also against six-frame translated nucleotide se-quences from a genome provides the most direct evidence ofprotein coding genes (Yates et al., 1995). This approach hasresulted in identification of novel genes and corrections to theexisting gene models in Homo sapiens (Desiere et al., 2005;Fermin et al., 2006; Menon et al., 2009; Molina et al., 2005),Anopheles gambiae (Chaerkady et al., 2011; Kalume et al.,2005), Mycobacterium tuberculosis (Kelkar et al., 2011),Candida glabrata (Prasad et al., 2012), Plasmodium falci-parum (Lasonder et al., 2002), Toxoplasma gondii (Xia et al.,2008), Schmidtea mediterranea (Bocchinfuso et al., 2012),Yersinia pestis (Payne et al., 2010), Pristionchus pacificus(Borchert et al., 2010), Deinococcus deserti (Baudet et al.,2010), Leishmania donovani (Nirujogi et al., 2013), and otherorganisms. This proteogenomics approach is also useful inidentifying splice variants, extensions/truncations to the ex-isting proteins, changes in translational start site based onidentification of N-terminally acetylated peptides, and framechanges (Castellana and Bafna, 2010; Renuse et al., 2011).This approach is mostly useful in situations where completegenome sequence of the organism under study is available(Armengaud et al., 2013). However, in cases where the ge-nome sequence is unavailable, the identification of proteinscan be done by using genome sequence data from taxo-nomically related species; we have used comparative pro-teogenomics to map the proteome of L. donovani (Pawaret al., 2012).

Leishmania, being a unique unicellular dimorphic patho-gen, has been studied in detail by various researchers usingproteomics tools and is summarized by Paape and Aebischer(2011). Proteomic profiling of Leishmanial promastigotes

and amastigote stages has shown differentially regulatedproteins in two different growth stages (Alcolea et al., 2011;Pawar et al., 2012; Pescher et al., 2011). Comparative pro-teomic analysis have been reported of antimony-resistant and-susceptible Leishmania (Biyani et al., 2011; Matrangoloet al., 2013; Walker et al., 2012), as well from oxidative andnitrosative stresses (Sardar et al., 2013). 627 phosphoproteinsof L. donovani axenic promastigotes and amastigotes havebeen identified by proteomic analysis, which revealed leish-manial proteins have multiple phosphorylation sites andphosphorylation occurs at distinct stages of the life cycle(Tsigankov et al., 2013). Similarly large numbers of mem-brane proteins have been identified from promastigotes andamastigote stages that usually are difficult to detect andcharacterize by a proteomic approach (Brotherton et al.,2012). Few investigators have studied macrophage pro-teomics after infection of Leishmanial parasite and havemainly focused on cell metabolism (Menezes et al., 2013)and phagosome and exosome formation (Campbell-Valoiset al., 2012). Although the L. major genome was the firstgenome to be sequenced, not a single study on proteomics hasbeen reported about this pathogen.

In the present study, we carried out proteogenomics anal-ysis of L. major promastigotes; the high resolution massspectrometry data was searched against protein database ofL. major. This resulted in identification of 23,307 uniquepeptides, which mapped to 3613 proteins in L. major ac-counting for 43% of L. major proteome. In addition to theproteomic analysis of L. major, we also carried out proteo-genomics analysis by searching L. major promastigote dataagainst six-frame translated genome of L. major. This re-sulted in identification of 26 genome search-specific peptides(GSSPs), which in turn resulted in identification of 3 novelgenes and 15 N-terminal extensions of existing genes modelsin L. major based on high resolution mass spectrometry-derived data. Finally, we performed RT-PCR and sequencingto validate the existence of these novel genes and N-terminalextensions identified in L. major.

Materials and Methods

Leishmania major cell culture

We used a MHOM/IL/67/JERICHO II (ATCC 50122)strain of Leishmania major for proteomic analysis. The mo-tile promastigote stage of L. major ATCC strain was culturedin DMEM containing 10% FBS at 25�C. The cells weregrown until the L. major culture reached mid-log phase. Thepromastigotes were observed under a phase contrast micro-scope to check for the characteristic flagella that help thepromastigote in whipping motility. The cells were harvestedby centrifuging at 3000 rpm for 10 min and the cell pellet waswashed with PBS of pH 7. This procedure was repeated sixtimes to get rid of any contaminating serum proteins used inthe cell culture media. The promastigote cells were countedusing a Neubauer chamber and 1 · 109 cells were used for theproteomic analysis.

Sample processing, protein isolation and fractionation

Sample preparation was carried out in a similar way asdescribed in our previous work (Pawar et al., 2012). Briefly,1 · 109 promastigotes were lysed in 0.5% SDS solution post

2 PAWAR ET AL.

6 sonication cycles. The lysate was centrifuged at 10,000 rpmfor 10 min and the supernatant was used for further proteomicexperimentation. Lysate (200 lg) was resolved using 10%SDS-PAGE, and in-gel trypsin digestion (1:20 trypsin) ofdifferent protein bands was carried out. The peptide fractionswere dried using the vacuum drying process. In addition,200 lg of lysate was subject to in-solution trypsin digestion(1:15 trypsin) as described previously (Pawar et al., 2012).This was followed by strong cation exchange (SCX) chro-matography-based fractionation of tryptic peptides. Thesefractions were completely dried and reconstituted in 40 lL of0.2% formic acid.

Mass spectrometry analysis

A total of 49 fractions obtained from two differentfractionation methods were analyzed by LC-MS/MS. ForLC-MS/MS, desalting trap column (5l 100 A Magic C18, Mi-chrom Bioresources) and analytical column (5 l 100 A MagicC18, Michrom Bioresources) were connected to a ProxeonEasy nLC system (Thermo Scientific, Bremen, Germany).The RP-LC was connected on-line with LTQ-Orbitrap VelosETD mass spectrometer (Thermo Electron, Bremen, Ger-many). The nanospray source was fitted with an 8 lm emittertip (New Objective, Woburn, MA) and a voltage of 2 kV wasapplied. Peptide samples reconstituted in HPLC solvent A(0.1% formic acid) were loaded onto the trap column andwashed for 5 min with 97% HPLC solvent A and 3% solventB (90% ACN in 0.1% formic acid). The peptide separa-tion was carried out using a linear gradient of 7%–30% sol-vent B for 53 min. at a constant flow rate of 0.4 lL/min. Thedata were acquired using Xcalibur 1.2 (Thermo Electron). Inthe scan range of m/z 350 to 1800 Da, 20 most abundant ionswere selected for fragmentation. The acquired ions wereexcluded for 30 sec. Target ion quantity for FT full MS scanwas 5 · 105 and for MSn was 2 · 105. FT analyzer was usedto acquire MS at the resolving power of 60,000 (for precur-sor). HCD mode was used to carry out MS/MS with resolv-ing power of 15,000 at 400 m/z (for fragment ions) andlock mass option was enabled for accurate mass measure-ments. Polydimethylcyclosiloxane ions were used for inter-nal calibration.

Mass spectrometric data searchesand protein identification

A nonredundant (nr) protein database of Leishmania majorstrain Fredlin (n = 8412) available from TriTrypDB (http://tritrypdb.org/common/downloads/release-4.1/Lmajor/) ki-netoplatid genomics resource, as of July 20, 2012 was usedfor our analysis. Database dependent searches were submit-ted to Mascot (version 2.2) and Sequest search engines usingProteome Discoverer (Thermo Scientific, version 1.2) and thedata was analyzed. The search parameters used were as fol-lows: a) Proteolytic enzyme used was trypsin (with up toone missed cleavage); b) 20 ppm was set as the peptide masserror tolerance; c) 0.1 Da was set as the fragment masserror tolerance; d) Fixed modification was cysteine carba-mindomethylation; e) Oxidation of methionine and proteinN-terminal acetylation were included as variable modifica-tions. Peptide data from Mascot and Sequest search algo-rithms were extracted with 1% false discovery rate (FDR) asthreshold using Thermo Proteome Discoverer. In the first

pass search, the proteomic data were searched against theL. major protein database, and unassigned spectra not map-ping to the protein database were filtered using a spectrumconfidence filter. These unassigned spectra were searchedin the second pass search against the six-frame translatedL. major genome database, which resulted in the identifica-tion of GSSPs. Only Rank 1 peptides with high confidencewere extracted and used for further analysis. Unique peptidedata from the both search algorithms (Mascot and Sequest)after protein database dependent searches were used forfurther analysis.

Proteogenomics data analysis

The Leishmania major strain Fredlin genome sequence wasdownloaded from TriTrypDB (http://tritrypdb.org/common/downloads/release-4.1/Lmajor/) and a six-frame translateddatabase (n = 1,117,604) was created using in-house pythonprograms, as described previously (Nirujogi et al., 2013).Briefly, the six-frame translated genome database of Leish-mania major was created by masking the gaps in the genome.The frequently encountered contaminants such as trypsin,keratins, and BSA were added to both the protein and six-frame translated genome databases that were used for MS/MS ion search. Searches were submitted through ProteomeDiscoverer console (Thermo Scientific, version 1.2) to Se-quest and Mascot (version 2.2) search engines as describedabove.

Mass spectrometric data were searched against the L.major protein database in the first pass search. The unas-signed spectra were searched using a L. major six-frametranslated genome database in the second pass search. Thesepeptides that map uniquely to the six-frame translated ge-nome database but not to the protein database were referredto as GSSPs. We carried out further analysis of the genomicregions where these GSSPs map to, to identify novel genesor corrections to existing annotations. Alternative genemodels were searched using two different gene predictionprograms—FgeneSH and GeneMark for eukaryotes. Thusextensions in gene models, as well as novel genes obtainedusing peptide evidence and gene prediction tools, werechecked for their conservation across Kinetoplast family(includes genus Leishmania and Trypanosoma). In addition,we checked other well-curated Leishmania databases such asTriTrypDB and ensemble L. major genome browser to see ifthe N-terminal extension and novel genes identified in ouranalysis have been incorporated in the current gene builds inthese databases.

Primer designing and RT-PCR validation

Specific primers were designed for gene models (novelgenes and extensions) using Gene Runner (Version 3.05)software. The designing of the primer was based on annota-tion of gene models as predicted by gene prediction algo-rithms or homology to other Leishmania species. The size ofthe amplicon was chosen to be 300 to 500 bp to ensure goodsequencing quality. Total RNA was isolated from promasti-gote stage of L. major using Qiagen RNeasy kit (Qiagen,Netherlands) and the yield of the RNA was estimated usingNanodrop 2000 (Thermo Scientific, DE). DNase I treatmentwas given to the RNA. Preparation of cDNA was carried outby reverse transcription of 1 lg RNA using ABI high capacity

CUTANEOUS LEISHMANIASIS 3

cDNA reverse transcription kit. The PCR reaction mixtureconsisted of approximately 1 lL cDNA obtained from prot-mastigote lifestage of L. major, 10 nM of forward and reverseprimers, 1.5 mM MgCl2, 0.2 mM dNTP mix, 1.5 U of Taqpolymerase and Taq PCR buffer in 25 lL reaction volumes.Amplification of the targets was achieved by the followingPCR cycle: 95�C for 5 min, 35 cycles of 94�C for 60 sec,50�C–60�C for 45 sec, 72�C for 60 sec, and final extension at72�C for 10 min. PCR reaction carried out without cDNAfrom promastigote stage served as negative control. Ampli-con size was checked using DNA ladder on 1.5% agarose gel.Specific amplicons were purified by Qiagen gel extraction kitand subjected to sequencing by Sanger’s method. The cDNAsequences obtained have been submitted to GenBank.

Availability of proteomic data

The raw mass spectrometry data (.raw files) generatedfrom this study have been made publically available to otherresearchers through the Tranche server (http://proteomecommons.org/tranche). The Tranche Hash is: cONyF1wG0tlabIAwSiSrzrcdNDiXbqaoZfRQoDHu8Z + 0OQck IyvAGp7Rv3kQ3U5pZHDUIDhDcqGsMcjQlA6SGIDJm5AAAAAAAAAYvg = =

Results

Summary of proteomic data

We carried out extensive proteomic profiling of L. majorpromastigotes to map its proteome and annotate its genome.This study generated one of the largest protein catalogs ofL. major to date. The workflow used for proteomic analysisand genome annotation of L. major is outlined in Figure 1.Peptide fractions obtained from two different fractionationmethods (i.e., in-gel and SCX) were analyzed on an LTQ-Orbitrap Velos ETD mass spectrometer, and 49 LC-MS/MSruns were carried out. We used high resolution settings forboth MS and MS/MS fragmentation. The mass spectrometrydata were searched using Sequest and Mascot search algo-rithms. Approximately 285,651 MS/MS spectra were acquiredin this study, which resulted in 172,513 peptide spectrummatches (PSMs). In total, 23,333 unique peptide sequencesthat passed the 1% FDR threshold were identified fromprotein and genome database searches. Thus the high res-olution mass spectrometry data yielded more identificationof proteins compared to previously published studies char-acterizing the proteome of L. major. The complete list ofproteins and peptide identified in protein database searches

Proteins isolated from promastigotes

SDS-PAGE

In-gel digestion

In-solution digestion

SCX

Axenic Leishmania major

Promastigote

LC-MS/MS

Mass spectrometry data

FIG. 1. Work flow for sample processing, fractionation, andproteomic analysis of L. major promastigote. Leishmaniamajor promastigotes were used for the proteomic analysis asindicated. Proteins from lysates of promastigote were extractedusing SDS and subjected to SDS-PAGE or digested withtrypsin and subjected to SCX (strong cation exchange) chro-matography. LC-MS/MS analysis of digested gel bands orSCX fractions was carried out on a high resolution massspectrometer. The mass spectrometry data was searchedagainst a protein database and a six-frame translated genomedatabase of L. major.

4 PAWAR ET AL.

is shown in Supplementary Tables S1 and S2 (supplementarymaterial is available online at www.liebertonline.com/omi).

Confirmation of annotated protein coding genesin L. major genome

The majority of the annotated proteins coding genes inL. major genome were identified based on in silico geneprediction algorithms, and most of the proteins are hypo-thetical proteins with no known assigned function. Wewanted to carry out in-depth proteomic profile of L. majorpromastigotes that would directly identify protein-codinggenes. L. major has 8412 protein coding genes and, in ourstudy, we identified 23,307 unique peptides derived from3613 proteins. Of the 3613 proteins identified in L. majorpromastigote, 58% (2082) proteins were identified by three ormore peptides per protein, 14% (508) proteins by two pep-tides, and 28% (1023) proteins by single peptide. Of the 8412proteins in L. major, the majority of the proteins are un-characterized hypothetical proteins (5336), and we haveconfirmed the existence of 1999 of these hypothetical pro-teins (24% of L. major proteome). Most of these hypotheticalproteins have no known biological function assigned andtheir role in L. major cellular physiology remains unexplored.Figure 2 illustrates an example that shows 51 unique pep-tides (2036 PSMs) mapping to LmjF.28.2770 protein codinggene which codes for 658 amino acid long heat-shock protein70 (HSP70). A representative MS/MS spectrum of HSP70peptide SVHDVVLVGGSTR identified in L. major has beenprovided. HSP70 has been shown to play a crucial role inLeishmania stage differentiation (Louw et al., 2010), viru-lence (Folgueira et al., 2008), and cell cycle (Raina and Kaur,2012). In addition to HSP70, we also identified proteins be-longing to different categories to be abundantly expressed inL. major promastigotes such as calpain-like cysteine pepti-dases (LmjF.27.0500, 278 unique peptides), glyceraldehyde

3-phosphate dehydrogenase (LmjF.30.2970, 38 unique pep-tides), pyruvate phosphate dikinase (LmjF.11.1000, 72 un-ique peptides) elongation factor 1-alpha, and paraflagellarrod proteins, among others. These proteins had multiplepeptides mapping to them with hundreds of PSMs. Theseproteins were abundantly expressed and seem to play animportant role in parasite survival.

N-terminal acetylation and confirmationof translational start sites

Protein translational start site (TSS) identification and itscorrect assignment in predicted transcripts has always been achallenging task. The traditional approach for TSS determi-nation relied on annotating the longest open reading frame(ORF) in a given nucleotide sequence under study. Hence,other methods such as mass spectrometry-based approach(Gevaert et al., 2003; Molina et al., 2005) and homology-based bioinformatics approach (Peri and Pandey, 2001) havebeen used as an additional line of evidence to determineN-terminal acetylation of annotated proteins. N-terminalacetylation reaction is catalyzed by N-acetyltransferase bytransferring acetyl group from acetyl-CoA to the first aminoacid post excision of initiator methionine. The excision ofN-terminal initiator methionine is catalyzed by methionineaminopeptidases (Frottin et al., 2006). N-terminal acetylationoccurs in the majority of eukaryotic proteins. There are fivemajor groups of N-terminal acetyl transferases (A to E) thatcatalyze N-terminal acetylation (Arnesen et al., 2009; Hol-lebeke et al., 2012). Mass spectrometry-based approachescan be used to ascertain the exact TSSs using N-terminalacetylation of peptides (Helbig et al., 2010). N-terminalacetylated peptides can be determined by searching massspectrometry data using N-terminal acetylation as a modifi-cation during database searches. In our present analysis, weidentified 266 N-terminal acetylated peptides; among these,

L. majorLmjF_28_2770

1 658

200 400 600 800 1000 1200

m/z

0

50

100

Rel

ativ

e A

bu

nd

ance

477.25

y5

637.34

538.27b5

906.49

b91050.54

b11

1002.98

y10

175.11

y1

187.10

b2

276.17

y2

324.17

b3

352.16VHD

363.20

y3

420.22

y4

439.20

b4 576.31

y6

689.40

y7

b6

750.42

b7

849.48

b8

SVHDVVLVGGSTR

A

B

C

FIG. 2. Identification of heat shock protein 70 (hsp70).(A) Heat shock protein 70 (blue bar) L. major is encodedby the gene LmjF.28.2770 and is a 658 amino acid longprotein. The red bars indicate the peptides identified inour study mapping to HSP70. (B) Sequence of heatshock protein 70 (LMJF.28.2770) was identified inL. major on the basis of 51 unique peptides with 2,076PSMs with 70% protein coverage; the red colored se-quences are the peptides identified in this study. (C) Arepresentative MS/MS spectrum of peptide SVHDVVLVGGSTR identified from heat shock protein 70.

CUTANEOUS LEISHMANIASIS 5

21% of initiator methionine residues were found to beacetylated. In addition, other amino acids at second positionwere found to be acetylated. Among these, serine (47%),alanine (25%), and threonine (7%) residues were also foundto be acetylated. This approach can be used to correctly an-notate TSS for protein coding genes. The complete list ofN-terminal acetylated peptides and the corresponding pro-teins is shown in Supplementary Table S2.

Proteogenomics analysis of L. major

Identifying novel genes and corrections to existing genemodels can be achieved by applying a proteogenomics ap-proach. The majority of genome annotations are carried outusing in silico approaches. Proteomics can provide a directevidence for existence of a protein coding gene in a givengenome understudy. We carried out a proteogenomics anal-ysis of L. major, which resulted in identification of genomesearch specific peptides (GSSPs). These GSSPs mapped un-iquely to the L. major genome and were identified in ourstudy by searching unassigned spectra that were filtered outpost L. major protein database searches. These unassignedspectra were searched using a L. major six-frame translatedgenome database in the second pass search. This proteoge-nomics study resulted in identification of 15 N-terminalextensions and three novel genes in L. major. A completeoverview of the proteogenomics workflow adopted for the

L. major promastigote MS/MS data is shown in Figure 3. Thedetails of the novel genes and N-terminal extensions alongwith the corresponding GSSPs are provided in the Supple-mentary Table S3.

Identification of novel genes in L. major. We identifiedGSSPs mapping to the six-frame translated L. major genome.These were categorized as intergenic (i.e., mapping to par-ticular genomic regions that have no known protein codinggenes). The identification of these novel protein coding genesin L. major shows that these genes are not a part of the currentL. major genome annotation. In the current study, three un-ique peptides were identified mapping to the intergenic re-gion of L. major genome that resulted in identification ofthree novel protein coding genes in L. major. An illustrativeexample of a novel gene identified in L. major coding for aconserved hypothetical protein is shown in Figure 4. Weidentified a unique GSSP mapping to intergenic region onchromosome 32 on the negative strand in the third frame.Upon gene prediction analysis, we identified a protein that is125 amino acids long in this region. This is a novel gene inL. major and there is no conserved annotated ortholog inother Leishmania species, namely L. donovani, L. brazi-liensis, and L. infantum. However, Trypanosoma cruzi has agene Tc00.1047053511707.20 that codes for this hypotheti-cal protein. Upon further analysis of the genomic sequencesfrom different Leishmania species, we identified similar

L. major six-frame translated genome database searches

L. major Protein database searches

L. major mass spectrometry data

Sequest/ Mascot

Unassigned spectra

spectrum confidence filter

Sequest/ Mascot

Protein database searches

3,613 proteins in L. major

Peptides unique to protein database searches

Genome Search Specific Peptides (GSSPs)

3 Novel gene models Corrections to existing gene models

15 N-terminalextension

FIG. 3. Proteogenomics workflow used to analyze L. majorpromastigote proteomic data. L. major mass spectrometry datawas first searched against a L. major protein database resultingin the identification of 3613 proteins in L. major. The un-matched spectra were searched in the second pass searchagainst the six-frame translated L. major genome database,which resulted in identification of GSSPs. These GSSPs, inturn, provided evidence for existence of novel protein codinggenes and also resulted in correction to existing annotations inL. major upon further analysis of genomic regions.

6 PAWAR ET AL.

conserved ORFs in other Leishmania species that are not yetidentified and annotated as protein-coding genes. Our pro-teogenomics data show that these novel genes have protein-coding potential. Further investigations need to be carried out todetermine whether they are also expressed in other Leishmaniaspecies that have similar conserved genomic sequences. An-other example of a novel gene in L. major is one which codesfor a hypothetical protein. We identified a unique GSSP map-ping to the intergenic region on chromosome 9 on the negativestrand in the first frame. Gene prediction analysis resulted inidentification of 140 amino acid long protein. We have notidentified any conserved orthologs in other protozoans.

Identification of N-terminal extension in L. major. In thepresent study, we have corrected the annotation of existinggene models in L. major. Towards this end we identifiedGSSPs that mapped to 5¢ boundary of existing genes inL. major. Hence, we identified 23 unique peptides that re-sulted in N-terminal extension of 15 proteins in L. major. Anexample of N-terminal extension of an existing L. major geneis of leucine-rich repeat coding gene (LmjF.30.0150), which

codes for a 216 amino acid long protein and is present onchromosome 30 negative strand first frame. We identified 3GSSPs mapping to upstream of this gene. This resulted inextension of this protein by 140 amino acids. Upon geneprediction analysis, we identified a longer protein whichcontained the 3 GSSPs. It had a conserved ortholog inL. braziliensis which was coded by the gene LBRM_30_0160which codes for a 359 amino acid longer protein (Fig. 5).Thus we corrected the annotation of the existing gene inL. major and identified an N-terminal extended proteinproduct in L. major. Another example of N-terminal exten-sion of the gene LmjF.26.1550, which codes for mitochon-drial tri-functional enzyme alpha subunit of length 726 aminoacids, is present on chromosome 26 on the positive strand inthe second frame. We identified a single GSSP mappingupstream of this gene. This resulted in extension of thisprotein by 88 amino acids. Upon gene prediction analysis, weidentified a longer protein that contained the GSSP. It had aconserved ortholog in L. braziliensis that was coded by thegene LBRM_26_1570, which codes for an 804 amino acidlong protein.

A

B

C

FIG. 4. Identification of a novel protein coding genes in L. major based onpeptide evidence. (A) A genome search specific peptide (red bar) mapped to anintergenic region in the L. major genome where no annotated gene exists. Furthergene prediction analysis using FGeneSH algorithm of this genomic region re-sulted in identification of a novel protein coding gene in L. major. Upon furtherbioinformatics analysis, a corresponding ortholog was identified in T. cruziTc00.1047053511707.20. (B) Sequence of novel predicted protein identified inFGeneSH gene prediction analysis of L. major. Red color sequences denote theGSSP identified in L. major, which support the existence of this novel proteincoding gene in L. major. (C) A representative MS/MS spectrum of an identifiedgenome search specific peptide LIEEDLFLDHIDK is shown.

CUTANEOUS LEISHMANIASIS 7

Bioinformatics analysis

Identified proteins were classified into various groupsbased on their primary subcellular localization (e.g., mem-brane, cytoplasm). Additionally, proteins were also groupedon the basis of biological process (e.g., metabolism). Theanalysis was carried out in accordance with Gene Ontology(GO) standards. The 3613 proteins identified in the currentproteogenomics analysis of L. major promastigote werecategorized on the basis of biological processes (e.g., cell

signaling and communication). This resulted in the identifi-cation of 1418 proteins (39%), which were grouped into oneof biological processes (Fig. 6a). The majority of the groupedproteins play a role in cellular metabolism, protein synthesis,degradation, and transport. However, only a small number ofthe identified proteins are known to be associated withpathogenesis. This points towards the fact that many of thevirulence factors associated with infection and survival ofL. major in infected host cells (i.e., neutrophils and macro-phages) are expressed constantly at a certain basal level, even

A

B

C

FIG. 5. Identification of N-terminal extended proteins in L. major. (A) Threegenome search specific peptides (red bars) mapped to the upstream region ofL. major gene LmjF.30.0150 (red box) and this code for short protein product of 216amino acids (blue box). However, upon gene prediction analysis, the presence of amuch longer protein coding gene extending N terminal of this gene was identified.Ortholog LbrM.30.0160 in L. braziliensis codes for 359 amino acid long protein(blue box) and it supports the N-terminal extension. (B) Full-length pairwisealignment of full length L. braziliensis ortholog LbrM.30.0160 and the truncatedL. major protein LMJF.30.0150 as well as full length predicted protein identified inFgeneSH gene prediction analysis. The GSSPs supporting the N-terminal extensionand correction to the existing gene model of LmjF.30.0150 in L. major are re-presented as red color sequences and its conservation across the other ortholog inL. braziliensis. (C) A representative MS/MS spectrum of an identified genomesearch specific peptide IGPQGAMFLFDALR is shown.

8 PAWAR ET AL.

in axenic L. major cultures. Expression of these pathogenicity-associated factors probably increases post exposure of thesepathogens to the host immune system. Additionally, proteinswere classified on the basis of primary subcellular locations(Fig. 6b). We have also analyzed biological domains andmotifs of the proteins identified in the current proteoge-nomics analysis of L. major. The details of the correspondingdomains and motifs can be found in Supplementary Table S1.Many of the identified proteins could not be classified in anyknown category, and they remained unclassified. Many ofthese proteins could be potential drug targets or vaccinecandidates.

A bioinformatics analysis of L. major promastigote pro-teins was carried out by searching TriTrypDB for identifi-cation of signal peptide and transmembrane domaininformation. Of the 3613 proteins identified in L. major, 399proteins contained transmembrane (TM) domain, and 366proteins contained only signal peptides (SP). Additionally,

296 proteins contained both a transmembrane domain and asignal peptide (Fig. 6c). Identification of secreted and trans-membrane proteins in L. major can provide valuable infor-mation regarding the role of these membrane/extracellularproteins in virulence and pathogenicity of L. major duringcutaneous leishmaniasis.

Some of the interesting surface proteins that are shown tobe associated with virulence are as follows: Tuzins are sur-face proteins, which have been studied in relation to Trypa-nosoma cruzi and its expression, was shown to be regulatedpost transcriptionally (Teixeira et al., 1999). Kinetoplastidmembrane protein-11 (KMP-11) is a surface protein that isexpressed in the promastigote stage of Leishmania. KMP-11expression is severely downregulated in the amastigote stageand is localized to flagellar basal body and flagella. KMP-11is present either as membrane bound or soluble protein(Berberich et al., 1998). It is highly antigenic and stimulates avariety of immune cells such as B-cells and dendritic cells.

Cell Signaling and communication (87)

Cell cycle and apoptosis (30)

Metabolism (465)

Pathogenesis (22)

Nucleic acid metabolism (147)

Transport (200)

cell adhesion and motility (67)

Protein folding, proteolysis and translation (400)

Unclassified (2,195)

Cytoplasm (228)

Glycosome (27)

Nucleus (106)

Flagella (11)

Extracellular and membrane (205)

Mitochondria (65)

Proteosome (27)

Ribosome (111)

Endoplasmic reticulum and golgi apparatus (37)

Cytoskeleton (66)

Unclassified (2,730)

399

366

296

Signal peptide

Transmembrane domain

Total = 3,613

A

B

C

FIG. 6. Gene ontology basedbioinformatics analysis of proteinsidentified in L. major. (A) Dis-tribution of L. major proteinsbased on biological function. (B)Distribution of L. major proteinsbased on subcellular localization.(C) Distribution of secreted andtransmembrane proteins identifiedin L. major. A great percentage ofproteins remain unclassified.

CUTANEOUS LEISHMANIASIS 9

Additionally, circulating anti-KMP-11 antibodies have beenfound in patients with leishmaniasis (Trujillo et al., 1999).The partial list of transmembrane and secreted proteinsidentified in L. major promastigote is shown in Table 1.

RT-PCR validation of mass spectrometry derived data

To validate our finding in the current study, a subset ofnovel identifications were validated using RT-PCR. Theseincluded 5 N-terminal extensions and 3 novel genes. Intotal 8 RT-PCR reactions were performed to validate theexistence of transcripts for novel genes and N-terminalextension events. The cDNA sequences have been sub-mitted to NCBI Genbank. Figure 7 shows the RT-PCR

amplification products for the novel genes and N-terminalextensions identified in our study. The details of validatednovel genes and N-terminal extensions with the corre-sponding GenBank accession numbers are provided in theSupplementary Table S4.

Discussion

Leishmania major genome was the first genome whosecosmid contig map was constructed amongst Leishmaniaspecies. Also, the genome of L. major was the first genome tobe completely sequenced and annotated. It is at present thebest annotated genome amongst Leishmania parasites. TheL. major complete genome sequence has been exploited for

Table 1. Transmembrane and Secreted Proteins Identified in L. major

Gene ProteinUniquepeptides Biological role

1 LmjF.36.2570 Membrane-bound acidphosphatase

12 Protects engulfed Leishmaniaagainst oxidative stress

2 LmjF.35.2210 Kinetoplastid membraneprotein-11

7 Surface glycoprotein that inducesa strong immune response

3 LmjF.08.0795 Tuzin 1 Surface protein4 LmjF.05.1215 Surface antigen-like protein 8 Surface protein5 LmjF.23.1082 Hydrophilic acylated surface

protein a4 Surface protein expressed in the

metacyclics and intracellular amastigotes.It has been suggested that the protein mayplay a role in metacyclic stages in parasitetransmission from the vector sandflyto the host. In the intracellular amastigotestage it may have role in parasite survival.

6 LmjF.28.1200 Glucose-regulated protein 78 46 Member of HSP 70 family

FIG. 7. RT-PCR based validation of novel genes and N-terminalextension identified in L. major. We validated the results from ourproteogenomics study using RT-PCR and sequencing it. We carried outagarose gel electrophoresis of the PCR products of N-terminal exten-sions and novel genes identified in L. major proteogenomics study.

10 PAWAR ET AL.

comparative genomics with other sequenced genomes ofLeishmania that showed conservation of gene content andsynteny amongst different species of Leishmania (Peacocket al., 2007). The gene sequences also helped in tran-scriptomic analysis in different Leishmanial spp. (Cohen-Freue et al., 2007; Depledge et al., 2009; Li et al., 2008).Various in silico analyses have been performed using thisgenome sequence to yield interesting information such asproteins targeted to glycosomes (Opperdoes and Szikora,2006). Despite this information, very few studies have beencarried out with L. major, especially in the area of tran-scriptomics and proteomics. Studies reported in these areasused L. donovani, L. mexicana, L. infantum, and L. pana-mensis, and showed that *10%–12% of proteins are differ-entially expressed in amstigotes (Papadopoulou, 2008). Thismay be because it is easier to obtain motile promastigotes,infective metacyclic and axenic non-motile amastigotes byin vitro cultures with these parasites. Axenic amastigotescannot be obtained in vitro from L. major, which make itan unsuitable model for most research of Leishmania. Studyof proteomics in Leishmania has also helped reveal post-translational modifications (Rosenzweig et al., 2008), im-munodominant antigens (Forgber et al., 2006), and virulencefactors (Fasel, 2008). Proteins from the promastigotes stage,which were previously reported to be uniquely expressed inthe amastigote stage such as 40S ribosomal S2 protein,b-tubulin, and eukaryotic initiation factor 5a, putative havebeen reported (Nugent et al., 2004).

Our proteogenomics study is significant because it is thelargest catalog of proteins detected by mass spectrometry ascompared to other reported studies of different Leishmanialspp. In total, 23,333 unique peptide sequences were identi-fied, which belonged to 3613 proteins in L. major promasti-gotes. This represents 43% of L. major proteome comprisingof a large set of proteins having different cellular loca-tions and function. We compared our L. major promastigotedata with gene expression analysis reported previously toidentify stage-specific expression in L. major promastigotes(Akopyants et al., 2010). The gene expression data weredownloaded from TriTrypDB. We identified various proteinsthat were reported to be expressed in log phase promasti-gotes, but also detected proteins that are known to be ex-pressed in the metacyclic stage. We found many proteins inour proteomic data that are reported to be upregulated inmetacyclic stage of L. major as seen by the relative increasein their mRNA as compared to log phase promastigotes stageof parasite. These proteins belong to cell surface proteinsincluding amastins, hydrophilic acylated surface protein, andvarious transporters. Proteins that were upregulated in me-tacyclics responsible for oxygen radical metabolism such astryparedoxin and glutathione peroxidase were also detectedin our preparation (TritrypDB). In fact, metacyclic promas-tigotes are enriched from promastigotes culture, and presenceof these proteins in our preparation indicates presence ofmetacyclic forms in the culture. We also compared ourL. major promastigote data with a previously published geneexpression analysis data on L. major by Rochette et al.(2008), who compared gene expression profiles between mid-log phase promastigotes and lesion-derived amastigotes ofL. major. Of the 431 unique genes found to be differentiallyexpressed in L. major promastigotes, we identified 260 pro-teins in our study. These include proteins again responsible

for flagellar components, surface antigen proteins, fatty acidmetabolism, and various transporters.

We identified 266 N-terminal acetylated peptides in ourstudy. This is the largest catalog of acetylated peptides to datein Leishmania. N-terminal acetylation is crucial for stabilityof proteins. In a proteomic study by Rosenzweig et al. (2008),the authors carried out a time course analysis of L. donovanidifferentiation in a quantitative proteomic approach. Theyalso tried to identify various post-translational modificationsin both life stages of L. donovani. Their study resulted inidentification of 16 phosphopeptides, 20 methylated peptides,10 glycopeptides, and 27 acetylated peptides (Rosenzweiget al. 2008). Of those 26 unique acetylated peptides, whichmapped to 26 proteins in L. donovani, we identified 10 N-terminally acetylated peptides in our 266 N-terminal acety-lated peptide list (black bold in Supplementary Table S5).While the remaining 16 acetylated peptides identified inL. donovani were not identified in our L. major proteoge-nomics study, in fact one peptide (MNSNHADAGAPAMEK) was redundant in Rosenzweig’s list and it mapped toaspartyl-tRNA synthetase, as can be seen in SupplementaryTable S5. However, the remaining 256 N-terminal acetylatedpeptides identified in L. major promastigotes have not beenreported earlier and the complete list of these unique 266acetylated peptides is provided in Supplementary Table S5.

Our proteogenomics analysis resulted in identification of26 genome search specific peptides (GSSPs), which mapuniquely to the six-frame translated genome of L. major andnot to the L. major protein database. In all, 23 peptidesmapped to N-termini of 15 genes, thereby extending theboundary of these previously annotated genes. These 15 N-terminal extensions could be identified as the GSSPs mapped5¢ to the existing gene models in L. major. It should bepointed out that alternative coding sequences were shown toexist in TriTrypDB database due to alternative splicing viaspliced leader peptides. Many of these gene models that wehave corrected in L. major have an alternative start site up-stream of the currently annotated gene and GSSPs identifiedin our study mapped to the alternative CDS. However, thesegene models were not updated and our study showed thatthese alternative CDS lead to generation of longer proteins.The full length orthologs for many of these N-terminallyextended genes exist in other related kinetoplastids. Ourproteogenomics analysis also identified three GSSPs thatmap to intergenic region where no known ORF was shown toexist in L. major genome. These three GSSPs resulted inidentification of three novel genes in L. major and these genesdo not have conserved annotated orthologs in other Leish-mania. This is quite novel since the L. major genome was thefirst to be sequenced, and its genome was used as refer-ence for assembling other Leishmania species genomes. TheL. major genome is the best curated genome among theLeishmania species. We could still identify three novel pro-tein coding genes that were missed earlier by using ourproteogenomics approach. However, further analysis of thegenomic regions (containing novel protein coding genes) inL. major and in other related Leishmania species resulted inidentification of conserved ORFs that represent novel proteincoding genes not yet identified and annotated in L. major andin other related Leishmania species. Thus at the genomiclevel, there is conservation of these novel genes in mostLeishmania species, but these novel protein coding genes

CUTANEOUS LEISHMANIASIS 11

need to be identified and annotated. We have validated theexistence of these three novel genes and 5 N-terminal ex-tensions in identified in our proteogenomics study of L. majorusing RT-PCR and sequencing.

Conclusions

In the current proteogenomics analysis, we mapped 43%of the L. major proteome, illustrating the value of high-resolution mass spectrometry derived data in proteomicprofiling and genome annotation of L. major. We haveidentified expression of many hypothetical proteins in pro-mastigotes. In addition, we identified bona fide virulencefactors that have been shown to be associated with patho-genesis and survival of Leishmania in infected hosts. Al-though L. major genome was the first Leishmania genome tobe sequenced and is one of the best annotated Leishmaniagenomes, we identified 26 GSSPs from six-frame translatedgenome searches of L. major and this resulted in correction tothe 15 gene models (N-terminal extensions). In addition, weidentified three novel genes in L. major genome on the basisof GSSP evidence. This study shows that genome annotationis a challenging task, and using proteomic and proteoge-nomics approaches can assist in identification of novel genesmissed in the current annotation. We could also refine ex-isting gene models on the basis of peptide evidence.

Acknowledgments

We thank the Department of Biotechnology (DBT), Gov-ernment of India, for research support to the Institute ofBioinformatics. Harsha Gowda is a Wellcome Trust/DBTIndia Alliance Early Career Fellow. Harsh Pawar and SwetaN. Khobragade are recipients of Senior Research Fellow-ships, Gajanan Sathe and Sandip Chavan are recipients ofJunior Research Fellowships from Council of Scientific andIndustrial Research (CSIR), Government of India. SantoshRenuse is recipient of Senior Research Fellowship from theUniversity Grants Commission (UGC), Government of India.We would like to thank Deepa Chaphekar for assistance inmaking of figures.

Author Disclosure Statement

The authors declare no conflicting financial interests.

References

Akopyants NS, Kruvand E, Wong I, and Beverley SM. (2010).L. major strain Friedlin Three Developmental Stages.Available from http://tritrypdb.org. Last access: 2/1/14.

Alcolea PJ, Alonso A, and Larraga V. (2011). Proteome pro-filing of Leishmania infantum promastigotes. J EukaryotMicrobiol 58, 352–358.

Armengaud J, Hartmann EM, and Bland C. (2013). Proteoge-nomics for environmental microbiology. Proteomics 13,2731–2742.

Arnesen T, Van Damme P, Polevoda B, et al. (2009). Pro-teomics analyses reveal the evolutionary conservation anddivergence of N-terminal acetyltransferases from yeast andhumans. Proc Natl Acad Sci USA 106, 8157–8162.

Baudet M, Ortet P, Gaillard JC, et al. (2010). Proteomics-basedrefinement of Deinococcus deserti genome annotation reveals

an unwonted use of non-canonical translation initiation co-dons. Mol Cell Proteomics 9, 415–426.

Berberich C, Machado G, Morales G, Carrillo G, Jimenez-RuizA, and Alonso C. (1998). The expression of the Leishmaniainfantum KMP-11 protein is developmentally regulated andstage specific. Biochim Biophys Acta 1442, 230–237.

Biyani N, Singh AK, Mandal S, Chawla B, and Madhubala R.(2011). Differential expression of proteins in antimony-susceptible and -resistant isolates of Leishmania donovani.Mol Biochem Parasitol 179, 91–99.

Bocchinfuso DG, Taylor P, Ross E, et al. (2012). Proteomicprofiling of the planarian Schmidtea mediterranea and itsmucous reveals similarities with human secretions and thosepredicted for parasitic flatworms. Mol Cell Proteomics 11,681–691.

Borchert N, Dieterich C, Krug K, et al. (2010). Proteogenomicsof Pristionchus pacificus reveals distinct proteome structureof nematode models. Genome Res 20, 837–846.

Brotherton MC, Racine G, Ouameur AA, Leprohon P, Papa-dopoulou B, and Ouellette M. (2012). Analysis of membrane-enriched and high molecular weight proteins in Leishmaniainfantum promastigotes and axenic amastigotes. J ProteomeRes 11, 3974–3985.

Campbell-Valois FX, Trost M, Chemali M, et al. (2012).Quantitative proteomics reveals that only a subset of theendoplasmic reticulum contributes to the phagosome. MolCell Proteomics 11, M111 016378.

Castellana N, and Bafna V. (2010). Proteogenomics to discoverthe full coding content of genomes: A computational per-spective. J Proteomics 73, 2124–2135.

Chaerkady R, Kelkar DS, Muthusamy B, et al. (2011). A pro-teogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res21, 1872–1881.

Choudhary JS, Blackstock WP, Creasy DM, and Cottrell JS.(2001). Interrogating the human genome using uninterpretedmass spectrometry data. Proteomics 1, 651–667.

Cohen-Freue G, Holzer TR, Forney JD, and McMaster WR.(2007). Global gene expression in Leishmania. Int J Parasitol37, 1077–1086.

Depledge DP, Evans KJ, Ivens AC, et al. (2009). Comparativeexpression profiling of Leishmania: Modulation in gene ex-pression between species and in different host genetic back-grounds. PLoS Negl Trop Dis 3, e476.

Desiere F, Deutsch EW, Nesvizhskii AI, et al. (2005). Integrationwith the human genome of peptide sequences obtained byhigh-throughput mass spectrometry. Genome Biol 6, R9.

Downing T, Imamura H, Decuypere S, et al. (2011). Wholegenome sequencing of multiple Leishmania donovani clinicalisolates provides insights into population structure andmechanisms of drug resistance. Genome Res 21, 2143–2156.

Fasel NA, A. El Fadili-Kundig A, Gonzalez I, and Masina S.(2008). The Leishmania proteome. Leishmania after the ge-nome, pgs. 55–76.

Fermin D, Allen BB, Blackwell TW, et al. (2006). Novel geneand gene model detection using a whole genome open readingframe analysis in proteomics. Genome Biol 7, R35.

Folgueira C, Carrion J, Moreno J, Saugar JM, Canavate C, andRequena JM. (2008). Effects of the disruption of the HSP70-II gene on the growth, morphology, and virulence of Leish-mania infantum promastigotes. Int Microbiol 11, 81–89.

Forgber M, Basu R, Roychoudhury K, et al. (2006). Mappingthe antigenicity of the parasites in Leishmania donovani in-fection by proteome serology. PLoS One 1, e40.

12 PAWAR ET AL.

Frottin F, Martinez A, Peynot P, et al. (2006). The proteomicsof N-terminal methionine cleavage. Mol Cell Proteomics 5,2336–2349.

Gevaert K, Goethals M, Martens L, et al. (2003). Exploringproteomes and analyzing protein processing by mass spec-trometric identification of sorted N-terminal peptides. NatBiotechnol 21, 566–569.

Helbig AO, Gauci S, Raijmakers R, et al. (2010). Profiling of N-acetylated protein termini provides in-depth insights into theN-terminal nature of the proteome. Mol Cell Proteomics 9,928–939.

Hollebeke J, Van Damme P, and Gevaert K. (2012). N-terminalacetylation and other functions of Nalpha-acetyltransferases.Biol Chem 393, 291–298.

Ivens AC, Peacock CS, Worthey EA, et al. (2005). The genomeof the kinetoplastid parasite, Leishmania major. Science 309,436–442.

Kalume DE, Peri S, Reddy R, et al. (2005). Genome annotationof Anopheles gambiae using mass spectrometry-derived data.BMC Genomics 6, 128.

Kelkar DS, Kumar D, Kumar P, et al. (2011). Proteogenomicanalysis of Mycobacterium tuberculosis by high resolutionmass spectrometry. Mol Cell Proteomics 10, M111 011627.

Killick-Kendrick R. (1990). Phlebotomine vectors of theleishmaniases: A review. Med Vet Entomol 4, 1–24.

Lasonder E, Ishihama Y, Andersen JS, et al. (2002). Analysis ofthe Plasmodium falciparum proteome by high-accuracy massspectrometry. Nature 419, 537–542.

Li Q, Zhao Y, Ni B, et al. (2008). Comparison of the expressionprofiles of promastigotes and axenic amastigotes in Leish-mania donovani using serial analysis of gene expression.Parasitol Res 103, 821–828.

Louw CA, Ludewig MH, Mayer J, and Blatch GL. (2010). TheHsp70 chaperones of the Tritryps are characterized by un-usual features and novel members. Parasitol Int 59, 497–505.

Mann M, and Pandey A. (2001). Use of mass spectrometry-derived data to annotate nucleotide and protein sequencedatabases. Trends Biochem Sci 26, 54–61.

Matrangolo FS, Liarte DB, Andrade LC, et al. (2013). Compara-tive proteomic analysis of antimony-resistant and -susceptibleLeishmania braziliensis and Leishmania infantum chagasi lines.Mol Biochem Parasitol 190, 63–75.

McDowell MA, Rafati S, Ramalho-Ortigao M, and Ben Salah A.(2011). Leishmaniasis: Middle East and North Africa researchand development priorities. PLoS Negl Trop Dis 5, e1219.

Menezes JP, Almeida TF, Petersen AL, et al. (2013). Proteomicanalysis reveals differentially expressed proteins in macro-phages infected with Leishmania amazonensis or Leishmaniamajor. Microbes Infect 15, 579–591.

Menon R, Zhang Q, Zhang Y, et al. (2009). Identification ofnovel alternative splice isoforms of circulating proteins in amouse model of human pancreatic cancer. Cancer Res 69,300–309.

Molina H, Bunkenborg J, Reddy GH, Muthusamy B, Scheel PJ,and Pandey A. (2005). A proteomic analysis of human he-modialysis fluid. Mol Cell Proteomics 4, 637–650.

Murray HW, Berman JD, Davies CR, and Saravia NG. (2005).Advances in leishmaniasis. Lancet 366, 1561–1577.

Nirujogi RS, Pawar H, Renuse S, et al. (2013). Moving fromunsequenced to sequenced genome: Reanalysis of the pro-teome of Leishmania donovani. J Proteomics 93, 48–61.

Nugent PG, Karsani SA, Wait R, Tempero J, and Smith DF.(2004). Proteomic analysis of Leishmania mexicana differ-entiation. Mol Biochem Parasitol 136, 51–62.

Opperdoes FR, and Szikora JP. (2006). In silico prediction ofthe glycosomal enzymes of Leishmania major and trypano-somes. Mol Biochem Parasitol 147, 193–206.

Paape D, and Aebischer T. (2011). Contribution of proteomicsof Leishmania spp. to the understanding of differentiation,drug resistance mechanisms, vaccine and drug development. JProteomics 74, 1614–1624.

Pandey A, and Lewitter F. (1999). Nucleotide sequence data-bases: A gold mine for biologists. Trends Biochem Sci 24,276–280.

Pandey A, and Mann M. (2000). Proteomics to study genes andgenomes. Nature 405, 837–846.

Papadopoulou B MF, Rochette A, Muller M, Dumas C, andChow C. (2008). Regulation of gene expression in Leish-mania throughout a complex digenetic life cycle. Leishmaniaafter the genome, pgs. 29–54.

Pawar H, Sahasrabuddhe NA, Renuse S, et al. (2012). A pro-teogenomic approach to map the proteome of an unsequencedpathogen—Leishmania donovani. Proteomics 12, 832–844.

Payne SH, Huang ST, and Pieper R. (2010). A proteogenomicupdate to Yersinia: Enhancing genome annotation. BMCGenomics 11, 460.

Peacock CS, Seeger K, Harris D, et al. (2007). Comparativegenomic analysis of three Leishmania species that cause di-verse human disease. Nat Genet 39, 839–847.

Peri S, and Pandey A. (2001). A reassessment of the translationinitiation codon in vertebrates. Trends Genet 17, 685–687.

Pescher P, Blisnick T, Bastin P, and Spath GF. (2011). Quan-titative proteome profiling informs on phenotypic traits thatadapt Leishmania donovani for axenic and intracellular pro-liferation. Cell Microbiol 13, 978–991.

Prasad TS, Harsha HC, Keerthikumar S, et al. (2012). Proteo-genomic analysis of Candida glabrata using high resolutionmass spectrometry. J Proteome Res 11, 247–260.

Raina P, and Kaur S. (2012). Knockdown of LdMC1 and Hsp70by antisense oligonucleotides causes cell-cycle defects andprogrammed cell death in Leishmania donovani. Mol CellBiochem 359, 135–149.

Renuse S, Chaerkady R, and Pandey A. (2011). Proteoge-nomics. Proteomics 11, 620–630.

Rochette A, Raymond F, Ubeda JM, et al. (2008). Genome-wide gene expression profiling analysis of Leishmania majorand Leishmania infantum developmental stages reveals sub-stantial differences between the two species. BMC Genomics9, 255.

Rogers MB, Hilley JD, Dickens NJ, et al. (2011). Chromosomeand gene copy number variation allow major structuralchange between species and strains of Leishmania. GenomeRes 21, 2129–2142.

Rosenzweig D, Smith D, Myler PJ, Olafson RW, and Zilber-stein D. (2008). Post-translational modification of cellularproteins during Leishmania donovani differentiation. Pro-teomics 8, 1843–1850.

Sardar AH, Kumar S, Kumar A, et al. (2013). Proteome changesassociated with Leishmania donovani promastigote adapta-tion to oxidative and nitrosative stresses. J Proteomics 81,185–199.

Schwartz E, Hatz C, and Blum J. (2006). New world cutaneousleishmaniasis in travellers. Lancet Infect Dis 6, 342–349.

Teixeira SM, Kirchhoff LV, and Donelson JE. (1999). Trypa-nosoma cruzi: Suppression of tuzin gene expression by its5¢-UTR and spliced leader addition site. Exp Parasitol 93,143–151.

TritrypDB. TritrypDB: www.tritrypdb.org. Last access: 2/10/14.

CUTANEOUS LEISHMANIASIS 13

Trujillo C, Ramirez R, Velez ID, and Berberich C. (1999). Thehumoral immune response to the kinetoplastid membraneprotein-11 in patients with American leishmaniasis andChagas disease: Prevalence of IgG subclasses and mapping ofepitopes. Immunol Lett 70, 203–209.

Tsigankov P, Gherardini PF, Helmer-Citterich M, Spath GF,and Zilberstein D. (2013). Phosphoproteomic analysis ofdifferentiating Leishmania parasites reveals a unique stage-specific phosphorylation motif. J Proteome Res 12, 3405–3412.

Walker J, Gongora R, Vasquez JJ, et al. (2012). Discovery offactors linked to antimony resistance in Leishmania pana-mensis through differential proteome analysis. Mol BiochemParasitol 183, 166–176.

WHO. (2010). WHO. Leishmaniasis: Burden of disease, sur-veillance e control, epidemics, acess to medicines, informa-tion resources. World Health Organization.

Xia D, Sanderson SJ, Jones AR, et al. (2008). The proteome ofToxoplasma gondii: Integration with the genome providesnovel insights into gene expression and annotation. GenomeBiol 9, R116.

Yates JR, 3rd, Eng JK, and McCormack AL. (1995). Mininggenomes: Correlating tandem mass spectra of modified andunmodified peptides to sequences in nucleotide databases.Anal Chem 67, 3202–3210.

Address correspondence to:Dr. Akhilesh Pandey

McKusick-Nathans Institute of Genetic MedicineJohns Hopkins University School of Medicine

733 North Broadway, BRB 527Baltimore, MD 21205

E-mail: [email protected]

or

Dr. Milind S. PatoleNational Centre for Cell Science

Pune 411007India

E-mail: [email protected]

Abbreviations Used

CL¼ cutaneous leishmaniasisGSSPs¼ genome search specific peptides

KMP¼ kinetoplastid membrane proteinPSMs¼ peptide spectrum matches

14 PAWAR ET AL.


Recommended