Draft
In silico genome-wide identification and characterization of
glutathione S-transferase gene family in Vigna radiata (L.) Wilczek
Journal: Genome
Manuscript ID gen-2017-0192.R1
Manuscript Type: Article
Date Submitted by the Author: 25-Nov-2017
Complete List of Authors: Vaish, Swati; Institute of Bioscience and Technology, Shri Ramswaroop
Memorial University, Lucknow-Deva Road, Barabanki, Uttar Pradesh 225003, India Awasthi , Praveen; National Agri-Food Biotechnology Institute Tiwari, Siddharth; National Agri-Food Biotechnology Institute Tiwari, Shailesh Kumar; Indian Institute of Vegetable Research Gupta, Divya; Institute of Bioscience and Technology, Shri Ramswaroop Memorial University, Lucknow-Deva Road, Barabanki, Uttar Pradesh 225003, India Basantani, Mahesh; Shri Ramswaroop Memorial University, Institute of Bio-Science and Technology
Is the invited manuscript for consideration in a Special
Issue? :
N/A
Keyword: Bioinformatics, Phi and tau GSTs, legumeinfo, Plant stress metabolism, Whole-genome sequencing
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
1
In silico genome-wide identification and characterization of glutathione
S-transferase gene family in Vigna radiata (L.) Wilczek
Swati Vaisha, Praveen Awasthi
b, Siddharth Tiwari
b, Shailesh Kumar
Tiwaric, Divya Gupta
a,Mahesh Kumar Basantani
a*
aInstitute of Bioscience and Technology, Shri Ramswaroop Memorial University,
Lucknow-Deva Road, Barabanki, 225003, Uttar Pradesh, India
bNational Agri-Food Biotechnology Institute (NABI), (Department of Biotechnology,
Government of India), Knowledge City, Sector 81, S.A.S. Nagar, Mohali 140306,
Punjab, India.
cDivision of Crop Improvement, ICAR-Indian Institute of Vegetable Research, Post bag
01, Post Office Jakhini (Shahanshahpur), Varanasi, 221305, Uttar Pradesh, India
Running title: Glutathione S-transferase genes in Vigna radiata
*Corresponding author:
Dr. Mahesh Kumar Basantani,
Institute of Bioscience and Technology,
Shri Ramswaroop Memorial University,
Lucknow-Deva Road,
Barabanki, 225003,
Uttar Pradesh,
India
Tel: +91 9839534061
Page 1 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
2
Abstract
Plant glutathione S-transferases are integral to normal plant metabolism, and biotic and
abiotic stress tolerance. GST gene family has been characterized in diverse plant species
using molecular biology and bioinformatics approaches. In the current study, in silico
analysis identified 44 GSTs in Vigna radiata. Of the total 44 GSTs identified,
chromosomal locations of 31 GSTs were confirmed. The pI value of GST proteins ranged
from 5.10 to 9.40. The predicted molecular weights ranged from 13.12 to 50 kDa.
Subcellular localization analysis revealed that all GSTs were predominantly localized in
the cytoplasm. The active site amino acids were confirmed to be serine in tau, phi, theta,
zeta and TCHQD; cysteine in lambda, DHAR and omega; and tyrosine in EF1G. The
gene architecture conformed to the 2 exon-1 intron and 3 exon-2 intron organization in
case of tau and phi classes, respectively. MEME analysis identified 10 significantly
conserved motifs with the width of 8 to 50 amino acids. The motifs identified were either
specific to a specific GST class, or were shared by multiple GST classes. The results of
the current study will be of potential importance in the characterization of GST gene
family in V. radiata, an economically important leguminous crop.
Keywords: Bioinformatics, phi and tau GSTs, legumeinfo, plant stress metabolism,
whole-genome sequencing
1. Introduction
Vigna radiata (L.) R. Wilczek, commonly known as mung bean, is a widely cultivated
warm-season legume species, grown extensively in tropical and subtropical regions of the
world. It belongs to the papilionoid subfamily of the Fabaceae family and has a diploid
chromosome number of 2n =22 (Keatinge et al. 2011). It is grown in about 6 million
hectares of area in the world, primarily in South and Southeast Asia, Africa, South
America and Australia (Schafleitner et al. 2015; Nair et al. 2012). In southern Asia, it is
grown in India, Pakistan, Bangladesh, Sri Lanka, China, etc. It is grown in the equatorial
and semi-tropical climates of India (Baloda et al. 2017). India is the world’s largest
producer of V. radiata contributing over 50% of the total world production (Kang et al.
2014). The states of Rajasthan, Maharashtra, Andhra Pradesh, Gujarat and Bihar
Page 2 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
3
contribute maximum countrywide V. radiata production, with Rajasthan and Maharashtra
producing 26% and 20%, respectively (Figure 1) (GoI: Department of Agriculture and
Cooperation, 2014-15).
V. radiata is a rich source of protein, resistant starch and dietary fibres (Chitra et al.
1995; Sandhu et al. 2008), and contains higher levels of folate and iron than most other
legumes (Graham and Vance 2003). Proteins and carbohydrates of V. radiata are easily
digestible and create less flatulence than proteins derived from other legumes. V. radiata
is highly sensitive to salty, and desiccated soils, and variations of temperature (very low
or very high), during the flowering and seed/pod development stages, result in heavy
losses to productivity (Baloda et al. 2017). V. radiata fixes atmospheric nitrogen via root
rhizobial symbiosis and improves soil fertility (Yaqub et al. 2010).
Plant glutathione S-transferases (GSTs; EC 2.5.1.1.8) are found in higher plants,
bryophytes, algae, fungi, bacteria, etc. They are specifically located in the cytosol
(Sheehan et al. 2001). Interestingly, two GSTs, Nt ParA in tobacco, and GTSU12 in
Arabidopsis thaliana are found to be present in the nucleus (Dixon et al. 2009; Takahashi
et al. 1995). Besides, in plants, GSTs are also reported to be present in chloroplasts,
mitochondria, etc. (Lallement et al. 2014). Cytosolic GSTs are the most numerous and
extensively characterized both structurally and functionally. However, there is a lack of
information about organelle-specific plant GSTs. In plants, GSTs have been recognized
for their roles in normal physiology and metabolism, biotic and abiotic stress
management like oxidative stress tolerance, herbicides, weedicides, pesticides, antibiotic
resistance etc. (Dixon et al. 2002; Neuefeind et al. 1997).
Plant GSTs are classified into ten different classes: tau (GSTU), phi (GSTF), lambda
(GSTL), GSTT (theta), GSTZ (zeta), DHAR (dehydroascorbate reductase), TCHQD
(tetrachloro hydroquinone), EF1G (elongation factor-1gamma), hemerythrin and iota,
based on sequence similarity, immunological reactivity, kinetic properties, and structural
conformation (Liu et al. 2013). 14 different GST classes have been identified in
eukaryotic photosynthetic organisms on the basis of phylogenetic analysis, out of which
phi and tau are plant-specific and the most abundant plant GSTs (Lallement et al. 2014).
All plant GSTs have relative molecular masses of around 50 kDa and are homodimer or
heterodimer composed of two similarly sized (~25kDa) subunits with an isoelectric point
Page 3 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
4
in the pH range of 4 to 5. All GSTs consist of two domains, a conserved N-terminal
domain containing G-site for binding of GSH; and relatively less conserved C-terminal
domain with H-site, with which hydrophobic toxic molecules interact. Plant GSTs have a
tendency to attain α-helical structure followed by random coil and then by β-sheet
(Labrou et al. 2015).
GST gene family has been characterized in several plant species using molecular and
bioinformatics approaches. 55 GSTs in Arabidopsis thaliana (Dixon et al. 2009), 82 in
Oryza sativa (Jain et al. 2010), 81 in Populus (Lan et al. 2009), 42 in maize (McGonigle
et al. 2000), 65 in broccoli (Vijayakumar et al. 2016), 84 in Hordeum vulgare (Rezaei et
al. 2013), 20 in Draceana cambodiana (Zhu et al. 2016), and 37 GSTs in Physcomitrella
patens (a bryophyte) have been identified (Liu et al. 2013).
The draft genome sequence of cultivated mung bean (V. radiata var. radiata VC1973A)
has been published by Kang et al. (2014). The availability of whole-genome sequence
information will tremendously enhance genomics research in V. radiata, and provide an
impetus to V. radiata breeding programmes, thereby laying down the foundation for V.
radiata resequencing efforts.
The V. radiata whole-genome sequence information offers a wide range of opportunities
for identification and characterisation of agriculturally-relevant gene families, understand
gene expression regulation in normal plant metabolism and during abiotic and biotic
stresses, identification of molecular markers to undertake targeted marker-assisted
breeding programmes, etc.
Despite the availability of whole-genome sequence of V. radiata, large-scale in-silico
genome-wide identification and characterization of any gene family have not been carried
out in this economically important legume crop. Till date, there is no report of GST gene
family identification and characterization in V. radiata. The current study identified 44
GSTs in V. radiata, distributed into 7 classes; they were found to be primarily localized
in the cytoplasm. The canonical N and C-terminal domains of GSTs were found to be
present in V. radiata GSTs with active site residues located in the N-terminus G-site.
These GSTs were further characterized on the basis of molecular weight, pI, protein
length, gene architecture, protein motif identification, active site residue localization etc.
Page 4 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
5
2. Material and methods
2.1 Searching for GST genes in V. radiata sequence database available at Legume
Information System (LIS)
To conduct the in-silico identification of GSTs present in V. radiata genome, A. thaliana,
Glycine max and O. sativa GST protein sequences were used to query the V. radiata
whole genome database Vr1.0 available at Legume Information System (LIS;
http://legumeinfo.org/). A. thaliana GST protein sequences were retrieved from The
Arabidopsis Information Resource (TAIR); soybean and rice GST protein sequences
were retrieved from NCBI by the locus ID or the accession number published by Liu et
al. (2015) and Jain et al. (2010), respectively (Supplementary tables 1, 2 and 3). A total of
235 GST protein sequences from these three species were used to query the V. radiata
database. In the preliminary analysis, these sequences were FASTA formatted and used
to search the V. radiata sequence database at LIS with default parameters. Furthermore,
GST_N and GST_C domains of these 235 GST protein sequences from the three species
were identified. These two domain sequences were separately used to query the V.
radiata sequence database at LIS. Hidden Markov Model (HMM) searches were also
carried out separately with GST_N and GST_C domain sequences of the three species as
queries. pBLAST searches were performed individually for GST_N and GST_C domain
sequences of each GST class separately using GST protein sequences of A. thaliana, G.
max and O. sativa as queries. The amino acid, genomic, and coding sequence (CDS) of
the identified V. radiata were downloaded from the LIS. These GSTs were tentatively
named as VrGSTs. The protein and gene sequences of the identified V. radiata GSTs
were subjected to pI and molecular weight predictions, subcellular localization, protein
domain characterization, identification of exon-intron organisation, protein motif
identification, phylogenetic analysis, etc using different online or offline software
programs and applications (Table 1).
2.2 Conserved domain analysis and confirmation of GST proteins
The identity and protein domain organization of V. radiata GST proteins were primarily
confirmed by NCBI batch-CD (conserved domain) search
Page 5 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
6
(https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) (Marchler-Bauer et al.
2017).
2.3 Subcellular localization
The probable subcellular localization of the identified GSTs was determined by CELLO
online tool v.2.5 (http://cello.life.nctu.edu.tw/) (Yu et al. 2006), Target P (Emanuelsson et
al. 2000) and WoLF PSORT (Horton et al. 2007).
2.4 Prediction of molecular weights and pIs on ExPASy server
The protein molecular weights and pIs were predicted using ProtParam tool of ExPASy
(http://web.expasy.org/protparam/) (Gasteiger et al. 2005).
2.5 Conserved motif identification in V. radiata GST proteins
MEME analysis (http://meme-suite.org/) (Bailey et al. 2009) was performed with default
parameters to identify conserved motifs in the identified V. radiata GST protein
sequences.
2.6 Schematic diagram of protein functional domains with active site residues
The protein domain organisation and active site residues were depicted diagrammatically
using Illustrator IBS v. 1.0.2 (http://ibs.biocuckoo.org/index.php) (Liu et al. 2015).
2.7 Protein sequence alignments
The GST protein sequences of V. radiata, A. thaliana, O. sativa and G. max were aligned
using Clustal Omega (Sievers et al. 2011). The protein alignments were rendered in
ESPript 3.0 (Robert and Gouet 2014).
2.8 Gene structure visualization
The exon-intron number and gene architecture of the identified GST genes were obtained
using Gene Structure Display Server 2.0 (GSDS; http://gsds.cbi.pku.edu.cn/) by aligning
the FASTA formatted CDS and genomic DNA sequences (Hu et al. 2015).
2.9 Phylogenetic analysis
Page 6 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
7
The evolutionary relationship of V. radiata (an angiosperm) GST proteins with
Physcomitrella patens (a bryophyte), Larix kaempferi (a gymnosperm) and Arabidopsis
thaliana (an angiosperm) GST proteins was studied using MEGA 7 (Kumar et al. 2016).
Phylogenetic analysis was performed by Neighbour-Joining method in MEGA 7.
Bootstrap analysis was performed with 1000 replicates.
3. Results
3.1 Chromosome locations, nomenclature, and gene lengths of GST genes identified in V.
radiata
A pBLAST search of the V. radiata genome using GST protein sequences of A. thaliana,
G. max and O. sativa led to the identification of 44 GST genes sequences (Supplementary
table 4). They were distributed in tau (19), phi (7), lambda (3), EF1G (2), zeta (2), theta
(1), TCHQD (2), mPGES2 (2), GST_N_2GST_N (2), omega (2), and DHAR (2). As
reported in other plant species, the tau and phi class GSTs were highest in number in V.
radiata as well. Out of these putative 44 GST genes, the chromosomal locations of 31
were known, the rest being assigned to the scaffolds. Only the 31 genes with confirmed
chromosomal locations were subjected to further bioinformatics analyses and
characterizations (Table 2). In case of tau class GSTs, the chromosomal locations of 12,
out of 19 identified, were known: one tau each was present on chromosomes 1, 3, 6 and
11; and four each were present on chromosomes 7 and 8 (Figures 2A and 2B). The
chromosomal locations of 4 phi GSTs, out of 7 identified, were known: two were found
to be present on chromosome 6, and one each was present on chromosomes 8 and 10. Of
the 3 lambda GSTs, one was found to be present on chromosome 2, and the other two
were assigned to the scaffolds. The 2 EF1Gs were localized to chromosomes 7 and 10.
Out of the 2 zeta GSTs identified, one was present on chromosome 7; the location of the
other zeta GST was not known. DHAR GSTs were localized to chromosomes 3 and 8.
The 2 TCHQDs were localized to chromosomes 5 and 6. The 2 omega GSTs were
localized to chromosomes 7 and 8. The 2 mPGES2 GSTs were localized to chromosomes
1 and 5. The 2 GST_N_2GST_N were localized to chromosomes 5 and 8. The only theta
GST identified was localized to chromosome 7.
Page 7 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
8
The GSTs were designated as VrGSTs according to the proposed nomenclature for GST
genes (Dixon et al. 2002; Dixon and Edwards 2010). The genes from the tau, phi, theta,
zeta, lambda, omega, EF1G, TCHQD, mPGES2, GST_N_2GST_N and DHAR classes
were named as GSTU, GSTF, GSTT, GSTZ, GSTL, GSTO, EF1G, TCHQD, mPGES2,
GST_N_2GST_N and DHAR, respectively, followed by a gene number (Table 2). The
numbering for each class GST gene was based on their position on each chromosome
from the beginning towards the end of the chromosome (5’->3’), and on different
chromosomes from chromosome 1 to chromosome 11 (Dong et al., 2016). The gene
lengths of the identified GSTs ranged from 723 nucleotides (VrGSTU7) to 11101
nucleotides (VrGSTZ1).
3.2 GSTs identified in V. radiata demonstrated domain organization typical of GST
protein family
The protein sequences of the 31 VrGSTs were downloaded from the LIS database. These
sequences ranged in length from 117 (VrGSTF2) to 442 (VrGSTT1) amino acids (Table
2). All the VrGSTUs were from ~200 to ~250 amino acids; VrGSTU10 being the only
exception with the protein length of 320 amino acids. The theta GST (VrGSTT1) was the
longest with 442 amino acids as opposed to VrGSTF1 and VrGSTF2, which were the
smallest with amino acid lengths of 178 and 117, respectively. VrEF1G1 and VrEF1G2 also
had long chain lengths with 391 and 419 amino acids, respectively. VrGST_N_2GST_N1 and
VrGST_N_2GST_N2 were 333 and 358 amino acids long, respectively. VrGSTO1 and
VrGSTO2 had chain lengths of 368 and 352 amino acids, respectively.
The protein domain organization of the VrGSTs was further identified and confirmed by
NCBI batch-CD search (Supplementary table 5). The analysis revealed that all the 31 GSTs
had typical GST class-specific domain organization having a small thioredoxin-like N-
terminal domain that binds to glutathione (GST-N), and a variable GST-C domain that binds
to hydrophobic or electrophilic substrates (Supplementary figure 1). The analysis
demonstrated the presence of G-site, H-site, N-terminal and C-terminal domain interfaces, etc
integral to GST protein function. It was found that all the GSTs showed the presence of
typical N- and C-terminal domains. However, Vradi06g02400_Phi (VrGSTF1) and
Vradi06g16320_Phi (VrGSTF2) showed only C-terminal or N-terminal domains,
respectively. In case of Vradi03g07940_DHAR (VrDHAR1), GST_C domain was
Page 8 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
9
identified. In Vradi05g06210_TCHQD (VrTCHQD1) and Vradi06g11490_TCHQD
(VrTCHQD2), GstA family and GST_C domains were identified. Distinct GST_N and
GST_C_mPGES2 domains were identified in Vradi01g05130_mPGES2 (VrmPGES2A)
and Vradi05g13980_mPGES2 (VrmPGES2B). Distinct GST_N and GST_C terminal
domains were identified in VrGST_N_2GST_N1 and VrGST_N_2GST_N2.
Vradi07g04760_Omega-like (VrGSTO1), and Vradi08g07540_Omega-like (VrGSTO2)
revealed the presence of ECM4 domain, indicating that these two GSTs might belong to
the glutathionyl-hydroquinone reductases (GS-HQRs) group of GSTs. (Supplementary
figure 1 and Figures 3A and 3B).
3.3 pI value and molecular weight predictions of GSTs identified in V. radiata
The predicted molecular weights of the VrGST proteins ranged from 13.12 (VrGSTF2) to
50 kDa (VrGSTT1) (Table 2). The theoretical pI values ranged from 5.10 (VrGSTL1) to
9.40 (VrTCHQD1). The pI value of the VrGSTUs ranged from 5.23 (VrGSTU1) to 8.38
(VrGSTU5). In case of VrGSTFs, VrGSTF2 had a pI value of 9.18 as against pI values
between ~5 and ~6 in the other members. The pI values of all the omega, TCHQD,
mPGES2 and GST_N_2GST_N classes were in the higher ranges between 8 and 9.4.
3.4 Subcellular localization analysis of V. radiata GST proteins revealed that most of the
VrGSTs were cytoplasmic
The subcellular localization of the 31 GSTs was predicted using 3 different prediction
tools. In most of the VrGST proteins, the results were the same with all the 3 tools.
However, for some VrGSTs different cellular locations were predicted by different tools
(Table 3). According to CELLO, all the VrGSTs were predicted to be cytoplasmic,
except VrGSTT1 and VrGSTZ1, which were localized to mitochondria. According to
WoLF PSORT, VrGSTU4, VrGSTU6, VrGSTU9, VrGSTT1 and VrEF1G2 were
predicted to be localized to chloroplast; VrGSTU5 was nuclear; VrGSTF1 was
mitochondrial; and VrGSTF2 was vacuolar. According to TargetP, VrGSTU7,
VrGSTU9, VrGSTU11, VrGSTU12 and VrEF1G2 were predicted to be secretory GSTs.
According to CELLO, VrGSTO1 was predicted to be mitochondrial, and VrGSTO2 as
extracellular. However, according to WoLF PSORT and TargetP, VrGSTO2 was
Page 9 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
10
predicted to be chloroplastic. Both the mPGES2 GSTs (Vradi01g05130_mPGES2 and
Vradi05g13980_mPGES2) were predicted to be either mitochondrial or chloroplastic.
Both the GST_N_2GST_N (Vradi05g22310_GST_N_2GST_N and
Vradi08g05820_GST_N_2GST_N) were predicted to be chloroplastic.
3.5 V. radiata GST genes followed two-exon/one-intron and three-exon/two-intron
architecture in tau GSTs and phi GSTs, respectively
The exon-intron organization in the VrGST genes was determined using the coding
sequences and the corresponding genomic DNA sequences. The gene structure showed
group-specific exon/intron patterns. The number of exons ranged from 2 in VrGSTUs to
17 in VrGSTZ1 (Vradi07g26200) (Figure 4). All tau GSTs of V. radiata possessed two
exons and one intron except VrGSTU6 (Vradi07g05380), VrGSTU9 (Vradi08g08500)
and VrGSTU10 (Vradi08g15620), which contained four-exons and three-introns each.
Amongst the four identified phi genes, one phi GST VrGSTF1 (Vradi06g02400)
possessed three-exons and two-intron, and the rest three VrGSTF2 (Vradi06g16320),
VrGSTF3 (Vradi08g10080) and VrGSTF4 (Vradi10g04530) contained two, four and two
exons, respectively. VrGSTT1 (Vradi07g30490) and VrGSTL1 (Vradi03g03170) were
found to contain 13 and 11 exons, respectively. Both the VrEF1G contained 7 exons
each. Both the VrDHARs (Vradi03g07940 and Vradi08g22680) contained 6 exons each.
The VrGST_N_2GST_N1 (Vradi05g22310) and VrGST_N_2GST_N2 (Vradi08g05820)
contained 10 and 12 exons, respectively. Both the VrmPGES2 (Vradi01g05130 and
Vradi05g13980) contained 6 exons each. VrGSTO1 (Vradi07g04760) and VrGSTO2
(Vradi08g07540) contained 6 and 3 exons, respectively.
3.6 VrGST protein active sites are comprised of conserved serine or cysteine residues
The multiple sequence alignment of full-length VrGST protein sequences with GST
protein sequences of A. thaliana, rice and soybean revealed highly conserved N-terminus
with active site serine (Ser; S) or cysteine (Cys; C) residue for the activation of GSH
binding and GST catalytic activity. The tau, phi, theta and zeta VrGSTs had active site
Ser residues (Figure 5A, 5B, 5C and 5D), while DHAR and lambda VrGSTs revealed the
presence of active site Cys residues (Figure 5E and 5F). The positions of active site Ser
Page 10 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
11
and Cys were different among different GST classes (Table 4). Tau VrGSTs possessed
Ser at position 10-20; phi VrGSTs had conserved Ser residue at position 60-70; theta and
zeta VrGSTs contained active site Ser at positions 14 and 20, respectively. DHAR and
lambda VrGSTs possessed active site Cys residue at locations 20 and 36, respectively. In
case of EF1G VrGSTs, there were two probable conserved active site tyrosine (Tyr; Y)
residues (Figure 5G and 5H). In case of TCHQD, the potential active site Ser residue was
at around position 30 (Figure 5I). In omega GSTs, the active site Cys was found to be
present at around positions 30-40 within the ACPWA amino acid sequence in case of
Vradi07g04760 (VrGSTO1); however, the active site Cys in Vradi08g07540 (VrGSTO2)
could not be identified around 30-40. Nonetheless, a Cys was found to be present at
around position 120 within A-C-P-W-A amino acid sequence (Figure 5J).
3.7 MEME analysis of VrGST proteins revealed the presence of class-specific motifs
MEME analysis was performed to identify conserved motifs in VrGST protein
sequences. Of all the motifs identified, few were class specific and others were found in
almost all classes (Table 5). Motif 1, 4 and 6 were present only in VrGSTUs. Motif 4 had
extremely conserved A-R-F-W sequence. Motifs 8 and 10 were present only in EF1G
VrGSTs. Motif 7 was present in EF1G and mPGES2 VrGSTs. Motif 9 was present in
EF1G and GST_N_2GST_N VrGSTs. Motifs 2 and 5 were present in various GST
classes. Motif 3 was presnt in all the VrGSTs irrespective of the class.
3.8 VrGST proteins clustered with the GST proteins of other plant species in a class-
specific manner
An extensive phylogenetic analysis of VrGST proteins was carried out by comparing
them with GST protein sequences from plant species as diverse as Physcomitrella patens
(a bryophyte), Larix kaempferi (a gymnosperm) and A. thaliana (an angiosperm). All the
tau VrGSTUs were found to be closely associated with A. thaliana. Similarly, phi
VrGSTFs were found to be closely associated with A. thaliana. mPGES2, omega and
GST_N_2GST_N VrGSTs branched out separately from the rest of the VrGSTs. The
analysis clearly revealed that VrGSTs of a particular class clustered with the GSTs of
their classes. Exceptionally, Vradi06g16320_phi clustered with zeta GSTs.
Page 11 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
12
4. Discussion
The current study performed the identification and detailed characterization of V. radiata
GSTs using in silico approaches. Similar studies have been carried out in a wide variety
of plant species and large number of GST genes has been identified in these species. 101
GSTs were identified in the genome of G. max through TBLASTN (Liu et al. 2015), 61
GST transcripts were reported in Citrus cinensis (Licciardello et al. 2014). In 2013, a
genome-wide analysis of P. patens revealed the presence of 37 GSTs, where GST family
was reported for the first time from a nonvascular representative of early land plants (Liu
et al. 2013). The 37 P. patens GSTs were divided into 10 classes, including two new
classes (hemerythrin and iota). In V. radiata a total of 44 GST genes were identified out
of which tau and phi were highest in number, 19 and 7 respectively. Interestingly, HMM
analysis resulted in the identification of TCHQD, omega, mPGES2 and
GST_N_2GST_N VrGSTs. Only one putative hemerythrin GST was found in V. radiata
when P. patens hemerythrin GSTs were used as query sequences (data not shown).
VrGSTs, like most of the plant GSTs, were primarily localized in the cytoplasm;
however, VrGSTU5, VrGSTF1, VrGSTO1, VrGSTT7 and VrGSTZ7 were predicted to
be localized to mitochondria, which suggests that these VrGSTs might be involved in
functions different from cytoplasmic GSTs. These VrGSTs might be involved in
maintaining GSH:GSSG ratios in mitochondria since high glutathione concentrations
have been observed in mitochondria (Zechmann et al. 2008). GFP-GST fusion studies,
similar to those performed in A. thaliana, may be performed for these VrGSTs to confirm
their subcellular localization (Dixon et al. 2009). mPGES2 and GST_N_2GST_N
VrGSTs were predicted to be chloroplastic. GSTs present in chloroplast have been
identified other plant species. An auxin-inducible chloroplast-localized GST has been
identified from phreatophyte Prosopis juliflora (George et al. 2010); similarly, a
chloroplast-localized GST from Puccinellia tenuiflora seedling leaves in resistance to
Na2CO3 stress (Sun et al. 2012). It would be interesting to identify targeting sequences in
these organelle-localized VrGSTs so as to confirm their subcellular localization.
All plant GSTs have relative molecular masses of around 50 kDa and are homodimer or
heterodimer encoded by different genes of the same class and composed of two similarly
Page 12 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
13
sized (~25kDa) subunits with an isoelectric point in the pH range of 4 to 5 (Frova et al.
2006). The average molecular masses of VrGST tau and phi were 26.8 kDa, within the
range of earlier reported data. EF1G and theta VrGSTs had highest molecular masses of
46.1 and 50 kDa, respectively. TCHQD, omega, mPGES2, and GST_N_2GST_N
VrGSTs had consistently high pI values ranging from 8 to 9.
As reported, one-inron/two-exon structure is normally found in plant specific tau GSTs
and two-intron/three-exon structure characterizes phi class GSTs in higher plants (Labrou
et al. 2015), VrGSTs also followed the same gene architecture. With the exception of
VrGSTU6 (Vradi07g05380_Tau), VrGSTU9 (Vradi08g08500_Tau) and VrGSTU10
(Vradi08g15620_Tau), all the tau VrGSTs had one-intron/two-exon structure. Similar
exceptions have been reported in other plant species as well. Liu et al., 2015, reported
three exons in GSTU54, a deviation from the typical one-inron/two-exon structure for tau
GSTs. Zeta-class GST genes possess 10 exons (Basantani et al. 2007). However, in the
current study, 17 exons were identified in VrGSTZ1 (Vradi07g26200).
Plant GST protein structure studies have clearly demonstrated that tau, phi, theta, zeta
GSTs contain Ser active site residue involved in GSH binding and activation, whereas
DHAR and lambda are Cys containing GSTs that facilitate deglutathionylation reaction.
EF1G are predicted to contain Ser or Tyr residue. For active site Ser/Cys/Tyr residue
position identification in VrGSTs, amino acid sequence alignments of VrGST proteins
were performed in Clustal omega and visualized using EsPript software. Tau and phi
VrGST active site serines were positioned between 10 to 20, and 60 to 70 respectively.
The zeta-class GSTs from a range of species contain a characteristic motif
[SSCX(W/H)RVIAL, in the N terminal region (Board et al. 1997). No such motif was
found in the current analysis. MEME analysis suggested that motifs found in multiple
classes of GSTs may be performing similar functions.
The phylogenetic analysis revealed that tau class VrGSTs were more closely related to A.
thaliana GSTs as compared to L. kaempferi. L. kaempferi tau GSTs branched separately
from VrGSTs and A. thaliana GSTs. It was the same for VrGSTs of other classes. Phi
VrGSTs were also more closely related to A. thaliana.
Using a combined computational strategy, the current study identified 44 VrGSTs in the
V. radiata genome and characterized them based on their sub-cellular localization,
Page 13 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
14
protein domains and active sites, gene structure, motif analysis and phylogeny. The
results of the current study can potentially be used for the cloning and characterization of
VrGSTs and detailed investigation of V. radiata GST gene family. The role of GSTs in V.
radiata growth and metabolism, biotic and abiotic stress tolerance, etc can be elucidated
in further studies and the candidate genes identified can be used for making stress-
tolerant plants with enhanced productivity.
Acknowledgement
This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
Page 14 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
15
References
Baloda, A., Madanpotra, S., and Aiwal, P.K.J. 2017. Transformation of mung bean plants
for abiotic stress tolerance by introducing codA gene, for an osmoprotectant
glycine betaine. Journal of Plant Stress Physiology, 3: 5-11. doi:
http://dx.doi.org/10.19071/jpsp.2017.v3.3148
Bailey, T.L., Bodén, M., Buske, F.A., Frith, M., Grant, C.E., and Clementi, L., et al.
2009. MEME SUITE: tools for motif discovery and searching. Nucleic Acids
Research, 37(Web Server Issue): 202-208. doi:
https://doi.org/10.1093/nar/gkp335
Basantani, M., and Srivastava, A. 2007. Plant glutathione transferases: a decade falls
short. Canadian Journal of Botany, 85(5): 443-456. doi:
https://doi.org/10.1139/B07-033
Board, P.G., Baker, R.T., Chelvanayagam, G., and Jermiin, L.S. 1997. Zeta, a novel class
of glutathione transferases in a range of species from plants to humans.
Biochemical Journal, 328(Pt 3): 929-935. doi: https://doi.org/10.1042/bj3280929
Chitra, U., Vimala, V., Singh, U., and Geervani P. 1995. Variability in phytic acid
content and protein digestibility of grain legumes. Plant Foods for Human
Nutrition, 47(2): 163-172. doi: https://doi.org/10.1007/BF01089266
Page 15 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
16
Dixon, D.P., Lapthorn, A., and Edwards, R. 2002. Plant glutathione transferases. Genome
Biology, 3: 3004.1. doi: https://doi.org/10.1186/gb-2002-3-3-reviews3004
Dixon, D.P., Hawkins, T., Hussey, P.J., and Edwards, R. 2009. Enzyme activities and
subcellular localization of members of the Arabidopsis glutathione transferase
superfamily. Journal of Experimental Botany, 60(4): 1207-1218. doi:
https://doi.org/10.1093/jxb/ern365
Dixon, D.P., and Edwards, R. 2010. Glutathione transferases. The Arabidopsis Book.
American Society of Plant Biologists.
Dong, Y., Li, C., Zhang, Y., He, Q., Daud, M.K., and Chen, J., et al. 2016. Glutathione S-
transferase gene family in Gossypium raimondii and G. arboreum: Comparative
genomic study and their expression under salt stress. Frontiers in Plant Science, 7:
139. doi: https://doi.org/10.3389/fpls.2016.00139
Emanuelsson, O., Nielsen, H., Brunak, S., and Heijne G.V. 2000. Predicting subcellular
localization of proteins based on their N-terminal amino acid sequence.
Journal of Molecular Biology, 300(4): 1005-1016. doi:
https://doi.org/10.1006/jmbi.2000.3903
Frova, C. 2006. Glutathione transferases in the genomics era: New insights and
perspectives. Biomolecular Engineering, 23(4): 149-169. doi:
https://doi.org/10.1016/j.bioeng.2006.05.020
Page 16 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
17
Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., and Appel, R.D., et
al. 2005. Protein Identification and Analysis Tools on the ExPASy Server. The
Proteomics Protocols Handbook 571-607.
George, S., Venkataraman, G., and Parida, A. 2010. A chloroplast-localized and auxin-
induced glutathione S-transferase from phreatophyte Prosopis juliflora confer
drought tolerance on tobacco. Journal of Plant Physiology, 167(4): 311-318. doi:
https://doi.org/10.1016/j.jplph.2009.09.004
GoI: Department of Agriculture and Cooperation (2014-15)
Graham, P.H. and Vance, C.P. 2003. Legumes: importance and constraints to greater
use. Plant Physiology, 131(3): 872-877. doi: http://dx.doi.org/10.1104/pp.017004
Horton, P., Park, K., Obayashi, T., Fujita, N., Harada, H., and Adams-Collier, C.J., et al.
2007. WoLF PSORT: protein localization predictor. Nucleic Acids Research, 35
(Web Server Issue): W585-W587. doi: https://doi.org/10.1093/nar/gkm259
Hu, B., Jin, J., Guo, A-Y., Zhang, H., Luo, J., and Gao, G. 2015. GSDS 2.0: an upgraded
gene feature visualization server. Bioinformatics, 31(8): 1296-1297. doi:
https://doi.org/10.1093/bioinformatics/btu817
Jain, M., Ghanashyam, C., and Bhattacharjee, A. 2010. Comprehensive expression
analysis suggests overlapping and specific roles of glutathione S-transferases
during development and stress responses in rice. BMC Genomics, 11: 73.
doi: https://doi.org/10.1186/1471-2164-11-73
Page 17 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
18
Kang Y.J., Kim S., Kim M.Y., Lestari P., Kim K.H., and Ha B-K., et al. 2014. Genome
sequence of mungbean and insights into evolution within Vigna species. Nature
Communications, 5: 5443. doi: https://doi.org/10.1038/ncomms6443
Keatinge, J.D.H., Easdown, W.J., Yang, R.Y., Chadha, M.L., and Shanmugasundaram, S.
2011. Overcoming chronic malnutrition in a future warming world: the key
importance of mungbean and vegetable soybean. Euphytica, 180(1): 129-141. doi:
https://doi.org/10.1007/s10681-011-0401-6
Kumar, S., Stecher, G., and Tamura, K. 2016. MEGA7: Molecular Evolutionary Genetics
Analysis Version 7.0 for Bigger Datasets. Molecular Biology and Evolution,
33(7): 1870-1874. doi: https://doi.org/10.1093/molbev/msw054
Labrou, N.E., Papageorgiou, A.C., Pavli, O., and Flemetakis, E. 2015. Plant GSTome:
structure and functional role in xenome network and plant stress response. Current
Opinion in Biotechnology, 32: 186-194. doi:
http://dx.doi.org/10.1016/j.copbio.2014.12.024
Lallement, P.A., Brouwer, B., Keech, O., Hecker, A., and Rouhier, N. 2014. The still
mysterious roles of cysteine-containing glutathione transferases in plants.
Frontiers in Pharmacology, 5: 192. doi: https://doi.org/10.3389/fphar.2014.00192
Lan, T., Yang, Z.L., Yang, X., Liu, Y.J., Wang, X.R., and Zeng, Q.Y. 2009. Extensive
functional diversification of the Populus glutathione S-transferase supergene
family. Plant Cell, 21(12): 3749-3766. doi: http://dx.doi.org/10.1105/tpc.109.
070219
Page 18 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
19
Licciardello, C., D’Agostino, N., Traini, A., Recupero, G.R., Frusciante, L., and
Chiusano M.L. 2014. Characterization of the glutathione S-transferase gene
family through ESTs and expression analyses within common and pigmented
cultivars of Citrus sinensis (L.) Osbeck. BMC Plant Biology, 14: 39.
doi: https://doi.org/10.1186/1471-2229-14-39
Liu, H-J., Tang, Z-X., Han, X-M., Yang, Z-L., Zhang, F-M., and Yang, H-L., et al. 2015.
Divergence in enzymatic activities in the soybean GST supergene family provides
new insight into the evolutionary dynamics of whole-genome duplicates.
Molecular Biology and Evolution, 32(11): 2844–2859. doi:
https://doi.org/10.1093/molbev/msv156
Liu, W., Xie, Y., Ma, J., Luo, X., Nie, P., and Zuo, Z., et al. 2015. IBS: an illustrator for
the presentation and visualization of biological sequences. Bioinformatics,
31(20): 3359-3361. doi: https://doi.org/10.1093/bioinformatics/btv362
Liu, Y-J., Han, X-M., Ren, L-L., Yang, H-L., and Zeng, Q-N., 2013. Functional
divergence of the glutathione S-transferase supergene family in Physcomitrella
patens reveals complex patterns of large gene family evolution in land plants.
Plant Physiology, 161(2): 773-786. doi: http://dx.doi.org/10.1104/pp.112.205815
Marchler-Bauer A., Bo, Y., Han, L., He, J., Lanczycki, C.J., and Lu, S. et al. 2017.
CDD/SPARCLE: functional classification of proteins via subfamily domain
architectures. Nucleic Acids Research, 45(Database Issue): D200-D203. doi:
https://doi.org/10.1093/nar/gkw1129
Page 19 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
20
McGonigle, B., Keeler, S.J., Lau, S-M.C., Koeppe, M.K., and O’Keefe, D.P. 2000. A
genomics approach to the comprehensive analysis of the glutathione S-transferase
gene family in soybean and maize. Plant Physiology, 124(3): 1105-1120. doi:
http://dx.doi.org/10.1104/pp.124.3.1105
Nair, R.M., Schafleitner, R., Kenyon, L., Srinivasan, R., Easdown, W., and Ebert, A.W.,
et al. 2012. Genetic improvement of mungbean. SABRAO Journal of Breeding
and Genetics, 44(2): 177-190.
Neuefeind, T., Reinemer, P., and Bieseler, B. 1997. Plant glutathione S-transferases and
herbicide detoxification. Biological Chemistry, 378(3-4): 199-205.
Rezaei, M.K., Shobbar, Z.-S., Shahbazi, M., Abedini, R., and Zare, S. 2013. Glutathione
S-transferase (GST) family in barley: Identification of members, enzyme activity,
and gene expression pattern. Journal of Plant Physiology, 170(14): 1277-1284.
doi: http://dx.doi.org/10.1016/j.jplph.2013.04.005
Robert, X., and Gouet, P. 2014. Deciphering key features in protein structures with the
new ENDscript server. Nucleic Acids Research, 42(Web Server Issue): W320-
W324. doi: https://doi.org/10.1093/nar/gku316
Sandhu, K.S., and Lim, S-T. 2008. Digestibility of legume starches as influenced by their
physical and structural properties. Carbohydrate Polymers, 71(2): 245-252. doi:
https://doi.org/10.1016/j.carbpol.2007.05.036
Page 20 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
21
Schafleitner, R., Nair, M.R., Rathore, A., Wang, Y-W., Lin C-Y., and Chu, S-H., et al.
2015. The AVRDC – The World Vegetable Center mungbean (Vigna radiata)
core and mini core collections. BMC Genomics, 16: 344.
doi: http://dx.doi.org/10.1186%2Fs12864-015-1556-7
Sheehan, D., Meade, G., Foley, V.M., and Dowd, C.A. 2001. Structure, function and
evolution of glutathione transferases: implications for classification of non-
mammalian members of an ancient enzyme superfamily. Biochemical Journal,
360(Pt 1): 1-16. doi: https://doi.org/10.1042/bj3600001
Sievers, F., Wilm, A., Dineen, D.G., Gibson, T.J., Karplus, K., and Li, W., et al. 2011.
Fast, scalable generation of high-quality protein multiple sequence alignments
using Clustal Omega. Molecular Systems Biology, 7: 539.
doi: https://doi.org/10.1038/msb.2011.75
Soranzo, N., Gorla, M.S., Mizzi, L., Toma, G.D., and Frova, C. 2004. Organisation and
structural evolution of the rice glutathione S-transferase gene family. Molecular
Genetics and Genomics, 271(5): 511-521. doi: https://doi.org/10.1007/s00438-
004-1006-8
Sun, G.R., Wu, X.L., Chen, G., Wang, J.B., Cao, W.Z., and Du, Q., et al. 2012. The
function of chloroplast GST of Puccinellia tenuiflora seedling leaves in resistance
to Na2CO3 stress. Advanced Materials Research, 343-344: 712-720.
Page 21 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
22
Takahashi, Y., Hasezawa, S., Kusaba, M., and Nagata, T. 1995. Expression of the auxin-
regulated parA gene in transgenic tobacco and nuclear localization of its gene
product. Planta, 196(1): 111-117. doi: https://doi.org/10.1007/BF00193224
Vijayakumar, H., Thamilarasan, S.K., Shanmugam, A., Natarajan, S.K., Jung, H-J., and
Park, J-I., et al. 2016. Glutathione transferases superfamily: Cold-inducible
expression of distinct GST genes in Brassica oleracea. International Journal of
Molecular Sciences, 17(8): 1211. doi: http://dx.doi.org/10.3390/ijms17081211
Yaqub, M., Mahmood, T., Akhtar, M., Iqbal M.M., and Ali S. 2010. Induction of
mungbean [Vigna radiata (L.) Wilczek] as a grain legume in the annual rice-
wheat double cropping system. Pakistan Journal of Botany, 42(5): 3125-3135.
Yu, C-S., Chen, Y-C., Lu, C-H., and Hwang, J-K. 2006. Prediction of protein subcellular
localization. Proteins, 64(3): 643-651. doi: http://dx.doi.org/10.1002/prot.21018
Zechmann B., Mauch F., Sticher L. and Müller M. 2008. Subcellular
immunocytochemical analysis detects the highest concentrations of glutathione in
mitochondria and not in plastids. Journal of Experimental Botany, 59(14): 4017-
4027. doi: https://doi.org/10.1093/jxb/ern243
Zhu, J-H., Li, H-L., Guo, D., Wang, Y., Dai, H-F., and Mei, W-L., et al. 2016.
Transcriptome-wide identification and expression analysis of glutathione S-
transferase genes involved in flavonoids accumulation in Dracaena cambodiana.
Plant Physiology and Biochemistry, 104: 304-311. doi:
https://doi.org/10.1016/j.plaphy.2016.05.012
Page 22 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
23
Legends
Figure 1. State-wise distribution of mung bean in India. Rajasthan (26%), Maharashtra
(20%) and Andhra Pradesh (10%) contribute maximum production, followed by other
states (GoI: Department of Agriculture and Cooperation 2014-15)
Figure 2. Genomic distribution of VrGSTs on chromosomes. (A) Chromosomal locations
are indicated based on V. radiata genome database on LIS. (B) Summary of the number
of GSTs present on each chromosome. The VrGST genes have been highlighted. Each
color represents one GST class.
Figure 3. Protein domain organization of VrGSTs. (A) Domain organization showing
active site amino acid residues on the representative GST protein from each class. (B)
Diagrammatic representation of protein domain of VrGSTs created using Illustrator.
Figure 4. Gene architecture of VrGSTs. The exon-intron structures of VrGST genes were
determined by comparing the coding sequences and the corresponding genomic DNA
sequences using the Gene Structure Display Server (GSDS). The blue rounded rectangles
indicate exons and the black lines indicate introns.
Figure 5. Protein active sites in VrGSTs. VrGST protein active site residues were
predicted by multiple sequence alignments of V. radiata, A. thaliana, rice and soybean
GST protein sequences. The asterisks indicate the active site serine in tau, phi, theta and
zeta VrGSTs (A to D); the active site cysteine in DHAR and lambda (E and F); the two
probable active site tyrosine residues in EF1G (G and H); and the potential active site
cysteine residue in omega VrGSTs (J).
Figure 6. Phylogenetic analysis of VrGST proteins. Different GST classes and their
branches have been given different colours.
Page 23 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
24
Legends to supplementary material
gen-2017-0192.R1Suppla
Supplementary table 1. Details of GST gene family of O. sativa. [Soranzo et al. (2004);
Jain et al. (2010)]
Supplementary table 2. Details of GST gene family of A. thaliana (Source: TAIR)
Supplementary table 3. Details of GST gene family of G. max (Liu et al. 2015)
Supplementary table 4. Details of GSTs identified in V. radiata. 44 GSTs distributed in
tau, phi, theta, zeta, lambda, DHAR, EF1G, TCHQD, omega, mPGES2, and
GST_N_2GST_N classes were identified in V. radiate
Supplementary table 5. NCBI batch-CD search results for 31 GST proteins to identify
GST family protein domain organization
gen-2017-0192.R1Supplb
Supplementary Figure 1. NCBI batch-CD search results of VrGST proteins. The image
shows the concise results of all the 31 VrGST proteins demonstrating the N- and C-
terminal domains.
Page 24 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 1. List of sequence databases, bioinformatics software and applications
used in the present study
S. No. Bioinformatics tools Application
1. Legume information system (LIS) pBLAST search, retrieval of GST amino acid sequence of
Glycine max through the locus id/accession no.
2. The Arabidopsis information
Resource (TAIR)
Retrieval of all GST class amino acid sequences of
Arabidopsis
3. Rice Annotation Project (RAP) Retrieval of all GST class amino acid sequences of rice
through locus IDs
4. NCBI Database To retrieve some amino acid sequences of soybean
5. NCBI Batch CD Search Identification of conserved GST protein domains
6. ExPASY protparam Computation of various chemical and physical parameters
of GST proteins
7. CELLO, Target P, WoLF PSORT For subcellular localization
8. Clustal Omega Protein sequence alignments
9. MEME suit Identification of conserved motifs
10. Illustrator tool For diagrammatic representation of GST protein domains
11. Gene Structure Display Server
(GSDS)
Gene structure with well-defined introns and exons
12. MEGA v. 7 Evolutionary analysis and alignment
13. ESPript 3 Formatting of multiple sequence alignments
Page 25 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 2. Details of GSTs identified in V. radiata. 31 GSTs distributed in tau,
phi, theta, zeta, lambda, DHAR, EF1G, TCHQD, omega, mPGES2, and
GST_N_2GST_N classes were further characterized. These GST genes were
designated as VrGSTs
S.
No. Gene name Accession no
Chromosome
no
Gene length
(nucleotides)
Protein
length (aa) pI
Mol wt
(kDa)
1 VrGSTU1 Vradi01g12660_Tau Vr01 1332 219 5.23 25.58
2 VrGSTU2 Vradi03g09170_Tau Vr03 1588 252 6.72 28.76
3 VrGSTU3 Vradi06g05930_Tau Vr06 801 221 6.25 25.10
4 VrGSTU4 Vradi07g04910_Tau Vr07 730 219 5.7 25.00
5 VrGSTU5 Vradi07g05370_Tau Vr07 980 222 8.38 25.30
6 VrGSTU6 Vradi07g05380_Tau Vr07 4952 285 7.04 32.33
7 VrGSTU7 Vradi07g23980_Tau Vr07 723 219 5.93 25.67
8 VrGSTU8 Vradi08g02660_Tau Vr08 1655 226 5.48 25.94
9 VrGSTU9 Vradi08g08500_Tau Vr08 1649 228 5.34 26.58
10 VrGSTU10 Vradi08g15620_Tau Vr08 5085 320 6.04 37.18
11 VrGSTU11 Vradi08g22420_Tau Vr08 1116 218 7.87 25.34
12 VrGSTU12 Vradi11g10960_Tau Vr11 1269 232 5.25 26.29
13 VrGSTF1 Vradi06g02400_Phi Vr06 1130 178 6.11 20.55
14 VrGSTF2 Vradi06g16320_Phi Vr06 1055 117 9.18 13.12
15 VrGSTF3 Vradi08g10080_Phi Vr08 6496 356 5.53 40.64
16 VrGSTF4 Vradi10g04530_Phi Vr10 2369 213 6.09 24.29
17 VrGSTT1 Vradi07g30490_Theta Vr07 6007 442 9.29 50.03
18 VrGSTZ1 Vradi07g26200_Zeta Vr07 11101 395 5.85 44.67
19 VrGSTL1 Vradi03g03170_Lambda Vr03 4654 330 5.1 37.74
20 VrDHAR1 Vradi03g07940_DHAR Vr03 5393 235 9.27 26.25
21 VrDHAR2 Vradi08g22680_DHAR Vr08 3247 213 5.98 23.42
22 VrEF1G1 Vradi07g27390_EF1G Vr07 3138 391 6.21 44.28
23 VrEF1G2 Vradi10g13450_EF1G Vr10 2377 419 5.57 47.93
24 VrTCHQD1 Vradi05g06210_TCHQD Vr05 2659 267 9.4 31.6
25 VrTCHQD2 Vradi06g11490_TCHQD Vr06 2670 267 9.23 31.71
26 VrGSTO1 Vradi07g04760_Omega-like Vr07 5427 368 6.51 42.1
27 VrGSTO2 Vradi08g07540_Omega-like Vr08 2189 352 9.05 38.58
28 VrmPGES2A Vradi01g05130_mPGES2 Vr01 3956 311 8.93 35.09
29 VrmPGES2B Vradi05g13980_mPGES2 Vr05 3080 293 9.00 33.26
30 VrGST_N_2GST_N1 Vradi05g22310_GST_N_2GST_N Vr05 3685 333 8.4 37.28
31 VrGST_N_2GST_N2 Vradi08g05820_GST_N_2GST_N Vr08 5930 358 9.01 39.82
Page 26 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 3. Subcellular localization of GSTs identified in V. radiata. 31 GSTs were
analyzed for their cellular location using CELLO, WoLF PSORT, and TargetP
subcellular localization prediction tools
�
�
S. No. Gene name Accession No CELLO WoLF
PSORT TargetP
1 VrGSTU1 Vradi01g12660_Tau Cytoplasm Cytoplasm
2 VrGSTU2 Vradi03g09170_Tau Cytoplasm Cytoplasm
3 VrGSTU3 Vradi06g05930_Tau Cytoplasm Cytoplasm
4 VrGSTU4 Vradi07g04910_Tau Cytoplasm Chloroplast
5 VrGSTU5 Vradi07g05370_Tau Cytoplasm Nucleus Mitochondria
6 VrGSTU6 Vradi07g05380_Tau Cytoplasm Chloroplast
7 VrGSTU7 Vradi07g23980_Tau Cytoplasm Cytoplasm Secretory
8 VrGSTU8 Vradi08g02660_Tau Cytoplasm Cytoplasm
9 VrGSTU9 Vradi08g08500_Tau Cytoplasm Chloroplast Secretory
10 VrGSTU10 Vradi08g15620_Tau Cytoplasm Cytoplasm
11 VrGSTU11 Vradi08g22420_Tau Cytoplasm Cytoplasm Secretory
12 VrGSTU12 Vradi11g10960_Tau Cytoplasm Cytoplasm Secretory
13 VrGSTF1 Vradi06g02400_Phi Cytoplasm Mitochondria
14 VrGSTF2 Vradi06g16320_Phi Cytoplasm Vacuole
15 VrGSTF3 Vradi08g10080_Phi Cytoplasm Cytoplasm
16 VrGSTF4 Vradi10g04530_Phi Cytoplasm Cytoplasm
17 VrGSTT1 Vradi07g30490_Theta Mitochondria Chloroplast
18 VrGSTZ1 Vradi07g26200_Zeta Mitochondria Cytoplasm
19 VrGSTL1 Vradi03g03170_Lambda Cytoplasm Cytoplasm
20 VrDHAR1 Vradi03g07940_DHAR Chloroplast Chloroplast Chloroplast
21 VrDHAR2 Vradi08g22680_DHAR Cytoplasm Cytoplasm
22 VrEF1G1 Vradi07g27390_EF1G Cytoplasm Cytoplasm
23 VrEF1G2 Vradi10g13450_EF1G Cytoplasm Chloroplast Secretory
24 VrTCHQD1 Vradi05g06210_TCHQD Mitochondria Cytoplasm
25 VrTCHQD2 Vradi06g11490_TCHQD Cytoplasm Cytoplasm
26 VrGSTO1 Vradi07g04760_Omega-like Mitochondria Cytoplasm
27 VrGSTO2 Vradi08g07540_Omega-like Extracellular Chloroplast Chloroplast
28 VrmPGES2A Vradi01g05130_mPGES2 Chloroplast Mitochondria Mitochondria
29 VrmPGES2B Vradi05g13980_mPGES2 Mitochondria Chloroplast Mitochondria
30 VrGST_N_2GST_N1
Vradi05g22310_GST_N_2GST_N Plasma
membrane Chloroplast Chloroplast
31 VrGST_N_2GST_N2 Vradi08g05820_GST_N_2GST_N Chloroplast Chloroplast Chloroplast
�
�
�
Page 27 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Table 4. Predicted amino acid position of active site serine or cysteine residues in
VrGSTs
GST
Class
Active site amino acid
residue
Predicted position
with MSA
Tau
Phi
Theta
Zeta
Lambda
DHAR
EF1G
TCHQD
Omega
Ser
Ser
Ser
Ser
Cys
Cys
Tyr
Ser
Cys
10-20
60-70
14
20
36
20
-
-
30-40
Page 28 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
������������ ��������������������� ����� �������������������������������
��������������������������������� ������� ������ �������������������������������������
�
!����� ��� ���� �������� �������� "����� ������# �������������
�����
������������
���
�
���
�
��� ���
�
������������
���
�
���
�
������������ ���
������������
� �!���"�#���
"$��%��
�
������������
���
�
���
�
����&��''�'�
�
(�
��
�������)����
�
���
�
��������*���
�
+�
��
��)���(��
�
���
�
���
�
$������� ���,� ���
-�"./���������
� �!��������
�/�*�����
*"�#��
�
��
�
� ���+������
��
�
+��
�
��� ���
�
)���������(� (�
+��
�
���"$��%��
�/������
�
��
�
���������+��
�
��
�
����������
�
�
������������ ���
(�
�
�0*�������*��
12�/���#�+���3
�%�343��%�34��
�
�����������)�
+��
�
��
���������
�
�
�
�
Page 29 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure 1. State-wise distribution of mung bean in India. Rajasthan (26%), Maharashtra (20%) and Andhra Pradesh (10%) contribute maximum production, followed by other states (GoI: Department of Agriculture
and Cooperation 2014-15)
78x100mm (300 x 300 DPI)
Page 30 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure 2. Genomic distribution of VrGSTs on chromosomes. (A) Chromosomal locations are indicated based on V. radiata genome database on LIS. (B) Summary of the number of GSTs present on each chromosome.
The VrGST genes have been highlighted. Each color represents one GST class.
97x125mm (300 x 300 DPI)
Page 31 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure 3. Protein domain organization of VrGSTs. (A) Domain organization showing active site amino acid residues on the representative GST protein from each class. (B) Diagrammatic representation of protein
domain of VrGSTs created using Illustrator.
201x260mm (300 x 300 DPI)
Page 32 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure 4. Gene architecture of VrGSTs. The exon-intron structures of VrGST genes were determined by comparing the coding sequences and the corresponding genomic DNA sequences using the Gene Structure Display Server (GSDS). The blue rounded rectangles indicate exons and the black lines indicate introns.
120x155mm (300 x 300 DPI)
Page 33 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure 5. Protein active sites in VrGSTs. VrGST protein active site residues were predicted by multiple sequence alignments of V. radiata, A. thaliana, rice and soybean GST protein sequences. The asterisks
indicate the active site serine in tau, phi, theta and zeta VrGSTs (A to D); the active site cysteine in DHAR
and lambda (E and F); the two probable active site tyrosine residues in EF1G (G and H); and the potential active site cysteine residue in omega VrGSTs (J).
339x442mm (300 x 300 DPI)
Page 34 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Figure 6. Phylogenetic analysis of VrGST proteins. Different GST classes and their branches have been given different colours.
106x137mm (300 x 300 DPI)
Page 35 of 35
https://mc06.manuscriptcentral.com/genome-pubs
Genome