Genome-wide analysis reveals artificial selection on coatcolour and reproductive traits in Chinese domestic pigs
CHAO WANG,* HONGYANG WANG,* YU ZHANG,* ZHONGLIN TANG,† KUI LI† and BANG LIU*
*Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural
University, Wuhan 430070, China, †Key Laboratory of Farm Animal Genetic Resources and Germplasm Innovation of Ministry of
Agriculture, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
Abstract
Pigs from Asia and Europe were independently domesticated from c. 9000 years ago. During this period, strong artifi-
cial selection has led to dramatic phenotypic changes in domestic pigs. However, the genetic basis underlying these
morphological and behavioural adaptations is relatively unknown, particularly for indigenous Chinese pigs. Here,
we performed a genome-wide analysis to screen 196 regions with selective sweep signals in Tongcheng pigs, which
are a typical indigenous Chinese breed. Genes located in these regions have been found to be involved in lipid
metabolism, melanocyte differentiation, neural development and other biological processes, which coincide with the
evolutionary phenotypic changes in this breed. A synonymous substitution, c.669T>C, in ESR1, which colocalizes
with a major quantitative trait locus for litter size, shows extreme differences in allele frequency between Tongcheng
pigs and wild boars. Notably, the variant C allele in this locus exhibits high allele frequency in most Chinese popu-
lations, suggesting a consequence of positive selection. Five genes (PRM1, PRM2, TNP2, GPR149 and JMJD1C)
related to reproductive traits were found to have high haplotype similarity in Chinese breeds. Two selected genes,
MITF and EDNRB, are implied to shape the two-end black colour trait in Tongcheng pig. Subsequent SNP
microarray studies of five Chinese white-spotted breeds displayed a concordant signature at both loci, suggesting
that these two genes are responsible for colour variations in Chinese breeds. Utilizing massively parallel sequencing,
we characterized the candidate sites that adapt to artificial and environmental selections during the Chinese pig
domestication. This study provides fundamental proof for further research on the evolutionary adaptation of
Chinese pigs.
Keywords: evolution, genomic resequencing, melanocyte, reproductive traits, selective sweep, Tongcheng pigs
Received 1 May 2014; revision received 12 July 2014; accepted 17 July 2014
Introduction
The domestication of wild animals to meet human
demand through generations of selective breeding is a
remarkable activity in the history of modern human civi-
lization. After dogs and sheep, pigs, as an indispensable
commercial livestock, are the third animal species to be
domesticated. Wild boars originated in south-east Asian
5.3–3.5 Ma and then split into the Asian and European
subspecies c. 1 Ma (Groenen et al. 2012). Extensive
archaeological records and molecular evidence have sug-
gested that multiple centres of porcine domestication
occurred across Eurasia c. 9000 years ago (Bokonyi 1974;
Giuffra et al. 2000; Kijas & Andersson 2001; Larson et al.
2005). Subsequently, domestication occurred in parallel
in Asia and Europe with local wild boars (Giuffra et al.
2000; Fang et al. 2009). With long-term climate fluctua-
tions, human hunting and follow-up stock-raising activi-
ties in particular, the population and geographical
distribution of these swine have greatly varied from wild
boars (Larson et al. 2010).
The evolution of the domestic pigs is related to dra-
matic phenotypic changes in behaviour, reproduction,
growth, and coat colour. As stockbreeding has devel-
oped from the 18th century (Darwin 1868; Larson et al.
2007), desirable traits, such as lean growth and bone
development, have been enhanced. Some of the genetic
variations behind these favourable phenotypes have
been mapped and well-characterized. For example, a sin-
gle nucleotide substitution in intron 3 of IGF causes a
major QTL effect on muscle content (Van Laere et al.
2003). Rubin et al. (2012) determined that selection at the
NR6A1, PLAG1 and LOCRL loci has major effects on
elongation of pig back. Several other morphological
changes, such as coat colour, have also undergoneCorrespondence: Bang Liu, Fax: +8602787280408;
E-mail: [email protected]
© 2014 John Wiley & Sons Ltd
Molecular Ecology Resources (2014) doi: 10.1111/1755-0998.12311
genetic modification. Molecular evidence was found that
the MC1R gene affects melanin synthesis, and it shows a
distinct phylogenetic relationship between the Asian and
European clades (Fang et al. 2009).
Asian pigs have evolved distinct characteristics
because of independent domestication. For example,
high fertility and fatness are two of the most favourable
traits of the Chinese pig breed throughout history. Both
male and female Chinese indigenous pigs reach sexual
maturity at a relatively early age. The average age of the
first expressed oestrus with ovulation in gilts is around
98 days, compared with 200 days for European domestic
pigs. Boars exhibit initial ‘sex behaviour’ at ~50 days and
are able to mate from an average of 128 days (Wang et al.
2011). The fat percentage in indigenous Chinese pigs is
normally above 40% when they are 90 kg, and a big and
dropping abdomen could be observed in a wide range of
Chinese breeds (Wang et al. 2011).
However, the genetic variations underlying the phe-
notypic changes in domestic Chinese pigs remain rela-
tively unknown. To address this issue, we studied
Tongcheng pigs (Fig. 1a), a two-end black coloured
breed in central China, which possesses the common
characteristics of most Chinese breeds. We conducted
whole-genome sequencing of Tongcheng pigs to uncover
genetic variations under artificial selection. We focused
on six selective sweeps related to coat colour and repro-
ductive traits and provide further evidence to demon-
strate that these genetic loci are functionally conserved
across Chinese pigs.
Materials and methods
Whole-genome sequencing and mapping
Ear tissues of 22 Tongcheng pigs were collected from the
Tongcheng pig conservation farm (Hubei, China). Geno-
mic DNA was extracted using a routine phenol–chloro-
form method, and it was diluted to a final concentration
of 50 ng/lL. DNA samples from 16 females and two
males were equally mixed to construct two pooled
libraries to reduce sequencing bias. Four Tongcheng
female DNA samples were used to construct the individ-
ual libraries. All libraries were generated with a mean
insertion size of 500 bp, and they were sequenced in 100-
bp paired-end reads with a Hiseq2000 instrument (Illu-
mina, USA). The raw sequence reads were trimmed by
removing the index and barcoding sequences, and
unpaired reads were discarded. We downloaded addi-
tional sequencing data from 66 individuals, including
seven Chinese wild boars, 22 pigs from eight Chinese
domestic breeds, 25 pigs from four European domestic
breeds, six European wild boars and six individuals from
outgroup species (Table S1, Supporting information)
(Groenen et al. 2012; Li et al. 2013). All cleaned reads
were aligned against the SSCROFA 10.2 reference genome
by BOWTIE2 (Langmead & Salzberg 2012). Alignment was
performed using an ‘end-to-end’ mapping strategy with
a sensitive setting (-D 15 -R 2 –N 0 –L 22 –i S,1,1.15). We
removed reads with multiple mapping locations, which
may induce false-positive errors in the variant calling
step. Alignment archives were merged, converted to
BAM files, and subsequently sorted and indexed using
SAMTOOLS (Li et al. 2009). Postprocedures, including gap
realignment and base recalibration, were performed by
GATK (McKenna et al. 2010).
Variant calling
To detect genomic regions under artificial selection in
Tongcheng pigs, we called the variations from the Ton-
gcheng pool and seven Chinese wild boars by the Uni-
fiedGenotyper function in GATK. Only biallelic SNPs with
a minimum quality of 100 were selected. We further
required that the coverage for each SNP should be above
five in Tongcheng pool and in at least five Chinese wild
boars. For each SNP, genotypes were called from the
Chinese wild boars with a coverage >5. After filtering,
7 416 043 high-quality SNPs were generated for subse-
quent analyses (SNP data have been submitted in Dryad
with Accession no. doi:10.5061/dryad.8c930).
Identification of SNPs with different allele frequenciesbetween Tongcheng and Chinese wild boars
We analysed SNPs demonstrating different allele fre-
quencies (DAF) between the Tongcheng and Chinese
wild boars populations. We scanned the ~7 M SNP data
set by requiring the reference allele frequency of >0.8 in
one population and <0.2 in another population. Because
the SSCROFA 10.2 reference assembly was generated from
a domesticated pig, we further employed five outgroup
species, including Sus barbatus, Sus verrucosus, Sus
cebifrons, Sus celebensis and Phacochoerus africanus to deter-
mine whether the reference allele is ancestral or derived
(Groenen et al. 2012) (see Table S1, Supporting informa-
tion). At genomic coordinates of the DAF SNPs, geno-
types were called in the outgroup individuals with a
coverage >5 using GATK (McKenna et al. 2010). The ances-
tral/derived allele was determined only when all
outgroup individuals had the same genotype. Conse-
quently, 229 DAF SNPs in gene coding regions were
identified using ANNOVAR (Wang et al. 2010).
Analysis of selective sweep
Screening for selective sweeps was performed by using
sliding windows. Before analysis, we estimated the
© 2014 John Wiley & Sons Ltd
2 C. WANG ET AL .
distribution of SNP counts in 50-, 100-, 150- and 200-kb
windows with half sliding steps (Fig. S1, Supporting
information). We chose 150 kb as an appropriate size
because it contained few windows with SNPs < 10
(1.4%), and it was also more sensitive for detecting small
regions compared with the larger window sizes. For each
150-kb window, we used the homozygosity (Hp) and fix-
ation index (FST) methods to search for selection signals
in the genome of the Tongcheng pig. In the homozygos-
ity analysis, we separately summed the number of major
and minor allele reads from all SNPs within the 150-kb
window in the Tongcheng pool and then estimated
homozygosity following the formula described by Rubin
et al. (2010). Next, we calculated the FST values between
Tongcheng and Chinese wild boars for the individual
SNPs (Weir & Cockerham 1984). The allele frequency in
two populations was separately assessed by the number
of allele reads in the Tongcheng pool and reliable geno-
types in Chinese wild boars. We then averaged the single
FST within the 150-kb window. To avoid spurious selec-
tion signals, we discarded 484 of 34 569 windows con-
taining fewer than 10 SNPs. For the abnormal
homozygosity distribution was observed in chromosome
X, we separately plotted the distributions for Hp and FSTin the autosomes. The putative selected windows were
extracted from the intersection of 3% of the right tail of
(a)
(c)
(b)Histogram of Hp Histogram of Fst
μ = 0.26σ = 0.06
μ = 0.11σ = 0.04
Tongcheng pigs
0.0 0.1 0.2 0.3 0.4 0.50.0 0.1 0.2 0.3 0.40
200
600
800
400
020
060
080
010
0040
0
Threshold = 0.12
Threshold = 0.20
Fst
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Hp
0.0
0.1
0.2
0.3
0.4
0.5
1 32 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Chromosome
Fig. 1 Genome-wide selection analysis of Tongcheng pigs. (a) Image of Tongcheng pigs. (b) Histograms of the 150-kb windowed het-
erozygosity (Hp) and fixation index (FST) of the autosomes. (c) Plot of the Hp and FST values for the Tongcheng pigs along the auto-
somes.
© 2014 John Wiley & Sons Ltd
GENOME-WIDE ANALYSIS OF CHINESE PIGS 3
the FST distribution (FST > 0.2) and 3% of the left tail of
the Hp distribution (Hp < 0.12). Neighbouring target
windows were merged. All selected regions were consid-
ered as genetic intervals that were subject to strong artifi-
cial selection during domestication. Using the same
thresholds, we detected the selection signatures in chro-
mosome X. Genes residing within selected regions were
identified using ENSEMBL gene annotation.
Gene ontology analysis
A total of 407 human genes were found to be ortholo-
gous with genes in the selected regions by searching
‘one2one’ homology type genes in the BIOMART online tool
(http://www.ENSEMBL.org/biomart/martview/). The
orthologous genes were uploaded to the DAVID online tool
to test for enrichment in gene ontology (GO) terms (da
Huang et al. 2009a,b). Terms with P-values (EASE Score)
<0.05 were considered to be significantly enriched in this
study.
SNP data resource and analysis
We employed published SNP data from the Porcine 60K
beadchip (Illumina) of 181 Chinese indigenous pigs from
five white-spotted breeds, including Tongcheng, Ningxi-
ang, Luchuan, Bama and Wuzhishan pigs, and the solid
black breed, Laiwu pigs (Table S2, Fig. S2, Supporting
information) (Yang et al. 2014). SNPs were located by
mapping probe sequences against the SSCROFA 10.2 refer-
ence genome via the local BLAST tool (Camacho et al.
2009). Unmapped and ambiguously mapped SNPs were
discarded. The remaining SNPs were further filtered by
requiring the minor allele frequency to be >0.05 and the
ratio of missing genotypes lower than 0.2. Selection sig-
natures were detected by calculating the homozygosity
with a window that contained five adjacent SNPs and
slid along the genome with one SNP step. We counted
the number of major and minor alleles per population
and used same formula (Rubin et al. 2010) to calculate
the windowed homozygosity.
Haplotype similarity analysis
To examine the haplotype similarity between Tongcheng
pigs and other breeds, we screened variants in four
regions identified by selective sweep analysis (Table S3,
Supporting information). SNPs were called from 54 indi-
vidually sequenced pigs from 11 populations, including
Chinese wild boars, Tongcheng, Meishan, Jinghua,
Neijiang, Penzhou, Yanan, Wujin, Duroc, Large white
and Landrace pigs (see Table S1, Supporting informa-
tion) by GATK. Biallelic SNPs with qualities above 100
were selected, and reliable genotypes were called from
individuals with a coverage >5. To obtain informative
SNPs, we required the minor allele frequency to be >0.05and the genotype calling rate to be >0.8 for each SNP.
After filtering, a total of 8297 SNPs were selected in these
four regions (SNP data for each region have been submit-
ted in Dryad with Accession no. doi: 10.5061/dryad.
8c930).
The haplotype similarity between Tongcheng pigs
and other breeds was visualized by identity score (IS) in
a pairwise comparison. Considering that a 150-kb win-
dow with a half overlapping step was used for selective
sweep analysis, we chose 75 kb consecutive windows to
calculate IS values. At each SNP, genotypes were called
in the individuals with a coverage >5, and the allele fre-
quencies were estimated per population. Then, the IS for
each 75-kb window was calculated as described (Rubin
et al. 2010).
Phylogeny analysis
Sequences for PRM1, PRM2, TNP2, GPR149 and JMJD1C
were extracted from all 70 individually sequenced pigs
according to genomic position by SAMTOOLS (Li et al.
2009). PRM1, PRM2 and TNP2 were analysed together
because they are physically linked in the pig genome.
Flanking sequences were extended 25 kb in the GPR149
analysis due to its small size. A phylogenetic tree was
constructed with Warthog as the outgroup using RAXML
in the GTRGAMMA model (Stamatakis 2006). The best tree
was selected after 50 iterations.
Results
Whole-genome sequencing of Tongcheng pigs
Sequencing of four Tongcheng individuals and the Ton-
gcheng pool generated a total of 165.8 billion 2 9 100 bp
paired-end reads, and 109.1 billion reads (65.8%) were
successfully mapped to the SSCROFA 10.2 reference assem-
bly with a unique location (Table 1). Consequently, the
average sequencing depth was 4.6–6.49 for the Tongch-
eng individuals and 18.29 for the pool. Approximately
74% of the reference genome was covered by sequencing
reads for each sample. We identified 7 416 043 high-
quality SNPs from the Tongcheng pool and seven
Chinese wild boars.
Within the ~7 M SNP data set, 229 SNPs showed
remarkable differences in allele frequency between Ton-
gcheng and Chinese wild boars in gene coding sequences
(Table S4, Supporting information). Among these varia-
tions, 72 nonsynonymous and 102 synonymous substitu-
tions were observed in the Tongcheng population, in
comparison with 16 nonsynonymous and 39 synony-
mous substitutions in Chinese wild boars. A Fisher’s
© 2014 John Wiley & Sons Ltd
4 C. WANG ET AL .
exact test of the ratio of nonsynonymous/synonymous
substitutions suggested no substantial differentiation
between the two populations (two tail Fisher’s exact test,
P = 0.113). In contrast to the results of the European
domestics and wild boars (Rubin et al. 2012), this finding
is consistent with the historical fact that the Chinese
indigenous pigs have not undergone a selection as
intense as that experienced by European domestics.
Selective sweep analysis
To accurately detect genomic regions under artificial
selection in Tongcheng pigs, we measured the homozy-
gosity (Hp) and fixation index (FST) in the 150-kb win-
dows with half sliding step along the pig genome. The
windows simultaneously with significantly low Hp val-
ues (3% right tail, where Hp is 0.12) and significantly
high FST values (3% right tail, where FST is 0.2) were con-
sidered as the target windows in Tongcheng pigs
(Fig. 1b,c). Neighbouring windows were further merged
into the selected regions. As a result, 196 putative
selected regions with a total length of 56.9 Mb were iden-
tified from 15 autosomes and chromosome X, accounting
for 2.19% of the entire genome (Table S5, Supporting
information).
From the DAF SNP list above, 21 nonsynonymous
substitutions in 11 genes (ER, PLAA, ENSSSCG00000
005241, NPAP1, OR1F1, RP1, TBX19, WIF1, MACF1,
MTR1 and HEATR1) were found colocalized with the
selected regions, suggesting that these substitutions
increase in frequency for positive selection. We analysed
the GO of the 570 protein-coding genes embedded in the
selected regions and detected 34 significantly enriched
GO terms (P < 0.05) (Table S6, Supporting information),
including some categories that were identified to play
important roles in selective breeding. Ten genes,
PLA2G4F, PLBD2, SULT2A1, HSD17B14, YWHAH, LIPE,
PLCB2, PLA1A, PLA2G2A and ECI1, are predominantly
related to the ‘lipid catabolic process’ (Term ID: 0016042,
P = 0.01), which is in agreement with the favoured selec-
tion for fatness in Chinese pigs. Three genes, TYRP1,
MITF and EDNRB, belong to the ‘melanocyte differentia-
tion’ category (Term ID: 0030318, P = 0.03), which is
probably associated with the two-end black colour trait
in Tongcheng pigs. Eight genes, EDNRB, GRIA2, EGR2,
ADORA2A, PPFIA3, YWHAH, GRIN2D and NPY5R, and
three genes, ADORA2A, GLRB and GRIN2D, were found
to be associated with the ‘regulation of neurological sys-
tem process’ (Term ID: 0031644, P = 0.05) and ‘startle
response’ (Term ID: 0001964, P = 0.04) categories, respec-
tively, and are assumed to be involved in behaviour
adaptation during domestication. Candidate genes
linked to these biological processes may reflect the selec-
tion on functional variation throughout domestication of
the Tongcheng breed. In the following sections, we
explore the general functions of candidate genes related
to coat colour (MITF and EDNRB) and reproduction
(ESR1, PRM1, PRM2, TNP2, GPR149 and JMJD1C) across
the Chinese pig populations.
On chromosome X, a highly homozygous region was
observed at 64.7–101.2 Mb (average Hp = 0.012) in Ton-
gcheng pigs (Fig. 2a), but not in Chinese wild boars
(average Hp = 0.366). The intermediate sequence diver-
gence (average FST = 0.142) implies that this fixation is
not probably shaped by artificial selection. Of note, this
homozygous fragment has also been reported in both
European domestics and wild boars, where they share
the same haplotype (Rubin et al. 2012). We evaluated the
haplotype similarity of this fragment between Tongch-
eng pigs and other populations, including six Chinese
and three European domestic breeds (Fig. 2b). Conse-
quently, Meishan, Jinghua, Neijiang, Wujin and Yanan
pigs exhibited extremely high haplotype similarity (aver-
age IS > 0.989) with Tongcheng pigs, whereas the Euro-
pean domestic breeds, Large White, Landrace and Duroc
were fixed in another haplotype (average IS < 0.046).
This haplotype divergence confirms that fixation was
independently established in Chinese and European
populations. Unexpectedly, fixation did not occur in one
Chinese domestic breed, Penzhou pigs (average
IS = 0.657), suggesting that the fixation process could be
still occurring in some breeds. Moreover, an remarkably
heterozygous region (average Hp = 0.47) spanning over
12 Mb was observed at 51.45–63.53 Mb on chromosome
X. Normally, heterozygosity in a large-scale region is the
result of potential segmental duplications. In this case,
reads from homologous fragments mapping to the same
reference sequence could dramatically increase the level
Table 1 Sequencing information of five Tongcheng samples
Sample ID Sample type Cleaned reads Uniquely mapped reads Average depth Genome coverage (%) Accession nos
TC_1p Pool 847 969 874 557 583 049 18.39 74.85 SRX510749
TC_1d Individual 225 359 876 107 266 477 4.609 73.94 SRX473146
TC_2d Individual 243 645 292 116 055 070 4.889 74.26 SRX473147
TC_3d Individual 175 233 414 149 661 767 6.059 74.64 SRX473148
TC_4d Individual 165 891 834 160 293 426 6.589 74.42 SRX473149
© 2014 John Wiley & Sons Ltd
GENOME-WIDE ANALYSIS OF CHINESE PIGS 5
of nucleotide polymorphism. However, no significant
increase in coverage was observed in this heterozygous
region when comparing the flanking genomic intervals.
Further study is needed for exploring the origin of this
heterozygous region.
Two candidate genes for white spotting
Coat colour is a remarkable morphologic feature for
breed standard. Two genes in the selected regions, endo-
thelin receptor type b (EDNRB) and microphthalmia-
associated transcription factor (MITF), were regarded
as strong candidates for two-end black colour in
Tongcheng pigs due to their functional importance in
melanogenesis (Fig. 3a). The EDNRB gene encodes a
seven-transmembrane G protein-coupled receptor and
contributes to coat colour phenotypes in many mam-
mals. In mice, the mutant EDNRB causes the classic
piebald (s) colour (Yamada et al. 2006), and the server
mutant piebald lethal (sl) (Ceccherini et al. 1995). More
importantly, the significant EDNRB signature
(ch11:54.6–54.75 Mb, Hp = 0.10, FST = 0.24) coincides
with results from a genome association study of the
two-end black colour in Chinese pigs (Ai et al. 2013),
which makes this gene as an ideal positive control for
this selection study. MITF (Hp = 0.01, FST = 0.25) was
identified from a selected region located at 56.4–56.7 Mb
on chromosome 13. This gene encodes a basic helix-
loop-helix (hHLH)-leucine zipper protein that is a mas-
ter regulator for melanocyte development (Hou & Pavan
2008). MITF is associated with the dominant white col-
our in cattle (Philipp et al. 2011) and the splashed white
colour in horse (Hauswirth et al. 2012), and it is respon-
sible for the mi locus in mouse (Hodgkinson et al. 1993;
Steingrimsson et al. 1994).
To further elucidate the relationship between these
two candidate genes and white spotting variation in
Chinese pigs, we searched for selection signals in six
Chinese indigenous breeds with diverse colour patterns
in microarray data (see Materials and methods). At the
EDNRB and MITF selected regions detected by sequenc-
ing, the homozygosity dramatically dropped in all five
white-spotted breeds, but no such reduction was
observed in solid black Laiwu pigs (Fig. 3b). In intervals
10
CWB
Chin
ese
Dom
estic
Euro
pean
Dom
estic
TCMSJHNJWJYNPZLWLRDU
Hp (Tongcheng pigs)Hp (Chinese wild boars)Fst (TC vs CWB)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 Mb
Haplotype similarity in pairwise comparison between Tongcheng and other breeds
Hp/
Fst
identity score
R1 R2
Selection signatures in Chromsome X
(a)
(b)
Fig. 2 Selection analysis of chromosome X. (a) Distribution of heterozygosity (Hp) for Tongcheng pigs (blue spots) and Chinese wild
boars (orange spots), and the fixation index (FST, grey line) between the two populations on chromosome X. All values were estimated
based on 150 kb sliding windows. One highly heterozygous region is found at 51.45–63.53 Mb (R1). A remarkable homozygosity was
observed in a region ranging from 64.7 to 101.2 Mb in Tongcheng pigs (R2). (b) Haplotype similarity in pairwise comparison between
Tongcheng pigs and other breeds. The haplotype similarity was estimated by the identity score with 75-kb windows. Meishan (MS),
Jinghua (JH), Neijiang (NJ), Wujin (WJ) and Yanan (YN) pigs are homozygous for the same haplotype as Tongcheng pigs. Large white
(LW), Landrace (LR) and Duroc (DU) pigs are fixed for the distinct haplotype. Moreover, Chinese wild boars (CWB) and Pengzhou (PZ)
pigs are not fixed in this region.
© 2014 John Wiley & Sons Ltd
6 C. WANG ET AL .
with extremely low homozygosity, white-spotted breeds
shared an identical haplotype, which comprised five
neighbouring SNPs (ALGA0062329, INRA0036477,
MARC0048926, ASGA0050902 and INRA0036487) rang-
ing from 54.71 to 54.90 Mb on chromosome 11 (Fig. S3A,
Supporting information). At the MITF locus, white-spot-
ted pigs were fixed for a single haplotype consisting of
eight adjacent SNPs (ASGA0057575, ASGA0057576,
INRA0040199, ASGA0057578, ALGA0070125, ALGA007
0129, ALGA0070134 and SIRI0000807), which spanned
56.31–56.56 Mb on chromosome 13 (Fig. S3B, Supporting
information). These shared haplotypes are in perfect
positional concordance with the selected regions identi-
fied in the Tongcheng pigs, strongly indicating that
EDNRB and MITF are the cause of the white spotting
patterns in Chinese pigs. Notably, two Ningxiang pigs
were found to be heterozygous for the shared haplotypes
in the MITF and EDNRB loci. Ningxiang pigs have vari-
ous colour patterns: most of Ningxiang pigs show ‘black
clouds overhanging snows with a silver ring around the
neck’, and a few Ningxiang pigs exhibit ‘two-end black
with an additional black patch in the back’. Thus, the
unstable inheritance of the colour pattern in this breed
may explain why a few heterozygous individuals were
observed at the MITF and EDNRB loci. In the Tongcheng
pool, 488 and 35 SNPs in the MITF and EDNRB genes,
respectively, were found fixed at the alternative allele.
According to ENSEMBL annotation and reported tran-
script information (Bourneuf et al. 2011), none of these
variants are located in gene coding regions.
Four putative sweeps related to reproductive traits
Only one gene, oestrogen receptor 1 (ESR1), was
detected in the selected region found at 16.73–16.88 Mb
on chromosome 1 (Fig. 4a). A PvuII polymorphism in the
ESR1 gene is highly associated with litter size, and the
beneficial allele from the Meishan breed increases pro-
duction by 2.3 pigs in the first parities and 1.5 pigs aver-
age over all parities (Rothschild et al. 1996). Within the
selected region of ESR1, a DAF A to G SNP (chr1:
16779229) was found to be incorrectly annotated by EN-
SEMBL. Based on its mRNA reference (GenBank:
NM_214220), we reassigned this variation to correct it
into a synonymous substitution, c.669T>C, in the third
exon. This variant C allele was first reported from cDNA
sequence scanning in a Chinese-European pig line, in
which c.669T>C showed the same segregation pattern as
the PvuII polymorphism (Munoz et al. 2007). Moreover,
this synonymous substitution was significantly associ-
ated with the nonreturn rate in a boar fertility study
(Gunawan et al. 2011). We genotyped this SNP in indi-
vidually sequenced pigs, and variant C allele was detect-
able in all Chinese pigs and Large White pigs, but not in
other European breeds (Fig. 4b).
In addition, three putative selected regions related to
reproductive traits were identified (Fig. 4c). The first
region (Hp = 0.087, FST = 0.217) at 32.55–32.70 Mb on
chromosome 3 contains three physically linked genes:
PRM1 (sperm protamine p1), PRM2 (sperm protamine
p2) and TNP2 (transition protein 2). PRM1 and PRM2
58 Mb54 56 60500.
00.
10.
20.
30.
464 Mb
0.0
0.1
0.2
0.3
0.4
150
K bW
indo
ws
150
KbW
indo
ws
Hp/
Fst
Hp/
Fst
Chr11: Chr13:
MITFEDNRB
HpFst
5 SN
Ps w
indo
w
BamaNingxiangWuzhishanTongcheng
Laiwu
Luchuan
54 55 56
0.0
0.1
0.2
0.3
0.4
0.5
55 570.
00.
10.
20.
30.
40.
5
Selected region in EDNRB locus Selected region in MITF locus
Chr11: Chr13:
Hp
5 SN
Ps w
indo
w
Hp
(a)
(b)
Fig. 3 Two candidate genes associated
with white spotting patterns. (a) Selection
signatures observed in the EDNRB and
MITF loci of Tongcheng pigs. Selected
regions are indicated by the grey back-
ground. (b) At selected regions of the
MITF and EDNRB genes, coinciding selec-
tion signals were found in five white-
spotted breeds from 60 K SNP microarray
data. Selection signals were assessed by
calculating the heterozygosity of five adja-
cent SNPs in each breed, and solid black
Laiwu pigs were used as control.
© 2014 John Wiley & Sons Ltd
GENOME-WIDE ANALYSIS OF CHINESE PIGS 7
encode protamines, which are necessary for sperm head
condensation and DNA stabilization. It has been found
that PRM1 and PRM2 deficiency in mice leads to sperm
morphology defects, motility reduction and infertility
(Cho et al. 2001). Tnp2 is essential for maintaining normal
Prm2 processing and completing chromatin condensa-
tion (Zhao et al. 2001). The second region (Hp = 0.081,
FST = 0.242) found at 102.38–102.53 Mb on chromosome
13 contains the GPR149 (G protein-coupled receptor 149)
gene. The GPR149 gene is conserved in vertebrates and
highly expressed in ovaries. Gpr149 null mice are one of
a few models with increased fertility and enhanced
ovulation (Edson et al. 2010). The third putative region
(Hp = 0.075, FST = 0.229) was identified at 71.85–72.38 Mb
chr1:14 16 18 Mb
0.0
0.1
0.2
0.3
0.4
ESR1
TC vs CWBLWLRDRMSJHNJPZWJYN
TC vs
TC vs
32.48 32.78 102.3 102.68 71.78 72.15 Mb
*07899 RMI2 PRM1PRM2TNP2
SOCS1
ARHGEF26 GPR149
DHX36
NRBF2JMJD1C
1
0
0.0
0.1
0.2
0.3
0.4
Hp/
Fst
Hp/
Fst
chr3:30 32 34
0.0
0.1
0.2
0.3
0.4
chr13:100 102 1040.
00.
10.
20.
30.
4
chr14:69 71 73 Mb
HpFst
HpFst
Iden
tity
Scor
e
0 10 20 30
Chinese wild boarsTongcheng pigs
Meishan pigsJinhua pigs
Neijiang pigsPenzhou pigs
Yanan pigsEuropean wild boars
Large white pigsLandrace pigs
Duroc pigs
C allelecount
T allelecount
Chin
ese
Euro
pean
Allele frequency of c.669 in ESR1 in each population(a)
(c)
(d)
(b)
Fig. 4 Four selected regions related to the reproductive traits. (a) Selection signature observed at the ESR1 locus. (b) The C and T allele
frequency of the c.669 substitution in ESR1 in each population. The counts for the C and T alleles were assessed by genotypes called
from individually sequenced pigs for each population. (c) Three selected regions putatively associated with reproduction traits in Ton-
gcheng pigs. (d) Haplotype similarity in pairwise comparisons between Tongcheng pigs (TC) and other breeds. The majority of Chinese
populations, including Jinghua (JH), Meishan (MS), Yanan (YN), Neijiang (NJ), Penzhou (PZ) and Wujing (WJ) pigs, showed high hap-
lotype similarity at the three genetic loci compared with Chinese wild boars (CWB), Large white (LW), Landrace (LR) and Duroc (DU)
pigs.
© 2014 John Wiley & Sons Ltd
8 C. WANG ET AL .
on chromosome 14 and contains the JMJD1C gene (jum-
onji domain-containing protein), which is important for
spermiogenesis, and contributes to long-term mainte-
nance of the male germ line in mice (Kuroki et al. 2013).
A genome-wide association study indicated that JMJD1C
may influence serum androgen levels in men (Jin et al.
2012). A pairwise comparison between Tongcheng and
other populations demonstrated that Chinese domestic
breeds share high sequence similarity in the three
selected regions. To confirm this finding, we constructed
the phylogenetic tree using the candidate genes within
the three selected regions. As expected, the majority of
Chinese pigs were clustered in the same clade (Fig. S4,
Supporting information), suggesting that selection in
these loci occurred before the creation of pig breeds.
Interestingly, this clade grouped together with the North
and South Chinese wild boars for the GPR149 gene, but
it clustered with the Southwest Chinese wild boars for
the JMJD1C gene. This result indicates that the favoured
alleles at these two loci originated from different wild
boar populations.
Discussion
It is well known that artificial selection has greatly
shaped pig genomic variability during the process of pig
domestication and breeding. In this study, we utilized
whole-genome resequencing to screen regions under
artificial selection in Tongcheng pigs. By functionally cat-
egorising the genes within the selected regions, we dis-
cussed the genetic model of selection in Tongcheng pigs
during the domestication. We intersected the selected
regions in our analysis with 61.44 Mb of selective sweeps
of European domestic pigs (Rubin et al. 2012), and only
1.18 Mb of the regions overlapped between two data
sets, implying that the European and Asian pigs were
subject to distinct selection patterns during their inde-
pendent domestications. This finding is also in agree-
ment with the genetic discrepancy found between
Tibetan and Duroc pigs (Li et al. 2013).
Previous studies of the KIT allele in Chinese pigs
showed no apparent linkage between colour pattern and
dominant white (Xu et al. 2006; Lai et al. 2007), which is a
classic colour locus responsible for the white spotting
colours in European pigs (Johansson Moller et al. 1996;
Marklund et al. 1998; Giuffra et al. 1999). In this study,
we demonstrate that two colour related genes, MITF and
EDNRB, are under strong artificial selection in Chinese
white-spotted pigs. To date, this is the first report of the
association between MITF and white spotting variation
in Chinese pig populations. We deduced that the tested
white-spotted breeds may possess different mutant
alleles for MITF and EDNRB, and interactions between
them result in different regulatory effects for shaping
diverse colour patterns. Generally, coregulation of multi-
ple mutants provides much more phenotypic variations
than a single mutant, which is a reasonable explanation
for why there are so many colour patterns in Chinese
populations. In European pigs, although different white
spotting colours are all governed by the KIT gene, at least
four duplications and one splicing mutation have been
identified at this locus (Rubin et al. 2012). The haplotype
effect of these mutations creates a wide range of colour
patterns, including the belt colour in Hampshire pigs,
the patch colour in Pietrain pigs and the completely
white colour in Large White and Landrace pigs. Exami-
nation of Tongcheng pool sequencing data for MITF and
EDNRB revealed no obvious candidate mutations in cod-
ing sequences, suggesting that they most probably repre-
sent regulatory mutations. The relationship between
regulatory mutations in MITF and EDNRB and depig-
mentation has been reported in many cases. The Mitfmi-bw
allele had a long interspersed element-1 insertion in
intron 3, which decreases the expression of functional
Mitf-M, and makes mice completely white (Takeda et al.
2014). In dogs, regulatory mutations in the melanocyte-
specific promoter of MITF cause white spotting in boxers
and bull terriers (Karlsson et al. 2007). A 5.5-kb retropo-
son-like element insertion in intron 1 markedly reduces
the expression of EDNRB, and causes the Piebald white
spotting colour in mice (Yamada et al. 2006).
At the DAF c.669T>C SNP identified in the selected
gene ESR1, the C allele strikingly increases in frequency
in most Chinese domestic breeds compared with Chinese
wild boars, which may be the result of favoured selection
on high litter size. In contrast, appearance of the C allele
in only the Large White breed but not other European
pigs could be caused by introgression of near Asian pigs
into Europe during the 18th–19th centuries (Giuffra et al.
2000). Male reproduction is an important standard for
pig breeding. Chinese indigenous boars usually undergo
pubertal development at a young age and have smaller
adult testicular size and greater serum FSH concentra-
tions (Borg et al. 1993; Wise et al. 1996). In our study, four
candidate genes, PRM1, PRM2, TNP2 and JMJD1C, were
functionally associated with male reproduction. Interest-
ingly, a testicular weight QTL identified in the Meishan
X Duroc F2 population colocalized with PRM1, PRM2
and TNP2 (Sato et al. 2003), indicating that these genes
are most probably involved in shaping the particular
reproductive characteristics of Chinese boars.
Acknowledgements
We thank BerryGenomics (Beijing, China) for preparing the
DNA libraries and performing sequencing, and Dr. Carl-Johan
Rubin and Alvaro Martinez Barrio from Uppsala University for
providing assistance with data analysis. We also thank Prof. Leif
© 2014 John Wiley & Sons Ltd
GENOME-WIDE ANALYSIS OF CHINESE PIGS 9
Andersson from Uppsala University for providing comments on
this work. This study was supported by the Major International
Cooperation NSFC (31210103917) and the National High
Technology Research and Development Program of China
(2011AA100304, 2011AA100302).
References
Ai H, Huang L, Ren J (2013) Genetic diversity, linkage disequilibrium
and selection signatures in Chinese and Western pigs revealed by gen-
ome-wide SNP markers. PLoS One, 8, e56001.
Bokonyi S (1974) History of Domestic Mammals in Central and Eastern
Europe. Akademiai Kiado, Budapest.
Borg KE, Lunstra DD, Christenson RK (1993) Semen characteristics, tes-
ticular size, and reproductive hormone concentrations in mature
duroc, meishan, fengjing, and minzhu boars. Biology of Reproduction,
49, 515–521.
Bourneuf E, Du ZQ, Estelle J et al. (2011) Genetic and functional evalua-
tion of MITF as a candidate gene for cutaneous melanoma predisposi-
tion in pigs. Mammalian Genome, 22, 602–612.
Camacho C, Coulouris G, Avagyan V et al. (2009) BLAST+: architecture
and applications. BMC Bioinformatics, 10, 421.
Ceccherini I, Zhang AL, Matera I et al. (1995) Interstitial deletion of the
endothelin-B receptor gene in the spotting lethal (sl) rat. Human Molec-
ular Genetics, 4, 2089–2096.
Cho C, Willis WD, Goulding EH et al. (2001) Haploinsufficiency of prot-
amine-1 or -2 causes infertility in mice. Nature Genetics, 28, 82–86.
Darwin C (1868) The Variation of Animals and Plants Under Domestication
D. Appleton and company, New York.
Edson MA, Lin YN, Matzuk MM (2010) Deletion of the novel oocyte-
enriched gene, Gpr149, leads to increased fertility in mice. Endocrinol-
ogy, 151, 358–368.
Fang M, Larson G, Ribeiro HS, Li N, Andersson L (2009) Contrasting
mode of evolution at a coat color locus in wild and domestic pigs. PLoS
Genetics, 5, e1000341.
Giuffra E, Evans G, Tornsten A et al. (1999) The Belt mutation in pigs is
an allele at the Dominant white (I/KIT) locus. Mammalian Genome, 10,
1132–1136.
Giuffra E, Kijas JM, Amarger V et al. (2000) The origin of the domestic
pig: independent domestication and subsequent introgression. Genet-
ics, 154, 1785–1791.
Groenen MA, Archibald AL, Uenishi H et al. (2012) Analyses of pig
genomes provide insight into porcine demography and evolution.
Nature, 491, 393–398.
Gunawan A, Kaewmala K, Uddin MJ et al. (2011) Association study and
expression analysis of porcine ESR1 as a candidate gene for boar fertil-
ity and sperm quality. Animal Reproduction Science, 128, 11–21.
Hauswirth R, Haase B, Blatter M et al. (2012) Mutations in MITF and
PAX3 cause “splashed white” and other white spotting phenotypes in
horses. PLoS Genetics, 8, e1002653.
Hodgkinson CA, Moore KJ, Nakayama A et al. (1993) Mutations at the
mouse microphthalmia locus are associated with defects in a gene
encoding a novel basic-helix-loop-helix-zipper protein. Cell, 74,
395–404.
Hou L, Pavan WJ (2008) Transcriptional and signaling regulation in neu-
ral crest stem cell-derived melanocyte development: do all roads lead
to Mitf? Cell Research, 18, 1163–1176.
da Huang W, Sherman BT, Lempicki RA (2009a) Bioinformatics enrich-
ment tools: paths toward the comprehensive functional analysis of
large gene lists. Nucleic Acids Research, 37, 1–13.
da Huang W, Sherman BT, Lempicki RA (2009b) Systematic and integra-
tive analysis of large gene lists using DAVID bioinformatics resources.
Nature Protocols, 4, 44–57.
Jin G, Sun J, Kim ST et al. (2012) Genome-wide association study identi-
fies a new locus JMJD1C at 10q21 that may influence serum androgen
levels in men. Human Molecular Genetics, 21, 5222–5228.
Johansson Moller M, Chaudhary R, Hellmen E et al. (1996) Pigs with the
dominant white coat color phenotype carry a duplication of the KIT
gene encoding the mast/stem cell growth factor receptor. Mammalian
Genome, 7, 822–830.
Karlsson EK, Baranowska I, Wade CM et al. (2007) Efficient mapping of
mendelian traits in dogs through genome-wide association. Nature
Genetics, 39, 1321–1328.
Kijas JM, Andersson L (2001) A phylogenetic study of the origin of the
domestic pig estimated from the near-complete mtDNA genome.
Journal of Molecular Evolution, 52, 302–308.
Kuroki S, Akiyoshi M, Tokura M et al. (2013) JMJD1C, a JmjC domain-
containing protein, is required for long-term maintenance of male
germ cells in mice. Biology of Reproduction, 89, 93.
Lai FJ, Ren J, Ai HS et al. (2007) Chinese white Rongchang pig does not
have the dominant white allele of KIT but has the dominant black
allele of MCIR. Journal of Heredity, 98, 84–87.
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie
2. Nature Methods, 9, 357–359.
Larson G, Dobney K, Albarella U et al. (2005) Worldwide phylogeogra-
phy of wild boar reveals multiple centers of pig domestication. Science,
307, 1618–1621.
Larson G, Albarella U, Dobney K et al. (2007) Ancient DNA, pig domesti-
cation, and the spread of the Neolithic into Europe. Proceedings of the
National Academy of Sciences, USA, 104, 15276–15281.
Larson G, Liu R, Zhao X et al. (2010) Patterns of East Asian pig domesti-
cation, migration, and turnover revealed by modern and ancient DNA.
Proceedings of the National Academy of Sciences, USA, 107, 7686–7691.
Li H, Handsaker B, Wysoker A et al. (2009) The sequence alignment/
map format and SAMTOOLS. Bioinformatics, 25, 2078–2079.
Li M, Tian S, Jin L et al. (2013) Genomic analyses identify distinct patterns
of selection in domesticated pigs and Tibetan wild boars. Nature
Genetics, 45, 1431–1438.
Marklund S, Kijas J, Rodriguez-Martinez H et al. (1998) Molecular basis
for the dominant white phenotype in the domestic pig. Genome
Research, 8, 826–833.
McKenna A, Hanna M, Banks E et al. (2010) The Genome Analysis Tool-
kit: a MapReduce framework for analyzing next-generation DNA
sequencing data. Genome Research, 20, 1297–1303.
Munoz G, Ovilo C, Estelle J et al. (2007) Association with litter size of
new polymorphisms on ESR1 and ESR2 genes in a Chinese-European
pig line. Genetics, Selection, Evolution, 39, 195–206.
Philipp U, Lupp B, Momke S et al. (2011) A MITF mutation associated
with a dominant white phenotype and bilateral deafness in German
Fleckvieh cattle. PLoS One, 6, e28857.
Rothschild M, Jacobson C, Vaske D et al. (1996) The estrogen receptor
locus is associated with a major gene influencing litter size in pigs.
Proceedings of the National Academy of Sciences, USA, 93, 201–205.
Rubin CJ, Zody MC, Eriksson J et al. (2010) Whole-genome resequencing
reveals loci under selection during chicken domestication. Nature, 464,
587–591.
Rubin CJ, Megens HJ, Martinez Barrio A et al. (2012) Strong signatures of
selection in the domestic pig genome. Proceedings of the National
Academy of Sciences, USA, 109, 19529–19536.
Sato S, Oyamada Y, Atsuji K et al. (2003) Quantitative trait loci analysis
for growth and carcass traits in a Meishan x Duroc F2 resource popula-
tion. Journal of Animal Science, 81, 2938–2949.
Stamatakis A (2006) RAXML-VI-HPC: maximum likelihood-based phylo-
genetic analyses with thousands of taxa and mixed models. Bioin-
formatics, 22, 2688–2690.
Steingrimsson E, Moore KJ, Lamoreux ML et al. (1994) Molecular basis of
mouse microphthalmia (mi) mutations helps explain their develop-
mental and phenotypic consequences. Nature Genetics, 8, 256–263.
Takeda K, Hozumi H, Nakai K et al. (2014) Insertion of long interspersed
element-1 in the Mitf gene is associated with altered neurobehavior of
the black-eyed white Mitf(mi-bw) mouse. Genes to Cells, 19, 126–140.
Van Laere AS, Nguyen M, Braunschweig M et al. (2003) A regulatory
mutation in IGF2 causes a major QTL effect on muscle growth in the
pig. Nature, 425, 832–836.
© 2014 John Wiley & Sons Ltd
10 C. WANG ET AL .
Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of
genetic variants from high-throughput sequencing data. Nucleic Acids
Research, 38, e164.
Wang L, Wang A, Wang L et al. (2011) Animal Genetic Resources in China:
Pigs. Chinese Agricultural Press, Beijing, China.
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of
population structure. Evolution, 38, 1358–1370.
Wise T, Lunstra DD, Ford JJ (1996) Differential pituitary and gonadal
function of Chinese Meishan and European white composite boars:
effects of gonadotropin-releasing hormone stimulation, castration, and
steroidal feedback. Biology of Reproduction, 54, 146–153.
Xu GL, Ren J, Ding NS et al. (2006) Genetic analysis of the KIT and MC1R
genes in Chinese indigenous pigs with belt-like coat color phenotypes.
Animal Genetics, 37, 518–519.
Yamada T, Ohtani S, Sakurai T et al. (2006) Reduced expression of the
endothelin receptor type B gene in piebald mice caused by insertion of
a retroposon-like element in intron 1. The Journal of Biological Chemistry,
281, 10799–10807.
Yang S, Li X, Li K, Fan B, Tang Z (2014) A genome-wide scan for signa-
tures of selection in Chinese indigenous and commercial pig breeds.
BMC Genetics, 15, 7.
Zhao M, Shirley CR, Yu YE et al. (2001) Targeted disruption of the transi-
tion protein 2 gene affects sperm chromatin structure and reduces
fertility in mice. Molecular and Cellular Biology, 21, 7243–7255.
B.L., C.W., T.Z. and K.L. designed the study, and C.W.
collected the ear samples and prepared the DNA for
sequencing. C.W., H.W. and Y.Z. performed bioinformat-
ics analysis, and B.L. and C.W. wrote the article.
Data Accessibility
All of the Tongcheng pig sequencing data were submit-
ted to the SRA database in NCBI under Accession nos
SRX473146–SRX473149 and SRX510749. SNP data used
for selective sweep analysis and haplotype similarity
analysis have been submitted in Dryad with Accession
no. doi: 10.5061/dryad.8c930.
Supporting Information
Additional Supporting Information may be found in the online
version of this article:
Fig. S1 Distribution of SNP counts with 50-, 100-, 150- and 200-
kb window sizes.
Fig. S2 Images of five Chinese indigenous breeds with different
colour patterns.
Fig. S3 Genotypes of different pigs in the EDNRB and MITF
regions.
Fig. S4 Phylogenetic tree of genes in three selected regions.
Table S1 Information of downloaded individual sequencing
data.
Table S2 Information of breeds used for 60 K SNPs microarray
genotyping.
Table S3 Information of four regions used for haplotype com-
parison.
Table S4 SNPs with different allele frequency between Tongch-
eng and Chinese wild boars.
Table S5 Putative selected regions in Tongcheng pigs.
Table S6 Gene ontology analysis of the candidate genes.
© 2014 John Wiley & Sons Ltd
GENOME-WIDE ANALYSIS OF CHINESE PIGS 11