Post on 23-Jan-2019
transcript
UNIVERSIDADE DE LISBOA
FACULDADE DE CIÊNCIAS
DEPARTAMENTO DE BIOLOGIA ANIMAL
Differentiation and genetic variability in cork oak populations
(Quercus suber L.)
Joana Seabra Pulido Neves da Costa
MESTRADO EM BIOLOGIA HUMANA E AMBIENTE
Lisboa
2011
UNIVERSIDADE DE LISBOA
FACULDADE DE CIÊNCIAS
DEPARTAMENTO DE BIOLOGIA ANIMAL
Differentiation and genetic variability in cork oak populations
(Quercus suber L.)
Joana Seabra Pulido Neves da Costa
Dissertação orientada por:
Prof. Doutor Octávio Fernando de Sousa Salgueiro Godinho Paulo
Doutora Dora Cristina Vicente Batista Lyon de Castro
MESTRADO EM BIOLOGIA HUMANA E AMBIENTE
Lisboa
2011
III
Nota prévia
A presente tese de mestrado encontra-se escrita na língua Inglesa uma vez que esta é
considerada a língua científica universal. Por esta razão, o conhecimento e treino da sua
escrita apresentam uma importância considerável para quem tenciona seguir uma carreira em
investigação científica em Biologia. Com a escrita da tese em Inglês pretende-se também
acelerar o processo de elaboração dos manuscritos e subsequentes publicações científicas.
As referências bibliográficas foram elaboradas segundo os parâmetros da revista científica
internacional, “Trends in Ecology and Evolution” (www.cell.com/trends/ecology-
evolution/authors). Esta é uma das revistas mais relevantes na área em que esta tese foi
desenvolvida e possui um sistema de citações cómodo para a leitura de textos de revisão
científica. Adicionando o seu elevado factor de impacto na sociedade científica, pareceu
apropriada a escolha desta revista como referência para a apresentação da bibliografia.
O estudo elaborado nesta tese foi desenvolvido no âmbito do projecto PTDC/AGR-
GLP/104966/2008, “Avaliação dos recursos genéticos e genómicos do sobreiro: bases para
uma gestão prospectiva”, financiado pela Fundação para a Ciência e Tecnologia (FCT).
IV
Foreword
The present master thesis is written in English. This is considered as the universal scientific
language and, therefore, is of the upmost importance the practice of its writing and grammar
for those who intend to follow a career in Biology and scientific investigation. Also, the
writing of the present thesis in the English language allows to accelerate the process of
submission of the manuscripts for further publication.
The bibliographic references were elaborated following the parameters of the international
scientific journal “Trends in Ecology and Evolution” (www.cell.com/trends/ecology-
evolution/authors). This is one of the most relevant journals in the area where this thesis was
developed, with an elevated impact factor in the scientific society. Also it possesses a
confortable citations system for the reading long texts.
This study is part of the project PTDC/AGR-GLP/104966/2008, “Avaliação dos recursos
genéticos e genómicos do sobreiro: bases para uma gestão prospectiva”, funded by Fundação
para a Ciência e Tecnologia (FCT).
V
Agradecimentos
No terminar desta tese surge a necessidade de agradecer a todos aqueles que de alguma forma
a tornaram possível.
O primeiro agradecimento é devido aos meus orientadores, Octávio Paulo e Dora Batista. Ao
Professor Octávio pelo incentivo e voto de confiança que depositou em mim desde o início. À
Dora pela proposta do tema de mestrado e pelo despertar do meu interesse pelas plantas.
À Professora Deodália Dias pelas oportunidades que me proporcionou e pelo apoio
incondicional quando os problemas fogem ao nosso controlo e não dependem de nós.
Agradeço em particular à Professora Helena Almeida do Instituto Superior de Agronomia
pelo acesso à Herdade Monta da Fava de onde vieram algumas das populações de sobreiro
mais importantes para o desenvolvimento deste trabalho. A todos os que directa ou
indirectamente foram importantes para a recolha das amostras.
Ao CoBiG2
pelo grupo que se formou e pelos bons momentos. Ao Francisco Pina-Martins e à
Vera Nunes por me terem criado nos momentos iniciais da minha vida de laboratório. À Sofia
Seabra pela sua calma natural e amizade. Ao Eduardo Marabuto pelo seu bom humor, muito
necessário em tempos difíceis. À Sara Ema, por incrível que te possa parecer acho que
stressas mais que eu e isso é uma ajuda enorme, assim como estares comigo até ao dia da
entrega… mesmo de moletas. Ao Diogo Silva pelo partilhar de alguns momentos difíceis
com os orientadores a afins. À Catarina Dourado, Ana Sofia, Patrícia Brás, Renata Martins,
Inês Modesto e Bruno Vieira que me foram ajudando com as usuais dificuldades de uma tese.
Aos restantes membros do CoBiG2, assim como a antigos membros e às mais recentes
aquisições, muito obrigada!
À Rita Oliveira e Raquel Vaz, não fazem parte do CoBiG2, mas fazem parte da família e
merecem o devido reconhecimento e agradecimento pelo que “aturaram” da minha parte.
Um agradecimento especial a quatro pessoas que devem ter sofrido muito comigo. Seriam
precisas páginas de agradecimentos, mas como não o posso fazer fica a intenção. À Catarina
Dourado um agradecimento em particular. Foi um longo caminho e fica o agradecimento pelo
carinho, apoio e amizade. Ao Eduardo Marabuto pelos valiosos comentários, acima de tudo
na Introdução. Tens razão em muitas coisas mas há que fazer compromissos. À Sofia Seabra
pelas importantíssimas correcções, seria muito mais difícil sem ti. Ao Bruno Vieira pelas
VI
horas infinitas que me ouviu queixar da vida em geral, da tese em particular. Sempre com
muito amor e carinho! Obrigada aos quatro!
À Diana Martins. Não estás sempre comigo mas estás sempre a pensar em mim e tens timings
impecáveis para quando preciso mais de ti.
E ao Pai, Mãe e Avós. Apesar de estarem no fim desta lista foram provavelmente as pessoas
que mais contribuíram para que esta tese pudesse ser concluída. Claramente não estaria cá
sem o apoio precioso da minha família.
VII
Resumo
O ano 2011 foi designado como “O Ano Internacional das Florestas” pela Assembleia Geral
das Nações Unidas, na tentativa de despertar o interesse público e promover a
sustentabilidade da gestão e conservação florestal para o benefício das gerações futuras.
Estimativas da FAO (Food and Agriculture Organization) para o ano de 2010 demonstraram
que 31% da superfície terrestre ainda está coberta por florestas e que as árvores
correspondem a 90% da biomassa terrestre, compreendendo um total de 60.000 a 100.000
taxa. Contudo, certas alterações induzidas pelo Homem, principalmente a desflorestação e as
alterações climáticas elevaram o número de espécies ameaçadas de extinção para 10%.
Nos últimos tempos as espécies florestais têm sido bastante usadas em estudos de genética
populacional e evolutiva, assim como em estudos genómicos. As principais razões são as
características particulares que estes modelos não-clássicos apresentam, visto resultarem de
milhões de anos de divergência e diversificação, e assim apresentarem impressionantes níveis
de diversidade morfológica, divergência evolutiva e diversidade ecológica. Apesar de o
impacto que as alterações globais vão ter sobre estas espécies depender grandemente da sua
capacidade de reacção e da dos seus ecossistemas, os estudos genéticos permitem-nos, até
certo ponto, prever as consequências evolutivas das alterações uma vez que nos possibilitam
aumentar o conhecimento da biodiversidade e evolução destas espécies.
O conceito de “Filogeografia” foi apresentado por Avise et al. em 1987, e durante os últimos
25 anos teve um grande impacto na investigação, particularmente em animais. Nas plantas os
resultados produzidos não têm sido tão explícitos, principalmente devido à falta de
variabilidade genética aplicável à análise filogeográfica. Tem sido consideravelmente difícil
encontrar um marcador genético em plantas com um poder de resolução semelhante ao DNA
mitocondrial animal. No entanto a filogeografia em plantas tem-se desenvolvido bastante,
principalmente nos últimos anos, com o crescimento do uso de marcadores moleculares
nucleares e com a recolha de informação de fragmentos maiores do genoma cloroplastidial.
O género Quercus (carvalhos) (Fagaceae) é um dos grupos mais importantes de
angióspermicas lenhosas no hemisfério norte, nomeadamente em relação à diversidade de
espécies, dominância ecológica e valor económico. O género é bastante antigo considerando
que o fóssil mais antigo encontrado pertence ao Oligoceno (34-23 milhões de anos). Os
VIII
carvalhos são os membros dominantes de uma grande variedade de habitats e pensa-se que
existam 500-600 espécies na Terra.
O sobreiro (Quercus suber L.) representa umas das espécies arbóreas mais importantes da
região Oeste do Mediterrâneo, tanto económica como ecologicamente, onde define espaços
florestais abertos (criados e mantidos pelo Homem) conhecidos em Portugal como
“montados”. A área de distribuição do sobreiro, apesar de descontínua, vai desde a costa
Atlântica do Norte de África e Península Ibérica até às regiões sudoeste de Itália, incluindo as
ilhas Mediterrânicas Sicília e Sardenha, assim como as zonas costeiras do Mediterrâneo da
Argélia e Tunísia. As florestas de sobreiro cobrem uma área total de cerca de 2,2 milhões de
hectares, de onde são extraídas 340.000 toneladas/ano de cortiça. As maiores extensões de
área coberta estão localizadas em Portugal com cerca de 700.000 hectares, correspondendo a
21% da área florestal Portuguesa e 30% da área mundial de produção de cortiça. O sobreiro
tem sido usado desde a Antiguidade para a produção de cortiça e este produto natural
apresenta um grande valor económico. As maiores ameaças contudo, são enfrentadas pelas
populações naturais e marginais, que muitas vezes são pequenas e se encontram dispersas e
em habitats restritos. Muitas destas populações podem estar em risco de desaparecer,
principalmente devido à falta de regeneração.
Devido ao seu valor económico e também porque os espaços florestais de sobro são
reservatórios de biodiversidade e abrigo para uma grande variedade de espécies ameaçadas de
extinção, estas populações representam material importante para estudos genéticos que
possam servir de base ao delineamento de programas de conservação. Assim sendo, é
necessário fortalecer e aumentar o conhecimento da organização espacial da variação
genética da espécie, para assim se poder tomar decisões conscientes e informadas sobre a
conservação dos recursos genéticos.
Os estudos filogeográficos em Quercus suber têm sido pouco aprofundados e alguns até
inconclusivos. Isto leva a que não haja uma boa compreensão da história evolutiva da
espécie, muito provavelmente devido ao número limitado de áreas amostradas ou o baixo
conteúdo informativo dos marcadores usados. Por exemplo, nos estudos que envolveram
populações Portuguesas, foram feitas inferências com base numa amostragem deficiente de
Portugal, e sendo esta uma das regiões mais relevantes na história presente e passada do
sobreiro, é necessária uma maior cobertura da área de distribuição, incluindo algumas zonas
referidas como potenciais zonas de refúgios glaciais para outras espécies. Por outro lado, uma
IX
vez que a maioria dos estudos filogeográficos são suportados por dados derivados do DNA
cloroplastidial (cpDNA) (PCR-RFLPs e SSRs), deve considerar-se se outras abordagens
moleculares ou marcadores genéticos, que evoluam a taxas mais rápidas que o cpDNA não
indicariam um cenário evolutivo diferente. Esta tese de mestrado propõe uma abordagem
diferente dos estudos anteriores, complementando dados obtidos a partir de DNA
cloroplastidial e nuclear. Esta abordagem nunca foi aplicada ao sobreiro, e espera-se que
possa adicionar informação filogeográfica relevante. Mais especificamente os objectivos
deste trabalho foram: 1. Inferir a história evolutiva e os padrões demográficos de Quercus
suber; 2. Explorar os padrões de hibridação e introgressão do sobreiro com outras espécies de
Quercus; 3. Avaliar os níveis de diversidade e diferenciação entre e dentro de algumas
populações chave de sobreiro.
A sequenciação de vários fragmentos permitiu inferir alguns detalhes sobre a história
evolutiva da espécie. O tradicional cpDNA foi seleccionado para sequenciação de 3 regiões
inter-génicas (TrnL-F, TrnS-PsbC e TrnH-PsbA), num total de 148 amostras provenientes de
26 populações. No entanto, e porque inferências filogeográficas baseadas num único tipo de
marcador não-recombinante pode dar informações erróneas sobre a história evolutiva da
espécie, o genoma nuclear (nuDNA) também foi explorado com a sequenciação de um gene
candidato potencialmente envolvido no stress osmótico (EST 2T13), em 104 amostras
provenientes das mesmas 26 populações. Para ambos os conjuntos de dados foram detectadas
duas linhagens presentes em sobreiro. Uma linhagem, a “linhagem pura”, parece
praticamente exclusiva do sobreiro e divide-se em três sub-linhagens possivelmente
resultantes de três zonas de refúgio, sendo uma predominante na zona Oeste do Mediterrâneo,
e as outras duas na zona Este do Mediterrâneo. A outra linhagem aparece associada a
Quercus ilex (azinheira) e Quercus coccifera (carrasco) e foi apelidada de “linhagem
introgredida”. Esta linhagem parece resultar de vários fenómenos de hibridação e
introgressão com Quercus ilex. A análise combinada das sequências do cpDNA e nuDNA
sugere que esta introgressão aconteceu em ambos os sentidos entre as duas espécies, assim
como sugere que estes eventos foram frequentes e consecutivos durante um período de
tempo.
Finalmente, microssatélites nucleares, derivados de ESTs (Expressed Sequence Tags) (EST-
SSRs) e anónimos (nuSSRs), permitiram obter uma perspectiva dos padrões de diversidade
genética e estrutura populacional do sobreiro. Numa primeira fase foi possível estabelecer os
EST-SSRs como marcadores válidos no sobreiro, contrariando a ideia de que os EST-SSRs
X
tendem a ser pouco polimórficos. Posteriormente, uma análise combinada destes dois
marcadores (5 EST-SSRs e 3 nuSSRs) em 379 indivíduos provenientes de 13 populações
detectou uma diversidade genética relativamente baixa, mas altamente significativa. Apesar
de não ter sido detectada estrutura populacional nas populações Portuguesas, aparecendo em
conjunto num grupo populacional, verifica-se uma tendência para considerar a Catalunha
(Espanha) como uma das populações mais diferenciadas.
No geral os objectivos do trabalho foram cumpridos, esclarecendo alguns pontos da
filogeografia e história evolutiva do sobreiro. A introdução dos novos marcadores
moleculares foi claramente informativa, revelando novos aspectos inesperados acerca dos
padrões genéticos da espécie e assim o gerar de hipóteses explicativas completamente novas
em sobreiro.
Palavras-Chave
Quercus suber, estrutura geográfica, microssatélites, ESTs, introgressão
XI
Abstract
Cork oak (Quercus suber L.) is one of the most important tree species, economically but also
ecologically, in the Western Mediterranean region. Consequently there is an enormous
interest in understanding the evolutionary history and current population structure in cork
oak. Although some details on the genetic divergence of cork oak populations have been
uncovered, it is most probable that a different and complementary analysis of chloroplastidial
and nuclear DNA markers (cpDNA and nuDNA) can bring additional phylogeographical
relevant information. So far, no one has attempted the molecular approach proposed in the
present study for cork oak by combining cpDNA and nuDNA sequence variation and also
anonymous nuclear microsatellites (nuSSRs) and EST-derived (Expressed Sequence Tags)
(EST-SSRs) polymorphism data to infer phylogeographical patterns and history, possible
glacial refuges, diversity levels and geographic structure.
A genetic survey was conducted sampling populations throughout the entire distribution
range of the species. Genetic diversity was monitored at 8 nuclear microsatellite loci (3 EST-
SSRs and 5 nuSSRs) in 379 individuals derived from 13 populations, and at 4 DNA
sequences (3 cpDNA intergenic spacer regions and 1 osmotic-stress related candidate gene)
in 148 samples from 26 populations.
DNA sequences, of both cpDNA and nuDNA, confirmed two main lineages of cork oak
haplotypes, the first named as pure lineage (mostly exclusive of cork oak but also shared with
Q. cerris) and the second as introgressed lineage (shared with Q. ilex and Q. coccifera).
However, sequences of the cpDNA show the complexity of the introgressed lineage,
apparently indicating that these events of hybridization and introgression may have happened
frequently and consecutively over a period of time. The theory of cork oak refugia over the
last glaciations was also revisited (over the pure lineage of the cpDNA haplotypes) and three
major haplotypes were detected, reflecting three possible refuge areas. Finally, with the
microsatellite data, population differentiation was low but rather significant and the
geographic subdivisions that could be defined isolated the Portuguese populations in one
cluster, further characterizing the Catalonia (Spain) population as possibly the most
differentiated population.
Key words
Quercus suber, geographical structure, microsatellites, ESTs, introgressive hybridization
XII
XIII
List of abbreviations
AFLP – Amplified Fragment Length Polymorphisms
BA – Bayesian analysis
BLAST – Basic Local Alignment Search Tool
bp – base pairs
BP – Before Present
CBOL – The Consortium for the Barcode of Life
COI or Cox1 – cytochrome c oxidase I
cpDNA – Chloroplastidial DNA
ESTs – Expressed Sequence Tags
EST-SSRs – EST-derived SSRs
FAO – Food and Agricultural Organization (of the United Nations)
ITS – Internal Transcriber Spacer
kb – Kilobases
MCMC – Markov Chain Monte Carlo
MP – Maximum Parsimony
mtDNA – Mitochondrial DNA
nuDNA – Nuclear DNA
nuSSRs – nuclear SSRs
PCR - Polymerase chain reaction
RAPDs – Random Amplified Polymorphic DNA
rDNA – Ribosomal DNA
RFLP – Restriction Fragment Length Polymorphism
sncDNA – Single-nuclear copy DNA
SNPs – Single Nucleotide Polymorphisms
SSR – Simple sequence repeats; microsatellites
XIV
XV
Table of Contents
1. Introduction ................................................................................................................................... 17
Thesis Main Goals ............................................................................................................................ 18
1.1 An emblematic tree: Quercus suber L. ............................................................................. 20
1.1.1 General aspects on cork oak and the “montado” ............................................................. 20
1.1.2 Taxonomic classification and phylogenetic studies ......................................................... 22
1.1.2.1 Barcoding in oak phylogenetics ................................................................................ 24
1.1.3 Geographical distribution ................................................................................................. 25
1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization ............... 27
1.1.5 Genetic diversity studies .................................................................................................. 30
1.1.6 Hybridization and cytoplasmatic introgression ................................................................ 35
1.2 Molecular markers in phylogeography ............................................................................. 37
1.2.1 Mitochondrial DNA (mtDNA) ......................................................................................... 37
1.2.2 Chloroplastidial DNA (cpDNA) ...................................................................................... 38
1.2.3 Nuclear DNA (nuDNA) ................................................................................................... 39
1.2.4 Simple Sequence Repeats (SSRs) .................................................................................... 40
1.2.5 Expressed Sequence Tags (ESTs) .................................................................................... 41
2. Materials and Methods ................................................................................................................. 43
2.1 Sampling and DNA extraction ................................................................................................ 43
2.2 DNA sequencing ..................................................................................................................... 44
2.3 Microsatellite genotyping ....................................................................................................... 47
2.4 Phylogenetic and phylogeographic analysis ........................................................................... 48
2.5 Selective neutrality tests and demographic history ................................................................. 49
2.6 Genetic diversity and population differentiation ..................................................................... 50
2.7 Genetic structure of populations ............................................................................................. 51
3. Results ........................................................................................................................................... 53
3.1 Sequencing of chloroplast and nuclear DNA fragments ......................................................... 53
3.1.1 cpDNA and nuDNA diversity levels ............................................................................... 53
3.1.2 Differentiation patterns .................................................................................................... 54
3.1.3 Mismatch distribution and neutrality tests ....................................................................... 62
3.2 Microsatellite analysis............................................................................................................. 64
3.2.1 Genetic diversity values ................................................................................................... 64
3.2.2 Genetic differentiation among populations ...................................................................... 66
XVI
3.2.3 Population structure ......................................................................................................... 68
4. Discussion ..................................................................................................................................... 73
4.1 Differentiation and demographic patterns ............................................................................... 73
4.2 Hybridization and introgression .............................................................................................. 75
4.3 Genetic diversity and population structure ............................................................................. 79
5. Final Remarks ............................................................................................................................... 82
6. Bibliographic References .............................................................................................................. 84
Supporting Information ......................................................................................................................... 95
Supporting Information 1 .................................................................................................................. 96
Supporting Information 2 .................................................................................................................. 97
Supporting Information 3 .................................................................................................................. 99
Supporting Information 4 ................................................................................................................ 101
Supporting Information 5 ................................................................................................................ 105
Supporting Information 6 ................................................................................................................ 109
Supporting Information 7 ................................................................................................................ 110
Supporting Information 8 ................................................................................................................ 111
Supporting Information 9 ................................................................................................................ 115
Materials and Methods
17
1. Introduction
The year of 2011 has been designated as „The International Year of Forests‟ by the United
Nations General Assembly, in an attempt to raise awareness and strengthen a more
sustainable forest management and conservation of all types of forests for the benefit of
current and future generations. Estimates by the Food and Agriculture Organization (of the
United Nations) (FAO), in the year of 2010, demonstrated that 31% of the Earth‟s terrestrial
surface is still covered by forests, and that trees correspond to 90% of Earth‟s biomass [1].
Some estimates of the global tree species richness state that there are 60,000 to 100,000 taxa,
and that forests harbour the majority of the world‟s terrestrial biodiversity [2]. However, the
ongoing deforestation and other human-induced global changes (such as climate and land
use) brought the number of the world‟s tree species threatened with extinction close to 10%
[3,4] and, although the overall rate of deforestation remains alarmingly high (estimated at 9.4
million hectares per year in the late 1990s), this rate is surprisingly slowing down [5].
In recent years forest trees have been gaining much attention as a non-classical model for
several types of studies. For purposes of population and evolutionary genetic and genomic
studies, they are particularly interesting since forest trees result of millions of years of lineage
divergence and diversification and present amazing levels of diversity in morphology,
adaptation, and ecology [6,7]. Although, in the end, the impact of global changes in forest
trees will depend to great extent on the reaction of these trees and their ecosystems, genetic
studies open the possibility of predicting the evolutionary consequences of the future global
changes by increasing the knowledge on tree biodiversity and evolution [8]. For that purpose
phylogeographic genetic studies seem to be an important step in understanding these
processes.
Avise et al. presented the concept of “Phylogeography” in 1987 [9], and during the past 25
years or so phylogeography has had a major impact on research, particularly in animal
species. In plants, however, the produced results have not been so explicit. One of the major
problems has been a lack of useful genetic variation applicable to the phylogeographic
analysis. It is quite difficult to find genetic markers in plants with a resolving power
comparable to animal mitochondrial DNA (mtDNA) [7,10]. Nonetheless, plant
Materials and Methods
18
phylogeography has come a long way over the last few years with the availability of nuclear
markers and with the collection of data from larger sections of chloroplastidial genome [11].
The genus Quercus (oaks) of the Fagaceae family is one of the most important groups of
woody angiosperms in the northern hemisphere in terms of species diversity, ecological
dominance, and economic value. The genus is quite old, since the oldest unequivocal oak
fossils belong to the Oligocene, which ranges from 34 to 23 million years before present.
Oaks are dominant members of a wide variety of habitats, and somewhat 500-600 species
exist on earth [12,13].
The Quercus suber L. (commonly known as cork oak) is among the most important tree
species (economically and ecologically) in the Western Mediterranean region, from where it
is endemic, defining unique open woods (created and maintained by man) known in Portugal
as “montados” ” and in Spain as “dehesas”. Quercus suber has been mostly used to produce
cork and this natural product has a great economic value. The biggest threats, however, are
faced by the marginal natural populations, often growing in small and scattered stands and in
restricted habitats that are at risk of disappearing, mainly due to a lack of regeneration [14].
Due to the species economic value and also because cork oak woodlands are renowned
reservoirs of biodiversity, home to a variety of threatened and endangered species, and
crucial to avoid soil erosion, Q. suber populations represent valuable material for genetic
studies as well as gene conservation programs. With that purpose, a greater knowledge about
the spatial organization of genetic variation within the species is necessary to allow decisions
to be made about tree breeding and the conservation of genetic resources.
Thesis Main Goals
Although previous studies have already addressed population genetics in Quercus suber,
there is still a void in the current understanding of the evolutionary history of the species,
mostly due to the limited number of geographical areas sampled or to low marker informative
content. In particular, Portuguese populations have been poorly represented in those studies,
and being Portugal one of the most relevant regions in the recent and past history of cork oak,
a more complete range of cork oak distribution and differentiation should be covered,
Materials and Methods
19
including some areas referred as potential glacial refuges for other species. On the other hand,
since the majority of the previous phylogeographical inferences are supported by data from
chloroplastidial markers [Restriction Fragment Length Polymorphism (RFLP) and
microsatellites (SSRs)], it should be raised the question whether other molecular approaches
or genetic markers evolving at faster rates than chloroplastidial DNA (cpDNA), would
provide a different microevolutionary scenario. Therefore, the main objectives of this work
were to:
1. Infer the evolutionary history and demographic patterns of Quercus suber;
2. Assess the hybridization and introgression patterns by other Quercus species;
3. Evaluate the diversity and differentiation levels among and within some key cork
oak populations.
To achieve these goals populations from the entire Mediterranean distribution of the species
were analysed using different approaches with several molecular markers. A multi-locus
sequencing approach was applied to infer the evolutionary history of the species and its
relationships and also introgression patterns with other Quercus species. The traditional
cpDNA was selected for sequencing of several fragments. Additionally, and because
phylogeographic inferences based on a single non-recombining marker can be misleading, the
nuclear genome was also explored with the sequencing of one candidate gene. Finally,
Expressed sequence tag (EST) derived SSRs (EST-SSRs) and anonymous nuclear SSRs
(nuSSRs) were also used with the intention of providing a perspective of patterns of genetic
diversity and population structure.
Materials and Methods
20
Figure 1.1: Quercus suber L. – Cork
oak‟s natural population in Serra da
Estrela, Portugal
1.1 An emblematic tree: Quercus suber L.
1.1.1 General aspects on cork oak and the “montado”
Cork oak (Quercus suber Linné, 1753) is an emblematic Mediterranean evergreen
sclerophyllous tree. It is a slow growing, extremely long-lived tree, reaching about 20 meters
height, with massive branches forming a round crown (Fig. 1.1). It is a diploid (2n=24),
monoecious (both male and female reproductive organs in one individual) species with a
protandrous system (anthers mature before carpels) to ensure cross-pollination. Plant
propagation in natural populations occurs by seed (acorn) dispersal and subsequent
germination (sexual reproduction), which is called natural regeneration. Cork oaks natural
regeneration is mostly assured by wind and animals, as with those of most oak species [15-
17].
Cork oak, along with holm oak (Quercus ilex L.,
1753), are the two main evergreen oak species in
the western part of the Mediterranean Basin [17].
These two species, particularly in the Iberia
Peninsula are mostly present as semi-natural stands
known as “montados”, which are open woods with
a delicate and particular ecosystem, created and
maintained by man.
The montado semi-natural landscape is valued
because it represents a viable land use still
preserving a rich biodiversity at all levels from
insects and flora to top predators such as the
Iberian Imperial Eagle (Aquila adalberti) or the
Iberian Lynx (Lynx pardinus), the world‟s most
endangered cat and their mutual prey species, the rabbit (Oryctolagus cuniculus).
They also represent an important economical resource, but with the exception of central
Spain, holm oak forests can be regarded as rare cases of woodlands that have undergone very
little or no silvicultural management. Cork oak management is, however, at a different level,
since its high economical importance is associated not only with harvesting of acorns, but
Materials and Methods
21
also of cork. The thick and soft bark of cork oak is used to produce the familiar cork which is
the main product responsible for the important economical role of this partly domesticated
species. Trees are first stripped of cork, from the lower portion of the trunk at about 14 years
of age and subsequently every 9-12 years, and can live through this process for 100 to 500
years without any apparent effect on tree physiology. Acorns are eaten by birds and they are
highly valued as fattening fodder for domestic Iberian pigs. Since ancient times, cork oak has
been favoured, and sometimes widely spread, by preferably using acorns from trees
producing good quality cork [15,18-20]. Therefore, Q. suber is widely cultivated within its
natural range, but according to Carrión et al. [21], without human activities, cork oak would
never develop pure stands in the Iberian Peninsula, and would form mixed forests with other
sclerophyllous and deciduous oaks, and with Pinus pinaster.
Cork oak forests cover 2,2 million hectares worldwide, from where 340,000 tons/year of cork
are extracted (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). The largest stands,
covering about 700,000 ha are located in Portugal and correspond to 21% of the forest area in
Portugal and to 30% of the world‟s cork producing area. Currently, cork industry represents
3% of all Portuguese exportations. Cork stoppers for wine bottles are the most representative
product of this industry, responsible for 70% of the exportations (see:
http://www.amorim.com/cor_glob_cortica.php). In spite of the economic importance of this
renewable material, there is still much to discover about both the biological and the genetic
mechanisms involved in its formation. Human intervention through extensive plantations and
systematic clear-cutting in forests with the objective of empirically selecting varieties with
higher quality levels of cork is supposed to have strongly contributed to the genetic
homogenization of Q. suber populations in the Iberian Peninsula [22].
Cork oak plantations are very important for the economy and play an important social and
environmental role that has to be taken into consideration as the unparalleled decline
occurring in the Iberian Peninsula and in Morocco is threatening the entire ecosystem [22].
Although the marginal and natural populations of cork oak are possibly the most endangered,
Iberian cork oak montados are also currently threatened and in decline due to multiple
factors. The main factor contributing to this decline is the occurrence of very severe drought
periods over several consecutive years [18]. The lack of natural regeneration (mainly due to
overgrazing and insolation, particularly in North Africa) is one of the most important factors
and so stand sustainability cannot rely exclusively on the decreasing resprouting ability of
Materials and Methods
22
aged and decaying adult trees [23,24]. In Portugal and Spain another contributing factor to
this decline is the occurrence of ink disease, a root disease caused by the soil born pathogen
Phytophthora cinnamomi. Moreover, the increasing use of synthetic stoppers in wine bottles
replacing the traditional cork is an additional factor that in conjunction with the above stated
threatens this ecosystem at medium term [18,22].
Holm oak montados are also endangered for some of these and a number of other reasons.
Thus, the admired sustainability of montados is jeopardized, and these formations may
become „fossil forests‟ [23]. In this sense, the outlook is more favourable for Q. ilex which is
a more euryecious species than Q. suber (whose presence is limited by cold, drought and soil
type). In recent years, there has been increasing recognition of the important contribution
made by these species to the preservation of seminatural habitats and landscapes in Europe
[18,23,24]. Several studies on the regeneration of Mediterranean forest have been published,
and some of them are centred on Q. ilex and Q. suber [24,25]. However, these works have
focused mainly on ecological aspects of regeneration, silviculture and land use [23], without
addressing the genetic bases of montado regeneration or the populations‟ diversity with
consequences on adaptation, which is now an immediate priority to allow informed decisions
for conservation of genetic resources.
1.1.2 Taxonomic classification and phylogenetic studies
Several proposals for Quercus taxonomy based on morphology have been presented [12,26],
however these classifications have always been surrounded by controversy mainly due to a
generalised intraspecific morphological variation that may be produced by hybridization and
adaptation to ecological changes in the environment [27], especially abundant in oaks. As a
result, classifications have been all but straightforward and especially at the subgenus level,
still uncertain. The taxonomic scheme proposed by Schwarz in 1964 [26] is possibly the
most accepted for the classification of cork oak, and appears to be the most suitable in
describing the systematics of European oaks [19,28,29]. According to the Flora Europaea
[17], and that same taxonomic scheme, the genus Quercus is divided in four subgenera (or
subsections), as follows:
Materials and Methods
23
Order Fagales
Family Fagaceae
Genus Quercus
Subgenus Cerris
Quercus
Sclerophyllodrys
Erythrobalanus
Quercus suber belongs to the family Fagaceae, genus Quercus and subgenus (or subsection)
Cerris (Spach) Oersted.
Quercus comprises 500-600 species, of which 350–500 species are distributed throughout the
Northern Hemisphere [12,13,30]. They are conspicuous members of the temperate deciduous
forests of North America, Europe, Asia, as well as the evergreen Mediterranean maquis. A
smaller number of oak species (30–35) are evergreen and grow mainly in south-western Asia,
western North America and around the Mediterranean Basin. In the Mediterranean area, only
four evergreen oak species have been identified. These include Quercus alnifolia Poech.
(golden oak) endemic to Cyprus and Quercus suber L. (cork oak) distributed exclusively in
the western part of the Mediterranean Basin. The third species is the holly oak which is a
complex including Quercus coccifera L. and Quercus calliprinos Webb. Allozyme studies
suggest that holly oak should perhaps be considered as a single species (Q. coccifera L.) with
subspecies coccifera and calliprinos [19]. The fourth Mediterranean oak species, Quercus ilex
L. (holm oak), shows two morphological types, rotundifolia and ilex type [18,19,26], which
are sometimes regarded as distinct species.
According to Schwarz [26], Q. ilex and Q. coccifera (including subsp. calliprinos) belong to
subgenus Sclerophyllodrys (O. Schwartz) whereas Q. suber and Q. alnifolia relate to
subgenus Cerris (Spach). This classification was also supported by RFLP analysis of the
nuclear ribosomal DNA (rDNA) 18S and 25S and spacer regions [28] and chloroplastidial
DNA [30] and by nuclear DNA (nuDNA) Internal Transcriber Spacer (ITS) sequences
[27,30], however Q. alnifolia was not included in these studies. Moreover, from the study of
Manos et al. [30], evidence was obtained that the two groups of Mediterranean oaks (subg.
Sclerophyllodrys and subg. Cerris sensu Schwarz; ”Ilex group” and “Cerris group” sensu
Materials and Methods
24
Nixon) are monophyletic, as reported previously by Nixon [29]. More recently, they
constitute a larger group (the Eurasian Cerris group) which includes all the European and
Asiatic evergreen oak species analysed [27,30]. When considering the subg. Cerris, several
systematic studies support that Quercus cerris and Quercus crenata are the most closely
related species to Q. suber [27,29-31].
1.1.2.1 Barcoding in oak phylogenetics
Tree species share several attributes, such as longevity, complex reproductive strategies, great
potential for local adaptation, and slow mutation and speciation rates [2], that makes
barcoding of forest trees a captivating issue from both speculative and practical points of
view. “DNA Barcoding” is a molecular approach to identify the species to which any living
organism belongs by the use of a standardised gene region of the genome (or several loci
used together as a complementary unit). Ideally, the barcode system would be an universal
and valuable resource that would allow fast and unequivocal species identification and taxon
characterization at any life stage of the specimen and from minimal tissue samples
(http://www.barcoding.si.edu) [29,32,33]. Besides taxonomy, a widespread application of
barcoding would be a powerful research complement for molecular ecology, phylogenetics,
and population genetics [34].
The success of a DNA sequence as a species identification tool - the barcode - depends on the
prerequisite of existence of unique substitutions that distinguish among closely related
species, and ease of application across a broad range of taxa. A portion of the mitochondrial
cytochrome c oxidase I (COI or cox1) gene sequence is currently being used as a universal
barcode in certain groups of animals, fungi, diatoms, and red algae. However, COI has
proved to be unsuitable in land plants, mainly because of the low nucleotide substitution rates
of the plant mitochondrial genome [7,35,36]. The nuclear and plastid plant genomes therefore
offer the best expectation of yielding a suitable sequence (or pool of sequences) for DNA
barcoding, i.e., a sequence(s) that will be variable enough to differentiate species, but at the
same time still stable enough at a lower taxonomic level as to have low infraspecific
variability [33,35]. The difficulty in finding a single-locus for barcode in plants suggested a
multilocus approach, focusing on the chloroplast genome as the most promising strategy for
barcoding plant species. Therefore a pool of loci has been recently considered, with the
Materials and Methods
25
greatest interest turned to seven candidates: rpoB, rpoC1 and rbcL as three easy-to-align
coding regions, a section of matK as a rapidly evolving coding region, and trnH-psbA, atpF-
atpH, and psbK-psbI for being three rapidly evolving intergenic spacers [36,37]. Based on the
relative ease of amplification, sequencing, multialignment, and on the amount of variation
displayed, many research groups have proposed different combinations of these loci [32,36-
39]. However, in 2009, the CBOL (The Consortium for the Barcode of Life) Plant Working
Group stated the combination of rbcL and matK as the most convenient in terms of
universality, sequence quality and discrimination power. Nevertheless, it is still argued that
regardless of the regions adopted for barcoding, some species will always be better resolved
with the use of other regions [29,36,40]. Such an example is the oaks, which represent an
obstacle to the idea of barcode in plants.
A recent attempt of barcoding in the Italian wild dendroflora, with the use of four plastid
regions (trnH-psbA, rbcL, rpoC1, matK), revealed that the genus Quercus is noncompliant to
barcoding (0% discrimination success) [29], a probable consequence of factors like low
variation rate at the plastid genome level and hybridization. Nonetheless, it appears that the
main obstacle to barcoding success in difficult genera, such as Quercus, cannot simply be
overcome by adding additional plastid DNA data. Nuclear DNA may offer some advantages
due to higher mutation rates and modes of inheritance. Discrimination of the same set of oak
species was already obtained by means of internal transcribed spacer region of ribosomal
DNA (ITS) sequence variation [27], and it even supports the recognition of the subgenus
Schlerophyllodrys, Cerris, and Quercus, as proposed by Schwarz [26]. The rapidly evolving
ITS may thus represent a useful supplementary barcode in difficult genera, although not
without completely overcoming extant problems, namely the paralogy and other factors
associated with the complex concerted evolution of this highly repeated part of the nuclear
genome, which still requires further refinement of current protocols [7,35].
1.1.3 Geographical distribution
The Mediterranean evergreen Quercus species are a group with overlapping habitats. In the
Western Mediterranean Basin, holm oak, cork oak and holly-oak are the dominant
broadleaved species. These three species are sympatric in many areas, but some differences
in their ecological requirements produce distinct responses to environmental conditions and
Materials and Methods
26
Figure 1.2: Geographical distribution of cork oak, Quercus suber, represented in dark grey. Based
on Magri et al. [16]
hence different evolutionary histories as interestingly confirmed by several studies showing
differences in their genetic variation patterns at both nuclear and cytoplasmic levels
[18,19,30,41,42].
Q. suber has quite a narrow geographical range when compared to the other main evergreen
Mediterranean oak species, mainly due to its ecological restrictions. The modern distribution
of cork oak, rather discontinuous, ranges from the Atlantic coasts of North Africa and Iberian
Peninsula to the southeastern regions of Italy, and includes the main western Mediterranean
islands of Sicily and Sardinia as well as the coastal belts of Algeria and Tunisia, Provence
(France) and Catalonia (Spain) [16,43] (Fig. 1.2).
As opposed to holm oak which shows a great ecological amplitude, cork oak is restricted to
hot (>4ºC – 5ºC mean temperature for the coldest month) variants of the humid and sub-
humid Mediterranean areas with at least 450 mm mean annual rainfall [18,20].
Materials and Methods
27
In Europe there are, theoretically, low winter temperatures that appear to set the geographic
distribution limits and most cork oak stands are located in areas below 800 meters in altitude,
since cork oak leaves are less tolerant to frost and to drought than those of the more
widespread holm oak. In addition, whereas holm oak is indifferent to soil types, cork oak
usually grows in acidic soils on granite, schist, or sandy substrates and it avoids limestone
and other carbonated substrates. Cork oak distribution is therefore more shifted to the west
and more patchy than that of holm oak (sensu latu) which constitutes a continuum from
Turkey to Portugal, including all the larger Mediterranean islands [17,18,20,43]. In spite of
this, within its geographical range, cork oak shows high levels of morphological and
phenological variability, albeit most of this diversity is considered to be result of past
introgressive hybridization with other sympatric species [15,44,45]. Nowadays, in their
common distribution area, cork and holm oaks often grow together and the local occurrence
of morphologically intermediate trees has been reported [18,27].
1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization
Several hypotheses have been advanced concerning the evolutionary history of cork oak as
well as the geographical location of its centre of origin; however the details of its
differentiation processes are still largely unknown.
It was originally suggested that Q. suber may have originated in the Iberian Peninsula where
the species has its current main range (Fig. 1.2). This hypothesis was based on geobotanical
studies and on allozyme variation from the whole cork oak range, which revealed a
substantially higher genetic diversity in the Iberian populations as compared with those from
North Africa, Italy and France [18,27]. Paleoecological data indicate that both cork and holm
oak species have been present in south Europe since the end of the Tertiary period. Also, two
fossil records of cork oak from Miocene age were found in Portugal and two belonging to the
Pliocene were recovered in Tunisia and Galicia (Spain). Therefore it seems plausible an early
Cenozoic origin for Q. suber in Iberia and subsequently, at the end of the Miocene, the
colonization of North Africa from the Gibraltar strait [16,18,43].
Alternatively, according to fossil records of other oak species of subgenus Cerris, dating to
the Tertiary and found in the Balkanic Peninsula, it has also been considered that Q. suber
Materials and Methods
28
might have appeared first in more eastern countries (either in the Balkanic Peninsula or,
alternatively, in the Middle Eastern-Peri-Caucasian area), in common to the whole Cerris
group. It has been suggested that the species expanded westward during the late Miocene and
was widespread throughout the Mediterranean Basin during the Pliocene, where it survived
thanks to the lack of climatic constraints, but going extinct in the eastern part of its
distribution area [27,43]. Data from PCR–RFLPs over cpDNA fragments seems to constitute
additional evidence to support an eastern origin for cork oak [43].
Glacial and periglacial environments have had a significant effect on the modern vegetation
of Europe. It is widely accepted that the climatic oscillations that occurred during the
Quaternary (i.e., over the past 1.8 million years) are one of the most crucial determinants of
the current distribution of biota in temperate latitudes. The spatial patterns of several tree
species throughout the European continent are the long-term result of late glacial and post-
glacial migration from refugial populations that were able to withstand the severe climatic
conditions of Pleistocene stadials [46-49]. With few exceptions [8,50,51], during the coldest
periods of the last full glacial epoch (37,000 – 16,000 years BP – before present) the locations
postulated for glacial refugia of most European woody angiosperms have been south of the
parallel 40º N, which runs from central Portugal to Sardinia, Calabria and northern Greece.
This is considered to be the boundary between polar aridity and warmer climates during part
of the Quaternary. The theory that southern Europe (particularly the three southern peninsulas
- Balkan, Italian and Iberian) and the Near East provided appropriate conditions for refugia of
temperate tree taxa is based on a number of assumptions relating to the full-glacial
environments of those regions and their ability to supply the necessary conditions for growth
[52,53].
The original refugial model idea implied that „forests‟ could have survived in these southern
locations during the cold stages of the Quaternary. However, extensive populations of trees
have never been detected. Instead, the traditional palaeogeographical models (although
inferred from a scarce palynological evidence) suggest a small number of refugia - the “few
southern refugia” hypothesis [52,53]. Temperate tree taxa possibly survived in small pockets
of microenvironmentally favourable locations where usually only a few tree taxa are detected
and in low concentrations. Aridity was probably a significant limiting factor for tree growth.
Assuming the hypothesis of “few southern refugia”, common patterns of post-glacial
colonization for temperate European tree species are defined and expected, with high
Materials and Methods
29
diversity levels in southern Europe and decreasing northwards [54-57]. Some of the main
European trees species have been analysed using molecular markers, including for example
several Quercus species and Abies alba (European silver fir), and the resulting patterns of
diversity correspond to the expected ones [58,59].
However, more complete palaeobotanical data sets [50,53], palaeoclimatic modeling [60] and
genetic research [52] are starting to question the paradigm of “few southern refugia” in
southern Europe (and in particular in the three southern peninsulas) during full-glaciations.
Increasing evidence indicates that during the last full-glacial period populations of coniferous
and some deciduous trees grew much further north and east than previously assumed [53]. In
addition, new palaeoclimatic simulations suggest that full-glacial conditions in central and
eastern Europe were not nearly as severe as previously anticipated [60,61]. While some
refugia for Mediterranean trees were previously identified in the Iberia Peninsula, López de
Heredia et al. (2007) results based on cpDNA PCR-RFLPs and a review of paleobotanical
data support the presence of multiple refugia for the evergreen oaks within the Iberian
Peninsula (e.g. Cantabric mountain ranges, south-eastern Spain or even central Spain) during,
at least, the last glacial period [52]. Under the “multiple refugia” hypothesis, tree species that
nowadays are present in the north and central Europe would have recolonized these areas
from populations located in the north of the Iberian Peninsula. Moreover, these populations
would have been barriers preventing expansion from southern refugia. If that was the case,
cpDNA data should show complex patterns of spatial distribution that would have resulted
from the generation of multiple secondary contact zones [8].
For the last glacial and postglacial periods, results from palynological data indicate the
occurrence of cork oak in south-western Iberia since the Late Glacial period (17,000-12,000
years BP) and in North Africa since the early Postglacial (approximately 8,500 years BP)
[18,27]. It is accepted that during the Quaternary glaciations, cork oak may have survived in
scattered refugia which possessed favourable microclimate conditions, and from which
postglacial colonization occurred over recent millennia. Palynological [21] and molecular
data [43,52] indicate a glacial refugia in south-western Iberia that expanded northwards in the
absence of mountain barriers and which was favoured by the existence of siliceous substrates.
It is also possible that the extensive introgression of Q. suber with Q. ilex may also indicate
several potential refugia in eastern Iberia [52]. RFLP analysis of the whole cpDNA show a
phylogeographical pattern of three groups corresponding to potential glacial refuges in Italy,
Materials and Methods
30
North Africa and Iberian Peninsula [43], from which, after the last glaciations, Q. suber may
have begun migrating northward to the southern part of France. However no fossil record
supports the molecular data for the Italian and North African refuge.
Reliable scientific evidence is lacking to confirm the presence of Q. suber in more northern
and eastern European countries. The Tertiary and Quaternary remains (megafossils and
pollen) found in several European countries did not allow taxonomic identification at the
species level and could be attributable to any Mediterranean oak species of the Cerris group
[43,62]. In fact, Q. suber is more thermophilous and has stricter soil requirements than many
other Quercus species, thus a bigger reduction of this species‟ range during glacial times is
expected to have happened. However, the uncertainty of palynological discrimination and the
lower cpDNA variation itself could bias the identification of glacial refugia for Q. suber [52].
1.1.5 Genetic diversity studies
As cork oak is predominantly allogamous, i. e. favouring cross-fertilization, with a life span
of up to 500 years or more and having a low replacement rate, it can be expected that at least
in some places, and mostly for selectively neutral characters, selection over time may have
resulted in reduced genetic differentiation both among trees of the same population and
between populations. However, a differentiation among populations has been detected around
the Mediterranean Basin by investigating both chloroplast and mitochondrial DNA (which
are maternally inherited in oaks, as demonstrated by Dumolin et al. [63]), as well as allozyme
variation [16,18,19,67]. Isozyme variation in the genus Quercus also shows that genetic
variability is high and similar to that found in conifers [15,65]. One of the main causes for the
high polymorphism found in cork-oak, as well as in holm-oak, may be attributable to the
physiological plasticity of the species, which allows them to adapt to variable and
unpredictable climatic conditions, characteristic of the Mediterranean climate. High levels of
diversity within populations are observed; conversely, low inter-population variability
indicates that most of the total genetic diversity in the species is found within rather than
among populations [15,18,19,23]. According to Elena-Rosselló & Cabrera [15], more than
83% of the total diversity in this species is found within populations, and the decline of
kinship estimates with distance suggests that isolation by distance has led to this structure.
The results obtained for Q. suber contrast with those found for most temperate forest species,
Materials and Methods
31
for which a generally weak and narrower within-population structure is the trend [23]. In cork
oak, gene flow between populations was estimated as more than one migrant per generation
(F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) and is theoretically enough to prevent
genetic drift from causing local genetic differentiation and therefore population divergence,
under the Wright‟s Island Model [66].
PCR-RFLPs over specific cpDNA fragments illustrate a complex pattern of variation in the
evergreen oaks [19,41]. Jiménez et al. [41] detected three very distinct lineages of cpDNA
haplotypes, two of them being present in cork oak. One of the lineages, the “suber” lineage,
is specific to cork oak populations and may be considered as the original and most widely
distributed lineage in this species. The partial geographical distribution of this lineage was
reported by López de Heredia et al. [64], from peninsular Italy, Sardinia, Sicily, Corsica,
northern Africa and the island of Minorca. Cork oak populations from the Spanish mainland
and from the island of Majorca were characterized by another maternal lineage also shared
with Q. ilex and Q. coccifera, the “ilex-coccifera I” lineage [41,64]. This fact was interpreted
as the result of multiple and mainly unidirectional cytoplasmic introgression of Q. suber by
Q. ilex.
RFLP analysis over the whole chloroplastidial DNA was used by Lumaret et al. [43] for the
first time in Q. suber to analyse the phylogeographical variation over the whole species range
(Fig. 1.3). The chlorotypes showed a clear phylogeographical pattern of three groups
corresponding to potential glacial refuges in Italy, North Africa and Iberian Peninsula. The
most ancestral and recent groups were observed in populations located in the eastern and
western parts of the species range, respectively. Unrelated chlorotypes of an “ilex” cpDNA
lineage were also identified in specific western populations [43]. From the cpDNA variants
of „ilex‟ lineage recovered through interspecific introgression, additional successive cpDNA
changes may have occurred in Q. suber, and so two distinct cpDNA lineages in cork oak
were predicted. A particular chlorotype S1, observed predominantly in continental Italy and
in Sicily, was identified by Lumaret et al. [43] in a few populations from Sardinia, and from
Corsica which also shared a rare chlorotype S7 with Tunisia. This situation possibly reflects
the occurrence of rare natural events of long-distance dispersal from several geographical
sources located in the closest areas to those islands. Moreover, the possibility of an
intentional acorn transport by people for economic purposes cannot be ruled out and its
impact on the geographical patterns of cork oak genetic variation should not be
Materials and Methods
32
underestimated [43]. López-de-Heredia et al. [64] also proposed the possibility of long-
distance dispersal events to explain the sharing of a rare chlorotype by cork oak populations
located in Minorca and in Sardinia.
Using cpDNA microsatellites, Magri et al. [16] analysed cork oak populations throughout the
species distribution range and found a high geographical structure characterized by five
distinct haplotypes (Fig. 1.4). It was assumed that H3 (north Africa-Sardinia-Corsica-
Provence) and H4 (Portugal-western Spain-southwest France-northern Morocco) were the
ancestral Q. suber haplotypes, with H1, H2 (Italy) and H5 (scattered populations) originating
through ancient or recent introgression with Q. cerris (H1 and H2) and Q. ilex (H5). Also, the
cpDNA SSR data combined with paleobotanical and geodynamics models demonstrated that
cork oak populations have possibly experienced a genetic drift geographically consistent with
the Oligocene and Miocene break-up events of the European–Iberian continental margins and
persisted in some of the separate microplates that are currently found in Tunisia, Sardinia,
Figure 1.3: Geographical distribution of the eight and six chlorotypes of the „suber‟ and „ilex‟ lineages
identified in Q. suber populations by Lumaret et al. [43]. Chlorotypes were scored by RFLP variation over
the whole cpDNA molecule. The identity of sampled populations and cpDNA chlorotypes assayed through
RFLP as well as affiliation to the „suber‟ or „ilex‟ cpDNA lineages are indicated in the Figure. Source:
Lumaret et al. [43].
Materials and Methods
33
Figure 1.4: Distribution of cpDNA haplotypes found by Magri et al. [16] with cpDNA SSRs and
phylogenetic reconstruction of the relationships between haplotypes. The black circle in the network
indicates a hypothesized mutation, which is required to connect existing haplotypes within the network
with maximum parsimony. The grey area corresponds to the current distribution of Q. suber Source:
Magri et al. [16].
Corsica, and Provence [16] (Fig. 1.5). All these events seemed to have occurred without
detectable cpDNA modifications for a time span of over 15 million years.
The modern history of Quercus suber is closely related to human activity over the use of its
cork. For this reason, humans have been considered responsible for a reduction in genetic
variation in some stands of cork oak, as well as for hybridization with congeners [67]. Other
cultivated tree species in the Mediterranean area display a similar low geographical structure
in genetic variation, arguing for a multidirectional diffusion because of human activity. For
example, in Castanea sativa, the low geographical structure of the chloroplast genetic
diversity may be explained by the effect of a strong human impact [67,68]. However the
geographical distribution of the cork oak haplotypes found by Magri et al. [16] does not
appear to be related to cultivation. In fact, fossil pollen and wood records suggest that cork
oak was distributed in approximately the same areas as today even before the Neolithic.
Materials and Methods
34
Figure 1.5: Reconstructions of the Western
Mediterranean palaeogeography and possible
location of Quercus suber haplotypes found by
Magri et al. [16] (colours as in Fig. 1.4).
Continental microterranes rifted off the European-
Iberian continental margin: Rif (R), Betic range (B),
Balearics (Ba), Kabylies (Ka), Corsica (Co),
Sardinia (Sa), Calabria (Ca). Source: Magri et al.
[16].
Another possible hypothesis to explain these results is postglacial population expansion from
the potential glacial refuges in Italy, North Africa and Iberian Peninsula [43].
Some studies have also assessed the genetic
variability of cork oak populations in
Portugal. Coelho et al. [22] used AFLP
markers and reported low levels of
differentiation among cork oak populations.
The reasons pointed out are owed to the
outcrossing characteristic of the species,
long distance anemophilous pollination and
eventual secondary acorn dispersal by
animals, leading to extensive gene flow and
an increased homogeneity of allele
frequencies between populations [22,45].
The values of population differentiation
reported by Coelho et al. [22] (FST =0.0172)
are below the average of 0.07–0.09
expected for long-lived, wind-pollinated
woody species. These results are similar to
those found by Simões de Matos (F. Simões
de Matos, PhD thesis, INETI Lisbon, 2007)
with nuclear SSRs (FST=0.02), confirming
the absence of population structure. This
pattern of genetic differentiation within
Portuguese cork oak stands, some located
over a distance of 700 km, may be
explained by anthropogenic pressure in addition to a constant gene flow. This study shows
that 90% of the polymorphic markers identified in cork oak genotypes are uniformly
distributed through the populations of Algarve, Alentejo and Trás-os-Montes regions.
Materials and Methods
35
1.1.6 Hybridization and cytoplasmatic introgression
Capture of unexpected chloroplast haplotypes by hybridization and introgression has been
proposed as the most likely explanation for the sharing of cytoplasmic genes both in
deciduous and evergreen oaks [69] as well as in others [41,70]. Q. suber was reported to
hybridize with several species of the evergreen oak group, particularly, with holm oak [17,45]
this being regarded as one factor contributing to the increase of genetic diversity in cork oak
[22]. Q. suber and Q. ilex possess overlapping geographical distributions [17], and
hybridization occurs in nature, although it is not a frequent event [45]. Nevertheless, these
species are not very closely related, as shown from both cytoplasmic and nuclear genetic
analyses [19,27,69] and belong to subgenera Cerris and Schlerophyllodrys, respectively [26],
although the more recent classification includes both species within the same Eurasian Cerris
group [30].
The two most easily recognizable oak hybrids are Quercus x crenata (Q. cerris x Q. suber)
and Quercus x morisii (Q. suber x Q. ilex) [27]. It must be noted that these relatively rare
hybrids (0.3%) are found only when both parental species co-occur. Mature hybrid
individuals are easily recognized due to intermediate morphological traits between the two
parental oaks [27], but seedlings and even juvenile trees show very similar morphological
traits so that, in mixed stands, species identification is usually very difficult or even
impossible until the adult stage [71]. Asymmetric hybridization has been confirmed by
Boavida et al. [45], upon the description of post-pollination barriers in Q. suber to
interspecific crosses with Q. ilex, Q. coccifera, Q. faginea and Q. robur. The cross between
Q. ilex and Q. suber shows evidence of unidirectional compatibility and a higher success rate
was reported in the interspecific crosses in which Q. suber acts as pollen donor rather than as
female parent due to a differential growth in the pollen tubes of both species [45]. Also, since
both species are protandrous and Q. ilex flowers earlier, early cork oak male flowers can
pollinate late holm oak female flowers, the reverse not usually occurring.
By analysing polymorphism at allozyme loci and DNA markers for which alleles are distinct
in the two species growing in separate areas (diagnostic markers), evidence was obtained for
the occurrence of hybrids and genetic introgression (backcrosses between hybrids and
parental species) between sympatric holm oak (female) and cork oak (male) in several
locations [18,43,71]. Further evidence was advanced that, in initial hybridization and in
backcrosses, Q. ilex is predominantly, but not exclusively, the maternal species. This
Materials and Methods
36
interpretation is supported by the discovery of “ilex-coccifera I” haplotypes (chlorotype
shared by Q. ilex and Q. coccifera) in Q. suber individuals, and the absence of the opposite
situation, that is no Q. suber haplotypes within the Q. ilex pool [41].
The effect of hybridization and introgression in Q. suber cpDNA can produce the total
replacement of the Q. suber chlorotype by the “ilex-coccifera I” lineage (chlorotype shared
by Q. ilex and Q. coccifera) [41,64]. This situation is common in eastern Spain, where
siliceous soils are scarce and the effective population size is lower than in the continuous
forests from western Iberia. It has been suggested, on the basis of the differences between Q.
ilex and Q. suber chlorotypes found in sympatric populations, that hybridization and
introgression in these populations may be ancient [43]. Therefore, as reported by López de
Heredia et al [52] it cannot be ruled out that in the eastern range of the species some
populations withstood the glacial conditions by hybridizing with Q. ilex. For instance, a
particular chlorotype found by these authors (named “c66”) is predominant in all Q. suber
populations from Catalonia (north-eastern Spain), being very rare in Q. ilex.
The absence of cork oak populations possessing „ilex‟ chlorotypes in the eastern
Mediterranean range of the species was reported both by López-de-Heredia et al. [64] and by
Lumaret et al. [43]. However in a Corsican population, one of the 50 cork oak individuals
scored for cpDNA RFLPs was shown to possess an „ilex‟ chlorotype [43], suggesting that
cytoplasmic introgression of Q. suber by Q. ilex does occur in the eastern range although
apparently much less commonly. A substantial number of trees showing intermediate
morphology between both species have been observed in south-eastern continental Italy [27],
in Sardinia and Provence [42], also possessing predominantly an „ilex‟ chlorotype and for
many of them a hybrid origin was confirmed on the basis of nuclear interspecific diagnostic
markers. So, interspecific hybridization is likely to have happened quite frequently in the
eastern part of the range of Q. suber as well [43].
Materials and Methods
37
1.2 Molecular markers in phylogeography
The use of molecular markers has revolutionized research fields such as conservation
biology, population biology, and ecology. Markers provide a mean of observing otherwise
hidden aspects of natural history, whether this involves population level interactions on
ecological timescales, or the evolutionary relationships of genes, populations, and taxa [10].
As stated before, there is a lack of phylogeographic studies in plants, when in comparison to
animal studies. One of the major problems is finding useful genetic variation applicable to
this type of analysis, and it has been quite difficult to find genetic markers with a resolving
power comparable to the animal mitochondrial DNA [7,10]. To address this problem, and
also the choice of the molecular markers used for this study, it seemed necessary to review
the literature concerning plant genomes and the molecular markers available.
Plants are characterized by three types of genomes within the cell: the nuclear genome, and
two cytoplasmic genomes – mitochondrial and chloroplastidial DNA. The latter are of
endosymbiotic origin and have lost various genes to the nucleus over time (and, sometimes,
vice versa). These organelle genomes, because of their supposed shared prokaryotic origin,
are similar to animal mtDNA in overall structure (closed-circle chromosomes), replication
mode (with large populations of molecules per cell), and a non-Mendelian inheritance.
However, they also differ from animal mtDNA, and from one another, in some important
molecular and evolutionary aspects [10,72].
1.2.1 Mitochondrial DNA (mtDNA)
Although the phylogeographic studies in animals rely heavily on the mitochondrial genome,
in plants several characteristics make it poorly suited for these studies [7,11].
Plant mtDNA is highly variable in size across species, ranging from about 20 kilobases (kb)
to 2500 kb. Inheritance is often maternal, but not always. Surprisingly, plant mtDNA evolves
rapidly with respect to gene order and gene rearrangements are common, but rather slowly
regarding primary nucleotide sequence. This leads to low rates of sequence evolution (about
100 times slower than in animals), such that specific loci do not contain adequate variation
Materials and Methods
38
for generating phylogeographic, intraspecific signal. So, in these regards, the evolutionary
dynamics of plant mtDNA and animal mtDNA differ greatly, and one must look in
alternative genomes for informative variation [7,10].
1.2.2 Chloroplastidial DNA (cpDNA)
Although similar to mtDNA, the plant cpDNA plays by different evolutionary rules. It varies
moderately in size among species (from about 120 to 217 kb), with much of the size variation
attributable to the extent of sequence repetition in a large inverted repeat region. The
molecule contains about 120 genes that code for ribosomal and transfer RNAs, and several
polypeptides involved in protein synthesis and photosynthesis. The chloroplastidial genome
is transmitted maternally in most species, biparentally in some, and paternally in others
(notably, most gymnosperms), and tends to evolve somewhat slowly with regard to gene
rearrangements and also in terms of primary nucleotide sequences (about 3 to 4 times faster
than plant mtDNA, but still much slower than animal mtDNA). For this latter reason, cpDNA
sequences have proven especially useful for estimating phylogenetic relationships in plants
[7,10].
Intraspecific variation has been reported in a growing number of species, so that almost all
published plant phylogeographic studies have relied on the chloroplast genome as their only
source of genetic variation [7,10]. Most of this variation has been revealed by restriction
enzyme digestion of cpDNA (RFLP technique), in which genetic variants reflect the gain or
loss of restriction sites or length variation [73]. A more recent restriction enzyme-based
approach involves the digestion of PCR-amplified chloroplast loci to reveal fragment length
polymorphisms (RFLP) within the amplified fragment [52,59]. Using these readily accessible
laboratory techniques, large portions of the chloroplast genome may be evaluated in
numerous individuals. Furthermore, it is believed that at least 50% of all cpDNA variation
may be attributable to small insertion/deletion mutations. However, concerns about the
homology of length variants associated with simple-sequence repeat (SSR) polymorphisms
need to be addressed before this technique can be widely applied to construct useful gene
trees [7]. Ultimately, direct knowledge of the sequences of cpDNA variation would be most
desirable for gene tree construction. Unlike restriction enzyme analyses, direct sequencing of
cpDNA loci has not retrieved so far as many optimal levels of variation for phylogeographic
Materials and Methods
39
analysis. On the search for cpDNA loci with useful levels of sequence variation, it is
necessary to consider that the mutation rate of cpDNA varies for different regions of the
genome and non-coding regions are more prone to mutation. Therefore, several small regions
of the chloroplast genome (such as some intergenic spacers) show potential for
phylogeographic analysis [59,74,75].
Several attempts indicate that single cpDNA loci are only occasionally useful at the
intraspecific level, but as technology progresses and the sequencing of larger fragments of
DNA becomes easily achievable, with diligent sequencing efforts, it seems likely that
sufficient genetic variation can be uncovered and studies will utilize more of the potentially
available variation in the chloroplast. Ultimately, finer phylogeographic resolution can be
obtained [7,10]. Indeed a few studies have already proved intergenic spacer regions as useful
regions for direct sequencing, such as, for example, trnT-L-F in Ficus carica [76][74] and
trnH-psbA in Eucalyptus perriniana [77][78], as well as psbC-trnS intergenic spacer region
[79] in several Quercus species [75].
1.2.3 Nuclear DNA (nuDNA)
The remaining alternative is the nuclear genome that is still largely unexplored but offers a
potentially inexhaustible source of informative genetic variation, and lately many
investigators are developing techniques and strategies for locating and efficiently sampling
appropriate variation in nuclear DNA.
The ITS region, useful for plant systematics, is however generally not very helpful for
phylogeographic studies. First, for most species examined, intraspecific variation has not
always been detected in this region. Furthermore, as part of a multicopy gene family, the ITS
region is subjected to poorly understood processes of concerted evolution, which may lead to
problems with the interpretation of sequence polymorphism at the intraspecific level. Also,
when a locus is part of a multicopy gene or multigene family, PCR amplification with
conserved primers may produce multiple fragments, including duplicated gene copies,
pseudogenes, and even recombinant PCR artifacts. Care is thus necessary to avoid comparing
paralogous loci, which may be especially difficult to detect in cases where there has been
differential homogenization of gene copies among populations [7,80].
Materials and Methods
40
In principle, single-copy nuclear (scn) genes should also provide sufficient sequence data for
phylogeographic assessments at the intraspecific level, but three technical and biological
obstacles need to be considered: first the considerable slow rate of sequence evolution at
many nuclear loci; in diploid organisms, the difficulty of isolating aleles, one at a time; and
intragenic recombination. Nonetheless, scnDNA has been employed successfully in some
phylogeographic assessments, with some of the most informative results coming from intron
sequences at protein-coding genes [10,81]. However, so far, no single locus appears to be
universally useful in all species of plants.
Additional features of the nuclear genome also need to be taken into account for
phylogeographic analysis such as complications involving interallelic recombination and
heterozygosity, recombinant alleles from crossing-over events among alleles of a locus
resulting in chimeric haplotypes and also the homology of the loci in use needs to be
reassured. Some (and probably many) „single-copy‟ nuclear genes exist as part of small gene
families consisting of two to ten expressed loci and possibly additional pseudogenes
[7,10,80,81].
Despite all of these potential problems the nuclear genome is still, perhaps, the most dynamic
and useful marker for studying plant phylogeography because it is much larger than the
others and includes most of the information behind the shaping and adaptation of the
individual to the environment.
1.2.4 Simple Sequence Repeats (SSRs)
Microsatellites (or simple sequence repeats - SSRs) are short repetitive sequences of
nucleotides of typically 1-5 base pairs (bp) motifs, that are repeated in tandem up to a usual
maximum of 60 or so, and are widespread in both eukaryotic and prokaryotic genomes
[82,83]. Less accuracy of traditional molecular markers in the estimation of genetic
differences between various taxa and their insufficient statistical capacity forced researchers
to look towards better alternatives like microsatellites. They present a group of characteristics
that make them eligible as markers of choice for several studies, such as: 1- PCR-based, 2-
co-dominant, 3- usually multiallelic and highly variable, 4- randomly dispersed throughout
Materials and Methods
41
the genome, and 5- easily scorable by different methods [82,84,85]. Neutral nuclear SSRs
(nuSSRs) are the choice for diversity analysis, genetic mapping and association studies [86].
The use of microsatellites as polymorphic DNA markers has considerably increased over the
years, and although they were originally designed for research in humans, they have been
extensively used for genetic analysis in all classes of organisms, including plants [82,85].
With the development of other genetic markers like single nucleotide polymorphisms (SNPs)
and AFLPs, it was thought that the use of microsatellites would decline. However, recent
research has improved its application so much that microsatellites will probably still be used
in the near future as important genetic markers in various biological disciplines [11,82]. The
initial cost associated with microsatellites may be high due to the requirement of sequence
information, but once developed they can be easily maintained and shared between
laboratories. The ease of use, high reproducibility, low cost and abundance of SSR loci in
living organisms makes them ideal markers for genetic analysis. Also they are multi-allelic
and generally have high heterozygosity and mutation rates (ranging from 10-6
to 10-2
events
per locus per generation), which can make them more informative than other markers, such as
Random Amplified Polymorphic DNA (RAPDs) and AFLPs [82,85].
Particularly in Quercus suber, Simões de Matos (F. Simões de Matos, PhD thesis, INETI
Lisbon, 2007) developed the only specific nuSSRs for cork oak (as a rule SSRs are species-
specific markers which must be developed de novo for each species, mainly because they
usually occur in non-coding regions of the genome which are not highly conserved) but some
studies [87,88] have shown the transferability of nuSSRs from other oak species to cork oak,
which potentially reduces the need to develop species-specific nuSSRs for this species.
1.2.5 Expressed Sequence Tags (ESTs)
Expressed Sequence Tags (ESTs) can serve as a source of molecular markers as gene
sequences, SSRs or SNPs, and are an easy way to access fragments of the transcriptome.
They are short (200-800 bases), randomly selected sequences derived from cDNA libraries.
Even if ESTs are not available from the organism under study, EST collections can serve as a
bridge between the genomic resources of model organisms and diverse species of interest,
usually nonmodel organisms. ESTs provide information of the transcribed mRNA
Materials and Methods
42
populations within a given set of tissues, developmental stages, environmental conditions and
genotypes [89,90]. For instance, the direct sequencing of EST fragments and subsequent
detection of SNPs would be the most useful way of studying geographical distribution of
genetic variation within species. As most ESTs are directly involved in the genetic control of
an adaptive trait and have a known function, ESTs are the genetic marker that offer real
potential for detecting adaptive genetic diversity [90].
As an alternative to the conventional strategy for detecting anonymous SSRs, large numbers
of novel SSRs can be isolated with comparatively minor effort simply by in silico mining of
the ESTs databases [91,92]. This approach has become a routine for some species, and there
are many characteristics that EST-SSRs (EST-derived SSRs) present and that make them
valuable as genetic markers. These include their presence in large numbers, high levels of
polymorphism compared with many other types of genetic markers, co-dominant inheritance,
repeatability and clarity of scoring, and enhanced transferability across related species
[91,93]. Perhaps the greatest concern about the utility of EST-SSRs in population genetic
analysis is that selection on these loci might influence the estimation of population
parameters. Indeed, divergent selection will increase differentiation among and reduce
variability within populations, whereas the opposite effect is expected under balancing
selection. However, studies of large-scale comparative analyses suggest that only a very
small percentage of all genes are experiencing positive selection [91,93]. Inevitably some
fraction of all EST-SSRs will be subjected to selection.
Recently a significant number of EST‟s was generated in oaks, and particularly in cork oak.
Since EST‟s are gene conservative primers designed for a species are likely to work well in
related ones.
Ultimately, the potential of phylogeography may be fully accomplished when multiple loci
are considered. The combined analysis of different marker types should allow a
reconstruction of past population events in great detail, and also help understand their spatial
structure and the dynamics of genetic diversity.
Materials and Methods
43
2. Materials and Methods
2.1 Sampling and DNA extraction
Sampling of 26 natural populations was performed from the entire Mediterranean distribution
(Fig. 1.2 and Table 2.1). In Portugal sampling was performed surveying the following
locations: Gerês, Serra da Estrela, Serra de São Mamede, Serra da Arrábida, Serra de
Monchique, Serra do Buçaco, Azeitão and Serra de Sintra. Stands were considered as natural
populations when constituted by irregularly disposed trees with over 50 years old. The
remaining populations from Portugal (São Brás de Alportel), Spain (Cataluña, Montes de
Toledo, Haza del Lino, Sierra de Aracena, Sierra Morena, Sierra de Guadarrama), Italy
(Puglia, Lazio and Sicily), France (Var, Landes and Corsica), Algeria (Forêt des Guerbès),
Tunisia (Mekna and Fermana) and Morocco (Taza and Kenitra), were obtained from a cork
oak provenance trial, located at Herdade Monte da Fava (Ermidas do Sado) , which harbours
an international provenance trial established in 1998 in the frame of the Q. suber network
from EUFORGEN, covering the complete distribution range of the species. Access to these
populations was kindly provided by Helena Almeida from Instituto Superior de Agronomia.
From each population 3-5 trees were sampled for the cpDNA and nuDNA fragment analysis.
Young leaves were collected from Spring 2009 to Summer 2010 on a total of 119 adult trees
distributed among the 26 sampled populations (Table 2.1).
Of the 26 populations chosen, 13 were selected for a wider sampling for the SSR study. The
selected locations are representative of the entire Mediterranean distribution, and are the
following: Portugal (Gerês, Serra da Estrela, Serra da Arrábida, Serra de Monchique, Serra
do Buçaco and Serra de Sintra), Spain (Cataluña and Haza del Lino), Italy (Puglia), Algeria
(Forêt des Guerbès), Tunisia (Mekna) and Morocco (Taza and Kenitra). For each population
22-32 trees were obtained. Young leaves were also collected from Spring 2009 until Summer
2010, on a total of 379 adult trees distributed among the 13 sampled populations (Table 2.1).
Several other Quercus species (namely Q. robur, Q. pyrenaica, Q. faginea, Q. rubra, Q.
lusitanica, Q. canariensis, Q. cerris, Q. ilex (subsp rotundifolia and subsp ilex) and Q.
coccifera) were also sampled from natural populations and used to help determine the Q.
suber lineages, and also to more accurately establish the phylogenetic relationships of these
lineages. According to the taxonomic classification of Schwartz [26] Q. cerris is part of the
Materials and Methods
44
subgenus Cerris, together with Q. suber. As for Q. petrea, Q. robur, Q. pyrenaica, Q. faginea
and Q. lusitanica they belong to the subgenus Quercus. The species Q. coccifera and Q. ilex
belong to the subgenus Sclerophyllodrys. Finally, Q. rubra is part of the subgenus
Erythrobalanus. Castanea crenata was used as an outgroup (Table 2.1). Species identification
of each tree was checked based on the leaf morphology, and presence of bark in Q. suber,
assessed during the growing season on fully elongated leaves.
The leaves were ground thoroughly with liquid Nitrogen, with a mortar and pestle, and then
the genomic DNA was extracted according to Qiagen‟s protocol for DNeasy plant mini kit
(Qiagen). The samples were analysed by electrophoresis on 1% w/v agarose gels stained with
Red Safe 20,000x (iNtRON Biothechnology), to determine DNA integrity.
2.2 DNA sequencing
Polymerase chain reaction (PCR) amplifications were performed for 148 Quercus samples
(Table 2.1) for fragments of three different chloroplastidial DNA regions [intergenic spacer
regions TrnL-F [74], TrnS-PsbC [79] and TrnH-PsbA [78]]. Considering preliminary results
of the cpDNA fragments analysis, 104 individuals, out of the 148, were selected for
amplification of one nuclear DNA fragment [Expressed Sequence Tag (EST) 2T13 [94], a
stress osmotic related gene] (Table 2.1). The primers used to amplify each fragment were
those described by each mentioned author (Supporting Information 1 – Table S1.1 and Table
S1.2).
To confirm Quercus species and assess the usefulness of barcodes as phylogeographical
markers the official cpDNA barcode fragments (matK and rbcL) were amplified with the
primers described by Cuénoud et al. [95] and Kress & Erickson [32], respectively
(Supporting Information 1 – Table S1.1). Three individuals of each cork oak lineage,
identified in the previous analysis of cpDNA regions, and one individual of each other
Quercus species were selected for the analysis.
PCRs were performed in a final volume of 25 μL, with 1 μL of DNA (50–100 ng), 1x PCR
buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4
µM of each primer. PCR amplification conditions were as follows: an initial denaturation
step at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 20 s,
Materials and Methods
45
annealing at 65 °C for 30 s for intergenic cpDNA fragments and 55 °C for the nuclear
candidate gene and barcode fragments, extension at 72 °C for 40 s, and a final extension step
at 72 °C for 7 min. PCR and amplification conditions were the same for all oak species.
PCR products amplifications were verified by staining with Red Safe 20,000x (iNtRON
Biothechnology) along with the molecular weight marker HyperLadder™ IV (Bioline) on 1%
w/v agarose gels. Amplicons were purified using SureClean (Bioline).
The nuclear EST fragments Phyt B (Phytocrome B, involved in flower phenology) [96] and
Cons 58 (Auxin repressed protein) [97] were also tested, with the primers described by the
mentioned authors (Supporting Information 1 – Table S1.2); however after several attempts
of optimization no amplification product was obtained.
Sequencing reactions were carried out using the BigDye v3.1 chemistry (Applied
Biossystems, ABI) on an ABI prism 310 automated sequencer. Amplicons were sequenced in
both directions with an initial denaturation at 96ºC for 1 min, followed by 25 cycles of 96ºC
for 10s, annealing temperature of 50ºC for 5s, and a final extension step at 60ºC for 4 min.
The amplified products were purified through a 70% ethanol precipitation, described as
follows. The total reaction volume was transferred to a 1.5 ml tube containing 1 μl of 3 M of
sodium acetate and 25 μl of absolute ethanol. This mixture was subsequently incubated on ice
for 30 min, and then centrifuged at 10,000 g for 25 min. The supernatant was discarded and
300 μl of 70% ethanol were added to each tube, which were centrifuged for 15 min at the
same speed; this last step was performed a second time. Finally, the supernatant was
completely discarded and the samples were air-dried in the dark, until further processing.
The products were sequenced in an ABI PRISM® 310 Genetic Analyzer (Applied
Biosystems, USA) available in the laboratory.
Chromatograms were manually checked for errors in SEQUENCHER v4.0.5 (Gene Codes
Co.). For the nuclear fragment, nucleotide ambiguities of similar peak size in chromatograms
were considered as evidence of potential heterozygous sites. The IUPAC ambiguity code was
used for subsequent analyses.
BLAST (Basic Local Alignment Search Tool) against NCBI database
(http://blast.ncbi.nlm.nih.gov) was always performed to confirm the fragments‟ identity. The
Materials and Methods
46
Species Country Site Code GPS coordinates Sample size
Lat Long cpDNA nuDNA SSRs
Q. suber Portugal Azeitão AZT 38º 30'N 9º 02'W 5 3 -
Gerês GER 41º 40'N 8º 10'W 5 3 29
Serra de Monchique MON 37º 19’N 8º 34’W 5 3 29
Serra da Arrábida ARR 38º 50’N 9º 03’W 5 4 30
Serra São Mamede SSM 39º 23'N 7º 22'W 5 4 -
Serra de Sintra SIN 38º 45’N 9º 25'W 5 3 30
Serra do Buçaco BUC 40º 22'N 8º 21'W 5 3 30
São Brás de Alportel SBA 37º 20'N 7º 56'W 5 3 -
Serra da Estrela EST 40º 32'N 7º 51'W 5 4 32
Tunisia Mekna, Tabarka MEK 36º 57'N 8º 51'E 5 3 28
Fermana FER 36º 35'N 8º 32'E 3 3 -
Algeria Forêt des Guerbès ALG 36º 54'N 7º 15'E 5 3 30
Italy Puglia, Brindisi PUG 40º 34'N 17º 40'E 5 3 22
Lazio, Tuscany LAZ 42º 25'N 11º 57'E 5 3 -
Sicily, Catania SIC 37º 07'N 14º 30'E 3 3 -
France Landes, Soustons LAN 43º 45'N 1º 20'W 5 2 -
Var, Bomes les Mimoses VAR 43º 08'N 6º 15'E 3 3 -
Corsica, Sartene COR 41º 37'N 8º 58'E 3 3 -
Morocco Kenitra KEN 34º 05'N 6º 35'W 5 3 30
Rif, Taza TAZ 34º 12'N 4º 15'W 5 3 30
Spain Sierra de Guadarrama GUA 40º 31'N 3º 45'W 5 3 -
Montes de Toledo, Cañamero
TOL 39º 22'N 5º 21'W 5 3 -
Haza del Lino HAZ 36º 50'N 3º 18'W 5 5 29
Sierra Morena, Fuencaliente
MOR 38º 24'N 4º 16'W 4 3 -
Cataluña, Sta Coloma de Farnes
CAT 41º 51'N 2º 32'W 5 3 30
Sierra de Aracena, Jabugo
ARC 37º 54’N 6º 44’W 3 3 -
Q. rotundifolia
Portugal Ermidas do Sado
38º 00'N 8º 07'W 2 2 -
Serra da Arrábida 38º 50’N 9º 03’W 1 1 -
Serra da Estrela 40º 32'N 7º 51'W 1 1 -
Serra de São Mamede 39º 23'N 7º 22'W 9 6 -
Fátima 39º 37'N 8º 40'W 1 1 -
Q. ilex France 43º 09’N 3º 03’E 2 1 -
Q. coccifera Portugal Cascais, Aldeia de Juzo 38º 72’N 9º 09’W 5 3 -
Q. faginea Portugal Serra da Arrábida 38º 50’N 9º 03’W 1 1 -
Q. pyrenaica Portugal Serra da Estrela 40º 32'N 7º 51'W 1 1 -
Q. robur Portugal Serra da Estrela 40º 32'N 7º 51'W 1 1 -
Q. canariensis
Portugal Lisbon 38º 45’N 9º 09’W 1 - -
Q. lusitanica Portugal Negrais
38º 52’N 9º 17’W 1 1 -
Q. rubra Portugal Lisbon
38º 45’N 9º 09’W 1 1 -
Q. cerris Italy Greve in Chianti 43º 35’N 11º 18’E 1 1 -
Castanea crenata
Portugal Vila Real
1 1 -
Total 148 104 379
Table 2.1: Description of the sampled populations for the several species, and sample size for each marker
(cpDNA and nuDNA sequences, and SSRs).
Materials and Methods
47
matK sequence for Quercus crenata was retrieved from GenBank (accession number
FN675334, [29]).
2.3 Microsatellite genotyping
A total of 9 dinucleotide nuclear anonymous microsatellite (nuSSRs) markers previously
developed on other oaks species were used in this study; one of them, MSQ13, was first
described in Q. macrocarpa Michx. [98], five in Q. petraea (Matt) Liebl. (QpZAG9,
QpZAG15, QpZAG36, QpZAG46 and QpZAG110) [99], and three in Q. robur L.
(QrZAG11, QrZAG7 and QrZAG20) [100]. Transferability of these SSRs to cork oak had
been previously reported [87,88]. These microsatellites are considered as unlinked and
anonymous markers [101]. Amplifications were performed with the primers designed by the
previously mentioned authors and the conditions were as follows: an initial denaturation step
at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 60 s,
annealing at 50 °C for 30 s (specific annealing temperatures in Table S2.1 – Supporting
Information 2), extension at 72 °C for 60 s, and a final extension step at 72 °C for 10 min.
PCRs were performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR
buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4
µM of each primer. However, considering the authors‟ guidelines for PCR, and after several
attempts, the loci QpZAG36 and QpZAG46 presented no amplification product for most of
the samples or very unreliable scoring and were, therefore, abandoned.
Two nuSSRs developed by Simões de Matos (F. Simões de Matos, PhD thesis, INETI
Lisbon, 2007) specifically for cork oak (QsA11 and QsD8) were also tested. However, in
spite of the optimization attempts a clear scoring was never possible and the loci were also
discarded.
At the onset of this work there were no EST-derived microsatellites (EST-SSRs) specifically
for cork oak, but since ESTs are gene conservative sequences, primers designed for a species
are likely to work in related ones. So, six polymorphic EST-SSRs were selected from
Quercus mongolica (QmOST1, QmD12, QmAJ1, QmDN1, QmDN2, QmDN3) [92]. The loci
names were chosen for this work and correspond, respectively, to the following NCBI dbEST
(http://www.ncbi.nlm.nih.gov/dbEST) accession numbers: DN949770, CR627959,
AJ577265, DN950717, DN949776, and DN950726. The selected sets of specific primers for
Materials and Methods
48
each SSR used can be found in Ueno & Tsumura [92] (Supporting Information 2 – Table
S2.2). PCR amplification conditions were as follows: an initial denaturation step at 94 °C for
5 min followed by 30 cycles consisting of denaturation at 94 °C for 30 s, annealing at 57 °C
for 30 s (specific annealing temperatures in Table S2.2 – Supporting Information 2),
extension at 72 °C for 30 s, and a final extension step at 72 °C for 10 min. PCRs were
performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR buffer
(Promega), 1U Taq polymerase (Promega), 1.5 mM MgCl2, 0.12 mM dNTPs and 0.3 µM of
each primer. After several attempts of amplification, the locus QmDN2 presented no PCR
products.
PCR product electrophoresis was performed with an ABI PRISM 310 automated sequencer
and the genotypes were scored and visually controlled using the GENEMAPPER software
v3.7 (Applied Biosystems, Inc.) To identify and correct possible genotyping errors the
software MICRO-CHECKER v2.2.3 [102] was used.
2.4 Phylogenetic and phylogeographic analysis
Datasets for each sequenced fragment were aligned in CLUSTAL X v2.0.12 [103,104],
followed by manual refinement in BIOEDIT v7.0.9 [105]. To create the cpDNA concatenated
matrix from the individual datasets of TrnL-F, TrnS-PsbC and TrnH-PsbA fragments, the
CONCATENATOR v1.1.0 software was used [106].
Phylogenetic analysis was performed using PAUP* v4.0.b4a [107]. Maximum parsimony
(MP) analyses were carried out on all data sets. The optimal tree was found by a heuristic
search with tree-bisection–reconnection as the branch-swapping algorithm. Initial trees were
obtained via stepwise addition with 1000 replicates of random addition sequence.
Bootstrapping with 1000 replicates was performed to evaluate the robustness of the nodes of
the phylogenetic trees.
Bayesian analyses (BA) were undertaken using MRBAYES v3.1.2 [108] with the optimal
model selected under the Akaike Information Criterion (AIC), as implemented in
MrMODELTEST v2.3 [109]. For analysis of the combined data, model selection was carried
out separately for each cpDNA data set with MrMODELTEST and then implemented
according to the author‟s recommendations. Additionally indels were included and scored as
Materials and Methods
49
binary characters (absent/present). The posterior probabilities of the phylogenetic trees were
estimated by a Metropolis-coupled, Markov chain Monte Carlo sampling algorithm
(MCMCMC), sampling at every 1000th
generation. For the individual and combined cpDNA
datasets, Bayesian posterior probabilities were generated from 6x106
and 5x108 generations,
respectively. For the nuclear fragment dataset 3x106 generations were used to calculate the
Bayesian phylogeny and respective posterior probability values. The analysis was run three
times with one cold and three incrementally heated Metropolis-coupled Monte Carlo Markov
chains, starting from random trees. Ten percent of the generations were discarded as burn-in.
Trees were then combined and summarized on a 50% majority-rule consensus tree.
The cpDNA fragments, when aligned for all oak species presented several indels, therefore
only the MP and BA analyses were performed, because only these allow considering indels
as informative data.
The program NETWORK v4.6 [110] was used to construct a median-joining network of
haplotypes showing the number of mutational steps between them.
2.5 Selective neutrality tests and demographic history
Selective neutrality of each microsatellite locus was examined based on the sampling
distribution of neutral alleles under the infinite-alleles model. The Ewens–Watterson
homozygosity test [111] and the Ewens–Watterson–Slatkin exact test [112,113] were
performed using the absolute allele frequency distribution, as implemented in ARLEQUIN
v3.5 software [114]. In these tests, the expected null distribution of the homozygosity statistic
(Fexp) is generated by simulating random neutral samples, which is then compared with the
homozygosity observed in the original sample (Fobs). If the null hypothesis of selective
neutrality is rejected (p<0.05), an Fobs/Fexp ratio less than 1 implies balancing selection in
favour of heterozygotes and a ratio greater than 1 implies directional selection in favour of
advantageous alleles.
The mismatch distribution (1000 replicates) was used to infer the demographic history of the
cork oak lineages present in each cpDNA and nuDNA datasets. Pairwise distances between
haplotypes, time since population expansion (τ), relative population size before (θ0) and after
(θ1) expansion were calculated in ARLEQUIN. The Harpending's (1994) raggedness index
Materials and Methods
50
(r) and the sum of squared deviation (SSD) to assess the statistical significance of the
distribution under the rapid expansion model was tested with 1000 replicates of bootstrap in
ARLEQUIN.
Both Tajima‟s D [116] and Fu‟s Fs [117] tests were implemented to test deviations from
neutrality. Fu‟s Fs uses information from the haplotype distribution and is particularly
sensitive to population demographic expansion where low Fs values indicate an excess of
single substitutions usually due to expansion [117,118]. Tajima‟s D uses the average number
of pairwise differences and number of segregating sites in the intraspecific DNA sequence to
test for departure from neutral expectations, generally assuming negative values in
populations that have experienced size changes, or for sequences that have undergone
selection [116,118]. Fu‟s Fs and Tajima‟s D were calculated in ARLEQUIN.
2.6 Genetic diversity and population differentiation
Linkage disequilibrium (LD) between all pairs of polymorphic SSR loci was calculated using
the probability test implemented in GENEPOP v4.0 software [119]. Using the complete
sampling, the nucleotide diversity (π) and its standard deviation, Haplotype diversity (Hd)
and Indel Haplotype diversity (IndelHd) were estimated for each selected sequenced
fragment in DnaSP v10.01 [120].
Gene diversity statistics (gene diversity He [121] and allelic richness A) were estimated for
microsatellites using the program FSTAT v2.9.3.2 [122,123]. Allelic richness (A) was
corrected using the rarefaction method based on a minimum sample size of 21 diploid
individuals, which corresponded to the smallest number of individuals successfully
genotyped for a given locus in a population. The private alleles were calculated in GenAlEx
v6.3 [124]. The inbreeding coefficient Fis [125] was calculated using ARLEQUIN and its
deviation from zero tested by 10,000 allele permutations. Population differentiation was
calculated by FST [125] and RST [126] in ARLEQUIN.
SMOGD software v1.2.5 [127] was used to measure the actual differentiation among
populations (Dest) according to Jost [128], G‟ST standardized measure of genetic
differentiation [129] and GST nearly unbiased estimator of relative differentiation [130].
Pairwise genetic differentiation between populations was estimated with FST, RST and Dest, in
Materials and Methods
51
FSTAT, ARLEQUIN and SMOGD, respectively. Standard Bonferroni corrections were
applied to account for multiple testing.
Geographic patterns of genetic differentiation were tested by regressing the genetic
differentiation (FST) against geographic distance between pairs of samples, following Rousset
[131] [FST/(1-FST) and logarithm of geographic distances between populations]. The reduced
major axis regression was used to estimate the regression, using the IBDWS v3.03 software
[132]. Mantel tests were used to test the null hypothesis of no relationship between the
genetic and geographic matrices.
2.7 Genetic structure of populations
The Bayesian clustering method implemented in STRUCTURE v2.3.3 [133] was used to
determine the genetic structure of the sampled populations for the microsatellite loci. Because
preliminary analyses showed that overall differentiation was low the new clustering method
was used, which is not only based on the individual multilocus genotypes but also takes into
account the sampling locations [134]. The LocPrior model considers that the prior
distribution of cluster assignments can vary among populations. This approach is
recommended by the authors when the genetic data are not very informative to help the
detection of population structure. A parameter r indicates the extent to which the sampling
locations are informative (small values <1 indicate that locations are informative). Twenty
independent runs were done, following a Markov Chain Monte Carlo (MCMC) scheme, for
each value of K (the number of putative clusters) ranging from 1 to 13 (the number of
populations sampled). The admixture model with sampling locations as prior information
[134] was selected and correlated allele frequencies among populations were assumed [135].
Each run consisted in a MCMC length of 1,000,000 and 50,000 burnin. It was used the
posterior probability of the data for a given K, LnP(D), to identify the most probable number
of clusters using both DK (DeltaK) ad hoc statistics [136] and guidelines of the software
documentation [133]. Once the most likely K value was determined, for interpreting results
was chosen the run with the higher posterior probability and lower variance. Final results
from STRUCTURE were visualized using the software DISTRUCT v1.1 [137].
The degree of population subdivision was also explored as implemented in the R-package
GENELAND v3.2.4 [138]. This latter approach determines the number of groups (K) using a
Materials and Methods
52
Bayesian clustering model executed in a MCMC scheme to detect the location of genetic
discontinuities using individual geo-referenced multilocus genotypes [139]. GENELAND
uses geographical locations of individuals as prior information. This model treats the number
of clusters as a parameter processed by the MCMC scheme without any approximation and
may provide a better estimation of the number of clusters than other proposed procedures that
do not take the geographical locations into account [139,140]. Twenty independent MCMC
runs were performed, allowing K to vary from 1-13 (the number of populations sampled),
with the following parameters: 1,000,000 iterations, of which every hundredth one was saved
(after 10% burnin), treating the number of genetic clusters as unknown and using Dirichlet
model for allelic frequencies (assumed as correlated).
Results obtained following the GENELAND and STRUCTURE approaches were further
tested with an Analysis of Molecular Variance (AMOVA) approach [141].
Results
53
3. Results
3.1 Sequencing of chloroplast and nuclear DNA fragments
3.1.1 cpDNA and nuDNA diversity levels
Initially, 148 samples were sequenced for the cpDNA fragments studied (Table 2.1). The
cpDNA fragments, when aligned for all oak species presented several indels. When
ambiguous alignments were produced, several slightly different alignments including the
removal of the ambiguous positions or indels were tested, without producing any major
differences in the results. The nucleotide diversity found in each dataset was 0.00400 (+/-
0.0009), 0.00925 (+/- 0.0007) and 0.00549 (+/- 0.0004) for the fragments TrnS/PsbC,
TrnH/PsbA and TrnL-F, respectively (Table 3.1). For the cork oak samples (119 out of the
148 individuals), a total of 8 TrnS/PsbC haplotypes, 7 TrnH/PsbA haplotypes, and 5 TrnL-F
haplotypes, were obtained. For the cpDNA concatenated dataset 17 cork oak haplotypes were
detected with a nucleotide diversity of 0.00658 (+/- 0.005) (Table 3.1). After a preliminary
analysis of the cpDNA sequences, 104 samples (out of the 148) from the main groups were
selected and sequenced for candidate gene EST 2T13 (Table 2.1). The alignment was
straightforward showing no potential heterozygous sites for the cork oak samples. The
nucleotide diversity estimated for the EST 2T13 fragment is 0.02387 (+/- 0.0126) (Table 3.1),
with 8 cork oak haplotypes.
Table 3.1: The length (bp), number of parsimony informative sites (PI) and
estimated nucleotide diversity (π) and its standard deviation for each dataset,
using the complete sampling.
Lenght (bp)
Variable sites
Total characters Indels PI π
Individual cpDNA
TrnS/PsbC 250 20 238 12 15 0.00400
+/- 0.0009
TrnH/PsbA 478 54 448 30 34 0.00925
+/- 0.0007
TrnL-F 381 18 374 7 14 0.00549
+/-0.0004
Concatenated
TrnS/TrnH/TrnL 1109 92 1060 49 63 0.00658
+/-0.0005
Individual nuDNA
EST 2T13 249 48 240 9 20 0.02387
+/-0.0126
Results
54
3.1.2 Differentiation patterns
Maximum parsimony (MP) trees for the cpDNA fragments TrnS/PsbC, TrnH/PsbA and
TrnL-F are presented in Fig. 3.1a, Fig. 3.2 and Fig. 3.3, respectively. The Bayesian analyses
(BA) derived trees showed very similar results to those of the MP analysis; therefore it was
decided to present the MP tree of each fragment with the respective bootstrap and clade
credibility values. The concatenated tree supported the results of the individual trees
(Supporting Information 3). In all the cpDNA phylogenetic trees (Fig. 3.1a, Fig. 3.2 and Fig.
3.3), four major groups were distinguished, and were named as Group A, B, C and D. Group
A (highlighted in the figures in yellow) is composed exclusively by samples of the subgenus
Cerris, namely cork oak samples from several populations and Q. cerris. Group B appears as
a more complex group since it is composed of samples of several Quercus species, namely Q.
suber (highlighted in the trees in orange – subg Cerris), and Q. coccifera (green), Q. ilex ilex
(pink) and Q. ilex rotundifolia (red) of the subgenus Sclerophyllodrys. Considering the
presence of cork oak samples in these two groups, and that the samples present in each group
were always the same for all the cpDNA fragments, the haplotypes belonging to Group A
were considered as a pure lineage of cork oak, while the samples belonging to Group B were
considered as an introgressed lineage of cork oak. Group C (highlighted in blue) is composed
by several Quercus species, specifically Q. faginea, Q. robur, Q. pyrenaica, Q. lusitanicus
and Q. canariensis from the subgenus Quercus. Finally, Group D is constituted by Q. rubra
of the subgenus Erythrobalanus.
In particular, Group A is composed of 92 cork oak samples (out of the 119) and the sample of
Q. cerris, and was characterized by low levels of variation and number of haplotypes (Table
3.2). This was particularly evident for the TrnL-F fragment for which only one haplotype was
found (Fig. 3.3), and for TrnH/PsbA fragment where again only one major haplotype is
present, although two derived low frequent haplotypes are present in Puglia (Italy) (Fig. 3.2
and Table 3.2). In both these fragments Q. cerris shares the same haplotype as Q. suber.
Higher variation was found for TrnS/PsbC fragment in cork oaks pure lineage (Table 3.2),
allowing the distinction of tree sublineages, that were named as A1, A2 and A3 (Fig. 3.1 and
Table 3.2). In Fig. 3.1b a reconstruction of the phylogenetic tree for the pure lineage of the
TrnS/PsbC fragment shows the major haplotypes of each sublineage and the mutational
events that occurred during the formation of those haplotypes. The sublineage A1 (sl A1) is
exclusive to the island of Sicily, the sublineage A2 (sl A2) is present in West Mediterranean
Results
55
Figure 3.1: a) Maximum parsimony tree of the cpDNA TrnS/PsbC intergenic spacer region. Four groups are
represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage and Q. cerris (Bright
Yellow - Sublineage A2 (Sl A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1
(Sl A1)); Group B (orange – cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink –
Q. ilex); Group C is highlighted in dark blue and is composed of several Quercus: Q. faginea, Q. robur, Q.
pyrenaica, Q. canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra.
Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and
the Bayesian credibility value; b) Detailed phylogenetic reconstruction of the sublineages from Group A.
Bootstrap support and Bayesian credibility value are provided above each branch. The site combinations bellow
each branch represents the mutational events that occurred along the evolution of the three sublineages.
a) b)
Results
56
Figure 3.2: Maximum parsimony tree of the cpDNA TrnH/PsbA intergenic spacer region. Four groups are
represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage; Group B: cork oak‟s
introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted
in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and
Q. lusitanica; Group D is hightlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the
bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.
Results
57
Figure 3.3: Maximum parsimony tree of the cpDNA TrnL-F intergenic spacer region. Four groups are
represented and color coded. Group A is highlighted in yellow: cork oak‟s pure lineage; Group B: cork oak‟s
introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted
in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and
Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the
bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.
Results
58
Hd Indel Hd nr H
TrnS/PsbC
Pure Lineage (A) 0.000 0.471 4
Introgressed lineage (B) 0.000 0.649 4
TrnH/PsbA Pure Lineage (A) 0.024 0.024 3
Introgressed lineage (B) 0.442 0.695 5
TrnL-F Pure Lineage (A) 0.000 0.000 1
Introgressed lineage (B) 0.613 0.413 4
EST2T13 Pure Lineage (α) 0.494 0.028 6
Introgressed lineage (β) 0.000 0.182 2
Table 3.2: Haplotype diversity (Hd), Indel Haplotype
diversity (Indel Hd) and number of haplotypes for each
cpDNA and nuclear fragment (nr H) according to pure
and introgressed cork oak lineages.
populations, and the sublineage A3 (sl A3) is present in East Mediterranean populations (Fig.
3.4). For this fragment Q. cerris shows a
derived haplotype from the sublineage A3
(Fig. 3.1a).
Group B is composed of all the Q.
coccifera and Q. ilex (subspecies ilex and
rotundifolia) samples that were analysed,
as well as 27 of the 119 cork oak
samples. Most of the cork oak haplotypes
that belong to this group are shared or
seemed derived from the haplotypes
present in the other species.
When comparing cork oak samples in Group B, belonging to the introgressed lineage, they
generally presented more variability than those in Group A, the pure lineage (Table 3.2).
All the three cpDNA fragments seem to be able to distinguish groups C and D, even if with a
low resolution, most evident in the TrnL-F fragment. Also these groups are quite inconsistent
regarding their position on the trees, as well as the phylogenetic relationships between them
and the other groups (Fig. 3.1a, Fig. 3.2 and Fig. 3.3).
The analysis of the barcode matK fragment provided roughly the same nucleotide diversity
(π=0.0067 +/- 0.00095) as the remaining cpDNA fragments analysed, whereas the rbcL
fragment presented no variation for any of the species. The analysis of the matK fragment and
the resulting phylogenetic tree (Fig. 3.5) corroborated the presence of two cork oak lineages.
Quercus cerris and Quercus crenata are classified, both by classic taxonomy [17] and recent
DNA barcode analysis [29], as the most closely related species to Q. suber (subgenus Cerris),
and these species appear with the same haplotype as cork oak‟s samples characterized as pure
lineage (subgenus Cerris). Cork oak samples characterized as the introgressed lineage appear
closely related to Quercus ilex ilex, Quercus ilex rotundifolia and Quercus coccifera
(subgenus Schlerophyllodrys).
Results
59
Figure 3.4: Geographical distribution of cork oak cpDNA haplotype lineages according to the TrnS/PsbC
fragment. Pie charts represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect
the number of samples per population (3-5). Colour codes reflect those in the TrnS/PsbC tree (Fig. 3.1);
Yellow: cork oak‟s Pure lineage (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3;
Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage. In grey is represented the present
distribution of the species.
Figure 3.5: Maximum parsimony tree of the cpDNA fragment matK. Four groups are represented and
classified according to classic taxonomy (Tutin et al. 1993), following the four subgenera identified by
Schwartz 1964 (Sclerophyllodrys, Cerris, Quercus and Erythrobalanus). Numbers at the nodes are the
bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility
value.
Results
60
For the nuclear fragment EST 2T13, the best model of sequence evolution for the BA was
calculated, and the resultant tree presented a very similar topology to that of the MP. The MP
tree is shown in Fig. 3.6a with the respective bootstrap values of MP and clade credibility
values for the BA. The nuclear tree reflects the same pattern as the cpDNA trees. The same
four major groups are present, here named as Group α, β, γ and δ.
Group α is constituted by Q. suber [40 samples out of the 52 used, (Table 2.1)] and Q. cerris,
similarly to the cpDNA results. The pattern of three sublineages is also present, α1, α2 and
α3, although Q. cerris in the nuclear fragment appears to share the same haplotype of the
cork oak samples from sublineage α3. In Fig. 3.6b a reconstruction of the phylogenetic tree
for the nuclear pure lineage of the EST 2T13 fragment shows the major haplotypes of each
sublineage and the six mutational events that occurred during the formation of those
haplotypes. Group β, as in the cpDNA trees, is constituted by cork oak, Q. coccifera and Q.
ilex samples. Also as in the cpDNA trees, the Group γ is composed of the Quercus species
belonging to the subg. Quercus and Group δ is constituted by Q. rubra from the subgenus
Schlerophyllodrys. The phylogenetic relationships between the four groups more closely
resemble those of the cpDNA fragment TrnH/PsbA tree. The major differences found
between the nuDNA and the cpDNA datasets are the cork oak samples that compose Group α
(and one could call the nuclear pure lineage), and subsequent sublineages, and Group β (the
nuclear introgressed lineage), that are not always the same when comparing the fragments
from both genomes. In particular, sublineage α1 in the nuclear DNA is not exclusively
composed of cork oak samples from Sicily island as in the cpDNA fragments, showing
samples that in the cpDNA belonged to sublineage A3; sublineage α2 is not completely West
Mediterranean in the nuclear DNA presenting samples from the sublineage α3 as well as
from the introgressed lineage; the sublineage α3 also loses its exclusiveness to East
Mediterranean populations being constituded by samples that in the cpDNA trees belong to
the sublineage α2 and introgressed lineage (Fig. 3.7). Another difference between the
cpDNA and the nuclear DNA was the arrangement of the cork oak samples that compose
Group β. These samples do not share the same haplotypes with Q. ilex and Q. coccifera as
they did in the cpDNA fragments. Instead they present a major haplotype derived from a
Results
61
Figure 3.6: a) Maximum parsimony tree of the candidate gene EST 2T13. Four groups are represented and color
coded. Group α is highlighted in yellow: cork oak‟s pure lineage and Q. cerris (Bright Yellow - Sublineage α2;
Brownish-Yellow – Sublineage α3; Light Yellow – Sublineage α1); Group β (Orange – cork oak‟s introgressed
lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group γ is highlighted in dark blue and is
composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica:
Group δ is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap
support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value. b) Detailed
phylogenetic reconstruction of the sublineages from Group α. Bootstrap support and Bayesian credibility value
are provided above each branch. The site combinations bellow each branch represents the 6 mutational events
that occurred along the evolution of the three sublineages.
a)
b)
Results
62
Figure 3.7: Geographical distribution of cork oak nuDNA EST 2T13 haplotype lineages. Pie charts
represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect the number of
samples per population (3-5). Colour codes reflect those in the EST 2T13 tree (Fig. 3.6); Yellow: cork oak‟s
nuclear pure lineage (Bright Yellow - Sublineage A2‟; Brownish-Yellow – Sublineage A3‟; Light Yellow –
Sublineage A1‟); Orange: cork oak‟s nuclear introgressed lineage. In grey is represented the present
distribution of the species.
common ancestor, shared with Q. ilex and Q. coccifera. The diversity levels in the nuclear
pure lineage are not as low as in the cpDNA (Table 3.2), showing 6 haplotypes.
However, the diversity levels of the cork oak samples that belong to the nuclear introgressed
lineage in Group β are lower when compared to the diversity of the other species in this
group (Table 3.2 and Fig. 3.6). In Group γ the diversity, however is higher than that of the
cpDNA since each species is characterized by its own haplotype (Fig. 3.6).
Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Supporting
Information 4) reflecting the four major groups in the trees and the shared haplotypes for Q.
suber, in clade B, with Quercus coccifera, Q. rotundifolia and Q. ilex.
3.1.3 Mismatch distribution and neutrality tests
Demographic histories of both cork oak lineages were evaluated with mismatch distributions
and tests of the standard neutral model for a demographically stable population (Tajima‟s D
[116] and Fu‟s Fs [117]) (Table 3.3). The TrnS/PsbC fragment sequence analysis provided
Results
63
slightly contradicting results. The null hypothesis of population demographic expansion was
not rejected based on the mismatch distribution for neither of the lineages (pure lineage –
SSD=0.098, p=0.061; r=0.436, p=0.164/ introgressed lineage – SSD=0.026, p=0.086;
r=0.186, p=0.055), but p values are somehow marginal and these statistics are conservative
and use little information of the data. Detecting population demographic size changes can be
difficult with small sample sizes or haplotypes, or when the population has experienced a
very recent expansion. Fu‟s Fs has been shown to be more powerful than mismatch
distributions in detecting both very recent and older population expansions [117,118], and
this statistic (such as Tajima‟s D) did not support population expansion for either of the
lineages (Table 3.3). The TrnH/PsbA fragment analysis, for the pure lineage showed a strong
evidence of recent population expansion from the not significant sum of squared deviations
(SSD=0.000, p=0.183) and Harpending‟s raggedness index (r=0.822, p=0.837), and
significant (p<0.001) negative values of Fu‟s Fs. Tajima‟s D values, although not significant
(p=0.119) presented also a negative tendency (D=-1.047). TrnH/PsbA introgressed lineage
presented no evidence of population expansion. SSD and r values rejected the null hypothesis
of expansion supported by the values of D and Fs. It was not possible to calculate the
mismatch and Fu‟s Fs for the pure lineage of the TrnL-F fragment because only one
haplotype is present. The TrnL-F introgressed lineage presented a mismatch distribution that
departed (although marginally) from the stepwise growth model (SSD=0.132, p=0.076), but
fit to the Harpending‟s raggedness index of stepwise population expansion model (r=0.491,
p=0.019). Fu‟s Fs values and Tajima‟s D were positive and not significant rejecting
population expansion (Table 3.3).
The nuDNA fragment EST 2T13 was also evaluated for its demographic history and
neutrality. For the nuclear pure lineage the null hypothesis of demographic expansion based
on the Harpending‟s raggedness index of the mismatch distribution was not rejected
(r=0.145, p=1.00). However the SSD value rejected the null hypothesis at a highly significant
level (SSD=0.328, p=0.00), supported by the non-significant values of Fu‟s Fs and Tajima‟s
D (although both values are negative) (Table 3.3). For the nuclear introgressed lineage the
mismatch analysis indicates demographic expansion, although this is not supported by Fu‟s
Fs and Tajima‟s D tests (Table 3.3).
Results
64
Missmatch Tajima's D Fu's Fs
τ Ɵ0 Ɵ1 SSD r D Fs
TrnS-PsbC
Pure lineage (A) 2.648 0.000 1.115 0.098 ns 0.436 ns 0.000 ns 0.947 ns
Introgressed lineage (B) 0.959 0.000 99999.000 0.026 ns 0.186 ns 0.000 ns -0.271 ns
TrnH-PsbA
Pure lineage (A) 3.000 0.000 0.050 0.000 ns 0.882 ns -1.047 ns -3.773 ***
Introgressed lineage (B) 10.273 0.000 8.887 0.236 ** 0.461 *** 0.047 ns 5.891 ns
TrnL-F
Pure lineage (A) - - - - - 0.000 ns -
Introgressed lineage (B) 2.484 0.002 3.000 0.132 ns 0.491 * 0.771 ns 1.290 ns
EST 2T13
Pure lineage (α) 0.000 0.000 3413.950 0.328 *** 0.145 ns -1.323 ns -0.741 ns
Introgressed lineage (β) 2.965 0.450 0.450 0.021 ns 0.457 ns 0.000 ns -0.176 ns
3.2 Microsatellite analysis
3.2.1 Genetic diversity values
For the EST-SSRs markers, QmDN1 locus was apparently monomorphic and was discarded
from any subsequent analysis. Global evaluation of the microsatellite data set using Micro-
Checker [102] revealed no evidence of genotyping errors due to stuttering or large allele
dropout, but identified possible null alleles at two markers: QmOST1 and QmDN3. For
QmOST1 locus, although marginally, there is the possibility of null alleles for the
populations HAZ and MEK (Supporting Information 5 – Fig. 5.1). As for the QmDN3 locus
revealed indices of null alleles in all populations and, therefore, was eliminated from all
subsequent analyses (Supporting Information 5 – Fig. 5.2). For the three remaining EST-
SSRs no linkage disequilibrium between the loci was detected (Supporting Information 6).
The number of total alleles (NA) in each population ranged from nine to fourteen and the
allelic richness (A) from 3.000 to 4.109, being the SIN population that clearly presented the
higher number of alleles and consequent the highest allelic richness. Gene diversity (expected
heterozygosity over loci) ranged from 0.400 in SIN to 0.598 in CAT (Table 3.4). Only the
population of SIN departed significantly from Hardy-Weinberg equilibrium at 0.01
Table 3.3: Estimates of mismatch distribution parameters and neutrality tests. τ = (tau) time since population
expansion; θ = relative population size before (θ0) and after (θ1) expansion; SSD = sum of squared deviations; r
= Harpending‟s raggedness index; D = Tajima‟s D; Fs = Fu‟s Fs; ns = not significant; * Significant at p<0.05;
*** Significant at p<0.001
Results
65
significance level (Table 3.4). The inbreeding coefficient for the SIN population was positive
(Fis=0.1691), and as the species, although monoeicious, presents a protandrous system to
ensure cross-pollination, significant deviation from zero should reflect biparental inbreeding
or population substructure (Table 3.4). In total, 18 alleles were identified at the three loci, and
7 alleles were exclusive to a single population (private alleles). Of these, 5 were exclusive to
SIN and the others to MEK and CAT (Table 3.4). The private alleles were at the extremes of
the allele size distribution and occurred at very low frequencies.
EST-SSRs (3 loci) nuSSRs (5 loci)
Country Code+ NA A PA Fis Ho He NA A PA Fis Ho He
Portugal ARR 10 3.315 - 0.0162 ns 0.522 0.531 34 6.165 3 0.1064 ns 0.538 0.601
BUC 10 3.025 - -0.0005 ns 0.422 0.422 34 5.997 1 0.0399 ns 0.553 0.576
EST 10 3.167 - 0.2190 ns 0.375 0.479 32 5.703 - 0.1449 ns 0.506 0.591
GER 10 3.228 - -0.0840 ns 0.506 0.467 30 5.635 - 0.0586 ns 0.557 0.591
MON 11 3.479 - 0.1769 ns 0.393 0.476 35 6.494 1 0.1579 * 0.521 0.617
SIN 14 4.109 5 0.1691 ** 0.333 0.400 33 6.158 1 -0.0665 ns 0.621 0.583
Algeria ALG 10 3.202 - -0.0055 ns 0.522 0.519 38 6.759 1 0.0920 ns 0.510 0.548
Spain CAT 11 3.465 1 -0.0218 ns 0.611 0.598 29 5.354 - 0.0628 ns 0,533 0.569
HAZ 10 3.230 - 0.1188 ns 0.512 0.580 32 5.796 1 0.1181 ns 0,469 0.531
Marocco TAZ 10 3.276 - 0.1083 ns 0.478 0.535 34 6.138 1 0.0261 ns 0.538 0.552
KEN 11 3.365 - -0.2336 ns 0.700 0.570 26 4.938 - 0.0433 ns 0.630 0.658
Tunisia MEK 11 3.480 1 0.1689 ns 0.441 0.528 29 5.433 - 0.0348 ns 0.508 0.526
Italy PUG 9 3.000 - 0.0403 ns 0.444 0.463 28 5.600 - 0.0642 ns 0.591 0.630
For the nuSSRs, As previously shown by Burgarella et al. [142] the locus MSQ13 appears to
be particularly informative to detect F1 hybrids between Q. suber and Q. rotundifolia because
the allele sizes do not overlap [88,142]. The locus was tested in some individuals for each
population (including all the individuals that were detected as belonging to the introgressed
lineages), revealing to be monomorphic at the expected allele size for Q. suber. Thus, the
locus was not used in the following analysis. Global evaluation of the microsatellite data set
Table 3.4: Populations of Quercus suber sampled for the molecular genetic work with SSRs, including country
and population abbreviations, number of total alleles (NA), allelic richness (A), number of private alleles (PA),
expected (He) and observed (Ho) heterozygosities, and within-population inbreeding coefficients (Fis).
+See Fig. 3.4 for visual location on a map of Europe.
Significance levels after Bonferroni corrections: Ns – Not significant; ** Significant at p<0.01; * Significant at
p<0.05
Results
66
using Micro-Checker revealed no evidence of genotyping errors due to stuttering or large
allele dropout, but identified possible null alleles in a few populations for markers
QpZAG110, QrZAG20, QrZAG11 and QpZAG15 (Supporting Information 5 – Fig. 5.3, Fig.
5.4, Fig. 5.5). Also, the QpZAG15 locus revealed a departure from the Hardy-Weinberg
equilibrium (HWE) in 9 populations (data not shown). Considering all, this locus was
removed from subsequent analyses. No linkage disequilibrium between the remaining loci
was detected (Supporting Information 6). The number of total alleles (NA) in each population
ranged from 26 (KEN) to 36 (ALG) and the allelic richness (A) from 4.938 to 6.759. The
gene diversity (expected heterozygosity over loci) ranged from 0.526 in Mekna to 0.658 in
Kenitra (Table 3.4). These values are slightly higher than those obtained for the EST-SSRs in
every population, with the exception of the Spanish and Tunisian populations. Only the
population of MON departed significantly from HWE at 0.05 significance level, after
Bonferroni correction (Table 3.4). Fis for the MON population assumed a positive value, and
could reflect biparental inbreeding or population substructure (Table 3.4). However, in this
case, considering that Micro-Checker marginally detected null alleles for this population for
the QrZAG20 locus, this effect cannot be discarded. In total, 56 alleles were identified at the
five loci, and nine alleles were private alleles. The private alleles presented no particular
distribution over the populations as did those of the EST-SSRs, although a slight tendency for
the population of ARR that has 3 of the nine alleles (Table 3.4). The private alleles were
mostly at the extremes of the allele size distribution and occurred at low frequencies.
No microsatellite (either nuSSR or EST-SSR) revealed evidence of nonneutrality after the
Ewens–Watterson and Ewens–Watterson–Slatkin tests (data not shown).
3.2.2 Genetic differentiation among populations
Different coefficients of genetic differentiation among populations were estimated for both
types of SSRs markers (Table 3.5). All the coefficients displayed higher values for the EST-
SSRs than for the nuSSRs, and consistently in both markers G‟ST and D displayed slightly
higher values than FST, GST and RST (Table 3.5). GST and FST showed that differentiation
among populations was more than double in the case of EST-SSRs (GST=0.066 EST-SSRs vs.
0.031 nuSSRs; FST=0.071 EST-SSR‟s vs 0.032 nuSSRs). Nevertheless, for the remaining
Results
67
Table 3.5: Genetic statistics for EST-SSRs and nuSSRs. Number of alleles (NA),
allelic richness (A), observed (Ho) andexpected (He) heterozygosities, FST
differentiation among populations according to Wier and Cockerham [125]; RST
differentiation among populations according to Slatkin [126], GST proportion among
population differentiation according to Nei & Chesser [130], G'ST standardized measure
of genetic differentiation according to Hedrick [129], and Dest estimator of actual
differentiation according to Jost [128]
Locus NA A Ho He FST RST GST G'ST Dest
EST-SSRs
QrOST1 9 4.320 0.540 0.610 0.064 0.038 0.060 0.148 0.093
QpD12 3 2.999 0.403 0.474 0.139 0.142 0.130 0.229 0.114
QmAJ1 6 3.182 0.501 0.542 0.017 0.019 0.020 0.045 0.025
All 18 1.750 0.481 0.542 0.071 0.066 0.066 0.141 0.077
nuSSR's
QpZAG110 23 13.149 0.817 0.872 0.022 -0.004 0.023 0.169 0.149
QpZAG9 7 3.124 0.138 0.142 0.014 0.015 0.014 0.016 0.003
QrZAG20 5 3.894 0.449 0.557 0.035 0.042 0.036 0.081 0.047
QrZAG7 10 6.625 0.689 0.756 0.057 0.138 0.055 0.204 0.158
QrZAG11 11 5.755 0.580 0.628 0.010 0.032 0.015 0.042 0.027
All 56 6.509 0.535 0.591 0.032 0.045 0.031 0.102 0.077
coefficients (RST, G‟ST and D) the differences between the markers are not significant and,
interestingly, the value of actual differentiation among populations calculated according to
Jost D for both SSRs was the same (Dest=0.077) (Table 3.5).
Tests of pairwise FST and RST were performed for the thirteen populations, for both EST-
SSRs and nuSSRs. There was a tendency for obtaining higher values in the EST-SSR data
matrix, but not always so. Therefore both SSR matrices were analysed together. The overall
genetic differentiation at the microsatellite loci was low (Pairwise FST from 0.000 to 0.123),
though highly significant (p<0.001) after bonferroni correction in 51 out of 78 pairs (Table
3.6). The RST matrix values very resembled the ones of the FST matrix. The highest values
were obtained for the populations CAT and KEN, followed by PUG. The Dest values,
although similar to the FST and RST values, tend to be lower (Supporting Information 7).
Isolation by distance was tested using a Mantel test but no correlation was found between
genetic differentiation and geographic distance among populations (r=0.1082, p=0.26).
Results
68
ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK
ALG -- 0.023 0.044 0.055 0.013 0.063 0.021 0.075 0.073 0.022 0.059 0.041 0.000
ARR 0.023 ***
-- 0.000 0.052 0.020 0.004 0.000 0.055 0.065 0.034 0.001 0.008 0.046
BUC 0.043 ***
0.000 ns
-- 0.094 0.056 0.007 0.004 0.060 0.108 0.043 0.013 0.009 0.067
CAT 0.052 ***
0.050 ***
0.086 ***
-- 0.051 0.094 0.084 0.141 0.083 0.067 0.096 0.092 0.086
HAZ 0.013
ns 0.020
ns 0.053 ***
0.049 ***
-- 0.041 0.025 0.087 0.043 0.021 0.050 0.065 0.031
EST 0.035 ***
0.003 ns
0.007 ns
0.086 ***
0.039 ***
-- 0.000 0.056 0.092 0.034 0.006 0.035 0.044
GER 0.021 ***
0.000 ns
0.004 ns
0.077 ***
0.025 ***
0.000 ns
-- 0.039 0.073 0.028 0.004 0.023 0.037
PUG 0.070 ***
0.052 ***
0.057 ***
0.123 ***
0.080 ***
0.053 ***
0.038 ***
-- 0.101 0.055 0.072 0.100 0.084
KEN 0.068 ***
0.061 ***
0.097 ***
0.076 ***
0.042 ***
0.084 ***
0.068 ***
0.092 ***
-- 0.052 0.120 0.132 0.093
TAZ 0.021
ns 0.033 ***
0.041 ***
0.063 ***
0.020 ns
0.033 ns
0.027 ***
0.052 ***
0.049 ***
-- 0.068 0.082 0.036
MON 0.056 ***
0.001 ns
0.013 ns
0.087 ***
0.048 ***
0.006 ns
0.004 ns
0.067 ***
0.107 ***
0.063 ***
-- 0.035 0.076
SIN 0.039 ***
0.008 ns
0.009 ns
0.084 ***
0.061 ***
0.034 ***
0.022 ***
0.091 ***
0.117 ***
0.076 ***
0.034 ***
-- 0.069
MEK 0.000
ns 0.044 ***
0.063 ***
0.079 ***
0.030 ***
0.042 ***
0.036 ***
0.077 ***
0.085 ***
0.035 ***
0.070 ***
0.065 ***
--
ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK
ns=Not significant; *p<0.05; ** p<0.01, *** p<0.001
Table 3.6: Pairwise FST (Below) and RST (Upper) values between every population.
3.2.3 Population structure
The EST-SSRs and nuSSRs datasets were analysed separately and then merged together to
determine the populations genetic structure (Fig. 3.8). For the EST-SSR‟s, in the software
STRUCTURE, the logarithm of the probability of the data [LnP(D)] as function of K reached
a peak for K=3 (mean values: LnP(D)=-2055.3; var[LnP(D)]=131.2), which was confirmed
using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.1). For the nuSSR‟s
dataset the LnP(D) reached a peak at K=4 (mean values: LnP(D)=-4749.3;
var[LnP(D)]=238.9) and then decreased, but there was a higher DK value for K=3 than for
K=4 using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.2). For the
combined dataset, the LnP(D) reached a peak ate K=4 (mean values: LnP(D)=-6704.3;
var[LnP(D)]=303.8), but when DK was used to infer the number of clusters, K=2 presented
Results
69
the highest values, however there was a second peak at K=4 (Supporting Information 8 – Fig.
S8.3).
For the most likely run for each K, the r value was always low and below 1, indicating that
the sample locations were informative and helped greatly to find the population structure.
When comparing the results from the EST-SSRs and nuSSRs datasets the results are slightly
different, which is not completely unexpected considering the different types of SSRs (Fig.
3.8a and Fig. 3.8b). However, for both datasets each population can almost be completely
assigned to one of the clusters detected. When K=2, for the EST-SSRs the populations CAT
and KEN can be assigned to one cluster (pink cluster), as ALG, ARR, BUC, EST, GER,
PUG, MON and SIN to the other cluster (blue cluster). The populations HAZ, TAZ and MEK
appear as a mixture of both clusters (Fig. 3.8a). For the nuSSRs dataset, the groups are
different, as MEK appears differentiated from the remaining populations in the blue cluster
and ALG and HAZ as mixed populations, although slightly more similar to MEK (Fig. 3.8b).
Despite of the validation of K=3 for the EST-SSRs most of the populations appear as a
mixture of clusters. The population of CAT appears differentiated, alone in one of the clusters
(pink cluster), the same way as SIN appears in another cluster (blue cluster) (although some
individuals show more probability of belonging to the pink cluster, along with CAT) (Fig.
3.8b). For the nuSSRs, at K=4, CAT also appears differentiated, alone in one cluster (blue
cluster). The Italian population, PUG, can also be placed alone in another cluster (green
cluster). The Portuguese populations (ARR, BUC, EST, GER, MON and SIN) can all also be,
to same extent, placed in a third cluster (pink cluster), and HAZ and TAZ appear as mixed
populations (Fig. 3.8b).
For a more robust analysis both matrices were merged together (Fig. 3.8c). At K=2 the
populations of ALG, CAT and MEK appear as part of the same pink cluster (79%, 88% and
91% of assignment probabilities, respectively), and the Portuguese populations and PUG as
part of the blue cluster (94% on average for the Portuguese populations and 85% for PUG).
HAZ, KEN and TAZ appear as mixed populations, with a slight tendency for the pink cluster
(Fig. 3.8c). At K=4 CAT differentiates from the other populations (75%) in a green cluster.
PUG and KEN appear as part of the same yellow cluster (79% and 74%, respectively). The
MEK population differentiates in another cluster (85% for the pink cluster) and HAZ and
ALG appear as mixed populations although more closely related to the MEK cluster. The
Results
70
Portuguese populations appear all together in the blue cluster (83% on average), with the
GER populations as the most mixed population in the group. The population TAZ is a mixed
population between several clusters (Fig. 3.8c). The geographic distribution of the clusters
obtained by STRUCTURE for the combined SSRs dataset is presented in Fig. 3.9a for K=2
and in Fig. 3.9b for K=4.
To complement the analyses run in STRUCTURE, GENELAND analysis was performed on
the merged dataset. The geographical distribution of the six clusters detected is shown in Fig.
3.9c. The first cluster (purple) was composed of the Portuguese populations (EST, GER,
BUC, MON, SIN and ARR); the second (orange) was composed only by KEN; the third
cluster (green) grouped the populations HAZ and TAZ; the fourth cluster (grey) included a
single population, CAT; the fifth cluster (blue) comprised the populations of ALG and MEK;
and the sixth cluster (red) considered only PUG.
AMOVA considering the clusters formed in GENELAND and STRUCTURE analysis (Fig.
3.8 and Fig. 3.9) was always significant for the clusters detected at the 0.001 level but also
showed that the great majority of genetic variation was found within populations (94%).
Also, for the molecular analysis considering the 6 clusters (structure obtained by the software
GENELAND) we were able to obtain the highest value for the genetic differentiation
between groups (FCT=4.99) (Supporting Information 9).
Results
71
Figure 3.8: Structure clustering results obtained for the a) EST-SSRs dataset (K=2 and 3); b) nuSSRs dataset
(K=2, 3 and 4); and c) combined dataset (K=2, 3 and 4). Populations are separated by black bars and identified
at the bottom. In all analyses, each distinct cluster is represented by a unique colour. Each individual is
represented by a thin bar and the colours on each vertical bar represent the probability of the individual
belonging to each cluster.
a)
b)
c)
Results
72
Figure 3.9: Geographic distribution of the clusters obtained by STRUCTURE and GENELAND:
a) combined dataset with Structure for K=2; b) combined dataset for Structure with K=4; and c)
combined dataset for GENELAND with K=6. Pie charts represent the assignment probabilities to
each cluster, and each cluster is colour coded. Pie charts sizes reflect the number of samples per
population (22-32). For a) and b) the colour codes reflect the ones used in Fig. 7c to code each
cluster.
a)
b)
c)
Discussion
73
4. Discussion
4.1 Differentiation and demographic patterns
Maternally inherited cpDNA markers yield valuable information about genetic variability
associated with local populations or provenances [143], therefore the geographic patterns of
cpDNA haplotypes in many widespread European forest trees are sometimes interpreted
based on the assumption of survival as glacial refugia in South and Eastern Europe – outside
the limits of the Weichselion ice sheet – and postglacial migration. Some species appear to
have spread northwards and westwards from a single refuge while others spread from
multiple refugia [48,54,57,70,144].
Analysis of the sequencing data from cpDNA regions, clearly show (with the exception of the
rbcL fragment) the presence of two well established cork oak lineages, the pure lineage and
the introgressed lineage (supported as well by the sequencing of the nuclear candidate gene).
The cpDNA pure lineage here described seems to be related with the “suber” lineage
described previously by Jiménez et al. [41], which is almost specific to cork oak populations
and may be considered as the original and most widely distributed lineage in this species
[41,64]. The TrnS/PsbC fragments presented the highest resolution power regarding this
lineage and three main haplotypes (A1, A2 and A3) are evident (Fig. 3.1 and Fig 3.4). These
three sublineages have well delimited geographic areas and possibly reflect refuge areas from
where expansion events putatively occurred after the last glaciation, which is somewhat
supported by the values from mismatch distribution and neutrality tests (Table 3.3). The
previous works of López de Heredia et al. [52] and Lumaret et al. [43] have indicated the
southern Iberian Peninsula as a possible refuge area, supported by palynological data.
Although the results found in this work are not conclusive enough to support this idea, the
sublineage A2 appears to have spread from a western Mediterranean area, consistent with a
refuge area in the Iberian Peninsula. Lumaret et al. [43], based on RFLP analysis of the
whole cpDNA, indicated two more possible refuge areas for cork oak, more precisely
southern Italian Peninsula and North Africa, albeit this is not supported from fossil record
[52]. It is difficult to determine the origin of sl A3 because this haplotype is distributed
throughout most of Peninsular Italy and North Africa (Algeria and Tunisia). However, any of
these geographic areas could have been a refuge for this lineage in cork oak in agreement
Discussion
74
with the results presented by Lumaret et al. [43]. Nevertheless, the presence of a haplotype (sl
A2) restricted to the Sicily Island was unexpected. Although no previous work suggests
Sicily as a refuge area, the geographic restriction of this lineage and the fact that is more or
less contemporaneous to the other two sublineages, suggests that this might be indeed a
refuge area for cork oak.
It is also possible that the extensive introgression of Q. suber by Q. ilex may indicate several
potential refugia areas. In fact López de Heredia et al. [52] presents North-eastern Spain
(Catalonia) as a potential refuge area resultant from extensive hybridization with Q. ilex [52].
The authors argue that the populations from this area present a predominant “ilex chlorotype”
that is very rare in holm oak. Therefore it cannot be discarded the hypothesis that some
populations might have withstood the glacial conditions in this area (or any other area), by
hybridizing. Although it is not possible to fully corroborate this, it was found that in CAT
population there is the indication of a total replacement of the cpDNA pure lineage, which
might indicate that the events of introgression might be ancient, and indeed reflect a glacial
refuge area. The same complete replacement of the cpDNA pure lineage appears to have
happened in MEK, and almost completely in HAZ.
However, more detailed inferences about the geographic origins of the haplotypes and their
migration scenarios will require additional sampling of populations and most likely other
genomic regions because the lower cpDNA variation itself could bias the identification of
glacial refugia for Quercus suber.
There is no previous works using sequences from the nuclear genome in this species.
However, in comparison with the results from the cpDNA sequences, the nuclear DNA
fragment seems to be in fact more informative than the cpDNA. The nucleotide diversity is
higher than those from the cpDNA fragments, as well as the haplotype number found for the
pure lineage (Table 3.1 and Table 3.2). Also, the analysis shows a more complex geographic
distribution history for cork oak. The results obtained, just like for the cpDNA, showed a
pure lineage composed by three sublineages, but the distribution of the sublineages are not as
geographically structured as they were for the cpDNA dataset (Fig. 3.6 and Fig. 3.7). The
sublineage α3 provided by the nuDNA dataset, that in the cpDNA was restricted to the Sicily
Island, extends to Lazio (Italy) and Tunisia. The sublineage α1, equivalent to the cpDNA
sublineage A1, was still the most frequent sublineage, but at the nuDNA it is not restricted to
Discussion
75
the western part of the Mediterranean as it was in the cpDNA, showing an extended
distribution, although not so frequently, to the eastern part of the Mediterranean. The same
was detected for the sublineage α2 that seemed not to be restricted to the eastern part of the
species distribution. These differences between cpDNA and nuDNA sequence data can be
explained by long-distance pollen dispersal and/or high levels of polymorphism. However,
considering the results for the levels of polymorphism in the candidate gene (Table 3.1 and
Table 3.2) they do not appear to be high enough to justify these differences and long-distance
pollen dispersal, with the more limited acorn dispersal, seems to be a better explanation. This
is consistent with indirect methods based on measures of genetic differentiation for nuclear
versus cpDNA markers in oaks, which suggest that pollen flow is much higher (by two orders
of magnitude) than seed flow [145-147].
The pattern of three sublineages obtained in this work clearly contrasts with the one
previously found by Magri et al [16]. Using cpDNA microsatellites, the authors analysed
cork oak populations throughout the species distribution range and found a high geographical
structure characterized by five distinct haplotypes (Fig. 1.4). The cpDNA SSR data combined
with paleobotanical and geodynamics models lead the authors to suggest an early Cenozoic
origin for cork oak in the Iberian Peninsula and a susequent genetic drift geographically
consistent with the Oligocene and Miocene break-up events [16] (Fig. 1.5). All these events
seemed to have occurred without detectable cpDNA modifications for a time span of at least
15-25 million years. This is somehow also inconsistent with the results found in this work. As
most of the cpDNA fragments sequenced here actually showed no resolution and therefore
haplotype variation that could detect the three sublineages, the TrnS/PsbC fragment indeed
shows that the sublineages are formed by a single mutational event (Fig. 3.1), which is
unlikely to date to an early Cenozoic.
4.2 Hybridization and introgression
Several proposals for Quercus taxonomy based on morphology have been presented [12,26].
Classifications have not been straightforward and especially at the subgenus level, are
uncertain. The taxonomic scheme proposed by Schwarz [26] is possibly the most accepted for
Discussion
76
the classification of cork oak, and appears to be the most suitable in describing the
systematics of European oaks [19,31,32].
Upon sequencing of the cpDNA fragments for the eleven Quercus species used in this study,
with the exception of the rbcL fragment that presented no sequence variation between all the
11 species used, the remaining 4 cpDNA fragments (matK, TrnS/PsbC, TrnL-F and
TrnH/PsbA) in general were able to distinguish the 4 subgenus (or subsections) (Fig. 3.1a,
Fig. 3.2, Fig. 3.3 and Fig. 3.5) proposed by Schwarz [26] (Quercus, Erythrobalanus,
Sclerophyllodrys and Cerris). However, the phylogenetic relationships between the subgenus
are uncertain among fragments and it is not possible to make accurate inferences about those
relationships. Also, in accordance to the latest work of Piredda et al. [29], it remains the idea
that the genus Quercus is noncompliant to barcoding with the most common cpDNA
sequences, since most of the species analysed within the same subgenus share the same
cpDNA haplotype. The low levels of cpDNA variation rate and hybridization events are
likely to be the cause [29].
The nuclear DNA, however, has a lot more discrimination power than the cpDNA. In fact the
EST 2T13 fragment supports the recognition of the subgenus Sclerophyllodrys, Cerris,
Erythrobalanus and Quercus, in agreement with the works of Bellarosa et al. [28] and
Bellarosa et al. [27] that also used fragments of the nuclear genome [27,28]. Also, the EST
2T13 fragment distinguishes all the species analysed, and although this issue requires further
study it supports the idea that the nuclear DNA might be a useful supplementary barcode tool
in difficult genus such as Quercus.
The complex evolutionary history of the Mediterranean evergreen oaks has already been
addressed by other authors, that showed that Q. suber, Q. ilex and Q. coccifera present shared
haplotypes as a result of successful hybridization and introgression of Q. suber by Q. ilex
[41,43,52]. However, those results were based on RFLP analysis over the cpDNA only and
with no insight on the nuclear genome. The sequencing of the cpDNA fragments immediately
evidences the introgression events in Q. suber. Since the subgenus Sclerophyllodrys and
Cerris are clearly distinguishable in the phylogenetic trees constructed (Fig. 3.1, Fig. 3.2, Fig.
3.3 and Fig. 3.5), the presence of cork oak samples in both subgenus easily points to
Discussion
77
introgression of Q. suber, allowing the identification of a pure lineage of cork oak haplotypes
in the subg Cerris, and an introgressed lineage in the subg Sclerophyllodrys.
The distribution of the cpDNA introgressed lineage appears restricted to the Western area of
the species distribution and peripheral regarding the distribution of the pure lineage
(specifically the sublineage A3). Although it is not possible to date precisely the introgression
events some may in fact reflect glacial refugia in this area of the distribution [possibly in the
North-eastern Spain (Catalonia) and/or Morocco] where cork oak populations survived with
introgression with Q. rotundifolia. In the postglacial colonization events of range expansion
the rapid expansion of cork oak from the pure lineage refuge may have limited the expansion
of the introgressed lineage forming the mixed populations that present both haplotype
lineages (Fig. 3.4). On the other hand, the analysis of the phylogenetic trees doesn‟t allow
ruling out the hypothesis of more recent or current introgression events (Fig. 3.1, Fig. 3.2,
Fig. 3.3). Current hybridization is still happening, most frequently in central and eastern
Iberia, with the first-generation hybrids between Q. suber and Q. ilex being easily identified
in the field [52].
The same introgressed lineage seems to be present in the nuclear DNA, although there is no
previous reference. However, the cork oak samples belonging to the introgressed lineage are
not always the same in both genomes. That is, some of the samples of the cpDNA
introgressed lineage present a nuclear genome of the pure lineage as others present evidences
of a nuclear introgressed lineage, and also some samples with the cpDNA belonging to the
pure lineage present a nuclear genome from the introgressed lineage.
The flowering phenology and present day ecology of the two species suggest that pollen-flow
might be expected to be predominantly from Q. suber into Q. ilex. Quercus suber performs
better than Q. ilex as a pollen parent in interspecific crosses [45]. Molecular evidence provide
support for this expectation [18,43,71]. These evidences would explain the cork oak samples
that present an introgressed cpDNA, but where the nuclear fragment belongs to the pure
lineage (see, for example, samples TAZ 1 or HAZ 5). However the reverse also seems to
happen, because samples were found that present a cpDNA from the pure lineage, and the
nuclear DNA belongs to the introgressed lineage (see TOL 3 or LAZ 2). Interestingly some
Discussion
78
samples (see GER 5 or TAZ 2) present both cpDNA and nuDNA fragments of the
introgressed lineage at the same time.
The fact that in the subg. Sclerophyllodrys the species Q. ilex and Q. coccifera present the
same haplotypes was suggested previously by some authors to be a result of introgression
between these species or of incomplete lineage sorting [41,64]. The same happens in the subg
Cerris, between Q. cerris and Q. suber. The lack of resolution of the cpDNA might argue for
incomplete lineage sorting, but previous authors suggested introgression between these
closely related species [16]. Despite Quercus suber and Quercus cerris belong to the same
taxonomic group, subgenus Cerris [17,30], they are morphologically well distinct, and have
different geographical and ecological ranges. The natural distribution range of Q. cerris is
from central and southern Europe to Asia Minor. However, in peninsular Italy and in Sicily
the ranges of Q. cerris and Q. suber overlap. In fact, Q. crenata is hypothesized to be a
hybrid between Q. suber and Q. cerris, although some other authors considered it instead as a
fixed species.
The analyses of the cpDNA datasets show that Quercus cerris and Quercus ilex share the
same haplotype for most of the fragments, which could point to an incomplete lineage
sorting. However, the highest resolution power of the TrnS/PsbC fragment (Fig. 3.1) places
Q. cerris haplotype as highly derived from the sublineage A3, in the Eastern Mediterranean
area. Although this cpDNA fragment differentiates the species it does not excludes possible,
and eventually somewhat ancient, hybridization events between Q. suber sl A3 and Q. cerris.
The nuclear fragment shows that Q. cerris shares the same haplotype as Q. suber samples
from sublineage α1, one of the lineages from the Eastern Mediterranean area. Considering
both types of markers, although the cpDNA does not immediately suggest introgression
events between these species, the nuclear candidate gene does not clarify between this
hypothesis and incomplete lineage sorting. Nevertheless, retention of ancestral polymorphism
also needs to be considered given the unavailability in confirming introgression between
these species. These two hypotheses might be confounded with each other, particularly when
contemporary introgression can not be discarded, due to the presence of both species in some
areas.
Discussion
79
4.3 Genetic diversity and population structure
The selection of the populations for the SSRs analyses was made based on the sequencing
results and throughout the entire range in order to maximize the chances of surveying a great
part of the species genetic diversity.
Recent work has been done in genetic diversity and population structure for several species
using a combined analysis of EST and genomic SSRs. Although a small amount of work has
been done in cork oak with nuSSRs there were no previous studies EST-SSRs. Tests for
neutrality indicate that selection did not differentially affect performance of EST and nuSSRs
in characterizing cork oak populations. Even though EST-SSRs are potentially exposed to
selection only a small percentage shows evidence of positive selection [91,93]. However, it is
important to conduct selective neutrality tests on EST-SSRs before using them in population
genetics analyses because even though they most probably will not be under strong selection
pressure, a small percentage may indeed be [91,93]. Also, results show that genetic diversity
of EST-SSRs measures similar to the nuSSRs, and there is no evidence of null alleles or other
genotyping errors. Therefore, evidences suggest that EST-SSRs are appropriate markers for
population genetics studies in cork oak.
The population differentiation found, although low was significant and is, at least for the
EST-SSRs (FST=0.071; RST=0.066; Dest=0.077), close to the lower limit of the range of the
average values (0.07-0.09) expected for the long-lived, wind-pollinated woody species (Table
3.5 and Table 3.6) [22]. Although the studies of Coelho et al. [22] and Simões de Matos (F.
Simões de Matos, PhD thesis, INETI Lisbon, 2007) only considered Portuguese populations
(FST=0.0172 and FST=0.02/RST=0.013, respectively), the general values of population
differentiation found here were considerably higher (RST=0.066 EST-SSRs vs. 0.045 nuSSRs;
FST=0.071 EST-SSR‟s vs 0.032 nuSSRs) (Table 3.5). However, pairwise FST and RST values
between Portuguese populations tend to be lower and non-significant (Table 3.6) denoting the
small differentiation between these populations, also found by the studies of Coelho et al.
[22] and Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). Also, and
in agreement with these results we found that most of species diversity (94%) is found within
rather than among populations.
Discussion
80
The locus MsQ13 was previously suggested to be particularly informative to detect F1
hybrids between Q. suber and Q. rotundifolia because alleles sizes do not overlap [88,142].
Even though the locus was tested here in individuals of every population, including all the
individuals that were detected as belonging to the introgressed lineages (either cpDNA or
nuDNA) the locus was monomorphic at the expected allele size for Q. suber. Nevertheless
the work of Burgarella et al. [142] clearly demonstrates the difficulty in detecting
introgressed hybrids in these species even though the microsatellite loci chosen for their work
were highly differentiated between species and had good diagnostic power. Also, although
there was the initial attempt of recreate the SSR battery used in this work some of them were
discarded because there was no amplification product, the scoring was extremely doubtful or
there was a high deviation from HWE. In the future, perhaps a more targeted choice for easily
reproducible markers is required, as well as the investment in some key holm oak populations
for comparative purposes in detecting hybrids.
Isolation by distance was tested but no correlation was found between genetic differentiation
and geographic distance among populations throughout the Mediterranean. However, in a
previous work of Ramírez-Valiente et al. [148] in cork oak Spanish populations, and using
the same nuSSRs battery as in this work (with the exception of QpZAG46 that had no clear
scoring) the authors found that the FST measures for the neutral markers were correlated with
geographic distance. In the same work the authors also found an association between leaf size
and the microsatellite QpZAG46, which suggests a possible linkage between QpZAG46 and
genes encoding for leaf size [148].
When comparing the population structure results from the EST-SSRs and nuSSRs datasets
they are slightly different, which is not completely unexpected considering the different types
of SSRs (Fig. 8a and Fig. 8b). However, when merging the datasets, from where the most
consistent information is expected to be retrieved, the results from STRUCTURE and
GENELAND softwares, although not in complete agreement, present the same emerging
pattern: 1) The Portuguese populations grouped together in one cluster. There was no
differentiation between the Portuguese populations and this is in agreement with the results
found by Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). This
Discussion
81
might be explained, considering the geographic distance between the populations, and it
might be therefore expected the role of gene flow in the homogenization of the alleles in
these populations. Also, this is in agreement with the low and mostly non-significant pairwise
FST and RST values found between these populations; 2) Catalonia is clearly the most well
differentiated population. The results always placed this population as the only of a cluster
and it scored the highest pairwise comparisons for FST and RST values.
On the overall GENELAND results provided a more plausible scenario regarding the
distribution of the clades. When analysing STRUCTURE results the population of Puglia
(PUG) appeared in awkward clusters that are difficult to explain, such as, when K=2 why
does it appear in the same cluster as the Portuguese populations, and when K=4 in the same
cluster as Kenitra (KEN), as in K=2 KEN and PUG are in opposite clusters. Although
STRUCTURE groups KEN and PUG in one cluster, GENELAND separates these two
populations in one cluster each (Figs. 8 and 9). The small number of SSRs and low levels of
differentiation might explain the senseless distribution of some clades in STRUCTURE
analysis. However, the finding that GENELAND identified a greater number of clusters than
STRUCTURE (six versus two/four), and that the same clusters were identified by
independent GENELAND runs and produced similar values of posterior probabilities, could
indicate that the algorithm employed in GENELAND may be more sensitive to find weak
clusters in space, when there is low differentiation. In fact, recently, a similar finding was
reported by Wellenreuther et al. [149] in a work with Ischnura elegans, the blue tailed
damselfly.
Final Remarks
82
5. Final Remarks
Extending over a surface of about 2.2 million ha in seven Mediterranean countries (Portugal,
Spain, Algeria, Morocco, Italy, Tunisia and France), cork oak forest landscapes represent one
of the best examples of the multi-functional role of forests, maintained over thousands of
years but promoting high biodiversity levels. Well managed cork oak forests provide valuable
ecological functions such as the conservation of soil, buffering against climate change and
desertification, water table recharge and run-off control and contribute to the survival of
many species. Cork oak trees are extremely important in ensuring that these ecosystems
maintain the ecological balance and do not harm the forest. These semi-natural woodlands
thus provide a valuable income to local populations both at a direct level with the harvesting
of cork and in an indirect level by providing other economically valuable resources such as
grazing grounds for animals and above all, the maintenance of an ecological balance
Mediterranean regions have been facing a growing number of extreme weather events due to
rapid change of climate. Assessment of the impacts of climate extremes upon cork oak trees
can help planning better forest management practices for coping with future climate change,
and to achieve the purpose of sustainable development of the ecosystems and societies within
the Mediterranean area.
Studying the consequences of past climate shifts on biodiversity are among the best tools to
validate models of the ecological and evolutionary consequences of future changes. Advances
in DNA analysis are allowing the reconstruction of the evolutionary history of forest trees.
This work focused on the first molecular approach assessing the potential of a combined
analysis with chloroplastidial and nuclear DNA markers, as well as sequence data and
microsatellites. The importance of such synergistic analyses is highlighted when addressing
questions such as the evolutionary history and geographic patterns of populations‟ diversity.
On the overall, the three major objectives in this work were achieved. It was possible to
gather valuable information on the evolutionary history of Quercus suber. Sequencing data
allowed the detection of two major haplotype lineages, consistent in both nuclear and
chloroplastidial genomes. Within the pure lineage were unveiled three sublineages and some
signs of recent population expansion. It is hypothesised that during the coldest periods cork
oak would only survive in more benign climatic areas (possibly three refuges), from where,
Final Remarks
83
after the warming at the end of the last glacial period, might have colonized its current
distribution area.
It was also possible to explore the phylogenetic relationships of cork oak and other Quercus
species from all the four recognized subgenus. This also helped the detection of the
introgressed lineage in cork oak resulting from several events of hybridization with Q. ilex.
Although some of the hybridization events might appear old, current hybridization can not be
discarded. Also, and although the hybridization and DNA introgression by Q. ilex has already
been reported by other authors, it became evident in this work that the introgression events
are also detected in the nuclear genome.
Finally, microsatellites allowed the identification of some differentiation and structuring in
some key cork oak populations. Although the differentiation and the clusters found might be
somewhat weak, adding microsatellites and populations will possibly strengthen the results
found here.
Bibliographic References
84
6. Bibliographic References
1 Food and Agriculture Organization of the United Nations (FAO) (2011) State of the
World‟s Forests. Fao World Forests
2 Petit, R. J. and Hampe, A. (2006) Some Evolutionary Consequences of Being a Tree.
Annual Review of Ecology, Evolution, and Systematics. 37, 187-214
3 Oldfield, S. et al. (1998) The World List of Threatened Trees, Cambridge,World
Conservation Press
4 Hansen, A. J. et al. (2001) Global Change in Forests: Responses of Species,
Communities, and Biomes. BioScience. 51, 765-779
5 Food and Agriculture Organization of the United Nations (FAO) (2010) Global Forest
Resources Assessment 2010. Main report
6 González-Martínez, S. C. et al. (2006) Forest-tree population genomics and adaptive
evolution. The New phytologist. 170, 227-38
7 Schaal, B. a et al. (1998) Phylogeographic studies in plants: problems and prospects.
Molecular Ecology. 7, 465-474
8 Petit, R. J. et al. (2005) Climate changes and tree phylogeography in the
Mediterranean. Taxon. 54, 877-885
9 Avise, J. C. et al. (1987) Intraspecific Phylogeography: The Mitochondrial DNA
Bridge Between Population Genetics and Systematics. Annual Review of Ecology and
Systematics. 18, 489-522
10 Avise, J. C. (2009) Phylogeography: retrospect and prospect. Journal of
Biogeography. 36, 3-15
11 Beheregaray, L. B. (2008) Twenty years of phylogeography: the state of the field
and the challenges for the Southern Hemisphere. Molecular ecology. 17, 3754-74
12 Nixon, K. C. (1993) Infrageneric classification of Quercus (Fagaceae) and typification
of sectional names. Annales Des Sciences Forestières. 50, 25s-34s
13 Nixon, K. C. (2006) Global and Neotropical Distribution and Diversity of Oak ( genus
Quercus ) and Oak Forests. In Ecology and conservation of neotropical montane oak
forests 185 (Kappelle, M., ed), pp. 3-13, Springer-Verlag
14 Pausas, J. G. et al. (2006) Regeneration of a marginal Quercus suber forest in the
eastern Iberian Peninsula. Journal of Vegetation Science. 17, 729
Bibliographic References
85
15 Elena-Rosselló, J. A. and Cabrera, E. (1996) Isozyme Variation in Natural Populations
of Cork-Oak (Quercus suber L.). Population Structure, Diversity, Differentiation and
Gene Flow. Silvae Genetica. 4 & 45, 229-235
16 Magri, D. et al. (2007) The distribution of Quercus suber chloroplast haplotypes
matches the palaeogeographical history of the western Mediterranean. Molecular
Ecology. 16, 5259-5266
17 Tutin, T. G. et al. (1993) Flora Europaea, Volume 1, (2nd edn) Cambridge University
Press
18 Toumi, L. and Lumaret, R. (1998) Allozyme variation in cork oak (Quercus suber L.):
the role of phylogeography and genetic introgression by other Mediterranean oak
species and human activities. Theoretical and Applied Genetics (TAG). 97, 647-656
19 Toumi, L. and Lumaret, R. (2001) Allozyme characterisation of four Mediterranean
evergreen oak species. Biochemical systematics and ecology. 29, 799-817
20 Pausas, G. P. et al. (2009) The tree. In Cork Oak Woodlands on the Edge. Ecology,
Adaptive Management, and Restoration (1st edn) (Aronson, J. et al., eds), pp. 11-21,
Island Press
21 Carrión, J. S. et al. (2000) Past distribution and ecology of the cork oak (Quercus
suber) in the Iberian Peninsula: a pollen-analytical approach. Diversity and
Distributions. 6, 29 - 44
22 Coelho, A. C. et al. (2006) Genetic Diversity of Two Evergreen Oaks [Quercus suber
(L.) and Quercus ilex subsp. rotundifolia (Lam.)] in Portugal using AFLP Markers.
Silvae Genetica. 55, 146-152
23 Soto, A. et al. (2007) Differences in fine-scale genetic structure and dispersal in
Quercus ilex L. and Q. suber L.: consequences for regeneration of mediterranean open
woods. Heredity. 99, 601-7
24 Pulido, F. J. et al. (2001) Size structure and regeneration of Spanish holm oak Quercus
ilex forests and dehesas: effects of agroforestry use on their long-term sustainability.
Forest Ecology and Management. 146, 1-13
25 Pons, J. and Pausas, J. G. (2006) Oak regeneration in heterogeneous landscapes: The
case of fragmented Quercus suber forests in the eastern Iberian Peninsula. Forest
Ecology and Management. 231, 196-204
26 Schwarz, O. (1964) Quercus L. In Flora Europaea, Volume 1 (2nd edn) (Tutin, T. G.
et al., eds), pp. 71-76, Cambridge University Press
27 Bellarosa, R. et al. (2005) Utility of ITS sequence data for phylogenetic reconstruction
of Italian Quercus spp. Molecular Philogenetics and Evolution. 34, 355-370
Bibliographic References
86
28 Bellarosa, R. et al. (1990) Ribosomal RNA genes in Ouercus spp. (Fagaceae). Plant
Systematics and Evolution. 172, 127-139
29 Piredda, R. et al. (2011) Prospects of barcoding the Italian wild dendroflora: oaks
reveal severe limitations to tracking species identity. Molecular ecology resources. 11,
72-83
30 Manos, P. S. et al. (1999) Phylogeny, Biogeography, and Processes of Molecular
Differentiation in Quercus subgenus (Fagaceae). Molecular Phylogenetics and
Evolution. 12, 333-349
31 Manos, P. S. et al. (2001) Systematics of Fagaceae: Phylogenetic test of reproductive
trait evolution. International journal of plant sciences. 162, 1361-1379
32 Kress, W. J. and Erickson, D. L. (2007) A two-locus global DNA barcode for land
plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.
PloS one. 2, e508
33 Cowan, R. S. et al. (2006) 300,000 Species to Identify: Problems, Progress, and
Prospects in DNA Barcoding of Land Plants. Taxon. 55, 611
34 Hajibabaei, M. et al. (2007) DNA barcoding: how it complements taxonomy,
molecular phylogenetics and population genetics. Trends in genetics. 23, 167-72
35 Chase, M. W. et al. (2005) Land plants and DNA barcodes: short-term and long-term
goals. Philosophical transactions of the Royal Society of London. Series B, Biological
sciences. 360, 1889-95
36 Fazekas, A. J. et al. (2008) Multiple multilocus DNA barcodes from the plastid
genome discriminate plant species equally well. PloS one. 3, e2802
37 Chase, M. W. et al. (2007) A proposal for a standardised protocol to barcode all land
plants. Taxon. 56, 295-299
38 Lahaye, R. et al. (2008) DNA barcoding the floras of biodiversity hotspots. PNAS.
105, 2923-8
39 CBOL, P. W. G. (2009) A DNA barcode for land plants. PNAS. 106, 12794-7
40 Neubig, K. M. et al. (2008) Phylogenetic utility of ycf1 in orchids: a plastid gene more
variable than matK. Plant Systematics and Evolution. 277, 75-84
41 Jiménez, P. et al. (2004) High variability of chloroplast DNA in three Mediterranean
evergreen oaks indicates complex evolutionary history. Heredity. 93, 510-5
42 Lumaret, R. et al. (2002) Phylogeographical variation of chloroplast DNA in holm oak
(Quercus ilex L.). Molecular ecology. 11, 2327-36
Bibliographic References
87
43 Lumaret, R. et al. (2005) Phylogeographical Variation of Chloroplast DNA in Cork
Oak (Quercus suber). Annals of Botany. 96, 853-861
44 Rushton, B. S. (1993) Natural hybridization within the genus Quercus L. Annals of
forest science. 50, 73-90
45 Boavida, L. C. et al. (2001) Sexual reproduction in the cork oak (Quercus suber L). II.
Crossing intra- and interspecific barriers. Sexual Plant Reproduction. 14, 143-152
46 Bennett, K. D. (1997) Evolution and Ecology: the pace of life, Cambridge University
Press.
47 French, H. M. (2007) The Periglacial Environment, (3rd edn) Longman.
48 Comes, H. P. and Kadereit, W. K. (1998) The effect of Quaternary climatic changes on
plant distribution and evolution. Trends in Plant Science. 3, 432-438
49 Hewitt, G. M. (1999) Post-glacial re-colonization of European biota. Biological
Journal of the Linnean Society. 68, 87-112
50 Willis, K. J. et al. (2000) The Full-Glacial Forests of Central and Southeastern Europe.
Quaternary Research. 53, 203-213
51 Palmé, A. E. et al. (2003) Postglacial recolonization and cpDNA variation of silver
birch, Betula pendula. Molecular ecology. 12, 201-12
52 López de Heredia, U. et al. (2007) Molecular and palaeoecological evidence for
multiple glacial refugia for evergreen oaks on the Iberian Peninsula. Journal of
Biogeography. 34, 1505-1517
53 Willis, K. J. and Van Andel, T. H. (2004) Trees or no trees? The environments of
central and eastern Europe during the Last Glaciation. Quaternary Science Reviews.
23, 2369-2387
54 Hewitt, G. M. (1996) Some genetic consequences of ice ages, and their role in
divergence and speciation. Biological Journal of the Linnean Society.
55 Hewitt, G. M. (2000) The genetic legacy of the Quaternary ice ages. Nature. 405, 907-
913
56 Petit, R. J. et al. (1997) Chloroplast DNA footprints of postglacial recolonization by
oaks. PNAS. 94, 9996-10001
57 Taberlet, P. et al. (1998) Comparative phylogeography and postglacial colonization
routes in Europe. Molecular ecology. 7, 453-64
Bibliographic References
88
58 Konnert, M. and Bergmann, F. (1995) The geographical distribution of genetic
variation of silver fir (Abies alba, Pinaceae) in relation to its migration history. Plant
Systematics and Evolution. 196, 19-30
59 Dumolin-Lapègue, S. et al. (1997) Phylogeographic structure of white oaks throughout
the European continent. Genetics. 146, 1475-87
60 Pollard, D. and Barron, E. J. (2003) Causes of model-data discrepancies in European
climate during Oxygen Isotope Stage 3 with insights from the last glacial maximum.
Quaternary Research. 59, 108-113
61 Barron, E. and Pollard, D. (2002) High-Resolution Climate Simulations of Oxygen
Isotope Stage 3 in Europe. Quaternary Research. 58, 296-309
62 Kvacek, Z. and Walther, H. (1989) Paleobotanical studies in Fagaceae of the European
Tertiary. Plant systematics and Evolution. 162, 213-229
63 Dumolin, S. et al. (1995) Inheritance of chloroplast and mitochondrial genomes in
pedunculate oak investigated with an efficient PCR method. Theoretical and Applied
Genetics. 91, 1253-1256
64 López de Heredia, U. et al. (2005) The Balearic Islands: a reservoir of cpDNA genetic
variation for evergreen oaks. Journal of Biogeography. 32, 939-949
65 Kremer, A. and Petit, R. J. (1993) Gene diversity in natural populations of oak species.
Annals of forest science. 50, 186-202
66 Wright, S. (1931) Evolution in Mendelian Populations. Genetics. 16, 97-159
67 Thompson, J. D. (2005) Plant Evolution in the Mediterranean, Oxford University
Press.
68 Fineschi, S. et al. (2000) Chloroplast DNA polymorphism reveals little geographical
structure in Castanea sativa Mill. (Fagaceae) throughout southern European countries.
Molecular Ecology. 9, 1495 -1503
69 Petit, R. J. et al. (2002) Chloroplast DNA variation in European white oaks
Phylogeography and patterns of diversity based on data from over 2600 populations.
Forest Ecology and Management. 156, 5-26
70 Palmé, A. E. and Vendramin, G. G. (2002) Chloroplast DNA variation, postglacial
recolonization and hybridization in hazel, Corylus avellana. Molecular ecology. 11,
1769-79
71 Elena-Rosselló, J. A. et al. (1992) Evidence for hybridization between sympatric
holm-oak and cork-oak in Spain based on diagnostic enzyme markers. Vegetation. 99,
115-118
Bibliographic References
89
72 Hamza, N. B. (2010) Cytoplasmic and nuclear DNA markers as powerful tools in
populations‟ studies and in setting conservation strategies. African Journal of
Biotechnology. 9, 4510-4515
73 Levy, F. et al. (1996) A population genetic analysis of chloroplast DNA in Phacelia.
Heredity. 76, 143-55
74 Taberlet, P. et al. (1991) Universal primers for amplification of three non-coding
regions of chloroplast DNA. Plant Molecular Biology. 17, 1105-1109
75 Aoki, K. et al. (2003) Intraspecific sequence variation of chloroplast DNA among the
component species of evergreen broad-leaved forests in Japan. Journal of plant
research. 116, 337-44
76 Baraket, G. et al. (2008) Chloroplast DNA analysis in Tunisian fig cultivars (Ficus
carica L.): Sequence variations of the trnL-trnF intergenic spacer. Biochemical
Systematics and Ecology. 36, 828-835
77 Rathbone, D. A. et al. (2007) Microsatellite and cpDNA variation in island and
mainland populations of a regionally rare eucalypt, Eucalyptus perriniana
(Myrtaceae). Australian journal of botany. 55, 513-520
78 Kress, W. J. et al. (2005) Use of DNA barcodes to identify flowering plants. PNAS.
102, 8369-8374
79 Nishizawa, T. and Watano, Y. (2000) Primer pairs suitable for PCR-SSCP analysis of
chloroplast DNA in angiosperms. Journal of Phytogeography Taxon. 48, 63-66
80 Calonje, M. et al. (2008) Non-coding nuclear DNA markers in phylogenetic
reconstruction. Plant Systematics and Evolution. 282, 257-280
81 Hare, M. P. (2001) Prospects for nuclear gene phylogeography. Trends in Ecology &
Evolution. 16, 700-706
82 Bhargava, A. and Fuentes, F. F. (2010) Mutational dynamics of microsatellites.
Molecular biotechnology. 44, 250-66
83 Goldstein, D. B. and Pollock, D. D. (1997) Launching Microsatellites : A Review of
Mutation Processes and Methods of Phylogenetic Inference. Journal of Heredity. 88,
335-342
84 Qureshi, S. N. et al. (2004) EST-SSR: A New Class of Genetic Markers in Cotton. The
Journal of Cotton Science. 8, 112-123
85 Oliveira, E. J. et al. (2006) Origin, evolution and genome distribution of
microsatellites. Genetics and Molecular Biology. 29, 294-307
Bibliographic References
90
86 Lazrek, F. et al. (2009) The use of neutral and non-neutral SSRs to analyse the genetic
structure of a Tunisian collection of Medicago truncatula lines and to reveal
associations with eco-environmental variables. Genetica. 135, 391-402
87 Hornero, J. et al. (2001) Testing the Conservation of Quercus spp. Microsatellites in
the Cork Oak, Q. suber L. Silvae Genetica. 50, 3-4
88 Soto, A. et al. (2003) Nuclear Microsatellite Markers for the Identification of Quercus
ilex L . and Q . suber L . hybrids. Silvae Genetica. 52, 63-66
89 Nagaraj, S. H. et al. (2007) A hitchhiker‟s guide to expressed sequence tag (EST)
analysis. Briefings in bioinformatics. 8, 6-21
90 Bouck, A. and Vision, T. (2007) The molecular ecologist‟s guide to expressed
sequence tags. Molecular ecology. 16, 907-24
91 Kim, K. S. et al. (2008) Utility of EST-derived SSRs as population genetics markers in
a beetle. The Journal of heredity. 99, 112-24
92 Ueno, S. and Tsumura, Y. (2007) Development of ten microsatellite markers for
Quercus mongolica var. crispula by database mining. Conservation Genetics. 9, 1083-
1085
93 Ellis, J. R. and Burke, J. M. (2007) EST-SSRs as a resource for population genetic
analyses. Heredity. 99, 125-32
94 Porth, I. et al. (2005) Linkage mapping of osmotic stress induced genes of oak. Tree
Genetics & Genomes. 1, 31-40
95 Cuénoud, P. et al. (2002) Molecular hylogenetics of Caryophyllales based on nuclear
18S rDNA and plastid and rbcl, atpB and matK DNA sequences. American Journal of
Botany. 89, 132-144
96 Jeffrey, J. A. and Lexer, C. (2008) A set of novel DNA polymorphisms within
candidate genes potentially involved in ecological divergence between Populus alba
and P. tremula, two hybridizing European forest trees. Molecular Ecology Resources.
8, 188-192
97 Casasoli, M. et al. (2006) Comparison of Quantitative Trait Loci for Adaptive Traits
Between Oak and Chestnut Based on an Expressed Sequence Tag Consensus Map.
Genetics Society of America. 172, 533-546
98 Dow, B. D. et al. (1995) Characterization of highly variable (GA/CT) n microsatellites
in the bur oak, Quercus macrocarpa. Theoretical and Applied Genetics. 91, 137-141
99 Steinkellner, H. et al. (1997) Identification and characterization of (GA/CT)n-
microsatellite loci from Quercus petraea. Plant molecular biology. 33, 1093-6
Bibliographic References
91
100 Kampfer, S. et al. (1998) Characterization of (GA)n Microsatellite Loci from Quercus
Robur. Hereditas. 129, 183-186
101 Alberto, F. et al. (2010) Population differentiation of sessile oak at the altitudinal front
of migration in the French Pyrenees. Molecular ecology. 19, 2626-39
102 Van Oosterhout, C. et al. (2004) Micro-Checker: Software for Identifying and
Correcting Genotyping Errors in Microsatellite Data. Molecular Ecology Notes. 4,
535-538
103 Thompson, J. D. et al. (1997) The CLUSTAL_X windows interface: flexible strategies
for multiple sequence alignment aided by quality analysis tools. Nucleic acids
research. 25, 4876-82
104 Larkin, M. a et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics
(Oxford, England). 23, 2947-8
105 Hall, T. A. (1999) BioEdit: A biological user-friendly sequence alignment editor and
analisis program. Nucleic Acids Symposium. 41, 95-98
106 Pina-Martins, F. and Paulo, O. S. (2008) Concatenator: Sequence Data Matrices
Handling Made Easy. Molecular ecology resources. 8, 1254-5
107 Swofford, D. L. (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other
Methods). Version 4. Inauer Associates, Sunderland, Massachusetts
108 Ronquist, F. and Huelsenbeck, J. P. (2003) MrBayes 3: Bayesian phylogenetic
inference under mixed models. Bioinformatics. 19, 1572-1574
109 Nylander, J. (2004) MrModeltest V2. Evolutionary Biology Centre.
110 Bandelt, H. J. et al. (1999) Median-joining networks for inferring intraspecific
phylogenies. Molecular biology and evolution. 16, 37-48
111 Watterson, G. a (1978) The homozygosity test of neutrality. Genetics. 88, 405-17
112 Slatkin, M. (1994) An exact test for neutrality based on the Ewens sampling
distribution. Genetical Research. 64, 71-74
113 Slatkin, M. (1996) A correction to the exact test based on the Ewens sampling
distribution. Genetical Research. 68, 259-260
114 Excoffier, L. and Lischer, H. E. L. (2010) Arlequin suite ver 3.5: a new series of
programs to perform population genetics analyses under Linux and Windows.
Molecular Ecology Resources. 10, 564-567
Bibliographic References
92
115 Harpending, H. C. (1994) Signature of ancient population growth in a low-resolution
mitochondrial DNA mismatch distribution. Human biology an international record of
research. 66, 591-600
116 Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics. 123, 585-95
117 Fu, Y.-X. (1997) Statistical Tests of Neutrality of Mutations Against Population
Growth, Hitchhiking and Background Selection. Genetics Society of America. 147,
915-925
118 Ramos-Onsins, S. E. and Rozas, J. (2002) Statistical properties of new neutrality tests
against population growth. Molecular biology and evolution. 19, 2092-100
119 Rousset, F. (2008) genepop‟007: a complete re-implementation of the genepop
software for Windows and Linux. Molecular Ecology Resources. 8, 103-106
120 Librado, P. and Rozas, J. (2009) DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics (Oxford, England). 25, 1451-2
121 Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New
York, USA. 512 pp
122 Goudet, J. (1995) FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics
. Journal of Heredity . 86, 485-486
123 Goudet, J. (2001) FSTAT, a program to estimate and test gene diversities and fixation
indices (version 2.9.3). Available ,
124 Peakall, R. and Smouse, P. E. (2006) genalex 6: genetic analysis in Excel. Population
genetic software for teaching and research. Molecular Ecology Notes. 6, 288-295
125 Wier, B. S. and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of
population structure. Evolution. 38, 1358-1370
126 Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele
frequencies. Genetics. 139, 457-62
127 Crawford, N. G. (2010) Smogd: Software for the Measurement of Genetic Diversity.
Molecular ecology resources. 10, 556-7
128 Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular
Ecology. 17, 4015-4026
129 Hedrick, P. W. (2005) A Standardized genetic differentiation measure. Evolution. 59,
1633-1638
Bibliographic References
93
130 Nei, M. and Chesser, R. K. (1983) Estimation of fixation indices and gene diversities.
Annals of Human Genetics. 47, 253-259
131 Rousset, F. (1997) Genetic differentiation and estimation of gene flow from F-statistics
under isolation by distance. Genetics Society of America. 145, 1219-1228
132 Jensen, J. L. et al. (2005) Isolation by distance, web service. BMC genetics. 6, 13
133 Pritchard, J. K. et al. (2000) Inference of population structure using multilocus
genotype data. Genetics. 155, 945-59
134 Hubisz, M. J. et al. (2009) Inferring weak population structure with the assistance of
sample group information. Molecular ecology resources. 9, 1322-32
135 Falush, D. et al. (2003) Inference of population structure using multilocus genotype
data: linked loci and correlated allele frequencies. Genetics. 164, 1567-87
136 Evanno, G. et al. (2005) Detecting the number of clusters of individuals using the
software STRUCTURE: a simulation study. Molecular ecology. 14, 2611-20
137 Rosenberg, N. A. (2004) Distruct: a program for the graphical display of population
structure. Molecular Ecology Notes. 4, 137-138
138 Guillot, G. et al. (2005) Geneland: a computer package for landscape genetics.
Molecular Ecology Notes. 5, 712-715
139 Guillot, G. et al. (2005) A spatial statistical model for landscape genetics. Genetics.
170, 1261-80
140 François, O. et al. (2006) Bayesian clustering using hidden Markov random fields in
spatial population genetics. Genetics. 174, 805-16
141 Excoffier, L. et al. (1992) Analysis of molecular variance inferred from metric
distances among DNA haplotypes: application to human mitochondrial DNA
restriction data. Genetics. 131, 479-91
142 Burgarella, C. et al. (2009) Detection of hybrids in nature: application to oaks
(Quercus suber and Q. ilex). Heredity. 102, 442-52
143 Lexer, C. et al. (2004) Hybrid zones as a tool for identifying adaptive genetic variation
in outbreeding forest trees: lessons from wild annual sunflowers (Helianthus spp.).
Forest ecology and management. 197, 49-64
144 Petit, R. J. et al. (2002) Identification of refugia and post-glacial colonisation routes of
European white oaks based on chloroplast DNA and fossil pollen evidence. Forest
Ecology and Management. 156, 49-74
Bibliographic References
94
145 Dow, B. D. and Ashley, M. V. (1996) Microsatellite analysis of seed dispersal and
parentage of samplings in bur oak, Quercus macrocarpa. Molecular ecology. 5, 615-
627
146 Hu, X. S. and Ennos, R. A. (1999) Impacts of seed and pollen flow on population
genetic structure for plant genomes with three contrasting modes of inheritance.
Genetics. 152, 441-50
147 Streiff, R. et al. (1999) Pollen dispersal inferred from paternity analysis in a mixed oak
stand of Quercus robur L . and Q. petraea ( Matt .) Liebl . Molecular Ecology. 8, 831-
841
148 Ramírez-Valiente, J. a et al. (2009) Elucidating the role of genetic drift and natural
selection in cork oak differentiation regarding drought tolerance. Molecular ecology.
18, 3803-15
149 Wellenreuther, M. et al. (2011) Environmental and climatic determinants of molecular
diversity and genetic population structure in a coenagrionid damselfly. PloS one. 6,
e20440
150 Lewontin, R. C. (1964) The Interaction of Selection and Linkage. I. General
Considerations; Heterotic Models. Genetics. 49, 49-67
151 Meirmans, P. G. and Hedrick, P. W. (2011) Assessing population structure: F(ST) and
related measures. Molecular ecology resources. 11, 5-18
95
Supporting Information
Supporting Information
Supporting Information
96
Table S1.1: Description of the cpDNA fragments used concerning primer sequences, annealing temperature (Ta
in ºC) and fragment size (in base pairs).
Table S1.2: Primer sequences and bibliographic references, annealing temperature (in ºC), fragment size (in
base pairs) and locus information for the nuDNA fragments.
Supporting Information 1
Information regarding the primers used for the amplification of each cpDNA fragment is
summarized in table S1.1, as well as the annealing temperatures for PCR amplification and
fragments size.
Primers
Locus Forward Reverse Ta Size Reference
TrnL-F 5’ GGT TCA AGT CCC TCT
ATC CC 3’ 5’ ATT TGA ACT GGT GAC ACG
AG 3’ 65 381
Taberlet et al., 1991
TrnS-PsbC 5’ TGA ACC TGT TCT TTC
CAT GA 3’ 5’ GAA CTA TCG AGG GTT
CGA AT 3’ 65 250
Nishizawa & Watano, 2000
TrnH-PsbA 5’ CGC GCA TGG TGG ATT
CAC AAT CC 3’ 5’ GTT ATG CAT GAA CGT AAT
GCT C 3’ 65 478 Kress et al., 2005
matK 5' CGA TCT ATT CAT TCA
ATA TTT C 3' 5' TCT AGC ACA CGA AAG TCG
AAG T 3' 65 740
Cuénoud et al., 2002
rbcla 5' ATG TCA CCA CAA ACA
GAG ACT AAA GC 3' 5' GTA AAA TCA AGT CCA CCR
CG 3' 65 552
Kress & Erickson, 2007
A description regarding the three nuclear candidate genes tested in this study is summarized
in table S1.2, as well as the annealing temperatures and primers for PCR amplification and
fragments size.
Primers
Locus Forward Reverse Description Ta size Reference
EST 2T13 5' CAT GCA CTG
CCA ATC TCA GAG A 3'
5' ATA ATT TGC CTC ATC ACT ACA TAA GA
3'
Osmotic stress related gene
55 249 Porth et al.,
2005
Cons 58 5'CCA ATT CTC TTA GTG GCA
AGG 3'
5' GCT TTG GGA TGA TGT TTT GG 3'
Auxin repressed protein
* * Casasoli et al., 2006
Phyt B 5' ATA TGG CGA ATA TGG GGT CA
3'
5' GGC ATC CAT TTC TGC ATT CT 3'
Phytocrome B, involved in flower
phenology * *
Jeffrey & Lexer, 2008
* Amplification product was never obtained for cork oak.
Supporting Information
97
Table S2.1: Description of the nuSSRs used concerning primer sequences, annealing temperatures (Ta in ºC),
repeat motif and size ranges (in base pairs).
Supporting Information 2
Information regarding the 11 dinucleotide nuclear microsatellite (nuSSRs) markers is
summarized in table S2.1. A description and relevant information about the 6 EST-SSRs
tested in this study is also summarized in table S2.2.
Primers
Size range (bp)
Locus Forward Reverse Ta Repeat motif Expected Found Reference
MsQ13 5' TGG CTG CAC
CTA TGG CTC TTA G 3'
5' ACA CTC AGA CCC ACC ATT
TTT CC 3' 55 (AG)n 222-246 218
Dow et al., 1995
QpZAG9 5' GCA ATT ACA
GGC TAG GCT GG 3'
5' GTC TGG ACC TAG CCC TCA TG
3' 50 (AG)12 182-210 223-249 Steinkellner
et al., 1997
QpZAG15 5' CGA TTT GAT
AAT GAC ACT ATG G 3'
5' CAT CGA CTC ATT GTT AAG
CAC 3' 57 (AG)23 108-152 101-135 Steinkellner
et al., 1997
QpZAG36 5' GAT CAAA AAT TTG GAA TAT TAA
GAG AG 3'
5' ACT GTG GTG GTG AGT CTA ACA TGT AG 3'
* (AG)19 210-236 * Steinkellner
et al., 1997
QpZAG46 5' CCC CTA TTG
AAG TCC TAG CCG 3'
5' TCT CCC ATG TAA GTA GCT
CTG 3' * (AG)13 190-222 * Steinkellner
et al., 1997
QpZAG110 5' GGA GGC TTC
CTT CAA CCT ACT 3'
5' GAT CTC TTG TGT GCT GTA
TTT 3' 50 (AG)15 206-262 208-258 Steinkellner
et al., 1997
QrZAG11
5' CCT TGA ACT CGA AGG TGT CCT
T 3'
5' GTA GGT CAA AAC CAT TGG
TTG ACT 3' 50 (TC)18 238-263 255-281 Kampfer et
al., 2004
QrZAG7
5' CAA CTT GGT GTT CGG ATC AA
3'
5' GTG CAT TTC TTT TAT AGC ATT CAC 3'
50 (TC)17 115-153 115-133 Kampfer et
al., 2004
QrZAG20 5' CCA TTA AAA
GAA GCA GTA TTT TGT 3'
5' GCA ACA CTC AGC CTA TAT CTA GAA 3'
50 (TC)22 160-200 161-171 Kampfer et
al., 2004
QsA11
5’ GAT CTC TTT GTC AAC CCA GAC
3’
5’ ATG TGT GTG GTG ATG GGT
TT 3' * (CA)n 258-276 * Simões de
Matos 2007
QsD8 5’ GAT CCT CTG
CTT CTC TCT G 3’
5’CTG CAA CTT TAT CCG CCT CC
3’ * (CA)n 140-150 * Simões de
Matos 2007
* Amplification product was never obtained, or the scoring was unreliable.
Supporting Information
98
Primers Size range (bp)
Locus Forward Reverse Ta Repeat motif Expected Found Description
QmOST1 DN949770
5' CAA CCA TCG AGG CCA TTA
CGA A 3'
5' TCA CCG ATC TTG AAG GTC
CTC GA 3' 58 (AG)19 149-171 134-152
EST Non-coding
QmD12 CR627959
5' GCT CCC TGG TAG TCG GCT
AAA GA 3'
5' CAA TTG GGA CAA CAT GGA
AGC AT 3' 58 (GCA)7 243-251 240-246
EST Coding Zinc finger
protein
QmAJ1 AJ577265
5' ATT CAG GCC GCA AAT CAA
TAA GG 3'
5' GAA ACT GGT CCC CTT CTC
TTG GA 3'
57 (GAA)6 374-380 360-375
EST Coding Pheromone
receptor-like protein
QmDN1 DN950717
5' TAG TTT TCC CAG CGA ATC
CAA CA 3'
5' CTT CTT GAA GGG ACT GAC
CCC AT 3' 58 (GGA)6 242-261 236
EST Coding Salt tolerance
protein
QmDN2 DN949776
5' CAA CCA TCG AGG CCA TTA
CGA A 3'
5' TCA CCG ATC TTG AAG GTC
CTC AG 3'
* (AG)9 156-168 *
EST Non-coding
60S ribosomal protein L21
QmDN3 DN950726
5' TCA AAC AAT CTC AAG GCT
CCC AA 3'
5' GCT TTT GAG AAA CTT TGG
CCA CC 3' 58 (TC)10 361-381 361-375
EST Non-coding
Putative carboxyl-terminal
proteinase
* Amplification product was never obtained, or the scoring was unreliable.
Table S2.2: Primer sequences [92], annealing temperature (in ºC), repeat motif, size ranges (in base pairs) and
locus information for the EST-SSRs.
Supporting Information
99
Supporting Information 3
The cpDNA concatenated matrix has a length of 1109 bp, where 92 are variable. The model
of sequence evolution for the Bayesian analysis (BA) was calculated separately for each
cpDNA data set. The BA tree showed a very similar result to that of the MP analysis,
therefore the MP tree for the concatenated dataset is presented in Fig. S3.1. The concatenated
tree supports the results of the individual trees, where the 4 major groups are present (Fig.
3.1a, Fig. 3.2 and Fig. 3.3). Highlighted in yellow, the Group A is composed by the cork oak
samples belonging to the pure lineage distributed in the three sublineages (A1, A2 and A3) in
accordance with the TrnS/PsbC tree (Fig. 3.1). Group B is the most variable one, composed
by several haplotypes of cork oak samples from the introgressed lineage, as well as with
samples from Quercus ilex (subs rotundifolia and ilex) and Quercus coccifera. The Group C,
composed by several Quercus species, is closely related to Group A. Group D is constituted
by Quercus rubra, which is placed as the most distant species from cork oak, as it happened
in the phylogeny of the TrnH/PsbA fragment (Fig. 3.2)
Supporting Information
100
Figure S3.1: Maximum parsimony tree of the cpDNA concatenated dataset. Four groups are represented and
color coded. Group A is highlighted in yellow: Cork oak‟s Pure lineage (Bright Yellow - Sublineage A2 (Sl
A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1(Sl A1)); Group B (orange –
cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group C is
highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q.
canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at
the nodes are the bootstrap support value obtained from 1000 replicates for the MP analysis and the Bayesian
credibility value.
Supporting Information
101
Supporting Information 4
Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Fig. S4.1,
Fig. S4.2 and Fig. S4.3) reflecting the four major groups in the phylogenetic trees. Also they
show shared haplotypes for Q. suber, in clade B, with Quercus coccifera, Q. ilex ilex and
Quercus ilex rotundifolia. Although the networks do not clearly reflect the phylogenetic
relationships between the groups they bring visual support information about the distance
between them, as the networks appear as a simple and clear way to represent the mutational
steps between haplotypes, and also about the haplotype frequencies. The median-joining
networks of the distribution representing the observed haplotypes for each Quercus species,
for the fragments TrnS/PsbC, TrnH/PsbA and TrnL-F, are respectively presented in Fig. S4.1,
Fig. S4.2 and Fig. S4.3.
Supporting Information
102
Figure S4.1: A median-joining haplotype network generated from 250 bases of the TrnS/PsbC intergenic spacer
region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading
indicates the proportion of individuals with a particular haplotype for a given species (Yellow: cork oak‟s Pure
lineage and Q. cerris (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3, including Q. cerris;
Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q.
rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light
Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black
circles indicate the presence of a missing ancestral haplotype
Supporting Information
103
Figure S4.2: A median-joining haplotype network generated from 478 bases of the TrnH/PsbA intergenic
spacer region. Circle size reflects the relative frequency of each haplotype across all 10 Quercus species.
Shading indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork
oak‟s pure lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q.
rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light
Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black
circles indicate the presence of a missing ancestral haplotype.
Supporting Information
104
Figure S4.3: A median-joining haplotype network generated from 381 bases of the TrnL-F intergenic spacer
region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading
indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork oak‟s pure
lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q. rotundifolia;
Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light Blue Q. rubra.
Each number in the network indicates the number of mutations between the haplotypes. Black circles indicate
the presence of a missing ancestral haplotype.
Supporting Information
105
Supporting Information 5
Global evaluation of the EST-SSR dataset using MICRO-CHEKER v2.2.3 [102] revealed no
evidence of genotyping errors due to stuttering or large allele dropout, but identified possible
null alleles, by a general excess of homozygotes, at two loci: QmOST1 and QmDN3
(p<0.05). For QmOST1 locus there is the possibility of null alleles for the populations of
Haza del Lino (HAZ) and Mekna (MEK) (Fig. S5.1). However, for both populations, when
analyzing the graphics the observed values of the homozygote frequencies are barely outside
the range of the expected values. Therefore this microsatellite was not discarded from the
following analysis.
For the QmDN3 locus the observed homozygote frequencies were clearly out of the range of
what would be expected. The fact that this was detected for all the 13 populations provides a
strong indicator that there seems to be in fact null alleles for this locus. A representative
example of all populations is exhibited in Fig. S5.2. As a result this locus was discarded from
all subsequent analyses.
Regarding the nuSSRs dataset, the global evaluation with MICRO-CHEKER revealed, again,
no evidence of genotyping errors due to stuttering or large allele dropout, but identified
possible null alleles in a few populations for the markers: QpZAG110 (Fig. S5.3), QrZAG11
(Fig. S5.4) and QrZAG20 (Fig. S5.6). Specifically for QpZAG110 locus, null alleles were
detected for the populations of Serra da Arrábida (ARR) e Serra do Buçaco (BUC); for the
QrZAG11 locus the possibility of null alleles was detected for Serra da Estrela (EST)
population; and for QrZAG20 locus for the populations of Puglia (PUG) and Serra de
Monchique (MON). However, when analyzing the graphics the observed values of the
homozygote frequencies are barely outside the range of the expected values, and the
indication of null alleles is only for one or two populations out of the 13 analysed. Therefore
these microsatellites were not discarded from the following analysis.
Supporting Information
106
a) b)
c) d)
a) b)
Figure S5.1: MICRO-CHEKER charts for the QmOST1 locus for the populations of HAZ (Figs. a and b) and
MEK (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population
HAZ; b) Homozygote frequencies for the population HAZ; c) Frequency differences in base pair for the
population MEK; d) Homozygote frequencies for the population MEK.
Figure S5.2: MICRO-CHEKER charts of the QmDN3 locus for the population of Serra da Arrábida (ARR), as
a representative of the indication of null alleles for all the 13 populations. The significance level is 0.05; a)
Frequency differences in base pair for the population ARR; b) Homozygote frequencies for the population ARR.
Supporting Information
107
a)
b)
c)
d)
Figure S5.3: MICRO-CHEKER charts for the locus QpZAG110 for the populations ARR and BUC. The
significance level is 0.05; a) Frequency differences in base pair for the population ARR; b) Homozygote
frequencies for the population ARR; c) Frequency differences in base pair for the population BUC; d)
Homozygote frequencies for the population BUC.
Supporting Information
108
a)
b)
a) b)
c) d)
Figure S5.4: MICRO-CHEKER charts for the locus QrZAG11 for the populations EST. The significance level
is 0.05; a) Frequency differences in base pair; b) Homozygote frequencies.
Figure S5.5: MICRO-CHEKER charts for the QrZAG20 locus for the populations of PUG (Figs. a and b) and
MON (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population
PUG; b) Homozygote frequencies for the population PUG; c) Frequency differences in base pair for the
population MON; d) Homozygote frequencies for the population MON.
Supporting Information 6
Linkage disequilibrium is the non-random association of alleles at two or more loci. This is a
statistical association and the loci do not have necessarily to be physically linked [150].
Genotypic linkage disequilibrium between all pairs of loci was tested by means of a
contingency exact test using GenePop v4 [119] (Table S6.1). No significant departure from
the null hypothesis of linkage equilibrium was detected. Therefore the eight polymorphic
microsatellite markers should be useful for this study.
Loci combination p
EST-SSRs
QrOST1 & QpD12 0.37
QrOST1 & QmAJ1 0.10
QpD12 & QmAJ1 0.07
nuSSRs
QpZAG110 & QpZAG9 0.38
QpZAG110 & QrZAG20 0.90
QpZAG9 & QrZAG20 0.43
QpZAG110 & QrZAG7 0.96
QpZAG9 & QrZAG7 0.51
QrZAG20 & QrZAG7 0.34
QpZAG110 & QrZAG11 0.81
QpZAG9 & QrZAG11 0.38
QpZAG20 & QrZAG11 0.10
QrZAG7 & QrZAG11 0.97
Complete dataset
QrOST1 & QpZAG110 0.95
QrOST1 & QpZAG9 0.88
QrOST1 & QrZAG20 0.18
QrOST1 & QrZAG7 0.00
QrOST1 & QrZAG11 0.44
QpD12 & QpZAG110 0.88
QpD12 & QpZAG9 0.95
QpD12 & QrZAG20 0.05
QpD12 & QrZAG7 0.25
QpD12 & QrZAG11 0.86
QmAJ1 & QpZAG110 0.58
QmAJ1 & QpZAG9 0.96
QmAJ1 & QrZAG20 0.59
QmAJ1 & QrZAG7 0.15
QmAJ1 & QrZAG11 0.82
Table S6.1: Test for linkage disequilibrium for all pairs of loci
using Fisher's method, implemented in GenePop software.
Loci combination p
EST-SSRs
QrOST1 & QpD12 0.37
QrOST1 & QmAJ1 0.10
QpD12 & QmAJ1 0.07
nuSSRs
QpZAG110 & QpZAG9 0.38
QpZAG110 & QrZAG20 0.90
QpZAG9 & QrZAG20 0.43
QpZAG110 & QrZAG7 0.96
QpZAG9 & QrZAG7 0.51
QrZAG20 & QrZAG7 0.34
QpZAG110 & QrZAG11 0.81
QpZAG9 & QrZAG11 0.38
QpZAG20 & QrZAG11 0.10
QrZAG7 & QrZAG11 0.97
Complete dataset
QrOST1 & QpZAG110 0.95
QrOST1 & QpZAG9 0.88
QrOST1 & QrZAG20 0.18
QrOST1 & QrZAG7 0.00
QrOST1 & QrZAG11 0.44
QpD12 & QpZAG110 0.88
QpD12 & QpZAG9 0.95
QpD12 & QrZAG20 0.05
QpD12 & QrZAG7 0.25
QpD12 & QrZAG11 0.86
QmAJ1 & QpZAG110 0.58
QmAJ1 & QpZAG9 0.96
QmAJ1 & QrZAG20 0.59
QmAJ1 & QrZAG7 0.15
QmAJ1 & QrZAG11 0.82 Table S6.1: Test for linkage disequilibrium for all pairs of loci
using Fisher's method, implemented in GenePop software.
Supporting Information
110
Table S7.1: Pair Dest values between every population.
ALG – Forêt des Guerbès (Algeria); ARR – Arrábida (Portugal); BUC – Buçaco (Portugal); CAT – Cataluña
(Spain); HAZ – Haza del Lino (Spain); EST – Estrela (Portugal); GER – Gerês (Portugal); ITA – Puglia
(Italy); KEN – Kenitra (Marocco); TAZ – Taza (Marocco); MON – Monchique (Portugal); SIN – Sintra
(Portugal); TUN – Mekna (Tunisia).
Supporting Information 7
Although FST is widely used as a measure of population differentiation and structure, it has
been criticized because of its dependency on within-population diversity, which has led to the
development of replacement statistics such as D, the measure of actual differentiation among
populations, according to Jost [128]. Nevertheless, Meirmans & Hendrick [151] recommend
continuing to use FST in combination with the new statistics.
Tests of pairwise Dest were performed for the thirteen populations. Both SSR‟s matrices
were analysed together. The overall genetic differentiation at the microsatellite loci was low
(Pairwise FST from 0.000 to 0.097) (Table S7.1). The Dest values very resembled the FST and
RST matrices (Table 3.6), although with a tendency to be lower.
-- 0.010 0.021 0.031 0.005 0.012 0.007 0.056 0.039 0.006 0.033 0.012 0.000 ALG
-- 0.000 0.017 0.016 0.001 0.000 0.050 0.040 0.008 0.000 0.000 0.024 ARR
-- 0.031 0.060 0.002 0.002 0.041 0.065 0.009 0.005 0.003 0.028 BUC
-- 0.035 0.050 0.029 0.097 0.051 0.045 0.043 0.037 0.070 CAT
-- 0.032 0.017 0.073 0.026 0.009 0.039 0.030 0.012 HAZ
-- 0.000 0.037 0.057 0.013 0.001 0.016 0.015 EST
-- 0.033 0.030 0.013 0.001 0.005 0.013 GER
-- 0.042 0.034 0.055 0.072 0.066 PUG
-- 0.025 0.069 0.070 0.062 KEN
-- 0.037 0.016 0.010 TAZ
-- 0.029 0.046 MON
-- 0.031 SIN
-- MEK
ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK
Supporting Information
111
Supporting Information 8
The estimation of the number of populations (K) should be treated with care and a biological
interpretation of K may not be straightforward. We used the posterior probability of the data
for a given K, LnP(D), to identify the most probable number of clusters using both DeltaK
(DK) ad hoc statistics [136] and by plotting the average values of LnP(D). As the LnP(D), the
(ad hoc) estimate for the number of groups given by STRUCTURE might not always
correspond to the real number of clusters, the DeltaK, an ad hoc quantity related to the second
order rate of change of the log probability of data with respect to the number of clusters,
tends to be a good predictor of the real number of clusters.
The EST-SSR‟s and nuSSR‟s datasets were analysed separately and then merged together to
determine the species genetic structure (Fig. 3.8). The plots of the logarithm of the
probability of the data [LnP(D)] and of the Evanno‟s criterion [136] are represented,
respectively, in Fig. S8.1, Fig. S8.2 and Fig. S8.3 for the EST-SSRs, nuSSRs and combined
datasets.
Supporting Information
112
Figure S8.1: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the
EST-SSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20
replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1
to 13).
Supporting Information
113
Figure S8.2: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the
nuSSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20
replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1
to 13).
Supporting Information
114
Figure S8.3: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the
combined dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20
replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1
to 13).
Supporting Information
115
Supporting Information 9
Several AMOVA (Hierarchical Analysis of Molecular Variance) [141] analysis (with 1000
permutations) (Table S9.1), were performed based on the allelic frequencies (FST values). It
was intended to verify the distribution of the genetic variability between the different
hierarchy levels: groups (FCT), populations (FSC) and individuals (FST). The different
structures considered were in accordance with the clusters (K) obtained by the softwares
STRUCTURE [133] (Fig. 3.8) and GENELAND [138] (Fig. 3.9). It is assumed that the best
genetic structure obtained is the one that explains the major part of variation by the groups
(FCT), that is, it maximizes the break between populations
Among groups Among populations
within groups Within populations
% Fct % Fsc % Fst
Two clusters (K=2) 2.81 0.02814
*** 3.16
0.03249
*** 94.03
0.02814
***
Four clusters (K=4) 3.37 0.05817
*** 2.45
0.02532
*** 94.18
0.03370
***
Six clusters (K=6) 4.99 0.05844
*** 0.85
0.00897
** 94.16
0.04992
***
Table S9.1: Variation percentages over different levels estimated with AMOVA. The analysis
was performed for the SSR loci combined dataset, based on Fst values.
%= Percentage explained by the total of molecular variance
Significance level **P<0.01, ***P<0.001
Supporting Information
116