Differentiation and genetic variability in cork oak...

transcript

UNIVERSIDADE DE LISBOA

FACULDADE DE CIÊNCIAS

DEPARTAMENTO DE BIOLOGIA ANIMAL

Differentiation and genetic variability in cork oak populations

(Quercus suber L.)

Joana Seabra Pulido Neves da Costa

MESTRADO EM BIOLOGIA HUMANA E AMBIENTE

Lisboa

UNIVERSIDADE DE LISBOA

FACULDADE DE CIÊNCIAS

DEPARTAMENTO DE BIOLOGIA ANIMAL

Differentiation and genetic variability in cork oak populations

(Quercus suber L.)

Joana Seabra Pulido Neves da Costa

Dissertação orientada por:

Prof. Doutor Octávio Fernando de Sousa Salgueiro Godinho Paulo

Doutora Dora Cristina Vicente Batista Lyon de Castro

MESTRADO EM BIOLOGIA HUMANA E AMBIENTE

Lisboa

Nota prévia

A presente tese de mestrado encontra-se escrita na língua Inglesa uma vez que esta é

considerada a língua científica universal. Por esta razão, o conhecimento e treino da sua

escrita apresentam uma importância considerável para quem tenciona seguir uma carreira em

investigação científica em Biologia. Com a escrita da tese em Inglês pretende-se também

acelerar o processo de elaboração dos manuscritos e subsequentes publicações científicas.

As referências bibliográficas foram elaboradas segundo os parâmetros da revista científica

internacional, “Trends in Ecology and Evolution” (www.cell.com/trends/ecology-

evolution/authors). Esta é uma das revistas mais relevantes na área em que esta tese foi

desenvolvida e possui um sistema de citações cómodo para a leitura de textos de revisão

científica. Adicionando o seu elevado factor de impacto na sociedade científica, pareceu

apropriada a escolha desta revista como referência para a apresentação da bibliografia.

O estudo elaborado nesta tese foi desenvolvido no âmbito do projecto PTDC/AGR-

GLP/104966/2008, “Avaliação dos recursos genéticos e genómicos do sobreiro: bases para

uma gestão prospectiva”, financiado pela Fundação para a Ciência e Tecnologia (FCT).

Foreword

The present master thesis is written in English. This is considered as the universal scientific

language and, therefore, is of the upmost importance the practice of its writing and grammar

for those who intend to follow a career in Biology and scientific investigation. Also, the

writing of the present thesis in the English language allows to accelerate the process of

submission of the manuscripts for further publication.

The bibliographic references were elaborated following the parameters of the international

scientific journal “Trends in Ecology and Evolution” (www.cell.com/trends/ecology-

evolution/authors). This is one of the most relevant journals in the area where this thesis was

developed, with an elevated impact factor in the scientific society. Also it possesses a

confortable citations system for the reading long texts.

This study is part of the project PTDC/AGR-GLP/104966/2008, “Avaliação dos recursos

genéticos e genómicos do sobreiro: bases para uma gestão prospectiva”, funded by Fundação

para a Ciência e Tecnologia (FCT).

Agradecimentos

No terminar desta tese surge a necessidade de agradecer a todos aqueles que de alguma forma

a tornaram possível.

O primeiro agradecimento é devido aos meus orientadores, Octávio Paulo e Dora Batista. Ao

Professor Octávio pelo incentivo e voto de confiança que depositou em mim desde o início. À

Dora pela proposta do tema de mestrado e pelo despertar do meu interesse pelas plantas.

À Professora Deodália Dias pelas oportunidades que me proporcionou e pelo apoio

incondicional quando os problemas fogem ao nosso controlo e não dependem de nós.

Agradeço em particular à Professora Helena Almeida do Instituto Superior de Agronomia

pelo acesso à Herdade Monta da Fava de onde vieram algumas das populações de sobreiro

mais importantes para o desenvolvimento deste trabalho. A todos os que directa ou

indirectamente foram importantes para a recolha das amostras.

Ao CoBiG2

pelo grupo que se formou e pelos bons momentos. Ao Francisco Pina-Martins e à

Vera Nunes por me terem criado nos momentos iniciais da minha vida de laboratório. À Sofia

Seabra pela sua calma natural e amizade. Ao Eduardo Marabuto pelo seu bom humor, muito

necessário em tempos difíceis. À Sara Ema, por incrível que te possa parecer acho que

stressas mais que eu e isso é uma ajuda enorme, assim como estares comigo até ao dia da

entrega… mesmo de moletas. Ao Diogo Silva pelo partilhar de alguns momentos difíceis

com os orientadores a afins. À Catarina Dourado, Ana Sofia, Patrícia Brás, Renata Martins,

Inês Modesto e Bruno Vieira que me foram ajudando com as usuais dificuldades de uma tese.

Aos restantes membros do CoBiG2, assim como a antigos membros e às mais recentes

aquisições, muito obrigada!

À Rita Oliveira e Raquel Vaz, não fazem parte do CoBiG2, mas fazem parte da família e

merecem o devido reconhecimento e agradecimento pelo que “aturaram” da minha parte.

Um agradecimento especial a quatro pessoas que devem ter sofrido muito comigo. Seriam

precisas páginas de agradecimentos, mas como não o posso fazer fica a intenção. À Catarina

Dourado um agradecimento em particular. Foi um longo caminho e fica o agradecimento pelo

carinho, apoio e amizade. Ao Eduardo Marabuto pelos valiosos comentários, acima de tudo

na Introdução. Tens razão em muitas coisas mas há que fazer compromissos. À Sofia Seabra

pelas importantíssimas correcções, seria muito mais difícil sem ti. Ao Bruno Vieira pelas

horas infinitas que me ouviu queixar da vida em geral, da tese em particular. Sempre com

muito amor e carinho! Obrigada aos quatro!

À Diana Martins. Não estás sempre comigo mas estás sempre a pensar em mim e tens timings

impecáveis para quando preciso mais de ti.

E ao Pai, Mãe e Avós. Apesar de estarem no fim desta lista foram provavelmente as pessoas

que mais contribuíram para que esta tese pudesse ser concluída. Claramente não estaria cá

sem o apoio precioso da minha família.

Resumo

O ano 2011 foi designado como “O Ano Internacional das Florestas” pela Assembleia Geral

das Nações Unidas, na tentativa de despertar o interesse público e promover a

sustentabilidade da gestão e conservação florestal para o benefício das gerações futuras.

Estimativas da FAO (Food and Agriculture Organization) para o ano de 2010 demonstraram

que 31% da superfície terrestre ainda está coberta por florestas e que as árvores

correspondem a 90% da biomassa terrestre, compreendendo um total de 60.000 a 100.000

taxa. Contudo, certas alterações induzidas pelo Homem, principalmente a desflorestação e as

alterações climáticas elevaram o número de espécies ameaçadas de extinção para 10%.

Nos últimos tempos as espécies florestais têm sido bastante usadas em estudos de genética

populacional e evolutiva, assim como em estudos genómicos. As principais razões são as

características particulares que estes modelos não-clássicos apresentam, visto resultarem de

milhões de anos de divergência e diversificação, e assim apresentarem impressionantes níveis

de diversidade morfológica, divergência evolutiva e diversidade ecológica. Apesar de o

impacto que as alterações globais vão ter sobre estas espécies depender grandemente da sua

capacidade de reacção e da dos seus ecossistemas, os estudos genéticos permitem-nos, até

certo ponto, prever as consequências evolutivas das alterações uma vez que nos possibilitam

aumentar o conhecimento da biodiversidade e evolução destas espécies.

O conceito de “Filogeografia” foi apresentado por Avise et al. em 1987, e durante os últimos

25 anos teve um grande impacto na investigação, particularmente em animais. Nas plantas os

resultados produzidos não têm sido tão explícitos, principalmente devido à falta de

variabilidade genética aplicável à análise filogeográfica. Tem sido consideravelmente difícil

encontrar um marcador genético em plantas com um poder de resolução semelhante ao DNA

mitocondrial animal. No entanto a filogeografia em plantas tem-se desenvolvido bastante,

principalmente nos últimos anos, com o crescimento do uso de marcadores moleculares

nucleares e com a recolha de informação de fragmentos maiores do genoma cloroplastidial.

O género Quercus (carvalhos) (Fagaceae) é um dos grupos mais importantes de

angióspermicas lenhosas no hemisfério norte, nomeadamente em relação à diversidade de

espécies, dominância ecológica e valor económico. O género é bastante antigo considerando

que o fóssil mais antigo encontrado pertence ao Oligoceno (34-23 milhões de anos). Os

carvalhos são os membros dominantes de uma grande variedade de habitats e pensa-se que

existam 500-600 espécies na Terra.

O sobreiro (Quercus suber L.) representa umas das espécies arbóreas mais importantes da

região Oeste do Mediterrâneo, tanto económica como ecologicamente, onde define espaços

florestais abertos (criados e mantidos pelo Homem) conhecidos em Portugal como

“montados”. A área de distribuição do sobreiro, apesar de descontínua, vai desde a costa

Atlântica do Norte de África e Península Ibérica até às regiões sudoeste de Itália, incluindo as

ilhas Mediterrânicas Sicília e Sardenha, assim como as zonas costeiras do Mediterrâneo da

Argélia e Tunísia. As florestas de sobreiro cobrem uma área total de cerca de 2,2 milhões de

hectares, de onde são extraídas 340.000 toneladas/ano de cortiça. As maiores extensões de

área coberta estão localizadas em Portugal com cerca de 700.000 hectares, correspondendo a

21% da área florestal Portuguesa e 30% da área mundial de produção de cortiça. O sobreiro

tem sido usado desde a Antiguidade para a produção de cortiça e este produto natural

apresenta um grande valor económico. As maiores ameaças contudo, são enfrentadas pelas

populações naturais e marginais, que muitas vezes são pequenas e se encontram dispersas e

em habitats restritos. Muitas destas populações podem estar em risco de desaparecer,

principalmente devido à falta de regeneração.

Devido ao seu valor económico e também porque os espaços florestais de sobro são

reservatórios de biodiversidade e abrigo para uma grande variedade de espécies ameaçadas de

extinção, estas populações representam material importante para estudos genéticos que

possam servir de base ao delineamento de programas de conservação. Assim sendo, é

necessário fortalecer e aumentar o conhecimento da organização espacial da variação

genética da espécie, para assim se poder tomar decisões conscientes e informadas sobre a

conservação dos recursos genéticos.

Os estudos filogeográficos em Quercus suber têm sido pouco aprofundados e alguns até

inconclusivos. Isto leva a que não haja uma boa compreensão da história evolutiva da

espécie, muito provavelmente devido ao número limitado de áreas amostradas ou o baixo

conteúdo informativo dos marcadores usados. Por exemplo, nos estudos que envolveram

populações Portuguesas, foram feitas inferências com base numa amostragem deficiente de

Portugal, e sendo esta uma das regiões mais relevantes na história presente e passada do

sobreiro, é necessária uma maior cobertura da área de distribuição, incluindo algumas zonas

referidas como potenciais zonas de refúgios glaciais para outras espécies. Por outro lado, uma

vez que a maioria dos estudos filogeográficos são suportados por dados derivados do DNA

cloroplastidial (cpDNA) (PCR-RFLPs e SSRs), deve considerar-se se outras abordagens

moleculares ou marcadores genéticos, que evoluam a taxas mais rápidas que o cpDNA não

indicariam um cenário evolutivo diferente. Esta tese de mestrado propõe uma abordagem

diferente dos estudos anteriores, complementando dados obtidos a partir de DNA

cloroplastidial e nuclear. Esta abordagem nunca foi aplicada ao sobreiro, e espera-se que

possa adicionar informação filogeográfica relevante. Mais especificamente os objectivos

deste trabalho foram: 1. Inferir a história evolutiva e os padrões demográficos de Quercus

suber; 2. Explorar os padrões de hibridação e introgressão do sobreiro com outras espécies de

Quercus; 3. Avaliar os níveis de diversidade e diferenciação entre e dentro de algumas

populações chave de sobreiro.

A sequenciação de vários fragmentos permitiu inferir alguns detalhes sobre a história

evolutiva da espécie. O tradicional cpDNA foi seleccionado para sequenciação de 3 regiões

inter-génicas (TrnL-F, TrnS-PsbC e TrnH-PsbA), num total de 148 amostras provenientes de

26 populações. No entanto, e porque inferências filogeográficas baseadas num único tipo de

marcador não-recombinante pode dar informações erróneas sobre a história evolutiva da

espécie, o genoma nuclear (nuDNA) também foi explorado com a sequenciação de um gene

candidato potencialmente envolvido no stress osmótico (EST 2T13), em 104 amostras

provenientes das mesmas 26 populações. Para ambos os conjuntos de dados foram detectadas

duas linhagens presentes em sobreiro. Uma linhagem, a “linhagem pura”, parece

praticamente exclusiva do sobreiro e divide-se em três sub-linhagens possivelmente

resultantes de três zonas de refúgio, sendo uma predominante na zona Oeste do Mediterrâneo,

e as outras duas na zona Este do Mediterrâneo. A outra linhagem aparece associada a

Quercus ilex (azinheira) e Quercus coccifera (carrasco) e foi apelidada de “linhagem

introgredida”. Esta linhagem parece resultar de vários fenómenos de hibridação e

introgressão com Quercus ilex. A análise combinada das sequências do cpDNA e nuDNA

sugere que esta introgressão aconteceu em ambos os sentidos entre as duas espécies, assim

como sugere que estes eventos foram frequentes e consecutivos durante um período de

tempo.

Finalmente, microssatélites nucleares, derivados de ESTs (Expressed Sequence Tags) (EST-

SSRs) e anónimos (nuSSRs), permitiram obter uma perspectiva dos padrões de diversidade

genética e estrutura populacional do sobreiro. Numa primeira fase foi possível estabelecer os

EST-SSRs como marcadores válidos no sobreiro, contrariando a ideia de que os EST-SSRs

tendem a ser pouco polimórficos. Posteriormente, uma análise combinada destes dois

marcadores (5 EST-SSRs e 3 nuSSRs) em 379 indivíduos provenientes de 13 populações

detectou uma diversidade genética relativamente baixa, mas altamente significativa. Apesar

de não ter sido detectada estrutura populacional nas populações Portuguesas, aparecendo em

conjunto num grupo populacional, verifica-se uma tendência para considerar a Catalunha

(Espanha) como uma das populações mais diferenciadas.

No geral os objectivos do trabalho foram cumpridos, esclarecendo alguns pontos da

filogeografia e história evolutiva do sobreiro. A introdução dos novos marcadores

moleculares foi claramente informativa, revelando novos aspectos inesperados acerca dos

padrões genéticos da espécie e assim o gerar de hipóteses explicativas completamente novas

em sobreiro.

Palavras-Chave

Quercus suber, estrutura geográfica, microssatélites, ESTs, introgressão

Abstract

Cork oak (Quercus suber L.) is one of the most important tree species, economically but also

ecologically, in the Western Mediterranean region. Consequently there is an enormous

interest in understanding the evolutionary history and current population structure in cork

oak. Although some details on the genetic divergence of cork oak populations have been

uncovered, it is most probable that a different and complementary analysis of chloroplastidial

and nuclear DNA markers (cpDNA and nuDNA) can bring additional phylogeographical

relevant information. So far, no one has attempted the molecular approach proposed in the

present study for cork oak by combining cpDNA and nuDNA sequence variation and also

anonymous nuclear microsatellites (nuSSRs) and EST-derived (Expressed Sequence Tags)

(EST-SSRs) polymorphism data to infer phylogeographical patterns and history, possible

glacial refuges, diversity levels and geographic structure.

A genetic survey was conducted sampling populations throughout the entire distribution

range of the species. Genetic diversity was monitored at 8 nuclear microsatellite loci (3 EST-

SSRs and 5 nuSSRs) in 379 individuals derived from 13 populations, and at 4 DNA

sequences (3 cpDNA intergenic spacer regions and 1 osmotic-stress related candidate gene)

in 148 samples from 26 populations.

DNA sequences, of both cpDNA and nuDNA, confirmed two main lineages of cork oak

haplotypes, the first named as pure lineage (mostly exclusive of cork oak but also shared with

Q. cerris) and the second as introgressed lineage (shared with Q. ilex and Q. coccifera).

However, sequences of the cpDNA show the complexity of the introgressed lineage,

apparently indicating that these events of hybridization and introgression may have happened

frequently and consecutively over a period of time. The theory of cork oak refugia over the

last glaciations was also revisited (over the pure lineage of the cpDNA haplotypes) and three

major haplotypes were detected, reflecting three possible refuge areas. Finally, with the

microsatellite data, population differentiation was low but rather significant and the

geographic subdivisions that could be defined isolated the Portuguese populations in one

cluster, further characterizing the Catalonia (Spain) population as possibly the most

differentiated population.

Key words

Quercus suber, geographical structure, microsatellites, ESTs, introgressive hybridization

List of abbreviations

AFLP – Amplified Fragment Length Polymorphisms

BA – Bayesian analysis

BLAST – Basic Local Alignment Search Tool

bp – base pairs

BP – Before Present

CBOL – The Consortium for the Barcode of Life

COI or Cox1 – cytochrome c oxidase I

cpDNA – Chloroplastidial DNA

ESTs – Expressed Sequence Tags

EST-SSRs – EST-derived SSRs

FAO – Food and Agricultural Organization (of the United Nations)

ITS – Internal Transcriber Spacer

kb – Kilobases

MCMC – Markov Chain Monte Carlo

MP – Maximum Parsimony

mtDNA – Mitochondrial DNA

nuDNA – Nuclear DNA

nuSSRs – nuclear SSRs

PCR - Polymerase chain reaction

RAPDs – Random Amplified Polymorphic DNA

rDNA – Ribosomal DNA

RFLP – Restriction Fragment Length Polymorphism

sncDNA – Single-nuclear copy DNA

SNPs – Single Nucleotide Polymorphisms

SSR – Simple sequence repeats; microsatellites

Table of Contents

1. Introduction ................................................................................................................................... 17

Thesis Main Goals ............................................................................................................................ 18

1.1 An emblematic tree: Quercus suber L. ............................................................................. 20

1.1.1 General aspects on cork oak and the “montado” ............................................................. 20

1.1.2 Taxonomic classification and phylogenetic studies ......................................................... 22

1.1.2.1 Barcoding in oak phylogenetics ................................................................................ 24

1.1.3 Geographical distribution ................................................................................................. 25

1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization ............... 27

1.1.5 Genetic diversity studies .................................................................................................. 30

1.1.6 Hybridization and cytoplasmatic introgression ................................................................ 35

1.2 Molecular markers in phylogeography ............................................................................. 37

1.2.1 Mitochondrial DNA (mtDNA) ......................................................................................... 37

1.2.2 Chloroplastidial DNA (cpDNA) ...................................................................................... 38

1.2.3 Nuclear DNA (nuDNA) ................................................................................................... 39

1.2.4 Simple Sequence Repeats (SSRs) .................................................................................... 40

1.2.5 Expressed Sequence Tags (ESTs) .................................................................................... 41

2. Materials and Methods ................................................................................................................. 43

2.1 Sampling and DNA extraction ................................................................................................ 43

2.2 DNA sequencing ..................................................................................................................... 44

2.3 Microsatellite genotyping ....................................................................................................... 47

2.4 Phylogenetic and phylogeographic analysis ........................................................................... 48

2.5 Selective neutrality tests and demographic history ................................................................. 49

2.6 Genetic diversity and population differentiation ..................................................................... 50

2.7 Genetic structure of populations ............................................................................................. 51

3. Results ........................................................................................................................................... 53

3.1 Sequencing of chloroplast and nuclear DNA fragments ......................................................... 53

3.1.1 cpDNA and nuDNA diversity levels ............................................................................... 53

3.1.2 Differentiation patterns .................................................................................................... 54

3.1.3 Mismatch distribution and neutrality tests ....................................................................... 62

3.2 Microsatellite analysis............................................................................................................. 64

3.2.1 Genetic diversity values ................................................................................................... 64

3.2.2 Genetic differentiation among populations ...................................................................... 66

3.2.3 Population structure ......................................................................................................... 68

4. Discussion ..................................................................................................................................... 73

4.1 Differentiation and demographic patterns ............................................................................... 73

4.2 Hybridization and introgression .............................................................................................. 75

4.3 Genetic diversity and population structure ............................................................................. 79

5. Final Remarks ............................................................................................................................... 82

6. Bibliographic References .............................................................................................................. 84

Supporting Information ......................................................................................................................... 95

Supporting Information 1 .................................................................................................................. 96

Supporting Information 4 ................................................................................................................ 101

Materials and Methods

1. Introduction

The year of 2011 has been designated as „The International Year of Forests‟ by the United

Nations General Assembly, in an attempt to raise awareness and strengthen a more

sustainable forest management and conservation of all types of forests for the benefit of

current and future generations. Estimates by the Food and Agriculture Organization (of the

United Nations) (FAO), in the year of 2010, demonstrated that 31% of the Earth‟s terrestrial

surface is still covered by forests, and that trees correspond to 90% of Earth‟s biomass [1].

Some estimates of the global tree species richness state that there are 60,000 to 100,000 taxa,

and that forests harbour the majority of the world‟s terrestrial biodiversity [2]. However, the

ongoing deforestation and other human-induced global changes (such as climate and land

use) brought the number of the world‟s tree species threatened with extinction close to 10%

[3,4] and, although the overall rate of deforestation remains alarmingly high (estimated at 9.4

million hectares per year in the late 1990s), this rate is surprisingly slowing down [5].

In recent years forest trees have been gaining much attention as a non-classical model for

several types of studies. For purposes of population and evolutionary genetic and genomic

studies, they are particularly interesting since forest trees result of millions of years of lineage

divergence and diversification and present amazing levels of diversity in morphology,

adaptation, and ecology [6,7]. Although, in the end, the impact of global changes in forest

trees will depend to great extent on the reaction of these trees and their ecosystems, genetic

studies open the possibility of predicting the evolutionary consequences of the future global

changes by increasing the knowledge on tree biodiversity and evolution [8]. For that purpose

phylogeographic genetic studies seem to be an important step in understanding these

processes.

Avise et al. presented the concept of “Phylogeography” in 1987 [9], and during the past 25

years or so phylogeography has had a major impact on research, particularly in animal

species. In plants, however, the produced results have not been so explicit. One of the major

problems has been a lack of useful genetic variation applicable to the phylogeographic

analysis. It is quite difficult to find genetic markers in plants with a resolving power

comparable to animal mitochondrial DNA (mtDNA) [7,10]. Nonetheless, plant

phylogeography has come a long way over the last few years with the availability of nuclear

markers and with the collection of data from larger sections of chloroplastidial genome [11].

The genus Quercus (oaks) of the Fagaceae family is one of the most important groups of

woody angiosperms in the northern hemisphere in terms of species diversity, ecological

dominance, and economic value. The genus is quite old, since the oldest unequivocal oak

fossils belong to the Oligocene, which ranges from 34 to 23 million years before present.

Oaks are dominant members of a wide variety of habitats, and somewhat 500-600 species

exist on earth [12,13].

The Quercus suber L. (commonly known as cork oak) is among the most important tree

species (economically and ecologically) in the Western Mediterranean region, from where it

is endemic, defining unique open woods (created and maintained by man) known in Portugal

as “montados” ” and in Spain as “dehesas”. Quercus suber has been mostly used to produce

cork and this natural product has a great economic value. The biggest threats, however, are

faced by the marginal natural populations, often growing in small and scattered stands and in

restricted habitats that are at risk of disappearing, mainly due to a lack of regeneration [14].

Due to the species economic value and also because cork oak woodlands are renowned

reservoirs of biodiversity, home to a variety of threatened and endangered species, and

crucial to avoid soil erosion, Q. suber populations represent valuable material for genetic

studies as well as gene conservation programs. With that purpose, a greater knowledge about

the spatial organization of genetic variation within the species is necessary to allow decisions

to be made about tree breeding and the conservation of genetic resources.

Thesis Main Goals

Although previous studies have already addressed population genetics in Quercus suber,

there is still a void in the current understanding of the evolutionary history of the species,

mostly due to the limited number of geographical areas sampled or to low marker informative

content. In particular, Portuguese populations have been poorly represented in those studies,

and being Portugal one of the most relevant regions in the recent and past history of cork oak,

a more complete range of cork oak distribution and differentiation should be covered,

including some areas referred as potential glacial refuges for other species. On the other hand,

since the majority of the previous phylogeographical inferences are supported by data from

chloroplastidial markers [Restriction Fragment Length Polymorphism (RFLP) and

microsatellites (SSRs)], it should be raised the question whether other molecular approaches

or genetic markers evolving at faster rates than chloroplastidial DNA (cpDNA), would

provide a different microevolutionary scenario. Therefore, the main objectives of this work

were to:

1. Infer the evolutionary history and demographic patterns of Quercus suber;

2. Assess the hybridization and introgression patterns by other Quercus species;

3. Evaluate the diversity and differentiation levels among and within some key cork

oak populations.

To achieve these goals populations from the entire Mediterranean distribution of the species

were analysed using different approaches with several molecular markers. A multi-locus

sequencing approach was applied to infer the evolutionary history of the species and its

relationships and also introgression patterns with other Quercus species. The traditional

cpDNA was selected for sequencing of several fragments. Additionally, and because

phylogeographic inferences based on a single non-recombining marker can be misleading, the

nuclear genome was also explored with the sequencing of one candidate gene. Finally,

Expressed sequence tag (EST) derived SSRs (EST-SSRs) and anonymous nuclear SSRs

(nuSSRs) were also used with the intention of providing a perspective of patterns of genetic

diversity and population structure.

Figure 1.1: Quercus suber L. – Cork

oak‟s natural population in Serra da

Estrela, Portugal

1.1 An emblematic tree: Quercus suber L.

1.1.1 General aspects on cork oak and the “montado”

Cork oak (Quercus suber Linné, 1753) is an emblematic Mediterranean evergreen

sclerophyllous tree. It is a slow growing, extremely long-lived tree, reaching about 20 meters

height, with massive branches forming a round crown (Fig. 1.1). It is a diploid (2n=24),

monoecious (both male and female reproductive organs in one individual) species with a

protandrous system (anthers mature before carpels) to ensure cross-pollination. Plant

propagation in natural populations occurs by seed (acorn) dispersal and subsequent

germination (sexual reproduction), which is called natural regeneration. Cork oaks natural

regeneration is mostly assured by wind and animals, as with those of most oak species [15-

Cork oak, along with holm oak (Quercus ilex L.,

1753), are the two main evergreen oak species in

the western part of the Mediterranean Basin [17].

These two species, particularly in the Iberia

Peninsula are mostly present as semi-natural stands

known as “montados”, which are open woods with

a delicate and particular ecosystem, created and

maintained by man.

The montado semi-natural landscape is valued

because it represents a viable land use still

preserving a rich biodiversity at all levels from

insects and flora to top predators such as the

Iberian Imperial Eagle (Aquila adalberti) or the

Iberian Lynx (Lynx pardinus), the world‟s most

endangered cat and their mutual prey species, the rabbit (Oryctolagus cuniculus).

They also represent an important economical resource, but with the exception of central

Spain, holm oak forests can be regarded as rare cases of woodlands that have undergone very

little or no silvicultural management. Cork oak management is, however, at a different level,

since its high economical importance is associated not only with harvesting of acorns, but

also of cork. The thick and soft bark of cork oak is used to produce the familiar cork which is

the main product responsible for the important economical role of this partly domesticated

species. Trees are first stripped of cork, from the lower portion of the trunk at about 14 years

of age and subsequently every 9-12 years, and can live through this process for 100 to 500

years without any apparent effect on tree physiology. Acorns are eaten by birds and they are

highly valued as fattening fodder for domestic Iberian pigs. Since ancient times, cork oak has

been favoured, and sometimes widely spread, by preferably using acorns from trees

producing good quality cork [15,18-20]. Therefore, Q. suber is widely cultivated within its

natural range, but according to Carrión et al. [21], without human activities, cork oak would

never develop pure stands in the Iberian Peninsula, and would form mixed forests with other

sclerophyllous and deciduous oaks, and with Pinus pinaster.

Cork oak forests cover 2,2 million hectares worldwide, from where 340,000 tons/year of cork

are extracted (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). The largest stands,

covering about 700,000 ha are located in Portugal and correspond to 21% of the forest area in

Portugal and to 30% of the world‟s cork producing area. Currently, cork industry represents

3% of all Portuguese exportations. Cork stoppers for wine bottles are the most representative

product of this industry, responsible for 70% of the exportations (see:

http://www.amorim.com/cor_glob_cortica.php). In spite of the economic importance of this

renewable material, there is still much to discover about both the biological and the genetic

mechanisms involved in its formation. Human intervention through extensive plantations and

systematic clear-cutting in forests with the objective of empirically selecting varieties with

higher quality levels of cork is supposed to have strongly contributed to the genetic

homogenization of Q. suber populations in the Iberian Peninsula [22].

Cork oak plantations are very important for the economy and play an important social and

environmental role that has to be taken into consideration as the unparalleled decline

occurring in the Iberian Peninsula and in Morocco is threatening the entire ecosystem [22].

Although the marginal and natural populations of cork oak are possibly the most endangered,

Iberian cork oak montados are also currently threatened and in decline due to multiple

factors. The main factor contributing to this decline is the occurrence of very severe drought

periods over several consecutive years [18]. The lack of natural regeneration (mainly due to

overgrazing and insolation, particularly in North Africa) is one of the most important factors

and so stand sustainability cannot rely exclusively on the decreasing resprouting ability of

aged and decaying adult trees [23,24]. In Portugal and Spain another contributing factor to

this decline is the occurrence of ink disease, a root disease caused by the soil born pathogen

Phytophthora cinnamomi. Moreover, the increasing use of synthetic stoppers in wine bottles

replacing the traditional cork is an additional factor that in conjunction with the above stated

threatens this ecosystem at medium term [18,22].

Holm oak montados are also endangered for some of these and a number of other reasons.

Thus, the admired sustainability of montados is jeopardized, and these formations may

become „fossil forests‟ [23]. In this sense, the outlook is more favourable for Q. ilex which is

a more euryecious species than Q. suber (whose presence is limited by cold, drought and soil

type). In recent years, there has been increasing recognition of the important contribution

made by these species to the preservation of seminatural habitats and landscapes in Europe

[18,23,24]. Several studies on the regeneration of Mediterranean forest have been published,

and some of them are centred on Q. ilex and Q. suber [24,25]. However, these works have

focused mainly on ecological aspects of regeneration, silviculture and land use [23], without

addressing the genetic bases of montado regeneration or the populations‟ diversity with

consequences on adaptation, which is now an immediate priority to allow informed decisions

for conservation of genetic resources.

1.1.2 Taxonomic classification and phylogenetic studies

Several proposals for Quercus taxonomy based on morphology have been presented [12,26],

however these classifications have always been surrounded by controversy mainly due to a

generalised intraspecific morphological variation that may be produced by hybridization and

adaptation to ecological changes in the environment [27], especially abundant in oaks. As a

result, classifications have been all but straightforward and especially at the subgenus level,

still uncertain. The taxonomic scheme proposed by Schwarz in 1964 [26] is possibly the

most accepted for the classification of cork oak, and appears to be the most suitable in

describing the systematics of European oaks [19,28,29]. According to the Flora Europaea

[17], and that same taxonomic scheme, the genus Quercus is divided in four subgenera (or

subsections), as follows:

Order Fagales

Family Fagaceae

Genus Quercus

Subgenus Cerris

Quercus

Sclerophyllodrys

Erythrobalanus

Quercus suber belongs to the family Fagaceae, genus Quercus and subgenus (or subsection)

Cerris (Spach) Oersted.

Quercus comprises 500-600 species, of which 350–500 species are distributed throughout the

Northern Hemisphere [12,13,30]. They are conspicuous members of the temperate deciduous

forests of North America, Europe, Asia, as well as the evergreen Mediterranean maquis. A

smaller number of oak species (30–35) are evergreen and grow mainly in south-western Asia,

western North America and around the Mediterranean Basin. In the Mediterranean area, only

four evergreen oak species have been identified. These include Quercus alnifolia Poech.

(golden oak) endemic to Cyprus and Quercus suber L. (cork oak) distributed exclusively in

the western part of the Mediterranean Basin. The third species is the holly oak which is a

complex including Quercus coccifera L. and Quercus calliprinos Webb. Allozyme studies

suggest that holly oak should perhaps be considered as a single species (Q. coccifera L.) with

subspecies coccifera and calliprinos [19]. The fourth Mediterranean oak species, Quercus ilex

L. (holm oak), shows two morphological types, rotundifolia and ilex type [18,19,26], which

are sometimes regarded as distinct species.

According to Schwarz [26], Q. ilex and Q. coccifera (including subsp. calliprinos) belong to

subgenus Sclerophyllodrys (O. Schwartz) whereas Q. suber and Q. alnifolia relate to

subgenus Cerris (Spach). This classification was also supported by RFLP analysis of the

nuclear ribosomal DNA (rDNA) 18S and 25S and spacer regions [28] and chloroplastidial

DNA [30] and by nuclear DNA (nuDNA) Internal Transcriber Spacer (ITS) sequences

[27,30], however Q. alnifolia was not included in these studies. Moreover, from the study of

Manos et al. [30], evidence was obtained that the two groups of Mediterranean oaks (subg.

Sclerophyllodrys and subg. Cerris sensu Schwarz; ”Ilex group” and “Cerris group” sensu

Nixon) are monophyletic, as reported previously by Nixon [29]. More recently, they

constitute a larger group (the Eurasian Cerris group) which includes all the European and

Asiatic evergreen oak species analysed [27,30]. When considering the subg. Cerris, several

systematic studies support that Quercus cerris and Quercus crenata are the most closely

related species to Q. suber [27,29-31].

1.1.2.1 Barcoding in oak phylogenetics

Tree species share several attributes, such as longevity, complex reproductive strategies, great

potential for local adaptation, and slow mutation and speciation rates [2], that makes

barcoding of forest trees a captivating issue from both speculative and practical points of

view. “DNA Barcoding” is a molecular approach to identify the species to which any living

organism belongs by the use of a standardised gene region of the genome (or several loci

used together as a complementary unit). Ideally, the barcode system would be an universal

and valuable resource that would allow fast and unequivocal species identification and taxon

characterization at any life stage of the specimen and from minimal tissue samples

(http://www.barcoding.si.edu) [29,32,33]. Besides taxonomy, a widespread application of

barcoding would be a powerful research complement for molecular ecology, phylogenetics,

and population genetics [34].

The success of a DNA sequence as a species identification tool - the barcode - depends on the

prerequisite of existence of unique substitutions that distinguish among closely related

species, and ease of application across a broad range of taxa. A portion of the mitochondrial

cytochrome c oxidase I (COI or cox1) gene sequence is currently being used as a universal

barcode in certain groups of animals, fungi, diatoms, and red algae. However, COI has

proved to be unsuitable in land plants, mainly because of the low nucleotide substitution rates

of the plant mitochondrial genome [7,35,36]. The nuclear and plastid plant genomes therefore

offer the best expectation of yielding a suitable sequence (or pool of sequences) for DNA

barcoding, i.e., a sequence(s) that will be variable enough to differentiate species, but at the

same time still stable enough at a lower taxonomic level as to have low infraspecific

variability [33,35]. The difficulty in finding a single-locus for barcode in plants suggested a

multilocus approach, focusing on the chloroplast genome as the most promising strategy for

barcoding plant species. Therefore a pool of loci has been recently considered, with the

greatest interest turned to seven candidates: rpoB, rpoC1 and rbcL as three easy-to-align

coding regions, a section of matK as a rapidly evolving coding region, and trnH-psbA, atpF-

atpH, and psbK-psbI for being three rapidly evolving intergenic spacers [36,37]. Based on the

relative ease of amplification, sequencing, multialignment, and on the amount of variation

displayed, many research groups have proposed different combinations of these loci [32,36-

39]. However, in 2009, the CBOL (The Consortium for the Barcode of Life) Plant Working

Group stated the combination of rbcL and matK as the most convenient in terms of

universality, sequence quality and discrimination power. Nevertheless, it is still argued that

regardless of the regions adopted for barcoding, some species will always be better resolved

with the use of other regions [29,36,40]. Such an example is the oaks, which represent an

obstacle to the idea of barcode in plants.

A recent attempt of barcoding in the Italian wild dendroflora, with the use of four plastid

regions (trnH-psbA, rbcL, rpoC1, matK), revealed that the genus Quercus is noncompliant to

barcoding (0% discrimination success) [29], a probable consequence of factors like low

variation rate at the plastid genome level and hybridization. Nonetheless, it appears that the

main obstacle to barcoding success in difficult genera, such as Quercus, cannot simply be

overcome by adding additional plastid DNA data. Nuclear DNA may offer some advantages

due to higher mutation rates and modes of inheritance. Discrimination of the same set of oak

species was already obtained by means of internal transcribed spacer region of ribosomal

DNA (ITS) sequence variation [27], and it even supports the recognition of the subgenus

Schlerophyllodrys, Cerris, and Quercus, as proposed by Schwarz [26]. The rapidly evolving

ITS may thus represent a useful supplementary barcode in difficult genera, although not

without completely overcoming extant problems, namely the paralogy and other factors

associated with the complex concerted evolution of this highly repeated part of the nuclear

genome, which still requires further refinement of current protocols [7,35].

1.1.3 Geographical distribution

The Mediterranean evergreen Quercus species are a group with overlapping habitats. In the

Western Mediterranean Basin, holm oak, cork oak and holly-oak are the dominant

broadleaved species. These three species are sympatric in many areas, but some differences

in their ecological requirements produce distinct responses to environmental conditions and

Figure 1.2: Geographical distribution of cork oak, Quercus suber, represented in dark grey. Based

on Magri et al. [16]

hence different evolutionary histories as interestingly confirmed by several studies showing

differences in their genetic variation patterns at both nuclear and cytoplasmic levels

[18,19,30,41,42].

Q. suber has quite a narrow geographical range when compared to the other main evergreen

Mediterranean oak species, mainly due to its ecological restrictions. The modern distribution

of cork oak, rather discontinuous, ranges from the Atlantic coasts of North Africa and Iberian

Peninsula to the southeastern regions of Italy, and includes the main western Mediterranean

islands of Sicily and Sardinia as well as the coastal belts of Algeria and Tunisia, Provence

(France) and Catalonia (Spain) [16,43] (Fig. 1.2).

As opposed to holm oak which shows a great ecological amplitude, cork oak is restricted to

hot (>4ºC – 5ºC mean temperature for the coldest month) variants of the humid and sub-

humid Mediterranean areas with at least 450 mm mean annual rainfall [18,20].

In Europe there are, theoretically, low winter temperatures that appear to set the geographic

distribution limits and most cork oak stands are located in areas below 800 meters in altitude,

since cork oak leaves are less tolerant to frost and to drought than those of the more

widespread holm oak. In addition, whereas holm oak is indifferent to soil types, cork oak

usually grows in acidic soils on granite, schist, or sandy substrates and it avoids limestone

and other carbonated substrates. Cork oak distribution is therefore more shifted to the west

and more patchy than that of holm oak (sensu latu) which constitutes a continuum from

Turkey to Portugal, including all the larger Mediterranean islands [17,18,20,43]. In spite of

this, within its geographical range, cork oak shows high levels of morphological and

phenological variability, albeit most of this diversity is considered to be result of past

introgressive hybridization with other sympatric species [15,44,45]. Nowadays, in their

common distribution area, cork and holm oaks often grow together and the local occurrence

of morphologically intermediate trees has been reported [18,27].

1.1.4 Evolutionary history – Origin, glacial refugia and post-glacial recolonization

Several hypotheses have been advanced concerning the evolutionary history of cork oak as

well as the geographical location of its centre of origin; however the details of its

differentiation processes are still largely unknown.

It was originally suggested that Q. suber may have originated in the Iberian Peninsula where

the species has its current main range (Fig. 1.2). This hypothesis was based on geobotanical

studies and on allozyme variation from the whole cork oak range, which revealed a

substantially higher genetic diversity in the Iberian populations as compared with those from

North Africa, Italy and France [18,27]. Paleoecological data indicate that both cork and holm

oak species have been present in south Europe since the end of the Tertiary period. Also, two

fossil records of cork oak from Miocene age were found in Portugal and two belonging to the

Pliocene were recovered in Tunisia and Galicia (Spain). Therefore it seems plausible an early

Cenozoic origin for Q. suber in Iberia and subsequently, at the end of the Miocene, the

colonization of North Africa from the Gibraltar strait [16,18,43].

Alternatively, according to fossil records of other oak species of subgenus Cerris, dating to

the Tertiary and found in the Balkanic Peninsula, it has also been considered that Q. suber

might have appeared first in more eastern countries (either in the Balkanic Peninsula or,

alternatively, in the Middle Eastern-Peri-Caucasian area), in common to the whole Cerris

group. It has been suggested that the species expanded westward during the late Miocene and

was widespread throughout the Mediterranean Basin during the Pliocene, where it survived

thanks to the lack of climatic constraints, but going extinct in the eastern part of its

distribution area [27,43]. Data from PCR–RFLPs over cpDNA fragments seems to constitute

additional evidence to support an eastern origin for cork oak [43].

Glacial and periglacial environments have had a significant effect on the modern vegetation

of Europe. It is widely accepted that the climatic oscillations that occurred during the

Quaternary (i.e., over the past 1.8 million years) are one of the most crucial determinants of

the current distribution of biota in temperate latitudes. The spatial patterns of several tree

species throughout the European continent are the long-term result of late glacial and post-

glacial migration from refugial populations that were able to withstand the severe climatic

conditions of Pleistocene stadials [46-49]. With few exceptions [8,50,51], during the coldest

periods of the last full glacial epoch (37,000 – 16,000 years BP – before present) the locations

postulated for glacial refugia of most European woody angiosperms have been south of the

parallel 40º N, which runs from central Portugal to Sardinia, Calabria and northern Greece.

This is considered to be the boundary between polar aridity and warmer climates during part

of the Quaternary. The theory that southern Europe (particularly the three southern peninsulas

- Balkan, Italian and Iberian) and the Near East provided appropriate conditions for refugia of

temperate tree taxa is based on a number of assumptions relating to the full-glacial

environments of those regions and their ability to supply the necessary conditions for growth

[52,53].

The original refugial model idea implied that „forests‟ could have survived in these southern

locations during the cold stages of the Quaternary. However, extensive populations of trees

have never been detected. Instead, the traditional palaeogeographical models (although

inferred from a scarce palynological evidence) suggest a small number of refugia - the “few

southern refugia” hypothesis [52,53]. Temperate tree taxa possibly survived in small pockets

of microenvironmentally favourable locations where usually only a few tree taxa are detected

and in low concentrations. Aridity was probably a significant limiting factor for tree growth.

Assuming the hypothesis of “few southern refugia”, common patterns of post-glacial

colonization for temperate European tree species are defined and expected, with high

diversity levels in southern Europe and decreasing northwards [54-57]. Some of the main

European trees species have been analysed using molecular markers, including for example

several Quercus species and Abies alba (European silver fir), and the resulting patterns of

diversity correspond to the expected ones [58,59].

However, more complete palaeobotanical data sets [50,53], palaeoclimatic modeling [60] and

genetic research [52] are starting to question the paradigm of “few southern refugia” in

southern Europe (and in particular in the three southern peninsulas) during full-glaciations.

Increasing evidence indicates that during the last full-glacial period populations of coniferous

and some deciduous trees grew much further north and east than previously assumed [53]. In

addition, new palaeoclimatic simulations suggest that full-glacial conditions in central and

eastern Europe were not nearly as severe as previously anticipated [60,61]. While some

refugia for Mediterranean trees were previously identified in the Iberia Peninsula, López de

Heredia et al. (2007) results based on cpDNA PCR-RFLPs and a review of paleobotanical

data support the presence of multiple refugia for the evergreen oaks within the Iberian

Peninsula (e.g. Cantabric mountain ranges, south-eastern Spain or even central Spain) during,

at least, the last glacial period [52]. Under the “multiple refugia” hypothesis, tree species that

nowadays are present in the north and central Europe would have recolonized these areas

from populations located in the north of the Iberian Peninsula. Moreover, these populations

would have been barriers preventing expansion from southern refugia. If that was the case,

cpDNA data should show complex patterns of spatial distribution that would have resulted

from the generation of multiple secondary contact zones [8].

For the last glacial and postglacial periods, results from palynological data indicate the

occurrence of cork oak in south-western Iberia since the Late Glacial period (17,000-12,000

years BP) and in North Africa since the early Postglacial (approximately 8,500 years BP)

[18,27]. It is accepted that during the Quaternary glaciations, cork oak may have survived in

scattered refugia which possessed favourable microclimate conditions, and from which

postglacial colonization occurred over recent millennia. Palynological [21] and molecular

data [43,52] indicate a glacial refugia in south-western Iberia that expanded northwards in the

absence of mountain barriers and which was favoured by the existence of siliceous substrates.

It is also possible that the extensive introgression of Q. suber with Q. ilex may also indicate

several potential refugia in eastern Iberia [52]. RFLP analysis of the whole cpDNA show a

phylogeographical pattern of three groups corresponding to potential glacial refuges in Italy,

North Africa and Iberian Peninsula [43], from which, after the last glaciations, Q. suber may

have begun migrating northward to the southern part of France. However no fossil record

supports the molecular data for the Italian and North African refuge.

Reliable scientific evidence is lacking to confirm the presence of Q. suber in more northern

and eastern European countries. The Tertiary and Quaternary remains (megafossils and

pollen) found in several European countries did not allow taxonomic identification at the

species level and could be attributable to any Mediterranean oak species of the Cerris group

[43,62]. In fact, Q. suber is more thermophilous and has stricter soil requirements than many

other Quercus species, thus a bigger reduction of this species‟ range during glacial times is

expected to have happened. However, the uncertainty of palynological discrimination and the

lower cpDNA variation itself could bias the identification of glacial refugia for Q. suber [52].

1.1.5 Genetic diversity studies

As cork oak is predominantly allogamous, i. e. favouring cross-fertilization, with a life span

of up to 500 years or more and having a low replacement rate, it can be expected that at least

in some places, and mostly for selectively neutral characters, selection over time may have

resulted in reduced genetic differentiation both among trees of the same population and

between populations. However, a differentiation among populations has been detected around

the Mediterranean Basin by investigating both chloroplast and mitochondrial DNA (which

are maternally inherited in oaks, as demonstrated by Dumolin et al. [63]), as well as allozyme

variation [16,18,19,67]. Isozyme variation in the genus Quercus also shows that genetic

variability is high and similar to that found in conifers [15,65]. One of the main causes for the

high polymorphism found in cork-oak, as well as in holm-oak, may be attributable to the

physiological plasticity of the species, which allows them to adapt to variable and

unpredictable climatic conditions, characteristic of the Mediterranean climate. High levels of

diversity within populations are observed; conversely, low inter-population variability

indicates that most of the total genetic diversity in the species is found within rather than

among populations [15,18,19,23]. According to Elena-Rosselló & Cabrera [15], more than

83% of the total diversity in this species is found within populations, and the decline of

kinship estimates with distance suggests that isolation by distance has led to this structure.

The results obtained for Q. suber contrast with those found for most temperate forest species,

for which a generally weak and narrower within-population structure is the trend [23]. In cork

oak, gene flow between populations was estimated as more than one migrant per generation

(F. Simões de Matos, PhD thesis, INETI Lisbon, 2007) and is theoretically enough to prevent

genetic drift from causing local genetic differentiation and therefore population divergence,

under the Wright‟s Island Model [66].

PCR-RFLPs over specific cpDNA fragments illustrate a complex pattern of variation in the

evergreen oaks [19,41]. Jiménez et al. [41] detected three very distinct lineages of cpDNA

haplotypes, two of them being present in cork oak. One of the lineages, the “suber” lineage,

is specific to cork oak populations and may be considered as the original and most widely

distributed lineage in this species. The partial geographical distribution of this lineage was

reported by López de Heredia et al. [64], from peninsular Italy, Sardinia, Sicily, Corsica,

northern Africa and the island of Minorca. Cork oak populations from the Spanish mainland

and from the island of Majorca were characterized by another maternal lineage also shared

with Q. ilex and Q. coccifera, the “ilex-coccifera I” lineage [41,64]. This fact was interpreted

as the result of multiple and mainly unidirectional cytoplasmic introgression of Q. suber by

Q. ilex.

RFLP analysis over the whole chloroplastidial DNA was used by Lumaret et al. [43] for the

first time in Q. suber to analyse the phylogeographical variation over the whole species range

(Fig. 1.3). The chlorotypes showed a clear phylogeographical pattern of three groups

corresponding to potential glacial refuges in Italy, North Africa and Iberian Peninsula. The

most ancestral and recent groups were observed in populations located in the eastern and

western parts of the species range, respectively. Unrelated chlorotypes of an “ilex” cpDNA

lineage were also identified in specific western populations [43]. From the cpDNA variants

of „ilex‟ lineage recovered through interspecific introgression, additional successive cpDNA

changes may have occurred in Q. suber, and so two distinct cpDNA lineages in cork oak

were predicted. A particular chlorotype S1, observed predominantly in continental Italy and

in Sicily, was identified by Lumaret et al. [43] in a few populations from Sardinia, and from

Corsica which also shared a rare chlorotype S7 with Tunisia. This situation possibly reflects

the occurrence of rare natural events of long-distance dispersal from several geographical

sources located in the closest areas to those islands. Moreover, the possibility of an

intentional acorn transport by people for economic purposes cannot be ruled out and its

impact on the geographical patterns of cork oak genetic variation should not be

underestimated [43]. López-de-Heredia et al. [64] also proposed the possibility of long-

distance dispersal events to explain the sharing of a rare chlorotype by cork oak populations

located in Minorca and in Sardinia.

Using cpDNA microsatellites, Magri et al. [16] analysed cork oak populations throughout the

species distribution range and found a high geographical structure characterized by five

distinct haplotypes (Fig. 1.4). It was assumed that H3 (north Africa-Sardinia-Corsica-

Provence) and H4 (Portugal-western Spain-southwest France-northern Morocco) were the

ancestral Q. suber haplotypes, with H1, H2 (Italy) and H5 (scattered populations) originating

through ancient or recent introgression with Q. cerris (H1 and H2) and Q. ilex (H5). Also, the

cpDNA SSR data combined with paleobotanical and geodynamics models demonstrated that

cork oak populations have possibly experienced a genetic drift geographically consistent with

the Oligocene and Miocene break-up events of the European–Iberian continental margins and

persisted in some of the separate microplates that are currently found in Tunisia, Sardinia,

Figure 1.3: Geographical distribution of the eight and six chlorotypes of the „suber‟ and „ilex‟ lineages

identified in Q. suber populations by Lumaret et al. [43]. Chlorotypes were scored by RFLP variation over

the whole cpDNA molecule. The identity of sampled populations and cpDNA chlorotypes assayed through

RFLP as well as affiliation to the „suber‟ or „ilex‟ cpDNA lineages are indicated in the Figure. Source:

Lumaret et al. [43].

Figure 1.4: Distribution of cpDNA haplotypes found by Magri et al. [16] with cpDNA SSRs and

phylogenetic reconstruction of the relationships between haplotypes. The black circle in the network

indicates a hypothesized mutation, which is required to connect existing haplotypes within the network

with maximum parsimony. The grey area corresponds to the current distribution of Q. suber Source:

Magri et al. [16].

Corsica, and Provence [16] (Fig. 1.5). All these events seemed to have occurred without

detectable cpDNA modifications for a time span of over 15 million years.

The modern history of Quercus suber is closely related to human activity over the use of its

cork. For this reason, humans have been considered responsible for a reduction in genetic

variation in some stands of cork oak, as well as for hybridization with congeners [67]. Other

cultivated tree species in the Mediterranean area display a similar low geographical structure

in genetic variation, arguing for a multidirectional diffusion because of human activity. For

example, in Castanea sativa, the low geographical structure of the chloroplast genetic

diversity may be explained by the effect of a strong human impact [67,68]. However the

geographical distribution of the cork oak haplotypes found by Magri et al. [16] does not

appear to be related to cultivation. In fact, fossil pollen and wood records suggest that cork

oak was distributed in approximately the same areas as today even before the Neolithic.

Figure 1.5: Reconstructions of the Western

Mediterranean palaeogeography and possible

location of Quercus suber haplotypes found by

Magri et al. [16] (colours as in Fig. 1.4).

Continental microterranes rifted off the European-

Iberian continental margin: Rif (R), Betic range (B),

Balearics (Ba), Kabylies (Ka), Corsica (Co),

Sardinia (Sa), Calabria (Ca). Source: Magri et al.

Another possible hypothesis to explain these results is postglacial population expansion from

the potential glacial refuges in Italy, North Africa and Iberian Peninsula [43].

Some studies have also assessed the genetic

variability of cork oak populations in

Portugal. Coelho et al. [22] used AFLP

markers and reported low levels of

differentiation among cork oak populations.

The reasons pointed out are owed to the

outcrossing characteristic of the species,

long distance anemophilous pollination and

eventual secondary acorn dispersal by

animals, leading to extensive gene flow and

an increased homogeneity of allele

frequencies between populations [22,45].

The values of population differentiation

reported by Coelho et al. [22] (FST =0.0172)

are below the average of 0.07–0.09

expected for long-lived, wind-pollinated

woody species. These results are similar to

those found by Simões de Matos (F. Simões

de Matos, PhD thesis, INETI Lisbon, 2007)

with nuclear SSRs (FST=0.02), confirming

the absence of population structure. This

pattern of genetic differentiation within

Portuguese cork oak stands, some located

over a distance of 700 km, may be

explained by anthropogenic pressure in addition to a constant gene flow. This study shows

that 90% of the polymorphic markers identified in cork oak genotypes are uniformly

distributed through the populations of Algarve, Alentejo and Trás-os-Montes regions.

1.1.6 Hybridization and cytoplasmatic introgression

Capture of unexpected chloroplast haplotypes by hybridization and introgression has been

proposed as the most likely explanation for the sharing of cytoplasmic genes both in

deciduous and evergreen oaks [69] as well as in others [41,70]. Q. suber was reported to

hybridize with several species of the evergreen oak group, particularly, with holm oak [17,45]

this being regarded as one factor contributing to the increase of genetic diversity in cork oak

[22]. Q. suber and Q. ilex possess overlapping geographical distributions [17], and

hybridization occurs in nature, although it is not a frequent event [45]. Nevertheless, these

species are not very closely related, as shown from both cytoplasmic and nuclear genetic

analyses [19,27,69] and belong to subgenera Cerris and Schlerophyllodrys, respectively [26],

although the more recent classification includes both species within the same Eurasian Cerris

group [30].

The two most easily recognizable oak hybrids are Quercus x crenata (Q. cerris x Q. suber)

and Quercus x morisii (Q. suber x Q. ilex) [27]. It must be noted that these relatively rare

hybrids (0.3%) are found only when both parental species co-occur. Mature hybrid

individuals are easily recognized due to intermediate morphological traits between the two

parental oaks [27], but seedlings and even juvenile trees show very similar morphological

traits so that, in mixed stands, species identification is usually very difficult or even

impossible until the adult stage [71]. Asymmetric hybridization has been confirmed by

Boavida et al. [45], upon the description of post-pollination barriers in Q. suber to

interspecific crosses with Q. ilex, Q. coccifera, Q. faginea and Q. robur. The cross between

Q. ilex and Q. suber shows evidence of unidirectional compatibility and a higher success rate

was reported in the interspecific crosses in which Q. suber acts as pollen donor rather than as

female parent due to a differential growth in the pollen tubes of both species [45]. Also, since

both species are protandrous and Q. ilex flowers earlier, early cork oak male flowers can

pollinate late holm oak female flowers, the reverse not usually occurring.

By analysing polymorphism at allozyme loci and DNA markers for which alleles are distinct

in the two species growing in separate areas (diagnostic markers), evidence was obtained for

the occurrence of hybrids and genetic introgression (backcrosses between hybrids and

parental species) between sympatric holm oak (female) and cork oak (male) in several

locations [18,43,71]. Further evidence was advanced that, in initial hybridization and in

backcrosses, Q. ilex is predominantly, but not exclusively, the maternal species. This

interpretation is supported by the discovery of “ilex-coccifera I” haplotypes (chlorotype

shared by Q. ilex and Q. coccifera) in Q. suber individuals, and the absence of the opposite

situation, that is no Q. suber haplotypes within the Q. ilex pool [41].

The effect of hybridization and introgression in Q. suber cpDNA can produce the total

replacement of the Q. suber chlorotype by the “ilex-coccifera I” lineage (chlorotype shared

by Q. ilex and Q. coccifera) [41,64]. This situation is common in eastern Spain, where

siliceous soils are scarce and the effective population size is lower than in the continuous

forests from western Iberia. It has been suggested, on the basis of the differences between Q.

ilex and Q. suber chlorotypes found in sympatric populations, that hybridization and

introgression in these populations may be ancient [43]. Therefore, as reported by López de

Heredia et al [52] it cannot be ruled out that in the eastern range of the species some

populations withstood the glacial conditions by hybridizing with Q. ilex. For instance, a

particular chlorotype found by these authors (named “c66”) is predominant in all Q. suber

populations from Catalonia (north-eastern Spain), being very rare in Q. ilex.

The absence of cork oak populations possessing „ilex‟ chlorotypes in the eastern

Mediterranean range of the species was reported both by López-de-Heredia et al. [64] and by

Lumaret et al. [43]. However in a Corsican population, one of the 50 cork oak individuals

scored for cpDNA RFLPs was shown to possess an „ilex‟ chlorotype [43], suggesting that

cytoplasmic introgression of Q. suber by Q. ilex does occur in the eastern range although

apparently much less commonly. A substantial number of trees showing intermediate

morphology between both species have been observed in south-eastern continental Italy [27],

in Sardinia and Provence [42], also possessing predominantly an „ilex‟ chlorotype and for

many of them a hybrid origin was confirmed on the basis of nuclear interspecific diagnostic

markers. So, interspecific hybridization is likely to have happened quite frequently in the

eastern part of the range of Q. suber as well [43].

1.2 Molecular markers in phylogeography

The use of molecular markers has revolutionized research fields such as conservation

biology, population biology, and ecology. Markers provide a mean of observing otherwise

hidden aspects of natural history, whether this involves population level interactions on

ecological timescales, or the evolutionary relationships of genes, populations, and taxa [10].

As stated before, there is a lack of phylogeographic studies in plants, when in comparison to

animal studies. One of the major problems is finding useful genetic variation applicable to

this type of analysis, and it has been quite difficult to find genetic markers with a resolving

power comparable to the animal mitochondrial DNA [7,10]. To address this problem, and

also the choice of the molecular markers used for this study, it seemed necessary to review

the literature concerning plant genomes and the molecular markers available.

Plants are characterized by three types of genomes within the cell: the nuclear genome, and

two cytoplasmic genomes – mitochondrial and chloroplastidial DNA. The latter are of

endosymbiotic origin and have lost various genes to the nucleus over time (and, sometimes,

vice versa). These organelle genomes, because of their supposed shared prokaryotic origin,

are similar to animal mtDNA in overall structure (closed-circle chromosomes), replication

mode (with large populations of molecules per cell), and a non-Mendelian inheritance.

However, they also differ from animal mtDNA, and from one another, in some important

molecular and evolutionary aspects [10,72].

1.2.1 Mitochondrial DNA (mtDNA)

Although the phylogeographic studies in animals rely heavily on the mitochondrial genome,

in plants several characteristics make it poorly suited for these studies [7,11].

Plant mtDNA is highly variable in size across species, ranging from about 20 kilobases (kb)

to 2500 kb. Inheritance is often maternal, but not always. Surprisingly, plant mtDNA evolves

rapidly with respect to gene order and gene rearrangements are common, but rather slowly

regarding primary nucleotide sequence. This leads to low rates of sequence evolution (about

100 times slower than in animals), such that specific loci do not contain adequate variation

for generating phylogeographic, intraspecific signal. So, in these regards, the evolutionary

dynamics of plant mtDNA and animal mtDNA differ greatly, and one must look in

alternative genomes for informative variation [7,10].

1.2.2 Chloroplastidial DNA (cpDNA)

Although similar to mtDNA, the plant cpDNA plays by different evolutionary rules. It varies

moderately in size among species (from about 120 to 217 kb), with much of the size variation

attributable to the extent of sequence repetition in a large inverted repeat region. The

molecule contains about 120 genes that code for ribosomal and transfer RNAs, and several

polypeptides involved in protein synthesis and photosynthesis. The chloroplastidial genome

is transmitted maternally in most species, biparentally in some, and paternally in others

(notably, most gymnosperms), and tends to evolve somewhat slowly with regard to gene

rearrangements and also in terms of primary nucleotide sequences (about 3 to 4 times faster

than plant mtDNA, but still much slower than animal mtDNA). For this latter reason, cpDNA

sequences have proven especially useful for estimating phylogenetic relationships in plants

[7,10].

Intraspecific variation has been reported in a growing number of species, so that almost all

published plant phylogeographic studies have relied on the chloroplast genome as their only

source of genetic variation [7,10]. Most of this variation has been revealed by restriction

enzyme digestion of cpDNA (RFLP technique), in which genetic variants reflect the gain or

loss of restriction sites or length variation [73]. A more recent restriction enzyme-based

approach involves the digestion of PCR-amplified chloroplast loci to reveal fragment length

polymorphisms (RFLP) within the amplified fragment [52,59]. Using these readily accessible

laboratory techniques, large portions of the chloroplast genome may be evaluated in

numerous individuals. Furthermore, it is believed that at least 50% of all cpDNA variation

may be attributable to small insertion/deletion mutations. However, concerns about the

homology of length variants associated with simple-sequence repeat (SSR) polymorphisms

need to be addressed before this technique can be widely applied to construct useful gene

trees [7]. Ultimately, direct knowledge of the sequences of cpDNA variation would be most

desirable for gene tree construction. Unlike restriction enzyme analyses, direct sequencing of

cpDNA loci has not retrieved so far as many optimal levels of variation for phylogeographic

analysis. On the search for cpDNA loci with useful levels of sequence variation, it is

necessary to consider that the mutation rate of cpDNA varies for different regions of the

genome and non-coding regions are more prone to mutation. Therefore, several small regions

of the chloroplast genome (such as some intergenic spacers) show potential for

phylogeographic analysis [59,74,75].

Several attempts indicate that single cpDNA loci are only occasionally useful at the

intraspecific level, but as technology progresses and the sequencing of larger fragments of

DNA becomes easily achievable, with diligent sequencing efforts, it seems likely that

sufficient genetic variation can be uncovered and studies will utilize more of the potentially

available variation in the chloroplast. Ultimately, finer phylogeographic resolution can be

obtained [7,10]. Indeed a few studies have already proved intergenic spacer regions as useful

regions for direct sequencing, such as, for example, trnT-L-F in Ficus carica [76][74] and

trnH-psbA in Eucalyptus perriniana [77][78], as well as psbC-trnS intergenic spacer region

[79] in several Quercus species [75].

1.2.3 Nuclear DNA (nuDNA)

The remaining alternative is the nuclear genome that is still largely unexplored but offers a

potentially inexhaustible source of informative genetic variation, and lately many

investigators are developing techniques and strategies for locating and efficiently sampling

appropriate variation in nuclear DNA.

The ITS region, useful for plant systematics, is however generally not very helpful for

phylogeographic studies. First, for most species examined, intraspecific variation has not

always been detected in this region. Furthermore, as part of a multicopy gene family, the ITS

region is subjected to poorly understood processes of concerted evolution, which may lead to

problems with the interpretation of sequence polymorphism at the intraspecific level. Also,

when a locus is part of a multicopy gene or multigene family, PCR amplification with

conserved primers may produce multiple fragments, including duplicated gene copies,

pseudogenes, and even recombinant PCR artifacts. Care is thus necessary to avoid comparing

paralogous loci, which may be especially difficult to detect in cases where there has been

differential homogenization of gene copies among populations [7,80].

In principle, single-copy nuclear (scn) genes should also provide sufficient sequence data for

phylogeographic assessments at the intraspecific level, but three technical and biological

obstacles need to be considered: first the considerable slow rate of sequence evolution at

many nuclear loci; in diploid organisms, the difficulty of isolating aleles, one at a time; and

intragenic recombination. Nonetheless, scnDNA has been employed successfully in some

phylogeographic assessments, with some of the most informative results coming from intron

sequences at protein-coding genes [10,81]. However, so far, no single locus appears to be

universally useful in all species of plants.

Additional features of the nuclear genome also need to be taken into account for

phylogeographic analysis such as complications involving interallelic recombination and

heterozygosity, recombinant alleles from crossing-over events among alleles of a locus

resulting in chimeric haplotypes and also the homology of the loci in use needs to be

reassured. Some (and probably many) „single-copy‟ nuclear genes exist as part of small gene

families consisting of two to ten expressed loci and possibly additional pseudogenes

[7,10,80,81].

Despite all of these potential problems the nuclear genome is still, perhaps, the most dynamic

and useful marker for studying plant phylogeography because it is much larger than the

others and includes most of the information behind the shaping and adaptation of the

individual to the environment.

1.2.4 Simple Sequence Repeats (SSRs)

Microsatellites (or simple sequence repeats - SSRs) are short repetitive sequences of

nucleotides of typically 1-5 base pairs (bp) motifs, that are repeated in tandem up to a usual

maximum of 60 or so, and are widespread in both eukaryotic and prokaryotic genomes

[82,83]. Less accuracy of traditional molecular markers in the estimation of genetic

differences between various taxa and their insufficient statistical capacity forced researchers

to look towards better alternatives like microsatellites. They present a group of characteristics

that make them eligible as markers of choice for several studies, such as: 1- PCR-based, 2-

co-dominant, 3- usually multiallelic and highly variable, 4- randomly dispersed throughout

the genome, and 5- easily scorable by different methods [82,84,85]. Neutral nuclear SSRs

(nuSSRs) are the choice for diversity analysis, genetic mapping and association studies [86].

The use of microsatellites as polymorphic DNA markers has considerably increased over the

years, and although they were originally designed for research in humans, they have been

extensively used for genetic analysis in all classes of organisms, including plants [82,85].

With the development of other genetic markers like single nucleotide polymorphisms (SNPs)

and AFLPs, it was thought that the use of microsatellites would decline. However, recent

research has improved its application so much that microsatellites will probably still be used

in the near future as important genetic markers in various biological disciplines [11,82]. The

initial cost associated with microsatellites may be high due to the requirement of sequence

information, but once developed they can be easily maintained and shared between

laboratories. The ease of use, high reproducibility, low cost and abundance of SSR loci in

living organisms makes them ideal markers for genetic analysis. Also they are multi-allelic

and generally have high heterozygosity and mutation rates (ranging from 10-6

to 10-2

events

per locus per generation), which can make them more informative than other markers, such as

Random Amplified Polymorphic DNA (RAPDs) and AFLPs [82,85].

Particularly in Quercus suber, Simões de Matos (F. Simões de Matos, PhD thesis, INETI

Lisbon, 2007) developed the only specific nuSSRs for cork oak (as a rule SSRs are species-

specific markers which must be developed de novo for each species, mainly because they

usually occur in non-coding regions of the genome which are not highly conserved) but some

studies [87,88] have shown the transferability of nuSSRs from other oak species to cork oak,

which potentially reduces the need to develop species-specific nuSSRs for this species.

1.2.5 Expressed Sequence Tags (ESTs)

Expressed Sequence Tags (ESTs) can serve as a source of molecular markers as gene

sequences, SSRs or SNPs, and are an easy way to access fragments of the transcriptome.

They are short (200-800 bases), randomly selected sequences derived from cDNA libraries.

Even if ESTs are not available from the organism under study, EST collections can serve as a

bridge between the genomic resources of model organisms and diverse species of interest,

usually nonmodel organisms. ESTs provide information of the transcribed mRNA

populations within a given set of tissues, developmental stages, environmental conditions and

genotypes [89,90]. For instance, the direct sequencing of EST fragments and subsequent

detection of SNPs would be the most useful way of studying geographical distribution of

genetic variation within species. As most ESTs are directly involved in the genetic control of

an adaptive trait and have a known function, ESTs are the genetic marker that offer real

potential for detecting adaptive genetic diversity [90].

As an alternative to the conventional strategy for detecting anonymous SSRs, large numbers

of novel SSRs can be isolated with comparatively minor effort simply by in silico mining of

the ESTs databases [91,92]. This approach has become a routine for some species, and there

are many characteristics that EST-SSRs (EST-derived SSRs) present and that make them

valuable as genetic markers. These include their presence in large numbers, high levels of

polymorphism compared with many other types of genetic markers, co-dominant inheritance,

repeatability and clarity of scoring, and enhanced transferability across related species

[91,93]. Perhaps the greatest concern about the utility of EST-SSRs in population genetic

analysis is that selection on these loci might influence the estimation of population

parameters. Indeed, divergent selection will increase differentiation among and reduce

variability within populations, whereas the opposite effect is expected under balancing

selection. However, studies of large-scale comparative analyses suggest that only a very

small percentage of all genes are experiencing positive selection [91,93]. Inevitably some

fraction of all EST-SSRs will be subjected to selection.

Recently a significant number of EST‟s was generated in oaks, and particularly in cork oak.

Since EST‟s are gene conservative primers designed for a species are likely to work well in

related ones.

Ultimately, the potential of phylogeography may be fully accomplished when multiple loci

are considered. The combined analysis of different marker types should allow a

reconstruction of past population events in great detail, and also help understand their spatial

structure and the dynamics of genetic diversity.

2. Materials and Methods

2.1 Sampling and DNA extraction

Sampling of 26 natural populations was performed from the entire Mediterranean distribution

(Fig. 1.2 and Table 2.1). In Portugal sampling was performed surveying the following

locations: Gerês, Serra da Estrela, Serra de São Mamede, Serra da Arrábida, Serra de

Monchique, Serra do Buçaco, Azeitão and Serra de Sintra. Stands were considered as natural

populations when constituted by irregularly disposed trees with over 50 years old. The

remaining populations from Portugal (São Brás de Alportel), Spain (Cataluña, Montes de

Toledo, Haza del Lino, Sierra de Aracena, Sierra Morena, Sierra de Guadarrama), Italy

(Puglia, Lazio and Sicily), France (Var, Landes and Corsica), Algeria (Forêt des Guerbès),

Tunisia (Mekna and Fermana) and Morocco (Taza and Kenitra), were obtained from a cork

oak provenance trial, located at Herdade Monte da Fava (Ermidas do Sado) , which harbours

an international provenance trial established in 1998 in the frame of the Q. suber network

from EUFORGEN, covering the complete distribution range of the species. Access to these

populations was kindly provided by Helena Almeida from Instituto Superior de Agronomia.

From each population 3-5 trees were sampled for the cpDNA and nuDNA fragment analysis.

Young leaves were collected from Spring 2009 to Summer 2010 on a total of 119 adult trees

distributed among the 26 sampled populations (Table 2.1).

Of the 26 populations chosen, 13 were selected for a wider sampling for the SSR study. The

selected locations are representative of the entire Mediterranean distribution, and are the

following: Portugal (Gerês, Serra da Estrela, Serra da Arrábida, Serra de Monchique, Serra

do Buçaco and Serra de Sintra), Spain (Cataluña and Haza del Lino), Italy (Puglia), Algeria

(Forêt des Guerbès), Tunisia (Mekna) and Morocco (Taza and Kenitra). For each population

22-32 trees were obtained. Young leaves were also collected from Spring 2009 until Summer

2010, on a total of 379 adult trees distributed among the 13 sampled populations (Table 2.1).

Several other Quercus species (namely Q. robur, Q. pyrenaica, Q. faginea, Q. rubra, Q.

lusitanica, Q. canariensis, Q. cerris, Q. ilex (subsp rotundifolia and subsp ilex) and Q.

coccifera) were also sampled from natural populations and used to help determine the Q.

suber lineages, and also to more accurately establish the phylogenetic relationships of these

lineages. According to the taxonomic classification of Schwartz [26] Q. cerris is part of the

subgenus Cerris, together with Q. suber. As for Q. petrea, Q. robur, Q. pyrenaica, Q. faginea

and Q. lusitanica they belong to the subgenus Quercus. The species Q. coccifera and Q. ilex

belong to the subgenus Sclerophyllodrys. Finally, Q. rubra is part of the subgenus

Erythrobalanus. Castanea crenata was used as an outgroup (Table 2.1). Species identification

of each tree was checked based on the leaf morphology, and presence of bark in Q. suber,

assessed during the growing season on fully elongated leaves.

The leaves were ground thoroughly with liquid Nitrogen, with a mortar and pestle, and then

the genomic DNA was extracted according to Qiagen‟s protocol for DNeasy plant mini kit

(Qiagen). The samples were analysed by electrophoresis on 1% w/v agarose gels stained with

Red Safe 20,000x (iNtRON Biothechnology), to determine DNA integrity.

2.2 DNA sequencing

Polymerase chain reaction (PCR) amplifications were performed for 148 Quercus samples

(Table 2.1) for fragments of three different chloroplastidial DNA regions [intergenic spacer

regions TrnL-F [74], TrnS-PsbC [79] and TrnH-PsbA [78]]. Considering preliminary results

of the cpDNA fragments analysis, 104 individuals, out of the 148, were selected for

amplification of one nuclear DNA fragment [Expressed Sequence Tag (EST) 2T13 [94], a

stress osmotic related gene] (Table 2.1). The primers used to amplify each fragment were

those described by each mentioned author (Supporting Information 1 – Table S1.1 and Table

S1.2).

To confirm Quercus species and assess the usefulness of barcodes as phylogeographical

markers the official cpDNA barcode fragments (matK and rbcL) were amplified with the

primers described by Cuénoud et al. [95] and Kress & Erickson [32], respectively

(Supporting Information 1 – Table S1.1). Three individuals of each cork oak lineage,

identified in the previous analysis of cpDNA regions, and one individual of each other

Quercus species were selected for the analysis.

PCRs were performed in a final volume of 25 μL, with 1 μL of DNA (50–100 ng), 1x PCR

buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4

µM of each primer. PCR amplification conditions were as follows: an initial denaturation

step at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 20 s,

annealing at 65 °C for 30 s for intergenic cpDNA fragments and 55 °C for the nuclear

candidate gene and barcode fragments, extension at 72 °C for 40 s, and a final extension step

at 72 °C for 7 min. PCR and amplification conditions were the same for all oak species.

PCR products amplifications were verified by staining with Red Safe 20,000x (iNtRON

Biothechnology) along with the molecular weight marker HyperLadder™ IV (Bioline) on 1%

w/v agarose gels. Amplicons were purified using SureClean (Bioline).

The nuclear EST fragments Phyt B (Phytocrome B, involved in flower phenology) [96] and

Cons 58 (Auxin repressed protein) [97] were also tested, with the primers described by the

mentioned authors (Supporting Information 1 – Table S1.2); however after several attempts

of optimization no amplification product was obtained.

Sequencing reactions were carried out using the BigDye v3.1 chemistry (Applied

Biossystems, ABI) on an ABI prism 310 automated sequencer. Amplicons were sequenced in

both directions with an initial denaturation at 96ºC for 1 min, followed by 25 cycles of 96ºC

for 10s, annealing temperature of 50ºC for 5s, and a final extension step at 60ºC for 4 min.

The amplified products were purified through a 70% ethanol precipitation, described as

follows. The total reaction volume was transferred to a 1.5 ml tube containing 1 μl of 3 M of

sodium acetate and 25 μl of absolute ethanol. This mixture was subsequently incubated on ice

for 30 min, and then centrifuged at 10,000 g for 25 min. The supernatant was discarded and

300 μl of 70% ethanol were added to each tube, which were centrifuged for 15 min at the

same speed; this last step was performed a second time. Finally, the supernatant was

completely discarded and the samples were air-dried in the dark, until further processing.

The products were sequenced in an ABI PRISM® 310 Genetic Analyzer (Applied

Biosystems, USA) available in the laboratory.

Chromatograms were manually checked for errors in SEQUENCHER v4.0.5 (Gene Codes

Co.). For the nuclear fragment, nucleotide ambiguities of similar peak size in chromatograms

were considered as evidence of potential heterozygous sites. The IUPAC ambiguity code was

used for subsequent analyses.

BLAST (Basic Local Alignment Search Tool) against NCBI database

(http://blast.ncbi.nlm.nih.gov) was always performed to confirm the fragments‟ identity. The

Species Country Site Code GPS coordinates Sample size

Lat Long cpDNA nuDNA SSRs

Q. suber Portugal Azeitão AZT 38º 30'N 9º 02'W 5 3 -

Gerês GER 41º 40'N 8º 10'W 5 3 29

Serra de Monchique MON 37º 19’N 8º 34’W 5 3 29

Serra da Arrábida ARR 38º 50’N 9º 03’W 5 4 30

Serra São Mamede SSM 39º 23'N 7º 22'W 5 4 -

Serra de Sintra SIN 38º 45’N 9º 25'W 5 3 30

Serra do Buçaco BUC 40º 22'N 8º 21'W 5 3 30

São Brás de Alportel SBA 37º 20'N 7º 56'W 5 3 -

Serra da Estrela EST 40º 32'N 7º 51'W 5 4 32

Tunisia Mekna, Tabarka MEK 36º 57'N 8º 51'E 5 3 28

Fermana FER 36º 35'N 8º 32'E 3 3 -

Algeria Forêt des Guerbès ALG 36º 54'N 7º 15'E 5 3 30

Italy Puglia, Brindisi PUG 40º 34'N 17º 40'E 5 3 22

Lazio, Tuscany LAZ 42º 25'N 11º 57'E 5 3 -

Sicily, Catania SIC 37º 07'N 14º 30'E 3 3 -

France Landes, Soustons LAN 43º 45'N 1º 20'W 5 2 -

Var, Bomes les Mimoses VAR 43º 08'N 6º 15'E 3 3 -

Corsica, Sartene COR 41º 37'N 8º 58'E 3 3 -

Morocco Kenitra KEN 34º 05'N 6º 35'W 5 3 30

Rif, Taza TAZ 34º 12'N 4º 15'W 5 3 30

Spain Sierra de Guadarrama GUA 40º 31'N 3º 45'W 5 3 -

Montes de Toledo, Cañamero

TOL 39º 22'N 5º 21'W 5 3 -

Haza del Lino HAZ 36º 50'N 3º 18'W 5 5 29

Sierra Morena, Fuencaliente

MOR 38º 24'N 4º 16'W 4 3 -

Cataluña, Sta Coloma de Farnes

CAT 41º 51'N 2º 32'W 5 3 30

Sierra de Aracena, Jabugo

ARC 37º 54’N 6º 44’W 3 3 -

Q. rotundifolia

Portugal Ermidas do Sado

38º 00'N 8º 07'W 2 2 -

Serra da Arrábida 38º 50’N 9º 03’W 1 1 -

Serra da Estrela 40º 32'N 7º 51'W 1 1 -

Serra de São Mamede 39º 23'N 7º 22'W 9 6 -

Fátima 39º 37'N 8º 40'W 1 1 -

Q. ilex France 43º 09’N 3º 03’E 2 1 -

Q. coccifera Portugal Cascais, Aldeia de Juzo 38º 72’N 9º 09’W 5 3 -

Q. faginea Portugal Serra da Arrábida 38º 50’N 9º 03’W 1 1 -

Q. pyrenaica Portugal Serra da Estrela 40º 32'N 7º 51'W 1 1 -

Q. robur Portugal Serra da Estrela 40º 32'N 7º 51'W 1 1 -

Q. canariensis

Portugal Lisbon 38º 45’N 9º 09’W 1 - -

Q. lusitanica Portugal Negrais

38º 52’N 9º 17’W 1 1 -

Q. rubra Portugal Lisbon

38º 45’N 9º 09’W 1 1 -

Q. cerris Italy Greve in Chianti 43º 35’N 11º 18’E 1 1 -

Castanea crenata

Portugal Vila Real

Total 148 104 379

Table 2.1: Description of the sampled populations for the several species, and sample size for each marker

(cpDNA and nuDNA sequences, and SSRs).

matK sequence for Quercus crenata was retrieved from GenBank (accession number

FN675334, [29]).

2.3 Microsatellite genotyping

A total of 9 dinucleotide nuclear anonymous microsatellite (nuSSRs) markers previously

developed on other oaks species were used in this study; one of them, MSQ13, was first

described in Q. macrocarpa Michx. [98], five in Q. petraea (Matt) Liebl. (QpZAG9,

QpZAG15, QpZAG36, QpZAG46 and QpZAG110) [99], and three in Q. robur L.

(QrZAG11, QrZAG7 and QrZAG20) [100]. Transferability of these SSRs to cork oak had

been previously reported [87,88]. These microsatellites are considered as unlinked and

anonymous markers [101]. Amplifications were performed with the primers designed by the

previously mentioned authors and the conditions were as follows: an initial denaturation step

at 94 °C for 5 min followed by 30 cycles consisting of denaturation at 94 °C for 60 s,

annealing at 50 °C for 30 s (specific annealing temperatures in Table S2.1 – Supporting

Information 2), extension at 72 °C for 60 s, and a final extension step at 72 °C for 10 min.

PCRs were performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR

buffer (Promega), 1U Taq polymerase (Promega), 2.0 mM MgCl2, 0.12 mM dNTPs and 0.4

µM of each primer. However, considering the authors‟ guidelines for PCR, and after several

attempts, the loci QpZAG36 and QpZAG46 presented no amplification product for most of

the samples or very unreliable scoring and were, therefore, abandoned.

Two nuSSRs developed by Simões de Matos (F. Simões de Matos, PhD thesis, INETI

Lisbon, 2007) specifically for cork oak (QsA11 and QsD8) were also tested. However, in

spite of the optimization attempts a clear scoring was never possible and the loci were also

discarded.

At the onset of this work there were no EST-derived microsatellites (EST-SSRs) specifically

for cork oak, but since ESTs are gene conservative sequences, primers designed for a species

are likely to work in related ones. So, six polymorphic EST-SSRs were selected from

Quercus mongolica (QmOST1, QmD12, QmAJ1, QmDN1, QmDN2, QmDN3) [92]. The loci

names were chosen for this work and correspond, respectively, to the following NCBI dbEST

(http://www.ncbi.nlm.nih.gov/dbEST) accession numbers: DN949770, CR627959,

AJ577265, DN950717, DN949776, and DN950726. The selected sets of specific primers for

each SSR used can be found in Ueno & Tsumura [92] (Supporting Information 2 – Table

S2.2). PCR amplification conditions were as follows: an initial denaturation step at 94 °C for

5 min followed by 30 cycles consisting of denaturation at 94 °C for 30 s, annealing at 57 °C

for 30 s (specific annealing temperatures in Table S2.2 – Supporting Information 2),

extension at 72 °C for 30 s, and a final extension step at 72 °C for 10 min. PCRs were

performed in a final volume of 15 μL, with 0.5 μL of DNA (50–100 ng), 1x PCR buffer

(Promega), 1U Taq polymerase (Promega), 1.5 mM MgCl2, 0.12 mM dNTPs and 0.3 µM of

each primer. After several attempts of amplification, the locus QmDN2 presented no PCR

products.

PCR product electrophoresis was performed with an ABI PRISM 310 automated sequencer

and the genotypes were scored and visually controlled using the GENEMAPPER software

v3.7 (Applied Biosystems, Inc.) To identify and correct possible genotyping errors the

software MICRO-CHECKER v2.2.3 [102] was used.

2.4 Phylogenetic and phylogeographic analysis

Datasets for each sequenced fragment were aligned in CLUSTAL X v2.0.12 [103,104],

followed by manual refinement in BIOEDIT v7.0.9 [105]. To create the cpDNA concatenated

matrix from the individual datasets of TrnL-F, TrnS-PsbC and TrnH-PsbA fragments, the

CONCATENATOR v1.1.0 software was used [106].

Phylogenetic analysis was performed using PAUP* v4.0.b4a [107]. Maximum parsimony

(MP) analyses were carried out on all data sets. The optimal tree was found by a heuristic

search with tree-bisection–reconnection as the branch-swapping algorithm. Initial trees were

obtained via stepwise addition with 1000 replicates of random addition sequence.

Bootstrapping with 1000 replicates was performed to evaluate the robustness of the nodes of

the phylogenetic trees.

Bayesian analyses (BA) were undertaken using MRBAYES v3.1.2 [108] with the optimal

model selected under the Akaike Information Criterion (AIC), as implemented in

MrMODELTEST v2.3 [109]. For analysis of the combined data, model selection was carried

out separately for each cpDNA data set with MrMODELTEST and then implemented

according to the author‟s recommendations. Additionally indels were included and scored as

binary characters (absent/present). The posterior probabilities of the phylogenetic trees were

estimated by a Metropolis-coupled, Markov chain Monte Carlo sampling algorithm

(MCMCMC), sampling at every 1000th

generation. For the individual and combined cpDNA

datasets, Bayesian posterior probabilities were generated from 6x106

and 5x108 generations,

respectively. For the nuclear fragment dataset 3x106 generations were used to calculate the

Bayesian phylogeny and respective posterior probability values. The analysis was run three

times with one cold and three incrementally heated Metropolis-coupled Monte Carlo Markov

chains, starting from random trees. Ten percent of the generations were discarded as burn-in.

Trees were then combined and summarized on a 50% majority-rule consensus tree.

The cpDNA fragments, when aligned for all oak species presented several indels, therefore

only the MP and BA analyses were performed, because only these allow considering indels

as informative data.

The program NETWORK v4.6 [110] was used to construct a median-joining network of

haplotypes showing the number of mutational steps between them.

2.5 Selective neutrality tests and demographic history

Selective neutrality of each microsatellite locus was examined based on the sampling

distribution of neutral alleles under the infinite-alleles model. The Ewens–Watterson

homozygosity test [111] and the Ewens–Watterson–Slatkin exact test [112,113] were

performed using the absolute allele frequency distribution, as implemented in ARLEQUIN

v3.5 software [114]. In these tests, the expected null distribution of the homozygosity statistic

(Fexp) is generated by simulating random neutral samples, which is then compared with the

homozygosity observed in the original sample (Fobs). If the null hypothesis of selective

neutrality is rejected (p<0.05), an Fobs/Fexp ratio less than 1 implies balancing selection in

favour of heterozygotes and a ratio greater than 1 implies directional selection in favour of

advantageous alleles.

The mismatch distribution (1000 replicates) was used to infer the demographic history of the

cork oak lineages present in each cpDNA and nuDNA datasets. Pairwise distances between

haplotypes, time since population expansion (τ), relative population size before (θ0) and after

(θ1) expansion were calculated in ARLEQUIN. The Harpending's (1994) raggedness index

(r) and the sum of squared deviation (SSD) to assess the statistical significance of the

distribution under the rapid expansion model was tested with 1000 replicates of bootstrap in

ARLEQUIN.

Both Tajima‟s D [116] and Fu‟s Fs [117] tests were implemented to test deviations from

neutrality. Fu‟s Fs uses information from the haplotype distribution and is particularly

sensitive to population demographic expansion where low Fs values indicate an excess of

single substitutions usually due to expansion [117,118]. Tajima‟s D uses the average number

of pairwise differences and number of segregating sites in the intraspecific DNA sequence to

test for departure from neutral expectations, generally assuming negative values in

populations that have experienced size changes, or for sequences that have undergone

selection [116,118]. Fu‟s Fs and Tajima‟s D were calculated in ARLEQUIN.

2.6 Genetic diversity and population differentiation

Linkage disequilibrium (LD) between all pairs of polymorphic SSR loci was calculated using

the probability test implemented in GENEPOP v4.0 software [119]. Using the complete

sampling, the nucleotide diversity (π) and its standard deviation, Haplotype diversity (Hd)

and Indel Haplotype diversity (IndelHd) were estimated for each selected sequenced

fragment in DnaSP v10.01 [120].

Gene diversity statistics (gene diversity He [121] and allelic richness A) were estimated for

microsatellites using the program FSTAT v2.9.3.2 [122,123]. Allelic richness (A) was

corrected using the rarefaction method based on a minimum sample size of 21 diploid

individuals, which corresponded to the smallest number of individuals successfully

genotyped for a given locus in a population. The private alleles were calculated in GenAlEx

v6.3 [124]. The inbreeding coefficient Fis [125] was calculated using ARLEQUIN and its

deviation from zero tested by 10,000 allele permutations. Population differentiation was

calculated by FST [125] and RST [126] in ARLEQUIN.

SMOGD software v1.2.5 [127] was used to measure the actual differentiation among

populations (Dest) according to Jost [128], G‟ST standardized measure of genetic

differentiation [129] and GST nearly unbiased estimator of relative differentiation [130].

Pairwise genetic differentiation between populations was estimated with FST, RST and Dest, in

FSTAT, ARLEQUIN and SMOGD, respectively. Standard Bonferroni corrections were

applied to account for multiple testing.

Geographic patterns of genetic differentiation were tested by regressing the genetic

differentiation (FST) against geographic distance between pairs of samples, following Rousset

[131] [FST/(1-FST) and logarithm of geographic distances between populations]. The reduced

major axis regression was used to estimate the regression, using the IBDWS v3.03 software

[132]. Mantel tests were used to test the null hypothesis of no relationship between the

genetic and geographic matrices.

2.7 Genetic structure of populations

The Bayesian clustering method implemented in STRUCTURE v2.3.3 [133] was used to

determine the genetic structure of the sampled populations for the microsatellite loci. Because

preliminary analyses showed that overall differentiation was low the new clustering method

was used, which is not only based on the individual multilocus genotypes but also takes into

account the sampling locations [134]. The LocPrior model considers that the prior

distribution of cluster assignments can vary among populations. This approach is

recommended by the authors when the genetic data are not very informative to help the

detection of population structure. A parameter r indicates the extent to which the sampling

locations are informative (small values <1 indicate that locations are informative). Twenty

independent runs were done, following a Markov Chain Monte Carlo (MCMC) scheme, for

each value of K (the number of putative clusters) ranging from 1 to 13 (the number of

populations sampled). The admixture model with sampling locations as prior information

[134] was selected and correlated allele frequencies among populations were assumed [135].

Each run consisted in a MCMC length of 1,000,000 and 50,000 burnin. It was used the

posterior probability of the data for a given K, LnP(D), to identify the most probable number

of clusters using both DK (DeltaK) ad hoc statistics [136] and guidelines of the software

documentation [133]. Once the most likely K value was determined, for interpreting results

was chosen the run with the higher posterior probability and lower variance. Final results

from STRUCTURE were visualized using the software DISTRUCT v1.1 [137].

The degree of population subdivision was also explored as implemented in the R-package

GENELAND v3.2.4 [138]. This latter approach determines the number of groups (K) using a

Bayesian clustering model executed in a MCMC scheme to detect the location of genetic

discontinuities using individual geo-referenced multilocus genotypes [139]. GENELAND

uses geographical locations of individuals as prior information. This model treats the number

of clusters as a parameter processed by the MCMC scheme without any approximation and

may provide a better estimation of the number of clusters than other proposed procedures that

do not take the geographical locations into account [139,140]. Twenty independent MCMC

runs were performed, allowing K to vary from 1-13 (the number of populations sampled),

with the following parameters: 1,000,000 iterations, of which every hundredth one was saved

(after 10% burnin), treating the number of genetic clusters as unknown and using Dirichlet

model for allelic frequencies (assumed as correlated).

Results obtained following the GENELAND and STRUCTURE approaches were further

tested with an Analysis of Molecular Variance (AMOVA) approach [141].

Results

3. Results

3.1 Sequencing of chloroplast and nuclear DNA fragments

3.1.1 cpDNA and nuDNA diversity levels

Initially, 148 samples were sequenced for the cpDNA fragments studied (Table 2.1). The

cpDNA fragments, when aligned for all oak species presented several indels. When

ambiguous alignments were produced, several slightly different alignments including the

removal of the ambiguous positions or indels were tested, without producing any major

differences in the results. The nucleotide diversity found in each dataset was 0.00400 (+/-

0.0009), 0.00925 (+/- 0.0007) and 0.00549 (+/- 0.0004) for the fragments TrnS/PsbC,

TrnH/PsbA and TrnL-F, respectively (Table 3.1). For the cork oak samples (119 out of the

148 individuals), a total of 8 TrnS/PsbC haplotypes, 7 TrnH/PsbA haplotypes, and 5 TrnL-F

haplotypes, were obtained. For the cpDNA concatenated dataset 17 cork oak haplotypes were

detected with a nucleotide diversity of 0.00658 (+/- 0.005) (Table 3.1). After a preliminary

analysis of the cpDNA sequences, 104 samples (out of the 148) from the main groups were

selected and sequenced for candidate gene EST 2T13 (Table 2.1). The alignment was

straightforward showing no potential heterozygous sites for the cork oak samples. The

nucleotide diversity estimated for the EST 2T13 fragment is 0.02387 (+/- 0.0126) (Table 3.1),

with 8 cork oak haplotypes.

Table 3.1: The length (bp), number of parsimony informative sites (PI) and

estimated nucleotide diversity (π) and its standard deviation for each dataset,

using the complete sampling.

Lenght (bp)

Variable sites

Total characters Indels PI π

Individual cpDNA

TrnS/PsbC 250 20 238 12 15 0.00400

+/- 0.0009

TrnH/PsbA 478 54 448 30 34 0.00925

+/- 0.0007

TrnL-F 381 18 374 7 14 0.00549

+/-0.0004

Concatenated

TrnS/TrnH/TrnL 1109 92 1060 49 63 0.00658

+/-0.0005

Individual nuDNA

EST 2T13 249 48 240 9 20 0.02387

+/-0.0126

Results

3.1.2 Differentiation patterns

Maximum parsimony (MP) trees for the cpDNA fragments TrnS/PsbC, TrnH/PsbA and

TrnL-F are presented in Fig. 3.1a, Fig. 3.2 and Fig. 3.3, respectively. The Bayesian analyses

(BA) derived trees showed very similar results to those of the MP analysis; therefore it was

decided to present the MP tree of each fragment with the respective bootstrap and clade

credibility values. The concatenated tree supported the results of the individual trees

(Supporting Information 3). In all the cpDNA phylogenetic trees (Fig. 3.1a, Fig. 3.2 and Fig.

3.3), four major groups were distinguished, and were named as Group A, B, C and D. Group

A (highlighted in the figures in yellow) is composed exclusively by samples of the subgenus

Cerris, namely cork oak samples from several populations and Q. cerris. Group B appears as

a more complex group since it is composed of samples of several Quercus species, namely Q.

suber (highlighted in the trees in orange – subg Cerris), and Q. coccifera (green), Q. ilex ilex

(pink) and Q. ilex rotundifolia (red) of the subgenus Sclerophyllodrys. Considering the

presence of cork oak samples in these two groups, and that the samples present in each group

were always the same for all the cpDNA fragments, the haplotypes belonging to Group A

were considered as a pure lineage of cork oak, while the samples belonging to Group B were

considered as an introgressed lineage of cork oak. Group C (highlighted in blue) is composed

by several Quercus species, specifically Q. faginea, Q. robur, Q. pyrenaica, Q. lusitanicus

and Q. canariensis from the subgenus Quercus. Finally, Group D is constituted by Q. rubra

of the subgenus Erythrobalanus.

In particular, Group A is composed of 92 cork oak samples (out of the 119) and the sample of

Q. cerris, and was characterized by low levels of variation and number of haplotypes (Table

3.2). This was particularly evident for the TrnL-F fragment for which only one haplotype was

found (Fig. 3.3), and for TrnH/PsbA fragment where again only one major haplotype is

present, although two derived low frequent haplotypes are present in Puglia (Italy) (Fig. 3.2

and Table 3.2). In both these fragments Q. cerris shares the same haplotype as Q. suber.

Higher variation was found for TrnS/PsbC fragment in cork oaks pure lineage (Table 3.2),

allowing the distinction of tree sublineages, that were named as A1, A2 and A3 (Fig. 3.1 and

Table 3.2). In Fig. 3.1b a reconstruction of the phylogenetic tree for the pure lineage of the

TrnS/PsbC fragment shows the major haplotypes of each sublineage and the mutational

events that occurred during the formation of those haplotypes. The sublineage A1 (sl A1) is

exclusive to the island of Sicily, the sublineage A2 (sl A2) is present in West Mediterranean

Results

Figure 3.1: a) Maximum parsimony tree of the cpDNA TrnS/PsbC intergenic spacer region. Four groups are

represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage and Q. cerris (Bright

Yellow - Sublineage A2 (Sl A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1

(Sl A1)); Group B (orange – cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink –

Q. ilex); Group C is highlighted in dark blue and is composed of several Quercus: Q. faginea, Q. robur, Q.

pyrenaica, Q. canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra.

Numbers at the nodes are the bootstrap support values obtained from 1000 replicates for the MP analysis and

the Bayesian credibility value; b) Detailed phylogenetic reconstruction of the sublineages from Group A.

Bootstrap support and Bayesian credibility value are provided above each branch. The site combinations bellow

each branch represents the mutational events that occurred along the evolution of the three sublineages.

Results

Figure 3.2: Maximum parsimony tree of the cpDNA TrnH/PsbA intergenic spacer region. Four groups are

represented and color coded. Group A is highlighted in yellow: cork oak‟s Pure lineage; Group B: cork oak‟s

introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted

in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and

Q. lusitanica; Group D is hightlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the

bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.

Results

Figure 3.3: Maximum parsimony tree of the cpDNA TrnL-F intergenic spacer region. Four groups are

represented and color coded. Group A is highlighted in yellow: cork oak‟s pure lineage; Group B: cork oak‟s

introgressed lineage – orange; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex; Group C is highlighted

in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and

Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the

bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value.

Results

Hd Indel Hd nr H

TrnS/PsbC

Pure Lineage (A) 0.000 0.471 4

Introgressed lineage (B) 0.000 0.649 4

TrnH/PsbA Pure Lineage (A) 0.024 0.024 3

TrnL-F Pure Lineage (A) 0.000 0.000 1

EST2T13 Pure Lineage (α) 0.494 0.028 6

Introgressed lineage (β) 0.000 0.182 2

Table 3.2: Haplotype diversity (Hd), Indel Haplotype

diversity (Indel Hd) and number of haplotypes for each

cpDNA and nuclear fragment (nr H) according to pure

and introgressed cork oak lineages.

populations, and the sublineage A3 (sl A3) is present in East Mediterranean populations (Fig.

3.4). For this fragment Q. cerris shows a

derived haplotype from the sublineage A3

(Fig. 3.1a).

Group B is composed of all the Q.

coccifera and Q. ilex (subspecies ilex and

rotundifolia) samples that were analysed,

as well as 27 of the 119 cork oak

samples. Most of the cork oak haplotypes

that belong to this group are shared or

seemed derived from the haplotypes

present in the other species.

When comparing cork oak samples in Group B, belonging to the introgressed lineage, they

generally presented more variability than those in Group A, the pure lineage (Table 3.2).

All the three cpDNA fragments seem to be able to distinguish groups C and D, even if with a

low resolution, most evident in the TrnL-F fragment. Also these groups are quite inconsistent

regarding their position on the trees, as well as the phylogenetic relationships between them

and the other groups (Fig. 3.1a, Fig. 3.2 and Fig. 3.3).

The analysis of the barcode matK fragment provided roughly the same nucleotide diversity

(π=0.0067 +/- 0.00095) as the remaining cpDNA fragments analysed, whereas the rbcL

fragment presented no variation for any of the species. The analysis of the matK fragment and

the resulting phylogenetic tree (Fig. 3.5) corroborated the presence of two cork oak lineages.

Quercus cerris and Quercus crenata are classified, both by classic taxonomy [17] and recent

DNA barcode analysis [29], as the most closely related species to Q. suber (subgenus Cerris),

and these species appear with the same haplotype as cork oak‟s samples characterized as pure

lineage (subgenus Cerris). Cork oak samples characterized as the introgressed lineage appear

closely related to Quercus ilex ilex, Quercus ilex rotundifolia and Quercus coccifera

(subgenus Schlerophyllodrys).

Results

Figure 3.4: Geographical distribution of cork oak cpDNA haplotype lineages according to the TrnS/PsbC

fragment. Pie charts represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect

the number of samples per population (3-5). Colour codes reflect those in the TrnS/PsbC tree (Fig. 3.1);

Yellow: cork oak‟s Pure lineage (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3;

Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage. In grey is represented the present

distribution of the species.

Figure 3.5: Maximum parsimony tree of the cpDNA fragment matK. Four groups are represented and

classified according to classic taxonomy (Tutin et al. 1993), following the four subgenera identified by

Schwartz 1964 (Sclerophyllodrys, Cerris, Quercus and Erythrobalanus). Numbers at the nodes are the

bootstrap support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility

value.

Results

For the nuclear fragment EST 2T13, the best model of sequence evolution for the BA was

calculated, and the resultant tree presented a very similar topology to that of the MP. The MP

tree is shown in Fig. 3.6a with the respective bootstrap values of MP and clade credibility

values for the BA. The nuclear tree reflects the same pattern as the cpDNA trees. The same

four major groups are present, here named as Group α, β, γ and δ.

Group α is constituted by Q. suber [40 samples out of the 52 used, (Table 2.1)] and Q. cerris,

similarly to the cpDNA results. The pattern of three sublineages is also present, α1, α2 and

α3, although Q. cerris in the nuclear fragment appears to share the same haplotype of the

cork oak samples from sublineage α3. In Fig. 3.6b a reconstruction of the phylogenetic tree

for the nuclear pure lineage of the EST 2T13 fragment shows the major haplotypes of each

sublineage and the six mutational events that occurred during the formation of those

haplotypes. Group β, as in the cpDNA trees, is constituted by cork oak, Q. coccifera and Q.

ilex samples. Also as in the cpDNA trees, the Group γ is composed of the Quercus species

belonging to the subg. Quercus and Group δ is constituted by Q. rubra from the subgenus

Schlerophyllodrys. The phylogenetic relationships between the four groups more closely

resemble those of the cpDNA fragment TrnH/PsbA tree. The major differences found

between the nuDNA and the cpDNA datasets are the cork oak samples that compose Group α

(and one could call the nuclear pure lineage), and subsequent sublineages, and Group β (the

nuclear introgressed lineage), that are not always the same when comparing the fragments

from both genomes. In particular, sublineage α1 in the nuclear DNA is not exclusively

composed of cork oak samples from Sicily island as in the cpDNA fragments, showing

samples that in the cpDNA belonged to sublineage A3; sublineage α2 is not completely West

Mediterranean in the nuclear DNA presenting samples from the sublineage α3 as well as

from the introgressed lineage; the sublineage α3 also loses its exclusiveness to East

Mediterranean populations being constituded by samples that in the cpDNA trees belong to

the sublineage α2 and introgressed lineage (Fig. 3.7). Another difference between the

cpDNA and the nuclear DNA was the arrangement of the cork oak samples that compose

Group β. These samples do not share the same haplotypes with Q. ilex and Q. coccifera as

they did in the cpDNA fragments. Instead they present a major haplotype derived from a

Results

Figure 3.6: a) Maximum parsimony tree of the candidate gene EST 2T13. Four groups are represented and color

coded. Group α is highlighted in yellow: cork oak‟s pure lineage and Q. cerris (Bright Yellow - Sublineage α2;

Brownish-Yellow – Sublineage α3; Light Yellow – Sublineage α1); Group β (Orange – cork oak‟s introgressed

lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group γ is highlighted in dark blue and is

composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q. canariensis and Q. lusitanica:

Group δ is highlighted in light blue and is constituted by Q. rubra. Numbers at the nodes are the bootstrap

support values obtained from 1000 replicates for the MP analysis and the Bayesian credibility value. b) Detailed

phylogenetic reconstruction of the sublineages from Group α. Bootstrap support and Bayesian credibility value

are provided above each branch. The site combinations bellow each branch represents the 6 mutational events

that occurred along the evolution of the three sublineages.

Results

Figure 3.7: Geographical distribution of cork oak nuDNA EST 2T13 haplotype lineages. Pie charts

represent the haplotype frequencies in the analysed populations. Pie charts sizes reflect the number of

samples per population (3-5). Colour codes reflect those in the EST 2T13 tree (Fig. 3.6); Yellow: cork oak‟s

nuclear pure lineage (Bright Yellow - Sublineage A2‟; Brownish-Yellow – Sublineage A3‟; Light Yellow –

Sublineage A1‟); Orange: cork oak‟s nuclear introgressed lineage. In grey is represented the present

distribution of the species.

common ancestor, shared with Q. ilex and Q. coccifera. The diversity levels in the nuclear

pure lineage are not as low as in the cpDNA (Table 3.2), showing 6 haplotypes.

However, the diversity levels of the cork oak samples that belong to the nuclear introgressed

lineage in Group β are lower when compared to the diversity of the other species in this

group (Table 3.2 and Fig. 3.6). In Group γ the diversity, however is higher than that of the

cpDNA since each species is characterized by its own haplotype (Fig. 3.6).

Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Supporting

Information 4) reflecting the four major groups in the trees and the shared haplotypes for Q.

suber, in clade B, with Quercus coccifera, Q. rotundifolia and Q. ilex.

3.1.3 Mismatch distribution and neutrality tests

Demographic histories of both cork oak lineages were evaluated with mismatch distributions

and tests of the standard neutral model for a demographically stable population (Tajima‟s D

[116] and Fu‟s Fs [117]) (Table 3.3). The TrnS/PsbC fragment sequence analysis provided

Results

slightly contradicting results. The null hypothesis of population demographic expansion was

not rejected based on the mismatch distribution for neither of the lineages (pure lineage –

SSD=0.098, p=0.061; r=0.436, p=0.164/ introgressed lineage – SSD=0.026, p=0.086;

r=0.186, p=0.055), but p values are somehow marginal and these statistics are conservative

and use little information of the data. Detecting population demographic size changes can be

difficult with small sample sizes or haplotypes, or when the population has experienced a

very recent expansion. Fu‟s Fs has been shown to be more powerful than mismatch

distributions in detecting both very recent and older population expansions [117,118], and

this statistic (such as Tajima‟s D) did not support population expansion for either of the

lineages (Table 3.3). The TrnH/PsbA fragment analysis, for the pure lineage showed a strong

evidence of recent population expansion from the not significant sum of squared deviations

(SSD=0.000, p=0.183) and Harpending‟s raggedness index (r=0.822, p=0.837), and

significant (p<0.001) negative values of Fu‟s Fs. Tajima‟s D values, although not significant

(p=0.119) presented also a negative tendency (D=-1.047). TrnH/PsbA introgressed lineage

presented no evidence of population expansion. SSD and r values rejected the null hypothesis

of expansion supported by the values of D and Fs. It was not possible to calculate the

mismatch and Fu‟s Fs for the pure lineage of the TrnL-F fragment because only one

haplotype is present. The TrnL-F introgressed lineage presented a mismatch distribution that

departed (although marginally) from the stepwise growth model (SSD=0.132, p=0.076), but

fit to the Harpending‟s raggedness index of stepwise population expansion model (r=0.491,

p=0.019). Fu‟s Fs values and Tajima‟s D were positive and not significant rejecting

population expansion (Table 3.3).

The nuDNA fragment EST 2T13 was also evaluated for its demographic history and

neutrality. For the nuclear pure lineage the null hypothesis of demographic expansion based

on the Harpending‟s raggedness index of the mismatch distribution was not rejected

(r=0.145, p=1.00). However the SSD value rejected the null hypothesis at a highly significant

level (SSD=0.328, p=0.00), supported by the non-significant values of Fu‟s Fs and Tajima‟s

D (although both values are negative) (Table 3.3). For the nuclear introgressed lineage the

mismatch analysis indicates demographic expansion, although this is not supported by Fu‟s

Fs and Tajima‟s D tests (Table 3.3).

Results

Missmatch Tajima's D Fu's Fs

τ Ɵ0 Ɵ1 SSD r D Fs

TrnS-PsbC

Pure lineage (A) 2.648 0.000 1.115 0.098 ns 0.436 ns 0.000 ns 0.947 ns

Introgressed lineage (B) 0.959 0.000 99999.000 0.026 ns 0.186 ns 0.000 ns -0.271 ns

TrnH-PsbA

Pure lineage (A) 3.000 0.000 0.050 0.000 ns 0.882 ns -1.047 ns -3.773 ***

Introgressed lineage (B) 10.273 0.000 8.887 0.236 ** 0.461 *** 0.047 ns 5.891 ns

TrnL-F

Pure lineage (A) - - - - - 0.000 ns -

Introgressed lineage (B) 2.484 0.002 3.000 0.132 ns 0.491 * 0.771 ns 1.290 ns

EST 2T13

Pure lineage (α) 0.000 0.000 3413.950 0.328 *** 0.145 ns -1.323 ns -0.741 ns

Introgressed lineage (β) 2.965 0.450 0.450 0.021 ns 0.457 ns 0.000 ns -0.176 ns

3.2 Microsatellite analysis

3.2.1 Genetic diversity values

For the EST-SSRs markers, QmDN1 locus was apparently monomorphic and was discarded

from any subsequent analysis. Global evaluation of the microsatellite data set using Micro-

Checker [102] revealed no evidence of genotyping errors due to stuttering or large allele

dropout, but identified possible null alleles at two markers: QmOST1 and QmDN3. For

QmOST1 locus, although marginally, there is the possibility of null alleles for the

populations HAZ and MEK (Supporting Information 5 – Fig. 5.1). As for the QmDN3 locus

revealed indices of null alleles in all populations and, therefore, was eliminated from all

subsequent analyses (Supporting Information 5 – Fig. 5.2). For the three remaining EST-

SSRs no linkage disequilibrium between the loci was detected (Supporting Information 6).

The number of total alleles (NA) in each population ranged from nine to fourteen and the

allelic richness (A) from 3.000 to 4.109, being the SIN population that clearly presented the

higher number of alleles and consequent the highest allelic richness. Gene diversity (expected

heterozygosity over loci) ranged from 0.400 in SIN to 0.598 in CAT (Table 3.4). Only the

population of SIN departed significantly from Hardy-Weinberg equilibrium at 0.01

Table 3.3: Estimates of mismatch distribution parameters and neutrality tests. τ = (tau) time since population

expansion; θ = relative population size before (θ0) and after (θ1) expansion; SSD = sum of squared deviations; r

= Harpending‟s raggedness index; D = Tajima‟s D; Fs = Fu‟s Fs; ns = not significant; * Significant at p<0.05;

*** Significant at p<0.001

Results

significance level (Table 3.4). The inbreeding coefficient for the SIN population was positive

(Fis=0.1691), and as the species, although monoeicious, presents a protandrous system to

ensure cross-pollination, significant deviation from zero should reflect biparental inbreeding

or population substructure (Table 3.4). In total, 18 alleles were identified at the three loci, and

7 alleles were exclusive to a single population (private alleles). Of these, 5 were exclusive to

SIN and the others to MEK and CAT (Table 3.4). The private alleles were at the extremes of

the allele size distribution and occurred at very low frequencies.

EST-SSRs (3 loci) nuSSRs (5 loci)

Country Code+ NA A PA Fis Ho He NA A PA Fis Ho He

Portugal ARR 10 3.315 - 0.0162 ns 0.522 0.531 34 6.165 3 0.1064 ns 0.538 0.601

BUC 10 3.025 - -0.0005 ns 0.422 0.422 34 5.997 1 0.0399 ns 0.553 0.576

EST 10 3.167 - 0.2190 ns 0.375 0.479 32 5.703 - 0.1449 ns 0.506 0.591

GER 10 3.228 - -0.0840 ns 0.506 0.467 30 5.635 - 0.0586 ns 0.557 0.591

MON 11 3.479 - 0.1769 ns 0.393 0.476 35 6.494 1 0.1579 * 0.521 0.617

SIN 14 4.109 5 0.1691 ** 0.333 0.400 33 6.158 1 -0.0665 ns 0.621 0.583

Algeria ALG 10 3.202 - -0.0055 ns 0.522 0.519 38 6.759 1 0.0920 ns 0.510 0.548

Spain CAT 11 3.465 1 -0.0218 ns 0.611 0.598 29 5.354 - 0.0628 ns 0,533 0.569

HAZ 10 3.230 - 0.1188 ns 0.512 0.580 32 5.796 1 0.1181 ns 0,469 0.531

Marocco TAZ 10 3.276 - 0.1083 ns 0.478 0.535 34 6.138 1 0.0261 ns 0.538 0.552

KEN 11 3.365 - -0.2336 ns 0.700 0.570 26 4.938 - 0.0433 ns 0.630 0.658

Tunisia MEK 11 3.480 1 0.1689 ns 0.441 0.528 29 5.433 - 0.0348 ns 0.508 0.526

Italy PUG 9 3.000 - 0.0403 ns 0.444 0.463 28 5.600 - 0.0642 ns 0.591 0.630

For the nuSSRs, As previously shown by Burgarella et al. [142] the locus MSQ13 appears to

be particularly informative to detect F1 hybrids between Q. suber and Q. rotundifolia because

the allele sizes do not overlap [88,142]. The locus was tested in some individuals for each

population (including all the individuals that were detected as belonging to the introgressed

lineages), revealing to be monomorphic at the expected allele size for Q. suber. Thus, the

locus was not used in the following analysis. Global evaluation of the microsatellite data set

Table 3.4: Populations of Quercus suber sampled for the molecular genetic work with SSRs, including country

and population abbreviations, number of total alleles (NA), allelic richness (A), number of private alleles (PA),

expected (He) and observed (Ho) heterozygosities, and within-population inbreeding coefficients (Fis).

+See Fig. 3.4 for visual location on a map of Europe.

Significance levels after Bonferroni corrections: Ns – Not significant; ** Significant at p<0.01; * Significant at

p<0.05

Results

using Micro-Checker revealed no evidence of genotyping errors due to stuttering or large

allele dropout, but identified possible null alleles in a few populations for markers

QpZAG110, QrZAG20, QrZAG11 and QpZAG15 (Supporting Information 5 – Fig. 5.3, Fig.

5.4, Fig. 5.5). Also, the QpZAG15 locus revealed a departure from the Hardy-Weinberg

equilibrium (HWE) in 9 populations (data not shown). Considering all, this locus was

removed from subsequent analyses. No linkage disequilibrium between the remaining loci

was detected (Supporting Information 6). The number of total alleles (NA) in each population

ranged from 26 (KEN) to 36 (ALG) and the allelic richness (A) from 4.938 to 6.759. The

gene diversity (expected heterozygosity over loci) ranged from 0.526 in Mekna to 0.658 in

Kenitra (Table 3.4). These values are slightly higher than those obtained for the EST-SSRs in

every population, with the exception of the Spanish and Tunisian populations. Only the

population of MON departed significantly from HWE at 0.05 significance level, after

Bonferroni correction (Table 3.4). Fis for the MON population assumed a positive value, and

could reflect biparental inbreeding or population substructure (Table 3.4). However, in this

case, considering that Micro-Checker marginally detected null alleles for this population for

the QrZAG20 locus, this effect cannot be discarded. In total, 56 alleles were identified at the

five loci, and nine alleles were private alleles. The private alleles presented no particular

distribution over the populations as did those of the EST-SSRs, although a slight tendency for

the population of ARR that has 3 of the nine alleles (Table 3.4). The private alleles were

mostly at the extremes of the allele size distribution and occurred at low frequencies.

No microsatellite (either nuSSR or EST-SSR) revealed evidence of nonneutrality after the

Ewens–Watterson and Ewens–Watterson–Slatkin tests (data not shown).

3.2.2 Genetic differentiation among populations

Different coefficients of genetic differentiation among populations were estimated for both

types of SSRs markers (Table 3.5). All the coefficients displayed higher values for the EST-

SSRs than for the nuSSRs, and consistently in both markers G‟ST and D displayed slightly

higher values than FST, GST and RST (Table 3.5). GST and FST showed that differentiation

among populations was more than double in the case of EST-SSRs (GST=0.066 EST-SSRs vs.

0.031 nuSSRs; FST=0.071 EST-SSR‟s vs 0.032 nuSSRs). Nevertheless, for the remaining

Results

Table 3.5: Genetic statistics for EST-SSRs and nuSSRs. Number of alleles (NA),

allelic richness (A), observed (Ho) andexpected (He) heterozygosities, FST

differentiation among populations according to Wier and Cockerham [125]; RST

differentiation among populations according to Slatkin [126], GST proportion among

population differentiation according to Nei & Chesser [130], G'ST standardized measure

of genetic differentiation according to Hedrick [129], and Dest estimator of actual

differentiation according to Jost [128]

Locus NA A Ho He FST RST GST G'ST Dest

EST-SSRs

QrOST1 9 4.320 0.540 0.610 0.064 0.038 0.060 0.148 0.093

QpD12 3 2.999 0.403 0.474 0.139 0.142 0.130 0.229 0.114

QmAJ1 6 3.182 0.501 0.542 0.017 0.019 0.020 0.045 0.025

All 18 1.750 0.481 0.542 0.071 0.066 0.066 0.141 0.077

nuSSR's

QpZAG110 23 13.149 0.817 0.872 0.022 -0.004 0.023 0.169 0.149

QpZAG9 7 3.124 0.138 0.142 0.014 0.015 0.014 0.016 0.003

QrZAG20 5 3.894 0.449 0.557 0.035 0.042 0.036 0.081 0.047

QrZAG7 10 6.625 0.689 0.756 0.057 0.138 0.055 0.204 0.158

QrZAG11 11 5.755 0.580 0.628 0.010 0.032 0.015 0.042 0.027

All 56 6.509 0.535 0.591 0.032 0.045 0.031 0.102 0.077

coefficients (RST, G‟ST and D) the differences between the markers are not significant and,

interestingly, the value of actual differentiation among populations calculated according to

Jost D for both SSRs was the same (Dest=0.077) (Table 3.5).

Tests of pairwise FST and RST were performed for the thirteen populations, for both EST-

SSRs and nuSSRs. There was a tendency for obtaining higher values in the EST-SSR data

matrix, but not always so. Therefore both SSR matrices were analysed together. The overall

genetic differentiation at the microsatellite loci was low (Pairwise FST from 0.000 to 0.123),

though highly significant (p<0.001) after bonferroni correction in 51 out of 78 pairs (Table

3.6). The RST matrix values very resembled the ones of the FST matrix. The highest values

were obtained for the populations CAT and KEN, followed by PUG. The Dest values,

although similar to the FST and RST values, tend to be lower (Supporting Information 7).

Isolation by distance was tested using a Mantel test but no correlation was found between

genetic differentiation and geographic distance among populations (r=0.1082, p=0.26).

Results

ALG ARR BUC CAT HAZ EST GER PUG KEN TAZ MON SIN MEK

ALG -- 0.023 0.044 0.055 0.013 0.063 0.021 0.075 0.073 0.022 0.059 0.041 0.000

ARR 0.023 ***

-- 0.000 0.052 0.020 0.004 0.000 0.055 0.065 0.034 0.001 0.008 0.046

BUC 0.043 ***

0.000 ns

-- 0.094 0.056 0.007 0.004 0.060 0.108 0.043 0.013 0.009 0.067

CAT 0.052 ***

0.050 ***

0.086 ***

-- 0.051 0.094 0.084 0.141 0.083 0.067 0.096 0.092 0.086

HAZ 0.013

ns 0.020

ns 0.053 ***

0.049 ***

-- 0.041 0.025 0.087 0.043 0.021 0.050 0.065 0.031

EST 0.035 ***

0.003 ns

0.007 ns

0.086 ***

0.039 ***

-- 0.000 0.056 0.092 0.034 0.006 0.035 0.044

GER 0.021 ***

0.000 ns

0.004 ns

0.077 ***

0.025 ***

0.000 ns

-- 0.039 0.073 0.028 0.004 0.023 0.037

PUG 0.070 ***

0.052 ***

0.057 ***

0.123 ***

0.080 ***

0.053 ***

0.038 ***

-- 0.101 0.055 0.072 0.100 0.084

KEN 0.068 ***

0.061 ***

0.097 ***

0.076 ***

0.042 ***

0.084 ***

0.068 ***

0.092 ***

-- 0.052 0.120 0.132 0.093

TAZ 0.021

ns 0.033 ***

0.041 ***

0.063 ***

0.020 ns

0.033 ns

0.027 ***

0.052 ***

0.049 ***

-- 0.068 0.082 0.036

MON 0.056 ***

0.001 ns

0.013 ns

0.087 ***

0.048 ***

0.006 ns

0.004 ns

0.067 ***

0.107 ***

0.063 ***

-- 0.035 0.076

SIN 0.039 ***

0.008 ns

0.009 ns

0.084 ***

0.061 ***

0.034 ***

0.022 ***

0.091 ***

0.117 ***

0.076 ***

0.034 ***

-- 0.069

MEK 0.000

ns 0.044 ***

0.063 ***

0.079 ***

0.030 ***

0.042 ***

0.036 ***

0.077 ***

0.085 ***

0.035 ***

0.070 ***

0.065 ***

ns=Not significant; *p<0.05; ** p<0.01, *** p<0.001

Table 3.6: Pairwise FST (Below) and RST (Upper) values between every population.

3.2.3 Population structure

The EST-SSRs and nuSSRs datasets were analysed separately and then merged together to

determine the populations genetic structure (Fig. 3.8). For the EST-SSR‟s, in the software

STRUCTURE, the logarithm of the probability of the data [LnP(D)] as function of K reached

a peak for K=3 (mean values: LnP(D)=-2055.3; var[LnP(D)]=131.2), which was confirmed

using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.1). For the nuSSR‟s

dataset the LnP(D) reached a peak at K=4 (mean values: LnP(D)=-4749.3;

var[LnP(D)]=238.9) and then decreased, but there was a higher DK value for K=3 than for

K=4 using Evanno‟s criterion [136] (Supporting Information 8 – Fig. S8.2). For the

combined dataset, the LnP(D) reached a peak ate K=4 (mean values: LnP(D)=-6704.3;

var[LnP(D)]=303.8), but when DK was used to infer the number of clusters, K=2 presented

Results

the highest values, however there was a second peak at K=4 (Supporting Information 8 – Fig.

S8.3).

For the most likely run for each K, the r value was always low and below 1, indicating that

the sample locations were informative and helped greatly to find the population structure.

When comparing the results from the EST-SSRs and nuSSRs datasets the results are slightly

different, which is not completely unexpected considering the different types of SSRs (Fig.

3.8a and Fig. 3.8b). However, for both datasets each population can almost be completely

assigned to one of the clusters detected. When K=2, for the EST-SSRs the populations CAT

and KEN can be assigned to one cluster (pink cluster), as ALG, ARR, BUC, EST, GER,

PUG, MON and SIN to the other cluster (blue cluster). The populations HAZ, TAZ and MEK

appear as a mixture of both clusters (Fig. 3.8a). For the nuSSRs dataset, the groups are

different, as MEK appears differentiated from the remaining populations in the blue cluster

and ALG and HAZ as mixed populations, although slightly more similar to MEK (Fig. 3.8b).

Despite of the validation of K=3 for the EST-SSRs most of the populations appear as a

mixture of clusters. The population of CAT appears differentiated, alone in one of the clusters

(pink cluster), the same way as SIN appears in another cluster (blue cluster) (although some

individuals show more probability of belonging to the pink cluster, along with CAT) (Fig.

3.8b). For the nuSSRs, at K=4, CAT also appears differentiated, alone in one cluster (blue

cluster). The Italian population, PUG, can also be placed alone in another cluster (green

cluster). The Portuguese populations (ARR, BUC, EST, GER, MON and SIN) can all also be,

to same extent, placed in a third cluster (pink cluster), and HAZ and TAZ appear as mixed

populations (Fig. 3.8b).

For a more robust analysis both matrices were merged together (Fig. 3.8c). At K=2 the

populations of ALG, CAT and MEK appear as part of the same pink cluster (79%, 88% and

91% of assignment probabilities, respectively), and the Portuguese populations and PUG as

part of the blue cluster (94% on average for the Portuguese populations and 85% for PUG).

HAZ, KEN and TAZ appear as mixed populations, with a slight tendency for the pink cluster

(Fig. 3.8c). At K=4 CAT differentiates from the other populations (75%) in a green cluster.

PUG and KEN appear as part of the same yellow cluster (79% and 74%, respectively). The

MEK population differentiates in another cluster (85% for the pink cluster) and HAZ and

ALG appear as mixed populations although more closely related to the MEK cluster. The

Results

Portuguese populations appear all together in the blue cluster (83% on average), with the

GER populations as the most mixed population in the group. The population TAZ is a mixed

population between several clusters (Fig. 3.8c). The geographic distribution of the clusters

obtained by STRUCTURE for the combined SSRs dataset is presented in Fig. 3.9a for K=2

and in Fig. 3.9b for K=4.

To complement the analyses run in STRUCTURE, GENELAND analysis was performed on

the merged dataset. The geographical distribution of the six clusters detected is shown in Fig.

3.9c. The first cluster (purple) was composed of the Portuguese populations (EST, GER,

BUC, MON, SIN and ARR); the second (orange) was composed only by KEN; the third

cluster (green) grouped the populations HAZ and TAZ; the fourth cluster (grey) included a

single population, CAT; the fifth cluster (blue) comprised the populations of ALG and MEK;

and the sixth cluster (red) considered only PUG.

AMOVA considering the clusters formed in GENELAND and STRUCTURE analysis (Fig.

3.8 and Fig. 3.9) was always significant for the clusters detected at the 0.001 level but also

showed that the great majority of genetic variation was found within populations (94%).

Also, for the molecular analysis considering the 6 clusters (structure obtained by the software

GENELAND) we were able to obtain the highest value for the genetic differentiation

between groups (FCT=4.99) (Supporting Information 9).

Results

Figure 3.8: Structure clustering results obtained for the a) EST-SSRs dataset (K=2 and 3); b) nuSSRs dataset

(K=2, 3 and 4); and c) combined dataset (K=2, 3 and 4). Populations are separated by black bars and identified

at the bottom. In all analyses, each distinct cluster is represented by a unique colour. Each individual is

represented by a thin bar and the colours on each vertical bar represent the probability of the individual

belonging to each cluster.

Results

Figure 3.9: Geographic distribution of the clusters obtained by STRUCTURE and GENELAND:

a) combined dataset with Structure for K=2; b) combined dataset for Structure with K=4; and c)

combined dataset for GENELAND with K=6. Pie charts represent the assignment probabilities to

each cluster, and each cluster is colour coded. Pie charts sizes reflect the number of samples per

population (22-32). For a) and b) the colour codes reflect the ones used in Fig. 7c to code each

cluster.

Discussion

4. Discussion

4.1 Differentiation and demographic patterns

Maternally inherited cpDNA markers yield valuable information about genetic variability

associated with local populations or provenances [143], therefore the geographic patterns of

cpDNA haplotypes in many widespread European forest trees are sometimes interpreted

based on the assumption of survival as glacial refugia in South and Eastern Europe – outside

the limits of the Weichselion ice sheet – and postglacial migration. Some species appear to

have spread northwards and westwards from a single refuge while others spread from

multiple refugia [48,54,57,70,144].

Analysis of the sequencing data from cpDNA regions, clearly show (with the exception of the

rbcL fragment) the presence of two well established cork oak lineages, the pure lineage and

the introgressed lineage (supported as well by the sequencing of the nuclear candidate gene).

The cpDNA pure lineage here described seems to be related with the “suber” lineage

described previously by Jiménez et al. [41], which is almost specific to cork oak populations

and may be considered as the original and most widely distributed lineage in this species

[41,64]. The TrnS/PsbC fragments presented the highest resolution power regarding this

lineage and three main haplotypes (A1, A2 and A3) are evident (Fig. 3.1 and Fig 3.4). These

three sublineages have well delimited geographic areas and possibly reflect refuge areas from

where expansion events putatively occurred after the last glaciation, which is somewhat

supported by the values from mismatch distribution and neutrality tests (Table 3.3). The

previous works of López de Heredia et al. [52] and Lumaret et al. [43] have indicated the

southern Iberian Peninsula as a possible refuge area, supported by palynological data.

Although the results found in this work are not conclusive enough to support this idea, the

sublineage A2 appears to have spread from a western Mediterranean area, consistent with a

refuge area in the Iberian Peninsula. Lumaret et al. [43], based on RFLP analysis of the

whole cpDNA, indicated two more possible refuge areas for cork oak, more precisely

southern Italian Peninsula and North Africa, albeit this is not supported from fossil record

[52]. It is difficult to determine the origin of sl A3 because this haplotype is distributed

throughout most of Peninsular Italy and North Africa (Algeria and Tunisia). However, any of

these geographic areas could have been a refuge for this lineage in cork oak in agreement

Discussion

with the results presented by Lumaret et al. [43]. Nevertheless, the presence of a haplotype (sl

A2) restricted to the Sicily Island was unexpected. Although no previous work suggests

Sicily as a refuge area, the geographic restriction of this lineage and the fact that is more or

less contemporaneous to the other two sublineages, suggests that this might be indeed a

refuge area for cork oak.

It is also possible that the extensive introgression of Q. suber by Q. ilex may indicate several

potential refugia areas. In fact López de Heredia et al. [52] presents North-eastern Spain

(Catalonia) as a potential refuge area resultant from extensive hybridization with Q. ilex [52].

The authors argue that the populations from this area present a predominant “ilex chlorotype”

that is very rare in holm oak. Therefore it cannot be discarded the hypothesis that some

populations might have withstood the glacial conditions in this area (or any other area), by

hybridizing. Although it is not possible to fully corroborate this, it was found that in CAT

population there is the indication of a total replacement of the cpDNA pure lineage, which

might indicate that the events of introgression might be ancient, and indeed reflect a glacial

refuge area. The same complete replacement of the cpDNA pure lineage appears to have

happened in MEK, and almost completely in HAZ.

However, more detailed inferences about the geographic origins of the haplotypes and their

migration scenarios will require additional sampling of populations and most likely other

genomic regions because the lower cpDNA variation itself could bias the identification of

glacial refugia for Quercus suber.

There is no previous works using sequences from the nuclear genome in this species.

However, in comparison with the results from the cpDNA sequences, the nuclear DNA

fragment seems to be in fact more informative than the cpDNA. The nucleotide diversity is

higher than those from the cpDNA fragments, as well as the haplotype number found for the

pure lineage (Table 3.1 and Table 3.2). Also, the analysis shows a more complex geographic

distribution history for cork oak. The results obtained, just like for the cpDNA, showed a

pure lineage composed by three sublineages, but the distribution of the sublineages are not as

geographically structured as they were for the cpDNA dataset (Fig. 3.6 and Fig. 3.7). The

sublineage α3 provided by the nuDNA dataset, that in the cpDNA was restricted to the Sicily

Island, extends to Lazio (Italy) and Tunisia. The sublineage α1, equivalent to the cpDNA

sublineage A1, was still the most frequent sublineage, but at the nuDNA it is not restricted to

Discussion

the western part of the Mediterranean as it was in the cpDNA, showing an extended

distribution, although not so frequently, to the eastern part of the Mediterranean. The same

was detected for the sublineage α2 that seemed not to be restricted to the eastern part of the

species distribution. These differences between cpDNA and nuDNA sequence data can be

explained by long-distance pollen dispersal and/or high levels of polymorphism. However,

considering the results for the levels of polymorphism in the candidate gene (Table 3.1 and

Table 3.2) they do not appear to be high enough to justify these differences and long-distance

pollen dispersal, with the more limited acorn dispersal, seems to be a better explanation. This

is consistent with indirect methods based on measures of genetic differentiation for nuclear

versus cpDNA markers in oaks, which suggest that pollen flow is much higher (by two orders

of magnitude) than seed flow [145-147].

The pattern of three sublineages obtained in this work clearly contrasts with the one

previously found by Magri et al [16]. Using cpDNA microsatellites, the authors analysed

cork oak populations throughout the species distribution range and found a high geographical

structure characterized by five distinct haplotypes (Fig. 1.4). The cpDNA SSR data combined

with paleobotanical and geodynamics models lead the authors to suggest an early Cenozoic

origin for cork oak in the Iberian Peninsula and a susequent genetic drift geographically

consistent with the Oligocene and Miocene break-up events [16] (Fig. 1.5). All these events

seemed to have occurred without detectable cpDNA modifications for a time span of at least

15-25 million years. This is somehow also inconsistent with the results found in this work. As

most of the cpDNA fragments sequenced here actually showed no resolution and therefore

haplotype variation that could detect the three sublineages, the TrnS/PsbC fragment indeed

shows that the sublineages are formed by a single mutational event (Fig. 3.1), which is

unlikely to date to an early Cenozoic.

4.2 Hybridization and introgression

Several proposals for Quercus taxonomy based on morphology have been presented [12,26].

Classifications have not been straightforward and especially at the subgenus level, are

uncertain. The taxonomic scheme proposed by Schwarz [26] is possibly the most accepted for

Discussion

the classification of cork oak, and appears to be the most suitable in describing the

systematics of European oaks [19,31,32].

Upon sequencing of the cpDNA fragments for the eleven Quercus species used in this study,

with the exception of the rbcL fragment that presented no sequence variation between all the

11 species used, the remaining 4 cpDNA fragments (matK, TrnS/PsbC, TrnL-F and

TrnH/PsbA) in general were able to distinguish the 4 subgenus (or subsections) (Fig. 3.1a,

Fig. 3.2, Fig. 3.3 and Fig. 3.5) proposed by Schwarz [26] (Quercus, Erythrobalanus,

Sclerophyllodrys and Cerris). However, the phylogenetic relationships between the subgenus

are uncertain among fragments and it is not possible to make accurate inferences about those

relationships. Also, in accordance to the latest work of Piredda et al. [29], it remains the idea

that the genus Quercus is noncompliant to barcoding with the most common cpDNA

sequences, since most of the species analysed within the same subgenus share the same

cpDNA haplotype. The low levels of cpDNA variation rate and hybridization events are

likely to be the cause [29].

The nuclear DNA, however, has a lot more discrimination power than the cpDNA. In fact the

EST 2T13 fragment supports the recognition of the subgenus Sclerophyllodrys, Cerris,

Erythrobalanus and Quercus, in agreement with the works of Bellarosa et al. [28] and

Bellarosa et al. [27] that also used fragments of the nuclear genome [27,28]. Also, the EST

2T13 fragment distinguishes all the species analysed, and although this issue requires further

study it supports the idea that the nuclear DNA might be a useful supplementary barcode tool

in difficult genus such as Quercus.

The complex evolutionary history of the Mediterranean evergreen oaks has already been

addressed by other authors, that showed that Q. suber, Q. ilex and Q. coccifera present shared

haplotypes as a result of successful hybridization and introgression of Q. suber by Q. ilex

[41,43,52]. However, those results were based on RFLP analysis over the cpDNA only and

with no insight on the nuclear genome. The sequencing of the cpDNA fragments immediately

evidences the introgression events in Q. suber. Since the subgenus Sclerophyllodrys and

Cerris are clearly distinguishable in the phylogenetic trees constructed (Fig. 3.1, Fig. 3.2, Fig.

3.3 and Fig. 3.5), the presence of cork oak samples in both subgenus easily points to

Discussion

introgression of Q. suber, allowing the identification of a pure lineage of cork oak haplotypes

in the subg Cerris, and an introgressed lineage in the subg Sclerophyllodrys.

The distribution of the cpDNA introgressed lineage appears restricted to the Western area of

the species distribution and peripheral regarding the distribution of the pure lineage

(specifically the sublineage A3). Although it is not possible to date precisely the introgression

events some may in fact reflect glacial refugia in this area of the distribution [possibly in the

North-eastern Spain (Catalonia) and/or Morocco] where cork oak populations survived with

introgression with Q. rotundifolia. In the postglacial colonization events of range expansion

the rapid expansion of cork oak from the pure lineage refuge may have limited the expansion

of the introgressed lineage forming the mixed populations that present both haplotype

lineages (Fig. 3.4). On the other hand, the analysis of the phylogenetic trees doesn‟t allow

ruling out the hypothesis of more recent or current introgression events (Fig. 3.1, Fig. 3.2,

Fig. 3.3). Current hybridization is still happening, most frequently in central and eastern

Iberia, with the first-generation hybrids between Q. suber and Q. ilex being easily identified

in the field [52].

The same introgressed lineage seems to be present in the nuclear DNA, although there is no

previous reference. However, the cork oak samples belonging to the introgressed lineage are

not always the same in both genomes. That is, some of the samples of the cpDNA

introgressed lineage present a nuclear genome of the pure lineage as others present evidences

of a nuclear introgressed lineage, and also some samples with the cpDNA belonging to the

pure lineage present a nuclear genome from the introgressed lineage.

The flowering phenology and present day ecology of the two species suggest that pollen-flow

might be expected to be predominantly from Q. suber into Q. ilex. Quercus suber performs

better than Q. ilex as a pollen parent in interspecific crosses [45]. Molecular evidence provide

support for this expectation [18,43,71]. These evidences would explain the cork oak samples

that present an introgressed cpDNA, but where the nuclear fragment belongs to the pure

lineage (see, for example, samples TAZ 1 or HAZ 5). However the reverse also seems to

happen, because samples were found that present a cpDNA from the pure lineage, and the

nuclear DNA belongs to the introgressed lineage (see TOL 3 or LAZ 2). Interestingly some

Discussion

samples (see GER 5 or TAZ 2) present both cpDNA and nuDNA fragments of the

introgressed lineage at the same time.

The fact that in the subg. Sclerophyllodrys the species Q. ilex and Q. coccifera present the

same haplotypes was suggested previously by some authors to be a result of introgression

between these species or of incomplete lineage sorting [41,64]. The same happens in the subg

Cerris, between Q. cerris and Q. suber. The lack of resolution of the cpDNA might argue for

incomplete lineage sorting, but previous authors suggested introgression between these

closely related species [16]. Despite Quercus suber and Quercus cerris belong to the same

taxonomic group, subgenus Cerris [17,30], they are morphologically well distinct, and have

different geographical and ecological ranges. The natural distribution range of Q. cerris is

from central and southern Europe to Asia Minor. However, in peninsular Italy and in Sicily

the ranges of Q. cerris and Q. suber overlap. In fact, Q. crenata is hypothesized to be a

hybrid between Q. suber and Q. cerris, although some other authors considered it instead as a

fixed species.

The analyses of the cpDNA datasets show that Quercus cerris and Quercus ilex share the

same haplotype for most of the fragments, which could point to an incomplete lineage

sorting. However, the highest resolution power of the TrnS/PsbC fragment (Fig. 3.1) places

Q. cerris haplotype as highly derived from the sublineage A3, in the Eastern Mediterranean

area. Although this cpDNA fragment differentiates the species it does not excludes possible,

and eventually somewhat ancient, hybridization events between Q. suber sl A3 and Q. cerris.

The nuclear fragment shows that Q. cerris shares the same haplotype as Q. suber samples

from sublineage α1, one of the lineages from the Eastern Mediterranean area. Considering

both types of markers, although the cpDNA does not immediately suggest introgression

events between these species, the nuclear candidate gene does not clarify between this

hypothesis and incomplete lineage sorting. Nevertheless, retention of ancestral polymorphism

also needs to be considered given the unavailability in confirming introgression between

these species. These two hypotheses might be confounded with each other, particularly when

contemporary introgression can not be discarded, due to the presence of both species in some

areas.

Discussion

4.3 Genetic diversity and population structure

The selection of the populations for the SSRs analyses was made based on the sequencing

results and throughout the entire range in order to maximize the chances of surveying a great

part of the species genetic diversity.

Recent work has been done in genetic diversity and population structure for several species

using a combined analysis of EST and genomic SSRs. Although a small amount of work has

been done in cork oak with nuSSRs there were no previous studies EST-SSRs. Tests for

neutrality indicate that selection did not differentially affect performance of EST and nuSSRs

in characterizing cork oak populations. Even though EST-SSRs are potentially exposed to

selection only a small percentage shows evidence of positive selection [91,93]. However, it is

important to conduct selective neutrality tests on EST-SSRs before using them in population

genetics analyses because even though they most probably will not be under strong selection

pressure, a small percentage may indeed be [91,93]. Also, results show that genetic diversity

of EST-SSRs measures similar to the nuSSRs, and there is no evidence of null alleles or other

genotyping errors. Therefore, evidences suggest that EST-SSRs are appropriate markers for

population genetics studies in cork oak.

The population differentiation found, although low was significant and is, at least for the

EST-SSRs (FST=0.071; RST=0.066; Dest=0.077), close to the lower limit of the range of the

average values (0.07-0.09) expected for the long-lived, wind-pollinated woody species (Table

3.5 and Table 3.6) [22]. Although the studies of Coelho et al. [22] and Simões de Matos (F.

Simões de Matos, PhD thesis, INETI Lisbon, 2007) only considered Portuguese populations

(FST=0.0172 and FST=0.02/RST=0.013, respectively), the general values of population

differentiation found here were considerably higher (RST=0.066 EST-SSRs vs. 0.045 nuSSRs;

FST=0.071 EST-SSR‟s vs 0.032 nuSSRs) (Table 3.5). However, pairwise FST and RST values

between Portuguese populations tend to be lower and non-significant (Table 3.6) denoting the

small differentiation between these populations, also found by the studies of Coelho et al.

[22] and Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). Also, and

in agreement with these results we found that most of species diversity (94%) is found within

rather than among populations.

Discussion

The locus MsQ13 was previously suggested to be particularly informative to detect F1

hybrids between Q. suber and Q. rotundifolia because alleles sizes do not overlap [88,142].

Even though the locus was tested here in individuals of every population, including all the

individuals that were detected as belonging to the introgressed lineages (either cpDNA or

nuDNA) the locus was monomorphic at the expected allele size for Q. suber. Nevertheless

the work of Burgarella et al. [142] clearly demonstrates the difficulty in detecting

introgressed hybrids in these species even though the microsatellite loci chosen for their work

were highly differentiated between species and had good diagnostic power. Also, although

there was the initial attempt of recreate the SSR battery used in this work some of them were

discarded because there was no amplification product, the scoring was extremely doubtful or

there was a high deviation from HWE. In the future, perhaps a more targeted choice for easily

reproducible markers is required, as well as the investment in some key holm oak populations

for comparative purposes in detecting hybrids.

Isolation by distance was tested but no correlation was found between genetic differentiation

and geographic distance among populations throughout the Mediterranean. However, in a

previous work of Ramírez-Valiente et al. [148] in cork oak Spanish populations, and using

the same nuSSRs battery as in this work (with the exception of QpZAG46 that had no clear

scoring) the authors found that the FST measures for the neutral markers were correlated with

geographic distance. In the same work the authors also found an association between leaf size

and the microsatellite QpZAG46, which suggests a possible linkage between QpZAG46 and

genes encoding for leaf size [148].

When comparing the population structure results from the EST-SSRs and nuSSRs datasets

they are slightly different, which is not completely unexpected considering the different types

of SSRs (Fig. 8a and Fig. 8b). However, when merging the datasets, from where the most

consistent information is expected to be retrieved, the results from STRUCTURE and

GENELAND softwares, although not in complete agreement, present the same emerging

pattern: 1) The Portuguese populations grouped together in one cluster. There was no

differentiation between the Portuguese populations and this is in agreement with the results

found by Simões de Matos (F. Simões de Matos, PhD thesis, INETI Lisbon, 2007). This

Discussion

might be explained, considering the geographic distance between the populations, and it

might be therefore expected the role of gene flow in the homogenization of the alleles in

these populations. Also, this is in agreement with the low and mostly non-significant pairwise

FST and RST values found between these populations; 2) Catalonia is clearly the most well

differentiated population. The results always placed this population as the only of a cluster

and it scored the highest pairwise comparisons for FST and RST values.

On the overall GENELAND results provided a more plausible scenario regarding the

distribution of the clades. When analysing STRUCTURE results the population of Puglia

(PUG) appeared in awkward clusters that are difficult to explain, such as, when K=2 why

does it appear in the same cluster as the Portuguese populations, and when K=4 in the same

cluster as Kenitra (KEN), as in K=2 KEN and PUG are in opposite clusters. Although

STRUCTURE groups KEN and PUG in one cluster, GENELAND separates these two

populations in one cluster each (Figs. 8 and 9). The small number of SSRs and low levels of

differentiation might explain the senseless distribution of some clades in STRUCTURE

analysis. However, the finding that GENELAND identified a greater number of clusters than

STRUCTURE (six versus two/four), and that the same clusters were identified by

independent GENELAND runs and produced similar values of posterior probabilities, could

indicate that the algorithm employed in GENELAND may be more sensitive to find weak

clusters in space, when there is low differentiation. In fact, recently, a similar finding was

reported by Wellenreuther et al. [149] in a work with Ischnura elegans, the blue tailed

damselfly.

Final Remarks

5. Final Remarks

Extending over a surface of about 2.2 million ha in seven Mediterranean countries (Portugal,

Spain, Algeria, Morocco, Italy, Tunisia and France), cork oak forest landscapes represent one

of the best examples of the multi-functional role of forests, maintained over thousands of

years but promoting high biodiversity levels. Well managed cork oak forests provide valuable

ecological functions such as the conservation of soil, buffering against climate change and

desertification, water table recharge and run-off control and contribute to the survival of

many species. Cork oak trees are extremely important in ensuring that these ecosystems

maintain the ecological balance and do not harm the forest. These semi-natural woodlands

thus provide a valuable income to local populations both at a direct level with the harvesting

of cork and in an indirect level by providing other economically valuable resources such as

grazing grounds for animals and above all, the maintenance of an ecological balance

Mediterranean regions have been facing a growing number of extreme weather events due to

rapid change of climate. Assessment of the impacts of climate extremes upon cork oak trees

can help planning better forest management practices for coping with future climate change,

and to achieve the purpose of sustainable development of the ecosystems and societies within

the Mediterranean area.

Studying the consequences of past climate shifts on biodiversity are among the best tools to

validate models of the ecological and evolutionary consequences of future changes. Advances

in DNA analysis are allowing the reconstruction of the evolutionary history of forest trees.

This work focused on the first molecular approach assessing the potential of a combined

analysis with chloroplastidial and nuclear DNA markers, as well as sequence data and

microsatellites. The importance of such synergistic analyses is highlighted when addressing

questions such as the evolutionary history and geographic patterns of populations‟ diversity.

On the overall, the three major objectives in this work were achieved. It was possible to

gather valuable information on the evolutionary history of Quercus suber. Sequencing data

allowed the detection of two major haplotype lineages, consistent in both nuclear and

chloroplastidial genomes. Within the pure lineage were unveiled three sublineages and some

signs of recent population expansion. It is hypothesised that during the coldest periods cork

oak would only survive in more benign climatic areas (possibly three refuges), from where,

Final Remarks

after the warming at the end of the last glacial period, might have colonized its current

distribution area.

It was also possible to explore the phylogenetic relationships of cork oak and other Quercus

species from all the four recognized subgenus. This also helped the detection of the

introgressed lineage in cork oak resulting from several events of hybridization with Q. ilex.

Although some of the hybridization events might appear old, current hybridization can not be

discarded. Also, and although the hybridization and DNA introgression by Q. ilex has already

been reported by other authors, it became evident in this work that the introgression events

are also detected in the nuclear genome.

Finally, microsatellites allowed the identification of some differentiation and structuring in

some key cork oak populations. Although the differentiation and the clusters found might be

somewhat weak, adding microsatellites and populations will possibly strengthen the results

found here.

Bibliographic References

6. Bibliographic References

1 Food and Agriculture Organization of the United Nations (FAO) (2011) State of the

World‟s Forests. Fao World Forests

2 Petit, R. J. and Hampe, A. (2006) Some Evolutionary Consequences of Being a Tree.

Annual Review of Ecology, Evolution, and Systematics. 37, 187-214

3 Oldfield, S. et al. (1998) The World List of Threatened Trees, Cambridge,World

Conservation Press

4 Hansen, A. J. et al. (2001) Global Change in Forests: Responses of Species,

Communities, and Biomes. BioScience. 51, 765-779

5 Food and Agriculture Organization of the United Nations (FAO) (2010) Global Forest

Resources Assessment 2010. Main report

6 González-Martínez, S. C. et al. (2006) Forest-tree population genomics and adaptive

evolution. The New phytologist. 170, 227-38

7 Schaal, B. a et al. (1998) Phylogeographic studies in plants: problems and prospects.

Molecular Ecology. 7, 465-474

8 Petit, R. J. et al. (2005) Climate changes and tree phylogeography in the

Mediterranean. Taxon. 54, 877-885

9 Avise, J. C. et al. (1987) Intraspecific Phylogeography: The Mitochondrial DNA

Bridge Between Population Genetics and Systematics. Annual Review of Ecology and

Systematics. 18, 489-522

10 Avise, J. C. (2009) Phylogeography: retrospect and prospect. Journal of

Biogeography. 36, 3-15

11 Beheregaray, L. B. (2008) Twenty years of phylogeography: the state of the field

and the challenges for the Southern Hemisphere. Molecular ecology. 17, 3754-74

12 Nixon, K. C. (1993) Infrageneric classification of Quercus (Fagaceae) and typification

of sectional names. Annales Des Sciences Forestières. 50, 25s-34s

13 Nixon, K. C. (2006) Global and Neotropical Distribution and Diversity of Oak ( genus

Quercus ) and Oak Forests. In Ecology and conservation of neotropical montane oak

forests 185 (Kappelle, M., ed), pp. 3-13, Springer-Verlag

14 Pausas, J. G. et al. (2006) Regeneration of a marginal Quercus suber forest in the

eastern Iberian Peninsula. Journal of Vegetation Science. 17, 729

15 Elena-Rosselló, J. A. and Cabrera, E. (1996) Isozyme Variation in Natural Populations

of Cork-Oak (Quercus suber L.). Population Structure, Diversity, Differentiation and

Gene Flow. Silvae Genetica. 4 & 45, 229-235

16 Magri, D. et al. (2007) The distribution of Quercus suber chloroplast haplotypes

matches the palaeogeographical history of the western Mediterranean. Molecular

Ecology. 16, 5259-5266

17 Tutin, T. G. et al. (1993) Flora Europaea, Volume 1, (2nd edn) Cambridge University

18 Toumi, L. and Lumaret, R. (1998) Allozyme variation in cork oak (Quercus suber L.):

the role of phylogeography and genetic introgression by other Mediterranean oak

species and human activities. Theoretical and Applied Genetics (TAG). 97, 647-656

19 Toumi, L. and Lumaret, R. (2001) Allozyme characterisation of four Mediterranean

evergreen oak species. Biochemical systematics and ecology. 29, 799-817

20 Pausas, G. P. et al. (2009) The tree. In Cork Oak Woodlands on the Edge. Ecology,

Adaptive Management, and Restoration (1st edn) (Aronson, J. et al., eds), pp. 11-21,

Island Press

21 Carrión, J. S. et al. (2000) Past distribution and ecology of the cork oak (Quercus

suber) in the Iberian Peninsula: a pollen-analytical approach. Diversity and

Distributions. 6, 29 - 44

22 Coelho, A. C. et al. (2006) Genetic Diversity of Two Evergreen Oaks [Quercus suber

(L.) and Quercus ilex subsp. rotundifolia (Lam.)] in Portugal using AFLP Markers.

Silvae Genetica. 55, 146-152

23 Soto, A. et al. (2007) Differences in fine-scale genetic structure and dispersal in

Quercus ilex L. and Q. suber L.: consequences for regeneration of mediterranean open

woods. Heredity. 99, 601-7

24 Pulido, F. J. et al. (2001) Size structure and regeneration of Spanish holm oak Quercus

ilex forests and dehesas: effects of agroforestry use on their long-term sustainability.

Forest Ecology and Management. 146, 1-13

25 Pons, J. and Pausas, J. G. (2006) Oak regeneration in heterogeneous landscapes: The

case of fragmented Quercus suber forests in the eastern Iberian Peninsula. Forest

Ecology and Management. 231, 196-204

26 Schwarz, O. (1964) Quercus L. In Flora Europaea, Volume 1 (2nd edn) (Tutin, T. G.

et al., eds), pp. 71-76, Cambridge University Press

27 Bellarosa, R. et al. (2005) Utility of ITS sequence data for phylogenetic reconstruction

of Italian Quercus spp. Molecular Philogenetics and Evolution. 34, 355-370

28 Bellarosa, R. et al. (1990) Ribosomal RNA genes in Ouercus spp. (Fagaceae). Plant

Systematics and Evolution. 172, 127-139

29 Piredda, R. et al. (2011) Prospects of barcoding the Italian wild dendroflora: oaks

reveal severe limitations to tracking species identity. Molecular ecology resources. 11,

30 Manos, P. S. et al. (1999) Phylogeny, Biogeography, and Processes of Molecular

Differentiation in Quercus subgenus (Fagaceae). Molecular Phylogenetics and

Evolution. 12, 333-349

31 Manos, P. S. et al. (2001) Systematics of Fagaceae: Phylogenetic test of reproductive

trait evolution. International journal of plant sciences. 162, 1361-1379

32 Kress, W. J. and Erickson, D. L. (2007) A two-locus global DNA barcode for land

plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

PloS one. 2, e508

33 Cowan, R. S. et al. (2006) 300,000 Species to Identify: Problems, Progress, and

Prospects in DNA Barcoding of Land Plants. Taxon. 55, 611

34 Hajibabaei, M. et al. (2007) DNA barcoding: how it complements taxonomy,

molecular phylogenetics and population genetics. Trends in genetics. 23, 167-72

35 Chase, M. W. et al. (2005) Land plants and DNA barcodes: short-term and long-term

goals. Philosophical transactions of the Royal Society of London. Series B, Biological

sciences. 360, 1889-95

36 Fazekas, A. J. et al. (2008) Multiple multilocus DNA barcodes from the plastid

genome discriminate plant species equally well. PloS one. 3, e2802

37 Chase, M. W. et al. (2007) A proposal for a standardised protocol to barcode all land

plants. Taxon. 56, 295-299

38 Lahaye, R. et al. (2008) DNA barcoding the floras of biodiversity hotspots. PNAS.

105, 2923-8

39 CBOL, P. W. G. (2009) A DNA barcode for land plants. PNAS. 106, 12794-7

40 Neubig, K. M. et al. (2008) Phylogenetic utility of ycf1 in orchids: a plastid gene more

variable than matK. Plant Systematics and Evolution. 277, 75-84

41 Jiménez, P. et al. (2004) High variability of chloroplast DNA in three Mediterranean

evergreen oaks indicates complex evolutionary history. Heredity. 93, 510-5

42 Lumaret, R. et al. (2002) Phylogeographical variation of chloroplast DNA in holm oak

(Quercus ilex L.). Molecular ecology. 11, 2327-36

43 Lumaret, R. et al. (2005) Phylogeographical Variation of Chloroplast DNA in Cork

Oak (Quercus suber). Annals of Botany. 96, 853-861

44 Rushton, B. S. (1993) Natural hybridization within the genus Quercus L. Annals of

forest science. 50, 73-90

45 Boavida, L. C. et al. (2001) Sexual reproduction in the cork oak (Quercus suber L). II.

Crossing intra- and interspecific barriers. Sexual Plant Reproduction. 14, 143-152

46 Bennett, K. D. (1997) Evolution and Ecology: the pace of life, Cambridge University

Press.

47 French, H. M. (2007) The Periglacial Environment, (3rd edn) Longman.

48 Comes, H. P. and Kadereit, W. K. (1998) The effect of Quaternary climatic changes on

plant distribution and evolution. Trends in Plant Science. 3, 432-438

49 Hewitt, G. M. (1999) Post-glacial re-colonization of European biota. Biological

Journal of the Linnean Society. 68, 87-112

50 Willis, K. J. et al. (2000) The Full-Glacial Forests of Central and Southeastern Europe.

Quaternary Research. 53, 203-213

51 Palmé, A. E. et al. (2003) Postglacial recolonization and cpDNA variation of silver

birch, Betula pendula. Molecular ecology. 12, 201-12

52 López de Heredia, U. et al. (2007) Molecular and palaeoecological evidence for

multiple glacial refugia for evergreen oaks on the Iberian Peninsula. Journal of

Biogeography. 34, 1505-1517

53 Willis, K. J. and Van Andel, T. H. (2004) Trees or no trees? The environments of

central and eastern Europe during the Last Glaciation. Quaternary Science Reviews.

23, 2369-2387

54 Hewitt, G. M. (1996) Some genetic consequences of ice ages, and their role in

divergence and speciation. Biological Journal of the Linnean Society.

55 Hewitt, G. M. (2000) The genetic legacy of the Quaternary ice ages. Nature. 405, 907-

56 Petit, R. J. et al. (1997) Chloroplast DNA footprints of postglacial recolonization by

oaks. PNAS. 94, 9996-10001

57 Taberlet, P. et al. (1998) Comparative phylogeography and postglacial colonization

routes in Europe. Molecular ecology. 7, 453-64

58 Konnert, M. and Bergmann, F. (1995) The geographical distribution of genetic

variation of silver fir (Abies alba, Pinaceae) in relation to its migration history. Plant

Systematics and Evolution. 196, 19-30

59 Dumolin-Lapègue, S. et al. (1997) Phylogeographic structure of white oaks throughout

the European continent. Genetics. 146, 1475-87

60 Pollard, D. and Barron, E. J. (2003) Causes of model-data discrepancies in European

climate during Oxygen Isotope Stage 3 with insights from the last glacial maximum.

Quaternary Research. 59, 108-113

61 Barron, E. and Pollard, D. (2002) High-Resolution Climate Simulations of Oxygen

Isotope Stage 3 in Europe. Quaternary Research. 58, 296-309

62 Kvacek, Z. and Walther, H. (1989) Paleobotanical studies in Fagaceae of the European

Tertiary. Plant systematics and Evolution. 162, 213-229

63 Dumolin, S. et al. (1995) Inheritance of chloroplast and mitochondrial genomes in

pedunculate oak investigated with an efficient PCR method. Theoretical and Applied

Genetics. 91, 1253-1256

64 López de Heredia, U. et al. (2005) The Balearic Islands: a reservoir of cpDNA genetic

variation for evergreen oaks. Journal of Biogeography. 32, 939-949

65 Kremer, A. and Petit, R. J. (1993) Gene diversity in natural populations of oak species.

Annals of forest science. 50, 186-202

66 Wright, S. (1931) Evolution in Mendelian Populations. Genetics. 16, 97-159

67 Thompson, J. D. (2005) Plant Evolution in the Mediterranean, Oxford University

Press.

68 Fineschi, S. et al. (2000) Chloroplast DNA polymorphism reveals little geographical

structure in Castanea sativa Mill. (Fagaceae) throughout southern European countries.

Molecular Ecology. 9, 1495 -1503

69 Petit, R. J. et al. (2002) Chloroplast DNA variation in European white oaks

Phylogeography and patterns of diversity based on data from over 2600 populations.

Forest Ecology and Management. 156, 5-26

70 Palmé, A. E. and Vendramin, G. G. (2002) Chloroplast DNA variation, postglacial

recolonization and hybridization in hazel, Corylus avellana. Molecular ecology. 11,

1769-79

71 Elena-Rosselló, J. A. et al. (1992) Evidence for hybridization between sympatric

holm-oak and cork-oak in Spain based on diagnostic enzyme markers. Vegetation. 99,

115-118

72 Hamza, N. B. (2010) Cytoplasmic and nuclear DNA markers as powerful tools in

populations‟ studies and in setting conservation strategies. African Journal of

Biotechnology. 9, 4510-4515

73 Levy, F. et al. (1996) A population genetic analysis of chloroplast DNA in Phacelia.

Heredity. 76, 143-55

74 Taberlet, P. et al. (1991) Universal primers for amplification of three non-coding

regions of chloroplast DNA. Plant Molecular Biology. 17, 1105-1109

75 Aoki, K. et al. (2003) Intraspecific sequence variation of chloroplast DNA among the

component species of evergreen broad-leaved forests in Japan. Journal of plant

research. 116, 337-44

76 Baraket, G. et al. (2008) Chloroplast DNA analysis in Tunisian fig cultivars (Ficus

carica L.): Sequence variations of the trnL-trnF intergenic spacer. Biochemical

Systematics and Ecology. 36, 828-835

77 Rathbone, D. A. et al. (2007) Microsatellite and cpDNA variation in island and

mainland populations of a regionally rare eucalypt, Eucalyptus perriniana

(Myrtaceae). Australian journal of botany. 55, 513-520

78 Kress, W. J. et al. (2005) Use of DNA barcodes to identify flowering plants. PNAS.

102, 8369-8374

79 Nishizawa, T. and Watano, Y. (2000) Primer pairs suitable for PCR-SSCP analysis of

chloroplast DNA in angiosperms. Journal of Phytogeography Taxon. 48, 63-66

80 Calonje, M. et al. (2008) Non-coding nuclear DNA markers in phylogenetic

reconstruction. Plant Systematics and Evolution. 282, 257-280

81 Hare, M. P. (2001) Prospects for nuclear gene phylogeography. Trends in Ecology &

Evolution. 16, 700-706

82 Bhargava, A. and Fuentes, F. F. (2010) Mutational dynamics of microsatellites.

Molecular biotechnology. 44, 250-66

83 Goldstein, D. B. and Pollock, D. D. (1997) Launching Microsatellites : A Review of

Mutation Processes and Methods of Phylogenetic Inference. Journal of Heredity. 88,

335-342

84 Qureshi, S. N. et al. (2004) EST-SSR: A New Class of Genetic Markers in Cotton. The

Journal of Cotton Science. 8, 112-123

85 Oliveira, E. J. et al. (2006) Origin, evolution and genome distribution of

microsatellites. Genetics and Molecular Biology. 29, 294-307

86 Lazrek, F. et al. (2009) The use of neutral and non-neutral SSRs to analyse the genetic

structure of a Tunisian collection of Medicago truncatula lines and to reveal

associations with eco-environmental variables. Genetica. 135, 391-402

87 Hornero, J. et al. (2001) Testing the Conservation of Quercus spp. Microsatellites in

the Cork Oak, Q. suber L. Silvae Genetica. 50, 3-4

88 Soto, A. et al. (2003) Nuclear Microsatellite Markers for the Identification of Quercus

ilex L . and Q . suber L . hybrids. Silvae Genetica. 52, 63-66

89 Nagaraj, S. H. et al. (2007) A hitchhiker‟s guide to expressed sequence tag (EST)

analysis. Briefings in bioinformatics. 8, 6-21

90 Bouck, A. and Vision, T. (2007) The molecular ecologist‟s guide to expressed

sequence tags. Molecular ecology. 16, 907-24

91 Kim, K. S. et al. (2008) Utility of EST-derived SSRs as population genetics markers in

a beetle. The Journal of heredity. 99, 112-24

92 Ueno, S. and Tsumura, Y. (2007) Development of ten microsatellite markers for

Quercus mongolica var. crispula by database mining. Conservation Genetics. 9, 1083-

93 Ellis, J. R. and Burke, J. M. (2007) EST-SSRs as a resource for population genetic

analyses. Heredity. 99, 125-32

94 Porth, I. et al. (2005) Linkage mapping of osmotic stress induced genes of oak. Tree

Genetics & Genomes. 1, 31-40

95 Cuénoud, P. et al. (2002) Molecular hylogenetics of Caryophyllales based on nuclear

18S rDNA and plastid and rbcl, atpB and matK DNA sequences. American Journal of

Botany. 89, 132-144

96 Jeffrey, J. A. and Lexer, C. (2008) A set of novel DNA polymorphisms within

candidate genes potentially involved in ecological divergence between Populus alba

and P. tremula, two hybridizing European forest trees. Molecular Ecology Resources.

8, 188-192

97 Casasoli, M. et al. (2006) Comparison of Quantitative Trait Loci for Adaptive Traits

Between Oak and Chestnut Based on an Expressed Sequence Tag Consensus Map.

Genetics Society of America. 172, 533-546

98 Dow, B. D. et al. (1995) Characterization of highly variable (GA/CT) n microsatellites

in the bur oak, Quercus macrocarpa. Theoretical and Applied Genetics. 91, 137-141

99 Steinkellner, H. et al. (1997) Identification and characterization of (GA/CT)n-

microsatellite loci from Quercus petraea. Plant molecular biology. 33, 1093-6

100 Kampfer, S. et al. (1998) Characterization of (GA)n Microsatellite Loci from Quercus

Robur. Hereditas. 129, 183-186

101 Alberto, F. et al. (2010) Population differentiation of sessile oak at the altitudinal front

of migration in the French Pyrenees. Molecular ecology. 19, 2626-39

102 Van Oosterhout, C. et al. (2004) Micro-Checker: Software for Identifying and

Correcting Genotyping Errors in Microsatellite Data. Molecular Ecology Notes. 4,

535-538

103 Thompson, J. D. et al. (1997) The CLUSTAL_X windows interface: flexible strategies

for multiple sequence alignment aided by quality analysis tools. Nucleic acids

research. 25, 4876-82

104 Larkin, M. a et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics

(Oxford, England). 23, 2947-8

105 Hall, T. A. (1999) BioEdit: A biological user-friendly sequence alignment editor and

analisis program. Nucleic Acids Symposium. 41, 95-98

106 Pina-Martins, F. and Paulo, O. S. (2008) Concatenator: Sequence Data Matrices

Handling Made Easy. Molecular ecology resources. 8, 1254-5

107 Swofford, D. L. (2003) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other

Methods). Version 4. Inauer Associates, Sunderland, Massachusetts

108 Ronquist, F. and Huelsenbeck, J. P. (2003) MrBayes 3: Bayesian phylogenetic

inference under mixed models. Bioinformatics. 19, 1572-1574

109 Nylander, J. (2004) MrModeltest V2. Evolutionary Biology Centre.

110 Bandelt, H. J. et al. (1999) Median-joining networks for inferring intraspecific

phylogenies. Molecular biology and evolution. 16, 37-48

111 Watterson, G. a (1978) The homozygosity test of neutrality. Genetics. 88, 405-17

112 Slatkin, M. (1994) An exact test for neutrality based on the Ewens sampling

distribution. Genetical Research. 64, 71-74

113 Slatkin, M. (1996) A correction to the exact test based on the Ewens sampling

distribution. Genetical Research. 68, 259-260

114 Excoffier, L. and Lischer, H. E. L. (2010) Arlequin suite ver 3.5: a new series of

programs to perform population genetics analyses under Linux and Windows.

Molecular Ecology Resources. 10, 564-567

115 Harpending, H. C. (1994) Signature of ancient population growth in a low-resolution

mitochondrial DNA mismatch distribution. Human biology an international record of

research. 66, 591-600

116 Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by

DNA polymorphism. Genetics. 123, 585-95

117 Fu, Y.-X. (1997) Statistical Tests of Neutrality of Mutations Against Population

Growth, Hitchhiking and Background Selection. Genetics Society of America. 147,

915-925

118 Ramos-Onsins, S. E. and Rozas, J. (2002) Statistical properties of new neutrality tests

against population growth. Molecular biology and evolution. 19, 2092-100

119 Rousset, F. (2008) genepop‟007: a complete re-implementation of the genepop

software for Windows and Linux. Molecular Ecology Resources. 8, 103-106

120 Librado, P. and Rozas, J. (2009) DnaSP v5: a software for comprehensive analysis of

DNA polymorphism data. Bioinformatics (Oxford, England). 25, 1451-2

121 Nei, M. (1987) Molecular Evolutionary Genetics. Columbia University Press, New

York, USA. 512 pp

122 Goudet, J. (1995) FSTAT (Version 1.2): A Computer Program to Calculate F-Statistics

. Journal of Heredity . 86, 485-486

123 Goudet, J. (2001) FSTAT, a program to estimate and test gene diversities and fixation

indices (version 2.9.3). Available ,

124 Peakall, R. and Smouse, P. E. (2006) genalex 6: genetic analysis in Excel. Population

genetic software for teaching and research. Molecular Ecology Notes. 6, 288-295

125 Wier, B. S. and Cockerham, C. C. (1984) Estimating F-statistics for the analysis of

population structure. Evolution. 38, 1358-1370

126 Slatkin, M. (1995) A measure of population subdivision based on microsatellite allele

frequencies. Genetics. 139, 457-62

127 Crawford, N. G. (2010) Smogd: Software for the Measurement of Genetic Diversity.

Molecular ecology resources. 10, 556-7

128 Jost, L. (2008) GST and its relatives do not measure differentiation. Molecular

Ecology. 17, 4015-4026

129 Hedrick, P. W. (2005) A Standardized genetic differentiation measure. Evolution. 59,

1633-1638

130 Nei, M. and Chesser, R. K. (1983) Estimation of fixation indices and gene diversities.

Annals of Human Genetics. 47, 253-259

131 Rousset, F. (1997) Genetic differentiation and estimation of gene flow from F-statistics

under isolation by distance. Genetics Society of America. 145, 1219-1228

132 Jensen, J. L. et al. (2005) Isolation by distance, web service. BMC genetics. 6, 13

133 Pritchard, J. K. et al. (2000) Inference of population structure using multilocus

genotype data. Genetics. 155, 945-59

134 Hubisz, M. J. et al. (2009) Inferring weak population structure with the assistance of

sample group information. Molecular ecology resources. 9, 1322-32

135 Falush, D. et al. (2003) Inference of population structure using multilocus genotype

data: linked loci and correlated allele frequencies. Genetics. 164, 1567-87

136 Evanno, G. et al. (2005) Detecting the number of clusters of individuals using the

software STRUCTURE: a simulation study. Molecular ecology. 14, 2611-20

137 Rosenberg, N. A. (2004) Distruct: a program for the graphical display of population

structure. Molecular Ecology Notes. 4, 137-138

138 Guillot, G. et al. (2005) Geneland: a computer package for landscape genetics.

Molecular Ecology Notes. 5, 712-715

139 Guillot, G. et al. (2005) A spatial statistical model for landscape genetics. Genetics.

170, 1261-80

140 François, O. et al. (2006) Bayesian clustering using hidden Markov random fields in

spatial population genetics. Genetics. 174, 805-16

141 Excoffier, L. et al. (1992) Analysis of molecular variance inferred from metric

distances among DNA haplotypes: application to human mitochondrial DNA

restriction data. Genetics. 131, 479-91

142 Burgarella, C. et al. (2009) Detection of hybrids in nature: application to oaks

(Quercus suber and Q. ilex). Heredity. 102, 442-52

143 Lexer, C. et al. (2004) Hybrid zones as a tool for identifying adaptive genetic variation

in outbreeding forest trees: lessons from wild annual sunflowers (Helianthus spp.).

Forest ecology and management. 197, 49-64

144 Petit, R. J. et al. (2002) Identification of refugia and post-glacial colonisation routes of

European white oaks based on chloroplast DNA and fossil pollen evidence. Forest

Ecology and Management. 156, 49-74

145 Dow, B. D. and Ashley, M. V. (1996) Microsatellite analysis of seed dispersal and

parentage of samplings in bur oak, Quercus macrocarpa. Molecular ecology. 5, 615-

146 Hu, X. S. and Ennos, R. A. (1999) Impacts of seed and pollen flow on population

genetic structure for plant genomes with three contrasting modes of inheritance.

Genetics. 152, 441-50

147 Streiff, R. et al. (1999) Pollen dispersal inferred from paternity analysis in a mixed oak

stand of Quercus robur L . and Q. petraea ( Matt .) Liebl . Molecular Ecology. 8, 831-

148 Ramírez-Valiente, J. a et al. (2009) Elucidating the role of genetic drift and natural

selection in cork oak differentiation regarding drought tolerance. Molecular ecology.

18, 3803-15

149 Wellenreuther, M. et al. (2011) Environmental and climatic determinants of molecular

diversity and genetic population structure in a coenagrionid damselfly. PloS one. 6,

e20440

150 Lewontin, R. C. (1964) The Interaction of Selection and Linkage. I. General

Considerations; Heterotic Models. Genetics. 49, 49-67

151 Meirmans, P. G. and Hedrick, P. W. (2011) Assessing population structure: F(ST) and

related measures. Molecular ecology resources. 11, 5-18

Supporting Information

Table S1.1: Description of the cpDNA fragments used concerning primer sequences, annealing temperature (Ta

in ºC) and fragment size (in base pairs).

Table S1.2: Primer sequences and bibliographic references, annealing temperature (in ºC), fragment size (in

base pairs) and locus information for the nuDNA fragments.

Supporting Information 1

Information regarding the primers used for the amplification of each cpDNA fragment is

summarized in table S1.1, as well as the annealing temperatures for PCR amplification and

fragments size.

Primers

Locus Forward Reverse Ta Size Reference

TrnL-F 5’ GGT TCA AGT CCC TCT

ATC CC 3’ 5’ ATT TGA ACT GGT GAC ACG

AG 3’ 65 381

Taberlet et al., 1991

TrnS-PsbC 5’ TGA ACC TGT TCT TTC

CAT GA 3’ 5’ GAA CTA TCG AGG GTT

CGA AT 3’ 65 250

Nishizawa & Watano, 2000

TrnH-PsbA 5’ CGC GCA TGG TGG ATT

CAC AAT CC 3’ 5’ GTT ATG CAT GAA CGT AAT

GCT C 3’ 65 478 Kress et al., 2005

matK 5' CGA TCT ATT CAT TCA

ATA TTT C 3' 5' TCT AGC ACA CGA AAG TCG

AAG T 3' 65 740

Cuénoud et al., 2002

rbcla 5' ATG TCA CCA CAA ACA

GAG ACT AAA GC 3' 5' GTA AAA TCA AGT CCA CCR

CG 3' 65 552

Kress & Erickson, 2007

A description regarding the three nuclear candidate genes tested in this study is summarized

in table S1.2, as well as the annealing temperatures and primers for PCR amplification and

fragments size.

Primers

Locus Forward Reverse Description Ta size Reference

EST 2T13 5' CAT GCA CTG

CCA ATC TCA GAG A 3'

5' ATA ATT TGC CTC ATC ACT ACA TAA GA

Osmotic stress related gene

55 249 Porth et al.,

Cons 58 5'CCA ATT CTC TTA GTG GCA

AGG 3'

5' GCT TTG GGA TGA TGT TTT GG 3'

Auxin repressed protein

* * Casasoli et al., 2006

Phyt B 5' ATA TGG CGA ATA TGG GGT CA

5' GGC ATC CAT TTC TGC ATT CT 3'

Phytocrome B, involved in flower

phenology * *

Jeffrey & Lexer, 2008

* Amplification product was never obtained for cork oak.

Table S2.1: Description of the nuSSRs used concerning primer sequences, annealing temperatures (Ta in ºC),

repeat motif and size ranges (in base pairs).

Information regarding the 11 dinucleotide nuclear microsatellite (nuSSRs) markers is

summarized in table S2.1. A description and relevant information about the 6 EST-SSRs

tested in this study is also summarized in table S2.2.

Primers

Size range (bp)

Locus Forward Reverse Ta Repeat motif Expected Found Reference

MsQ13 5' TGG CTG CAC

CTA TGG CTC TTA G 3'

5' ACA CTC AGA CCC ACC ATT

TTT CC 3' 55 (AG)n 222-246 218

Dow et al., 1995

QpZAG9 5' GCA ATT ACA

GGC TAG GCT GG 3'

5' GTC TGG ACC TAG CCC TCA TG

3' 50 (AG)12 182-210 223-249 Steinkellner

et al., 1997

QpZAG15 5' CGA TTT GAT

AAT GAC ACT ATG G 3'

5' CAT CGA CTC ATT GTT AAG

CAC 3' 57 (AG)23 108-152 101-135 Steinkellner

et al., 1997

QpZAG36 5' GAT CAAA AAT TTG GAA TAT TAA

GAG AG 3'

5' ACT GTG GTG GTG AGT CTA ACA TGT AG 3'

* (AG)19 210-236 * Steinkellner

et al., 1997

QpZAG46 5' CCC CTA TTG

AAG TCC TAG CCG 3'

5' TCT CCC ATG TAA GTA GCT

CTG 3' * (AG)13 190-222 * Steinkellner

et al., 1997

QpZAG110 5' GGA GGC TTC

CTT CAA CCT ACT 3'

5' GAT CTC TTG TGT GCT GTA

TTT 3' 50 (AG)15 206-262 208-258 Steinkellner

et al., 1997

QrZAG11

5' CCT TGA ACT CGA AGG TGT CCT

5' GTA GGT CAA AAC CAT TGG

TTG ACT 3' 50 (TC)18 238-263 255-281 Kampfer et

al., 2004

QrZAG7

5' CAA CTT GGT GTT CGG ATC AA

5' GTG CAT TTC TTT TAT AGC ATT CAC 3'

50 (TC)17 115-153 115-133 Kampfer et

al., 2004

QrZAG20 5' CCA TTA AAA

GAA GCA GTA TTT TGT 3'

5' GCA ACA CTC AGC CTA TAT CTA GAA 3'

50 (TC)22 160-200 161-171 Kampfer et

al., 2004

5’ GAT CTC TTT GTC AAC CCA GAC

5’ ATG TGT GTG GTG ATG GGT

TT 3' * (CA)n 258-276 * Simões de

Matos 2007

QsD8 5’ GAT CCT CTG

CTT CTC TCT G 3’

5’CTG CAA CTT TAT CCG CCT CC

3’ * (CA)n 140-150 * Simões de

Matos 2007

* Amplification product was never obtained, or the scoring was unreliable.

Primers Size range (bp)

Locus Forward Reverse Ta Repeat motif Expected Found Description

QmOST1 DN949770

5' CAA CCA TCG AGG CCA TTA

CGA A 3'

5' TCA CCG ATC TTG AAG GTC

CTC GA 3' 58 (AG)19 149-171 134-152

EST Non-coding

QmD12 CR627959

5' GCT CCC TGG TAG TCG GCT

AAA GA 3'

5' CAA TTG GGA CAA CAT GGA

AGC AT 3' 58 (GCA)7 243-251 240-246

EST Coding Zinc finger

protein

QmAJ1 AJ577265

5' ATT CAG GCC GCA AAT CAA

TAA GG 3'

5' GAA ACT GGT CCC CTT CTC

TTG GA 3'

57 (GAA)6 374-380 360-375

EST Coding Pheromone

receptor-like protein

QmDN1 DN950717

5' TAG TTT TCC CAG CGA ATC

CAA CA 3'

5' CTT CTT GAA GGG ACT GAC

CCC AT 3' 58 (GGA)6 242-261 236

EST Coding Salt tolerance

protein

QmDN2 DN949776

5' CAA CCA TCG AGG CCA TTA

CGA A 3'

5' TCA CCG ATC TTG AAG GTC

CTC AG 3'

* (AG)9 156-168 *

EST Non-coding

60S ribosomal protein L21

QmDN3 DN950726

5' TCA AAC AAT CTC AAG GCT

CCC AA 3'

5' GCT TTT GAG AAA CTT TGG

CCA CC 3' 58 (TC)10 361-381 361-375

EST Non-coding

Putative carboxyl-terminal

proteinase

* Amplification product was never obtained, or the scoring was unreliable.

Table S2.2: Primer sequences [92], annealing temperature (in ºC), repeat motif, size ranges (in base pairs) and

locus information for the EST-SSRs.

The cpDNA concatenated matrix has a length of 1109 bp, where 92 are variable. The model

of sequence evolution for the Bayesian analysis (BA) was calculated separately for each

cpDNA data set. The BA tree showed a very similar result to that of the MP analysis,

therefore the MP tree for the concatenated dataset is presented in Fig. S3.1. The concatenated

tree supports the results of the individual trees, where the 4 major groups are present (Fig.

3.1a, Fig. 3.2 and Fig. 3.3). Highlighted in yellow, the Group A is composed by the cork oak

samples belonging to the pure lineage distributed in the three sublineages (A1, A2 and A3) in

accordance with the TrnS/PsbC tree (Fig. 3.1). Group B is the most variable one, composed

by several haplotypes of cork oak samples from the introgressed lineage, as well as with

samples from Quercus ilex (subs rotundifolia and ilex) and Quercus coccifera. The Group C,

composed by several Quercus species, is closely related to Group A. Group D is constituted

by Quercus rubra, which is placed as the most distant species from cork oak, as it happened

in the phylogeny of the TrnH/PsbA fragment (Fig. 3.2)

Figure S3.1: Maximum parsimony tree of the cpDNA concatenated dataset. Four groups are represented and

color coded. Group A is highlighted in yellow: Cork oak‟s Pure lineage (Bright Yellow - Sublineage A2 (Sl

A2); Brownish-Yellow – Sublineage A3 (Sl A3); Light Yellow – Sublineage A1(Sl A1)); Group B (orange –

cork oak‟s introgressed lineage; green – Q. coccifera; red – Q. rotundifolia; pink – Q. ilex); Group C is

highlighted in dark blue and is composed of several Quercus species: Q. faginea, Q. robur, Q. pyrenaica, Q.

canariensis and Q. lusitanica; Group D is highlighted in light blue and is constituted by Q. rubra. Numbers at

the nodes are the bootstrap support value obtained from 1000 replicates for the MP analysis and the Bayesian

credibility value.

Median-joining analysis of the cpDNA fragments resulted in haplotype networks (Fig. S4.1,

Fig. S4.2 and Fig. S4.3) reflecting the four major groups in the phylogenetic trees. Also they

show shared haplotypes for Q. suber, in clade B, with Quercus coccifera, Q. ilex ilex and

Quercus ilex rotundifolia. Although the networks do not clearly reflect the phylogenetic

relationships between the groups they bring visual support information about the distance

between them, as the networks appear as a simple and clear way to represent the mutational

steps between haplotypes, and also about the haplotype frequencies. The median-joining

networks of the distribution representing the observed haplotypes for each Quercus species,

for the fragments TrnS/PsbC, TrnH/PsbA and TrnL-F, are respectively presented in Fig. S4.1,

Fig. S4.2 and Fig. S4.3.

Figure S4.1: A median-joining haplotype network generated from 250 bases of the TrnS/PsbC intergenic spacer

region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading

indicates the proportion of individuals with a particular haplotype for a given species (Yellow: cork oak‟s Pure

lineage and Q. cerris (Bright Yellow - Sublineage A2; Brownish-Yellow – Sublineage A3, including Q. cerris;

Light Yellow – Sublineage A1); Orange: cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q.

rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light

Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black

circles indicate the presence of a missing ancestral haplotype

Figure S4.2: A median-joining haplotype network generated from 478 bases of the TrnH/PsbA intergenic

spacer region. Circle size reflects the relative frequency of each haplotype across all 10 Quercus species.

Shading indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork

oak‟s pure lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q.

rotundifolia; Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light

Blue Q. rubra. Each number in the network indicates the number of mutations between the haplotypes. Black

circles indicate the presence of a missing ancestral haplotype.

Figure S4.3: A median-joining haplotype network generated from 381 bases of the TrnL-F intergenic spacer

region. Circle size reflects the relative frequency of each haplotype across 10 Quercus species. Shading

indicates the proportion of individuals with a particular haplotype for a given species (Yellow: Cork oak‟s pure

lineage, with Q. cerris; Orange: Cork oak‟s introgressed lineage; Green: Q. coccifera; Red: Q. rotundifolia;

Pink: Q. ilex; Dark Blue: Q. robur, Q. pyrenaica, Q. faginea, Q. lusitanica, Q. canariensis; Light Blue Q. rubra.

Each number in the network indicates the number of mutations between the haplotypes. Black circles indicate

the presence of a missing ancestral haplotype.

Global evaluation of the EST-SSR dataset using MICRO-CHEKER v2.2.3 [102] revealed no

evidence of genotyping errors due to stuttering or large allele dropout, but identified possible

null alleles, by a general excess of homozygotes, at two loci: QmOST1 and QmDN3

(p<0.05). For QmOST1 locus there is the possibility of null alleles for the populations of

Haza del Lino (HAZ) and Mekna (MEK) (Fig. S5.1). However, for both populations, when

analyzing the graphics the observed values of the homozygote frequencies are barely outside

the range of the expected values. Therefore this microsatellite was not discarded from the

following analysis.

For the QmDN3 locus the observed homozygote frequencies were clearly out of the range of

what would be expected. The fact that this was detected for all the 13 populations provides a

strong indicator that there seems to be in fact null alleles for this locus. A representative

example of all populations is exhibited in Fig. S5.2. As a result this locus was discarded from

all subsequent analyses.

Regarding the nuSSRs dataset, the global evaluation with MICRO-CHEKER revealed, again,

no evidence of genotyping errors due to stuttering or large allele dropout, but identified

possible null alleles in a few populations for the markers: QpZAG110 (Fig. S5.3), QrZAG11

(Fig. S5.4) and QrZAG20 (Fig. S5.6). Specifically for QpZAG110 locus, null alleles were

detected for the populations of Serra da Arrábida (ARR) e Serra do Buçaco (BUC); for the

QrZAG11 locus the possibility of null alleles was detected for Serra da Estrela (EST)

population; and for QrZAG20 locus for the populations of Puglia (PUG) and Serra de

Monchique (MON). However, when analyzing the graphics the observed values of the

homozygote frequencies are barely outside the range of the expected values, and the

indication of null alleles is only for one or two populations out of the 13 analysed. Therefore

these microsatellites were not discarded from the following analysis.

Figure S5.1: MICRO-CHEKER charts for the QmOST1 locus for the populations of HAZ (Figs. a and b) and

MEK (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population

HAZ; b) Homozygote frequencies for the population HAZ; c) Frequency differences in base pair for the

population MEK; d) Homozygote frequencies for the population MEK.

Figure S5.2: MICRO-CHEKER charts of the QmDN3 locus for the population of Serra da Arrábida (ARR), as

a representative of the indication of null alleles for all the 13 populations. The significance level is 0.05; a)

Frequency differences in base pair for the population ARR; b) Homozygote frequencies for the population ARR.

Figure S5.3: MICRO-CHEKER charts for the locus QpZAG110 for the populations ARR and BUC. The

significance level is 0.05; a) Frequency differences in base pair for the population ARR; b) Homozygote

frequencies for the population ARR; c) Frequency differences in base pair for the population BUC; d)

Homozygote frequencies for the population BUC.

Figure S5.4: MICRO-CHEKER charts for the locus QrZAG11 for the populations EST. The significance level

is 0.05; a) Frequency differences in base pair; b) Homozygote frequencies.

Figure S5.5: MICRO-CHEKER charts for the QrZAG20 locus for the populations of PUG (Figs. a and b) and

MON (Figs. c and d). The significance level is 0.05; a) Frequency differences in base pair for the population

PUG; b) Homozygote frequencies for the population PUG; c) Frequency differences in base pair for the

population MON; d) Homozygote frequencies for the population MON.

Linkage disequilibrium is the non-random association of alleles at two or more loci. This is a

statistical association and the loci do not have necessarily to be physically linked [150].

Genotypic linkage disequilibrium between all pairs of loci was tested by means of a

contingency exact test using GenePop v4 [119] (Table S6.1). No significant departure from

the null hypothesis of linkage equilibrium was detected. Therefore the eight polymorphic

microsatellite markers should be useful for this study.

Loci combination p

EST-SSRs

QrOST1 & QpD12 0.37

QrOST1 & QmAJ1 0.10

QpD12 & QmAJ1 0.07

nuSSRs

QpZAG110 & QpZAG9 0.38

QpZAG110 & QrZAG20 0.90

QrZAG20 & QrZAG7 0.34

Complete dataset

QrOST1 & QpZAG110 0.95

QrOST1 & QrZAG20 0.18

QpD12 & QpZAG110 0.88

QpD12 & QpZAG9 0.95

QpD12 & QrZAG20 0.05

QpD12 & QrZAG7 0.25

QmAJ1 & QpZAG110 0.58

QmAJ1 & QpZAG9 0.96

QmAJ1 & QrZAG20 0.59

QmAJ1 & QrZAG7 0.15

Table S6.1: Test for linkage disequilibrium for all pairs of loci

using Fisher's method, implemented in GenePop software.

Loci combination p

EST-SSRs

QrOST1 & QpD12 0.37

QrOST1 & QmAJ1 0.10

QpD12 & QmAJ1 0.07

nuSSRs

QpZAG110 & QpZAG9 0.38

Complete dataset

QpD12 & QpZAG110 0.88

QpD12 & QpZAG9 0.95

QpD12 & QrZAG7 0.25

QmAJ1 & QpZAG110 0.58

QmAJ1 & QpZAG9 0.96

QmAJ1 & QrZAG7 0.15

QmAJ1 & QrZAG11 0.82 Table S6.1: Test for linkage disequilibrium for all pairs of loci

using Fisher's method, implemented in GenePop software.

Table S7.1: Pair Dest values between every population.

ALG – Forêt des Guerbès (Algeria); ARR – Arrábida (Portugal); BUC – Buçaco (Portugal); CAT – Cataluña

(Spain); HAZ – Haza del Lino (Spain); EST – Estrela (Portugal); GER – Gerês (Portugal); ITA – Puglia

(Italy); KEN – Kenitra (Marocco); TAZ – Taza (Marocco); MON – Monchique (Portugal); SIN – Sintra

(Portugal); TUN – Mekna (Tunisia).

Although FST is widely used as a measure of population differentiation and structure, it has

been criticized because of its dependency on within-population diversity, which has led to the

development of replacement statistics such as D, the measure of actual differentiation among

populations, according to Jost [128]. Nevertheless, Meirmans & Hendrick [151] recommend

continuing to use FST in combination with the new statistics.

Tests of pairwise Dest were performed for the thirteen populations. Both SSR‟s matrices

were analysed together. The overall genetic differentiation at the microsatellite loci was low

(Pairwise FST from 0.000 to 0.097) (Table S7.1). The Dest values very resembled the FST and

RST matrices (Table 3.6), although with a tendency to be lower.

-- 0.010 0.021 0.031 0.005 0.012 0.007 0.056 0.039 0.006 0.033 0.012 0.000 ALG

-- 0.000 0.017 0.016 0.001 0.000 0.050 0.040 0.008 0.000 0.000 0.024 ARR

-- 0.031 0.060 0.002 0.002 0.041 0.065 0.009 0.005 0.003 0.028 BUC

-- 0.035 0.050 0.029 0.097 0.051 0.045 0.043 0.037 0.070 CAT

-- 0.032 0.017 0.073 0.026 0.009 0.039 0.030 0.012 HAZ

-- 0.000 0.037 0.057 0.013 0.001 0.016 0.015 EST

-- 0.033 0.030 0.013 0.001 0.005 0.013 GER

-- 0.042 0.034 0.055 0.072 0.066 PUG

-- 0.025 0.069 0.070 0.062 KEN

-- 0.037 0.016 0.010 TAZ

-- 0.029 0.046 MON

-- 0.031 SIN

-- MEK

The estimation of the number of populations (K) should be treated with care and a biological

interpretation of K may not be straightforward. We used the posterior probability of the data

for a given K, LnP(D), to identify the most probable number of clusters using both DeltaK

(DK) ad hoc statistics [136] and by plotting the average values of LnP(D). As the LnP(D), the

(ad hoc) estimate for the number of groups given by STRUCTURE might not always

correspond to the real number of clusters, the DeltaK, an ad hoc quantity related to the second

order rate of change of the log probability of data with respect to the number of clusters,

tends to be a good predictor of the real number of clusters.

The EST-SSR‟s and nuSSR‟s datasets were analysed separately and then merged together to

determine the species genetic structure (Fig. 3.8). The plots of the logarithm of the

probability of the data [LnP(D)] and of the Evanno‟s criterion [136] are represented,

respectively, in Fig. S8.1, Fig. S8.2 and Fig. S8.3 for the EST-SSRs, nuSSRs and combined

datasets.

Figure S8.1: Estimated number of populations (K) derived from the STRUCTURE clustering analyses, for the

EST-SSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20

replicated runs (above) and DeltaK (below) are plotted as a function of the number of clusters tested (K from 1

to 13).

nuSSRs dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20

to 13).

combined dataset. Mean and posterior probabilities of the data [LnP(D)] with standard deviation over 20

to 13).

Several AMOVA (Hierarchical Analysis of Molecular Variance) [141] analysis (with 1000

permutations) (Table S9.1), were performed based on the allelic frequencies (FST values). It

was intended to verify the distribution of the genetic variability between the different

hierarchy levels: groups (FCT), populations (FSC) and individuals (FST). The different

structures considered were in accordance with the clusters (K) obtained by the softwares

STRUCTURE [133] (Fig. 3.8) and GENELAND [138] (Fig. 3.9). It is assumed that the best

genetic structure obtained is the one that explains the major part of variation by the groups

(FCT), that is, it maximizes the break between populations

Among groups Among populations

within groups Within populations

% Fct % Fsc % Fst

Two clusters (K=2) 2.81 0.02814

*** 3.16

0.03249

*** 94.03

0.02814

Four clusters (K=4) 3.37 0.05817

*** 2.45

0.02532

*** 94.18

0.03370

Six clusters (K=6) 4.99 0.05844

*** 0.85

0.00897

** 94.16

0.04992

Table S9.1: Variation percentages over different levels estimated with AMOVA. The analysis

was performed for the SSR loci combined dataset, based on Fst values.

%= Percentage explained by the total of molecular variance

Significance level **P<0.01, ***P<0.001

Differentiation and genetic variability in cork oak...

Documents