+ All Categories
Home > Documents > Supplementary Materials for -...

Supplementary Materials for -...

Date post: 21-Mar-2018
Category:
Upload: phungnhi
View: 215 times
Download: 2 times
Share this document with a friend
63
www.sciencemag.org/cgi/content/full/336/6080/466/DC1 Supplementary Materials for Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe Pontus Skoglund,* Helena Malmström, Maanasa Raghavan, Jan Storå, Per Hall, Eske Willerslev, M. Thomas P. Gilbert, Anders Götherström,* Mattias Jakobsson* *To whom correspondence should be addressed. E-mail: [email protected] (P.S.); [email protected] (A.G.); [email protected] (M.J.) Published 27 April 2012, Science 336, 466 (2012) DOI: 10.1126/science.1216304 This PDF file includes: Materials and Methods Supplementary Text Figs. S1 to S22 Tables S1 to S15 References (3385)
Transcript

www.sciencemag.org/cgi/content/full/336/6080/466/DC1

Supplementary Materials for

Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe

Pontus Skoglund,* Helena Malmström, Maanasa Raghavan, Jan Storå, Per Hall, Eske Willerslev, M. Thomas P. Gilbert, Anders Götherström,* Mattias Jakobsson*

*To whom correspondence should be addressed. E-mail: [email protected] (P.S.);

[email protected] (A.G.); [email protected] (M.J.)

Published 27 April 2012, Science 336, 466 (2012) DOI: 10.1126/science.1216304

This PDF file includes:

Materials and Methods Supplementary Text Figs. S1 to S22 Tables S1 to S15 References (33–85)

Contents

1 Archaeological context 41.1 Burial sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Dating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Osteoarchaeology of the analysed samples . . . . . . . . . . . . . . . . . . . 6

2 Sample selection and DNA extraction 6

3 Library preparation and sequencing 7

4 Data preprocessing and alignment 84.1 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Stampy performance for ancient DNA . . . . . . . . . . . . . . . . . . . . . 84.3 Ste7 resequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 mtDNA analysis 95.1 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.2 Haplogroup assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6 Autosomal data processing 116.1 HM3+FINHM+HGDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116.2 FINHM+POPRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126.3 1KGPomni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126.4 1KGPomni+Sweden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126.5 Complete genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

7 Relative sequencing/alignment error rates 13

8 PCA and Procrustes analysis 138.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148.2 Validation using Saqqaq data . . . . . . . . . . . . . . . . . . . . . . . . . 14

9 Model-based clustering 14

10 Authenticity 1510.1 Patterns of molecular degradation . . . . . . . . . . . . . . . . . . . . . . . 1510.2 Testing for autosomal contamination . . . . . . . . . . . . . . . . . . . . . 15

11 Allele sharing analysis 16

12 Estimation of population divergence time from complete genomes 1612.1 Divergence from genomes of Northwest European ancestry . . . . . . . . . . 1712.2 Divergence from genomes of East Asian and West African ancestry . . . . . 18

13 Modelling European population history 1813.1 Testing population topology using genealogical concordance . . . . . . . . . 1813.2 Gene flow between a population related to Gok4 and extant European populations 1913.3 Estimation of Neolithic farmer ancestry in European populations . . . . . . . 20

2

14 The Neolithic hunter-gatherer gene pool is not fully represented in modernEuropeans 21

List of Figures

S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22S2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24S4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25S5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26S6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27S7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28S8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29S9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30S10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31S11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32S12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33S13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34S14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35S15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36S16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37S17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38S18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39S19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40S20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41S21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42S22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

List of Tables

S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44S2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46S4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47S5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48S6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49S7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50S8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51S9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52S10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53S11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54S12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55S13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56S14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57S15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3

1 Archaeological context

The process of Neolithization in Scandinavia starts around 6,000 before present (BP) whenthe oldest traces of farming, husbandry and material culture expressions of the Funnel BeakerCulture (TRB; after the German term Trichterbecher Kultur) appear in the archaeologicalrecord (33, 34). The TRB culture of Scandinavia represents the northernmost distribution ofa large Western European complex (e.g. 35). The oldest TRB finds in the Southern parts ofScandinavia are roughly contemporary with the end-phase of the Mesolithic hunter-gathererErtebolle Culture (33). In Sweden, the introduction of TRB marks the transition to the Ne-olithic Stone Age despite the fact that hunter-gatherer groups are present in large parts of thearea (12). The TRB complex is visible in the archaeological record until approximately 5,000BP (12, 15). In the coastal areas of Southern and Eastern Middle Sweden the Pitted WareCulture (PWC), representing a local hunter-gatherer complex, appears between the time span5,300-4,400/4,000 BP (12). Material culture expressions such as pottery style and generalsubsistence patterns bear the greatest similarities with groups East of the Baltic Sea (e.g. 36).The PWC culture represents the youngest (last) hunter-gatherer culture in the area and iscontemporary with TRB groups in the early phase, but later the PWC appears side-by-sidewith the Corded Ware Culture (or Battle Axe Culture; 12).

The relationship between the farming and the hunter-gatherer complexes has been one ofthe most debated questions in Scandinavian archaeology (e.g. 12) and the question is ofgeneral relevance for archaeology and more specifically for the Neolithisation of the Northernparts of Europe (e.g. 35). The key question has been whether the TRB culture appear as aresult of migrations to the area or whether the change represent adoption of new traditionsand material culture expressions by local groups (12). Similar questions have been debatedfor much of the European continent (3, 5)

Osteological studies have revealed a certain degree of morphological differences in the skeletonbetween individuals from TRB and PWC contexts. It has been shown that PWC individualsexhibit skeletal traits characteristic of cold-adaptation, such as certain facial features and limbproportions (crural index) which are absent in TRB individuals (37,38). Traits in the dentitionexhibit both differences (39) and simlarities between the two groups (40). The mean statureof the two groups is roughly similar for males (164.0 cm for TRB and 164.6 cm for PWC)but there is a larger difference for females (154.6 cm for TRB and 157.7 for PWC) (38). Theosteological observations are affected by small sample sizes and especially the TRB samplesare affected by the collective burial custom. The bones are normally recovered commingledand few skeletal elements may be associated to articulated skeletons. We also note the gen-eral difficulties involved in interpreting the reasons for morphological differences in skeletalmaterial. Thus, it has not been possible to make conclusive interpretations of the relationshipbetween the TRB and PWC individuals based on osteological data. However, chemical andbiomolecular analyses of human bones have in recent years shed new light on the relationshipbetween material culture units and human groups in general and more specifically the relation-ship between the TRB and PWC complexes. Isotopic studies (Strontium, Sr) indicate thatsome level of mobility occurred of those buried in the megaliths in Falbygden, Vastergotland,and that as much as 25% of the population may have been non-local (by birth) (41). Inter-estingly, the data indicate that the most likely area of origin of those interpreted as non-locals

4

was to the southwest of the Falbygden area, i.e., areas with abundant TRB remains and arte-facts but with few traces of PWC. The analyses indicate a low level of mobility between thegeographical areas of PWC and TRB.

The relationship between the TRB and PWC has further been investigated by stable analysisof dietary patterns (Carbon, C and Nitrogen, N). Such analyses have shown that on the islandof Oland in the Baltic Sea, the two groups apparently lived side by side but kept distinctand different dietary patterns (42). TRB individuals buried in megalithic structures exhibita dietary pattern with mainly terrestrial proteins while PWC individuals exhibit a diet basedon marine proteins. The dietary pattern at Ajvide and Ire on the island of Gotland is similarto that of the Pitted Ware Culture individuals on Oland indicating a high intake of marineproteins (43–45). Stable isotope analyses of human remains from several megalithic structuresin Vastergotland, among them samples from the same parish as the TRB individual (Gok4)analysed in the present study, show a dietary pattern including proteins mainly of terrestrialorigin (44, 46, 47). The TRB and PWC are two distinct archaeological cultural units andthe differences in material culture expressions between them may be regarded as fundamentaland includes artifact types, burial constructions and burial customs, settlement pattern andsubsistence strategies as well as dietary patterns.

1.1 Burial sites

In the present study, aDNA of four individuals from the Stone Age were successfully analysed(Table S1). Three of the individuals originate from two Pitted Ware Culture cemetery andsettlement sites on the island of Gotland (Ajv52, Ajv70 and Ire8). One of these sites, Ajvide,is located in the southwestern part of the of Gotland while Ire is located in the northern partof the island. The material culture expressions of the two Gotland sites are characteristicof the PWC and show striking resemblances. One individual (Gok4) originates from themegalithic burial structure (Gokhem 94) characteristic of the Funnel Beaker Culture in Gokhemparish, Vastergotland, Sweden. The fifth sample (Ste7) that was excluded from the analyses(see sections 4 and 7) originates from the same megalithic structure. Gokhem 94 containedmore than 9,000 fragments of commingled human bones, together with the remains of fourarticulated skeletons. The megalith was in (primary) use between approximately 3,400-2,900BC.

1.2 Dating

The sample from Gokhem (Gok4) has been 14C-dated to 4,921 ± 50 calBP (approx. 3100-2900 BC; 15 . The three Neolithic hunter-gatherer samples from Gotland have not been dateddirectly but several radiocarbon dates of other burials from the two cemeteries indicate arougly similar age as the individual from Gokhem. Radiocarbon dates of several burials fromthe sites fall roughly between 4,360-4,005 BP at Ajvide (48) and between 4,280-3,850 BP forIre (49). This corresponds roughly to 3,300-2,350 BC and 3,100-2,150 BC in calibrated years.Note that the dates for Ire were done with conventional dating methods while the Ajvide dateswere obtained using accelerator mass spectrometry (AMS) and are therefore more reliable.

5

1.3 Osteoarchaeology of the analysed samples

• Ajv52 (Ajvide 52; ID:146): Child, 7 years old. Grave 52 at Ajvide contained skeletonsof two children, one aged approximately 7 years and another 1.5-2 years (50). The olderindividual was sampled for the present study. Grave goods associated with the olderchild were bone points, beads made of seal teeth and bird bones, a worked pig tusk,and stone implements.

• Ajv70 (Ajvide 70; ID:164): Male aged 20-25 years (50 ; Fig. S1). Burial goods recoveredwere the base of a ceramic pot (PWC) and a worked bone artefact.

• Ire8 : Female, aged 40-50 years. Close to the skeleton one shard of pottery and a fewfish bones were recovered (49).

• Gok4 : Female aged approximately 20 years recovered in the megalithic structure Gokhem94 (39, 41). The skeleton was not complete. Analyses of Sr in teeth (87Sr/86Sr =0.7171) indicate that the individual was born within an area between 50-100 km southor southwest from the megalithic structure (17).

• Ste7 : Analysed tooth found as strayfind in the megalithic structure Gokhem 94. Noosteological data available.

2 Sample selection and DNA extraction

DNA extracts from seven Neolithic samples, five of which human (Ajv52, Ajv70, Ire8, Ste7and Gok4), were chosen from a previous study (8) based on several authenticity criteria thatindicated suitable preservation and low levels of contamination:

• Quantitative real-time PCR (qPCR) had shown that the extracts contained more than1,000 template copies of an 80 bp mtDNA fragment (between 47,334 and 1,497 copiesfor Ajv70 and Gok4, Table S4A in (8))

• qPCR had revealed an inverse correlation between fragment length and DNA yield (8).The degradation ratio was between 9.07 and 3.23 for Ajv70 and Ste7, see Table S4Ain (8).

• Roche GS FLX deep sequencing of short overlapping d-loop amplicons produced vastamount of synthetic clones that gave consistent results between clones deriving fromindependent DNA extractions (8).

• Consensus mtDNA sequences comprising 341 bp of the d-loop yielded phylogeneticallyconsistent results (8).

• The samples Ajv52, Ajv70 and Ire8 had also yielded nuclear DNA in an earlier study (14).

Two animal samples, a Harp seal (Pagophilus groenlandicus) from Ajvide (GSAj15) and adomestic dog from Linkoping (CFBe3a), that had been handled in parallel with the humansamples were used as negative controls. The GSAj15 sample was conservatively chosen basedon it being the non-human sample with the most evidence of human contamination in a

6

previous study (197 template copies of the same 80 bp human mtDNA fragment referred toabove; 8). The CFBe3a sample had 11 template copies of the human mtDNA fragment (8).Thus, while both animal samples had previously shown evidence of a low degree of humanmtDNA contamination, the amount of human mtDNA in the animal samples was much lowercompared to all human samples used in this study (8).

Bones and teeth from the seven samples had been decontaminated (13) and DNA extracted (8)according to previous protocols. The outer surfaces of bones were removed by sandpaper pol-ishing and teeth were wiped with 0.1M HCl. All samples were soaked in 0.1M HCl, rinsed withddH2O (DNA free ELGA grade) and dried. Between 50-200mg bone powder was obtainedper sample with a Dremel automated drill and soaked in 0.5% sodium hypochlorite for 15 minand rinsed three times with LiChrosolv water (Merck, Darmstadt, Germany). The powder wassubsequently incubated for 24 h at 55◦C in 1.6 ml buffer containing 0.5M EDTA (pH 8.0),1M urea and 100g/ml proteinase K. The sample was then briefly spun at 2,000rpm for 5 minbefore transferring the supernatant to an Amicon Ultra-15 centrifugal filter (Millipore, MA,USA) which was centrifuged at 4,000 g for 15 minutes, or until 50-100µl remained in thecolumn. The extract was then purified using QIAquickTM silica-based spin columns (Qiagen,Hilden, Germany) based on the method of (51). First, the extract was added to a spin col-umn together with approximately 5 times its volume of PB buffer. Following centrifugationat 13,000rpm for 1 minute, 750µl PE buffer was added and the column was centrifuged asbefore until there was no visible liquid left. DNA was then eluded by adding 2 X 45µl EBbuffer to the column membrane and centrifuging for 1 minute at 13,000 rpm.

3 Library preparation and sequencing

Paired-end Illumina libraries were built on all samples following the manufacturer’s protocol(Illumina Paired-End Sample Prep Kit), with the given modifications. The nebulization stepwas bypassed due to the fragmented nature of ancient DNA. Adaptor ligation was performedusing 1µl of Index PE Adaptor Oligo Mix (Illumina Multiplexing Sample Preparation Oligonu-cleotide Kit). Libraries were amplified using both Phusion polymerase (Finnzymes, Espoo,Finland) and Platinum Taq DNA Polymerase High Fidelity (HiFi) (Invitrogen, Carlsbad, CA),the former of which is known to not faithfully amplify molecules that contain uracil residues.

The set-up for the first round of amplification using Phusion polymerase was as follows: 5µlDNA library, 1x Phusion HF buffer, 200µM dNTP each (Invitrogen, Carlsbad, CA), 500nMMultiplexing PCR primer 1.0, 10nM Multiplexing PCR primer 2.0, 500nM Index PCR primer,0.02U/µl Phusion DNA Polymerase and water up to 50µl (primers obtained from the IlluminaMultiplexing Sample Preparation Oligonucleotide Kit). Cycling conditions were: 98◦C for 30seconds; 18 cycles of 98◦C for 10 seconds, 65◦C for 30 seconds, 72◦C for 30 seconds; and, afinal extension at 72◦C for 5 minutes.

The Hifi PCR was set up in two successive amplification rounds, with the first amplifica-tion protocol as follows: 5µl DNA library, 1x High Fidelity PCR buffer, 2mM MgSO4, 200µMdNTP each (Invitrogen, Carlsbad, CA), 500nM Multiplexing PCR primer 1.0, 10nM Multi-plexing PCR primer 2.0, 500nM Index PCR primer, 1 unit of Platinum Taq High Fidelity andup to 50µl water. Cycling conditions for the HiFi amplification were: 94◦C for 4 minutes; 18

7

cycles of 94◦C for 30 seconds, 60◦C for 30 seconds, 68◦C for 30 seconds; and, a final extensionat 72◦C for 7 minutes. PCR products from this first amplification round were purified throughQIAquick spin columns (Qiagen, Hilden, Germany), from which 5µl of library was used to setup a second HiFi amplification round using the same PCR set-up as above and similar cyclingconditions except that the number of cycles was increased to 22.

All Phusion and HiFi (second round) PCR products were purified using QIAquick Gel Ex-traction Kit (Qiagen, Hilden, Germany) with no modifications to the provided protocol andeluted in a final volume of 30µl. Libraries were sequenced on at least one lane each on theIllumina GAIIx platform.

4 Data preprocessing and alignment

Raw data was analyzed using 1.7.0 of CASAVA and 1.8.0 of the OLB (offline basecaller). Toavoid possible contamination that could have entered the library outside clean room facilities,we stringently discarded all reads that did not have the exact correct index sequence (Ta-ble S2). The retained fraction was generally 96%-99% in all libraries except the one fromthe seal, for which the fraction was only 48%. In addition, we trimmed remaining adaptorsequences, again requiring an exact match of a portion of the adaptor sequence and that itwas located in the 5’ end of the sequence read. Sequencing errors can lead to untrimmedadaptor sequences when only exact matches are identified and trimmed, but these should haveminimal effect on our analysis due to further filtering criteria applied below.

4.1 Mapping

We mapped all sequence reads to the human reference genome (UCSC hg18) with the mito-chondrion replaced by the Cambridge reference sequence (rCRS). We used Stampy-1.0.10

(18), an accurate mapper for Illumina sequence reads used e.g. for indel discovery in the 1000genomes project (25). We ran Stampy in ’sensitive’ mode under default parameters. Duplicatereads were filtered based on mapping positions (’rmdup’) using the samtools package (52).

The fraction of reads mapping to the human genome (Table 1 in the main text) is simi-lar to the fraction obtained from the best preserved Neandertal fossils (53,54), but lower thana permafrost preserved modern human of similar age (11).

4.2 Stampy performance for ancient DNA

To assess Stampy’s applicability to ancient genomic data that is affected by e.g. postmortemnucleotide misincorporations (19), we compared the fraction of mapped reads obtained byStampy and 50,000 randomly subsampled reads from the same library that were aligned us-ing megablast (55) to an extensive database (hg18, panTro2, NCBI ”other genomic” and”ref seq”). We used a word size of 12 and required an e-value of less than 0.001. Aligningagainst a large database with genomic information for several vertebrates closely related tohumans together with the high sensitivity of megablast allows distinguishing between se-quences that have their absolute best match to humans from sequences that align to humansas well as other organisms. We find that the fraction of human reads obtained for the 5 human

8

samples sequenced using both Hifi and Phusion polymerase is highly similar when mappingwith Stampy and megablast (Fig. S2).

4.3 Ste7 resequencing

One of the five human samples, Ste7, had a significantly lower fraction of reads mapping tothe human genome than the other four samples. Attempting to remedy this, a new library wasbuilt for this sample using the GS Rapid Library Prep Kit (Roche, Branford, CO) and followingthe provided protocol with some modifications. 1µl of a 5µM dilution of the Index PE AdaptorOligo Mix was added to the end-repaired and A-tailed DNA fragments with 1µl of RL ligase,and the adaptor ligation reaction was carried out at 25◦C for 15 minutes. Adaptor-ligatedlibraries were purified using MinElute spin columns (Qiagen, Hilden, Germany) and eluted in30µl of EB buffer. The HiFi PCR protocol and cycling conditions outlined in section 3 werefollowed for the new Ste7 library with the exception that two reactions with 3µl DNA librarywere set up for each of the two PCR rounds, and, shorter primers than the ones providedby Illumina were used in both the amplification rounds in order to minimize the formationof primer dimers (see [56] for primer sequences, Index 2). The two PCR reactions afterthe second amplification round were pooled together and gel purified using the E.Z.N.A gelpurification kit (Omega Bio-Tek) and eluted with 30µl elution buffer. The gel-purified librarywas sequenced on six additional Illumina GAIIx lanes. However, we found that saturation inthe number of unique sequences had been reached, as the total number of unique humanreads did not increase proportional to the amount of sequencing (Table S3). This was alsoreflected in the number of new SNPs obtained as data from each sequencing lane was added.On the basis of these observations and the increased error rate in Ste7 data (see section 7),this individual was excluded from all population genetic analyses.

5 mtDNA analysis

The 5 Neolithic samples had previously been assigned to mtDNA haplogroups based on 341bp of the mitochondrial d-loop (8). To authenticate our data and, if possible, obtain betterresolution of the haplogroups, we assembled mitochondrial genome sequences and comparedthe results with those obtained previously.

5.1 Assembly

We extracted all reads that aligned to the human mitochondrion (rCRS), and called a consen-sus sequence using the samtools package with default parameters assuming a haploid model.We find that the mtDNA genome coverage shows little correlation with the total amount ofhuman DNA obtained from each sample (Table S4), which is in line with other observationsof highly variable mtDNA-nuDNA ratios in ancient DNA extracts (57).

5.2 Haplogroup assignment

The haplogroup affiliation of each sample was assigned using HaploGrep (58) and PhyloTree

version 12 (http://www.phylotree.org; 59). The mutations are described relative to therCRS (60). The haplogroup affiliation of all samples corresponded to earlier results. The

9

increased amount of sequence data added further support, and in three cases, the place ofthe sample in mtDNA gene tree was refined down to subhaplogroup level.

Two of the Neolithic hunter-gatherer samples, Ire8 and Ajv70, were previously classified asU4/H1b based on substitutions at np 16093 and 16356 (8). The mitochondrial genome ofIre8 displays 22 of the 26 mutations expected for subhaplogroup U4d (Table S5). The missingmutations are T195C (that define U4’9 together with G499A and T5999C), T4646C (that de-fine U4 together with A6047G, C11332T, C14620T, T15693C, T16356C) and 2405.1C (thatdefine U4d together with T629C). The consensus sequence lacks coverage for A8860G (whichtogether with A263G and A15326G define H2a2a). The mutational pattern for the Ajv70 mi-tochondrial genome also corresponds to that of haplogroup U4, either within subhaplogroupU4d or U4a2a. In both cases, Ajv70 displays 23 of the 26 expected mutations but lacks,as do Ire8, the T195C and 2405.1C mutations and sequence data for A8860G. The missingmutations leading to U4a2 are T195C and C8818T (defining U4a). However, the sample doesdisplay the T310C substitution that defines U4a2. The third Neolithic hunter-gatherer sample,Ajv52, was previously assigned to haplogroup V based on a transition at nucleotide position(np) 16298 (8). The new data further provide 10 of 12 mutations leading to V (Table S5).The two expected transitions that are missing are A8860G, for which there is no sequencedata, and A4769G (defining H2a).

The Neolithic farmer sample, Gok4, was previously classified as belonging to haplogroupH as no d-loop mutations were observed (8) and the shotgun sequence assembly providedfour out of six expected mutations that further supported the classification. The transitionat np 8860 is missing as there is no sequence data for that position and the transition at np4769 display the ancestral allele state. The second Neolithic farmer sample, Ste7, had previ-ously been assigned to haplogroup T as it displayed transitions at np 16126-16294-16304 (8).The new assembly could further place the sample within subhaplogroup T2b3 as 28 of 32expected mutations were present. The ancestral allele state was present for A8860G, A4769Gand C16296T (that defines T2 together with A11812G and A14233G). No sequence data wereavailable for mutation G5147A (defining T2b together with G930A and T16304C). Further,Ste7 seems related to T2b3a, as the sample displays one (C151T) of four (151T, 152C, 5656Gand C16292T) polymorphisms that define the subhaplogroup.

The number of non-diagnostic mutations (mutations not observed in PhyloTree) were con-siderably higher in assemblies with lower coverage (Gok4 had 92 such mutations comparedto 10, 19, 21 and 26 for Ajv70, Ajv52, Ste7 and Ire8 respectively) (Table S6). Note thatthe relative sequence error estimates presented for the autosomal data (see section 7) arefor positions that are already known to be polymorphic, and thus the total fraction of errorsfor low-coverage data when taking all possible positions into account is much greater. Themajority of the mtDNA polymorphisms that have not been documented previously are likelyerroneous, and they display the expected relationship to coverage, with the mtDNA genomeassembly with the lowest coverage (Gok4) also having the highest number of this type ofpolymorphisms (Table S6).

In addition, the mtDNA genomes display a third category of polymorphisms: mutations thatare observed in association with other haplogroups (Table S5). The samples displayed between

10

zero and nine such mutations. These were all transitions, found between one and 36 times atwidely dispersed positions in the phylogenetic tree. However, some of these could representtrue polymorphisms in the Neolithic individuals. For example, both Ire8 and Ajv70 displayedtransitions at np 16093 in the previous study (8). Other polymorphisms in this category arelikely due to errors, for example the transitions at np 16145 and 16167 in the Gok4 mtDNA,for which the previous study showed no support.

6 Autosomal data processing

For our population genetic analyses, we obtained haploid genotypes for all autosomal positionsthat overlapped with SNPs in the particular reference dataset (see below and Fig. S3) withthe following criteria:

• We retained reads with mapping quality greater than 30.

• We retained bases with quality greater than 15.

• We excluded positions where a T was found in the sequencing read and a C in thereference human SNP data set.

• We excluded positions where a A was found in the sequencing read and a G in thereference human SNP data set.

• We excluded positions with more than two alleles (see Table S7 and Table S8).

• We excluded positions with gaps or an N in the sequencing read.

• We randomly sampled one read from positions covered by multiple reads.

We base all autosomal population genetic analyses on known human single nucleotide poly-morphisms (SNPs) from five different reference sets. Compared to sequence-based analyses,the advantage is that most sequencing errors (which are prevalent in second-generation se-quencing data, in particular from ancient DNA) will be weeded out and therefore have a muchsmaller impact on the analyses. Haplotypes and missing genotypes were estimated for eachdataset of extant humans using fastPHASE (61) version 1.4.5 (except for the data that werealready provided with phase information, see below). The number of haplotype clusters wasset to 25, and we used 25 runs of the EM algorithm. This analysis was used to generate a”best guess” estimate of the true underlying patterns of haplotype structure. When multipledata sets were to be combined, we merged them separately based on position, excluded A/Tand G/C SNPs, and flipped genotypes according to data sets that were provided with hg18strand orientation.

6.1 HM3+FINHM+HGDP

We obtained phased Finnish HapMap (FINHM) data from 81 individuals in two groups (FINand LSFIN) (ftp.fimm.fi/pub/FIN HAPMAP3/; 22), phased hapmap3 (HM3) genotypesfrom 7 populations (CEU; TSI; GIH; JPT+CHB; MKK; LWK; YRI; ftp.ncbi.nlm.nih.gov/hapmap/phasing/2009-02 phaseIII/HapMap3 r2/; downloaded 30 nov 2010, 23), and

11

phased genotypes for 938 individuals from the Human Genome Diversity Project (HGDP;http://hgdp.uchicago.edu/Phased data/; 21, 62). To resolve strand issues in the datasets, we flipped SNPs to hg18 strand orientation for HGDP, FINHM, and HM3 data separately,and intersected the 1,163,280 SNPs common to HM3 and FINHM with the HDGP IlluminaSNPs. We excluded SNPs with MAF≤0.5%, leaving a total of 519,951 SNPs.

6.2 FINHM+POPRES

We merged the FINHM data set with a data set consisting of 1385 individuals from thePOPRES collection that had been genotyped on Affymetrix 6.0 arrays (24, 63). We used2 outlier iterations in EIGENSOFT with pairwise r2 limit of 0.2 to exclude 7 individuals inthe FINHM data that were identified as outliers on the first or second eigenvector (LSFIN2,LSFIN22, FIN12, LSFIN10, LSFIN30, LSFIN40, LSFIN21, LSFIN28). We excluded SNPs withMAF≤0.5% to a final total of 279,344 SNPs.

6.3 1KGPomni

We used 2,379,175 autosomal SNPs genotyped on Illumina 2.5 Omni arrays (25) from 241individuals from 5 European populations (CEU, TSI, FIN, GBR, IBS). We performed 5 outlieridentification iterations in EIGENSOFT 4.0 with pairwise r2 limit of 0.2, and exluded 3 CEUoutliers (NA12329, NA06989, NA06984) based on visual inspection of the PCA results. Weexcluded SNPs with MAF≤0.5%.

6.4 1KGPomni+Sweden

We phased genotype data from 353,565 autosomal SNP typed in 1,525 Swedish individuals(28) as described above. This data set was then merged with 1KGPomni (see above) in orderto flip to hg18 strand orientation, excluding all G/C and A/T SNPs, resulting in 250,725autosomal SNPs that were present in both data sets. We excluded SNPs with MAF≤0.5%.

6.5 Complete genomes

SNP data genotyped using arrays suffers from ascertainment bias, and reports a limited num-ber of polymorphisms for each individual. We attempted to circumvent these issues by iden-tifying SNPs between 9 human genomes sequenced to high-coverage. These consisted ofthe 7.5X genome of Craig Venter (64) obtained with capillary seqeuncing technology to-gether with 30X genome assemblies obtained with Illumina sequencing technology from 2individuals from Utah of European descent (25), one individual of Chinese descent (65), oneindividual of Korean descent (66) and two Yoruba individuals from Ibadan, Nigeria (25), andfinally the genome of an individual of Northern European ancestry sequenced with Helicostechnology (67). We added data on the chimpanzee allele (panTro2; 68, 69 , and excludedhypermutable CpG sites to a total data set comprising 6,001,729 SNPs. To avoid biasesfrom using different sequencing platforms and genotyping methodologies, we restricted allpopulation genetic analyses to the seven genomes sequenced by Illumina technology (theVenter (64) and Patient 0 (67) genomes were excluded), and used genotypes that had been

12

called by the 1000 genomes project and filtered for mapping uniqueness and genotyping qual-ity (25 ; ftp://ftp.sanger.ac.uk/pub/rd/humanSequences/; retrieved April 2011). Weextracted genotypes from the Neolithic data separately for each sample as described above.

We note that ascertainment bias also affects this data since we only use sites that havebeen found to be polymorphic among the 9 diploid genomes. We therefore ignore potentialsites where the Neolithic data has an ancestral allele (shared with the chimpanzee) that isderived in all the 18 chromosomes from the modern genomes. This would lead to an under-estimation of population divergence time between Neolithic individuals and modern genomes(see section 12). However, these sites are rare, and are more susceptible to sequencing errorscompared to sites known to be polymorphic in the human population.

7 Relative sequencing/alignment error rates

To investigate relative sequencing error rates in sequences from different Neolithic samples,we counted the number of sites in which the Neolithic individual had a third allele that hadnot been seen in the 9 high-coverage human genomes and the chimpanzee. This fraction ofsites comprises positions in which there is a sequencing or alignment error in the Neolithicdata, and possibly also mismatches caused by sequencing and alignment errors in the moderndata. Regardless, these sites should be informative on the relative rates of sequencing andalignment errors between the different Neolithic data sets.

We find that Ste7 has an approximately 3 to 7 times inflated error rate compared to theother 4 samples (Table S7). This could possibly be due to the lower fraction of reads map-ping to the human genome, which could cause a proportionally higher number of reads withinthis fraction to be the result of mapping error. We also find similar results when computingthe fraction of triallelic sites using the 1KGPomni genotype data (Table S8), with Ste7 havingapproximately 2 to 3 times as high error rate as the other individuals. Error rates for theremaining 4 Neolithic individuals are rather homogeneous, but slightly inflated for the Ire8and Gok4 individuals.

8 PCA and Procrustes analysis

A central problem for ancient genomic projects is the large fraction (usually >95%) of retainedshotgun sequences that are comprised of non-endogenous sequences (e.g. from microbes inbone material [57 ]). This has the consequence that it is very difficult to obtain high-coveragedata even from relatively well-preserved material (53, 54). The data from the four differentNeolithic individuals presented here cover only about 1%-3.5% of the human genome each,resulting in few overlapping positions that could be used for direct population genetic com-parisons between individuals. However, the vast amount of reference data for modern humangenetic variation offers an opportunity of indirect comparison, where each individual is analyzedtogether with a modern reference data set separately, whereafter the individual analyses areconsidered jointly. Here we present such an approach based on Procrustes analysis (70) thatis tailored for joint analysis of non-overlapping data from different individuals using principalcomponent analysis or other methods based on the same conceptual framework (71).

13

8.1 Implementation

Under the assumption that the genealogical history of each genetic locus–although non-overlapping–represents a random outcome of the same ancestral process, the relationshipbetween non-overlapping data from different individuals from the same population (i.e. indi-viduals with similar ancestry) and individuals from other populations is largely independent ofwhich loci are used. We transformed PC1 and PC2 from each individual analysis to matchthe PC1 and PC2 configuration obtained using only the reference individuals (using all SNPsin the data set, and where each ancient (Neolithic) individual was given coordinates (0, 0))(Fig. S4). As long as the ancient individual does not heavily influence the loadings of the refer-ence individuals, the PC1 and PC2 loadings for a given reference individual in an analyses of asubset of SNPs from the complete set should just reflect the sampling variance in the numberof SNPs. Therefore, we take the mean of PC1 and PC2 obtained in each separate analysis foreach reference individual, but retain the transformed PC1 and PC2 configuration for ancientindividuals. All principal component analyses were performed using EIGENSOFT 4.0 (20),with one SNP from each pair with r2 > 0.2 excluded. Results are shown in Figs. S5-S9.

8.2 Validation using Saqqaq data

As a proof of concept, we analyzed sequence reads from two Illumina libraries prepared froma paleo-eskimo hair sample (”Saqqaq”; 11) that had been amplified by both Hifi polymeraseand Phusion polymerase by (72) and mapped to the human genome using BWA (73) byAmhed Missael Vargas Velazquez and Ludovic Orlando. We used the same genotype filtersas described above and obtained 23,587 SNPs from the Hifi data and 38,107 SNPs fromthe Phusion data that overlapped with the HM3+FINHM+HGDP data set. We used 48non-African populations in the data set to generate a PC1-PC2 projection as in Figure 3Bin (11), and used Procrustes analysis to obtain a joint analysis of the non-overlapping datafrom the two different libraries as described above. The results were highly consistent betweenthe haploid low-coverage data analyzed using our pipeline and the diploid high-coverage dataanalyzed by (11), as well as highly consistent between the two libraries amplified by Hifi andPhusion (Fig. S10).

9 Model-based clustering

Each Neolithic individual was clustered together with extant European individuals from HapMap,Finnish HapMap, and HGDP (HM3+FINHM+HGDP) plus the Levantine Druze populationusing an unsupervised clustering algorithm implemented in ADMIXTURE (26). For each Ne-olithic individual was the clustering performed based on SNPs that overlapped between theNeolithic individual and the set of extant individual (see Table 1 in main text). All runs ofADMIXTURE were based on the ’admixture’ model, in which each individual is assumed to haveancestry in multiple clusters. Default settings were used (except for generating the randomseed). For a given value of K, the number of clusters considered, 10 replicate analyses wereperformed for each dataset comprising of one Neolithic individual and 384 extant individuals.The 10 replicate runs of ADMIXTURE (for each set of individuals and each K) were analyzedwith CLUMPP version 1.1.2 (27) to identify common modes among replicates. The CLUMPP

analysis used the fullSearch algorithm for K = 2 and 3, and the Greedy algorithm with 10,000

14

random permutations for K = 4. The 10 runs of ADMIXTURE produced very similar solutions,the average similarity coefficient H ′ was greater than 0.98 for all choices of K (2, 3 and 4) andfor all four Neolithic individuals. The clustering results were visualized with Distruct (74).

Fig. S11 shows the clustering solutions for the four Neolithic individuals assuming 2, 3 and 4clusters (each plot shows the average across 10 replicate ADMIXTURE runs). Figure 1D in themain text was generated by extracting the (averaged) ancestry matrix of the extant individu-als from each analysis containing one Neolithic individual and averaging them using CLUMPP.These four ancestry matrices were similar (H ′ = 0.795) and the four Neolithic individualswere merged to the plot so that the original clustering solutions were respected (comparewith Fig. S11).

10 Authenticity

Several observations, such as high fraction of sequences mapping to the human genomecompared to animal controls and matching mitochondrial genome assemblies to previouslyobtained mtDNA data, point to the presence of endogenous DNA in the human sequencedata. To investigate authenticity more closely we characterized two physical features that havepreviously been shown to differ between authentic ancient DNA and contaminating modernhuman molecules (19,29): (i) increased rate of thymine residues at positions where a cytosineis found in the reference sequence due to nucleotide misincorporation, and (ii) increased rate ofpurine residues to the 5’ end of the mapped sequence reads. However, increased rate of purinesin the 5’ end can also be caused by bleach treatment (75), which was used to decontaminatethe samples. We therefore base our conclusions and method for assessing authenticity in theautosomal data (see below) only on the nucleotide misincorporation pattern. Since Phusionpolymerase does not faithfully replicate Uracil (11, 72, 76), these analyses only utilized datafrom libraries amplified by Hifi.

10.1 Patterns of molecular degradation

In each alignment to the human genome, we extracted the positions covered by the read plus5 bp in the 5’ and 3’ from the hg18 reference sequence, and locally realigned the read and thereference sequence using MUSCLE3.8 (77). We then computed the average base frequencyat different positions of both the sequence read and the sequence at and around the alignedregion in the reference genome. We required reads to have a mapping quality of at least 30and averaged the frequency over all analyzed positions. We find clear evidence of patterns (i)and (ii), which were highly homogenous among samples (Fig. S12).

10.2 Testing for autosomal contamination

The presence of endogenous DNA does not preclude that a small fraction of contaminationcould influence the population genetic analyses and comparisons with modern populations. Totest whether contamination could be present in sufficient amounts to affect our conclusions, wedevised a new approach that relies on the expectation that C→T nucleotide misincorporationsare many times more frequent in endogenous molecules than contaminating molecules. Bydividing the sequences into those that have and those that do not have evidence of nucleotide

15

misincorporation we can create fractions that are highly enriched for endogenous and putativecontaminating molecules, respectively. Assuming significant (but not overwhelming) presenceof contamination, the latter category without evidence for misincorporation is expected tostill contain a portion of endogenous molecules that do not have misincorporations, but theformer category with misincorporations is expected to be composed mostly of endogenoussequences. All population genetic analyses that could be subject to contamination can thenbe assessed for different behaviour between the two fractions.

To implement this idea, we created additional sets of genotypes where we only consideredsequence reads that either had, or did not have a T where the reference base was a C withinthe first 10 bp (in the 5’ end) (Table S9). We used the same method for local realignment asabove. We performed PCA and Procrustes analysis on each separate fraction (with- and with-out nucleotide misincorporations) separately, together with the TSI (Tuscans from Italy) andFIN (Finnish from Finland) in the 1KGPomni reference data set, since differentiation betweenthese two populations summarizes the main component of differentiation between Neolithichunter-gatherers and the Neolithic farmer in our analyses (Fig. 1 in the main text). We findno evidence suggesting that contamination could influence our analyses and conclusions, withthe separation and different population genetic affinities between the Neolithic farmer and theNeolithic hunter-gatherers remaining evident in all analyses (Figs. 2, S13 and S14).

11 Allele sharing analysis

We computed the mean frequency of the allele observed in Neolithic farmer data (Gok4), andNeolithic hunter-gatherer data (Ajv52+Ajv70+Ire8) separately in various extant populations,summing over the total number of SNPs. To create a data set representing all three Ne-olithic hunter-gatherers we pooled the sequence reads from all individuals and genotyped allSNPs using the procedure described in section 6, resulting in 12,609 SNPs that overlappedwith the FINHM+POPRES data. We restricted the analysis to the 25 populations in theFINHM+POPRES data set with sample size of at least 8 chromosomes, and computed thefrequency of the allele observed in either Neolithic hunter-gatherers or the Neolithic farmerin extant populations by averaging over 8 randomly sampled chromosomes (without replace-ment). This procedure was replicated 100 times (for each SNP) and the mean across the 100repititions is reported in Table S10. Spatial interpolations of Neolithic allele frequencies wereperformed using the Krig function in the R package ”fields” (78). Coordinates for POPRESpopulations were obtained with Geonames (http://www.geonames.org/). For some pop-ulations (Sweden, Russia, United Kingdom) the coordinates were placed in the capital (63).FINHM coordinates were placed in Helsinki (FIN) and Oulo (LSFIN).

12 Estimation of population divergence time from com-plete genomes

To provide an alternative view of the genetic relationship between the Neolithic individualsand modern humans, we computed the population divergence time between each Neolithicindividual and one of seven complete genomes from extant humans. We used the same frame-work for estimation of population divergence time using reference genomes as in (32), with the

16

main difference being we estimate population divergence time to each modern diploid genomeseparately, using the chimpanzee as an outgroup. To implement the method, we compare thetwo gene copies in a modern genome to the Neolithic gene copy and the chimpanzee. Weinfer a genealogical topology only in those cases where two copies are identical, but differfrom the other two (one of which is the chimpanzee–representing the ancestral state), whichare also identical. We then count sites where the two copies from the modern genome share aderived allele not found in the Neolithic sample as ’concordant’ (assuming a model where theNeolithic individual splits from the modern population). The remaining two configurationsresult in a ’discordant’ topology. When two copies from the same population (in this casea single individual) are sampled in this way, the internode time T in the framework outlinedin (32) becomes equal to the total divergence time between the modern population and theNeolithic population.

Since the expected fraction of concordant topologies increases proportional to the time sincepopulation divergence, we can estimate T (measured in units of 2Ne generations where Ne

is the effective population size) and obtain confidence intervals in a maximum likelihoodframework using a coalescent model (e.g. 79). We have the log-likelihood function

Log(L(T )) = Nconc ×(

1− 2e−T

3

)−Ndisc ×

2e−T

3

where Nconc is the number of concordant genealogies, and Ndisc is the number of discordantgenealogies (the sum of the of two types) (79).

For estimating a joint population divergence times for all the three Neolithic hunter-gatherers,we pooled the sequence reads from the three individuals (Ajv52+Ajv70+Ire8) and obtainedgenotypes as outlined above.

12.1 Divergence from genomes of Northwest European ancestry

We find that the estimated population divergence time T between all Neolithic humans andmodern individuals of Northwest European ancestry (CEU) is on the order of 0.01, with pointestimates from 0.0069 to 0.0241. Comparing estimates for the Neolithic farmer to the pooledNeolithic hunter-gatherer data obtained using NA12891 we obtain T = 0.0150 (95% ML CI:0.003-0.027) and T = 0.0155 (95% ML CI: 0.008-0.023), respectively (Table S11). The esti-mates obtained using NA12892 are more disparate but with overlapping confidence intervals,specifically we obtain T = 0.0241 (95% ML CI: 0.012-0.037) for the Neolithic farmer and T= 0.0103 (95% ML CI: 0.003-0.018) for the Neolithic hunter-gatherers.

Assuming a recent European effective population size of 2Ne=10,000 (80) and a 25 yeargeneration time, the estimates obtained for NA12891 would translate to a divergence time of0.015× 10, 000× 2× 25 = 7, 500 years (95% CI:1,500-13,500) for the Neolithic farmer and0.0155 × 10, 000 × 2 × 25 = 7, 750 years (95% CI:4,000-11,500) for the Neolithic hunter-gatherers. The estimates for NA12892 are 12,050 years (95% ML CI: 6,000-18,500) for theNeolithic farmer and 5,150 years (95% ML CI: 1,500-9,000) for the Neolithic hunter-gatherers.The maximum and minimum values for each Neolithic group obtained using the two differentCEU genomes are reported in the main text, together with the mean of the two point estimates.

17

If the history between the modern genome and a Neolithic individual was more complexthan an instantanuous divergence model (79), the interpretation of these estimates wouldhave to be modified accordingly. If, for example, the modern genome would be the result ofadmixture between the Neolithic individual and some other population, T would be a functionpartly of genetic drift in the two source populations since their divergence, as well as therespective fractions they contributed to the ancestry of the Neolithic individual.

With this in regard, these estimaed divergence times are in line with the age of the spec-imens, and the similarity of the estimates for the Neolithic farmer and the Neolithic huntergatherers in the comparison with the NA12891 genome is in line with an estimated fraction of51.7% Neolithic farmer ancestry in the CEU population (Table S15). The difference betweenthe two genomes, where NA12892 appear to be non-significantly closer to Neolithic hunter-gatherers than the Neolithic farmer, could possibly be due to different ancestry in NA12892compared to NA12891. This difference is however within the confidence intervals of the esti-mates (Fig. S15).

While these estimates can be inflated by sequencing errors (32), the estimated error ratesfor Gok4 are not widely different from the Neolithic hunter-gatherer data (see section 7). Onthe other hand, it has been shown that the ancestral effective population size influences thismethod when single SNPs are used to infer genealogical topologies, but this bias is small fordivergence times on the order of 0.01 (32).

12.2 Divergence from genomes of East Asian and West African an-cestry

In contrast to the results obtained using extant European genomes, we find that the inferredpopulation divergence times from East Asians is significantly higher (0.12-0.18; Fig. S16;Table S12). This likely reflects both a more ancient divergence between ancestors of Europeansand East Asians as well as greater genetic drift in East Asians than Europeans (81). However,we are still unable to detect any significant difference between the Neolithic farmer Gok4 andthe three Neolithic hunter-gatherers. We obtained similar results when comparing the Neolithicdata to three genomes from individuals of West African ancestry (Fig. S16, Table S12).

13 Modelling European population history

Here, we use the observed patterns of differentiation between Neolithic farmers and hunter-gatherers to investigate models of population history relating the Neolithic individuals tomodern populations.

13.1 Testing population topology using genealogical concordance

To build a tree-like framework relating the Neolithic individuals to modern populations, wetake a similar approach as (82), testing the support of individual four taxon topologies andconsidering the results jointly. Consider a case where a single gene copy is sampled from

18

each of four populations A, B, C and D as for the divergence time estimation above, thereare only three possible genealogical topologies, of which the one that mirrors the populationtopology (i.e. the ’concordant’ topology) is expected to be most frequent. By scoring whichgenealogical topology is supported by the configuration of alleles at each site (assuming aninfinite sites model of mutation), we use a ’concordance’ test of the hypothesis that thetopology (A,B), (C,D) is significantly more frequent than the topologies (A,C), (B,D) and(A,D), (B,C). We conduct this test by computing the excess of the putatively ’concordant’category over the second most frequent category (the most frequent ’discordant’ category,denoted disc1), as

C =Nconc −Ndisc1

Nconc +Ndisc1

To obtain standard errors (SEs) for this statistic, we use a block jackknife procedure wherewe divide the genome in 130-140 contiguous blocks, and obtain Z-scores by normalizing withthe SE. If the C-statistic is significantly positive (> 3 SEs from 0), we consider the topologysupported.

We use European and Levantine (Druze) populations in the HM3+FINHM+HGDP data set,and add San and Yoruba as outgroups. Most tests support the results obtained from the pre-vious analysis, suggesting a close relation between Neolithic hunter-gatherers and NorthernEuropean populations (such as Finnish) on one hand, and Neolithic farmers and SouthernEuropean populations (such as Sardinians), on the other hand (Table S13). Based on all the1,271 significant four-wise combinations of tests in the data (108 that included the Neolithichunter-gatherers and 97 that included the Neolithic farmer), we constructed a neighbour-joining tree using an algorithm in the clann software package (83), and find that the resultsof this clustering is largely similar to the results based on PCA and ADMIXTURE analyses(Fig. S17).

This approach provides a means for clustering haploid data from populations and representingthe result as a topology, but we note that the neighbour-joining algorithm allows for conflictsbetween inferred topologies, and it utilizes information such as how many times two groupsappear in the same significant concordance test (for example in tests including the outgroup)rather than explicitly testing if there is support for each node in the inferred topology. In thefuture, more sophisticated algorithms for constructing the supertree might allow populationtopologies to be investigated in a more formal setting that is directly related to populationgenetic models of divergence with- or without gene flow.

13.2 Gene flow between a population related to Gok4 and extantEuropean populations

While the concordance tests are designed to portray the major pattern underlying the data,human population history often comprises more complex events such as migration and admix-ture. The observation that a Neolithic farmer individual in Scandinavia 5,000 years ago hasa similar genetic makeup as modern-day Southern Europeans suggest that it belonged to apopulation that expanded in concert with agriculture, and as such may have interacted andadmixed with resident populations. Observing that Neolithic hunter-gatherers tend to cluster

19

in the vicinity of Finnish individuals and the Neolithic farmer clusters with individuals fromsoutheastern Europe, we considered the possibility that other European populations in theHM3+FINHM+HGDP data set are derived from admixture between resident hunter-gatherergroups and expanding farming populations related to Gok4. Using the topology inferred aboveas a framework, we tested the hypothesis of no gene flow in the unrooted population topology(Outgroup,Gok4), (X, LSFIN), were Outgroup is either Papuans, Native Americans (Su-rui, Karitiana and Colombians; 21,84), Yoruba, or Levantine Druze. X is one of 9 Europeanpopulations in the HM3+FINHM+HGDP data set. We restrict the analysis to the 7,712SNPs not affected by putative nucleotide misincorporations. We compute the 4-populationtest based on the estimated allele frequencies in each population as in (31, 82).

D =

∑ni=1[p

iOutgroup(1− pi

Gok4)piX(1− pi

LSFIN)]− [(1− piOutgroup)p

iGok4p

iX(1− pi

LSFIN)]∑ni=1[p

iOutgroup(1− pi

Gok4)piX(1− pi

LSFIN)] + [(1− piOutgroup)p

iGok4p

iX(1− pi

LSFIN)]

where pi is the allele frequency at locus i. The statistic is summed over all n loci and wecompute a standard error using a block jackknife procedure over 132 windows with 50 SNPsin each.

We find evidence of a violation of the basic population topology using all outgroups (Ta-ble S14). The signal is consistently (but not always significantly) negatively skewed regardlessof which outgroup is used. This suggests that gene flow from Asia into Northern Europe orfrom Africa into Southern Europe is not enough to explain the data, since the signal is seenacross all populations when using outgroups from both these regions.

13.3 Estimation of Neolithic farmer ancestry in European popula-tions

Based on the 4-population tests and the available data, we considered a model (Fig. S18)where the late-settlement Finnish founder population LSFIN (22) has received a negligableamount of Neolithic farmer-related gene flow since the Neolithic expansion. Secondly, we as-sume that Gok4 instantanuosly split from the ancestors of extant Druze in the Levant (Druzewas chosen over Palestinians due to low degree of recent African admixture (85). Using threeNative American populations with no evidence of European admixture (Surui, Karitiana andColombians; (21, 84) as an outgroup, we estimate the fraction of Gok4-related ancestry inan extant European population X, as a mixture between populations related to LSFIN andGok4, using the f4-ancestry estimation framework (31, 85). Using the notation of (85), wecompute the statistic

f4(Outgroup,Druze;Gok4, LSFIN) =

∑ni=1(p

iOutgroup − pi

Druze)(piX − pi

LSFIN)∑ni=1(p

iOutgroup − pi

Druze)(piGok4 − pi

LSFIN)

where pi is the allele frequency at locus i in each population, and we compute the statisticsumming over all n loci. To obtain standard errors, we used a block jackknife, with 132 con-tiguous windows with the same number of SNPs. To also estimate ancestry in the Swedishpopulation, we merged the HM3+FINHM+HGDP data with the Swedish data, which resulted

20

in 4,817 SNPs that could be used in the analysis. We divided the Swedish data into individ-uals from Southern- (743 individuals), Central- (545 individuals), and Northern Sweden (237individuals).

A number of possible model violations could influence these estimates. Most notably, geneflow from East Asia into the European populations (e.g. the Finnish, Russian, Adygei andDruze populations) in the model. To assess the robustness of the model we replaced theNative Americans used as outgroup with Papuans, JPT+CHB (Japanese and Chinese ances-try) or Uygur (Central Asian), and obtained correlations with the original estimates of 0.99,0.98 and 0.99 respectively (Fig. S20). While the absolute values of ancestry varied with theoutgroup, we found that using Native Americans was the only case when estimates did notexceed 100%.

14 The Neolithic hunter-gatherer gene pool is not fullyrepresented in modern Europeans

To maximize the population genetic resolution of the genetic profile of the Neolithic hunter-gatherers, with special regard to their relationship with modern Scandinavians, we analyzed14,113 SNPs obtained by merging the Swedish and HM3+FINHM+HGDP data, and callinggenotypes based on all sequences available from the available Neolithic hunter-gatherer data(Ajv52, Ajv70 and Ire8), with the same criteria as above. To avoid the issue of a much largersample size in the Swedish data, we randomly sampled 20 individuals from each of Southern,Central, and Northern Sweden (Gotaland, Svealand, and Norrland; 28). We performed PCAand model-based clustering using ADMIXTURE as above but without Procrustes transformationor CLUMPP analysis. We find that Neolithic hunter-gatherers fall outside the variation of extantEuropeans in both these analyses (Figs. S21 and S22).

21

Figure S1: The Ajvide 70 (Ajv70) individual. Photograph by Johan Norderang

22

2 4 6 8 10 12 14

24

68

1012

14

% Human reads (Stampy)

% H

uman

rea

ds (

meg

abla

st)

R = 0.96, P < 1e−5

Figure S2: Benchmarking of Stampy by comparison with megablast.

23

Gökhem (TRB, Neolithic farmers)Gotland (PWC, Neolithic hunter−gatherers)1KGPomni+SwedenFINHM+POPRESHM3+FINHM+HGDP1KGPomni

Figure S3: Sample locations of all data sets used in the study. Some sampling locations arerepresented in multiple data sets (e.g. Spain), and thus have overlapping symbols.

24

−40 −20 0 20 40 60

−20

020

40Procrustes errors

Dimension 1

Dim

ensi

on 2

●●

●●

●●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

● ●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●● ●

●●

●●●

●●

●●●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

● ●

●● ●

●● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●● ●●●

●● ● ●

● ●

●●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●●

●●●●

●●●

●●

●●

● ●●

●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●●

●●●

● ●

●●●●

●●

●●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

●●

● ●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●● ●●

● ●●

●●

●●

● ●● ●●

● ●

●● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●●

●●●

● ●

●●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

Figure S4: Procrustes transformation of (PC1,PC2) coordinates from two different data sets.Arrows represent the residuals between reference (”target”) and the coordinates for rotationafter transformation.

25

−0.03 −0.01 0.01 0.03

−0.

04−

0.02

0.00

0.02

Modern individuals

PC1

PC

2

Africa

Middle East

EastAsia

Oceania

America

C/SAsia Europe

A

−0.03 −0.01 0.01 0.03

−0.

04−

0.02

0.00

0.02

Merged

PC1

PC

2

B

−0.02 0.00 0.02 0.04

−0.

06−

0.04

−0.

020.

000.

02

Ajv52

PC1

PC

2

C

−0.03 −0.01 0.01 0.03

−0.

04−

0.02

0.00

0.02

Ajv70

PC1

PC

2

D

−0.02 0.00 0.02 0.04

−0.

04−

0.02

0.00

0.02

Gök4

PC1

PC

2

E

−0.04 −0.02 0.00 0.02 0.04−

0.06

−0.

04−

0.02

0.00

0.02

Ire8

PC1

PC

2

F

Neolithic hunter−gatherer: Ajv52Ajv70Ire8Neolithic farmer:Gök4

Figure S5: PCA and Procrustes analysis using individuals sampled worldwide in theHM3+FINHM+HGDP data set. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformedPCAs of each Neolithic individual separately.

26

−0.10 0.00 0.10 0.20

−0.

10−

0.05

0.00

0.05

0.10

Modern individuals

PC2

PC

1

Orcadian

Middle East

Basque

Sardinian

NorthItalian

FIN

LSFINRussian

AdygeiTSI

French

A

−0.15 −0.10 −0.05 0.00 0.05 0.10

−0.

050.

000.

050.

10

Merged

PC2

PC

1

Orcadian

Middle East

Basque

Sardinian

NorthItalian

FIN

LSFINRussian

AdygeiTSI

French

B

−0.15 −0.05 0.05 0.10 0.15

−0.

050.

000.

050.

10

Ajv52

PC2

PC

1

Orcadian

Middle East

Basque

Sardinian

NorthItalian

FIN

LSFINRussian

AdygeiTSI

French

C

−0.15 −0.05 0.00 0.05 0.10

−0.

10−

0.05

0.00

0.05

0.10

Ajv70

PC2

PC

1

Orcadian

Middle East

Basque

SardinianNorthItalian

FIN

LSFINRussian

AdygeiTSI

French

D

−0.15 −0.10 −0.05 0.00 0.05

−0.

10−

0.05

0.00

0.05

0.10

Gök4

PC2

PC

1

Orcadian

Middle East

Basque

Sardinian

NorthItalian

FIN

LSFINRussian

AdygeiTSI

French

E

−0.15 −0.05 0.00 0.05 0.10−

0.10

−0.

050.

000.

050.

10

Ire8

PC2

PC

1

Orcadian

Middle East

Basque

Sardinian

NorthItalian

FIN

LSFINRussian

AdygeiTSI

French

F

Neolithic hunter−gatherer: Ajv52Ajv70Ire8Neolithic farmer:Gök4

Figure S6: PCA and Procrustes analysis using individuals sampled from Europe and theLevant in the HM3+FINHM+HGDP data set. A) PCA of modern individuals only. B)Procrustes-transformed and combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individual separately.

27

−0.04 0.00 0.04 0.08

0.05

0.00

−0.

05−

0.10

Modern individuals

PC2

PC

1

Cyprus

GreeceTurkeyItalyAlbania

SpainRomania

HungaryFrance

GermanyIrelandDenmarkSweden

UkraineRussia

Finland

Finland (LS)

Latvia

A

−0.02 0.00 0.02 0.04

0.04

0.02

0.00

−0.

04

Merged

PC2

PC

1

CyprusGreece TurkeyItaly

AlbaniaSpainRomania

HungaryFrance

GermanyIrelandDenmarkSweden UkraineRussia

Finland

Latvia

B

−0.05 0.00 0.05 0.10

−0.

040.

000.

040.

08

Ajv52

PC2

PC

1

CyprusGreece

TurkeyItalyAlbania

SpainRomaniaHungary

FranceGermanyIreland

Denmark

SwedenUkraine

Russia

Finland

Finland (LS)

Latvia

C

−0.06 −0.02 0.02 0.06

−0.

040.

000.

040.

08

Ajv70

PC2

PC

1

CyprusGreeceTurkeyItalyAlbaniaSpain

RomaniaHungary

France

GermanyIreland

DenmarkSwedenUkraineRussia

Finland

Finland (LS)

Latvia

D

−0.04 0.00 0.02 0.04 0.06

−0.

040.

000.

040.

08

Gök4

PC2

PC

1

Cyprus

GreeceTurkeyItaly

AlbaniaSpainRomania

HungaryFrance

GermanyIreland Denmark

Sweden

UkraineRussia

Finland

Finland (LS)

Latvia

E

−0.03 −0.01 0.01 0.03−

0.02

0.00

0.02

0.04

0.06

Ire8

PC2

PC

1

CyprusGreece

TurkeyItalyAlbaniaSpain

RomaniaHungaryFranceGermanyIrelandDenmark

Sweden UkraineRussia

Finland

Finland (LS)

Latvia

F

Neolithic hunter−gatherer: Ajv52Ajv70Ire8Neolithic farmer:Gök4

Figure S7: PCA and Procrustes analysis using individuals in the FINHM+POPRES dataset. A) PCA of modern individuals only. B) Procrustes-transformed and combined PCAs ofall 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of each Neolithic individualseparately.

28

−0.10 −0.05 0.00 0.05

0.05

0.00

−0.

05−

0.10

Modern individuals

PC2

PC

1

A

−0.10 −0.05 0.00 0.05

0.05

0.00

−0.

05−

0.10

Merged

PC2

PC

1

B

−0.10 −0.05 0.00 0.05 0.10

0.05

0.00

−0.

05−

0.10

Ajv52

PC2

PC

1

C

−0.10 −0.05 0.00 0.05 0.10

0.05

0.00

−0.

05−

0.10

Ajv70

PC2

PC

1

D

−0.15 −0.05 0.00 0.05 0.10

0.05

0.00

−0.

05−

0.10

Gök4

PC2

PC

1

E

−0.10 −0.05 0.00 0.05 0.100.

050.

00−

0.05

−0.

10

Ire8

PC2

PC

1

F

FIN (Finland)GBR+CEU (Gr. Britain/Utah)IBS (Spain)TSI (Italy)

Neolithic hunter−gatherer: Ajv52Ajv70Ire8

Neolithic farmer:Gök4

Figure S8: PCA and Procrustes analysis using individuals sampled from Europe in the1KGPomni data set. A) PCA of modern individuals only. B) Procrustes-transformed andcombined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of eachNeolithic individual separately.

29

−0.02 0.02 0.06 0.10

0.06

0.02

−0.

02−

0.06

Modern individuals

PC2

PC

1

Northern SwedenCentral Sweden

Southern Sweden

A

−0.02 −0.01 0.00 0.01 0.02 0.03

0.02

0.00

−0.

02−

0.04

Merged

PC2

PC

1 Northern SwedenCentral Sweden

Southern Sweden

B

−0.04 −0.02 0.00 0.02 0.04

0.06

0.02

−0.

02

Ajv52

PC2

PC

1

TSIIBS

CEUGBR

FIN

Northern SwedenCentral SwedenSouthern Sweden

C

−0.06 −0.02 0.02 0.06

0.06

0.02

−0.

02

Ajv70

PC2

PC

1

Northern SwedenCentral Sweden

Southern Sweden

D

−0.04 −0.02 0.00 0.02 0.04

0.04

0.02

0.00

−0.

02

Gök4

PC2

PC

1

Northern SwedenCentral SwedenSouthern Sweden

E

−0.02 −0.01 0.00 0.01 0.02 0.030.

020.

010.

00−

0.01

−0.

03

Ire8

PC2

PC

1

Northern SwedenCentral SwedenSouthern Sweden

F

FIN (Finland)GBR+CEU (Gr. Britain/Utah)IBS (Spain)TSI (Italy)Northern SwedenCentral SwedenSouthern SwedenNeolithic hunter−gatherer: Ajv52Ajv70Ire8Neolithic farmer:Gök4

Figure S9: PCA and Procrustes analysis using individuals sampled from Europe in the1KGPomni+Sweden data set. A) PCA of modern individuals only. B) Procrustes-transformedand combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs ofeach Neolithic individual separately. Centroids for Southern-, Central and Northern Swedenare indicated.

30

−0.15 −0.10 −0.05 0.00

−0.

020.

000.

020.

04

PC2

PC

1

●●●●●●●●●●●●

●●●

●●●●●●●●●●

●●●

●●

●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

● ●●●

●●●●●●●●●●●●

●●●●●●

●●●●●●●

● ●●●●●●●

●● ●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●● ●●●●●●

●●●

●●●●●● ●●●●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●

●●●

●●●●● ●●●●●

●● ●●●●●●●●●●

●●●● ●●●●●●●

●●

●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●

●●●

●●

●●●●●●

●●●●

●●

● HP

East Asia

Oceania

America

Central/SouthAsia

Europe

Figure S10: PCA and Procrustes analysis using data from two libraries prepared from apaleo-eskimo hair sample (Saqqaq; 11,72) using Phusion (denoted ”P”) and Hifi polymerase(denoted ”H”) comprising 38, 107 and 23, 587 SNPs, respectively, together with non-Africanindividuals in the HM3+FINHM+HGDP data set.

31

Ajv52 (hunter-gatherer)

Ajv70 (hunter-gatherer)

Ire8 (hunter-gatherer)

Gok4 (farmer)

Finl

and

(LS)

Finl

and

Rus

sian

Orc

adia

n

NW

Eur

opea

n an

cest

ry (C

EU)

Fren

ch

Bas

que

Nor

th It

alia

n

Italy

(TSI

)

Tusc

an

Sard

inia

n

Ady

gei

Dru

ze

Ajv52 (hunter-gatherer)

Ajv70 (hunter-gatherer)

Ire8 (hunter-gatherer)

Gok4 (farmer)

Ajv52 (hunter-gatherer)

Ajv70 (hunter-gatherer)

Ire8 (hunter-gatherer)

Gok4 (farmer)

K = 2

K = 3

K = 4

Figure S11: Model-based clustering of Neolithic and extant Europeans (plus the levantine Druze)assuming 2, 3 and 4 clusters

32

0 2 4 6 8 10 12 14

0.00

0.10

0.20

0.30

Ajv52

ATG

A

Bas

e fr

eque

ncy

in r

ead

give

n C

in r

efer

ence

B

ase

freq

uenc

y

in r

efer

ence

Distance from 5' end of sequence read

0 2 4 6 8 10 12 14

0.00

0.10

0.20

0.30

Ajv70

ATG

B

0 2 4 6 8 10 12 14

0.00

0.10

0.20

0.30

Ire8

ATG

C

0 2 4 6 8 10 12 14

0.00

0.10

0.20

0.30

Gök4

ATG

D

−5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

ATGC

E

−5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

ATGC

F

−5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

ATGC

G

−5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

ATGC

H

Figure S12: Patterns of nucleotide misincorporation and fragmentation in libraries preparedfrom Neolithic samples and amplified by Hifi polymerase. A to D shows average base frequencyin sequence reads at sites where a C is found in the reference sequence (hg18) for Ajv52, Ajv70,Ire8 and Gok4, respectively. E to H shows base frequencies in the reference around the 5’-endof the sequence reads.

33

−0.2 0.0 0.2 0.4

0.10

0.05

0.00

−0.

05

Modern individuals

PC2

PC

1

A

−0.04 0.00 0.02 0.04 0.06

0.06

0.04

0.02

0.00

−0.

04

Merged

PC2

PC

1

B

−0.05 0.00 0.05 0.10

0.06

0.04

0.02

0.00

−0.

04

Ajv52

PC2

PC

1

C

−0.10 0.00 0.05 0.10 0.15 0.20

0.05

0.00

−0.

05

Ajv70

PC2

PC

1

D

−0.05 0.00 0.05 0.10

0.08

0.04

0.00

−0.

04

Gök4

PC2

PC

1

E

−0.10 −0.05 0.00 0.050.

100.

050.

00−

0.05

Ire8

PC2

PC

1

F

FIN (Finland)TSI (Italy)

Neolithic hunter−gatherer: Ajv52Ajv70Ire8

Neolithic farmer:Gök4

Figure S13: PCA and Procrustes analysis using individuals of Italian (TSI) and Finnish(FIN) ancestry in the 1KGPomni data set and Neolithic sequences with evidence of cytosinedeamination in the first 10 bp. A) PCA of modern individuals only. B) Procrustes-transformedand combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of eachNeolithic individual separately.

34

−0.2 0.0 0.2 0.4

0.10

0.05

0.00

−0.

05

Modern individuals

PC2

PC

1

A

−0.05 0.00 0.05

0.06

0.02

−0.

02−

0.06

Merged

PC2

PC

1

B

−0.10 −0.05 0.00 0.05 0.10

0.06

0.02

−0.

02−

0.06

Ajv52

PC2

PC

1

C

−0.15 −0.10 −0.05 0.00 0.05 0.10

0.04

0.02

0.00

−0.

04

Ajv70

PC2

PC

1

D

−0.15 −0.05 0.00 0.05 0.10

0.06

0.02

−0.

02−

0.06

Gök4

PC2

PC

1

E

−0.10 −0.05 0.00 0.05 0.100.

060.

02−

0.02

−0.

06

Ire8

PC2

PC

1

F

FIN (Finland)TSI (Italy)

Neolithic hunter−gatherer: Ajv52Ajv70Ire8

Neolithic farmer:Gök4

Figure S14: PCA and Procrustes analysis using individuals of Italian (TSI) and Finnish(FIN) ancestry in the 1KGPomni data set and Neolithic sequences without evidence of cytosinedeamination in the first 10 bp. A) PCA of modern individuals only. B) Procrustes-transformedand combined PCAs of all 4 Neolithic individuals. (C-F) Procrustes-transformed PCAs of eachNeolithic individual separately.

35

0.00 0.02 0.04 0.06 0.08 0.10

−3.

0−

2.0

−1.

00.

0NA12891 (CEU)

Divergence time T (2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

ANeolithic hunter−gatherers (PWC)Pooled Neolithic hunter−gatherersNeolithic farmer (TRB)

0.00 0.02 0.04 0.06 0.08 0.10

−3.

0−

2.0

−1.

00.

0

NA12892 (CEU)

Divergence time T (2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

BNeolithic hunter−gatherers (PWC)Pooled Neolithic hunter−gatherersNeolithic farmer (TRB)

Figure S15: Relative log-likelihood curves for population divergence time T between Neolithicindividuals and complete genomes of European descent.

36

0.00 0.05 0.10 0.15 0.20

−3.

0−

2.0

−1.

00.

0

YH (Chinese)

Divergence time T (2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

A

0.00 0.05 0.10 0.15 0.20

−3.

0−

2.0

−1.

00.

0

SJK (Korean)

Divergence time T (2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

B

0.00 0.05 0.10 0.15 0.20

−3.

0−

2.0

−1.

00.

0

NA19238 (YRI)

Divergence time T (2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

C

0.00 0.05 0.10 0.15 0.20

−3.

0−

2.0

−1.

00.

0

NA19239 (YRI)

Divergence time T(2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

D

0.00 0.05 0.10 0.15 0.20

−3.

0−

2.0

−1.

00.

0

NA18507 (YRI)

Divergence time T(2Ne generations)

Rel

ativ

e lo

g−lik

elih

ood

E

Neolithic hunter−gatherers (PWC)Neolithic farmer (TRB)

Figure S16: Relative log-likelihood curves for population divergence time T between Neolithicindividuals and complete genomes from East Asian and African individuals.

37

SanYoruba Druze Neolithic farmer

TSI

Sardinian

NorthItalian

FrenchBasqueFrench

CEU

OrcadianRussian

FINLSFIN

Neolithic hunter−gatherers

Figure S17: Neighbour-joining supertree of 1,271 four-taxa topologies inferred by concor-dance tests.

38

Gök4

DruzePopulation XLSFINColombianKaritiana

Surui

Figure S18: Illustration of the model used for estimating the fraction of Neolithic farmerancestry in extant European populations.

39

Figure S19: Estimated Neolithic farmer ancestry (red) in European populations.

40

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Neolithic farmer ancestry (Outgroup: Americans)

Neo

lithi

c fa

rmer

anc

estr

y (O

utgr

oup:

Pap

uans

)

Sardinian

FIN

French

NorthItalian

CEU

FrenchBasque

Russian

Tuscan

TSI

Orcadian

Adygei

1

3

2

R = 0.99, P < 1e−10

A

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Neolithic farmer ancestry (Outgroup: Americans)

Neo

lithi

c fa

rmer

anc

estr

y (O

utgr

oup:

JP

T+

CH

B)

Sardinian

FIN

French

NorthItalian

CEU

FrenchBasque

Russian

Tuscan

TSI

Orcadian

Adygei1

3

2

R = 0.98, P < 1e−8

B

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

1.2

Neolithic farmer ancestry (Outgroup: Americans)

Neo

lithi

c fa

rmer

anc

estr

y (O

utgr

oup:

Uyg

ur)

Sardinian

FIN

French

NorthItalian

CEU

FrenchBasque

Russian

Tuscan

TSI

Orcadian

Adygei

1

32

R = 0.99, P < 1e−13

C

Figure S20: Substituting Native Americans as an outgroup yields highly correlated estimatesof Neolithic ancestry in European populations. The numbers 1, 2 and 3 in the plots refer toNorthern, Central, and Southern Sweden, respectively. In all panels, estimates using NativeAmericans are displayed on the x-axis, and estimates using a different outgroup are on they-axis. Alternative outgroups: A) Papuans, B) JPT+CHB (Japanese and Chinese), and C)Uygur (Central Asia).

41

−0.10 −0.05 0.00 0.05 0.10 0.15

−0.

10−

0.05

0.00

0.05

0.10

PC2

PC

1 ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●●●●

●●●● ●●

●● ●

●●●

● ●●

●●●

●●●

●●●

● ●

● ● ●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●●

● ●●●

●●

●●

● ●

●●● ●

●●●

● ●

●●

●●

●● ●

●●

●●●●●

●● ●

●●

●●

●●

●●●

● ●

●●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●● ●

●●

●● ●

● ●

●●●

●●

●●

● ●●●

●●

●●●

● ●

●●●

●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

Orcadian

Basque

Sardinian

North Italian

Druze

Finland

Russian

AdygeiTuscans

French

Southern/Central Sweden

Northern Sweden

Neolithichunter−gatherers

Figure S21: Principal component analysis of combined data (14,113 SNPs) from the threeNeolithic hunter-gatherers (Ajv52, Ajv70 and Ire8) and extant Europeans (including Swedes)and the Levantine Druze.

42

GR

K (h

unte

r-ga

ther

er)

Finl

and

(LS)

Finl

and

Rus

sian

Nor

ther

n Sw

eden

Cen

tral

Sw

eden

Sout

hern

Sw

eden

Orc

adia

n

NW

Eur

opea

n an

cest

ry (C

EU)

Fren

ch

Bas

que

Nor

th It

alia

n

Italy

(TSI

)

Tusc

an

Sard

inia

n

Ady

gei

Dru

ze

K = 2

K = 4

K = 3

Figure S22: Model-based clustering of combined data (14,113 SNPs) from the three Ne-olithic hunter-gatherers (Ajv52, Ajv70 and Ire8) and extant Europeans (including Swedes) andthe Levantine Druze assuming 2, 3 and 4 clusters.

43

Table S1: Summary of samples. *Pagophilus groenlandicus, #Canis familiaris.

Sample Species Context Age (years) Sex LocalityAjv52 Human PWC ca. 7 - AjvideAjv70 ” PWC 20-25 Male AjvideIre8 ” PWC 40-50 Female IreGok4 ” TRB ca. 20 Female Gokhem 94:1Ste7 ” TRB - - Gokhem 94:1GSAj15 Harp seal* - - - AjvideCFBe3a Domestic dog# - - - Linkoping

44

Table S2: Index sequences

Sample IndexAjv52 TGGTCAAjv70 ATTGGCIre8 GATCTGGok4 GCCTAASte7 ACATCGGSAj15 CACTGTCFBe3a CGTGAT

45

Table S3: Summary of initial quality control and mapping. In the main text we reportfraction of human reads based on Phusion-amplified libraries to allow direct comparison withthe negative animal controls.

Sample Indexed reads Hits to hg18 (%) Hits after rmdup (%)

PhusionAjv52 r1 22,799,659 766,058 (3.36%) 543,568 (2.38%)Ajv70 r1 19,842,862 1, 375,902 (6.93%) 962,367 (4.85%)Gok4 r1 10,409,532 1,464,876 (14.07%) 656,564 (6.31%)Ire8 r1 15,582,175 842, 520 (5.41%) 422,951 (2.71%)Ste7 r1 12,811,226 96,619 (0.75%) 45,661 (0.36%)GSAj15 6,033,355 16,430 (0.27%) 1,172 (0.02%)CFBe3a 12,278,562 28,697 (0.23%) 14,954 (0.12%)

HifiAjv52 r2 18,768,393 1,046,737 (5.58%) 862,396 (4.59%)Ajv70 r2 20,367,126 1,487,263 (7.30%) 1,309,407 (6.43%)Gok4 r2 19,460,817 2,121,404 (10.90%) 1,059,387 (5.44%)Ire8 r2 17,032,858 720,100 (4.23%) 564,792 (3.32%)Ste7 r2 18,633,041 223,274 (1.20%) 68,207 (0.37%)

Ste7 r3 13,079,481 190,104 (1.45%) 30,048 (0.23%)Ste7 r4 13,724,774 205,805 (1.50%) 26,147 (0.19%)Ste7 r5 13,761,976 205,213 (1.49%) 24,691 (0.18%)Ste7 r6 13,678,842 202,368 (1.48%) 24,164 (0.18%)Ste7 r7 13,685,086 205,167 (1.50%) 24,483 (0.18%)Ste7 r8 13,531,935 204,339 (1.51%) 24,323 (0.18%)

TotalAjv52 41,568,052 1,812,795 (4.36%) 1,405,964 (3.38%)Ajv70 40,209,988 2,863,165 (7.12%) 2,271,774 (5.65%)Gok4 29,870,349 3,586,280 (12.01%) 1,715,951 (5.74%)Ire8 32,615,033 1,562,620 (4.79%) 987,743 (3.03%)Ste7 112,906,361 318,710 (0.28%) 173,849 (0.15%)

46

Table S4: Summary of mtDNA assembly.

Sample Reads Mean depth rCRS coverage % rCRS Total bases Mean lengthAjv52 4,030 14.0 16,545 99.86% 232,332 57.7Ajv70 3,752 13.8 16,541 99.83% 227,824 60.7Ire8 3,027 9.8 16,553 99.90% 162,753 53.8Gok4 1,183 4.2 16,037 96.79% 67,861 57.4Ste7 3,892 14.2 16,551 99.89% 235,119 60.4

47

Ta

ble

S5

:S

umm

ary

ofha

plog

roup

(Hg)

defi

ning

pol

ymor

phis

ms.

Pos

itio

nsw

ere

nose

quen

ceda

taw

ere

avai

labl

ear

ein

dica

ted

wit

ha

*.

Sam

ple

Hg

inre

f.8

(d-l

oop

)N

ewH

gP

rese

ntH

gde

fini

ngS

NP

sA

bsen

tH

gde

fini

ngS

NP

sO

ther

diag

nost

icS

NP

sA

jv52

VV

72C

263G

750G

1438

G27

06G

4580

A70

28T

1532

6G15

904T

1629

8C

4769

G88

60G

*-

Ajv

70U

4/H

1bU

4d/U

4a2

73G

263G

T31

0C49

9A62

9C75

0G14

38G

1811

G27

06G

4646

C47

69G

5999

C60

47G

7028

T11

332T

1146

7G11

719A

1230

8G12

372A

1462

0T14

766T

1532

6G15

693C

1635

6C

195C

2405

.1C

8860

G*/

195C

8818

T88

60G

*93

G43

94T

4577

T46

16T

6083

T74

24G

1008

8T16

093C

1651

9C

Ire8

U4/

H1b

U4d

73G

263G

499A

629C

750G

1438

G18

11G

2706

G47

69G

5999

C60

47G

7028

T11

332T

1146

7G11

719A

1230

8G12

372A

1462

0T14

766T

1532

6G15

693C

1635

6C

195C

2405

.1C

4646

C88

60G

*93

G67

55A

7978

T91

92A

1008

8T11

337G

1504

9T16

093C

1651

9C

Gok

4H

H26

3G75

0G14

38G

1532

6G47

69G

8860

G*

298T

4814

T60

86C

8345

T11

071T

1614

516

167T

1651

9CS

te7

TT

2b3/

T2b

3a73

G15

1T26

3G70

9A75

0G93

0A14

38G

1888

A27

06G

4216

C49

17G

7028

T86

97A

1046

3C10

750G

1125

1G11

719A

1181

2G13

368A

1423

3G14

766T

1490

5A15

326G

1545

2A15

607G

1592

8A16

126C

1629

4T16

304C

4769

G51

47A

*88

60G

1629

6/15

2C47

69G

5147

A*

5656

G88

60G

1629

2T16

296T

1473

T88

38A

8856

A16

519C

48

Table S6: Summary of mtDNA polymorphisms not informative for haplogroup assignment.

Sample Non-diagnostic SNPsAjv52 1224T 1443C 1449A 1933T 4776A 5228T 5763A 5882T 5890T 6880T 7095T

7098T 7380A 7382A 7422A 7588A 7743T 9693T 14622TAjv70 4398T 4585T 4590G 4599T 4604T 4617T 4624T 4633T 4636T 6869TIre8 4T 17T 18T 29T 33T 1224T 1226T 1315A 2174A 2182A 2185A 4153A 4189T

5211T 5974T 6075A 6181T 6993A 7412T 7427T 7974T 8351T 8375T 8998A10213T 13860T

Gok4 320T 791A 1084T 1108T 1223T 1334A 1472A 1601T 1918T 2956G 2981T 3835T3868C 4003C 4092A 4220C 4656A 4724T 4848A 4849A 5172A 5245C 5490T5491T 5540A 5671A 5679A 5688A 5693C 5784T 5818T 5871A 5899T 5901T6049A 6368T 6402G 6421T 6434T 6447T 6448T 6449T 6463T 6568T 6573A6576A 6884T 7090A 7108A 7183G 7185G 7618T 7627T 7628T 7633T 7792T7794T 7801T 8243A 8346T 8352T 8610T 8719A 8923G 8941T 9030T 9693T10255G 11073T 11074T 11219T 12110T 12264A 12267T 13431T 13436T 13455T13458T 13464T 13625T 14528T 14532T 14533T 14535T 14685A 14771T 15434T15436T 15437A 16369A 16379T 16380T

Ste7 1951T 4358A 4488T 4521T 4776A 6081A 6087A 6156T 6753A 6754A 7367T7369T 7370T 7382A 7462T 7470T 7957T 7958T 8812C 9693T 13347T

49

Table S7: Triallelic site frequency in genome sequence data.

Total number of SNPs Neolithic data has Neolithic data hasSample (deamination excluded) third allele (count) third allele (%)Ajv52 94,650 2,276 2.46%Ajv70 153,973 3,833 2.55%Ire8 49,744 2,248 4.73%Gok4 105,274 2,929 2.86%Ste7 8,794 1,303 17.39%

50

Table S8: Triallelic site frequency in genotype array data.

Total number of SNPs Neolithic data has Neolithic data hasSample (deamination excluded) third allele (count) third allele (%)Ajv52 26,495 435 1.61%Ajv70 44,788 837 1.83%Ire8 11,763 279 2.32%Gok4 27,910 763 2.66%Ste7 1,348 68 4.80%

51

Table S9: Number of SNPs called using sequences with (’Degraded’) and without (’Notdegraded’) evidence of cytosine deamination in the the 5’ end.

Sample Degraded Not degradedAjv52 2,551 4,870Ajv70 3,070 8,946Ire8 686 1,532Gok4 1,273 5,926

52

Table S10: Allele sharing between Neolithic- and extant European populations. Sortedaccording to allele sharing with Neolithic farmer.

Population Label Sharing with Sharing with Neolithic Number ofin Fig. 3C Neolithic farmer hunter-gatherers Long Lat chr.

Cyprus Cyp 68.20% 68.21% 33 35 8Greece Gre 67.94% 68.51% 22 39 16France Fra 67.89% 68.80% 2 46 178Netherlands Net 67.88% 68.79% 5 52 34Romania Rom 67.84% 68.62% 25 46 28Italy Ita 67.81% 68.43% 12 42 438Germany Ger 67.80% 68.80% 10 51 142Croatia Cro 67.76% 68.67% 15 45 16Portugal Por 67.75% 68.59% -8 39 256Belgium Bel 67.73% 68.78% 4 50 86Spain Spa 67.72% 68.59% -4 40 272Poland Pol 67.71% 68.98% 20 52 44Austria Aus 67.69% 68.65% 13 47 28United Kingdom UK 67.68% 68.79% -2 53 400Serbia Ser 67.67% 68.62% 20 44 88Macedonia Mac 67.62% 68.58% 22 41 8Sweden Swe 67.61% 68.84% 15 62 20Ireland Ire 67.61% 68.71% -8 53 122Hungary Hun 67.60% 68.58% 20 47 38Russian Rus 67.56% 68.72% 37 55 12Turkey Tur 67.55% 67.98% 35 39 8FIN FIN 67.47% 68.77% 25 61 80LSFIN LSFIN 67.44% 68.79% 26 64 162Bosnia Bos 67.39% 68.81% 17 44 18Scotland Sco 67.35% 68.81% -4 56 10

53

Table S11: Population divergence time of Neolithic individuals from modern genomes ofNorthwest European ancestry (CEU).

NA12891 NA12892

Sample Nconc Ndisc T Nconc Ndisc TAjv52 3,986 7,549 0.0185 3,843 7,469 0.0096Ajv70 6,132 11,727 0.0151 6,075 11,750 0.0113Ire8 2,016 3,914 0.0100 1,960 3,840 0.0069Gok4 4,324 8,274 0.0150 4,362 8,129 0.0241PWC (Ajv52+Ajv70+Ire8) 11,843 22,628 0.0155 11,608 22,514 0.0103

54

Ta

ble

S1

2:

Pop

ulat

ion

dive

rgen

ceti

me

ofN

eolit

hic

indi

vidu

als

from

mo

dern

geno

mes

ofE

ast

Asi

anan

cest

ryan

dW

est

Afr

ican

ance

stry

(YR

I).

YH

SJK

NA

1923

8(Y

RI)

NA

1923

9(Y

RI)

NA

1850

7(Y

RI)

Nco

nc

Ndis

cT

Nco

nc

Ndis

cT

Nco

nc

Ndis

cT

Nco

nc

Ndis

cT

Nco

nc

Ndis

cT

Ajv

524,

918

6,64

20.

1490

4,90

86,

381

0.16

504,

252

7,49

60.

0438

4,18

77,

773

0.02

544,

177

8,06

60.

0118

Ajv

707,

761

10,2

700.

1574

7,83

49,

903

0.17

736,

826

11,4

870.

0609

6,75

012

,244

0.03

366,

521

12,3

490.

0185

Ire8

2,41

03,

434

0.12

622,

481

3,23

70.

1635

2,18

23,

793

0.04

592,

113

3,99

40.

0192

2,12

14,

197

0.00

36G

ok4

5,55

67,

226

0.16

495,

378

6,95

10.

1676

4,65

38,

001

0.05

294,

730

8,57

70.

0337

4,67

68,

723

0.02

37P

WC

14,7

2419

,854

0.14

9314

,861

19,0

550.

1710

12,9

2122

,244

0.05

2512

,721

23,3

970.

0287

12,5

2423

,987

0.01

46

55

Table S13: Examples of concordance tests relating the Neolithic individuals with extanthuman populations.

(A,B), (C,D) Nconc Ndisc1 Ndisc2 C SE Z(Sardinian,Gok4),(French,TSI) 564 493 437 0.067 0.028 2.43(Yoruba,Druze),(Gok4,LSFIN) 598 500 498 0.089 0.025 3.50(Druze,Gok4),(CEU,FIN) 556 476 475 0.078 0.024 3.20(FrenchBasque,Gok4),(CEU,Russian) 532 457 457 0.076 0.023 3.30(TSI,Gok4),(Orcadian,FIN) 578 477 474 0.096 0.027 3.55

(Druze,Sardinian),(Russian,Neolithic HGs) 1,609 1,439 1,399 0.055 0.016 3.47(Druze,TSI),(LSFIN,Neolithic HGs) 1,637 1,486 1,486 0.048 0.015 3.20(Druze,Sardinian),(LSFIN,Neolithic HGs) 1,591 1,426 1,376 0.055 0.015 3.66(Druze,Sardinian),(Orcadian,Neolithic HGs) 1,599 1,443 1,406 0.051 0.017 3.10(Druze,Sardinian),(FIN,Neolithic HGs) 1,670 1,487 1,436 0.058 0.016 3.73

56

Table S14: 4-population tests for admixture with Gok4 assuming four different outgroups(O). Tests with |Z| > 3 are highlighted in bold.

Pop X Z(O = Yoruba) Z(O = N. Am.) Z(O = Papuan) Z(O = Druze)Sardinian -5.74 -4.40 -4.50 -1.15TSI+Tuscan -5.03 -3.33 -3.66 -0.64North Italian -5.40 -3.58 -3.80 -0.67French Basque -5.53 -3.46 -3.74 -1.13French -5.30 -3.21 -3.53 -1.13CEU -5.35 -3.12 -3.64 -1.22Orcadian -4.97 -2.61 -3.39 -0.82Russian -4.78 -1.66 -2.51 -0.84FIN -4.84 -1.91 -2.70 -1.00

57

Table S15: Estimated Neolithic farmer ancestry in European populations.

Population Gok4-related ancestry (c) Standard errorSardinian 95.3% 13.4%North Italian 78.9% 12.1%TSI 77.4% 11.1%Tuscan 69.8% 11.6%French Basque 65.0% 10.2%French 55.3% 8.5%CEU 51.7% 7.8%Orcadian 46.9% 7.8%Adygei 46.4% 7.7%Southern Sweden 41.1% 7.8%Central Sweden 35.8% 7.1%Northern Sweden 31.1% 6.3%FIN 14.0% 3.4%Russian 11.3% 4.1%

58

References and Notes 1. G. V. Childe, The Dawn of European Civilization (Kegan, Paul, Trench & Trubner. London, 1925).

2. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Synthetic maps of human gene frequencies in Europeans. Science 201, 786 (1978). doi:10.1126/science.356262 Medline

3. L. Cavalli-Sforza, P. Menozzi, A. Piazza, The History and Geography of Human Genes (Princeton Univ. Press, Princeton, NJ, 1994).

4. J. Novembre, S. Ramachandran, Perspectives on human population structure at the cusp of the sequencing era. Annu. Rev. Genomics Hum. Genet. 12, 245 (2011). doi:10.1146/annurev-genom-090810-183123 Medline

5. C. Renfrew, P. G. Bahn, Archaeology: Theories, Methods, and Practice (Thames and Hudson, New York, 1991).

6. M. Stoneking, J. Krause, Learning about human population history from ancient and modern genomes. Nat. Rev. Genet. 12, 603 (2011). doi:10.1038/nrg3029 Medline

7. B. Bramanti et al., Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers. Science 326, 137 (2009). doi:10.1126/science.1176869 Medline

8. H. Malmström et al., Ancient DNA reveals lack of continuity between neolithic hunter-gatherers and contemporary Scandinavians. Curr. Biol. 19, 1758 (2009). doi:10.1016/j.cub.2009.09.017 Medline

9. W. Haak et al.; Members of the Genographic Consortium, Ancient DNA from European early neolithic farmers reveals their near eastern affinities. PLoS Biol. 8, e1000536 (2010). doi:10.1371/journal.pbio.1000536 Medline

10. M. Lacan et al., Ancient DNA reveals male diffusion through the Neolithic Mediterranean route. Proc. Natl. Acad. Sci. U.S.A. 108, 9788 (2011). doi:10.1073/pnas.1100723108 Medline

11. M. Rasmussen et al., Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757 (2010). doi:10.1038/nature08835 Medline

12. M. Malmer, The Neolithic of South Sweden: TRB, GRK, and STR (Royal Swedish Academy of Letters History and Antiquities; Almquist & Wiksell International, Stockholm, Sweden, 2002).

13. H. Malmström et al., More on contamination: The use of asymmetric molecular behavior to identify authentic ancient human DNA. Mol. Biol. Evol. 24, 998 (2007). doi:10.1093/molbev/msm015 Medline

14. H. Malmström et al., High frequency of lactose intolerance in a prehistoric hunter-gatherer population in northern Europe. BMC Evol. Biol. 10, 89 (2010). doi:10.1186/1471-2148-10-89 Medline

15. K.-G. Sjögren, Megaliths and Identities, paper presented at the third European Megalithic Studies Group Meeting, 13 to 15 May 2010, Kiel University, Germany, M. Furholt, F. Lüth, J. Müller, C. Scarre, Eds.; www.jungsteinsite.de.

16. See supplementary materials on Science Online.

17. K.-G. Sjögren, T. Price, T. Ahlström, Megaliths and mobility in south-western Sweden. Investigating relationships between a local society and its neighbours using strontium isotopes. J. Anthropol. Archaeol. 28, 85 (2009). doi:10.1016/j.jaa.2008.10.001

18. G. Lunter, M. Goodson, Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21, 936 (2011). doi:10.1101/gr.111120.110 Medline

19. A. W. Briggs et al., Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. U.S.A. 104, 14616 (2007). doi:10.1073/pnas.0704665104 Medline

20. N. Patterson, A. L. Price, D. Reich, Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006). doi:10.1371/journal.pgen.0020190 Medline

21. J. Z. Li et al., Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100 (2008). doi:10.1126/science.1153717 Medline

22. I. Surakka et al., Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging. Genome Res. 20, 1344 (2010). doi:10.1101/gr.106534.110 Medline

23. D. M. Altshuler et al.; International HapMap 3 Consortium, Integrating common and rare genetic variation in diverse human populations. Nature 467, 52 (2010). doi:10.1038/nature09298 Medline

24. M. R. Nelson et al., The Population Reference Sample, POPRES: A resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347 (2008). doi:10.1016/j.ajhg.2008.08.005 Medline

25. R. M. Durbin et al.; 1000 Genomes Project Consortium, A map of human genome variation from population-scale sequencing. Nature 467, 1061 (2010). doi:10.1038/nature09534 Medline

26. D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655 (2009). doi:10.1101/gr.094052.109 Medline

27. M. Jakobsson, N. A. Rosenberg, CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801 (2007). doi:10.1093/bioinformatics/btm233 Medline

28. E. Salmela et al., Swedish population substructure revealed by genome-wide single nucleotide polymorphism data. PLoS ONE 6, e16747 (2011). doi:10.1371/journal.pone.0016747 Medline

29. J. Krause et al., A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr. Biol. 20, 231 (2010). doi:10.1016/j.cub.2009.11.068 Medline

30. L. Orlando et al., True single-molecule DNA sequencing of a pleistocene horse bone. Genome Res. 21, 1705 (2011). doi:10.1101/gr.122747.111 Medline

31. D. Reich, K. Thangaraj, N. Patterson, A. L. Price, L. Singh, Reconstructing Indian population history. Nature 461, 489 (2009). doi:10.1038/nature08365 Medline

32. P. Skoglund, A. Götherström, M. Jakobsson, Estimation of population divergence times from non-overlapping genomic sequences: Examples from dogs and wolves. Mol. Biol. Evol. 28, 1505 (2011). doi:10.1093/molbev/msq342 Medline

33. A. Fischer, Food for Feasting? An Evaluation of Explanations of the Neolithisation of Denmark and Southern Sweden, no. 12 in Sheffield Archaeological Monographs (J. R. Collis Publication, Sheffield, UK, 2002).

34. F. Hallgren, Identitet i praktik: lokala, regionala och överregionala sociala sammanhang inom nordlig trattbägarkultur, no. 17 in Coast to Coast-Book (Department of Archaeology and Ancient History, Uppsala University, Uppsala, Sweden, 2008).

35. M. Midgley, The Megaliths of Northern Europe (Routledge, New York, 2008).

36. N. Stenbäck, Människorna vid havet: platser och keramik på Ålands-öarna perioden 3500-2000 f.Kr., Stockholm studies in archaeology 28 (Department of Archaeology, Stockholm University, Stockholm, Sweden, 2008).

37. T. Ahlström, Lund Archaeol. Rev. 3, 37 (1997).

38. T. Ahlström, Stockholm Archaeol. Rep. 33, 325 (1997).

39. T. Ahlström, Grave or Ossuary? Osteological Finds from a Recently Excavated Passage Tomb in Falbygden, no. 1201 in British Archaelogical Reports International Series (Archaeopress, Oxford, 2003).

40. T. Ahlström, Underjordiska dödsriken humanosteologiska studier av neolitiska kollektivgravar, no. 18 of Coast to Coast Books (Göteborgs Universitet, Lund, Sweden, 2009).

41. K.-G. Sjögren, Fragment av ordning. Undersökning av överplöjda megalitgravar vid Frälsegården, Gökhems socken, Västergötland, no. 23 in Västergötlands Museum Rapport (Göteborg University, Lund, Sweden, 2008).

42. G. Eriksson et al.,, J. Anthropol. Archaeol. 27, 520 (2008).

43. G. Eriksson, Journal of Anthropological Archaeology pp. 135-162 (2004).

44. K. Lidén, Prehistoric Diet Transitions—An Archaeological Perspective, no. 1 in Theses and Papers in Scientific Archaeology (Stockholm University, Hässelby, Sweden, 1995).

45. C. Lindqvist, G. Possnert, The Subsistence Economy and Diet at Jakobs/Ajvide, Eksta Parish and other Prehistoric Dwelling and Burial sites on Gotland in Long-term Perspective, no. 13:a in Theses and Papers in Scientific Archaeology (Stockholm University, Hässelby, Sweden, 1997).

46. K. Lidén, Laborativ Arkeologi 9, 5 (1995).

47. K. Lidén, G. Eriksson, B. Nordqvist, A. Götherström, E. Bendixen, Antiquity 79, 23 (2004).

48. J. Norderäng, 14C-dateringar från Ajvide, no. 3 in Gotland University Press Monograph (Gotland University, Visby, Sweden, 2008).

49. G. Janzon, Gotlands Mellanneolitiska Gravar, Studies in North-European Archaeology (Almqvist & Wiksell, Stockholm, Sweden, 1974).

50. P. Molnar, Tracing Prehistoric Activities. Life Ways, Habitual Behaviour and Health of Hunter-Gatherers on Gotland, Theses and Papers in Osteoarchaeology no. 4. (Stockholm University, Stockholm, Sweden, 2002).

51. D. Y. Yang, B. Eng, J. S. Waye, J. C. Dudar, S. R. Saunders, Improved DNA extraction from ancient bones using silica-based spin columns. Am. J. Phys. Anthropol. 105, 539 (1998). doi:10.1002/(SICI)1096-8644(199804)105:4<539::AID-AJPA10>3.0.CO;2-1 Medline

52. H. Li et al.; 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078 (2009). doi:10.1093/bioinformatics/btp352 Medline

53. R. E. Green et al., Analysis of one million base pairs of Neanderthal DNA. Nature 444, 330 (2006). doi:10.1038/nature05336 Medline

54. R. E. Green et al., A draft sequence of the Neandertal genome. Science 328, 710 (2010). doi:10.1126/science.1188021 Medline

55. Z. Zhang, S. Schwartz, L. Wagner, W. Miller, A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203 (2000). doi:10.1089/10665270050081478 Medline

56. M. L. Kampmann et al., A simple method for the parallel deep sequencing of full influenza A genomes. J. Virol. Methods 178, 243 (2011). doi:10.1016/j.jviromet.2011.09.001 Medline

57. H. N. Poinar et al., Metagenomics to paleogenomics: Large-scale sequencing of mammoth DNA. Science 311, 392 (2006). doi:10.1126/science.1123360 Medline

58. A. Kloss-Brandstätter et al., HaploGrep: A fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum. Mutat. 32, 25 (2011). doi:10.1002/humu.21382 Medline

59. M. van Oven, M. Kayser, Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386 (2009). doi:10.1002/humu.20921 Medline

60. R. M. Andrews et al., Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999). doi:10.1038/13779 Medline

61. P. Scheet, M. Stephens, A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629 (2006). doi:10.1086/502802 Medline

62. J. K. Pickrell et al., Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826 (2009). doi:10.1101/gr.087577.108 Medline

63. J. Novembre et al., Genes mirror geography within Europe. Nature 456, 98 (2008). doi:10.1038/nature07331 Medline

64. S. Levy et al., The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007). doi:10.1371/journal.pbio.0050254 Medline

65. J. Wang et al., The diploid genome sequence of an Asian individual. Nature 456, 60 (2008). doi:10.1038/nature07484 Medline

66. S. M. Ahn et al., The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622 (2009). doi:10.1101/gr.092197.109 Medline

67. D. Pushkarev, N. F. Neff, S. R. Quake, Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847 (2009). doi:10.1038/nbt.1561 Medline

68. Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69 (2005). doi:10.1038/nature04072 Medline

69. S. C. Schuster et al., Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943 (2010). doi:10.1038/nature08795 Medline

70. C. Wang et al., Stat. Appl. Genet. Mol. Biol. 9, 13 (2010).

71. B. E. Engelhardt, M. Stephens, Analysis of population structure: A unifying framework and novel methods based on sparse factor analysis. PLoS Genet. 6, e1001117 (2010). doi:10.1371/journal.pgen.1001117 Medline

72. A. Ginolhac, M. Rasmussen, M. Gilbert, E. Willerslev, L. Orlando, Bioinformatics 27, 2153 (2011).

73. H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589 (2010). doi:10.1093/bioinformatics/btp698 Medline

74. N. Rosenberg, distruct: A program for the graphical display of population structure. Mol. Ecol. Notes 4, 137 (2004). doi:10.1046/j.1471-8286.2003.00566.x

75. M. García-Garcerà et al., Fragmentation of contaminant and endogenous DNA in ancient samples determined by shotgun sequencing; prospects for human palaeogenomics. PLoS ONE 6, e24161 (2011). doi:10.1371/journal.pone.0024161 Medline

76. M. Fogg, L. Pearl, B. Connolly, Structural basis for uracil recognition by archaeal family B DNA polymerases. Nat. Struct. Mol. Biol. 9, 922 (2002). doi:10.1038/nsb867

77. R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792 (2004). doi:10.1093/nar/gkh340 Medline

78. Fields Development Team, Fields: Tools for Spatial Data (National Center for Atmospheric Research, Boulder, CO, 2006); www.cgd.ucar.edu/Software/Fields.

79. J. Wakeley, Coalescent Theory (Roberts & Company, Greenwood Village, CO, 2008).

80. I. Gronau, M. Hubisz, B. Gulko, C. Danko, A. Siepel, Nat. Genet. 43, 1031 (2011).

81. A. Keinan, J. C. Mullikin, N. Patterson, D. Reich, Measurement of the human allele frequency spectrum demonstrates greater genetic drift in East Asians than in Europeans. Nat. Genet. 39, 1251 (2007). doi:10.1038/ng2116 Medline

82. D. Reich et al., Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053 (2010). doi:10.1038/nature09710 Medline

83. C. J. Creevey, J. O. McInerney, Clann: Investigating phylogenetic information through supertree analyses. Bioinformatics 21, 390 (2005). doi:10.1093/bioinformatics/bti020 Medline

84. M. Jakobsson et al., Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998 (2008). doi:10.1038/nature06742 Medline

85. P. Moorjani et al., The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 7, e1001373 (2011). doi:10.1371/journal.pgen.1001373 Medline


Recommended