+ All Categories
Home > Documents > s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a...

s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a...

Date post: 21-Mar-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
37
Recent polygenic selection on educational attainment: a replication Davide Piffer Email: [email protected] Abstract The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model. Average frequencies of alleles with positive (Beta) effect on the phenotype (polygenic scores) were compared across populations (N=26)using data from 1000 Genomes. The polygenic score of 152 SNPs that reached genome-wide significance in the meta-analysis by Okbay et al. (2016) of the discovery and replication samples (N =405,072) was highly correlated to population IQ (r=0.863). Moreover, the polygenic scores obtained from the three independent GWAS exhibited strong intercorrelations even after pruning for linkage disequilibrium. The method of correlated vectors revealed the presence of a Jensen effect of SNP p value on population IQ and factor from the two previous GWAS (r= -.25). Factor analysis produced similar estimates of polygenic selection strength for educational attainment across the three datasets. The SNPs from the largest GWAS were subset by p value (N= 7) and factor analyzed. An SNP set’s P value-rank correlated substantially (0.4) with a composite index including measures of predictive validity and reliability (r x population IQ, average factor loadings, r x factor scores from the 2 previous GWAS, SAC (spatial autocorrelation)-free effect on population IQ. Moreover, the composite index of factor reliability and validity was strongly correlated (r=0.96) to loadings on a factor extracted from the 7 factors (“meta-factor”). That is, the factors’ with stronger independent correlations to measures of accuracy had stronger loadings on the “meta-factor”. Nine hits were found to be in LD across publications. This produced replicated factor and polygenic scores with strong correlations to population IQ (0.89 and 0.82-0.9, respectively), surviving control for spatial autocorrelation (B= 0.69 and 0.35-0.79, respectively). 1
Transcript
Page 1: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Recent polygenic selection on educational attainment: a replication

Davide PifferEmail: [email protected]

Abstract

The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model.Average frequencies of alleles with positive (Beta) effect on the phenotype (polygenic scores) were compared across populations (N=26)using data from 1000 Genomes. The polygenic score of 152 SNPs that reached genome-wide significance in the meta-analysis by Okbay et al. (2016) of the discovery and replication samples (N =405,072) was highly correlated to population IQ (r=0.863).

Moreover, the polygenic scores obtained from the three independent GWAS exhibited strong intercorrelations even after pruning for linkage disequilibrium.The method of correlated vectors revealed the presence of a Jensen effect of SNP p value on population IQ and factor from the two previous GWAS (r= -.25). Factor analysis produced similar estimates of polygenic selection strength for educational attainment across the three datasets. The SNPs from the largest GWAS were subset by p value (N= 7) and factor analyzed. An SNP set’s P value-rank correlated substantially (0.4) with a composite index including measures of predictive validity and reliability (r x population IQ, average factor loadings, r x factor scores from the 2 previous GWAS, SAC (spatial autocorrelation)-free effect on population IQ. Moreover, the composite index of factor reliability and validity was strongly correlated (r=0.96) to loadings on a factor extracted from the 7 factors (“meta-factor”). That is, the factors’ with stronger independent correlations to measures of accuracy had stronger loadings on the “meta-factor”. Nine hits were found to be in LD across publications. This produced replicated factor and polygenic scores with strong correlations to population IQ (0.89 and 0.82-0.9, respectively), surviving control for spatial autocorrelation (B= 0.69 and 0.35-0.79, respectively).The results together constitute a replication of preliminary findings and provide unequivocal evidence for recent diversifying polygenic selection on educational attainment and underlying cognitive ability.

Introduction The aim of this study is to replicate the studies by Piffer (2015, 2013) that educational attainment and cognition GWAS hits have different frequencies across populations and thus, were subject to different selection pressures. To this end, the hits from the two latest GWAS on educational attainment (Davies et al., 2016; Okbay et al., 2016) will be used in the analysis. The first GWAS was carried out using the UK Biobank sample (N=100K+). Over a thousand SNPs reached genome-wide significance (P< 5 x 10-8), but after controlling for linkage disequilibrium (Genotypes were LD pruned using clumping to obtain SNPs in linkage equilibrium with an r2<0.25 within a 200 bp window), a few independent signals were

1

Page 2: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

identified (Davies et al., 2016). For the sake of simplicity, the three hits found by Rietveld et al. (2013) were lumped together with this polygenic score.The second GWAS was carried out on a sample of 293K+ individuals (Okbay et al., 2016) and produced 74 independent (“LD-free”) hits. Factor analysis will be used to extract a factor accounting for cross-population variation in allele frequency, hence representing a signal of polygenic selection. Factor loadings will be examined to ascertain the reliability of the factor (i.e. do most alleles with positive GWAS effect load positively on the factor?). Predictive validity will be measured by computing the correlation between factor scores and population IQ. If alleles with positive GWAS beta (within population effect) load positively on a factor that is positively correlated to population IQ, this is interpreted as evidence of directional selection on the phenotype (educational attainment or related cognitive abilities).

Methods 1000 Genomes Frequencies were calculated from VCF files belonging to the phase 3 data: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/Rietveld et al. (2013) produced 3 SNPs reaching GWAS significance for educational attainment.Davies et al. (2016) reported 1115 SNPs reaching GWAS significance, of which 15 were independent signals for educational attainment. 942 SNPs were found on 1000 Genomes. Among the 15 independent signals, one (2:48696432_G_A) was missing. Okbay et al. (2016) reported 74 SNPs associated with years of education. 70 were found in 1000 Genomes (the other 4 variants were flagged because they had more than 3 different alleles).Population IQs for the 1000 Genomes populations were obtained from Piffer (2015).Polygenic score refers to the average frequency of alleles with positive effect at the individual level (i.e. GWAS beta).Statistical analyses were carried out using R (v. 3.2.3). Results Polygenic scores

Rietveld et al., 2016

A polygenic score was created using the top 3 SNPs in Rietveld et al. Davies et al., 2016 Davies et al. (2016) reported 1115 SNPs reaching GWAS significance, of which 15 were independent signals for educational attainment. 942 SNPs were found on 1000 Genomes. Among the 15 independent signals, one (2:48696432_G_A) was missing. Thus, a polygenic score (I.S. PS) was calculated using 14 SNPs.

2

Page 3: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Okbay et al., 2016

Okbay et al. (2016) reported 74 loci independently associated with educational attainment (years of education). Polygenic scores and population IQs are reported in table 1.

Table 1. Polygenic scores and population IQ.

PopulationRietveld et al. _2013 PS_Ed_Att_Davies PS_Ed_Att_Okbay IQ

Afr.Car.Barbados 0.106 0.419 0.508 83

US Blacks 0.129 0.447 0.517 85

Bengali Bangladesh 0.227 0.516 0.507 81

Chinese Dai 0.418 0.610 0.547

Utah Whites 0.374 0.493 0.506 99

Chinese, Bejing 0.434 0.671 0.563 105

Chinese, South 0.414 0.648 0.555 105

Colombian 0.252 0.500 0.509 83.5

Esan, Nigeria 0.096 0.416 0.507 71

Finland 0.387 0.560 0.523 101

British, GB 0.397 0.526 0.512 100

Gujarati Indian, Tx 0.311 0.498 0.508

Gambian 0.085 0.438 0.507 62

Iberian, Spain 0.366 0.512 0.519 97

Indian Telegu, UK 0.229 0.510 0.502

Japan 0.417 0.652 0.556 105

Vietnam 0.461 0.618 0.552 99.4

Luhya, Kenya 0.079 0.425 0.502 74

Mende, Sierra Leone 0.127 0.416 0.509 64

Mexican in L.A. 0.237 0.499 0.505 88

Peruvian, Lima 0.196 0.477 0.488 85

Punjabi, Pakistan 0.257 0.511 0.513 84

Puerto Rican 0.279 0.489 0.503 83.5

3

Page 4: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Sri Lankan, UK 0.222 0.506 0.501 79

Toscani, Italy 0.354 0.501 0.518 99

Yoruba, Nigeria 0.097 0.421 0.512 71

The polygenic scores have strong intercorrelations and are also strongly correlated to population IQ (table 2).

Table 2. Correlation matrix

Factor analysisThe 17 hits from the two GWAS (Rietveld et al., 2013 and Davies et al., 2016) were lumped together and a hit (rs1906252) was removed because in LD with rs9320913 from Rietveld et al., 2013. This yielded a set of 16 LD-free SNPs.A factor analysis (function “fa”, package “psych”) was carried out using Ordinary Least Squares to find the minimum residual solution. The proportion of variance explained was 0.54. This factor was correlated to population IQ (r= 0.89). 14/16 alleles loaded positively and the average loading was 0.494 (table 3).

Table 3. Factor loadings (structure matrix)

SNP Factor Loading

rs13086611_T -0.76

rs11130222_A 0.03

4

Page 5: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

rs12553324_G 0.87

rs55686445_C 0.88

rs9393692_G 0.03

rs3847225_C 0.43

rs4799950_G 0.7

rs4318611_A 0.52

rs112374913_A 0.89

rs12042107_C -0.67

rs11210887_A 0.93

rs482507_T 0.53

rs7701440_T 0.98

rs9320903_A 0.75

rs11584700_G 0.85

rs4851266_T 0.95

Mean 0.494

A factor analysis was carried out for 7 sets of 10 SNPs belonging to the 74 Okbay et al. (2016) independent hits (4 were missing). The number 10 was chosen for two reasons: 1) To follow the recommendation that the subject to item ratio be >2:1; 2) Because 70 (the total number of SNPs) is a multiple of 10.These were sorted by p value, with the first group having the lowest p value (i.e. highest GWAS significance). Factor scores are reported in table 4.

Table 4. Factor scores and population IQ

FactorRietvDavies

Fac_Okbay_1

Fac_Okbay_2

Fac_Okbay_3

Fac_Okbay_4

Fac_Okbay_5

Fac_Okbay_6

Fac_Okbay_7

Afr.Car.Barbados -1.386 -0.896 -1.360 1.351 0.125 -1.352 -0.174 1.080

US Blacks -1.041 -0.534 -0.964 1.231 0.531 -0.874 -0.206 1.322

Bengali Bangladesh -0.010 0.613 0.862 -0.582 -0.949 0.530 0.023 0.159

Chinese 1.229 -0.801 1.090 -0.742 1.371 0.034 2.009 -0.247

5

Page 6: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Dai

Utah Whites 0.385 1.278 0.006 -0.764 -0.949 0.898 -0.818 -1.093

Chinese, Bejing 1.614 0.347 1.483 -0.395 1.507 -0.091 2.045 -0.341

Chinese, South 1.399 0.142 1.266 -0.371 1.627 0.067 1.882 -0.478

Colombian 0.155 0.637 -0.320 -0.590 -0.437 0.992 -0.299 -0.749

Esan, Nigeria -1.517 -1.148 -1.578 1.703 0.188 -1.634 -0.400 1.428

Finland 0.873 1.378 -0.238 -1.163 -0.668 0.972 -0.976 -1.470

British, GB 0.568 1.547 -0.290 -0.296 -1.270 0.850 -0.698 -1.006

Gujarati Indian, Tx 0.065 -0.149 0.547 -0.575 -1.229 0.352 -0.337 -0.450

Gambian -1.380 -1.471 -1.274 1.645 0.597 -1.528 -0.144 1.829

Iberian, Spain 0.431 1.928 0.014 -0.458 -0.882 1.003 -0.843 -0.861

Indian Telegu, UK 0.030 0.112 0.542 -0.442 -0.968 0.436 -0.618 -0.046

Japan 1.422 0.058 1.456 -0.508 1.542 0.011 1.731 -0.224

Vietnam 1.252 -0.201 1.419 -0.607 1.510 -0.075 1.694 -0.468

Luhya, Kenya -1.439 -1.496 -1.372 1.624 0.135 -1.370 -0.570 1.524

Mende, Sierra Leone -1.404 -1.449 -1.347 1.676 0.390 -1.669 -0.111 1.646

Mexican in L.A. 0.018 -0.284 0.036 -0.624 0.370 1.409 -0.352 -0.614

Peruvian, Lima -0.008 -0.975 -0.031 -0.777 0.827 0.945 -0.025 -0.673

Punjabi, Pakistan -0.050 0.481 0.836 -0.545 -1.197 0.453 -0.390 -0.314

Puerto Rican 0.027 0.616 -0.036 -0.384 -0.476 0.545 -0.656 -0.616

Sri Lankan, UK 0.071 0.556 0.640 -0.777 -0.873 0.145 -0.488 -0.322

Toscani, Italy 0.266 0.893 -0.029 -0.283 -1.129 0.570 -0.888 -0.600

Yoruba, Nigeria -1.572 -1.183 -1.358 1.654 0.307 -1.621 -0.391 1.586

Spatial Autocorrelation (SAC)

6

Page 7: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Spatial (phylogenetic) correlation was calculated using the procedure illustrated in a previous paper (Piffer, 2015), which was based (then unknown to the author) on Mantel test (Mantel, 1967). Regression analysis applied to Mantel test enables estimation of polygenic selection pressures (Piffer, 2015).Pairwise Fst distances and pairwise score distances (absolute value of the difference in polygenic scores) were calculated. Table 5. SAC control for polygenic scores: Betas.

Source Fst PS

P.S. Davies et al. 2016. Β= 0.385 0.294

P.S. Davies et al. 2016 + Rietveld et al. 2013. Β=

0.329 0.361

P.S. Okbay et al. 2016 0.540 0.154

Table 6. SAC control for factor scores: Betas. Factor scores extracted from Okbay et al. 2016 GWAS. 7 sets of 10 SNPs sorted by p value and factor score extracted from Rietveld et al. (2013) and Davies et al. (2016).

Source Fst Factor

Fac_Rietveld_Davies. B= -0.162 0.861

Fac_1. B= 0.516 0.122

Fac_2. B= 0.650 -0.076

Fac_3. B= 0.598 -0.011

Fac_4. B= 0.622 -0.090

Fac_5. B= 0.699 -0.138

Fac_6. B= 0.557 0.095

Fac_7. B= 0.428 0.204

MCV

The Method of correlated vector was applied to the 70 SNPs from Okbay et al. (2016): the vector of the correlation of each SNP’s GWAS p value was correlated to the vector of the correlation between each SNP’s frequency and population IQ (r x IQ) and the vector of the

7

Page 8: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

correlation with the factor extracted from the two previous GWAS. Negative correlations were found between p value and r x IQ, r x Rietveld_Davies factor (r= -0.26; -0.25).

Table 7. MCV

SNP p value r x IQ r x fact Rietv_Davies

rs10061788 2.46E-09 0.217 0.155

rs1008078 6.01E-10 -0.704 -0.817

rs1043209 1.82E-11 0.640 0.836

rs10496091 5.62E-10 0.468 0.740

rs11191193 5.44E-11 -0.756 -0.695

rs11210860 2.36E-10 0.204 -0.066

rs112634398

4.61E-08 -0.389 -0.190

rs113520408

1.97E-08 0.589 0.409

rs114598875

2.41E-08 -0.505 -0.683

rs11588857 5.27E-10 0.776 0.842

rs11689269 1.28E-08 0.056 0.233

rs11690172 1.99E-08 -0.297 -0.258

rs11712056 3.3E-19 0.382 0.605

rs11768238 9.9E-10 0.282 0.429

rs12531458 3.11E-08 -0.571 -0.660

rs12646808 4E-08 -0.727 -0.886

rs12671937 9.15E-10 -0.422 -0.334

rs12682297 3.93E-09 -0.677 -0.815

rs12772375 1.56E-08 0.564 0.564

rs12969294 7.24E-09 0.431 0.512

rs12987662 2.69E-24 0.897 0.960

8

Page 9: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

rs13294439 2.2E-17 0.797 0.913

rs13402908 1.7E-11 0.200 0.389

rs1402025 3.42E-08 -0.867 -0.888

rs148734725

1.36E-18 -0.180 -0.493

rs165633 2.86E-09 0.452 0.496

rs16845580 2.65E-09 -0.466 -0.574

rs17119973 3.55E-10 -0.205 -0.008

rs17167170 1.14E-09 0.441 0.688

rs1777827 1.55E-08 0.821 0.898

rs17824247 2.77E-09 0.085 -0.185

rs2245901 4.54E-09 -0.413 -0.551

rs2431108 5.27E-09 -0.343 -0.341

rs2456973 1.06E-12 0.696 0.597

rs2457660 7.11E-10 -0.642 -0.801

rs2568955 1.8E-08 0.749 0.868

rs2610986 2.01E-08 -0.413 -0.465

rs2615691 4.71E-08 0.873 0.829

rs2837992 3.8E-08 -0.253 -0.520

rs2964197 3.02E-08 0.689 0.697

rs2992632 8.23E-09 0.423 0.499

rs301800 1.79E-08 0.129 0.132

rs3101246 1.43E-08 0.632 0.796

rs324886 1.91E-08 -0.500 -0.493

rs34072092 3.91E-08 -0.365 -0.261

rs34305371 3.76E-14 0.354 0.239

rs35761247 3.82E-08 0.318 0.171

9

Page 10: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

rs4493682 3.32E-08 -0.814 -0.914

rs4500960 3.75E-10 0.548 0.407

rs4851251 1.91E-08 -0.395 -0.424

rs4863692 1.56E-10 0.686 0.867

rs55830725 5.37E-10 -0.254 -0.419

rs56231335 2.07E-09 0.862 0.888

rs572016 3.46E-08 0.028 0.031

rs61160187 3.49E-10 0.876 0.924

rs62259535 2.63E-09 -0.330 -0.145

rs62263923 7.01E-09 0.793 0.893

rs62379838 3.3E-08 -0.131 -0.055

rs6739979 4.7E-08 -0.742 -0.779

rs6799130 2.82E-08 0.138 0.345

rs7131944 9.02E-09 0.029 -0.235

rs7306755 1.26E-12 0.154 0.054

rs76076331 3.63E-08 0.276 0.218

rs7767938 2.44E-08 -0.093 -0.007

rs7854982 1.29E-08 0.716 0.801

rs7945718 1.54E-08 -0.156 -0.256

rs7955289 4.49E-10 -0.224 -0.361

rs895606 2.25E-08 0.390 0.655

rs9320913 2.46E-19 0.793 0.747

rs9537821 1.5E-16 -0.395 -0.363

Mean 0.089 - CI (-0.035/ 0.213)

0.091 - CI (-0.047/0.229)

10

Page 11: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Four indicators of factor reliability were devised: 1) Average factor loading (mean loading of the 10 SNPs on the factor); 2) correlation to the factor scores obtained from Rietveld et al. (2013) and Davies et al. (2016); 3) Correlation with population IQ; 4) SAC-free Beta (“SAC Beta” for short). The values of these indicators are reported in table 8 for each of the 7 SNPs sets, along with their p value rank.

Table 8. Factor validity and reliability indicators.

SNP setAverage Fac. Loading

r x Fac Rietv_Dav r x IQ SAC Beta P value rank

Set 1 0.39 0.608 0.698 0.122 1

Set 2 0.221 0.896 0.715 -0.076 2

Set 3 0.051 -0.847 -0.720 -0.011 3

Set 4 0.152 0.199 0.094 -0.090 4

Set 5 0.199 0.684 0.643 -0.138 5

Set 6 0.046 0.560 0.394 0.096 6

Set 7 0.269 -0.813 -0.782 0.204 7

Mean 0.190 0.184 0.149 0.015

Table 9 reports the intercorrelations between the accuracy measures and p value rank.

Table 9. Intercorrelations between the accuracy measures and p value rank

11

Page 12: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

A novel measure of factor accuracy(“meta-accuracy”) was calculated as the mean between the four indicators (table X). In turn, the Spearman-rank correlation between the meta-accuracy vector and p value rank was computed. A negative correlation was found: r= -0.408.With the aim of validating the meta-accuracy measure, a meta-factor was created by factor analyzing the scores of the 7 factors. The factor loadings (“meta-loadings”) were in turn correlated to the meta-accuracy vector, thus producing a “meta-Jensen coefficient” (table 10). The correlation between the two meta-vectors was r= 0.969.

Table 10. Meta-indicator of factor accuracy (“meta-accuracy).

P value rank Meta-accuracy Meta-loadings

Set 1 1 0.455 0.76

Set 2 2 0.439 0.8

Set 3 3 -0.382 -0.99

Set 4 4 0.089 -0.2

Set 5 5 0.347 0.93

Set 6 6 0.274 0.16

Set 7 7 -0.281 -0.96

Factor scores for the meta-factor are reported in table 11.

Table 11. Meta-factor scores

Population Metafactor_Okbay2016

Afr.Car.Barbados -1.356

US Blacks -1.230

Bengali Bangladesh 0.476

Chinese Dai 0.572

Utah Whites 0.825

Chinese, Bejing 0.534

Chinese, South 0.532

Colombian 0.593

Esan, Nigeria -1.703

Finland 1.077

British, GB 0.560

Gujarati Indian, Tx 0.524

Gambian -1.739

Iberian, Spain 0.668

Indian Telegu, UK 0.364

12

Page 13: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Japan 0.540

Vietnam 0.621

Luhya, Kenya -1.645

Mende, Sierra Leone -1.729

Mexican in L.A. 0.625

Peruvian, Lima 0.603

Punjabi, Pakistan 0.559

Puerto Rican 0.431

Sri Lankan, UK 0.587

Toscani, Italy 0.405

Yoruba, Nigeria -1.692

A linear regression of population IQ on the three factors was carried out. Scatterplots are reported in figures 1 a,b.

Figure 1a. Regression of population IQ on factor extracted from the Okbay et al. (2016) dataset.

13

Page 14: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Figure 1b. Regression of population IQ on factor extracted from the Rietveld et al. (2013) & Davies et al. (2016) datasets.

14

Page 15: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

15

Page 16: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

There were substantial intercorrelations between the meta-factor, the Rietveld+Davies factor scores and IQ (table 12). SAC-control was applied to the meta-factor. This produced a very weak SAC-free effect (B= 0.097; Fst B= 0.508).

Table 12. interrcorrelations between the meta-factor, the Rietveld+Davies factor scores and IQ

To extract a reliable estimate of polygenic selection, the average of the two factors (“average factor”) was computed. The correlation between the average factor and population IQ was r= 0.858.

LD pruning

Cross-GWAS linkage was checked by feeding SNPSNAP with the list of 86 SNPs, with LD thresholds of 500kb and r= 0.5.In total, 8 SNP pairs were found to be in LD. One SNP was present in two GWAS datasets (rs9320913).

16

Page 17: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

A list of replicated or pseudo-replicated (in LD across studies) SNPs was created, composed of one of the two linked SNP(one for each pair) and the 8 SNPs in LD across GWAS (table 13). The polygenic scores from the linked SNPs are reported in table 14. The correlation between the two scores is r= 0.919.

Table 13. Pseudo-replicated and replicated SNPs. Sites in LD (r>0.5).

Publication Index SNP Publication Linked SNP

Davies et al., 2016 rs12042107 rs1008078 Okbay et al., 2016

Rietveld et al., 2013 rs11584700 rs11588857 Okbay et al., 2016

Rietveld et al., 2013 rs4851266 rs12987662 Okbay et al., 2016

Davies et al., 2016 rs13086611 rs148734725 Okbay et al., 2016

Davies et al., 2016 rs11130222 rs11712056 Okbay et al., 2016

Davies et al., 2016 rs55686445 rs62263923 Okbay et al., 2016

Davies et al., 2016 rs12553324 rs13294439 Okbay et al., 2016

Davies et al., 2016 rs4799950 rs12969294 Okbay et al., 2016

Rietveld et al., 2013 rs9320913* rs9320913 Okbay et al., 2016

*Replicated

Table 14. Replicated/pseudo-replicated PS score.

Population

PS (Rietveld_Davies)

PS (Okbay et al., 2016)

Afr.Car.Barbados 0.355 0.224

US Blacks 0.379 0.259

Bengali Bangladesh 0.412 0.353

Chinese Dai 0.481 0.425

Utah Whites 0.425 0.384

Chinese, Bejing 0.559 0.481

Chinese, South 0.532 0.462

Colombian 0.395 0.343

Esan, Nigeria 0.357 0.227

Finland 0.469 0.442

British, GB 0.465 0.416

Gujarati Indian, Tx 0.441 0.389

17

Page 18: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Gambian 0.368 0.218

Iberian, Spain 0.433 0.396

Indian Telegu, UK 0.423 0.365

Japan 0.536 0.458

Vietnam 0.520 0.450

Luhya, Kenya 0.343 0.231

Mende, Sierra Leone 0.371 0.231

Mexican in L.A. 0.387 0.335

Peruvian, Lima 0.351 0.299

Punjabi, Pakistan 0.429 0.378

Puerto Rican 0.399 0.345

Sri Lankan, UK 0.410 0.361

Toscani, Italy 0.431 0.396

Yoruba, Nigeria 0.363 0.231

Frequencies of the replicated hits were also calculated for the 5 super-populations (i.e. races) of 1000 Genomes for both SNP sets. A boxplot is shown in figures 2a and 2b.

Figure 2a. PS of linked/replicated SNPs by race. SNPs from Rietveld et al. (2013) and Davies et al. (2016).

18

Page 19: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Figure 2a. PS of linked/replicated SNPs by race. SNPs from Okbay et al. (2016).

19

Page 20: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

The replicated SNPs were factor analyzed. Factor scores and loadings are reported in tables 15 and 16, respectively.

Table 15. Factor scores (replicated SNPs)

Population

Factor Repl. (Davies and Rietveld, 2016) Factor Repl. (Okbay et al., 2016)

Afr.Car.Barbados -1.305 -1.309

US Blacks -1.139 -1.144

Bengali Bangladesh -0.233 -0.355

Chinese Dai 1.068 1.051

Utah Whites 0.434 0.437

Chinese, Bejing 1.627 1.608

Chinese, South 1.421 1.488

Colombian -0.047 -0.092

Esan, Nigeria -1.430 -1.411

Finland 0.600 0.555

20

Page 21: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

British, GB 0.709 0.742

Gujarati Indian, Tx 0.246 0.230

Gambian -1.280 -1.328

Iberian, Spain 0.345 0.318

Indian Telegu, UK 0.028 -0.014

Japan 1.493 1.443

Vietnam 1.441 1.468

Luhya, Kenya -1.434 -1.426

Mende, Sierra Leone -1.268 -1.319

Mexican in L.A. -0.052 0.024

Peruvian, Lima -0.108 0.034

Punjabi, Pakistan 0.151 0.190

Puerto Rican -0.035 -0.049

Sri Lankan, UK 0.034 -0.029

Toscani, Italy 0.153 0.268

Yoruba, Nigeria -1.419 -1.380

Table 16. Factor loadings (replicated SNPs)

SNPLoading (Davies and Rietveld, 2016) SNP

Loading (Okbey et al., 2016)

rs12042107_C -0.59 rs1008078 0.14

rs11584700_G 0.85 rs11588857 0.82

rs4851266_T 0.94 rs12987662 0.97

rs9320913_A 0.71 rs148734725 -0.52

rs13086611_T -0.75 rs11712056 0.61

rs11130222_A 0.06 rs62263923 0.88

rs55686445_C 0.87 rs13294439 0.88

rs12553324_G 0.86 rs12969294 0.49

rs4799950_G 0 0.69 rs9320913 0.71

Average 0.404 0.553

The two factors were almost identical (r= 0.998).Finally, a list of cross-GWAS clumped SNPs was created by keeping only one SNP for each LD pair (e.g. rs12042107 (Davies) - rs1008078 (Okbay). Only the latter (rs1008078) was preserved). Obviously, the replicated SNP (rs9320913) was counted only once.This resulted in a list of “LD-clumped” (86-8-1)= 77 SNPs.A LD-clumped polygenic score was calculated. This is reported in table 17.

21

Page 22: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

LD-clumped polygenic score (LD clumping across independent hits from three GWAS. Pre-clumping N=86; Post clumping and overlap: N=77).

Table 17. LD-clumped Polygenic Score.

Population PS Clumped

Afr.Car.Barbados 0.498

US Blacks 0.509

Bengali Bangladesh 0.511

Chinese Dai 0.563

Utah Whites 0.508

Chinese, Bejing 0.579

Chinese, South 0.571

Colombian 0.512

Esan, Nigeria 0.496

Finland 0.530

British, GB 0.516

Gujarati Indian, Tx 0.508

Gambian 0.498

Iberian, Spain 0.523

Indian Telegu, UK 0.506

Japan 0.573

Vietnam 0.566

Luhya, Kenya 0.495

Mende, Sierra Leone 0.497

Mexican in L.A. 0.510

Peruvian, Lima 0.494

Punjabi, Pakistan 0.516

Puerto Rican 0.505

Sri Lankan, UK 0.506

Toscani, Italy 0.519

Yoruba, Nigeria 0.500

The LD-clumped PS had the following correlations with the other variables: r x IQ: 0.766; r x FactorRietvDavies: 0.835; r x Metafactor: 0.475.

22

Page 23: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

The population IQ variable had some missing cases so the correlations are reported both with and without IQ (table 18a and 18b, respectively).

Table 18a. Correlation plot (all polygenic and factor scores). With IQ.

23

Page 24: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Table 18b.Correlation plot (all polygenic and factor scores). Without IQ.

Figure 3 reports the boxplot of the LD clumped PS by race.

Figure 3. LD-clumped polygenic score by race.

24

Page 25: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Okbay et al. (2016). 162 independent SNPs that reached genome-wide significance (P < 5×10-8) in the pooled-sex EduYears meta-analysis of the discovery and replication samples (N =405,072)

154 SNPs were found in 1000 Genomes. The polygenic score was computed (table 19). Its correlation to population IQ was r= 0.863 (scatterplot figure 4).

Population PS

Afr.Car.Barbados 0.4853493506

US Blacks 0.4849350649

Bengali Bangladesh 0.5049357143

Chinese Dai 0.5171298701

Utah Whites 0.5056584416

Chinese, Bejing 0.5298993506

Chinese, South 0.5240006494

Colombian 0.5015116883

Esan, Nigeria 0.4792058442

Finland 0.5170064935

British, GB 0.5086090909

25

Page 26: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Gujarati Indian, Tx 0.5079487013

Gambian 0.4844811688

Iberian, Spain 0.5171688312

Indian Telegu, UK 0.5080181818

Japan 0.530288961

Vietnam 0.5233383117

Luhya, Kenya 0.4777168831

Mende, Sierra Leone 0.475287013

Mexican in L.A. 0.4983077922

Peruvian, Lima 0.4769019481

Punjabi, Pakistan 0.5071402597

Puerto Rican 0.501524026

Sri Lankan, UK 0.5033376623

Toscani, Italy 0.5162746753

Yoruba, Nigeria 0.4824142857

Figure 4. Relationship between P.S. computed from hits by Okbay et al. (2016)’s pooled meta-analysis and population IQ.

26

Page 27: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

SAC: clumped and replicated SNPs

Spatial autocorrelation analysis was run on the three scores. The effect size is reported in table 20.

Table 20. SAC control for polygenic and factor scores.

Source Fst Factor/PS

PS clumped. B= 0.476 0.250

PS replicated_Rietv_Davies. B= 0.395 0.352

PS replicated_Okbay. B= -0.062 0.791

PS meta-analysis_Okbay. B= 0.229 0.500

27

Page 28: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Factor replicated. B= 0.002 0.695

Simulation

Factor loadings100 sets of 10 SNPs matched to the top significant SNPs in Okbay et al. (2016) were obtained from SNPSNAP. After removal of problematic SNPs (when frequency was 0 for a population, that population was not counted, creating mismatch between rows, hence these had to be removed). Among those,the first 200 sets (to speed up computation) of 10 random SNPs were chosen for a simulation. Factor analysis was iterated over each set. The average factor loading was 0.268 (SD=0.176).This information was used as a baseline, null model to test against polygenic selection. Z scores were calculated for factor analysis of GWAS hits= (Average loading-0.268)/0.176

Table 21. Z-scores of factor loadings

Factor Average Loading Z-score

LD-clumped (Davies et al. 2016 + Rietveld et al., 2013)

0.4941.284

Pseudo-replicated (Davies et al., 2016 and Rietveld et al., 2013)

0.404

0.773

Pseudo-replicated (Okbay et al., 2016)

0.5331.506

Okbay et al., 2016: Set 1 0.39 0.693

Set 2 0.221 -0.267

Set 3 0.051 -1.233

Set 4 0.152 -0.659

Set 5 0.199 -0.392

Set 6 0.046 -1.261

Set 7 0.269 0.006

Factor scoresThe correlations between the factor scores for the 200 sets of 10 SNPs and population IQ were computed. The average Pearson’s r was 0.22 (95% C.I.= -0.757; 0.823; 99% C.I= -0.826; 0.886). Thus, the correlations between the factors (pseudoreplicated hits) and IQ (r=0.89) is significant according to the conventional p value (0.05).

Polygenic scores

28

Page 29: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

The set of 7914 SNPs was divided into 52 sets of 152 SNPs. N=152 was chosen because it corresponds to the number of SNPs in the pooled Okbay et al. (2016) sample.The average correlation between the polygenic scores for 52 sets of 152 SNPs and IQ was 0.467 (95% C.I= -0.100; 0.817). The upper limit of the 95 % CI was almost identical to that obtained for the factor scores simulation (0.823). Hence, the correlation between the 152 SNPs GWAS hits polygenic score and population IQ (r= 0.863) is significant according to the conventional p value (0.05).

Discussion

The genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment were used to test a polygenic selection model.Strong inter-correlations among population-level polygenic scores of alleles found by three independent GWAS to be associated with educational attainment were observed. Moreover, these polygenic scores were substantially correlated to estimates of average population IQ (table 2).The polygenic score of 152 SNPs that reached genome-wide significance in the meta-analysis by Okbay et al. (2016) of the discovery and replication samples (N =405,072) was highly correlated to population IQ (r=0.863).Using the hits by Okbay et al. (2016), the method of correlated vectors revealed the presence of a Jensen effect of SNP p value on population IQ and factor from the two previous GWAS (r= -.25). That is, frequencies of alleles with lower p value (more GWAS significance) had stronger correlations to population IQ and a factor extracted from an independent dataset.Factor analysis produced similar estimates of polygenic selection strength for educational attainment across the three datasets. The SNPs from the largest GWAS (Okbay et al., 2016) were subset by p value (N= 7) and factor analyzed. Variables indicating reliability (factor loadings), predictive (r x population IQ and SAC-free Beta) and convergent validity (r x factor extracted from previous GWAS) were computed. The correlation with p value rank was in the expected direction in three out of four instances (table 8) and of low magnitude. A composite index was created by calculating the average of the four variables. This index substantially correlated to SNP set’s P value rank (r= -0.4). There was also a positive correlation between average factor loading and r x IQ, r x factor, r x SAC-free effect.Factor analysis was carried out on the 7 factors belonging to the 7 sets. The loadings on this “meta-factor” were strongly correlated to the composite index of factor accuracy (r=0.96).That is, the factors’ independent correlations to measures of accuracy were in turn correlated to their loadings on the “meta-factor”. The factor extracted from the hits of the first two GWAS (Rietveld et al., 2013; Davies et al,, 2016) survived control for phylogenetic correlation using Fst distances (test of the hypothesis that random factors such as drift or migration confound the results, see Piffer, 2015) quite well (B= 0.861) but among the Okbay et al. 2016 hits, only the top 10 significant ones had some residual predictive validity (B= 0.122). This discrepancy was also reflected in the average factor loadings, which were much higher in the former than in the latter (0.49 vs. 0.19).There were substantial intercorrelations (r= 0.77-0.89) between the meta-factor, the Rietveld+Davies factor scores and IQ (table 11).LD clumping was performed on the entire SNPs set from the three GWAS (N=86). This produced 77 LD-clumped SNPs, 1 SNP in common between two publications and 8 linked

29

Page 30: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

SNPs. The polygenic score calculated from the LD-clumped SNPs was correlated to population IQ (r=0.77) and to the factor (r= 0.85) and polygenic scores extracted from the 9 replicated/linked SNPs (r= 0.94 and 0.8) (tables 18 a and 18b). The two PS obtained from the cross-publication linked/replicated hits, were strongly correlated to population IQ (r=0.82; 0.9). The two factors of the replicated/linked hits were similarly correlated to population IQ (r=0.89).All four vectors survived control (partialling out of Fst distances) for spatial autocorrelation (table 19), reaching Betas= 0.25; 0.35;0.79; 0.69). The PS obtained from the 152 SNPs of Okbay et al. (2016) meta-analysis had “SAC-free” B=0.500.The replicated/linked SNPs exhibit a clear clustering of alleles across 5 major racial groups (figure 2), showing a pattern matching cross-racial differences in IQ: East Asians>Europeans>South Asians; Hispanics; Africans.The results together constitute a replication of preliminary findings (Piffer, 2013; Piffer, 2015) and provide unequivocal evidence for recent diversifying polygenic selection on educational attainment and underlying cognitive ability.This study provides strong evidence that there has been recent polygenic and diversifying selection on educational attainment, hence producing different levels of cognitive capacity and other traits related to educational attainment among populations.A limitation of this study is the reliance on GWAS hits for a complex phenotype such as educational attainment, which shares the majority of additive genetic variation with general intelligence, but also other personality and health-related traits (Krapohl et al., 2014 and 2015).It is possible that there are other SNPs affecting cognitive variables unrelated to educational attainment and these may not necessarily be subject to the same selection pressures.

References

Davies, G., Marioni, R.E., Liewald, D.C., et al. (2016). Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N =112 151). Molecular Psychiatry, 1-10. doi:10.1038/mp.2016.45

Krapohl, E., Rimfeld, K., Shakeshaft, N.G., Trzaskowski, M., McMillan, A., Pingault, J.-B., Asbury, K., Harlaar, N., Kovas, Y., Dale, P.S. & Plomin, R. (2014).The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence. PNAS, 111, 15273–15278, doi: 10.1073/pnas.1408777111

Krapohl, E., Euesden, J., Zabaneh, D., Pingault, J.B.,, Rimfeld, K., von Stumm, S., Dale, P.S., Breen, G., O’Reilly, P.F., and Plomin, R. (2015). Phenome-wide analysis of genome-wide polygenic scores. Molecular Psychiatry, 1-6. doi:10.1038/mp.2015.126

Okbay, A., Beauchamp, J.P., Fontana, M.A., Lee, J., Pers, T.H., et al. (2016). Genome-wide association study identifies 74 loci associated with educational attainment. Nature, doi:10.1038/nature17671

30

Page 31: s3-eu-west-1.amazonaws.com · Web viewRecent polygenic selection on educational attainment: a replication. Davide Piffer. Email: pifferdavide@gmail.com. Abstract . The genetic variants

Piffer, D. (2013). Factor Analysis of Population Allele Frequencies as a Simple, Novel Method of Detecting Signals of Recent Polygenic Selection: The Example of Educational Attainment and IQ. Mankind Quarterly, 54, 168-200. Piffer, D. (2015). A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation. Intelligence, 53, 43-50.

Rietveld, C.A., Medland, S.E., Derringer, J., Yang, J., Esko, T., Martin, N.W., et al. (2013). GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science, 340, 1467-1471. doi: http://doi.org/10.1126/science.1235488

31


Recommended