Contact: wyan@ggebiplot.com Biplot Analysis of Multi-Environment Trial Data Weikai Yan May 2006.

Post on 30-Mar-2015

214 views 0 download

Tags:

transcript

Contact: wyan@ggebiplot.com

Biplot AnalysisBiplot Analysis of of Multi-Environment Trial DataMulti-Environment Trial Data

Weikai YanMay 2006

Weikai Yan2006

Multi-Environment Trials Multi-Environment Trials (MET)(MET)

• MET are essential

• MET are expensive

• MET data are valuable

• MET data are not fully used

Weikai Yan2006

Why biplot analysis?Why biplot analysis?

• Biplot analysis can help understand MET data– Graphically, – Effectively, – Conveniently

Weikai Yan2006

OutlineOutline

• Multi-environment trial (MET) data• Basics of biplot analysis• Biplot analysis of G-by-E data• Biplot analysis of G-by-T data• Better understanding of MET data• Conclusions

Contact: wyan@ggebiplot.com

Multi-environment Multi-environment trial datatrial data

Weikai Yan2006

MET data is MET data is a genotype-environment-a genotype-environment-

trait trait (G-E-T) 3-way table(G-E-T) 3-way table

• Multiple Genotypes

• Multiple Environments

• Multiple Traits

Weikai Yan2006

A G-E-T 3-way table A G-E-T 3-way table contains contains

many 2-way tablesmany 2-way tables• G by E: for each trait

• G by T (trait): in each environment; across environments

• E by T: for each genotype; across genotypes

G-E-T data >> G-E data

Weikai Yan2006

A G-E-T 3-way table isA G-E-T 3-way table isan extended 2-way tablean extended 2-way table

• G by V:– each E-T combination as a variable (V)

• P by T: – each G-E combination as a phenotype

(P)

Weikai Yan2006

A G-E-T 3-way table implies A G-E-T 3-way table implies informative 2-way tablesinformative 2-way tables

• Association by environment 2-way tables– Associations:

• among traits• between traits and genetic markers

Weikai Yan2006

Goals of MET data analysis Goals of MET data analysis

• Short-term goals: – Variety evaluation

• Response to the environment (G x E)• Trait profiles (G x T)

• Long-term goals: – To understand

• the target environment (G x E)• the test environments (G x E)• the crop (G x T)• the genotype x environment interaction (A x T)

Contact: wyan@ggebiplot.com

Basics of biplot Basics of biplot analysisanalysis

Most two-way tables can be visually studied using biplots

Weikai Yan2006

Origin of biplotOrigin of biplot

Gabriel (1971) One of the most

important advances in data analysis in recent decades

Currently… > 50,000 web pages Numerous academic

publications Included in most

statistical analysis packages

Still a very new technique to most scientists

Prof. Ruben Gabriel, “The founder of biplot”Courtesy of Prof. Purificación Galindo

University of Salamanca, Spain

Weikai Yan2006

What is a biplot?What is a biplot?

• “Biplot” = “bi” + “plot”– “plot”

• scatter plot of two rows OR of two columns, or• scatter plot summarizing the rows OR the columns

– “bi” • BOTH rows AND columns

• 1 biplot >> 2 plots

Weikai Yan2006

Mathematical definition of Mathematical definition of a Biplota Biplot

Graphical display of matrix multiplicationGraphical display of matrix multiplication

“Inner product property”– Pij =OAi*OBj*cosij

– Implies the product matrix

A(4, 2) B(2, 3) P(4, 3)

121284

96103

151262

69201

321

214

332

321

044

313

332

341

a

a

a

a

bbb

y

x

bbb

a

a

a

a

yx

Matrix multiplication

-4

-3

-2

-1

0

1

2

3

4

5

-4 -3 -2 -1 0 1 2 3 4 5

X

Y

O

A1A2

A3

A4

B1

B2

B3

5.0

cos =0.8944

4.472

P11 = 5*4.472*0.8944 = 20

Weikai Yan2006

Practical definition of a Practical definition of a biplotbiplot

“Any two-way table can be analyzed using a 2D-biplot as soon as it can be sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)

214

332

321

044

313

332

341

121284

96103

151262

69201

321

y

x

eee

g

g

g

g

yx

g

g

g

g

eee

G-by-E table

Matrix decomposition

-4

-3

-2

-1

0

1

2

3

4

5

-4 -3 -2 -1 0 1 2 3 4 5

X

Y

O

G1G2

G3

G4

E1

E2

E3

P(4, 3) G(3, 2) E(2, 3)

(Now 3D-biplots are also possible…)

Weikai Yan2006

Singular Value Decomposition Singular Value Decomposition (SVD) & (SVD) &

Singular Value Partitioning (SVP) Singular Value Partitioning (SVP)

r

kkj

fk

fkik

SVP

r

kkjkik

SVDij

ba

baY

1

1

1

))((

(0 ≤ f ≤ 1)

“Singular values”Matrix characterising the rows

Matrix characterising the columns

SVD = PCA?

SVD:

SVP:

The ‘rank’ of Y, i.e., the minimum number of PC required to fully represent Y

Rows scores

Column scores

BiplotPlot Plot

Weikai Yan2006

Biplot interpretations Biplot interpretations

Inner-product property Interpretations based on biplots with f = 1

approximates YYT, the distance matrix Similarity/dissimilarity among row (genotype) factors

Interpretations based on biplots with f = 0 approximates YTY, the variance matrix Similarity/dissimilarity among column (environment)

factors

Combined use of f = 0 and f = 1

(Gabriel, 2002 Biometrika; Yan, 2002, Agron J; Built in the GGEbiplot software)

))((1

1

r

kkj

fk

fkikij baY

Weikai Yan2006

Biplot analysis is… Biplot analysis is…

to use biplots to display– a two-way data per se (Y), – its distance matrix (YYT), and– its variance matrix (YTY)

so that– relationships among rows, – relationships among columns, and– interactions between rows and columns

can be graphically visualized.

Weikai Yan2006

Data centeringData centering prior to prior to biplot analysisbiplot analysis

• The general linear model for a G-by-E data set (P) – P = M + G + E + GE

• Possible two-way “tables” (Y):• Y = P = M + G + E + GE —original data: QQE biplot

• Y = P – M = G + E + GE —global-centered (PCA)

• Y = P – M – E = G + GE —column-centered: GGE biplot

• Y = P – M – G = E + GE —row-centered

• Y = P – M – G – E = GE —double-centered: GE biplot

All models are useful, depending on the research objectives (built in GGEbiplot)

Weikai Yan2006

Data scalingData scaling prior to prior to biplot analysisbiplot analysis

• Different GGE biplots• Yij = (i + ij)/sj

• Sj = 1 no scaling

• Sj = (s.d.)j all environments are equally important

• Sj = (s.e.)j heterogeneity among environments is removed

(built in GGEbiplot)

Weikai Yan2006

Four questions must be Four questions must be askedasked

before trying to interpret a biplotbefore trying to interpret a biplot1. What is the model?

How the data were centered and scaled?What are we looking at?

2. What is the goodness of fit?How confident are we about what we see?What if the data is fitted poorly?

3. How singular values are partitioned?What questions can be asked?

4. Are the axes drawn to scale?Are the patterns artifacts?

(All are addressed explicitly in GGEbiplot)

Contact: wyan@ggebiplot.com

Biplot Analysis ofBiplot Analysis ofG-by-E dataG-by-E data

MEGA-MEGA-ENVIRONMENTENVIRONMENT

ANALYSISANALYSIS

TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION

GENOTYPEGENOTYPEEVALUATIONEVALUATION

Weikai Yan2006

Sample G-by-E dataSample G-by-E data(Yield data of 18 genotypes in 9 environments, 1993, Ontario, Canada)(Yield data of 18 genotypes in 9 environments, 1993, Ontario, Canada)

Weikai Yan2006

Before trying to interpret a Before trying to interpret a biplot…biplot…

1. Model selection?Centering = 2 (“G+GE”)

Scaling =0

2. Goodness of fit?78%.

3. Singular value partitioning?

SVP = 2 (environment-

metric)

4. Draw to scale?Yes.

Weikai Yan2006

G By E data analysisG By E data analysis

MEGA-MEGA-ENVIRONMENTENVIRONMENT

ANALYSISANALYSIS

TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION

GENOTYPEGENOTYPEEVALUATIONEVALUATION

• Mega-environment is a group of geographical locations that share the same (set of) best genotypes consistently across years.

Weikai Yan2006

Relationships among Relationships among environmentsenvironments

The “Environment-vector” view• Angle vs.

correlation• The angles

among test environments

• Environment grouping

Weikai Yan2006

“Which-won-where”

(Crossover GE is GE that caused genotype rank changes and different “winners” in different test environments)

G12

G7G18

G8G13

Weikai Yan2006

Are there meaningful Are there meaningful crossover GE?crossover GE?

The “which-won-where” view

(Crossover GE is GE that caused genotype rank changes and different “winners” in different test environments)

Weikai Yan2006

Are the Are the crossover patternscrossover patterns* * repeatable?repeatable?

• If YES…– The target environment can be divided into multiple

mega-environments– GE can be exploited by selecting for each mega-

environment– GE G

• If NO…– The target environment CANNOT be divided into

multiple mega-environments– GE CANNOT be exploited – GE must be avoided by testing across locations and

years

• *Not the environment-grouping patterns• Mega-environment is a group of geographical locations that share the same (set of) best genotypes consistently across years.• Multi-year data are needed

Weikai Yan2006

Classify your target Classify your target environment intoenvironment into

one of three categoriesone of three categoriesWith Crossover GE No Crossover

GE

Repeatable (2) Multiple MEsSelect for specifically adapted genotypes for each ME

(1) Single simple MEA single test location, single year suffices to select a single best variety

Not repeatable (3) Single complex MESelect for generally adapted genotypes across the whole regions across multiple years

ME: mega-environment

Weikai Yan2006

G By E data analysisG By E data analysis

MEGA-MEGA-ENVIRONMENTENVIRONMENT

ANALYSISANALYSIS

TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION

GENOTYPEGENOTYPEEVALUATIONEVALUATION

Weikai Yan2006

Discriminating ability and Discriminating ability and representativenessrepresentativeness

Vector length: discriminating abilityAngle to the AE: representativeness

Average-environment axis

Average environment

Weikai Yan2006

Ideal test environments:Ideal test environments:discriminating and discriminating and

representativerepresentative

Ideal testenvironment

Weikai Yan2006

Classify each test environment Classify each test environment into into

one of three categories one of three categories

• For each “good” or “useful” test environment: is it essential?

Discriminative Not discriminative

Representative (2) Good for selecting (more

important)

(1) Useless

Not representative

(3) Useful for culling (less important)

Weikai Yan2006

Vector length = discrimination Vector length = discrimination = GE = GE1 + GE2= GE = GE1 + GE2

Contribution toProportionateGE

Contribution toNon-proportionateGE

Weikai Yan2006

G By E data analysisG By E data analysis

MEGA-MEGA-ENVIRONMENTENVIRONMENT

ANALYSISANALYSIS

TESTTESTENVIRONMENTENVIRONMENTEVALUATIONEVALUATION

GENOTYPEGENOTYPEEVALUATIONEVALUATION

Weikai Yan2006

Vector length = GGE = G Vector length = GGE = G + GE+ GE

Contribution To GE(instability)

Contribution To G (mean performance)

Weikai Yan2006

Mean vs. StabilityMean vs. Stability

Weikai Yan2006

Genotype ranking on Genotype ranking on bothboth MEAN MEAN andand STABILITYSTABILITY

“The idealgenotype”

Weikai Yan2006

Genotype classification Genotype classification

Mean

Stability

High mean performance

Low mean performance

High stability Generally adapted

(VERY GOOD)

Bad everywhere

(VERY BAD)

Low stability Specifically Adapted

(GOOD)

Bad somewhere

(BAD)

Are there stability genes?!

Weikai Yan2006

G x E data analysis G x E data analysis summarysummary

• 1) Mega-environment analysis• 2) Test environment evaluation• 3) Genotype evaluation

Important comments:– (2) and (3) are meaningful only for a single mega-environment– Any stability analysis is meaningful only for a single mega-

environment– Any stability index can be used only as a modifier to the ranking

based on mean performance

Contact: wyan@ggebiplot.com

Other ways to view Other ways to view a GGE biplota GGE biplot

Weikai Yan2006

Inner-product propertyInner-product property

Weikai Yan2006

Ranking on a single Ranking on a single environmentenvironment

Weikai Yan2006

Ranking on two Ranking on two environmentsenvironments

Weikai Yan2006

Relative adaptation of a Relative adaptation of a genotypegenotype

Weikai Yan2006

Compare any two genotypesCompare any two genotypes

Contact: wyan@ggebiplot.com

Biplot analysis of Biplot analysis of Genotype by trait Genotype by trait

datadata

Weikai Yan2006

Objectives of G By T data Objectives of G By T data analysisanalysis

• Genotype evaluation based on trait profiles

• Relationship among breeding objectives

Weikai Yan2006

Data of 4 traits for 19 Data of 4 traits for 19 covered oat varieties covered oat varieties

(Ontario 2004)(Ontario 2004)

(Background info: High yield, high groat, high protein, and low oil are desirable for milling oats)

Weikai Yan2006

Relationships among Relationships among traitstraits

Weikai Yan2006

Trait profile of each Trait profile of each genotypegenotype

Weikai Yan2006

Trait profile of a Trait profile of a genotypegenotype

Weikai Yan2006

Trait profile comparison Trait profile comparison between two genotypesbetween two genotypes

Weikai Yan2006

Genotype ranking based Genotype ranking based on a traiton a trait

Weikai Yan2006

Parent selection based on trait Parent selection based on trait profilesprofiles

Weikai Yan2006

Independent cullingIndependent culling

Contact: wyan@ggebiplot.com

Fuller understanding Fuller understanding of MET data of MET data

MET data are more informative than you thought

Weikai Yan2006

A A G-E-TG-E-T 3-way dataset 3-way dataset contains various 2-way contains various 2-way

tablestables• G by E data• G by T data• E by T data:

– for each genotype; all genotypes• G by V data:

– each E-T as a variable (V)• P by T data:

– each G-E as a phenotype (P)• Genetic association by environment data• Trait association by environment data

Weikai Yan2006

Genetic-covariate by Genetic-covariate by environment biplotenvironment biplot

(QTL by environment biplot)(QTL by environment biplot)

BarleyGenomicsData

Weikai Yan2006

Trait-association by Trait-association by environment biplotenvironment biplot

OatMETData

Weikai Yan2006

Four-way data analysisFour-way data analysis

• Year…

Contact: wyan@ggebiplot.com

Conclusions Conclusions

Weikai Yan2006

Conclusion (1)Conclusion (1)

• “GGE biplot analysis” is an effective tool for G by E data analysis to achieve understandings about….

1. the target environment,

2. the test environments, and

3. the genotypes

4. stability analysis is useful only to a single mega-environment

Weikai Yan2006

Conclusion (2)Conclusion (2)

• “GGE biplot analysis” is an effective tool for G by T data analysis to achieve understandings about….

1. the interconnected plant system,

2. positively correlated traits

3. negatively correlated traits

4. the strength and weakness of the genotypes

Weikai Yan2006

Conclusion (3)Conclusion (3)

• “Biplot analysis” is an effective tool for other two-way table analysis

–Marker by environment–QTL by environment–Gene by treatment–Diallel cross–…

Weikai Yan2006

Conclusion (4)Conclusion (4)

• Biplot analysis can be VERY EASY…– From reading data to displaying the biplot: 2 seconds– Displaying any of the perspectives of a biplot and

changing from one to another: 1 second– Displaying the biplot for any subset: 1 second– Learning how to use the software and interpret

biplots: 30 minutes– Everything can be just one mouse-click away

Contact: wyan@ggebiplot.com

Thank youThank youContact: Weikai Yan: wyan@ggebiplot.com

web: www.ggebiplot.com