+ All Categories
Home > Documents > PROC QTL

PROC QTL

Date post: 01-Dec-2014
Category:
Upload: luis
View: 36 times
Download: 4 times
Share this document with a friend
50
PROC QTL A SAS Procedure for Mapping Quantitative Trait Loci Version 2.0
Transcript
Page 1: PROC QTL

PROC QTL

A SAS Procedure for Mapping Quantitative Trait Loci

Version 2.0

Page 2: PROC QTL

The correct bibliographic citation for this program is

Zhiqiu Hu and Shizhong Xu (2009). PROC QTL - A SAS Procedure for Mapping Quantitative Trait Loci. International Journal of Plant Genomics 2009: 3 doi:10.1155/2009/141234.

PROC QTL Version 2.0

Copyright © 2008, University of California, Riverside, CA, USA

All rights reserved.

University of California, Riverside

900 University Ave., Riverside, CA 92521

Page 3: PROC QTL

i

Contents

OVERVIEW: QTL PROCEDURE .......................................................................................... 1

GETTING STARTED: QTL PROCEDURE ................................................................................ 2

SYNTAX: QTL PROCEDURE .............................................................................................. 6

PROC QTL Statement ....................................................................................................... 6

BY Statement ................................................................................................................. 16

CLASS Statement ........................................................................................................... 16

ESTIMATE Statement .................................................................................................... 17

GENOTYPE Statement ................................................................................................... 18

MARKER Statement ....................................................................................................... 20

MATINGTYPE Statement ............................................................................................... 20

MODEL Statement ......................................................................................................... 20

RANGE Statement ......................................................................................................... 21

WEIGHT statement ........................................................................................................ 22

DETAILS: QTL PROCEDURE ........................................................................................... 22

EXAMPLES: QTL PROCEDURE ........................................................................................ 23

Example 1: QTL mapping for continuous trait .............................................................. 23

Example 2: QTL mapping for discrete traits .................................................................. 27

Example 3: QTL mapping in a four-way cross design .................................................... 29

Example 4: QTL mapping via the Bayesian method ...................................................... 32

Example 5: Estimating genomewide epistatic effects via the empirical Bayesian method .......................................................................................................................... 33

Example 6: Joint mapping of QTL for multiple traits .................................................... 35

Example 7: Estimating genomewide QTL effects for discrete traits that follow some special distributions ...................................................................................................... 39

Example 8: Composite interval mapping using PROC QTL ............................................ 43

Example 9: Permutation for the Bayesian shrinkage analysis ...................................... 45

REFERENCES: .............................................................................................................. 47

Page 4: PROC QTL

1

Overview: QTL procedure

PROC QTL is a user defined SAS procedure for mapping quantitative trait loci (QTL). The program was coded in C++ and the interface with the SAS system was conducted using the SAS/Toolkit software (SAS INSTITUTE INC 1991). Since this procedure is not a built-in SAS procedure, users need to obtain a copy of the executable file of PROC QTL and install the software in their personal computers before PROC QTL can be executed. Of course, users need a regular SAS license prior to the installation of PROC QTL. Once PROC QTL is installed, users can call the procedure just like they call any other regular SAS procedures without noticing the differences between this customized procedure and other built-in SAS procedures.

PROC QTL is different from other stand alone QTL mapping software packages, such as QTL Cartographer (WANG et al. 2007), in that the program must be executed within the SAS system to perform all the QTL analysis. It behaves like a parasite to the SAS system except that it presents no harm to the SAS system and the computers that run the program. The SAS system provides services to the procedure such as statement processing, data set management and memory allocation. PROC QTL can read SAS data sets and data views, perform data analysis, print results, and create other SAS data sets. There are many advantages to perform QTL mapping under SAS rather than using stand-alone programs:

• Familiarity – using PROC QTL is easy for SAS users because they already understood data input, data manipulation, and general SAS syntax.

• Convenience – a program incorporated into the SAS system allows you to put all your programming tools in one place.

• Integration – the data used by RPOC QTL can easily be sorted, printed, and analyzed using other SAS procedures during a single job.

• Special Capabilities – special features, such as BY-group processing and Weight variable handling, can be used.

• Reduced documentation – only the new language statements, the output of the procedure, and any special calculations in the procedure need to be explained.

Page 5: PROC QTL

2

Getting started: QTL procedure

QTL mapping usually needs the following information: the phenotypic values of a quantitative trait of interest, the genotypes of molecular markers and the linkage map of the markers. Users need to create two SAS datasets, the primary dataset and the map dataset. The primary dataset should contain the phenotypic values, the marker genotypes and all other variables relevant to the QTL mapping. The map dataset contains only three variables, the marker name, the position of marker and the chromosome. Following is an example showing how to create the primary SAS dataset for a BC (backcross) population. The first variable y is the phenotypic value and M1-M10 are the genotypes for ten markers. In this example, A and B indicate the two genotypes per locus and U indicates missing genotype.

/* Program 2-1 */ data one input y (m1-m10)($); cards; 19.87 U B B A A A A A A A 17.74 B B A A A A A A A A 21.74 A A A A A A A A A A 18.76 B A A U A B B B B B 20.39 A A A A A A A A A A 21.75 B B A A A U A A A A 18.44 A B B B B B B B B B 19.77 A A A A A A A B B B 22.37 A A A A A A A A A A 16.89 A A A B B B B B U B 20.06 A A A A A A A A A A 20.53 A U B B B B B B B A 23.04 A A A A A A A B A A 20.11 B U B B B B B B B A 20.95 A A A A A A A A A A 20.84 A A A A A A A A A A 20.30 B B B U A A A A B B 21.29 A A A A A A A A B B 19.19 A A A A A A A A A A 22.43 B A A A A A A A A A 20.29 A A A A A A A A A A 19.31 B B B B B B B B B B 18.75 B B A A B B B B B B 19.75 A A B B B B B B B B 21.26 A B B B B B B B B B 20.35 A A A A A A A A A A 16.93 B B B B B B B B B B 20.21 U A A U U B B B B B 20.78 A B B B B A A B B B 20.72 B B B B B B A U B B 23.87 A A A A U A A A B B 18.26 B U U B B B B B B B

21.01 A A A A A A A A A A 20.47 B B B B B U B B B B 18.54 B B B B B B B U B B 20.46 B B B U B B B B U A 21.05 B B B A A A U B B B 21.87 A A A A A A U A A A 19.57 B B B B B B A B U B 21.12 B B B B B B B U U U 18.46 B B B B B B B B B B 19.09 B B B B B A B B B B 19.68 A A A A A B B A A A 17.22 A A A A B B B B B B 20.68 U B B B U B B B B A 20.61 B B U A A A A A A B 21.74 A A A A A U A A A A 20.08 B B B B B B B B B B 21.40 A A U A B B B B B A 18.80 B B B B B B B B B B 18.89 A A A A A A A A A A 21.26 U B B B B B A A A A 18.56 A B B B B B U B U B 20.47 B B B B U B B B B B 21.38 B A A A A A A B B B 21.23 A A A A A A A A A A 19.42 B B B B B B B A A A 18.71 B B B B B B B B B U 18.13 B B B B B B B B B B 18.59 A A A B B B B B B B 23.43 A A A A A A A A A A 20.02 A A A A A A U A A B 16.68 A A A A A A U A A A 18.26 A A A A A B B A A A 19.68 A A A U A A A A A A 18.18 B B B B B B B B B B 21.74 U A A A A A A A A A 18.02 A U A A A B B B B B 17.67 B B B B B B B U B B

Page 6: PROC QTL

3

22.83 A A A A A A A A A U 19.76 A A A A B B B B B B 22.01 B B B B B B B B B B 21.54 B B U B A A A B B B 16.64 B B B B B B B B B B 20.16 B B B B A A B B B B 19.73 A B B B A U B B B B 21.58 B B B B B B B B B A 20.95 A A A A A A A A A A 22.45 A A A A A A A A A A 19.96 A A A A A A A A B B 24.17 A A U A A A A A A B 22.63 B B B B B B B B B B 17.56 B B B B B B B B B B 19.60 A A A A A A A A A A 19.93 A A A A A A A A U A 19.12 B B B B B B B B B A

20.96 B B B B B B B B B B 21.89 B B B B B B B B B U 19.77 B A U A A A A A A B 17.51 A A A A A A B A A A 20.84 U U A A A A A B U U 19.62 B A A A A A A A A A 19.57 A A A A B B B B B B 21.29 B A A A U A A A A A 18.73 U B B B B B B B B B 22.34 A A A A A A A A A A 19.23 A A A A A A A A A A 23.35 A A A A A A A A A A 20.06 B B B B B U B B B B 17.84 B A A A A A A A A A ; run;

The following dataset is an example of the map dataset. The three variables must be entered in this order, marker, position and chromosome. The marker variable stores the name of markers. This variable must be in character type. The marker names must match the marker variables defined in the primary dataset.

/* Program 2-2 */ data two; input marker $ position chromoso; cards; M1 0 1 M2 10.2 1 M3 18.6 1 M4 25.8 1 M5 33.9 1 M6 42.1 1 M7 51.8 1 M8 62.0 1 M9 71.4 1 M10 80.3 1 ; run;

The following code shows how to invoke PROC QTL.

/* Program 2-3 */ proc qtl data=one map=two out=result method="ml" step=5/fixed; model y =; matingtype "BC"; genotype A1A1='A' A1A2='B'; estimate "A"= .5 -.5; run;

Page 7: PROC QTL

4

The following figure shows the display of the output generated by PROC QTL.

The QTL Procedure Mapping information: Mapping method: Maximum likelihood method Step: 5.00 centiMorgan(cM) / Fixed Maximum number of iterations: 100 Convergence error: 1.000E-08 Population Information: Sample size: 100 Number of non-QTL effects: 0 Number of markers: 10 Number of traits: 1 Mating type: Backcross Marker Information: AA: 486 AB: 460 Number of missing marker genotypes: 54 Total number of marker genotypes: 1000 Missing marker proportion: 5.40% QTL Effect(s) defined: A : 0.50 AA - 0.50 AB

Page 8: PROC QTL

5

The main result of PROC QTL is a stored SAS dataset named RESULT. This output dataset has 21 observations and 12 variables as shown below.

Table 2.1 output of Program 2-3

trait chr marker position n_Iter conv_err LRT Wald ve intercpt A var_1 1 y 1 M1 0 2 4.66E-11 3.598 3.695 2.604 20.114 0.621 0.104 2 y 1 5 2 7.30E-09 4.492 5.139 2.569 20.108 0.728 0.103 3 y 1 10 2 8.19E-10 4.638 4.799 2.577 20.104 0.705 0.104 4 y 1 M2 10.2 2 1.97E-10 4.627 4.750 2.578 20.103 0.702 0.104 5 y 1 15.2 3 1.12E-10 4.295 4.618 2.581 20.096 0.694 0.104 6 y 1 M3 18.6 2 3.80E-09 3.800 3.884 2.600 20.092 0.640 0.105 7 y 1 23.6 3 1.82E-10 6.311 6.792 2.529 20.079 0.835 0.103 8 y 1 M4 25.8 3 5.86E-12 7.223 7.495 2.512 20.073 0.875 0.102 9 y 1 30.8 3 6.65E-10 10.051 11.210 2.428 20.076 1.049 0.098 10 y 1 M5 33.9 2 1.60E-09 11.112 11.764 2.416 20.081 1.070 0.097 11 y 1 38.9 3 1.04E-10 14.966 17.012 2.307 20.103 1.255 0.093 12 y 1 M6 42.1 2 3.81E-13 16.588 18.065 2.287 20.108 1.286 0.092 13 y 1 47.1 3 1.75E-09 20.415 24.227 2.174 20.117 1.452 0.087 14 y 1 M7 51.8 3 1.92E-11 22.463 25.667 2.149 20.119 1.485 0.086 15 y 1 56.8 4 6.46E-10 16.905 21.091 2.230 20.144 1.372 0.089 16 y 1 61.8 2 8.57E-09 9.509 10.184 2.451 20.160 1.001 0.099 17 y 1 M8 62 2 2.68E-10 9.211 9.685 2.462 20.160 0.979 0.099 18 y 1 67 3 3.55E-09 9.316 10.813 2.436 20.169 1.031 0.098 19 y 1 M9 71.4 3 1.26E-12 7.802 8.147 2.497 20.183 0.909 0.101 20 y 1 76.4 3 2.08E-09 7.990 9.248 2.471 20.166 0.960 0.100 21 y 1 M10 80.3 2 1.97E-09 6.909 7.273 2.517 20.156 0.858 0.101

The 12 variables in the output dataset are: Trait (name of the dependent variable specified in the model statement, Chr (the chromosome identification), Marker (marker name), Position (the location of the chromosome that is scanned by PRCO QTL), n_Iter (the number of iterations required for the ML method to converge), Conv_Err (the convergence error); LRT (the likelihood ratio test statistic), WALD (the WALD test statistics), Ve (the residual error variance), Intercpt (the intercept or mean), Additive (The additive QTL effect) and var_1 (the variance of the estimated QTL effect, the square root of it is the standard error).

Page 9: PROC QTL

6

Syntax: QTL procedure

The following statements are available in PROC QTL. Items within the < > are optional.

PROC QTL < options >; CLASS variable list; MODEL trait list = non-QTL-effects; MARKER variable list; MATINGTYPE 'label'; GENOTYPE genotype = 'label1' genotype = 'label2' <genotype = 'label3' genotype = 'label4'>; ESTIMATE 'label1' = effect-contrast < 'label2' = effect-contrast 'label2' = effect-contrast >; RANGE number list; WEIGHT variable; BY variable list;

The PROC QTL statement invokes the procedure. The MODEL statement, the GENOTYPE statement and the ESTMATE statement are required along with the PROC QTL statement. All other statements are optional. The following table gives a brief description for each statement of PROC QTL.

Table 3.1 PROC QTL Statement Options

Statement Description

PROC QTL invokes the procedure CLASS declares classification (discrete) variables MODEL defines the linear model to be fit for non-QTL effects, e.g., location

and gender MARKER provides names of markers to be included in the analysis MATINGTYPE define the type of line cross GENOTYPE defines marker genotypes ESTIMATE defines QTL effects (linear contrasts of genotypic values) RANGE specifies a region of the genome for analysis WEIGHT declares a weight variable BY declare variables as subgroups for separate analysis (data must be

sorted prior to the procedure is called)

PROC QTL Statement

PROC QTL < options >;

The QTL procedure starts with the PROC QTL statement. Table 3.2 summarizes some important options in the PROC QTL statement by function. These and other options in the PROC QTL statement are then

Page 10: PROC QTL

7

Table 3.2 PROC QTL Statement Options

Option Description

BURNIN = sets the number of iterations that will be discarded before we collect the posterior samples

COVERAGE = set the average genome coverage (in cM) of each QTL DATA= specifies the input data set DISTRIBUTION selects an appropriate distribution to describe a discrete

variable. EBAYESPARM provide the values of hyperparameters for the empirical Bayes

method GENOTYPE = defines the genotype updating algorithm to be used when the

Bayesian method INTERACTION Include epistatic effects in EBAYES method MAP= specifies the map data set MAXERR = sets the convergence criterion MAXITER= specifies the maximum number of iterations METHOD= determines estimation method OUT= specifies the output data set OUTPOST = provide result of the post MCMC analysis PERMUTATION allows users to perform the permutation analysis for the

Bayesian method POSTERIORSAMPLE = sets the posterior sample size POSITION = allows users to update the QTL position during the MCMC

sampling process SEED = sets a seed to initialize the pseudorandom generator STEP = gives the increment (cM) for genome scanning TRIM = specifies the number of iterations that will be skipped for

collection of the posterior sample

Table 3.3 Options that are compatible with the methods in the PROC QTL statement.

Option Default Setting LS IRLS ML FISHER BAYES EBAYES

BURNIN 2000 Yes COVERAGE 20 Yes DATA Yes Yes Yes Yes Yes Yes DISTORTION OFF Yes DISTRIBUTION Yes Yes Yes EBAYESPARM 2, 0 Yes GENOTYPE IMPUTE Yes INTERACTION OFF Yes MAP Yes Yes Yes Yes Yes Yes MAXERR 1.00E-08 Yes Yes Yes Yes MAXITER 100 Yes Yes Yes Yes OUT Yes Yes Yes Yes Yes Yes OUTPOST OFF Yes PERMUTAION OFF Yes POSITION DYNAMIC Yes POSTERIORSAMPLE 500 Yes SEED 0 Yes STEP 1 Yes Yes Yes Yes TRIM 20 Yes

Page 11: PROC QTL

8

described fully in alphabetical order. Table 3.3 summarizes the options that are compatible with the methods in the PROC QTL statement. You can specify the following options in the PROC QTL statement.

BURNIN = number

This option sets the number of iterations that will be discarded before we collect the posterior samples. The default value is 2000, but a larger number may be specified to avoid collection of samples before MCMC converges to the stationary distribution. This option is only valid for the Bayes method.

COVERAGE = number

This option is used to set the average genome coverage (in cM) of each QTL. The default value for the COVERAGE option is 20, i.e., one QTL is placed in every 20 cM of the genome. Choice of the proper value for this option depends on the value of the POSITION option. If POSTION = "DYNAMIC" is specified, the QTL positions will move along the genome, and thus a large value of the genome coverage per QTL number can be set. The genome coverage per QTL determines the number of QTL included in the model. The more the coverage per QTL, the less the number of QTL placed on the genome. If POSTION="STATIC" is specified, however, a small value of the genome coverage per QTL should be specified, i.e., more QTL should be placed on the genome. This will make sure that every region of the genome has an equal chance of being evaluated. The program will place at least one QTL per chromosome. If one sets COVERAGE= 1000, i.e., a QTL covers 1000 cM of the genome. This is equivalent to putting no QTL on the genome. In this case, the Bayesian method will only evaluate the markers. This is a Bayesian version of the all marker analysis. The option is valid for the BAYES method only.

DATA = SAS-dataset

This option names an input SAS data set used by the QTL procedure. This dataset is called the primary SAS dataset throughout the manual. If no dataset is named, by default, PROC QTL takes the current SAS data set as the primary input dataset for QTL mapping. The primary dataset should contain the phenotypic values, marker genotypes and any other variables relevant to the QTL mapping, e.g., sex, location, year and so on. Depending on the type of mapping populations, there are two formats of the primary datasets. One format is for BC, F2, RIL and DH populations. The other format is for FW cross (four-way cross). The first format is commonly seen in QTL mapping in line crosses. The second format (required by the FW cross) is a typical format required for pedigree data analysis. The following example shows the primary dataset of the first format, which

Page 12: PROC QTL

9

has 29 variables and 110 observations. Marker genotype variables can be in either character type or numeric type. The 26 variables, m1-m26, represent 26 markers. The values of the markers are A (homozygote), B (homozygote), H (heterozygote) and U (missing value).

data format1; input id sex $ wt10 (m1-m26)($); datalines; 1 M 55.0 A A A A A H H H A U H H U B B B B B H B B B B B B B 2 M 54.2 B B B B H U U H A A A H H H H H U H U B B B H H A A 3 F 61.6 H H H H H H H H H H H H B H H H H H H H B B H H H A 4 M 66.6 H H H B H B B H B B B B B B B B B H H H B B B H H H 5 M 67.4 B B B B B B B B B B B B B B B B B B H H H H H H H H … 110 M 53.2 U U A A A A A H H U H H U A A U A A A U A U U U U U ; run;

The following example shows the primary dataset of the second format, which has 16 variables and 112 observations. The first column is the id (identification) of an individual, the second and third column are the id's of the sire and the dam, respectively. An id with value -1 (any negative number) indicates that this individual is a founder. The genotype of each marker is represented by two characters (or a two-digit number), the first character (or digit) for the paternal allele and the second character for the maternal allele. For the founders, the paternal and maternal alleles must be entered in the correct order. For non-founder individuals, we may not have the information about the paternal and maternal allelic origins, and thus the two alleles can be entered arbitrarily. Once you input the marker genotypes in the second format, you do not need to use the GENOTYPE statement to convert the genotypes. The second format requires parental data to be entered into the pedigree prior to their children. The FW design only involves two parents, and thus the first two rows are reserved for the two parents and data of the children occupy the remaining rows of the dataset. Each marker variable contains a value in two letters or two digits. For the founders (parents), the first letter (or digit) represents the paternal allele and the second letter (or digit) represents the maternal allele. The phase in the children is irrelevant.

data format2; input id sir dam sex $ wt10 (m1-m10)($); datalines; 1 -1 -1 F -999 AB AB AB AB AB AB AB AB AB AB 2 -1 -1 M -999 AB AB AB AB AB AB AB AB AB AB 3 1 2 M 55.0 AA AA AA AA AA AB AB AB AA UU 4 1 2 M 54.2 BB BB BB BB AB UU UU AB AA AA 5 1 2 F 61.6 AB AB AB AB AB AB AB AB AB AB 6 1 2 M 66.6 AB AB AB BB AB BB BB AB BB BB 7 1 2 M 67.4 BB BB BB BB BB BB BB BB BB BB … 112 1 2 M 53.2 UU UU AA AA AA AA AA AB AB UU ; run

Page 13: PROC QTL

10

User may use either "." or "-999.999" to indicate missing phenotypic values in the primary dataset. PROC QTL handles missing (unobserved) phenotypic variables using one of two ways. For the Bayesian method (only Bayesian method), the missing phenotypic values are sampled from their conditional posterior distribution within each iteration. For all other methods (non-Bayesian methods), missing phenotypic values are replaced by the observed population means of the traits. Users may also delete observations with missing phenotypic values in the data step before calling the QTL procedure. The current version of PROC QTL (except the Bayesian method) cannot handle missing phenotypic values measured as discrete or count data. Users are strongly encouraged to delete observations with missing phenotypic values in the data step. Optimal strategies for handling missing values are under development and will be available soon.

DISTRIBUTION = 'distribution-type'

This option allows users to select an appropriate distribution to describe a discrete variable. The current version of PROC QTL supports two distributions: the 'POISSON' distribution and the 'BINOMIAL' distribution. More distribution will be added later. Although the Poisson variable and Binomial variable are discrete, users should NOT declare them in the CLASS statement once this option is specified. This option is valid for the LS, FISHER and ML method.

EBAYESPARM = {number, number}

Users can use this option to provide the values of hyperparameters { , }τ ω for the empirical Bayes method (EBAYES).

By default, { , } { 2, 0}τ ω = − , which is equivalent to an unbounded uniform prior for the variance component for each QTL effect (regression coefficient). Under this setting, an explicit solution exists for each variance component conditional on other variance components in each iteration. Otherwise, the SIMPLEX algorithm will be used to find the numerical values. In most cases, PROC QTL generates reasonably good results with less computing time by using the default hyperparameter setting. Therefore, it is strongly recommended that users ignore this option. This option is valid for the EBAYES method only. In the Bayesian shrinkage analysis (WANG et al. 2005; XU 2003), the hyper parameter values are { , } {0, 0}τ ω = , which is also called the Jeffreys’ prior or vague prior, i.e., 2 2( ) 1 /k kp σ σ= .

GENOTYPE = 'genotype-updating-approach'

This option defines the genotype updating algorithm to be used when the

Page 14: PROC QTL

11

Bayesian method of QTL mapping is selected. It takes no effect if any other method is turned on. The current version provides two alternatives: "IMPUTE" and "EXPECT". When "IMPUTE" is specified, the QTL genotypes for each individuals will be sampled from its conditional posterior probability distribution. If "EXPECT" is specified, the QTL genotypes will be substituted by the conditional expectations. "IMPUTE" is the default value for the GENOTYPE option. This option is valid only for the BAYES method.

INTERACTION

This option is a switch to indicate whether or not epistatic effects are included in the analysis. This option is only valid for the EBAYES method in the current version of the program.

MAP = SAS-dataset

This option names the SAS data set for the linkage map of marker loci. A valid map dataset must always contain the following three variables in the correct order: marker-name, marker-location and chromosome. Marker-name must be a character variable. The location of a marker must be numerical and measured in cM. When the MAP=SAS-dataset option is used, PROC QTL will perform interval mapping based on this map. If this option is not chosen, you need to provide names of markers you want to analyze using the MARKER statement. PROC QTL will then conduct marker analysis without map (see the MARKER statement described later). The following example shows a map dataset with three variables and 26 observations. The marker names in the MAP dataset must match the marker variables defined in the primary SAS dataset.

data map; input marker $ position chromosome; datalines; M1 0.0 1 M2 19.6 1 M3 25.5 1 M4 26.4 1 … M26 87.9 2 ; run;

MAXERR= maximum-convergence-error

This option sets the convergence criterion for any methods that require iterations. The default value is MAXERR=1E-8. A smaller value may increase the number of iterations required to converge.

MAXITER= maximum number of iterations

This option defines the maximum number of iterations allowed for any

Page 15: PROC QTL

12

numerical methods specified in the METHOD option. If this option is absent, the default maximum number of iterations is 100.

METHOD = 'method' < /DISTORTION >

This option specifies a statistical method for QTL mapping. Five methods are available in the current version of the program. They are the least square method (LS) (HALEY and KNOTT 1992), the iteratively reweighted least square (IRLS)(XU 1998a, b), the maximum likelihood method (ML) (LANDER and BOTSTEIN 1989), the Fisher scoring method (FISHER) (HAN and XU 2008), the Bayesian method (BAYES) (WANG et al. 2005) and the empirical Bayesian method (EBAYES) (XU 2007). If the method option is not specified, the default method is ML. With the ML method, users may further test segregation distortion and map QTL based on non-Mendelian segregation ratio by adding the DISTORTION option (XU 2008; XU and HU 2009). For example, users who analyze the mouse data may want to test segregation distortion using the following option,

METHOD='ML'/DISTORTION

If the DISTORTION option is omitted, Mendelian segregation is assumed for calculating the conditional probability of QTL genotype given marker information. The DISTORTION option only takes effect when the ML method is used. You may specify the DISTORTION option under other methods, but it will take no effect.

Table 3.4 Legal values of the METHOD option

METHOD single trait multiple traits distortion analysis continuous discrete Poisson /

Binomial continuous discrete continuous & discrete

LS Q Q Q Q IRLS Q Q ML Q Q Q Q Q FISHER Q Q Q BAYES Q Q Q Q Q EBAYES M

Note: Q, the method can be used to perform QTL mapping; M, the method is only valid for marker analysis.

OUT = SAS-dataset

This option names an output data set that contains the results of QTL mapping. The number of variables and observations depends on other options or statements provided by the user. Basically, the dataset contains the chromosome position scanned (called virtual map), the number of iterations taken for convergence, the test statistics (LRT and Wald), the regression coefficients for non-QTL effects, the estimated residual error variance, the estimated QTL effects (linear contrasts of genotypic values)

Page 16: PROC QTL

13

and the variance-covariance matrix of the estimated QTL effects. Note that the output data set will also contain the calculated segregation proportions of genotypes for the scanned positions and the likelihood ratio test statistic for the deviation from Mendelian segregation ratio if the DISTORTION suboption is specified with the METHOD='ML' option (see the METHOD='label' option. If METHOD=’BAYES’, the output is entirely different from that of any other method. For the LS, IRLS, FISHER and ML methods, the following variables appear most likely in the output SAS dataset.

Table 3.5 Variables in output dataset of LS, IRLS, FISHER and ML methods

Variables Description

trait The name of dependents variable specified in the model statement chr Chromosome identification of the position scanned marker The name of molecular marker variable described in the primary dataset position Location of each assumed locus in the linkage map n_iter Number of iterations required for convergence conv_err Convergence error LRT Likelihood ratio test statistics Wald Wald test statistics ve Residual error variance intercpt The intercept or the mean of the dependent variable (trait) var_i The variance of the i-th user-defined QTL effect cov_i_j The covariance between the i-th and the j-th QTL effects intcpt_1 … intcpt_n

The intercept for ordinal data analysis. The first and last values are -1E10 and 1E10, respectively.

LRT_dist The test statistics for segregation distortion analysis freq_AA freq_AB freq_BB

The estimated frequencies of genotypes AA, AB and BB for the locus scanned

Table 3.6 Variables in output dataset of BAYES method

Variables Description

fixed_i The i-th fixed effect (non-QTL effect) ve The residual error variance intcpt_1, … , intcpt_n

The intercepts for ordinal data analysis. The first and last values are –1E10 and 1E10, respectively.

chr_i Chromosome identification of the i-th QTL included in the model p_i The location of the i-th QTL in the corresponding linkage group

(chromosome) a_i b_i …

a_i: the first user-defined effect for the i-th QTL b_i: the second user-defined effect for the i-th QTL

va_i vb_i …

Va_i: The variance of the first user-defined effect for the i-th QTL Vb_i: The variance of the second user-defined effect for the i-th QTL

Page 17: PROC QTL

14

The BAYES method produces an output file containing the posterior sample for all variables generated in the MCMC sampling process. The variables in the posterior sample are listed in the table 3.6.

OUTPOST = SAS-dataset </ { options }>

This option allows users to perform post MCMC analysis for the Bayesian method. The default result of post MCMC analysis contains the posterior sample size, the posterior means and the posterior variance-covariance matrix of estimated QTL effects for each putative position of the genome. For example, if you use the following option,

OUTPOST = RESULT / {STEP = 5.0}

the result dataset will contain the posterior sample size (count), the posterior means and the posterior variance-covariance matrix of all estimated QTL effects for every segment of the genome that covers 5 cM. The content of the OUTPOST dataset for the Bayesian analysis is similar to that of the OUT dataset of the none-Bayesian methods (see Table 3.5) except that a new variable named 'count' is added. The count represents the number of hits by a QTL for each defined segment of the genome. The OUTPOST = option is valid for the BAYES method only. The sub-options available in the OUTPOST option are described as follows.

STEP = number

This sub-option specifies the bin size of the genome for the post MCMC analysis. The valid value for this option should be a number larger than 0.05 and smaller than 10; otherwise, this option will be ignored. By default, STEP = 1.0 is assigned to this option. This option is only valid when POSITION=”DYNAMIC” is turned on; otherwise, it will be ignored.

MCMCINPUT=SAS-dataset

This sub-option assigns a SAS dataset that contains a pre-prepared MCMC sample. Once a dataset is provided here, PROC QTL will skip the sampling process and directly execute the post MCMC analysis for the provided posterior sample loaded here.

QUANTILE ={ number list }

This sub-option specifies the quantiles (percentiles) of the posterior sample for each estimated QTL effect. The valid values for this option should be numbers between 0 and 1. The number outside the scope will be replaced with its nearest boundary value, i.e., 0 for a negative number and 1 for any values larger than one. This option is only valid when the POSITION=”STATIC” is turned on; otherwise, it will be ignored.

Page 18: PROC QTL

15

PERMUTATION

This option allows users to perform permutation analysis for the Bayesian method. Once the PERMUTATION option is turned on in the PROC QTL statement, the phenotypic values will be reshuffled before parameters are sampled in every circle of the MCMC sampling process. Since the QTL effects in the posterior samples are drawn from the null distributions, users can infer the 95% and 99% confidence intervals of the posterior samples (null distributions). QTL effects fall outside the 95% confidence intervals of the null distributions are considered as significant. The permutation test is better performed in marker analysis or fixed QTL positions, i.e., POSITION = static. A demonstration of the permutation analysis can be found in Example 9. Note that the PERMUTATION option is valid for the Bayesian method only.

POSITION = 'position-updating-approach' </RANDOM>

This option allows users to update the QTL position during the MCMC sampling process. There are two approaches for QTL position updating. One is “DYNAMIC” and the other is “STATIC”. When the DYNAMIC approach is specified, the QTL position is updated using the Metropolis-Hastings algorithm. If the /RANDOM suboption is turned on, the QTL position is updated randomly, i.e., a new position is randomly selected in the neighborhood of the old position and it is always accepted without using the Metropolis-Hastings criterion to decide whether the new position should be accepted or not. When "STATIC" is turned on, the QTL positions will stay where they are throughout the entire MCMC sampling process. This option is valid for the BAYES method only.

POSTERIORSAMPLE = number

This option sets the posterior sample size, i.e., the number of observations saved in the MCMC sampling process. This option is valid for the BAYES method only.

SEED = number

This option sets a seed to initialize the pseudorandom generator used by the BAYES method. By default, SEED = 0, PROC QTL gets the current time from the system clock as the random seed and provides different results each time the program is executed. When the SEED=x for x>0, the results are repeatable. In other words, a separate run of the program with the same non-zero seed will generate exactly the same result.

STEP=number </FIXED>

This option gives the increment (step size in cM) for genome scanning. The default number is 1 cM. Without specifying the /FIXED suboption, the step

Page 19: PROC QTL

16

size may vary from one interval to another interval to make sure that marker positions are included in the virtual map. The number assigned to the step size is the maximum increment allowed in the scanning. For example, if STEP=2 is chosen, then PROC QTL will scan the genome in every d cM, where d ≤ 2. The value of d will be equal to 2 if and only if the interval size divided by 2 is a whole number (integer). With the /FIXED suboption, the number assigned is exactly the step size except that marker positions are forced to be included in the virtual map. For example, if STEP=2/FIXED, the genome will be scanned in every 2 cM except that the step prior to a marker may be less than 2 cM if the interval divided by 2 is not a whole number. Since RPOC QTL generates a virtual map that always includes marker positions, users may use STEP=1E8 option to perform marker analysis only.

TRIM = number

This option specifies the number of iterations that will be skipped for collection of the posterior sample after the burnin period. For example, if TRIMMING=20 is specified, we then collect one observation in every 20 iterations after the burnin period. A larger number of trimming will decrease the serial correlation between consecutive observations of the posterior sample. Again, TRIMMING is only valid when the Bayesian method is used for QTL mapping.

BY Statement

BY variable;

Users may use a BY statement to obtain separate analyses on observations in groups defined by the BY variable. The BY variable must be sorted in the primary dataset before PROC QTL is executed. Users may declare more BY variables, just like the BY variable statement used with any other built-in SAS procedures.

CLASS Statement

CLASS variable list;

The CLASS statement declares one or more variables (variable list) as discrete variables. Typical class variables are TREATMENT, SEX, RACE, GROUP and REPLICATION. Users may also declare a TRAIT (DEPENDENT variable) in the CLASS statement. If a trait is declared as a CLASS variable, it will be treated as a binary or ordered categorical trait and the generalized linear model (GLM) under the PROBIT link function will be used to perform the QTL mapping. Variables declared in the CLASS statement will be recoded and decomposed into one or more single

Page 20: PROC QTL

17

contrasts by using the full rank design function (see the DESIGNF function in the PROC IML environment). Therefore, user may expect to see more non-QTL effects in the output dataset than the number of variables included in the MODEL statement. Variables with non-integer values are valid variables for inclusion in the CLASS statement, but each different value will be treated as a separate category during the recoding process. An excessive number of categories may cause the program to halt because PROC QTL can only handle a maximum number of 10 categories.

ESTIMATE Statement

ESTIMATE 'QTL-effect-name' = contrast < …'QTL-effect-name' = contrast >

With the ESTIMATE statement, a user can define up to three (one, two or three) different QTL effects expressed as linear functions (or contrasts) of the genotypic values. For the BC, RIL1, RIL2 and DH mating designs, there are only two genotypes. Therefore, only one QTL effect can be defined. For the BC design, the two genotypes follow this order: A1A1 and A1A2. Let G11 and G12 be the genotypic values. The QTL effect is defined as A=G11-G12. Therefore, the estimate statement appears as

ESTIMATE 'A' =1 -1;

For the remaining two mating designs (RIL1, RIL2 and DH), the order of the two genotypes is: A1A1 and A2A2. The QTL effect is defined as A=G11-G22. Therefore, the estimate statement is

ESTIMATE 'A'=1 -1;

User can use any other linear combinations as the QTL effect. For example, a user may prefer using A=0.5G11-0.5G22 as the QTL effect, in which case the estimate statement should be

ESTIMATE 'A'=0.5 -0.5;

Users may also want to express the QTL effect as A=G11 assuming that G22=0. The estimate statement should be written as

ESTIMATE 'A'=1 0;

For an F2 mating design, the order of the three genotypes is A1A1, A1A2 and A2A2. Users can define up to two QTL effects, additive and dominance effects. Let A=G11-G22 be the additive effect and D=G12-0.5(G11+G22) be the dominance effect. The estimate statement is

ESTIMATE 'A'=1 0 -1 'D'=-0.5 1 -0.5;

Page 21: PROC QTL

18

Users have the flexibility to choose any other different scales to define the QTL effects with the estimate statement. If a user only wants to fit an additive model, the corresponding estimate statement should appear like ESTIMATE 'A'=1 0 -1, simply ignoring the contrast for the dominance effect. For a FW cross design, the order of the four possible genotypes is as follows: A1A3, A1A4, A2A3 and A2A4, assuming that the parental mating type is A1A2×A3A4. Users can define up to three QTL effects (i.e., you can estimate one, two or three QTL effects). Users can define the three different effects in an arbitrary fashion, but we recommend the following estimate statement,

ESTIMATE 'A_m'=1 1 -1 -1 'A_f'=1 -1 1 -1 'D'=1 -1 -1 1;

where A_m is the difference between the two alleles of the male parent, A_f is the difference between the two alleles of the female parent and D is the interaction between A_m and A_f (the so called dominance effect).

GENOTYPE Statement

GENOTYPE genotype = 'label1' genotype = 'label2' <genotype = 'label3'>;

This statement is required for all matingtypes except the FW design. Depending on the type of line crosses (specified by the MATINGTYPE statement), the number of possible marker genotypes varies from two (e.g., BC population) to four (e.g., four way cross). Investigators may code the genotypes arbitrarily in the primary dataset. However, PROC QTL requires a conversion from the genotype labels in the primary dataset to the labels that are recognizable by the procedure. This process is accomplished by using the genotype statement. Since the number of possible genotypes per locus depends on the type of line crosses (matingtype), the genotype labels need to be described under each matingtype.

BC mating design

The genotype statement for the BC design is either

GENOTYPE A='label1' H='label2';

or

GENOTYPE A1A1='label1' A1A2='label2';

where label1 and label2 are the homozygote and heterozygote, respectively, given in the primary dataset. Note that a BC population contains only two genotypes.

Page 22: PROC QTL

19

DH mating design

The genotype statement for the double haploid design is either

GENOTYPE A='label1' H='label2';

or

GENOTYPE A1A1='label1' A1A2='label2';

where label1 and label2 are the two homozygotes. Note that a DH population contains only two genotypes.

F2 mating design

The genotype statement is

GENOTYPE A1A1='label1' A1A2='label2' A2A2='label3';

The three labels (label1, label2 and label3) are simply character values used by the users in the primary dataset to indicate the three genotypes (first homozygote, heterozygote and second homozygote). Any other character values (not label1, label2 and label3) appeared in the primary dataset will be treated as missing values. An alternative genotype conversion system is

GENOTYPE A='label1' H='label2' B='label3';

Again, label1, label2 and label3 are character values used by the users in the primary dataset to indicate the first homozygote, the heterozygote and the second homozygote, respectively.

FW mating design

The four way mating design requires an entirely different system for genotype data input and the GENOTYPE statement is not needed (see the primary SAS dataset of format2).

RIL mating design

The genotype statement for the recombinant inbred lines (RIL1 and RIL2) is either

GENOTYPE A='label1' B='label2';

or

GENOTYPE A1A1='label1' A2A2='label2';

where label1 and label2 are the two homozygotes. Note that an RIL population contains only two genotypes.

Page 23: PROC QTL

20

MARKER Statement

MARKER variable list

This statement defines marker variables for inclusion in the marker analysis if a map dataset is not provided. Sometimes, investigators may be interested in marker analysis before a map has been generated (map not available). In this case, users must provide the names of markers to be included in the analysis. The markers are declared using the marker statement. If users provide both the map dataset and the marker statement, the marker statement will take no effect.

MATINGTYPE Statement

MATINGTYPE 'matingtype';

The matingtype statement is used to define the type of population. The current version of the program can handle four different populations (mating types): BC (backcross), F2, RIL (recombinant inbred lines), DH (double haploid) and FW (four-way cross). There are two different types of RIL. RIL1 is created by sefling the F2 for many generations. RIL2 is generated by brother-sister mating for many generations. If MATINGTYPE 'RIL' is used without specifying which type of the two RILs, it means MATINGTYPE 'RIL1'. We recommend using MATINGTYEP 'RIL1' or MATINGTYPE 'RIL2' explicitly to eliminate any confusion. Note that the definition of MATINIGTYPE statement will affect declaration of the GENOTYPE statement and the ESTIMATE statement.

MODEL Statement

MODEL trait = < non-QTL-effects >;

The MODEL statement names the traits (also called the dependent variables) and non-QTL effects (independent variables). The dependent variables occur in the left hand side of the equation and non-QTL effects occur in the right hand side of the equation. Users may specify multiple continuous variables (traits) in the left hand side of the MODEL equation, but category traits must be analyzed one at a time, i.e., only one discrete variable is allowed to appear in the left hand side of the "=" sign. All variables that appear as traits in the MODEL statement must be numerical variables defined in the primary data set. If a categorical variable has been coded as a character variable, this variable must be recoded as numerical variable in the primary dataset and then discretized in the CLASS statement before PROC QTL can analyze it as an ordinal trait. A valid variable for

Page 24: PROC QTL

21

category trait may contain 2 to 10 categories and observations in each category must not be less than 5% of the sample size. If the variable has more than 10 categories or one or more categories have less than 5% of the sample size, the program will stop execution and the users are asked to recombine some of the categories before the program is re-executed. Both numerical and character variables are acceptable as non-QTL effects that occur in the MODEL statement. Discrete variables, however, need to be declared in the CLASS statement before they are entered into the model statement. If the DISTRIBUTION option in the QTL statement is defined as "BINORMIAL", users can specify the trait in the form of a single variable (binary trait only) or in the form of a ratio of two variables denoted by events/trials.

MODEL events/trials = < non-QTL-effects >;

This form is applicable only to summarized Bernoulli response data. When each observation in the input data set contains the number of events (for example, successes) and the number of trials from a set of Bernoulli trials, use the events/trials syntax. In the events/trials model syntax, users need to specify two variables that contain the event and trial counts. These two variables are separated by a slash (/). The values of both events and trials must be nonnegative, and the value of the trials variable must be greater than 0 for an observation to be valid. When each observation in the input data set contains a single trial from a Bernoulli experiment, use the first form of MODEL specification. If no non-QTL effects occur in the data, simply use “MODEL trait = ;” or "MODEL events/trials = ;" without specifying any variables in the right hand side of the "=" sign because by default, intercept (or mean) is always included in the analysis unless you specify an option in the MODEL statement with /NOINTERCEPT. The markers are independent variables and, normally, all independent variables should appear in the right hand side of the model equation. However, the MODEL statement designed here already assumes that all markers in the map dataset have been included, and thus marker variables should not appear again in the right hand side of the equation. If a map dataset is not provided, RPOC QTL will conduct marker analysis (not interval mapping) and the markers to be analyzed should be declared in the MARKER statement (see next paragraph).

RANGE Statement

RANGE number-list;

This statement is one of the optional statements and it allows the users to analyze a subset of the genome. The valid number range is from 1 to m,

Page 25: PROC QTL

22

where m is the total number of points to be scanned in the virtual map. This statement is useful once a user completes the interval mapping and wants to manipulate the program to do some further analysis, such as composite interval mapping. Users can hand-pick markers as co-factors and put these co-factors into the MODEL statement as non-QTL effects, and then scan a region of interest using the RANGE statement. The result of the scan for the specified region will be equivalent to that of the composite interval mapping because co-factors (markers) have been taken into account in the non-QTL effects listed in the MODEL statement.

WEIGHT statement

WEIGHT variable;

This statement declares a variable in the primary dataset as the weight to the observation. The WEIGHT statement is very useful if the data points for the phenotypic values are averages of several individual plants of the same genotypes (replicated experiment). In this case, the WEIGHT variable is the number of plants used to calculate the data point (average phenotype). One can only define one weight variable. If more than one variable are defined, only the first one is considered as the weight variable. Missing values and negative numbers of the weight variable are treated as 0 (eliminated from the data analysis if an observation has a weight with a value 0).

Details: QTL procedure

Details of the methods and algorithms implemented by PROC QTL can be found from a book entitled "Principles and Procedures of QTL Mapping". Users may access to the contents by the shortcut form "start menu -> Programs -> PROC QTL" after PROC QTL is installed. Users may also download the PDF version of the book from our website: http://www.statgen.ucr.edu

Page 26: PROC QTL

23

Examples: QTL procedure

Example 1: QTL mapping for continuous trait

This example shows the application of PROC QTL to a real life data from an F2 mouse population (LAN et al. 2006). The number of mice is 145 and the number of markers is 196. However, the example only shows the data of the first two chromosomes with 26 markers only. The trait is the ten week's body weight (wt10). The primary dataset and the MAP dataset in this example are named E1DATA and E1MAP, respectively. All datasets used in examples 1-6 have been copied to the SASUSER library when the PROC QTL is installed.

The variable SEX can be used as a none-QTL-effect, wt10 is the body weight at week 10. The following code will scan the two chromosomes for QTL for the trait wt10.

/* Program 4-1-1 */ proc qtl data=sasuser.E1data map=sasuser.E1map out=result

method='ml'/distortion step=1.0; class sex; model wt10= sex; matingtype 'F2'; genotype A1A1='A' A1A2='H' A2A2='B'; estimate 'additive'=1 0 -1; run;

The output is a SAS dataset named RESULT, which has 222 observations and 17 variables as shown below in Table 4.1.1.

PROC QTL 1.0 also provides a BY statement. By using this statement, users may perform an analysis for different genders separately. Note that, similar to that of the other SAS procedures, the primary dataset must be sorted before the BY statement can be used in the analysis. We may use the following statements to prepare the primary dataset and conduct QTL mapping separately for different genders.

/* Program 4-1-2 */ proc sort data=sasuser.E1data out=mouse2; by SEX; run; proc qtl data=mouse2 map=E1map out=result

method='ml'/distortion step=1.0; model wt10=;

Page 27: PROC QTL

24

matingtype 'F2'; genotype A1A1='A' A1A2='H' A2A2='B'; estimate 'additive'=1 0 -1; by sex; run;

The output generated from the above code is a SAS dataset named RESULT, which has 444 observations and 17 variables as shown in

Page 28: PROC QTL

25

Table 4.1.1 Output dataset of the mouse wt10 QTL mapping generated by Program 4-1-1.

trait chr marker position n_Iter conv_err LRT Wald ve intercpt fix_A_1 additive var_1 LRT_dist freq_AA freq_AB freq_BB 1 wt10 1 M1 0 4 4.84E-09 2.9103 3.2649 31.9097 60.0733 -1.6046 -1.4203 0.6178 1.7158 0.2000 0.5116 0.2884 2 wt10 1 0.98 5 2.12E-09 3.2571 3.8162 31.7571 60.0532 -1.6173 -1.5333 0.6160 1.9153 0.1967 0.5120 0.2913 3 wt10 1 1.96 6 1.04E-09 3.6267 4.4236 31.5911 60.0315 -1.6301 -1.6479 0.6139 2.1249 0.1935 0.5123 0.2942 4 wt10 1 2.94 6 8.07E-09 4.0136 5.0764 31.4151 60.0087 -1.6428 -1.7617 0.6113 2.3414 0.1904 0.5125 0.2971 5 wt10 1 3.92 7 2.89E-09 4.4106 5.7590 31.2335 59.9853 -1.6551 -1.8720 0.6085 2.5610 0.1875 0.5125 0.3001 6 wt10 1 4.9 7 9.91E-09 4.8105 6.4496 31.0523 59.9621 -1.6666 -1.9762 0.6055 2.7796 0.1848 0.5123 0.3029 7 wt10 1 5.88 8 2.50E-09 5.2051 7.1251 30.8773 59.9395 -1.6770 -2.0719 0.6025 2.9935 0.1824 0.5119 0.3057 8 wt10 1 6.86 8 4.59E-09 5.5868 7.7613 30.7144 59.9183 -1.6860 -2.1570 0.5995 3.1996 0.1803 0.5114 0.3083 9 wt10 1 7.84 8 6.30E-09 5.9487 8.3371 30.5684 59.8989 -1.6936 -2.2303 0.5966 3.3959 0.1785 0.5107 0.3108 10 wt10 1 8.82 8 6.73E-09 6.2858 8.8356 30.4429 59.8817 -1.6997 -2.2911 0.5941 3.5813 0.1770 0.5099 0.3131 11 wt10 1 9.8 8 5.76E-09 6.5942 9.2451 30.3402 59.8669 -1.7043 -2.3394 0.5919 3.7556 0.1757 0.5090 0.3153 12 wt10 1 10.78 8 4.01E-09 6.8717 9.5591 30.2614 59.8545 -1.7076 -2.3753 0.5902 3.9191 0.1747 0.5081 0.3173 13 wt10 1 11.76 8 2.29E-09 7.1169 9.7757 30.2066 59.8446 -1.7097 -2.3996 0.5890 4.0727 0.1739 0.5071 0.3191 14 wt10 1 12.74 8 1.06E-09 7.3313 9.8968 30.1749 59.8372 -1.7108 -2.4130 0.5883 4.2172 0.1732 0.5061 0.3207 15 wt10 1 13.72 7 4.87E-09 7.5161 9.9273 30.1651 59.8322 -1.7110 -2.4164 0.5882 4.3535 0.1727 0.5052 0.3221 16 wt10 1 14.7 7 1.64E-09 7.6738 9.8748 30.1753 59.8295 -1.7104 -2.4109 0.5886 4.4823 0.1724 0.5044 0.3232 17 wt10 1 15.68 6 7.65E-09 7.8078 9.7474 30.2036 59.8288 -1.7092 -2.3973 0.5896 4.6042 0.1722 0.5037 0.3241 18 wt10 1 16.66 6 1.65E-09 7.9225 9.5551 30.2477 59.8301 -1.7076 -2.3767 0.5912 4.7195 0.1720 0.5032 0.3248 19 wt10 1 17.64 5 7.67E-09 8.0231 9.3071 30.3055 59.8332 -1.7057 -2.3498 0.5933 4.8284 0.1720 0.5028 0.3252 20 wt10 1 18.62 5 6.39E-10 8.1156 9.0135 30.3748 59.8380 -1.7035 -2.3176 0.5959 4.9308 0.1720 0.5026 0.3254 222 wt10 2 M26 87.9 2 1.07E-10 0.3420 0.4113 32.7317 60.1958 -1.4577 0.4695 0.5360 1.4148 0.2899 0.4438 0.2663

Page 29: PROC QTL

26

Table 4.1.2. Output dataset of the mouse wt10 QTL mapping for separate sex generated by Program 4-1-2.

sex trait chr marker position n_Iter conv_err LRT Wald ve intercpt additive var_1 LRT_dist freq_AA freq_AB freq_BB 1 F wt10 1 M1 0 2 9.06E-14 2.6893 2.7652 26.6041 61.6819 -1.6512 0.9860 0.3004 0.2414 0.5345 0.2241 2 F wt10 1 0.98 4 2.28E-10 2.9157 3.1669 26.4294 61.6791 -1.7713 0.9907 0.3599 0.2373 0.5399 0.2228 3 F wt10 1 1.96 5 1.02E-09 3.1583 3.6112 26.2388 61.6755 -1.8959 0.9954 0.4302 0.2331 0.5454 0.2215 4 F wt10 1 2.94 6 1.40E-09 3.4125 4.0871 26.0377 61.6711 -2.0213 0.9997 0.5091 0.2289 0.5509 0.2202 5 F wt10 1 3.92 7 1.08E-09 3.6713 4.5768 25.8339 61.6658 -2.1430 1.0034 0.5928 0.2249 0.5561 0.2191 6 F wt10 1 4.9 7 5.58E-09 3.9269 5.0582 25.6367 61.6598 -2.2564 1.0065 0.6773 0.2210 0.5608 0.2182 7 F wt10 1 5.88 8 1.51E-09 4.1715 5.5095 25.4545 61.6532 -2.3576 1.0089 0.7588 0.2174 0.5650 0.2176 8 F wt10 1 6.86 8 2.58E-09 4.3984 5.9123 25.2941 61.6459 -2.4445 1.0107 0.8344 0.2142 0.5685 0.2173 9 F wt10 1 7.84 8 3.00E-09 4.6017 6.2543 25.1594 61.6381 -2.5158 1.0120 0.9030 0.2113 0.5713 0.2174 10 F wt10 1 8.82 8 2.61E-09 4.7793 6.5289 25.0523 61.6299 -2.5718 1.0131 0.9645 0.2087 0.5735 0.2178 11 F wt10 1 9.8 8 1.80E-09 4.9288 6.7336 24.9731 61.6215 -2.6130 1.0140 1.0195 0.2063 0.5752 0.2185 12 F wt10 1 10.78 8 1.02E-09 5.0495 6.8684 24.9212 61.6128 -2.6403 1.0149 1.0695 0.2041 0.5764 0.2195 13 F wt10 1 11.76 7 5.17E-09 5.1416 6.9350 24.8956 61.6041 -2.6544 1.0160 1.1156 0.2021 0.5771 0.2207 14 F wt10 1 12.74 7 2.18E-09 5.2059 6.9363 24.8952 61.5953 -2.6563 1.0172 1.1592 0.2003 0.5775 0.2222 15 F wt10 1 13.72 7 7.26E-10 5.2439 6.8754 24.9186 61.5867 -2.6465 1.0187 1.2013 0.1986 0.5776 0.2238 16 F wt10 1 14.7 6 2.98E-09 5.2573 6.7560 24.9647 61.5782 -2.6257 1.0205 1.2429 0.1970 0.5774 0.2256 17 F wt10 1 15.68 6 5.56E-10 5.2485 6.5831 25.0317 61.5701 -2.5944 1.0225 1.2846 0.1955 0.5769 0.2276 18 F wt10 1 16.66 5 1.69E-09 5.2202 6.3619 25.1180 61.5623 -2.5533 1.0248 1.3270 0.1940 0.5762 0.2297 19 F wt10 1 17.64 4 4.77E-09 5.1758 6.0991 25.2212 61.5551 -2.5032 1.0273 1.3706 0.1927 0.5753 0.2320 20 F wt10 1 18.62 4 3.37E-11 5.1191 5.8027 25.3386 61.5485 -2.4448 1.0301 1.4160 0.1914 0.5742 0.2343 21 F wt10 1 M2 19.6 3 2.11E-11 5.0548 5.4814 25.4672 61.5427 -2.3796 1.0330 1.4636 0.1902 0.5730 0.2368 444 M wt10 2 M26 87.9 3 6.14E-10 0.0187 0.1188 38.3229 58.7801 -0.3698 1.1507 4.1384 0.3465 0.3572 0.2963

Page 30: PROC QTL

27

Example 2: QTL mapping for discrete traits

The example is a real data of a F2 population for rice sheath blight disease (ZOU et al. 2000). There are 12 molecular markers, distributed along two chromosomes covering 268 cM in length. The sample size is 119. The disease resistance of each individual is measured in grade from grade 1 to grade 6. The primary dataset and MAP dataset are named E2data and E2map, respectively. All datasets in examples 1-6 will be copied to the SASUSER library when PROC QTL is installed.

The following code invokes PROC QTL. Since RESISTANC is declared as a CLASS variable, RESISTANC will be treated as an ordered categorical variable in the analysis. Although you may define up to two QTL effects in the F2 population, this example shows that you can ignore the dominance effect.

/* Program 4-2 */ proc qtl data=sasuser.E2data map= sasuser.E2map

out=result method="Fisher" step=1.0; class resistenc; model resistenc=; matingtype "F2"; genotype A="1" B="3" H="2"; estimate "a"=1 0 -1; run;

The output is a SAS dataset named result2, which has 270 observations and 18 variables as shown in Table 4.2.

Page 31: PROC QTL

28

Table 4.2. Output of the rice blight disease data generated by Program 4-2.

trait chr marker position n_Iter conv_err LRT Wald ve intcpt_0 intcpt_1 intcpt_2 intcpt_3 intcpt_4 intcpt_5 intcpt_6 a var_1 1 resisten 1 RM245 0 4 1.81E-11 6.9127 6.9278 1 -1E+10 -1.1924 -0.5065 -0.0621 0.3423 1.3260 1E+10 0.3546 0.0181 2 resisten 1 1 4 7.56E-11 7.4031 7.4615 1 -1E+10 -1.1970 -0.5085 -0.0622 0.3441 1.3308 1E+10 0.3764 0.0190 3 resisten 1 2 4 2.26E-10 7.8886 7.9745 1 -1E+10 -1.2021 -0.5111 -0.0627 0.3453 1.3350 1E+10 0.3966 0.0197 4 resisten 1 3 4 4.64E-10 8.3600 8.4549 1 -1E+10 -1.2076 -0.5141 -0.0637 0.3460 1.3384 1E+10 0.4148 0.0204 5 resisten 1 4 4 7.1E-10 8.8072 8.8945 1 -1E+10 -1.2131 -0.5174 -0.0651 0.3461 1.3408 1E+10 0.4305 0.0208 6 resisten 1 5 4 9E-10 9.2202 9.2889 1 -1E+10 -1.2186 -0.5208 -0.0668 0.3457 1.3424 1E+10 0.4436 0.0212 7 resisten 1 6 4 1.02E-09 9.5894 9.6372 1 -1E+10 -1.2238 -0.5244 -0.0688 0.3448 1.3429 1E+10 0.4538 0.0214 8 resisten 1 7 4 1.1E-09 9.9067 9.9405 1 -1E+10 -1.2286 -0.5279 -0.0711 0.3434 1.3423 1E+10 0.4612 0.0214 9 resisten 1 8 4 1.15E-09 10.1657 10.2009 1 -1E+10 -1.2328 -0.5313 -0.0735 0.3415 1.3408 1E+10 0.4657 0.0213 10 resisten 1 9 4 1.2E-09 10.3624 10.4210 1 -1E+10 -1.2365 -0.5346 -0.0761 0.3392 1.3383 1E+10 0.4674 0.0210 11 resisten 1 10 4 1.25E-09 10.4953 10.6027 1 -1E+10 -1.2395 -0.5378 -0.0788 0.3366 1.3347 1E+10 0.4661 0.0205 12 resisten 1 11 4 1.28E-09 10.5655 10.7476 1 -1E+10 -1.2419 -0.5408 -0.0816 0.3335 1.3303 1E+10 0.4620 0.0199 13 resisten 1 RM205 12 4 1.23E-09 10.5762 10.8567 1 -1E+10 -1.2437 -0.5436 -0.0845 0.3301 1.3249 1E+10 0.4549 0.0191 14 resisten 1 13 4 2.32E-09 10.9574 11.1468 1 -1E+10 -1.2489 -0.5465 -0.0854 0.3314 1.3300 1E+10 0.4744 0.0202 15 resisten 1 14 4 3.81E-09 11.3470 11.4234 1 -1E+10 -1.2546 -0.5497 -0.0864 0.3327 1.3352 1E+10 0.4942 0.0214 16 resisten 1 15 4 5.48E-09 11.7428 11.6827 1 -1E+10 -1.2606 -0.5531 -0.0876 0.3339 1.3404 1E+10 0.5139 0.0226 17 resisten 1 16 4 6.93E-09 12.1426 11.9211 1 -1E+10 -1.2669 -0.5568 -0.0889 0.3350 1.3458 1E+10 0.5334 0.0239 18 resisten 1 17 4 7.73E-09 12.5436 12.1359 1 -1E+10 -1.2734 -0.5607 -0.0904 0.3361 1.3510 1E+10 0.5524 0.0251 19 resisten 1 18 4 7.64E-09 12.9426 12.3253 1 -1E+10 -1.2801 -0.5647 -0.0921 0.3369 1.3562 1E+10 0.5708 0.0264 20 resisten 1 19 4 6.84E-09 13.3361 12.4884 1 -1E+10 -1.2869 -0.5689 -0.0940 0.3376 1.3611 1E+10 0.5883 0.0277 270 resisten 2 RM20B 101 3 4.01E-09 0.8907 0.8856 1 -1E+10 -1.2106 -0.5360 -0.1000 0.2893 1.2422 1E+10 -0.1244 0.0175

Page 32: PROC QTL

29

Example 3: QTL mapping in a four-way cross design

This example shows QTL mapping in a four-way cross design. There are 202 individuals in the primary dataset. The first two individuals are the parents and the remaining 200 individuals are progeny. The phenotypic values of the two parents will not be analyzed in QTL mapping. The parents only provide the marker genotypes. If there are no phenotypic values for the parents, simply place zeros because these two phenotypic values will not be used anyway. The primary dataset and the MAP dataset are named E3data and E3map, respectively. All datasets that are used in examples 1-6 will be copied to the SASUSER library when PROC QTL is installed.

The following statements perform QTL mapping for the FW design. Note that users may analyze a discrete trait (y2) as if it were a continuous trait in the analysis if the trait is not declared in the CLASS statement.

/* Program 4-3-1 */ proc qtl data=sasuser.E3data map= sasuser.E3map

out=result method="irls" step=1.0; model y2=; estimate "a_m"=1 1 -1 -1 "a_f"= 1 -1 1 -1; matingtype "fw"; run;

The output SAS dataset is named RESULT, which has 175 observations and 15 variables, as shown in Table 4.3.1.

The second trait in the primary dataset (y2) is a discrete trait. We can analyze this trait separately as an ordinal trait using the following statements. The map dataset is not used, thus, you need the MARKER statement to list the markers for inclusion of the analysis.

/* Program 4-3-2 */ proc qtl data=sasuser.E3data out=result method="ml"

step=1.0; class y2; model y2=; marker M1-M19; estimate "a_m"=1 1 -1 -1 "a_f"= 1 -1 1 -1; matingtype "fw"; run;

The output SAS dataset named RESULT has 19 observations and 17 variables as shown in Table 4.3.2.

Page 33: PROC QTL

30

Table 4.3.1. Output of Program 4-3-1.

trait chr marker position n_Iter Conv_err LRT Wald ve intercpt a_m a_f var_1 cov_1_2 var_2 1 y2 1 M1 0 1 0 3.3164 3.3440 1.1604 2.4892 0.0513 0.1305 0.0058 -0.0002 0.0060 2 y2 1 0.927 3 1.79E-10 3.5537 3.4652 1.1582 2.4903 0.0570 0.1298 0.0058 -0.0002 0.0060 3 y2 1 1.855 3 2.27E-09 3.7919 3.6031 1.1562 2.4915 0.0629 0.1289 0.0058 -0.0003 0.0060 4 y2 1 2.782 3 8.40E-09 4.0310 3.7586 1.1543 2.4927 0.0689 0.1281 0.0058 -0.0004 0.0060 5 y2 1 3.709 4 1.03E-10 4.2705 3.9324 1.1525 2.4940 0.0751 0.1273 0.0058 -0.0004 0.0060 6 y2 1 4.636 4 1.61E-10 4.5085 4.1255 1.1508 2.4954 0.0815 0.1264 0.0058 -0.0005 0.0060 7 y2 1 5.564 4 1.72E-10 4.7423 4.3386 1.1493 2.4968 0.0880 0.1256 0.0058 -0.0005 0.0060 8 y2 1 6.491 4 1.25E-10 4.9681 4.5726 1.1480 2.4983 0.0947 0.1248 0.0058 -0.0006 0.0060 9 y2 1 7.418 4 5.77E-11 5.1805 4.8286 1.1471 2.4999 0.1015 0.1240 0.0058 -0.0006 0.0059 10 y2 1 8.345 3 3.87E-09 5.3732 5.1080 1.1465 2.5016 0.1085 0.1232 0.0058 -0.0006 0.0059 11 y2 1 9.273 3 3.97E-10 5.5376 5.4124 1.1464 2.5034 0.1157 0.1226 0.0058 -0.0006 0.0059 12 y2 1 M2 10.2 1 0 5.6634 5.7443 1.1468 2.5053 0.1231 0.1219 0.0058 -0.0006 0.0059 13 y2 1 11.133 3 6.17E-10 5.9631 5.8674 1.1440 2.5064 0.1277 0.1201 0.0058 -0.0006 0.0059 14 y2 1 12.067 3 5.70E-09 6.2343 5.9987 1.1415 2.5075 0.1323 0.1184 0.0058 -0.0006 0.0059 15 y2 1 13 4 1.01E-10 6.4749 6.1381 1.1395 2.5085 0.1369 0.1167 0.0058 -0.0006 0.0059 16 y2 1 13.933 4 1.50E-10 6.6823 6.2862 1.1379 2.5095 0.1414 0.1151 0.0058 -0.0005 0.0059 17 y2 1 14.867 4 1.30E-10 6.8531 6.4436 1.1368 2.5105 0.1460 0.1135 0.0058 -0.0005 0.0059 18 y2 1 15.8 4 6.59E-11 6.9829 6.6113 1.1363 2.5114 0.1506 0.1120 0.0058 -0.0005 0.0059 19 y2 1 16.733 3 3.54E-09 7.0665 6.7906 1.1364 2.5122 0.1552 0.1106 0.0058 -0.0005 0.0059 20 y2 1 17.667 3 3.29E-10 7.0971 6.9835 1.1372 2.5130 0.1599 0.1093 0.0058 -0.0005 0.0059 175 y2 2 M19 83.2 1 0 2.2702 2.2831 1.1665 2.5146 -0.0126 0.1158 0.0060 -0.0005 0.0059

Page 34: PROC QTL

31

Table 4.3.2. Output of Program 4-3-2.

trait marker n_Iter conv_err LRT Wald ve intcpt_0 intcpt_1 intcpt_2 intcpt_3 intcpt_4 a_m a_f var_1 cov_1_2 var_2 1 y2 M1 4 1.84E-11 3.6957 3.7435 1 -1E+10 -0.7519 0.0561 0.7242 1E+10 -0.0621 -0.1348 0.0058 -0.0001 0.0060 2 y2 M2 3 6.72E-09 6.0040 6.0544 1 -1E+10 -0.7744 0.0377 0.7116 1E+10 -0.1292 -0.1253 0.0059 -0.0006 0.0060 3 y2 M3 3 6.63E-09 6.9683 6.9752 1 -1E+10 -0.7866 0.0259 0.7061 1E+10 -0.1637 -0.1098 0.0059 -0.0004 0.0060 4 y2 M4 4 9.59E-13 9.9955 10.0300 1 -1E+10 -0.7780 0.0405 0.7257 1E+10 -0.1874 -0.1445 0.0059 -0.0004 0.0061 5 y2 M5 4 4.88E-12 17.4851 17.4554 1 -1E+10 -0.7941 0.0435 0.7426 1E+10 -0.1977 -0.2617 0.0059 0.0002 0.0060 6 y2 M6 4 4.27E-10 32.3860 31.6667 1 -1E+10 -0.8480 0.0250 0.7588 1E+10 -0.3181 -0.3468 0.0062 0.0008 0.0062 7 y2 M7 70 9.86E-09 27.3604 38.9494 1 -1E+10 -1.8158 0.0928 1.6222 1E+10 -0.4179 -1.6224 0.0145 -0.0060 0.1223 8 y2 M8 4 1.68E-10 19.9627 19.5890 1 -1E+10 -0.8219 0.0162 0.7222 1E+10 -0.2248 -0.2782 0.0060 0.0005 0.0060 9 y2 M9 4 1.22E-10 12.6076 12.5270 1 -1E+10 -0.7970 0.0228 0.7144 1E+10 -0.1708 -0.2213 0.0060 0.0003 0.0060 10 y2 M10 3 6.7E-09 3.4708 3.4456 1 -1E+10 -0.7802 0.0211 0.6937 1E+10 -0.1125 -0.0969 0.0059 0.0005 0.0059 11 y2 M11 3 5.54E-10 0.8295 0.8311 1 -1E+10 -0.7762 0.0242 0.6909 1E+10 -0.0229 -0.0659 0.0058 0.0001 0.0058 12 y2 M12 22 6.95E-09 3.1222 3.1047 1 -1E+10 -0.7702 0.0360 0.7082 1E+10 -0.0431 -0.1397 0.0189 -0.0135 0.0193 13 y2 M13 3 1.84E-09 3.8660 3.8527 1 -1E+10 -0.7732 0.0346 0.7068 1E+10 -0.0425 -0.1400 0.0058 -0.0005 0.0059 14 y2 M14 3 3.82E-09 4.1467 4.0948 1 -1E+10 -0.7815 0.0243 0.6993 1E+10 -0.1318 -0.0755 0.0058 -0.0002 0.0058 15 y2 M15 3 1.3E-09 2.2302 2.2114 1 -1E+10 -0.7655 0.0360 0.7074 1E+10 -0.0983 -0.0547 0.0059 -0.0002 0.0058 16 y2 M16 3 8.5E-09 4.3384 4.2811 1 -1E+10 -0.7580 0.0469 0.7239 1E+10 -0.1382 -0.0727 0.0060 -0.0003 0.0058 17 y2 M17 3 5.99E-10 1.5769 1.5711 1 -1E+10 -0.7582 0.0418 0.7115 1E+10 -0.0954 -0.0116 0.0060 -0.0005 0.0058 18 y2 M18 3 9.85E-10 0.9288 0.9229 1 -1E+10 -0.7757 0.0218 0.6904 1E+10 -0.0212 -0.0691 0.0060 -0.0003 0.0058 19 y2 M19 3 1.15E-09 2.4180 2.4229 1 -1E+10 -0.7773 0.0218 0.6933 1E+10 0.0173 -0.1188 0.0060 -0.0005 0.0058

Page 35: PROC QTL

32

Example 4: QTL mapping via the Bayesian method

This example shows the analysis of a simulated dataset of a BC population for a continuous trait (WANG et al. 2005). There are 121 markers evenly distributed along a chromosome covering 2400 cM in length. The number of the individuals is 498. The primary dataset and the MAP dataset are named E4data and E4map, respectively. All datasets used in examples 1-6 will be copied to the SASUSER library when PROC QTL is installed.

The following code will invoke the Bayesian QTL mapping and the posterior sample will be analyzed with a 1 cM bin.

/* Program 4-4 */ proc qtl data=sasuser.E4data map= sasuser.E4map

out=MCMC outpost=result/{step=1.0} method="bayes" genotype="expect" position="dynamic" coverage=25 burnin=2000 trimming=20 seed=0 posteriorsample=500;

model trait=; genotype A="A" H="H"; estimate 'a'=-1 1; matingtype 'BC'; run;

The output SAS dataset named RESULT has 2401 observations and 6 variables as shown in Table 4.4.

Table 4.4 Output of Program 4-4.

chr marker position count a var_1 1 1 COL1 0 9 0.1417 0.0119 2 1 1 10 0.0787 0.0134 3 1 2 13 0.1064 0.0208 4 1 3 15 0.2197 0.0801 5 1 4 19 0.2256 0.0459 6 1 5 13 0.1863 0.0705 7 1 6 20 0.2990 0.0800 8 1 7 21 0.1705 0.0475 9 1 8 17 0.1505 0.1059 10 1 9 16 0.2484 0.0818 11 1 10 14 0.3160 0.0760 12 1 11 15 0.2122 0.0557 13 1 12 21 0.2449 0.0587 14 1 13 16 0.3824 0.4287 2401 1 COL121 2400 0 0.0000 0.0000

Page 36: PROC QTL

33

Example 5: Estimating genomewide epistatic effects via the empirical Bayesian method

This example uses the mice data (ATCHLEY et al. 1997) of Example 1 to demonstrate marker analysis for continuous traits via the empirical Bayesian method (eBayes). Epistatic effects can be estimated by turning on the INTERACTION option in the PROC QTL statement.

The following codes will invoke the empirical Bayesian method.

/* Program 4-5 */ proc qtl data=sasuser.E1data map=sasuser.E1map

out=result method='ebayes' interaction ebayesparm={-2, 0};

model wt10= ; matingtype 'F2'; genotype A1A1='A' A1A2='H' A2A2='B'; estimate 'additive'=1 0 -1; run;

The output SAS dataset named RESULT has 351 observations and 14 variables as shown in Table 4.5.

Page 37: PROC QTL

34

Table 4.5. Output or Program 4-5 for the empirical Bayes method.

chr1 marker1 pos1 chr2 marker2 pos2 u_additi s_additi v_additi f_additi n_Iter conv_err ve intercpt 1 1 M1 0 1 M1 0 4.33E-31 1E-30 1E-15 1.87E-31 20 6.57E-09 16.8822 63.1644 2 1 M1 0 1 M2 19.6 -1.62 3.128471 0.709898 5.207614 20 6.57E-09 16.8822 63.1644 3 1 M1 0 1 M3 25.5 -9.2E-32 1E-30 1E-15 8.44E-33 20 6.57E-09 16.8822 63.1644 4 1 M1 0 1 M4 26.4 -9.2E-32 1E-30 1E-15 8.51E-33 20 6.57E-09 16.8822 63.1644 5 1 M1 0 1 M5 42.9 5.45E-31 1E-30 1E-15 2.97E-31 20 6.57E-09 16.8822 63.1644 6 1 M1 0 1 M6 48.7 4.62E-31 1E-30 1E-15 2.14E-31 20 6.57E-09 16.8822 63.1644 7 1 M1 0 1 M7 50.1 4.96E-31 1E-30 1E-15 2.46E-31 20 6.57E-09 16.8822 63.1644 8 1 M1 0 1 M8 62.9 5.9E-31 1E-30 1E-15 3.48E-31 20 6.57E-09 16.8822 63.1644 9 1 M1 0 1 M9 69.6 2.63E-31 1E-30 1E-15 6.91E-32 20 6.57E-09 16.8822 63.1644 10 1 M1 0 1 M10 71.7 2.84E-31 1E-30 1E-15 8.06E-32 20 6.57E-09 16.8822 63.1644 11 1 M1 0 1 M11 73.9 1.18E-31 1E-30 1E-15 1.4E-32 20 6.57E-09 16.8822 63.1644 12 1 M1 0 1 M12 77.9 2.57E-31 1E-30 1E-15 6.59E-32 20 6.57E-09 16.8822 63.1644 13 1 M1 0 1 M13 85.4 4.39E-31 1E-30 1E-15 1.93E-31 20 6.57E-09 16.8822 63.1644 14 1 M1 0 1 M14 98.7 8.99E-31 1E-30 1E-15 8.09E-31 20 6.57E-09 16.8822 63.1644 15 1 M1 0 1 M15 101.7 8.3E-31 1E-30 1E-15 6.89E-31 20 6.57E-09 16.8822 63.1644 16 1 M1 0 1 M16 104.2 5.67E-31 1E-30 1E-15 3.22E-31 20 6.57E-09 16.8822 63.1644 17 1 M1 0 1 M17 110.5 2.86E-31 1E-30 1E-15 8.18E-32 20 6.57E-09 16.8822 63.1644 18 1 M1 0 1 M18 121.6 -1E-31 1E-30 1E-15 1.02E-32 20 6.57E-09 16.8822 63.1644 19 1 M1 0 2 M19 0 3.29E-32 1E-30 1E-15 1.08E-33 20 6.57E-09 16.8822 63.1644 20 1 M1 0 2 M20 13.2 -6.3E-31 1E-30 1E-15 3.95E-31 20 6.57E-09 16.8822 63.1644 351 2 M26 87.9 2 M26 87.9 -6.1E-31 1E-30 1E-15 3.77E-31 20 6.57E-09 16.88219 63.1644

Page 38: PROC QTL

35

Example 6: Joint mapping of QTL for multiple traits

This example demonstrates how to perform joint mapping for multiple continuous traits using an interval mapping method and the Bayesian method.

This example uses a simulated data of F2 population. There are 241 markers dispersed evenly along a chromosome covering 2400 cM in length. Two continuous traits, y1 and y2, and all the 241 markers were generated for a total of 500 individuals. The two traits were converted subsequently to ordinal traits, named yy1 and yy2, which have three categories and four categories, respectively. The primary dataset and the MAP dataset are named E6data and E6map, respectively.

There is no significant difference between QTL mapping of multiple traits and that of single trait in terms of the program syntax. Users only need to specify multiple dependent variables in the MODEL statement. However, not all methods in the PROC QTL statement are valid for multiple trait analysis. Please refer to the summary of METHOD option in the PROC QTL statement for more details about the availability of methods for joint mapping of multiple traits.

In the current version of PROC QTL, there are only two interval mapping methods, LS and ML, are valid for joint mapping of multiple continuous traits. The following code will invoke ML method to joint mapping for two continuous traits, y1 and y2.

/* Program 4-6-1 */ proc qtl data=sasuser.E6data map=sasuser.E6map

out=RESULT method= "ML" step=1.0; model y1 y2=; matingtype 'F2'; genotype A1A1='1' A1A2='2' A2A2='3'; estimate 'additive'=1 0 -1; run;

The output SAS dataset named RESULT has 2401 observations and 16 variables as shown in Table 4.6.1.

The following will show how to map multiple traits jointly using the Bayesian method. The following code will perform joint mapping for continues trait y1 and ordinal trait yy2 and directly provide the post analysis result of the Bayesian analysis.

Page 39: PROC QTL

36

/* Program 4-6-2 */ proc qtl data= sasuser.E6data map= sasuser.E6map

out=MCMC outpost= RESULT /{step=1.0} method= "bayes" genotype="impute" position="dynamic" coverage=20 burnin=2000 trim=20 posteriorsample=300;

class yy2; model y1 yy2=; matingtype 'F2'; genotype A1A1='1' A1A2='2' A2A2='3'; estimate 'additive'=1 0 -1; run;

The output SAS dataset named RESULT has 2401 observations and 6 variables as shown in Table 4.6.2.

Page 40: PROC QTL

37

Table 4.6.1. Output of Program 4-6-1.

chr1 marker1 position n_Iter conv_err LRT ve1 cov1_2 ve2 trait1 LRT1 intecpt1 additiv1 trait2 LRT2 intecpt2 additiv2 1 1 M1 0 1 2.19E-09 7.6747 33.1105 -4.1286 30.8966 y1 7.6747 10.0396 0.9834 y2 0.7937 5.0295 -0.3077 2 1 1 1 4.63E-09 7.6940 33.1109 -4.1263 30.8951 y1 7.6940 10.0382 1.0031 y2 0.8144 5.0300 -0.3188 3 1 2 1 7.95E-09 7.6587 33.1146 -4.1250 30.8939 y1 7.6587 10.0369 1.0158 y2 0.8313 5.0305 -0.3282 4 1 3 2 7.01E-11 7.5653 33.1220 -4.1248 30.8930 y1 7.5653 10.0356 1.0209 y2 0.8439 5.0309 -0.3354 5 1 4 2 1.16E-10 7.4123 33.1331 -4.1259 30.8925 y1 7.4123 10.0344 1.0178 y2 0.8516 5.0314 -0.3402 6 1 5 2 1.68E-10 7.2006 33.1478 -4.1282 30.8924 y1 7.2006 10.0334 1.0062 y2 0.8541 5.0319 -0.3423 7 1 6 2 2.13E-10 6.9335 33.1659 -4.1317 30.8927 y1 6.9335 10.0325 0.9862 y2 0.8509 5.0323 -0.3415 8 1 7 2 2.15E-10 6.6171 33.1869 -4.1364 30.8934 y1 6.6171 10.0319 0.9581 y2 0.8416 5.0326 -0.3377 9 1 8 2 1.48E-10 6.2593 33.2102 -4.1421 30.8945 y1 6.2593 10.0314 0.9225 y2 0.8261 5.0329 -0.3309

10 1 9 2 5.08E-11 5.8703 33.2352 -4.1485 30.8959 y1 5.8703 10.0311 0.8804 y2 0.8046 5.0331 -0.3214 11 1 M2 10 2 3.15E-14 5.4613 33.2611 -4.1556 30.8976 y1 5.4613 10.0311 0.8330 y2 0.7777 5.0332 -0.3095 12 1 11 2 6.33E-10 6.0297 33.2218 -4.1497 30.8985 y1 6.0297 10.0280 0.9020 y2 0.7274 5.0337 -0.3051 13 1 12 2 7.63E-09 6.6088 33.1819 -4.1448 30.8998 y1 6.6088 10.0248 0.9672 y2 0.6759 5.0340 -0.2986 14 1 13 3 3.92E-10 7.1809 33.1423 -4.1410 30.9015 y1 7.1809 10.0214 1.0260 y2 0.6258 5.0343 -0.2905 15 1 14 3 9.16E-10 7.7275 33.1043 -4.1387 30.9036 y1 7.7275 10.0182 1.0761 y2 0.5795 5.0346 -0.2815 16 1 15 3 1.13E-09 8.2309 33.0689 -4.1380 30.9060 y1 8.2309 10.0150 1.1159 y2 0.5384 5.0348 -0.2720 17 1 16 3 8.29E-10 8.6764 33.0372 -4.1388 30.9086 y1 8.6764 10.0121 1.1446 y2 0.5029 5.0350 -0.2625 18 1 17 3 3.65E-10 9.0531 33.0100 -4.1412 30.9113 y1 9.0531 10.0094 1.1619 y2 0.4729 5.0351 -0.2531 19 1 18 3 8.68E-11 9.3543 32.9879 -4.1451 30.9142 y1 9.3543 10.0070 1.1681 y2 0.4479 5.0353 -0.2440 20 1 19 2 3E-09 9.5779 32.9712 -4.1501 30.9171 y1 9.5779 10.0050 1.1640 y2 0.4272 5.0354 -0.2351

2401 1 M241 2400 2 4.85E-10 20.9651 32.5633 -4.7156 30.7649 y1 20.9651 9.9730 1.4156 y2 2.9831 4.9946 0.5889

Page 41: PROC QTL

38

Table 4.6.2. Output of Program 4-6-2.

chr marker position count additiv1 additiv2 1 1 M1 0 13 -0.0449 -0.3178 2 1 1 50 -0.0426 -0.3195 3 1 2 50 0.0149 -0.3640 4 1 3 35 0.0346 -0.2984 5 1 4 51 0.0683 -0.2998 6 1 5 38 -0.0547 -0.2630 7 1 6 20 -0.0762 -0.2712 8 1 7 18 -0.0375 -0.1729 9 1 8 19 -0.1490 -0.0951

10 1 9 15 0.0032 -0.1559 11 1 M2 10 10 -0.0730 -0.0483 12 1 11 8 -0.0166 -0.0294 13 1 12 1 -0.0863 -0.0431 14 1 13 9 -0.1531 -0.2603 15 1 14 13 -0.2016 -0.2588 16 1 15 7 -0.0920 -0.0806 17 1 16 7 -0.1385 -0.2393 18 1 17 3 -0.0925 -0.0966 19 1 18 1 -0.3508 -0.6257 20 1 19 4 -0.1891 -0.1460 21 1 M3 20 3 -0.1143 -0.1384 22 1 21 2 -0.4757 -0.3172 23 1 22 2 -0.2336 -0.2880 24 1 23 2 -0.3554 -0.0198 25 1 24 3 -0.5092 0.0164 26 1 25 5 -0.1678 -0.1048 27 1 26 2 -0.3457 -0.2301 28 1 27 8 -0.3280 0.0124 29 1 28 6 -0.4372 0.0680 30 1 29 6 -0.0316 0.0254 31 1 M4 30 10 -0.3139 0.0296 33 1 32 6 -0.2115 0.0282 34 1 33 7 -0.4435 0.0230 35 1 34 5 -0.2716 0.0951 36 1 35 13 -0.3034 0.1016 37 1 36 5 -0.3328 0.1946 38 1 37 6 -0.1580 0.1198 39 1 38 10 -0.1873 0.2163

2401 1 M241 2400 0 0.0000 0.0000

Page 42: PROC QTL

39

Example 7: Estimating genomewide QTL effects for discrete traits that follow some special distributions

The first part of this example uses a simulated data to demonstrate how to perform QTL mapping for discrete trait that follows Poisson distribution. There are 500 individuals sampled from a F2 population. The genotypes of 481 markers and the phenotype of a Poisson-distributed trait have been simulated for each individual. The 481 markers are evenly dispersed along a 2400 cM chromosome. The primary dataset and the MAP dataset are named E7data1 and E7map1, respectively.

The following code will invoke the ML method.

/* Program 4-7-1 */ proc qtl data=SASUser.E7data1 map=SASUser.E7map1

out=result method="ML" distribution="poisson"; model y=; matingtype "F2"; genotype A="1" B="-1" H="0"; Estimate "Add"=1 0 -1; run;

The output SAS dataset named RESULT has 2401 observations and 12 variables as shown in Table 4.7.1.

Please note that although the Poisson variable is discrete, user should NOT declare it in the CLASS statement.

The second part uses a real data of F2 population for the trait of wheat female sterility (DOU et al. 2009) to demonstrate QTL mapping for binomial traits. The seed setting ratio on fully pollinated spikes of 243 plants were evaluated to measure the female fertility, which is the ratio of the number of seed setting spikelets (seed_setting) to the total number of spikelets (spikelets). There are 28 molecular markers distributed along 5 chromosomes measured for each individual. The primary dataset and the MAP dataset are named E7data2 and E2map2 , respectively.

The following code invokes the FISHER method.

/* Program 4-7-2 */ proc qtl data=SASUser.E7data2 map=SASUser.E7map2

out=result method="FISHER" distribution="binomial";

model seed_setting/spikelets =; matingtype "F2";

Page 43: PROC QTL

40

genotype A="A" B="B" H="H"; Estimate "Add"=1 0 -1; run;

The output SAS dataset named RESULT has 376 observations and 12 variables as shown in Table 4.7.2.

A pseudo variable named 'binary' was generated using the following rules to indicate whether the plant is sterile completely or not.

1 for seed_setting 0

binary0 for seed_setting 0

>= =

The variable 'binary' can be treated as a special binomial trait in which the number of trials is one for all individuals. In this situation, users can ignore the trails in the model statement as shown below,

/* Program 4-7-3 */ proc qtl data=SASUser.E7data2 map=SASUser.E7map2

out=result method="FISHER" distribution="binomial";

model binary =; matingtype "F2"; genotype A="A" B="B" H="H"; Estimate "Add"=1 0 -1; run;

The output dataset named 'result' includes 376 observations and 12 variables as shown in Table 4.7.3.

Page 44: PROC QTL

41

Table 4.7.1. Output of Program 4-7-1.

trait chr marker position n_Iter conv_err LRT Wald ve intercpt Add var_1 1 y 1 M1 0 3 5.64E-16 38.0860 37.6590 1 0.5246 0.3044 0.0025

2 y 1 1 3 2.35E-09 37.3180 37.1450 1 0.5249 0.3080 0.0026

3 y 1 2 4 1.98E-10 35.5820 35.6630 1 0.5259 0.3049 0.0026

4 y 1 3 4 8.45E-10 32.7950 32.8800 1 0.5280 0.2941 0.0026

5 y 1 4 4 6.97E-10 28.8170 28.4980 1 0.5314 0.2726 0.0026

6 y 1 M2 5 2 4.34E-09 23.4580 23.2960 1 0.5374 0.2311 0.0023

7 y 1 6 3 1.14E-10 26.0970 25.8090 1 0.5368 0.2477 0.0024

8 y 1 7 3 8.83E-10 28.4490 28.1070 1 0.5366 0.2600 0.0024

9 y 1 8 3 8.49E-10 30.4250 30.0560 1 0.5365 0.2681 0.0024

10 y 1 9 3 1.17E-10 31.9600 31.5830 1 0.5367 0.2719 0.0023

11 y 1 M3 10 3 1.43E-16 33.0010 32.7310 1 0.5373 0.2710 0.0022

12 y 1 11 4 4.06E-09 42.2450 41.1980 1 0.5281 0.3275 0.0026

13 y 1 12 4 4.34E-09 49.3450 48.8000 1 0.5230 0.3551 0.0026

14 y 1 13 4 1.01E-09 54.5250 53.8520 1 0.5201 0.3679 0.0025

15 y 1 14 4 3.42E-11 58.0440 57.0110 1 0.5188 0.3706 0.0024

16 y 1 M4 15 3 2.14E-14 60.0470 58.9690 1 0.5190 0.3646 0.0023

2401 y 1 M481 2400 2 7.55E-16 0.4810 0.4810 1 0.5493 0.0339 0.0024

Table 4.7.2. Output of Program 4-7-2.

trait chr marker position n_Iter conv_err LRT Wald ve intercpt Add var_1 1 seed_set 1 Xwmc667 0.00 3 1.77E-10 194.99 187.47 1 0.7834 0.3403 0.0006 2 seed_set 1 0.99 3 2.18E-10 197.86 190.73 1 0.7846 0.3513 0.0006

3 seed_set 1 1.98 3 3.15E-10 200.68 193.95 1 0.7856 0.3624 0.0007 4 seed_set 1 2.97 3 4.81E-10 203.41 197.13 1 0.7867 0.3735 0.0007 5 seed_set 1 3.96 3 7.25E-10 206.05 200.22 1 0.7877 0.3847 0.0007

6 seed_set 1 4.95 3 1.06E-09 208.57 203.22 1 0.7886 0.3958 0.0008 7 seed_set 1 5.94 3 1.49E-09 210.95 206.08 1 0.7895 0.4066 0.0008 8 seed_set 1 6.93 3 2.01E-09 213.17 208.79 1 0.7902 0.4173 0.0008

9 seed_set 1 7.92 3 2.62E-09 215.20 211.32 1 0.7908 0.4275 0.0009 10 seed_set 1 8.91 3 3.31E-09 217.02 213.64 1 0.7913 0.4373 0.0009

11 seed_set 1 9.90 3 4.05E-09 218.62 215.72 1 0.7916 0.4465 0.0009

12 seed_set 1 10.89 3 4.81E-09 219.98 217.56 1 0.7917 0.4550 0.0010 13 seed_set 1 11.88 3 5.55E-09 221.08 219.11 1 0.7916 0.4628 0.0010

14 seed_set 1 12.87 3 6.25E-09 221.91 220.37 1 0.7912 0.4696 0.0010 15 seed_set 1 13.86 3 6.86E-09 222.45 221.31 1 0.7907 0.4753 0.0010

16 seed_set 1 14.85 3 7.36E-09 222.70 221.93 1 0.7898 0.4800 0.0010

376 seed_set 5 cft21 155.66 2 6.60E-09 9.40 9.36 1 0.7022 -0.0792 0.0007

Page 45: PROC QTL

42

Table 4.7.3. Output of Program 4-7-3.

trait chr marker position n_Iter conv_err LRT Wald ve intercpt Add var_1

1 Binary 1 Xwmc667 0.00 4 7.77E-11 8.06 7.43 1 1.0918 0.3827 0.0197 2 Binary 1 0.99 4 1.00E-10 8.10 7.50 1 1.0931 0.3935 0.0207

3 Binary 1 1.98 4 1.32E-10 8.14 7.56 1 1.0944 0.4043 0.0216

4 Binary 1 2.97 4 1.76E-10 8.17 7.62 1 1.0957 0.4150 0.0226

5 Binary 1 3.96 4 2.40E-10 8.20 7.68 1 1.0968 0.4256 0.0236 6 Binary 1 4.95 4 3.28E-10 8.21 7.73 1 1.0979 0.4360 0.0246

7 Binary 1 5.94 4 4.48E-10 8.23 7.78 1 1.0988 0.4461 0.0256

8 Binary 1 6.93 4 6.03E-10 8.23 7.82 1 1.0996 0.4557 0.0266

9 Binary 1 7.92 4 7.96E-10 8.23 7.85 1 1.1001 0.4649 0.0275 10 Binary 1 8.91 4 1.02E-09 8.22 7.88 1 1.1005 0.4734 0.0284

11 Binary 1 9.90 4 1.28E-09 8.20 7.90 1 1.1006 0.4813 0.0293

12 Binary 1 10.89 4 1.54E-09 8.18 7.91 1 1.1005 0.4883 0.0301

13 Binary 1 11.88 4 1.80E-09 8.15 7.91 1 1.1001 0.4943 0.0309 14 Binary 1 12.87 4 2.01E-09 8.11 7.91 1 1.0993 0.4994 0.0315 15 Binary 1 13.86 4 2.17E-09 8.06 7.89 1 1.0983 0.5033 0.0321 16 Binary 1 14.85 4 2.25E-09 8.00 7.86 1 1.0969 0.5061 0.0326 17 Binary 1 15.84 4 2.24E-09 7.94 7.82 1 1.0951 0.5076 0.0329 18 Binary 1 16.83 4 2.13E-09 7.87 7.78 1 1.0931 0.5078 0.0332 19 Binary 1 17.82 4 1.93E-09 7.79 7.72 1 1.0906 0.5066 0.0333 20 Binary 1 18.81 4 1.68E-09 7.70 7.64 1 1.0879 0.5041 0.0332 21 Binary 1 19.80 4 1.38E-09 7.61 7.56 1 1.0848 0.5002 0.0331 22 Binary 1 20.79 4 1.08E-09 7.51 7.47 1 1.0814 0.4950 0.0328 23 Binary 1 21.79 4 8.05E-10 7.40 7.36 1 1.0778 0.4885 0.0324 24 Binary 1 22.78 4 5.65E-10 7.28 7.24 1 1.0740 0.4808 0.0319 25 Binary 1 23.77 4 3.74E-10 7.15 7.11 1 1.0699 0.4719 0.0313 26 Binary 1 24.76 4 2.31E-10 7.02 6.97 1 1.0658 0.4620 0.0306

27 Binary 1 25.75 4 1.34E-10 6.88 6.82 1 1.0616 0.4512 0.0298

28 Binary 1 26.74 4 7.15E-11 6.73 6.66 1 1.0573 0.4396 0.0290 29 Binary 1 27.73 4 3.50E-11 6.57 6.50 1 1.0531 0.4273 0.0281

30 Binary 1 28.72 4 1.56E-11 6.41 6.32 1 1.0489 0.4145 0.0272

376 Binary 5 cft21 155.66 3 6.35E-11 0.37 0.36 1 0.9867 -0.0853 0.0200

Page 46: PROC QTL

43

Example 8: Composite interval mapping using PROC QTL

PROC QTL does not directly support composite interval mapping (CIM, ZENG 1994), but with some extra coding, users can perform CIM. Users can create a SAS macro to perform CIM analysis using PROC QTL. In this example, we provide a SAS macro to allow users to perform CIM.

The following code will invoke CIM analysis in two steps: 1) select marker cofactors according to the result of marker analysis; 2) perform CIM analysis using cofactors selected in the first step.

/* Program 4-8 */ data map; set SASUser.E7map1; run; data data; set SASUser.E7data1; run; /*estimate effects of markers by marker analysis*/; proc qtl data=out.data map=out.map out=lrtmarker

method="fisher" step=100; model y1= ; matingtype "F2"; genotype A="1" B="-1" H="0"; Estimate "Add"=1 0 -1; run; proc iml; use lrtmarker; read all var {marker LRT Position Chr}; free index; do i=1 to nROW(LRT); /*the threshold to select cofactors*/; if (LRT[i]>80) then index=index//i; end; create cofactor var {marker position chr}; marker=marker[index]; position=position[index]; chr=chr[index]; append; quit;run; %macro CIM; %do M=1 %to 480; proc iml; Call SYMPUTX

("M1", %str(compress("M"+char(&M)))); Call SYMPUTX

Page 47: PROC QTL

44

("M2", %str(compress("M"+char(&M+1)))); use cofactor; read all var {marker chr}; %put M1 &M1 M2 &M2; CoMark=""; do i=1 to nrow(marker); if ("&M1"^=marker[i])&("&M2"^=marker[i])

then CoMark=CoMark+(marker[i]); end; Call SYMPUTX ("CoFact", %str(coMark)); quit;run; data map; set out.map; if (mark="&M1") | (mark="&M2"); run; proc qtl data=data map=map out=tmp

method="fisher" step=1; class &CoFact; model y1= &CoFact; matingtype "F2"; genotype A="1" B="-1" H="0"; Estimate "Add"=1 0 -1; run; proc append base=RESULT data=tmp FORCE; run; %end; %mend; %CIM;

The output dataset contains 2880 observations and 42 variables, which includes 30 cofactors from 15 markers. For convenience of coding, all markers except for the first and the last are calculated twice in this code. However, users may avoid this duplication by adding the RANGE statement in PROC QTL.

Page 48: PROC QTL

45

Example 9: Permutation for the Bayesian shrinkage analysis

There are two permutation strategies in the Bayesian shrinkage analysis, “permutation outside the Markov chain” and “permutation inside the Markov chain” (CHE and XU 2010). PROC QTL does not support the first strategy of permutation because users simply permute the data and call PROC QTL to analyze the permuted data. The second strategy (permutation inside the Markov chain) is supported by PROC QTL. Users can turn on the PERMUTATION option in the PROC QTL statement to generate the posterior sample from the permuted data. The posterior distribution for each parameter mimics the null distribution.

The following code will analyze the permuted data and generate the null distribution for each parameter.

/* Program 4-9 */ proc qtl data=sasuser.E4data map= sasuser.E4map

out=mcmcsample outpost=post/{quantile={0.005 0.025 0.975 0.995}} method="bayes" genotype="expect" position="static" coverage=25 burnin=2000 trimming=10 seed=0 posteriorsample=1000 permutation;

model trait=; genotype A="A" H="H"; estimate 'a'=-1 1; matingtype 'BC'; run;

The output dataset named post includes 96 observations and 10 variables as shown in Table 4.9 given below. The last four variables (Q1_1 – Q1_4) represent the 0.5, 2.5, 97.5 and 99.5 percentiles of the posterior samples.

Page 49: PROC QTL

46

Table 4.9 Output of Program 4-9.

chr marker position count a var_1 Q1_1 Q1_2 Q1_3 Q1_4 1 1 12.5 1000 -0.0177 0.0601 -1.4888 -0.5774 0.3256 1.0308 2 1 37.5 1000 -0.0031 0.0298 -0.9477 -0.2730 0.1864 0.9452 3 1 62.5 1000 0.0012 0.0455 -1.1648 -0.3751 0.4248 0.9222 4 1 87.5 1000 -0.0017 0.0365 -1.1433 -0.3094 0.3434 0.9824 5 1 112.5 1000 0.0021 0.0451 -1.1313 -0.4514 0.4194 1.1619 6 1 137.5 1000 0.0049 0.0199 -0.6139 -0.0923 0.1664 0.9805 7 1 162.5 1000 0.0004 0.0351 -0.9348 -0.2300 0.2177 1.0350 8 1 187.5 1000 -0.0012 0.0273 -0.9603 -0.2001 0.1184 1.2596 9 1 212.5 1000 0.0124 0.0381 -0.8040 -0.2210 0.3988 1.3504

10 1 237.5 1000 -0.0033 0.0324 -0.8677 -0.3894 0.2549 1.0250 11 1 262.5 1000 0.0098 0.0406 -0.8971 -0.2022 0.4294 1.2351 12 1 287.5 1000 0.0022 0.0290 -0.7955 -0.2117 0.2380 0.8539 13 1 312.5 1000 -0.0032 0.0344 -1.0110 -0.3552 0.2905 0.9551 14 1 337.5 1000 0.0065 0.0397 -0.8985 -0.2967 0.3468 1.3176 15 1 362.5 1000 -0.0057 0.0367 -1.1300 -0.4153 0.2819 1.0034 16 1 387.5 1000 0.0098 0.0542 -1.0089 -0.3598 0.4822 1.4772 17 1 412.5 1000 0.0035 0.0395 -0.9550 -0.3672 0.3750 1.1798 18 1 437.5 1000 -0.0007 0.0317 -0.9669 -0.3305 0.2667 0.9534 19 1 462.5 1000 -0.0067 0.0489 -1.2084 -0.5667 0.4427 1.0023 20 1 487.5 1000 -0.0047 0.0410 -1.1607 -0.3466 0.2027 0.9376 21 1 512.5 1000 -0.0011 0.0650 -1.3829 -0.5293 0.5804 1.1739 22 1 537.5 1000 -0.0016 0.0317 -0.9768 -0.3057 0.2744 1.0987 23 1 562.5 1000 0.0045 0.0126 -0.5987 -0.0645 0.1752 0.6525 24 1 587.5 1000 -0.0006 0.0325 -0.9279 -0.3793 0.3702 0.8921 25 1 612.5 1000 0.0044 0.0340 -0.9805 -0.3345 0.3615 1.0276 26 1 637.5 1000 0.0066 0.0417 -1.0039 -0.3614 0.4175 1.0229 27 1 662.5 1000 -0.0073 0.0356 -1.0884 -0.3679 0.2829 0.9232 28 1 687.5 1000 -0.0007 0.0495 -1.1536 -0.4222 0.4656 0.8986 29 1 712.5 1000 -0.0023 0.0425 -1.0611 -0.3268 0.2902 1.0233 30 1 737.5 1000 -0.0041 0.0392 -1.0285 -0.3175 0.2239 1.2047 31 1 762.5 1000 -0.0018 0.0450 -1.0455 -0.3505 0.2830 1.2809 32 1 787.5 1000 0.0083 0.0481 -1.1236 -0.2784 0.4541 1.2732 33 1 812.5 1000 -0.0156 0.0493 -1.2771 -0.5003 0.3509 0.8893 34 1 837.5 1000 0.0053 0.0337 -0.8350 -0.3215 0.3717 1.1037 35 1 862.5 1000 0.0004 0.0295 -0.8416 -0.2996 0.2918 0.9336 36 1 887.5 1000 0.0078 0.0290 -0.6803 -0.2570 0.3356 1.1436 37 1 912.5 1000 -0.0015 0.0484 -1.3270 -0.3704 0.4736 1.0440 38 1 937.5 1000 -0.0108 0.0364 -1.1106 -0.3686 0.1809 0.8488

96 1 2387.5 1000 0.0100 0.0409 -1.0118 -0.2834 0.4688 1.1288

Page 50: PROC QTL

47

References:

Atchley, W. R., S. Xu and D. E. Cowley, 1997 Altering developmental trajectories in mice by restricted index selection. Genetics 146: 629-640.

Che, X., and S. Xu, 2010 Significance test and genome selection in Bayesian shrinkage analysis. International Journal of Plant Genomics 2010: 11. doi: 10.1155/2010/893206 doi: 10.1155/2010/893206.

Dou, B., B. Hou, H. Xu, X. Lou, X. Chi et al., 2009 Efficient mapping of a female sterile gene in wheat (Triticum aestivum L.). Genetical Research 91: 337-343 doi: 10.1017/S0016672309990218.

Haley, C. S., and S. A. Knott, 1992 A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69: 315-324.

Han, L., and S. Xu, 2008 A Fisher scoring algorithm for the weighted regression method of QTL mapping. Heredity 101: 453-464 doi: 10.1038/hdy.2008.78.

Lan, H., M. Chen, J. Flowers, B. Yandell, D. Stapleton et al., 2006 Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2: e6.

Lander, E. S., and D. Botstein, 1989 Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185-199.

Sas Institute Inc, 1991 SAS/TOOLKIT® Software: Usage and Refernce, Version 6, First Edition. Cary, NC: SAS Institute Inc.

Wang, H., Y. Zhang, X. Li, G. L. Masinde, S. Mohan et al., 2005 Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465-480 doi: 10.1534/genetics.104.039354.

Wang, S., C. J. Basten and Z. B. Zeng2007 Windows QTL Cartographer 2.5 Department of Statistics, North Carolina State University, Raleigh, NC (http://statgen.ncsu.edu/qtlcart/WQTLCart.htm).

Xu, S., 1998a Further investigation on the regression method of mapping quantitative trait loci. Heredity 80: 364-373.

Xu, S., 1998b Iteratively reweighted least squares mapping of quantitative trait loci. Behavior Genetics 28: 341-355.

Xu, S., 2003 Estimating Polygenic Effects Using Markers of the Entire Genome. Genetics 163: 789-801.

Xu, S., 2007 An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63: 513-521 doi: 10.1111/j.1541-0420.2006.00711.x.

Xu, S., 2008 Quantitative trait locus mapping can benefit from segregation distortion. Genetics 180: 2201-2208 doi: 10.1534/genetics.108.090688.

Xu, S., and Z. Hu, 2009 Mapping quantitative trait loci using distorted markers. International Journal of Plant Genomics 2009: Article ID 410825, 11 doi: 10.1155/2009/410825.

Zeng, Z. B., 1994 Precision mapping of quantitative trait loci. Genetics 136: 1457-1468.

Zou, J. H., X. B. Pan, Z. X. Chen, J. Y. Xu, J. F. Lu et al., 2000 Mapping quantitative trait loci controlling sheath blight resistance in two rice cultivars ( Oryza sativa L.). Theoretical and Applied Genetics 101: 569-573 doi: 10.1007/s001220051517.


Recommended