ADMIXMAPa program to model admixture using marker genotype data 
User options
The program requires a list of options to be specified by the user either as commandline arguments, or in a text file the name of which is given as a single argument to the program . As explained above, the most convenient way to specify these arguments is to use a Perl script (see "admixmap.pl" ). A list of these options is given in the following table. Required arguments are in bold.
samples 
Integer specifying total number of iterations of the Markov chain, including burnin. With strong priors and informative markers, a run of about 500 should suffice for inference. Otherwise, a run of at least 20 000 iterations may be necessary. See here for how to determine if the run has been long enough. 
burnin 
Integer specifying number of iterations for burnin of the Markov chain, before posterior samples are output. A burnin of at least 50 iterations is recommended for inference. For analyses requiring long runs, a burnin of up to 500 may be required. 
every 
Integer specifying the "thinning" of samples from the posterior distribution that are written to the output files, after the burnin period. For example, if every=10, sampled values are written to the output files every 10 iterations. We recommend using a value of 5 to keep down the size of the output files. Sampling more frequently than this does not much improve the precision of results, because successive draws are not independent. Thinning the output samples does not affect the calculation of ergodic averages or test statistics, which are based on all sampled values. Note that every must be no greater than (samples  burnin) / 10 or some output files may be empy. 
numannealedruns 
If thermo=0,
this specifies the number of "annealing" runs during burnin. This
usually improves mixing.
If thermo=1, this specifies the number of "temperatures" at which to run in order to estimate the marginal likelihood by thermodynamic integration. Default is 20. 
displaylevel 
0  silent mode; Only start and finish times output to screen. 1  quiet mode; Model specification, priors, test results and diagnostics written to screen. 2  normal mode; more verbose information and an iteration counter output to screen. >2  monitor mode; populationlevel
parameters also written to screen with frequency specified by every. 
resultsdir  Path of directory for output files. Default is 'results'. 
logfile 
Name of log file written by the program. Default is 'logfile.txt', 
seed 
can be used to specify a seed for the random number
generator. 
Allele / Haplotype Frequency Model
The program requires one of the following four options, any one of which specifies the number of subpopulations in the model: populations, allelefreqfile, priorallelefreqfile, or historicallelefreqfile. These options are mutually exclusive.
populations 
Integer specifying number of subpopulations that have contributed to the admixed population under study. If specified as 1, the program fits a model based on a single homogeneous population. This option is not required (and is ignored) if information about allele frequencies is supplied in allelefreqfile, priorallelefreqfile, or historicallelefreqfile, as the number of columns in any of these files defines the number of subpopulations in the model. If none of these files are specified, the parameters of the Dirichlet priors for allele or haplotype frequencies default to 1/n, where n is the number of alleles or haplotypes at each compound locus. 
allelefreqfile 
Pathname of file containing the allele
frequencies of the genotyped loci for each subpopulation. When this option
is specified, the model treats the allele frequencies as fixed constants. This option is obsolete, and retained only for backward compatibility. Instead, use option priorallelefreqfile to specify the allele frequencies, and specify option fixedallelefreqs=1. This allows you to use the same format for the allele frequency file, whether the allele frequencies are fixed, have a prior distribution with no dispersion, or are specified with a dispersion model. 
priorallelefreqfile 
Pathname of file containing parameters of the Dirichlet prior distributions for allele frequencies (or haplotype frequencies) at each compound locus in each subpopulation. Where allele frequencies have been estimated from a sample of unadmixed individuals, the prior distribution parameters for the corresponding subpopulation should be specified as the observed allele counts plus 0.5. Where no allele frequency data are available, specify the prior parameters as 0.5 for each allele ("reference" prior). When this option is specified, the program fits a model in which the allele frequencies in each subpopulation are estimated simultaneously from the unadmixed samples and the admixed sample under study 
Pathname of file containing observed allele counts at the genotyped loci from samples of unadmixed individuals in each subpopulation. When this option is specified, the program fits a model that allows the "historic" allele frequencies in the unadmixed population to vary from the corresponding ancestryspecific allele frequencies in the admixed population under study 
Details of file formats are under Input files
locusfile  path to file containing information about each locus typed 
genotypesfile  path to file containing genotypes for each individual typed 
outcomevarfile  path to file containing values of outcome variables 
coxoutcomevarfile  path to file containing data for a Cox regression 
covariatesfile  path to file containing covariates for a regression model 
targetindicator  Integer specifying column in outcomevarfile that contains the first outcome variable to be modelled. This column number should be specified as an offset from column 1: thus to select the variable in column 1, specify targetindicator=0. The default is 0. 
outcomes 
valid only with outcomevarfile.
Integer specifying the number of columns of the outcomevarfile to use, starting with targetindicator. 
reportedancestry 
not fully tested or documented: allows prior information about each individual’s ancestry to be specified in the model 
testgenotypesfile  specifies genotypes for offline score tests at loci that have not been included in the model. 
indadmixhiermodel 
0  Model for a collection of
individuals in which the admixture proportions of each
individual’s parents, and the sum of intensities on each parental gamete,
are statistically independent given the priors on these parameters.
This option is useful in two situations: (1) when you already have strong prior information about the distribution of admixture in the population from which the individuals have been sampled, and want to specify a Dirichlet prior for each individual’s parental admixture proportions using the option initalpha0; or (2) when you want to calculate the marginal likelihood of the model given the genotype data on each individual. 1 Hierarchical model on individual admixture The default is 1. 
randommatingmodel 
0  assortative mating model (admixture proportions the same in both parents) 1  random mating model The default is 0. 
globalrho 
0  the sum of intensities parameter r is allowed to vary between individuals, or between gametes if a random mating model is specified). This specifies a hierarchical model, with a gamma distribution for the variation of r between individuals specified as below. 1  the sum of intensities r is modelled as a global parameter, set to be the same on all parental gametes The default is 1 
fixedallelefreqs 
1 specifies that priorallelefreqfile contains fixed allele frequencies 0 otherwise default is 0 
correlatedallelefreqs 
valid only with 'populations' or 'priorallelefreqfile' options
1 specifies a correlated allele frequency model 0 otherwise default is 0 
sumintensitiesprior globalsumintensitiesprior 
In a model with global sumintensities or without a hierarchical model of individual admixture, the sum of intensities parameter has a Gamma(a, b) prior specified as " globalsumintensitiesprior="a,b" ". Default values for a and b are 3 and 0.5, giving a prior mean of 6 and prior variance of 12. Otherwise (indadmixhiermodel=1 and globalrho=0 ), the sum of intensities parameter r has a Gamma(a,b) prior distribution and the scale parameter b has a beta hyperprior with parameters b_{0} and b_{1}. This specifies a "GammaGamma" prior, which has mean E(r) = ab_{1} / (b_{0}  1) and variance E(r)(E(r)+1) / (b_{0}2). The three parameters of this prior are specified with sumintensitiesprior. The three values must be enclosed by quotes and separated by commas e.g "sumintensitiesprior="2,3,4" ". Thus, for instance, to model an AfricanAmerican population, for which we have prior information that the sum of intensities parameter is about 6 per morgan, we could specify sumintensitiesprior = "6,40,39" This specifies the prior for the sum of intensities parameter r as Gamma(6, 1) which has mean 6 and variance 1. "0,1,0" specifies a flat prior on log r "1,1,0" specifies a flat prior on r The default, if this option is not specified, is "4,3,3" Where there is not enough data for reliable inference of the sum of intensities parameter, it is often useful to specify that the prior distribution should be truncated at some upper limit of plausible values, using the option truncationpoint. 
etapriormean, etapriorvar  Specify the prior mean and variance of the dispersion parameter(s), h, in a dispersion or correlated allele frequency model. 
etapriorfile 
File containing parameters of the gamma
prior distribution specified for the allele frequency dispersion parameter h in each subpopulation. This option can be used only when a
dispersion model has been specified with the option historicallelefreqfile. This is useful when there are not enough
data for the dispersion parameter to be inferred from the data, and we want
to use prior information from population genetics. This file has one row for each
subpopulation (in the same order as the order of subpopulations by columns in
historicallelefreqfile, and two columns specifying the shape and location
parameters of the gamma distribution.
Thus, for a sample from an AfricanAmerican population, in which historicallelefreqfile contains
counts of alleles in samples of modern west Africans (in the first column)
and Europeans (in the second column), we might specify an etaprior file containing
these two lines: 50 1 500 1 This specifies a prior with mean 50 for the
parameter for dispersion of allele frequencies between modern unadmixed west
Africans and the African gene pool in AfricanAmericans, and a prior with
mean 500 and variance 500 for the parameter for dispersion of allele
frequencies between modern unadmixed Europeans and the European gene pool
in AfricanAmericans. The dispersion parameter is related to the fixation index F_{ST} by x = (1 + F_{ST}) / F_{ST},
so values of 50 and 500 for x correspond roughly to values of 0.02 and 0.002
for F_{ST}. 
admixtureprior,
admixtureprior1 
When
indadmixhiermodel = 0, each of these two options
can be used to specify a Dirichlet parameter vector for parental admixture proportions. The parameter vector is specified as a
string of numbers separated by commas.
For instance, with a model based on 3 subpopulations: admixtureprior
= “2, 8, 3.5” would specify the prior for parental admixture
proportions (or the maternal gamete if option randommatingmodel=1 has been
specified) with parameter vector c(2, 8, 3.5). admixtureprior1 can be used similarly to specify the prior for
paternal
admixture proportions if optionrandommatingmodel=1 has been specified. For example, "admixtureprior = 1,1,0" and "admixtureprior1 = 1,1,1" would specify that one parent has 2way admixture (between subpopulations 1 and 2) and the other has 3way admixture between subpopulations . If
indadmixhiermodel =1, admixtureprior can be used to specify initial
values for the population admixture Dirichlet parameters. 
regressionpriorprecision  Prior precision (1 / variance) of regression parameters 
popadmixproportionsequal  Specifies that the populationlevel admixture proportions are to be kept equal 
Pathnames of output files, details of file formats in Output files.
paramfile  Populationlevel admixture and sumofintensities 
regparamfile  Regression parameters 
dispparamfile  Allele/haplotype frequency dispersion in historicallelefreqs model 
indadmixturefile  Individuallevel admixture proportions and sumofintensities 
allelefreqoutputfile  Name of output file containing samples from the posterior distribution of ancestryspecific allele frequencies. Valid only when the allele frequencies are specified as random variables, i.e. when one of the two options priorallelefreqfile or historicallelefreqfile is specified and fixedallelefreqs is 0. 
ergodicaveragefile  Ergodic averages of populationlevel parameters and of the mean and variance of the deviance. 
The options below specify additional tests or output,but do not change the model itself
chib 
1  Calculate marginal likelihood for the first individual using Chib algorithm. 0  default 
thermo 
1  Use thermodynamic integration to compute
marginal likelihood.
0  default 
testoneindiv 
1  compute marginal likelihood for the
first individual listed in the genotypes file. This individual will not be
included as part of the sample and should not be included in an
outcomevarfile or covariatesfile.
0  default 
indadmixmodefile  Name of output file containing posterior estimates of the modes of individual admixture proportions and individuallevel sumintensities (if globalrho=0). 
admixturescorefile 
Pathname of file to which results of a score test for the association of the trait with individual admixture will be written. This option is valid only if an outcome variable has been specified. This option is used only to obtain a formal test of the null hypothesis of no association between the trait and individual admixture. If admixturescorefile is specified, the regression model will not include individual admixture proportions as explanatory variables, and tests for allelic association or linkage will not be adjusted for the effect of individual admixture. Provided an outcomevarfile is specified and unless option admixturescorefile is specified the program will fit a regression model with the outcome variable as dependent variable and individual admixture proportions (plus any covariates specified in inputfile) as explanatory variables. 
allelicassociationscorefile 
Name of output file containing score tests for association of the outcome variable with alleles at each simple locus, adjusting for individual admixture. 
residualellelicassocscorefile  Name of output file containing score tests for residual allelic association between pairs of unlinked loci. 
haplotypeassociationscorefile 
Name of output file containing score tests for association of the outcome variable with haplotypes for all compound loci containing haplotypes, adjusting for individual admixture. 
ancestryassociationscorefile 
Name of output file containing score
tests at each compound locus for linkage with genes underlying ethnic
variation in the trait. This is a test for association of the trait with
locus ancestry, adjusting for individual admixture and covariates. This
test should be used in a crosssectional or cohort study design. For a casecontrol
study of a rare disease, the affectedonly test below has greater
statistical power. 
affectedonlyscorefile 
Name of output file containing score tests at each compound locus for linkage with ancestry, based on comparing the observed and expected proportions of gene copies at this locus that have ancestry from each subpopulation. This test is calculated from affected individuals only: individuals are their own controls. Even when the sample includes both cases and controls, this test is more powerful than the regression model score test in ancestryassociationscorefile if the disease is rare. 
likratiofile  Name of output file containing likelihood ratios for the affectedsonly score test at values of 0.5 and 2 for the ancestry risk ratio. 
allelefreqscorefile 
Name of output file containing score tests of misspecified ancestry specific allele frequencies. This option is valid only when the allele frequencies are fixed, i.e. when option allelefreqfile is specified or fixedallelefreqs is 1. There is a test for each population at each locus as well as a summary chisquared test across populations. 
hwscoretestfile  Name of outputfile containing score tests for heterozygosity across loci, as a test for departure from HardyWeinberg equilibrium. These can be used to detect genotyping errors. 
Name of output file containing test for residual population stratification (stratification not accounted for by the fitted model). 

Name of output file containing test for dispersion of allele frequencies between the unadmixed populations sampled and the corresponding ancestryspecific allele frequencies in the admixed population under study. This is evaluated for each subpopulation at each locus, and as a global test over all loci. This option is valid only if option priorallelefreqfile is specified. The results are "Bayesian pvalues", as above. 

fstoutputfile 
This option is used only with option
historicallelefreqfile 