SimM v. 040504 CITATION If you use SimM, pleace cite the following: Lemire M, Roslin NM, Laprise C, Hudson TJ, Morgan K (2004) Transmission ratio distortion and allele sharing in affected sib-pairs --- a new linkage statistic with reduced bias, with application to chromosome 6q25.3. Am J Hum Genet. 75:000-000. NAME SimM - a gene dropping simulation program SYNOPSIS SimM [options] [pedfile] [datafile] [output] DESCRIPTION Perform gene dropping simulations under Mendelian autosomal inheritance rules or transmission ratio distortion in all the pedigrees found in the file [pedfile], using the marker information given in [datafile]. The structure for the two files [pedfile] and [datafile] are similar to pre-makeped LINKAGE format files. -append Append output to file [output] without prompting. Cannot be used in combination with -overw. -atrd Simulate markers under allele-specific transmission ratio distortion (TRD), for the markers listed in file atrd.txt. This file contains the marker index (the first marker being 1), the allele under TRD, and the probability of a person heterozygous for that allele transmitting it. Compare with -trd. -dna Denotes a column following gender (or dx, if present) is in [pedfile], indicating whether a person has DNA available (1) or not (0). Persons marked with 0 will have missing genotypes at all markers in [outfile]. Cannot be used in combination with -match. -dx Read diagnosis column found in [pedfile], appearing after the required input fields. Also write diagnosis plus a liability column (all individuals are put in class 1) to [output], before the simulated genotypes. -fg [file] Force founders to have genotypes given in [file]. [file] contains the following fields: pedigree ID, founder ID, marker index (where 1 is the first marker), and the specific alleles. -match Constrains any allele which is zero in [pedfile] to be zero in [output]. The "markers" in [pedfile] must follow the required fields (and diagnosis, if present), and must be in the same order as in [datafile]. Cannot be used in combination with -dead. -nl Suppress printing a liability column in [output]. Can only be used with -dx. -overw Overwrite output file [output] without prompting. Cannot be used in combination with -append. -repl [r] Create r replicates of [pedfile]; default 100. -seed [n] Use random number seed n (a long integer); default: long time(0). -sex [file] a male sex-specific genetic map is created by adding to the sex-averaged intermarker distances found in [datafile] the values found in [file]. The female map is adjusted accordingly. -templates Print a description of the input files to the screen. -trd Simulate under transmission ratio distortion based on the grandparental origin of alleles for the markers in file "trd.txt". This file lists the marker index (1 is the first marker), the probability of a father transmitting his grandpaternal allele, and the probability of a mother transmitting her grandmaternal allele. Compare with -atrd. -uf [file] Randomly assign any allele to missing for markers listed in [file]. [file] lists marker number (the first marker being 1) and the probability of alleles being set to missing. Note that you will likely end up with hemizygous individuals. Contrast with -unk. -unk [p] Randomly assign any allele of any marker to missing with probability [p]. Contrast with -uf. INPUT FILES For examples of [pedfile] and [datafile], use -templates, or see directory TestFiles. Pedfile Required fields: Pedigree ID, Individual ID, Father, Mother Gender. Optional fields: Diagnosis (if -dx is used), Dead (if -dead) OR genotypes to be matched (if -match). Note that -dead and -match are mutually exclusive. A pre-makeped LINKAGE format file can be used without modification, provided no liability column is present. Datafile A parameter file suitable for use in GENEHUNTER or LINKAGE is accepted. See the LINKAGE documentation for details. The general format for the file is as follows, with all fields separated by whitespace. It is assumed that a trait locus is present. line 1: 4 integers. For the purposes of SimM, only the first is used, which corresponds to the number of markers in the file, including the trait locus. line 2: 4 numbers, all of which are ignored. line 3: The order of the markers, by index, starting with 2 (the first autosomal marker). SimM assumes the markers in [pedfile] are in index order. The trait locus (the first marker) is not simulated by SimM, hence its index must not appear on this line. line 4: Two integers, the first of which cannot be a "3". Otherwise, this line is ignored by SimM. This corresponds to the trait locus. line 5: Allele frequencies for the trait locus. line 6: Number of liability classes (should be "1") line 7: 3 numbers in [0,1] for the penetrances of the allelic groups. line 8: Two integers. First, a "3", then the number of alleles for the first marker. A pound sign can be used to insert a comment (usually the marker name) after the integers. line 9: The allele frequencies for the marker. The format for lines 8 and 9 are repeated for each autosomal marker. Hence, the length of this file depends on the number of markers present. line -3: two zeroes (not used by SimM) line -2: The intermarker distances, in Haldane cM. line -1: 3 numbers, not used by SimM. If a trait locus is not present in [datafile], the following changes should be made: line 1: Subtract 1 from the first integer. line 3: Change the order to 1 to the number of autosomal markers lines 4 to 7 are omitted. OUTPUT FILE The output file re-writes the required input pedigree fields, followed by diagnosis and liability if -dx is used, followed by the marker data generated. To distinguish between replicates, the string "_R" is appended to the Pedigree ID field, where R is the replicate number. The file "recombination.log" is also created, indicating where recombination occurred during the segregation process. FURTHER DETAILS Haplotypes are assigned to the founders of each pedigrees assuming no linkage disequilibrium between the markers, according to the allele frequencies given in [datafile]. Recombination occurs according to the map in [datafile], assumed to be in Haldane cM. Haplotypes are transmitted to offspring according to the rules of Mendelian inheritance (by default), or under transmission ratio distortion (if -trd or -atrd is used). Only autosomal marker data are simulated in [output]. The diagnosis column, if present, is an exact copy of the column in [pedfile]. Random numbers are generated using the C function drand48(). If no seed is provided, the long integer time(0), the number of seconds elapsed since midnight 1 January 1970, is used as the seed. To avoid identical replicates, users must ensure that consecutive calls of SimM are at least one second apart. INCLUDED FILES SimM.cc Indiv.cc Indiv.hh AlleleGenerator.cc AlleleGenerator.hh PedStruct.cc PedStruct.hh Makefile GNUGeneralPublicLicense.txt README TestFiles/pedfile TestFiles/pedfile2 TestFiles/datafile.dat TestFiles/datafile2.dat TestFiles/forced TestFiles/_trd.txt TestFiles/README.test Compiles under Solaris 8 (gcc 2.95.3), Linux (gcc 3.3.2) AUTHOR Written by Mathieu Lemire, Research Institute of the McGill University Health Centre. Version 040504 (4 May 2004). Distribution version 0.0. Code written in C/C++. COPYRIGHT SimM -- a gene dropping simulation program Copyright (C) 2004 Mathieu Lemire. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. CONTACT Mathieu Lemire mlemire [at] sdf dot lonestar dot org