Logo der Universitätsmedizin Mainz

GENEHUNTER-MODSCORE 3.1.1

GENEHUNTER-MODSCORE (GHM) is a further extension of GENEHUNTER-IMPRINTING. The program is based on the original GENEHUNTER version 2.1 release 6 (Kruglyak et al. 1996; Kruglyak and Lander 1998; Markianos et al. 2001); it can handle autosomal or pseudoautosomal loci. GENEHUNTER-MODSCORE allows for a MOD-score analysis, in which parametric LOD scores are maximized over the parameters of the trait model, i.e., the penetrances and disease allele frequency. By this means, the disease-model parameter space is explored in an efficient way, and so researchers do not have to rely on a single trait model when performing a parametric linkage analysis. This can be of great help in the context of genetically complex traits, for which the disease model parameters are usually unknown prior to the analysis. Please note that, because of the additional maximization, MOD scores are inflated when compared to LOD scores that were calculated under a single trait model. Therefore, in the context of a MOD-score analysis, significance criteria for LOD scores cannot be applied without correction. For details regarding this issue, please see the references (Strauch et al. 2000; 2005) mentioned below. The latest version 3.1.1 of GENEHUNTER-MODSCORE includes a permutation procedure to obtain empiric p values for the MOD-score based test on genomic imprinting (MOBIT) as well as a small script (‘run_mobit_permutations.sh’) to enable parallel calculation of p values on multiple CPUs. The MOBIT is thoroughly explained in Brugger et al. (2019) also mentioned below.

For version 3.1, we have developed a completely new algorithm that substantially speeds up MOD-score calculation. In particular, our new algorithm reduces the effective number of inheritance vectors by collapsing them into classes with identical disease-locus-likelihood contribution. To this end, the disease-locus-likelihood contribution of each inheritance vector is stored in its algebraic form as a sum of products of penetrances and disease-allele frequencies. Inheritance vectors with the same disease-locus-likelihood contribution are joined together in an inheritance-vector class. This concept is even extended across pedigrees, such that the disease-locus-likelihood contributions of all inheritance vector classes can be numerically calculated in a single step for the entire dataset rather than having to recompute them many times. Unlike the previous algorithm, the new algebraic algorithm neither requires peeling of nuclear families nor loop-breaking.  For details about the new algorithm, please see Brugger & Strauch (2014).


In order to determine the significance of MOD scores, the new algebraic algorithm is also used in the simulation routine of GENEHUNTER-MODSCORE, which allows researchers to determine empirical p values by performing simulations under the null hypothesis of no linkage. Hence, the evaluation of many sets of tested trait-model parameters during a MOD-score analysis in conjunction with empirical p-value calculation is now feasible within a reasonable amount of time. It is of note, however, that the computational speed-up using the new algebraic algorithm as compared to the old algorithm depends on the number of different types of pedigrees in the dataset. It can be hence advisable to also try the old algorithm for MOD score analysis, especially if there are many larger and/or different types of pedigrees in the dataset.


For further information regarding the simulation routine for the calculation of empiric p values and the option to use sex-specific recombination frequencies, please see Mattheisen et al. (2008) and Dietter et al. (2007), respectively.


A Perl script, GH_modview (written by Franz Rüschendorf), is provided with GENEHUNTER-MODSCORE. It allows for the creation of a Gnuplot graph of the LOD or MOD score, displayed by the single family contributions. An example of such a plot obtained for a sample with three pedigrees is shown below. Each family is represented by a different color. For every genetic position, the contribution of a family that yields a score above zero is added to the positive side of the y-axis, and the contribution of a family that yields a score below zero is added to the negative side of the y-axis. The overall score at a genetic position equals the total positive score (i.e., the sum over all families with positive contribution) minus the total negative score. This type of diagram is useful for both Mendelian and complex traits, since it identifies families with positive versus negative contribution to the linkage signal at a particular genetic position.
GENEHUNTER-MODSCORE can perform separate maximizations over penetrances of several liability classes, e.g. for males and females, individuals of different age, or different levels of risk due to environmental factors. By this means, it is also possible to study gene-environment interactions.

In the case that a genome scan for a certain trait yields at least two linkage peaks, it is reasonable to perform a linkage analysis that explicitly models two trait loci. Such an analysis can be done with the program GENEHUNTER-TWOLOCUS. In the parametric context, the best-fitting trait models at the two loci obtained by a MOD-score analysis with GENEHUNTER-MODSCORE can be used to derive the underlying two-locus trait model. Please see the GENEHUNTER-TWOLOCUS subpage for details regarding this issue.


More information about GENEHUNTER-MODSCORE can be found in the file INSTALL.ghm that is included in the archives provided below. Please also see the online help for details, e.g. by typing 'help modcalc' or 'help modscore' at the GHM prompt, or refer to the PDF or PostScript version of the online help (files ghm.pdf and ghm.ps, respectively).


The archives contain sample input files for a MOD-score analysis (sample.in and sample_use_map.in), in conjunction with the files linkloci.dat, linkloci.dat.sxp, linkloci.imp, linkloci.imp.sxp, and linkped.pre. Furthermore, map files with the genetic positions of markers according to the Duffy, Marshfield, Nievergelt-Schork, and the Rutgers map are included (with kind permission by David Duffy, Karl Broman, Nicholas Schork, and Tara Matise). A sample run can be executed e.g. by typing 'run sample_use_map.in' at the GHM prompt, or by calling 'ghm < sample_use_map.in' from the command shell.

Reference publications for GENEHUNTER-MODSCORE:

  • Brugger M, Strauch K (2014): Fast linkage analysis with MOD scores using algebraic calculation. Human Heredity 78(3-4):179-194
  • Mattheisen M, Dietter J, Knapp M, Baur MP, Strauch K (2008): Inferential testing for linkage with GENEHUNTER-MODSCORE: the impact of the pedigree structure on the null distribution of multipoint MOD scores. Genetic Epidemiology 32:73-83
  • Dietter J, Mattheisen M, Fürst R, Rüschendorf F, Wienker TF, Strauch K (2007): Linkage analysis using sex-specific recombination fractions with GENEHUNTER-MODSCORE. Bioinformatics 23:64-70
  • Strauch K, Fürst R, Rüschendorf F, Windemuth C, Dietter J, Flaquer A, Baur MP, Wienker TF (2005): Linkage analysis of alcohol dependence using MOD scores. BMC Genetics 6(Suppl1):S162
  • Strauch K (2003): Parametric linkage analysis with automatic optimization of the disease model parameters. American Journal of Human Genetics 73 (Suppl1):A2624


Reference publication for the imprinting analysis option:

  • Strauch K, Fimmers R, Kurz T, Deichmann KA, Wienker TF, Baur MP (2000): Parametric and nonparametric multipoint linkage analysis with imprinting and two-locus-trait models: application to mite sensitization. American Journal of Human Genetics 66:1945-1957


Reference publication for the permutation procedure to obtain empiric p values for the MOBIT imprinting test:

  • Brugger M, Knapp M, Strauch K (2019): Properties and evaluation of the MOBIT – a novel linkage-based test statistic and quantification method for imprinting. Statistical Applications in Genetics and Molecular Biology 18(4):20180025

Original GENEHUNTER references:

  • Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES (1996): Parametric and nonparametric linkage analysis: a unified multipoint approach. American Journal of Human Genetics 58:1347-1363
  • Kruglyak L, Lander ES (1998): Faster multipoint linkage analysis using Fourier transforms. Journal of Computational Biology 5:1-7
  • Markianos K, Daly MJ, Kruglyak L (2001): Efficient multipoint linkage analysis through reduction of inheritance space. American Journal of Human Genetics 68:963-977


Download

Click here to download