|
B. Background and Significance
Significance of the Proposed Research.
We have assembled an outstanding research team with the expertise in statistical genetics, linkage analysis, and computer algorithms to properly analyze the large amount of existing genome-scan data generated by the NHLBI MGS. The primary benefit of our project will be the creation of very precise sex-specific maps while properly modeling genotyping errors. These maps should be very useful for the entire disease mapping community, as proper usage of these maps should result in a reduction in false evidence of linkage in regions containing no disease loci, as well as in increased power to map disease genes when there is true linkage. Our collated data set will also allow us to critically evaluate the hypothesis that recombination rates may vary between populations. Finally, note that it is impossible to generate more precise genetic maps from physical data - the only way to obtain more precise genetic maps is via observation of large collections of actual recombination events such as those we propose to analyze here.
Review of Literature that is Relevant to this Application.
There are at least three types of map misspecification. First, map distance estimates may be incorrect (Note that this occurs whenever sex-averaged maps are used if the sex-specific map distances differ.). Second, the order of markers may be incorrect. Third, rates of recombination may vary among different subpopulations.
Effect Of Imprecise Estimates Of Map Distance. The Marshfield maps were built using a set of 8 CEPH pedigrees that contain a total of 184 potentially-informative meioses. This small sample size implies that the current recombination fraction estimates are quite imprecise. As Daw et al. (2000) state: "Given error-free marker data, the 95% confidence interval (CI) on a 10-cM distance may be 4-19 cM with the sample size (< 200 meioses) and marker quality often used to construct human maps" (Daw et al. 2000). This implies that often the true underlying map distance may be half as small or twice as large as the estimated distance. This effect is further compounded by genotyping errors.
The map distance estimates are even less accurate for sex-specific maps. Since these are derived from approximately half as many meioses as sex-averaged maps, the confidence intervals (CIs) are even larger. The sex-specific 95% CI for a 10 cM region is approximately 3-25 cM using the 8 CEPH families. In addition, it is harder to precisely estimate map distance for smaller intervals. This effect is greatest on the male-specific maps since males, on average, undergo considerably less recombination than females.
Daw et al. (2000) quantified the effect of using incorrect map distances on multipoint linkage analysis. They analyzed the effects of using sex-averaged distances in regions of unequal female:male rates of recombination, and the effects of misspecifying the map distance, as well as the combined effects of both types of misspecification. For their true9 models, they used sets of two linked markers and a trait locus using a range of F:M recombination ratios from 1:1 to 20:1 and true distances ranging from 0-25 cM, but analyzed the data assuming combinations of incorrect distances and incorrect F:M ratios. They evaluated the bias under two models, one where the trait locus was linked to the two markers, and one where it was unlinked.
They found that "with actual linkage, any map misspecification causes negative bias in lod scores, resulting in loss of power to detect linkage" (Daw et al. 2000). For simple monogenic diseases, this bias in the presence of true linkage may be modest. However, larger biases were found when the trait locus is unlinked: under-estimation of the map distance biases the lod scores downward, while over-estimation biases them upward. In addition, using a sex-averaged map instead of a sex-specific map always biased the lod scores upward, markedly increasing the false positive rate. One additional analysis looked at the effects of misspecification over a genome screen based on a small set of families and a low lod score threshold of 1 for further follow-up. While a lod score of 1 is not statistically significant, it is a commonly-used cutoff for follow-up. For a true 10 cM interval, use of a sex-averaged map instead of the correct sex-specific map increased the false-positive rate by 50%. This increase in false-positive rates is a particular problem when sample sizes are modest, which has often been the case for many studies of complex disease. Since it is very costly to follow-up many false-positive results, there is a clear need for more precise and accurate sex-specific genetic maps.
These findings may be even more important when considering linkage analysis of a complex trait. In the case of a complex trait, misspecified maps can interact with misspecified disease models to produce false positive evidence for linkage. For example, Halpern and Whittemore discuss a non-parametric linkage analysis of prostate cancer (Halpern and Whittemore 1999), where there was a linkage signal in a region between two markers with conflicting marker maps: one map indicated the region was only 5.5 cM while the Ginithon map estimate was 9 cM. Less evidence of linkage was obtained with the 5.5 cM map (a Z-score of 0.719) than with the 9 cM map (a Z-score of 1.126). Since it was not clear which result should be relied upon, Halpern and Whittemore suggest that we should rely on single-point results instead of multipoint results until accurate meiotic maps become available. Similarly, Daw et al. (2000) discuss the need for better linkage maps and point out that a great quantity of genotyping data already exists that could be used for this purpose. The genotypes generated by the NHLBI Mammalian Genotyping Service are precisely the type of data required to produce more accurate maps.
Effect Of Incorrect Marker Order. Although linkage maps are constructed using a high statistical threshold, the correct ordering of markers is not guaranteed. Several cases where marker order on a linkage map does not match true marker order have been documented, with many others yet to be identified (Dunham et al. 1999; Matise et al. 2001). Incorrect marker order can have a profound affect on linkage analyses. A genome scan for markers linked to type 2 diabetes demonstrates this effect (Hanis et al. 1996; Stringham and Boehnke 2001). The marker map in a critical region was not well defined and several different marker orders (and estimates of map distances) were equally likely. Depending on which map was used, the maximum multipoint lod scores ranged from 2.7 to 4.3. This wide range is due to a combination of uncertain marker order and map distance estimates. Previous work has shown that the order of 98% of the markers on the current Weber screening sets is highly consistent with physical order (DeWan et al. 2001). Therefore, while our project should resolve the few remaining order uncertainties, the maps we will create will be very similar to the current existing maps with respect to marker order. As the human DNA sequence becomes more complete we will continue to validate marker orders.
Effect Of Ignoring Population-Specific Rates Of Recombination. A few small analyses have suggested that there may be variation in recombination rates among different individuals or sub-populations (Broman et al. 1998; Weitkamp 1974). However, this has never been evaluated systematically in any large samples. Since allele frequencies vary among populations, it seems likely that recombination rates may also vary across populations. However, most studies use map distances estimated from the CEPH pedigrees (primarily of Northern European descent). If recombination rates do vary significantly across populations, using CEPH map distances for a different ethnic group would result in map misspecification, with effects as described above.
Description Of The Map We Propose To Improve. The Mammalian Genotyping Service (MGS), directed by Dr. James L. Weber, is funded by the NHLBI to perform genome-wide genotyping for appropriate genetics studies. These genome scans currently employ the Version 10 screening set, which consists of 405 short tandem repeat markers at an average resolution of 9 cM (range 0-19 cM). The markers were selected to optimize informativeness, map position, ease of scoring, and amplification efficiency. The version 10 set is 73% tetra-nucleotide repeats, 16% tri-nucleotide repeats, and 12% di-nucleotide repeats. Additional information on the MGS and the screening sets is available at http://research.marshfieldclinic.org/.
Maps for the Weber screening set are derived from higher-resolution "Marshfield maps" that contain over 8,000 markers (Broman et al. 1998). These maps were built using genotypes from only eight of the 60+ CEPH reference pedigrees (Broman et al. 1998; Dib et al. 1996; Murray et al. 1994). Therefore, as the Weber group has made clear all along, map distances based on this small number of pedigrees are "only crude estimates" (Yuan et al. 1997).
Our project will result in linkage maps based on substantially more genotype data than any other mapping endeavor to date. At least one other group is constructing maps using more data than in the standard 8 CEPH pedigrees. Dr. Kong and colleagues at Decode Genetics are constructing high-resolution genetic maps using 146 Icelandic pedigrees containing 1,257 meioses (Kong et al. 2001). However, as this work is not yet published (other than in meeting abstracts), it is not clear whether this map will be made publicly available, and the details of the marker set are not known. While the results of their work will be very interesting and complementary to our work, our maps will be based on a much larger data set with many more meioses, and also can uniquely address the hypothesis that there is population-level variation in recombination rates. If there is significant population-level variation, then this will limit the value of the Kong et al. data.
Impact of Improving Estimates Of Map Distance. The MGS has completed genome scans on over 100 human studies searching for genes for a wide variety of diseases. In addition, over 130 Weber screening sets have been sold commercially by Research Genetics between 1998-2001 (Donna Brown, personal communication). Therefore, the impact of improved sex-specific and sex-averaged maps for the Weber screening set is quite high. In addition, if we find that recombination rates vary significantly between different ethnic groups, group-specific map distances will be of particular use to investigators working with these sub-populations.
Map Distances Can Not Be Determined By Any Other Means. While the human draft DNA sequence can be used to validate marker order on linkage maps, estimates of meiotic map distance cannot be obtained by any means other than by linkage analysis using genotype data. The relationship between recombination rate and physical distance varies dramatically across the genome (Broman et al. 1998; Matise et al. 2001; Yu et al. 2001) making it impossible to estimate genetic map distance from physical distance.
In conclusion, the body of work reviewed in this section makes clear that there is a need to construct more precise sex-specific linkage maps. Recent projects in our laboratories make it clear that we have the expertise and materials necessary to carry out this important project.
|