
|
Chromosome 1 is the largest human chromosome and constitutes approximately 10% of the 3000 Mb human genome. The Sanger Centre has been funded to generate the complete map and sequence of this chromosome as part of its contribution of one-third of the genome, which is scheduled for completion by 2003. The strategy of generating a chromosome 1-specific sequence map has two distinct advantages over more global approaches. Firstly, it provides a means whereby the international collaboration to determine the sequence of the whole genome can be coordinated and shared on a chromosome-by-chromosome basis. Secondly, it allows for the generation of a physical and transcriptional map in close collaboration with the chromosome 1 community. This section outlines the rapid progress that the chromosome 1 project has achieved in the period since the last workshop. It will provide details of physical map generation, sequence production and analysis, and will identify points of access to the data. Mapping We are continuing to adhere to our landmark-based strategy for the generation of a physical map, as outlined in the 3rd (Vance et al., 1997) and 4th (Gregory et al., 1998) chromosome 1 workshop reports. Our most recent chromosome 1 RH map (3/11/98), contains greater than our target density of 15 landmark markers/Mb and is now complete. 5486 STSs, 88% of which are RH-mapped, have been used to identify 30,700 large-insert [P1-artificial (PAC) and bacterial artificial (BAC)] clones. Fluorescent restriction digest fingerprinting of 23,000 clones has produced contigs covering an estimated 144.6 Mb (Figure 1). |
|
|||||||||||||
|
Gaps in the bacterial clone map are identified and closed using a combination of different techniques. Fiber fluorescence in situ hybridization (FISH) is used to estimate the gaps between contigs whose order and orientation are known. De novo STSs and probes are generated from clones at the ends of contigs using either clone end sequencing or vectorette techniques, respectively. Unfinished and finished sequence homologies to TIGR genomic survey sequences are used to identify bridging BACs between existing contigs. Additional clone coverage is achieved by the incorporation of other BAC contigs generated as part of the whole genome BAC fingerprinting study at the Washington University Genome Sequencing Center. Sequencing Sequence production of each BAC or PAC clone is based upon shotgun sequencing of 1.4-2.2 kb subcloned M13 and pUCs followed by directed finishing. Templates are now being sequenced using Big Dye chemistries in 384 well format. Significant improvements in sequence production have been achieved by using 96 lane ABI 377 slab gels and by the introduction of ABI 3700 and MEGABASE capillary sequencing machines. A bacterial clone is considered finished when the entire sequence is contiguous, determined on both strands or using two chemistries, and when all ambiguities are resolved to provide sequence data with an accuracy of „ 99.99%. Modifications to finishing protocols, in particular the new version of Gap4 (Staden group) which is used to visualize and assemble sequence reads and to establish a consensus during the finishing process, have increased finished sequence output. As June 28th, 1999 a total of 16.4 Mb of sequence has been finished and 13.3 Mb submitted to public databases (Figure 2). The international consortium sequencing the human genome agreed in March, 1999 to determine a 'working draft' of the genome by Spring, 2000. As a result, the chromosome 1 project has accelerated the generation of bacterial clone contigs, which has dramatically increased map coverage, in preparation for the production of draft sequence. The production of the working draft sequence utilizes only pUC clones and is designed to produce 3X depth of coverage of the sequence on average. The working draft will comprise genomic sequence of each bacterial clone, covering at least 90% of the insert, in one or a few contigs. Each clone will be placed within a bacterial clone map. Analysis The chromosome 1 project provides a detailed human manual annotation and experimental analysis on 'finished' sequence clones. Computational methods provide evidence for the existence of gene features, which are then supported by an experimental approach to isolate corresponding cDNA sequences. A combination of prediction programs, primarily FGENES, Genscan, Genewise, Grail and HEXON, are used to determine potential coding sequences (CDSs) from repeat masked sequence. BLASTN and BLASTX are used to determine DNA and protein homologies, respectively, from unmasked sequence. PCR-based screening of one of 17 cDNA libraries is initiated upon strong evidence from in silico prediction. Both computational and experimental CDSs are collated and graphically displayed in an ACEDB format (see Data access/release). To date, analysis of 10.44 Mb of 'finished' chromosome 1 genomic sequence has identified 117 chromosome 1 genes from 97 finished PAC/BAC sequences (Rhodes et al., this report). Collaborations The Sanger Centre chromosome 1 project actively encourages the establishment of mapping and sequencing collaborations. Many of our current collaborations (Figure 3) have arisen directly from previous chromosome 1 workshops, illustrating the value of single chromosome workshops as such a vehicle for coordination. All collaborations follow the established data release policy of the Sanger Centre (refer to Data access/release). Several abstracts from this report (Carpten et al., Lloyd et al., Doudney et al., Labay et al., Bjork et al., Pavari et al.) describe examples of progress made through such collaborations. Data access/release Chromosome 1 mapping and sequence data is released freely into the public domain. Unfinished assembled sequence is released nightly and finished sequence is annotated and submitted to the EBI directly upon completion. Maps are updated weekly on the Internet. A chromosome 1-specific BLAST server and conditions of data use, as well as all of the above sites are accessible via the Sanger Centre Chromosome 1 Homepage. |
||||||||||||||