
1Division of Neurology, Duke University Medical Center, Durham, NC
(USA)
2 Lab of Statistical Genetics, The Rockefeller University,
New York, NY (USA)
3Sanger Centre, Hinxton Hall, Hinxton, Cambridgeshire,
UK
4 Department of Pediatrics, University of Iowa, Iowa City,
IA (USA)
5Children's Hospital Medical Center, Genetics Division,
Boston, MA (USA)
6Department of Medical Genetics, University Hospital
Ghent, Ghent (Belgium)
7Division of Oncology, Children's Hospital of
Philadelphia, Philadelphia, PA (USA)
8Research Institute of Molecular Pathology (I.M.P.),
Vienna (Austria)
Running Title: Third International Chromosome 1 Workshop
Corresponding Author:
Jeffery M. Vance
Box 2903, Duke University Medical Center
Durham, NC, 27710, USA
Fax: 1-919-681-7894
Telephone: 1-919-681-5696
Email: jeff@dnadoc.mc.duke.edu
Radiation Hybrid and Integrated Maps
Chromosome 1p Physical Mapping Report
Physical maps on chromosome 1q
Electronic and Genomic Resources
Chromosome 1 Abnormalities in Human Neoplasia
Systematic Mapping and Sequence Analysis
The Third International Chromosome 1 mapping workshop was held at Duke University in Durham, North Carolina, USA between the 25th and 27th of April, 1997. The meeting was more focused than previous meetings, with primary emphasis on 3 points: 1) providing an update of chromosome one mapping data as in previous reports; 2) discussion and initiation of a framework to build a integrated chromosome 1 map; and 3) discussion of how to integrate the chromosome one sequencing efforts at the Sanger Center in Cambridge, England with the existing chromosome 1 community.
In concurrence with the previous SCW in Vienna in September of 1995, an active Chromosome 1 web site was been constructed at http://linkage.rockefeller.edu/chr1 which is maintained by Dr. Peter White at Childrens Hospital of Philadelphia and Dr. Tara Matise at Rockefeller University. This site has been very active, with over 5000 visits in the past year. It will hold this report as well.
The workshop led to discussions on several general points by the participants: 1) Given the initiation of sequencing chromosome 1 at the Sanger Center, it was agreed that an effort would be made to ensure that the markers in the Sanger RH framework map, being used for the sequencing framework, will be included in part of the integrated mapping effort already initiated by Drs. White and Matise (see abstract). As the genetic maps are currently more accurate in many areas, this maybe difficult for some of the markers. 2) The confusing and significant problem of having too many aliases for a large number of markers, particularly ESTs. This was felt to be one of the primary problems with integrating maps. GDB has made some headway on this problem and was encouraged by the participants to consider serving as a primary site of alias lists that could be linked to other data sources. 3) A direct link from the Sanger sequencing effort was introduced into the chromosome 1 website just prior to the workshop. The participants felt this was a very valuable step into integrating the sequencing effort with the chromosome 1 community, the actual "users of the data". This web site data, although naturally limited relative to the 1ace ACeDB ftp site, will be updated daily. Considerable discussion during the workshop revealed several suggestions that all participants felt would make the data much more useful to the average website user. The feasibility of incorporating these points into the website will be evaluated by the Sanger Center. 4) Discussion occurred on how to work with the Sanger Center on prioritizing areas of high interest to the chromosome 1 community for sequencing, and 5) It was decided to try and expand the use of ACeDB as a instrument to manage the large amount of chromosome 1 data as it continues to expand.
Finally, the participants felt that the work initiated at this meeting should be continued and Drs. Wooster, Gregory and Vaudin will investigate possible sites and sources of funding for another small meeting in the future.
Genetic Maps
(Prepared by Tara C. Matise)
There have been several genome-wide genetic maps published in the past few years (Dib et al., 1996; Murray et al., 1994; Sheffield et al., 1995). These maps provide valuable information regarding the map locations of hundreds of genetic markers located on chromosome 1 (Table 1). In addition, two genome-wide integrated maps which include genetic markers have been recently published (Collins et al., 1996; Hudson et al., 1995). While there are many markers in common across these maps, there is no single map which contains linkage map positions for all available genetic markers on chromosome 1. Therefore, unless working only with markers developed exclusively by a single group, such as Généthon or the Cooperative Human Linkage Center (CHLC), extrapolating map locations from markers placed on different maps requires manual integration which can be quite difficult.
To overcome the limitations of multiple maps, we (T.C.M. and P.S.W.) have constructed a genetic map which localizes nearly all chromosome 1 markers that have been genotyped in the CEPH pedigrees (Dausset et al., 1990). This map contains over 800 chromosome 1 markers, and includes markers for anonymous D-segments as well as gene-based markers, hybridization-based markers (RFLPs, VNTRs) in addition to PCR-based markers (di-, tri-, and tetra-nucleotide microsatellite repeats), and markers that were developed and genotyped in several different laboratories. Genotypes for a total of 853 markers were extracted from the CEPH database (version 8.0). Forty-three of these had been genotyped in more than one laboratory or represent the same locus as another marker in this set, leaving 810 markers thought to represent unique loci. Of these, 44 identify genes and 766 are anonymous DNA segments, 691 of which are PCR-based, with 465 developed by Généthon, 109 by CHLC, 88 by the Eccles Institute of Human Genetics, and 29 by other investigators.
To facilitate map construction and provide a basis for comparison with other genetic maps, a set of 126 Généthon markers was used as an initial framework. For our framework, we modified the original Généthon map (Dib et al., 1996), so that the local order of all markers is supported by odds of at least 1000:1. A preliminary map was produced by simply assigning each of the 684 non-framework markers against the framework map. We have termed this a "comprehensive positional" map. An expanded version of the map was produced by allowing non-framework markers to fill in gaps in the framework where there were odds of at least 1000:1 in support of one map position over all others. The resulting map, termed a "comprehensive expanded" map, contains 154 ordered (backbone) markers and 656 off-backbone markers placed in 1000:1 odds support intervals (Figure 1). The MultiMap (Matise et al., 1994) and CRI-MAP (Lander and Green, 1987) programs were used for map construction, applying previously published mapping algorithms. An error threshold was applied so that markers whose inclusion in the backbone map would increase the map length by more than 10% were not added to the map. The sex-averaged, female and male map lengths are 289, 363, and 216 cM, respectively (Table 1). Chromosome 1 is estimated to be 263 Mb (Morton, 1991). Averaged over the entire chromosome, one centiMorgan is approximately equal to 910 kb. Thus, the average map resolution, or distance between markers, is approximately 1.9 cM and 1.7 Mb.
This map provides relative map locations of all markers in a single unified resource. It should prove especially useful for identifying polymorphic markers in specific regions of interest. The comprehensive expanded genetic map is being submitted to the Genome Database (GDB) and will be viewable using their MAPVIEW Java Applet. In addition, both maps can be viewed and queried on the chromosome 1 web site (http://linkage.rockefeller.edu/chr1). The web site contains the relative likelihood for the positions of the off-backbone markers, as well as tables of basic marker-related information including links to the GDB.
The genetic map location of D1S160 is thought to be incorrect in some publicly available maps of chromosome 1. We addressed this problem at the previous chromosome 1 workshop (Weith et al., 1996) and found likely genotyping errors for this locus in CEPH family 1362. Dr. Peter White provided new genotypes for D1S160 for this family, and these new genotypes were found to differ significantly from those present in the CEPH database for this locus. Hence, the new genotypes were used in the current genetic map construction and the resulting map position of D1S160 is consistent with physical mapping evidence (Jensen et al., 1997). D1S160 is correctly localized 17-23 cM from the most p-telomeric marker on this map, D1S243.
Radiation Hybrid and Integrated Maps
(Prepared by Tara C. Matise and Peter White)
Radiation hybrid maps
The Radiation Hybrid Database (RHdb; www.ebi.ac.uk/RHdb) contains radiation hybrid (RH) score data for over 35,000 markers typed in several RH panels, including the Genebridge 4 (GB4) panel (Gyapay et al., 1996) and Stanford's G3 panel (Stewart et al., 1997). The majority of this RH data has been generated by the genome centers comprising the "RH consortium." Several of the consortium members have published or made available whole-genome RH maps. In addition, the consortium has cooperatively produced a genome-wide RH map consisting of over 16,000 transcripts (Schuler et al., 1996). For each of these maps, a subset of polymorphic microsatellite markers was used to construct a high likelihood (usually 1000:1) framework map of each chromosome, to which additional markers were related. A summary of each of these maps for chromosome 1 is presented in Table 1. The current RHdb data release, 8.0, contains RH scores for 2496 chromosome 1 markers screened in the GB4 panel and 1126 markers screened in the G3 panel.
The Sanger Centre presented a radiation hybrid map of chromosome 1. The Sanger map currently consists of 961 markers typed on the GB4 panel and is being expanded to eventually include 5500 chromosome 1 markers. This RH map will serve as an important resource in integrating the Sanger Centres physical mapping and sequencing efforts.
Integrated maps
"Integrated map" is becoming an increasingly common and wide-ranging term that can refer to various mapping procedures. Here we define "integrated" as a map relating data derived from two or more distinct experimental procedures onto a single scale. Since the previous workshop, a number of genome-wide integrated maps have been published or presented (Table 2). The CEPH-Généthon integrated map combines genetic linkage and YAC-based physical data (Chumakov et al., 1995), while the Whitehead Institute map includes linkage, RH, and YAC data (Hudson et al., 1995). While the recent RH consortium gene map (see RH maps above) relies mainly upon RH genotyping of transcript markers, the framework incorporates microsatellite polymorphisms and displays the map in centiMorgans rather than centiRays. An integrated map that combines RH, genetic, physical, and cytogenetic data by computational means into a weighted linear ordering of various marker types has been constructed by researchers at the University of Southampton (Collins et al., 1996). Chromosome 1-specific statistics for each of these maps is shown in Table 2. In addition, a regional integrated map of 1p35-36 that combines data from a distal 1p-specific RH panel and genetic linkage data has been constructed by Jensen and colleagues (1997) and is available at the chromosome 1 web site (linkage.rockefeller. edu/chr1).
A major focus of the workshop was discussing ways to combine disparate genome-wide and chromosome 1-specific mapping data into a single map that reflects physical distance. It was proposed that the most effective means would relate markers to a well-ordered set of framework markers rather than to place all markers in a linear order with reduced support for order. The largest dataset of markers mapped by a single technique is the 2496 markers typed on the GB4 RH panel. This RH dataset has the potential for constructing a high resolution, as well as marker-dense, map that includes a large number of genetic markers. The inclusion of genetic markers can allow the integration of cytogenetic, genetic, and physical mapping data with relative ease.
Tara Matise and Peter White proposed a specific procedure to create an integrated chromosome 1 map based upon GB4 data in the RHdb. A well-characterized set of linearly-ordered Généthon markers from the Wellcome Trust Centre for Human Genetics will be used as a skeletal RH map, onto which additional markers will be placed sequentially to establish a framework map with 1000:1 likelihood odds. Additional markers in the GB4 dataset will then be intervaled relative to the framework markers. The data will be presented relative to the Unigene clusters, allowing the comparison of mapped transcripts with corresponding sequence and functional data. Future efforts will focus upon incorporating YAC contigs from the Whitehead Institute and CEPH, PAC contigs and sequence information from the Sanger Centre, additional genetic markers from the CEPH genotype database, and STSs typed on the Stanford G3 RH panel. Each of these marker sets can be related to the framework map by the genetic markers used in common between the maps. This proposal can potentially integrate markers from all of the existing large-scale genome-wide maps onto a single comprehensive map, and this procedure can easily be adapted to other chromosomes.
Chromosome 1p Physical Mapping Report
(Prepared by Andreas Weith)
Physical mapping data and contig constructs generated in the past two years on chromosome 1 have still not reached the desired goal of covering the entire chromosome with evenly distributed and sufficiently detailed maps. In particular on chromosome 1p, again an accumulation of mapping data is observed in regions long known to be relevant to human disease while other parts of the chromosome remain entirely untouched. The efforts of genome centers such as Whitehead or CEPH provide chromosome-wide information (Tables 2, 3 and 4). In addition to the genome mapping projects, three chromosome 1-specific maps were generated (Kugoh et al., 1995, Ariyama et al., 1995, Roberts et al., 1996, see Table 2) and one integrated map specific for chromosome 1, already presented at the first chromosome 1 workshop was updated (Forabosco et al., 1995, Table 2).
As a region of extensive interest, again the terminal 1p region (1pter-1p35) was the main focus of several groups for either FISH or contig mapping (Tables 5, 6). Ten maps were generated mostly centering around distinct subregions with the purpose of either identifying disease gene loci or mapping cytogenetic aberrations (Table 4). Gene identification by exon trapping and cDNA selection from P1 clones and PACs and subsequent FISH mapping has yielded a partial transcript map in 1p36.32-p36.31 (Onyango et al., 1997b, Table 4). Comprehensive mapping efforts at the Sanger Centre (Cambridge, UK) as a prerequisite for genome sequencing focus on 1p36, too. The contigs generated are listed in Table 5; further details are presented elsewhere in this report, and are updated on the chromosome 1 web site. The 1p32 region was another subject of YAC mapping and long range restriction analysis, centering around the infantile neuronal ceroid lipofuscinosis locus (Hellsten et al., 1995, Table 4).
Despite extensive efforts on creating comprehensive maps for chromosome 1pter-p35, contig construction or even the creation of a consensus map of this approximately 30-40 Mb region is still pending. Strikingly, this region appears to be almost inaccessible to cloning in some regions. As an indication, contigs generated by the Whitehead effort are significantly small as compared to the rest of the chromosome. The number of contigs appears to be increased: 10 out of 17 contigs on 1p are localized within the terminal 1p region, however, it seems to be difficult to generate overlaps.
Most of the mapping in particular the genome wide efforts - has still been performed using YACs as clone material. However, critical evaluation of YACs with respect to their stability as well as susceptibility to extended chimerism has resulted in an increased use of prokaryotic long insert clones. BAC, PAC, P1 and cosmid clones are used with increased frequency recently (Ariyama et al., 1995; Barnas et al., 1995; Kugoh et al., 1995; Maris et al., 1996; Onyango et al., 1997a, 1997b); their use may be appropriate also because they are suggested to be appropriate targets for subsequent gene identification strategies.
A total of 21 gene assignments to chromosome 1p were made by either somatic cell hybrid analysis or by FISH or by both methods (Table 6). Most of these mapping data represent novel assignments of genes to chromosome 1. Some of these genes are also new to the human genome: PO42 corresponds to a gene of unknown function (Onyango et al., 1997b). Human xylosidase is the newly identified human homolog of the bacterial gene (Onyango et al., 1997b) and so is hMYH (Slupska et al., 1996). Another gene identified by means of its evolutionary conservation is DVL1 (Pizzuti et al., 1996). Furthermore, MBP1, previously identified as a protein binding to the myc gene promoter, is co-localized to the ENO1 locus (Onyango et al., 1997b). Owing to its almost complete identity to ENO1 apart from two nucleotide sequence changes in the coding region, the relation of MBP1 to ENO1 must be re-considered.
Apart from important new assignments, the newly available mapping information has also created at least one apparent map conflict on 1p: FISH analysis of the MYCL locus on extended pro-metaphase chromosomes led to re-localization of this gene from the previously assigned 1p32 band to 1p34.3 (Speleman et al., 1996). This has an important impact on the localization of the YAC contigs and PFGE map of Hellsten and colleagues (1995), who used MYCL as one of the anchoring markers. Provided a correct assignment of MYCL to 1p34.3, the infantile neuronal ceroid lipofuscinosis locus would also be localized in this genomic area.
Physical maps on chromosome 1q.
(Prepared by Brian C. Schutte)
The review of physical mapping on the long arm of chromosome 1 is divided into three sections, FISH maps, contig maps and gene structures. The data was compiled from searches of GDB, OMIM and Medline databases and was restricted to reports published in the last 18 months.
FISH maps
A total of 52 chromosome 1 assignments were made by in situ hybridization since the last chromosome 1 workshop (Table 7). The assignments can be divided into 27 single-gene, 3 contigs, 5 gene clusters and 17 genetic markers. For the purpose of this review, only reports containing multiple loci are described in greater detail below.
In an effort to integrate cytogenetic maps with physical and genetic maps, one study (Bray-Ward et al., 1996) performed FISH analysis with YAC clones that contained genetic markers. From this genome-wide effort, 17 FISH assignments were made on 1q.
FISH assignments were made on 5 gene clusters including the FCGR1, EDC, EPH-related kinase ligands, pentraxin and RCA gene clusters. The high-affinity receptor for immunoglobulin G, Fc gamma RI (FCGR1), is encoded by a family of three genes within humans that share over 98% of DNA sequence homology. Maresco and colleagues (1996) performed fluorescence in situ hybridization of human cells and Southern analysis of cell lines containing 1p and 1q to show that the three genes flank the centromere of chromosome 1 at bands 1p12 and 1q21. FCGR1B was found at 1p12, whereas both FCGR1A and FCGR1C were localized to 1q21.
The epidermal differentiation complex (EDC) unites a number of related genes on 1q21 that play an important role in terminal differentiation of the human epidermis. Marenholz et al. (1996) determined the chromosomal orientation of the EDC by fluorescence in situ hybridization with two YACs that flank a contig which contains the EDC.
Cerretti and colleagues (1996) identified five cDNAs encoding membrane-bound ligands to EPH-related kinases or EPLGs. Using Southern hybridization analysis of human x rodent somatic cell hybrid DNAs, they assigned EPLG1, EPLG3, and EPLG4 to human chromosome 1. Fluorescence in situ hybridization to metaphase chromosome preparations using genomic clones from each locus refined this localization to chromosome 1q21-q22.
The pentraxin genes are found as a cluster in the 1q21-q25 region and encode proteins with immune- and inflammation-associated functions. Walsh and colleagues (1996) used fluorescence in situ hybridization with YACs to refine the map location of the pentraxin cluster to 1q23. In addition, this cluster contains the histone H3F2 and H4F2 genes and the gene for erythroid alpha-spectrin (SPTA1).
The regulator of complement activation (RCA) gene cluster maps to 1q31-q32. Pardo-Manuel de Villena and colleagues (1996) used two-color in situ hybridization to determine the chromosomal position of genes in this cluster. They found that HF1/F13B are proximal to the C4BP/MCP genes and that the PTPRC and REN genes map to a region of the RCA gene cluster between the HF1/F13B and C4BP/MCP.
FISH assignments were also performed on single genes that were contained within YAC contigs. Gelb and colleagues (1996) constructed a YAC contig that contained the connexin 40 (Cx40; HGMW-approved symbol GJA5), natriuretic peptide receptor A (NPR1) and flavin-containing monooxygenase 5 (FMO5) genes. Single-color FISH using PAC clones containing the Cx40 and NPR1 STSs localized these genes to 1q21.1. Clancy and colleagues (1996) used fluorescence in situ hybridization to localize the glutamine synthetase gene (GLUL) to 1q23. They also mapped GLUL to five CEPH YACs between the polymorphic PCR markers D1S117 and D1S466 by analysis of the Whitehead Institute's recently described chromosome 1 contig map. Finally, Hardas and colleagues (1996) used fluorescence in situ hybridization to localize the psoriasin gene to 1q21. Psoriasin is an abundant low molecular weight protein in keratinocytes from psoriatic lesions and is similar in gene structure and expression to other genes on human chromosomal band 1q21. In addition, the psoriasin gene was present on a 380-kb yeast artificial chromosome clone that was previously mapped to 1q21 and shown to contain the genes calcyclin, MRP8 and CaN19.
Contig maps
A total of 11 YAC contigs were reported on chromosome 1q since the previous workshop and are described in greater detail below (Table 8). In addition, the human genome map (release 11; October, 1996) from the Whitehead Institute contains 7 YAC contigs (WC1.17 to WC1.23) which span most of chromosome 1q. Detailed descriptions of Whitehead contigs can be obtained at their Web site (http://www-genome.wi.mit.edu).
The epidermal differentiation complex (EDC) includes at least 18 genes that play an important role in terminal differentiation of the human epidermis. Mischke et al. (1996) showed that the EDC is localized in a 2.05 Mb region of 1q21. Marenholz et al. (1996) constructed a YAC contig that covers about 6 Mb of 1q21 and includes the entire EDC. Their contig contains 24 YAC clones and 32 STSs. The STSs were derived from genes (18) genetic markers (4) and new region-specific probes (10). Two subsequent reports by Wicki et al. (1996a, b) showed that 3 new genes from the S100A family (S100A11, S100A12 and S100A13) also colocalize to the EDC raising the total number of genes to 21.
Gelb and colleagues (1996) constructed a contig over the connexin 40 (GJA5) gene that contained 5 YAC clones and 6 STS covering about 1.5 Mb (assuming an average clone size of 1Mb). The STSs were derived from 3 genes and 3 genetic markers.
A 1.4 Mb YAC contig was constructed by Walsh and colleagues (1996) which included the pentraxin genes, for C-reactive protein (CRP), serum amyloid P (SAP) protein (APCS),and a CRP pseudogene (CRPP1). The four-YAC contig included other genes with immune functions including the FCER1A gene, which encodes the alpha-subunit of the IgE high-affinity Fc receptor and the IFI-16 gene, an interferon-gamma-induced gene. In addition, it contains the histone H3F2 and H4F2 genes and the gene for erythroid alpha-spectrin (SPTA1). The gene order was cen.-SPTA1-H4F2-H3F2-IFI-16-CRP-CRPP1-APCS -FCER1A- tel.
Two independent YAC contigs were constructed across the critical region of the primary open angle glaucoma (GLCA1) locus. The contig by Sunden and colleagues (1996) spanned 6.5 Mb and contained 37 YACs and 46 STSs of which 8 STSs were derived from cDNA or EST sequences. The contig by Clepet and colleagues (1996) spanned 9.4 Mb and contained 68 YACs and 47 STSs. A YAC contig that combines these studies would contain 85 YACs and about 78 STSs over 9.4 Mb (20 YACs and a minimum of 15 STSs were used by both groups).
Clancy and colleagues (1996) mapped the glutamine synthetase gene (GLUL) to 5 CEPH YACs. According to the Whitehead physical map for chromosome 1 these YACs also contained the polymorphic PCR markers D1S117 and D1S466.
In an effort to identify new neuromuscular disease genes, Neri and colleagues (1996) screened human reference cDNAs and EST sequences in Genbank for the presence of CAG/CTG repeats. Of the selected clones, 286 were found to represent new genes, and 72 have thus far been shown to contain CAG/CTG repeats. One of the cDNAs, Unigene A006A02, was mapped to YACs 852g6, 884f5 and 891b4 which localize to 1q32-q41 on the Whitehead physical map. These YACs are also contained in the YAC contig that spans the VWS critical region (Schutte et al., 1996). Preliminary STS content analysis confirmed the presence of this cDNA in the VWS contig but outside of the VWS critical region (B. Schutte, unpublished results).
Van der Woude syndrome (VWS) is the most frequent form of syndromic clefting. Linkage analysis has localized the gene between D1S245 and D1S414, an interval of 4.1 cM. A microdeletion around D1S205 and an additional genetic recombinant aided in narrowing the critical region to a 1.6 cM region between D1S491 and D1S205. Schutte and colleagues (1996) constructed a 3.5-Mb YAC contig from D1S245 through D1S414. The contig contained 18 YACs and 10 STSs. One single YAC, yCEPH785B2, contains both flanking STSs (D1S491, D1S205) and both the proximal and distal ends of the microdeletion.
The gene for Usher syndrome type II (USH2A), an autosomal recessive syndromic deafness, has been mapped to a region of 1q41 flanked by D1S217 and D1S439. Sumegi et al. (1996) constructed an 11 Mb YAC contig across this region. The contig contained 21 YACs and 17 STSs. Linkage analysis with two new markers D1S474 and AFM144XF2 narrowed the critical region to approximately 1 Mb. In addition, a long-range physical map of the Usher type IIa critical region, using MluI, BssHII, NotI, EagI, and SacII, was developed.
The Chediak-Higashi syndrome (CHS) is a severe autosomal recessive condition, features of which are partial oculocutaneous albinism, increased susceptibility to infections, deficient natural killer cell activity, and the presence of large intracytoplasmic granulations in various cell types. The CHS locus was mapped to a 5-cM interval in chromosome segment 1q42.1-q42.2. Barrat et al. (1996) constructed a 3 Mb YAC contig with 10 STSs over this region.
Gene structures
The genomic structure for 9 genes from chromosome 1q were reported since the previous workshop (Table 9). Three of these genes belong to gene families, S100A12 and SCM-1 and SCM-1. As expected, the overall structure of the S100A12 gene was very similar to the structure for the other members of the S100A gene family (Wicki et al., 1996). Also, the structure for SCM-1 and SCM-1 were very similar except that while both genes contain a pseudogene of the ribosomal large subunit L7a in the first intron, the SCM-1 gene had a 1.5 kb region deleted from the first intron which included part of the L7a pseudogene (Yoshida et al., 1996).
Electronic and Genomic Resources
(Prepared by Peter White)
Electronic resources
A World Wide Web site (see Table 10 for Internet addresses) dedicated to chromosome 1 was established just prior to the 1995 workshop. In the interim, this site has been greatly expanded to provide links to chromosome 1 information on the Internet, to serve as a communications forum, and to display relevant genomic data. The workshop section of the website contains information about the 2nd and 3rd SCW1 workshops, including contact information of meeting participants, workshop abstracts, and summaries of meeting discussions and content. Additional sections contain descriptions of the chromosome 1 mailing list (see below) and a related on-line discussion forum for posting of relevant comments.
The resources section of the web site is a comprehensive listing of links to chromosome 1 resources existing on the Internet. This section is divided into links for singular chromosome 1 maps (constructed primarily with one mapping method), integrated maps, genomic databases, and additional useful resources. Corresponding links to help pages are provided, when available, to explain individual resources to first-time users. Links are provided both to chromosome 1-specific maps and to data from genome-wide mapping efforts of the large genome centers. Information is available for genetic, radiation hybrid, physical, FISH, and transcript maps, as well as for sequencing efforts at the Sanger Centre. Additions to the resources page and suggestions for improving the overall chromosome 1 site are encouraged.
The data section is a new addition to the web site that is of great potential utility. This section contains presentations of unreviewed chromosome 1-specific data as a service to the genome mapping community. Currently, the data section contains three submissions: comprehensive positional genetic maps of chromosome 1 by T. Matise (see genetic maps), a characterization of 100 ESTs from 1q21 by H. Vos, and an integrated transcript map of 1p35-36 by Jensen and colleagues . Further contributions to the data page were discussed by several workshop participants. This repository hopes to provide a forum for the display of detailed mapping and sequencing data, including information that would not otherwise be published, and for the creation of web interfaces that complement published articles. Additional submissions of regional or whole chromosome data are welcomed.
A related web site for chromosome 1 has recently been created by the Sanger Centre for dissemination of their mapping and sequencing data. This site includes a radiation hybrid map, physical mapping, sequencing, available software, and the 1ace ACEDB database . Queries to 1ace can be made through the web; 1ace can also be downloaded from the Sanger ftp site, as can finished and unfinished sequence data. Currently only the Sanger ftp site provides the full information available to public users. A partial integration of the chromosome 1 web site and the Sanger site was discussed at the workshop. It is anticipated that a Sanger-specific section will be created on the chromosome 1 site in the future.
In 1996, an Email list server was established to distribute chromosome 1-specific news and announcements, and to promote interactions between members of the chromosome 1 community. To subscribe to the group, send an Email message to lists@genome.chop.edu with SUBSCRIBE Chr1_list in the body of the message. Further details regarding the mailing list can be found on the chromosome 1 web site.
Genomic resources
Many of the resources developed for mapping of the entire genome are applicable to chromosome 1. A number of these resources, especially those for genetic, RH, transcript, and physical maps are described in greater detail elsewhere in this report. An excellent summary of genome-wide large insert clone libraries established through 1995 can be found in the previous workshop report . This summary will concentrate on resources developed more recently.
Significant advances in whole-genome BAC and PAC libraries have been made in the last year, as these libraries are the source of sequence-ready contigs for many chromosome sequencing projects. The Caltech BAC library has been expanded to 9 genome equivalents with an average insert size of 130 kb, and the RPCI-4, 5, and 6 PAC libraries together constitute 14 genome equivalents with an average insert size of 150 kb . Both the BAC and PAC libraries exhibit high clone stability and low chimerism rates. Individual clones, whole or partial screening sets, and screening services are available from commercial sources.
Perhaps the most significant genome mapping advance in the last year has been the genome-wide transcript map of Schuler and colleagues . A large number of the mapped EST clones (currently greater than 230,000 from the I.M.A.G.E. consortium) are available from several sources . The transcript mapping efforts primarily use two genome-wide RH mapping panels: the Stanford G3 and Genebridge4 panels . A more recent high resolution panel (TNG) has subsequently been created by the Stanford Human Genome Center. This panel consists of 90 hybrids and has a resolution approaching 100 kb. All 3 panels can be purchased from Research Genetics, and on-line RH data servers for the G3 and GB4 panels are also available (Table 10).
The Chromosome Abnormality Database is a collection of acquired and constitutional chromosomal abnormalities from cytogenetic labs in the United Kingdom. A similar resource, the Dysmorphic Human-Mouse Homology Database, lists several characterized human and mouse cytogenetic aberrations involving chromosome 1. Each of these resources has useful web interfaces (Table 10).
Chromosome 1-specific resources
The most comprehensive collection of chromosome 1 mapping information can be found in the 1ace ACEDB database created by the Sanger Centre. 1ace compiles current mapping information generated at Sanger, as well as chromosome 1 data from OMIM, GDB, RHdb, and several genome centers, into a hypertext-linked, object-oriented database with graphical views . ACEDB was designed for the Unix operating system, although Windows- and Mac-compatible versions have also been written. More information about the capabilities and requirements for 1ace can be found at the Sanger chromosome 1 web site.
A somatic cell hybrid containing chromosome 1 (A9-1neo, #GM13139) as its only human component is available from the American Type Culture Collection (ATCC), and a number of cell lines containing rearranged chromosomes 1 are available through the ATCC and the Coriell Cell Repositories (Table 10) . Chromosome painting probes for various portions of chromosome 1 have been developed by the Cytogenetics Unit at the University of Bari, and DNA samples of these probes can be freely obtained . DNA probes from microdissection libraries specific for 22 cytogenetically-defined chromosome 1 bands are commercially available from Research Genetics. At the present workshop, A. Weith presented a continued analysis of a 1p35-36-specific microdissection library . P. White described the construction and analysis of a radiation hybrid panel for distal 1p with a resolution of 800 kb . Richard Wooster of the Sanger Centre announced the availability of a flow-sorted chromosome 1 small insert library which is being used to develop additional STSs.
Three recent manuscripts are of particular interest to the chromosome 1 mapping community. Roberts and colleagues describe a chromosome 1-specific mapping panel of eight somatic cell hybrids containing various translocations which was subsequently used to regionally map 192 ESTs . Wimmer and co-workers applied the emerging technology of two-dimensional electrophoretic genome scanning to flow-sorted chromosome 1 DNA, which enabled them to construct a reproducible NotI-EcoRV restriction pattern of the chromosome . The authors also demonstrated that specific fragments were recoverable for cloning, sequencing, and FISH. Finally, a first-generation methodology for constructing chromosome-specific cDNA libraries was tested on chromosome 1 by Mancini and colleagues . Alu-PCR products from the chromosome 1-specific A9-1neo somatic hybrid were used as probes for identifying human cDNAs, the majority of which mapped to chromosome 1. Both the genome scanning and chromosome-specific library methods are innovative approaches to genome mapping that could significantly impact chromosomal mapping in the future.
New Genes
(Prepared by Gail Bruns)
There are now 557 loci on chromosome 1 defined by genes or pseudogenes. Assignments to the chromosome since the last workshop on chromosome 1 (Weith et al., l996) include: (1) the leptin receptor gene (LEPR) to 1p31 (Chung et al., l996); the SCNN1D locus for the delta subunit of the nonvoltage gated sodium channel 1 to 1p36.3p36.2 (Waldmann et al., l996); (2) loci for the kidneyspecific chloride channels CLCNKa and CLCNKb in close proximity at 1p36 (SaitoOhara et al., 1996); and (3) the KCNK1 gene for member 1 of subfamily K of inwardly rectifying potassium channels to 1q42q43 (Lesage et al., l996). Loci for three GPIanchored ligands of the eph family of receptor tyrosine kinases (EPLG1, EPLG3, EPLG4) have been localized to 1q21q22 by FISH and to the homologous region of mouse chromosome 3 (Cerretti et al., l996). The epithelial cell receptor protein tyrosine kinase gene ECK maps to 1p36 (Sulman et al., l997) and the PPP2R5A locus for the alpha isoform of the regulatory B subunit of the abundant protein phosphatase 2A localizes to 1q41 (McCright et al., l996).
A locus for an additional human member of the forkhead family of DNA binding proteins (FKHL12) has been mapped to 1p32 (Larsson et al., l995). The NFYC gene for the C subunit of the CCAATbinding nuclear transcription factor Y (NFY) has likewise been assigned to 1p32 by homology with the mouse (Sinha et al., l996). The YB1 gene for the Ybox binding protein, a member of a family of DNA binding proteins with a conserved cold shock domain, is located at 1p34 (Makino et al., 1996) while the HDAC1 locus for a histone deacylase related to the yeast transcriptional regulator Rdp3p maps to 1p34.1 (Furukawa et al., l996). The Ras homolog gene ARHC, for a retinal small GTPbinding protein, is located at 1p21p13 (Fagan et al., l994). Other new chromosome 1 loci of interest are those for the homeobox protein prox 1 (PROX1) at 1q32.2 (Zinovieva et al., l996); mitochondrial capsule selenoprotein of sperm (MCSP) at 1q21 (Aho et al., l996); an RNA specific adenosine deaminase (ADAR) at 1q21 (Weier et al., l995); a modifier of methylation for class 1 HLA (MEMO1) at 1p36.1p35 (Cheng et al., l996); acidic calponin (CNN3) at 1p21 (Magachi et al., l995); the glutamatecysteine ligase regulatory subunit (GLCLR) at 1p21 (Tsuchiya et al., l995); the LY9 cell membrane antigen homolog at 1q21.3q22 (Kingsmore et al., l995); the neuronal adhesion protein astrotactin (ASTN) at 1q25 (Fink et al., l995); and the extracellular matrix protein PRELP at 1q32.1 (Grover et al., l996). Loci for 3 additional members of the family of S100 calcium binding proteins (S100A11, S100A12, S100A13) have been assigned to 1q21 (Wicki et al., l996a,b).
Disease Mapping
(Prepared by Jeff Vance)
Chromosome 1 continues to demonstrate the presence of several important disease loci. There are now over 62 disorders localized to chromosome 1 (Figure 2 ). These include 15 disorders that have been mapped to chromosome 1 since the previous workshop or previously mapped disorders that have had the gene defect identified.
(A) Previously mapped disorders with gene identification (OMIM numbers)
Fundus Flavimaculatus with Macular Dystrophy (248200). Also known as Stargardt's disease, this is a common cause of macular degeneration in childhood. It is due to a defect in the ABCR gene encoding an ATP-binding cassette transporter (Allikmets et al., 1997).
Juvenile-onset glaucoma (137750) (GLC1A). This disorder is the rarer, juvenile onset form of glaucoma, shown to be different from the common idiopathic form of the disease (Wiggs et al., 1996). This autosomal dominant form was localized to 1q21-q31 by linkage analysis (Sheffield et al., 1993; Wiggs et al., 1994). An EST, TIGR 601652, was found to map in the candidate interval and was subsequently shown to be the gene defect (Stone et al., 1997)
Multiple epiphyseal dysplasia (600204). This autosomal dominant AD inherited form has been recently found to be due to a mutation in a collagen gene, COL9A2, in a large Dutch family (Muragaki et al., 1996)
(B) New Disorders
Posterior polar cataract (116600). This (AD) congenital posterior cataract was linked (Ionides et al., 1997) using a large English family to a region overlapping that previously noted for Volkman congenital cataract (Eiberg et al., 1995).
Primary congenital glaucoma (GLC3B) (600975). Akarsu and colleagues (1996) mapped this second locus for autosomal recessive congenital glaucoma to a region just proximal to the CMT2A locus, using consanguineous families. An initial locus (GLC3A) is located on 2p21.
Schnyder's crystalline corneal dystrophy (121800). This locus was mapped (Shearman et al., 1996) in the 1p34.1-p36 region in two families with Scandinavian descent. The disorder is characterized by crystalline deposits of cholesterol in the stroma.
Mom1 (Pla2G2A) (172411). This locus contributes to the variability of colonic polyps in mice and therefore is named as a modifier of the mutations in the Apc gene leading to multiple intestinal neoplasia. The MOM1 locus, Pla2GA has been placed by FISH to the 1p36-35 region, and YACs containing this locus cohybridized with D1S199. Interestingly, LOH occurs in 48% of sporadic colorectal tumors for D1S199 (Praml et al., 1995).
Retinitis Pigmentosa (RP18) (601414). Xu and colleagues (1996) studied a large AD Danish family and mapped this locus to the pericentrimeric region of chromosome 1 using linkage analysis.
Vorwinkel syndrome (124500). Also known as a variant of keratoderma hereditaria mutilans, these patients have multiple abnormalities including hyprkeratosis, develop constricting bands of the digets (pseudoainhum), have associated findings of alopecia and deafness as well as spastic paraplegia and myopathy (Maestrini et al., 1996). It is due to a defect in loricrin, demonstrating the first defect in one of the members of the epidermal differentiation complex.
Autosomal dominant non-syndromic deafness (DFNA7) (601412). This large Norwegian family with isolated, progressive high tone hearing loss was linked to D1S196 (Fagerheim et al., 1996), near a known POU gene, which has been shown to be the defect in DFN3.
Prostate Cancer (HPC1) (601518). A susceptibility gene for prostate cancer (Smith et al., 1996) was localized to 1q24-25 in a large set of multiplex families, using both parametric and non-parametric analyses.
Pycnodysostosis (265800). Localized in 1995 to 1q21 (Gelb et al., 1995; Polymeropoulos et al., 1995) this autosomal recessive disorder is best known for being the disorder that most likely afflicted the artist Henri de Toulouse-Lautrec. Shi et al (1994) had localized a cysteine protease gene in this area, which are implicated in bone resorption and remodeling. Gelb and colleagues (1996) demonstrated that the lysosomal enzyme Cathepsin K is indeed the defect in this osteochondrodysplasia disorder.
Variegate porphyria (176200). The gene for Protoporphyrinogen oxidase, the primary enzymatic abnormality in this AD disorder, was localized to lq23 using FISH. Variegate porphyria was similarly mapped to the same area using linkage analysis (Roberts et al., 1995).
Arrhythogenic right ventricular cardiomyopathy ARVD2 (107970). This ventricular cardiomyopathy is characterized by an effort-enduced tacycardia and was mapped near the alpha-actinin2 locus using linkage analysis (Rampazzo et al., 1995).
Chediak-Higashi syndrome (214500). This autosomal recessive disorder characterized by oculocutaneous albinism and immunologic deficiency was mapped to 1q42.1-42.2 by two groups (Barrat et al., 1996; Fukai et al.,1996). Both focused on chromosome 1 based on the hypothesis that the beige phenotype in mice, located on a chromosome 1 homologous region on mouse chromosome 13, is the same gene defect. Barrat et al (1996) have established a YAC contig covering the region.
Chromosome 1 Abnormalities in Human Neoplasia
(Prepared by Garrett Brodeur)
Structural abnormalities of chromosome 1 have been identified in a variety of solid tumors and hematopoietic malignancies (Dracopoli et al., 1994; Schwab et al., 1996; Weith et al., 1996). Furthermore, there is functional evidence that transfer of chromosome 1, or subchromosomal fragments, can affect the tumorigenicity of certain types of cancer (Bader et al., 1991; Horikawa et al., 1995; Tanaka et al., 1993). Finally, there is evidence that escape from senescence in somatic cell hybrid maps to two distinct regions of 1q (Karlsson et al., 1996). Taken together, these data suggest that several important genes contributing to malignant transformation, progression or predisposition map to this chromosome.
This report will review visible abnormalities of chromosome 1-categorized as deletions, translocations or duplications (such as trisomy or gene amplification)-that have been reported in particular cancers, with emphasis on new findings reported since the Second Chromosome 1 Workshop (Weith et al., 1996). The same criteria for inclusion that were used in previous Workshop reports (Dracopoli et al., 1994; Weith et al., 1996) were also used in this report. The reports of chromosome 1 abnormalities in cancer will be reviewed by disease type, with references given for new or previously uncited reports. However, the tables and figures will show the cumulative list of abnormalities according to type of rearrangement.
Deletions of chromosome 1 have been found in a variety of malignant diseases, and presumably these deletions represent loss of one or more suppressor genes from this region. Deletions were found more commonly in solid tumors (rather than leukemias), and most involved the short arm (1p), rather than the long arm (1q). Indeed, distal 1p (1p35-36) was the most common site of deletion, although several other discrete chromosomal regions on 1p and 1q are deleted in specific malignancies. This suggests there are several suppressor loci on chromosome 1, but no unequivocal suppressor genes on this chromosome have been cloned to date. The findings for deletions of chromosome 1 in human cancers are summarized in Table 11, and the regions of deletion are shown diagrammatically in Figure 3.
Translocations have been reported primarily in leukemias or myeloproliferative disorders (MDS), although an increasing number of reports involve solid tumors as well. In most cases that have been studied in detail, translocations result in the juxtaposition of regions of two genes, resulting in "oncogene activation". This may occur by deregulation of a gene on one of the partner chromosomes, or more commonly by the formation of a chimeric gene with enhanced or altered activity. The findings for translocations of chromosome 1 in human cancers are summarized in Table 12, and the locations of the breakpoints are shown diagrammatically in Figure 4.
Duplication of a chromosomal region can be accomplished by trisomy of the whole chromosome, a chromosomal arm, or a portion of the chromosome. For example, trisomy 1q is one of the most common changes in human neoplasia. This abnormality is seen in leukemias as well as solid tumors. Presumably three or more copies of a gene or genes in this region provides a selective advantage to cancer cells. Amplification of DNA from chromosome 1 may be seen cytogenetically as extrachromosomal double minutes (Dms), or as a chromosomally integrated homogeneously staining region (HSR). However, the cytogenetic origin of the amplified DNA cannot be discerned without a Southern or in situ hybridization procedure. The findings for trisomy or amplification of a region of chromosome 1 in human cancers are summarized in Table 13, and the regions involved in these rearrangements are shown diagrammatically in Figures 5.
Chromosome 1 Abnormalities in Specific Cancers
Brain tumors. Allelic loss on 1p is seen in several types of brain tumors, including gliomas, medulloblastomas meningiomas (Bello et al., 1995a; Kraus et al., 1996b; Sulman et al., 1997). Deletion in gliomas occurred on 1p particularly in oligodendrogliomas (Bello et al., 1995a; Bello et al., 1995b; Hashimoto et al., 1995). Loss of heterozygosity of 1p31-32 (between D1S1591 and D1S1596) was found in 35% of meningiomas (Bello et al., 1995a; Sulman et al., 1997). Finally, allelic loss in medulloblastoma occurs on 1q (between D1S1604 and D1S237) in 36% of the cases (Kraus et al., 1996b).
Breast cancer. Breast cancers have deletions that appear to cluster to 4 discrete regions of chromosome 1, suggesting multiple suppressor genes are involved in this disease as well (Weith et al., 1996). A portion of 1p31.1 that is frequently deleted has been cloned in yeast artificial chromosomes (YACs) (Hoggard et al., 1995). Deletion of 1p occurs as a frequent alteration in ductal carcinoma in situ of the breast, suggesting it may be an early change in the development of breast cancer (Munn et al., 1995). Deletion of 1p32-pter has been associated with amplification of the MYC gene in breast cancers (Bieche et al., 1994). Duplications of regions of chromosome 1 have also been reported recently in breast cancer. These involve either most of the short arm of chromosome 1, or the very distal portion of 1q (1q41-q44) (Weith et al., 1996).
Colorectal cancer. Three different regions on the distal short arm of chromosome 1 (1p34.2-pter) have been defined recently (Praml et al., 1995; Weith et al., 1996). There is controversy over the frequency and timing of 1p deletions in colorectal carcinomas, but the frequency may depend upon the number of probes used for detection. One study found 1p loss of heterozygosity (LOH) frequently in colorectal polyps, suggesting it is an early event (Lothe et al., 1995), but this point is controversial (Schwab et al., 1996). As previously mentioned, suppression of tumorigenicity in colon carcinoma cells by introduction of the 1p36 region has been reported (Tanaka et al., 1993).
Endometrial cancer. Frequent deletion of 1p sequences was found in endometrial cancers. The region of consistent loss was localized to 1p32-p33 between D1S211 and D1S190, and was more common in the highly aggressive papillary serous type (Arlt et al., 1996).
Ewing sarcoma. There is a t(1;16)(q11;q11) in a subset of Ewing sarcomas (also known as peripheral primitive neuroectodermal tumors or pPNETs) (Weith et al., 1996), whereas the characteristic finding is t(11;22). However, the genes involved in this t(1;16) translocation have not yet been cloned.
Gastric cancer. Allelic loss of 1p was observed in 12 of 26 gastric adenocarcinomas (Ezaki et al., 1996). The deletion has been mapped to 1p34-p35, between D1S57 and D1S62. However, this finding needs to be confirmed.
Germ cell tumors. Allelic loss of the maternal allele at 1p36 has been identified in pediatric germ cell tumors (Weith et al., 1996), whereas four different sites of allelic loss have been found in male germ cell tumors in adults (Mathew et al., 1994). The regions of most frequent LOH include: 1p13.3 (D1S73-26%), 1p22 (D1S16-38%) and 1p31-32 (D1S17-33%).
Hepatocellular cancer. Allelic loss of 1p appears to be the most frequent genetic abnormality in hepatic tumors (hepatoblastomas and hepatocellular carcinomas). The distal short arm (1p35-p36) shows loss or rearrangement most consistently, but other regions of 1p are deleted as well (Chen et al., 1996; Kraus et al., 1996a; Weith et al., 1996; Yeh et al., 1994).
Leukemias, lymphomas and MDS. The t(1;19) translocation is characteristic of pre-B acute lymphoblastic leukemia (ALL), and this results in the formation of a chimeric gene between PBX1 on chromosome 1 and E2A on chromosome 19 (Weith et al., 1996). Several translocations have been identified in T-cell ALL involving fusion of the TAL1 gene (or LCK) usually with members of the T-cell receptor (TCR) gene family (Table 12; Figure 4). The t(1;22) translocation is found in almost all cases of the M7 or megakaryoblastic variant of acute myelogenous leukemia (AML) (Weith et al., 1996). Two different translocations are found in myeloid leukemias involving chromosome 1 and 11. The first involves the AF1P gene at 1p, and the second involves the AF1Q gene at 1p. Both rearrangements involve the MLL1 gene at 11q23. Finally, there are two different translocations involving chromosome 1 in AML or MDS for which the translocation breakpoints and putative fusion partner genes have not yet been identified (Table 12; Figure 4).
Lung cancer. Chromosome 1 abnormalities have been described in small cell and non-small cell lung cancer, but the frequency, location and significance is not clear due to a lack of detailed molecular studies. Amplification of the MYCL gene at 1p32 is a common change in small cell lung cancer, but it has not been reported in other types of cancer (Weith et al., 1996). In some cases that have been studied in detail, an intrachromosomal rearrangement has been identified between the MYCL gene and another gene called RFL that is about 500 kb upstream of MYCL.
Melanoma. Evidence for at least two discrete regions of allelic loss have been reported in melanomas one at 1p36 and another at 1p22-31 (Weith et al., 1996). Indeed, predisposition to melanoma with dysplastic nevi has been mapped by linkage analysis to 1p36, but other sites of melanoma predisposition have been identified on chromosome 9.
Merkel cell tumor. Deletions or rearrangements involving 1p36 appear to be the most frequent type of abnormality in Merkel cell tumors, but relatively few cases have been studied (Weith et al., 1996).
Mesothelioma. Deletion of the short arm of chromosome 1, especially 1p21-p22, is a frequent finding in mesotheliomas (Lee et al., 1996; Weith et al., 1996). The consensus region of deletion is from D1S435 to D1S236. Gain of the long arm of chromosome 1 is also seen (Bjorkqvist et al., 1997).
Neuroblastoma. The largest number of reports concern the region of deletion in distal 1p in neuroblastoma. The region of 1p that was consistently deleted in neuroblastoma involved subbands of 1p36 (1p36.2-36.3) (White et al., 1995), but some tumors had much larger deletions, particularly those associated with MYCN amplification, have suggested that there may be a more proximal region containing another suppressor gene in the region 1p35-p36.1 (Weith et al., 1996). Two neuroblastoma patients have been identified with constitutional rearrangements of 1p36 (Weith et al., 1996), suggesting a predisposition gene might reside in this region. However, linkage analysis in families with hereditary neuroblastoma have failed to show linkage to this chromosomal region (Maris et al., 1996).Ovarian cancer. Two regions of frequent deletion of 1p were found in primary ovarian cancers: one at 1p35-p36 (between D1S199 and D1S470) and the other at 1p31-32 (between D1S201 and MYCL) (Futrcal et al., 1995).
Parathyroid adenoma. Frequent loss of 1p has been identified in 40% of parathyroid adenomas (Cryns et al., 1995). Most showed loss from 1p32-1pter, but the region has not been more precisely defined.
Pheochromocytoma and medullary thyroid cancer (MTC). The deletions in pheochromocytomas and MTCs usually involved almost all of 1p, suggesting that more than one locus may be affected (Weith et al., 1996). Alternatively, a rearrangement in proximal 1p may be involved.
Prostate cancer. A genome-wide search has identified a major susceptibility locus for prostate cancer on chromosome 1 (Smith et al., 1996). The locus has been mapped by linkage analysis to the long arm of chromosome 1 (1q24-q25). However, no candidate gene has been identified yet.
Renal cell cancer. A t(X;1)(p11;q21) translocation has been described as a cytogenetically defined subtype of papillary renal cell carcinoma (Weith et al., 1996). Recently, the translocation breakpoint has been cloned, and the translocation results in a fusion between the novel PRCC gene on chromosome 1 and the TFE3 transcription factor on the X chromosome (Sidhar et al., 1996). Deletion of the long arm of chromosome 1 has been demonstrated in renal collecting duct carcinoma, and the region of consistent deletion has been localized to subbands of 1q32 (Steiner et al., 1996).
Rhabdomyosarcoma and other sarcomas. Similarly, there is a consistent t(1;13) translocation in alveolar rhabdomyosarcoma, which is found in 10-20% of this histologic subtype; the majority of cases have a t(2;13) translocation. This t(1;13) translocation results in a fusion gene that involves the juxtaposition of PAX7 on chromosome 1 with FKHR on chromosome 13 (Davis et al., 1994). Interestingly, there is recent evidence that the region involving PAX7 and FKHR is frequently amplified in alveolar rhabdomyosarcoma. In addition to this locus, there are recent reports of amplification of a region of 1q (1q21-22) in soft tissue and bone sarcomas (Weith et al., 1996). This may appear as DMs, HSR or a ring chromosome.
Wilms tumor. An individual with Wilms tumor has been described with features suggestive of predisposition as well as a t(1;7)(q42;p15) constitutional translocation (Reynolds et al., 1996). The breakpoint has been spanned in YACs, but a putative predisposition has not been identified yet on either chromosome, and confirmatory evidence for predisposition at this locus is needed.
Systematic Mapping and Sequence Analysis
(Compiled by Richard Wooster, Simon Gregory and Mark Vaudin)
In the second half of 1996 the Sanger Centre embarked on the characterization of chromosome 1. Our aim is to construct physical clone maps covering chromosome 1 and use these to determine the DNA sequence of the whole chromosome. The strategy we are following involves establishing a high density framework map (in the order of 15 to 20 sequence tagged sites(STSs) per Megabase across the whole chromosome) using radiation hybrid (RH) mapping (Figure 6). The STSs are then used to identify large insert genomic clones, mainly P1 and bacterial artificial chromosome (PAC and BAC) clones. These are analyzed by restriction fingerprinting and STS content mapping. This data is used to generate genomic clone contigs. Any gaps between the contigs are being closed by directed walking experiments. A minimum tiling path of clones are submitted for sequencing. The sequence is annotated and submitted to the public databases (EMBL and GeneBank). The whole process is set out in a href=figures.shtml>Figure 1 which shows how data can be integrated from other sources and at what stages and how the data is made available. We hope to be able to provide data that is both timely and relevant to the chromosome 1 community.
RH mapping
4272 established markers were imported by anonymous ftp from the Genome Data Base, Généthon, the Whitehead Institute, the Cooperative Human Linkage Centre and Unigene in order to facilitate the integration of mapping data. To supplement these STSs we generated a further 1990 random chromosome 1 STSs as follows: Human chromosome 1 DNA was isolated using a fluorescence activated cell sorter. The DNA was digested to completion with Hind III and subcloned in to the plasmid Bluescript. Plasmid DNA was isolated from 6336 unique clones for sequencing. High quality sequences were submitted for PCR primer design. Sequences were rejected for primer design if they were not long enough, contained repetitive motifs or matched existing human sequence. Of the 2487 sets of STS primers that were designed, 80% successfully mapped back to chromosome 1 (10% did not map back to chromosome1 and 10% failed to amplify human DNA).Primer pairs for all STSs were tested and then used to screen the Généthon/Cambridge (GB4) panel of 85 radiation hybrids. Two point analysis of the data was used to define the intervals across the chromosome. Totally linked (i.e. inseparable) markers were subsequently masked from further analysis and the RH map was constructed using maximum likelihood and stepwise locus ordering algorithms through the RHMAPPER program. So far 2771STSs have been successfully typed, these represent 1446 ESTs and 1325 anonymous STSs. The present map has a framework of 84 STSs and a length of 805cR. 1754 STSs have been successfully placed on the map with a LOD score of 2.5 or more. The chromosome 1 RH map is available via the Sanger Centre's Web pages. Further, all STSs specific for chromosome 1 can be integrated into the RH map. The sequence of STSs and primers can be sent to the Sanger Centre. After typing, they will be placed on the RH map.
Bacterial Clone Identification
In the initial stage of bacterial clone map construction, markers from 1p35 to the p telomere and 1q24 were used to screen a whole genome PAC library of approximately 7 genome equivalents (from Pieter de Jong). During this work the latter half of this PAC library became available giving a total of approximately 15 genome equivalents. This section of the library is reported to contain clones with larger inserts and fewer non-recombinants and is being used for most of the screening at present. The strategy for PAC identification is as follows. Up to 25 STSs are radiolabeled before being incubated with total human DNA to suppress repetitive sequences and pooled together. The pools are hybridized to high-density grids of the PAC library. Positive clones are picked from the library into fresh microtiter plates. The STS content of the PACs is determined either by a second round of hybridization (where the STSs are hybridized one by one to the positive clones arrayed on a single filter) or by a colony PCR assay. All positive clones are also submitted for restriction enzyme fingerprinting. To date 1598 STSs have been screened across the PAC library. These have identified over 9,000 positive PACs. The screening of STSs from the remainder of the p arm is in progress and will be followed by STSs for the rest of the q arm. Most chromosome 1 specific STSs can be used to screen the PAC library. The sequence of STSs and their primers can be sent to the Sanger Centre. Where possible, the STSs will be hybridized to the PAC library and the positive clones passed on for fingerprinting. All of the hybridization data is available via the Sanger Centre Web pages or in ACeDB format from the Sanger Centres ftp site.
Restriction Fingerprinting
We have implemented a fluorescent fingerprinting technique (based on the method used in the C.elegans sequencing project by Coulson and Sulston). Fluorescent dye-terminators are incorporated onto the sticky end of Hind III digested clone DNA before recutting the Hind III fragments with Sau3AI. Labeled fragments are separated on a polyacrylamide gel and the data collected using an automated ABI 377 sequencer. Editing and data entry is performed using Image3 (a UNIX based image editing program) prior to analysis of the fingerprint data in our clone manipulation program FPC. Clones are overlapped on the basis of statistically significant probability matches and formed into contigs. A minimally overlapping set (or tiling path) is chosen for sequencing where clones have assembled into a deep contig and clone integrity is verified by good STS content data. All clones that are chosen for sequencing are routinely subjected to fluorescent insitu hybridization to check chimerism and to confirm the location on the cytogenetic map. Our evolving chromosome 1 bacterial clone database currently contains 1725 PACs, 859 contained within 143 contigs, yielding an estimated coverage of 18Mb. As of April 1997 55 clones have been sent for sequencing. The fluorescent fingerprinting method enables us to integrate any type of bacterial clone into our whole chromosome database. This includes PACs, BACs, cosmids and fosmids. All of the contigs that contain clones being sequenced can be viewed on the Sanger Centre Web pages and in ACeDB format from the Sanger Centres ftp site.
Sequencing
The sequencing strategy is an initial random shotgun phase of sequencing 1.2-1.8 kb fragments in M13 followed by directed finishing. A recent modification is being introduced to incorporate 25% or more of similar sized fragments cloned in pUC. DNA prepared from the sub-clones is sequenced using 50% dye primer and 50% dye terminator chemistry. The use of Thermosequenase and energy transfer (ET) primers has increased the consistency of good quality reads with an average read length of 350-400 base pairs (after vector and quality clipping) obtained in a 3.5 hour run on an ABI 377 sequencer. Inclusion of terminator sequencing in the shotgun has reduced significantly the number of sequencing compression problems. Using a UNIX table editor, data is entered into the laboratory database built on the ACeDB client/server model. Clone details, subclone information and other laboratory data is tracked by this mechanism. The raw sequence data is processed using the program ASP delivering quality statistics back to the system. Sequence readings are base called with PHRED and assembled with PHRAP (Phil Green Seattle). Automated ordering of directed reads is performed with the program FINISH. A robotic system has been developed to automate selection, rearraying and execution of these reads. The project is considered finished when contiguous, double stranded sequence with all problems resolved to a final accuracy of 99.99% is achieved.
Finished sequence analysis is partially automated. All finished sequences are searched against various protein and nucleic acid sequence databases. A variety of gene/exon prediction packages are used and the collated results are entered into an ACEDB database, from which they can be displayed graphically, allowing interactive gene building and other annotation.
PAC clones from chromosome 1 entered the sequencing pipeline in January 1997. To date 33 PACS from 1q24-25 and 22 PACs from 1p35-1pter have been submitted for sequencing. Twelve clones (1.2 Mb)from 1q24-25 have entered shotgun. Eight of these are in assembly and one is finished. PAC clones dJ24M15 and dJ518E13 combine to form a 200 Kb sequence contig. Analysis has revealed the entire sequence of the previously identified Tenascin R gene. Figure 7 shows the annotation of this gene in ACeDB. The alignment of the exons with the predicted exons and EST hits is clearly shown along with the distribution of various repeat families. It is anticipated that all of the sequence produced at the Sanger Centre will be annotated in a similar fashion.
Data release
The Sanger Centre policy on genomic sequence is for early and open release into the public domain of all genomic sequence data, including pre-release of data before the final finishing stage. Furthermore the RH mapping data, the fingerprinting data and all of the PAC identification results are made available to the chromosome 1 community. This data will include any unpublished information that is provided to us by other investigators. The information can be obtained in ACeDB format from, ftp://ftp.sanger.ac.uk/pub/human/chr1/rawdata/, this is updated on a weekly basis. It is also available on an interactive basis at http://www.sanger.ac.uk/HGP/Chr1, this version is updated daily. All of the DNA sequence is available either from ftp://ftp.sanger.ac.uk/pub/human/sequences/Chr_1/ or the public databases EMBL and GenBank.
Acknowledgement
This work was supported in part by a NIDR/NIH grant P50-D309170 and a grant from the Muscular Dystrophy Association (MDA).