Mutagenesis, Vol. 17, No. 5, 361-364,
September 2002
© 2002 UK Environmental Mutagen Society/Oxford University Press
The discovery and confirmation of single nucleotide polymorphisms in the human p53R2 gene by EST database analysis
Centre for Molecular Genetics and Toxicology, School of Biological Sciences, University of Wales Swansea, Singleton Park, Swansea SA2 8PP, UK
| Abstract |
|---|
|
|
|---|
The human expressed sequence tag (EST) database provides a wealth of resources, which can be used to rapidly screen for potential polymorphisms in proteins of physiological interest. The human p53R2 gene, a recently identified ribonucleotide reductase, plays an important role in DNA repair and is involved in the pathway of p53 activity in response to the presence of DNA damage. On the basis of the alignment of human EST sequences, we identified three candidate polymorphisms at nt 2752, 2759 and 4696 in the 3'-untranslated region of the p53R2 gene. The presence of these polymorphisms was confirmed in a Caucasian population (n = 82) by allele-specific PCR and PCR/restriction fragment length polymorphism analyses. The rare allele frequency at position 4696 (15.5%) is higher than either rare allele frequency at position 2752 or 2759 (6 and 6%). Our results suggest that the human EST data may serve as a valuable source for the rapid identification of genetic variation.
| Introduction |
|---|
|
|
|---|
The human expressed sequence tags (ESTs) database consists of >3700000 entries of partial cDNA sequences. These sequences have been generated from many different tissues and are derived from a range of individuals. ESTs can reflect a part or all of the transcribed sequence of a gene, which includes the coding sequences as well as the 5'- and 3'-untranslated regions (UTRs). Currently, the ESTs database is accessible online from the website of the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/dbEST/). Because ESTs have been collected from many different sources, the wealth of information in the ESTs database provides investigators with overlapping sequences of the same region, thus potentially allowing the identification of new single nucleotide polymorphisms (SNPs). Over the past few years other investigators have taken advantage of bioinformatic searching strategies to identify many SNPs (Garg et al., 1999
p53R2 is a recently identified ribonucleotide reductase that catalyses the conversion of ribonucleoside diphosphates to their corresponding deoxyribonucleotides to provide precursors for DNA synthesis (Tanaka et al., 2000
) and it is also part of the p53 pathway. The p53R2 gene is one of the genes that functions in p53-induced DNA repair. In response to various levels of genotoxic stress, inhibition of p53R2 expression in cells with an intact p53-dependent DNA damage checkpoint has been shown to result in reduced levels of ribonucleotide reductase activity, DNA repair and cell survival (Tanaka et al., 2000
). Thus, p53R2 plays an important role in the repair of DNA damage.
Since p53R2 plays a crucial role in DNA repair and is involved in the pathway of p53 activity in response to the presence of DNA damage, the discovery and understanding of any genetic variation in the p53R2 gene present in the human population would be valuable. To our knowledge, two studies from the same laboratory have identified three polymorphisms in the p53R2 5'-UTR (Smeds et al., 2001a
,b
). In this study we have carried out a search for previously unidentified genetic polymorphisms in the p53R2 gene by direct searching strategies in the ESTs database and identified and further confirmed three SNPs in the 3'-UTR of the p53R2 gene.
| Materials and methods |
|---|
|
|
|---|
Materials
Oligonucleotides were synthesized using an automated DNA synthesizer (Cruachem, Glasgow, UK). Restriction enzymes and their reaction buffers were obtained from Promega Corp. (Southampton, UK) (HindIII) and New England BioLabs (Hertfordshire, UK) (Tsp509I). Agarose gel was obtained from Cambrex (Rockland, USA).
EST database screening
The EST database is available at the NCBI website (http://www.ncbi.nlm.nih.gov/dbEST/). Database screening was performed using gapped BLAST programs (Ulrich et al., 2000
), which are obtainable from the home page of the NCBI (Altschul et al., 1997
).
Primer design and restriction enzyme selection
Primers were designed for the candidate polymorphisms using the PRIMER program (obtained from the Whitehead Genome Center), which was used to evaluate primer melting temperature, annealing temperature and the likelihood of oligonucleotide self-priming. Restriction enzymes were selected for the polymorphic sites using the CUTTER 2.0 program (http://www.firstmarket.com/cutter/cut2.html), which can identify all the restriction enzyme sites in any given cDNA sequence.
Confirmation of p53R2 candidate polymorphisms in a Caucasian population
The presence of human p53R2 candidate polymorphisms was verified in a Caucasian population. DNA was extracted from peripheral blood lymphocytes by standard methods (Stratagene, UK). Eighty-two healthy individuals were randomly selected and included in this study (Gao et al., 1998
). The average age of healthy individuals was 43.5 years (range 1579), with 47 women and 35 men.
The BLAST program was used to search the EST database to identify candidate polymorphisms in the p53R2 gene. No candidate polymorphism in the coding region has been identified by a multiple sequence alignment. Five p53R2 candidate polymorphisms were identified in the 3'-UTR by searching the EST database, three of which have been experimentally confirmed in our study population (Figure 1
). A PCR-based assay was used to verify potential polymorphisms. Primer sequences are presented in Table I
. The first polymorphism, an A
C transversion at nucleotide position 2752 was detected by allele-specific PCR with one downstream (R2-3) and two upstream (R2-1 and R2-2) primers, differing in the terminal base (A or C). Each sample was tested in two parallel reactions, with the same downstream primer and one of the upstream primers. Amplification will take place in only one of the tubes, that containing the exact matching upstream primer, for homozygous individuals but in both tubes for heterozygotes. The reactions were performed in a total volume of 50 µl, containing 1.5 mM MgCl2, 10 mM TrisHCl, pH 8.8, 100 µM each dNTP, 10 pmol each primer, 2.5 U Taq DNA polymerase (Promega Corp) and 0.5 µg template DNA. Amplification was performed in a PTC-225 Peltier Thermal Cycler (MJ Research) with 5 min of initial denaturation at 94°C, followed by 32 cycles of 94°C for 45 s, 57°C for 30 s and 72°C for 45 s, with a final extension at 72°C for 5 min.
|
|
The second polymorphism, an A
G transition at nucleotide position 2759 was investigated by the PCR/restriction fragment length polymorphism (PCRRFLP) method. The A
G transition eliminates a recognition site for the restriction enzyme Tsp509I. The fragment containing the polymorphism was amplified by PCR using primers R2-4 and R2-5 in the same reaction mix as described above. The cycling conditions were 5 min of initial denaturation at 94°C, followed by 32 cycles of 94°C for 25 s, 57°C for 30 s and 72°C for 45 s, with a final extension at 72°C for 5 min. The amplification product was incubated overnight with Tsp509I restriction enzyme at 65°C and digested fragments were visualized on a 2.5% agarose gel with ethidium bromide staining.
The third polymorphism identified is a G
C transversion at nucleotide position 4696, which creates a recognition site for the HindIII restriction enzyme. The fragment containing the polymorphism was amplified by PCR using primers R2-6 and R2-7 in the same reaction mix as described above. The cycling conditions were 5 min of initial denaturation at 94°C, followed by 32 cycles of 94°C for 10 s, 57°C for 20 s and 72°C for 45 s, with a final extension at 72°C for 5 min. The amplification product was digested overnight with HindIII restriction enzyme at 37°C and the digested fragments were analyzed on a 2.5% agarose gel with ethidium bromide staining.
After completion of the EST database screening, the potential polymorphisms in the p53R2 gene were checked against the SNP consortium database (http://snp.cshl.org/), the NCBI SNP database (http://www.ncbi.nlm.nih.gov/SNP/index.html) and the Human Genome Variation database, HGVbase (http://hgvbase.cgb.ki.se/) (Fredman et al., 2002
). The potential polymorphisms identified in this study had not been reported in the above databases. When the potential polymorphisms in the p53R2 gene have been experimentally confirmed, new identified polymorphisms can then be submitted to the SNP database.
Statistical analysis
2 analysis was used to test for agreement with the HardyWeinberg equilibrium in our study population and for linkage disequilibrium between the two polymorphic sites. All P values shown are two-sided and P < 0.05 was judged statistically significant.
| Results |
|---|
|
|
|---|
EST database screening
When the complete cDNA sequence of p53R2 (GenBank accession no. AB036063) was used as the query sequence, we identified 100 matches of human EST sequences from the human ESTs database using a BLAST searching approach (Altschul et al., 1997
On the basis of this BLAST searching approach, a number of sequence variants were observed. Since ESTs are usually generated by single pass automated sequencing, the occurrence of errors are quite common. Given that the sequencing errors are random, the number of true polymorphic sites can be substantially reduced to those that show the same base substitution in more than one EST sequence. Figure 2
shows the alignment of human EST sequences obtained with the p53R2 cDNA sequence. In this alignment, variations at nt 2752, 2759 and 4696 in the 3'-UTR are frequent. The sequence variations at positions 2752 and 2759 occurred in eight and nine of 14 aligned ESTs, respectively, while the variation at position 4696 occurred in 16 of 17 aligned ESTs. No other position presented variations at these frequencies in the region. Moreover, a review of these aligned EST sequences showed that they were derived from different tissue sources and research communities. Therefore, these support the possibility that the three identified variations are due to allelic variations rather than random sequencing errors.
|
Identification of three new p53R2 candidate polymorphisms in a Caucasian population
We used a PCR-based assay to confirm the existence of three candidate polymorphisms of the p53R2 gene in the 3'-UTR among a Caucasian population. The results of the PCR-based assay are shown in Figure 3
C transversion at position 2752 was detected by the presence or absence of a 270 bp fragment after allele-specific PCR. The A
G transition at position 2759 was characterized by the presence or absence of a Tsp509I restriction site in the 249 bp PCR product after PCRRFLP analysis. Similarly, the G
C transition at position 4696 was characterized by the presence or absence of a HindIII restriction site in the 228 bp PCR product after PCRRFLP analysis.
|
The genotype frequencies among 82 Caucasian individuals are shown in Table II
2 analysis of these alleles indicated that there is significant linkage disequilibrium between these two loci (P < 0.05).
|
| Discussion |
|---|
|
|
|---|
Genetic polymorphisms in the human population have been studied in order to gain insight into their influence on the activity of specific genes involved in disease susceptibility. Finding previously unknown polymorphisms has often relied on the detection of a related phenotype or the chance sequencing of variant cDNA (Seidegard et al., 1988
The use of the human ESTs database to identify candidate polymorphisms has advantages that can be exploited to facilitate the development of highly dense genetic maps for the analysis of a human population. One of the main advantages of this approach is that it is undoubtedly rapid and cost-effective, which allows investigators to increase the pace of their research.
Although this approach has obvious advantages, there are some disadvantages and limitations that should be considered. EST sequences are usually generated by single pass automated sequencing, thus sequencing errors are common (2%) (Hillier et al., 1996
). Many of the deposited sequences in the ESTs database contain errors that produce false positives in the search for polymorphisms. Therefore, this requires a quality control measure to eliminate false positives. The measure of `more than one EST rule' is employed in this study (Brett et al., 2000
; Ulrich et al., 2000
), i.e. only candidate polymorphisms are considered for further analysis when more than one EST have the same potential polymorphic change.
The ESTs database contains a number of sequences from multiple sources. It provides an excellent resource to search for polymorphisms in widely expressed genes. However, this approach may be limited by the number of available ESTs, as the database contains mainly data for widely expressed genes. Another limitation of this approach is that ESTs are normally sequenced from the 5'- and 3'-ends and there are relatively few sequences spanning the central region of complete cDNAs. Thus, polymorphisms in these areas may be relatively less frequently discovered than those in the 5'- and 3'-UTRs (Figure 1
). The 3'-end of mRNA is a non-coding region and although it is not translated into protein, it contains sequence information to maintain and determine mRNA stability. Since p53R2 is a recently identified gene, the role of the 3'-UTR in this gene, associated with other binding sites or functional sites, is still not clear. However, it is conceivable that polymorphic changes in this region may have an impact on mRNA turnover and differences in mRNA turnover can modify the steady-state levels of a given mRNA and thus determine protein expression levels.
In summary, the human ESTs database has been demonstrated to be a valuable tool to search for potential polymorphisms by using different sequence alignment strategies. We have used this bioinformatic searching strategy to identify three new SNPs in the 3'-UTR of the human p53R2 gene. The presence of the polymorphisms has been confirmed in a Caucasian population (n = 82). All three new polymorphisms of the p53R2 gene reported here have been deposited in HGVbase (SNP001025927SNP001025929).
| Acknowledgments |
|---|
We thank Professor Julian M.Hopkin and Dr Pei-Song Gao for kindly providing the healthy samples. During the period of the study Zheng Ye was supported by an ORS award and a PhD studentship provided by Phillip Morris Products SA, Switzerland.
| Notes |
|---|
1 To whom correspondence should be addressed. Tel: +44 1792 205678; Fax: +44 1792 295447; Email: bazheye{at}swansea.ac.uk
| Reference |
|---|
|
|
|---|
-
Ali-Osman,F., Akande,O., Antoun,G., Mao,J.X. and Buolamwini,J. (1997) Molecular cloning, characterization and expression in Escherichia coli of full-length cDNAs of three human glutathione S-transferase Pi gene variants. Evidence for differential catalytic activity of the encoded proteins. J. Biol. Chem., 272, 1000410012.
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J.,Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
Blackburn,A.C., Tzeng,H., Anders,M.W. and Board,P.G. (2000) Discovery of a functional polymorphism in human glutathione transferase zeta by expressed sequence tag database analysis. Pharmacogenetics, 10, 4957.[Web of Science][Medline]
Board,P.G, Chelvanayagam,G., Jermin,L.S., Tetlow,N., Tzeng,H., Anders,M.W. and Blackburn,A. (2001) Identification of novel glutathione transferase and polymorphic variation by expressed sequence tag database analysis. Drug Metab. Dispos., 29, 544547.
Brett,D., Lehmann,G., Hanke,J., Gross,S., Reich,J. and Bork,P. (2000) EST analysis online: WWW tools for detection of SNPs and alternative splice forms. Trends Genet., 16, 416418.[Web of Science][Medline]
Fredman,D., Siegfried,M., Yuan,Y.P., Bork,P., Lehvaslaiho,H. and Brookes,A.J. (2002) HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res., 30, 387391.
Gao,P.S., Mao,X.Q., Kawai,M., Enomoto,T., Sasaki,S., Tanabe,O., Yoshimura,K., Shaldon,S.R., Dake,Y., Kitano,H., Coull,P., Shirakawa,T. and Hopkin,J.M. (1998) Negative association between asthma and variants of CC16(CC10) on chromosome 11q13 in British and Japanese populations. Hum. Genet., 103, 5759.[Web of Science][Medline]
Garg,K., Green,P. and Nickerson,D.A. (1999) Identification of candidate coding region single nucleotide polymorphisms in 165 human genes using assembled Expressed Sequence Tags. Genome Res., 9, 10871092.
Hillier,L., Lennon,G., Becker,M., Bonaldo,M.F., Chiapelli,B., Chissoe,S., Dietrich,N., DuBuque,T., Favello,A., Gish,W., Hawkins,M., Hultman,M., Kucaba,T., Lacy,M., Le,M., Le,N., Mardis,E., Moore,B., Morris,M., Parsons,J., Prange,C., Rifkin,L., Rohlfing,T., Schellenberg,K. and Marra,M. (1996) Generation and analysis of 280,000 human expressed sequence tags. Genome Res., 6, 807828.
Picoult-Newberg,L., Ideker,T.E., Pohl,M.G., Taylor,S.L., Donaldson,M.A., Nickerson,D.A. and Boyce-Jacino,M. (1999) Mining SNPs from EST databases. Genome Res., 9, 167174.
Seidegard,J., Vorachek,W.R., Pero,R.W. and Pearson,W.R. (1988) Hereditary differences in the expression of the human glutathione transferase active on trans-stilbene oxide are due to a gene deletion. Proc. Natl Acad. Sci. USA, 85, 72937297.
Smeds,J., Kumar,R. and Hemminki,K. (2001a) Polymorphic insertion of additional repeat within an area of direct 8 bp tandem repeats in the 5'-untranslated region of the p53R2 gene and cancer risk. Mutagenesis, 16, 547550.
Smeds,J., Nava,M., Kumar,R. and Hemminki,K. (2001b) A novel polymorphism (88 C
A) in the 5' UTR of the p53R2 gene. Hum. Mutat., 17, 82.[Medline]
Tanaka,H., Arakawa,H., Yamaguchi,T., Shiraishi,K., Fukuda,S., Matsui,K., Takei,Y. and Nakamura,Y. (2000) A ribonucleotide reductase gene involved in a p53-dependent cell-cycle checkpoint for DNA damage. Nature, 404, 4249.[Medline]
Ulrich,C.M., Bigler,J., Velicer,C.M., Greene,E.A., Farin,F.M. and Potter,J.D. (2000) Searching Expressed Sequence Tag databases: discovery and confirmation of a common polymorphism in the Thymidylate Synthase gene. Cancer Epidemiol. Biomarkers Prev., 9, 13811385.
Received on January 14, 2002; accepted on April 11, 2002.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



