Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
???displayArticle.abstract???
Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and approximately 67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. The results of the analysis have been stored in a publicly available database XenDB http://bibiserv.techfak.uni-bielefeld.de/xendb/. A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches.
Figure 3. Identification of chimeric TCs: Matches of at least 100 bp in length were mapped back to the TC sequences to identify the regions that are covered by a match (yellow boxes). If two matches overlap, the region will be extended accordingly. If after the mapping two clearly separated regions remain as shown here, the TC is flagged as potential chimera.
Figure 1. Full length clone selection (top) and TC categories (bottom). ESTs derived from different clones were clustered and assembled. The CAP3 contig was compared to protein databases using BLASTX and FASTY and hits categorized in 4 categories. Class 1 hits had to match the whole protein sequence and start with an ATG in the TC and M in the protein and the hit had to end at a STOP codon. Class 2 hits had to match the whole protein sequence, start with an ATG in the TC and M in the protein. Class 3 had to match the full protein sequence (without further restrictions), class 4 had to cover the protein over almost its full length, allowing the match to start or end maximal 10 ten amino acids after/before the start or end of the protein. Predicted 5' TCs (P5P) had to have enough sequence to fill up the missing 5' end of the protein sequence. Clone selection: Clone A and B were discarded because of missing IMAGE id. Clone 54321 does not span 5' end of protein match. Clone 21345 was selected as most 5' clone fulfilling the requirements.
Figure 2. Comparison of a BLASTX alignment with corresponding full length FASTY alignment, as generated by the Genlight system. Blue boxes in (a) indicate open reading frames, green boxes start and red boxes stop codons, respectively. The assembled TC sequence has a frameshift at position 1150 from frame 1 to 3, generating two distinct HSPs in the BLASTX alignment (b). FASTY clearly corrects this frameshift and generates a full length alignment (c).
Figure 4. Two examples of TCs derived from clones predicted to have a full length insert (P5P). The start positions in the hit suggest that the unmatched amino-terminal protein sequence is not well conserved between X. laevis and the matched organisms, here rabbit (top) and human (bottom), but the open reading frames (blue boxes) indicate that the clones the sequences were derived from do actually contain a full length insert. (Screenshots of the results were generated by the Genlight system.)
Figure 5. Cluster view of the XenDB Web interface. Best FASTY hits to NR protein database, five model organisms and Xenopus proteins are shown on top. Gene Ontologies (GO) are based on best human and mouse IPI hits, functional categories on hits to COG and KOG databases. Below, additional information for each EST in the cluster is shown, such as accession, UniGene and TGI id, clone, cell and tissue type. Clones predicted not to be full length are colored red. Links to CAP3 assembly and TC sequence are provided.
Aaronson,
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
1996, Pubmed
Aaronson,
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.
1996,
Pubmed Adams,
Complementary DNA sequencing: expressed sequence tags and human genome project.
1991,
Pubmed Altmann,
Microarray-based analysis of early development in Xenopus laevis.
2001,
Pubmed
,
Xenbase Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997,
Pubmed Apweiler,
UniProt: the Universal Protein knowledgebase.
2004,
Pubmed Arima,
Global analysis of RAR-responsive genes in the Xenopus neurula using cDNA microarrays.
2005,
Pubmed
,
Xenbase Ashburner,
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
2000,
Pubmed Bartel,
MicroRNAs: genomics, biogenesis, mechanism, and function.
2004,
Pubmed Bendtsen,
Improved prediction of signal peptides: SignalP 3.0.
2004,
Pubmed Besemer,
Heuristic approach to deriving models for gene finding.
1999,
Pubmed Boguski,
ESTablishing a human transcript map.
1995,
Pubmed Boon,
An anatomy of normal and malignant gene expression.
2002,
Pubmed Burke,
d2_cluster: a validated method for clustering EST and full-length cDNAsequences.
1999,
Pubmed Chevreux,
Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs.
2004,
Pubmed Chow,
Pax6 induces ectopic eyes in a vertebrate.
1999,
Pubmed
,
Xenbase Christoffels,
STACK: Sequence Tag Alignment and Consensus Knowledgebase.
2001,
Pubmed Chung,
Screening of FGF target genes in Xenopus by microarray: temporal dissection of the signalling pathway using a chemical inhibitor.
2004,
Pubmed
,
Xenbase Cox,
Caudalization of neural fate by tissue recombination and bFGF.
1995,
Pubmed
,
Xenbase Crump,
Exposure to the herbicide acetochlor alters thyroid hormone-dependent gene expression and metamorphosis in Xenopus Laevis.
2002,
Pubmed
,
Xenbase Edgar,
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.
2002,
Pubmed Ewing,
Base-calling of automated sequencer traces using phred. II. Error probabilities.
1998,
Pubmed Ewing,
Analysis of expressed sequence tags indicates 35,000 human genes.
2000,
Pubmed Gaiano,
The role of notch in promoting glial and neural stem cell fates.
2002,
Pubmed Gehring,
Pax 6: mastering eye morphogenesis and eye evolution.
1999,
Pubmed Gehring,
The genetic control of eye development and its implications for the evolution of the various eye-types.
2002,
Pubmed Gehring,
Homeodomain proteins.
1994,
Pubmed Glaser,
Genomic structure, evolutionary conservation and aniridia mutations in the human PAX6 gene.
1992,
Pubmed Gupta,
Genome wide identification and classification of alternative splicing based on EST data.
2004,
Pubmed Halder,
Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila.
1995,
Pubmed Henderson,
Finding genes in DNA with a Hidden Markov Model.
1997,
Pubmed Hillier,
Generation and analysis of 280,000 human expressed sequence tags.
1996,
Pubmed Huang,
CAP3: A DNA sequence assembly program.
1999,
Pubmed International Human Genome Sequencing Consortium,
Finishing the euchromatic sequence of the human genome.
2004,
Pubmed Isaacs,
Regulation of Hox gene expression and posterior development by the Xenopus caudal homologue Xcad3.
1998,
Pubmed
,
Xenbase Jurka,
Repbase update: a database and an electronic journal of repetitive elements.
2000,
Pubmed Kersey,
The International Protein Index: an integrated database for proteomics experiments.
2004,
Pubmed Klint,
Signal transduction by fibroblast growth factor receptors.
1999,
Pubmed Komar,
Internal ribosome entry sites in cellular mRNAs: mystery of their existence.
2005,
Pubmed König,
Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design.
2004,
Pubmed
,
Xenbase Koonin,
A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes.
2004,
Pubmed Kota,
Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.).
2003,
Pubmed Krogh,
Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.
2001,
Pubmed Krüger,
e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences.
2004,
Pubmed Kuhlbrodt,
Sox10, a novel transcriptional modulator in glial cells.
1998,
Pubmed Ladd,
Finding signals that regulate alternative splicing in the post-genomic era.
2002,
Pubmed Lal,
A public database for gene expression in human cancers.
1999,
Pubmed Lash,
SAGEmap: a public gene expression resource.
2000,
Pubmed Liang,
An optimized protocol for analysis of EST sequences.
2000,
Pubmed Lipscombe,
Functional diversity in neuronal voltage-gated calcium channels by alternative splicing of Ca(v)alpha1.
2002,
Pubmed Mattick,
Non-coding RNAs: the architects of eukaryotic complexity.
2001,
Pubmed Michaut,
Analysis of the eye developmental pathway in Drosophila using DNA microarrays.
2003,
Pubmed Mironov,
Frequent alternative splicing of human genes.
1999,
Pubmed Morey,
Employment opportunities for non-coding RNAs.
2004,
Pubmed Muñoz-Sanjuán,
Gene profiling during neural induction in Xenopus laevis: regulation of BMP signaling by post-transcriptional mechanisms and TAB3, a novel TAK1-binding protein.
2002,
Pubmed
,
Xenbase Nekrutenko,
Reconciling the numbers: ESTs versus protein-coding genes.
2004,
Pubmed Oklü,
The latent transforming growth factor beta binding protein (LTBP) family.
2000,
Pubmed Pearson,
Comparison of DNA sequences with protein sequences.
1997,
Pubmed Peiffer,
A Xenopus DNA microarray approach to identify novel direct BMP target genes involved in early embryonic development.
2005,
Pubmed
,
Xenbase Quackenbush,
The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species.
2001,
Pubmed Roberts,
Alternative splicing: combinatorial output from the genome.
2002,
Pubmed Sammut,
The fate of duplicated major histocompatibility complex class Ia genes in a dodecaploid amphibian, Xenopus ruwenzoriensis.
2002,
Pubmed Schuler,
Pieces of the puzzle: expressed sequence tags and the catalog of human genes.
1997,
Pubmed Schuler,
A gene map of the human genome.
1996,
Pubmed Shin,
Identification of neural genes using Xenopus DNA microarrays.
2005,
Pubmed
,
Xenbase Sonnhammer,
A hidden Markov model for predicting transmembrane helices in protein sequences.
1998,
Pubmed Stamm,
Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome.
2002,
Pubmed Strausberg,
The cancer genome anatomy project: online resources to reveal the molecular signatures of cancer.
2002,
Pubmed Su,
A gene atlas of the mouse and human protein-encoding transcriptomes.
2004,
Pubmed Tarone,
Integrin function and regulation in development.
2000,
Pubmed Tran,
Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals.
2002,
Pubmed
,
Xenbase Useche,
High-throughput identification, database storage and analysis of SNPs in EST sequences.
2001,
Pubmed Velculescu,
Serial analysis of gene expression.
1995,
Pubmed Venables,
Alternative splicing in the testes.
2002,
Pubmed Wang,
EST clustering error evaluation and correction.
2004,
Pubmed Waterston,
Initial sequencing and comparative analysis of the mouse genome.
2002,
Pubmed Wheeler,
Database resources of the National Center for Biotechnology Information: update.
2004,
Pubmed Wheeler,
Database resources of the National Center for Biotechnology.
2003,
Pubmed Wheeler,
Database resources of the National Center for Biotechnology Information.
2005,
Pubmed Wright,
The Xenopus XIHbox 6 homeo protein, a marker of posterior neural induction, is expressed in proliferating neurons.
1990,
Pubmed
,
Xenbase Yelin,
Widespread occurrence of antisense transcription in the human genome.
2003,
Pubmed Yoshida,
Intermediate filament proteins define different glial subpopulations.
2001,
Pubmed
,
Xenbase Yoshida,
Glial-defined rhombomere boundaries in developing Xenopus hindbrain.
2000,
Pubmed
,
Xenbase Zhang,
A greedy algorithm for aligning DNA sequences.
2000,
Pubmed Zhang,
Computational prediction of eukaryotic protein-coding genes.
2002,
Pubmed