Patterns & Motifs
Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes
Searching for sequence similarity by comparing physico-chemical properties of cDNA
DILIMOT is a server for finding short (3-8 amino acids), over-represented peptide patterns, or Linear motifs, in a set of proteins.
In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs. We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of such matrices in large databases. PoSSuMsearch Fast and Sensitive Matching of Position Specific Scoring Matrices Using Enhanced Suffix Arrays
The Gibbs Motif Sampler will allow you to identify motifs, conserved regions, in DNA or protein sequences.
Allow to select varied motif databases
Over-represented Transcription Factor Binding Site Prediction Tool (OTFBS)
PhyloGibbs is an algorithm for discovering regulatory sites in a collection of DNA sequences, including multiple alignments of orthologous sequences from related organisms. Many existing approaches to either search for sequence-motifs that are overrepresented in the input data, or for sequence-segments that are more conserved evolutionary than expected. PhyloGibbs combines these two approaches and identifies significant sequence-motifs by taking both over-representation and conservation signals into account.
The SCOPE motif finder is designed to identify candidate regulatory DNA motifs from sets of genes that are coordinately regulated. SCOPE motif finder uses an ensemble of three programs behind the scenes to identify different kinds of motifs - BEAM identifies nondegenerate motifs (e.g. ACGTGC), PRISM identifies degenerate motifs (e.g. AWCGRYH), and SPACER identifies bipartite motifs (e.g. ACCNNNNNNNNNGTT). All parameters are automatically set to find the optimal length motif and degree of degeneracy in the reported motifs.
Improbizer searches for motifs in DNA or RNA sequences that occur with improbable frequency (to be just chance) using a variation of the expectation maximization (EM) algorithm.
The main question R'MES addresses is 'does this motif occur in that biological sequence with an expected frequency?' In other words, can we observe it so many times, or so few times, just by chance? Usually, when the answer is no, such a motif is a candidate to have a particular biological meaning; only a candidate: statistical significance is not equivalent to biological significance.
ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif. We have used ELPH to find patterns such as ribosome binding sites (RBSs) and exon splicing enhancers (ESEs).
MEME is a tool for discovering motifs in a group of related DNA or protein sequences. MAST is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.
Tools for MOtif Discovery in nucleotide sequences
A Suite of Web-based Programs to Search for Regulatory Motifs in Prokaryotes and Eukaryotes
YMF is a program that detects statistically overrepresented words (motifs) in DNA sequences. The user may specify the characteristics of the motifs to be detected. A motif here is a short string of nucleotides, degenerate symbols, and spacers.
Motif discovery in data sets that include both intraspecies overrepresentation and interspecies conservation.
This tool locates sites (short DNA sequences) within the human, mouse, rat, fruitfly and the nematode (C. elegans) genomes.
The Regulatory Sequence Analysis Tools (RSAT) is a complex online tool to analyze regulatory sequences. The site offers a series of tools dedicated to the detection of regulatory signals in non-coding sequences. The only input required is a list of genes of interest (e.g. a family of co-regulated genes). From this information, you can retrieve the upstream sequences over a desired distance, discover putative regulatory signals, search the matching positions for these signals in your original dataset or in whole genomes, and display the results graphically in the form of a feature map.
jPREdictor was designed for the prediction of cis-regulatory elements, for which short motifs (protein binding sites on the DNA) are known. Using these motifs, it is possible to search them on sequences, to weight them by applying a positive or negative training set (model or background, respectively) and to score a sequence.
TRRD is a unique information resource, accumulating information on structural and functional organization of transcription regulatory regions of eukaryotic genes. Only experimentally confirmed information is included into TRRD.
Clover is a program for identifying functional sites in DNA sequences. If you give it a set of DNA sequences that share a common function, it will compare them to a library of sequence motifs (e.g. transcription factor binding patterns), and identify which if any of the motifs are statistically overrepresented in the sequence set.
The Motif Analysis Workbench is a WWW interface for automated comprehensive analyses of promoter regulatory motifs and the effect they exert on mRNA expression profiles. The server provides a wide spectrum of analysis tools that allow de-novo discovery of regulatory motifs, along with refinement and in-depth investigation of fully or partially characterized motifs.
SeqVISTA: A Graphical Tool for Sequence Feature Visualization and Comparison. SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet.
Cis-acting regulatory elements in plants
TOUCAN is a workbench for regulatory sequence analysis on metazoan genomes : comparative genomics, detection of significant transcription factor binding sites, and detection of cis-regulatory modules (combinations of binding sites) in sets of coexpressed/coregulated genes.
SwissRegulon is a repository of databases and bioinformatic tools related to transcription regulatory processes and includes SwissRegulon, Phylogibbs, MARA and TCS.
TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs.
The composite regulatory signature database (CRSD) is a web werver that can be applied in investigating complex regulatory behaviors involving the gene expression signatures (GESs), microRNA regulatory signatures (MRSs) and TF regulatory signatures (TRSs). The six well-known and large-scale databases, including the human UniGene (1), mature microRNAs (2,3), putative promoter (4), TRANSFAC (5), pathway, and Gene Ontology (6) databases, were integrated to provide the comprehensive analysis in CRSD. Two new genome-wide databases, the MRS and TRS databases, were also constructed and further integrated into the CRSD.
INCLUSive is a suit of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow for normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval and detection of known and unknown regulatory elements.
REDUCE is an acronym that stands for Regulatory Element Detection Using Correlation with Expression, not coincidentally also the title of our paper. Based on a simple model for transcriptional regulation by independently acting transcription factors, REDUCE makes it possible to find regulatory elements based on a single microarray experiment.
Sequence Motif Databases
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations.
Welcome to the UCSC ENCODE Project portal. This site contains information related to the ENCODE project at NHGRI. The University of California Santa Cruz (UCSC) manages the official repository of sequence-related data for the ENCODE Consortium and supports the coordination of data submission, storage, retrieval, and visualization.
The Open REGulatory ANNOtation database (ORegAnno) is an open database for the curation of known regulatory elements from scientific literature.
RegTransBase consists of two modules - a database of regulatory interactions based on literature and an expertly curated database of transcription factor binding sites.
CENSOR is a software tool which screens query sequences against a reference collection of repeats and "censors" (masks) homologous portions with masking symbols, as well as generating a report classifying all found repeats.
The REPuter program family provides state of the art software solutions to compute and visualize repeats in whole genomes or chromosomes. REPuter computes all maximal duplications and reverse, complemented and reverse complemented repeats in a DNA input sequence.
Imperfect Microsatellite Extractor (IMEx) is a tool for extracting perfect as well as imperfect Microsatelites or Simple Sequence Repeats (SSR's) or Short Tandem Repeats (STR's) from genome sequences. IMEx is an efficient, fast and user-friendly program can search for microsatellites in the way the user wants.
Spectral Repeat Finder (SRF) is a program to find repeats through an analysis of the power spectrum of a given DNA sequence.
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.
QGRS Mapper is a software program that generates information on composition and distribution of putative Quadruplex forming G-Rich Sequences (QGRS) in nucleotide sequences. It is also designed to handle the analysis of mammalian pre-mRNA sequences, including that are alternatively processed (alternatively spliced or alternatively polyadenylated). QGRS Mapper is based on published algorithms for recognition and mapping of putative QGRS.
UTRscan looks for UTR functional elements by searching through user submitted sequence data for the patterns defined in the UTRsite collection.
UTRSite is a collection of functional sequence patterns located in 5' or 3' UTR sequences.
Detection of polyadenylation signals in human DNA sequences.
PolyA_DB, a web resource for pre-mRNA cleavage and polyadenylation sites (polyA sites). PolyA_DB version 1 contains human and mouse polyA sites mapped by cDNA/EST sequences. PolyA_DB version 2 contains polyA sites in human, mouse, rat, chicken and zebrafish that are mapped by cDNA/EST and Trace sequences. PolyA_SVM predicts polyA sites using 15 cis elements identified for human polyA sites.