Patterns & Motifs
This tool locates sites (short DNA sequences) within the human, mouse, rat, fruitfly and the nematode (C. elegans) genomes.
http://zlab.bu.edu/site2genome/
The main question R'MES addresses is "does this motif occur in that biological sequence with an expected frequency?" In other words, can we observe it so many times, or so few times, just by chance? Usually, when the answer is no, such a motif is a candidate to have a particular biological meaning; only a candidate: statistical significance is not equivalent to biological significance.
http://migale.jouy.inra.fr/outils/mig/rmes
Java Word Frequencies (JFreq) is a front end to Schbath's R'MES
http://athena.bioc.uvic.ca/workbench.php?tool=jfreq&db=
PhyloGibbs is an algorithm for discovering regulatory sites in a collection of DNA sequences, including multiple alignments of orthologous sequences from related organisms. Many existing approaches to either search for sequence-motifs that are overrepresented in the input data, or for sequence-segments that are more conserved evolutionary than expected. PhyloGibbs combines these two approaches and identifies significant sequence-motifs by taking both over-representation and conservation signals into account.
http://www.phylogibbs.unibas.ch/cgi-bin/phylogibbs.pl
In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs. We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of such matrices in large databases. PoSSuMsearch Fast and Sensitive Matching of Position Specific Scoring Matrices Using Enhanced Suffix Arrays
http://bibiserv.techfak.uni-bielefeld.de/possumsearch/
Improbizer searches for motifs in DNA or RNA sequences that occur with improbable frequency (to be just chance) using a variation of the expectation maximization (EM) algorithm.
http://www.soe.ucsc.edu/~kent/improbizer/improbizer.html
ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif. We have used ELPH to find patterns such as ribosome binding sites (RBSs) and exon splicing enhancers (ESEs).
http://www.cbcb.umd.edu/software/ELPH/
Searching for sequence similarity by comparing physico-chemical properties of cDNA
http://gnaweb.gbf.de/cgi-bin/FeatureScan/FeatureScan.pl
MEME is a tool for discovering motifs in a group of related DNA or protein sequences. MAST is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.
http://meme.sdsc.edu/meme/intro.html
BEST is a suite of motif-finding programs, including four motif-finding programs: AlignACE, BioProspector, Consensus, MEME, and the optimization program BioOptimizer configured for each of these programs.
http://webster.cs.uga.edu/~che/BEST/
Tools for MOtif Discovery in nucleotide sequences
http://159.149.109.9/modtools/
AlignACE (Aligns Nucleic Acid Conserved Elements) is a program which finds sequence elements conserved in a set of DNA sequences.
http://atlas.med.harvard.edu/download/index.html
A Suite of Web-based Programs to Search for Regulatory Motifs in Prokaryotes and Eukaryotes
http://ai.stanford.edu/~iliu/seqmotifs/
MONKEY is a set of programs designed to search alignments of non-coding DNA sequence for matches to matrices representing the sequence specificity of transcription factors.
http://rana.lbl.gov/%7Ealan/Monkey.htm
DILIMOT is a server for finding short (3-8 amino acids), over-represented peptide patterns, or Linear motifs, in a set of proteins.
http://dilimot.embl.de/
YMF is a program that detects statistically overrepresented words (motifs) in DNA sequences. The user may specify the characteristics of the motifs to be detected. A motif here is a short string of nucleotides, degenerate symbols, and spacers.
http://wingless.cs.washington.edu/YMF/YMFWeb/YMFInput.pl
PhyME discovers motifs by integrating two important aspects of the motif's significance, overrepresentation and interspecies conservation, into one probabilistic score. The algorithm is based on multiple alignment and expectation-maximization.
http://bio.cs.washington.edu/software.html#phyme
Over-represented Transcription Factor Binding Site Prediction Tool (OTFBS)
http://www.bioinfo.tsinghua.edu.cn/~zhengjsh/OTFBS/
The Gibbs Motif Sampler will allow you to identify motifs, conserved regions, in DNA or protein sequences.
http://bayesweb.wadsworth.org/gibbs/gibbs.html
Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes
http://ai.stanford.edu/~xsliu/BioProspector/
Allow to select varied motif databases
http://motif.genome.jp/
cis-Regulatory Elements
jPREdictor was designed for the prediction of cis-regulatory elements, for which short motifs (protein binding sites on the DNA) are known. Using these motifs, it is possible to search them on sequences, to weight them by applying a positive or negative training set (model or background, respectively) and to score a sequence.
http://bibiserv.techfak.uni-bielefeld.de/jpredictor/
SeqVISTA: A Graphical Tool for Sequence Feature Visualization and Comparison. SeqVISTA presents a holistic, graphical view of features annotated on nucleotide or protein sequences. This interactive tool highlights the residues in the sequence that correspond to features chosen by the user, and allows easy searching for sequence motifs or extraction of particular subsequences. SeqVISTA is able to display results from diverse sequence analysis tools in an integrated fashion, and aims to provide much-needed unity to the bioinformatics resources scattered around the Internet.
http://zlab.bu.edu/SeqVISTA/
Allows to check your sequence against known regulatory sequences, and also allows one to query the regulatory sequence database for specific motifs.
http://www.ebi.ac.uk/asd-srv/wb.cgi?method=3
TRRD is a unique information resource, accumulating information on structural and functional organization of transcription regulatory regions of eukaryotic genes. Only experimentally confirmed information is included into TRRD.
http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd//
The Regulatory Sequence Analysis Tools (RSAT) is a complex online tool to analyze regulatory sequences. The site offers a series of tools dedicated to the detection of regulatory signals in non-coding sequences. The only input required is a list of genes of interest (e.g. a family of co-regulated genes). From this information, you can retrieve the upstream sequences over a desired distance, discover putative regulatory signals, search the matching positions for these signals in your original dataset or in whole genomes, and display the results graphically in the form of a feature map.
http://rsat.ulb.ac.be/rsat/index.html
The Motif Analysis Workbench is a WWW interface for automated comprehensive analyses of promoter regulatory motifs and the effect they exert on mRNA expression profiles. The server provides a wide spectrum of analysis tools that allow de-novo discovery of regulatory motifs, along with refinement and in-depth investigation of fully or partially characterized motifs.
http://bioportal.weizmann.ac.il/~lapidotm/rMotif/html/
Cis-acting regulatory elements in plants
http://bioinformatics.psb.ugent.be/webtools/plantcare/html/
Clover is a program for identifying functional sites in DNA sequences. If you give it a set of DNA sequences that share a common function, it will compare them to a library of sequence motifs (e.g. transcription factor binding patterns), and identify which if any of the motifs are statistically overrepresented in the sequence set.
http://zlab.bu.edu/clover/
TOUCAN is a workbench for regulatory sequence analysis on metazoan genomes : comparative genomics, detection of significant transcription factor binding sites, and detection of cis-regulatory modules (combinations of binding sites) in sets of coexpressed/coregulated genes.
http://homes.esat.kuleuven.be/~bioiuser/toucan/
TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs.
http://fraenkel.mit.edu/TAMO/
Swissregulon is a database with genome-wide annotations of regulatory sites.
http://www.swissregulon.unibas.ch/cgi-bin/regulon
The composite regulatory signature database (CRSD) is a web werver that can be applied in investigating complex regulatory behaviors involving the gene expression signatures (GESs), microRNA regulatory signatures (MRSs) and TF regulatory signatures (TRSs). The six well-known and large-scale databases, including the human UniGene (1), mature microRNAs (2,3), putative promoter (4), TRANSFAC (5), pathway, and Gene Ontology (6) databases, were integrated to provide the comprehensive analysis in CRSD. Two new genome-wide databases, the MRS and TRS databases, were also constructed and further integrated into the CRSD.
http://biochip.nchu.edu.tw/crsd1/
INCLUSive is a suit of algorithms and tools for the analysis of gene expression data and the discovery of cis-regulatory sequence elements. The tools allow for normalization, filtering and clustering of microarray data, functional scoring of gene clusters, sequence retrieval and detection of known and unknown regulatory elements.
http://tomcatbackup.esat.kuleuven.be/inclusive/index.jsp
REDUCE is an acronym that stands for Regulatory Element Detection Using Correlation with Expression, not coincidentally also the title of our paper. Based on a simple model for transcriptional regulation by independently acting transcription factors, REDUCE makes it possible to find regulatory elements based on a single microarray experiment.
http://bussemaker.bio.columbia.edu:8080/reduce/
Databases of genome-wide regulatory module and element predictions
http://www.cisred.org/
Sequence motif Databases
An interactive database providing access to mRNA sequences and associated regulatory elements. The mRNA sequences are derived from all gene sequence data in Genbank, including complete genomes, divided into putative 5' UTRs and 3' UTRs, initiation and termination regions and the full CDS sequences. This data can be searched for defined regulatory elements.
http://uther.otago.ac.nz/Transterm.html
It is known that eukaryotic mRNAs often contain various regulatory signals. Some of these signals are described in detail but most are hidden within relatively long seqments. We developed a database on the TRanslational SIGnals (TRSIG) collecting information on mRNA sequence segments possessing experimentally verified regulatory activities. In particular, TRSIG collects primary experimental data and experimental sequences rather than a description of a limited set of well-investigated posttranscriptional signals.
http://gibk26.bse.kyutech.ac.jp/jouhou/trsig/trsig.html
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations.
http://www.cisred.org/
Welcome to the UCSC ENCODE Project portal. This site contains information related to the ENCODE project at NHGRI. The University of California Santa Cruz (UCSC) manages the official repository of sequence-related data for the ENCODE Consortium and supports the coordination of data submission, storage, retrieval, and visualization.
http://genome.ucsc.edu/ENCODE/
The Open REGulatory ANNOtation database (ORegAnno) is an open database for the curation of known regulatory elements from scientific literature.
http://www.oreganno.org/oregano/Index.jsp
RegTransBase consists of two modules - a database of regulatory interactions based on literature and an expertly curated database of transcription factor binding sites.
http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main
Repeats
software tool which screens query sequences against a reference collection of repeats
http://www.ebi.ac.uk/Tools/censor/index.html
The REPuter program family provides state of the art software solutions to compute and visualize repeats in whole genomes or chromosomes. REPuter computes all maximal duplications and reverse, complemented and reverse complemented repeats in a DNA input sequence.
http://bibiserv.techfak.uni-bielefeld.de/reputer/
Imperfect Microsatellite Extractor (IMEx) is a tool for extracting perfect as well as imperfect Microsatelites or Simple Sequence Repeats (SSR's) or Short Tandem Repeats (STR's) from genome sequences. IMEx is an efficient, fast and user-friendly program can search for microsatellites in the way the user wants.
http://203.197.254.154/IMEX/index.html
Spectral Repeat Finder (SRF) is a program to find repeats through an analysis of the power spectrum of a given DNA sequence.
http://www.imtech.res.in/raghava/srf/
Tandem Repeats Finder
http://tandem.bu.edu/trf/trf.html
Tandem Repeats Database (TRDB) is a public repository of information on tandem repeats in genomic DNA and contains a variety of tools for their analysis.
https://tandem.bu.edu/cgi-bin/trdb/trdb.exe
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). On average, almost 50% of a human genomic DNA sequence currently will be masked by the program.
http://www.repeatmasker.org/
The Variable Number Tandem Repeat Locus Database (VNTRDB)
http://vntr.csie.ntu.edu.tw/
Quadruplex
QGRS Mapper is a software program that generates information on composition and distribution of putative Quadruplex forming G-Rich Sequences (QGRS) in nucleotide sequences. It is also designed to handle the analysis of mammalian pre-mRNA sequences, including that are alternatively processed (alternatively spliced or alternatively polyadenylated). QGRS Mapper is based on published algorithms for recognition and mapping of putative QGRS.
http://bioinformatics.ramapo.edu/QGRS/index.php
UTRs
UTRscan looks for UTR functional elements by searching through user submitted sequence data for the patterns defined in the UTRsite collection.
http://www.ba.itb.cnr.it/BIG/UTRScan/
UTRSite is a collection of functional sequence patterns located in 5' or 3' UTR sequences.
http://www2.ba.itb.cnr.it/UTRSite/
Polyadenylation sites
Databse on polyadenylation sites
http://polya.umdnj.edu/PolyA_DB2/index.php