Integrated search in PROSITE, Pfam, PRINTS and other family and domain databases. InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
Conserved Domain Database Search @ NCBI
PANTHER version 7.0 contains 6594 protein families, each with a phylogenetic tree relating modern-day genes in 48 organisms. Expert biologists have divided each family into subfamilies, which are generally orthologous groups but may also contain recently duplicated paralogs. Each family and subfamily is also represented as a hidden Markov model (HMM), which can be used to classify new sequences to an existing subfamily.
TIGRFAMs are protein families based on Hidden Markov Models or HMMs. TIGRFAMs is a resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Find Pfam families within your sequence of interest.
ProDom is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases
Try DOUTfinder analysis of your protein, which will you help evaluate the subsignificant domain hits when other databases have failed.
SYSTERS (short for SYSTEmatic Re-Searching) is a collection of graph-based algorithms to hierarchically partition a large set of protein sequences into homologous families and superfamilies. The methods unified now under the name SYSTERS (short for SYSTEmatic Re-Searching) are based on an all-against-all database search (using Smith-Waterman comparisons on a GeneMatcher machine).
The Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the NCBI Entrez Protein Database based on domain architecture, defined as the sequential order of conserved domains in proteins.
PANDIT is a collection of multiple sequence alignments and phylogenetic trees covering many common protein domains.
AnDom helps to assign structual domains to protein sequences and to classify them according to SCOP.
SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms.
Proteins from complete genomes have been clustered by sequence similarity into groups - COGs, or in case of viruses, VOGs. Genome ProtMap maps each protein from a COG/VOG back to its genome, and displays all the genomic segments coding for members of this particular group of related proteins.
The NCBI Entrez Protein Clusters database is a collection of Reference Sequence (RefSeq) proteins from the complete genomes of prokaryotes, plasmids, and organelles grouped and annotated based on sequence similarity and protein function.
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
Scans a sequence against PROSITE or a pattern against the UniProt Knowledgebase (Swiss-Prot and TrEMBL)
High-quality Automated and Manual Annotation of microbial Proteomes. HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is applied to bacterial, archaeal and plastid-encoded proteins.
SVM-Prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from Its Primary Sequence.
The PIRSF classification system is based on whole proteins rather than on the component domains; therefore, it allows annotation of generic biochemical and specific biological functions, as well as classification of proteins without well-defined domains.
CDTree: a protein domain hierarchy viewer and editor
EVEREST is an automatic identification and classification of protein domains. EVEREST combines methodologies from the fields of finite metric spaces, machine learning and statistical modeling and achieves state of the art results. Our process begins by constructing a database of protein segments that emerge in an all vs. all pairwise sequence comparison.
ProtoNet provides automatic hierarchical classification of protein sequences. The site allows users to study the clustering as well as its qualities.
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.
Jevtrace is a implementation of the evolutionary trace method. The software expands on the evolutionary trace by allowing manipulation of the input data and parameters of analysis, and presents a number of novel tree inspired analysis of protein families.
Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. BLOCKS is being discontinued.
SBASE is a collection of protein domain sequences collected from the literature, from protein sequence databases and from genomic databases (Vlahovicek et al, 2002). The protein domains are defined by their sequence boundaries given by the publishing authors or in one of the primary sequence databases (Swiss-Prot, PIR, TREMBL etc.). Domain groups are included if they have well defined sequence boundaries, and if they can be distinguished from other sequences using a similarity search technique.
mkdom 2 is the program used to build the ProDom database.
The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE,