Protein Domains & Families

Integrated search in PROSITE, Pfam, PRINTS and other family and domain databases. InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
Conserved Domain Database Search @ NCBI
PANTHER version 6.1 contains 5547 protein families, divided into 24,582 functionally distinct protein subfamilies by expert biologist curators.
TIGRFAMs are protein families based on Hidden Markov Models or HMMs. Use this page to see the curated seed alignmet for each TIGRFAM, the full alignment of all family members and the cutoff scores for inclusion in each of the TIGRFAMs. Also use this page to search through the TIGRFAMs and HMMs for text in the TIGRFAMs Text Search or search for specific sequences in the TIGRFAMs Sequence Search.
ProDom is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases
Try DOUTfinder analysis of your protein, which will you help evaluate the subsignificant domain hits when other databases have failed.
Large-scale Protein Clustering based on Sequence Similarity
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Find Pfam families within your sequence of interest.
The Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the NCBI Entrez Protein Database based on domain architecture, defined as the sequential order of conserved domains in proteins.
PANDIT is a collection of multiple sequence alignments and phylogenetic trees covering many common protein domains.
AnDom helps to assign structual domains to protein sequences and to classify them according to SCOP.
SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms.
Proteins from complete genomes have been clustered by sequence similarity into groups - COGs, or in case of viruses, VOGs. Genome ProtMap maps each protein from a COG/VOG back to its genome, and displays all the genomic segments coding for members of this particular group of related proteins.
The NCBI Entrez Protein Clusters database is a collection of Reference Sequence (RefSeq) proteins from the complete genomes of prokaryotes, plasmids, and organelles grouped and annotated based on sequence similarity and protein function.
The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE,  PRINTS,  Pfam,  ProDom,  SMART,  TIGRFAMs,  Gene3D, SUPERFAMILY, PIR Superfamily and  PANTHER.
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them
Scans a sequence against PROSITE or a pattern against the UniProt Knowledgebase (Swiss-Prot and TrEMBL)
Implementation of the evolutionary trace method. The software expands on the evolutionary trace by allowing manipulation of the input data and parameters of analysis, and presents a number of novel tree inspired analysis of protein families.
Java GUI for InterProScan (JIPS) is a grapical user interface for finding new motif/domains/fingerprints in repeated Interproscan searches. JIPS compares results from InterProScan searches performed with two versions of a InterPro database and highlights new motifs/Domains/fingerprints which are from the updated database. Results are displayed in an easy-to-use tabular format. JIPS also contains tools to assist with ortholog-based comparative studies of protein signatures.
High-quality Automated and Manual Annotation of microbial Proteomes. HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is applied to bacterial, archaeal and plastid-encoded proteins.
SuperFamily (PIRSF) classification system. Based on the evolutionary relationships of whole proteins, this classification system allows annotation of both speci®c biological and generic biochemical functions.
 This tool allows you to search our library of functional diagnostic profiles with a protein sequence. A match to one or more of these profiles provides you with an effective multi-alignment of your query sequence to the best-matched profile's defining set of proteins along with its annotations.
SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms.
mkdom 2 is the program used to build the ProDom database.
CDTree: a protein domain hierarchy viewer and editor
EVEREST is an automatic identification and classification of protein domains. EVEREST combines methodologies from the fields of finite metric spaces, machine learning and statistical modeling and achieves state of the art results. Our process begins by constructing a database of protein segments that emerge in an all vs. all pairwise sequence comparison.
ProtoNet provides automatic hierarchical classification of protein sequences. The site allows users to study the clustering as well as its qualities.
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.
SBASE is a collection of protein domain sequences collected from the literature, from protein sequence databases and from genomic databases (Vlahovicek et al, 2002). The protein domains are defined by their sequence boundaries given by the publishing authors or in one of the primary sequence databases (Swiss-Prot, PIR, TREMBL etc.). Domain groups are included if they have well defined sequence boundaries, and if they can be distinguished from other sequences using a similarity search technique.
SVM-Prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from Its Primary Sequence.
Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. BLOCKS is being discontinued.