SWISS-PROT RELEASE 24.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 24.0 of SWISS-PROT contains 28154 sequence entries, comprising
9'545'427 amino acids abstracted from 27750 references. This represents
an increase of 5.9% over release 23. The recent growth of the data bank
is summarized below.
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
24.0 12/92 28154 9 545 427
1.2 Source of data
Release 24.0 has been updated using protein sequence data from release
34.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 33.0 of the
EMBL Nucleotide Sequence Database.
<PAGE>
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Database cross-
references) pointer lines:
Entries with pointer(s) to only PIR entri(es): 4411
Entries with pointer(s) to only EMBL entri(es): 3691
Entries with pointer(s) to both EMBL and PIR entri(es): 19493
Entries with no pointers lines: 559
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 23
2.1 Sequences and annotations
About 1466 sequences have been added since release 23, the sequence data
of 196 existing entries has been updated and the annotations of 3300
entries have been revised. In particular we have used reviews articles
to update the annotations of the following groups or families of
proteins:
- 14-3-3 proteins
- 5'-nucleotidases
- 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (HPPK)
- Actin-capping proteins alpha subunits
- Bacterial regulatory proteins, crp/fnr family
- Beta-lactamases class B
- Calreticulins and calnexins
- Chaperonins TCP-1
- Chitinases
- Clusterins
- Cold shock proteins
- Dihydropteroate synthase (DHPS)
- DNA-directed DNA polmyerases (C family)
- Glutamyl-tRNA reductases
- Glycoprotein hormones
- Granulins
- Guanine-nucleotide releasing factors CDC24 family
- Hirudins
- Leader peptidase family
- Neurotransmitters transporters
- Pancreatic hormone family
- Prokaryotic-type release factors
- Prolyl endopeptidases
- Receptor tyrosine kinase class V (eph, eck, elk, etc.)
- secY proteins
- Serine proteases, subtilisin family (subtilases)
- Tranketolases
- Transcription factor TFIIB
- Transthyretins
- Visual pigments (opsins)
- Wnt-1 family
<PAGE>
2.2 Weekly update of SWISS-PROT
Starting with this release we are providing weekly updates of SWISS-
PROT. These updates are available by anonymous FTP. Three files will be
updated every week:
new_seq.dat Contains all the new entries since the last full release.
upd_seq.dat Contains the entries for which the sequence data has been
updated since the last release.
upd_ann.dat Contains the entries for which one or more annotation
fields have been updated since the last release.
Currently these files are available on the following anonymous ftp
servers:
Organism EMBL ftp server
Address ftp.embl-heidelberg.de (or 192.54.41.33)
Directory /pub/databases/swissprot/new
Organism ExPASy (Geneva University Expert Protein Analysis System)
Address expasy.hcuge.ch (or 129.195.254.61)
Directory /databases/swiss-prot/updates
Organism National Center for Biotechnology Information (NCBI)
Address ncbi.nlm.nih.gov (or 130.14.20.1)
Directory /repository/swiss-prot/updates
!! Important notes !!!
Although we are going to try to follow a regular schedule, we do not
promise to update these files every week. In some cases two weeks will
elapse in-between two updates.
Due to the current mechanism used to build a release the entries that
are provided in these updates are not guaranteed to be error free. Also,
for the same reason, new entries will not contain an OC (Organism
Classification) line.
3.0 CHANGES PLANNED FOR FUTURE RELEASES
3.1 Change in the RA line concerning the author names format
As from release 25 in March 1993 we will change the format of author
names on RA lines to conform to that used by major bibliographic
databases such as Medline. The main change is that the periods and
hyphens ("-") which currently appear within initials will not appear any
more. For example, the current:
RA Wilson A.C., Smith J.-C.;
<PAGE>
will appear as:
RA Wilson AC, Smith JC;
4. ENZYME AND PROSITE
4.1 The ENZYME data bank
Release 11.0 of the ENZYME data bank is distributed with release 24 of
SWISS-PROT. ENZYME release 11.0 contains information relative to 3489
enzymes. The data bank has been significantly modified to take into
account the information available in the new edition of the IUPAC-IUB
Enzyme Nomenclature book which describes many new enzymes and updates
the information concerning existing ones.
4.2 The PROSITE data bank
Release 10.0 of the PROSITE data bank is distributed with release 24 of
SWISS-PROT. Release 10.0 contains 635 documentation chapters that
describes 803 different patterns. Since the last major release of
PROSITE (release 9.00 of June 1992), 55 new chapters have been added and
about 255 chapters have been updated. The new chapters are listed below.
- 14-3-3 proteins signatures
- 5'-nucleotidase signatures
- 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature
- Aminotransferases class-IV signature
- AP endonucleases family 1 signatures
- AP endonucleases family 2 signatures
- ArgE / dapE / CPG2 family signatures
- Barwin domain signatures
- Beta-lactamases class B signatures
- Calreticulin family signatures
- Chaperonins TCP-1 signatures
- Chitinases class I signatures
- Chorismate synthase signatures
- Dihydropteroate synthase signatures
- Electron transfer flavoprotein alpha-subunit signature
- Endonuclease III iron-sulfur binding region signature
- Enterobacterial virulence outer membrane protein signatures
- Formate--tetrahydrofolate ligase signatures
- F-actin capping protein alpha subunit signatures
- Germin family signature
- GltP / dctA family of transporters signatures
- Glutamyl-tRNA reductase signature
- Glycoprotein hormones alpha chain signatures
- Glycosyl hydrolases family 11 active site signatures
- Glycosyl hydrolases family 3 active site
- Granulins signature
<PAGE>
- Granulocyte-macrophage colony-stimulating factor signature
- Guanine-nucleotide dissociation stimulators CDC24 family signature
- Guanine-nucleotide dissociation stimulators CDC25 family signature
- Involucrin signature
- Phosphoglucomutase & phosphomannomutase phosphohistidine signature
- Prokaryotic ornithine and lysine decarboxylases pyridoxal-phosphate
- Prokaryotic-type carbonic anhydrases signatures
- Prokaryotic-type peptide chain release factors signature
- Prolyl endopeptidase family serine active site
- Protein secY signatures
- Receptor tyrosine kinase class V signatures
- Riboflavin synthase alpha chain family Lum-binding site signature
- Ribosomal protein L13 signature
- Ribosomal protein L30e signature
- Ribosomal protein L34 signature
- Ribosomal protein S16 signature
- Ribosomal protein S17e signature
- Ribosomal protein S26e signature
- Sigma-54 factors family signatures
- Sigma-70 factors family signatures
- Single-strand binding protein family signatures
- Stress-induced proteins SRP1/TIP1 family signature
- S-adenosyl-L-homocysteine hydrolase signatures
- Tetrahydrofolate dehydrogenase/cyclohydrolase signatures
- Transcription factor TFIIB repeat signature
- Transketolase signatures
- Transthyretin signatures
- WHEP-TRS domain signature
- XPAC protein signatures
5. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, if, for example, the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.68 Gln (Q) 4.03 Leu (L) 9.15 Ser (S) 7.07
Arg (R) 5.25 Glu (E) 6.25 Lys (K) 5.80 Thr (T) 5.85
Asn (N) 4.43 Gly (G) 7.11 Met (M) 2.34 Trp (W) 1.30
Asp (D) 5.26 His (H) 2.26 Phe (F) 3.97 Tyr (Y) 3.22
Cys (C) 1.81 Ile (I) 5.50 Pro (P) 5.07 Val (V) 6.51
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 3698
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 1665
2x: 617
3x: 359
4x: 236
5x: 153
6x: 117
7x: 86
8x: 65
9x: 76
10x: 31
11- 20x: 147
21- 50x: 86
51-100x: 24
>100x: 36
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 2094 Human
2 1991 Escherichia coli
3 1297 Mouse
4 1198 Rat
5 1092 Baker's yeast (Saccharomyces cerevisiae)
6 576 Bovine
7 496 Fruit fly (Drosophila melanogaster)
8 448 Chicken
9 423 Bacillus subtilis
10 318 African clawed frog (Xenopus laevis)
11 316 Salmonella typhimurium
12 303 Rabbit
13 278 Pig
14 251 Vaccinia virus (strain Copenhagen)
15 209 Maize
16 193 Human cytomegalovirus (strain AD169)
17 168 Bacteriophage T4
18 162 Vaccinia virus (strain WR)
19 158 Rice
20 147 Tobacco
21 140 Wheat
22 136 Pseudomonas aeruginosa
23 134 Arabidopsis thaliana (Mouse-ear cress)
134 Pea
25 128 Barley
26 120 Staphylococcus aureus
27 117 Fission yeast (Schizosaccharomyces pombe)
117 Marchantia polymorpha (Liverwort)
29 116 Spinach
30 113 Sheep
31 111 Slime mold (Dictyostelium discoideum)
111 Soybean
33 109 Caenorhabditis elegans
34 104 Dog
35 103 Neurospora crassa
36 101 Pseudomonas putida
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 1706 1001-1100 270
51- 100 2911 1101-1200 157
101- 150 4223 1201-1300 133
151- 200 2703 1301-1400 86
201- 250 2317 1401-1500 71
251- 300 2091 1501-1600 38
301- 350 1909 1601-1700 36
351- 400 1878 1701-1800 33
401- 450 1423 1801-1900 36
451- 500 1633 1901-2000 27
501- 550 1095 2001-2100 10
551- 600 786 2101-2200 33
601- 650 543 2201-2300 40
651- 700 402 2301-2400 13
701- 750 385 2401-2500 15
751- 800 309 >2500 78
801- 850 227
851- 900 236
901- 950 145
951-1000 156
Currently the ten largest sequences are:
RYNR_RABIT 5037 a.a.
RYNR_HUMAN 5032 a.a.
APB_HUMAN 4563 a.a.
APOA_HUMAN 4548 a.a.
DYHC_TRIGR 4466 a.a.
POLG_BVDV 3988 a.a.
VGF1_IBVB 3951 a.a.
POLG_HCVA 3898 a.a.
POLG_HCVB 3898 a.a.
ACVT_PENCH 3791 a.a.
<PAGE>
APPENDIX B: ON-LINE EXPERTS
B.1 List of on-line experts for PROSITE and SWISS-PROT
Field of expertise Name Email address
--------------------------- ------------------ -----------------------------
Alcohol dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt.persson@embl-heidelberg.
de
Aldehyde dehydrogenases Joernvall H. hans.jornvall@k1m.ki.se
Persson B. bengt.persson@embl-heidelberg.
de
Alpha-crystallins/HSP-20 Leunissen J.A.M. jackl@caos.caos.kun.nl
de Jong W. u629000@hnykun11.bitnet
Alpha-2-macroglobulins Van Leuven F. fred@blekul13.bitnet
AA-tRNA synthetases class II Leberman R. leberman@frembl51.bitnet
Apolipoproteins Boguski M.S. boguski@ncbi.nlm.nih.gov
Arrestins Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.
edu
ATP synthase c subunit Recipon H. recipon@ncbi.nlm.nih.gov
Band 4.1 family proteins Rees J. jrees@vax.oxford.ac.uk
Beta-lactamases Brannigan J. jab5@vaxa.york.ac.uk
Beta-transducin family Boguski M.S. boguski@ncbi.nlm.nih.gov
C-type lectin domain Drickamer K. drick@cuhhca.hhmi.columbia.
edu
Chalcone/stilbene synthases Schroeder J. raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60 Georgopoulos C. georgopo@cmu.unige.ch
Chaperonins TCP1 family Willison K.R. willison@icr.ac.uk
Chitinases Henrissat B. bernie@cermav.grenet.fr
Clusterin Peitsch M.C. peitsch@ulbio1.unil.ch
Cold shock domain Landsman D. landsman@ncbi.nlm.nih.gov
CTF/NF-I Mermod N. nmermod@ulys.unil.ch
Gronostajski R. gronosr@ccsmtp.ccf.org
Cytochromes P450 Holsztynska E.J. ela@netcom.uucp
netcom!ela@apple.com
DEAD-box helicases Linder P. linder@urz.unibas.ch
dnaJ family Kelley W. kelley@cmu.unige.ch
EF-hand calcium-binding Cox J.A. cox@sc2a.unige.ch
Kretsinger R.H. rhk5i@virginia.bitnet
Enoyl-CoA hydratase Hofmann K.O. khofmann@biomed.biolan.
uni-koeln.de
fruR/lacI family HTH proteins Reizer J. jreizer@ucsd.edu
GATA-type zinc-fingers Boguski M.S. boguski@ncbi.nlm.nih.gov
GDT/GTP dissociation stimul. Boguski M.S. boguski@ncbi.nlm.nih.gov
GltP family of transporters Hofmann K.O. khofmann@biomed.biolan.
uni-koeln.de
Glucanases Henrissat B. bernie@cermav.grenet.fr
Beguin P. phycel@pasteur.bitnet
Glutamine synthetase Tateno Y. ytateno@genes.nig.ac.jp
<PAGE>
G-protein coupled receptors Chollet A. chollet@clients.switch.ch
Attwood T.K. bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins Boguski M.S. boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17 Landsman D. landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.
edu
Integrases Roy P.H. 2020000@saphir.ulaval.ca
Kringle domain Ikeo K. kikeo@genes.nig.ac.jp
Lipocalins Boguski M.S. boguski@ncbi.nlm.nih.gov
Peitsch M.C. peitsch@ulbio1.unil.ch
lysR family HTH proteins Henikoff S. henikoff@sparky.fhcrc.org
MAC components / perforin Peitsch M.C. peitsch@ulbio1.unil.ch
Malic enzymes Glynias M. mglynias@ncsa.uiuc.edu
Myelin proteolipid protein Hofmann K.O. khofmann@biomed.biolan.
uni-koeln.de
Pancreatic trypsin inhibitor Ikeo K. kikeo@genes.nig.ac.jp
PEP requiring enzymes Reizer J. jreizer@ucsd.edu
pfkB carbohydrate kinases Reizer J. jreizer@ucsd.edu
Phytochromes Partis M.D. partis@gcri.afrc.ac.uk
Protein kinases Hanks S. hanks@vuctrvax.bitnet
Hunter T. hunter@salk.bitnet
PTS proteins Reizer J. jreizer@ucsd.edu
Restriction-modification Bickle T. bickle@urz.unibas.ch
enzymes Roberts R.J. roberts@neb.com
Ribosomal protein S3 Hallick R. hallick%biotec@arizona.edu
Ribosomal protein S15 Ellis S.R. srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases Harayama S. harayama@cmu.unige.ch
Signal sequence peptidases von Heijne G. gvh@csb.ki.se
Dalbey R.E. rdalbey@magnus.acs.ohio-state.
edu
Sodium symporters Reizer J. jreizer@ucsd.edu
Subtilases Brannigan J. jab5@vaxa.york.ac.uk
Siezen R.J. nizo@caos.caos.kun.nl
Thiol proteases Turk B. turk@ijs.ac.mail.yu
Thiol proteases inhibitors Turk B. turk@ijs.ac.mail.yu
TNF family Jongeneel C.V. vjongene@isrecmail.unil.ch
TPR repeats Boguski M.S. boguski@ncbi.nlm.nih.gov
Transit peptides von Heijne G. gvh@csb.ki.se
Type-II membrane antigens Levy S. levy@cellbio.stanford.edu
Uracil-DNA glycosylase Aasland R. aasland@bio.uib.no
Vitamin K-depend. Gla domain Price P.A. pprice@ucsd.edu
Xylose isomerase Jenkins J. jenkins@frira.afrc.ac.uk
WAP-type domain Claverie J.-M. jmc@ncbi.nlm.nih.gov
ZP domain Bork P. bork@embl-heidelberg.de
African swine fever virus Yanez R.J. ryanez@cbm2.uam.es
Bacteriophage P4 Halling C. chh9@midway.uchicago.edu
Drosophila Ashburner M. ma11@phx.cam.ac.uk
Escherichia coli Rudd K. rudd@ncbi.nlm.nih.gov
Salmonella typhimurium Rudd K. rudd@ncbi.nlm.nih.gov
Snakes Stocklin R. stocklin@cmu.unige.ch
Yeast chromosome I Ouellette F. francis@monod.biol.mcgill.ca
<PAGE>
B.2 Requirements to fulfill to become an on-line expert
An expert should be a scientist working with specific famili(es) of
proteins (or specific domains) and which would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact Amos Bairoch
at one of the following electronic mail addresses:
bairoch@cmu.unige.ch
bairoch@cgecmu51.bitnet
<PAGE>
APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
**********************
*********************** <----- * EPD [Euk. Promot.] *
* EMBL Nucleotide * -----> **********************
* Sequence Data *
***************** ----> * Library * **********************
* FLYBASE * <---- *********************** <----- * ECD [E. coli map] *
* [Drosophila * ^ | ^ **********************
* genetic maps] * --------+ | | |
***************** <-----+ | | | +--------- **********************
| | | | +--------- * TFD [Trans. fact.] *
| | | | | +------> **********************
| | | | | |
***************** | v | v v | **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <---- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
***************** * Data Bank * v
*********************** **********************
***************** | ^ | ^ | ^ | | * OMIM [Diseases] *
* PROSITE * <-------+ | | | | | | +--------> **********************
* [Patterns] * ----------+ | | | | |
***************** | | | | +-----------> **********************
| | | | +-------------- * E. coli 2D gels *
| | | | **********************
| | | |
| | | +----------------> **********************
| | +------------------- * EcoGene/EcoSeq *
| v **********************
| ***********************
+--------> * PDB [3D structures] *
***********************
<PAGE>