SWISS-PROT RELEASE 12.0 RELEASE NOTES
Date: October 14, 1989
Author: A. Bairoch
1. INTRODUCTION
1.1 Evolution
Release 12.0 of SWISS-PROT contains 12305 sequence entries, comprising
3'797'482 amino acids abstracted from 12147 references. This represents
an increase of 16% over release 11.0. The recent growth of the data bank
is summarized below:
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
1.2 Source of data
Release 12.0 has been updated using protein sequence data from release
21.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 20.0 of the
EMBL Nucleotide Sequence Data Library.
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Databank
Reference) pointer lines:
Entries with pointer(s) to only PIR entri(es): 3125
Entries with pointer(s) to only EMBL entri(es): 4873
Entries with pointer(s) to both EMBL and PIR entri(es): 3575
Entries with no pointers lines (entered in house): 732
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 11
2.1 Sequences and annotations
Some 1466 new sequences have been added since the last release, the
sequence data of 173 existing entries has been updated and the
annotations of 2400 entries have been revised. In particular we have
used reviews articles to update the annotations of the following groups
or families of proteins:
Acyl carrier proteins
Aminoacyl-transfer RNA synthetases
Biotin-requiring enzymes
Chloroplast photosystems I and II proteins
Creatine kinases
Crp bacterial activator proteins
Enolases
Glucose-6-phosphate dehydrogenases
Glutaredoxins
GTP-binding elongation factors
Heat shock hsp90 proteins
Insulin family proteins
Insulin-like growth factor binding proteins
Insect-type alcohol dehydrogenases / ribitol dehydrogenase family
Integrins
Iron-containing alcohol dehydrogenases
LysR bacterial activator proteins
Malate dehydrogenase
Mammalian defensins
Mitochondrial energy transfer proteins
Myc-type proteins
Phosphoglucose isomerases
Phosphoglycerate kinases
Serine/threonine specific protein phosphatases
Sugar transporters
Uracil-DNA glycosylases
Vertebrate galactoside-binding lectins
Zinc-containing alcohol dehydrogenases
2.2 New line-type
This release introduce an new type of data line, the OG line. The OG
(OrGanelle) lines indicate if the gene coding for a protein originates
from the mitochondria, the chloroplast, or a plasmid. The format for the
OG line is:
OG CHLOROPLAST.
OG MITOCHONDRION.
OG PLASMID name.
Where 'name' is the name of the plasmid.
Previously this information was stored in the OS line, as shown in the
example below.
OS WHEAT (TRITICUM AESTIVUM) CHLOROPLAST.
The above example will now be stored as:
OS WHEAT (TRITICUM AESTIVUM).
OG CHLOROPLAST.
2.3 New topic for the comments (CC) line type
As of release 12 we have added a new 'topic' for the comments (CC) line-
type: CAUTION, which is used to warn about possible errors and/or
grounds for confusion. Example of its usage:
CC -!- CAUTION: ALSO SEE VERSION 2 OF THIS PROTEIN THAT DIFFERS DUE
CC TO A FRAMESHIFT.
2.4 Documentation changes
- ACINDEX.TXT is a new document file which is an index of all the
accession numbers which appear in SWISS-PROT and the name of the
entries in which they occur.
- PDBTOSP.TXT is a new document file which is an index of all the
Brookhaven PDB entries referenced in SWISS-PROT.
- The JOURLIST.TXT document now indicates the abbreviation and the full
names of all journals cited in SWISS-PROT.
3. THE NEXT RELEASE
SWISS-PROT release 13.0 will be available in January 1990.
Starting with release 13 SWISS-PROT will be distributed with PROSITE, a
data bank of sites and patterns in proteins. Both data banks will be
fully cross-referenced.
4. WE NEED YOUR HELP !
We welcome any feedback from our users. We especially would appreciate
that you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, as for example if the function
of a protein has been clarified or if new post-translational information
has become available.
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.70 Gln (Q) 4.10 Leu (L) 9.12 Ser (S) 7.03
Arg (R) 5.21 Glu (E) 6.23 Lys (K) 5.85 Thr (T) 5.85
Asn (N) 4.39 Gly (G) 7.21 Met (M) 2.29 Trp (W) 1.34
Asp (D) 5.21 His (H) 2.27 Phe (F) 3.95 Tyr (Y) 3.21
Cys (C) 1.85 Ile (I) 5.38 Pro (P) 5.14 Val (V) 6.50
Asx (B) 0.01 Glx (Z) 0.01 Xaa (X) 0.03
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Ser, Val, Glu, Thr = Lys, Ile, Arg = Asp, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 1841
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 834
2x: 345
3x: 180
4x: 117
5x: 73
6x: 60
7x: 27
8x: 29
9x: 38
10x: 16
11- 20x: 61
21-100x: 49
>100x: 12
A.2.2 Table of the most represented species
Number Frequency Species
1 1093 Human
2 951 Escherichia coli
3 621 Mouse
4 519 Rat
5 397 Baker's yeast (Saccharomyces cerevisiae)
6 337 Bovine
7 210 Fruit fly (Drosophila melanogaster)
8 205 Chicken
9 174 Rabbit
10 149 Pig
11 133 Bacillus subtilis
12 113 African clawed frog (Xenopus laevis)
13 98 Tobacco
14 96 Salmonella typhimurium
15 94 Maize
16 89 Rice
17 84 Liverwort (Marchantia polymorpha)
18 80 Bacteriophage T4
19 77 Wheat
20 70 Herpes virus (Type 1, Strain 17)
21 69 Spinach
22 68 Vaccinia Virus
23 67 Varicella-Zoster virus (Strain Dumas)
24 63 Soybean
25 62 Bacteriophage Lambda
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 782 1001-1100 96
51- 100 1520 1101-1200 63
101- 150 2297 1201-1300 51
151- 200 1257 1301-1400 31
201- 250 978 1401-1500 22
251- 300 827 1501-1600 13
301- 350 728 1601-1700 15
351- 400 708 1701-1800 11
401- 450 540 1801-1900 9
451- 500 615 1901-2000 14
501- 550 476 >2000 68
551- 600 282
601- 650 215
651- 700 156
701- 750 130
751- 800 93
801- 850 93
851- 900 112
901- 950 51
951-1000 52
Currently the three largest sequences are:
RYNR$RABIT 5037 a.a.
APB$HUMAN 4563 a.a.
APOA$HUMAN 4548 a.a.
APPENDIX B: DISKS FOR SWISS-PROT
B.1 IBM PC/AT 1.2 Mb disks
SWISS-PROT release 12 is stored on sixteen 1.2 Mb disks. Each of these
disk contains a single bulk file (PRT12_01.BLK to PRT12_16.BLK):
Disk First sequence Last Sequence
1 10K5$ECOLI ATP6$YEAST
2 ATP8$ASPAM CHLN$ECOLI
3 CHOA$STRSP CYC$MIRLE
4 CYC$MOUSE FA10$HUMAN
5 FA11$HUMAN GTA1$RAT
6 GTA2$RAT HMEN$DROME
7 HMEN$DROVI KAD1$HUMAN
8 KAD1$PIG M4$DICDI
9 M5$ECOLI NRAM$INACR
10 NRAM$INADA POL$HIV2I
11 POL$HIV2N RBS4$LYCES
12 RBS4$SOYBN SMS1$HUMAN
13 SMS1$ICTPU TRPC$ACICA
14 TRPC$ASPNG VIP$HUMAN
15 VIP$PIG YU74$ECOLI
16 YVL1$HCMVA ZP3$MOUSE
B.2 IBM PS/2 1.4 Mb disks
The number and content of the 1.4 Mb disks for the PS/2 systems are
exactly identical to those of the 1.2 Mb disks (see above).