ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland
Search for

             SWISS-PROT RELEASE 13.0 RELEASE NOTES


   Date:     January 18, 1990
   Author:   A. Bairoch


                               1. INTRODUCTION

   1.1  Evolution

   Release 13.0  of SWISS-PROT  contains 13837 sequence entries, comprising
   4'347'336 amino  acids abstracted from 13560 references. This represents
   an increase of 14% over release 12.0. The recent growth of the data bank
   is summarized below:

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336


   1.2  Source of data

   Release 13.0  has been  updated using protein sequence data from release
   22.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 21.0 of the
   EMBL Nucleotide Sequence Data Library.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank  we list  here the  statistics  concerning  the  DR  (Databank
   Reference) pointer lines:

   Entries with pointer(s) to only PIR entri(es):           3104
   Entries with pointer(s) to only EMBL entri(es):          5894
   Entries with pointer(s) to both EMBL and PIR entri(es):  3989
   Entries with no pointers lines (entered in house):        850


      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 11

   As the  last SWISS-PROT  release to  be distributed to PC/Gene users was
   release 11,  we list  here the changes that were made both in release 12
   and 13.


   2.1  Sequences and annotations

   Almost 3000  sequences have  been added  since release  11, the sequence
   data of  380 existing  entries has  been updated  and the annotations of
   4900 entries  have been  revised. In  particular we  have  used  reviews
   articles to  update the  annotations of the following groups or families
   of proteins:

   -  11-S plant seed storage proteins
   -  3'5'-cyclic nucleotide phosphodiesterases
   -  Acyl carrier proteins
   -  Aldehyde dehydrogenases
   -  Aminoacyl-transfer RNA synthetases
   -  AraC family bacterial transcription regulation proteins
   -  Arginases
   -  Bacteriophage P22 proteins
   -  Bacteriophage T4 proteins
   -  Band 3 protein family
   -  Biotin-requiring enzymes
   -  Chloroplast photosystems I and II proteins
   -  Creatine kinases
   -  Crp family bacterial activator proteins
   -  Cyclins
   -  Dehydrins
   -  DNA mismatch repair proteins
   -  Endothelins / sarafotoxins
   -  Enolases
   -  Eukaryotic thiol proteases
   -  Extradiol ring-cleavage dioxygenases
   -  Flavodoxins
   -  Globins, annelids
   -  Globins, molluscs
   -  Glucose-6-phosphate dehydrogenases
   -  Glutaredoxins
   -  GTP-binding elongation factors
   -  Glutamate dehydrogenases
   -  Glycerate and 3-phosphoglycerate dehydrogenases
   -  Glycophorins
   -  Granzymes
   -  GTP-binding elongation factors
   -  Heat-labile enterotoxins
   -  Heat shock hsp90 proteins
   -  Insulin family proteins
   -  Insulin-like growth factor binding proteins
   -  Insect larval cuticle proteins
   -  Insect-type alcohol dehydrogenases / ribitol dehydrogenase family
   -  Integrins
   -  Interleukin 7
   -  Iron-containing alcohol dehydrogenases
   -  Lysosome-associated membrane glycoproteins
   -  LysR family bacterial activator proteins
   -  L-lactate dehydrogenases
   -  Macrolide-lincosamide-streptogramin B resistance proteins
   -  Malate dehydrogenase
   -  Mammalian defensins
   -  MHC class II proteins
   -  Mitochondrial energy transfer proteins
   -  Myc-type proteins
   -  Nerve growth factors
   -  N-4 cytosine-specific DNA methylases
   -  Peroxidases
   -  Phosphoglucose isomerases
   -  Phosphoglycerate kinases
   -  Phospholipases A2
   -  Picornaviruses genome polyproteins
   -  Ribosomal proteins
   -  Rubredoxins
   -  Serine hydroxymethyltransferases
   -  Serine/threonine specific protein phosphatases
   -  Staphyloccocal enterotoxins / Streptococcal pyrogenic exotoxins
   -  Sugar transporters
   -  Thaumatin family proteins
   -  Transferrins
   -  Tryptophan synthase alpha and beta chains
   -  Tyrosine protein kinases
   -  Uracil-DNA glycosylases
   -  Vertebrate galactoside-binding lectins
   -  Zinc-containing alcohol dehydrogenases


   2.2  New line-type

   Release 12  introduced a  new type  of data  line, the  OG line.  The OG
   (OrGanelle) lines  indicate  whether  the  gene  coding  for  a  protein
   originates from  the mitochondria,  the chloroplast,  or a  plasmid. The
   format for the OG line is:

   OG   CHLOROPLAST.
   OG   MITOCHONDRION.
   OG   PLASMID name.

   Where 'name' is the name of the plasmid.

   Previously this  information was  stored in the OS line, as shown in the
   example below.

   OS   WHEAT (TRITICUM AESTIVUM) CHLOROPLAST.

   The above example is now stored as:

   OS   WHEAT (TRITICUM AESTIVUM).
   OG   CHLOROPLAST.


   2.3  New topic for the comments (CC) line type

   As of release 12 we have added a new 'topic' for the comments (CC) line-
   type: CAUTION.  It is  used to  indicate  that  possible  errors  and/or
   grounds for confusion may exist. Example of its usage:

   CC   -!- CAUTION: ALSO SEE VERSION 2 OF THIS PROTEIN THAT DIFFERS DUE
   CC       TO A FRAMESHIFT.


   2.4  Small change in the RL lines for submissions

   RL lines  for data  submitted to  EMBL or Genbank was represented by two
   subtypes of RL lines, as illustrated in the following examples:

   RL   SUBMITTED (OCT-1989) TO THE EMBL DATA LIBRARY.
   RL   SUBMITTED (OCT-1989) TO GENBANK.

   Starting with  release 13,  all these  lines are  now in  the  following
   format:

   RL   SUBMITTED (OCT-1989) TO EMBL/GENBANK DATA BANKS.


   2.5  Documentation changes

   -  ACINDEX.TXT is  a new  document file  which is  an index  of all  the
      accession numbers  which appear  in SWISS-PROT  and the  name of  the
      entries in which they occur.
   -  PDBTOSP.TXT is  a new  document file  which is  an index  of all  the
      Brookhaven PDB entries referenced in SWISS-PROT.
   -  The JOURLIST.TXT document now indicates the abbreviation and the full
      names of all journals cited in SWISS-PROT.

   Important: for  more  detailed  information  concerning  the  SWISS-PROT
   documentation please consult appendix C of these release notes.



     3. IMPORTANT NOTES CONCERNING SWISS-PROT RELEASE 13 AND PC/GENE 6.01

   3.1  The ryanodine receptor

   The rabbit  skeletal muscle ryanodine receptor (RYNR$RABIT) is a protein
   of 5037  amino acid  residues. PC/Gene  release 6.01  can  only  analyze
   proteins of  up to  5000 residues.  This limitation will be increased in
   the next  major version  (6.50). Until  this release  we have dealt with
   this protein  in the  following way:  the sequence  entry RYNR$RABIT was
   split into two parts. RYN1$RABIT contains the first 4563 residues (which
   corresponds to the cytoplasmic domain), and RYN2$RABIT contains residues
   4561 to 5037.

   Note: due  to this  modification there are 13838 sequence entries in the
   PC/Gene version  of the SWISS-PROT data bank (instead of 13837 as listed
   in section 1.1 of these release notes).

   3.2  The OG line

   The OG  line-type introduced  in release  12 (see  section 2.2)  is  not
   supported by  release 6.01  of PC/Gene.  This means  that although these
   lines are present in the SWISS-PROT data base (either on the CD-ROM disk
   or on the bulk files on the floppy disks), you can not make use of them.
   Release 6.50 will fully support OG lines.


   Note to  CD-ROM users:  library files  containing the  names of  all the
   sequences  which   originate  from   either  the   mitochondria  or  the
   chloroplast are available on the CD-ROM (for more details see the CD-ROM
   release notes).



                            4. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about annotations to be updated, as for example if the function
   of a protein has been clarified or if new post-translational information
   has become available.



                         APPENDIX A: SOME STATISTICS


   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.69   Gln (Q) 4.10   Leu (L) 9.11   Ser (S) 7.03
   Arg (R) 5.22   Glu (E) 6.29   Lys (K) 5.87   Thr (T) 5.84
   Asn (N) 4.42   Gly (G) 7.18   Met (M) 2.30   Trp (W) 1.33
   Asp (D) 5.23   His (H) 2.27   Phe (F) 3.93   Tyr (Y) 3.20
   Cys (C) 1.83   Ile (I) 5.37   Pro (P) 5.13   Val (V) 6.49

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.04


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 2032

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 916
                            2x: 389
                            3x: 187
                            4x: 127
                            5x:  87
                            6x:  67
                            7x:  32
                            8x:  28
                            9x:  44
                           10x:  20
                       11- 20x:  66
                       21-100x:  53
                         >100x:  16


        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        1200          Human
         2        1072          Escherichia coli
         3         685          Mouse
         4         576          Rat
         5         443          Baker's yeast (Saccharomyces cerevisiae)
         6         366          Bovine
         7         239          Fruit fly (Drosophila melanogaster)
         8         227          Chicken
         9         195          Rabbit
        10         160          Bacillus subtilis
        11         159          Pig
        12         131          African clawed frog (Xenopus laevis)
        13         122          Bacteriophage T4
        14         112          Salmonella typhimurium
        15         103          Tobacco
        16         102          Maize
        17          92          Rice
        18          84          Liverwort (Marchantia polymorpha)
        19          78          Wheat
        20          74          Staphylococcus aureus
        21          71          Vaccinia Virus
        22          70          Herpes virus (Type 1, strain 17)
                    70          Pea
                    70          Spinach
        25          69          Soybean


   A.3  Repartition of the sequences by size

      From   To  Number             From   To   Number
         1-  50     923             1001-1100      112
        51- 100    1635             1101-1200       72
       101- 150    2474             1201-1300       59
       151- 200    1405             1301-1400       39
       201- 250    1113             1401-1500       33
       251- 300     960             1501-1600       16
       301- 350     842             1601-1700       15
       351- 400     815             1701-1800       13
       401- 450     618             1801-1900        9
       451- 500     678             1901-2000       15
       501- 550     532             2001-2100        6
       551- 600     329             2101-2200       18
       601- 650     243             2201-2300       18
       651- 700     173             2301-2400       10
       701- 750     174             2401-2500        8
       751- 800     110             >2500           26
       801- 850     101
       851- 900     121
       901- 950      60
       951-1000      62

   Currently the five largest sequences are:   RYNR$RABIT  5037 a.a.
                                               APB$HUMAN   4563 a.a.
                                               APOA$HUMAN  4548 a.a.
                                               DMD$HUMAN   3685 a.a.
                                               DMD$CHICK   3660 a.a.


                          APPENDIX B: DOCUMENTATION

   SWISS-PROT documentation consists of the following items:

   USERMAN .TXT   SWISS-PROT user manual
   SP13_REL.TXT   Release notes (this document)
   SHORTDES.TXT   Short description of entries in  SWISS-PROT (this document
                  contains  the  same  information as  that available  in the
                  catalog file (PROT_CAT.TXT) but it is formatted differently)
   JOURLIST.TXT   List of abbreviations for journals cited
   KEYWLIST.TXT   List of keywords in use
   SPECLIST.TXT   List of organism (species) identification codes
   SPECODES.TXT   List of sequence entry codes classified by species
   ACINDEX .TXT   Accession number index
   AUTINDEX.TXT   Author index
   CITINDEX.TXT   Citation index
   ECINDEX .TXT   Index of enzymes classified by their EC number
   EMBLTOSP.TXT   Index  of the  EMBL Data Library sequences referenced in
                  SWISS-PROT
   PDBTOSP .TXT   Index of Brookhaven PDB entries referenced in SWISS-PROT

   -  All these  document files  are available on the CD-ROM disk. They are
      stored in the '\DOC_DBAS\SPROT' directory.
   -  Except for  AUTINDEX.TXT and  SHORTDES.TXT, all  the  other  document
      files are stored in two SWISS-PROT documentation floppy disks.
   -  Some of  these documents  are also distributed in a printed form (see
      table below).

             Document         Documentation   Printed
                              Disk N#         copy
             ----------------------------------------
             USERMAN .TXT     1               Yes
             SP13_REL.TXT     1               Yes
             SHORTDES.TXT     N.A.            [*]
             JOURLIST.TXT     1               Yes
             KEYWLIST.TXT     1               Yes
             SPECLIST.TXT     1               Yes
             SPECODES.TXT     1               No
             ACINDEX .TXT     1               No
             AUTINDEX.TXT     N.A.            No
             CITINDEX.TXT     2               No
             ECINDEX .TXT     1               Yes
             EMBLTOSP.TXT     1               No
             PDBTOSP .TXT     1               Yes

   [*] The content  of the   catalog file  (PROT_CAT.TXT) is  provided in a
       printed form. It contains  the same information as that available in
       SHORT_DES.TXT, but it is formatted differently.



                       APPENDIX C: FLOPPY DISK VERSION

   C.1  IBM PC/AT 1.2 Mb disks

   SWISS-PROT release  13 is  stored on  eighteen 1.2  Mb disks.  Each  one
   contains a single bulk file (PRT13_01.BLK to PRT13_18.BLK):

   Disk     First sequence        Last Sequence
    1       10KA$MYCTU            ATCD$RAT
    2       ATCE$PIG              CATL$CHICK
    3       CATL$HUMAN            CRAM$CRAAB
    4       CRB$DROME             DRN1$SHEEP
    5       DRNE$VIBCH            FM19$BACNO
    6       FM1A$ECOLI            H3$CAEEL
    7       H3$CHICK              HPIS$RHOTE
    8       HPIS$THIPF            KABL$MOUSE
    9       KAC$HUMAN             LPID$EDWTA
   10       LPIV$ECOLI            NEF$HIV1R
   11       NEF$HIV1S             PEPA$HUMAN
   12       PEPA$MACFU            PRTS$HUMAN
   13       PRTS$SERMA            RL7$DICDI
   14       RL7$ECOLI             SST2$YEAST
   15       ST12$YEAST            TRY1$ECOLI
   16       TRY1$HUMAN            VIP$GADMO
   17       VIP$GOAT              YP3$CHLTR
   18       YP4$CHLTR             ZP3$MOUSE


   C.2  IBM PS/2 1.4 Mb disks

   The number  and content  of the  1.4 Mb  disks for  the PS/2 systems are
   exactly identical to those of the 1.2 Mb disks (see above).

   C.3  Catalog file

   The SWISS-PROT  catalog file for PC/Gene (PROT_CAT.TXT) is stored on two
   disks (CATALOG  disks 1  and 2).  Insert the  first disk  in your floppy
   drive and  type: INSCAT.  Follow the  program instructions,  you will be
   prompted to enter the second disk once the content of the first one have
   been copied.

   C.4  Documentation disks

   There are  two documentation  disks. The  content of  these two disks is
   described in appendix B.

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland