ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland
Search for

             SWISS-PROT RELEASE 8.0 RELEASE NOTES


   Date:     August 4, 1988
   Author:   A. Bairoch


                         1. INTRODUCTION

   1.1  Evolution

   Release 8.0  of SWISS-PROT  contains 7724 sequence entries,
   comprising  2'224'465  amino  acids  abstracted  from  8088
   references. This represents an increase of 13% over release
   7.0. The  recent growth  of the  data  bank  is  summarised
   below:

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465



   1.2  Source of data

   Release 8.0  has been  updated using  protein sequence data
   from  release  16.0  of  the  PIR  (Protein  Identification
   Resource) protein  data bank,  as well  as  translation  of
   nucleotide sequence  data from  release 15.0  of  the  EMBL
   nucleotide sequence Data Library.

   As an  indication to the source of the sequence data in the
   SWISS-PROT data bank we list here the statistics concerning
   the DR (Databank Reference) pointer lines:

   Entries with pointer(s) to only PIR entri(es):          3385
   Entries with pointer(s) to only EMBL entri(es):         2084
   Entries with pointer(s) to both EMBL and PIR entri(es): 2057
   Entries with no pointers lines (entered in house):       198



     2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE
                            RELEASE 7


   2.1  Sequences and annotations

   Some 903  new sequences  have been  added  since  the  last
   release, the  sequence data of 93 existing entries has been
   updated and  the annotations  of almost  2400 entries  have
   been revised.  In particular  we have used reviews articles
   to update  the  annotations  of  the  following  groups  or
   families of proteins:

        1,6-fructose-bisphosphate aldolases.
        Alkaline phosphatases.
        Annexins.
        Endo- and Exo-glucanases.
        Bacterial ferredoxins.
        Beta-adrenergic receptors.
        Chemotaxis proteins.
        Collagens.
        Complement proteins.
        Cytochrome B5.
        Cytosolic lipid-binding proteins.
        Fumarases.
        Malate dehydrogenases.
        Muscarinic acetylcholine receptors.
        Opsins.
        Protamines.
        PTS system proteins.
        RNA polymerase sigma factors.
        Serpins.
        Small, acid-soluble spore proteins (SASP).
        Tachykinins.
        TGF-beta/Inhibins/MIS.
        Zinc-finger proteins.
        Zinc metallo endopeptidases.


   2.2  New feature table keys

   Two  new  feature  keys  have  been  introduced  with  this
   release:

   ZN_FING:   Extent of a "zinc-finger" region.

   NP_BIND:   Extent  of   a   nucleotide   phosphate  binding
              region. The  nature of  the nucleotide phosphate
              is indicated  in the description field. Examples
              of NP_BIND key feature lines:

   FT   NP_BIND      13     25       ATP.
   FT   NP_BIND      45     49       GTP (POTENTIAL).


   2.3  Enhancement in the format of the CC line

   A major  proportion of  the comment blocks are now arranged
   according to what we designate as 'topics`, the format of a
   comment block which belongs to a 'topic` is:

   CC    -!- TOPIC: FREE TEXT DESCRIPTION.

   The current topics and their definition are:

   CATALYTIC ACTIVITY Description of the reaction(s) catalysed
                      by an enzyme [*].
   DISEASE            Description of the disease(s) associated
                      with a deficiency of a protein.
   FUNCTION           General description  of the  function(s)
                      of a protein.
   INDUCTION          Description  of  the  compound(s)  which
                      stimulate the synthesis of a protein.
   PATHWAY            Description of  the metabolic pathway(s)
                      to which is associated a protein.
   SIMILARITY         Description   of    the   similariti(es)
                      (sequence or  structural) of  a  protein
                      with other proteins.
   SUBCELLULAR LOCATION     Description  of   the  subcellular
                      location of a mature protein product.
   SUBUNIT            Description of  the quaternary structure
                      of a protein.

   [*]  Whenever it was possible we have used, to describe the
      catalytic activity  of an enzyme, the recommendations of
      the Nomenclature Committee of the International Union of
      Biochemistry (IUB) as published in:

      Enzyme Nomenclature,  NC-IUB, Academic  Press, New-York,
      (1984).
      and
      Supplement  1:   Corrections  and   Additions,  Eur.  J.
      Biochem. 157:1-26(1986).

   Here is,  for each  of the topics defined above, an example
   of its usage:

   CC   -!- CATALYTIC ACTIVITY: ATP + L-GLUTAMATE + NH(3) =
   CC       ADP + GLUTAMINE + ORTHOPHOSPHATE.
   CC   -!- DISEASE: NADH-CYTOCHROME B5 REDUCTASE DEFICIENCY
   CC       CAUSES HEREDITARY METHEMOGLOBINEMIA.
   CC   -!- FUNCTION: PREVENTS THE POLYMERIZATION OF ACTIN.
   CC   -!- INDUCTION: ARGINASE ACTIVITY CAN BE INDUCED BY
   CC       ARGININE OR HOMOARGININE.
   CC   -!- PATHWAY: FIRST STEP IN PROLINE BIOSYNTHESIS.
   CC   -!- SIMILARITY: WITH SUBTILISINS, THERMITASE, AND
   CC       PROTEINASE K.
   CC   -!- SUBCELLULAR LOCATION: MITOCHONDRIAL MATRIX.
   CC   -!- SUBUNIT: TETRAMER OF IDENTICAL CHAINS.


   2.4  Small change in the format of the RN line

   The syntax  of RN  lines  for  articles  which  report  the
   sequence of a protein translated from a nucleotide sequence
   used to  be "SEQUENCE  TRANSL. FROM N.A. SEQ.", and has now
   been changed to "SEQUENCE FROM N.A."

   Examples:

   RN   [1] (SEQUENCE FROM N.A.)
   RN   [1] (SEQUENCE OF 12-143 FROM N.A.)


   2.5 Changes in the documentation

   The document  file SPECFREQ.TXT  which was used to list the
   frequency table  of usage  of  the  species  identification
   codes  has  been  discontinued,  this  information  is  now
   included in a new appendix of these release notes (Appendix
   B), along  with other  statistics such  as the  amino  acid
   composition in  the data  bank and  the repartition  of the
   sequences by size (number of residues).



                       3. THE NEXT RELEASE

   SWISS-PROT release  9.0 will be available in December 1988.
   We plan to upgrade the OC lines so as to add at least a few
   level of  taxonomic nodes  (currently  only  the  top-node:
   viridae, prokaryota or eukaryota is available).



                     4. WE NEED YOUR HELP !

   We welcome any feedback from our users. We especially would
   appreciate that  you notify  us if  you find that sequences
   belonging to  your field  of expertise are missing from the
   data  bank.  We  also  would  like  to  be  notified  about
   annotations to  be updated,  as for example if the function
   of  a   protein  has   been  clarified   or  if  new  post-
   translational information has become available.



                APPENDIX A: DISKS FOR SWISS-PROT


   A.1  IBM PC/AT 1.2 Mb disks

   SWISS-PROT is  stored on  nine 1.2  Mb disks. Each of these
   disk  contains   a  single   bulk  file   (PROT8_01.BLK  to
   PROT8_09.BLK):

   Disk     First sequence        Last Sequence
    1       16K$TRVPS             CD8$HUMAN
    2       CD8$MOUSE             DYR$LACCA
    3       DYR$MESAU             HBA$CAVPO
    4       HBA$CEBAP             KNL1$BOVIN
    5       KNL1$HUMAN            NRAM$INATO
    6       NRAM$INATR            RASN$MOUSE
    7       RB1$HUMAN             TRA$MAIZE
    8       TRA$PSEAE             Y8K6$VACCV
    9       Y90K$VACCV            ZEG1$MAIZE


   A.2  IBM PS/2 1.4 Mb disks

   The number  and content  of the  1.4 Mb  disks for the PS/2
   systems are  exactly identical to those of the 1.2 Mb disks
   (see above).


   A.3  IBM PC 360 Kb disks

   SWISS-PROT is  stored on  twenty eight  (28) 360  Kb disks.
   Each  of   these  disks   contains  a   single  bulk   file
   (PROT8_01.BLK to PROT8_28.BLK):

   Disk     First sequence        Last Sequence
    1       16K$TRVPS             AMEG$BOVIN
    2       AMID$XENLA            AZUR$ALCDE
    3       AZUR$ALCFA            CART$CHICK
    4       CAS1$BOVIN            COA1$POVMA
    5       COA1$SV40             CRG2$RANTE
    6       CRG2$RAT              CYS1$DICDI
    7       CYS2$DICDI            ELA2$MOUSE
    8       ELA2$PIG              FIBG$RAT
    9       FIBH$HUMAN            GLEM$HUMAN
   10       GLGA$ECOLI            HB2I$HUMAN
   11       HB2I$MOUSE            HEMA$MEASH
   12       HEMA$NDV              IG1R$HUMAN
   13       IGAO$DACDE            KAG2$MOUSE
   14       KAG3$MOUSE            KV5C$MOUSE
   15       KV5D$MOUSE            MAL3$DROME
   16       MALE$ECOLI            MYSB$RABIT
   17       MYSP$RAT              NUO5$PANPA
   18       NUO5$PONPY            PG40$HUMAN
   19       PGCA$CHICK            POLN$SFV
   20       POLN$SINDV            RADX$YEAST
   21       RAN1$SCHPO            ROB2$HUMAN
   22       ROC$HUMAN             SP0A$BACSU
   23       SP0B$BACSU            THRB$BOVIN
   24       THRB$HUMAN            TX1$ANESU
   25       TX11$NAJHH            VGE$BPS13
   26       VGF$BPG4              VNST$VSVJ
   27       VNST$VSVJM            YGYW$ECOLI
   28       YHL1$EPBAR            ZEG1$MAIZE


   A.4  IBM PS/2 720 Kb disks

   SWISS-PROT is stored on fourteen (14) 720 Kb disks. Each of
   these disks contains two of the 360 Kb format bulk files.



                   APPENDIX B: SOME STATISTICS


   B.1  Amino acid composition

        Composition in percent for the complete data bank

   Ala (A) 7.76   Gln (Q) 4.11   Leu (L) 9.07   Ser (S) 7.04
   Arg (R) 5.12   Glu (E) 6.12   Lys (K) 5.88   Thr (T) 5.86
   Asn (N) 4.36   Gly (G) 7.33   Met (M) 2.28   Trp (W) 1.36
   Asp (D) 5.21   His (H) 2.30   Phe (F) 3.95   Tyr (Y) 3.25
   Cys (C) 1.93   Ile (I) 5.28   Pro (P) 5.21   Val (V) 6.52

   Asx (B) 0.02   Glx (Z) 0.02   Xaa (X) 0.01


        Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Lys, Thr, Ile, Pro, Asp, Arg,
   Asn, Gln, Phe, Tyr, His, Met, Cys, Trp


   B.2   Repartition of  the sequences  by their  organism  of
         origin

   Total number  of species represented in this release of the
   data bank: 1345

   Species represented 1x:  636
                       2x:  259
                       3x:  140
                       4x:   88
                       5x:   40
                       6x:   39
                       7x:   25
                       8x:   23
                       9x:   15
                      10x:    8
                     >10x:   72


        Table of the most represented species

   739: HUMAN       133: CHICK       62: BACSU       55: TOBAC
   628: ECOLI       125: RABIT           LAMBD       54: BPT7
   414: MOUSE       117: DROME       59: EPBAR       52: MARPO
   312: RAT         113: PIG             MAIZE
   256: YEAST        68: XENLA       58: SALTY
   250: BOVIN        66: BPT4            VACCV


   B.3  Repartition of the sequences by size

   From   To Number             From   To Number
      1-  50    384             1001-1100     36
     51- 100   1000             1101-1200     30
    101- 150   1743             1201-1300     26
    151- 200    856             1301-1400     18
    201- 250    581             1401-1500     12
    251- 300    494             1501-1600      2
    301- 350    450             1601-1700     10
    351- 400    432             1701-1800      7
    401- 450    309             1801-1900      5
    451- 500    350             1901-2000      2
    501- 550    284             >2000         42
    551- 600    166
    601- 650    123             Currently the two largest sequences are:
    651- 700     74             APB$HUMAN   4563 a.a.
    701- 750     76             APOA$HUMAN  4548 a.a.
    751- 800     55
    801- 850     46
    851- 900     56
    901- 950     36
    951-1000     22

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland