ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland
Search for

             SWISS-PROT RELEASE 5.0 RELEASE NOTES


   Date:     September 8, 1987
   Author:   A. Bairoch


                          INTRODUCTION

   Release 5.0  of SWISS-PROT  contains 5205 sequence entries,
   comprising  1'327'683  amino  acids  abstracted  from  5673
   references. This represents an increase of 28% over release
   4.0. The  recent growth  of the  data  bank  is  summarized
   below:

   Release      Date         Number of entries           Nb of amino
                                                         acids

   3.0          11/86        4160                            969 641
   4.0          04/87        4387                          1 036 010
   5.0          09/87        5205                          1 327 683


                         SOURCE OF DATA

   Release 5.0  has been  updated using  protein sequence data
   from release  12.0 of the PIR protein data bank, as well as
   translation of nucleic-acid sequence data from release 12.0
   of the EMBL nucleotide Data Library. The statistics for the
   source of the sequences in release 5.0 of SWISS-PROT are:

   Entries adapted from PIR entries                   4419
   Entries translated from EMBL entries                708
   Entries entered "in house"                           78


   DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 4


   New sequence data

   To make  SWISS-PROT as complete as possible we have scanned
   the EMBL  Data Library  for sequences  not yet available in
   the protein  data bank and we have translated and annotated
   some 700 additional sequences.

   In addition  to those  new  sequences,  some  200  existing
   entries have  been updated  with sequence  data  translated
   from EMBL entries.


   We also  have started to introduce DR lines pointing to the
   primary accession  number of  the nucleotide sequence entry
   in EMBL  to entries  which were sequenced at the DNA or RNA
   level.  The   current  statistics  for  DR  lines  are  the
   following:

   Entries with pointer line(s) to a PIR entry        4419
   Entries with pointer line(s) to an EMBL entry      1512
   Entries with pointer line(s) to a PDB entry         140


   Modifications in the feature keys

   -  A new feature key has been introduced with this release:
      MUTAGEN which  is used to describe sites which have been
      experimentally altered.

   -  The description  part of  the CONFLICT  and VARIANT keys
      now follows a precise format.

      a) If  the position described is missing the description
         part will be similar to those shown in  the following
         examples:

      FT   CONFLICT     33     33       MISSING (IN REF. 2).
      FT   VARIANT      12    245       MISSING (IN SMALL VARIANT).

      b) Positions with amino acid change(s) are now described
         as shown in the following examples:

      FT   CONFLICT     60     60       P -> A (IN REF. 2).
      FT   CONFLICT      1      5       ASTQS -> GAWTL (IN REF. 3).
      FT   VARIANT      39     39       P -> G (IN 50% OF THE MOLECULES).


   Format modifications

   We have  made a number of small modifications to the format
   of SWISS-PROT  to make  it more compatible with that of the
   EMBL Data Library. Those format changes are the following:

   -  Journal abbreviations  are now  identical in  both  data
      banks.
   -  The format  of the  RL line  for submitted  data and for
      Ph.D. thesis  has been  modified to  be compatible  with
      that used in the EMBL Data Library.
   -  In the  DT line,  there are  now two  spaces between the
      date and  the comment  part (instead  of one in previous
      releases).


   Organism identification code changes

   We have modified some viruses organism identification codes
   to make  them compatible  with the  abbreviations  commonly
   used. Among the codes changed are those of the AIDS viruses
   which now have organism codes starting with HIV1 or HIV2.


   Various changes

   -  Keywords have  been updated  in a  significant number of
      entries.
   -  Cytochrome P450  entry names  and descriptions have been
      totally modified to take in account new nomenclature.
   -  The annotations of a number of protein families has been
      updated and  completed using recent reviews (among those
      reviewed are: cystatins, ubiquitins, RAS oncogenes, "EF-
      hand"     calcium-binding      proteins,      ferritins,
      serine/threonine protein  kinases, GTP binding proteins,
      etc).



                        THE NEXT RELEASE

   SWISS-PROT release 6.0 will be probably ready in the end of
   january or  beginning of february 1987, it will include new
   data from  PIR release  13  and  a  significant  number  of
   additional translated EMBL release 13 sequences.

   At this  stage all the sequences which are stored in SWISS-
   PROT and  which were  sequenced at  the DNA  or  RNA  level
   should have  DR lines  pointing to  the  primary  accession
   number of the nucleotide sequence entry in EMBL.

   We also  plan to  add some new feature keys and to make the
   keyword usage in SWISS-PROT more consistent.



                APPENDIX A: DISKS FOR SWISS-PROT


   IBM PC/AT 1.2Mb disks

   -  SWISS-PROT is stored on six 1.2 Mb disks.
   -  Disks  1   to  5   each  contain   a  single  bulk  file
      (PROT5_01.BLK to  PROT5_05.BLK) which  correspond to the
      sequences adapted  from entries  in the  PIR data  bank.
      Those sequences  are classified  in the  order in  which
      they are found in the PIR data bank.
   -  Disk  6   contains  two  bulk  files:  PROT5_NW.BLK  and
      PROT5_PR.BLK. The  first bulk  file  correspond  to  new
      sequences translated from EMBL Data Library entries. The
      second one to a few "preliminary" entry sequences.


   IBM PC 360 Kb disks

   -  SWISS-PROT is stored on eighteen 360 Kb disks.
   -  Disks  1   to  15   each  contain  a  single  bulk  file
      (PROT5_01.BLK to  PROT5_15.BLK) which  correspond to the
      sequences adapted  from entries  in the  PIR data  bank.
      Those sequences  are classified  in the  order in  which
      they are found in the PIR data bank.
   -  Disks  16   to  17  each  contain  a  single  bulk  file
      (PROT5_N1.BLK and  PROT5_N2.BLK which  correspond to new
      sequences translated from EMBL Data Library entries.
   -  Disk 18  contain contains  two bulk  files: PROT5_N3.BLK
      and PROT5_PR.BLK.  The first bulk file correspond to the
      last part of the new sequences translated from EMBL Data
      Library entries.  The second  one to a few "preliminary"
      entry sequences.

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland