ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland
Search for



                    SWISS-PROT RELEASE 24.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 24.0  of SWISS-PROT  contains 28154 sequence entries, comprising
   9'545'427 amino  acids abstracted from 27750 references. This represents
   an increase  of 5.9% over release 23. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427

   1.2  Source of data

   Release 24.0  has been  updated using protein sequence data from release
   34.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 33.0 of the
   EMBL Nucleotide Sequence Database.














<PAGE>




   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):            4411
   Entries with pointer(s) to only EMBL entri(es):           3691
   Entries with pointer(s) to both EMBL and PIR entri(es):  19493
   Entries with no pointers lines:                            559


      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 23


   2.1  Sequences and annotations

   About 1466 sequences have been added since release 23, the sequence data
   of 196  existing entries  has been  updated and  the annotations of 3300
   entries have  been revised.  In particular we have used reviews articles
   to update  the annotations  of  the  following  groups  or  families  of
   proteins:

   -  14-3-3 proteins
   -  5'-nucleotidases
   -  7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase (HPPK)
   -  Actin-capping proteins alpha subunits
   -  Bacterial regulatory proteins, crp/fnr family
   -  Beta-lactamases class B
   -  Calreticulins and calnexins
   -  Chaperonins TCP-1
   -  Chitinases
   -  Clusterins
   -  Cold shock proteins
   -  Dihydropteroate synthase (DHPS)
   -  DNA-directed DNA polmyerases (C family)
   -  Glutamyl-tRNA reductases
   -  Glycoprotein hormones
   -  Granulins
   -  Guanine-nucleotide releasing factors CDC24 family
   -  Hirudins
   -  Leader peptidase family
   -  Neurotransmitters transporters
   -  Pancreatic hormone family
   -  Prokaryotic-type release factors
   -  Prolyl endopeptidases
   -  Receptor tyrosine kinase class V (eph, eck, elk, etc.)
   -  secY proteins
   -  Serine proteases, subtilisin family (subtilases)
   -  Tranketolases
   -  Transcription factor TFIIB
   -  Transthyretins
   -  Visual pigments (opsins)
   -  Wnt-1 family




<PAGE>




   2.2  Weekly update of SWISS-PROT

   Starting with  this release  we are  providing weekly  updates of SWISS-
   PROT. These  updates are available by anonymous FTP. Three files will be
   updated every week:

   new_seq.dat    Contains all the new entries since the last full release.
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release.
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since the last release.

   Currently these  files are  available on  the  following  anonymous  ftp
   servers:

   Organism       EMBL ftp server
   Address        ftp.embl-heidelberg.de (or 192.54.41.33)
   Directory      /pub/databases/swissprot/new

   Organism       ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/swiss-prot/updates

   Organism       National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/swiss-prot/updates


   !! Important notes !!!

   Although we  are going  to try  to follow  a regular schedule, we do not
   promise to  update these  files every week. In some cases two weeks will
   elapse in-between two updates.

   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free. Also,
   for the  same reason,  new entries  will not  contain  an  OC  (Organism
   Classification) line.


                   3.0 CHANGES PLANNED FOR FUTURE RELEASES

   3.1  Change in the RA line concerning the author names format

   As from  release 25  in March  1993 we  will change the format of author
   names on  RA lines  to conform  to  that  used  by  major  bibliographic
   databases such  as Medline.  The main  change is  that the  periods  and
   hyphens ("-") which currently appear within initials will not appear any
   more. For example, the current:

   RA   Wilson A.C., Smith J.-C.;





<PAGE>



   will appear as:

   RA   Wilson AC, Smith JC;



                            4. ENZYME AND PROSITE

   4.1  The ENZYME data bank

   Release 11.0  of the  ENZYME data bank is distributed with release 24 of
   SWISS-PROT. ENZYME  release 11.0  contains information  relative to 3489
   enzymes. The  data bank  has been  significantly modified  to take  into
   account the  information available  in the  new edition of the IUPAC-IUB
   Enzyme Nomenclature  book which  describes many  new enzymes and updates
   the information concerning existing ones.

   4.2  The PROSITE data bank

   Release 10.0  of the PROSITE data bank is distributed with release 24 of
   SWISS-PROT.  Release  10.0  contains  635  documentation  chapters  that
   describes 803  different patterns.  Since  the  last  major  release  of
   PROSITE (release 9.00 of June 1992), 55 new chapters have been added and
   about 255 chapters have been updated. The new chapters are listed below.

   -  14-3-3 proteins signatures
   -  5'-nucleotidase signatures
   -  7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature
   -  Aminotransferases class-IV signature
   -  AP endonucleases family 1 signatures
   -  AP endonucleases family 2 signatures
   -  ArgE / dapE / CPG2 family signatures
   -  Barwin domain signatures
   -  Beta-lactamases class B signatures
   -  Calreticulin family signatures
   -  Chaperonins TCP-1 signatures
   -  Chitinases class I signatures
   -  Chorismate synthase signatures
   -  Dihydropteroate synthase signatures
   -  Electron transfer flavoprotein alpha-subunit signature
   -  Endonuclease III iron-sulfur binding region signature
   -  Enterobacterial virulence outer membrane protein signatures
   -  Formate--tetrahydrofolate ligase signatures
   -  F-actin capping protein alpha subunit signatures
   -  Germin family signature
   -  GltP / dctA family of transporters signatures
   -  Glutamyl-tRNA reductase signature
   -  Glycoprotein hormones alpha chain signatures
   -  Glycosyl hydrolases family 11 active site signatures
   -  Glycosyl hydrolases family 3 active site
   -  Granulins signature






<PAGE>




   -  Granulocyte-macrophage colony-stimulating factor signature
   -  Guanine-nucleotide dissociation stimulators CDC24 family signature
   -  Guanine-nucleotide dissociation stimulators CDC25 family signature
   -  Involucrin signature
   -  Phosphoglucomutase & phosphomannomutase phosphohistidine signature
   -  Prokaryotic ornithine and lysine decarboxylases pyridoxal-phosphate
   -  Prokaryotic-type carbonic anhydrases signatures
   -  Prokaryotic-type peptide chain release factors signature
   -  Prolyl endopeptidase family serine active site
   -  Protein secY signatures
   -  Receptor tyrosine kinase class V signatures
   -  Riboflavin synthase alpha chain family Lum-binding site signature
   -  Ribosomal protein L13 signature
   -  Ribosomal protein L30e signature
   -  Ribosomal protein L34 signature
   -  Ribosomal protein S16 signature
   -  Ribosomal protein S17e signature
   -  Ribosomal protein S26e signature
   -  Sigma-54 factors family signatures
   -  Sigma-70 factors family signatures
   -  Single-strand binding protein family signatures
   -  Stress-induced proteins SRP1/TIP1 family signature
   -  S-adenosyl-L-homocysteine hydrolase signatures
   -  Tetrahydrofolate dehydrogenase/cyclohydrolase signatures
   -  Transcription factor TFIIB repeat signature
   -  Transketolase signatures
   -  Transthyretin signatures
   -  WHEP-TRS domain signature
   -  XPAC protein signatures



                            5. WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.
















<PAGE>




                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.68   Gln (Q) 4.03   Leu (L) 9.15   Ser (S) 7.07
   Arg (R) 5.25   Glu (E) 6.25   Lys (K) 5.80   Thr (T) 5.85
   Asn (N) 4.43   Gly (G) 7.11   Met (M) 2.34   Trp (W) 1.30
   Asp (D) 5.26   His (H) 2.26   Phe (F) 3.97   Tyr (Y) 3.22
   Cys (C) 1.81   Ile (I) 5.50   Pro (P) 5.07   Val (V) 6.51

   Asx (B) 0.01   Glx (Z) 0.01   Xaa (X) 0.03


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Ser, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 3698

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1665
                            2x:  617
                            3x:  359
                            4x:  236
                            5x:  153
                            6x:  117
                            7x:   86
                            8x:   65
                            9x:   76
                           10x:   31
                       11- 20x:  147
                       21- 50x:   86
                       51-100x:   24
                         >100x:   36












<PAGE>




        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        2094          Human
         2        1991          Escherichia coli
         3        1297          Mouse
         4        1198          Rat
         5        1092          Baker's yeast (Saccharomyces cerevisiae)
         6         576          Bovine
         7         496          Fruit fly (Drosophila melanogaster)
         8         448          Chicken
         9         423          Bacillus subtilis
        10         318          African clawed frog (Xenopus laevis)
        11         316          Salmonella typhimurium
        12         303          Rabbit
        13         278          Pig
        14         251          Vaccinia virus (strain Copenhagen)
        15         209          Maize
        16         193          Human cytomegalovirus (strain AD169)
        17         168          Bacteriophage T4
        18         162          Vaccinia virus (strain WR)
        19         158          Rice
        20         147          Tobacco
        21         140          Wheat
        22         136          Pseudomonas aeruginosa
        23         134          Arabidopsis thaliana (Mouse-ear cress)
                   134          Pea
        25         128          Barley
        26         120          Staphylococcus aureus
        27         117          Fission yeast (Schizosaccharomyces pombe)
                   117          Marchantia polymorpha (Liverwort)
        29         116          Spinach
        30         113          Sheep
        31         111          Slime mold (Dictyostelium discoideum)
                   111          Soybean
        33         109          Caenorhabditis elegans
        34         104          Dog
        35         103          Neurospora crassa
        36         101          Pseudomonas putida

















<PAGE>




   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    1706             1001-1100      270
                 51- 100    2911             1101-1200      157
                101- 150    4223             1201-1300      133
                151- 200    2703             1301-1400       86
                201- 250    2317             1401-1500       71
                251- 300    2091             1501-1600       38
                301- 350    1909             1601-1700       36
                351- 400    1878             1701-1800       33
                401- 450    1423             1801-1900       36
                451- 500    1633             1901-2000       27
                501- 550    1095             2001-2100       10
                551- 600     786             2101-2200       33
                601- 650     543             2201-2300       40
                651- 700     402             2301-2400       13
                701- 750     385             2401-2500       15
                751- 800     309             >2500           78
                801- 850     227
                851- 900     236
                901- 950     145
                951-1000     156


   Currently the ten largest sequences are:


                            RYNR_RABIT  5037 a.a.
                            RYNR_HUMAN  5032 a.a.
                            APB_HUMAN   4563 a.a.
                            APOA_HUMAN  4548 a.a.
                            DYHC_TRIGR  4466 a.a.
                            POLG_BVDV   3988 a.a.
                            VGF1_IBVB   3951 a.a.
                            POLG_HCVA   3898 a.a.
                            POLG_HCVB   3898 a.a.
                            ACVT_PENCH  3791 a.a.
















<PAGE>




                         APPENDIX B: ON-LINE EXPERTS


   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise            Name               Email address
---------------------------   ------------------ -----------------------------
Alcohol dehydrogenases        Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt.persson@embl-heidelberg.
                                                 de
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt.persson@embl-heidelberg.
                                                 de
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl@caos.caos.kun.nl
                              de Jong W.         u629000@hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred@blekul13.bitnet
AA-tRNA synthetases class II  Leberman R.        leberman@frembl51.bitnet
Apolipoproteins               Boguski M.S.       boguski@ncbi.nlm.nih.gov
Arrestins                     Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.
                                                 edu
ATP synthase c subunit        Recipon H.         recipon@ncbi.nlm.nih.gov
Band 4.1 family proteins      Rees J.            jrees@vax.oxford.ac.uk
Beta-lactamases               Brannigan J.       jab5@vaxa.york.ac.uk
Beta-transducin family        Boguski M.S.       boguski@ncbi.nlm.nih.gov
C-type lectin domain          Drickamer K.       drick@cuhhca.hhmi.columbia.
                                                 edu
Chalcone/stilbene synthases   Schroeder J.       raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60       Georgopoulos C.    georgopo@cmu.unige.ch
Chaperonins TCP1 family       Willison K.R.      willison@icr.ac.uk
Chitinases                    Henrissat B.       bernie@cermav.grenet.fr
Clusterin                     Peitsch M.C.       peitsch@ulbio1.unil.ch
Cold shock domain             Landsman D.        landsman@ncbi.nlm.nih.gov
CTF/NF-I                      Mermod N.          nmermod@ulys.unil.ch
                              Gronostajski R.    gronosr@ccsmtp.ccf.org
Cytochromes P450              Holsztynska E.J.   ela@netcom.uucp
                                                 netcom!ela@apple.com
DEAD-box helicases            Linder P.          linder@urz.unibas.ch
dnaJ family                   Kelley W.          kelley@cmu.unige.ch
EF-hand calcium-binding       Cox J.A.           cox@sc2a.unige.ch
                              Kretsinger R.H.    rhk5i@virginia.bitnet
Enoyl-CoA hydratase           Hofmann K.O.       khofmann@biomed.biolan.
                                                 uni-koeln.de
fruR/lacI family HTH proteins Reizer J.          jreizer@ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski@ncbi.nlm.nih.gov
GDT/GTP dissociation stimul.  Boguski M.S.       boguski@ncbi.nlm.nih.gov
GltP family of transporters   Hofmann K.O.       khofmann@biomed.biolan.
                                                 uni-koeln.de
Glucanases                    Henrissat B.       bernie@cermav.grenet.fr
                              Beguin P.          phycel@pasteur.bitnet
Glutamine synthetase          Tateno Y.          ytateno@genes.nig.ac.jp





<PAGE>



G-protein coupled receptors   Chollet A.         chollet@clients.switch.ch
                              Attwood T.K.       bph6tka@biovax.leeds.ac.uk
GTPase-activating proteins    Boguski M.S.       boguski@ncbi.nlm.nih.gov
HMG1/2 and HMG-14/17          Landsman D.        landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. kolakowski@helix.mgh.harvard.
                                                 edu
Integrases                    Roy P.H.           2020000@saphir.ulaval.ca
Kringle domain                Ikeo K.            kikeo@genes.nig.ac.jp
Lipocalins                    Boguski M.S.       boguski@ncbi.nlm.nih.gov
                              Peitsch M.C.       peitsch@ulbio1.unil.ch
lysR family HTH proteins      Henikoff S.        henikoff@sparky.fhcrc.org
MAC components / perforin     Peitsch M.C.       peitsch@ulbio1.unil.ch
Malic enzymes                 Glynias M.         mglynias@ncsa.uiuc.edu
Myelin proteolipid protein    Hofmann K.O.       khofmann@biomed.biolan.
                                                 uni-koeln.de
Pancreatic trypsin inhibitor  Ikeo K.            kikeo@genes.nig.ac.jp
PEP requiring enzymes         Reizer J.          jreizer@ucsd.edu
pfkB carbohydrate kinases     Reizer J.          jreizer@ucsd.edu
Phytochromes                  Partis M.D.        partis@gcri.afrc.ac.uk
Protein kinases               Hanks S.           hanks@vuctrvax.bitnet
                              Hunter T.          hunter@salk.bitnet
PTS proteins                  Reizer J.          jreizer@ucsd.edu
Restriction-modification      Bickle T.          bickle@urz.unibas.ch
            enzymes           Roberts R.J.       roberts@neb.com
Ribosomal protein S3          Hallick R.         hallick%biotec@arizona.edu
Ribosomal protein S15         Ellis S.R.         srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        harayama@cmu.unige.ch
Signal sequence peptidases    von Heijne G.      gvh@csb.ki.se
                              Dalbey R.E.        rdalbey@magnus.acs.ohio-state.
                                                 edu
Sodium symporters             Reizer J.          jreizer@ucsd.edu
Subtilases                    Brannigan J.       jab5@vaxa.york.ac.uk
                              Siezen R.J.        nizo@caos.caos.kun.nl
Thiol proteases               Turk B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk@ijs.ac.mail.yu
TNF family                    Jongeneel C.V.     vjongene@isrecmail.unil.ch
TPR repeats                   Boguski M.S.       boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gvh@csb.ki.se
Type-II membrane antigens     Levy S.            levy@cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland@bio.uib.no
Vitamin K-depend. Gla domain  Price P.A.         pprice@ucsd.edu
Xylose isomerase              Jenkins J.         jenkins@frira.afrc.ac.uk
WAP-type domain               Claverie J.-M.     jmc@ncbi.nlm.nih.gov
ZP domain                     Bork P.            bork@embl-heidelberg.de

African swine fever virus     Yanez R.J.         ryanez@cbm2.uam.es
Bacteriophage P4              Halling C.         chh9@midway.uchicago.edu
Drosophila                    Ashburner M.       ma11@phx.cam.ac.uk
Escherichia coli              Rudd K.            rudd@ncbi.nlm.nih.gov
Salmonella typhimurium        Rudd K.            rudd@ncbi.nlm.nih.gov
Snakes                        Stocklin R.        stocklin@cmu.unige.ch
Yeast chromosome I            Ouellette F.       francis@monod.biol.mcgill.ca





<PAGE>




   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and which would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at one of the following electronic mail addresses:

                             bairoch@cmu.unige.ch
                           bairoch@cgecmu51.bitnet







































<PAGE>





           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES


   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:


                                                       **********************
                        *********************** <----- * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * -----> **********************
                        *  Sequence Data      *
***************** ----> *  Library            *        **********************
* FLYBASE       * <---- *********************** <----- * ECD [E. coli map]  *
* [Drosophila   *                ^  |       ^          **********************
* genetic maps] * --------+      |  |       |
***************** <-----+ |      |  |       +--------- **********************
                        | |      |  |       +--------- * TFD [Trans. fact.] *
                        | |      |  |       | +------> **********************
                        | |      |  |       | |
*****************       | v      |  v       v |        **********************
* REBASE        *       ***********************        * ENZYME [Nomencl.]  *
* [Restriction  * <---- *  SWISS-PROT         * <----- **********************
*  enzymes]     *       *  Protein Sequence   *            |
*****************       *  Data Bank          *            v
                        ***********************        **********************
*****************         | ^  |  ^ |  ^ |  |          * OMIM   [Diseases]  *
* PROSITE       * <-------+ |  |  | |  | |  +--------> **********************
* [Patterns]    * ----------+  |  | |  | |
*****************              |  | |  | +-----------> **********************
             |                 |  | |  +-------------- * E. coli 2D gels    *
             |                 |  | |                  **********************
             |                 |  | |
             |                 |  | +----------------> **********************
             |                 |  +------------------- * EcoGene/EcoSeq     *
             |                 v                       **********************
             |          ***********************
             +--------> * PDB [3D structures] *
                        ***********************

















<PAGE>

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland