ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland
Search for
                   SWISS-PROT RELEASE 35.0 RELEASE NOTES

                   1.  INTRODUCTION

 Release 35.0 of SWISS-PROT contains  69'113 sequence entries, comprising
 25'083'768  amino   acids  abstracted   from  59'101   references.  This
 represents an increase of 18.3% over release  34. The growth of the data
 bank is summarized below.

 Release      Date           Number of       Number of amino
                               entries                 acids
    2.0       09/86               3939               900 163
    3.0       11/86               4160               969 641
    4.0       04/87               4387             1 036 010
    5.0       09/87               5205             1 327 683
    6.0       01/88               6102             1 653 982
    7.0       04/88               6821             1 885 771
    8.0       08/88               7724             2 224 465
    9.0       11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420
   28.0       02/94              36000            12 496 420
   29.0       06/94              38303            13 464 008
   30.0       10/94              40292            14 147 368
   31.0       02/95              43470            15 335 248
   32.0       11/95              49340            17 385 503
   33.0       02/96              52205            18 531 384
   34.0       10/96              59021            21 210 389
   35.0       11/97              69113            25 083 768



     2.  DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 34

 2.1  Sequences and annotations

 10'189 sequences have been added since  release 34, the sequence data of
 1654 existing entries  has been  updated and  the annotations  of 15'683
 entries have been revised.


 2.2  What's happening with the model organisms

 We have selected  a number of  organisms that  are the  target of genome
 sequencing and/or mapping projects and for which we intend to:

 . Be as complete  as possible. All  sequences available at  a given time
   should be  immediately  included  in  SWISS-PROT.  This also  includes
   sequence corrections and updates;
 . Provide a higher level of annotation;
 . Provide cross-references  to  specialized  database(s)  that  contain,
   among other data, some  genetic information about the  genes that code
   for these proteins;
 . Provide specific indices or documents.

 What was done  since the  last release  or in  preparation for  the next
 release concerning model organisms:

 . We  have   added   Methanoccocus   jannaschii,   Helicobacter  pylori,
   Synechocystis PCC 6803 to  the list of model  organisms. The genome of
   these organisms has been completely sequenced  and we plan to annotate
   them fully  in SWISS-PROT.  Specific  documents have  been  added (see
   section 4) for each of these organisms.

 . We also  have  added  mouse (Mus  musculus)  as  a  model  organism. A
   significant effort has been done to  add new mouse sequences (542 have
   been added since the  last release); we  have added links  to MGD (the
   Mouse Genome Database;  see section  2.4) and we  also have  created a
   specific  document  (MGDTOSP.TXT)  that   lists  the  cross-references
   between MGD and SWISS-PROT.

 . We have  continued  our effort  in  catching up  with  the  backlog of
   sequences from  other  model  organisms. In  particular  we  added 410
   entries from  yeast,  644  from  human,  89  from  S.pombe,  527  from
   C.elegans, 95 from A.thaliana and 92 from D.melanogaster.

 . We have added  in SWISS-PROT all  the sequences  from yeast chromosome
   XIII. We plan  to integrate data  from the  remaining chromosomes (IV,
   XII, XV and XVI) very soon  so as to have a  complete set of annotated
   yeast sequences.

 . We have finished the annotation of all Mycoplasma genitalium entries.

 . We plan  to  finish  as quickly  as  possible  the  annotation  of the
   Escherichia coli and Haemophilus influenzae sequence entries which are
   not yet part of SWISS-PROT.

 Here is the current status of the model organisms in SWISS-PROT:

 Organism        Database            Index file       Number of
                 cross-referenced                     sequences
 --------------  ----------------    --------------   ---------
 A.thaliana      None yet            In preparation         658
 B.subtilis      SubtiList           SUBTILIS.TXT          1882
 C.albicans      None yet            CALBICAN.TXT           167
 C.elegans       Wormpep             CELEGANS.TXT          1735
 D.discoideum    DictyDB             DICTY.TXT              272
 D.melanogaster  FlyBase             FLY.TXT               1002
 E.coli          EcoGene             ECOLI.TXT             4098
 H.influenzae    HiDB (TIGR)         HAEINFLU.TXT          1687
 H.sapiens       MIM                 MIMTOSP.TXT           4644
 H.pylori        HpDB (TIGR)         HPYLORI.TXT            257
 M.genitalium    MgDB (TIGR)         MGENITAL.TXT           470
 M.musculus      MGD                 MGDTOSP.TXT           2971
 M.jannaschii    MjDB (TIGR)         MJANNASC.TXT          1064
 M.tuberculosis  None yet            None yet               796
 S.cerevisiae    SGD                 YEAST.TXT             4750
 S.typhimurium   StyGene             SALTY.TXT              680
 S.pombe         None yet            POMBE.TXT             1045
 S.solfataricus  None yet            None yet                42


 Collectively the entries from the above  model organisms represent 35.4%
 of all SWISS-PROT entries.


 2.3  Changes affecting the accession numbers

 With the creation of the  TrEMBL database (see section  6) and the rapid
 increase in the amount of sequence data, we  are faced with a problem of
 availability of accession numbers. Currently we  use a system based on a
 one-letter prefix followed by 5 digits. This system was also used by the
 nucleotide sequence databases  which had originally  reserved for SWISS-
 PROT the prefix letters 'P' and 'Q'. The nucleotide databases having run
 out of space (due  mainly to EST's), have  been forced to  start using a
 new format based on a two-letter prefix followed by 6 digits.

 We have  used up  all possible  numbers with  'P' and  'Q' and  the only
 letter prefix which was not  used by the nucleotide  database is 'O'. As
 we believe that  changing the  format of the  accession numbers  to that
 used now by the  nucleotide database would create  havoc on the numerous
 software packages using SWISS-PROT, we have  decided to keep a system of
 accession numbers based on a six-character  code, but with the following
 changes:

 1)   We have  started  using 'O'.  This  extra letter  should  allow the
 continuation of  the present  format (1  prefix letter  + 5  digits) for
 approximately one year.
 2)   When we will have finished using up 'O', we will introduce a system
 based on the following format:

      1        2       3          4            5            6
     [O,P,Q]  [0-9]  [A-Z, 0-9]  [A-Z, 0-9]   [A-Z, 0-9]   [0-9]

 What the above means is that we will keep a six-character code, but that
 in positions 3,  4 and  5 of this  code any  combination of  letters and
 numbers can  be  present.  This format  allows  a  total  of  14 million
 accession numbers (up from 300'000 with the current system).

 We only  allow numbers  in  positions 2  and  6 so  that  the SWISS-PROT
 accession numbers can not  be mistaken with gene  names, acronyms, other
 type of accession numbers or any type of words!

 Examples: P0A3S2, Q2ASD4, O13YX2, P9B123


 2.4  Introduction of a new CC line-type topic (DATABASE)

 There are an  increasing number of  databases that cater  for a specific
 protein or  a for  a  very limited  number  of proteins.  Most  of these
 databases are mutation databases, reporting defects  linked to a genetic
 disease. We want  to add cross-references  to these  databases when they
 are available  electronically, either  by WWW  or  by FTP.  We therefore
 added in this release, a new comments (CC) line-type 'topic': "DATABASE"
 whose syntax is the following:

  CC   -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][;
          FTP="Address"].

 Where
 `NAME' is the name of the database;
 `NOTE' (optional) is a free text note;
 `WWW'  (optional) is the WWW address (URL) of the database;
 `FTP'  (optional) is the  anonymous FTP address (including the directory
        name) where the database file(s) are stored.

 Examples of its usage:

 CC   -!- DATABASE: NAME=CD40Lbase;
 CC       NOTE=European CD40L defect database (mutation db);
 CC       WWW="http://www.expasy.ch/www/cd40lbase.html";
 CC       FTP="ftp://www.expasy.ch/databases/cd40lbase".

 CC   -!- DATABASE: NAME=PROW; NOTE=CD guide CD80 entry;
 CC       WWW="http://www.ncbi.nlm.nih.gov/prow/cd/cd80.htm".

 Please note that this topic along  with some forms of  the DR lines (see
 next section)  are  the first  occurrence  in SWISS-PROT  of  lower case
 characters (yes, we plan to go to mixed cases soon!).

 It is also,  currently, the  only part of  SWISS-PROT where  line longer
 than 75 characters can  be found as we  do not reformat long  URL or FTP
 addresses.


 2.5  Changes concerning cross-references (DR line)

 2.5.1  TIGR

 We have added cross-references  from SWISS-PROT to the  TIGR database, a
 collection  of  genomic  databases  for  microbes,  plants  and  animals
 maintained by The  Institute for  Genomic Research (TIGR)  in Rockville,
 Maryland, USA. These cross-references are present in the DR lines:

 Data bank identifier : TIGR
 Primary identifier   : The genome Open Reading Frame (ORF) code
 Secondary identifier : Not defined, a dash ("-") is stored in that field
 Example              : DR   TIGR; HP1563; -.


 2.5.2  MGD

 We have  added  cross-references  from SWISS-PROT  to  the  Mouse Genome
 Database (MGD),  maintained by  The  Jackson Laboratory  in  Bar Harbor,
 Maine, USA. These cross-references are present in the DR lines:

 Data bank identifier : MGD
 Primary identifier   : The accession number
 Secondary identifier : The gene designation
 Example              : DR   MGD; MGI :109323; HTR2B.


 2.5.3  LISTA

 We have  removed  the  cross-references  from  SWISS-PROT  to the  LISTA
 database which is no longer maintained  and which has been superseded by
 the SGD database to which SWISS-PROT is fully cross-referenced.


 2.5.4  PROSITE

 The format for cross-references to the PROSITE protein domain and family
 database used to be:

 DR   PROSITE; ACCESSION_NUMBER; ENTRY_NAME.

 It has been changed to:

 DR   PROSITE; ACCESSION_NUMBER; ENTRY_NAME; STATUS.

 Where 'ACCESSION_NUMBER' stands for the accession  number of the PROSITE
 pattern or  profile entry;  "ENTRY_NAME" is  the name  of the  entry and
 `STATUS' is one of the following:

 n
 FALSE_NEG
 PARTIAL
 UNKNOWN_n

 Where "n"  is the  number of  hits  of the  pattern or  profile  in that
 particular protein sequence. The "FALSE_NEG" status indicates that while
 the pattern or  profile did  not detect  the protein  sequence, it  is a
 member of  that  particular  family  or  domain.  The  "PARTIAL"  status
 indicates that  the  pattern  or profile  did  not  detect  the sequence
 because that sequence is not  complete and lacks the  region on which is
 based  the  pattern/profile.  Finally  the  "UNKNOWN"  status  indicates
 uncertainties as to the fact that the sequence is a member of the family
 or domain described by the pattern/profile.

 Example of PROSITE cross-references:

 DR   PROSITE; PS00107; PROTEIN_KINASE_ATP; 1.
 DR   PROSITE; PS00028; ZINC_FINGER_C2H2; 6.
 DR   PROSITE; PS00237; G_PROTEIN_RECEPTOR; FALSE_NEG.
 DR   PROSITE; PS01128; SHIKIMATE_KINASE; PARTIAL.


 2.5.5  REBASE

 Two small changes  have been made  to the syntax  of cross-references to
 the REBASE database:

 - REBASE has recently changed its accession numbers to add an additional
   digit (an extra leading zero).
 - We are  now using  mixed case  characters in the  secondary identifier
   (the name  of the restriction system)  so as to  represent exactly the
   information as stored in REBASE.

 Example:

 DR   REBASE; RB0005; ECORI.

 has been changed to:

 DR   REBASE; RB00005; EcoRI.



                  3. PLANNED CHANGES

 3.1  Extension of the accession number system

 As already explained in  detail under 2.3, we  will extend the accession
 number system when  we will  have used  up the  'O' series  of accession
 numbers. This can be anticipated for October 1998.


 3.2  Switch to the NCBI taxonomy

 To standardize the taxonomies used by different databases we will change
 with release 37 our taxonomy. We will switch to the NCBI taxonomy, which
 is  already  used  as  the  common  taxonomy  by  the  DDBJ/EMBL/GenBank
 nucleotide sequence databases.


 3.3  Introduction of RT lines

 With release 37  we will introduce  a new  line type,  the RT (Reference
 Title) line. This  optional line will  be placed  between the  RA and RL
 line. The  RT line  gives the  title  of the  paper (or  other  work) as
 exactly as possible given the limitations of the computer character set.
 The form which will  be used is that  which would be used  in a citation
 rather than displayed at  the top of the  published paper. For instance,
 where journals capitalize major  title words this is  not preserved. The
 title is enclosed  in double quotes,  and may be  continued over several
 lines as necessary.  The title lines  are terminated by  a semicolon. An
 example of the use of RT lines is shown below:

 RT   "Sequence analysis of the genome of the unicellular cyanobacterium
 RT   Synechocystis sp. strain PCC6803. I. Sequence features in the 1 Mb
 RT   region from map positions 64% to 92% of the genome.";



                  4. STATUS OF THE DOCUMENTATION FILES

 SWISS-PROT is distributed  with a  large number of  documentation files.
 Some of  these files  have  been available  for  a long  time  (the user
 manual, release  notes,  the  various  indices  for authors,  citations,
 keywords, etc.),  but  many  have  been  created  recently  and  we  are
 continuously adding new  files. Since release  34, we have  added 15 new
 document files.  The following table lists  all  the documents  that are
 currently available.

 USERMAN.TXT    User manual
 RELNOTES.TXT   Release notes
 SHORTDES.TXT   Short description of entries in SWISS-PROT
 JOURLIST.TXT   List of abbreviations for journals cited
 KEYWLIST.TXT   List of keywords in use
 SPECLIST.TXT   List of organism identification codes
 TISSLIST.TXT   List of tissues
 EXPERTS.TXT    List of on-line experts for PROSITE and SWISS-PROT
 SUBMIT.TXT     Submission of sequence data to SWISS-PROT

 ACINDEX.TXT    Accession number index
 AUTINDEX.TXT   Author index
 CITINDEX.TXT   Citation index
 KEYINDEX.TXT   Keyword index
 SPEINDEX.TXT   Species index
 DELETEAC.TXT   Deleted accession number index [1]

 7TMRLIST.TXT   List of 7-transmembrane G-linked receptors entries
 AATRNASY.TXT   List of aminoacyl-tRNA synthetases
 ALLERGEN.TXT   Nomenclature and index of allergen sequences
 BLOODGRP.TXT   List of blood group antigen proteins [1]
 CALBICAN.TXT   Index   of  Candida  albicans  entries   and  their
                corresponding gene designations
 CDLIST.TXT     CD  nomenclature  for  surface  proteins  of  human
                leucocytes
 CELEGANS.TXT   Index  of Caenorhabditis elegans entries  and their
                corresponding gene Wormpep cross-references
 DICTY.TXT      Index   of  Dictyostelium  discoideum  entries  and
                their  corresponding gene designations  and DictyDb
                cross-references
 EC2DTOSP.TXT   Index  of  Escherichia coli  Gene-protein  database
                entries referenced in SWISS-PROT
 ECOLI.TXT      Index  of Escherichia coli K12  chromosomal entries
                and their corresponding EcoGene cross-references
 EMBLTOSP.TXT   Index  of   EMBL  Database  entries  referenced  in
                SWISS-PROT
 EXTRADOM.TXT   Nomenclature of extracellular domains
 FLY.TXT        Index  of  Drosophila  entries and  FlyBase  cross-
                references [1]
 GLYCOSID.TXT   Classification  of glycosyl hydrolase  families and
                index of glycosyl hydrolase entries
 HAEINFLU.TXT   Index  of  Haemophilus  influenzae  RD  chromosomal
                entries
 HOXLIST.TXT    Vertebrate  homeotic Hox proteins: nomenclature and
                index
 HPYLORI.TXT    Index   of   Helicobacter   pylori   strain   26695
                chromosomal entries [1]
 HUMCHR18.TXT   Index of protein  sequence entries encoded on human
                chromosome 18 [1]
 HUMCHR19.TXT   Index of protein  sequence entries encoded on human
                chromosome 19 [1]
 HUMCHR20.TXT   Index of protein  sequence entries encoded on human
                chromosome 20
 HUMCHR21.TXT   Index of protein  sequence entries encoded on human
                chromosome 21
 HUMCHR22.TXT   Index of protein  sequence entries encoded on human
                chromosome 22
 HUMCHRX.TXT    Index of protein  sequence entries encoded on human
                chromosome X
 HUMCHRY.TXT    Index of protein  sequence entries encoded on human
                chromosome Y
 INITFACT.TXT   List and index of translation initiation factors [1]
 MIMTOSP.TXT    Index of MIM entries referenced in SWISS-PROT
 METALLO.TXT    Classification  of  metallothioneins and  index  of
                entries in SWISS-PROT [1]
 MGDTOSP.TXT    Index of MGD entries referenced in SWISS-PROT [1]
 MGENITAL.TXT   Index  of Mycoplasma genitalium chromosomal entries
                [1]
 MJANNASC.TXT   Index of Methanococcus jannaschii entries [1]
 NGR234.TXT     Table  of   putative  genes  in  Rhizobium  plasmid
                pNGR234a [1]
 NOMLIST.TXT    List   of  nomenclature   related  references   for
                proteins
 PCC6803.TXT    Index of Synechocystis strain PCC 6803 entries [1]
 PDBTOSP.TXT    Index  of X-ray  crystallography Protein Data  Bank
                (PDB) entries referenced in SWISS-PROT
 PEPTIDAS.TXT   Classification  of peptidase families and  index of
                peptidase entries
 PLASTID.TXT    List of chloroplast and cyanelle encoded proteins
 POMBE.TXT      Index   of  Schizosaccharomyces  pombe  entries  in
                SWISS-PROT    and    their    corresponding    gene
                designations
 RESTRIC.TXT    List of restriction enzyme and methylase entries
 RIBOSOMP.TXT   Index of  ribosomal proteins classified by families
                on the basis of sequence similarities
 SALTY.TXT      Index  of  Salmonella typhimurium  LT2  chromosomal
                entries  and  their  corresponding  StyGene  cross-
                references
 SUBTILIS.TXT   Index of  Bacillus subtilis 168 chromosomal entries
                and their corresponding SubtiList cross-references
 UPFLIST.TXT    UPF  (Uncharacterized  Protein Families)  list  and
                index of members [1]
 YEAST.TXT      Index   of  Saccharomyces  cerevisiae  entries  and
                their corresponding gene designations
 YEAST1.TXT     Yeast Chromosome I entries
 YEAST2.TXT     Yeast Chromosome II entries
 YEAST3.TXT     Yeast Chromosome III entries
 YEAST5.TXT     Yeast Chromosome V entries
 YEAST6.TXT     Yeast Chromosome VI entries
 YEAST7.TXT     Yeast Chromosome VII entries
 YEAST8.TXT     Yeast Chromosome VIII entries
 YEAST9.TXT     Yeast Chromosome IX entries
 YEAST10.TXT    Yeast Chromosome X entries
 YEAST11.TXT    Yeast Chromosome XI entries
 YEAST13.TXT    Yeast Chromosome XIII entries [1]
 YEAST14.TXT    Yeast Chromosome XIV entries

 Notes:
 [1]  New in release 35.

 We have  continued  to include  in  some SWISS-PROT  document  files the
 references of  World-Wide  Web  sites  relevant  to  the  subject  under
 consideration. There are now 12 documents that include such links.



                  5. THE EXPASY WORLD-WIDE WEB SERVER

 5.1  Background information

 The most  efficient and  user-friendly  way to  browse  interactively in
 SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and other databases. is to use
 the World-Wide  Web (WWW)  molecular biology  server ExPASy.  The ExPASy
 server was  made  available  to the  public in  September  1993,  it  is
 reachable at the following address:

                              http://www.expasy.ch/

 The ExPASy WWW server  allows access, using  the user-friendly hypertext
 model, to the  SWISS-PROT, PROSITE,  ENZYME, SWISS-2DPAGE, SWISS-3DIMAGE
 and CD40Lbase  databases and,  through any  SWISS-PROT  protein sequence
 entry, to  other databases  such  as EMBL,  Eco2DBASE,  EcoCyc, FlyBase,
 GCRDb, MaizeDB, SubtiList/NRSub,  OMIM, PDB, HSSP,  ProDom, REBASE, SGD,
 YEPD and  Medline. ExPAsy  also offers  many tools  for the  analysis of
 protein sequences and 2D gels.


 5.2  SWISS-SHOP

 We provide, on  ExPASy, a  service called SWISS-SHOP.  SWISS-Shop allows
 any users of SWISS-PROT  to indicate what proteins  he/she is interested
 in. This can be done using various criteria that can be combined:

 -    By entering  one  or  more words  that  should  be  present  in the
      description line;
 -    By entering one or more species name(s) or taxonomic division(s);
 -    By entering one or more keywords;
 -    By entering one or more author names;
 -    By entering  the  accession number  (or  entry name)  of  a PROSITE
      pattern or a user-defined sequence pattern;
 -    By entering the  accession number  (or entry  name) of  an existing
      SWISS-PROT entry or by entering a "private" sequence.

 Every week, the  new sequences  entered in SWISS-PROT  are automatically
 compared with all the criteria that have been defined by the users. If a
 sequence corresponds to the  selection criteria defined by  a user, that
 sequence is sent by electronic mail.


 5.3  What is new on ExPASy

 ExPASy is constantly modified  and improved. If you  wish to be informed
 on the changes made to the server you can either:

 -    Read  the  document  "History  of  changes,  improvements  and  new
      features" which is available at the address:

              http://www.expasy.ch/www/history.html

 -    Subscribe to SWISS-Flash, a service that reports news of databases,
      software and services developments. By subscribing to this service,
      you will  automatically  get  SWISS-Flash  bulletins  by electronic
      mail. To subscribe use the address:

              http://www.expasy.ch/www/swiss-flash.html



                  6. TREMBL - A SUPPLEMENT TO SWISS-PROT

 The ongoing  genome sequencing  and mapping  projects  have dramatically
 increased the number of protein sequences to be incorporated into SWISS-
 PROT. Since we do not want to dilute the quality standards of SWISS-PROT
 by incorporating  sequences into  the database  without  proper sequence
 analysis and annotation,  we cannot  speed up  the incorporation  of new
 incoming data indefinitely.  But as we  also want to  make the sequences
 available as  fast as  possible, we  have  introduced with  SWISS-PROT a
 computer annotated supplement.  This supplement  consists of  entries in
 SWISS-PROT-like format  derived  from  the  translation  of  all  coding
 sequences (CDS) in the  EMBL nucleotide sequence  database, except those
 already included in SWISS-PROT.

 We name  this  supplement  TrEMBL (Translation  from  EMBL).  It  can be
 considered as  a  preliminary  section  of  SWISS-PROT. This  SWISS-PROT
 release is supplemented by TrEMBL release 5. TrEMBL is split in two main
 sections; SP-TrEMBL and REM-TrEMBL:

 - SP-TrEMBL (SWISS-PROT TrEMBL) contains the entries (140'555 in release
   5) which should  be incorporated into SWISS-PROT. SWISS-PROT accession
   numbers have been assigned for all SP-TrEMBL entries.

 - REM-TrEMBL (REMaining TrEMBL) contains  the entries (25'806 in release
   5) that  we do not want to include  in  SWISS-PROT  for  a  variety of
   reasons (synthetic  sequences,  pseudogenes, translations of uncorrect
   open reading frames,  fragments  with  less  than eight  amino  acids,
   patent-derived sequences, immunoglobulins and T-cell receptors, etc.)

 TrEMBL is available  by FTP from  the EBI server  (ftp.ebi.ac.uk) in the
 directory '/pub/databases/trembl'. It can  be queried on WWW  by the EBI
 SRS server (http://www.ebi.ac.uk/). It  is also available  on the SWISS-
 PROT CD-ROM and is searchable on the  FASTA, BIC_SW and BLAST servers of
 the EBI.



                  7. WEEKLY UPDATES OF SWISS-PROT

 Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
 are updated at each update:

 new_seq.dat    Contains all the new entries since the last full release;
 upd_seq.dat    Contains the entries for which the sequence data has been
                updated since the last release;
 upd_ann.dat    Contains  the entries  for which  one or  more annotation
                fields have been updated since the last release.

 Currently these  files  are  available on  the  following  anonymous FTP
 servers:

 Organization   ExPASy (Geneva University Expert Protein Analysis System)
 Address        expasy.hcuge.ch  (or 129.195.254.61)
 Directory      /databases/swiss-prot/updates

 Organization   European Bioinformatics Institute (EBI)
 Address        ftp.ebi.ac.uk (or 193.62.196.6)
 Directory      /pub/databases/swissprot/new


 !! Important notes !!!

 - Although we  try to follow  a regular schedule,  we do  not promise to
   update these files every week. In some cases two weeks will elapse in-
   between two updates.
 - Due to the current mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free.



                  8.  ENZYME and PROSITE

 8.1  The ENZYME data bank

 Release 22.0 of the ENZYME  data bank is distributed  with release 35 of
 SWISS-PROT. ENZYME release  22.0 contains  information relative  to 3651
 enzymes.


 8.2  The PROSITE data bank

 Release 14.0 of the PROSITE data bank  is distributed with release 35 of
 SWISS-PROT. This release  of PROSITE contains  997 documentation entries
 that describe  1'335  different patterns,  rules  and profiles/matrices.
 Release 14.0  is  the  first completely  new  release  of  PROSITE since
 November 1995. Since  that date we  have added 114  entries and modified
 566 entries. The long time that  elapsed between this release of PROSITE
 and the  last  one is  partially  due  to a  complete  rewriting  of the
 software  tools  that  maintain  the  database  and  allows  it  be  bi-
 directionally inked to SWISS-PROT. Thanks to  those changes, we will now
 be able to  produce PROSITE releases  at each release  of SWISS-PROT and
 also to offer on the ExPASy server frequent updates of the database.



                  9. WE NEED YOUR HELP !

 We welcome feedback from our users.  We would especially appreciate that
 you notify us  if you  find that  sequences belonging  to your  field of
 expertise are  missing from  the data  bank. We  also  would like  to be
 notified about annotations to be updated,  if, for example, the function
 of a protein has been clarified or if new post-translational information
 has become  available. To  facilitate such  feedback's  we offer  on the
 ExPASY WWW server  a form that  allows the submission  of updates and/or
 corrections to SWISS-PROT:

               http://www.expasy.ch/sprot/sp_update_form.html

 It is also  possible, from  any entries in  SWISS-PROT displayed  by the
 ExPASy server, to submit updates and/or  corrections for that particular
 entry. Finally, you  can also send  your comments by  electronic mail to
 the address:

                            swiss-prot@expasy.ch


 ========================================================================


                         APPENDIX A: SOME STATISTICS


   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.57   Gln (Q) 4.00   Leu (L) 9.39   Ser (S) 7.15
   Arg (R) 5.15   Glu (E) 6.34   Lys (K) 5.95   Thr (T) 5.70
   Asn (N) 4.50   Gly (G) 6.83   Met (M) 2.35   Trp (W) 1.24
   Asp (D) 5.29   His (H) 2.23   Phe (F) 4.08   Tyr (Y) 3.18
   Cys (C) 1.68   Ile (I) 5.78   Pro (P) 4.91   Val (V) 6.55

   Asx (B) 0.001  Glx (Z) 0.001  Xaa (X) 0.01


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Ile, Thr, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 5713

   The first twenty species represent 34020 sequences: 49.2 % of the total
   number of entries.


   A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 2609
                            2x:  891
                            3x:  480
                            4x:  321
                            5x:  225
                            6x:  209
                            7x:  148
                            8x:   94
                            9x:  113
                           10x:   58
                       11- 20x:  261
                       21- 50x:  165
                       51-100x:   64
                         >100x:   75


   A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        4750          Baker's yeast (Saccharomyces cerevisiae)
         2        4644          Human
         3        4098          Escherichia coli
         4        2971          Mouse
         5        2398          Rat
         6        1882          Bacillus subtilis
         7        1735          Caenorhabditis elegans
         8        1687          Haemophilus influenzae
         9        1064          Methanococcus jannaschii
        10        1047          Bovine
        11        1045          Fission yeast (Schizosaccharomyces pombe)
        12        1002          Fruit fly (Drosophila melanogaster)
        13         799          Chicken
        14         786          Mycobacterium tuberculosis
        15         680          Salmonella typhimurium
        16         658          Arabidopsis thaliana (Mouse-ear cress)
        17         648          African clawed frog (Xenopus laevis)
        18         551          Pig
        19         541          Rabbit
        20         494          Synechocystis sp. (strain PCC 6803)
        21         489          Mycoplasma pneumoniae
        22         470          Mycoplasma genitalium
        23         403          Rhizobium sp. (strain NGR234)
        24         398          Maize
        25         340          Pseudomonas aeruginosa
        26         292          Rice
        27         273          Bacteriophage T4
        28         272          Slime mold (Dictyostelium discoideum)
        29         257          Helicobacter pylori
        30         256          Tobacco
        31         253          Vaccinia virus (strain Copenhagen)
        32         248          Dog
        33         231          Pea
        34         223          Sheep
        35         219          Porphyra purpurea
        36         209          Barley
        37         203          Neurospora crassa
        38         199          Wheat
                   199          Staphylococcus aureus
        40         196          Mycobacterium leprae
        41         193          Human cytomegalovirus (strain AD169)
        42         192          Soybean
        43         190          Klebsiella pneumoniae
        44         184          Vaccinia virus (strain WR)
        45         183          Rhodobacter capsulatus
                   183          Pseudomonas putida
        47         180          Bacillus stearothermophilus
        48         175          Potato
        49         174          Tomato
        50         167          Candida albicans
        51         162          Agrobacterium tumefaciens
        52         156          Spinach
        53         154          Rhizobium meliloti
                   154          Autographa californica nuclear polyhedrosis virus
        55         151          Chlamydomonas reinhardtii
        56         150          Marchantia polymorpha (Liverwort)
        57         149          Guinea pig
        58         146          Variola virus
        59         145          Cyanophora paradoxa
        60         139          Odontella sinensis
        61         138          Aspergillus nidulans
        62         134          Orgyia pseudotsugata multicapsid polyhedrosis virus
        63         132          Lactococcus lactis (subsp. lactis)
        64         131          Streptomyces coelicolor
        65         122          Thermus aquaticus (subsp. thermophilus)
        66         120          Horse
        67         116          Golden hamster
        68         113          Trypanosoma brucei brucei
                   113          Anabaena sp. (strain PCC 7120)
                   113          Synechococcus sp. (strain PCC 7942)
        71         108          Kluyveromyces lactis
        72         107          Bombyx mori (Silk moth)
        73         105          Bradyrhizobium japonicum
                   105          Alcaligenes eutrophus
        75         102          Yersinia enterocolitica



   A.3  Repartition of the sequences by size

               From   To  Number             From   To   Number
                  1-  50    2882             1001-1100      627
                 51- 100    5886             1101-1200      484
                101- 150    8453             1201-1300      339
                151- 200    6661             1301-1400      226
                201- 250    6184             1401-1500      186
                251- 300    5742             1501-1600      115
                301- 350    5369             1601-1700      102
                351- 400    5392             1701-1800       79
                401- 450    4149             1801-1900       86
                451- 500    3905             1901-2000       52
                501- 550    2927             2001-2100       30
                551- 600    2053             2101-2200       67
                601- 650    1560             2201-2300       64
                651- 700    1159             2301-2400       32
                701- 750    1032             2401-2500       39
                751- 800     831             >2500          203
                801- 850     652
                851- 900     685
                901- 950     464
                951-1000     396



   A.4  Longest sequences

   The longest sequences (>=4000 residues) are listed here:

                               HTS1_COCCA  5217
                               MUC2_HUMAN  5179
                               FAT_DROME   5147
                               RYNR_RABIT  5037
                               RYNR_PIG    5035
                               RYNR_HUMAN  5032
                               RYNC_RABIT  4969
                               LRP_CAEEL   4753
                               DYHC_DICDI  4725
                               PLEC_RAT    4687
                               LRP2_RAT    4660
                               DYHC_RAT    4644
                               DYHC_DROME  4639
                               DYHC_CAEEL  4568
                               DYHB_CHLRE  4568
                               APB_HUMAN   4563
                               APOA_HUMAN  4548
                               LRP1_HUMAN  4544
                               LRP1_CHICK  4543
                               DYHC_PARTE  4540
                               RRPA_CVMJH  4488
                               DYHG_CHLRE  4485
                               DYHC_ANTCR  4466
                               DYHC_TRIGR  4466
                               GRSB_BACBR  4451
                               PKSK_BACSU  4447
                               PKSL_BACSU  4427
                               PGBM_HUMAN  4393
                               YP73_CAEEL  4385
                               DYHC_NEUCR  4367
                               DYHC_NECHA  4349
                               DYHC_EMENI  4344
                               PKD1_HUMAN  4303
                               DYHC_YEAST  4092
                               RRPA_CVH22  4085


   A.5  Statistics for journal citations


   Total number of journals cited in this release of SWISS-PROT: 861


   A.5.1 Table of the frequency of journal citations

        Journals cited 1x: 326
                       2x: 117
                       3x:  61
                       4x:  39
                       5x:  30
                       6x:  23
                       7x:  14
                       8x:  13
                       9x:  10
                      10x:  12
                  11- 20x:  66
                  21- 50x:  58
                  51-100x:  23
                    >100x:  69


   A.5.2  List of the most cited journals in SWISS-PROT

   Citations       Journal abbreviation
   ---------       ----------------------------------
   6038            J. BIOL. CHEM.
   3672            PROC. NATL. ACAD. SCI. U.S.A.
   3356            NUCLEIC ACIDS RES.
   2604            J. BACTERIOL.
   2352            GENE
   1992            FEBS LETT.
   1853            EUR. J. BIOCHEM.
   1693            BIOCHEM. BIOPHYS. RES. COMMUN.
   1651            EMBO J.
   1596            BIOCHEMISTRY
   1540            NATURE
   1367            BIOCHIM. BIOPHYS. ACTA
   1244            J. MOL. BIOL.
   1177            CELL
   1137            MOL. CELL. BIOL.
    920            MOL. GEN. GENET.
    899            PLANT MOL. BIOL.
    850            BIOCHEM. J.
    764            SCIENCE
    750            VIROLOGY
    748            GENOMICS
    731            MOL. MICROBIOL.
    661            J. BIOCHEM.
    502            J. VIROL.
    444            J. CELL BIOL.
    439            YEAST
    435            J. GEN. VIROL.
    418            PLANT PHYSIOL.
    381            GENES DEV.
    333            HUM. MOL. GENET.
    323            J. IMMUNOL.
    313            CURR. GENET.
    305            ARCH. BIOCHEM. BIOPHYS.
    303            INFECT. IMMUN.
    287            ONCOGENE
    287            MOL. BIOCHEM. PARASITOL.
    262            BIOL. CHEM. HOPPE-SEYLER
    248            FEMS MICROBIOL. LETT.
    230            MOL. ENDOCRINOL.
    230            HUM. MUTAT.
    220            J. CLIN. INVEST.
    220            AM. J. HUM. GENET.
    219            NAT. GENET.
    219            DEVELOPMENT
    216            J. GEN. MICROBIOL.
    213            HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
    194            J. MOL. EVOL.
    185            GENETICS
    180            STRUCTURE
    178            MICROBIOLOGY
    177            BLOOD
    172            HUM. GENET.
    169            DNA CELL BIOL.
    168            J. EXP. MED.
    163            APPL. ENVIRON. MICROBIOL.
    158            DEV. BIOL.
    156            NEURON
    152            DNA
    136            IMMUNOGENETICS
    124            ENDOCRINOLOGY
    123            DNA SEQ.
    122            PLANT CELL
    115            NAT. STRUCT. BIOL.
    109            HEMOGLOBIN
    108            PROTEIN SCI.
    108            BIOCHIMIE
    106            AGRIC. BIOL. CHEM.
    105            BIOORG. KHIM.
    101            CANCER RES.


 ===========================================================================


   APPENDIX B: RELATIONSHIPS BETWEEN SWISS-PROT AND SOME BIOMOLECULAR
               DATABASES

   The current  status of  the relationships (cross-references) between
   SWISS-PROT and some biomolecular databases is shown in the following
   schematic:


                         ***********************
                         *  EMBL Nucleotide    *
                         *  Sequence Database  *
                         *       [EBI]         *
                         ***********************
                           ^ ^ ^  ^  ^ ^ ^ ^ ^
******************         | | |  I  | | | | |         **********************
* FlyBase        * <-------+ | |  I  | | | | +-------> * MGD [Mouse]        *
******************         | | |  I  | | | | |         **********************
                           | | |  I  | | | | |
******************         | | |  I  | | | | |         **********************
* SubtiList      * <---------+ |  I  | | | +---------> * GCRDb [7TM recep.] *
* [B.subtilis]   *         | | |  I  | | | | |         **********************
******************         | | |  I  | | | | |
                           | | |  I  | | | | |         **********************
******************         | | |  I  | | +-----------> * EcoGene [E.coli]   *
* Mendel [Plant] * <-----+ | | |  I  | | | | |         **********************
******************       | | | |  I  | | | | |
                         | | | |  I  | | | | |         **********************
******************       | | | |  I  +---------------> * SGD [Yeast]        *
* MaizeDb        * <-----------+  I  | | | | |         **********************
* [Zea mays]     *       | | | |  I  | | | | |
******************       | | | |  I  | | | | |         **********************
                         | | | |  I  | +-------------> * DictyDB [D.disco.] *
******************       | | | |  I  | | | | |         **********************
* WormPep        *       | | | |  I  | | | | |
* [C.elegans]    * <---+ | | | |  I  | | | | |         **********************
******************     | | | | |  I  | | | | | +-----> * ENZYME [Nomencl.]  *
                       | | | | |  I  | | | | | |       **********************
******************     | v v v v  v  v v v v v v           v
* REBASE         *     *************************       **********************
* [Restriction   * <-- *   SWISS-PROT          * ----> * OMIM [Human]       *
*  enzymes]      *     *   Protein Sequence    *       **********************
******************     *   Data Bank           *
                       *************************       **********************
******************      ^ ^ ^ ^ ^ ^ ^ | ^ ^ ^          * ECO2DBASE     [2D] *
* StyGene        *      | | | | | | | | | | +--------> **********************
* [S.Typhimurium]* <----+ | | | | | | | | |
******************        | | | | | | | | |            **********************
                          | | | | | | | | +----------> * Maize-2DPAGE  [2D] *
******************        | | | | | | | |              **********************
* Transfac       * <------+ | | | | | | |
******************          | | | | | | |              **********************
                            | | | | | | +------------> * SWISS-2DPAGE  [2D] *
******************          | | | | | |                **********************
* Harefield [2D] * <--------+ | | | | |
******************            | | | | |                **********************
                              | | | | +--------------> * Aarhus/Ghent  [2D] *
******************            | | | |                  **********************
* PROSITE        *            | | | |
* [Patterns and  * <----------+ | | +----------------> **********************
* profiles]      *              | |                    * YEPD [Yeast]  [2D] *
******************              | +----------------+   **********************
             |                  v                  |
             |          ***********************    +-> **********************
             +--------> * PDB [3D structures] * <----- * HSSP [3D similar.] *
                        ***********************        **********************

 ===End=of=SWISS-PROT=release=35=notes=====================================

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by ca flag CBR Canada Mirror sites: Australia  Brazil  China  Korea  Switzerland