The Make2D-DB II Package

Attention: World-2DPAGE is no longer maintained.

SWISS-2DPAGE USER MANUAL

Release 18, September 2006

What is SWISS-2DPAGE ?
Structure of a SWISS-2DPAGE entry
The different lines

The ID line

Entry Name
Entry class

The AC line
The DT line
The DE line
The GN line
The OS line
The OC line
The OX line
The MT line
The IM line
The reference (RN, RP, RX, RA, RT, RL) lines

The RN line
The RP line
The RX line
The RA line
The RT line
The RL line

Journal citations
Book citations
Unpublished results
Unpublished observations
Thesis
Patent applications
Submissions

The CC line
The 2D lines
The 1D lines
The DR line
The // line

What is SWISS-2DPAGE ?

SWISS-2DPAGE is an annotated two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) and SDS-PAGE database established in 1993 and maintained collaboratively by the Biomedical Proteomics Reasearch Group (BPRG) of the Geneva University and the Swiss Institute of Bioinformatics (SIB).

The SWISS-2DPAGE database assembles data on proteins identified on various 2-D PAGE and SDS-PAGE maps. Each SWISS-2DPAGE entry contains textual data on one protein, including mapping procedures, physiological and pathological information, experimental data (isoelectric point, molecular weight, amino acid composition, peptide masses) and bibliographical references. In addition to this textual data, SWISS-2DPAGE provides several 2-D PAGE and SDS-PAGE images showing the experimentally determined location of the protein, as well as a theoretical region computed from the sequence protein, indicating where the protein might be found in the gel.
Cross-references are provided to Medline and other federated 2-DE databases (COMPLUYEAST-2DPAGE, Cornea-2DPAGE, DOSAC-COBS 2D Page, ECO2DBASE, HSC-2DPAGE, LENS-2DPAGE, OGP-WWW, PHCI-2DPAGE, PMMA-2DPAGE, Siena-2DPAGE, YEPD) and to UniProtKB/Swiss-Prot, which provides many links to other molecular databases (EMBL, Genbank, PROSITE, OMIM, etc).

For detailed information specific to the current SWISS-2DPAGE release, see the release notes.

If you want to cite SWISS-2DPAGE in a publication please use the following reference:

Hoogland C., Mostaguir K., Sanchez J.-C., Hochstrasser D.F., Appel R.D.

SWISS-2DPAGE, ten years later.

Proteomics 2004, 4(8): 2352-2356.

Copyright Notice

SWISS-2DPAGE is copyright the Swiss Institute of Bioinformatics. There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme see: http://www.expasy.org/ch2d/license.html or send an email to legal@sib.swiss.

The above copyright notice also applies to this user manual as well as to any other SWISS-2DPAGE documents.

Structure of a SWISS-2DPAGE entry

The entries in the SWISS-2DPAGE are text files structured to be readable by human as well as by computer programs. The explanations, descriptions, classifications and comments are in everyday English. However, symbols familiar to biochemists, chemists and molecular biologists are also used. Each entry corresponds to one protein and is composed of lines. Different types of lines, each with their own format, are used to record the various data which make up the entry. A sample protein entry is shown below:


ID   ALF_ECOLI; STANDARD; 2DG.

AC   P0AB71; P11604;

DT   01-SEP-1997, integrated into SWISS-2DPAGE (release 6).

DT   15-MAY-2003, 2D annotation version 4.

DT   26-SEP-2006, general annotation version 10.

DE   Fructose-bisphosphate aldolase class 2 (EC 4.1.2.13) (Fructose-

DE   bisphosphate aldolase class II) (FBP aldolase).

GN   Name=fbaA; Synonyms=fba, fda; OrderedLocusNames=b2925;

OS   Escherichia coli.

OC   Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;

OC   Enterobacteriaceae; Escherichia.

OX   NCBI_TaxID=562;

MT   ECOLI, ECOLI-DIGE4.5-6.5, ECOLI5-6.

IM   ECOLI, ECOLI-DIGE4.5-6.5, ECOLI5-6.

RN   [1]

RP   MAPPING ON GEL.

RX   MEDLINE=98410772; PubMed=9740056;

RA   Tonella L., Walsh B.J., Sanchez J.-C., Ou K., Wilkins M.R., Tyler M.,

RA   Frutiger S., Gooley A.A., Pescaru I., Appel R.D., Yan J.X., Bairoch A.,

RA   Hoogland C., Morch F.S., Hughes G.J., Williams K.L., Hochstrasser D.F.;

RT   '''98 Escherichia coli SWISS-2DPAGE database update'';

RL   Electrophoresis 19:1960-1971(1998).

RN   [2]

RP   MAPPING ON GEL.

RX   PubMed=11680886;

RA   Tonella L., Hoogland C., Binz P.-A., Appel R.D., Hochstrasser D.F.,

RA   Sanchez J.-C.;

RT   ''New perspectives in the Escherichia coli proteome investigation'';

RL   Proteomics 1:409-423(2001).

RN   [3]

RP   MAPPING ON GEL.

RX   PubMed=12469338;

RA   Yan J.X., Devenish A.T., Wait R., Stone T., Lewis S., Fowler S.;

RT   ''Fluorescence 2-D difference gel electrophoresis and mass spectrometry

RT   based proteomic analysis of E. coli'';

RL   Proteomics 2:1682-1698(2002).

2D   -!- MASTER: ECOLI;

2D   -!-   PI/MW: SPOT 2D-000L0H=5.55/40732;

2D   -!-   PI/MW: SPOT 2D-000L1R=5.43/39855;

2D   -!-   AMINO ACID COMPOSITION: SPOT 2D-000L1R: B=10.9, Z=10.5, S=7.2, H=3,

2D         G=10.5, T=5.3, A=9.4, P=4.3, Y=3.6, R=3.2, V=7.4, M=1.7, I=5.3, L=8.6,

2D         F=4.3, K=4.8;

2D   -!-   MAPPING: AMINO ACID COMPOSITION AND SEQUENCE TAG (SKIF) [1].

2D   -!- MASTER: ECOLI-DIGE4.5-6.5;

2D   -!-   PI/MW: SPOT 2D-001WMY=5.49/39104;

2D   -!-   PEPTIDE MASSES: SPOT 2D-001WMY: 955.51; 1502.78; 1762.98; 1878.01; TRYPSIN.

2D   -!-   MAPPING: Peptide mass fingerprinting [3].

2D   -!- MASTER: ECOLI5-6;

2D   -!-   PI/MW: SPOT 2D-001L5L=5.56/50220;

2D   -!-   PI/MW: SPOT 2D-001L6U=5.56/49421;

2D   -!-   PEPTIDE MASSES: SPOT 2D-001L5L: 1320.801; 1502.988; 1878.257;

2D         2591.649; 2719.736; 2871.916; TRYPSIN.

2D   -!-   PEPTIDE MASSES: SPOT 2D-001L6U: 934.591; 950.583; 953.573;

2D         1320.797; 1502.947; 1878.221; 2591.534; 2719.66; TRYPSIN.

2D   -!-   MAPPING: Peptide mass fingerprinting [2].

CC   ---------------------------------------------------------------------------

CC   This SWISS-2DPAGE entry is copyright the Swiss Institute of Bioinformatics.

CC   There are no restrictions on its use by non-profit institutions as long as

CC   its content is in no way modified and this statement is not removed.

CC   Usage by and for commercial entities requires a license agreement (See

CC   http://www.expasy.org/ch2d/license.html or send email to legal@sib.swiss).

CC   ---------------------------------------------------------------------------

DR   UniProtKB/Swiss-Prot; P0AB71; ALF_ECOLI.

Each line begins with a two-character line code, which indicates the type of data contained in the line. The current line types and line codes, and the order in which they appear in a SWISS-2DPAGE entry are shown below and are described extensively in the following sections.


---------  ----------------------------    ----------------------

Line code  Content                         Occurrence in an entry

---------  ----------------------------    ----------------------

ID         Identification                  Once; starts the entry

AC         Accession number(s)             One or more

DT         Date                            Three times

DE         Description                     One or more

GN         Gene name(s)                    Optional

OS         Organism species                One or more

OC         Organism classification         One or more

OX         Taxonomy cross-reference(s)     Once

MT         Masters                         One or more

IM         Images                          One or more

RN         Reference number                One or more

RP         Reference position              One or more

RX         Reference cross-reference(s)    Optional

RA         Reference authors               One or more

RT         Reference title                 Optional

RL         Reference location              One or more

CC         Comments or notes               Optional

2D         2-D PAGE specific data          Several

1D         SDS-PAGE specific data          Several

DR         Database cross-references       Optional

//         Termination line                Once; ends the entry

---------  ----------------------------    ----------------------

As shown in the above table, some entries do not contain all of the line types, and some line types occur many times in a single entry. Each entry must begin with an identification line (ID) and end with a terminator line (//).

It must be noted that for standardization purpose most of the SWISS-2DPAGE line types and formats are kept from the Swiss-Prot knowledgebase. One thus can refer to the UniProtKB/Swiss-Prot user manual for extended description of these lines. Only the MT, IM, 2D and 1D lines (2-D PAGE and SDS-PAGE data) are specific to the SWISS-2DPAGE database.

The two-character line type code which begins each line is always followed by three blanks, so that the actual information begins with the sixth character. Information is not extended beyond character position 75.

The different line types

The ID line

The ID (IDentification) line is always the first line of an entry. The general form of the ID line is:

ID   ENTRY_NAME; ENTRY_CLASS; 2DG.

Entry Name

The first item on the ID line is the entry name. This name is a useful means of identifying a protein. The entry name consists of up to 12 uppercase alphanumeric characters.

SWISS-2DPAGE uses the general purpose naming convention used by UniProtKB/Swiss-Prot which can be symbolized as X_Y, where:

for entry coming from UniProtKB/Swiss-Prot: X is a mnemonic code of at most 5 alphanumeric characters representing the protein name.
Examples: B2MG is for Beta-2-microglobulin, HBA is for Hemoglobin alpha chain and ALBU is for albumin.

for entry coming from UniProtKB/TrEMBL: X is identical to the accession number of the entry (thus 6 alphanumeric characters).

The '_' sign serves as a separator.

Y is a mnemonic species identification code of at most 5 alphanumeric characters representing the biological source of the protein.
Examples: ECOLI for Escherichia coli, HUMAN for Homo sapiens or YEAST for Baker's yeast (Saccharomyces cerevisiae).
The names of all the species identification codes currently defined in the Swiss-Prot knowledgebase are listed in the Swiss-Prot document file speclist.txt.

An example of a complete protein entry name is: A1AT_HUMAN for the human alpha-1-antitrypsin.

Entry class

The entry class defines the type of data, which may be STANDARD (for data which are complete) or PRELIMINARY (for entries in which certain information is missing or has not yet been verified).

The AC line

The AC (ACcession number) line lists the accession number(s) associated with an entry. The format of the AC line is:


AC   AC_number_1;[ AC_number_2;]...[ AC_number_N;]

An example of an accession number line is shown below:

AC   P02649;

The accession numbers are separated by semicolons and the list is terminated by a semicolon. If necessary, more than one AC line will be used. Most SWISS-2DPAGE entries currently have only one accession number.

The purpose of accession numbers is to provide a stable way of identifying entries from release to release. It is sometimes necessary for reasons of consistency to change the names of the entries, for example, to ensure that related entries have similar names. However, an accession number is always conserved, and therefore allows unambiguous citation of SWISS-2DPAGE entries.

Researchers who wish to cite entries in their publications should always cite the first accession number.This is commonly referred to as the 'primary accession number'.

Entries will have more than one accession number if they have been merged or split. For example, when two entries are merged into one, a new accession number goes at the start of the AC line, and those from the merged entries are listed after this one. Similarly, if an existing entry is split into two or more entries (a rare occurrence), the original accession number list is retained in all the derived entries.

An accession number is dropped only when the data to which it was assigned have been completely removed from the database.

Accession numbers consist of 6 alphanumerical characters in the following format:


    1       2     3         4         5         6

    [O,P,Q] [0-9] [A-Z,0-9] [A-Z,0-9] [A-Z,0-9] [0-9]

Here are some examples of valid accession numbers: O08709, Q9ZPF5, Q9Z1T6 and P54638.

The DT line

The DT (DaTe) lines show the date of creation and last modification of the database entry. The format of the DT lines is:


DT   DD-MMM-YYYY, integrated into SWISS-2DPAGE (release n).

DT   DD-MMM-YYYY, 2D annotation version x.

DT   DD-MMM-YYYY, general annotation version x.

where 'DD' is the day, 'MMM' the month, 'YEAR' the year, respectively. The dates shown in DT lines correspond to the date of the biweekly release at which an entry was integrated or updated. There are always three DT lines in each entry, each of them is associated with a specific comment:

The first DT line indicates when the entry first appeared in the database. The associated comment 'integrated into SWISS-2DPAGE (release release_number)' indicates the release number.
The second DT line indicates when 2D annotation were last modified. The associated comment '2D annotation version x' indicates the 2D annotation version number. This version number is incremented by one when new spots have been identified for this protein, when new maps have been added to this protein, or when existing spot annotations have been modified.
The third DT line indicates when general annotation were last modified. The associated comment 'general annotation version x' indicates the general annotation version number. This version number is incremented by one whenever any data in other fields of the entry is modified (protein name, accession number, description, gene name, etc.).

Example of a block of DT lines:


DT   01-AUG-1995, integrated into SWISS-2DPAGE (release 2).

DT   01-OCT-2001, 2D annotation version 3.

DT   07-AUG-2006, general annotation version 7.

The DE line

The DE (DEscription) lines contain general descriptive information about the protein stored. This information is generally sufficient to identify the protein precisely. The format of the DE lines is:

DE   Description.

The description is given in ordinary English and is free-format. In some cases, more than one DE line is required; in this case, the text is divided only between words and only the last DE line is terminated by a period.

When the complete sequence was not determined, the last information given on the DE lines will be '(Fragment)' or '(Fragments)'.

Two examples of description lines are given here:

DE   Apolipoprotein E (Apo-E).

DE   Aldehyde dehydrogenase A (EC 1.2.1.22) (Lactaldehyde dehydrogenase).

For a detailed description of the current rule applyed to the DE line, one should refer to the UniProtKB/Swiss-Prot user manual.

The GN line

The GN (Gene Name) line contains the name(s) of the gene(s) that code for the stored protein sequence. The format of the GN line is:


GN   NAME1[ AND|OR NAME2...].

Examples:


GN   APOE.

GN   ATPA.

It often occurs that more than one gene name has been assigned to an individual locus. In that case all the synonyms will be listed. The word 'OR' separates the different designations. The first name in the list is assumed to be the most correct (or most current) designation. Example:


GN   ATPA OR UNCA OR PAPA OR B3734 OR Z5232 OR ECS4676.

In a few cases, multiple genes code for an identical protein sequence. In that case all the different gene names will be listed. The word 'AND' separates the designations. Example:


GN   HBA1 AND HBA2.

In very rare cases 'AND' and 'OR' can both be present. In that case parentheses are used as shown in the following example:


GN   (TUFA OR B3339) AND (TUFB OR B3980).

The OS line

The OS (Organism Species) line specifies the organism(s) which was the source of the stored protein. In the rare case where all the species information will not fit on a single line more than one OS line is used. The last OS line is terminated by a period.

The species designation consists, in most cases, of the Latin genus and species designation followed by the English name (in parentheses).

Examples of OS lines are shown here:

OS   Escherichia coli.

OS   Homo sapiens (Human).

OS   Saccharomyces cerevisiae (Baker's yeast).

The OC line

The OC (Organism Classification) lines contain the taxonomic classification of the source organism. The taxonomic classification used is that maintained at the NCBI (see http://www.ncbi.nlm.nih.gov/Taxonomy/) and used by the nucleotide sequence databases (EMBL/GenBank/DDBJ).

The classification is listed top-down as nodes in a taxonomic tree in which the most general grouping is given first. The classification may be distributed over several OC lines, but nodes are not split or hyphenated between lines. The individual items are separated by semicolons and the list is terminated by a period.

The format of the OC lines is:

OC   Node[; Node...].

For example the classification lines for a human sequence would be:

OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;

OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.

The OX line

The OX (Organism taXonomy cross-reference) line is used to indicate the identifier assigned to a specific source organism in a taxonomic database. The format of the OX line is:


OX   Taxonomy-database_Qualifier=Taxonomic code;

Currently the cross-references are made to the NCBI's taxonomic classification (see http://www.ncbi.nlm.nih.gov/Taxonomy/), which is associated with the qualifier 'TaxID' and a one- to six-digit taxonomic code.

For example:


OX   NCBI_TaxID=9606;

The MT line

The MT (MasTer) lines are specific to SWISS-2DPAGE. These lines indicate on what types of maps the protein has been identified (such as PLASMA, LIVER, etc.).

A master line example is shown here.

MT   NUCLEI_LIVER_HUMAN, NUCLEOLI_HELA_2D_HUMAN, NUCLEOLI_HELA_1D_HUMAN.

The IM line

The IM (IMages) lines list the 2-D PAGE and SDS-PAGE images which are associated to the entry. These may be, for example, TUMORAL LIVER, NORMAL LIVER or just LIVER.

The reference (RN, RP, RX, RA, RT, RL) lines

These lines comprise the literature citations within SWISS-2DPAGE. The citations indicate the sources from which the data has been abstracted. The reference lines for a given citation occur in a block, and are always in the order RN, RP, RX, RA, RT, RL. Within each such reference block the RN and RP lines occur once, the RX and RT lines occur zero or more times, and the RA and RL lines each occur one or more times. If several references are given, there will be a reference block for each.

An example of a complete reference is:


RN   [1]

RP   MAPPING ON GEL.

RX   MEDLINE=98410772; PubMed=9740056;

RA   Tonella L., Walsh B.J., Sanchez J.-C., Ou K., Wilkins M.R., Tyler M.,

RA   Frutiger S., Gooley A.A., Pescaru I., Appel R.D., Yan J.X., Bairoch A.,

RA   Hoogland C., Morch F.S., Hughes G.J., Williams K.L., Hochstrasser D.F.;

RT   "'98 Escherichia coli SWISS-2DPAGE database update.";

RL   Electrophoresis 19:1960-1971(1998).

The formats of the individual lines are explained below.

The RN line

The RN (Reference Number) line gives a sequential number to each reference citation in an entry. This number is used to indicate the reference in comments and 2D lines. The format of the RN line is:

RN   [N]

where N denotes the n^th reference for this entry. The reference number is always enclosed in square brackets.

The RP line

The RP (Reference Position) line describes the extent of the work carried out by the authors of the reference cited. The format of the RP line is:

RP   COMMENT.

Typical examples of RP lines are shown below:

RP   PROTEIN SEQUENCE.

RP   AMINO ACID COMPOSITION.

RP   MAPPING ON GEL.

RP   CHARACTERIZATION.

RP   REVIEW.

The RX line

The RX (Reference cross-reference) line is an optional line which is used to indicate the identifier assigned to a specific reference in a bibliographic database. The format of the RX line is:

RX   BIBLIOGRAPHIC_DATABASE=IDENTIFIER[; BIBLIOGRAPHIC_DATABASE=IDENTIFIER...];

Where the valid bibliographic database names and their associated identifier are:


------- ------------------------------------------

Name	Identifier

------- ------------------------------------------

MEDLINE	Eight-digit MEDLINE Unique Identifier (UI)

PubMed	PubMed Unique Identifier (PMID)

------- ------------------------------------------

Example of RX lines:


RX   PubMed=11503206;

RX   MEDLINE=98410772; PubMed=9740056;

The RA line

The RA (Reference Author) lines list the authors of the paper (or other work) cited. All of the authors are included, and are listed in the order given in the paper. The names are listed surname first followed by a blank followed by initial(s) with periods. The authors' names are separated by commas and terminated by a semicolon. Author names are not split between lines. An example of the use of RA lines is shown below:

RA   Anderson N.L., Anderson N.G.;

As many RA lines as necessary are included for each reference.

The RT line

The RT (Reference Title) lines give the title of the paper (or other work) as exactly as possible given the limitations of the computer character set. The title is enclosed in double quotes, and may be continued over several lines as necessary. The title lines are terminated by a semicolon. An example of the use of RT lines is shown below:

RT   "High resolution two-dimensional electrophoresis of human plasma

RT   proteins.";

The RT line is optional or as many RT lines as necessary are included for each reference.

The RL line

The RL (Reference Location) lines contain the conventional citation information for the reference. In general, the RL lines alone are sufficient to find the paper in question.

Journal citations

The RL line for a journal citation includes the journal abbreviation, the volume number, the page range, and the year. The format for such a RL line is:

RL   Journal_abbrev Volume:First_page-Last_page(YEAR).

Journal names are abbreviated according to the conventions used by the National Library of Medicine (NLM) and are based on the existing ISO and ANSI standards.

An example of an RL line is:

RL   Proc. Natl. Acad. Sci. U.S.A. 74:5421-5425(1977).

When a reference is made to a paper which is 'in press' at the time when the data bank is released, the page range and eventually the volume number are indicated as '0' (zero), if unknown. An example of a RL line of such type is shown here:

RL   Electrophoresis 0:0-0(1997).

RL   Electrophoresis 14:in press(1993).

Book citations

A variation of the RL line format is used for papers found in books or other similar publications, which are cited as shown below:

RL   (In) Editor1 I.[, Editor2 I., EditorX I.] (eds.);

RL   Book Title, pp.[Vol:]First-Last, Publisher, City (Year).

The first RL line contains the designation '(In)', which indicates that this is a book reference. These citations generally include the following information: the title of the book, the name of the editor(s), the page range, the publisher name, the city where it is published, and the year of publication (which is always shown between parentheses).

Example of book citation is given here:


RL   (In) Neidhardt et al. (eds.);

RL   Escherichia coli and Salmonella: Cellular and Molecular Biology (2nd

RL   ed.), pp.2067-2117, ASM Press, Washington DC (1996).

Unpublished results

RL lines for unpublished results follows the format shown in the following example:

RL   Unpublished results, cited by:

RL   ULRICH E.L., KROGMANN D.W., MARKLEY J.L.;

RL   J. Biol. Chem. 257:9356-9364(1982).

Unpublished observations

For unpublished observations the format of the RL line is:

RL   Unpublished observations (MMM-YEAR).

Where 'MMM' is the month and 'YEAR' is the year.

We use the 'unpublished observations' RL line to cite communications by scientists to SWISS-2DPAGE of unpublished information concerning various aspects of an entry.

Thesis

For Ph.D. thesis the format of the RL line is:

RL   Thesis (YEAR), Institution Name, Country.

An example of such a line is given here:

RL   Thesis (1994), Geneva University, Switzerland.

Patent applications

For patent applications the format of the RL line is:

RL   Patent number PAT_NUMB, DD-MMM-YYYY.

Where 'PAT_NUMB' is the international publication number of the patent, 'DD' is the day, 'MMM' is the month and 'YEAR' is the year.

Submissions

The final form that an RL line can take is that used for submissions. The format of such a RL line is:

RL   Submitted (MMM-YEAR) to the DATABASE_NAME database.

Where 'MMM' is the month, 'YEAR' is the year and 'DATABASE_NAME' is the database name (for example SWISS-2DPAGE or Swiss-Prot).

An example of submission RL line is given here:

RL   Submitted (JUN-1994) to the SWISS-2DPAGE database.

The CC line

The CC lines are free text comments on the entry, and may be used to convey any useful information. The comments always appear below the last reference line and are grouped together in comment blocks, a block being made of 1 or more comment lines. The first line of a block is marked with the characters '-!-'.

The format of a comment block is:

CC   -!- FIRST LINE OF A COMMENT BLOCK

CC       SECOND AND SUBSEQUENT LINES.

A major proportion of the comment blocks are arranged according to what we designate as 'topics'. The format of a comment block which belongs to a 'topic' is:

CC   -!- TOPIC: FREE TEXT DESCRIPTION.

The current topics used in SWISS-2PDAGE are:

SUBUNIT: Description of the quaternary structure of a protein.
MISCELLANEOUS: Any comment which does not belong to any of the other defined topics.

For a detailed description of current topics used in Swiss-Prot, one should refer to the UniProtKB/Swiss-Prot user manual.

We show here an example of its usage:

CC   -!- SUBUNIT: HOMOTETRAMER.

CC   -!- MISCELLANEOUS: POSSIBLE SIGNAL/TRANSIT 1-10.

The 2D lines

The 2D lines contain data specific to 2-D PAGE reference maps. The 2D lines may start with free text comments concerning all the reference maps available for the entry. Then appear 2D lines grouped by master.

The 2D comment lines is a block made of 1 or more lines, the first one is marked with the characters '-!-' and the last one is terminated by a period. The format for these 2D comment lines is similar to the CC lines:

2D   -!- FIRST LINE OF 2D COMMENT BLOCK

2D       SECOND AND SUBSEQUENT LINES.

As for CC lines, the 2D comment lines are arranged in 'topics'. The current topic is:

MAPPING COMMENT: General comments about the mapping procedure concerning all the reference maps available for the entry. The format is as follow:

2D   -!- MAPPING COMMENT: FREE TEXT DESCRIPTION.

Here is an example of the 2D mapping comment usage:

2D   -!- MAPPING COMMENT: CROSS-SPECIES IDENTIFICATION

2D       (UniProtKB/Swiss-Prot; PGMU_RAT; P38652).

Then appear the 2D lines block for each master. Each block is made of two or more 2D lines. The first line of a block specifies the master and has the following format:

2D   -!- MASTER: 'MASTER';

where 'MASTER' is one of the SWISS-2DPAGE masters available in the current release.

Examples of the first line of a 2D master block are:

2D   -!- MASTER: LIVER_HUMAN;

2D   -!- MASTER: HEPG2SP_HUMAN;

A major proportion of the 2D blocks are arranged according to what we designated as 'topics'. There are fixed format and free text topics. The first line of a topic is marked with the character '-!-'.

Current fixed format topics are:

1. PI/MW: Description of the isoelectric point and molecular weight of the entry on the SWISS-2DPAGE master gel

The format of the PI/MW topic is:

2D   -!-   PI/MW: SPOT 'SERIAL NUMBER'='PI'/'MW';

Where 'SERIAL NUMBER' is the spot serial number (a unique spot identifier across all maps in SWISS-2DPAGE), 'PI' is the experimental isoelectric point and 'MW' the experimental molecular weight of the spot as determined on the master map.

Here is an example for the PI/MW topic:

2D   -!-   PI/MW: SPOT 2D-0000GG=5.80/65958;

2. AMINO ACID COMPOSITION: Description of the amino acid composition of the entry (in %) determined after 2-D PAGE transfer on a PVDF membrane, hydrolysis, Fmoc derivatisation and HPLC analysis (see protocols)

The format of the AMINO ACID COMPOSITION topic is:

2D   -!- AMINO ACID COMPOSITION: SPOT 'SERIAL NUMBER': AAC_LIST

2D       SUBSEQUENT AAC_LIST;

where AAC_LIST contains the beginning of the amino acid composition and SUBSEQUENT AAC_LIST contains the remaining parts of the amino acid composition. This topic may take one or more lines. The amino acid composition is a list of comma separated items of the form 'X=AAC', where X is the one-letter for an amino acid, and AAC is its value in percent of the total amount of amino acids.

We give here an example of an AMINO ACID COMPOSITION topic:


2D   -!-   AMINO ACID COMPOSITION: SPOT 2D-000SEW: B=11.30, S=10.80,

2D         Z=12.10, G=11.20, T=3.90, H=0.90, Y=4.60, A=9.40, P=2.30, R=3.40,

2D         M=1.40, V=9.20, I=3.60, L=6.30, F=2.30, K=7.30 ;

The one-letter and three-letter codes for amino acids used in SWISS-2DPAGE are those adopted by the commission on Biochemical Nomenclature of the IUPAC-IUB.

3. PEPTIDE MASSES: Description of the experimental peptide masses of the entry (in Dalton) obtained by mass spectrometry. Only the peptide masses allowing the identification using the Expasy PeptIdent tool are given.

The parameters used for identification are:

Enzyme: Trypsin, allowing for up to 2 missed cleavages
Cysteine treated with Iodoacetamide to form carboxyamidomethyl cysteine (Cys_CAM) considered
Methionine in oxidized form also considered
Peptide masses are monoisotopic and interpreted as [M+H]+

The format of the PEPTIDE MASSES topic is:


2D   -!- PEPTIDE MASSES: SPOT 'SERIAL NUMBER': MASSES_LIST;

2D       SUBSEQUENT MASSES_LIST; 'ENZYME'.

where MASSES_LIST contains the beginning of the peptide masses list and SUBSEQUENT MASSES_LIST contains the remaining parts (if needed), and ENZYME is the enzyme used for the digestion. This topic may take one or more lines. The peptide masses are separated by semicolons.

An example is shown below:


2D   -!-   PEPTIDE MASSES: SPOT 2D-0015H6: 1001.631; 1267.653; 1731.898;

2D         1821.909; TRYPSIN.

4. PEPTIDE SEQUENCES: Description of the experimental peptide sequences of the entry obtained by tandem mass spectrometry. Only the peptide sequences identified are given.

The format of the PEPTIDE SEQUENCES topic is:


2D   -!- PEPTIDE SEQUENCES: SPOT 'SERIAL NUMBER': SEQUENCES_LIST;

2D       SUBSEQUENT SEQUENCES_LIST.

where 'SERIAL NUMBER' is the spot serial number, SEQUENCES_LIST contains the beginning of the peptide sequences list and SUBSEQUENT SEQUENCES_LIST contains the remaining parts (if needed). This topic may take one or more lines. The peptide sequences is a list of semicolons separated items of the form '(S)EQUENC(E),X-Y', where (S)EQUENC(E) is the peptide sequence found, X and Y are respectively the start and end position of the peptide in the protein sequence.

An example is shown below:


2D   -!-   PEPTIDE SEQUENCES: SPOT 2D-001WFZ: (R)VASWSTAR(H),318-325;

2D         (R)QPVSASDFALQFTPGKR(Y),391-407.

The format of a free text topic is:

2D   -!-   TOPIC: FREE TEXT DESCRIPTION.

Current free text topics are:

1. MAPPING: Description of the biochemical technique which has allowed the identification of the entry on the SWISS-2DPAGE master gel.

2. NORMAL LEVEL: Description of the physiological protein expression.

3. PATHOLOGICAL LEVEL: Description of pathological protein expressions (an increase or decrease).

4. NORMAL POSITIONAL VARIANTS: Description of physiological polymorphisms.

5. PATHOLOGICAL POSITIONAL VARIANTS: Description of pathological polymorphisms.

6. EXPRESSION: Description of the protein expression modifications including level and/or post-translational modifications.

Examples of free text topics are:

2D   -!-   MAPPING: MATCHING WITH A PLASMA GEL.

2D   -!-   NORMAL LEVEL: 30-60 MG/L.

2D   -!-   PATHOLOGICAL LEVEL: INCREASED DURING THE ACUTE-

2D         PHASE REACTION; DECREASED DURING EMPHYSEMA.

2D   -!-   NORMAL POSITIONAL VARIANTS: 30 GENETICS

2D         VARIANTS KNOWN AS PI ALLELES.

2D   -!-   PATHOLOGICAL POSITIONAL VARIANTS: ALPHA-1-

2D         ANTITRYPSIN PITTSBURGH.

2D   -!-   EXPRESSION: decrease after benzoic acid treatment [1].

The 1D lines

The 1D lines contain data specific to SDS-PAGE reference gels. These lines are arranged like the 2D ones. That is, the 1D lines for a given master occur in a block. A block is made of two or more 1D lines. The first line of a block specifies the master and has the following format:

1D   -!- MASTER: 'MASTER';

where 'MASTER' is one of the SWISS-2DPAGE masters available in the current release.

Example of the first line of a 1D master block is:

1D   -!- MASTER: NUCLEOLI_HELA_1D_HUMAN;

Current fixed format topic for 1D lines is:

1. MW: Description of the experimental molecular weight of the entry on the SWISS-2DPAGE master gel

The format of the MW topic is:

1D   -!-   MW: BAND 'SERIAL NUMBER'='MW';

Where 'SERIAL NUMBER' is the SWISS-2PDPAGE serial number (a unique spot identifier across all gels in SWISS-2DPAGE), and 'MW' the experimental molecular weight of the band as determined on the master gel.

Here is an example for the MW topic:

1D   -!-   MW: BAND 1D-001V8C=63488;

See 2D lines for other fixed format topics available.

The format of a free text topic is:

1D   -!-   TOPIC: FREE TEXT DESCRIPTION.

Current free text topic is:

1. MAPPING: Description of the biochemical technique which has allowed the identification of the entry on the SWISS-2DPAGE master gel.

See 2D lines for other free text topics available.

The DR line

The DR (Database cross-Reference) lines are used as pointers to information related to SWISS-2DPAGE entries and found in other databases.

The format of the DR line is:

DR   DATABASE; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER.

The first item on the DR line, the database identifier, is the abbreviated name of the data collection to which reference is made. The currently defined database identifiers are:


------------------   ---------------------------------------------------------------------

Identifier           Database description

------------------   ---------------------------------------------------------------------

Cornea-2DPAGE        2-DE database at Aarhus University, Denmark

COMPLUYEAST-2DPAGE   2-DE database at Madrid University, Spain

DOSAC-COBS 2D-PAGE   2-DE database at Palermo University, Italy

ECO2DBASE            Escherichia coli gene-protein database (2D gel spots)

HSC-2DPAGE           Harefield hospital 2D gel protein databases

LENS-2DPAGE          2-DE database of mammalian lens samples of the Oregon Health & Science University, US

OGP-WWW              Oxford GlycoProteomics database (Human platelet) at Oxford University, UK

PHCI-2DPAGE          Parasite host cell interaction 2D-PAGE, Aarhus University, Denmark

PMMA-2DPAGE          2D-PAGE database at Purkyne Military Medical Academy, Czech

Siena-2DPAGE         2-DE protein database, Siena University, Italy

UniProtKB/Swiss-Prot Protein Knowledgebase from the Swiss Institute of Bioinformatics and

                     the EMBL Outstation - the European Bioinformatics Institute.

UniProtKB/TrEMBL     Computer-annotated supplement to Swiss-Prot from the Swiss Institute of Bioinformatics and

                     the EMBL Outstation - the European Bioinformatics Institute.

YEPD                 Yeast electrophoresis protein database

-------------        ---------------------------------------------------------------------

The second item on the DR line, the primary identifier, is an unambiguous pointer to the information entry in the database to which reference is being made. For a UniProtKB/Swiss-Prot reference, the primary identifier is the first accession number (also called the Unique Identifier in some databases) of the entry to which reference is being made.

The third and last item on the DR line, the secondary identifier, is used to complement the information given by the first identifier. For a UniProtKB/Swiss-Prot reference the secondary identifier is the entry name.

Examples of complete DR lines are shown here:

DR   UniProtKB/Swiss-Prot; P00352; DHAC_HUMAN.

DR   ECO2DBASE; G052.0; 6TH EDITION.

DR   HSC-2DPAGE; P47985; HUMAN.

DR   Siena-2DPAGE; P38646; GR75_HUMAN.

DR   PHCI-2DPAGE; P09211; GTP_HUMAN.

DR   YEPD; 4270; -.

The // line

The // (terminator) line contains no data or comments. It designates the end of an entry.