Attention: World-2DPAGE is no longer maintained. Previously submitted data can still be queried at World-2DPAGE Repository.
![]() |
The Make2D-DB II Packagecurrent version: 3.10.2 -- December 2011 |
Read-Me: Data Preparation
Related documents:
- The Make2D-DB II Site - For an up-to-date documentation, news and FAQ
- Readme: Main - Introduction and Installation
- Readme: Configuration
- The configuration process
- Readme: Interface - The server query and management interface
- Changes - Recent major changes and fixed bugs
- SWISS-2DPAGE user manual - (cf. also http://world-2dpage.expasy.org/swiss-2dpage/docs/manch2d.html for the most recent version)
This document contains the instructions on how to prepare your data before launching the checking or the conversion (transform / update) process. If you are only installing a Web portal with no local data, then you are not concerned by this document.
Use by
default the 'data'
directory
to place in your flat file, XML or text/CSV reports, as well as your
map
images and other specific files (like the hidden* files, cf. below). You may also opt for a
different
directory to look for your data (to be defined within the configuration
files). In such case, use your directory path in the following guidelines
rather than the
default
'data'
directory.
A test database has been prepared for you to test the tool. The data set, combined with some additional examples, are to be found in the data_test directory, so you can also check the nature of data to be used. Please, note that this is a totally fictive set of data with no real biological significance. If you are testing the flat file format, the database name within the postgreSQL catalog will be by default 'test_2dpage', otherwise define the database name yourself during the configuration process. You can have a look at this test database by clicking here.
Note: Only since version 1.0 it becomes possible to provide your data in various formats (simple text reports, spreadsheets or XML files), in addition to the previously required flat file format (the SWISS-2DPAGE-like text file listing sequentially your proteins). Nevertheless, a format similar to the flat file format is still used *internally* by the tool. By providing your data in any other format, the tool would still generate automatically an intermediate flat file to work with. A copy of this generated file will be placed in your data directory (as well as a similar copy in the temp directory). This file is named "last_generated_flat_file.dat". During the conversion process, you may have a look at it. If you wish to, you may even interrupt the process to edit this file for any personal add or change. You should then restart your process again by defining your data source to be read form this flat file.
The images
For each map, create 2 corresponding images. One with exactly the same dimensions (width and height) as the original map image, and a small one (with approximately a size of 100 pixels x 100 pixels). It is possible to use images of different size than the original ones, and even to shift their origin, which by default is located at the top-left corner (more details are available in the configuration document, see Readme: Configuration).
Both of the two image files should have exactly the same name as the original map name referred to in the IM line of your flat file - the text file listing sequentially your proteins in case you are providing a SWISS-2DPAGE like flat file - or/and as listed in your 'existing.maps' file (see below), except that the small image name should be preceded by the prefix 'small_'. Map names should be upper-cased and should not contain any spaces (e.g. LIVER_MOUSE, PLASMA_4-7). Use any valid graphic type (gif, tif, png, jpg..) and add its extension to the image name. Using 'png' or 'jpg' image format would enhance the speed of your images display, while using 'tif' images would slow it down.example for a map called PLASMA: (e.g. "IM PLASMA" line in a SWISS-2DPAGE like flat file or simply a map called PLASMA in your 'existing.maps' file)
create, for example, 'PLASMA.png' (usually the same dimension as the original map image) and 'small_PLASMA.png' (the 'small_MapName' image)Put all your images in the 'data' directory. Or place them respectively in two sub-directories called 'images' and 'small_images'
Tip: If you have a logo image that you want to display on your Web interface, then put it also in this 'data' directory.
[top]
Listing your maps
Important:
During the configuration process (perl make2db.pl -m config), you are asked if you want to generate a maps' file (third choice from the very first level). You will then be guided to input several annotations for each of your maps. It is highly recommended to generate your file this way, as this offers richer annotations for your maps.
The new version of the tool lets you annotate the following fields:
example of a maps' file generated during the configuration process (note that, for each map, parameters are separated by TABs and are all in a single line).
- the map name itself, one single upper-cased word, e.g. PLASMA or PLASMA_4-7; all your subsequent gel text reports - if any - should have the same name used in here, plus a '.txt' extension (required parameter)
- a more descriptive and longer name (optional parameter)
- width of the image in pixel / X-coordinates (required)
- height of the image in pixel / Y-coordinates (required)
- pI start and end values (optional)
- Mw start and end values (optional)
- taxonomy ID (optional)
- a description of the species strain if needed (optional)
- a tissue name - only names listed in the UniProtKB tissue list are accepted (the ID or the SY line, e.g. Abdomen) - you may contact us if you wish to add any tissue not present in this list (optional)
- a list of mapping (identification) methods applied to the whole map for the spots' identification (optional)
- URL (uri) for both the preparation and informatics parts (optional)
- local documents for both the preparation and informatics parts, e.g. PSI-MIAPE documents, PSI-Gel documents (optional)
- short comments for both the preparation and informatics parts (optional)
- software used for the detection (optional)
- any related comments (optional)
- number of detected spots for statistical data; this will override the number of detected spots read from the Melanie XML reports when given (optional)
- shift the X position of the image in pixel (optional) -- note: this value will be overriden by any *defined* shifting in '2d_include.pl:map_shift_left', which acts on *all* maps all together
- shift the Y position of the image in pixel (optional) -- note: this value will be overriden by any *defined* shifting in '2d_include.pl:map_shift_down', which acts on *all* maps all together
- adapt spots position horizontlly using a ratio value (optional) -- note: this value will be overriden by any *defined* ratio in '2d_include.pl:map_x_ratio', which acts on *all* maps all together
- adapt spots position vertically using a ratio value (optional) -- note: this value will be overriden by any *defined* ratio in '2d_include.pl:map_y_ratio', which acts on *all* maps all together
Be sure to place your generated 'existing.maps' file into your 'data' directory.
The following old format, still accepted but deprecated, has been kept only for compatibility:
The old way:
Create a text file called "existing.maps" containing the list of all your map images. Each line should contain details for one map. The minimal syntax to follow is:
map_short_name map_long_name width height
"map_short_name" is the name of your map (e.g. PLASMA) and the corresponding images files (if you are not a Melanie / ImageMasterTM 2D Platinum 5.0 user, this name, combined with the name of your database will be also used as a unique identifier for your map). "map_long_name" is a more descriptive name for the map to be displayed. Spaces are now allowed to separate words. width and height are the X and Y dimensions in pixels of your original map image. Finally, separate fields with a tabulation.
Example of an existing.maps file (spaces could be tabs or just spaces):
LIVER Human Liver original Geneva 600 800
PLASMA Human Plasma number 11 1200 1600When using a flat file: remember that each map name should be written exactly as it is written in your data flat file IM fields.
Finally you may also add a tissue name at the end of each line (optional). Only tissues listed on the tisslist.txt, the tisslist_initial.txt or the tisslist_aliases.txt will be retained; example.
Put your exisiting.maps file in the 'data' directory.
[top]
The spots and entries identification and annotation
This section describes how to prepare your spots' data.
The term 'entry' (or entries) is commonly used in this document as a synoym for 'protein'.
Before going any further, you should first remember that there are three manners to provide this data to the tool. You have the choice between the following three options:
- A Flat File: A SWISS-2DPAGE-like text file, listing sequentially your proteins. This file has to be combined either with simple spot lists (defining the spots position), or with some Melanie / ImageMasterTM 2D Platinum text reports or XML exports.
- Spreadsheets (CSV / tab-delimited text files, e.g. EXCEL exports)
- Melanie / ImageMasterTM 2D Platinum XML exports alone.
Depending on your choice, you will have different levels of granularity for your annotations. Internally, the tool will always partially rely on a flat file, be it provided by the user, or generated by the tool itself from the spreadsheets or the Melanie exports.
The flat file offers the more structured manner to provide data for the tool. It has the advantage of being strict (in the positive way) and can be extremely rich. The drawback is that it is hard to manually generate a flat file highly annotated and correctly formated.
The spreadsheets have the advantage to be simple to generate. Many laboratories do already work with this format to store their 2D data. Using spreadsheets the user is totally free to define the extent of his annotations from very basic annotations to extremely rich and user defined annotations. The main drawback is that many user defined annotation categories make it harder to link data between researchers (a semantic problem), specially that no unambiguous ontology has been defined yet.
The Melanie XML exports is, among the three options, the easiest way to generate data, assuming of course that the maps are accurately annotated with this software. In the meanwhile, those annotations are currently quite limited and the XML schema itself does not follow any wide-spread standard, as such a standard does not exist yet.
We will detail separately each of those options. You may retain that if you are not providing your personal flat file, you will always have the possibility to work with the automatically generated one (to be found within your data directory, as well as in the temp directory, under the name of last_generated_flat_file.dat whenever you run the tool in the other two modes). You may want sometime to edit manually this generated file and decide to restart your conversion process based on your edited copy (by switching to the flat file mode from your configuration files and defining the $db_file variable equal to 'last_generated_flat_file.dat' or any other name if you save your modified copy under another name).
One more remark. During the data/syntax checking, you may encounter error messages complaining about some data inconsistency. Some of those messages would point to a section from a flat file where the error has been detected, even if you have not provided any flat file. As a major part of your data is translated internally into the flat file format, inconsistency in this flat file may be traced back so you may find the source of error in your original spreadsheets or text reports.
Finally, it is important to signal that a major part of the external updates rely on some protein index, which is the Swiss-Prot/UniProtKB accession numbers. By providing such identifiers for your identified proteins (to be your accession numbers or as cross-references), you ensure to get the maximum profit from this feature, and to be more "visible" to other remote Make2D-DB II databases (the tool creates dynamic cross-references between the remote databases based on this index or the SWISS-2DPAGE one).
WORKING WITH FLAT FILES
The
spots' text reports
(Melanie / ImageMasterTM 2D Platinum users working with XML exported reports or with text exported reports, combined with a flat file, do not need to read this first sub-section, you may directly go to "Other supported report format")
The spots reports are text files that list the spots' coordinates within a map image. There should be one report per map in the 'data' directory. This report should be given the name of the corresponding map exactly as it is written in the 'IM' line of your flat file. It should also have a '.txt' extension (e.g: PLASMA.txt).Each report should contain a line for each identified spot on the corresponding map, indicating the spot identifier (spot's name) and its position on the image (given in pixel). Spaces could be a tab or just simple spaces. Actually, there are several accepted line syntaxes.
Tip: many 2D-PAGE software should let you easily export this type of report files.
Once generated, put all of your reports in the 'data' directory. Make sure they have been saved in text format.
General syntax of a report line:
Spot_ID x_position y_position [%Od] [%Vol]
Separate fields with spaces (or tabs). Spot_ID is the identifier given to the spot/band, x_position is the spot position (in pixels) from left to right, y_position is the spot position from top to bottom. In 1D maps (SDS) you can even omit the x_position field, a default value will then be read from the configuration file (see Readme: Configuration).
You have also the choice to include values for both the relative optical density (%Od) and the relative volume (%Vol) for each spot (expressed in %). If you give one single value it will be interpreted as being the %Vol value.- Example of a MAP.txt file (header lines containing double quotes around field names are optional and will be ignored). Spaces can be a tab or just simple spaces:
423 120 210- for an SDS.txt file (minimal data):
424 100 120
...or
425 300 400 0.012 0.0345
...426 300Note: If you want to use any field other than the SpotID as your spot identifier (e.g. using SWISS-2DPAGE like SerialNumbers), then simply replace the SpotID field by the desired annotation field you want to use as your spot identifier (thought, make sure this annotation field is unique per spot), e.g.
2D-ABC123 120 210
Other examples: PLASMA.txt, PLASMA2.txt from the
test_2dpage database. In the second example, you may notice that some
extra annotations, i.e.
"pi:4.85 mw:22158" (syntax
is "pi:value
mw:value", separated by spaces or a tabulation)
(full syntax: "Spot_ID x_position y_position
[%Od] [%Vol] [pi:value] [mw:value]",
square brackets mean values within are optional)
Other supported
spot report formats (Melanie
/
ImageMasterTM 2D Platinum
users):
By defining the configuration variable '$Melanie = 1' in your include.cfg
configuration file, you tell the tool to look for some Melanie reports.
The tool will start by searching any file with the extension .xml (XML files) inside your data directory (e.g. anything.xml) and will parse them.
If none is present, then it will look for text files (.txt) corresponding to the different
maps listed in your database (e.g. PLASMA.txt
and PLASMA2.txt).
You may use
the default text spot reports generated by Melanie:
If you are using the free of charge Melanie / ImageMaster Viewer (tested up to version 5.02), you can directly use the generated spot report which also exports the following data by default (make sure that they are listed in the following order):
"GelName" SpotID X Y Pi Mw Od Area Vol %Od %Vol Circularity/Saliency.
To use another annotation (SerialNumber) instead of the default SpotID to be your spot identifiers, simply add (export) this annotation as an additional last field in each of your report lines:
e.g.
"GelName" SpotID X Y Pi Mw Od Area Vol %Od %Vol Circularity/Saliency "SerialNumber/your_annotation"
You
may also work in combination with the Melanie XML exports:
Make sure you have the common perl XML::Parser and libxml-perl modules installed on your system (the tool will need to use the XML::Parser::PerlSAX perl module). If not, ask your system administrator to install a recent version, or simply prepare your reports as described in the sub-sections above.
The Make2D-DB MelanieXMLParser module will extract the name of the
gel from the Melanie XML file. If the original Melanie Image file name
is different from the gel name used in the flat file IM lines, then
name the corresponding xml files to the appropriate gel names (e.g.
PLASMA.xml and PLASMA2.xml), one
gel per file. Otherwise, you may use any name and group several
gels inside one single XML file, provided it has the extension .xml.
It is not strictly required to prepare an "existing.maps" file (if
present, it will override Melanie XML values). The pI/Mw values will
then be
read from
the Melanie export, overriding any values given in your flat
file (for your flat file, the syntax "2D
-!-
PI/MW: SPOT spotID" without any given pI/Mw values will be then
accepted).
All graphically detected spots will be integrated into your database
(being annotated/identified or not) if you set the variable "$include_not_identified_spots = 1" in your configuration file include.cfg.
Extracting spots' positions directly from annotated Melanie / ImageMasterTM 2D Platinum:
-- deprecated --
If you
are a Melanie 4 /
ImageMasterTM 2D Platinum 5.0 user (or higher) and you don't
wish to
export those spots' reports yourself, then you simply
do not create them. By not finding the spots' reports, the tool will
try
to analyze the Melanie / ImageMasterTM 2D Platinum
5.0 maps
themselves to extract the spots'
positions. If your maps are saved in the Melanie II or the Melanie 3
format, and you
do not have a copy of Melanie 4 or ImageMasterTM 2D Platinum
5.0 (or higher), you can still convert your maps
using the ImageMaster/Melanie
Viewer
(version 4.08 and up).
By doing
so, you should be aware that the tool will rely on the Melanie
/ ImageMasterTM 2D Platinum
5.0 SpotID
field to refer to your spots, and to the pI/Mw values given
in your flat
file.
-- deprecated --
To extract spot annotations, the tool will try at first to read any exported Melanie XML file with a .xml extension, provided it has been configured to read Melanie / ImageMaster data ("$Melanie = 1" in include.cfg).
It will then look on all text reports (for files named MAP.txt, where MAP is the different map names given in your database flat file).
If no reports are found, the tool will try to directly extract annotations from the Melanie images themselves (it is recommended not to rely on this step) as this option is being deprecated!
The Database Flat File
Create and place in the data
directory your database flat file (text file) containing one
entry per protein. Entries are separated by a // line. The
usual
headers used with the first version of make2ddb are optional and will
be
ignored except for the database name if no name has been given in the
configuration
file.
Before going any further, please, make sure you are familiar with the syntax described in the SWISS-2DPAGE user manual which lists in more details a large part of the syntax to be adopted.
Compared to the syntax described in the
above link, the tool offers
much more tolerance vis-a-vis of the syntax. It lets you also define a list of
default values to be applied whenever a required information is
missing. Finally, some extra specific additions have been adopted for
the
Mass Spectrometry annotations.
Example of a simple xxx.dat file (fictive entry / some Make2D-DB II optional keywords are not displayed for simplification) :ID HC_HUMAN; STANDARD; 2DG.
AC P02760; P02759; P00977;
DE Alpha-1-microglobulin/ Inter-alpha-trypsin inhibit or light chain
DE (PROTEIN HC) (HI30).
IM LIVER, PLASMA.
RN [1]
RP MAPPING ON GEL.
RX MEDLINE; 78094420.
RA Anderson N.L., Anderson N.G.;
RT "High Resolution 2-DE of human Liver";
RL Proc. Natl. Acad. Sci. U.S.A. 74:5421-5425(1977).
2D -!- MASTER: LIVER;
2D -!- PI/MW: SPOT 1=5.12/30851;
2D -!- PI/MW: SPOT 2=5.07/29736;
2D -!- MASTER: PLASMA;
2D -!- PI/MW: SPOT 1=4.86/33544;
2D -!- PI/MW: SPOT 2=4.96/32167;
2D -!- PI/MW: SPOT 3=5.07/31046;
DR Swiss-Prot; P02760; HC_HUMAN.
//
ID CRP_HUMAN; PRELIMINARY; 2DG.
AC P02741;
DE C-reactive protein precursor.
IM PLASMA.
RN [1]
RP MAPPING ON GEL.
RA Anderson N.L.;
RL Personal Communication(1993).
CC -!- SUBUNIT: HOMOPENTAMER.
2D -!- MASTER: PLASMA;
2D -!- PI/MW: SPOT 999=5.12/23908;
DR Swiss-Prot; P02741; CRP_HUMAN.
DR SWISS-2DPAGE; P02741; CRP_HUMAN.
//
Notes:
The image names used in the existing.maps file, in the IM line and in the ' 2D -!- MASTER ' line should be exactly the same (e.g. PLASMA). Consequently, the image names should be upper-cased, and they should not have any characters other than letters, underscores, digits and '-'.
The database text file is structured to be readable by humans as well as by computer programs. The different lines describing one entry begins with a two-character line code, which indicates the type of data contained in this line. The remaining part of the line should follow the given rules, otherwise the conversion will not work properly (errors are signaled). Especially, in case they are provided, for the lines described extensively below: the given structure should be strictly respected:
- The ID line (optional):
The ID (IDentification)
line is the first line of an entry.
The general form of the ID line is: - The AC
line:
The AC (ACcession
number) line lists the accession numbers
associated
with an entry. The accession numbers are separated by semicolons and
the
list is terminated by a semicolon. If necessary, more than one AC line
will be used. An example of an accession number line is shown below: - The DE line (optioanl):
The DE (DEscription)
lines contain general descriptive
information
about the protein stored. This information is generally sufficient to
identify
the protein precisely. The format of the DE lines is: - The
IM line (optional):
The IM (IMages)
line lists the 2-D PAGE images which are
associated
with the entry. The images are separated by commas, and the list is
terminated
by a period. An images line example is shown here: - The RA lines (optional if a default bibliographic
reference is defined in the configuration files):
The RA (Reference Author)
lines list the authors of the
paper (or any other type of work) cited. All of the authors are included, and are
listed
in the order given in the paper. The names are listed surname first
followed
by a blank followed by initial(s) with periods. The authors' names are
separated by commas and terminated by a semicolon. Author names are not
split between lines. An example of the use of RA lines is shown below: - The DR lines (optional):
The DR (Database
cross-Reference) lines are used as
pointers
to information related to an entry and found in other databases. The
format
of the DR line is: - The // line: The // (terminator) line contains no data or comments. It designates the end of an entry.
ID Entry_Name; ENTRY_CLASS; 2DG.
Entry_Class and 2DG are optional.
If you omit the ID line, the AC value will be also taken as an ID Entry_Name, until the external data integration is performed over your data.
AC P07237; P30037; P32079;
Entries will have more than one accession number if they have been merged or split. For example, when two entries are merged into one, a new accession number goes at the start of the AC line, and those from the merged entries are listed after this one. Similarly, if an existing entry is split into two or more entries, the original accession number list is retained in all the derived entries.
DE Description of my protein.
The description is given in ordinary English and is free-text. In some cases, more than one DE line are necessary; in this case, the text is divided only between words and only the last DE line is terminated by a period.
IM LIVER, PLASMA. This line is not necessary anymore, as the map names are read either from the 2D sections or from a given default value.
RA Edwards J., Anderson N.G., Nance S.L.,
RA Anderson N.L.;
As many RA lines as necessary are included for each reference.
DR DATABASE; PRIMARY_IDENTIFIER; SECONDARY_IDENTIFIER.
Examples of complete DR lines are shown here:
DR Swiss-Prot; P00352; DHAC_HUMAN.
DR ECO2DBASE; G052.0; 6TH EDITION.
DR HSC-2DPAGE; P47985; HUMAN.
DR YEPD; 4270; -.
For Make2D-DB II, many lines (ID, DE, DT, GN, OS, OC, OX, IM, RP, RX, RA, RL, RT, CC, DR,..) are not explicitly required within your database text file. Meanwhile, you may need to set up default values for some of them (DT, OS, OC, OX, RP, RA, RL) in the configuration file 'include.cfg' (see Readme: Configuration). The fields IM and MA can be totally omitted from the database text file. IM field is internally evaluated, when missing, by reading the 2D -!- Master lines. If you define a Taxonomy ID value for one or more of your maps within your existing.maps file, then entries belonging to those maps will also adopt their TaxID (except when you force a specific species annotation for some individual entry by defining for it a specific OX field). A different set of entries, forming the test database are listed in this flat file (test.dat). The first entry, Z02760 (HC_HUMAN) is an extended entry. A "minimal entry" text ( with the minimal required data) is shown within this test database (test.dat). It has the accession number "ZI|GI.MINIMAL" and only contains 3 types of lines: AC, 2D and DR. The tool tries then to add some missing values based on the given configuration files and the extracted external data related to the UniProtKB (Swiss-Prot or TrEMBL) entry given by the UniProtKB (Swiss-Prot or TrEMBL) DR cross-reference line. Entry "P12345" has even the very strict minimum required for an entry (one AC line and two 2D lines for 1 spot location). The tool recognizes that "P12345" is a UniProtKB/Swiss-Prot accession number and automatically cross-references the entry based on this identifier. |
Compared to the original SWISS-2DPAGE manual, some syntax modifications on the 2D lines have been adopted by the tool to suite the need for a more elaborate annotation for PMF lines (peptides fingerprinting) and MS/MS lines (tandem mass spectrometry) combined with peptide sequences. The rules are:
- All the standard syntax is still perfectly sufficient, e.g. for a PMF list:
2D -!- PEPTIDE MASSES: SPOT 'SERIAL NUMBER': MASSES_LIST;e.g.
2D SUBSEQUENT MASSES_LIST; 'ENZYME'.
2D -!- PEPTIDE MASSES: SPOT 1234: 1001.631; 1267.653; 1731.898;
2D 1821.909; TRYPSIN.
2D -!- PEPTIDE MASSES: SPOT 1234: 1001.631 (243.34); 1267.653 (190.11); 1731.898 (340.81);
2D 1821.909 (301.11); TRYPSIN.
2D -!- MASS SPECTROMETRY: SPOT 89: [ParentPetideMass:ParentPeptideCharge] MASSES_LIST;
2D SUBSEQUENT MASSES_LIST.
The ParentPetideMass and the ParentPeptideCharge are optional. If present they are separated
by a colon and given inside square brackets. If just one value is given, it is considered to be the parent charge.
The syntax for the masses and their intensities are similar to the PMF syntax.
A final period '.' is required at the end of the very last line of the section.
e.g.
2D -!- MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94); 286.107811 (280.22);
2D 458.7317 (595.24); 859.5010 (379.048); 859.7686 (171.43); 860.4855 (113.333).
2D -!- MASS SPECTROMETRY: [1200.7:1-] 869.468(3.09);524.448(2.67);635.708(3.17);712.129(1.2)::777.77(3.7);888.48(2.8);...
- You can include related local MS files to be displayed, or external URLs if data is stored on the Web (e.g. on some repository). The keywords to use are: file, ident-file, uri and ident-uri. A colon separate the keywords from their value (a file path or a Web address).
-
file for a local MS file, e.g.
2D -!- MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; file:/some_path/msms.pkl.
- ident-file
for a local MS identification report, e.g (a Mascot report).
2D -!- MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; ident-file:/some_path/msIdentResults.dat.
- uri
for a MS file located on the Web, e.g.
2D -!- MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; uri:http://www.ebi.ac.uk/pride/search.do?someID.
- ident-uri
for a MS identification report located on the Web, e.g.
2D -!- MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; ident-uri:http://www.ebi.ac.uk/pride/search.do?someID.
e.g.
2D file:msms.pkl uri:http://www.ebi.ac.uk/pride/search.do?directLink=true&experimentAccessionNumber=1
2D ident-file:msIdentResults.dat
You will probably not need to give any MASS_LIST when pointing to some file (as those files should contain the peak list values themselves). Nevertheless, you should still give an Enzyme Name when dealing with "PEPTIDE MASSES" (PMF) data.
- A keyword to tell Maked2D-DB II that the identification is to be hidden from public access (by default this keyword is 'private') may be added between brackets before the final period.
2D -!- MASS SPECTROMETRY: [1723.9581:1+] 270.074448 (491.94);...; file:/some_path/msms.pkl {private}.Though, there is a far better alternative way to control the visibility of your data. The section hiding parts of your data from public access explains how to set this up.
- When listing several lists of "MASS SPECTROMETRY" and their corresponding identified "PEPTIDE SEQUENCES", the order of correspondence between the MS data section and the identified peptides section follows the same order in which they are given. e.g. the first "PEPTIDE SEQUENCES" list correlates with the first "MASS SPECTROMETRY" list, and so on...
- The mapping (identification) methods are vocabulary controlled and are defined in the editable basic_include.pl main configuration file inside the %mapping_methods_description list. You may redefine or add your own mapping methods within this list (contact us if any help is needed).
WORKING
WITH
SPREADSHEETS (Tab-delimited files)
Being simple text files, it is also possible to write manually such reports in any text editor, taking care to separate fields with tabs, and to save in simple text format.
You instruct Make2D-DB II to work in the spreadsheets mode by defining in your config.cfg configuration file the $db_file variable (the flat file name) to be empty ($db_file = "") and by setting the $Melanie variable to null ($Melanie = 0).
You should provide a separate report file for each of your maps. The report file name should be written exactly like the Gel name you would have given in your existing.maps configuration file. You should always use the extension '.txt' (e.g. PLASMA.txt, PLASMA2.txt or PLASMA3.txt)
The first line of your report should contain the headers for the various columns. Those headers will be used by the tool to know what is the annotation category of each column. Headers can follow any order in your report, except for the very first header which has always to be the "SPOT" header.
Do not duplicate any header, instead, check below for each header category how to separate different elements.
All headers will be upper-cased by the tool. They may be contained inside double quotes or not.
There are three main categories of headers:
- The mandatory headers:
those are required headers, the tool will
complain if they are missing, if the values in the columns are not
defined or if they are syntactically incorrect
- The pre-defined headers: those are optional headers, the tool will only complain if the values are not following the expected syntax
- The free-text headers: those are defined by the user, they fall into 2 different classes: the "2D" and the "COMMENT" class, no syntax check is applied
The mandatory headers
- "SPOT" header: The column for this header should be the first column to be defined. It contains the spot ID. You may use any single word for the values (e.g. 900 or 2D-TWX222).
- "X" header: This is the x-coordinates of the spot on the gel image (the width value) in pixel. Values should be positive or 0.
- "Y" header: This is the y-coordinates of the spot on the gel image (the height value) in pixel. Values should be positive or 0.
- "MW"
header: The apparent molecular weight of the spot on the gel. Values
are given in Dalton*. Use only integer numbers.
You may have several lines with the same spot ID. This is useful when you want to include several annotations for the same spot (like when you have several identified proteins for the same spot, or when you have several independent MS analysis, etc..). When a spot is listed more than once, its X/Y coordinates, as well as its pI/Mw values, are only retained from their last occurrence. It is also not necessary to give again the X , Y, MW and PI values for a spot after they have been already given in a precedent line (c.f PLASMA2.txt).
The origin to evaluate the X/Y positions is the top-left corner of the image.
*You may also give MW values in kDa. The tool will assume they are in kDa if their values are low enough not to be in Dalton (e.g. 20.5).
The pre-defined headers
- "PI" header: The apparent pI of the spot on the gel. If this column is not present then we are in presence of a SDS gel (bands). Otherwise, define a positive value starting from 0 (use real numbers, e-g. 7.443). The tool expects to find a defined value for all spots or no value at all for all of them.
- "AC" header: This is the column to hold the identified protein accession numbers (if known). Give a Swiss-Prot (UniProtKB) accession number for best results. Leave blank if no protein has been identified. When several proteins are identified for the same spot, write an independent line for each of them (e.g. spot 397 from the PLASMA.txt report).
- "MAPPING METHODS" header: You may use this column to list the different mapping (identification) methods used for the spot's identification. The mapping methods are vocabulary controlled and are defined in the editable basic_include.pl main configuration file inside the %mapping_methods_description list. You may use here the keywords separated by commas (e.g. "MS/MS, Gm, Co" to display 'Tandem mass spectrometry', 'Gel matching' and 'Comigration' within your entries). You may redefine or add your own mapping / identification methods within this list (contact us if any help is needed).
- "OD"
header (alias "%OD"):
Relative optical densities (%Od)
are listed here. Values range from 0.0 up to 100.0 (use real numbers,
e.g. 0.32112).
- "VOL" header (alias "%VOL"): Relative volumes (%Vol) are listed here. Values range from 0.0 up to 100.0 (use real numbers, e.g. 0.32112).
- "AMINO
ACID" header: This column is used to list the experimental
analysis results by amino acid composition. The syntax follows the one
shown in the
SWISS-2DPAGE 2D lines manual for the "AMINO ACID COMPOSITION"
- "PMF"
header: Peptide fingerprinting peak lists are listed here and
follows basically the
SWISS-2DPAGE 2D lines manual syntax for "PEPTIDE MASSES". You may
also include the intensities of the pics following the intensity rule and the ident data rule given in the previous
section.
- "MS" header (alias "MS/MS" or "MASS SPECTROMETRY"): Tandem mass sepctrometry peak lists are listed here and follows the Mass Spectrometry rule, as well as the intensity rule and the ident data rule given in the previous section.
- "PMF FILE" header: Instead of listing your PMF peak lists yourself, you may just give the absolute or relative path for your local PMF experimental data file (e.g. a pmf.dta file) in this column. The tool will execute the appropriate conversion over your files to include their content within your database.
- "MS FILE" header: Instead of listing your tandem MS peak lists yourself, you may just give the absolute or relative path for your local MS experimental data file (e.g. a msms.mgf file) in this column. The tool will execute the appropriate conversion over your files to include their content within your database. The tool usually rely on the file extension to "guess" its format. You will need, depending on the format you are using, to explicitly tell Make2D-DB II what is the used format. Read the note entitled "Input formats for MS/MS" below for more details.
- "PMF URI"
header: Here you can give a URL (namely URI) pointing
to your experimental data to be viewed if the later is stored in some repository
(e.g. PRIDE) or is accessible from the Web. You can still populate the
column "PMF" with peak list data if you wish to.
- "MS URI" header: Here you can give a URL (namely URI) pointing to your eperimental data if the later is stored in some repository (e.g. PRIDE) or is accessible from the Web. You can still populate the column "MS" with peak list data if you wish to.
- "PMF IDENT-FILE" header: PMF Analysis documents/reports can be given here (e.g. a Mascot search report) when they are present. Give an absolute or relative path for your local files.
- "MS IDENT-FILE" header: MS Analysis documents/reports can be given here (e.g. a PSI AnalysisXML or a Phenyx search report) when they are present. Give an absolute or relative path for your local files.
- "PMF IDENT-URI" header: Like the "PMF URI" header, you may give URLs pointing to some repository or any Web location where your PMF identification/analysis report may be viewed.
- "MS IDENT-URI" header: Like the "MS URI" header, you may give URLs pointing to some repository or any Web location where your MS identification/analysis report may be viewed.
- "PEPTIDES"
header: The peptides are the identified peptide sequences related to
the MS/MS data. The syntax do exactly follow the one given in the
SWISS-2DPAGE 2D lines manual "PEPTIDE SEQUENCES".
The file report PLASMA2.txt gives many examples of MS annotations.
Listing several PMF/MS files or URIs: In order to list more than one element under the PMF/MS file and URI categories (headers 9 to 16), simply separate them by spaces. To ensure correspondance between elements across different categories (e.g. between analysis and identifcation files), respect the order they are listed with across the different columns.
- "REFERENCE" header: By listing your bibliographic references following the SWISS-2DPAGE format in a separate file that you call 'reference.txt' in your data directory (example), you can list in this column the reference numbers related to each entry. Many references can be given separated by commas, (e.g. 1,2,8). e.g. PLASMA2.txt. Remember that RP, RA (or RG) and RL lines - respectively the 'Reference Position', the 'Reference Author' (or the 'Reference Group') and the 'Reference Location' - must be defined in all references, all the other lines are optional (and no need for a RN line).
- "XREF" header (alias "CROSS-REFERENCES"): If a protein has been identified for your spot, you may list here as many cross-references to external ressources as you wish. The syntax to follow is "Xref_Database ID1 & Xref_Database ID1; ID2 & ..." (e.g. "Swiss-Prot P04040 & SWISS-2DPAGE P04040"). Only if your main accession number is already a UniProtKB (Swiss-Prot or TrEMBL) identifier that a large collection of cross-references will be automatically integrated, with no need to define anything for the XREF field. In the other hand, if your identifier is not a UniProtKB AC, you may find it very useful to define here a cross-reference to UniProtKB (Swiss-Prot or TrEMBL) to activate external data retrieval relatred to the UniProtKB. For more information on the cross-reference database list available with this tool, see cross-references.
The free-text headers
- The "COMMENT" class: Whenever your header begins with the keyword "COMMENT:" then it is considered a general comment related to the identified protein (e.g. "COMMENT: SUBUNIT" or "COMMENT: MISCELLANEOUS" columns in PLASMA.txt). No syntax check is applied.
- The "2D" class: All the other free-text headers will fall into this class. Those are considered as free-text 2D annotations. (e.g. "PATHOLOGY LEVEL" or "EXPRESSION" columns in PLASMA3.txt). No syntax check is applied.
A free 2D annotation is applied specifically to the spot it is given for. A convenient manner to apply a free 2D annotation to all spots of a map all at once is to precede the header name of the annotation by a star '*', e.g. "* EXPRESSION". If we would like to only apply the annotation to all the spots related to a particular protein, then precede the annotation itself by a star '*', and define the annotation for only one of the spots related to this protein, e.g. "* method not applicable on this protein".
For completion purpose, we should mention that the older format for spreadsheets is still accepted by the tool. This older format is much more restricted and does not support headers. It has 2 possible syntaxes:
the short syntax (without identification annotations)
Spot X Y pI Mw [AC1 AC2 AC3]and the long one (with ordered identification annotations, e.g. PLASMA3_noheaders.txt)
Spot X Y pI Mw [AC1] [IdentMethod1,IdentMethod2,..] [PMF] [MS/MS] [AMINO ACID COMPOSITION] [%od] [%vol]
Based on your CSV reports, the tool will generate an intermediate 'last_created_flat_file.dat' file. You may then choose to continue, or to interrupt the process of conversion. If you interrupt the process, you will be able to manually edit the 'last_created_flat_file.dat' now present in your data directory if you wish to add more annotations or to change others. You should then save the edited file under another name (e.g. newFlatFile.dat) and define the falt file variable $db_file to be equal to this new file name (without any path) before resuming your installation. This will then switch you to the flat file mode. Otherwise, continue to proceed without interruption.
WORKING WITH
MELANIE
(with no flat file)
Make sure you have the common perl XML::Parser and libxml-perl modules installed on your system (the tool will need to use the XML::Parser::PerlSAX perl module). If not, ask your system administrator to install a recent version.
Based on your Melanie XML exports, the tool will generate an intermediate 'last_created_flat_file.dat' file. You may then choose to continue, or to interrupt the process of conversion. If you interrupt the process, you will be able to manually edit the 'last_created_flat_file.dat' now present in your data directory if you wish to add more annotations or to change others. You should then save the edited file under another name (e.g. newFlatFile.dat) and define the falt file variable $db_file to be equal to this new file name (without any path) before resuming your installation. This will then switch you to the flat file mode. Otherwise, continue to proceed without interruption.
Hiding parts of your data from public access
- The hiddenGels.txt
file: This file takes the list of gels to be hidden from public users.
- The hiddenEntries.txt file: This file takes the list of protein accession numbers to be hidden from public users.
- The hiddenSpots.txt file: This file controls if an association between a spot and an identified protein is to be shown or not. It also controls if identification data from 'MS/MS' (tandem mass spectrometry), 'PMF' (peptide mass fingerprinting) or 'Aa' (amino acid composition) are to be displayed for public users or not.
All those three master files can be found in the readme directory. You may copy them to your data directory and then edit them using any text editor. The three master files fully describe the syntax to follow. Here are three examples of edited files located in the data_test directory: (hiddenGels.txt example, hiddenEntries.txt example and hiddenSpots.txt example)
The
test database
- in include.cfg: set
$db_file = "" and $Melanie = 0;
- in include.cfg: set
$db_file = "" and $Melanie = 1;
- in include.cfg: set $db_file = "test.dat" and $Melanie = 0;
- in include.cfg: set $db_file = "test.dat" and $Melanie = 1;
- The tool will always look for the presence of a file called "subtitle.html" in your 'data' directory to include it as a subtitle in your Web interface. So, it is a good place to write some description of your database, your institution, to include some logos, and so on.
The database cross-reference links
If you edit directly this file from the 'text' directory, the changes will apply to all your subsequent installations, but your changes may not remain permanent (because the file is automatically made up-to-date by contacting the ExPASy server). It is recommended that you update this file specifically for one installation by editing it, after your installation is complete, from your Web server directory where it has been copied (by default the copy of this file should be found in '/www/var/cgi-bin/2d/inc/links.txt' or similar). See Readme: Main for more details.
Related documents:
- The Make2D-DB II Site - For an up-to-date documentation, news and FAQ
- Readme: Main - Introduction and Installation
- Readme: Configuration
- The configuration process
- Readme: Interface - The server query and management interface
- Changes - Recent major changes and fixed bugs
- SWISS-2DPAGE user manual - (cf. also http://world-2dpage.expasy.org/swiss-2dpage/docs/manch2d.html for the most recent version)