World-2DPAGE Home
Make2D-DB II

The Make2D-DB II Package

current version: 3.10.2 -- December 2011



The Make2D-DB II Site on ExPASy - For an up-to-date documentation, news and FAQ




Read-Me: Data Preparation

Related documents:




This document contains the instructions on how to prepare your data before launching the checking or the conversion (transform / update) process. If you are only installing a Web portal with no local data, then you are not concerned by this document.

Use by default the 'data' directory to place in your flat file, XML or text/CSV reports, as well as your map images and other specific files (like the hidden* files, cf. below). You may also opt for a different directory to look for your data (to be defined within the configuration files). In such case, use your directory path in the following guidelines rather than the default 'data' directory.


A test database has been prepared for you to test the tool. The data set, combined with some additional examples, are to be found in the data_test directory, so you can also check the nature of data to be used. Please, note that this is a totally fictive set of data with no real biological significance. If you are testing the flat file format, the database name within the postgreSQL catalog will be by default 'test_2dpage', otherwise define the database name yourself during the configuration process. You can have a look at this test database by clicking here.

Note: Only since version 1.0 it becomes possible to provide your data in various formats (simple text reports, spreadsheets or XML files), in addition to the previously required flat file format (the SWISS-2DPAGE-like text file listing sequentially your proteins). Nevertheless, a format similar to the flat file format is still used *internally* by the tool. By providing your data in any other format, the tool would still generate automatically an intermediate flat file to work with. A copy of this generated file will be placed in your data directory (as well as a similar copy in the temp directory). This file is named "last_generated_flat_file.dat". During the conversion process, you may have a look at it. If you wish to, you may even interrupt the process to edit this file for any personal add or change. You should then restart your process again by defining your data source to be read form this flat file.




The images

For each map, create 2 corresponding images. One with exactly the same dimensions (width and height) as the original map image, and a small one (with approximately a size of 100 pixels x 100 pixels). It is possible to use images of different size than the original ones, and even to shift their origin, which by default is located at the top-left corner (more details are available in the configuration document, see Readme: Configuration).
Both of the two image files should have exactly the same name as the original map name referred to in the IM line of your flat file - the text file listing sequentially your proteins in case you are providing a SWISS-2DPAGE like flat file - or/and as listed in your 'existing.maps' file (see below), except that the small image name should be preceded by the prefix 'small_'. Map names should be upper-cased and should not contain any spaces (e.g. LIVER_MOUSE, PLASMA_4-7). Use any valid graphic type (gif, tif, png, jpg..) and add its extension to the image name. Using 'png' or 'jpg' image format would enhance the speed of your images display, while using 'tif' images would slow it down.

example for a map called PLASMA: (e.g. "IM   PLASMA" line in a SWISS-2DPAGE like flat file or simply a map called PLASMA in your 'existing.maps' file)
create, for example, 'PLASMA.png' (usually the same dimension as the original map image) and 'small_PLASMA.png' (the 'small_MapName' image)

Put all your images in the 'data' directory. Or place them respectively in two sub-directories called 'images' and 'small_images'

Tip: If you have a logo image that you want to display on your Web interface, then put it also in this 'data' directory.

 [top]



Listing your maps

 Note: Melanie / ImageMasterTM 2D Platinum 5.0 users working with XML exported reports do not explicitly need to perform this task, except that it offers much more annotations for your maps than the ones that can be extracted from the exported XML files.
Important:
During the configuration process (perl make2db.pl -m config), you are asked if you want to generate a maps' file (third choice from the very first level). You will then be guided to input several annotations for each of your maps. It is highly recommended to generate your file this way, as this offers richer annotations for your maps.

The new version of the tool lets you annotate the following fields:
  • the map name itself, one single upper-cased word, e.g. PLASMA or PLASMA_4-7; all your subsequent gel text reports - if any - should have the same name used in here, plus a '.txt' extension (required parameter)
  • a more descriptive and longer name (optional parameter)
  • width of the image in pixel / X-coordinates (required)
  • height of the image in pixel / Y-coordinates (required)
  • pI start and end values (optional)
  • Mw start and end values (optional)
  • taxonomy ID (optional)
  • a description of the species strain if needed (optional)
  • a tissue name - only names listed in the UniProtKB tissue list are accepted (the ID or the SY line, e.g. Abdomen) - you may contact us if you wish to add any tissue not present in this list (optional)
  • a list of mapping (identification) methods applied to the whole map for the spots' identification (optional)
  • URL (uri) for both the preparation and informatics parts (optional)
  • local documents for both the preparation and informatics parts, e.g. PSI-MIAPE documents, PSI-Gel documents (optional)
  • short comments for both the preparation and informatics parts (optional)
  • software used for the detection (optional)
  • any related comments (optional)
  • number of detected spots for statistical data; this will override the number of detected spots read from the Melanie XML reports when given (optional)
  • shift the X position of the image in pixel (optional) -- note: this value will be overriden by any *defined* shifting in '2d_include.pl:map_shift_left', which acts on *all* maps all together
  • shift the Y position of the image in pixel (optional) -- note: this value will be overriden by any *defined* shifting in '2d_include.pl:map_shift_down', which acts on *all* maps all together
  • adapt spots position horizontlly using a ratio value (optional) -- note: this value will be overriden by any *defined* ratio in '2d_include.pl:map_x_ratio', which acts on *all* maps all together
  • adapt spots position vertically using a ratio value (optional) -- note: this value will be overriden by any *defined* ratio in '2d_include.pl:map_y_ratio', which acts on *all* maps all together
example of a maps' file generated during the configuration process (note that, for each map, parameters are separated by TABs and are all in a single line).

Be sure to place your generated 'existing.maps' file into your 'data' directory.

The following old format, still accepted but deprecated, has been kept only for compatibility:
The old way:
Create a text file called "existing.maps" containing the list of all your map images. Each line should contain details for one map. The minimal syntax to follow is:
map_short_name   map_long_name   width   height
"map_short_name" is the name of your map (e.g. PLASMA) and the corresponding images files (if you are not a Melanie / ImageMasterTM 2D Platinum 5.0 user, this name, combined with the name of your database will be also used as a unique identifier for your map). "map_long_name" is a more descriptive name for the map to be displayed. Spaces are now allowed to separate words. width and height are the X and Y dimensions in pixels of your original map image. Finally, separate fields with a tabulation.
Example of an existing.maps file (spaces could be tabs or just spaces):  
LIVER   Human Liver original Geneva   600   800
PLASMA   Human Plasma number 11   1200   1600
When using a flat file: remember that each map name should be written exactly as it is written in your data flat file IM fields.
Finally you may also add a tissue name at the end of each line (optional). Only tissues listed on the tisslist.txt, the tisslist_initial.txt or the tisslist_aliases.txt will be retained; example.
Put your exisiting.maps file in the 'data' directory.

 
[top]



The spots and entries identification and annotation


This section describes how to prepare your spots' data.

The term 'entry' (or entries) is commonly used in this document as a synoym for 'protein'.

Before going any further, you should first remember that there are three manners to provide this data to the tool. You have the choice between the following three options:

  1. A Flat File: A SWISS-2DPAGE-like text file, listing sequentially your proteins. This file has to be combined either with simple spot lists (defining the spots position), or with some Melanie / ImageMasterTM 2D Platinum text reports or XML exports.
  2. Spreadsheets (CSV / tab-delimited text files, e.g. EXCEL exports)
  3. Melanie / ImageMasterTM 2D Platinum XML exports alone.

Depending on your choice, you will have different levels of granularity for your annotations. Internally, the tool will always partially rely on a flat file, be it provided by the user, or generated by the tool itself from the spreadsheets or the Melanie exports.

The flat file offers the more structured manner to provide data for the tool. It has the advantage of being strict (in the positive way) and can be extremely rich. The drawback is that it is hard to manually generate a flat file highly annotated and correctly formated.
The spreadsheets have the advantage to be simple to generate. Many laboratories do already work with this format to store their 2D data. Using spreadsheets the user is totally free to define the extent of his annotations from very basic annotations to extremely rich and user defined annotations. The main drawback is that many user defined annotation categories make it harder to link data between researchers (a semantic problem), specially that no unambiguous ontology has been defined yet.
The Melanie XML exports is, among the three options, the easiest way to generate data, assuming of course that the maps are accurately annotated with this software. In the meanwhile, those annotations are currently quite limited and the XML schema itself does not follow any wide-spread standard, as such a standard does not exist yet.

We will detail separately each of those options. You may retain that if you are not providing your personal flat file, you will always have the possibility to work with the automatically generated one (to be found within your data directory, as well as in the temp directory, under the name of last_generated_flat_file.dat whenever you run the tool in the other two modes). You may want sometime to edit manually this generated file and decide to restart your conversion process based on your edited copy (by switching to the flat file mode from your configuration files and defining the $db_file variable equal to 'last_generated_flat_file.dat' or any other name if you save your modified copy under another name).

One more remark. During the data/syntax checking, you may encounter error messages complaining about some data inconsistency. Some of those messages would point to a section from a flat file where the error has been detected, even if you have not provided any flat file. As a major part of your data is translated internally into the flat file format, inconsistency in this flat file may be traced back so you may find the source of error in your original spreadsheets or text reports.

Finally, it is important to signal that a major part of the external updates rely on some protein index, which is the Swiss-Prot/UniProtKB accession numbers. By providing such identifiers for your identified proteins (to be your accession numbers or as cross-references), you ensure to get the maximum profit from this feature, and to be more "visible" to other remote Make2D-DB II databases (the tool creates dynamic cross-references between the remote databases based on this index or the SWISS-2DPAGE one).

[top]




WORKING WITH FLAT FILES


By defining a non void string for the $db_file variable in your config.cfg file you are telling Make2D-DB II to work in the flat file mode.

The spots' text reports

(Melanie / ImageMasterTM 2D Platinum users working with XML exported reports or with text exported reports,  combined with a flat file, do not need to read this first sub-section, you may directly go to "Other supported report format")

The spots reports are text files that list the spots' coordinates within a map image. There should be one report per map in the 'data' directory. This report should be given the name of the corresponding map exactly as it is written in the 'IM' line of your flat file. It should also have a '.txt' extension (e.g: PLASMA.txt).

Each report should contain a line for each identified spot on the corresponding map, indicating the spot identifier (spot's name) and its position on the image (given in pixel). Spaces could be a tab or just simple spaces. Actually, there are several accepted line syntaxes.

Tip: many 2D-PAGE software should let you easily export this type of report files.

Once generated, put all of your reports in the 'data' directory. Make sure they have been saved in text format.
 

General syntax of a report line: 

Spot_ID   x_position   y_position [%Od [%Vol]
Separate fields with spaces (or tabs). Spot_ID is the identifier given to the spot/band, x_position is the spot position (in pixels) from left to right, y_position is the spot position from top to bottom. In 1D maps (SDS) you can even omit the x_position field, a default value will then be read from the configuration file (see Readme: Configuration).
You have also the choice to include values for both the relative optical density (%Od) and the relative volume (%Vol) for each spot (expressed in %). If you give one single value it will be interpreted as being the %Vol value.

- Example of a MAP.txt file (header lines containing double quotes around field names are optional and will be ignored). Spaces can be a tab or just simple spaces:

423   120   210
424   100   120
...

or

425   300   400   0.012   0.0345
...

- for an SDS.txt file (minimal data):
426   300
Note: If you want to use any field other than the SpotID as your spot identifier (e.g. using SWISS-2DPAGE like SerialNumbers), then simply replace the SpotID field by the desired annotation field you want to use as your spot identifier (thought, make sure this annotation field is unique per spot), e.g.

2D-ABC123   120   210

Other examples: PLASMA.txt, PLASMA2.txt from the test_2dpage database. In the second example, you may notice that some extra annotations, i.e.

"pi:4.85   mw:22158" (syntax is  "pi:value   mw:value", separated by spaces or a tabulation)
(full syntax: "Spot_ID   x_position   y_position [%Od [%Vol] [pi:value] [mw:value]", square brackets mean values within are optional)

were added at the end of the line. Those parameters (pI and Mw) are optional as the flat file already contains this information (see below). Defining those values inside the spots' report will make the tool ignore those read from the flat file and even accept the following syntax for a spot within your flat file: "2D   -!-   PI/MW: SPOT spotID", without values for pI and Mw. It is your own choice to decide to list those values here (reducing redundancy) or not.
     

    Other supported spot report formats (Melanie / ImageMasterTM 2D Platinum users):

    By defining the configuration variable '$Melanie = 1' in your include.cfg configuration file, you tell the tool to look for some Melanie reports. The tool will start by searching any file with the extension .xml (XML files) inside your data directory (e.g. anything.xml) and will parse them. If none is present, then it will look for text files (.txt) corresponding to the different maps listed in your database (e.g. PLASMA.txt and PLASMA2.txt).

    You may use the default text spot reports generated by Melanie:

    If you are using the free of charge Melanie / ImageMaster Viewer (tested up to version 5.02), you can directly use the generated spot report which also exports the following data by default (make sure that they are listed in the following order):

"GelName"  SpotID  X  Y  Pi  Mw  Od  Area  Vol  %Od  %Vol  Circularity/Saliency.

Those reports can be read and treated directly by the tool with no need to manipulate them.

To use another annotation (SerialNumber) instead of the default SpotID to be your spot identifiers, simply add (export) this annotation as an additional last field in each of your report lines:
e.g.

"GelName"  SpotID  X  Y  Pi  Mw  Od  Area  Vol  %Od  %Vol  Circularity/Saliency "SerialNumber/your_annotation"


    You may also work in combination with the Melanie  XML exports:

    Make sure you have the common perl XML::Parser and libxml-perl modules installed on your system (the tool will need to use the XML::Parser::PerlSAX perl module). If not, ask your system administrator to install a recent version, or simply prepare your reports as described in the sub-sections above.

    The Make2D-DB MelanieXMLParser module will extract the name of the gel from the Melanie XML file. If the original Melanie Image file name is different from the gel name used in the flat file IM lines, then name the corresponding xml files to the appropriate gel names (e.g. PLASMA.xml and PLASMA2.xml), one gel per file. Otherwise, you may use any name and group several gels inside one single XML file, provided it has the extension .xml.

    It is not strictly required to prepare an "existing.maps" file (if present, it will override Melanie XML values). The pI/Mw values will then be read from the Melanie export, overriding any values given in your flat file (for your flat file, the syntax "2D   -!-   PI/MW: SPOT spotID" without any given pI/Mw values will be then accepted).

    All graphically detected spots will be integrated into your database (being annotated/identified or not) if you set the variable "$include_not_identified_spots = 1" in your configuration file include.cfg.


    Extracting spots' positions directly from annotated Melanie / ImageMasterTM 2D Platinum:

    -- deprecated --
    If you are a Melanie 4 / ImageMasterTM 2D Platinum 5.0 user (or higher) and you don't wish to export those spots' reports yourself, then you simply do not create them. By not finding the spots' reports, the tool will try to analyze the Melanie / ImageMasterTM 2D Platinum 5.0 maps themselves to extract the spots' positions. If your maps are saved in the Melanie II or the Melanie 3 format, and you do not have a copy of Melanie 4 or ImageMasterTM 2D Platinum 5.0 (or higher), you can still convert your maps using the ImageMaster/Melanie Viewer (version 4.08 and up).
    By doing so, you should be aware that the tool will rely on the Melanie / ImageMasterTM 2D Platinum 5.0 SpotID field to refer to your spots, and to the pI/Mw values given in your flat file.
    -- deprecated --


Order for spot annotations' extraction (Melanie / ImageMasterTM 2D Platinum users):

To extract spot annotations, the tool will try at first to read any exported Melanie XML file with a .xml extension, provided it has been configured to read Melanie / ImageMaster data ("$Melanie = 1" in include.cfg).

It will then look on all text reports (for files named MAP.txt, where MAP is the different map names given in your database flat file).

If no reports are found, the tool will try to directly extract annotations from the Melanie images themselves (it is recommended not to rely on this step) as this option is being deprecated!


The Database Flat File


Create and place in the data directory your database flat file (text file) containing one entry per protein. Entries are separated by a // line. The usual headers used with the first version of make2ddb are optional and will be ignored except for the database name if no name has been given in the configuration file.

Before going any further, please, make sure you are familiar with the syntax described in the SWISS-2DPAGE user manual which lists in more details a large part of the syntax to be adopted.

Compared to the syntax described in the above link, the tool offers much more tolerance vis-a-vis of the syntax. It lets you also define a list of default values to be applied whenever a required information is missing. Finally, some extra specific additions have been adopted for the Mass Spectrometry annotations.

Example of a simple xxx.dat file (fictive entry / some Make2D-DB II optional keywords are not displayed for simplification) :

ID   HC_HUMAN;     STANDARD;     2DG.
AC   P02760; P02759; P00977;
DE   Alpha-1-microglobulin/ Inter-alpha-trypsin inhibit or light chain
DE   (PROTEIN HC) (HI30).
IM   LIVER, PLASMA.
RN   [1]
RP   MAPPING ON GEL.
RX   MEDLINE; 78094420.
RA   Anderson N.L., Anderson N.G.;
RT   "High Resolution 2-DE of human Liver";
RL   Proc. Natl. Acad. Sci. U.S.A. 74:5421-5425(1977).
2D   -!- MASTER: LIVER;
2D   -!-   PI/MW: SPOT 1=5.12/30851;
2D   -!-   PI/MW: SPOT 2=5.07/29736;
2D   -!- MASTER: PLASMA;
2D   -!-   PI/MW: SPOT 1=4.86/33544;
2D   -!-   PI/MW: SPOT 2=4.96/32167;
2D   -!-   PI/MW: SPOT 3=5.07/31046;
DR   Swiss-Prot; P02760; HC_HUMAN.
//
ID   CRP_HUMAN;     PRELIMINARY;     2DG.
AC   P02741;
DE   C-reactive protein precursor.
IM   PLASMA.
RN   [1]
RP   MAPPING ON GEL.
RA   Anderson N.L.;
RL   Personal Communication(1993).
CC   -!- SUBUNIT: HOMOPENTAMER.
2D   -!- MASTER: PLASMA;
2D   -!-   PI/MW: SPOT 999=5.12/23908;
DR   Swiss-Prot; P02741; CRP_HUMAN.
DR   SWISS-2DPAGE; P02741; CRP_HUMAN.
//

Notes:
  • The image names used in the existing.maps file, in the IM line and in the ' 2D   -!- MASTER ' line should be exactly the same (e.g. PLASMA). Consequently, the image names should be upper-cased, and they should not have any characters other than letters, underscores, digits and '-'.

  • The database text file is structured to be readable by humans as well as by computer programs. The different lines describing one entry begins with a two-character line code, which indicates the type of data contained in this line. The remaining part of the line should follow the given rules, otherwise the conversion will not work properly (errors are signaled). Especially, in case they are provided, for the lines described extensively below: the given structure should be strictly respected:  
  • For more details on each line type, please refer to the SWISS-2DPAGE manual. You can get the last release of this manual from the ExPASy server at http://world-2dpage.expasy.org/swiss-2dpage/docs/manch2d.html,or download it by ftp from ftp://ftp.expasy.org/databases/swiss-2dpage/manch2d.htm.



    For Make2D-DB II, many lines (ID, DE, DT, GN, OS, OC, OX, IM, RP, RX, RA, RL, RT, CC, DR,..) are not explicitly required within your database text file. Meanwhile, you may need to set up default values for some of them (DT, OS, OC, OX, RP, RA, RL) in the configuration file 'include.cfg' (see Readme: Configuration). The fields IM and MA can be totally omitted from the database text file. IM field is internally evaluated, when missing, by reading the 2D -!- Master lines. If you define a Taxonomy ID value for one or more of your maps within your existing.maps file, then entries belonging to those maps will also adopt their TaxID (except when you force a specific species annotation for some individual entry by defining for it a specific OX field).

    A different set of entries, forming the test database are listed in this flat file (test.dat). The first entry, Z02760 (HC_HUMAN) is an extended entry.
    A "minimal entry" text ( with the minimal required data) is shown within this test database (test.dat). It has the accession number "ZI|GI.MINIMAL" and only contains 3 types of lines: AC, 2D and DR. The tool tries then to add some missing values based on the given configuration files and the extracted external data related to the UniProtKB (Swiss-Prot or TrEMBL) entry given by the UniProtKB (Swiss-Prot or TrEMBL) DR cross-reference line.
    Entry "P12345" has even the very strict minimum required for an entry (one AC line and two 2D lines for 1 spot location). The tool recognizes that "P12345" is a UniProtKB/Swiss-Prot accession number and automatically cross-references the entry based on this identifier.



    Compared to the original SWISS-2DPAGE manual, some syntax modifications on the 2D lines have been adopted by the tool to suite the need for a more elaborate annotation for PMF lines (peptides fingerprinting) and MS/MS lines (tandem mass spectrometry) combined with peptide sequences. The rules are:

    - All the standard syntax is still perfectly sufficient, e.g. for a PMF list:

    2D   -!- PEPTIDE MASSES: SPOT 'SERIAL NUMBER': MASSES_LIST;
    2D SUBSEQUENT MASSES_LIST; 'ENZYME'.
    e.g.
    2D   -!-   PEPTIDE MASSES: SPOT 1234: 1001.631; 1267.653; 1731.898;
    2D 1821.909; TRYPSIN.

    - Intensities can be included between parenthesis following each mZ value; e.g.
    2D   -!-   PEPTIDE MASSES: SPOT 1234: 1001.631 (243.34); 1267.653 (190.11); 1731.898 (340.81);
    2D 1821.909 (301.11); TRYPSIN.

    - A new "MASS SPECTROMETRY" category is introduced to list Tandem MS peak lists:
    2D   -!-   MASS SPECTROMETRY: SPOT 89: [ParentPetideMass:ParentPeptideCharge] MASSES_LIST;
    2D SUBSEQUENT MASSES_LIST.

    The ParentPetideMass and the ParentPeptideCharge are optional. If present they are separated
    by a colon and given inside square brackets. If just one value is given, it is considered to be the parent charge.
    The syntax for the masses and their intensities are similar to the PMF syntax.
    A final period '.' is required at the end of the very last line of the section.

    e.g.

    2D   -!-   MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94); 286.107811 (280.22);
    2D 458.7317 (595.24); 859.5010 (379.048); 859.7686 (171.43); 860.4855 (113.333).

    - For both "MASS SPECTROMETRY" and "PEPTIDE MASSES" we may separate the experimental data (all peaks) from those retained being significant for the identification (analysis) part by double colons, e.g.
    2D   -!-   MASS SPECTROMETRY: [1200.7:1-] 869.468(3.09);524.448(2.67);635.708(3.17);712.129(1.2)::777.77(3.7);888.48(2.8);...

    The left part is supposed to be the significant values for the identification (analysis), while the right part lists all values, or just the additional other values not retained for the identification.

    - You can include related local MS files to be displayed, or external URLs if data is stored on the Web (e.g. on some repository). The keywords to use are: file, ident-file, uri and ident-uri. A colon separate the keywords from their value (a file path or a Web address).
    • file for a local MS file, e.g. 
      2D   -!-   MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; file:/some_path/msms.pkl.
    • ident-file for a local MS identification report, e.g (a Mascot report).
      2D   -!-   MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; ident-file:/some_path/msIdentResults.dat.
    • uri for a MS file located on the Web, e.g. 
      2D   -!-   MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; uri:http://www.ebi.ac.uk/pride/search.do?someID.
    • ident-uri for a MS identification report located on the Web, e.g. 
      2D   -!-   MASS SPECTROMETRY: SPOT 89: [1723.9581:1+] 270.074448 (491.94);...; ident-uri:http://www.ebi.ac.uk/pride/search.do?someID.
    All those document annotations are optional and may be combined in any order (separate document annotations by spaces). Remember to always terminate the section with a final period.
    e.g.
    2D   -!-   TANDEM MASS SPECTROMETRY: SPOT 111: [630.878:1+] 86.1001 (6.2857); 120.0644 (29.8095); 120.1283 (2.1905);
    2D        file:msms.pkl uri:http://www.ebi.ac.uk/pride/search.do?directLink=true&experimentAccessionNumber=1
    2D        ident-file:msIdentResults.dat


    You will probably not need to give any MASS_LIST when pointing to some file (as those files should contain the peak list values themselves). Nevertheless, you should still give an Enzyme Name when dealing with "PEPTIDE MASSES" (PMF) data.

    - A keyword to tell Maked2D-DB II that the identification is to be hidden from public access (by default this keyword is 'private')  may be added between brackets before the final period.
    2D   -!-   MASS SPECTROMETRY: [1723.9581:1+] 270.074448 (491.94);...; file:/some_path/msms.pkl {private}.
    Though, there is a far better alternative way to control the visibility of your data. The section hiding parts of your data from public access explains how to set this up.

    - When listing several lists of "MASS SPECTROMETRY" and their corresponding identified "PEPTIDE SEQUENCES", the order of correspondence between the MS data section and the identified peptides section follows the same order in which they are given. e.g. the first "PEPTIDE SEQUENCES" list correlates with the first "MASS SPECTROMETRY" list, and so on...

    - The mapping (identification) methods are vocabulary controlled and are defined in the editable basic_include.pl main configuration file inside the %mapping_methods_description list. You may redefine or add your own mapping methods within this list (contact us if any help is needed).

    [top]



    WORKING WITH SPREADSHEETS (Tab-delimited files)

    By using the spreadsheets mode, users have the choice to work with a large range of pre-defined annotations, but also with any number of their own personal free-text annotations. The spreadsheets mode mean any text report with fields separated by tabulators (tab-delimited files/CSV). Those are, for example, spreadsheet software exports (e.g. EXCEL) into text files. When you export such reports, make sure to select the tabulator to be your delimiter!
    Being simple text files, it is also possible to write manually such reports in any text editor, taking care to separate fields with tabs, and to save in simple text format.

    You instruct Make2D-DB II to work in the spreadsheets mode by defining in your config.cfg configuration file the $db_file variable (the flat file name) to be empty ($db_file = "") and by setting the $Melanie variable to null ($Melanie = 0).

    You should provide a separate report file for each of your maps. The report file name should be written exactly like the Gel name you would have given in your existing.maps configuration file. You should always use the extension '.txt' (e.g. PLASMA.txt, PLASMA2.txt or PLASMA3.txt)

    The first line of your report should contain the headers for the various columns. Those headers will be used by the tool to know what is the annotation category of each column.  Headers can follow any order in your report, except for the very first header which has always to be the "SPOT" header.
    Do not duplicate any header, instead, check below for each header category how to separate different elements.
    All headers will be upper-cased by the tool. They may be contained inside double quotes or not.

    There are three main categories of headers:

    The mandatory headers

    There are four required headers. They are:
    1. "SPOT" header: The column for this header should be the first column to be defined. It contains the spot ID. You may use any single word for the values (e.g. 900 or 2D-TWX222).
    2. "X" header: This is the x-coordinates of the spot on the gel image (the width value) in pixel. Values should be positive or 0.
    3. "Y" header: This is the y-coordinates of the spot on the gel image (the height value) in pixel. Values should be positive or 0.
    4. "MW" header: The apparent molecular weight of the spot on the gel. Values are given in Dalton*. Use only integer numbers.
    Only one single value per data line is admitted for these headers.

    You may have several lines with the same spot ID. This is useful when you want to include several annotations for the same spot (like when you have several identified proteins for the same spot, or when you have several independent MS analysis, etc..). When a spot is listed more than once, its X/Y coordinates, as well as its pI/Mw values, are only retained from their last occurrence. It is also not necessary to give again the X , Y, MW and PI values for a spot after they have been already given in a precedent line (c.f PLASMA2.txt).

    The origin to evaluate the X/Y positions is the top-left corner of the image.

    *You may also give MW values in kDa. The tool will assume they are in kDa if their values are low enough not to be in Dalton (e.g. 20.5).

    The pre-defined headers

    Those headers do have a special definition. They are optional but their values is restricted to some associated syntax. You may use any combination of them, in any order, without duplicating any of them:
    1. "PI" header: The apparent pI of the spot on the gel. If this column is not present then we are in presence of a SDS gel (bands). Otherwise, define a positive value starting from 0 (use real numbers, e-g. 7.443). The tool expects to find a defined value for all spots or no value at all for all of them.
    2. "AC" header: This is the column to hold the identified protein accession numbers (if known). Give a Swiss-Prot (UniProtKB) accession number for best results. Leave blank if no protein has been identified. When several proteins are identified for the same spot, write an independent line for each of them (e.g. spot 397 from the PLASMA.txt report).
    3. "MAPPING METHODS" header: You may use this column to list the different mapping (identification) methods used for the spot's identification. The mapping methods are vocabulary controlled and are defined in the editable basic_include.pl main configuration file inside the %mapping_methods_description list. You may use here the keywords separated by commas (e.g. "MS/MS, Gm, Co" to display 'Tandem mass spectrometry', 'Gel matching' and 'Comigration' within your entries). You may redefine or add your own mapping / identification methods within this list (contact us if any help is needed).
    4. "OD" header (alias "%OD"): Relative optical densities (%Od) are listed here. Values range from 0.0 up to 100.0 (use real numbers, e.g. 0.32112).
    5. "VOL" header (alias "%VOL"): Relative volumes (%Vol) are listed here. Values range from 0.0 up to 100.0 (use real numbers, e.g. 0.32112).
    6. "AMINO ACID" header: This column is used to list the experimental analysis results by amino acid composition. The syntax follows the one shown in the SWISS-2DPAGE 2D lines manual for the "AMINO ACID COMPOSITION"
    7. "PMF" header: Peptide fingerprinting peak lists are listed here and follows basically the SWISS-2DPAGE 2D lines manual syntax for "PEPTIDE MASSES". You may also include the intensities of the pics following the intensity rule and the ident data rule given in the previous section.
    8. "MS" header (alias "MS/MS" or "MASS SPECTROMETRY"): Tandem mass sepctrometry peak lists are listed here and follows the Mass Spectrometry rule, as well as the intensity rule and the ident data rule given in the previous section.
    9. "PMF FILE" header: Instead of listing your PMF peak lists yourself, you may just give the absolute or relative path for your local PMF experimental data file (e.g. a pmf.dta file) in this column. The tool will execute the appropriate conversion over your files to include their content within your database.
    10. "MS FILE" header: Instead of listing your tandem MS peak lists yourself, you may just give the absolute or relative path for your local MS experimental data file (e.g. a msms.mgf file) in this column. The tool will execute the appropriate conversion over your files to include their content within your database. The tool usually rely on the file extension to "guess" its format. You will need, depending on the format you are using, to explicitly tell Make2D-DB II what is the used format. Read the note entitled "Input formats for MS/MS" below for more details.
    11. "PMF URI" header: Here you can give a URL (namely URI) pointing to your experimental data to be viewed if the later is stored in some repository (e.g. PRIDE) or is accessible from the Web. You can still populate the column "PMF" with peak list data if you wish to.
    12. "MS URI" header: Here you can give a URL (namely URI) pointing to your eperimental data if the later is stored in some repository (e.g. PRIDE) or is accessible from the Web. You can still populate the column "MS" with peak list data if you wish to.
    13. "PMF IDENT-FILE" header: PMF Analysis documents/reports can be given here (e.g. a Mascot search report) when they are present. Give an absolute or relative path for your local files.
    14. "MS IDENT-FILE" header: MS Analysis documents/reports can be given here (e.g. a PSI AnalysisXML or a Phenyx search report) when they are present. Give an absolute or relative path for your local files.
    15. "PMF IDENT-URI" header: Like the "PMF URI" header, you may give URLs pointing to some repository or any Web location where your PMF identification/analysis report may be viewed.
    16. "MS IDENT-URI" header: Like the "MS URI" header, you may give URLs pointing to some repository or any Web location where your MS identification/analysis report may be viewed.
    17. "PEPTIDES" header: The peptides are the identified peptide sequences related to the MS/MS data. The syntax do exactly follow the one given in the SWISS-2DPAGE 2D lines manual "PEPTIDE SEQUENCES".

    Input formats for MS/MS: idj, mzdata, mzxml, btdx, dta, mgf, peptMatches, pkl. The Tool will rely on the extension of your given file to "guess" what is its format. When dealing with PSI mzData, or mzXML formats (who both usually have the extension .xml), you should precise their format by giving the format name, followed by a colon before the path to your file, e.g. "mzdata:/some_path/my_MS_file.xml" or "mzxml:/some_path/my_MS_file.xml". This is also perfectly fine with files having the same extension as their files, which mean that "/some_path/some_MS_file.pkl" and "pkl:/some_path/some_MS_file.pkl" are both correct.
    The file report PLASMA2.txt gives many examples of MS annotations.

    Listing several PMF/MS files or URIs: In order to list more than one element under the PMF/MS file and URI categories (headers 9 to 16), simply separate them by spaces. To ensure correspondance between elements across different categories (e.g. between analysis and identifcation files), respect the order they are listed with across the different columns.

    1. "REFERENCE" header: By listing your bibliographic references following the SWISS-2DPAGE format in a separate file that you call 'reference.txt' in your data directory (example), you can list in this column the reference numbers related to each entry. Many references can be given separated by commas, (e.g. 1,2,8). e.g. PLASMA2.txt. Remember that RP, RA (or RG) and RL lines - respectively the 'Reference Position', the 'Reference Author' (or the 'Reference Group') and the 'Reference Location' - must be defined in all references, all the other lines are optional (and no need for a RN line).
    2. "XREF" header (alias "CROSS-REFERENCES"): If a protein has been identified for your spot, you may list here as many cross-references to external ressources as you wish. The syntax to follow is "Xref_Database ID1 & Xref_Database ID1; ID2 & ..." (e.g. "Swiss-Prot P04040 & SWISS-2DPAGE P04040"). Only if your main accession number is already a UniProtKB (Swiss-Prot or TrEMBL) identifier that a large collection of cross-references will be automatically integrated, with no need to define anything for the XREF field. In the other hand, if your identifier is not a UniProtKB AC, you may find it very useful to define here a cross-reference to UniProtKB (Swiss-Prot or TrEMBL) to activate external data retrieval relatred to the UniProtKB. For more information on the cross-reference database list available with this tool, see cross-references.

    The free-text headers

    You may include as many free-text columns as you wish. Two classes are though distinct:

    - The "COMMENT" class:  Whenever your header begins with the keyword "COMMENT:" then it is considered a general comment related to the identified protein (e.g. "COMMENT: SUBUNIT" or "COMMENT: MISCELLANEOUS" columns in PLASMA.txt). No syntax check is applied.

    - The "2D" class:  All the other free-text headers will fall into this class. Those are considered as free-text 2D annotations. (e.g. "PATHOLOGY LEVEL" or "EXPRESSION" columns in PLASMA3.txt). No syntax check is applied.

    A free 2D annotation is applied specifically to the spot it is given for. A convenient manner to apply a free 2D annotation to all spots of a map all at once is to precede the header name of the annotation by a star '*', e.g. "* EXPRESSION". If we would like to only apply the annotation to all the spots related to a particular protein, then precede the annotation itself by a star '*', and define the annotation for only one of the spots related to this protein, e.g. "* method not applicable on this protein".



    For completion purpose, we should mention that the older format for spreadsheets is still accepted by the tool. This older format is much more restricted and does not support headers. It has 2 possible syntaxes:

    the short syntax (without identification annotations)
    Spot  X   Y   pI  Mw  [AC1 AC2 AC3]
    and the long one (with ordered identification annotations, e.g. PLASMA3_noheaders.txt)
    Spot  X   Y   pI   Mw   [AC1]   [IdentMethod1,IdentMethod2,..]   [PMF]   [MS/MS]   [AMINO ACID COMPOSITION]   [%od]   [%vol]
    
    


    Based on your CSV reports, the tool will generate an intermediate 'last_created_flat_file.dat' file. You may then choose to continue, or to interrupt the process of conversion. If you interrupt the process, you will be able to manually edit the 'last_created_flat_file.dat' now present in your data directory if you wish to add more annotations or to change others. You should then save the edited file under another name (e.g. newFlatFile.dat) and define the falt file variable $db_file to be equal to this new file name (without any path) before resuming your installation. This will then switch you to the flat file mode. Otherwise, continue to proceed without interruption.

    [top]



    WORKING WITH MELANIE (with no flat file)


    Make sure you have the common perl XML::Parser and libxml-perl modules installed on your system (the tool will need to use the XML::Parser::PerlSAX perl module). If not, ask your system administrator to install a recent version.

    By giving a void string to the $db_file variable in your config.cfg file and a positive value for the $Melanie variable ($Melanie = 1) you are telling Make2D-DB II to work in the Melanie/Image Master XML mode. The Make2D-DB MelanieXMLParser module will consider the name of the gel image file exported within the Melanie file to be the gel name to use (e.g. PLASMA or PLASMA2), so make sure before exporting your XML files with Melanie that the name of the gel image file exported is exactly written as you would like your gel to be called within the new database (the tool will automatically truncate the path and the extensions '.tif' or '.mel' from the gel file name). You may use as many Melanie XML separated files as you wish; the tool parses all files it founds in the data directory which have the extension '.xml'.

    It is not strictely required to prepare an "existing.maps" file (but, if present, this one will override Melanie XML values). Though, an "exsiting.maps" file will give you the opportunity to attach much more annotations to your maps.

    Based on your Melanie XML exports, the tool will generate an intermediate 'last_created_flat_file.dat' file. You may then choose to continue, or to interrupt the process of conversion. If you interrupt the process, you will be able to manually edit the 'last_created_flat_file.dat' now present in your data directory if you wish to add more annotations or to change others. You should then save the edited file under another name (e.g. newFlatFile.dat) and define the falt file variable $db_file to be equal to this new file name (without any path) before resuming your installation. This will then switch you to the flat file mode. Otherwise, continue to proceed without interruption.

    [top]



    Hiding parts of your data from public access

    The Make2D-DB II tool lets you control which data is to be displayed for public users, and which data should be restricted to administrators and privilaged/private users. You may use three distinct files to control which of your entire gels are to be private, which of your protein entries are to be private and which of your spots experimental identification data and analysis are to be private:
    • The hiddenGels.txt file: This file takes the list of gels to be hidden from public users.
    • The hiddenEntries.txt file: This file takes the list of protein accession numbers to be hidden from public users.
    • The hiddenSpots.txt file: This file controls if an association between a spot and an identified protein is to be shown or not. It also controls if identification data from 'MS/MS' (tandem mass spectrometry), 'PMF' (peptide mass fingerprinting) or 'Aa' (amino acid composition) are to be displayed for public users or not.
    You may generate those files yourself from scratch (comment lines beginning with a '#' character are ignored) or use the master files in the readme directory. Place them with their respective names inside your data directory. Those files will then be read by the tool and will be also copied to your server directories, so you may decide at any moment later to modify them to activate back some of your hidden data, or instead to make some more data hidden. There is a section in the Web administration interface explaining how to manage this task.

    All those three master files can be found in the readme directory. You may copy them to your data directory and then edit them using any text editor. The three master files fully describe the syntax to follow. Here are three examples of edited files located in the data_test directory: (hiddenGels.txt example, hiddenEntries.txt example and hiddenSpots.txt example)

    The administrator will always have full access to private data. He might also give a password that privileged users should provide to access such data. This password is configurable within the generated server configuration file 2d_include.pl.

    [top]



    The test database

    A test dataset containing various data source formats is included within this package in the data_test directory. You should read the package content section from the Readme: Main page which describes in details the content of both the test_data directory, and its sub-directory examples. You may try different combination of settings, like for example using the spreadsheets mode1 (with PLASMA.txt and PLASMA2.txt), using the Melanie Export.xml file2 or using the test.dat flat file3 combined with the text reports (PLASMA_example_report.txt and PLASMA2_example_report.txt, to be copied to data_test and renamed to PLASMA.txt and PASLAM2.txt). You may also try the flat file mode combined with the Melanie XML export4 (both Export.xml and test.dat) as your source data. You may edit the different hidden*.txt files to control the effect on the query interface, and so on. To try those different approaches, you will have first to configure adequate configuration files, like described in the Readme: Configuration page.
    1. in include.cfg: set $db_file = "" and $Melanie = 0;
    2. in include.cfg: set $db_file = "" and $Melanie = 1;
    3. in include.cfg: set $db_file = "test.dat" and $Melanie = 0;
    4. in include.cfg: set $db_file = "test.dat" and $Melanie = 1;
    You may also edit the file subtitle.html1 wich will be displayed in the Web interface as a subtitle section. This file can be a simple text file, or a HTML tagged file (without headers!), and may then contain any HTML tags including images and external links. Have also a look at the file references.txt which lists some bibliographic references cited from within the spreadsheet report PLASMA2.txt.

    1. The tool will always look for the presence of a file called "subtitle.html" in your 'data' directory to include it as a subtitle in your Web interface. So, it is a good place to write some description of your database, your institution, to include some logos, and so on.

    [top]



    The database cross-reference links

    A file listing some URL links to different database cross-references (mainly for the DR lines) is provided within this package (in the 'text' directory). The file name is 'DbCrossRefs.txt' (this file is only present if you allow the tool to extract data from the ExPASy server). Otherwise, the tool will use the file called 'links.txt'. You can let the tool use this file as it is, or choose to edit it yourself to add or update URLs.
    If you edit directly this file from the 'text' directory, the changes will apply to all your subsequent installations, but your changes may not remain permanent (because the file is automatically made up-to-date by contacting the ExPASy server). It is recommended that you update this file specifically for one installation by editing it, after your installation is complete, from your Web server directory where it has been copied (by default the copy of this file should be found in '/www/var/cgi-bin/2d/inc/links.txt' or similar). See Readme: Main for more details.

     [top]



    Related documents:

    For any question, suggestion or comment: Please, contact .

    [top]