[Feature-dev] [ feature-Features-1381 ] Featurize should read alternative PDB structural models

noreply at simtk.org noreply at simtk.org
Fri Oct 15 15:00:13 PDT 2010


Features item #1381, was opened at 2010-10-15 14:00
You can respond by visiting: 
https://simtk.org/tracker/?func=detail&atid=148&aid=1381&group_id=16

Category: None
Group: None
Status: Open
Resolution: None
Priority: 2
Submitted By: Mike Wong (mikewong899)
Assigned to: Gurgen Tumanyan (tumanian)
Summary: Featurize should read alternative PDB structural models

Initial Comment:
Protein Data Bank files often have more than one structural model. Each of these alternate configurations shows the atoms in slightly different locations. Rarely one of the alternate configurations will have extra atoms (e.g. Hydrogen atoms).

Currently FEATURE ignores all but the first PDB structural model. However, some ML methods may include other structural models as training data. After all, they are valid evidence gathered through acceptable experimental methods.

Therefore, the following alterations should be made to support this feature request.

1. Point files should accept a model accession number in the first column:


1kft.21
2ys1.9
1lhe.1

The grammar should be <PDB ID>.<MODEL ACCESSION NUMBER>

Note that 1lhe has only one model, and there is no MODEL or ENDMDL record type.

2. Protein.cc will have to be modified to skip to the requested model. If that model does not exist, it should throw a fatal error and let the user know what model accession numbers are available.

3. featurize.cc will have to be modified to take a new model accession parameter as a command-line option.

----------------------------------------------------------------------

You can respond by visiting: 
https://simtk.org/tracker/?func=detail&atid=148&aid=1381&group_id=16


More information about the Feature-dev mailing list