Protein-DNA Interface Modeling Software
The program interface model (intf_model.exe) can be used predict the conformation of the sidechains within a protein-DNA binding interface as described in Siggers and Honig, (2007) Nucl. Acid Res. 35:1085-1097.. The program allows the identity of the amino acid and base residues in the interface to be changed by the user. Generating multiple, modeled protein-DNA complexes by modeling the binding interface of a protein-DNA complex with multiple DNA sequences, followed by a binding energy calculation on the modeled complexes, provides a way to predict the binding specificity of a protein for different DNA sequences.
Download files
See explanation document for description of these files here
- intf_model.exe: Linux compiled executable. For source code contact Trevor Siggers: tws11 < at> columbia.edu
- pdb_to_Amber.pl: Perl script for PDB file format conversion
- comfile: Example command file (COMFILE)
- RESLIST: Example RESLIST file
- AMBER98.top, AMBER98_0.5Phosph.top: Two topology files.
- Xiang_large_TOR.lib, Xiang_medium_XYZ.lib: Torsional and cartesian sidechain rotamer library files.
- pol, nopol: Rotatable/polar hydrogen description files.
Tutorial
To run the program one requires a topology file (*.top) and a command file (COMFILE), both are described below.
Topology File: The program requires a topology file that describes the topology and force-field of the biomolecule (e.g. protein & DNA) being used. These files are analogous to the *.top *.crg files used by CHARMM. Two topology files are provided here for the AMBER98 force-field: AMBER98.top is the standard AMBER98 force-field (with the exception of improper dihedral terms), and AMBER98_0.5Phosph has the charge of the DNA phosphate groups scaled as described in the Siggers & Honig (2007). This topology file is read using an environment variable (TROLLTOP). It is easiest if this is set in your .tcshrc (or equivalent) file:
setenv TROLLTOP /foo/AMBER98.top
Command File (COMFILE): The program is run using a Command File (COMFILE) which describes all the input parameters and input files. The COMFILE is a plain text file listing all the arguments – one per line. The arguments needed in the COMFILE and short explanations are listed below; however, running the program with the option –help (>intf_model.exe –help) will similarly list the options with a short description of each. Comment lines can be indicated with preceding ‘//’ characters. For several of the arguments extra details on file formats are provided below.
COMFILE arguments/syntax:
Additional parameter/file information:
-i PDB.file: This PDB file needs to be formatted to agree with the syntax
in the topology file. A perl script is included here to format a
standard PDB file to agree with the two AMBER98topology files
provide. This script can be run as shown below. Currently
parameters for metal ions are not included, therefore PDB atom
lines, such as for Zn atoms in Zinc-finger proteins, need to be
manually removed before running the script.
> perl pdb_to_Amber.pl –i FOO.pdb > FOO_converted.pdb
-res RESLIST: The residue list (RESLIST) file describes which sidechains and nucleotides will be re-modeled. The syntax of the file is as follows:
LINE 1: subset description of residues to model
LINES 2-N: identity of the residues indicated in LINE1
LINE N+1-M: Constraint lines to constrain rotamer sampling.
Example RESLIST file: (chain A and range 10-12) or (chain B and range 1-3) or (chain C and range 5-7)
ASP A10
LEU A11
TRP A12
CYS A13
GUA B 1CYT A 7
THY B 2ADE A 6
THY B 3ADE A 5
CON 1.0 :chain B or chain C
Line 1 indicates that residues 10-12 from chain A, residues 1-3 from chain B and residues 5-7 from chain C should all be modeled. Residues do not need to be contiguous however, to select residues 1 and 3 from chain A one would write: chain A and (range 1 or range 3). The following lines (2-8) indicate the residue identities, protein sidechains are written one per line while nucleotides a paired up with their base-pairing partner nucleotides as indicated. Only residues indicated in line 1 will be modeled, therefore, CYS A 13 (line 5) will not be modeled. The constraint line(line 9) indicates that for chains B and chain C only rotamers with an rmsd <= 1.0 angstroms with the crystal structure PDB will be allowed. This line only makes sense for sidechain residues where the identity doesn’t change and for nucleotide rotamers, where the RMSD is calculated using the sugar heavy atoms and the N1 (pyrimidine bases) or N9 (purine bases).
-libSC_LIB: A copy of the large torsional rotamer derived fro the cartesian
library of Xiang & Honig (2001) JMB 311:421 is provided. As
well, a smaller cartesian (XYZ) version of Xiang & Honig
library is provided. The syntax of the rotamer libraries needs
to follow that of these files and the rotamer type (TOR or
XYZ) needs to be indicated with the –prot_lib_type
argument.
-pol pol: This file contains one line that indicates which hydrogren
atoms should be treated as rotatable. Rotatable hydrogens
(e.g. CYS HG1) will be rotationally sampled during the
modeling (i.e. when selecting the lowest energy rotamer, for
each CYS rotamer, the CA-CB-SG-HG1 dihedral angle will be
sampled at 15 degree increments). These rotatable hydrogens
are normally: CYS, THR, SER, TYR. Two files are included
here, the file pol indicates the CYS,THR,SER and TYR atoms
(with the correct atom names) should be treated as rotatable,
and nopol is a dummy file indicating that no hydrogens
should be treated as rotatable.
Running the program:
>intf_model.exe –com comfile
References
Siggers TW, Honig B. Structure-based prediction of C2H2 zinc-finger binding specificity: sensitivity to docking geometry. Nucleic Acids Res. 2007;35(4):1085-97.
Questions
If you have questions related to Protein-DNA Modeling Interface Software, contact honig_software@c2b2.columbia.edu.