CYANA Macro: noeassign: Difference between revisions

From CYANA Wiki
Jump to navigation Jump to search
Line 19: Line 19:


Performs a complete structure calculation with several cycles of automated NOE assignment (Herrmann et al., 2002) according to the following parameters:
Performs a complete structure calculation with several cycles of automated NOE assignment (Herrmann et al., 2002) according to the following parameters:
; '''peaks''' : Names of the input NOESY peak lists. Multiple peak list names must be separated by commas without blanks before or after a comma. Peak list names with the extension .'''xpk''' refer to NMRView peak lists, otherwise XEASY format peak lists with default extension .'''peaks''' are assumed.
; peaks: Names of the input NOESY peak lists. Multiple peak list names must be separated by commas without blanks before or after a comma. Peak list names with the extension .'''xpk''' refer to NMRView peak lists, otherwise XEASY format peak lists with default extension .'''peaks''' are assumed.
; '''format''' : Order of the dimensions in each of the peak lists given by the parameter '''peaks'''. Multiple format declarations, given in the same order as the corresponding peak list names in the '''peaks''' parameter, must be separated by commas without blanks before or after a comma. The order of dimensions of a peak list is given by a string with one of character for each dimension. '''H''' and '''h''' denote <sup>1</sup>H dimensions, '''N''' or '''C''' stand for the dimension of the <sup>13</sup>C or <sup>15</sup>N nucleus that is covalently bound to the proton of the '''H''' dimension in 3D and 4D NOESY spectra, and '''n''' or '''c''' stand for the dimension of the <sup>13</sup>C or <sup>15</sup>N nucleus that is covalently bound to the proton of the h dimension in 4D NOESY spectra. For instance, the format string '''ChH''' for a 3D NOESY spectrum indicates that the three dimensions in the peak list correspond, in this order, to <sup>13</sup>C, the “free” <sup>1</sup>H, and the <sup>1</sup>H that is bound to the <sup>13</sup>C of the first dimension. The '''format''' parameter is optional if the order of the dimensions in the peak lists can be determined otherwise, i.e. if either all peak lists are 2D, or if format declarations are included in the peak lists with “'''#CYANAFORMAT'''” entries, or if the program can unambiguously deduce the correct ordering of the dimensions from assigned peaks in the peak lists. In general, it is most convenient to declare the order of the dimensions in the peak list by a “'''#CYANAFORMAT'''” statement.
; format: Order of the dimensions in each of the peak lists given by the parameter '''peaks'''. Multiple format declarations, given in the same order as the corresponding peak list names in the '''peaks''' parameter, must be separated by commas without blanks before or after a comma. The order of dimensions of a peak list is given by a string with one of character for each dimension. '''H''' and '''h''' denote <sup>1</sup>H dimensions, '''N''' or '''C''' stand for the dimension of the <sup>13</sup>C or <sup>15</sup>N nucleus that is covalently bound to the proton of the '''H''' dimension in 3D and 4D NOESY spectra, and '''n''' or '''c''' stand for the dimension of the <sup>13</sup>C or <sup>15</sup>N nucleus that is covalently bound to the proton of the h dimension in 4D NOESY spectra. For instance, the format string '''ChH''' for a 3D NOESY spectrum indicates that the three dimensions in the peak list correspond, in this order, to <sup>13</sup>C, the “free” <sup>1</sup>H, and the <sup>1</sup>H that is bound to the <sup>13</sup>C of the first dimension. The '''format''' parameter is optional if the order of the dimensions in the peak lists can be determined otherwise, i.e. if either all peak lists are 2D, or if format declarations are included in the peak lists with “'''#CYANAFORMAT'''” entries, or if the program can unambiguously deduce the correct ordering of the dimensions from assigned peaks in the peak lists. In general, it is most convenient to declare the order of the dimensions in the peak list by a “'''#CYANAFORMAT'''” statement.
; '''prot''' : Names of the input chemical shift lists. Multiple chemical shift list names must be separated by commas without blanks before or after a comma. The default extension for chemical shift list names is .'''prot'''. Each chemical shift list is used for the corresponding peak list given in the '''peaks''' parameter. If less chemical shift lists are given than there are peak lists, then the last chemical shift list is used for all remaining peak lists. In particular, it is possible, and recommended, to use a single chemical shift list for all peak lists.
; '''prot''' : Names of the input chemical shift lists. Multiple chemical shift list names must be separated by commas without blanks before or after a comma. The default extension for chemical shift list names is .'''prot'''. Each chemical shift list is used for the corresponding peak list given in the '''peaks''' parameter. If less chemical shift lists are given than there are peak lists, then the last chemical shift list is used for all remaining peak lists. In particular, it is possible, and recommended, to use a single chemical shift list for all peak lists.
; '''cycles''' : Cycles of automated NOESY assignment and structure calculation that are performed, given as an integer range. By default, seven cycles numbered 1–7 are performed, followed by a final structure calculation using the NOE assignments of the last cycle. Cycles are skipped if their output structure  ('''cycle'''''n''.'''pdb'''), or the output structure of a later cycle, already exists. This allows to automatically continue an interrupted '''noeassign''' calculation with the next unfinished cycle. To repeat a complete calculation, it is necessary to remove the output files of the previous run, for instance with the '''cyanaclean''' command.
; '''cycles''' : Cycles of automated NOESY assignment and structure calculation that are performed, given as an integer range. By default, seven cycles numbered 1–7 are performed, followed by a final structure calculation using the NOE assignments of the last cycle. Cycles are skipped if their output structure  ('''cycle'''''n''.'''pdb'''), or the output structure of a later cycle, already exists. This allows to automatically continue an interrupted '''noeassign''' calculation with the next unfinished cycle. To repeat a complete calculation, it is necessary to remove the output files of the previous run, for instance with the '''cyanaclean''' command.

Revision as of 08:56, 8 August 2009

Parameters

peaks=string
(required)
format=string
(default: none)
prot=string
(required)
cycles=integer range
(default: 1..7)
combination=integer range
(default: 1..2)
keep=string
(default: none)
confine=real
(default: 1.0E10)
calculation=string
(default: structcalc)
autoaco
shiftassign
multiple
stereoexpand
details

Description

Performs a complete structure calculation with several cycles of automated NOE assignment (Herrmann et al., 2002) according to the following parameters:

peaks
Names of the input NOESY peak lists. Multiple peak list names must be separated by commas without blanks before or after a comma. Peak list names with the extension .xpk refer to NMRView peak lists, otherwise XEASY format peak lists with default extension .peaks are assumed.
format
Order of the dimensions in each of the peak lists given by the parameter peaks. Multiple format declarations, given in the same order as the corresponding peak list names in the peaks parameter, must be separated by commas without blanks before or after a comma. The order of dimensions of a peak list is given by a string with one of character for each dimension. H and h denote 1H dimensions, N or C stand for the dimension of the 13C or 15N nucleus that is covalently bound to the proton of the H dimension in 3D and 4D NOESY spectra, and n or c stand for the dimension of the 13C or 15N nucleus that is covalently bound to the proton of the h dimension in 4D NOESY spectra. For instance, the format string ChH for a 3D NOESY spectrum indicates that the three dimensions in the peak list correspond, in this order, to 13C, the “free” 1H, and the 1H that is bound to the 13C of the first dimension. The format parameter is optional if the order of the dimensions in the peak lists can be determined otherwise, i.e. if either all peak lists are 2D, or if format declarations are included in the peak lists with “#CYANAFORMAT” entries, or if the program can unambiguously deduce the correct ordering of the dimensions from assigned peaks in the peak lists. In general, it is most convenient to declare the order of the dimensions in the peak list by a “#CYANAFORMAT” statement.
prot
Names of the input chemical shift lists. Multiple chemical shift list names must be separated by commas without blanks before or after a comma. The default extension for chemical shift list names is .prot. Each chemical shift list is used for the corresponding peak list given in the peaks parameter. If less chemical shift lists are given than there are peak lists, then the last chemical shift list is used for all remaining peak lists. In particular, it is possible, and recommended, to use a single chemical shift list for all peak lists.
cycles
Cycles of automated NOESY assignment and structure calculation that are performed, given as an integer range. By default, seven cycles numbered 1–7 are performed, followed by a final structure calculation using the NOE assignments of the last cycle. Cycles are skipped if their output structure (cyclen.pdb), or the output structure of a later cycle, already exists. This allows to automatically continue an interrupted noeassign calculation with the next unfinished cycle. To repeat a complete calculation, it is necessary to remove the output files of the previous run, for instance with the cyanaclean command.
combination
Cycles of automated NOESY assignment in which constraint combination is applied, given as an integer range. Since the largest numbers of incorrect distance restraints occur in the first cycles, and because constraint combination entails a (temporary) loss of structural data, constraint combination is typically applied in the first two cycles only.
keep
Name of a CYANA macro or command that selects those assigned peaks whose assignment should be kept unchanged during automated NOE assignment. By default, i.e. if this parameter is absent, the program will discard any preexisting NOE assignments in the input peak lists, and will try to assign all peaks. For example, defining a command
	  subroutine KEEP
	    peaks select "*, *"
	  end
and calling the noeassign command as
	  noeassignkeep=KEEP
will keep all peak assignments in the input peak lists fixed and lets the program search for new NOE assignments only for previously unassigned peaks.
confine
Maximal effective distance restraint violation contributing to the target function with violation confinement in cycles 1–2. See variable viocap for details. The confine value is assigned to the variable viocap in cycles 1–2. In subsequent cycles violation confinement is not applied.
calculation
CYANA command used to execute the structure calculation in each cycle. The standard macro structcalc is used is used by default. A user-defined alternative command must understand the same parameters as structcalc.
autoaco
Option to specify the use of temporary torsion angle restraints that favor the allowed regions of the Ramachandran plot (Laskowski et al., 1996) and the staggered rotamer positions for torsion angles between atoms with four covalent bonds, e.g. tetrahedral carbons. The temporary torsion angle restraints are generated with the commands ramaaco and rotameraco in the intermediate cycles, and with the commands ramaaco minimal and rotameraco for the final structure calculation. Temporary torsion angle restraints are applied in the initial and intermediate stages but not in the final stage of the standard simulated annealing schedule with the command anneal.
shiftassign
Option to enable the assignment of missing chemical shift during automated NOE assignment. In this case a new chemical shift list, cyclen.prot, is produced and used in each cycle. The reliability of new chemical shift assignments made in this way should be evaluated critically by the user.
multiple
Option to allow for ambiguous distance restraints also in the final structure calculation. By default, only unambiguous distance restraints are used for the final structure calculation. Ambiguous distance restraints from the last intermediate cycle are split into multiple unambiguous distance restraints, or discarded.
stereoexpand
Option to use the command distance stereoexpand instead of the standard command distance modify to account for the absence of stereospecific assignments.
details
Option to produce, in addition to the normal output, also assigned peak lists from each, not only the last, cycle. In addition XEASY assignment (.assign) files are produced if XEASY format peak lists are used.

Algorithm overview

The algorithm for automated NOE assignment is a re-implementation of the former CANDID procedure (Herrmann et al., 2002) on the basis of a probabilistic treatment of the NOE assignment process. The key features of the algorithm are network anchoring to reduce the initial ambiguity of NOESY peak assignments, ambiguous distance restraints to generate conformational restraints from NOESY cross peaks with multiple possible assignments, and constraint combination to minimize the impact of erroneous distance restraints on the structure. Automated NOE assignment and the structure calculation are combined in an iterative process that comprises, typically, seven cycles of automated NOE assignment and structure calculation, followed by a final structure calculation using only unambiguously assigned distance restraints. Between subsequent cycles, information is transferred exclu¬sively through the intermediary 3D structures. The molecular structure obtained in a given cycle is used to guide the NOE assignments in the following cycle. Otherwise, the same input data are used for all cycles, that is, the amino acid sequence of the protein, one or several chemical shift lists from the sequence-specific resonance assignment, and one or several lists containing the positions and volumes of cross peaks in 2D, 3D or 4D NOESY spectra. The input may further include previously assigned NOE upper distance bounds or other previously assigned conformational restraints for the structure calculation.

In each cycle, first all assignment possibilities of a peak are generated on the basis of the chemical shift values that match the peak position within given tolerance values, and the quality of the fit is expressed by a Gaussian probability, Pshifts. Second, the probability Pstructure for agreement with the preliminary structure from the preceding cycle, represented by a bundle of conformers, is computed as the fraction of the conformers in which the corresponding distance is shorter than the upper distance bound plus the acceptable distance restraint violation cutoff. Assignment possibilities for which the product of these two probabilities is below the required probability threshold are discarded. Third, each remaining assignment possibility is evaluated for its network anchoring, i.e., its embedding in the network formed by the assignment possibilities of all the other peaks and the covalently constrained short-range distances. The network anchoring probability Pnetwork that the distance corresponding to an assignment possibility is shorter than the upper distance bound plus the acceptable violation is computed given the assignments of the other peaks but independent from knowledge of the three-dimensional structure. Contributions to the network anchoring probability for a given “current” assignment possibility result from other peaks with the same assignment, from pairs of peaks that connect indirectly the two atoms of the current assignment possibility via a third atom, and from peaks that connect an atom in the vicinity of the first atom of the current assignment with an atom in the vicinity of the second atom of the current assignment. Short-range distances that are constrained by the covalent geometry can, for network anchoring, take the same role as an unambiguously assigned NOE. Individual contributions to the network anchoring of the current assignment possibility are expressed as probabilities, P1, P2, …, that the distance corresponding to the current assignment possibility satisfies the upper distance bound. The network anchoring probability is obtained from the individual probabilities as Pnetwork = 1-(1-P1)•(1-P2)•••, which is never smaller than the highest probability of an individual network anchoring contribution. Only assignment possibilities for which the product of the three probabilities is above a threshold, Ptot = PshiftsPnetworkPstructurePmin , are accepted. Cross peaks with a single accepted assignment yield a conventional unambiguous distance restraint. Cross peaks with multiple accepted assignments result in Otherwise, an ambiguous distance restraint is generated that embodies multiple accepted assignments.

Spurious distance restraints may arise from the misinterpretation of noise and spectral artifacts, in particular at the outset of a structure determination before 3D structure-based filtering of the restraint assignments can be applied. CYANA uses “constraint combination” (Herrmann et al., 2002) to reduce structural distortions from erroneous distance restraints. Medium-range and long-range distance restraints are incorporated into “combined distance restraints”, which are ambiguous distance restraints with combined assignments from different, in general unrelated, cross peaks. The basic property of ambiguous distance restraints that the restraint will be fulfilled by the correct structure whenever at least one of its assign¬ments is correct, regardless of the presence of additional, erroneous assignments, then implies that such combined restraints have a lower probability of being erroneous than the corresponding original restraints, provided that the fraction of erroneous original restraints is smaller than 50%. Constraint combination aims at minimizing the impact of such imperfections on the resulting structure at the expense of a temporary loss of information, and is applied to medium- and long-range distance restraints in the first two cycles of combined automated NOE assignment and structure calculation with CYANA.

The distance restraints are then included in the input for the structure calculation with simulated annealing by the fast CYANA torsion angle dynamics algorithm (Güntert et al., 1997). The structure calculations typically comprise seven cycles. The second and subsequent cycles differ from the first cycle by the use of additional selection criteria for cross peaks and NOE assignments that are based on assessments relative to the protein 3D structure from the preceding cycle. The precision of the structure determination normally improves with each subsequent cycle. Accordingly, the cutoff for acceptable distance restraint violations in the calculation of Pstructure is tightened from cycle to cycle. In the final cycle, an additional filtering step ensures that all NOEs have either unique assignments to a single pair of hydrogen atoms, or are eliminated from the input for the structure calculation. This facilitates the use of subsequent refinement and analysis programs that cannot handle ambiguous distance restraints.

Input data

Required input data consists of one or several NOESY peak lists in XEASY or NMRView format, and one or several corresponding chemical shift lists. In addition, the noeassign command needs an init.cya macro in the current working directory that reads the residue library and the sequence. Optional input files include additional conformational restraints stored in files with their respective default file name extension and specified by the variable restraints as a comma separated list of their file names without intervening blanks. It is possible to provide an input structure called cyclen.pdb file for cycle n. In this case the calculation will be started with cycle n+1 by making NOESY cross peak assignments based on the structure read from the file cyclen.pdb. The file may contain a bundle of conformers or a single structure. It is also possible to provide an input upper distance limit file called cycle0.upl containing the list of short, covalently constrained distances which is used in the network anchoring of NOE assignments. All distance restraints in this file must have their relative weight (in column 6 of the file) set to zero. By default, the cycle0.upl file is automatically produced by the noeassign command.

Output data

Output data for each cycle n comprise the files

cyclen.noa
Assignment details about every NOESY peak that is
cyclen.upl
NOE upper distance bounds obtained in cycle n
cyclen.pdb
Structure bundle obtained in cycle n
cyclen.ovw
Overview table for the structure calculation in cycle n

Two additional files are written in the last cycle, say cycle 7:

A-cycle7.peaks
Assigned peak lists, where A is the name of the input peak list. A separate file is written for each input peak list. These output peak lists normally contain peaks with ambiguous assignments and can be read by CYANA but in general not by XEASY.
A-cycle7-ref.peaks
Copy of the input peak list A in which the XEASY color code of a peak is set to 1, if the assignments made by the program is compatible with the assignment present in the input peak list, 2, if the two assignments are incompatible, 3, if the peak is assigned by the program but was not assigned in the input peak list, or 4, if the peak is unassigned. Apart from the color code, the input peak list is unchanged. The input assignment of a peak, if present, is only used for comparison, not for assigning the peak in the automated procedure. The A-cycle7-ref.peaks files are readable by XEASY and can be used to visualize the results of the automated NOE assignment in the spectra.

These output peak lists can be saved for all cycles if the option details is set. In this case also XEASY assignment files named A-cyclen.assign are written, if the input peak lists were in XEASY format, and the NOE distance restraints before applying constraint combination are saved as cyclen-uncombined.upl. If the autoaco option is set, then the temporary torsion angle restraints to favor allowed regions of the Ramachandran plot and staggered rotamer positions are saved in the file cycle.aco, which is used in all cycles. The final structure calculation produces the following files:

final.upl
Final NOE upper distance bounds
final.aco
Temporary torsion angle restraints used in the final structure calculation (only if the autoaco option is set)
final.pdb
Final structure bundle
final.ovw
Overview table for the final structure calculation
finalstereo.cya
Stereospecific assignments determined on the basis of the NOE distance restraints
A-final.prot
Copy of the input chemical shift list A in which the chemical shifts of stereospecificly assigned diastereotopic partners are swapped, if necessary
rama.ps
Ramachandran plot for the final structure

The cyanatable command can be used to generate a summary table of a complete structure calculation with automated NOESY assignment by the noeassign command.

Consistency checks

Before starting NOE assignment, the noeassign command checks the input peak lists and chemical shift lists for possible inconsistencies with the peakcheck command, and reports possible problems on standard output. First, the completeness of the 1H chemical shift assignments is evaluated. Missing 1H assignments of backbone amide protons and aliphatic protons and the percentage of assignment completeness for these nuclei is reported. A high degree of completeness is crucial for successful automated NOE assignment with the noeassign command (Jee & Güntert, 2003). If the input peak lists contain assigned peaks, then their position is compared with the chemical shifts of the atoms to which they are assigned, and deviations that exceed the defined chemical shift tolerances are indicated. Similarly, deviations of the chemical shift values for the same atom in different chemical shift lists are reported, as well as chemical shifts that deviate by more than 4 standard deviations from their average value in the residue library. The consistency of the cis/trans proline declarations in the sequence file with the 13C/13C chemical shift values in the (first) chemical shift list are checked with the command cisprocheck (Schubert et al., 2002).

Covalently constrained short distances

Next, the list of short, covalently constrained distances is prepared, unless it is provided explicitly by the user in cycle0.upl file. To this end, a quick “cycle 0” structure calculation of 100 conformers is performed without any NOE distance restraints using variable target function minimization by the command vtfmin. The 20 conformers with the lowest target function values are searched for short 1H–1H distances with the command distances short. Of these all intraresidual distances and all sequential distances shorter than 5 Å are retained and saved in the file cycle0.upl for later use in network anchoring.

Cycles

Typically seven cycles of automated NOE assignment followed by structure calculation are performed. Cross peaks are read and calibrated by the calibration command. The calibration constant for each peak list is determined automatically by default, or can be specified explicitly by setting the variable calibration_constant to a comma-separated list of the calibration constants, given in the same order as the peak lists in the peaks parameter of the noeassign command. The structure from the preceding cycle is read, except in the first cycle, cycle 1, that is performed without initial structure. The assign command is used to assign all NOESY peaks, except those that are to be kept as in the input peak lists according to the keep parameter. Parameters of the assign command depend on the cycle number: The acceptable violation of the distance bound that corresponds to a NOE assignment is decreased from 1.5 Å in cycle 2 to 0.9 Å in cycle 3, 0.6 Å in cycle 4, 0.3 Å in cycle 5, and 0.1 Å in cycles 6 and 7. The potential violations are computed with respect to the structure from the preceding cycle. The threshold probability for acceptable assignments is set to Pmin=0.1 in cycle 1, and Pmin=0.2 in all other cycles. In cycle 1, a cutoff of 0.45 is imposed on the overall quality of a peak assignment (see assign command). Upwards elasticity of the upper distance bounds of up to 25% is allowed from cycle 3 onwards. Constraint combination is applied in cycles 1 and 2 to all distance restraints spanning at least two residues. Stereospecific assignments are optimally swapped in all cycles, i.e. the variable swap is set to 1.

Final structure calculation

The final structure calculation uses the NOE distance restraints from the last NOE assignment cycle (normally, cycle 7), which are converted to unambiguous distance restraints by the command distance split, and interpreted without automatic, on-the-fly swapping of stereospecific assignments, both in order to make them usable in refinement and validation programs that can handle conventional distance restraints only. Stereospecific assignments that are consistent over all 20 conformers from the last cycle are fixed. Distance restraints with not stereospecificly assigned diastereotopic pairs are modified to account for the absence of the stereospecific assignment by the command distances modify or, if the stereoexpand option is set, by the command distances stereoexpand.