CYANA Command: assign
Parameters
- alignfactor=real
- (default: 0.5)
- matchfactor=real
- (default: 0.5)
- violation=real
- (default: -1.0)
- probability=real
- (default: 0.2)
- quality=real
- (default: 0.5)
- elasticity=real range
- (default: 1.0..1.0)
- confidence=real
- (default: 1.0)
- supportweight=real
- (default: 1.0)
- prefer=integer
- (default: 999999)
- interrange=integer range
- (default: 0..)
- unassigned=real
- (default: 0.1)
- short
- changevol
Description
The assign command performs automated assignment of the NOESY cross peaks on the basis of the given chemical shifts, knowledge of covalently constrained short distances, and the selected 3D conformers, if available. The assign command is used in the noeassign macro to implement a combined automated NOESY assignment and structure calculation strategy. (Güntert, 2003; Güntert, 2004; Herrmann et al., 2002; Jee & Güntert, 2003).
Input data
Required input data consists of unassigned (or assigned) NOESY peaks from one or several peak lists, and one or several chemical shift lists. Optional input data comprises a group of selected conformers and a list of covalently constrained short distances. To each input peak an upper distance bound must have been attributed, for instance using the peaks simplecal command or the calibration macro that convert peak intensitites or volumes into distance bounds.
Output data
Output data comprises assignments made by the assign command for the peaks that were NOT selected in the input peak lists, as well as a report including details on the assignment of each individual peak and a summary table. Peaks that were selected on input are not modified. If peaks are assigned and unselected on input, the report also provides a comparison between the input assignment and the new assignment made by the assign command that overwrites the input assignment.
Assignment strategy
First all assignment possibilities of a peak are generated on the basis of the chemical shift values that match the peak position within the tolerance defined by the tolerance variable. Second, the probability for agreement with the bundle of selected conformers, if present, is computed as the fraction of the conformers in which the corresponding distance is shorter than the upper distance bound plus the acceptable violation, and assignment possibilities for which the product of these two probabilities is below the required probability threshold are discarded. Third, each remaining assignment possibility is evaluated for its network anchoring, i.e., its embedding in the network formed by the assignment possibilities of all the other peaks and the covalently constrained distances. The network anchoring probability that the distance corresponding to an assignment is shorter than the upper distance bound plus the acceptable violation is computed given the assignments of the other peaks but independent from knowledge of the three-dimensionl structure. Only assignment possibilities for which the product of the three probabilities is above the required probability threshold, are accepted. Next the overall quality Q of the assignment of a peak is computed from the probabilities of its individual accepted assignment possibilities. The overall quality of a peak assignment is always at least as large as the highest probability of an accepted assignment possibility. Peaks are kept assigned only if their quality exceeds the quality cutoff.
Example assignment report for a peak
Peak 165 from c13.peaks (8.72, 4.11, 59.86 ppm; 3.08 A): 2 out of 4 assignments used, quality = 0.97: * H ILE 64 + HA ILE 63 OK 90 99 100 91 2.1-2.3 1260=69, 63/50=24...(10) H ILE 63 + HA ILE 63 OK 71 71 100 100 2.8-2.8 3.0=100 H SER 43 - HA ILE 63 far 0 95 0 - 6.4-9.0 H ALA 22 - HA ILE 63 far 0 99 0 - 9.9-14.6 Violated in 0 structures by 0.00 A.
- Line 1: Peak number, peak list, peak position, upper distance bound.
- Line 2: Number of used assignments, number of assignment possibilities, overall quality of the peak assignment (0..1). Quality values below the quality cutoff are marked as "low quality", and the peak remains unassigned.
- Lines 3-7: Individual assignment possibilities
- Flag that indicates the input assignment, if present, by a * if it is among the used assignments, or by a ! otherwise.
- First atom, identified by its name, residue name, and residue number
- Flag: +, used assignment; -, assignment possibility not used
- Second atom, identified by its name, residue name and number
- Decision on assignment possibility:
OK, good assignment with probability above the probability cutoff far, structure based probability too low lone, network anchoring based probability too low poor, individual probabilities ok but overall probability too low
- Overall probability for the assignment possibility (%)
- Probability for match between peak position and chemical shifts (%)
- Probability for agreement with input structure bundle (%)
- Probability derived from network anchoring (%)
- Minimal and maximal distance in the selected conformers (Angstrom)
- Most important individual contributions to the network anchoring
   based probability, ordered by decreasing size. The number after the
   equal sign is the probability in percent for the contribution
   identified in front of the equal sign, as follows (only the first
   three possibilities appear in the example above):
   real: covalently constrained distance shorter than real A.
   integer: peak number of a (symmetrically related) peak with the
           same assignment
   integer/integer: numbers of two peaks that relate the two atoms
           of the present assignment through a third atom
   integer/real: peak with number integer connects the first atom
           to a third atom whose distance from the second atom is
           covalently restrained to be shorter than real A.
   real/integer: peak with number integer connects the second
           atom to a third atom whose distance from the first atom
           is covalently restrained to be shorter than real A.
   ~integer: The peak with number integer connects two atoms that
           covalently restrained to be less than x A from the first
           and second atom of the present assignment possibility,
           respectively.
   For reasons of space, only the first few contributions are printed.
   An ellipsis "..." followed by the total number of contributions
   in parenthesis indicates that not all contributions with probability
   greater than 1% are printed.
- Line 8 (last line): Number of conformers in which the upper distance
limit of the ambiguous distance restraint formed by the accepted assignments (marked by + in lines 3-7) is violated by more than the violation threshold, and the average size of the violation.
Covalently contrained distances:
The covalently constrained short distances are normally taken from distance restraints with weight zero, which can be obtained, for instance, by analyzing a bundle of randomized conformers with the distance short command, as implemented in the noeassign macro. If no distance restraints with weight zero exist, the short distances are calculated internally from the select conformers (which should be randomized), if available and if violation is negative, or by an analytical calculation otherwise.
Elasticity of upper distance bounds:
When searching for peak assignments the algorithm can adapt individual upper distance bounds in the input peak lists by a factor within the allowed elasticity range. An individual upper bound can be increased if a slight violation of the original upper distance bound can be avoided by the increased distance limit in at least 80% of the conformers. An individual upper bound can be decreased if the actual distances in the input conformers are consistently shorter than the upper distance bound. By default, there is no "elasticity" of the upper distance bounds, i.e. the input distance limits are used without change. If an upper distance is changed, its modified value is indicated in the first line of the report on the assignment of the peak. The additional option changevol can be used to correct peak volumes according to the internal change of the corresponding upper distance bound using an inverse sixth power relationship.
Additional control parameters:
The probability for the chemical shift matching is calculated using the tolerance values multiplied by matchfactor. A smaller matchfactor implies a higher weight for good agreement between the peak coordinates and the chemical shifts. The mutual alignment of peaks is controlled by the variable tolerance, and the probability for network anchoring is calculated using the tolerance values multiplied by alignfactor. A smaller alignfactor implies a higher weight for good mutual alignment between peaks with assignment possibilities to the same atom(s). When calculating the network anchoring probability of a given peak assignment, the probabilities of other aligned peaks may be scaled by a confidence factor between 0 and 1. Chemical shift assignments with an attached chemical shift error larger than the unassigned cutoff are treated as "unassigned" when determining the initial assignment possibilities of peaks: Only one of the two atoms of an assignment may be "unassigned", and, if in addition the short option is set, only short-range assignments for covalently constrained distances are considered.
Symmetric homodimers:
The assign command provides special features for symmetric homodimers that can be defined with the molecules define command. In the case of a homodimer, only assignments with the first atom in the first monomer are made. The corresponding symmetric distance restraint can be added afterwards with the molecules symmetrize command. Homodimer assignments are restricted to be only intramolecular or only intermolecular for peaks with (XEASY) color codes 8 or 9, respectively. Furthermore, intermolecular homodimer assignments between residues i and j are considered only if |i-j| is within the interrange. Intermolecular assignments of a peak are also excluded if the peak has at least one intramolecular assignment between residues i and j with |i-j| smaller than prefer.
Further reading:
- Herrmann et al. J. Mol. Biol. 319, 209-227 (2002).
(Note that the algorithm implemented in the assign command differs significantly from the original CANDID algorithm described in this publication.)
- Guntert. Meth. Mol. Biol. 278, 353-378 (2004).
- Guntert. Prog. NMR Spectrosc. 43, 105-125 (2003).
- Jee & Guntert. J. Struct. Funct. Genom. 4, 179-189 (2003).