CYANA Command: assign: Difference between revisions

From CYANA Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Parameters ==
== Parameters ==


; alignfactor=<real>             (default: 0.5)
; alignfactor=''real''             :(default: 0.5)
; matchfactor=<real>             (default: 0.5)
; matchfactor=''real''             :(default: 0.5)
; violation=<real>               (default: -1.0)
; violation=''real''               :(default: -1.0)
; probability=<real>             (default: 0.2)
; probability=''real''             :(default: 0.2)
; quality=<real>                 (default: 0.5)
; quality=''real''                 :(default: 0.5)
; elasticity=<real range>       (default: 1.0..1.0)
; elasticity=''real range''       :(default: 1.0..1.0)
; confidence=<real>             (default: 1.0)
; confidence=''real''             :(default: 1.0)
; supportweight=<real>           (default: 1.0)
; supportweight=''real''           :(default: 1.0)
; pathlength=<integer>           (default: 3)
; pathlength=''integer''           :(default: 3)
; prefer=<integer>               (default: 999999)
; prefer=''integer''               :(default: 999999)
; interrange=<integer range>     (default: 0..)
; interrange=''integer range''     :(default: 0..)
; unassigned=<real>             (default: 0.1)
; unassigned=''real''             :(default: 0.1)
; noartifact=<string>           (default: none)
; noartifact=''string''           :(default: none)
; short
; short
; nearest                             
; nearest                             
Line 20: Line 20:
== Description ==
== Description ==


The 'assign' command performs automated assignment of the NOESY cross peaks on the basis of the given chemical shifts, knowledge of
The '''assign''' command performs automated assignment of the NOESY cross peaks on the basis of the given chemical shifts, knowledge of covalently constrained short distances, and the selected 3D conformers, if available. The '''assign''' command is used in the [[CYANA Macro: noeassign|'''noeassign''']] macro to implement a combined automated NOESY assignment and structure calculation strategy.
covalently constrained short distances, and the selected 3D conformers, if available. The 'assign' command is used in the 'noeassign' macro
to implement a combined automated NOESY assignment and structure calculation strategy.


=== Input data ===
=== Input data ===
Line 33: Line 31:
=== Output data ===
=== Output data ===


Output data comprises assignments made by the 'assign' command for the peaks that were NOT selected in the input peak lists, as well as a
Output data comprises assignments made by the '''assign''' command for the peaks that were NOT selected in the input peak lists, as well as a report including details on the assignment of each individual peak and a summary table. Peaks that were selected on input are not modified. If peaks are assigned and unselected on input, the report also provides a comparison between the input assignment and the new assignment made by the '''assign''' command that overwrites the input assignment.
report including details on the assignment of each individual peak and a summary table. Peaks that were selected on input are not modified. If
peaks are assigned and unselected on input, the report also provides a comparison between the input assignment and the new assignment made
by the 'assign' command that overwrites the input assignment.


=== Assignment strategy ===
=== Assignment strategy ===
Line 51: Line 46:
     H    SER  43 - HA    ILE  63  far    0    95  0  -  6.4-9.0
     H    SER  43 - HA    ILE  63  far    0    95  0  -  6.4-9.0
     H    ALA  22 - HA    ILE  63  far    0    99  0  -  9.9-14.6
     H    ALA  22 - HA    ILE  63  far    0    99  0  -  9.9-14.6
   Violated in 0 structures by 0.00 Å.
   Violated in 0 structures by 0.00 A.


- Line 1: Peak number, peak list, peak position, upper distance bound.
- Line 1: Peak number, peak list, peak position, upper distance bound.
- Line 2: Number of used assignments, number of assignment possibilities, overall quality of the peak assignment (0..1). Quality values below the <quality> cutoff are marked as "low quality", and the peak remains unassigned.
 
- Lines 3-7: Individual assignment possibilities  
- Line 2: Number of used assignments, number of assignment possibilities, overall quality of the peak assignment (0..1). Quality values below the ''quality'' cutoff are marked as "low quality", and the peak remains unassigned.
* Flag that indicates the input assignment, if present, by a '*' if it is among the used assignments, or by a '!' otherwise.
 
- Lines 3-6: Individual assignment possibilities  
* Flag that indicates the input assignment, if present, by a '''*''' if it is among the used assignments, or by a '''!''' otherwise.
* First atom, identified by its name, residue name, and residue number
* First atom, identified by its name, residue name, and residue number
* Flag: '+', used assignment; '-', assignment possibility not used
* Flag: '''+''', used assignment; '''-''', assignment possibility not used
* Second atom, identified by its name, residue name and number
* Second atom, identified by its name, residue name and number
* Decision on assignment possibility:
* Decision on assignment possibility:
:; OK: good assignment with probability above the <probability> cutoff
:; OK: good assignment with probability above the ''probability'' cutoff
:; far: structure based probability too low
:; far: structure based probability too low
:; lone: network anchoring based probability too low
:; lone: network anchoring based probability too low
Line 71: Line 68:
* Minimal and maximal distance in the selected conformers (Angstrom)
* Minimal and maximal distance in the selected conformers (Angstrom)
* Most important individual contributions to the network anchoring based probability, ordered by decreasing size. The number after the equal sign is the probability in percent for the contribution identified in front of the equal sign, as follows (only the first three possibilities appear in the example above):
* Most important individual contributions to the network anchoring based probability, ordered by decreasing size. The number after the equal sign is the probability in percent for the contribution identified in front of the equal sign, as follows (only the first three possibilities appear in the example above):
:; <real>: covalently constrained distance shorter than <real> A.
:; ''real'': covalently constrained distance shorter than ''real'' Å.
:; <integer>: peak number of a (symmetrically related) peak with the same assignment
:; ''integer'': peak number of a (symmetrically related) peak with the same assignment
:; <integer>/<integer>: numbers of two peaks that relate the two atoms of the present assignment through a third atom
:; ''integer''/''integer'': numbers of two peaks that relate the two atoms of the present assignment through a third atom
:; <integer>/<real>: peak with number <integer> connects the first atom to a third atom whose distance from the second atom is covalently restrained to be shorter than <real> A.
:; ''integer''/''real'': peak with number ''integer'' connects the first atom to a third atom whose distance from the second atom is covalently restrained to be shorter than ''real'' Å.
:; <real>/<integer>: peak with number <integer> connects the second atom to a third atom whose distance from the first atom is covalently restrained to be shorter than <real> A.
:; ''real''/''integer'': peak with number ''integer'' connects the second atom to a third atom whose distance from the first atom is covalently restrained to be shorter than ''real'' Å.
:; ~<integer>: The peak with number <integer> connects two atoms that covalently restrained to be less than x A from the first and second atom of the present assignment possibility, respectively.
:; ~''integer'': The peak with number ''integer'' connects two atoms that covalently restrained to be less than x Å from the first and second atom of the present assignment possibility, respectively.
For reasons of space, only the first few contributions are printed.
For reasons of space, only the first few contributions are printed.
An ellipsis "..." followed by the total number of contributions in parenthesis indicates that not all contributions with probability greater than 1% are printed.
An ellipsis "..." followed by the total number of contributions in parenthesis indicates that not all contributions with probability greater than 1% are printed.
- Line 8 (last line): Number of conformers in which the upper distance limit of the ambiguous distance restraint formed by the accepted  assignments (marked by '+' in lines 3-7) is violated by more than the <violation> threshold, and the average size of the violation.
The contents of the assignment report is as follows:
Line 1: Peak number, peak list, peak position, upper distance bound.


Line 2: Number of used assignments, number of assignment possibilities, overall quality of the peak assignment (0...1). Quality values below the ''quality'' cutoff are marked as "low quality", and the peak remains unassigned.
- Line 7 (last line): Number of conformers in which the upper distance limit of the ambiguous distance restraint formed by the accepted assignments (marked by '+' in lines 3-6) is violated by more than the ''violation'' threshold, and the average size of the violation.
 
Lines 3–6: Information about individual assignment possibilities:
 
* Flag that indicates the input assignment, if present, by a '''*''' if it is among the used assignments, or by a '''!''' otherwise.
* First atom, identified by its name, residue name, and residue number
* Flag: '''+''', used assignment; '''-''', assignment possibility not used
* Second atom, identified by its name, residue name and number
* Decision on assignment possibility:
:; OK : Good assignment with probability above the ''probability'' cutoff.
:; far : Structure based probability too low.
:; lone : Network anchoring based probability too low.
:; poor : Individual probabilities ok but overall probability too low.
 
:Note that an assignment with “'''OK'''” is not necessarily used when forming a distance restraint from the peak. Even individually good assignments may be discarded because the overall quality of the peak assignments is too low, or because there exist other, much better assignments for the peak.
* Overall probability for the assignment possibility (%).
* Probability for match between peak position and chemical shifts (%).
* Probability for agreement with input structure bundle (%).
* Probability derived from network anchoring (%).
* Minimal and maximal distance in the selected conformers (Å).
* Most important individual contributions to the network anchoring based probability, ordered by decreasing size. The number after the equal sign is the probability in percent for the contribution identified in front of the equal sign, as follows (only the first three possibilities appear in the example above):
:;''r'': Covalently constrained distance shorter than ''r'' Å.
:;''i'': Peak number of a (symmetrically related) peak with the same assignment.
:;''i<sub>1</sub>''/''i<sub>2</sub>'': Numbers of two peaks that relate the two atoms of the present assignment through a third atom.
:;''i''/''r'': Peak with number ''i'' connects the first atom to a third atom whose distance from the second atom is covalently restrained to be shorter than ''r'' Å.
:;''r''/''i'': Peak with number ''i'' connects the second atom to a third atom whose distance from the first atom is covalently restrained to be shorter than ''r'' Å.
:;''~i'': The peak with number ''i'' connects two atoms that are covalently restrained to be less than x Å from the first and second atom of the present assignment possibility, respectively.
 
:For reasons of space, only the first few contributions are printed. An ellipsis “...” followed by the total number of contributions in parenthesis indicates that not all contributions with probability greater than 1% are printed.
 
Line 7 (last line): Number of conformers in which the upper distance limit of the ambiguous distance restraint formed by the accepted assignments (marked by +in lines 3–6) is violated by more than the ''violation'' threshold, and the average size of the violation.


=== Covalently constrained distances ===
=== Covalently constrained distances ===


The covalently constrained short distances are normally taken from distance restraints with weight zero, which can be obtained, for instance, by analyzing a bundle of randomized conformers with the [[CYANA Commands: distances short|'''distances short''']] command, as implemented in the [[CYANA Macro: noeassign|'''noeassign''']] macro. If no distance restraints with weight zero exist, the short distances are calculated internally from the select conformers (which should be randomized), if available and if ''violation'' parameter is negative, or by an analytical calculation otherwise.
The covalently constrained short distances are normally taken from distance restraints with weight zero, which can be obtained, for instance, by analyzing a bundle of randomized conformers with the [[CYANA Commands: distances short|'''distances short''']] command, as implemented in the [[CYANA Macro: noeassign|'''noeassign''']] macro. If no distance restraints with weight zero exist, the short distances
are calculated internally from the select conformers (which should be randomized), if available and if ''violation'' is negative, or by an analytical calculation otherwise.


=== Elasticity of upper distance bounds ===
=== Elasticity of upper distance bounds ===
Line 136: Line 95:


The '''assign''' command provides special features for symmetric homodimers that can be defined with the [[CYANA Commands: molecules define|'''molecules define''']] command. In the case of a homodimer, only assignments with the first atom in the first monomer are made. The corresponding symmetric distance restraint can be added afterwards with the [[CYANA Commands: molecules symmetrize|'''molecules symmetrize''']] command. Homodimer assignments are restricted to be only intramolecular or only intermolecular for peaks with (XEASY) color codes 8 or 9, respectively. Furthermore, intermolecular homodimer assignments between residues ''i'' and ''j'' are considered only if |''i'' - ''j''| is within the ''interrange''. Intermolecular assignments of a peak are also excluded if the peak has at least one intramolecular assignment between residues ''i'' and ''j'' with |''i'' - ''j''| smaller than the parameter ''prefer''.
The '''assign''' command provides special features for symmetric homodimers that can be defined with the [[CYANA Commands: molecules define|'''molecules define''']] command. In the case of a homodimer, only assignments with the first atom in the first monomer are made. The corresponding symmetric distance restraint can be added afterwards with the [[CYANA Commands: molecules symmetrize|'''molecules symmetrize''']] command. Homodimer assignments are restricted to be only intramolecular or only intermolecular for peaks with (XEASY) color codes 8 or 9, respectively. Furthermore, intermolecular homodimer assignments between residues ''i'' and ''j'' are considered only if |''i'' - ''j''| is within the ''interrange''. Intermolecular assignments of a peak are also excluded if the peak has at least one intramolecular assignment between residues ''i'' and ''j'' with |''i'' - ''j''| smaller than the parameter ''prefer''.
== Further reading ==
* Herrmann et al. J. Mol. Biol. 319, 209-227 (2002). (Note that the algorithm implemented in the 'assign' command differs significantly from the original CANDID algorithm described in this publication.)
* Guntert. Meth. Mol. Biol. 278, 353-378 (2004).
* Guntert. Prog. NMR Spectrosc. 43, 105-125 (2003).
* Jee & Guntert. J. Struct. Funct. Genom. 4, 179-189 (2003).


== See also ==
== See also ==

Latest revision as of 12:13, 12 January 2010

Parameters

alignfactor=real
(default: 0.5)
matchfactor=real
(default: 0.5)
violation=real
(default: -1.0)
probability=real
(default: 0.2)
quality=real
(default: 0.5)
elasticity=real range
(default: 1.0..1.0)
confidence=real
(default: 1.0)
supportweight=real
(default: 1.0)
pathlength=integer
(default: 3)
prefer=integer
(default: 999999)
interrange=integer range
(default: 0..)
unassigned=real
(default: 0.1)
noartifact=string
(default: none)
short
nearest
changevol

Description

The assign command performs automated assignment of the NOESY cross peaks on the basis of the given chemical shifts, knowledge of covalently constrained short distances, and the selected 3D conformers, if available. The assign command is used in the noeassign macro to implement a combined automated NOESY assignment and structure calculation strategy.

Input data

Required input data consists of unassigned (or assigned) NOESY peaks from one or several peak lists, and one or several chemical shift lists. Optional input data comprises a group of selected conformers and a list of covalently constrained short distances. To each input peak an upper distance bound must have been attributed, for instance using the 'peaks simplecal' command or the 'calibration' macro that convert peak intensitites or volumes into distance bounds.

Output data

Output data comprises assignments made by the assign command for the peaks that were NOT selected in the input peak lists, as well as a report including details on the assignment of each individual peak and a summary table. Peaks that were selected on input are not modified. If peaks are assigned and unselected on input, the report also provides a comparison between the input assignment and the new assignment made by the assign command that overwrites the input assignment.

Assignment strategy

First all assignment possibilities of a peak are generated on the basis of the chemical shift values that match the peak position within the tolerance defined by the tolerance variable. Second, the probability for agreement with the bundle of selected conformers, if present, is computed as the fraction of the conformers in which the corresponding distance is shorter than the upper distance bound plus the acceptable violation, and assignment possibilities for which the product of these two probabilities is below the required probability threshold are discarded. Third, each remaining assignment possibility is evaluated for its network anchoring, i.e., its embedding in the network formed by the assignment possibilities of all the other peaks and the covalently constrained distances. The network anchoring probability that the distance corresponding to an assignment is shorter than the upper distance bound plus the acceptable violation is computed given the assignments of the other peaks but independent from knowledge of the three-dimensional structure. Only assignment possibilities for which the product of the three probabilities is above the required probability threshold, are accepted. Next the overall quality Q of the assignment of a peak is computed from the probabilities of its individual accepted assignment possibilities. The overall quality of a peak assignment is always at least as large as the highest probability of an accepted assignment possibility. Peaks are kept assigned only if their quality exceeds the quality cutoff. Example assignment report for a peak:

Example assignment report for a peak

 Peak 165 from c13.peaks (8.72, 4.11, 59.86 ppm; 3.08 A):
 2 out of 4 assignments used, quality = 0.97:
 * H     ILE   64 + HA    ILE   63  OK    90    99 100  91  2.1-2.3   1260=69, 63/50=24...(10)
   H     ILE   63 + HA    ILE   63  OK    71    71 100 100  2.8-2.8   3.0=100
   H     SER   43 - HA    ILE   63  far    0    95   0   -  6.4-9.0
   H     ALA   22 - HA    ILE   63  far    0    99   0   -  9.9-14.6
 Violated in 0 structures by 0.00 A.

- Line 1: Peak number, peak list, peak position, upper distance bound.

- Line 2: Number of used assignments, number of assignment possibilities, overall quality of the peak assignment (0..1). Quality values below the quality cutoff are marked as "low quality", and the peak remains unassigned.

- Lines 3-6: Individual assignment possibilities

  • Flag that indicates the input assignment, if present, by a * if it is among the used assignments, or by a ! otherwise.
  • First atom, identified by its name, residue name, and residue number
  • Flag: +, used assignment; -, assignment possibility not used
  • Second atom, identified by its name, residue name and number
  • Decision on assignment possibility:
OK
good assignment with probability above the probability cutoff
far
structure based probability too low
lone
network anchoring based probability too low
poor
individual probabilities ok but overall probability too low
  • Overall probability for the assignment possibility (%)
  • Probability for match between peak position and chemical shifts (%)
  • Probability for agreement with input structure bundle (%)
  • Probability derived from network anchoring (%)
  • Minimal and maximal distance in the selected conformers (Angstrom)
  • Most important individual contributions to the network anchoring based probability, ordered by decreasing size. The number after the equal sign is the probability in percent for the contribution identified in front of the equal sign, as follows (only the first three possibilities appear in the example above):
real
covalently constrained distance shorter than real Å.
integer
peak number of a (symmetrically related) peak with the same assignment
integer/integer
numbers of two peaks that relate the two atoms of the present assignment through a third atom
integer/real
peak with number integer connects the first atom to a third atom whose distance from the second atom is covalently restrained to be shorter than real Å.
real/integer
peak with number integer connects the second atom to a third atom whose distance from the first atom is covalently restrained to be shorter than real Å.
~integer
The peak with number integer connects two atoms that covalently restrained to be less than x Å from the first and second atom of the present assignment possibility, respectively.

For reasons of space, only the first few contributions are printed. An ellipsis "..." followed by the total number of contributions in parenthesis indicates that not all contributions with probability greater than 1% are printed.

- Line 7 (last line): Number of conformers in which the upper distance limit of the ambiguous distance restraint formed by the accepted assignments (marked by '+' in lines 3-6) is violated by more than the violation threshold, and the average size of the violation.

Covalently constrained distances

The covalently constrained short distances are normally taken from distance restraints with weight zero, which can be obtained, for instance, by analyzing a bundle of randomized conformers with the distances short command, as implemented in the noeassign macro. If no distance restraints with weight zero exist, the short distances are calculated internally from the select conformers (which should be randomized), if available and if violation is negative, or by an analytical calculation otherwise.

Elasticity of upper distance bounds

When searching for peak assignments the algorithm can adapt individual upper distance bounds in the input peak lists by a factor within the allowed elasticity range. An individual upper bound can be increased if a slight violation of the original upper distance bound can be avoided by the increased distance limit in at least 80% of the conformers. An individual upper bound can be decreased if the actual distances in the input conformers are consistently shorter than the upper distance bound. By default, there is no “elasticity” of the upper distance bounds, i.e. the input distance limits are used without change. If an upper distance is changed, its modified value is indicated in the first line of the report on the assignment of the peak. The additional option changevol can be used to correct peak volumes according to the internal change of the corresponding upper distance bound using an inverse sixth power relationship.

Additional control parameters

The probability for the chemical shift matching is calculated using the tolerance values multiplied by matchfactor. A smaller matchfactor implies a higher weight for good agreement between the peak coordinates and the chemical shifts. The mutual alignment of peaks is controlled by the variable tolerance, and the probability for network anchoring is calculated using the tolerance values multiplied by alignfactor. A smaller alignfactor implies a higher weight for good mutual alignment between peaks with assignment possibilities to the same atom(s). When calculating the network anchoring probability of a given peak assignment, the probabilities of other aligned peaks may be scaled by a confidence factor between 0 and 1. Chemical shift assignments with an attached chemical shift error larger than the unassigned cutoff are treated as "unassigned" when determining the initial assignment possibilities of peaks: Only one of the two atoms of an assignment may be “unassigned”, and, if in addition the 'short' option is set, only short-range assignments for covalently constrained distances are considered.

Symmetric homodimers

The assign command provides special features for symmetric homodimers that can be defined with the molecules define command. In the case of a homodimer, only assignments with the first atom in the first monomer are made. The corresponding symmetric distance restraint can be added afterwards with the molecules symmetrize command. Homodimer assignments are restricted to be only intramolecular or only intermolecular for peaks with (XEASY) color codes 8 or 9, respectively. Furthermore, intermolecular homodimer assignments between residues i and j are considered only if |i - j| is within the interrange. Intermolecular assignments of a peak are also excluded if the peak has at least one intramolecular assignment between residues i and j with |i - j| smaller than the parameter prefer.

Further reading

  • Herrmann et al. J. Mol. Biol. 319, 209-227 (2002). (Note that the algorithm implemented in the 'assign' command differs significantly from the original CANDID algorithm described in this publication.)
  • Guntert. Meth. Mol. Biol. 278, 353-378 (2004).
  • Guntert. Prog. NMR Spectrosc. 43, 105-125 (2003).
  • Jee & Guntert. J. Struct. Funct. Genom. 4, 179-189 (2003).

See also