Automated resonance assignment with FLYA (Gothenburg 2014): Difference between revisions

From CYANA Wiki
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
In this tutorial we want to determine the resonance assignments and the structure of a protein using the program CYANA.   
In this tutorial we want to determine the resonance assignments and the structure of a protein using the program CYANA.   


== CYANA setup for the EMBO Practical Course NMR in Basel (15 October 2014) ==
== CYANA setup for the NMR course in Gothenburg (15 October 2014) ==


Please follow the following steps carefully (exact Linux commands are given below; you may copy them to a terminal):
Please follow the following steps carefully (exact Linux commands are given below; you may copy them to a terminal):


# From your home directory, change to the directory 'guentert'.
# In your home directory, make a directory 'cyana' and change into it.
# Execute the setup script.
# Download the [[Media:flyadata.tgz|input data for the practical]].
# Download the [[Media:flyaembo.tgz|input data for the practical]].
# Unpack the input data for the practical.
# Unpack the input data for the practical.
# Move it to different directory, 'flyaembo1' (such that you later unpack the original data again to run another calculation)
# Copy it to different directory, 'flya01' (to keep the original data in 'flyadata' to run another calculation later)
# Change into the subdirectory 'flyaembo1' where the input data for the practical are.
# Change into the subdirectory 'flya01' where the input data for the practical are.
# Test whether CYANA can be started by typing its name, 'cyana'.
# Test whether CYANA can be started by typing its name, 'cyana'.
# Exit from CYANA by typing 'q' or 'quit'.
# Exit from CYANA by typing 'q' or 'quit'.


  cd ~/guentert
  cd ~
  source setup-guentert
  mkdir cyana
  wget 'http://www.cyana.org/wiki/images/3/31/Flyaembo.tgz'
cd cyana
  tar zxf Flyaembo.tgz
  wget 'http://www.cyana.org/wiki/images/4/46/Flyadata.tgz'
  mv flyaembo flyaembo1
  tar zxf Flyadata.tgz
  cd flyaembo1
  cp -a flyadata flya01
  cd flya01
  cyana
  cyana
  ___________________________________________________________________
  ___________________________________________________________________
Line 29: Line 29:
   
   
     Demo license valid for specific sequences until 2014-12-31.
     Demo license valid for specific sequences until 2014-12-31.
     Library file "/Users/guentert/Documents/Meetings/1410NMRCourseGothenburg/Practical/cyana-3.97/lib/cyana.lib" read, 38 residue types.
     Library file "/Users/guentert/bin/cyana-3.97/lib/cyana.lib" read, 38 residue types.
    Sequence file "demo.seq" read, 114 residues.
    Sequence file "demo.seq" read, 114 residues.
  cyana> q
  cyana> q


Line 85: Line 85:
'''Hint:''' The formats of other CYANA files are described in the [[CYANA 3.0 Reference Manual]].
'''Hint:''' The formats of other CYANA files are described in the [[CYANA 3.0 Reference Manual]].


<!--
== FLYA initialization script ==
== FLYA initialization script ==


Line 95: Line 96:


The command 'cyanalib' reads the standard CYANA library. The second command reads the protein sequence.
The command 'cyanalib' reads the standard CYANA library. The second command reads the protein sequence.
 
-->
== Experiment definitions in the CYANA library ==
== Experiment definitions in the CYANA library ==


Line 101: Line 102:


  SPECTRUM HNCA  HN N C
  SPECTRUM HNCA  HN N C
   0.980  HN:H_AMI  N:N_AM* C_A*  C_BYL C:C_ALI
   0.980  HN:H_AMI  N:N_AM*  C:C_ALI C_BYL
   0.800  HN:H_AMI  N:N_AMI  C_BYL C_ALI  N_AMI C:C_ALI
   0.800  HN:H_AMI  N:N_AMI  (C_ALI) C_BYL  C:C_ALI  


The first line corresponds to the '#SPECTRUM' line in the peak list. It specifies the experiment name and a label for the atoms that are detected in each dimension of the spectrum. The number of labels defines the dimensionality of the experiment (3 in case of HNCA).
The first line corresponds to the '#SPECTRUM' line in the peak list. It specifies the experiment name and a label for the atoms that are detected in each dimension of the spectrum. The number of labels defines the dimensionality of the experiment (3 in case of HNCA).
Line 113: Line 114:


  structurepeaks:=N15NOESY,C13NOESY
  structurepeaks:=N15NOESY,C13NOESY
  #assignpeaks:=N15NOESY,C13NOESY,C13HSQC,N15HSQC,HCCHTOCSY,HCCHCOSY,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH,HBHAcoNH,CcoNH,HCcoNH
  assignpeaks:=N15NOESY,C13NOESY,C13HSQC,N15HSQC,HCCHTOCSY,HCCHCOSY,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH,HBHAcoNH,CcoNH,HCcoNH
assignpeaks:=N15HSQC,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH


Experiments used for automated NOESY assignment and structure calculation are given in the 'structurepeaks:=...' statement (not relevant for the present practical); those for automated chemical shift assignment in the 'assignpeaks:=...' statement. The peak list names are separated by commas (without blanks!).  
Experiments used for automated NOESY assignment and structure calculation are given in the 'structurepeaks:=...' statement (not relevant for the present practical); those for automated chemical shift assignment in the 'assignpeaks:=...' statement. The peak list names are separated by commas (without blanks!).  
Line 120: Line 120:
The commands above will use all available peak lists. You can choose any subset of them by modifying the 'noesy:=...' or 'scalar:=...' statements.  
The commands above will use all available peak lists. You can choose any subset of them by modifying the 'noesy:=...' or 'scalar:=...' statements.  


'''Try it!''' For instance, to make only backbone assignment using the triple resonance backbone assignment spectra, you can set
'''Try it!''' For instance, to make only backbone assignment using the <sup>15</sup>N-HSQC and triple resonance backbone assignment spectra, you can set


  noesy:=
  assignpeaks:=N15HSQC,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH
scalar:=HNCA,HN_CA_CO,HNCO,HN_CO_CA,CBCANH,CBCACONH,HBHACONH


These are followed by tolerances for chemical shift matching:
These are followed by tolerances for chemical shift matching:
Line 145: Line 144:
These define
These define


* The population size for the genetic algorithm, i.e. how many assignments form one generation (20; chosen smaller than in normal production runs in order to speed up the calculation)   
* The population size for the genetic algorithm, i.e. how many assignments form one generation (25; chosen smaller than in normal production runs in order to speed up the calculation)   
* The maximal number of iterations during local optimization (15000).
* The maximal number of iterations during local optimization (15000).
* A file with reference chemical shifts (ref.prot). These will not be used by the algorithm but only for a subsequent analysis of its results.
* An option to choose the "quick" optimization schedule.
* Groups of atoms for which assignment statistics will be calculated and reported in the 'flya.txt' output file.
* Groups of atoms for which assignment statistics will be calculated and reported in the 'flya.txt' output file.
* The seed value for the random number generator. Any positive integer number can be used.
* The seed value for the random number generator. Any positive integer number can be used.
Line 167: Line 166:
Finally, there is the command to start the FLYA algorithm:
Finally, there is the command to start the FLYA algorithm:


  flya refprot=ref.prot noesy=$noesy scalar=$scalar runs=5 #stage=1
  flya runs=10 shiftreference=ref.prot structurepeaks=$structurepeaks assignpeaks=$assignpeaks


Here, the given parameters of the 'flya' command specify that
Here, the given parameters of the 'flya' command specify that


* The results will be compared with the reference chemical shifts in the file 'ref.prot' (which have been determined independently by conventional methods).
* The number of independent runs of the algorithm, from which the consolidated shift will be calculated (chosen smaller than in normal production runs in order to speed up the calculation).
* The results will be compared with the reference chemical shifts in the file 'ref.prot' (which have been determined independently by conventional methods). The reference chemical shifts will not be used by the algorithm but only for a subsequent analysis of its results.
* The input peak lists that will be used (as defined above).
* The input peak lists that will be used (as defined above).
* The number of independent runs of the algorithm, from which the consolidated shift will be calculated (chosen smaller than in normal production runs in order to speed up the calculation).
<!--* Whether or not the resonance assignment and the NOESY peak lists will be used for a structure calculation-->
<!--* Whether or not the resonance assignment and the NOESY peak lists will be used for a structure calculation-->


== Run the FLYA calculation ==
== Run the FLYA calculation ==


To run the FLYA calculation, you start CYANA and execute the 'RUNFLYA.cya' script:
To run the FLYA calculation, you start CYANA and execute the 'CALC.cya' script:


  cyana "nproc=5; RUNFLYA"
  cyana "nproc=2; CALC"


By specifying the 'nproc=5' command, the 5 independent runs of the algorithm will be performed in parallel. On a computer with multiple processors this will speed up the calculation, which is expected to take a few minutes.
By specifying the 'nproc=2' command, the independent runs of the algorithm will be performed in parallel. On a computer with multiple processors this will speed up the calculation, which is expected to take a few minutes.


== FLYA output files ==
== FLYA output files ==
Line 197: Line 196:
=== The flya.txt file ===
=== The flya.txt file ===


This output file starts with overall assignment statistics for each group of atoms as defined by 'analyze_assign_group:=...' in RUNFLYA.cya':
This output file starts with overall assignment statistics for each group of atoms as defined by 'analyzeassign_group:=...' in CALC.cya':


     ____________________________________________________________
     ____________________________________________________________
Line 207: Line 206:
     assigned from frequencies
     assigned from frequencies
   
   
     CONBB: REF 493 [FREQ 493 [ASN 493 T 489(99.2%) F 4 [INNERRES 0]]] ADDFREQ 11 [ASN 11]
     CONBB: REF 494 [FREQ 494 [ASN 494 T 491(99.4%) F 3 [INNERRES 0]]] ADDFREQ 4 [ASN 4]
   
   
     CONALL: REF 1141 [FREQ 1141 [ASN 1139 T 1073(94.0%) F 68 [INNERRES 45]]] ADDFREQ 30 [ASN 29]
     CONALL: REF 1096 [FREQ 1096 [ASN 1094 T 1047(95.5%) F 49 [INNERRES 35]]] ADDFREQ 18 [ASN 17]
   
   
     BB: REF 512 [FREQ 512 [ASN 512 T 502(98.0%) F 10 [INNERRES 4]]] ADDFREQ 30 [ASN 28]
     BB: REF 512 [FREQ 512 [ASN 512 T 498(97.3%) F 14 [INNERRES 3]]] ADDFREQ 30 [ASN 28]
   
   
     ALL: REF 1264 [FREQ 1261 [ASN 1259 T 1151(91.1%) F 110 [INNERRES 63]]] ADDFREQ 89 [ASN 83]
     ALL: REF 1264 [FREQ 1261 [ASN 1257 T 1141(90.3%) F 120 [INNERRES 71]]] ADDFREQ 89 [ASN 85]


* REF is the number of reference assignments (in the selected group)
* REF is the number of reference assignments (in the selected group)
Line 227: Line 226:
   
   
     Lists      #Expected  noRef  noPeak  Assigned        Match    #Measured Assigned  exp/meas Assigned
     Lists      #Expected  noRef  noPeak  Assigned        Match    #Measured Assigned  exp/meas Assigned
     N15NOESY      1497     174     502   932( 62.26%)  789( 52.71%)  3008    768( 25.53%)  1.2
     N15NOESY      1495     177     499   937( 62.68%)  788( 52.71%)  3008    766( 25.47%)  1.2
     C13NOESY      4864     299   2091 2811( 57.79%)  2220( 45.64%)  10886  2258( 20.74%)  1.2
     C13NOESY      4852     304   2074 2808( 57.87%)  2169( 44.70%)  10886  2219( 20.38%)  1.3
     C13H1          556      27      85  464( 83.45%)  406( 73.02%)    407    350( 86.00%)  1.3
     C13HSQC        556      27      85  457( 82.19%)  396( 71.22%)    407    345( 84.77%)  1.3
     N15H1          135      16      4  127( 94.07%)  113( 83.70%)    131    112( 85.50%)  1.1
     N15HSQC        135      16      4  128( 94.81%)  113( 83.70%)    131    113( 86.26%)  1.1
     HCCH24        2734     79   1403 1295( 47.37%)  1039( 38.00%)  2363  1015( 42.95%)  1.3
     HCCHTOCSY    2796     87   1445 1344( 48.07%)  1042( 37.27%)  2363  1022( 43.25%)  1.3
     HCCH7        1926      69    1102  852( 44.24%)  686( 35.62%)  2005    675( 33.67%)  1.3
     HCCHCOSY      1926      69    1102  878( 45.59%)  679( 35.25%)  2005    670( 33.42%)  1.3
     HNCA          219     15      19   195( 89.04%)  184( 84.02%)    329    177( 53.80%)  1.1
     HNCA          211     15      11   192( 91.00%)  182( 86.26%)    329    175( 53.19%)  1.1
     HN_CA_CO      211      15      11  194( 91.94%)  180( 85.31%)    246    175( 71.14%)  1.1
     HNcaCO        211      15      11  193( 91.47%)  179( 84.83%)    246    175( 71.14%)  1.1
     HNCO          105      7      1  100( 95.24%)    96( 91.43%)    158    97( 61.39%)  1.0
     HNCO          105      7      1  101( 96.19%)    95( 90.48%)    158    97( 61.39%)  1.0
     HN_CO_CA      105      7      0  103( 98.10%)    96( 91.43%)    158    98( 62.03%)  1.1
     HNcoCA        105      7      0  103( 98.10%)    96( 91.43%)    158    98( 62.03%)  1.1
     CBCANH        415     26      41   363( 87.47%)  344( 82.89%)    623    335( 53.77%)  1.1
     CBCANH        399     26      25   361( 90.48%)  344( 86.22%)    623    335( 53.77%)  1.1
     CBCACONH       216     13     18   195( 90.28%)  182( 84.26%)    324    185( 57.10%)  1.1
     CBCAcoNH       200     13       2   196( 98.00%)  183( 91.50%)    324    187( 57.72%)  1.0
     HBHACONH       288      20      82  205( 71.18%)  188( 65.28%)    364    183( 50.27%)  1.1
     HBHAcoNH       288      20      82  207( 71.88%)  188( 65.28%)    364    183( 50.27%)  1.1
     C_CO_NH        370      16      53  309( 83.51%)  277( 74.86%)    365    284( 77.81%)  1.1
     CcoNH          370      16      53  311( 84.05%)  277( 74.86%)    365    287( 78.63%)  1.1
     HC_CO_NH      540      22    225  319( 59.07%)  282( 52.22%)    442    259( 58.60%)  1.2
     HCcoNH        540      22    225  313( 57.96%)  276( 51.11%)    442    256( 57.92%)  1.2
     ALL          14181     805   5637 8464( 59.69%)  7082( 49.94%)  21809  6971( 31.96%)  1.2
     ALL          14189     821   5619 8529( 60.11%)  7007( 49.38%)  21809  6928( 31.77%)  1.2


It contains the following data:
It contains the following data:
Line 248: Line 247:
* '''#Expected:''' Total number of expected peaks
* '''#Expected:''' Total number of expected peaks
* '''noRef:''' Number of expected peaks with missing reference shifts
* '''noRef:''' Number of expected peaks with missing reference shifts
* '''noPeak:''' Number of expected peaks for wich no peak can be measured
* '''noPeak:''' Number of expected peaks for which no peak can be measured
* '''Assigned:''' Number of expected peaks that could be assigned based on the reference chemical shift assignments. The theoretical maximum of 100% corresponds to the situation that the spectra “explain” all expected peaks. Each expected peak can be mapped to at most one measured peak. Remaining expected peaks correspond to missing peaks in the measured peak list.
* '''Assigned:''' Number of expected peaks that could be assigned based on the reference chemical shift assignments. The theoretical maximum of 100% corresponds to the situation that the spectra “explain” all expected peaks. Each expected peak can be mapped to at most one measured peak. Remaining expected peaks correspond to missing peaks in the measured peak list.
* '''Match:''' Number of assigned peaks that fit (within tolerance) reference shifts. The theoretical maximum of 100% corresponds to having all measured peaks assigned. Note that several expected peaks can be mapped to the same measured peak, i.e. the assignments of measured peaks can be unambiguous or ambiguous. Remaining unassigned measured peaks are likely to be artifacts.
* '''Match:''' Number of assigned peaks that fit (within tolerance) reference shifts. The theoretical maximum of 100% corresponds to having all measured peaks assigned. Note that several expected peaks can be mapped to the same measured peak, i.e. the assignments of measured peaks can be unambiguous or ambiguous. Remaining unassigned measured peaks are likely to be artifacts.
Line 262: Line 261:


     Atom  Residue      Ref  Shift    Dev  Extent  inside  inref
     Atom  Residue      Ref  Shift    Dev  Extent  inside  inref
     N    LYS  11 120.779 120.805  -0.026    5.0  100.0  100.0  consolidated=
     N    LYS  11 120.779 120.805  -0.026    5.0  100.0  100.0  strong=
     H    LYS  11  7.516  7.514  0.002    5.0  100.0  100.0  consolidated=
     H    LYS  11  7.516  7.514  0.002    5.0  100.0  100.0  strong=
     CA    LYS  11  55.129  54.976  0.153    5.0  100.0  100.0  consolidated=
     CA    LYS  11  55.129  54.976  0.153    5.0  100.0  100.0  strong=
     HA    LYS  11  4.244  4.241  0.003    5.0    99.9  100.0  consolidated=
     HA    LYS  11  4.244  4.241  0.003    5.0    99.9  100.0  strong=
     CB    LYS  11  31.380  31.357  0.023    5.0  100.0  100.0  consolidated=
     CB    LYS  11  31.380  31.357  0.023    5.0  100.0  100.0  strong=
     HB2  LYS  11  1.490  1.486  0.004    5.0  100.0  100.0  consolidated=
     HB2  LYS  11  1.490  1.486  0.004    5.0  100.0  100.0  strong=
     HB3  LYS  11  1.490  1.487  0.003    5.0  100.0  100.0  consolidated=
     HB3  LYS  11  1.490  1.487  0.003    5.0  100.0  100.0  strong=
     CG    LYS  11  25.362  25.427  -0.065    5.0  100.0  100.0  consolidated=
     CG    LYS  11  25.362  25.427  -0.065    5.0  100.0  100.0  strong=
     HG2  LYS  11  1.270  1.263  0.007    5.0  100.0  100.0  consolidated=
     HG2  LYS  11  1.270  1.263  0.007    5.0  100.0  100.0  strong=
     HG3  LYS  11  1.446  1.456  -0.010    5.0    99.8  100.0  consolidated=
     HG3  LYS  11  1.446  1.456  -0.010    5.0    99.8  100.0  strong=
     CD    LYS  11  29.374  29.445  -0.071    5.0    65.6    60.0  =
     CD    LYS  11  29.374  29.445  -0.071    5.0    65.6    60.0  =
     HD2  LYS  11  1.495  1.487  0.008    5.0    59.9    60.0  =
     HD2  LYS  11  1.495  1.487  0.008    5.0    59.9    60.0  =
     HD3  LYS  11  1.495  2.045  -0.550    5.0    91.4    0.0  consolidated! (HB3 10)
     HD3  LYS  11  1.495  2.045  -0.550    5.0    91.4    0.0  strong! (HB3 10)
     CE    LYS  11  42.023  42.013  0.010    5.0  100.0  100.0  consolidated=
     CE    LYS  11  42.023  42.013  0.010    5.0  100.0  100.0  strong=
     HE2  LYS  11  2.744  2.749  -0.005    5.0    99.9  100.0  consolidated=
     HE2  LYS  11  2.744  2.749  -0.005    5.0    99.9  100.0  strong=
     HE3  LYS  11  2.824  2.758  0.066    5.0    74.2    0.0  ! (HE2)
     HE3  LYS  11  2.824  2.758  0.066    5.0    74.2    0.0  ! (HE2)
     C    LYS  11 174.616 176.282  -1.666    5.0    80.0    20.0  consolidated! (C 10)
     C    LYS  11 174.616 176.282  -1.666    5.0    80.0    20.0  strong! (C 10)


* '''Ref:''' Chemical shift value in the reference chemical shift list (ref.prot). It was not used in the calculation.
* '''Ref:''' Chemical shift value in the reference chemical shift list (ref.prot). It was not used in the calculation.
Line 287: Line 286:
* '''inref:''' Percentage of chemical shift values from the (5) independent runs of FLYA that agree (within the tolerance) with the reference value.
* '''inref:''' Percentage of chemical shift values from the (5) independent runs of FLYA that agree (within the tolerance) with the reference value.
* Outcome of the assignment:
* Outcome of the assignment:
** '''consolidated:''' "strong" assignment, i.e. Inside > 80%.
** '''strong:''' "strong" assignment, i.e. Inside > 80%.
** '''=:''' Assignment that agrees with reference, i.e. Dev < tolerance.
** '''=:''' Assignment that agrees with reference, i.e. Dev < tolerance.
** '''!:''' Assignment that does not agree with the reference, i.e. Dev > tolerance.
** '''!:''' Assignment that does not agree with the reference, i.e. Dev > tolerance.
Line 310: Line 309:
These shifts will only be used for comparison (e.g. in flya.tab, flya.txt, flya.pdf):
These shifts will only be used for comparison (e.g. in flya.tab, flya.txt, flya.pdf):


  run_assign_reference:=ref.prot
  shiftassign_reference:=ref.prot


Shifts and standard deviations in the file 'predicted.prot' (not provided in this practical) will replace the general statistics from cyana.lib (CSTABLE):
Shifts and standard deviations in the file 'predicted.prot' (not provided in this practical) will replace the general statistics from cyana.lib (CSTABLE):


  run_assign_statistics:=predicted.prot
  shiftassign_statistics:=predicted.prot


Shifts in the file 'fix.prot' will be fixed to the input values
Shifts in the file 'fix.prot' will be fixed to the input values


  run_assign_fix:=fix.prot
  shiftassign_fix:=fix.prot


The latter approach can for instance be used to perform sidechain assignment when the backbone assignment is already known.  
The latter approach can for instance be used to perform sidechain assignment when the backbone assignment is already known.  
Line 336: Line 335:
  q
  q


The file 'fix.prot' will contain the reference chemical shifts only for the backbone (and CB) atoms H, N, CA, CB, C'. Now you can repeat the assignment calculation by inserting the 'run_assign_fix:=fix.prot' statement in 'RUNFLYA.cya' and choosing only the input peak lists that are relevant for sidechain assignment:
The file 'fix.prot' will contain the reference chemical shifts only for the backbone (and CB) atoms H, N, CA, CB, C'. Now you can repeat the assignment calculation by inserting the 'shiftassign_fix:=fix.prot' statement in 'CALC.cya' and choosing only the input peak lists that are relevant for sidechain assignment:


  run_assign_fix:=fix.prot
  shiftassign_fix:=fix.prot
  noesy:=N15NOESY,C13NOESY
  noesy:=N15NOESY,C13NOESY
  scalar:=C13H1,N15H1,HCCH24,HCCH7,HBHACONH,C_CO_NH,HC_CO_NH
  assignpeaks:=C13H1,N15H1,HCCH24,HCCH7,HBHACONH,C_CO_NH,HC_CO_NH


== Fully automated structure calculation ==
== Fully automated structure calculation ==


Automated resonance assignment, automated NOE restraint assignment, and the structure calculation by torsion angle dynamics can be combined by running the 'flya' command in 'RUNFLYA.cya' with the additional parameter 'stage=1' (which was commented out so far):
Automated resonance assignment, automated NOE restraint assignment, and the structure calculation by torsion angle dynamics can be combined by running the 'flya' command in 'CALC.cya' with the additional parameter 'stage=1' (which was commented out so far):


  flya refprot=ref.prot noesy=$noesy scalar=$scalar runs=5 stage=1
  flya runs=10 shiftreference=ref.prot structurepeaks=$structurepeaks assignpeaks=$assignpeaks stage=1


The 'flya.prot' file from the automated resonance assignment will be used together with the (unassigned) NOESY peak lists to assign the NOESY peaks and to generate distance restraints in order to compute the three-dimensional structure of the protein.  
The 'flya.prot' file from the automated resonance assignment will be used together with the (unassigned) NOESY peak lists to assign the NOESY peaks and to generate distance restraints in order to compute the three-dimensional structure of the protein.  


To speed up the calculation, you can set in 'RUNFLYA.cya' (above the 'flya' command):
To speed up the calculation, you can set in 'CALC.cya' (above the 'flya' command):


  structures:=25,5
  structures:=25,5

Latest revision as of 09:00, 14 October 2014

In this tutorial we want to determine the resonance assignments and the structure of a protein using the program CYANA.

CYANA setup for the NMR course in Gothenburg (15 October 2014)

Please follow the following steps carefully (exact Linux commands are given below; you may copy them to a terminal):

  1. In your home directory, make a directory 'cyana' and change into it.
  2. Download the input data for the practical.
  3. Unpack the input data for the practical.
  4. Copy it to different directory, 'flya01' (to keep the original data in 'flyadata' to run another calculation later)
  5. Change into the subdirectory 'flya01' where the input data for the practical are.
  6. Test whether CYANA can be started by typing its name, 'cyana'.
  7. Exit from CYANA by typing 'q' or 'quit'.
cd ~
mkdir cyana
cd cyana
wget 'http://www.cyana.org/wiki/images/4/46/Flyadata.tgz'
tar zxf Flyadata.tgz
cp -a flyadata flya01
cd flya01
cyana
___________________________________________________________________

CYANA 3.97 (linux-intel)
 
Copyright (c) 2002-12 Peter Guentert. All rights reserved.
___________________________________________________________________

   Demo license valid for specific sequences until 2014-12-31.
   Library file "/Users/guentert/bin/cyana-3.97/lib/cyana.lib" read, 38 residue types.
   Sequence file "demo.seq" read, 114 residues.
cyana> q

If all worked, you are ready to go!

If you want to return to your practical later, using your own Linux or Mac OS X computer, you can download the demo version of CYANA from here. This is not necessary on the workshop computers where the software is already installed.

Hint: More information on the CYANA commands etc. is in the CYANA 3.0 Reference Manual.

Experimental input data

The protein sequence is stored in three-letter code in the file 'demo.seq'.

The following spectra have been measured:

  • [1H,13C]-HSQC (called 'C13HSQC' in FLYA)
  • [1H,15N]-HSQC (called 'N15HSQC' in FLYA)
  • 3D [13C]-resolved NOESY (called 'C13NOESY' in FLYA)
  • 3D [15N]-resolved NOESY (called 'N15NOESY' in FLYA)
  • HNCA
  • HN(CO)CA (called 'HNcoCA' in FLYA)
  • HNCO
  • HN(CA)CO (called 'HNcaCO' in FLYA)
  • CBCANH
  • CBCACONH (called 'CBCAcoNH' in FLYA)
  • HBHACONH (called 'HBHAcoNH' in FLYA)
  • HCCH-TOCSY (called 'HCCHTOCSY' in FLYA)
  • HCCH-COSY (called 'HCCHCOSY' in FLYA)
  • C(CO)NH (called 'CcoNH' in FLYA)
  • HC(CO)NH (called 'HCcoNH' in FLYA)

Peak lists in XEASY format that have been prepared by automatic peak picking with the program NMRView are stored in files XXX.peaks, where XXX denotes the FLYA spectrum type.

Each peak list starts with a header that defines the experiment type and the order of dimensions. For instance, for HNCA.peaks:

# Number of dimensions 3
#FORMAT xeasy3D
#INAME 1 HN
#INAME 2 C
#INAME 3 N
#SPECTRUM HNCA  HN C N
      5   6.475  58.033  98.548 1 U   2.769E+02  0.000E+00 e 0     0     0     0
      6   6.476  62.123  98.126 1 U   2.571E+01  0.000E+00 e 0     0     0     0
      7   6.475  54.017  98.159 1 U   2.547E+01  0.000E+00 e 0     0     0     0

The first line specifies the number of dimensions (3 in this case). The '#SPECTRUM' lines gives the experiment type (HNCA, which refers to the corresponding experiment definition in the CYANA library), followed by an identifier for each dimension of the peak list (HN C N) that specifies which chemical shift is stored in the corresponding dimension of the peak list. These labels must match those in the corresponding experiment definition in the general CYANA library (see below). After the '#SPECTRUM' line follows one line for every peak. For example, the first peak in the 'HNCA.peaks' list has

  • Peak number 5
  • HN chemical shift 6.475 ppm
  • C (CA) chemical shift 58.033 ppm
  • N chemical shift 98.548 ppm

The other data are irrelevant for automated chemical shift assignment with FLYA. In particular, the peak volume or intensity (2.769E+02) is not used by the algorithm.

Hint: The formats of other CYANA files are described in the CYANA 3.0 Reference Manual.

Experiment definitions in the CYANA library

When you start CYANA, the program reads the library and displays the full path name of the library file. You can open the standard library file to inspect, for example, the NMR experiment definitions that define which expected peaks are generated by FLYA. For instance, the definition for the HNCA spectrum (search for 'HNCA' in the library file 'cyana.lib') is

SPECTRUM HNCA  HN N C
 0.980  HN:H_AMI  N:N_AM*  C:C_ALI  C_BYL
 0.800  HN:H_AMI  N:N_AMI  (C_ALI) C_BYL  C:C_ALI 

The first line corresponds to the '#SPECTRUM' line in the peak list. It specifies the experiment name and a label for the atoms that are detected in each dimension of the spectrum. The number of labels defines the dimensionality of the experiment (3 in case of HNCA).

Each line below defines a (formal) magnetization transfer pathway that gives rise to an expected peak. in the case of HNCA there are two lines, corresponding to the intraresidual and sequential peak. For instance, the definition for the intraresidual peak starts with the probability to observe the peak (0.980), followed by a series of atom types, e.g. H_AMI for amide proton etc. An expected peak is generated for each molecular fragment in which these atom types occur connected by single covalent bonds. The atoms whose chemical shifts appear in the spectrum are identified by their labels followed by ':', e.g. for HNCA 'HN:', 'N:', and 'C:'.

FLYA execution script

The CYANA script 'CALC.cya' contains the commands to perform the automated resonance assignment. It starts with the specification of the names of the input peak lists:

structurepeaks:=N15NOESY,C13NOESY
assignpeaks:=N15NOESY,C13NOESY,C13HSQC,N15HSQC,HCCHTOCSY,HCCHCOSY,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH,HBHAcoNH,CcoNH,HCcoNH

Experiments used for automated NOESY assignment and structure calculation are given in the 'structurepeaks:=...' statement (not relevant for the present practical); those for automated chemical shift assignment in the 'assignpeaks:=...' statement. The peak list names are separated by commas (without blanks!).

The commands above will use all available peak lists. You can choose any subset of them by modifying the 'noesy:=...' or 'scalar:=...' statements.

Try it! For instance, to make only backbone assignment using the 15N-HSQC and triple resonance backbone assignment spectra, you can set

assignpeaks:=N15HSQC,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH

These are followed by tolerances for chemical shift matching:

tolerance:=0.03,0.03,0.4
assigncs_accH:=0.03
assigncs_accC:=0.4
assigncs_accN:=0.4

In this case a tolerance of 0.03 ppm will be used for protons, and 0.4 ppm will be used for carbon and nitrogen.

Parameters for the FLYA algorithm come next:

shiftassign_population:=25
shiftassign_iterations:=15000
shiftassign_quick=1
analyzeassign_group:=CONBB: N H CA C CB / CONSOLIDATED, CONALL: CONSOLIDATED, BB: N H CA C CB, ALL:*

randomseed  := 3771

These define

  • The population size for the genetic algorithm, i.e. how many assignments form one generation (25; chosen smaller than in normal production runs in order to speed up the calculation)
  • The maximal number of iterations during local optimization (15000).
  • An option to choose the "quick" optimization schedule.
  • Groups of atoms for which assignment statistics will be calculated and reported in the 'flya.txt' output file.
  • The seed value for the random number generator. Any positive integer number can be used.

There might be a command to restrict the generation expected peaks to a subset of atoms:

command select_atoms
  atoms select "* - CZ ?H* @ARG - ?Z @LYS"
end

Here, the zeta atoms of Arg and Lys are excluded, i.e. no expected peaks will be generated for these atoms (because they are only rarely observed in the spectra).

Specific labeling can be handled in the same way, and peak list-specific atom selections can be defined (not in this practical):

command XXX_select
  atoms select "..."
end

Finally, there is the command to start the FLYA algorithm:

flya runs=10 shiftreference=ref.prot structurepeaks=$structurepeaks assignpeaks=$assignpeaks

Here, the given parameters of the 'flya' command specify that

  • The number of independent runs of the algorithm, from which the consolidated shift will be calculated (chosen smaller than in normal production runs in order to speed up the calculation).
  • The results will be compared with the reference chemical shifts in the file 'ref.prot' (which have been determined independently by conventional methods). The reference chemical shifts will not be used by the algorithm but only for a subsequent analysis of its results.
  • The input peak lists that will be used (as defined above).

Run the FLYA calculation

To run the FLYA calculation, you start CYANA and execute the 'CALC.cya' script:

cyana "nproc=2; CALC"

By specifying the 'nproc=2' command, the independent runs of the algorithm will be performed in parallel. On a computer with multiple processors this will speed up the calculation, which is expected to take a few minutes.

FLYA output files

The FLYA algorithm will produce the following output files:

  • flya.prot: Consensus assigned chemical shifts. This file contains a chemical shift for every atom that has been assigned to least one peak.
  • flya.tab: Table with details about the chemical shift assignment of each atom (comparison with reference shifts). In this file you can see for each atom whether the assignment is "strong" (self-consistent) or "weak" (only tentative).
  • flya.txt: Assignment statistics
  • flya.pdf: Graphical representation of the assignment results
  • XXX_exp.peaks: List of expected peaks, corresponding to input peak list XXX.peaks
  • XXX_asn.peaks: Assigned peak list, corresponding to input peak list XXX.peaks

The flya.txt file

This output file starts with overall assignment statistics for each group of atoms as defined by 'analyzeassign_group:=...' in CALC.cya':

   ____________________________________________________________

   CHEMICAL SHIFT ASSIGNMENT
   ____________________________________________________________

   SEED: 1
   assigned from frequencies

   CONBB: REF 494 [FREQ 494 [ASN 494 T 491(99.4%) F 3 [INNERRES 0]]] ADDFREQ 4 [ASN 4]

   CONALL: REF 1096 [FREQ 1096 [ASN 1094 T 1047(95.5%) F 49 [INNERRES 35]]] ADDFREQ 18 [ASN 17]

   BB: REF 512 [FREQ 512 [ASN 512 T 498(97.3%) F 14 [INNERRES 3]]] ADDFREQ 30 [ASN 28]

   ALL: REF 1264 [FREQ 1261 [ASN 1257 T 1141(90.3%) F 120 [INNERRES 71]]] ADDFREQ 89 [ASN 85]
  • REF is the number of reference assignments (in the selected group)
  • FREQ is the number of atoms (with reference assignments) that could in principle be assigned by FLYA, i.e. for which at least one expected peak was generated.
  • ASN is the number of atoms assigned by FLYA
  • T is the number of correct FLYA assignments that agree (within the tolerance) with the reference. The percentage is relative to ASN.
  • F is the number of erroneous FLYA assignments that do not agree (within the tolerance) with the reference.
  • INNERRES are erroneous assignment for which the correct assignment is in the same residue.
  • ADDFREQ are additional atoms (without reference assignments) that could in principle be assigned by FLYA.
  • ASN are additional atoms (without reference assignments) that were assigned by FLYA.

Further below comes a table with information about each peak list:

   Lists      #Expected  noRef   noPeak   Assigned        Match    #Measured Assigned  exp/meas Assigned
   N15NOESY      1495     177     499   937( 62.68%)   788( 52.71%)   3008    766( 25.47%)   1.2
   C13NOESY      4852     304    2074  2808( 57.87%)  2169( 44.70%)  10886   2219( 20.38%)   1.3
   C13HSQC        556      27      85   457( 82.19%)   396( 71.22%)    407    345( 84.77%)   1.3
   N15HSQC        135      16       4   128( 94.81%)   113( 83.70%)    131    113( 86.26%)   1.1
   HCCHTOCSY     2796      87    1445  1344( 48.07%)  1042( 37.27%)   2363   1022( 43.25%)   1.3
   HCCHCOSY      1926      69    1102   878( 45.59%)   679( 35.25%)   2005    670( 33.42%)   1.3
   HNCA           211      15      11   192( 91.00%)   182( 86.26%)    329    175( 53.19%)   1.1
   HNcaCO         211      15      11   193( 91.47%)   179( 84.83%)    246    175( 71.14%)   1.1
   HNCO           105       7       1   101( 96.19%)    95( 90.48%)    158     97( 61.39%)   1.0
   HNcoCA         105       7       0   103( 98.10%)    96( 91.43%)    158     98( 62.03%)   1.1
   CBCANH         399      26      25   361( 90.48%)   344( 86.22%)    623    335( 53.77%)   1.1
   CBCAcoNH       200      13       2   196( 98.00%)   183( 91.50%)    324    187( 57.72%)   1.0
   HBHAcoNH       288      20      82   207( 71.88%)   188( 65.28%)    364    183( 50.27%)   1.1
   CcoNH          370      16      53   311( 84.05%)   277( 74.86%)    365    287( 78.63%)   1.1
   HCcoNH         540      22     225   313( 57.96%)   276( 51.11%)    442    256( 57.92%)   1.2
   ALL          14189     821    5619  8529( 60.11%)  7007( 49.38%)  21809   6928( 31.77%)   1.2

It contains the following data:

  • #Expected: Total number of expected peaks
  • noRef: Number of expected peaks with missing reference shifts
  • noPeak: Number of expected peaks for which no peak can be measured
  • Assigned: Number of expected peaks that could be assigned based on the reference chemical shift assignments. The theoretical maximum of 100% corresponds to the situation that the spectra “explain” all expected peaks. Each expected peak can be mapped to at most one measured peak. Remaining expected peaks correspond to missing peaks in the measured peak list.
  • Match: Number of assigned peaks that fit (within tolerance) reference shifts. The theoretical maximum of 100% corresponds to having all measured peaks assigned. Note that several expected peaks can be mapped to the same measured peak, i.e. the assignments of measured peaks can be unambiguous or ambiguous. Remaining unassigned measured peaks are likely to be artifacts.
  • #Measured: Total number of peaks in peak list
  • Assigned: Number of measured peaks that could be assigned to expected peaks
  • exp/meas: Ratio of assigned expected and measured peaks

There is more information on the results of the assignment calculation in the 'flya.txt' file (not described here).

The flya.tab file

This file provides information about the chemical shift assignment of each individual atom:

   Atom  Residue      Ref   Shift     Dev  Extent  inside   inref
   N     LYS   11 120.779 120.805  -0.026     5.0   100.0   100.0  strong=
   H     LYS   11   7.516   7.514   0.002     5.0   100.0   100.0  strong=
   CA    LYS   11  55.129  54.976   0.153     5.0   100.0   100.0  strong=
   HA    LYS   11   4.244   4.241   0.003     5.0    99.9   100.0  strong=
   CB    LYS   11  31.380  31.357   0.023     5.0   100.0   100.0  strong=
   HB2   LYS   11   1.490   1.486   0.004     5.0   100.0   100.0  strong=
   HB3   LYS   11   1.490   1.487   0.003     5.0   100.0   100.0  strong=
   CG    LYS   11  25.362  25.427  -0.065     5.0   100.0   100.0  strong=
   HG2   LYS   11   1.270   1.263   0.007     5.0   100.0   100.0  strong=
   HG3   LYS   11   1.446   1.456  -0.010     5.0    99.8   100.0  strong=
   CD    LYS   11  29.374  29.445  -0.071     5.0    65.6    60.0  =
   HD2   LYS   11   1.495   1.487   0.008     5.0    59.9    60.0  =
   HD3   LYS   11   1.495   2.045  -0.550     5.0    91.4     0.0  strong! (HB3 10)
   CE    LYS   11  42.023  42.013   0.010     5.0   100.0   100.0  strong=
   HE2   LYS   11   2.744   2.749  -0.005     5.0    99.9   100.0  strong=
   HE3   LYS   11   2.824   2.758   0.066     5.0    74.2     0.0  ! (HE2)
   C     LYS   11 174.616 176.282  -1.666     5.0    80.0    20.0  strong! (C 10)
  • Ref: Chemical shift value in the reference chemical shift list (ref.prot). It was not used in the calculation.
  • Shift: Consensus chemical shift value from FLYA
  • Dev = Ref - Shift
  • Extent: Number of runs in which the atom was assigned by FLYA.
  • Inside: Percentage of chemical shift values from the (5) independent runs of FLYA that agree (within the tolerance) with the consensus value.
  • inref: Percentage of chemical shift values from the (5) independent runs of FLYA that agree (within the tolerance) with the reference value.
  • Outcome of the assignment:
    • strong: "strong" assignment, i.e. Inside > 80%.
    • =: Assignment that agrees with reference, i.e. Dev < tolerance.
    • !: Assignment that does not agree with the reference, i.e. Dev > tolerance.
    • (atom name): Correct assignment, if within the same residue (no residue number given), or the neighboring residues.

The flya.pdf file

This PDF file provides a graphical representation of the 'flya.tab' file. Each assignment for an atom is represented by a colored rectangle.

flya.pdf
  • Green: Assignment by FLYA agrees with the manually determined reference assignment (within tolerance)
  • Red: Assignment by FLYA does not agree with the manually determined reference assignment
  • Blue: Assigned by FLYA but no reference available
  • Black: With reference assignment but not assigned by FLYA.

Respective light colors indicate assignments not classified as strong by the chemical shift consolidation. The row labeled HN/Hα shows for each residue HN on the left and Hα in the center. The N/Cα/C’ row shows for each residue the N, Cα, and C’ assignments from left to right. The rows β-η show the side-chain assignments for the heavy atoms in the center and hydrogen atoms to the left and right. In the case of branched side-chains, the corresponding row is split into an upper part for one branch and a lower part for the other branch.

Using input chemical shifts: shift predictions or partial assignments (optional)

Input chemical shift can be used in three ways.

These shifts will only be used for comparison (e.g. in flya.tab, flya.txt, flya.pdf):

shiftassign_reference:=ref.prot

Shifts and standard deviations in the file 'predicted.prot' (not provided in this practical) will replace the general statistics from cyana.lib (CSTABLE):

shiftassign_statistics:=predicted.prot

Shifts in the file 'fix.prot' will be fixed to the input values

shiftassign_fix:=fix.prot

The latter approach can for instance be used to perform sidechain assignment when the backbone assignment is already known.

If you want to do this, copy the original data to a new directory:

cd ~/guentert
tar zxf Flyaembo.tgz
mv flyaembo flyasc
cd flyasc

Then make a list of only the reference backbone chemical shifts. Start CYANA. In CYANA, enter the commands

read ref.prot
atom set "* - H N CA CB C" shift=none
write fix.prot
q

The file 'fix.prot' will contain the reference chemical shifts only for the backbone (and CB) atoms H, N, CA, CB, C'. Now you can repeat the assignment calculation by inserting the 'shiftassign_fix:=fix.prot' statement in 'CALC.cya' and choosing only the input peak lists that are relevant for sidechain assignment:

shiftassign_fix:=fix.prot
noesy:=N15NOESY,C13NOESY
assignpeaks:=C13H1,N15H1,HCCH24,HCCH7,HBHACONH,C_CO_NH,HC_CO_NH

Fully automated structure calculation

Automated resonance assignment, automated NOE restraint assignment, and the structure calculation by torsion angle dynamics can be combined by running the 'flya' command in 'CALC.cya' with the additional parameter 'stage=1' (which was commented out so far):

flya runs=10 shiftreference=ref.prot structurepeaks=$structurepeaks assignpeaks=$assignpeaks stage=1

The 'flya.prot' file from the automated resonance assignment will be used together with the (unassigned) NOESY peak lists to assign the NOESY peaks and to generate distance restraints in order to compute the three-dimensional structure of the protein.

To speed up the calculation, you can set in 'CALC.cya' (above the 'flya' command):

structures:=25,5
steps=4000

These commands tell the program to calculate, in each cycle, 25 conformers, and to analyze the best 5 of them. 4000 torsion angle dynamics steps will be applied per conformer.

7 cycle of automated NOE assignment and structure calculation will be performed. Statistics on the NOE assignment and the structure calculation will be in the file 'Table', which can also be produced with the command 'cyanatable -lp'.

The final structure will be 'final.pdb'. You can visualize it, for example, with the command

molmol -r 8-110 final.pdb

The optimal residue range for superposition can be found with the command

cyana overlay final.pdb

or with the CYRANGE web server.

Download results of fully automated structure calculation

If you cannot complete the fully automated structure calculation but want to look at the results that have been calculated previously, you may download them here (about 24 MB).