Automated resonance assignment with FLYA (EMBO 2017): Difference between revisions
Line 113: | Line 113: | ||
== FLYA execution script == | == FLYA execution script == | ||
The CYANA script ' | The CYANA script 'CALC.cya' contains the commands to perform the automated resonance assignment. It starts with the specification of the names of the input peak lists: | ||
peaks:=N15NOESY,C13NOESY,C13HSQC,N15HSQC,HCCHTOCSY,HCCHCOSY,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH,HBHAcoNH,CcoNH,HCcoNH | |||
The peak list names are separated by commas (without blanks!). The files on disk have the file name extension .peaks, e.g. N15NOESY.peaks. | |||
The commands above will use all available peak lists. You can choose any subset of them by modifying the ' | The commands above will use all available peak lists. You can choose any subset of them by modifying the 'peaks:=...' statement. | ||
'''Try it!''' For instance, to make only backbone assignment using the <sup>15</sup>N-HSQC and triple resonance backbone assignment spectra, you can set | '''Try it!''' For instance, to make only backbone assignment using the <sup>15</sup>N-HSQC and triple resonance backbone assignment spectra, you can set | ||
peaks:=N15HSQC,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH | |||
These are followed by tolerances for chemical shift matching: | These are followed by tolerances for chemical shift matching: | ||
assigncs_accH=0.03 | |||
assigncs_accH | assigncs_accC=0.4 | ||
assigncs_accC | assigncs_accN=assigncs_accC | ||
assigncs_accN:= | tolerance:=$assigncs_accH,$assigncs_accH,$assigncs_accC | ||
In this case a tolerance of 0.03 ppm will be used for protons, and 0.4 ppm will be used for carbon and nitrogen. | In this case a tolerance of 0.03 ppm will be used for protons, and 0.4 ppm will be used for carbon and nitrogen. |
Revision as of 13:32, 6 August 2017
In this tutorial we will determine the resonance assignments and the structure of a protein using the program CYANA.
CYANA setup for the EMBO Practical Course NMR in Basel (5-12 August 2017)
Please follow the following steps carefully (exact Linux commands are given below; you may copy them to a terminal):
- In your home directory, make a directory 'cyana' and change into it.
- Download the input data for the practical.
- Unpack the input data for the practical.
- Copy it to different directory, 'flya01' (to keep the original data in 'flyadata' to run another calculation later)
- Change into the subdirectory 'flya01' where the input data for the practical are.
- Test whether CYANA can be started by typing its name, 'cyana'.
- Exit from CYANA by typing 'q' or 'quit'.
cd ~ mkdir cyana cd cyana wget 'http://www.cyana.org/wiki/images/4/46/Flyadata.tgz' tar zxf Flyadata.tgz cp -a flyadata flya01 cd flya01 cyana ___________________________________________________________________ CYANA 3.98 (linux64-intel) Copyright (c) 2002-17 Peter Guentert. All rights reserved. ___________________________________________________________________ Demo license valid for specific sequences until 2017-12-31 Library file "/home/guentert_l/cyana-3.98/lib/cyana.lib" read, 41 residue types. Sequence file "demo.seq" read, 114 residues. cyana> q
If all worked, you are ready to go!
If you want to return to your practical later, using your own Linux or Mac OS X computer, you can download the demo version of CYANA from here. This is not necessary on the workshop computers where the software is already installed.
Hint: More information on the CYANA commands etc. is in the CYANA 3.0 Reference Manual.
Experimental input data
The protein sequence is stored in three-letter code in the file 'demo.seq'.
The following spectra have been measured:
- [1H,13C]-HSQC (called 'C13HSQC' in FLYA)
- [1H,15N]-HSQC (called 'N15HSQC' in FLYA)
- 3D [13C]-resolved NOESY (called 'C13NOESY' in FLYA)
- 3D [15N]-resolved NOESY (called 'N15NOESY' in FLYA)
- HNCA
- HN(CO)CA (called 'HNcoCA' in FLYA)
- HNCO
- HN(CA)CO (called 'HNcaCO' in FLYA)
- CBCANH
- CBCACONH (called 'CBCAcoNH' in FLYA)
- HBHACONH (called 'HBHAcoNH' in FLYA)
- HCCH-TOCSY (called 'HCCHTOCSY' in FLYA)
- HCCH-COSY (called 'HCCHCOSY' in FLYA)
- C(CO)NH (called 'CcoNH' in FLYA)
- HC(CO)NH (called 'HCcoNH' in FLYA)
Peak lists in XEASY format that have been prepared by automatic peak picking with the program NMRView are stored in files XXX.peaks, where XXX denotes the FLYA spectrum type.
Each peak list starts with a header that defines the experiment type and the order of dimensions. For instance, for HNCA.peaks:
# Number of dimensions 3 #FORMAT xeasy3D #INAME 1 HN #INAME 2 C #INAME 3 N #SPECTRUM HNCA HN C N 5 6.475 58.033 98.548 1 U 2.769E+02 0.000E+00 e 0 0 0 0 6 6.476 62.123 98.126 1 U 2.571E+01 0.000E+00 e 0 0 0 0 7 6.475 54.017 98.159 1 U 2.547E+01 0.000E+00 e 0 0 0 0
The first line specifies the number of dimensions (3 in this case). The '#SPECTRUM' lines gives the experiment type (HNCA, which refers to the corresponding experiment definition in the CYANA library), followed by an identifier for each dimension of the peak list (HN C N) that specifies which chemical shift is stored in the corresponding dimension of the peak list. These labels must match those in the corresponding experiment definition in the general CYANA library (see below). After the '#SPECTRUM' line follows one line for every peak. For example, the first peak in the 'HNCA.peaks' list has
- Peak number 5
- HN chemical shift 6.475 ppm
- C (CA) chemical shift 58.033 ppm
- N chemical shift 98.548 ppm
The other data are irrelevant for automated chemical shift assignment with FLYA. In particular, the peak volume or intensity (2.769E+02) is not used by the algorithm.
Hint: The formats of other CYANA files are described in the CYANA 3.0 Reference Manual.
Experiment definitions in the CYANA library
When you start CYANA, the program reads the library and displays the full path name of the library file. You can open the standard library file to inspect, for example, the NMR experiment definitions that define which expected peaks are generated by FLYA. For instance, the definition for the HNCA spectrum (search for 'HNCA' in the library file 'cyana.lib') is
SPECTRUM HNCA HN N C 0.980 HN:H_AMI N:N_AM* C:C_ALI C_BYL 0.800 HN:H_AMI N:N_AMI (C_ALI) C_BYL C:C_ALI
The first line corresponds to the '#SPECTRUM' line in the peak list. It specifies the experiment name and a label for the atoms that are detected in each dimension of the spectrum. The number of labels defines the dimensionality of the experiment (3 in case of HNCA).
Each line below defines a (formal) magnetization transfer pathway that gives rise to an expected peak. in the case of HNCA there are two lines, corresponding to the intraresidual and sequential peak. For instance, the definition for the intraresidual peak starts with the probability to observe the peak (0.980), followed by a series of atom types, e.g. H_AMI for amide proton etc. An expected peak is generated for each molecular fragment in which these atom types occur connected by single covalent bonds. The atoms whose chemical shifts appear in the spectrum are identified by their labels followed by ':', e.g. for HNCA 'HN:', 'N:', and 'C:'.
FLYA execution script
The CYANA script 'CALC.cya' contains the commands to perform the automated resonance assignment. It starts with the specification of the names of the input peak lists:
peaks:=N15NOESY,C13NOESY,C13HSQC,N15HSQC,HCCHTOCSY,HCCHCOSY,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH,HBHAcoNH,CcoNH,HCcoNH
The peak list names are separated by commas (without blanks!). The files on disk have the file name extension .peaks, e.g. N15NOESY.peaks.
The commands above will use all available peak lists. You can choose any subset of them by modifying the 'peaks:=...' statement.
Try it! For instance, to make only backbone assignment using the 15N-HSQC and triple resonance backbone assignment spectra, you can set
peaks:=N15HSQC,HNCA,HNcaCO,HNCO,HNcoCA,CBCANH,CBCAcoNH
These are followed by tolerances for chemical shift matching:
assigncs_accH=0.03 assigncs_accC=0.4 assigncs_accN=assigncs_accC tolerance:=$assigncs_accH,$assigncs_accH,$assigncs_accC
In this case a tolerance of 0.03 ppm will be used for protons, and 0.4 ppm will be used for carbon and nitrogen.
Parameters for the FLYA algorithm come next:
shiftassign_population:=25 shiftassign_iterations:=15000 shiftassign_quick=1 analyzeassign_group:=CONBB: N H CA C CB / CONSOLIDATED, CONALL: CONSOLIDATED, BB: N H CA C CB, ALL:* randomseed := 3771
These define
- The population size for the genetic algorithm, i.e. how many assignments form one generation (25; chosen smaller than in normal production runs in order to speed up the calculation)
- The maximal number of iterations during local optimization (15000).
- An option to choose the "quick" optimization schedule.
- Groups of atoms for which assignment statistics will be calculated and reported in the 'flya.txt' output file.
- The seed value for the random number generator. Any positive integer number is fine.
There might be a command to restrict the generation expected peaks to a subset of atoms:
command select_atoms atoms select "* - CZ ?H* @ARG - ?Z @LYS" end
Here, the zeta atoms of Arg and Lys are excluded, i.e. no expected peaks will be generated for these atoms (because they are only rarely observed in the spectra).
Specific labeling can be handled in the same way, and peak list-specific atom selections can be defined (not in this practical):
command XXX_select atoms select "..." end
Finally, there is the command to start the FLYA algorithm:
flya runs=10 shiftreference=ref.prot structurepeaks=$structurepeaks assignpeaks=$assignpeaks
Here, the given parameters of the 'flya' command specify that
- The number of independent runs of the algorithm, from which the consolidated shift will be calculated (chosen smaller than in normal production runs in order to speed up the calculation).
- The results will be compared with the reference chemical shifts in the file 'ref.prot' (which have been determined independently by conventional methods). The reference chemical shifts will not be used by the algorithm but only for a subsequent analysis of its results.
- The input peak lists that will be used (as defined above).
Run the FLYA calculation
To run the FLYA calculation, you start CYANA and execute the 'ASSIGN.cya' script:
cyana "nproc=2; ASSIGN"
By specifying the 'nproc=2' command, the independent runs of the algorithm will be performed in parallel. On a computer with multiple processors this will speed up the calculation, which is expected to take a few minutes.
FLYA output files
The FLYA algorithm will produce the following output files:
- flya.prot: Consensus assigned chemical shifts. This file contains a chemical shift for every atom that has been assigned to least one peak.
- flya.tab: Table with details about the chemical shift assignment of each atom (comparison with reference shifts). In this file you can see for each atom whether the assignment is "strong" (self-consistent) or "weak" (only tentative).
- flya.txt: Assignment statistics
- flya.pdf: Graphical representation of the assignment results
- XXX_exp.peaks: List of expected peaks, corresponding to input peak list XXX.peaks
- XXX_asn.peaks: Assigned peak list, corresponding to input peak list XXX.peaks
The flya.txt file
This output file starts with overall assignment statistics for each group of atoms as defined by 'analyzeassign_group:=...' in ASSIGN.cya':
____________________________________________________________ CHEMICAL SHIFT ASSIGNMENT ____________________________________________________________ SEED: 1 chemical shifts for 1350 atoms found Peaks assigned from frequencies CONBB: REFERENCES(2):494 CHEMICALSHIFTS(1):498 (1)and(2):494 MATCH:491(99.4% of (2)) CONALL: REFERENCES(2):1096 CHEMICALSHIFTS(1):1114 (1)and(2):1096 MATCH:1047(95.5% of (2)) BB: REFERENCES(2):512 CHEMICALSHIFTS(1):542 (1)and(2):512 MATCH:498(97.3% of (2)) ALL: REFERENCES(2):1264 CHEMICALSHIFTS(1):1350 (1)and(2):1261 MATCH:1141(90.3% of (2))
- REFERENCES(2) is the number of reference assignments (in the selected group)
- CHEMICALSHIFTS(1) is is the number of atoms assigned by FLYA
- (1)and(2) is the number of atoms that are assigned by FLYA and in the reference.
- MATCH is the number of atoms with the same assignment by FLYA and in the reference. The percentage is relative to the number of reference assignments.
Further below comes a table with information about each peak list:
Lists #Expected noRef noPeak Assigned Match #Measured Assigned exp/meas Assigned N15NOESY 1495 177 499 940( 62.88%) 790( 52.84%) 3008 768( 25.53%) 1.2 C13NOESY 4852 304 2074 2786( 57.42%) 2137( 44.04%) 10886 2178( 20.01%) 1.3 C13HSQC 556 27 85 453( 81.47%) 387( 69.60%) 407 335( 82.31%) 1.4 N15HSQC 135 16 4 128( 94.81%) 113( 83.70%) 131 113( 86.26%) 1.1 HCCHTOCSY 2796 87 1445 1289( 46.10%) 1000( 35.77%) 2363 971( 41.09%) 1.3 HCCHCOSY 1926 69 1102 869( 45.12%) 673( 34.94%) 2005 661( 32.97%) 1.3 HNCA 211 15 11 192( 91.00%) 182( 86.26%) 329 175( 53.19%) 1.1 HNcaCO 211 15 11 193( 91.47%) 179( 84.83%) 246 175( 71.14%) 1.1 HNCO 105 7 1 101( 96.19%) 95( 90.48%) 158 97( 61.39%) 1.0 HNcoCA 105 7 0 103( 98.10%) 96( 91.43%) 158 98( 62.03%) 1.1 CBCANH 399 26 25 361( 90.48%) 344( 86.22%) 623 335( 53.77%) 1.1 CBCAcoNH 200 13 2 196( 98.00%) 183( 91.50%) 324 187( 57.72%) 1.0 HBHAcoNH 288 20 82 207( 71.88%) 188( 65.28%) 364 183( 50.27%) 1.1 CcoNH 370 16 53 311( 84.05%) 277( 74.86%) 365 287( 78.63%) 1.1 HCcoNH 540 22 225 313( 57.96%) 276( 51.11%) 442 256( 57.92%) 1.2 ALL 14189 821 5619 8442( 59.50%) 6920( 48.77%) 21809 6819( 31.27%) 1.2
It contains the following data:
- #Expected: Total number of expected peaks
- noRef: Number of expected peaks with missing reference shifts
- noPeak: Number of expected peaks for which no peak can be measured
- Assigned: Number of expected peaks that could be assigned based on the reference chemical shift assignments. The theoretical maximum of 100% corresponds to the situation that the spectra “explain” all expected peaks. Each expected peak can be mapped to at most one measured peak. Remaining expected peaks correspond to missing peaks in the measured peak list.
- Match: Number of assigned peaks that fit (within tolerance) reference shifts. The theoretical maximum of 100% corresponds to having all measured peaks assigned. Note that several expected peaks can be mapped to the same measured peak, i.e. the assignments of measured peaks can be unambiguous or ambiguous. Remaining unassigned measured peaks are likely to be artifacts.
- #Measured: Total number of peaks in peak list
- Assigned: Number of measured peaks that could be assigned to expected peaks
- exp/meas: Ratio of assigned expected and measured peaks
There is more information on the results of the assignment calculation in the 'flya.txt' file (not described here).
The flya.tab file
This file provides information about the chemical shift assignment of each individual atom:
Atom Residue Ref Shift Dev Extent inside inref N LYS 11 120.779 120.805 -0.026 5.0 100.0 100.0 strong= H LYS 11 7.516 7.514 0.002 5.0 100.0 100.0 strong= CA LYS 11 55.129 54.976 0.153 5.0 100.0 100.0 strong= HA LYS 11 4.244 4.241 0.003 5.0 99.9 100.0 strong= CB LYS 11 31.380 31.357 0.023 5.0 100.0 100.0 strong= HB2 LYS 11 1.490 1.486 0.004 5.0 100.0 100.0 strong= HB3 LYS 11 1.490 1.487 0.003 5.0 100.0 100.0 strong= CG LYS 11 25.362 25.427 -0.065 5.0 100.0 100.0 strong= HG2 LYS 11 1.270 1.263 0.007 5.0 100.0 100.0 strong= HG3 LYS 11 1.446 1.456 -0.010 5.0 99.8 100.0 strong= CD LYS 11 29.374 29.445 -0.071 5.0 65.6 60.0 = HD2 LYS 11 1.495 1.487 0.008 5.0 59.9 60.0 = HD3 LYS 11 1.495 2.045 -0.550 5.0 91.4 0.0 strong! (HB3 10) CE LYS 11 42.023 42.013 0.010 5.0 100.0 100.0 strong= HE2 LYS 11 2.744 2.749 -0.005 5.0 99.9 100.0 strong= HE3 LYS 11 2.824 2.758 0.066 5.0 74.2 0.0 ! (HE2) C LYS 11 174.616 176.282 -1.666 5.0 80.0 20.0 strong! (C 10)
- Ref: Chemical shift value in the reference chemical shift list (ref.prot). It was not used in the calculation.
- Shift: Consensus chemical shift value from FLYA
- Dev = Ref - Shift
- Extent: Number of runs in which the atom was assigned by FLYA.
- Inside: Percentage of chemical shift values from the (5) independent runs of FLYA that agree (within the tolerance) with the consensus value.
- inref: Percentage of chemical shift values from the (5) independent runs of FLYA that agree (within the tolerance) with the reference value.
- Outcome of the assignment:
- strong: "strong" assignment, i.e. Inside > 80%.
- =: Assignment that agrees with reference, i.e. Dev < tolerance.
- !: Assignment that does not agree with the reference, i.e. Dev > tolerance.
- (atom name): Correct assignment, if within the same residue (no residue number given), or the neighboring residues.
The flya.pdf file
This PDF file provides a graphical representation of the 'flya.tab' file. Each assignment for an atom is represented by a colored rectangle.
- Green: Assignment by FLYA agrees with the manually determined reference assignment (within tolerance)
- Red: Assignment by FLYA does not agree with the manually determined reference assignment
- Blue: Assigned by FLYA but no reference available
- Black: With reference assignment but not assigned by FLYA.
Respective light colors indicate assignments not classified as strong by the chemical shift consolidation. The row labeled HN/Hα shows for each residue HN on the left and Hα in the center. The N/Cα/C’ row shows for each residue the N, Cα, and C’ assignments from left to right. The rows β-η show the side-chain assignments for the heavy atoms in the center and hydrogen atoms to the left and right. In the case of branched side-chains, the corresponding row is split into an upper part for one branch and a lower part for the other branch.
Using input chemical shifts: shift predictions or partial assignments (optional)
Input chemical shift can be used in three ways.
These shifts will only be used for comparison (e.g. in flya.tab, flya.txt, flya.pdf):
shiftassign_reference:=ref.prot
Shifts and standard deviations in the file 'predicted.prot' (not provided in this practical) will replace the general statistics from cyana.lib (CSTABLE):
shiftassign_statistics:=predicted.prot
Shifts in the file 'fix.prot' will be fixed to the input values
shiftassign_fix:=fix.prot
The latter approach can for instance be used to perform sidechain assignment when the backbone assignment is already known.
If you want to do this, copy the original data to a new directory:
cd ~/guentert tar zxf Flyaembo.tgz mv flyaembo flyasc cd flyasc
Then make a list of only the reference backbone chemical shifts. Start CYANA. In CYANA, enter the commands
read ref.prot atom set "* - H N CA CB C" shift=none write fix.prot q
The file 'fix.prot' will contain the reference chemical shifts only for the backbone (and CB) atoms H, N, CA, CB, C'. Now you can repeat the assignment calculation by inserting the 'shiftassign_fix:=fix.prot' statement in 'ASSIGN.cya' and choosing only the input peak lists that are relevant for sidechain assignment:
shiftassign_fix:=fix.prot noesy:=N15NOESY,C13NOESY assignpeaks:=C13H1,N15H1,HCCH24,HCCH7,HBHACONH,C_CO_NH,HC_CO_NH
Automated NOESY assignment and structure calculation
Automated NOE restraint assignment and the structure calculation by torsion angle dynamics can be done with 'CALC.cya' macro. The 'flya.prot' file from the automated resonance assignment will be used together with the (unassigned) NOESY peak lists to assign the NOESY peaks and to generate distance restraints in order to compute the three-dimensional structure of the protein.
TALOS+ can be used to generate torsion angle restraints from the backbone chemical shifts in 'flya.prot'. To do this, use the CYANA commands
read flya.prot talos write talos.aco
This will call the program TALOS+ and store the resulting torsion angle restraints in the file 'talos.aco'.
For further information about automated NOESY assignment you can consult the Tutorial Structure calculation with automated NOESY assignment (which uses different file names than we have here).
To speed up the calculation, you can set in 'CALC.cya':
structures:=50,10 steps=5000
These commands tell the program to calculate, in each cycle, 50 conformers, and to analyze the best 10 of them. 5000 torsion angle dynamics steps will be applied per conformer.
7 cycle of automated NOE assignment and structure calculation will be performed by running the command
cyana "nproc=2; CALC"
Statistics on the NOE assignment and the structure calculation will be in the file 'Table', which can also be produced with the command 'cyanatable -lp'.
The final structure will be 'final.pdb'. You can visualize it, for example, with the command
molmol -r 8-110 final.pdb
The optimal residue range for superposition can be found with the command
cyana overlay final.pdb
or with the CYRANGE web server.
Download results of fully automated structure calculation
If you cannot complete the fully automated structure calculation but want to look at the results that have been calculated previously, you may download them here (about 24 MB).