Automated resonance assignment with FLYA (EMBO 2013): Difference between revisions
Line 182: | Line 182: | ||
* '''flya.prot:''' Consensus assigned chemical shifts. This file contains a chemical shift for every atom that has been assigned to least one peak. | * '''flya.prot:''' Consensus assigned chemical shifts. This file contains a chemical shift for every atom that has been assigned to least one peak. | ||
* '''flya.tab: Table with details about the chemical shift assignment of each atom (comparison with reference shifts). In this file you can see for each atom whether the assignment is "strong" (self-consistent) or "weak" (only tentative). | * '''flya.tab:''' Table with details about the chemical shift assignment of each atom (comparison with reference shifts). In this file you can see for each atom whether the assignment is "strong" (self-consistent) or "weak" (only tentative). | ||
* '''flya.txt: Assignment statistics | * '''flya.txt:''' Assignment statistics | ||
* '''flya.pdf: Graphical representation of the assignment results | * '''flya.pdf:''' Graphical representation of the assignment results | ||
* '''''XXX''_exp.peaks:''' List of expected peaks, corresponding to input peak list ''XXX''.peaks | * '''''XXX''_exp.peaks:''' List of expected peaks, corresponding to input peak list ''XXX''.peaks | ||
* '''''XXX''_asn.peaks:''' Assigned peak list, corresponding to input peak list ''XXX''.peaks | * '''''XXX''_asn.peaks:''' Assigned peak list, corresponding to input peak list ''XXX''.peaks | ||
=== The flya.txt file === | |||
This output file starts with overall assignment statistics for each group of atoms as defined by 'analyze_assign_group:=...' in RUNFLYA.cya': | |||
____________________________________________________________ | |||
CHEMICAL SHIFT ASSIGNMENT | |||
____________________________________________________________ | |||
SEED: 1 | |||
assigned from frequencies | |||
CONBB: REF 493 [FREQ 493 [ASN 493 T 489(99.2%) F 4 [INNERRES 0]]] ADDFREQ 11 [ASN 11] | |||
CONALL: REF 1141 [FREQ 1141 [ASN 1139 T 1073(94.0%) F 68 [INNERRES 45]]] ADDFREQ 30 [ASN 29] | |||
BB: REF 512 [FREQ 512 [ASN 512 T 502(98.0%) F 10 [INNERRES 4]]] ADDFREQ 30 [ASN 28] | |||
ALL: REF 1264 [FREQ 1261 [ASN 1259 T 1151(91.1%) F 110 [INNERRES 63]]] ADDFREQ 89 [ASN 83] | |||
* REF is the number of reference assignments (in the selected group) | |||
* FREQ | |||
* ASN is the number of atoms assigned by FLYA | |||
* T is the number of correct FLYA assignments that agree (within the tolerance) with the reference. The percentage is relative to ASN. | |||
* F is the number of erroneous FLYA assignments that do not agree (within the tolerance) with the reference. | |||
* INNERRES are erroneous assignment for which the correct assignment is in the same residue. | |||
* ADDFREQ are additional atoms (without reference assignments) that could in principle be assigned by FLYA. | |||
* ASN are additional atoms (without reference assignments) that were assigned by FLYA. | |||
== Using input chemical shifts (predictions or partial assignments == | == Using input chemical shifts (predictions or partial assignments == |
Revision as of 13:31, 22 July 2013
In this tutorial we want to determine the resonance assignments and the structure of a protein using the program CYANA.
CYANA setup for the EMBO Practical Course NMR in Basel (20–27 July 2013)
Please follow the following steps carefully (exact Linux commands are given below):
- From your home directory, change to the directory 'guentert'.
- Execute the setup script.
- Download the input data for the practical.
- Unpack the input data for the practical.
- Change into the subdirectory 'flyaembo' where the input data for the practical are.
- Test whether CYANA can be started by typing its name, 'cyana'.
- Exit from CYANA by typing 'q' or 'quit'.
cd ~/guentert source setup-guentert wget 'http://www.cyana.org/wiki/images/3/31/Flyaembo.tgz' tar zxf Flyaembo.tgz cd flyaembo cyana ___________________________________________________________________ CYANA 3.96 (linux-intel) Copyright (c) 2002-12 Peter Guentert. All rights reserved. ___________________________________________________________________ Time-limited license valid until 2013-12-31. Library file "/home/guentert/src/cyana-3.96/lib/cyana.lib" read, 38 residue types. Sequence file "demo.seq" read, 114 residues. cyana> q
If all worked, you are ready to go!
If you want to return to your practical later, using your own Linux or Mac OS X computer, you can download the demo version of CYANA from here. This is not necessary on the workshop computers where the software is already installed.
Experimental input data
The protein sequence is stored in three-letter code in the file 'demo.seq'.
The following spectra have been measured:
- [1H,13C]-HSQC (called 'C13H1' in FLYA)
- [1H,15N]-HSQC (called 'N15H1' in FLYA)
- HNCA
- HN(CO)CA (called 'HN_CO_CA' in FLYA)
- HNCO
- HN(CA)CO (called 'HN_CA_CO' in FLYA)
- CBCANH
- CBCACONH
- HBHACONH
- HCCH-TOCSY (called 'HCCH24' in FLYA)
- HCCH-COSY (called 'HCCH7' in FLYA)
- C(CO)NH (called 'C_CO_NH' in FLYA)
- HC(CO)NH (called 'HC_CO_NH' in FLYA)
Peak lists in XEASY format that have been prepared by automatic peak picking with the program NMRView are stored in files XXX.peaks, where XXX denotes the FLYA spectrum type.
Each peak list starts with a header that defines the experiment type and the order of dimensions. For instance, for HNCA.peaks:
# Number of dimensions 3 #FORMAT xeasy3D #INAME 1 HN #INAME 2 C #INAME 3 N #SPECTRUM HNCA HN C N 5 6.475 58.033 98.548 1 U 2.769E+02 0.000E+00 e 0 0 0 0 6 6.476 62.123 98.126 1 U 2.571E+01 0.000E+00 e 0 0 0 0 7 6.475 54.017 98.159 1 U 2.547E+01 0.000E+00 e 0 0 0 0
The first line specifies the number of dimensions (3 in this case). The '#SPECTRUM' lines gives the experiment type (HNCA, which refers to the corresponding experiment definition in the CYANA library), followed by an identifier for each dimension of the peak list (HN C N) that specifies which chemical shift is stored in the corresponding dimension of the peak list. These labels must match those in the corresponding experiment definition in the general CYANA library (see below). After the '#SPECTRUM' line follows one line for every peak. For example, the first peak in the 'HNCA.peaks' list has
- Peak number 5
- HN chemical shift 6.475 ppm
- C (CA) chemical shift 58.033 ppm
- N chemical shift 98.548 ppm
The other data are irrelevant for automated chemical shift assignment with FLYA. In particular, the peak volume or intensity (2.769E+02) is not used by the algorithm.
Hint: The formats of other CYANA files are described in the CYANA 3.0 Reference Manual.
FLYA initialization script
The CYANA commands to run the automated assignment calculation are stored in two CYANA scripts or "macros".
One has the fixed name 'init.cya' and is executed automatically each time CYANA is started. It can also be called any one wants to reinitialize the program. It contains normally at least two commands that read the CYANA library and the protein sequence:
cyanalib read demo.seq
The command 'cyanalib' reads the standard CYANA library. The second command reads the protein sequence.
Experiment definitions in the CYANA library
When you start CYANA, the program reads the library and displays the full path name of the library file. You can open the standard library file to inspect, for example, the NMR experiment definitions that define which expected peaks are generated by FLYA. For instance, the definition for the HNCA spectrum (search for 'HNCA' is
SPECTRUM HNCA HN N C 0.980 HN:H_AMI N:N_AM* C_A* C_BYL C:C_ALI 0.800 HN:H_AMI N:N_AMI C_BYL C_ALI N_AMI C:C_ALI
The first line corresponds to the '#SPECTRUM' line in the peak list. It specifies the experiment name and a label for the atoms that are detected in each dimension of the spectrum. The number of labels defines the dimensionality of the experiment (3 in case of HNCA).
Each line below defines a (formal) magnetization transfer pathway that gives rise to an expected peak. in the case of HNCA there are two lines, corresponding to the intraresidual and sequential peak. For instance, the definition for the intraresidual peak starts with the probability to observe the peak (0.980), followed by a series of atom types, e.g. H_AMI for amide proton etc. An expected peak is generated for each molecular fragment in which these atom types occur connected by single covalent bonds. The atoms whose chemical shifts appear in the spectrum are identified by their labels followed by ':', e.g. for HNCA 'HN:', 'N:', and 'C:'.
FLYA execution script
The CYANA script 'RUNFLYA.cya' contains the commands to perform the automated resonance assignment. It starts with the specification of the names of the input peak lists:
noesy:=N15NOESY,C13NOESY scalar:=C13H1,N15H1,HCCH24,HCCH7,HNCA,HN_CA_CO,HNCO,HN_CO_CA,CBCANH,CBCACONH,HBHACONH,C_CO_NH,HC_CO_NH #scalar:=C13H1,N15H1,CBCACONH,C_CO_NH,HC_CO_NH
"Through-space" experiments are given in the 'noesy:=...' statement; "through-bond" ones in the 'scalar:=...' statement. The peak list names are separated by commas (without blanks!).
The commands above will use all available peak lists. You can choose any subset of them by modifying the 'noesy:=...' or 'scalar:=...' statements.
Try it! For instance, do make only backbone assignment using the triple resonance backbone assignment spectra, you can set
noesy:= scalar:=HNCA,HN_CA_CO,HNCO,HN_CO_CA,CBCANH,CBCACONH,HBHACONH
These are followed by tolerances for chemical shift matching:
tolerance:=0.03,0.03,0.4 assigncs_accH:=tolerance(1) assigncs_accC:=tolerance(3) assigncs_accN:=tolerance(3)
In this case a tolerance of 0.03 ppm will be used for protons, and 0.4 ppm will be used for carbon and nitrogen.
Parameters for the FLYA algorithm come next:
run_assign_population:=20 run_assign_iterations:=15000 run_assign_reference:=ref.prot analyze_assign_group:=CONBB: N H CA C CB / CONSOLIDATED, CONALL: CONSOLIDATED, BB: N H CA C CB, ALL:* randomseed := 3771
These define
- The population size for the genetic algorithm, i.e. how many assignments form one generation (20; chosen smaller than the
- The maximal number of iterations during local optimization (15000).
- A file with reference chemical shifts (ref.prot). These will not be used by the algorithm but only for a subsequent analysis of its results.
- Groups of atoms for which assignment statistics will be calculated and reported in the 'flya.txt' output file.
- The seed value for the random number generator. Any positive integer number can be used.
There might be a command to restrict the generation expected peaks to a subset of atoms:
command select_atoms atoms select "* - CZ ?H* @ARG - ?Z @LYS" end
Here, the zeta atoms of Arg and Lys are excluded, i.e. no expected peaks will be generated for these atoms (because they are only rarely observed in the spectra).
Specific labeling can be handled in the same way, and peak list-specific atom selections can be defined (not in this practical):
command XXX_select atoms select "..." end
Finally, there is the command to start the FLYA algorithm:
flya refprot=ref.prot noesy=$noesy scalar=$scalar runs=5 #stage=1
Here, the given parameters of the 'flya' command specify that
- The results will be compared with the reference chemical shifts in the file 'ref.prot' (which have been determined independently by conventional methods).
- The input peak lists that will be used (as defined above).
- The number of independent runs of the algorithm, from which the consolidated shift will be calculated.
Run the FLYA calculation
To run the FLYA calculation, you start CYANA and execute the 'RUNFLYA.cya' script:
cyana "nproc=5; RUNFLYA"
By specifying the 'nproc=5' command, the 5 independent runs of the algorithm will be performed in parallel. On a computer with multiple processors this will speed up the calculation, which is expected to take a few minutes.
FLYA output files
The FLYA algorithm will produce the following output files:
- flya.prot: Consensus assigned chemical shifts. This file contains a chemical shift for every atom that has been assigned to least one peak.
- flya.tab: Table with details about the chemical shift assignment of each atom (comparison with reference shifts). In this file you can see for each atom whether the assignment is "strong" (self-consistent) or "weak" (only tentative).
- flya.txt: Assignment statistics
- flya.pdf: Graphical representation of the assignment results
- XXX_exp.peaks: List of expected peaks, corresponding to input peak list XXX.peaks
- XXX_asn.peaks: Assigned peak list, corresponding to input peak list XXX.peaks
The flya.txt file
This output file starts with overall assignment statistics for each group of atoms as defined by 'analyze_assign_group:=...' in RUNFLYA.cya':
____________________________________________________________ CHEMICAL SHIFT ASSIGNMENT ____________________________________________________________ SEED: 1 assigned from frequencies CONBB: REF 493 [FREQ 493 [ASN 493 T 489(99.2%) F 4 [INNERRES 0]]] ADDFREQ 11 [ASN 11] CONALL: REF 1141 [FREQ 1141 [ASN 1139 T 1073(94.0%) F 68 [INNERRES 45]]] ADDFREQ 30 [ASN 29] BB: REF 512 [FREQ 512 [ASN 512 T 502(98.0%) F 10 [INNERRES 4]]] ADDFREQ 30 [ASN 28] ALL: REF 1264 [FREQ 1261 [ASN 1259 T 1151(91.1%) F 110 [INNERRES 63]]] ADDFREQ 89 [ASN 83]
- REF is the number of reference assignments (in the selected group)
- FREQ
- ASN is the number of atoms assigned by FLYA
- T is the number of correct FLYA assignments that agree (within the tolerance) with the reference. The percentage is relative to ASN.
- F is the number of erroneous FLYA assignments that do not agree (within the tolerance) with the reference.
- INNERRES are erroneous assignment for which the correct assignment is in the same residue.
- ADDFREQ are additional atoms (without reference assignments) that could in principle be assigned by FLYA.
- ASN are additional atoms (without reference assignments) that were assigned by FLYA.
Using input chemical shifts (predictions or partial assignments
Input chemical shift can be used in three ways.
These shifts will only be used for comparison (e.g. in flya.tab, flya.txt, flya.pdf):
run_assign_reference:=ref.prot
Shifts and standard deviations in the file 'predicted.prot' (not provided in this practical) will replace the general statistics from cyana.lib (CSTABLE):
run_assign_statistics:=predicted.prot
Shifts in the file 'fixed.prot' will be fixed to the input values
run_assign_fix:=fix.prot
The latter approach can for instance be used to perform sidechain assignment when the backbone assignment is already known.