Automated calculation of a protein-ligand complex structure (AUREMN, Brazil 2018)
In this tutorial we will determine the resonance assignments and the structure of a protein-ligand complex using modules of CYANA.
To this end we will first run FLYA on the complexed protein structure without intermolecular cross peak lists and use noeassign to assign the NOESY spectra and calculate the protein structure by itself.
In a next step we will create a ligand.mol2 from scratch and generate and ligand.lib file. Then we will assign intermolecular peaks lists and redo the structure calculation, this time to the protein-ligand complex.
To finalize you will compare the calculated NMR structure to an Xray structure and generate statistics.
CYANA setup for the AUREMN Practical Course NMR in Campino (24-26 February 2018)
Please follow the following steps carefully (exact Linux commands are given below; you may copy them to a terminal):
- Go to your home directory.
- Get the input data for the practical from the server.
- Unpack the input data for the practical.
- Change into the newly created directory 'cyana'
- Run the setup script 'setupcyana'.
cd ~ cp -r xxxx/xxx/xxx/CyanaAUREMN2018.tgz tar zxf CyanaAUREMN2018.tgz cd cyana ./setupcyana
- Copy the directory 'flyaquick' containing the input data for the practical to 'flyabb'.
- Change into the subdirectory 'flyabb'.
- Test whether CYANA can be started by typing its name, 'cyana'.
- Exit from CYANA by typing 'q' or 'quit'.
cp -r flyaquick flyabb cd flyabb cyana ___________________________________________________________________ CYANA 3.98 (linux64-intel) Copyright (c) 2002-17 Peter Guentert. All rights reserved. ___________________________________________________________________ Demo license valid for specific sequences until 2017-12-31 Library file "/home/guentert_l/cyana/cyana-3.98/lib/cyana.lib" read, 41 residue types. Sequence file "demo.seq" read, 114 residues. cyana> q
If all worked, you are ready to go!
If you want to return to your practical later, using your own Linux or Mac OS X computer, you can download the demo version of CYANA from here.
Hint: More information on the CYANA commands etc. is in the CYANA 3.0 Reference Manual.
Experimental input data
The protein sequence is stored in three-letter code in the file 'demo.seq'.
Experimental peak lists are available for the following spectra:
- HNtrosy (spectrum type 'N15HSQC' in the CYANA library)
- trHNCA (spectrum type 'HNCA' in the CYANA library)
- HNCOCA (spectrum type 'HNcoCA' in the CYANA library)
- HNCACB (spectrum type 'CBCANH' in the CYANA library)
- HCCCHTOCSY (the spectrum type will have to be determined in the first short excercise)
- NTOCSY (spectrum type 'N15TOCSY' in the CYANA library)
- 3D [13C]-resolved NOESY called aro (spectrum type 'C13NOESY' in the CYANA library)
- 3D [13C]-resolved NOESY called cnoesy (spectrum type 'C13NOESY' in the CYANA library)
- 3D [15N]-resolved NOESY called nnoesy (spectrum type 'N15NOESY' in the CYANA library)
Peak lists in XEASY format that have been prepared by automatic peak picking with the program NMRView are stored in files XXX.peaks, where XXX denotes the FLYA spectrum type.
Each peak list starts with a header that defines the experiment type and the order of dimensions. For instance, for HNCA.peaks:
# Number of dimensions 3 #FORMAT xeasy3D #INAME 1 HN #INAME 2 C #INAME 3 N #SPECTRUM HNCA HN C N 5 6.475 58.033 98.548 1 U 2.769E+02 0.000E+00 e 0 0 0 0 6 6.476 62.123 98.126 1 U 2.571E+01 0.000E+00 e 0 0 0 0 7 6.475 54.017 98.159 1 U 2.547E+01 0.000E+00 e 0 0 0 0
The first line specifies the number of dimensions (3 in this case). The '#SPECTRUM' lines gives the experiment type (HNCA, which refers to the corresponding experiment definition in the CYANA library), followed by an identifier for each dimension of the peak list (HN C N) that specifies which chemical shift is stored in the corresponding dimension of the peak list. These labels must match those in the corresponding experiment definition in the general CYANA library (see below). After the '#SPECTRUM' line follows one line for every peak. For example, the first peak in the 'HNCA.peaks' list has
- Peak number 5
- HN chemical shift 6.475 ppm
- C (CA) chemical shift 58.033 ppm
- N chemical shift 98.548 ppm
The other data are irrelevant for automated chemical shift assignment with FLYA. In particular, the peak volume or intensity (2.769E+02) is not used by the algorithm.
Hint: The formats of other CYANA files are described in the CYANA 3.0 Reference Manual.
Experiment definitions in the CYANA library
When you start CYANA, the program reads the library and displays the full path name of the library file. You can open the standard library file to inspect, for example, the NMR experiment definitions that define which expected peaks are generated by FLYA. For instance, the definition for the HNCA spectrum (search for 'HNCA' in the library file 'cyana.lib') is
SPECTRUM HNCA HN N C 0.980 HN:H_AMI N:N_AM* C:C_ALI C_BYL 0.800 HN:H_AMI N:N_AMI (C_ALI) C_BYL C:C_ALI
The first line corresponds to the '#SPECTRUM' line in the peak list. It specifies the experiment name and a label for the atoms that are detected in each dimension of the spectrum. The number of labels defines the dimensionality of the experiment (3 in case of HNCA).
Each line below defines a (formal) magnetization transfer pathway that gives rise to an expected peak. in the case of HNCA there are two lines, corresponding to the intraresidual and sequential peak. For instance, the definition for the intraresidual peak starts with the probability to observe the peak (0.980), followed by a series of atom types, e.g. H_AMI for amide proton etc. An expected peak is generated for each molecular fragment in which these atom types occur connected by single covalent bonds. The atoms whose chemical shifts appear in the spectrum are identified by their labels followed by ':', e.g. for HNCA 'HN:', 'N:', and 'C:'.
Exercise 1: Determine the spectrum type
For the HCCCHTOCSY, determine the spectrum type and put the definition in the HCCCHTOCSY.peaks file in the appropriate syntax.
The experiment is a TOCSY, a through-bond experiment. It allows you to see, in this case, from the backbone all the way out into the side chains.
- Use the less command (to view files in the terminal but not change) to search the spectrum type in the cyana.lib file
Hint: Look at the definitions themselves and not just the SPECTRUM names, to determine which TOCSY is the appropriate one. Take the experiment with the most through-bond transfers.
- Use the vi editor to manipulate the HCCCHTOCSY.peaks file and enter appropriate spectrum type
Hint: For information on how to use the vi editor visit: https://www.cs.colostate.edu/helpdocs/vi.html
Execution scripts or "macros" in cyana
For more complex task within cyana, rather than to enter the execution commands line by line at the cyana prompt, the necessary commands are collected in a file called xxx.cya.
Therefore, CYANA scripts ("macros") 'CALC*.cya' contain the various commands to perform the required tasks.
The init script
One has the fixed name 'init.cya' and is executed automatically each time CYANA is started. It can also be called any time one wants to reinitialize the program. It contains normally at least two commands that read the CYANA library and the protein sequence:
name:=demo rmsdrange:=15-111 cyanalib read demo.seq
The first line sets the variable name to demo, the second line sets the appropriate rmsdrange, the command 'cyanalib' reads the standard CYANA library. The next command reads the protein sequence.
The FLYA CALC script
The 'CALC.cya' starts with the specification of the names of the input peak lists:
peaks:=aro,cnoesy,nnoesy,HNtrosy,trHNCA,HNCOCA,HNCACB,NTocsy,HCCCHTocsy
The peak list names are separated by commas (without blanks!). The files on disk have the file name extension .peaks, e.g. HNCA.peaks.
The commands above will use all available peak lists. You can choose any subset of them by modifying the 'peaks:=...' statement.
These are followed by tolerances for chemical shift matching:
assigncs_accH=0.03 assigncs_accC=0.4 assigncs_accN=assigncs_accC tolerance:=$assigncs_accH,$assigncs_accH,$assigncs_accC
In this case, a tolerance of 0.03 ppm will be used for protons, and 0.4 ppm for carbon and nitrogen.
The next parameter specifies the seed value for the random number generator (an arbitrary positive integer is ok).
randomseed=101
Two parameters of the assignment algorithm are set in order to speed up the calculation for this practical:
shiftassign_population=25 (we do not set this, so the default of XX is used) shiftassign_quick=1
In production runs, better results can be expected (at the expense of longer computation times) if these parameters are not set. These parameters specify:
- The population size for the genetic algorithm, i.e. how many assignments form one generation (25; chosen smaller than in normal production runs in order to speed up the calculation)
- An option to choose the "quick" optimization schedule.
Finally, there is the command to start the FLYA algorithm:
flya runs=10 assignpeaks=$peaks shiftreference=manREF.prot
Here, the given parameters of the 'flya' command specify that
- The number of independent runs of the algorithm, from which the consolidated shift will be calculated (chosen smaller than in normal production runs in order to speed up the calculation).
- The input peak lists that will be used (as defined above).
- No ensemble of random structures will be calculated for generating expected peaks (leads to accurate prediction for long range NOES in NOESY-type experiments).
- The results will be compared with the reference chemical shifts in the file 'manREF.prot' (which have been determined by conventional methods). The reference chemical shifts will not be used by the algorithm but only for a subsequent analysis of its results.
Exercise 2: Compose the CALC.cya
Using the vi editor, create your CALC.cya file to run FLYA as outlined above. Be extra careful to avoid typos and unwanted spaces in coma list etc.
When you are done start your FLYA script using 10 processors by calling it as outlined just below. This will take about twenty minutes once the calculation starts.
Run the FLYA calculation
To run the FLYA calculation, you start CYANA and execute the 'CALC.cya' script. On a computer with multiple processors we canspeed up the calculation, by running the the CALC script through a MPI scheduler as:
cyana -n 10 CALC.cya
This starts 10 independent calculations on 10 processors by using the MPI scheduler (if installed on your system, otherwise shared memory will be used).
To check the queuing on the server use:
xxx
And to cancel the processes started by you before completion:
xxx
To check the general load on your local computer use:
top
FLYA output files
The FLYA algorithm will produce the following output files:
- flya.prot: Consensus assigned chemical shifts. This file contains a chemical shift for every atom that has been assigned to least one peak.
- flya.tab: Table with details about the chemical shift assignment of each atom (comparison with reference shifts). In this file you can see for each atom whether the assignment is "strong" (self-consistent) or "weak" (only tentative).
- flya.txt: Assignment statistics
- flya.pdf: Graphical representation of the assignment results
- XXX_exp.peaks: List of expected peaks, corresponding to input peak list XXX.peaks
- XXX_asn.peaks: Assigned peak list, corresponding to input peak list XXX.peaks
The flya.txt file
This output file starts with overall assignment statistics for each group of atoms as defined by 'analyzeassign_group:=...' in CALCbackbone.cya':
____________________________________________________________ CHEMICAL SHIFT ASSIGNMENT ____________________________________________________________ SEED: 1 chemical shifts for 542 atoms found Peaks assigned from frequencies BB: REFERENCES(2):512 CHEMICALSHIFTS(1):542 (1)and(2):512 MATCH:507(99.0% of (2))
- REFERENCES(2) is the number of reference assignments (in the selected group)
- CHEMICALSHIFTS(1) is is the number of atoms assigned by FLYA
- (1)and(2) is the number of atoms that are assigned by FLYA and in the reference.
- MATCH is the number of atoms with the same assignment by FLYA and in the reference. The percentage is relative to the number of reference assignments.
Further below comes a table with information about each peak list:
PEAKLISTS #Expected: Total number of expected peaks noRef: Number of expected peaks with missing reference shifts noPeak: Number of expected peaks for wich no peak can be measured Assigned: Number of expected peaks that could be assigned Match: Number of assigned peaks that fit reference shifts #Measured: Total number of peaks in peak list Assigned: Number of measured peaks that could be assigned to expected peaks exp/meas: Ratio of assigned expected and measured peaks Lists #Expected noRef noPeak Assigned Match #Measured Assigned exp/meas Assigned N15HSQC 106 8 1 104( 98.11%) 97( 91.51%) 131 96( 73.28%) 1.1 HNCA 211 15 11 194( 91.94%) 186( 88.15%) 329 179( 54.41%) 1.1 HNcaCO 211 15 11 197( 93.36%) 183( 86.73%) 246 176( 71.54%) 1.1 HNCO 105 7 1 101( 96.19%) 97( 92.38%) 158 97( 61.39%) 1.0 HNcoCA 105 7 0 101( 96.19%) 97( 92.38%) 158 99( 62.66%) 1.0 CBCANH 399 26 25 361( 90.48%) 350( 87.72%) 623 339( 54.41%) 1.1 CBCAcoNH 200 13 2 196( 98.00%) 185( 92.50%) 324 192( 59.26%) 1.0 ALL 1337 91 51 1254( 93.79%) 1195( 89.38%) 1969 1178( 59.83%) 1.1
It contains the following data:
- #Expected: Total number of expected peaks
- noRef: Number of expected peaks with missing reference shifts
- noPeak: Number of expected peaks for which no peak can be measured
- Assigned: Number of expected peaks that could be assigned based on the reference chemical shift assignments. The theoretical maximum of 100% corresponds to the situation that the spectra “explain” all expected peaks. Each expected peak can be mapped to at most one measured peak. Remaining expected peaks correspond to missing peaks in the measured peak list.
- Match: Number of assigned peaks that fit (within tolerance) reference shifts. The theoretical maximum of 100% corresponds to having all measured peaks assigned. Note that several expected peaks can be mapped to the same measured peak, i.e. the assignments of measured peaks can be unambiguous or ambiguous. Remaining unassigned measured peaks are likely to be artifacts.
- #Measured: Total number of peaks in peak list
- Assigned: Number of measured peaks that could be assigned to expected peaks
- exp/meas: Ratio of assigned expected and measured peaks
There is more information on the results of the assignment calculation in the 'flya.txt' file (not described here).
The flya.tab file
This file provides information about the chemical shift assignment of each individual atom:
Atom Residue Ref Shift Dev Extent inside inref ... N GLY 57 102.109 102.043 0.066 10.0 100.0 100.0 strong= H GLY 57 8.571 8.570 0.001 10.0 100.0 100.0 strong= CA GLY 57 45.415 45.433 -0.018 10.0 100.0 100.0 strong= HA2 GLY 57 4.042 HA3 GLY 57 3.436 C GLY 57 173.621 173.662 -0.041 10.0 89.4 90.0 strong= N LEU 58 120.640 120.649 -0.009 10.0 80.0 80.0 = H LEU 58 7.488 7.492 -0.004 10.0 79.8 80.0 = CA LEU 58 51.943 51.940 0.003 10.0 70.0 70.0 = HA LEU 58 4.995 CB LEU 58 45.602 45.568 0.034 10.0 82.7 80.0 strong= CG LEU 58 26.528 HG LEU 58 1.515 CD1 LEU 58 24.745 C LEU 58 173.619 174.576 -0.957 10.0 40.1 10.0 ! (C 59) ...
- Ref: Chemical shift value in the reference chemical shift list (ref.prot). It was not used in the calculation.
- Shift: Consensus chemical shift value from FLYA
- Dev = Ref - Shift
- Extent: Number of runs in which the atom was assigned by FLYA.
- Inside: Percentage of chemical shift values from the (10) independent runs of FLYA that agree (within the tolerance) with the consensus value.
- inref: Percentage of chemical shift values from the (10) independent runs of FLYA that agree (within the tolerance) with the reference value.
- Outcome of the assignment:
- strong: "strong" assignment, i.e. Inside > 80%.
- =: Assignment that agrees with reference, i.e. Dev < tolerance.
- !: Assignment that does not agree with the reference, i.e. Dev > tolerance.
- (atom name): Correct assignment, if within the same residue (no residue number given), or the neighboring residues.
The flya.pdf file
This PDF file provides a graphical representation of the 'flya.tab' file. Each assignment for an atom is represented by a colored rectangle.
- Green: Assignment by FLYA agrees with the manually determined reference assignment (within tolerance)
- Red: Assignment by FLYA does not agree with the manually determined reference assignment
- Blue: Assigned by FLYA but no reference available
- Black: With reference assignment but not assigned by FLYA.
Respective light colors indicate assignments not classified as strong by the chemical shift consolidation. The row labeled HN/Hα shows for each residue HN on the left and Hα in the center. The N/Cα/C’ row shows for each residue the N, Cα, and C’ assignments from left to right. The rows β-η show the side-chain assignments for the heavy atoms in the center and hydrogen atoms to the left and right. In the case of branched side-chains, the corresponding row is split into an upper part for one branch and a lower part for the other branch.
Exercise 3: Analyze the FLYA results
Analyze your FLYA results.
What do you think?
How does the automated assignment compare to the provided assignment?
How robust you think are the results?
What could you do to likely improve the result?
Automated NOESY assignment and structure calculation
We will perform an automated NOE restraint assignment and structure calculation by torsion angle dynamics.
The 'flya.prot' file from the automated resonance assignment will be used together with the (unassigned) NOESY peak lists to assign the NOESY peaks and to generate distance restraints in order to compute the three-dimensional structure of the protein.
Using Talos to generate aco restraints
Angle restraints from the backbone chemical shifts help restrict angular conformational space. We wish to use only "strong assignments" to generate these restraints.
TALOS is used to generate torsion angle restraints from the backbone chemical shifts in 'flya.prot' (PREP_aco.cya).
consolidate reference=flya.prot file=flya.tab plot=flya.pdf prot=details/a[0-9][0-9][0-9].prot
This overwrites the original flya.prot with only strong assignments, as well as the statistics files.
read prot flya.prot unknown=skip talos talos=talos+ talosaco pred.tab write aco talos.aco
This will call the program TALOS and store the resulting torsion angle restraints in the file 'talos.aco'.
Exercise 4: Calculate your backbone angels using Talos
Hint: copy the FLYA results into a new folder, since otherwise you will overwrite your original flya.prot file. Essentially you will need to copy the details directory and the flya.prot file
cp -r flyabb acoPREP
Running noeassign
peaks := cnoesy.peaks,nnoesy.peaks,aro.peaks prot := flya.prot restraints := talos.aco tolerance := 0.040,0.030,0.45 structures := 100,20 steps := 10000 randomseed := 434726 noeassign peaks=$peaks prot=$prot autoaco
To speed up the calculation, you can set alternativly in 'CALC.cya':
structures:=50,10 steps=5000
These commands tell the program to calculate, in each cycle, 50 conformers, and to analyze the best 10 of them. 5000 torsion angle dynamics steps will be applied per conformer.
7 cycle of automated NOE assignment and structure calculation will be performed by running the CALC.cya macro:
cyana -n 33 CALC.cya
Statistics on the NOE assignment and the structure calculation will be in the file 'Table', which can also be produced with the command 'cyanatable -lp'.
The final structure will be 'final.pdb'. You can visualize it, for example, with the command
chimera final.pdb
The optimal residue range for superposition can be found with the command
cyana overlay final.pdb
For further information about automated NOESY assignment you can consult the Tutorial Structure calculation with automated NOESY assignment (which uses different file names than we have here).
Creating the ligand library file for cyana
Exercise x: In this exercise you will create the ligand library file for cyana from scratch.
Drawing the molecule and obtaining the SMILES code
Go to "http://zinc.docking.org/search/structure"
Click on the Structure tab and draw the molecule using the supplied drawing of the compound as a guide.
Be well aware of the stereochemistry.
Copy the SMILES code.
Converting the SMILES code to mol2
There many options and programs to do this, we outline two:
If you want to use Avogadro:
Build -- > Insert --> SMILES
Paste the SMILES code
Extensions -- > Optimize Geometry
Save as
LIG.mol2 (*.mol2)
If you want to use chimera:
Tools --> Structure Editing --> Build Structure Start Structure
SMILES string
set the Residue name to LIG (capital letters)
--> Apply Save your mol2 file as: lig.mol2
Now, there is one issue we have to take care of: The intermolecular NOE assignments have to match the ligand structure assignment, otherwise the intermolecular NOEs will be wrong.
Therefore, open the supplied lig.pdb structure in chimera, as well as the created mol2 structure. Then using an editor, manually change the protons in your mol2 file to match those of the pdb.
Converting the mol2 file to a lib file for cyana
run cylib with the options -nc -sc
./cylib -nc -sc LIG.mol2
this will create the LIG.lib file.
The -sc option keeps the angles of the rings fixed. We can do this since they are in this molecule either aromatic or have sp3 conjugate carbons in them, fixing the ring geometry. If they had to be flexible, you would need to keep the angeles flexible and supply additional restraints to close the rings.
In the LIG.lib file, RESIDUE UNL1, replace the UNL1 residue name with LIG
To test the lib file we need cyana:
create a sequence file containing LIG 333
Start cyana This will read the cyana library file correctlly but give you the error:
*** ERROR: Illegal residue name "LIG". *** ERROR: Cannot read line 1: LIG 333
Because we do not have an init file and have not read the LIG.lib file yet, the program tries to read the default sequence file in the directory.
read lib LIG.lib append read seq LIG.seq anneal write pdb lig.pdb
Carefully analyze the WARNING and ERROR messages if any.
Then take a look at your lig.pdb in chimera and check that the chemistry and bonds are all as expected (ring closure!)
chimera LIG.pdb
If there are any issues go back to the drawing board to fix the issues.
To help find problems, you may use the command:
write lib LIG.lib names
This will write the library file containing actual atom names rather than numbers.
Calculating the structure of the protein-ligand complex
Intermolecular cross peaks
The structure calculation
Comparing the calculated NMR structure to an XRAY reference structure
Deposited structures often lack specific features in order for you to meaningfully work with a structure. Such as Xray structures lack proton atom coordinates.
Using the regularize command you can get a structure calculated within cyana that has these features but still is very close to the input structure of your choice.
Using regularize to attach missing atoms and normalize the structure
Exercise x: Using the cyana wiki, determine yourself the necessary commands to convert the supplied demoXray.pdb structure and compile a XXX.cya file to do the job.
Hints: Use the same sequence you used for the calculation of the complex structure in the exercise before.
To read pdb's containing molecules and atoms not specified in your sequence file:
read pdb xxxx.pdb unknown=warn
When you have your xray structure ready, load your calculated nmr structure and the xray structure in chimera.
Use to chimera specific commands to overlay the two structures and compare the structures visually.
Using cyana, compare the rmsd for the protein structure and the rmsd for the ligand.