Dataset#
The dataset can be downloaded here. The data is structured in two folders - inputs and results. All files necessary to prepare the inputs correctly or to run peak analysis and template matching are in the inputs folder. The results which one should obtain are in the results folder.
For your convenience we also provide a Jupyter notebook tm_processing.ipynb that can be directly used to run some of
the preprocessing and postprocessing functions.
Template matching inputs#
- 126_b4.mrc
Tomogram to run the template matching on. It is was downsampled from the unbinned data by factor 4 (i.e., in our notation corresponds to binning 4). To further reduce the size only the most interesting part of the tomogram was cut out and provided for the tutorial. The unbinned pixel size is 2.176 Å.
- microtubule/microtubule.em
Template of the microtubule in EM format. The template was downloaded from EMDB (accession code: EMD-6351) and downsampled to the correct pixel size of 8.7 Å. Note that EM format does not have the pixel size stored anywhere in its header and if you want to display the template with its correct “physical” size, e.g. in ChimeraX, you have to set the pixel size manually there.
The template is in the microtubule subfolder since such folder structure is required to run the peak analysis.
- microtubule/mask_microtubule.em
Mask for the template matching in EM format.
The mask is in the microtubule subfolder since such folder structure is required to run the peak analysis.
- wedge_list.star
Wedge list containing all necessary information about tomogram number 126 in starfile format.
- angles_5_c13.txt
List of angles in “zxz” order. The angular and in-plane sampling is 5 degrees.
- tm_param.star
Parameter file for the template matching. The only parameter that has to be adapted is the
rootdirwhere one should set the correct path to the template matching folder.
Wedge list creation inputs#
- 126.mrc.mdoc
Mdoc file containing all necessary information about the tilt series and dose-exposure.
- 126_gctf.star
File with defocus estimation done within GCTF.
Peak analysis inputs#
- template_list.csv
CSV file containing information about the template that are necessary to run the peak analysis.
- microtubule/tight_mask_microtubule.em
A tight and sharp binary mask that copies the shape of the microtubule and is used to compute the number of voxels the structure occupies. It should not be used for TM as it contain sharp edges.
The mask is in the microtubule subfolder since such folder structure is required to run the peak analysis.
- angles_10_c13.txt
List of angles in “zxz” order. The angular and in-plane sampling is 10 degrees.
This file is used only in the peak analysis but can be also used to run the TM in case larger angular sampling is needed.
Template matching results#
- tm_outputs/scores_0_126.em
The scores map with CCC values for the tomogram.
- tm_outputs/angles_0_126.em
The angles map containing indices that encode the position of the rotation in the angles_5_c13.txt
- tm_outputs/0.log
The log file produced by GAPSTOP™
- particle_list.em
A particle list in EM format that contains 31 particles.
This list is not created during the template matching. Instead, the function scores_extract_particles has to be called. See Results evaluation for more info.
Peak analysis results#
- template_list.csv
Same template list as in the inputs folder but containing also additional information from the peak analysis.
- peak_analysis/id_0_results
Folder with results of the peak analysis corresponding to the first row of the template list, i.e. to the angular sampling of 10 degrees.
The id_0_scores.em and id_0_angles.em are equivalents of the scores and angles maps and are produced by running TM on the microtubule template with itself.
The files with *dist_all* contain voxel-wise information on total angular distance from the starting orientation. To evaluate some statistic the basic distance file was also labeled with two different labeling techniques.
The files with *dist_inplane* contain voxel-wise information on inplane angular distance from the starting orientation. To evaluate some statistic the basic distance file was also labeled with two different labeling techniques.
The files with *dist_normals* contain voxel-wise information on cone angular distance from the starting orientation. To evaluate some statistic the basic distance file was also labeled with two different labeling techniques.
The file id_0.csv contains information for each rotation from the angle list file, such as total angular distance, inplane angular distance, cone angular distance, number of overlapping voxels (with both TM mask and the tight mask, the maximum CCC (masked and unmasked) and z-scores (masked and unmasked).
Additional CSV files contain information on line profiles from the scores maps and infromation on gradual angles analysis.
The most relevant results are summarized in the id_0_summary.pdf
- peak_analysis/id_1_results
Folder with results of the peak analysis corresponding to the second row of the template list, i.e. to the angular sampling of 5 degrees.
The file content is the same as for the id_0_results folder.