File structure

The following section explains the file structure for results generated by the Zonkey pipeline, based on the results generated when analyzing the example files included with the pipeline (see Zonkey Pipeline example project).

Single sample analysis

The following is based on running case 4a, as described in the Pipeline usage section. More specifically, the example in which the analysis are carried out on a BAM alignment file containing both nuclear and mitochondrial alignments:

# Case 4a: Analyses both nuclear and mitochondrial genome; results are placed in 'combined.zonkey'
$ paleomix zonkey run database.tar combined.bam

As noted in the comment, executing this command places the results in the directory 'combined.zonkey'. For a completed analysis, the results directory is expected to contain a (HTML) report and a directory containing each of the figures generated by the pipeline:

  • report.css
  • report.html
  • figures/

The report may be opened with any modern browser. Each figure displayed in the report is also available as a PDF file, accessed by clicking on a given figure in the report, or directly in the figures/ sub-directory.

Analysis result files

In addition, the following directories are generated by the analytical steps, and contain the various files used by or generated by the programs run as part of the Zonkey pipeline:

  • admixture/
  • mitochondria/
  • pca/
  • plink/
  • treemix/

In general, files in these directories are sorted by the prefix 'incl_ts' and the prefix 'excl_ts', which indicate that sites containing transitions (C<->G, and C<->T) have been included or excluded from the analyses, respectively. For a detailed description of the files generated by each analysis, please refer to the documentation for the respective programs used in said analyses.

Additionally, the results directory is expected to contain a 'temp' directory. This directory may safely be removed following the completion of a Zonkey run, but should be empty unless one or more analytical steps have failed.

Multi-sample analysis

When multiple samples are processed at once, as described in case 5 (Pipeline usage), results are written to a single 'results' directory. This directory will contain a summary report for all samples, as well as a sub-directory for each sample listed in the table of samples provided when running the pipeline. Thus, for the samples table shown in case 5:

$ cat samples.table
example1    combined.bam
example2    nuclear.bam
example3    mitochondrial.bam
example4    nuclear.bam mitochondrial.bam

# Case 5a) Analyse 3 samples; results are placed in 'my_samples.zonkey'
$ paleomix zonkey run database.tar my_samples.txt

The results directory is expected to contain the following files and directories:

  • summary.html
  • summary.css
  • example1/
  • example2/
  • example3/
  • example4/

The summary report may be opened with any modern browser, and offers a quick over-view of all samples processed as part of this analysis. The individual report for each sample may further more be accessed by clicking on the headers corresponding to the name of a give nsample.

The per-sample directories corresponding exactly to the result directories that would have been generated if the sample was processed by itself (see above), excepting that only a single 'temp' directory located in the root of the results directory is used.