Example projects and data-sets

The PALEOMIX pipeline contains small example projects for the larger pipelines, which are designed to be executed in a short amount of time, and to help verify that the pipelines have been correctly installed.

BAM Pipeline example project

The example project for the BAM pipeline involves the processing of a small data set consisting of (simulated) ancient sequences derived from the human mitochondrial genome. The runtime of this project on a typical desktop or laptop ranges from around 1 minute to around 1 hour (when building of models of the ancient DNA damage patterns is enabled). To access this example project, use the ‘example’ command for the bam_pipeline to copy the project files to a given directory (here, the current directory):

$ paleomix bam_pipeline example .
$ cd bam_pipeline
$ paleomix bam_pipeline run 000_makefile.yaml

By default, this example project includes the recalibration of quality scores for bases that are identified as putative post-mortem damage (see [Jonsson2013]). However, this greatly increases the time needed to run this example. While it is recommended to run this step, this step may be disabled by setting the value of the ‘RescaleQualities’ option in the ‘000_makefile.yaml’ file to ‘no’.

Before:

83
84
85
# Carry out quality base re-scaling of libraries using mapDamage
# This will be done using the options set for mapDamage below
RescaleQualities: yes

After:

83
84
85
# Carry out quality base re-scaling of libraries using mapDamage
# This will be done using the options set for mapDamage below
RescaleQualities: no

The output generated by the pipeline is described in the File structure section. Please see the Troubleshooting section if you run into problems running the pipeline.

Phylogentic Pipeline example project

The example project for the Phylogenetic pipeline involves the processing and mapping of a small data set consisting of (simulated) sequences derived from the human and primate mitochondrial genome, followed by the genotyping of gene sequences and the construction of a maximum likelihood phylogeny. Since this example project starts from raw reads, it therefore requires that the BAM pipeline has been correctly installed, as described in section Software requirements). The runtime of this project on a typical desktop or laptop ranges from around 30 minutes to around 1 hour.

To access this example project, use the ‘example’ command for the phylo_pipeline to copy the project files to a given directory (here, the current directory), and then run the ‘setup.sh’ script in the root directory, to generate the data set:

$ paleomix phylo_pipeline example .
$ cd phylo_pipeline
$ ./setup.sh

Once the example data has been generated, the two pipelines may be executed:

$ cd alignment
$ bam_pipeline run 000_makefile.yaml
$ cd ../phylogeny
$ phylo_pipeline genotype+msa+phylogeny 000_makefile.yaml

The output generated by the pipeline is described in the File structure section. Please see the Troubleshooting section if you run into problems running the pipeline.

Zonkey Pipeline example project

The example project for the Zonkey pipeline is based on a synthetic hybrid between a Domestic donkey and an Arabian horse (obtained from [Orlando2013]), using a low number of reads (1200). The runtime of these examples on a typical desktop or laptop ranges from around 30 minutes to around 1 hour, depending on your local configuration.

To access this example project, download the Zonkey reference database (see the ‘Prerequisites’ section of the Pipeline usage page for instructions), and use the ‘example’ command for zonkey to copy the project files to a given directory. Here, the current directory directory is used; to place the example files in a different location, simply replace the ‘.’ with the full path to the desired directory:

$ paleomix zonkey example database.tar .
$ cd zonkey_pipeline

The example directory contains 3 BAM files; one containing a nuclear alignment (‘nuclear.bam’); one containing a mitochondrial alignment (‘mitochondrial.bam’); and one containing a combined nuclear and mitochondrial alignment (‘combined.bam’). In addition, a sample table is included which shows how multiple samples may be specified and processed at once. Each of these may be run as follows:

# Process only the nuclear BAM;
# by default, results are saved in 'nuclear.zonkey'
$ paleomix zonkey run database.tar nuclear.bam

# Process only the mitochondrial BAM;
# by default, results are saved in 'mitochondrial.zonkey'
$ paleomix zonkey run database.tar mitochondrial.bam

# Process both the nuclear and the mitochondrial BAMs;
# note that is nessesary to specify an output directory
$ paleomix zonkey run database.tar nuclear.bam mitochondrial.bam results

# Process both the combined nuclear and the mitochondrial BAM;
# by default, results are saved in 'combined.zonkey'
$ paleomix zonkey run database.tar combined.bam

# Process multiple samples; the table corresponds to the four
# cases listed above.
$ paleomix zonkey run database.tar samples.txt

Please see the Troubleshooting section if you run into problems running the pipeline. The output generated by the pipeline is described in the File structure section.