.. _examples:

Example projects and data-sets
==============================

The PALEOMIX pipeline contains small example projects for the larger pipelines, which are designed to be executed in a short amount of time, and to help verify that the pipelines have been correctly installed.


.. _examples_bam:

BAM Pipeline example project
----------------------------

The example project for the BAM pipeline involves the processing of a small data set consisting of (simulated) ancient sequences derived from the human mitochondrial genome. The runtime of this project on a typical desktop or laptop ranges from around 1 minute to around 1 hour (when full modeling of ancient DNA damage patterns is enabled). To access this example project, use the 'example' command for the BAM pipeline to copy the project files to a given directory (here, the current directory)::

    $ paleomix bam example .
    $ cd bam_pipeline
    $ paleomix bam run makefile.yaml

The output generated by the pipeline is described in the :ref:`bam_filestructure` section. Please see the :ref:`troubleshooting` section if you run into problems running the pipeline.


.. _examples_phylo:

Phylogentic Pipeline example project
------------------------------------

The example project for the phylogenetic pipeline involves the processing and mapping of a small data set consisting of (simulated) sequences derived from the human and primate mitochondrial genome, followed by the genotyping of gene sequences and the construction of a maximum likelihood phylogeny. Since this example project starts from raw reads, it therefore requires that the BAM pipeline has been correctly installed, as described in section :ref:`bam_requirements`). The runtime of this project on a typical desktop or laptop ranges from around 30 minutes to around 1 hour.

To access this example project, use the 'example' command for the phylogenetic pipeline to copy the project files to a given directory (here, the current directory), and then run the 'setup.sh' script in the root directory, to generate the data set::

    $ paleomix phylo example .
    $ cd phylo_pipeline
    $ ./setup.sh

Once the example data has been generated, the two pipelines may be executed::

    $ cd alignment
    $ paleomix bam run makefile.yaml
    $ cd ../phylogeny
    $ paleomix phylo genotype+msa+phylogeny makefile.yaml

The output generated by the pipeline is described in the :ref:`phylo_filestructure` section. Please see the :ref:`troubleshooting` section if you run into problems running the pipeline.


.. _examples_zonkey:

Zonkey Pipeline example project
-------------------------------

The example project for the Zonkey pipeline is based on a synthetic hybrid between a Domestic donkey and an Arabian horse (obtained from [Orlando2013]_), using a low number of reads (1200). The runtime of these examples on a typical desktop or laptop ranges from around 30 minutes to around 1 hour, depending on your local configuration.

To access this example project, download the Zonkey reference database (see the 'Prerequisites' section of the :ref:`zonkey_usage` page for instructions), and use the 'example' command for zonkey to copy the project files to a given directory. Here, the current directory directory is used; to place the example files in a different location, simply replace the '.' with the full path to the desired directory::

    $ paleomix zonkey example database.tar .
    $ cd zonkey_pipeline


The example directory contains 3 BAM files; one containing a nuclear alignment ('nuclear.bam'); one containing a mitochondrial alignment ('mitochondrial.bam'); and one containing a combined nuclear and mitochondrial alignment ('combined.bam'). In addition, a sample table is included which shows how multiple samples may be specified and processed at once. Each of these may be run as follows::

    # Process only the nuclear BAM;
    # by default, results are saved in 'nuclear.zonkey'
    $ paleomix zonkey run database.tar nuclear.bam

    # Process only the mitochondrial BAM;
    # by default, results are saved in 'mitochondrial.zonkey'
    $ paleomix zonkey run database.tar mitochondrial.bam

    # Process both the nuclear and the mitochondrial BAMs;
    # note that is nessesary to specify an output directory
    $ paleomix zonkey run database.tar nuclear.bam mitochondrial.bam results

    # Process both the combined nuclear and the mitochondrial BAM;
    # by default, results are saved in 'combined.zonkey'
    $ paleomix zonkey run database.tar combined.bam

    # Process multiple samples; the table corresponds to the four
    # cases listed above.
    $ paleomix zonkey run database.tar samples.txt


Please see the :ref:`troubleshooting` section if you run into problems running the pipeline. The output generated by the pipeline is described in the :ref:`zonkey_filestructure` section.