PIIM: Population genetic Inference In Metagenomics

Philip Johnson

Documentation for version 2

Contents:

About

PIIM uses the site-frequency spectrum to estimate the population genetic parameters θ=2Nu (where u is the per-site mutation rate) and R=Nr (where r is the exponential growth rate). Version 2 adds the ability to look at the patterns of pairs of sites to estimate ρ=2Nc (where c is the per-site recombination rate). Note that the ρ estimation routine assumes a constant-size population (i.e. R=0). For details on the method, see:

Questions or bug reports may be e-mailed to me (plfjohnson) at my institution (@berkeley.edu; soon to be @emory.edu).



Changes

Version 2.0.3 (10 April 2013) Version 2.0.2 (13 March 2013) Version 2.0.1 Version 2.0

Files

The PIIM distribution includes this documentation and source code for the following programs:


Unpacking & compiling


Preparing your input data

Unfortunately, there are as many assembly output formats as assembly programs. As a result, I chose to implement a simple XML format that includes all the information required for PIIM: reads, assembly, information about read pairs and quality scores. The PERL script ace2xml.pl will convert from the output format used by the phrap assembler to the XML format used by PIIM. If you need the gory details, the XML specification is formalized in "piim_r.dtd", and an example can be found in "example.xml".

Please note that ace2xml.pl must find "FF_Index.pm"; the easiest way to make sure this happens is to keep the two files in the same directory. Running './ace2xml.pl' by itself will produce a short description of its usage. Typical usage looks like:

   ./ace2xml.pl -i my_assembly.ace -q read_qualities.qfa
The quality file should be a single fasta-style file with qualities instead of bases.

Running piim

Running './piim_r' by itself will produce a short list of the command-line parameters. The typical usage would be something like:

   ./piim_r -i example.xml -l lk_n50_t0.01_allsmooth -joint-est

Here, we take our input file (example.xml) and search for the maximum likelihood recombination rate using the likelihood in the specified table. Not that the maximum read depth in your input data should be no more than the maximum n in the likelihood table (otherwise those positions will be ignored and a warning issued). The output is the maximum likelihood parameter estimates separated by tabs (θ, ρ, R [growth rate], λ [gene-conversion parameter]):

  0.009375	0.0031875	0	500
Note that the λ parameter is fixed at 500 (never estimated) and the growth parameter can only estimated if using the -freqspec version of piim (in which case ρ is NOT estimated).

By default, piim uses the Nelder-Mead simplex algorithm to efficiently search parameter space. However, my implementation requires the caching all pairs of SNPs and uses the hard disk extensively. An alternative is to calculate the entire likelihood surface at fixed grid points in a single pass (option "-surface") and then interpolate to find the MLE. The former (default) is generally faster if you are not running anything else on the same computer that uses the hard disk extensively. However, the latter ("-surface") will be faster if you have a multi-core computer and are trying to run multiple instances simultaneously.

Details about command line parameters:


License

  Copyright 2009 The Regents of the University of California.  All
  rights reserved.
 
  IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT,
  INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
  LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
  DOCUMENTATION, EVEN IF REGENTS HAS BEEN ADVISED OF THE POSSIBILITY
  OF SUCH DAMAGE.
 
  REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  FOR A PARTICULAR PURPOSE. THE SOFTWARE AND ACCOMPANYING
  DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED "AS
  IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT,
  UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 
  This software is distributed under the terms of Version 2 of the
  GNU General Public License as published by the Free Software
  Foundation.  If you have not received a copy of Version 2 of the
  GNU General Public License with this program, you may download the
  GPL from the following url: http://www.gnu.org/copyleft/gpl.html.