.. vcfsim documentation master file. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. .. raw:: html

vcfsim 1.0

.. image:: images/vcfsim_logo.png :width: 200 :align: center What is vcfsim? =============== ``vcfsim`` is a command-line tool for generating simulated VCFs (Variant Call Format files used to encode genetic variation data). It pairs a coalescent simulation backend (`msprime `_) with lightweight postprocessing to produce biologically realistic VCFs with parameterized missing data — a full simulated dataset can be created from just a few command-line arguments. In particular, ``vcfsim`` makes it easy to simulate **all-sites VCFs** that contain both variant and invariant sites. All-sites VCFs are required for unbiased estimation of π and d\ :sub:`xy` (see `pixy `_), and are typically expensive to obtain from real data — ``vcfsim`` is designed to drop straight into pixy-style workflows for testing, benchmarking, and methods development. ``vcfsim`` also supports two missing-data models (uniform and HMM-based spatial clustering), arbitrary ploidy, custom sample names, two-population splits, and multi-chromosome batch runs from a parameter file. .. toctree:: :caption: Documentation :maxdepth: -1 about installation arguments changelog contributing .. toctree:: :maxdepth: -1 :caption: Guides usage missing_data populations multi_chromosome How should I cite vcfsim? ========================= If you use ``vcfsim`` in your research, please cite the repository and the underlying coalescent simulator: * ``vcfsim``: https://github.com/samuk-lab/vcfsim * Baumdicker, F. *et al.* (2022). Efficient ancestry and mutation simulation with msprime 1.0. *Genetics*, 220(3), iyab229. https://doi.org/10.1093/genetics/iyab229