Multi-chromosome runs
For genome-scale simulations — or any time you want to vary parameters
across chromosomes — pass a parameter file to --chromosome_file
instead of specifying the per-chromosome parameters on the command line.
vcfsim runs each row of the file as an independent simulation and
concatenates the results into a single VCF.
The parameter file format
The file is whitespace-separated, headerless, and has exactly five columns:
chromosome ploidy sequence_length Ne mu
For example:
1 2 10000 100000 1e-6
2 2 15000 100000 1e-6
X 1 8000 50000 1e-6
chromosome— name written into the#CHROMcolumn for the corresponding rows.ploidy— must be a positive integer.sequence_length,Ne,mu— may be plain numbers or simple Python numeric expressions (1e4,1.7e6,5.5e-9,2*10000).
Each row is run as a separate msprime simulation; the per-chromosome
ploidy, sequence length, Ne, and mutation rate can therefore vary freely.
This is particularly useful for simulating sex chromosomes alongside
autosomes — give the X-chromosome row ploidy = 1 and the autosome
rows ploidy = 2.
Running with a parameter file
When using --chromosome_file, omit the per-chromosome flags
(--chromosome, --replicates, --sequence_length, --ploidy,
--Ne, --mu) — they come from the file instead. Everything else
(--seed, sample specification, missingness model, --output_file)
works as usual:
vcfsim \
--seed 1234 \
--percent_missing_sites 0 \
--percent_missing_genotypes 0 \
--output_file myvcf \
--sample_size 10 \
--chromosome_file input.txt
This writes a single concatenated VCF (myvcf.vcf) containing all
chromosomes from input.txt. The header is written once at the top
and the per-chromosome bodies are appended in order.
Combining with custom sample names
The sample-specification flags (--sample_size, --samples,
--samples_file) work in batch mode exactly as they do for single-chromosome
runs. To label the same set of samples across all chromosomes:
vcfsim \
--seed 1234 \
--percent_missing_sites 0 \
--percent_missing_genotypes 0 \
--output_file myvcf \
--samples_file names.txt \
--chromosome_file input.txt
The same sample names are used in every per-chromosome simulation, so the final concatenated VCF has a single, consistent column ordering.