Simulating a population split
By default vcfsim simulates a single panmictic population. To produce a
VCF that contains two populations with a shared history, switch to
two-population mode:
vcfsim \
--chromosome 1 --replicates 1 --seed 1234 \
--sequence_length 10000 --ploidy 2 --Ne 100000 --mu 1e-6 \
--percent_missing_sites 0 --percent_missing_genotypes 0 \
--sample_size 10 \
--population_mode 2 --div_time 1000 \
--output_file myvcf
The model
--population_mode 2 simulates a clean two-population split: an
ancestral population C of effective size --Ne splits into two
present-day populations A and B, both of size --Ne, at
--div_time generations before present.
--population_mode 2 activates the split.
--div_time sets the split time in generations before present. Required when
--population_mode 2; ignored otherwise.
Sample distribution
Samples are split evenly between the two populations, so the total sample count must be even. This applies to all three sample-specification flags:
--sample_size 10puts 5 samples in A and 5 in B.--samples A1 A2 A3 A4puts the first two in A and the last two in B.--samples_file names.txtlikewise splits the names in order.
If an odd number of samples is requested in mode 2, vcfsim raises an
error rather than silently truncating.
Using the output for FST simulations
Two-population mode is the natural choice for benchmarking
between-population statistics (FST, dxy). The split time
gives you a single, intuitive knob for the level of differentiation:
larger --div_time produces deeper splits and higher FST.
When using the output for pixy-style
analyses, build the populations file from the sample names emitted by
vcfsim — the first half belong to population A and the second half
to population B.