How much accuracy do you get?
Probability values in BGEN are stored to at least 4 decimal places in BGEN v1.1, or to within 1/(pow(2,b)-1), where b is the number of bits used, in BGEN v1.2. This gives the following table
Number of bits | Probabilities accurate to within | Decimal places of accuracy |
---|---|---|
1 | 1 | 0 |
2 | 1/3 | 0 |
3 | 1/7 | 0 |
4 | 1/15 | 0 |
5 | 1/31 | 1 |
6 | 1/63 | 1 |
7 | 1/127 | 1 |
8 | 1/255 | 2 |
9 | 1/511 | 2 |
10 | 1/1023 | 2 |
11 | 1/2047 | 3 |
12 | 1/4095 | 3 |
13 | 1/4095 | 3 |
14 | 1/4095 | 3 |
16 | 1/65535 | 4 |
How much accuracy do you need?
The following graphs depict a comparison of -log10 P-values for an association test conducted using an imputed GEN file (probabilities stored to about 3 decimal places of accuracy) versus the same data converted to BGEN v1.2 at different precisions. At 8 bits there's a maximum discrepancy of around 0.03 in the -log10( pvalues ). (The data here represents imputed genotypes at 48,000 and a simulated case/control trait. Only SNPs with a minor allele count of 100, i.e. MAF > 0.1% were tested; no IMPUTE info threshhold was applied.