QCTOOL

View Ticket
Login
Ticket UUID: 4d88dff8cce297233bcea8b0f17c8e66da4996a1
Title: MalformedInputError with GP Data
Status: Fixed Type: Code_Defect
Severity: Critical Priority: Immediate
Subsystem: Resolution: Fixed
Last Modified: 2020-06-30 17:04:40
Version Found In: 2.0.6
User Comments:
gav added on 2020-06-27 15:59:04:

I’ve attached a fake test file that produces the following error.

$ qctool_v2.0.6 -g test.vcf -vcf-genotype-field GP -og test.bgen
Welcome to qctool
(version: 2.0.6, revision )
(C) 2009-2017 University of Oxford
Opening genotype files : [******************************] (1/1,0.0s,158.7/s)
========================================================================
Input SAMPLE file(s): Output SAMPLE file: "(n/a)".
Sample exclusion output file: "(n/a)".
Input GEN file(s):
‌ (not computed) "test.vcf"
‌ (total 1 sources, number of snps not computed).
‌ Number of samples: 10
Output GEN file(s): "test.bgen"
Output SNP position file(s): (n/a)
Sample filter: .
# of samples in input files: 10.
# of samples after filtering: 10 (0 filtered out).
========================================================================
Processing SNPs : (0/?,0.1s,0.0/s)
========================================================================
Number of SNPs:
in input file(s): (not computed).
in output file(s): 0
Number of samples in input file(s): 10.
Output GEN files: (0 snps) "test.bgen"
‌ (total 0 snps).
========================================================================
!! Error (genfile::MalformedInputError): Source "test.vcf" is malformed on line 17..
Thank you for using qctool.

gav added on 2020-06-27 16:01:12:

(From Albert Vernon Smith): This failure arises in the absence of a GT field. If GT present, then GP can be accessed and converted to bgen. Attached is a modified version of the test case that works.


avsmith added on 2020-06-29 01:22:29:

From the BitBucket Issue:

How critical is this to your workflow, e.g. is there a simple workaround that puts the GT field in before using QCTOOL?

Currently, there is no easy work round. We have a large dataset in Savvy format https://github.com/statgen/savvy. Savvy currently exports GP w/o GT to VCF. which would be used as an intermediate on the way to BGEN. We're hoping to make this large imputed dataset available to the community in BGEN format.


gav added on 2020-06-29 07:51:15:

Thanks Albert for the additional info, I will look at solving this.


gav added on 2020-06-30 09:22:38:

Hi Albert,

Can you please try the version (2.0.8-beta) in branch 4d88dff8cc-fix and let me know if this solves the problem?

$ qctool_v2.0.8-beta -g test.vcf -vcf-genotype-field GP -og /tmp/test.bgen 

(...snipped output)

========================================================================

Number of SNPs:
                     -- in input file(s):                 (not computed).
 -- in output file(s):                1

Number of samples in input file(s):   10.

Output GEN files:                     (1      snps)  "/tmp/x.bgen"
                                      (total 1 snps).
========================================================================


Thank you for using qctool.

You should be able to get it at https://enkre.net/cgi-bin/code/qctool/tarball/4d88dff8cc-fix/qctool.tar.gz.

Thanks, g.


gav added on 2020-06-30 09:25:35:

(Note there is an outstanding issue [b952a6be] where you must supply a sample file with `-s` if you want sample identifiers in the output BGEN file -


avsmith added on 2020-06-30 12:24:41:

Unfortunately, I'm still getting an error. Works on the test example with GT & GP, not in the one without.

Command:

~/qctool/build/default/qctool_v2.0.8-beta -g test.vcf -og test.bgen

Result:

Welcome to qctool
(version: 2.0.8-beta, revision )

(C) 2009-2017 University of Oxford

Opening genotype files                                      : [******************************] (1/1,0.0s,452.5/s)
========================================================================

Input SAMPLE file(s):         Output SAMPLE file:             "(n/a)".
Sample exclusion output file:   "(n/a)".

Input GEN file(s):
                                                    (not computed)  "test.vcf"
                                         (total 1 sources, number of snps not computed).
                      Number of samples: 10
Output GEN file(s):             "test.bgen"
Output SNP position file(s):    (n/a)
Sample filter:                  .
# of samples in input files:    10.
# of samples after filtering:   10 (0 filtered out).

========================================================================

Processing SNPs                                             :  (0/?,0.0s,0.0/s)terminate called after throwing an instance of 'genfile::OperationUnsupportedError'
  what():  genfile::OperationUnsupportedError
[1]    7051 abort (core dumped)  ~/qctool/build/default/qctool_v2.0.8-beta -

PS. Thanks for your prompt attention on this.


gav added on 2020-06-30 12:31:46:

Hi Albert, you will need to have -vcf-genotype-field GP in the command-line to get BGEN output, this relates to the way QCTOOL treats GT field as default in VCF. g.


avsmith added on 2020-06-30 13:18:52:

Sillyl me. I knew that, and promptly forgot. It works....

Thanks.


gav added on 2020-06-30 17:01:57:

Fixed in v2.0.8


Attachments: