Skip to content

AI consensus calling error on WGS samples #131

@GACGAMA

Description

@GACGAMA

I'm trying to run somaticseq_parallel on some samples VCFs to call the AI consensus.
The version for SomaticSeq is SomaticSeq v3.7.3. Version of XGBOOST is 2.0.2
I've run all mutation callers, then, with the VCF files, did the following command:

somaticseq_parallel.py --classifier-snv /scratch4/nsobrei2/ggama1/training/somaticseq/ai_model_titration_ffpe_wgs_synth/SNV_model.classifier --classifier-indel /scratch4/nsobrei2/ggama1/training/somaticseq/ai_model_titration_ffpe_wgs_synth/INDEL_model.classifier --output-directory /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR --genome-reference /scratch4/nsobrei2/references/ncbi_grch38_cipher/GRCh38_full_analysis_set_plus_decoy_hla.fa -dbsnp /scratch4/nsobrei2/references/dbsnp/138_cipher/Homo_sapiens_assembly38.dbsnp138.vcf.gz --threads 38 paired --tumor-bam-file /scratch4/nsobrei2/ggama1/germline-tumor/bams/BH12847_1_TUMOR.bam --normal-bam-file /scratch4/nsobrei2/ggama1/germline-tumor/bams/BH12847_1_GERMLINE.bam --mutect2-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.MuTect2.vcf.gz --vardict-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarDict.vcf.gz --somaticsniper-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.SomaticSniper.vcf.gz --muse-vcf /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.MuSE.vcf.gz --strelka-snv /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.Strelka.snv.vcf.gz --strelka-indel /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.Strelka.indel.vcf.gz --varscan-snv /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarScan2.snv.vcf.gz --varscan-indel /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarScan2.indel.vcf.gz --lofreq-snv /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.LoFreq.snv.vcf.gz --lofreq-indel /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.LoFreq.indel.vcf.gz

This is the output with the error


INFO 2024-01-29 21:25:59,514 SomaticSeq           SomaticSeq Input Arguments: output_directory=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR, genome_reference=/scratch4/nsobrei2/references/ncbi_grch38_cipher/GRCh38_full_analysis_set_plus_decoy_hla.fa, truth_snv=None, truth_indel=None, classifier_snv=/scratch4/nsobrei2/ggama1/training/somaticseq/ai_model_titration_ffpe_wgs_synth/SNV_model.classifier, classifier_indel=/scratch4/nsobrei2/ggama1/training/somaticseq/ai_model_titration_ffpe_wgs_synth/INDEL_model.classifier, pass_threshold=0.5, lowqual_threshold=0.1, algorithm=xgboost, homozygous_threshold=0.85, heterozygous_threshold=0.01, minimum_mapping_quality=1, minimum_base_quality=5, minimum_num_callers=0.5, dbsnp_vcf=/scratch4/nsobrei2/references/dbsnp/138_cipher/Homo_sapiens_assembly38.dbsnp138.vcf.gz, cosmic_vcf=None, inclusion_region=None, exclusion_region=None, threads=38, somaticseq_train=False, seed=0, tree_depth=12, iterations=None, features_excluded=[], extra_hyperparameters=None, keep_intermediates=False, tumor_bam_file=/scratch4/nsobrei2/ggama1/germline-tumor/bams/BH12847_1_TUMOR.bam, normal_bam_file=/scratch4/nsobrei2/ggama1/germline-tumor/bams/BH12847_1_GERMLINE.bam, tumor_sample=TUMOR, normal_sample=NORMAL, mutect_vcf=None, indelocator_vcf=None, mutect2_vcf=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.MuTect2.vcf.gz, varscan_snv=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarScan2.snv.vcf.gz, varscan_indel=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarScan2.indel.vcf.gz, jsm_vcf=None, somaticsniper_vcf=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.SomaticSniper.vcf.gz, vardict_vcf=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.VarDict.vcf.gz, muse_vcf=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.MuSE.vcf.gz, lofreq_snv=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.LoFreq.snv.vcf.gz, lofreq_indel=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.LoFreq.indel.vcf.gz, scalpel_vcf=None, strelka_snv=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.Strelka.snv.vcf.gz, strelka_indel=/scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/vcf_per_sample/extracted_vcf/kids_first/unsorted/BH12847_1_TUMOR.Strelka.indel.vcf.gz, tnscope_vcf=None, platypus_vcf=None, arbitrary_snvs=[], arbitrary_indels=[], which=paired
***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

***** WARNING: File /scratch4/nsobrei2/ggama1/germline-tumor/cavatica/somaticseq/consensus_AI/kids_first/BH12847_1_TUMOR/38.th.input.bed has inconsistent naming convention for record:
HLA-A*01:01:01:01	0	3503

2024-01-29 21:29:24,802 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:29:24,802 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:29:43,208 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:29:43,208 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:29:55,957 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:29:55,957 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:29:57,641 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:29:57,641 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:00,880 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:00,880 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:03,324 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:03,324 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:05,665 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:05,665 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:05,670 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:05,670 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:06,451 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:06,451 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:07,968 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:07,968 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:08,179 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:08,179 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:08,784 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:08,784 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:17,032 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:17,032 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:17,879 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:17,879 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:17,993 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:17,993 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:18,751 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:18,751 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:23,687 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:23,687 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:24,247 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:24,247 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:24,306 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:24,306 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:25,604 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:25,604 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:26,489 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:26,489 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:26,632 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:26,632 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:27,644 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:27,644 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:27,884 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:27,884 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:28,425 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:28,425 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:28,616 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:28,616 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:29,069 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:29,069 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:29,767 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:29,767 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:30,179 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:30,179 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:30,292 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:30,292 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:30,705 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:30,705 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:30,930 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:30,930 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:31,435 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:31,435 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:31,742 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:31,742 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:31,956 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:31,956 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:32,202 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:32,202 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:34,058 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:34,058 somatic_vcf2tsv.py   NO RE-SCALING
2024-01-29 21:30:36,062 - somatic_vcf2tsv.py - INFO - NO RE-SCALING
INFO 2024-01-29 21:30:36,062 somatic_vcf2tsv.py   NO RE-SCALING
INFO 2024-01-29 22:26:09,775 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:26:09,775 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 22:43:11,993 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:43:11,993 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 22:44:54,332 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:44:54,332 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 22:53:46,696 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:53:46,696 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 22:57:05,534 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:57:05,534 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 22:57:38,264 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:57:38,264 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 22:58:31,952 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 22:58:31,952 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:00:51,089 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:00:51,089 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:03:32,194 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:03:32,194 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:04:09,206 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:04:09,206 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:07:29,075 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:07:29,075 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:08:31,220 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:08:31,220 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:08:58,106 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:08:58,106 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:09:43,311 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:09:43,311 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:10:37,123 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:10:37,123 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:11:15,132 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:11:15,132 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:14:03,066 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:14:03,066 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:15:32,899 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:15:32,900 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:18:17,799 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:18:17,799 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:18:40,118 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:18:40,118 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:19:20,634 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:19:20,634 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:21:47,766 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:21:47,766 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:30:41,076 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:30:41,076 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:31:09,867 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:31:09,868 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:31:36,892 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:31:36,892 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:32:03,360 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:32:03,361 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:33:42,153 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:33:42,153 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:34:06,125 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:34:06,126 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:45:14,909 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:45:14,909 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-29 23:54:18,994 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-29 23:54:18,994 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:02:30,329 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:02:30,329 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:02:43,281 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:02:43,281 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:04:03,375 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:04:03,375 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:07:53,272 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:07:53,272 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:20:54,193 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:20:54,193 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:26:38,802 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:26:38,802 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:29:20,286 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:29:20,286 xgboost_predictor    Number of trees to use = 100
INFO 2024-01-30 00:38:09,574 xgboost_predictor    Columns removed for prediction: CHROM,POS,ID,REF,ALT,Strelka_QSS,Strelka_TQSS,if_COSMIC,COSMIC_CNT,TrueVariant_or_False
INFO 2024-01-30 00:38:09,575 xgboost_predictor    Number of trees to use = 100
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/ggama1/.conda/envs/somaticseq/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/ggama1/.conda/envs/somaticseq/lib/python3.11/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/home/ggama1/programs/somaticseq/somaticseq/somaticseq_parallel.py", line 84, in runPaired_by_region
    run_somaticseq.runPaired(
  File "/home/ggama1/programs/somaticseq/somaticseq/run_somaticseq.py", line 169, in runPaired
    modelPredictor(ensembleSnv, classifiedSnvTsv, algo, classifier_snv, iterations=iterations, features_to_exclude=features_excluded)
  File "/home/ggama1/programs/somaticseq/somaticseq/run_somaticseq.py", line 87, in modelPredictor
    somatic_xgboost.predictor(classifier, input_file, output_file, non_features, iterations)
  File "/home/ggama1/programs/somaticseq/somaticseq/somatic_xgboost.py", line 173, in predictor
    scores = xgb_model.predict(dtest, ntree_limit=iterations)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Booster.predict() got an unexpected keyword argument 'ntree_limit'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ggama1/.conda/envs/somaticseq/bin/somaticseq_parallel.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/ggama1/programs/somaticseq/somaticseq/somaticseq_parallel.py", line 308, in <module>
    subdirs = pool.map(runPaired_by_region_i, bed_splitted)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggama1/.conda/envs/somaticseq/lib/python3.11/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ggama1/.conda/envs/somaticseq/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
TypeError: Booster.predict() got an unexpected keyword argument 'ntree_limit'

The output of the created AI model, used in the above code, was:


INFO 2024-01-27 08:52:51,190 xgboost_builder      Columns removed before training: CHROM, POS, ID, REF, ALT, Strelka_QSS, Strelka_TQSS, if_COSMIC, COSMIC_CNT, TrueVariant_or_False
INFO 2024-01-27 08:52:51,190 xgboost_builder      Number of boosting rounds = 1000
INFO 2024-01-27 08:52:51,191 xgboost_builder      Hyperparameters: max_depth=8, nthread=48, objective=binary:logistic, seed=0, tree_method=hist, grow_policy=lossguide
/home/ggama1/.conda/envs/somaticseq/lib/python3.11/site-packages/xgboost/core.py:160: UserWarning: [09:07:04] WARNING: /workspace/src/c_api/c_api.cc:1240: Saving into deprecated binary model format, please consider using `json` or `ubj`. Model format will default to JSON in XGBoost 2.2 if not specified.
  warnings.warn(smsg, UserWarning)


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions