HRD Prediction Workflow Using WES Data in Tumor-Only Mode

Answered using ChatGPT 4o by ChatGPT4o

Question:

can you describe steps for HRD prediction with using different software in tumor only mode with wes data?

Answer:

Certainly! Homologous Recombination Deficiency (HRD) prediction using Whole Exome Sequencing (WES) data in tumor-only mode involves several steps. Here’s a general workflow using different software tools:

1. Data Preparation

  • Sequencing Data: Obtain raw WES data (FASTQ files) from the tumor sample.
  • Quality Control: Use tools like FastQC to assess the quality of the raw sequencing data.

2. Read Alignment

  • Align Reads: Use an aligner like BWA-MEM to map the reads to the reference genome (e.g., GRCh38).
    bwa mem -t 8 reference_genome.fa sample_R1.fastq.gz sample_R2.fastq.gz > aligned_reads.sam
    

3. Post-Alignment Processing

  • Convert SAM to BAM: Use SAMtools to convert SAM to BAM and sort the BAM file.
    samtools view -Sb aligned_reads.sam | samtools sort -o sorted_reads.bam
    
  • Mark Duplicates: Use Picard to mark duplicate reads.
    picard MarkDuplicates I=sorted_reads.bam O=dedup_reads.bam M=metrics.txt
    
  • Index BAM File: Index the BAM file using SAMtools.
    samtools index dedup_reads.bam
    

4. Variant Calling

  • Call Variants: Use a variant caller like GATK HaplotypeCaller to call variants.
    gatk HaplotypeCaller -R reference_genome.fa -I dedup_reads.bam -O raw_variants.vcf
    

5. Variant Filtering

  • Filter Variants: Apply filters to the raw variant calls using GATK VariantFiltration.
    gatk VariantFiltration -R reference_genome.fa -V raw_variants.vcf -O filtered_variants.vcf --filter-expression "QD < 2.0 || FS > 60.0 || MQ < 40.0" --filter-name "basic_snp_filter"
    

6. HRD Score Calculation

  • HRD Tools: Use specialized tools for HRD prediction. Some popular tools include:
    • HRDetect: A machine learning-based tool that integrates multiple types of genomic data.
    • ScarHRD: A tool that calculates HRD scores based on the number of large-scale state transitions (LSTs), loss of heterozygosity (LOH), and telomeric allelic imbalance (TAI).

Example with ScarHRD:

  • Prepare Input: Convert the VCF file to the required format for ScarHRD.
  • Run ScarHRD: Use the ScarHRD R package to calculate the HRD score.
    library(scarHRD)
    scarHRD(input_file = "filtered_variants.vcf", reference = "GRCh38", output_file = "HRD_results.txt")
    

7. Interpretation and Reporting

  • Interpret Results: Analyze the HRD scores and other relevant metrics to determine the HRD status of the tumor.
  • Generate Report: Compile the findings into a comprehensive report for clinical or research purposes.

Additional Considerations

  • Tumor Purity and Ploidy: Tools like FACETS or PureCN can be used to estimate tumor purity and ploidy, which can affect HRD prediction.
  • Annotation: Use tools like ANNOVAR or VEP to annotate the variants for better interpretation.

Summary

  1. Quality Control: FastQC
  2. Read Alignment: BWA-MEM
  3. Post-Alignment Processing: SAMtools, Picard
  4. Variant Calling: GATK HaplotypeCaller
  5. Variant Filtering: GATK VariantFiltration
  6. HRD Score Calculation: ScarHRD or HRDetect
  7. Interpretation and Reporting

This workflow provides a comprehensive approach to HRD prediction using WES data in tumor-only mode. Each step involves specific tools and commands that need to be executed in sequence to ensure accurate and reliable results.