HRD Prediction Workflow Using WES Data in Tumor-Only Mode

Answer:

Certainly! Homologous Recombination Deficiency (HRD) prediction using Whole Exome Sequencing (WES) data in tumor-only mode involves several steps. Here’s a general workflow using different software tools:

1. Data Preparation

Sequencing Data: Obtain raw WES data (FASTQ files) from the tumor sample.
Quality Control: Use tools like FastQC to assess the quality of the raw sequencing data.

2. Read Alignment

Align Reads: Use an aligner like BWA-MEM to map the reads to the reference genome (e.g., GRCh38).
```
bwa mem -t 8 reference_genome.fa sample_R1.fastq.gz sample_R2.fastq.gz > aligned_reads.sam
```

3. Post-Alignment Processing

Convert SAM to BAM: Use SAMtools to convert SAM to BAM and sort the BAM file.
```
samtools view -Sb aligned_reads.sam | samtools sort -o sorted_reads.bam
```

Mark Duplicates: Use Picard to mark duplicate reads.

picard MarkDuplicates I=sorted_reads.bam O=dedup_reads.bam M=metrics.txt

Index BAM File: Index the BAM file using SAMtools.
```
samtools index dedup_reads.bam
```

4. Variant Calling

Call Variants: Use a variant caller like GATK HaplotypeCaller to call variants.

gatk HaplotypeCaller -R reference_genome.fa -I dedup_reads.bam -O raw_variants.vcf

5. Variant Filtering

Filter Variants: Apply filters to the raw variant calls using GATK VariantFiltration.

gatk VariantFiltration -R reference_genome.fa -V raw_variants.vcf -O filtered_variants.vcf --filter-expression "QD < 2.0 || FS > 60.0 || MQ < 40.0" --filter-name "basic_snp_filter"

6. HRD Score Calculation

HRD Tools: Use specialized tools for HRD prediction. Some popular tools include:
- HRDetect: A machine learning-based tool that integrates multiple types of genomic data.
- ScarHRD: A tool that calculates HRD scores based on the number of large-scale state transitions (LSTs), loss of heterozygosity (LOH), and telomeric allelic imbalance (TAI).

Example with ScarHRD:

Prepare Input: Convert the VCF file to the required format for ScarHRD.

Run ScarHRD: Use the ScarHRD R package to calculate the HRD score.

library(scarHRD)
scarHRD(input_file = "filtered_variants.vcf", reference = "GRCh38", output_file = "HRD_results.txt")

7. Interpretation and Reporting

Interpret Results: Analyze the HRD scores and other relevant metrics to determine the HRD status of the tumor.
Generate Report: Compile the findings into a comprehensive report for clinical or research purposes.

Additional Considerations

Tumor Purity and Ploidy: Tools like FACETS or PureCN can be used to estimate tumor purity and ploidy, which can affect HRD prediction.
Annotation: Use tools like ANNOVAR or VEP to annotate the variants for better interpretation.

Summary

Quality Control: FastQC
Read Alignment: BWA-MEM
Post-Alignment Processing: SAMtools, Picard
Variant Calling: GATK HaplotypeCaller
Variant Filtering: GATK VariantFiltration
HRD Score Calculation: ScarHRD or HRDetect
Interpretation and Reporting

This workflow provides a comprehensive approach to HRD prediction using WES data in tumor-only mode. Each step involves specific tools and commands that need to be executed in sequence to ensure accurate and reliable results.