Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note

Our pipeline configuration is still in-development, and the contents of this document are subject to change.

https://nf-co.re/sarek/usage v2.7.1

Data Type

Method

Output

Tried yet?

WES or WGS

DeepVariant

Germline SNV, INDEL

Yes

WES or WGS

Strelka, Mutect2, Freebayes (question)

Somatic SNV, INDEL

WES or WGS

TBD

Germline and Somatic Structural Variants

WES or WGS

TBD

Germline and Somatic CNV

WES or WGS

TBD

Tumor MSI

SNV, INDEL variants

TBD

Annotated Variants

https://nf-co.re/rnaseq v3.5

Data Type

Method

Output

Tried yet?

RNA-Seq

Salmon

Gene expression counts

Yes

WES and WGS Variant Calling (SNV & INDEL)

Germline SNV + INDEL

This involves transformation of WES fastq or cram files to variant call files in VCF format (.vcf files).

...

Code Block
nf-core/sarek	v2.7.1
Nextflow	v21.10.5
BWA	0.7.17
GATK	v4.1.7.0
FreeBayes	v1.3.2
samtools	v1.9
Strelka	v2.9.10
Manta	v1.6.0
TIDDIT	v2.7.1
AlleleCount	v4.0.2
ASCAT	v2.5.2
Control-FREEC	vv11.6
msisensor	v0.5
SnpEff	v4.3t
VEP	v99.2
MultiQC	v1.8
FastQC	v0.11.9
bcftools	v1.9
CNVkit	v0.9.6
htslib	v1.9
QualiMap	v2.2.2-dev
Trim Galore	v0.6.4_dev
vcftools	v0.1.16
R	v4.0.2

Commands used for running JHU samples on DeepVariant:

All files and sample sheets are first staged in S3 buckets linked to NFTower. then the following command are used to launch the processing pipeline.

...

Profiles:

Code Block
aws_tower

Estimated costs for germline variant calling (per 50 samples)

...

According to the DeepVariant docs, it costs about $1 per WES sample and $12 per WGS sample on Google Cloud using a n1-standard-16 machine (16 vCPUs, 60 GB of memory, $0.76/hour).

...

If we infer the run time from the costs and price per hour, it should be roughly 2 hours per WES sample and 16 hours per WGS sample.

...

Somatic SNV + INDEL

TBD

Annotated Variants

Currently germ-line , germline variant calls in VCF format are being processed manually using VEP and vcf2maf

...

  • The compute cost should range from $50 to $2,500 depending on how many of the 50 samples are WGS and how many mutations they have.

...

RNA SEQUENCING DATA QUANTIFICATION

Processing RNA-seq files involve transformation of raw data (fastq files) to transcript counts (quants.sf files).

...

  • The files are pulled into NextFlow workflow setup and processed using the following versions of software:

    Code Block
    BEDTOOLS_GENOMECOV:
    bedtools: 2.30.0
    CAT_FASTQ:
    cat: 8.3
    CUSTOM_DUMPSOFTWAREVERSIONS:
    python: 3.9.5
    yaml: 5.4.1
    DESEQ2_QC_STAR_SALMON:
    bioconductor-deseq2: 1.28.0
    r-base: 4.0.3
    DUPRADAR:
    bioconductor-dupradar: 1.18.0
    r-base: 4.0.2
    FASTQC:
    fastqc: 0.11.9
    GET_CHROM_SIZES:
    samtools: 1.1
    GTF_GENE_FILTER:
    python: 3.8.3
    PICARD_MARKDUPLICATES:
    picard: 2.25.7
    PRESEQ_LCEXTRAP:
    preseq: 3.1.1
    QUALIMAP_RNASEQ:
    qualimap: 2.2.2-dev
    RSEM_PREPAREREFERENCE_TRANSCRIPTS:
    rsem: 1.3.1
    star: 2.7.6a
    RSEQC_BAMSTAT:
    rseqc: 3.0.1
    RSEQC_INFEREXPERIMENT:
    rseqc: 3.0.1
    RSEQC_INNERDISTANCE:
    rseqc: 3.0.1
    RSEQC_JUNCTIONANNOTATION:
    rseqc: 3.0.1
    RSEQC_JUNCTIONSATURATION:
    rseqc: 3.0.1
    RSEQC_READDISTRIBUTION:
    rseqc: 3.0.1
    RSEQC_READDUPLICATION:
    rseqc: 3.0.1
    SALMON_QUANT:
    salmon: 1.5.2
    SALMON_SE_GENE:
    bioconductor-summarizedexperiment: 1.20.0
    r-base: 4.0.3
    SALMON_TX2GENE:
    python: 3.8.3
    SALMON_TXIMPORT:
    bioconductor-tximeta: 1.8.0
    r-base: 4.0.3
    SAMPLESHEET_CHECK:
    python: 3.8.3
    SAMTOOLS_FLAGSTAT:
    samtools: 1.13
    SAMTOOLS_IDXSTATS:
    samtools: 1.13
    SAMTOOLS_INDEX:
    samtools: 1.13
    SAMTOOLS_SORT:
    samtools: 1.13
    SAMTOOLS_STATS:
    samtools: 1.13
    STAR_ALIGN:
    star: 2.6.1d
    STRINGTIE:
    stringtie: 2.1.7
    TRIMGALORE:
    cutadapt: 3.4
    trimgalore: 0.6.7
    UCSC_BEDCLIP:
    ucsc: 377
    UCSC_BEDGRAPHTOBIGWIG:
    ucsc: 377
    Workflow:
    Nextflow: 21.10.5
    nf-core/rnaseq: '3.4'

Command used to process JHU Biobank samples:

Params:

Code Block
input: s3://jhu-biobank-nf-project-tower-bucket/jobs/01-nfcore-rnaseq-3.4/inputs/sample-sheet.csv
outdir: s3://jhu-biobank-nf-project-tower-bucket/jobs/01-nfcore-rnaseq-3.4/outputs/
genome: GRCh38
igenomes_base: s3://sage-igenomes/igenomes

...

Profile:

Code Block
aws_tower 

Estimated costs for processing

...

Estimated Cost per sample = $0.20 ($51 for 261 samples)

...