Data Type	Method	Output	Tried yet?
https://nf-co.re/sarek/usage v2.7.1
WES or WGS	DeepVariant	Germline SNV, INDEL	Yes
WES or WGS	Strelka, Mutect2, Freebayes	Somatic SNV, INDEL
WES or WGS	TBD	Germline and Somatic Structural Variants
WES or WGS	TBD	Germline and Somatic CNV
WES or WGS	TBD	Tumor MSI
SNV, INDEL variants	TBD	Annotated Variants

Data Type	Method	Output	Tried yet?
https://nf-co.re/rnaseq v3.5
RNA-Seq	Salmon	Gene expression counts	Yes

WES and WGS Variant Calling (SNV & INDEL)

Germline SNV + INDEL

This involves transformation of WES fastq or cram files to variant call files in VCF format (.vcf files).

As of Jan 2022, the reference genome used is GRCh38.

The processing steps include the following:

Raw fastq files uploaded to Synapse by researcher in a folder with name format experiment_name_rnaseq_fastq_date . No white space should be present in the filenames (all filenames should have _ for whitespaces.
All experiment and sample related annotations need to be added on Synapse before processing can start. This is a required step so that a sample sheet can be generated to trigger the processing workflow
The sample sheet should contain the following information in the following format (saved as a .txt file) :

sample	subject	status	sex	file_1	file_2	lane	parentId
Synapse specimenID	Synapse individualID	1 (Tumor = 1, Normal 0)	XX or XY	synID	synID	Lane information	Synapse folder information

The files are pulled into NextFlow workflow setup and processed using the following versions of software:

Code Block

nf-core/sarek	v2.7.1
Nextflow	v21.10.5
BWA	0.7.17
GATK	v4.1.7.0
FreeBayes	v1.3.2
samtools	v1.9
Strelka	v2.9.10
Manta	v1.6.0
TIDDIT	v2.7.1
AlleleCount	v4.0.2
ASCAT	v2.5.2
Control-FREEC	vv11.6
msisensor	v0.5
SnpEff	v4.3t
VEP	v99.2
MultiQC	v1.8
FastQC	v0.11.9
bcftools	v1.9
CNVkit	v0.9.6
htslib	v1.9
QualiMap	v2.2.2-dev
Trim Galore	v0.6.4_dev
vcftools	v0.1.16
R	v4.0.2

Estimated costs for germline variant calling (per 50 samples)

According to the DeepVariant docs, it costs about $1 per WES sample and $12 per WGS sample on Google Cloud using a n1-standard-16 machine (16 vCPUs, 60 GB of memory, $0.76/hour).
If we infer the run time from the costs and price per hour, it should be roughly 2 hours per WES sample and 16 hours per WGS sample.
More information here:
Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key WORKFLOWS-98

Somatic SNV + INDEL

TBD

Annotated Variants

Currently germ-line variant calls in VCF format are being processed manually using VEP and vcf2maf

Estimated costs for germ-line variant annotation (per 50 samples) using VEP

The compute cost should range from $50 to $2,500 depending on how many of the 50 samples are WGS and how many mutations they have.

RNA SEQUENCING DATA QUANTIFICATION

Processing RNA-seq files involve transformation of raw data (fastq files) to transcript counts (quants.sf files).

The quantification software of choice is Salmon.

As of Jan 2022, the reference genome used is GRCh38.

Processing involves the following steps:

Raw fastq files uploaded to Synapse by researcher in a folder with name format experiment_name_rnaseq_fastq_date . No white space should be present in the filenames (all filenames should have _ for whitespaces.
All experiment and sample related annotations need to be added on Synapse before processing can start. This is a required step so that a sample sheet can be generated to trigger the processing workflow
The sample sheet should contain the following information in the following format (saved as a .csv file):

sample	single_end	fastq_1	fastq_2	strandedness
Synapse specimenID	0 (1 if paired-end)	synID	synID	reverse or forward

The files are pulled into NextFlow workflow setup and processed using the following versions of software:

Code Block

BEDTOOLS_GENOMECOV:
bedtools: 2.30.0
CAT_FASTQ:
cat: 8.3
CUSTOM_DUMPSOFTWAREVERSIONS:
python: 3.9.5
yaml: 5.4.1
DESEQ2_QC_STAR_SALMON:
bioconductor-deseq2: 1.28.0
r-base: 4.0.3
DUPRADAR:
bioconductor-dupradar: 1.18.0
r-base: 4.0.2
FASTQC:
fastqc: 0.11.9
GET_CHROM_SIZES:
samtools: 1.1
GTF_GENE_FILTER:
python: 3.8.3
PICARD_MARKDUPLICATES:
picard: 2.25.7
PRESEQ_LCEXTRAP:
preseq: 3.1.1
QUALIMAP_RNASEQ:
qualimap: 2.2.2-dev
RSEM_PREPAREREFERENCE_TRANSCRIPTS:
rsem: 1.3.1
star: 2.7.6a
RSEQC_BAMSTAT:
rseqc: 3.0.1
RSEQC_INFEREXPERIMENT:
rseqc: 3.0.1
RSEQC_INNERDISTANCE:
rseqc: 3.0.1
RSEQC_JUNCTIONANNOTATION:
rseqc: 3.0.1
RSEQC_JUNCTIONSATURATION:
rseqc: 3.0.1
RSEQC_READDISTRIBUTION:
rseqc: 3.0.1
RSEQC_READDUPLICATION:
rseqc: 3.0.1
SALMON_QUANT:
salmon: 1.5.2
SALMON_SE_GENE:
bioconductor-summarizedexperiment: 1.20.0
r-base: 4.0.3
SALMON_TX2GENE:
python: 3.8.3
SALMON_TXIMPORT:
bioconductor-tximeta: 1.8.0
r-base: 4.0.3
SAMPLESHEET_CHECK:
python: 3.8.3
SAMTOOLS_FLAGSTAT:
samtools: 1.13
SAMTOOLS_IDXSTATS:
samtools: 1.13
SAMTOOLS_INDEX:
samtools: 1.13
SAMTOOLS_SORT:
samtools: 1.13
SAMTOOLS_STATS:
samtools: 1.13
STAR_ALIGN:
star: 2.6.1d
STRINGTIE:
stringtie: 2.1.7
TRIMGALORE:
cutadapt: 3.4
trimgalore: 0.6.7
UCSC_BEDCLIP:
ucsc: 377
UCSC_BEDGRAPHTOBIGWIG:
ucsc: 377
Workflow:
Nextflow: 21.10.5
nf-core/rnaseq: '3.4'

Estimated costs for processing

Estimated Cost per sample = $0.20 ($51 for 261 samples)
Estimated time per 100 samples = approx 6 hrs

Versions Compared

Old Version 13

New Version 14

Key

WES and WGS Variant Calling (SNV & INDEL)

Germline SNV + INDEL

Estimated costs for germline variant calling (per 50 samples)

Somatic SNV + INDEL

Annotated Variants

Estimated costs for germ-line variant annotation (per 50 samples) using VEP

RNA SEQUENCING DATA QUANTIFICATION

Estimated costs for processing

Page Comparison

Versions Compared

Old Version 13

New Version 14

Key

WES and WGS Variant Calling (SNV & INDEL)

Germline SNV + INDEL

Estimated costs for germline variant calling (per 50 samples)

Somatic SNV + INDEL

Annotated Variants

Estimated costs for germ-line variant annotation (per 50 samples) using VEP

RNA SEQUENCING DATA QUANTIFICATION

Estimated costs for processing