Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: removed freebayes as it is no longer being included in our standard set
Note

Our pipeline configuration is still in-development, and the contents of this document are subject to change.

Summary of Processing

https://nf-co.re/sarek/usage v3.0

Data Type

Datatype

Method

Output

Tried yet?

WES or WGS

DeepVariant

Germline SNV, INDEL

Yes

WES or WGS

Strelka, Mutect2

, Freebayes (question)

Somatic SNV, INDEL

Yes

WES or WGS

TBD

Germline and Somatic Structural Variants

No

WES or WGS

TBD

Germline and Somatic CNV

No

WES or WGS

TBD

Tumor MSI

No

SNV, INDEL variants

TBD

Annotated Variants

No

https://nf-co.re/rnaseq v3.7

Data Type

Datatype

Method

Output

Tried yet?

RNA-Seq

STAR with Salmon in alignment-based mode

Gene expression counts

Yes

More information about the workflows is available here: /wiki/spaces/WF/pages/2363359776

bam/cram to fastq conversion

  • When fastq files are not available, cram/bam files are converted to fastq using this pipeline: https://github.com/qbic-pipelines/bamtofastq (v1.2.0).

  • If unaligned bam files are available instead of fastq files, we recommend providing u-bam files for direct input to sarek 3.0.

WES and WGS Variant Calling (SNV & INDEL)

...

  • Raw fastq files uploaded to Synapse by researcher in a folder with name format experiment_name_rnaseq_fastq_date. No white space should be present in the filenames (all filenames should have _ for whitespaces.

  • All experiment and sample related annotations need to be added on Synapse before processing can start. This is a required step so that a sample sheet can be generated to trigger the processing workflow

  • The sample sheet should contain the following information in a comma-separated file (.csv) with at least 3 columns, and a header row as shown below : . (More information here)

sample

subject

status

sex

file_1

file_2

lane

parentId

bed_file

output_parent_Id

Synapse specimenID

Synapse individualID

1 (Tumor = 1, Normal 0)

XX or XY

syn://synId

syn://synId

Lane information

SynapseID of parent folder

Synapse ID of BED file (if WES sata)

Synapse ID of folder where all processed files will be indexed

  • The files are pulled into NextFlow workflow setup and processed using the following versions of software:

...

Currently, germline variant calls in VCF format are being processed manually using VEP and vcf2maf

...

RNA

...

Sequencing Data Quantification

Processing RNA-seq files involve transformation of raw data (fastq files) to transcript counts (quants.sf files).

...

  • Raw fastq files uploaded to Synapse by researcher in a folder with name format experiment_name_rnaseq_fastq_date . No white space should be present in the filenames (all filenames should have _ for whitespaces. While the naming convention is a best practices recommendation and not a strict rule, the exclusion of whitespaces is required.

  • All experiment and sample related annotations need to be added on Synapse before processing can start. This is a required step so that a sample sheet can be generated to trigger the processing workflow

  • The sample sheet should contain the following information in the following format (saved as a .csv file) (More information here) :

sample

single_end

fastq_1

fastq_2

strandedness

Synapse specimenID

0 (1 if paired-end)

synID

synID

auto

  • The files are pulled into NextFlow workflow setup and processed using the following versions of software:

    Code Block
    BEDTOOLS_GENOMECOV:
    bedtools: 2.30.0
    CAT_FASTQ:
    cat: 8.3
    CUSTOM_DUMPSOFTWAREVERSIONS:
    python: 3.9.5
    yaml: 5.4.1
    DESEQ2_QC_STAR_SALMON:
    bioconductor-deseq2: 1.28.0
    r-base: 4.0.3
    DUPRADAR:
    bioconductor-dupradar: 1.18.0
    r-base: 4.0.2
    FASTQC:
    fastqc: 0.11.9
    GET_CHROM_SIZES:
    samtools: 1.1
    GTF_GENE_FILTER:
    python: 3.8.3
    PICARD_MARKDUPLICATES:
    picard: 2.25.7
    PRESEQ_LCEXTRAP:
    preseq: 3.1.1
    QUALIMAP_RNASEQ:
    qualimap: 2.2.2-dev
    RSEM_PREPAREREFERENCE_TRANSCRIPTS:
    rsem: 1.3.1
    star: 2.7.6a
    RSEQC_BAMSTAT:
    rseqc: 3.0.1
    RSEQC_INFEREXPERIMENT:
    rseqc: 3.0.1
    RSEQC_INNERDISTANCE:
    rseqc: 3.0.1
    RSEQC_JUNCTIONANNOTATION:
    rseqc: 3.0.1
    RSEQC_JUNCTIONSATURATION:
    rseqc: 3.0.1
    RSEQC_READDISTRIBUTION:
    rseqc: 3.0.1
    RSEQC_READDUPLICATION:
    rseqc: 3.0.1
    SALMON_QUANT:
    salmon: 1.5.2
    SALMON_SE_GENE:
    bioconductor-summarizedexperiment: 1.20.0
    r-base: 4.0.3
    SALMON_TX2GENE:
    python: 3.8.3
    SALMON_TXIMPORT:
    bioconductor-tximeta: 1.8.0
    r-base: 4.0.3
    SAMPLESHEET_CHECK:
    python: 3.8.3
    SAMTOOLS_FLAGSTAT:
    samtools: 1.13
    SAMTOOLS_IDXSTATS:
    samtools: 1.13
    SAMTOOLS_INDEX:
    samtools: 1.13
    SAMTOOLS_SORT:
    samtools: 1.13
    SAMTOOLS_STATS:
    samtools: 1.13
    STAR_ALIGN:
    star: 2.6.1d
    STRINGTIE:
    stringtie: 2.1.7
    TRIMGALORE:
    cutadapt: 3.4
    trimgalore: 0.6.7
    UCSC_BEDCLIP:
    ucsc: 377
    UCSC_BEDGRAPHTOBIGWIG:
    ucsc: 377
    Workflow:
    Nextflow: 21.10.5
    nf-core/rnaseq: '3.4'

...