Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Look in your S3 bucket for the results.

How to gain more parallelization by splitting your input files into multiple chunks

You can find the Python 2.7 script to split the scripts in subversion: phaseSplit.py

  • Usage:
    Code Block
    
    ~/>python2.7 phaseSplit.py --help
    usage: phaseSplit.py [-h] --phaseInputFile PHASEINPUTFILE
                         [--minColumnsPerFile MINCOLUMNSPERFILE]
                         [--columnOverlap COLUMNOVERLAP]
    
    Split phase input files into smaller chunks
    
    optional arguments:
      -h, --help            show this help message and exit
      --phaseInputFile PHASEINPUTFILE, -p PHASEINPUTFILE
                            the file path to the phase input file to be split
      --minColumnsPerFile MINCOLUMNSPERFILE, -m MINCOLUMNSPERFILE
                            the minimum number of columns to output per file
      --columnOverlap COLUMNOVERLAP, -o COLUMNOVERLAP
                            the number of columns to overlap with each file
    
  • How to run it:
    Code Block
    
    ~/>python2.7 phaseSplit.py -p ProSM_chrom_21.phase.inp
    Sample 0 chunk 0 startColumn 0 endColumn 100
    Sample 0 chunk 1 startColumn 80 endColumn 180
    Sample 0 chunk 2 startColumn 160 endColumn 260
    Sample 0 chunk 3 startColumn 240 endColumn 340
    ...
    Sample 73 chunk 155 startColumn 12400 endColumn 12500
    Sample 73 chunk 156 startColumn 12480 endColumn 12564
    SUCCESS: ProSM_chrom_21.phase.inp and ProSM_chrom_21.phase.inp_sanityCheck are equivalent