...
Look in your S3 bucket for the results.
How to gain more parallelization by splitting your input files into multiple chunks
You can find the Python 2.7 script to split the scripts in subversion: phaseSplit.py
- Usage:
Code Block ~/>python2.7 phaseSplit.py --help usage: phaseSplit.py [-h] --phaseInputFile PHASEINPUTFILE [--minColumnsPerFile MINCOLUMNSPERFILE] [--columnOverlap COLUMNOVERLAP] Split phase input files into smaller chunks optional arguments: -h, --help show this help message and exit --phaseInputFile PHASEINPUTFILE, -p PHASEINPUTFILE the file path to the phase input file to be split --minColumnsPerFile MINCOLUMNSPERFILE, -m MINCOLUMNSPERFILE the minimum number of columns to output per file --columnOverlap COLUMNOVERLAP, -o COLUMNOVERLAP the number of columns to overlap with each file
- How to run it:
Code Block ~/>python2.7 phaseSplit.py -p ProSM_chrom_21.phase.inp Sample 0 chunk 0 startColumn 0 endColumn 100 Sample 0 chunk 1 startColumn 80 endColumn 180 Sample 0 chunk 2 startColumn 160 endColumn 260 Sample 0 chunk 3 startColumn 240 endColumn 340 ... Sample 73 chunk 155 startColumn 12400 endColumn 12500 Sample 73 chunk 156 startColumn 12480 endColumn 12564 SUCCESS: ProSM_chrom_21.phase.inp and ProSM_chrom_21.phase.inp_sanityCheck are equivalent