Streaming data from aspera host to S3 bucket
One way of executing a large data transfer (e.g. for a seq project) is to write the data directly from the data provider to an S3 bucket, without having to first download the data. This requires that the data provider has an aspera license. The data will not be written to the machine running the transfer, but will go directly to the S3 bucket. Supposedly this works equally well streaming to Google Cloud Storage, but I have not tested it.
Step-by-step guide
To run the data transfer from an AWS EC2 machine:
Create an EC2 instance with permissions to access the S3 bucket.
Install aspera software.
Copy aspera license to /opt/aspera/etc/aspera-license
Install AWS CLI
Check that you can access the S3 bucket, for example by running "ls".
Run the data copy:
aspera copy
/opt/aspera/bin/astrap-config.sh enable
ascp -A
ascp -k 3 -v -p -l400m --mode recv --user=“USERNAME” --host=ASPERA.HOST.ADDRESS DIRECTORY_TO_DOWNLOAD s3://s3.amazonaws.com/YOUR_BUCKET/Where
USERNAME is your login ID on the aspera server
ASPERA.HOST.ADDRESS is the location of the aspera server
DIRECTORY_TO_DOWNLOAD is the name of the file or directory you want to copy to S3
YOUR_BUCKET is the name of the S3 bucket to which the data will be copied.
Note that I was provided a temporary copy of the full aspera software and license in order to test this procedure. I was told at the time (Fall, 2014) by the CEO that they were interested in offering a product for doing this type of transfer.
I tested this procedure from an Amazon EC2 instance using ami-8997afe0. I do not know what performance to expect running it from belltown or other Hutch local compute.