Skip to content
Mike DeLaurentis edited this page Oct 2, 2012 · 10 revisions

Running RUM

This page covers some common use cases for running RUM. Note that you can get detailed help by running rum_runner help. All these examples assume you have installed RUM system-wide or have added the bin directory that contains rum_runner to your PATH.

RUM will create several output files and many intermediate temporary files. When you run rum_runner, you will need to specify an output directory for these files. Once you've started a job running in an output directory, you can run other RUM operations like checking the job status or killing the job just by specifying the output directory (more details below). You can only use an output directory to contain the results of mapping one set of input reads. If you have multiple samples and you want to map them separately, please use a different output directory for each one, or else the output file names will clash.

Aligning reads

Let's assume your data is in a directory called "data/Lane1" and your indexes are in "$RUM_INDEXES". For single end data, you would execute the command shown below. The reads files can be FASTA or FASTQ.

rum_runner align
    --index $RUM_INDEXES/ORGANISM \
    --output data/Lane1           \
    --name Lane1                  \
    --chunks 1                    \
    data/Lane1/reads.txt

For paired-end run as follows

rum_runner align
    --index $RUM_INDEXES/ORGANISM \
    --output data/Lane1           \
    --name Lane1                  \
    --chunks 1                    \
    data/Lane1/forwardreads.txt data/Lane1/reversereads.txt

Change "ORGANISM" to your specific organism. Note that you'll need an index corresponding to that organism. You can find more information about installing indexes here.

This will run the entire alignment job in one piece. To parallelize, you must be on a machine with multiple cores or have access to multiple machines. To run in parallel on a single machine, simply add a --chunks N option, where N is the number of chunks to split the input into. If you are an a cluster, please take a look at these instructions for running a job on a cluster.

Unless you have processors or nodes left over, you should wait until it's completely done, running one lane before doing another. Each lane can take many hours.

The --name Lane1 argument is a name that you might want to change to be more descriptive, however it must be all letters, numbers, dashes, underscores, periods, nothing else.

Checking a job's status

After you start a job, you can check on its status by running

rum_runner status -o OUTPUT_DIR

This will print a short report showing which steps of the pipeline have been completed. Please see rum_runner help status for more information.

Stopping a job

If you need to stop a job and restart it later, use

rum_runner stop -o OUTPUT_DIR

Please see rum_runner help stop for more information.

Resuming a job

If you have a job that crashed or that you had to stop for some reason, you should be able to resume it from where it stopped with:

rum_runner resume -o OUTPUT_DIR

If you want to restart it from an earlier step, you can use the --from-step option:

rum_runner resume -o OUTPUT_DIR --from-step 17

You can find the step numbers for your job using rum_runner status. Note that unless you ran RUM with the --no-clean option, it may have to actually back up to an earlier step if intermediate files have been cleaned up.

Killing a job

If you want to completely abort a job, you can use

rum_runner kill -o OUTPUT_DIR

This will stop the job and remove all the output files associated with it. DO NOT run this if you want to restart the job later from where it left off; use rum_runner stop instead in that case.

Cleaning up

By default, RUM should remove all of the intermediate and temporary files it creates. However, if it encounters any errors during the run, it may leave some files around in order to help with debugging. To clean those files up, you can run

rum_runner clean -o OUTPUT_DIR

Please see rum_runner help clean for more information.

Creating gene profiles

Once you have multiple lanes mapped, there is a script called 'featurequant2geneprofiles.pl' in the scripts directory that will create one spreadsheet of the normalized intensities with rows=genes and columns=samples. Run it without parameters to get the usage:

perl rum/bin/featurequant2geneprofiles.pl

Next: RUM output files

Clone this wiki locally