Sunday, November 28, 2010

How to split one file into multiple files on linux?

When you have a list of input files that you want to analyze, but the software you are using doesn't support paralelization, a convenient way of speeding up the analysis is by running the program on a subsets of your files at the same time.
If you have a list of input files in one file (input.files.txt), it can easily be split into multiple files using the linux split command.

split -a 2 -l 500 -d input.files.txt input.file.

which will produce files named:

input.file.00
input.file.01
...
input.file.99

with each file containing 500 rows.

5 comments:

  1. Can you use this for fasta files and make sure that fasta id and the sequence don't end up on different files ?

    ReplyDelete
  2. I have no idea how to do it.

    ReplyDelete
  3. here are other ideas on how to do this for fasta files:
    http://biostar.stackexchange.com/questions/1853/code-golf-digesting-fasta-sequences-into-a-set-of-smaller-sequences

    thanks frenkiboy for sending the link !! : )

    ReplyDelete