Bash Multiprocessing

For those who didn’t get that ninja feeling by now, this is for you. We’ll combine all our newly learned superpowers and perform parallel computing, all with a single terminal-command!

Xargs reads items from standard input (meaning, you can pipe data to it) and executes the specified command.

The basic syntax for xargs is:

xargs [options] [command [initial-arguments]]

At first sight, you might not see the benefits of this. Why not create a while loop and run each command? The benefit of xargs is that it can batch arguments and call your command once on many files, instead of individually for each file.

But find can do so too!

Yup, you’re right. Still, there are more advantages. Xargs will work without needing find. So that’s one. But xargs has a special trick up its sleeve: it can run commands in parallel with the -P-option.

This option takes a number that defines how many processes it needs to start in parallel. You read that right — in parallel!

One real-world example is to use this when doing video conversion on lots of files. Let’s dissect the following command together:

find . -name "*.mpeg" | xargs -P 4 -I {} ffmpeg -i {} -o {}.mp4

First, we find all mpeg files. We feed these files to xargs. Next we tell xargs, with -P 4, to use four processes concurrently. We also tell xargs to substitute the file name in all places where it encounters {} with the -I option. So xargs gets the first video file and starts ffmpeg. Instead of waiting for ffmpeg to finish, xargs starts another instance of ffmpeg to process the second file in parallel. This goes on until it reaches four processes. If all four slots are taken, xargs waits for one to finish before starting the next process.

Video conversion is mostly CPU-bound. If your computer has four CPU cores, this conversion will go four times as fast compared to using regular find or a while loop. Isn’t that awesome?!


If you liked this page, please share it with a fellow learner: