<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Luke Piszkin &#8211; The Bowman Lab</title>
	<atom:link href="https://www.polarmicrobes.org/author/luke/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.polarmicrobes.org</link>
	<description>Marine Microbial Ecology</description>
	<lastBuildDate>Tue, 10 May 2022 18:48:22 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
<site xmlns="com-wordpress:feed-additions:1">188265837</site>	<item>
		<title>A short tutorial on Gnu Parallel</title>
		<link>https://www.polarmicrobes.org/a-short-tutorial-on-gnu-parallel/</link>
					<comments>https://www.polarmicrobes.org/a-short-tutorial-on-gnu-parallel/#comments</comments>
		
		<dc:creator><![CDATA[Luke Piszkin]]></dc:creator>
		<pubDate>Wed, 20 Jan 2021 05:36:12 +0000</pubDate>
				<category><![CDATA[Computer tutorials]]></category>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=3179</guid>

					<description><![CDATA[This post comes form Luke Piszkin, an undergraduate researcher in the Bowman Lab. Gnu Parallel is a must-have utility for anyone that spends a lot of time in Linux Land, and Luke recently had to gain some Gnu Parallel fluency &#8230; <a href="https://www.polarmicrobes.org/a-short-tutorial-on-gnu-parallel/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
										<content:encoded><![CDATA[
<p>This post comes form <a href="https://www.polarmicrobes.org/people/">Luke Piszkin</a>, an undergraduate researcher in the Bowman Lab.  <a href="https://www.gnu.org/software/parallel/">Gnu Parallel</a> is a must-have utility for anyone that spends a lot of time in Linux Land, and Luke recently had to gain some Gnu Parallel fluency for his project.  Enjoy!</p>



<p class="has-text-align-center">*******</p>



<div class="wp-block-jetpack-markdown"><p>GNU parallel is a Linux shell tool for executing jobs in parallel using multiple CPU cores.
This is a quick tutorial for increasing your workflow and getting the most out of your machine with parallel.
​
You can find the current distribution here: https://www.gnu.org/software/parallel/. Please try some basic commands to make sure it is working.
​
You will need some basic understanding of “piping” in the command line. I will describe command pipes briefly just for our purposes, but for a more detailed look please see https://www.howtogeek.com/438882/how-to-use-pipes-on-linux/.
​
Piping data in the command line involves taking the output of one command and using it as the input for another. A basic example looks like this:
​</p>
<pre><code>command_1 | command_2 | command_3 | … 
</code></pre>
<p>​
Where the output of <strong>command_1</strong> will be used as an input by <strong>command_2</strong>, <strong>command_2</strong> will be used by <strong>command_3</strong>, and so on. For now, we will only need to use one pipe with parallel. Now let&#8217;s look at a basic command run in parallel.
​</p>
<pre><code>Input: find -type f -name &quot;*.txt&quot; | parallel cat
</code></pre>
<pre><code>Output: 
The house stood on a slight rise just on the edge of the village.
It stood on its own and looked over a broad spread of West Country farmland.
Not a remarkable house by any means - it was about thirty years old, squattist, squarish, made of brick, and had four windows set in the front size and proportion which more or less exactly failed to please the eye
The only person for whom the house was in any way special was Arthur Dent, and that was only because it happened to be the one he lived in.
He had lived in it for about three years, ever since he had moved out of London because it made him nervous and irritable
</code></pre>
<p>​
This command makes use of <strong>find</strong> to list all the <strong>.txt</strong> files in my directory, then runs <strong>cat</strong> on them in parallel, which shows the contents of each file on a new line. We can already see how this is much easier than running each command separately, i.e:</p>
<pre><code>In: cat file1.txt
</code></pre>
<pre><code>The house stood on a slight rise just on the edge of the village.
</code></pre>
<pre><code>In: cat file2.txt
</code></pre>
<pre><code>It stood on its own and looked over a broad spread of West Country farmland.
</code></pre>
<p>​
Also, notice how we do not need any placeholder for the files in the second command, because of the pipes.
Now let&#8217;s take a more complicated example:</p>
<pre><code>find -type f -name &quot;*beta_gal_vibrio_vulnificus_1_100000_0__H_flex=up_*.txt&quot; ! -name &quot;*tally*&quot; | parallel -j 4 python3 PEPCplots.py {} flex log
</code></pre>
<pre><code>0.001759374417007663, 0.00033497120199255527, 0.9969940359705531
0.0019773468515624356, 0.00022978867370935437, 0.9969940359705531
0.001332602651915014, 0.0005953339816183529, 0.9969940359705531
0.0015118302435556904, 0.0005040931537659636, 0.9969940359705531
0.001320879258211107, 0.0006907926578169569, 0.9969940359705531
0.0016753759966792244, 0.00041583739269117386, 0.9969940359705302
0.0017187095827331082, 0.00036931151058880094, 0.9969940359705531
0.0017045099726521733, 0.00031386214441070197, 0.9969940359705531
0.001399703145023273, 0.0005196629341168314, 0.9969940359705531
0.001436129272321403, 0.0004806654291442482, 0.9969940359705531
</code></pre>
<p>​
This is an example from my research, it takes in a <strong>.txt</strong> data file and spits out some parameters that I want to put in a spreadsheet. Like before, we use <strong>find</strong> to get a list of all the files we want the second command to process. We use <strong>! -name “*tally*”</strong> to exclude any files that have “tally” anywhere in the name because we don’t want to process those. In the second command, we have the option <strong>-j 4</strong>. This tells parallel to use 4 CPU cores, so it can run 4 commands at a time. You can check your computer specs to see how many cores you have available. If your machine has hyper-threading, then it can create virtual cores to run jobs on too. For instance, my dinky laptop only has 2 cores, but with hyper-threading I can use 4. This is another way to improve your efficiency. In the second command you also see a <strong>{}</strong> placeholder. This spot is filled by whatever the first command outputs. In this case, we need that placeholder because our input files go between other commands. You can also use parallel to run a number of identical commands at the same time. This is helpful if you have a program to run on the same file multiple times. For example:</p>
<pre><code>seq 10 | parallel -N0 cat file1.txt
</code></pre>
<pre><code>The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
The house stood on a slight rise just on the edge of the village.
</code></pre>
<p>​
Here we use <strong>seq</strong> as a counting mechanism for how many times to run the second command. You can adjust the number of jobs by changing the <strong>seq</strong> argument. We include the <strong>-N0</strong> flag, which tells parallel to ignore any piped inputs because we aren’t using the first command for inputs this time.
Often, I like to include both the <strong>time</strong> shell tool and the <strong>&#8211;progress</strong> parallel option to see current job status and time for completion:
​</p>
<pre><code>seq 10 | time parallel --progress -N0 cat file1.txt
</code></pre>
<pre><code>Computers / CPU cores / Max jobs to run
1:local / 4 / 4
​
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:4/0/100%/0.0s The house stood on a slight rise just on the edge of the village.
local:4/1/100%/1.0s The house stood on a slight rise just on the edge of the village.
local:4/2/100%/0.5s The house stood on a slight rise just on the edge of the village.
local:4/3/100%/0.3s The house stood on a slight rise just on the edge of the village.
local:4/4/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:4/5/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:4/6/100%/0.2s The house stood on a slight rise just on the edge of the village.
local:3/7/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:2/8/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:1/9/100%/0.1s The house stood on a slight rise just on the edge of the village.
local:0/10/100%/0.1s
0.21user 0.46system 0:00.63elapsed 108%CPU (0avgtext+0avgdata 15636maxresident)k
0inputs+0outputs (0major+12089minor)pagefaults 0swaps
</code></pre>
<p>​
And with that, you are well on your way to significantly increasing your computing throughput and using the full potential of your machine. You should now have a sufficient understanding of parallel to construct a command for your own projects, and to explore more complicated applications of parallelization.
(Bonus points to whoever knows the book that I used for the text files.)</p>
</div>
<p><a class="a2a_button_facebook" href="https://www.addtoany.com/add_to/facebook?linkurl=https%3A%2F%2Fwww.polarmicrobes.org%2Fa-short-tutorial-on-gnu-parallel%2F&amp;linkname=A%20short%20tutorial%20on%20Gnu%20Parallel" title="Facebook" rel="nofollow noopener" target="_blank"></a><a class="a2a_button_mastodon" href="https://www.addtoany.com/add_to/mastodon?linkurl=https%3A%2F%2Fwww.polarmicrobes.org%2Fa-short-tutorial-on-gnu-parallel%2F&amp;linkname=A%20short%20tutorial%20on%20Gnu%20Parallel" title="Mastodon" rel="nofollow noopener" target="_blank"></a><a class="a2a_button_email" href="https://www.addtoany.com/add_to/email?linkurl=https%3A%2F%2Fwww.polarmicrobes.org%2Fa-short-tutorial-on-gnu-parallel%2F&amp;linkname=A%20short%20tutorial%20on%20Gnu%20Parallel" title="Email" rel="nofollow noopener" target="_blank"></a><a class="a2a_dd addtoany_share_save addtoany_share" href="https://www.addtoany.com/share#url=https%3A%2F%2Fwww.polarmicrobes.org%2Fa-short-tutorial-on-gnu-parallel%2F&#038;title=A%20short%20tutorial%20on%20Gnu%20Parallel" data-a2a-url="https://www.polarmicrobes.org/a-short-tutorial-on-gnu-parallel/" data-a2a-title="A short tutorial on Gnu Parallel"></a></p>]]></content:encoded>
					
					<wfw:commentRss>https://www.polarmicrobes.org/a-short-tutorial-on-gnu-parallel/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">3179</post-id>	</item>
	</channel>
</rss>
