<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	
	>
<channel>
	<title>
	Comments on: Some things should be easy&#8230;	</title>
	<atom:link href="https://www.polarmicrobes.org/some-things-should-be-easy/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.polarmicrobes.org/some-things-should-be-easy/</link>
	<description>Marine Microbial Ecology</description>
	<lastBuildDate>Wed, 04 Feb 2015 22:02:31 +0000</lastBuildDate>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9</generator>
	<item>
		<title>
		By: Jeff		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-264</link>

		<dc:creator><![CDATA[Jeff]]></dc:creator>
		<pubDate>Wed, 04 Feb 2015 22:02:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-264</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.polarmicrobes.org/some-things-should-be-easy/#comment-263&quot;&gt;mialian&lt;/a&gt;.

Mialian,
That&#039;s not c, sfederman&#039;s still in Python at that point.  If you replace the c.execute line in the Option #3 code with that suggested above it should work.  Alternative try this method instead: https://www.polarmicrobes.org/?p=859.]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.polarmicrobes.org/some-things-should-be-easy/#comment-263">mialian</a>.</p>
<p>Mialian,<br />
That&#8217;s not c, sfederman&#8217;s still in Python at that point.  If you replace the c.execute line in the Option #3 code with that suggested above it should work.  Alternative try this method instead: <a href="https://www.polarmicrobes.org/?p=859" rel="ugc">https://www.polarmicrobes.org/?p=859</a>.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: mialian		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-263</link>

		<dc:creator><![CDATA[mialian]]></dc:creator>
		<pubDate>Wed, 21 Jan 2015 10:21:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-263</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.polarmicrobes.org/some-things-should-be-easy/#comment-141&quot;&gt;sfederman&lt;/a&gt;.

sfederman,
I would like to implement your suggestion to (hopefully) speed up the process further, but I don&#039;t understand exactly how to implement it. I am not used to c-code. Could you show where and how you would implement it in Jeff&#039;s code!?

Thanks in advance!]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.polarmicrobes.org/some-things-should-be-easy/#comment-141">sfederman</a>.</p>
<p>sfederman,<br />
I would like to implement your suggestion to (hopefully) speed up the process further, but I don&#8217;t understand exactly how to implement it. I am not used to c-code. Could you show where and how you would implement it in Jeff&#8217;s code!?</p>
<p>Thanks in advance!</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jeff		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-147</link>

		<dc:creator><![CDATA[Jeff]]></dc:creator>
		<pubDate>Thu, 04 Jul 2013 16:57:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-147</guid>

					<description><![CDATA[If you are interested in setting up a taxonomy database please read this post describing a better method: https://www.polarmicrobes.org/?p=859]]></description>
			<content:encoded><![CDATA[<p>If you are interested in setting up a taxonomy database please read this post describing a better method: <a href="https://www.polarmicrobes.org/?p=859" rel="ugc">https://www.polarmicrobes.org/?p=859</a></p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jeff		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-144</link>

		<dc:creator><![CDATA[Jeff]]></dc:creator>
		<pubDate>Tue, 18 Jun 2013 18:31:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-144</guid>

					<description><![CDATA[In reply to &lt;a href=&quot;https://www.polarmicrobes.org/some-things-should-be-easy/#comment-139&quot;&gt;Jeff&lt;/a&gt;.

Further modifications... if you use the above fgrep --max-count=1 method you will invariably incorrectly match some lines in the .dmp file (wherever your query gi appears in a target gi).  Since --max-count=1 aborts the search after one match you will often not return the line you want.  Use grep(z) -P &quot;23435/t&quot; instead.  The -P option uses Perl regex, allowing the use of the \t character.  This example would match:

23435     46
123435    893

but not:

234357    21456

This effectively reduces the .dmp file to a size that allows dictionary creation, but it will contain a number of erroneous matches (not a problem, because the dictionary will force exact matches downstream).]]></description>
			<content:encoded><![CDATA[<p>In reply to <a href="https://www.polarmicrobes.org/some-things-should-be-easy/#comment-139">Jeff</a>.</p>
<p>Further modifications&#8230; if you use the above fgrep &#8211;max-count=1 method you will invariably incorrectly match some lines in the .dmp file (wherever your query gi appears in a target gi).  Since &#8211;max-count=1 aborts the search after one match you will often not return the line you want.  Use grep(z) -P &#8220;23435/t&#8221; instead.  The -P option uses Perl regex, allowing the use of the \t character.  This example would match:</p>
<p>23435     46<br />
123435    893</p>
<p>but not:</p>
<p>234357    21456</p>
<p>This effectively reduces the .dmp file to a size that allows dictionary creation, but it will contain a number of erroneous matches (not a problem, because the dictionary will force exact matches downstream).</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: sfederman		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-141</link>

		<dc:creator><![CDATA[sfederman]]></dc:creator>
		<pubDate>Thu, 06 Jun 2013 06:34:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-141</guid>

					<description><![CDATA[I&#039;ve also found this post very helpful in setting up a taxonomy database. Thanks for putting me on the right path.

I want to let you know one adjustment I made in order to speed up the queries. When creating the database, I adjusted your script above slightly:

&lt;code&gt;c.execute(&#039;&#039;&#039;CREATE TABLE gi_taxid (
			gi INTEGER PRIMARY KEY,
			taxid INTEGER)&#039;&#039;&#039;)&lt;/code&gt;

This makes the gi a primary key, and indexes this field in the file - greatly speeding up searches.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve also found this post very helpful in setting up a taxonomy database. Thanks for putting me on the right path.</p>
<p>I want to let you know one adjustment I made in order to speed up the queries. When creating the database, I adjusted your script above slightly:</p>
<p><code>c.execute('''CREATE TABLE gi_taxid (<br />
			gi INTEGER PRIMARY KEY,<br />
			taxid INTEGER)''')</code></p>
<p>This makes the gi a primary key, and indexes this field in the file &#8211; greatly speeding up searches.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: Jeff		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-139</link>

		<dc:creator><![CDATA[Jeff]]></dc:creator>
		<pubDate>Sun, 26 May 2013 17:29:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-139</guid>

					<description><![CDATA[Glad it helped!  I&#039;ve made the following two modifications to my use of grep, now using &quot;zgrep -F&quot;.  Zgrep allows a search on a gzipped dmp file (dmp.gz) which saves a little disk space.  The -F parameter (same as fgrep for search on a non-gzipped file) does not interpret regular expressions (not needed here), speeding up the grep search.]]></description>
			<content:encoded><![CDATA[<p>Glad it helped!  I&#8217;ve made the following two modifications to my use of grep, now using &#8220;zgrep -F&#8221;.  Zgrep allows a search on a gzipped dmp file (dmp.gz) which saves a little disk space.  The -F parameter (same as fgrep for search on a non-gzipped file) does not interpret regular expressions (not needed here), speeding up the grep search.</p>
]]></content:encoded>
		
			</item>
		<item>
		<title>
		By: cjcook		</title>
		<link>https://www.polarmicrobes.org/some-things-should-be-easy/#comment-138</link>

		<dc:creator><![CDATA[cjcook]]></dc:creator>
		<pubDate>Wed, 22 May 2013 17:03:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.polarmicrobes.org/?p=759#comment-138</guid>

					<description><![CDATA[Thank you so much for posting this.  I&#039;ve been struggling with the same problem for a couple of weeks now and it never occured to me to grep for it!]]></description>
			<content:encoded><![CDATA[<p>Thank you so much for posting this.  I&#8217;ve been struggling with the same problem for a couple of weeks now and it never occured to me to grep for it!</p>
]]></content:encoded>
		
			</item>
	</channel>
</rss>
