10-03-2018, 05:48 AM
Programming Snapshot: Implementing Fast Queries for Local Files in Go
<div style="margin: 5px 5% 10px 5%;"><img src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/programming-snapshot-implementing-fast-queries-for-local-files-in-go.png" width="346" height="92" title="" alt="" /></div><div><div class="article_intro">
<p>To find files quickly in the deeply nested subdirectories of his home directory, Mike whips up a Go program to index file metadata in an SQLite database.</p>
</div>
<div class="article_body">
<p>…the GitHub Codesearch <a class="info" href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go/(offset)/3#article_i1">[1]</a> project, with its indexer built in Go, at least lets you browse locally available repositories, index them, and then search for code snippets in a flash. Its author, Russ Cox, then an intern at Google, explained later how the search works <a class="info" href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go/(offset)/3#article_i2">[2]</a>.</p>
</div>
<p>How about using a similar method to create an index of files below a start directory to perform quick queries such as: “Which files have recently been modified?” “Which are the biggest wasters of space?” Or “Which file names match the following pattern?”</p>
<p>Unix filesystems store metadata in inodes, which reside in flattened structures on disk that cause database-style queries to run at a snail’s pace. To take a look at a file’s metadata, run the <code>stat</code>command on it and take a look at the file size and timestamps, such as the time of the last modification (<a class="figure" href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go#article_f2">Figure 2</a>).</p>
<div class="object-center">
<div class="imagecenter"><img alt="" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/programming-snapshot-implementing-fast-queries-for-local-files-in-go.png" /></p>
<p>Figure 2: Inode metadata of a file, here determined by stat, can be used to build an index.</p>
</div>
</div>
<p>Newer filesystems like ZFS or Btrfs take a more database-like approach in the way they organize the files they contain but do not go far enough to be able to support meaningful queries from userspace.</p>
<h4>Fast Forward Instead of Pause</h4>
<p>For example, if you want to find all files over 100MB on the disk, you can do this with a <code>find</code> call like:</p>
<pre class="auto">
find / -type f -size +100M</pre>
<p>If you are running the search on a traditional hard disk, take a coffee break. Even on a fast SSD, you need to prepare yourself for long search times in the minute range. The reason for this is that the data is scattered in a query-unfriendly way across the sectors of the disk.</p>
<p>Read more at <a href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go">Linux Pro Magazine</a></p>
</div>
<div style="margin: 5px 5% 10px 5%;"><img src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/programming-snapshot-implementing-fast-queries-for-local-files-in-go.png" width="346" height="92" title="" alt="" /></div><div><div class="article_intro">
<p>To find files quickly in the deeply nested subdirectories of his home directory, Mike whips up a Go program to index file metadata in an SQLite database.</p>
</div>
<div class="article_body">
<p>…the GitHub Codesearch <a class="info" href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go/(offset)/3#article_i1">[1]</a> project, with its indexer built in Go, at least lets you browse locally available repositories, index them, and then search for code snippets in a flash. Its author, Russ Cox, then an intern at Google, explained later how the search works <a class="info" href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go/(offset)/3#article_i2">[2]</a>.</p>
</div>
<p>How about using a similar method to create an index of files below a start directory to perform quick queries such as: “Which files have recently been modified?” “Which are the biggest wasters of space?” Or “Which file names match the following pattern?”</p>
<p>Unix filesystems store metadata in inodes, which reside in flattened structures on disk that cause database-style queries to run at a snail’s pace. To take a look at a file’s metadata, run the <code>stat</code>command on it and take a look at the file size and timestamps, such as the time of the last modification (<a class="figure" href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go#article_f2">Figure 2</a>).</p>
<div class="object-center">
<div class="imagecenter"><img alt="" src="http://www.sickgaming.net/blog/wp-content/uploads/2018/10/programming-snapshot-implementing-fast-queries-for-local-files-in-go.png" /></p>
<p>Figure 2: Inode metadata of a file, here determined by stat, can be used to build an index.</p>
</div>
</div>
<p>Newer filesystems like ZFS or Btrfs take a more database-like approach in the way they organize the files they contain but do not go far enough to be able to support meaningful queries from userspace.</p>
<h4>Fast Forward Instead of Pause</h4>
<p>For example, if you want to find all files over 100MB on the disk, you can do this with a <code>find</code> call like:</p>
<pre class="auto">
find / -type f -size +100M</pre>
<p>If you are running the search on a traditional hard disk, take a coffee break. Even on a fast SSD, you need to prepare yourself for long search times in the minute range. The reason for this is that the data is scattered in a query-unfriendly way across the sectors of the disk.</p>
<p>Read more at <a href="http://www.linuxpromagazine.com/Online/Features/Programming-Snapshot-Go">Linux Pro Magazine</a></p>
</div>