{"id":133167,"date":"2023-04-17T08:00:00","date_gmt":"2023-04-17T08:00:00","guid":{"rendered":"https:\/\/fedoramagazine.org\/?p=38109"},"modified":"2023-04-17T08:00:00","modified_gmt":"2023-04-17T08:00:00","slug":"linux-bcache-with-writeback-cache-how-it-works-and-doesnt-work","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2023\/04\/17\/linux-bcache-with-writeback-cache-how-it-works-and-doesnt-work\/","title":{"rendered":"Linux bcache with writeback cache (how it works and doesn\u2019t work)"},"content":{"rendered":"<p>bcache is a simple and good way to have large disks (typically rotary and slow) exhibit performance quite similar to an SSD disk, using a small SSD disk or a small part of an SDD.<\/p>\n<p>In general, bcache is a system for having devices composed of slow and large disks, with fast and small disks attached as a cache.<\/p>\n<p>This article will discuss performance and some optimization tips as well as configuration of bcache.<\/p>\n<p> <span id=\"more-38109\"><\/span> <\/p>\n<p>The following terms are used in bcache to describe how it works and the parts of bcache:<\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td><em>backing device<\/em><\/td>\n<td>slow and large disk (disk intended to actually hold the data)<\/td>\n<\/tr>\n<tr>\n<td><em>cache device<\/em><\/td>\n<td>fast and small disk (cache)<\/td>\n<\/tr>\n<tr>\n<td><em>dirty cache<\/em><\/td>\n<td>data present only in the cache device<\/td>\n<\/tr>\n<tr>\n<td><em>writeback<\/em><\/td>\n<td>writing to the cache device and later (much later) to the backing device<\/td>\n<\/tr>\n<tr>\n<td><em>writeback rate<\/em><\/td>\n<td>cache write speed in the backing device<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>A disk data cache has always existed, it is the free RAM in the operating system. When data is read from the disk it is copied to RAM. If the data is already in RAM, it is read from RAM rather than being read from disk again. When data is written to the disk, it is written to RAM and after a few moments written to the disk as well. The time data spends only in RAM is very short since RAM is volatile.<\/p>\n<p>bcache is similar, only it has various modes of cache operation. The mode that is faster in writing data is writeback. It works the same as for RAM, only instead of RAM there is a SATA or NVME SSD device. The data may reside only in the cache for much longer, even forever, so it is a bit riskier (if you break the SSD, the data that resided only in the cache is lost, with a good chance that the whole filesystem becomes inaccessible).<\/p>\n<h2>Performance Comparison<\/h2>\n<p>It is very difficult to gather reliable data from any tests, either with real cases or with special programs. They always give extremely variable, different, unstable values. The various caches present and the type of filesystem (btrfs, journaled, etc.), make the values very variable. It is advisable to ignore small differences (say 5-10%).<\/p>\n<p>The following performance data refers to the test below (random and multiple reads\/writes), trying to always maintain the same conditions and repeating three times in immediate sequence.<\/p>\n<pre class=\"wp-block-preformatted\">$ sysbench fileio --file-total-size=2G --file-test-mode=rndrw --time=30 --max-requests=0 run<\/pre>\n<p>The tables below show the performance of the separate devices:<\/p>\n<p class=\"has-text-align-center\"><strong>Performance of the backing device (RAID 1 with 1TB rotary disks)<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 0.22<\/td>\n<td>read, MiB\/s: 0.23<\/td>\n<td>read, MiB\/s: 0.19<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 0.15<\/td>\n<td>written, MiB\/s: 0.16<\/td>\n<td>written, MiB\/s 0.13<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>max: 174.92<\/td>\n<td>max: 879.59 <\/td>\n<td>max: 1335.30<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 87.56<\/td>\n<td>95th percentile: 87.56 <\/td>\n<td>95th percentile: 89.16<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>RAID 1 with 1TB rotary disks<\/figcaption><\/figure>\n<p class=\"has-text-align-center\"><strong>Performance of the cache device (SSD SATA 100GB)<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 7.28 <\/td>\n<td>read, MiB\/s: 7.21<\/td>\n<td>read, MiB\/s: 7.51<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 4.86\u2003<\/td>\n<td>written, MiB\/s: 4.81<\/td>\n<td>written, MiB\/s 5.01<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td> <\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>max: 126.55<\/td>\n<td>max: 102.39<\/td>\n<td>max: 107.95<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 1.47<\/td>\n<td>95th percentile: 1.47 <\/td>\n<td>95th percentile: 1.47<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Cache device (SSD SATA 100GB)<\/figcaption><\/figure>\n<p>The theoretical expectation that a bcache device will be as fast as the cache device is (physically) impossible to achieve. On average, bcache is significantly slower and only sometimes approaches the same performance as the cache device. Improved performance almost always requires various compromises.<\/p>\n<p>Consider an example assuming there is a 1TB bcache device and a 100GB cache. When writing a 1TB file, the cache device is filled, then partially emptied to the backing device, and refilled again, until the file is fully written. <\/p>\n<p>Because of this (and also because part of the cache also serves data when reading) there is a limit on the length of the file&#8217;s sequential data that are written to the cache. Once the limit is exceeded, the file data is written (or read) directly to the backing device, bypassing the cache.<\/p>\n<p>bcache also limits the response delay of the disks, but disproportionately so, especially for SSD SATA, degrading the performance of the cache.<\/p>\n<p>The dirty cache should be emptied to decrease the risk of data loss and to have cache available when it is needed. This should only be done when the devices exhibit little or no activity, otherwise the performance available for normal use collapses.<\/p>\n<p>Unfortunately, the default settings are too low, and the writeback rate adjustment is crude. To improve the writeback rate adjustment it is necessary to write a program (I wrote a script for this).<\/p>\n<p>The following commands provide the necessary optimizations (required at each startup) to get better performance from the bcache device.<\/p>\n<pre class=\"wp-block-preformatted\"># echo 0 &gt; \/sys\/block\/bcache0\/bcache\/cache\/congested_write_threshold_us\n# echo 0 &gt; \/sys\/block\/bcache0\/bcache\/cache\/congested_read_threshold_us\n# echo 600000000 &gt; \/sys\/block\/bcache0\/bcache\/sequential_cutoff\n# echo 40 &gt; \/sys\/block\/bcache0\/bcache\/writeback_percent<\/pre>\n<p>The following tables compare the performance with the default values and the optimization results.<\/p>\n<p class=\"has-text-align-center\"><strong>Performance with default values<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 3.37<\/td>\n<td>read, MiB\/s: 2.67<\/td>\n<td>read, MiB\/s: 2.61<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 2.24<\/td>\n<td>written, MiB\/s: 1.78<\/td>\n<td>written, MiB\/s 1.74<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>max: 128.51 <\/td>\n<td> max: 102.61 \u2003<\/td>\n<td>max: 142.04<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 9.22<\/td>\n<td>95th percentile: 10.84<\/td>\n<td>95th percentile: 11.04<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Default values (SSD SATA 100GB)<\/figcaption><\/figure>\n<p class=\"has-text-align-center\"><strong>Performance with optimizations<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 5.96 <\/td>\n<td> read, MiB\/s: 3.89 <\/td>\n<td>read, MiB\/s: 3.81<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 3.98 <\/td>\n<td>written, MiB\/s: 2.59<\/td>\n<td> written, MiB\/s 2.54<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>max: 131.95 <\/td>\n<td> max: 133.23<\/td>\n<td>max: 117.76<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 2.61<\/td>\n<td>95th percentile: 2.66<\/td>\n<td> 95th percentile: 2.66<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Optimization (SSD SATA 100GB)<\/figcaption><\/figure>\n<p class=\"has-text-align-center\"><strong>Performance with the writeback rate adjustment script<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td>\ufeff<\/td>\n<td>\ufeff<\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 6.25<\/td>\n<td>read, MiB\/s: 4.29<\/td>\n<td>read, MiB\/s: 5.12<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 4.17<\/td>\n<td> written, MiB\/s: 2.86<\/td>\n<td>written, MiB\/s 3.41<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td>\ufeff<\/td>\n<td>\ufeff<\/td>\n<\/tr>\n<tr>\n<td>max: 130.92<\/td>\n<td>max: 115.96<\/td>\n<td>max: 122.69<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 2.61<\/td>\n<td>95th percentile: 2.66<\/td>\n<td>95th percentile: 2.61<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Writeback rate adjustment (SSD SATA 100GB)<\/figcaption><\/figure>\n<p>In single operations (without anything else happening in the system) on large files, adjusting the writeback rate becomes irrelevant.<\/p>\n<div style=\"height:9px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n<h2>Prepare the backing, cache and bcache device<\/h2>\n<p>To create a bcache device you need to install the bcache-tools. The command for this is:<\/p>\n<pre class=\"wp-block-preformatted\"># dnf install bcache-tools<\/pre>\n<p>bcache devices are visible as <em>\/dev\/bcacheN<\/em> (for example <em>\/dev\/bcache0<\/em> ). Once created, they are managed like any other disk.<\/p>\n<p>More details are available at <a href=\"https:\/\/docs.kernel.org\/admin-guide\/bcache.html\">https:\/\/docs.kernel.org\/admin-guide\/bcache.html<\/a><\/p>\n<p class=\"has-pale-pink-background-color has-background\">CAUTION: Any operation performed can immediately destroy the data on the partitions and disks on which you are operating. Backup is advised.<\/p>\n<p>In the following example <em>\/dev\/md0<\/em> is the backing device and<em> \/dev\/sda7<\/em> is the cache device.<\/p>\n<p><strong>WARNING:<\/strong><em> bcache device cannot be resized.<\/em><br \/><strong>NOTE:<\/strong><em> bcache refuses to use partitions or disks with a filesystem already present.<\/em><\/p>\n<pre class=\"wp-block-preformatted\">To delete an existing filesystem you can use:\n# wipefs -a \/dev\/md0 # wipefs -a \/dev\/sda7 <\/pre>\n<h3>Create the backing device (and therefore the bcache device)<\/h3>\n<pre class=\"wp-block-preformatted\"># bcache make -B \/dev\/md0\nif necessary (device status is inactive)\n# bcache register \/dev\/md0<\/pre>\n<h3>Creating the cache device (and hooking the cache to the backing device)<\/h3>\n<pre class=\"wp-block-preformatted\"># bcache make -C \/dev\/sda7\nif necessary (device status is inactive)\n# bcache register \/dev\/sda7<\/pre>\n<pre class=\"wp-block-preformatted\"># bcache attach \/dev\/sda7 \/dev\/md0<\/pre>\n<pre class=\"wp-block-preformatted\"># bcache set-cachemode \/dev\/md0 writeback<\/pre>\n<h3>Check the status<\/h3>\n<pre class=\"wp-block-preformatted\"># bcache show<\/pre>\n<p>The output from this command includes information similar to the following:<br \/>(if the status of a device is inactive, it means that it must be registered)<\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td class=\"has-text-align-center\" data-align=\"center\">Name<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">Type<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">State<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">Bname<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">AttachToDev<\/td>\n<\/tr>\n<tr>\n<td class=\"has-text-align-center\" data-align=\"center\">\/dev\/md0<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">1 (data)<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">clean(running)<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">bcache0<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">\/dev\/sda7<\/td>\n<\/tr>\n<tr>\n<td class=\"has-text-align-center\" data-align=\"center\">\/dev\/sda7<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">3 (cache)<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">active<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">N\/A<\/td>\n<td class=\"has-text-align-center\" data-align=\"center\">N\/A<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>bcache show<\/figcaption><\/figure>\n<h3>Optimize<\/h3>\n<pre class=\"wp-block-preformatted\"># echo 0 &gt; \/sys\/block\/bcache0\/bcache\/cache\/congested_write_threshold_us\n# echo 0 &gt; \/sys\/block\/bcache0\/bcache\/cache\/congested_read_threshold_us\n# echo 600000000 &gt; \/sys\/block\/bcache0\/bcache\/sequential_cutoff\n# echo 40 &gt; \/sys\/block\/bcache0\/bcache\/writeback_percent<\/pre>\n<h2>In closing<\/h2>\n<p>Hopefully this article will provide some insight on the benefits of bcache if it suits your needs.<\/p>\n<p>As always, nothing fits all cases and all people&#8217;s preferences. However, understanding (even roughly) how things work, and especially how they don&#8217;t work, as well as how to adapt them, makes the difference in having satisfactory results or not <\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2>Addendum<\/h2>\n<p>The following charts show the performance with a SSD NVME cache device rather than SSD SATA as shown above.<\/p>\n<p class=\"has-text-align-center\"><strong>Performance of the cache device (SSD NVME 100GB)<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td>\ufeff<\/td>\n<td>\ufeff<\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 16.31<\/td>\n<td>read, MiB\/s: 16.17<\/td>\n<td>read, MiB\/s: 15.77<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 10.87<\/td>\n<td> written, MiB\/s: 10.78<\/td>\n<td>written, MiB\/s 10.51<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td>\ufeff<\/td>\n<td>\ufeff<\/td>\n<\/tr>\n<tr>\n<td>max: 17.50<\/td>\n<td>max: 15.30<\/td>\n<td>max: 46.61<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 1.10<\/td>\n<td>95th percentile: 1.10<\/td>\n<td>95th percentile: 1.10<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Cache device (SSD NVME 100GB)<\/figcaption><\/figure>\n<p class=\"has-text-align-center\"><strong>Performance with optimizations<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 7.96<\/td>\n<td> read, MiB\/s: 6.87 <\/td>\n<td>read, MiB\/s: 7.73<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 5.31 <\/td>\n<td>written, MiB\/s: 4.58<\/td>\n<td> written, MiB\/s 5.15<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>max: 50.79 <\/td>\n<td> max: 84.40<\/td>\n<td>max: 108.71<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 2.00<\/td>\n<td>95th percentile: 2.03<\/td>\n<td> 95th percentile: 2.00<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Optimization (SSD NVME da 100GB)<\/figcaption><\/figure>\n<p class=\"has-text-align-center\"><strong>Performance with the writeback rate adjustment script<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table>\n<tbody>\n<tr>\n<td>Throughput:<\/td>\n<td>\ufeff<\/td>\n<td>\ufeff<\/td>\n<\/tr>\n<tr>\n<td>read, MiB\/s: 8.43<\/td>\n<td>read, MiB\/s: 7.52<\/td>\n<td>read, MiB\/s: 7.34<\/td>\n<\/tr>\n<tr>\n<td>written, MiB\/s: 5.62<\/td>\n<td> written, MiB\/s: 5.02<\/td>\n<td>written, MiB\/s 4.89<\/td>\n<\/tr>\n<tr>\n<td>Latency (ms):<\/td>\n<td>\ufeff<\/td>\n<td>\ufeff<\/td>\n<\/tr>\n<tr>\n<td>max: 72.71<\/td>\n<td>max: 78.60<\/td>\n<td>max: 50.61<\/td>\n<\/tr>\n<tr>\n<td>95th percentile: 2.00<\/td>\n<td>95th percentile: 2.03<\/td>\n<td>95th percentile: 2.11<\/td>\n<\/tr>\n<\/tbody>\n<\/table><figcaption>Writeback rate adjustment (SSD NVME 100GB)<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>bcache is a simple and good way to have large disks (typically rotary and slow) exhibit performance quite similar to an SSD disk, using a small SSD disk or a small part of an SDD. In general, bcache is a system for having devices composed of slow and large disks, with fast and small disks [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[48],"tags":[45,61,46,47],"class_list":["post-133167","post","type-post","status-publish","format-standard","hentry","category-fedora-os","tag-fedora","tag-fedora-project-community","tag-magazine","tag-news"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/133167","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=133167"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/133167\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=133167"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=133167"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=133167"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}