03-27-2022, 04:07 AM
Python os.walk() – A Simple Illustrated Guide
<div><p>According to the <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-check-your-python-version/" data-type="post" data-id="1371" target="_blank">Python version</a> <strong>3.10.3</strong> official doc, the <a rel="noreferrer noopener" href="https://docs.python.org/3/library/os.html" target="_blank"><code>os</code> module</a> provides built-in miscellaneous operating system interfaces. We can achieve many operating system dependent functionalities through it. One of the functionalities is to <strong>generate the file names in a directory tree</strong> through <strong><code>os.walk()</code></strong>.</p>
<p>If it sounds great to you, please continue reading, and you will fully understand os.walk through Python code snippets and vivid visualization.</p>
<p>In this article, I will first introduce the usage of <code>os.walk</code> and then address three top questions about <code>os.walk</code>, including passing a file’s filepath to <code>os.walk</code>, <code>os.walk</code> vs. <code>os.listdir</code>, and <code>os.walk</code> recursive.</p>
<h2>How to Use os.walk and the topdown Parameter?</h2>
<h3>Syntax</h3>
<p>Here is the syntax for <code>os.walk</code>:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/fOktz0IKYE27u2A2pR5erisdxrh9IVnuRFquolz6k4FQSuBIV5Q9kN5mskJT8ICqATNhi59SqpJpOb629v9Nnu2yTB2wVOWUAsbT6x-tp9eCR8YPUIy2YtcqNjCerNNAj3HUgiOp" alt=""/></figure>
<h3>Input</h3>
<p>1. Must-have parameters:</p>
<ul>
<li><strong><code>top</code></strong>: accepts a directory(or file) path string that you want to use as the root to generate filenames.</li>
</ul>
<p>2. Optional parameters:</p>
<ul>
<li><strong><code>topdown</code></strong>: accepts a boolean value, <code>default=True</code>. If <code>True</code> or not specified, directories are scanned from top-down. Otherwise, directories are scanned from the bottom-up. If you are still confused about this <code>topdown</code> parameter like I first get to know <code>os.walk</code>, I have a nicely visualization in the example below.</li>
<li><code>onerror</code>: accepts a function with one argument, <code>default=None</code>. It can report the error to continue with the walk, or raise the exception to abort the walk.</li>
<li><code>followlinks</code>: accepts a boolean value, <code>default=False</code>. If <code>True</code>, we visit directories pointed to by symlinks, on systems that support them.</li>
</ul>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Tip</strong>: Generally, you only need to use the first two parameters in bold format.</p>
<h3>Output</h3>
<p>Yields 3-tuples (dirpath, dirnames, filenames) for each directory in the tree rooted at directory top (including top itself).</p>
<h3>Example</h3>
<p>I think the best way to comprehend <code>os.walk</code> is walking through an example.</p>
<p>Our example directory tree and its labels are:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/RU-gX4CYwtmG96UTi_Ex0nA2CILb1a48aRda8sbayxWSVg0GnVrbUGFpOoJyL2NwkjgjQjRRMDk_DMW8PMLf20FBAYfq6NCi_Js-FPAA1EkwPW5442O78oGNaTvqQk6AY-if_XM-" alt=""/></figure>
</div>
<p><em>By the way, the difference between a directory and a file is that a directory can contains many files like the above directory D contains <code>4.txt</code> and <code>5.txt</code>.</em></p>
<p>Back to our example, our goal is to </p>
<ul>
<li>Generate filenames based on the root directory, <code>learn_os_walk</code></li>
<li>Understand the difference between <code>topdown=True</code> and <code>topdown=False</code></li>
</ul>
<p>To use the <code>os.walk()</code> method, we need to first import <code>os</code> module:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/8eEm16wd6EFfpJPZ9WAF_uF_wzfFTtbmDhpZn2MQVLg8lnN8K9wPQirXIK0qrjKmEULn9TRj6seW90skeLmTZb-isx9YsfItdPng5C2n05Xv5bWmJtSWuSecU6MMjyfUQun2BGnu" alt=""/></figure>
<p>Then we can pass the input parameters to the <code>os.walk</code> and generate filenames. The code snippet is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">a_directory_path = './learn_os_walk' def take_a_walk(fp, topdown_flag=True): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.
</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/k3nva1QZzCfunGoLdjSuye6EqvOwv8tZ_W5lLrxEHMegRgTzPVyOmDuRg1L9dLUCf2AFoqc46WRg-sDuNUPnU6OXOAVqncRL4wgGJG00qgOqch6cr7ptnxsgZAsctJsljeoErm_c" alt=""/></figure>
<p>The above code has a function <code>take_a_walk</code> to use <code>os.walk</code> along with a <a href="https://blog.finxter.com/python-loops/" data-type="post" data-id="4596" target="_blank" rel="noreferrer noopener">for loop</a>. This is the most often usage of <code>os.walk</code> so that you can get every file level and filenames from the root directory iteratively. </p>
<p>For those with advanced knowledge in Python’s <a href="https://blog.finxter.com/understanding-generators-in-python/" data-type="post" data-id="33873" target="_blank" rel="noreferrer noopener">generator</a>, you would probably have already figured out that <code>os.walk</code> actually gives you a generator to <a href="https://blog.finxter.com/yield-keyword-in-python-a-simple-illustrated-guide/" data-type="post" data-id="14682" target="_blank" rel="noreferrer noopener">yield</a> next and next and next 3-tuple……</p>
<p>Back in this code, we set a <code>True</code> flag for the <code>topdown</code> argument. Visually, the topdown search way is like the orange arrow in the picture below:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh3.googleusercontent.com/FUigmU9wamX-jo4ONI50A_6-licIWn7CMrmJ4X60kp75q_XfqucUm0o2kD8XbKBkKiJH-5APk03_rjhmMx8sC-6JBiNrUUdF-g_UWCZHPCZvqa4MteUauBIW_2cozcC1TyIutfK_" alt=""/></figure>
</div>
<p>And if we run the above code, we can the below result:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/Nw-hjhSoa1DUojcYmZ8Z7fBVB5kSpZ_5j3UCUbAx7OvO20MooQrtWJSn9F3A2KCMow-DtmhRgrySMYW3JW0ePhZGU_xToQoNiGNMQrcXyYr-8_uL-fTf8ZKSnPs-SZkF-PJ1yIC-" alt="" width="251" height="341"/></figure>
</div>
<p>If we set the topdown to be <code>False</code>, we are walking the directory tree from its bottom directory D like this:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh5.googleusercontent.com/6npy_jaH9M5LFze668IIkmpGvw8KnloD9Z0y6BFrIuk9mq9tzKLQu_HpPCv8ObQsfr8GEwcGoXovruJ-3yV4Bd1XwjLyJagTyIV4ApTQah9spOwuoLxDCZ7oNdoDXgrpJ9wYC63I" alt=""/></figure>
</div>
<p>The corresponding code snippet is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">a_directory_path = './learn_os_walk' def take_a_walk(fp, topdown_flag=False): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh3.googleusercontent.com/UX_xHkW0YGkMhQugMEUd84GcqOCF9G-egBHx34R4lYLSXZ30VxcW8YgiWbpYafXfO7JsE1TaWskGPulzShfH3kyLYmnCAI4auYGulY8perE_1LJmhTP4byBj24n6OSTiBGYEyxAL" alt=""/></figure>
</div>
<p>And if we run the above code, we can the below result:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/Wl58daoFZum7nTYG0iygDw6m8Hy4anHBD63i3vw2I4hH2SSkiBAm80U1qzNVWyPPCBxhxtYlGGw0aXHwRxANh_W02inT3Rky9E7WvkNtEuVDcWTBwW5Viyo2g_zXWf2klPgsmfYK" alt="" width="257" height="349"/></figure>
</div>
<p>Now, I hope you understand how to use <code>os.walk</code> and the difference between <code>topdown=True</code> and <code>topdown=False</code>. <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p>Here’s the full code for this example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">__author__ = 'Anqi Wu' import os a_directory_path = './learn_os_walk'
a_file_path = './learn_os_walk.py' # same as a_file_path = __file__ def take_a_walk(fp, topdown_flag=True): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a file path
take_a_walk(a_file_path)
# Output Just 'What a walk!'
# Because there are neither subdirnames nor subfilenames in a single file !
# It is like:
# for i in []:
# print('hi!') # We are not going to execute this line. # *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside. # *Try to list all files and directories in a directory path
print('\n')
print(os.listdir(a_directory_path))
print('\n')</pre>
<h2>How to Pass a File’s filepath to os.walk?</h2>
<p>Of course, you might wonder what will happen if we pass a file’s filepath, maybe a Python module filepath string like <code>'./learn_os_walk.py'</code> to the <code>os.walk</code> function.</p>
<p>This is exactly a point I was thinking when I started using this method. The simple answer is that <strong>it will not execute your codes under the for loop</strong>.</p>
<p>For example, if you run a code in our <code>learn_os_walk.py</code> like this:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_file_path = './learn_os_walk.py' # same as a_file_path = __file__ def take_a_walk(fp, topdown_flag=False): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a file path
take_a_walk(a_file_path)
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/vMK_nlz-ns8zdF5n5OYSk-8l1gPUmYfLx1xQIa40eNtcuIsMkzGFJLmw4mfLIBxUkpQdjWtqgfap258Oz2AScJFo2oGzXKW-EyvrKyFknZhs2lf3uw-NzB0aMCnt2xIs_hTQQYeM" alt=""/></figure>
</div>
<p>The only output would be like this:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/WnJPLIJ97Z4SuO1W9z7BVkj0MsMc9G3z9JqRGgbSZ_v5pnCK-4IOH6Hw_xhoiPNbuuUihrf7yhOFqTwOa1bSsRoFPZ_JH9-on-3AP3G6EspIfMHhw1kkC_whqnif9HTLpA4aaer5" alt="" width="285" height="62"/></figure>
</div>
<p>Why is that?</p>
<p><strong>Because there are neither subdirnames nor subfilenames in a single file</strong>! It is like you are writing the below code:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">for i in []: print('hi!')</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh6.googleusercontent.com/cYsMXquGBpvFT-sjlCmlM2l2krJlae7R-tKnuvzCjfqxDPppk5hYViyNPlWvgXOHzLVVUON2M84lcFN7-6s-MQWLzfURiZas2yuwOkbRRox3g7YwET3OUAgfsxrSOwBdn7TAuEZB" alt=""/></figure>
</div>
<p>And you will not get any <code>'hi'</code> output because there is no element in an <a href="https://blog.finxter.com/how-to-check-if-a-python-list-is-empty/" data-type="post" data-id="9090" target="_blank" rel="noreferrer noopener">empty list</a>.</p>
<p>Now, I hope you understand why the official doc tells us to pass a path to a directory instead of a file’s filepath <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h2>os.walk vs os.listdir — When to Use Each?</h2>
<p>A top question of programmers concerns the difference between <code>os.walk</code> vs <code>os.listdir</code>. </p>
<p>The simple answer is:</p>
<p class="has-global-color-8-background-color has-background">The <code>os.listdir()</code> method returns a <a href="https://blog.finxter.com/python-lists/" data-type="post" data-id="7332" target="_blank" rel="noreferrer noopener">list</a> of every file and folder in a directory. The <code>os.walk()</code> method returns a list of every file in an entire file tree.</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh6.googleusercontent.com/BC9Egy0ipAEJQuQdwZuGUTcwK7PcDp8jap_ozfpq6EL60-O2sfdB2HELJxvbzh-wBSD-cJK0QL738jlN8Ap2ZEEdgjGt4jxPsKa3J7aHoT-VsgMgpSO49P14XSCG8pEvxwpdiThT" alt=""/></figure>
</div>
<p>Well, if you feel a little bit uncertain, we can then use code examples to help us understand better!</p>
<p>We will stick to our same example directory tree as below:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/RU-gX4CYwtmG96UTi_Ex0nA2CILb1a48aRda8sbayxWSVg0GnVrbUGFpOoJyL2NwkjgjQjRRMDk_DMW8PMLf20FBAYfq6NCi_Js-FPAA1EkwPW5442O78oGNaTvqQk6AY-if_XM-" alt=""/></figure>
</div>
<p>In this case, if we call <code>os.listdir()</code> method and pass the directory path of <code>learn_os_walk</code> to it like the code below:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_directory_path = './learn_os_walk' # *Try to list all files and directories in a directory path
print('\n')
print(os.listdir(a_directory_path))
print('\n')
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh3.googleusercontent.com/apX569Vd-f2mJtiwLkn4KZ_5o5lbvbYgr-NsFD55QMDcheCSGP-T4kH89NP-AZzM1BXZ9-6YjvCeaicx_SD2Ro3x_OqkpdlTEWrDqw9yxSXy3cpSZZwrGrBHsOcrT_l96_b6OoXZ" alt=""/></figure>
</div>
<p>And we will get an output like:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/PVpb-AJH8hABEYiEQZSDPks7Iy51KA5X9PGsV6Ix7twt1EL6SGyp9-6mKOC1qqahWNB0s7aZPK_ashhpX4gL9C8i6g2DpjkAwhTlbNSqmuOwTLGyWgxgR1TuitJvJ4d4IBoj1mww" alt="" width="251" height="50"/></figure>
</div>
<p>That’s it! Only the first layer of this entire directory tree is included. Or I should say that the <code>os.listdir()</code> cares only about what is directly in the root directory instead of searching through the entire directory tree like we see before in the <code>os.walk</code> example.</p>
<h3>Summary</h3>
<p class="has-global-color-8-background-color has-background"><strong>Summary</strong>: If you want to get a list of all filenames and directory names within a root directory, go with the <code>os.listdir()</code> method. If you want to iterate over an entire directory tree, you should consider <code>os.walk()</code> method.</p>
<p>Now, I hope you understand when to use <code>os.listdir</code> and when to use <code>os.walk</code> <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h2>os.walk() Recursive — How to traverse a Directory Tree?</h2>
<p>Our last question with <code>os.walk</code> is about how to literally <a href="https://blog.finxter.com/iterators-iterables-and-itertools/" data-type="post" data-id="29507" target="_blank" rel="noreferrer noopener">iterate</a> over the entire directory tree. </p>
<p>Concretely, we have some small goals for our next example:</p>
<ul>
<li>Iterate over all files within a directory tree</li>
<li>Iterate over all directories within a directory tree</li>
</ul>
<p>All examples below are still based on our old friend, the example directory tree:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/RU-gX4CYwtmG96UTi_Ex0nA2CILb1a48aRda8sbayxWSVg0GnVrbUGFpOoJyL2NwkjgjQjRRMDk_DMW8PMLf20FBAYfq6NCi_Js-FPAA1EkwPW5442O78oGNaTvqQk6AY-if_XM-" alt=""/></figure>
</div>
<h3>Iterate over all files within a directory tree</h3>
<p>First, let’s head over iterating over all files within a directory tree. This can be achieved by a <a href="https://blog.finxter.com/how-to-write-a-nested-for-loop-in-one-line-python/" data-type="post" data-id="11859" target="_blank" rel="noreferrer noopener">nested <code>for</code> loop</a> in Python.</p>
<p>The potential application could be some sanity checks or number counts for all files within one folder. How about counting the number of <code>.txt</code> files within one folder? Let’s do it!</p>
<p>The code for this application is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_directory_path = './learn_os_walk'
total_file = 0 for pathname, subdirnames, subfilenames in os.walk(a_directory_path): for subfilename in subfilenames: if subfilename.endswith('.txt'): total_file += 1
print(f'\n{total_file}\n')
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/lqr4wjEItFHP4H9oot4cePf3kEk747skshJupXlG2A-MJnX9HxAtSs56m-n5dgtoZvggQh8KKIAFF5Uj8OOssJbxIHN_PSMklx4hV9GxvWKOrrBGiOJBS7dDKZUl5hanLJCEvzgY" alt=""/></figure>
</div>
<p>As you can see, we use another <code>for</code> loop to iterate over subfilenames to get evey file within a directory tree. The output is <code>7</code> and is correct according to our example directory tree.</p>
<p>The full code for this example can be found <a href="https://github.com/anqiwoo/InterestingPythonPuzzles/blob/master/learn_os_walk_count_files.py" target="_blank" rel="noreferrer noopener">here</a>.</p>
<h3>Iterate over all directories within a directory tree</h3>
<p>Last, we can also iterate over all directories within a directory tree. This can be achieved by a nested for loop in Python.</p>
<p>The potential application could be also be some sanity checks or number counts for all directories within one folder. For our example, let’s check if all directories contains <code><a href="https://blog.finxter.com/python-init/" data-type="post" data-id="5133" target="_blank" rel="noreferrer noopener">__init__.py</a></code> file and add an empty <code>__init__.py</code> file if not. </p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Idea</strong>: The <code>__init__.py</code> file signifies whether the entire directory is a Python package or not.</p>
<p>The code for this application is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_directory_path = './learn_os_walk' for pathname, subdirnames, subfilenames in os.walk(a_directory_path): for subdirname in subdirnames: init_filepath = os.path.join(pathname, subdirname, '__init__.py') if not os.path.exists(init_filepath): print(f'Create a new empty [{init_filepath}] file.') with open(init_filepath, 'w') as f: pass
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh6.googleusercontent.com/OciDbIUlpkWjiu56KMYYf4uOZJ1wQxlidMyMix3oZx4zRHan0Nw4DJvVWXrJNJnq8vWMraLvoyGiB1zuiGZA9mVNipsJz6ciOUgV3I415p2TehqEnieNxLzk4nYxIrrcU1fa-qIr" alt=""/></figure>
</div>
<p>As you can see, we use another <code>for</code> loop to iterate over <code>subdirnames</code> to get evey directory within a directory tree. </p>
<p>Before the execution, our directory tree under the <code>take_a_walk</code> function mentioned before looks like this:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/Nw-hjhSoa1DUojcYmZ8Z7fBVB5kSpZ_5j3UCUbAx7OvO20MooQrtWJSn9F3A2KCMow-DtmhRgrySMYW3JW0ePhZGU_xToQoNiGNMQrcXyYr-8_uL-fTf8ZKSnPs-SZkF-PJ1yIC-" alt="" width="305" height="414"/></figure>
</div>
<p>After the execution, we can take a walk along the directory tree again and we get result like:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/NXiGZRZyez7cSSCdeNu_XiITJln-vv8OuxI-o3aE8BEoWSeTk6XzW_Ebdz-jsL7YJ5LcLwwabF1TDceDaErItXB1xiccN3NSkFz4A3CtD_vZ7A48mv20FKoVnBwhx8eqc5WmHSCz" alt="" width="261" height="359"/></figure>
</div>
<p>Hooray! We successfully iterate every directory within a directory tree and complete the <code>__init__.py</code> sanity check.</p>
<p>The full code for this example can be found <a href="https://github.com/anqiwoo/InterestingPythonPuzzles/blob/master/learn_os_walk_init_check.py" target="_blank" rel="noreferrer noopener">here</a>.</p>
<p>In summary, you can use <code>os.walk</code> recursively traverse every file or directory within a directory tree through a nested for loop.</p>
<h2>Conclusion</h2>
<p>That’s it for our <code>os.walk()</code> article!</p>
<p>We learned about its syntax, IO relationship, and difference between <code>os.walk</code> and <code>os.listdir</code>. </p>
<p>We also worked on real usage examples, ranging from changing the search direction through topdown parameter, <code>.txt</code> file number count, and <code>__init__.py</code> sanity check. </p>
<p>Hope you enjoy all this and happy coding!</p>
<hr class="wp-block-separator"/>
<h2>About the Author</h2>
<p>Anqi Wu is an aspiring Data Scientist and self-employed Technical Consultant. She is an incoming student for a Master’s program in Data Science and builds her technical consultant profile on Upwork.</p>
<p>Anqi is passionate about machine learning, statistics, data mining, programming, and many other data science related fields. During her undergraduate years, she has proven her expertise, including multiple winning and top placements in mathematical modeling contests. She loves supporting and enabling data-driven decision-making, developing data services, and teaching. </p>
<p>Here is a link to the author’s personal website: <a href="https://www.anqiwu.one/">https://www.anqiwu.one/</a>. She uploads data science blogs weekly there to document her data science learning and practicing for the past week, along with some best learning resources and inspirational thoughts.</p>
<p>Hope you enjoy this article! Cheers!</p>
</div>
https://www.sickgaming.net/blog/2022/03/...ted-guide/
<div><p>According to the <a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-check-your-python-version/" data-type="post" data-id="1371" target="_blank">Python version</a> <strong>3.10.3</strong> official doc, the <a rel="noreferrer noopener" href="https://docs.python.org/3/library/os.html" target="_blank"><code>os</code> module</a> provides built-in miscellaneous operating system interfaces. We can achieve many operating system dependent functionalities through it. One of the functionalities is to <strong>generate the file names in a directory tree</strong> through <strong><code>os.walk()</code></strong>.</p>
<p>If it sounds great to you, please continue reading, and you will fully understand os.walk through Python code snippets and vivid visualization.</p>
<p>In this article, I will first introduce the usage of <code>os.walk</code> and then address three top questions about <code>os.walk</code>, including passing a file’s filepath to <code>os.walk</code>, <code>os.walk</code> vs. <code>os.listdir</code>, and <code>os.walk</code> recursive.</p>
<h2>How to Use os.walk and the topdown Parameter?</h2>
<h3>Syntax</h3>
<p>Here is the syntax for <code>os.walk</code>:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/fOktz0IKYE27u2A2pR5erisdxrh9IVnuRFquolz6k4FQSuBIV5Q9kN5mskJT8ICqATNhi59SqpJpOb629v9Nnu2yTB2wVOWUAsbT6x-tp9eCR8YPUIy2YtcqNjCerNNAj3HUgiOp" alt=""/></figure>
<h3>Input</h3>
<p>1. Must-have parameters:</p>
<ul>
<li><strong><code>top</code></strong>: accepts a directory(or file) path string that you want to use as the root to generate filenames.</li>
</ul>
<p>2. Optional parameters:</p>
<ul>
<li><strong><code>topdown</code></strong>: accepts a boolean value, <code>default=True</code>. If <code>True</code> or not specified, directories are scanned from top-down. Otherwise, directories are scanned from the bottom-up. If you are still confused about this <code>topdown</code> parameter like I first get to know <code>os.walk</code>, I have a nicely visualization in the example below.</li>
<li><code>onerror</code>: accepts a function with one argument, <code>default=None</code>. It can report the error to continue with the walk, or raise the exception to abort the walk.</li>
<li><code>followlinks</code>: accepts a boolean value, <code>default=False</code>. If <code>True</code>, we visit directories pointed to by symlinks, on systems that support them.</li>
</ul>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Tip</strong>: Generally, you only need to use the first two parameters in bold format.</p>
<h3>Output</h3>
<p>Yields 3-tuples (dirpath, dirnames, filenames) for each directory in the tree rooted at directory top (including top itself).</p>
<h3>Example</h3>
<p>I think the best way to comprehend <code>os.walk</code> is walking through an example.</p>
<p>Our example directory tree and its labels are:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/RU-gX4CYwtmG96UTi_Ex0nA2CILb1a48aRda8sbayxWSVg0GnVrbUGFpOoJyL2NwkjgjQjRRMDk_DMW8PMLf20FBAYfq6NCi_Js-FPAA1EkwPW5442O78oGNaTvqQk6AY-if_XM-" alt=""/></figure>
</div>
<p><em>By the way, the difference between a directory and a file is that a directory can contains many files like the above directory D contains <code>4.txt</code> and <code>5.txt</code>.</em></p>
<p>Back to our example, our goal is to </p>
<ul>
<li>Generate filenames based on the root directory, <code>learn_os_walk</code></li>
<li>Understand the difference between <code>topdown=True</code> and <code>topdown=False</code></li>
</ul>
<p>To use the <code>os.walk()</code> method, we need to first import <code>os</code> module:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/8eEm16wd6EFfpJPZ9WAF_uF_wzfFTtbmDhpZn2MQVLg8lnN8K9wPQirXIK0qrjKmEULn9TRj6seW90skeLmTZb-isx9YsfItdPng5C2n05Xv5bWmJtSWuSecU6MMjyfUQun2BGnu" alt=""/></figure>
<p>Then we can pass the input parameters to the <code>os.walk</code> and generate filenames. The code snippet is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">a_directory_path = './learn_os_walk' def take_a_walk(fp, topdown_flag=True): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.
</pre>
<figure class="wp-block-image"><img src="https://lh6.googleusercontent.com/k3nva1QZzCfunGoLdjSuye6EqvOwv8tZ_W5lLrxEHMegRgTzPVyOmDuRg1L9dLUCf2AFoqc46WRg-sDuNUPnU6OXOAVqncRL4wgGJG00qgOqch6cr7ptnxsgZAsctJsljeoErm_c" alt=""/></figure>
<p>The above code has a function <code>take_a_walk</code> to use <code>os.walk</code> along with a <a href="https://blog.finxter.com/python-loops/" data-type="post" data-id="4596" target="_blank" rel="noreferrer noopener">for loop</a>. This is the most often usage of <code>os.walk</code> so that you can get every file level and filenames from the root directory iteratively. </p>
<p>For those with advanced knowledge in Python’s <a href="https://blog.finxter.com/understanding-generators-in-python/" data-type="post" data-id="33873" target="_blank" rel="noreferrer noopener">generator</a>, you would probably have already figured out that <code>os.walk</code> actually gives you a generator to <a href="https://blog.finxter.com/yield-keyword-in-python-a-simple-illustrated-guide/" data-type="post" data-id="14682" target="_blank" rel="noreferrer noopener">yield</a> next and next and next 3-tuple……</p>
<p>Back in this code, we set a <code>True</code> flag for the <code>topdown</code> argument. Visually, the topdown search way is like the orange arrow in the picture below:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh3.googleusercontent.com/FUigmU9wamX-jo4ONI50A_6-licIWn7CMrmJ4X60kp75q_XfqucUm0o2kD8XbKBkKiJH-5APk03_rjhmMx8sC-6JBiNrUUdF-g_UWCZHPCZvqa4MteUauBIW_2cozcC1TyIutfK_" alt=""/></figure>
</div>
<p>And if we run the above code, we can the below result:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/Nw-hjhSoa1DUojcYmZ8Z7fBVB5kSpZ_5j3UCUbAx7OvO20MooQrtWJSn9F3A2KCMow-DtmhRgrySMYW3JW0ePhZGU_xToQoNiGNMQrcXyYr-8_uL-fTf8ZKSnPs-SZkF-PJ1yIC-" alt="" width="251" height="341"/></figure>
</div>
<p>If we set the topdown to be <code>False</code>, we are walking the directory tree from its bottom directory D like this:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh5.googleusercontent.com/6npy_jaH9M5LFze668IIkmpGvw8KnloD9Z0y6BFrIuk9mq9tzKLQu_HpPCv8ObQsfr8GEwcGoXovruJ-3yV4Bd1XwjLyJagTyIV4ApTQah9spOwuoLxDCZ7oNdoDXgrpJ9wYC63I" alt=""/></figure>
</div>
<p>The corresponding code snippet is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">a_directory_path = './learn_os_walk' def take_a_walk(fp, topdown_flag=False): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh3.googleusercontent.com/UX_xHkW0YGkMhQugMEUd84GcqOCF9G-egBHx34R4lYLSXZ30VxcW8YgiWbpYafXfO7JsE1TaWskGPulzShfH3kyLYmnCAI4auYGulY8perE_1LJmhTP4byBj24n6OSTiBGYEyxAL" alt=""/></figure>
</div>
<p>And if we run the above code, we can the below result:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/Wl58daoFZum7nTYG0iygDw6m8Hy4anHBD63i3vw2I4hH2SSkiBAm80U1qzNVWyPPCBxhxtYlGGw0aXHwRxANh_W02inT3Rky9E7WvkNtEuVDcWTBwW5Viyo2g_zXWf2klPgsmfYK" alt="" width="257" height="349"/></figure>
</div>
<p>Now, I hope you understand how to use <code>os.walk</code> and the difference between <code>topdown=True</code> and <code>topdown=False</code>. <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p>Here’s the full code for this example:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">__author__ = 'Anqi Wu' import os a_directory_path = './learn_os_walk'
a_file_path = './learn_os_walk.py' # same as a_file_path = __file__ def take_a_walk(fp, topdown_flag=True): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a file path
take_a_walk(a_file_path)
# Output Just 'What a walk!'
# Because there are neither subdirnames nor subfilenames in a single file !
# It is like:
# for i in []:
# print('hi!') # We are not going to execute this line. # *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside. # *Try to list all files and directories in a directory path
print('\n')
print(os.listdir(a_directory_path))
print('\n')</pre>
<h2>How to Pass a File’s filepath to os.walk?</h2>
<p>Of course, you might wonder what will happen if we pass a file’s filepath, maybe a Python module filepath string like <code>'./learn_os_walk.py'</code> to the <code>os.walk</code> function.</p>
<p>This is exactly a point I was thinking when I started using this method. The simple answer is that <strong>it will not execute your codes under the for loop</strong>.</p>
<p>For example, if you run a code in our <code>learn_os_walk.py</code> like this:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_file_path = './learn_os_walk.py' # same as a_file_path = __file__ def take_a_walk(fp, topdown_flag=False): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a file path
take_a_walk(a_file_path)
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/vMK_nlz-ns8zdF5n5OYSk-8l1gPUmYfLx1xQIa40eNtcuIsMkzGFJLmw4mfLIBxUkpQdjWtqgfap258Oz2AScJFo2oGzXKW-EyvrKyFknZhs2lf3uw-NzB0aMCnt2xIs_hTQQYeM" alt=""/></figure>
</div>
<p>The only output would be like this:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/WnJPLIJ97Z4SuO1W9z7BVkj0MsMc9G3z9JqRGgbSZ_v5pnCK-4IOH6Hw_xhoiPNbuuUihrf7yhOFqTwOa1bSsRoFPZ_JH9-on-3AP3G6EspIfMHhw1kkC_whqnif9HTLpA4aaer5" alt="" width="285" height="62"/></figure>
</div>
<p>Why is that?</p>
<p><strong>Because there are neither subdirnames nor subfilenames in a single file</strong>! It is like you are writing the below code:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">for i in []: print('hi!')</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh6.googleusercontent.com/cYsMXquGBpvFT-sjlCmlM2l2krJlae7R-tKnuvzCjfqxDPppk5hYViyNPlWvgXOHzLVVUON2M84lcFN7-6s-MQWLzfURiZas2yuwOkbRRox3g7YwET3OUAgfsxrSOwBdn7TAuEZB" alt=""/></figure>
</div>
<p>And you will not get any <code>'hi'</code> output because there is no element in an <a href="https://blog.finxter.com/how-to-check-if-a-python-list-is-empty/" data-type="post" data-id="9090" target="_blank" rel="noreferrer noopener">empty list</a>.</p>
<p>Now, I hope you understand why the official doc tells us to pass a path to a directory instead of a file’s filepath <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h2>os.walk vs os.listdir — When to Use Each?</h2>
<p>A top question of programmers concerns the difference between <code>os.walk</code> vs <code>os.listdir</code>. </p>
<p>The simple answer is:</p>
<p class="has-global-color-8-background-color has-background">The <code>os.listdir()</code> method returns a <a href="https://blog.finxter.com/python-lists/" data-type="post" data-id="7332" target="_blank" rel="noreferrer noopener">list</a> of every file and folder in a directory. The <code>os.walk()</code> method returns a list of every file in an entire file tree.</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh6.googleusercontent.com/BC9Egy0ipAEJQuQdwZuGUTcwK7PcDp8jap_ozfpq6EL60-O2sfdB2HELJxvbzh-wBSD-cJK0QL738jlN8Ap2ZEEdgjGt4jxPsKa3J7aHoT-VsgMgpSO49P14XSCG8pEvxwpdiThT" alt=""/></figure>
</div>
<p>Well, if you feel a little bit uncertain, we can then use code examples to help us understand better!</p>
<p>We will stick to our same example directory tree as below:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/RU-gX4CYwtmG96UTi_Ex0nA2CILb1a48aRda8sbayxWSVg0GnVrbUGFpOoJyL2NwkjgjQjRRMDk_DMW8PMLf20FBAYfq6NCi_Js-FPAA1EkwPW5442O78oGNaTvqQk6AY-if_XM-" alt=""/></figure>
</div>
<p>In this case, if we call <code>os.listdir()</code> method and pass the directory path of <code>learn_os_walk</code> to it like the code below:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_directory_path = './learn_os_walk' # *Try to list all files and directories in a directory path
print('\n')
print(os.listdir(a_directory_path))
print('\n')
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh3.googleusercontent.com/apX569Vd-f2mJtiwLkn4KZ_5o5lbvbYgr-NsFD55QMDcheCSGP-T4kH89NP-AZzM1BXZ9-6YjvCeaicx_SD2Ro3x_OqkpdlTEWrDqw9yxSXy3cpSZZwrGrBHsOcrT_l96_b6OoXZ" alt=""/></figure>
</div>
<p>And we will get an output like:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/PVpb-AJH8hABEYiEQZSDPks7Iy51KA5X9PGsV6Ix7twt1EL6SGyp9-6mKOC1qqahWNB0s7aZPK_ashhpX4gL9C8i6g2DpjkAwhTlbNSqmuOwTLGyWgxgR1TuitJvJ4d4IBoj1mww" alt="" width="251" height="50"/></figure>
</div>
<p>That’s it! Only the first layer of this entire directory tree is included. Or I should say that the <code>os.listdir()</code> cares only about what is directly in the root directory instead of searching through the entire directory tree like we see before in the <code>os.walk</code> example.</p>
<h3>Summary</h3>
<p class="has-global-color-8-background-color has-background"><strong>Summary</strong>: If you want to get a list of all filenames and directory names within a root directory, go with the <code>os.listdir()</code> method. If you want to iterate over an entire directory tree, you should consider <code>os.walk()</code> method.</p>
<p>Now, I hope you understand when to use <code>os.listdir</code> and when to use <code>os.walk</code> <img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h2>os.walk() Recursive — How to traverse a Directory Tree?</h2>
<p>Our last question with <code>os.walk</code> is about how to literally <a href="https://blog.finxter.com/iterators-iterables-and-itertools/" data-type="post" data-id="29507" target="_blank" rel="noreferrer noopener">iterate</a> over the entire directory tree. </p>
<p>Concretely, we have some small goals for our next example:</p>
<ul>
<li>Iterate over all files within a directory tree</li>
<li>Iterate over all directories within a directory tree</li>
</ul>
<p>All examples below are still based on our old friend, the example directory tree:</p>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/RU-gX4CYwtmG96UTi_Ex0nA2CILb1a48aRda8sbayxWSVg0GnVrbUGFpOoJyL2NwkjgjQjRRMDk_DMW8PMLf20FBAYfq6NCi_Js-FPAA1EkwPW5442O78oGNaTvqQk6AY-if_XM-" alt=""/></figure>
</div>
<h3>Iterate over all files within a directory tree</h3>
<p>First, let’s head over iterating over all files within a directory tree. This can be achieved by a <a href="https://blog.finxter.com/how-to-write-a-nested-for-loop-in-one-line-python/" data-type="post" data-id="11859" target="_blank" rel="noreferrer noopener">nested <code>for</code> loop</a> in Python.</p>
<p>The potential application could be some sanity checks or number counts for all files within one folder. How about counting the number of <code>.txt</code> files within one folder? Let’s do it!</p>
<p>The code for this application is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_directory_path = './learn_os_walk'
total_file = 0 for pathname, subdirnames, subfilenames in os.walk(a_directory_path): for subfilename in subfilenames: if subfilename.endswith('.txt'): total_file += 1
print(f'\n{total_file}\n')
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh4.googleusercontent.com/lqr4wjEItFHP4H9oot4cePf3kEk747skshJupXlG2A-MJnX9HxAtSs56m-n5dgtoZvggQh8KKIAFF5Uj8OOssJbxIHN_PSMklx4hV9GxvWKOrrBGiOJBS7dDKZUl5hanLJCEvzgY" alt=""/></figure>
</div>
<p>As you can see, we use another <code>for</code> loop to iterate over subfilenames to get evey file within a directory tree. The output is <code>7</code> and is correct according to our example directory tree.</p>
<p>The full code for this example can be found <a href="https://github.com/anqiwoo/InterestingPythonPuzzles/blob/master/learn_os_walk_count_files.py" target="_blank" rel="noreferrer noopener">here</a>.</p>
<h3>Iterate over all directories within a directory tree</h3>
<p>Last, we can also iterate over all directories within a directory tree. This can be achieved by a nested for loop in Python.</p>
<p>The potential application could be also be some sanity checks or number counts for all directories within one folder. For our example, let’s check if all directories contains <code><a href="https://blog.finxter.com/python-init/" data-type="post" data-id="5133" target="_blank" rel="noreferrer noopener">__init__.py</a></code> file and add an empty <code>__init__.py</code> file if not. </p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/13.1.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Idea</strong>: The <code>__init__.py</code> file signifies whether the entire directory is a Python package or not.</p>
<p>The code for this application is:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import os a_directory_path = './learn_os_walk' for pathname, subdirnames, subfilenames in os.walk(a_directory_path): for subdirname in subdirnames: init_filepath = os.path.join(pathname, subdirname, '__init__.py') if not os.path.exists(init_filepath): print(f'Create a new empty [{init_filepath}] file.') with open(init_filepath, 'w') as f: pass
</pre>
<div class="wp-block-image">
<figure class="aligncenter"><img src="https://lh6.googleusercontent.com/OciDbIUlpkWjiu56KMYYf4uOZJ1wQxlidMyMix3oZx4zRHan0Nw4DJvVWXrJNJnq8vWMraLvoyGiB1zuiGZA9mVNipsJz6ciOUgV3I415p2TehqEnieNxLzk4nYxIrrcU1fa-qIr" alt=""/></figure>
</div>
<p>As you can see, we use another <code>for</code> loop to iterate over <code>subdirnames</code> to get evey directory within a directory tree. </p>
<p>Before the execution, our directory tree under the <code>take_a_walk</code> function mentioned before looks like this:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh3.googleusercontent.com/Nw-hjhSoa1DUojcYmZ8Z7fBVB5kSpZ_5j3UCUbAx7OvO20MooQrtWJSn9F3A2KCMow-DtmhRgrySMYW3JW0ePhZGU_xToQoNiGNMQrcXyYr-8_uL-fTf8ZKSnPs-SZkF-PJ1yIC-" alt="" width="305" height="414"/></figure>
</div>
<p>After the execution, we can take a walk along the directory tree again and we get result like:</p>
<div class="wp-block-image">
<figure class="aligncenter is-resized"><img loading="lazy" src="https://lh4.googleusercontent.com/NXiGZRZyez7cSSCdeNu_XiITJln-vv8OuxI-o3aE8BEoWSeTk6XzW_Ebdz-jsL7YJ5LcLwwabF1TDceDaErItXB1xiccN3NSkFz4A3CtD_vZ7A48mv20FKoVnBwhx8eqc5WmHSCz" alt="" width="261" height="359"/></figure>
</div>
<p>Hooray! We successfully iterate every directory within a directory tree and complete the <code>__init__.py</code> sanity check.</p>
<p>The full code for this example can be found <a href="https://github.com/anqiwoo/InterestingPythonPuzzles/blob/master/learn_os_walk_init_check.py" target="_blank" rel="noreferrer noopener">here</a>.</p>
<p>In summary, you can use <code>os.walk</code> recursively traverse every file or directory within a directory tree through a nested for loop.</p>
<h2>Conclusion</h2>
<p>That’s it for our <code>os.walk()</code> article!</p>
<p>We learned about its syntax, IO relationship, and difference between <code>os.walk</code> and <code>os.listdir</code>. </p>
<p>We also worked on real usage examples, ranging from changing the search direction through topdown parameter, <code>.txt</code> file number count, and <code>__init__.py</code> sanity check. </p>
<p>Hope you enjoy all this and happy coding!</p>
<hr class="wp-block-separator"/>
<h2>About the Author</h2>
<p>Anqi Wu is an aspiring Data Scientist and self-employed Technical Consultant. She is an incoming student for a Master’s program in Data Science and builds her technical consultant profile on Upwork.</p>
<p>Anqi is passionate about machine learning, statistics, data mining, programming, and many other data science related fields. During her undergraduate years, she has proven her expertise, including multiple winning and top placements in mathematical modeling contests. She loves supporting and enabling data-driven decision-making, developing data services, and teaching. </p>
<p>Here is a link to the author’s personal website: <a href="https://www.anqiwu.one/">https://www.anqiwu.one/</a>. She uploads data science blogs weekly there to document her data science learning and practicing for the past week, along with some best learning resources and inspirational thoughts.</p>
<p>Hope you enjoy this article! Cheers!</p>
</div>
https://www.sickgaming.net/blog/2022/03/...ted-guide/