[Tut] Python | Split String by Whitespace

Python | Split String by Whitespace

<p class="has-background" style="background-color:#d2fbfd"><strong>Summary:&nbsp;</strong>Use&nbsp;<code>"given string".split()</code>&nbsp;to split the given string by whitespace and store each word as an individual item in a list.<br /><strong>Minimal Example:</strong><br /><code>print("Welcome Finxter".split())</code><br /># OUTPUT: [‘Welcome’, ‘Finxter’]</p>
<h2><strong>Problem Formulation</strong></h2>
<p><strong>Problem</strong>: Given a string, How will you split the string into a list of words using whitespace as a separator/delimiter?</p>
<p>Let’s understand the problem with the help of a few examples:</p>
<figure class="wp-block-table is-style-stripes">
<td><strong>Example 1:</strong><br /><strong>Input:</strong> text = “Welcome to the world of Python”<br /><strong>Explanation: </strong>Split the string into a list of words using a space ” ” as the delimiter to separate the words from the given string. <br /><strong>Output: </strong><br />[‘Welcome’, ‘to’, ‘the’, ‘world’, ‘of’, ‘Python’]</p>
<p><strong>Example 2: </strong><br /><strong>Input:</strong><br />text = “””Item_1<br />Item_2<br />Item_3″””<br />print(text.split(‘\n’))<br /><strong>Explanation: </strong>Split the string into a list of words using a newline “\n” as the delimiter to separate the words from the given string. <br /><strong>Output:</strong> [‘Item_1’, ‘Item_2’, ‘Item_3’]</p>
<p><strong>Example 3: </strong><br />text = “This is just a random text:\n New Line”<br /><strong>Explanation: </strong>The given string contains a combination of whitespaces between the words, such as space, multiple-spaces, a tab and a new line character. All of these whitespace characters have to be considered as delimiters while separating the words from the given string and storing them as items in a list. Here’s how the output looks: <br /><strong>Output:</strong><br />[‘This’, ‘is’, ‘just’, ‘a’, ‘random’, ‘text:’, ‘New’, ‘Line’]</td>
<p>So, we have two situations at hand. One, that has a single whitespace used as a delimiter and another that has multiple whitespace characters as delimiters in the same string. Let’s dive into the numerous ways of solving this problem. </p>
<h2><strong>Method 1: Using split()</strong> </h2>
<p><code>split()</code> is a built-in method in Python which splits the string at a given separator and returns a split list of substrings. Here’s a minimal example that demonstrates how the <code>split</code> function works – <code>finxterx42'.split('x')</code> will split the string with the character ‘x’ as the delimiter and return the following list as an output: <code>['fin', 'ter', '42']</code>. The default separator, i.e., when no value is passed to the split function is considered as any whitespace character, i.e., it will take into account any whitespace such as ‘\n’, ” “, ‘\t’, etc.</p>
<p class="has-base-background-color has-background">Read more about the <code>split()</code> method in this blog tutorial: <strong><a rel="noreferrer noopener" href="" target="_blank">Python String split()</a></strong>.</p>
<p><strong>Approach: </strong>Thus to split a string based on a given whitespace delimiter, you can simply pass the specific whitespace character as a separator/delimiter to the <code>split('whitespace_character')</code> function.</p>
<p><strong>Code: </strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="3,10,15" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Example 1:
text = "Welcome to the world of Python"
print(text.split(' '))
# OUTPUT: ['Welcome', 'to', 'the', 'world', 'of', 'Python'] # Example 2:
text = """Item 1
Item 2
Item 3"""
# OUTPUT: ['Item_1', 'Item_2', 'Item_3'] # Example 3: text = "This is just a\trandom text:\nNew Line"
print(text.split()) # OUTPUT: ['This', 'is', 'just', 'a', 'random', 'text:', 'New', 'Line']</pre>
<p>Note that to separate the words in the third example we did specify any separator within the <code>split()</code> function. This is because when you don’t specify the separator, then Python will automatically consider that any whitespace character that occurs within the given string is a separator. </p>
<h2><strong>Method 2: Using <a href="" target="_blank" rel="noreferrer noopener">regex</a></strong></h2>
<p>Another extremely handy way of separating a string with whitespace characters as separators is to use the regex library. </p>
<p><strong>Approach 1: </strong>Import the regex library and use its split method as <code>re.split('\s+', text)</code> where ‘\s+’ returns a match whenever the string contains one or more whitespace characters. Therefore, whenever any whitespace character is encountered, the string will be separated at that point. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="4, 11, 16" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re
# Example 1:
text = "Welcome to the world of Python"
print(re.split('\s+', text))
# OUTPUT: ['Welcome', 'to', 'the', 'world', 'of', 'Python'] # Example 2:
text = """Item_1
print(re.split('\s+', text))
# OUTPUT: ['Item_1', 'Item_2', 'Item_3'] # Example 3:
text = "This is just a\trandom text:\nNew Line"
print(re.split('\s+', text))
# OUTPUT: ['This', 'is', 'just', 'a', 'random', 'text:', 'New', 'Line']</pre>
<p class="has-base-background-color has-background"><strong>Related Tutorial: <a href="" target="_blank" rel="noreferrer noopener">Python Regex Split</a></strong></p>
<p><strong>Approach 2: </strong>Another way of using the regex library to solve this question is to use the <code>findall()</code> method of the regex library. Import the regex library and use <code>re.findall(r'\S+', text)</code> where the expression returns all the characters/words in a list that do not contain any whitespace character. This essentially means that whenever Python finds and segregates a string that has no whitespace in it. As soon as a whitespace character is found it considers that as a breakpoint, therefore the next word that has a continuous sequence of characters without the presence of any whitespace character is taken into account. </p>
<p>Here’s a graphical representation of the above explanaton:</p>
<figure class="wp-block-image size-full is-style-default"><img loading="lazy" width="715" height="456" src="" alt="" class="wp-image-831613" srcset=" 715w, 300w" sizes="(max-width: 715px) 100vw, 715px" /></figure>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="4,11,16" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re
# Example 1:
text = "Welcome to the world of Python"
print(re.findall(r'\S+', text))
# OUTPUT: ['Welcome', 'to', 'the', 'world', 'of', 'Python'] # Example 2:
text = """Item_1
print(re.findall(r'\S+', text))
# OUTPUT: ['Item_1', 'Item_2', 'Item_3'] # Example 3:
text = "This is just a random text:\n New Line"
print(re.findall(r'\S+', text))
# OUTPUT: ['This', 'is', 'just', 'a', 'random', 'text:', 'New', 'Line']</pre>
<p class="has-base-background-color has-background"><strong>Related Tutorial: <a href="" target="_blank" rel="noreferrer noopener">Python re.findall() – Everything You Need to Know</a></strong></p>
<p><strong><em>Do you want to master the regex superpower?</em></strong> Check out my new book <em><strong><a href="" target="_blank" rel="noreferrer noopener" title="[eBook] The Smartest Way to Learn Python Regex">The Smartest Way to Learn Regular Expressions in Python</a></strong></em> with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video. </p>
<p>We have successfully solved the given problem using different approaches. I hope you enjoyed this&nbsp;<a rel="noreferrer noopener" href="" target="_blank">article</a>&nbsp;and it helps you in your Python coding journey. Please&nbsp;<a rel="noreferrer noopener" href="" target="_blank">subscribe and stay tuned</a>&nbsp;for more interesting articles!</p>
<p class="has-base-2-background-color has-background"><strong>Related Reads:</strong><br /><a rel="noreferrer noopener" href="" target="_blank">⦿</a>&nbsp;<a rel="noreferrer noopener" href="" target="_blank"><strong>How To Split A String And Keep The Separators?</strong></a><a rel="noreferrer noopener" href="" target="_blank"><br />⦿</a>&nbsp;<a rel="noreferrer noopener" href="" target="_blank"><strong>How To Cut A String In Python?</strong></a> <a rel="noreferrer noopener" href="" target="_blank"><br />⦿&nbsp;<strong>Python | Split String into Characters</strong></a></p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
