Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python | Split String with Regex

#1
Python | Split String with Regex

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;974647&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;0&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;0&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;0\/5 - (0 votes)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;0&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 0px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> <span class="kksr-muted">Rate this post</span> </div>
</div>
<p><strong>Summary: </strong>The different methods to split a string using regex are:</p>
<ul>
<li>re.split()</li>
<li>re.sub()</li>
<li>re.findall()</li>
<li>re.compile()</li>
</ul>
<h2><strong>Minimal Example</strong></h2>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="6,10,14,18" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = "Earth:Moon::Mars:Phobos" # Method 1
res = re.split("[:]+", text)
print(res) # Method 2
res = re.sub(r':', " ", text).split()
print(res) # Method 3
res = re.findall("[^:\s]+", text)
print(res) # Method 4
pattern = re.compile("[^:\s]+").findall
print(pattern(text)) # Output
['Earth', 'Moon', 'Mars', 'Phobos']</pre>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h2>Problem Formulation</h2>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4dc.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Problem:</strong> Given a string and a delimiter. How will you split the string using the given delimiter using different functions from the regular expressions library?</p>
<p><strong>Example: </strong>In the following example, the given string has to be split using a hyphen as the delimiter. </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Input
text = "abc-lmn-xyz" # Expected Output
['abc', 'lmn', 'xyz']</pre>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<h2>Method 1: re.split</h2>
<p>The&nbsp;<code>re.split(pattern, string)</code>&nbsp;method matches all occurrences of the&nbsp;<code>pattern</code>&nbsp;in the&nbsp;<code>string</code>&nbsp;and divides the string along the matches resulting in a list of strings&nbsp;<em>between&nbsp;</em>the matches. For example,&nbsp;<code>re.split('a', 'bbabbbab')</code>&nbsp;results in the list of strings&nbsp;<code>['bb', 'bbb', 'b']</code>.</p>
<figure class="wp-block-image size-full is-style-default"><img loading="lazy" decoding="async" width="768" height="432" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-166.png" alt="" class="wp-image-975043" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-166.png 768w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w" sizes="(max-width: 768px) 100vw, 768px" /></figure>
<p><strong>Approach: </strong>Use the <code>re.split</code> function and pass <code>[_]+</code> as the pattern which splits the given string on occurrence of an underscore. </p>
<p><strong>Code:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = "abc_lmn_xyz"
res = re.split("[_]+", text)
print(res) # ['abc', 'lmn', 'xyz']</pre>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Related Read: <a href="https://blog.finxter.com/python-regex-split" target="_blank" rel="noreferrer noopener">Python Regex Split</a></strong></p>
<h2>Method 2: re.sub</h2>
<p>The regex function&nbsp;<code>re.sub(P, R, S)</code>&nbsp;replaces all occurrences of the pattern&nbsp;<code>P</code>&nbsp;with the replacement&nbsp;<code>R</code>&nbsp;in string&nbsp;<code>S</code>. It returns a new string. For example, if you call&nbsp;<code>re.sub('a', 'b', 'aabb')</code>, the result will be the new string&nbsp;<code>'bbbb'</code>&nbsp;with all characters&nbsp;<code>'a'</code>&nbsp;replaced by&nbsp;<code>'b'</code>.</p>
<figure class="wp-block-image size-full is-style-default"><img decoding="async" loading="lazy" width="768" height="432" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-167.png" alt="" class="wp-image-975054" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-167.png 768w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w" sizes="(max-width: 768px) 100vw, 768px" /></figure>
<p><strong>Approach: </strong>The idea here is to use the <code>re.sub</code> function to replace all occurrences of underscores with a space and then use the split function to split the string at spaces. </p>
<p><strong>Code:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = "abc_lmn_xyz"
res = re.sub(r'_', " ", text).split()
print(res) # ['abc', 'lmn', 'xyz']</pre>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Related Read: <a href="https://blog.finxter.com/python-regex-sub/" target="_blank" rel="noreferrer noopener">Python Regex Sub</a></strong></p>
<h2>Method 3: re.findall</h2>
<p>The&nbsp;<code>re.findall(pattern, string)</code>&nbsp;method scans&nbsp;<code>string</code>&nbsp;from&nbsp;<strong>left to right</strong>, searching for all&nbsp;<strong>non-overlapping matches</strong>&nbsp;of the&nbsp;<code>pattern</code>. It returns a&nbsp;<strong>list of strings</strong>&nbsp;in the matching order when scanning the string from left to right.</p>
<figure class="wp-block-image size-full is-style-default"><img decoding="async" loading="lazy" width="768" height="432" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-168.png" alt="" class="wp-image-975065" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-168.png 768w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w" sizes="(max-width: 768px) 100vw, 768px" /></figure>
<p><strong>Approach: </strong>Find all occurrences of characters that are separated by underscores using the <code>re.findall()</code>.</p>
<p><strong>Code:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = "abc_lmn_xyz"
res = re.findall("[^_\s]+", text)
print(res) # ['abc', 'lmn', 'xyz']</pre>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Related Read:</strong> <strong><a href="https://blog.finxter.com/python-re-findall/" target="_blank" rel="noreferrer noopener">Python re.findall()</a></strong></p>
<h2>Method 4: re.compile</h2>
<p>The method&nbsp;<code>re.compile(pattern)</code>&nbsp;returns a regular expression object from the&nbsp;<code>pattern</code>&nbsp;that provides basic regex methods such as&nbsp;<code>pattern.search(string)</code>,&nbsp;<code>pattern.match(string)</code>, and&nbsp;<code>pattern.findall(string)</code>. The explicit two-step approach of (1) compiling and (2) searching the pattern is more efficient than calling, say,&nbsp;<code>search(pattern, string)</code>&nbsp;at once, if you match the same pattern multiple times because it avoids redundant compilations of the same pattern.</p>
<figure class="wp-block-image size-full is-style-default"><img decoding="async" loading="lazy" width="1024" height="576" src="https://blog.finxter.com/wp-content/uploads/2022/12/image-170.png" alt="" class="wp-image-975084" srcset="https://blog.finxter.com/wp-content/uploads/2022/12/image-170.png 1024w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w, https://blog.finxter.com/wp-content/uplo...68x432.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>
<p><strong>Code:</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = "abc_lmn_xyz"
pattern = re.compile("[^-\s]+").findall
print(pattern(text)) # ['abc', 'lmn', 'xyz']</pre>
<h3><strong>Why use re.compile?</strong></h3>
<ul>
<li><strong>Efficiency: </strong>Using <code>re.compile()</code> to assemble regular expressions is effective when the expression has to be used more than once. Thus, by using the classes/objects created by compile function, we can search for instances that we need within different strings without having to rewirte the expressions again and again. This increases productivity as well as saves time. </li>
<li><strong>Readability: </strong>Another advantage of using <code>re.compile</code> is the readability factor as it leverages you the power to decouple the specification of the regex. </li>
</ul>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><strong>Read: <a rel="noreferrer noopener" href="https://blog.finxter.com/python-regex-compile/#:~:text=Is%20It%20Worth%20Using%20Python%E2%80%99s%20re.compile()%3F" target="_blank">Is It Worth Using Python’s re.compile()?</a></strong></p>
<h2><strong>Exercise</strong></h2>
<p class="has-global-color-8-background-color has-background"><strong>Problem: </strong>Python regex split by spaces, commas, and periods, but not in cases like 1,000 or 1.50.</p>
<p class="has-base-2-background-color has-background"><strong>Given:<br /></strong><code>my_string = "one two 3.4 5,6 seven.eight nine,ten"</code><strong><br />Expected Output:<br /></strong><code>["one", "two", "3.4", "25.6" , "seven", "eight", "nine", "ten"]</code></p>
<p><strong>Solution</strong></p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">my_string = "one two 3.4 25.6 seven.eight nine,ten"
res = re.split('\s|(?&lt;!\d)[,.](?!\d)', my_string)
print(res) # ['one', 'two', '3.4', '25.6', 'seven', 'eight', 'nine', 'ten']</pre>
<h2>Conclusion</h2>
<p>Therefore, we have learned four different ways of splitting a string using the regular expressions package in Python. Feel free to use the suitable technique that fits your needs. The idea of this tutorial was to get you acquainted with the numerous ways of using regex to split a string and I hope it helped you. </p>
<p>Please stay tuned and <strong><a href="https://blog.finxter.com/subscribe/" target="_blank" rel="noreferrer noopener">subscribe</a></strong> for more interesting discussions and tutorials in the future. Happy coding! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<hr class="wp-block-separator has-alpha-channel-opacity" />
<p><strong><em>Do you want to master the regex superpower?</em></strong> Check out my new book <em><strong><a href="https://blog.finxter.com/ebook-the-smartest-way-to-learn-python-regex/" target="_blank" rel="noreferrer noopener" title="[eBook] The Smartest Way to Learn Python Regex">The Smartest Way to Learn Regular Expressions in Python</a></strong></em> with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video. </p>
</div>


https://www.sickgaming.net/blog/2022/12/...ith-regex/
Reply



Forum Jump:


Users browsing this thread:
3 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016