Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python Regex And Operator [Tutorial + Video]

#1
Python Regex And Operator [Tutorial + Video]

<div><p>This tutorial is all about the <strong>AND operator of Python’s&nbsp;<a rel="noreferrer noopener" target="_blank" href="https://docs.python.org/3/library/re.html">re library</a>.</strong> You may ask: what? (And rightly so.) </p>
<p>Sure, there’s the OR operator (example: <code>'iPhone|iPad'</code>). But what’s the meaning of matching one regular expression AND another? </p>
<p>There are different interpretations for the AND operator in a regular expression (regex):</p>
<ul>
<li><strong>Ordered</strong>: Match one regex pattern after another. In other words, you first match pattern <code>A</code> AND then you match pattern <code>B</code>. Here the answer is simple: you use the pattern <code>AB</code> to match both.</li>
<li><strong>Unordered</strong>: Match multiple patterns in a string but in no particular order (<a href="https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator">source</a>). In this case, you’ll use a bag-of-words approach.</li>
</ul>
<p>I’ll discuss both in the following. (You can also watch the video as you read the tutorial.)</p>
<figure class="wp-block-embed-youtube wp-block-embed is-type-rich is-provider-embed-handler wp-embed-aspect-16-9 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<div class="ast-oembed-container"><iframe title="Python Regex And Operator [Tutorial + Video]" width="1100" height="619" src="https://www.youtube.com/embed/r9Gaauyf1Qk?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>
</p></div>
</figure>
<h2>Ordered Python Regex AND Operator</h2>
<p><strong>Given a string. Say, your goal is to find all substrings that match string <code>'iPhone'</code>, followed by string <code>'iPad'</code>. You can view this as the AND operator of two regular expressions. How can you achieve this?</strong></p>
<p><strong>The straightforward AND operation of both strings is the regular expression pattern <code>iPhoneiPad</code>. </strong></p>
<p>In the following example, you want to match pattern ‘aaa’ and pattern ‘bbb’—in this order.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'aaabaaaabbb'
>>> A = 'aaa'
>>> B = 'bbb'
>>> re.findall(A+B, text)
['aaabbb']
>>> </pre>
<p>You use the re.findall() method. In case you don’t know it, here’s the definition from the <a href="https://blog.finxter.com/python-re-findall/">Finxter blog article</a>:</p>
<p><strong><em>The re.findall(pattern, string) method finds all occurrences of the pattern in the string and returns a list of all matching substrings.</em></strong></p>
<p><a href="https://blog.finxter.com/python-re-findall/">Please consult the blog article to learn everything you need to know about this fundamental Python method.</a></p>
<p>The first argument is the pattern <code>A+B</code> which evaluates to <code>'aaabbb'</code>. There’s nothing fancy about this: each time you write a string consisting of more than one character, you essentially use the <em>ordered </em>AND operator.</p>
<p>The second argument is the text <code>'aaabaaaabbb'</code> which you want to search for the pattern. </p>
<p>The result shows that there’s a matching substring in the text: <code>'aaabbb'</code>. </p>
<h2>Unordered Python Regex AND Operator</h2>
<p>But what if you want to search a given text for pattern <code>A</code> AND pattern <code>B</code>—but in no particular order? In other words: if both patterns appear anywhere in the string, the whole string should be returned as a match.</p>
<p>Now this is a bit more complicated because any regular expression pattern is ordered from left to right. A simple solution is to use the lookahead assertion <code>(?.*A)</code> to check whether regex <code>A</code> appears anywhere in the string. (Note we assume a single line string as the <code>.*</code> pattern doesn’t match the newline character by default.)</p>
<p>Let’s first have a look at the minimal solution to check for two patterns anywhere in the string (say, patterns <code>'hi'</code> AND <code>'you'</code>). </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> pattern = '(?=.*hi)(?=.*you)'
>>> re.findall(pattern, 'hi how are yo?')
[]
>>> re.findall(pattern, 'hi how are you?')
['']</pre>
<p>In the first example, both words do not appear. In the second example, they do.</p>
<p>But how does the lookahead assertion work? You must know that any other regex pattern “consumes” the matched substring. The consumed substring cannot be matched by any other part of the regex.</p>
<p>Think of the lookahead assertion as a non-consuming pattern match. The regex engine goes from the left to the right—searching for the pattern. At each point, it has one “current” position to check if this position is the first position of the remaining match. In other words, the regex engine tries to “consume” the next character as a (partial) match of the pattern. </p>
<p>The advantage of the lookahead expression is that it doesn’t consume anything. It just “looks ahead” starting from the current position whether what follows would theoretically match the lookahead pattern. If it doesn’t, the regex engine cannot move on.</p>
<figure class="wp-block-image size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/02/lookahead-1024x576.jpg" alt="" class="wp-image-6115" srcset="https://blog.finxter.com/wp-content/uploads/2020/02/lookahead-scaled.jpg 1024w, https://blog.finxter.com/wp-content/uplo...00x169.jpg 300w, https://blog.finxter.com/wp-content/uplo...68x432.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>A simple example of lookahead. The regular expression engine matches (“consumes”) the string partially. Then it checks whether the remaining pattern could be matched without actually matching it.</figcaption></figure>
<p>Let’s go back to the expression <code>(?=.*hi)(?=.*you)</code> to match strings that contain both <code>'hi'</code> and <code>'you'</code>. Why does it work?</p>
<p>The reason is that the lookahead expressions don’t consume anything. You first search for an arbitrary number of characters <code>.*</code>, followed by the word <code>hi</code>. But because the regex engine hasn’t consumed anything, it’s still at the <strong>same position at the beginning of the string</strong>. So, you can repeat the same for the word <code>you</code>.</p>
<p>Note that this method doesn’t care about the order of the two words:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> pattern = '(?=.*hi)(?=.*you)'
>>> re.findall(pattern, 'hi how are you?')
['']
>>> re.findall(pattern, 'you are how? hi!')
['']</pre>
<p>No matter which word <code>"hi"</code> or <code>"you"</code> appears first in the text, the regex engine finds both.</p>
<p>You may ask: why’s the output the empty string? The reason is that the regex engine hasn’t consumed any character. It just checked the lookaheads. So the easy fix is to consume all characters as follows:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> pattern = '(?=.*hi)(?=.*you).*'
>>> re.findall(pattern, 'you fly high')
['you fly high']</pre>
<p>Now, the whole string is a match because after checking the lookahead with <code>'(?=.*hi)(?=.*you)'</code>, you also consume the whole string <code>'.*'</code>. </p>
<h2>Python Regex Not</h2>
<p>How can you search a string for substrings that do NOT match a given pattern? In other words, what’s the “negative pattern” in Python regular expressions?</p>
<p>The answer is two-fold:</p>
<ul>
<li>If you want to match all characters except a set of specific characters, you can use the negative character class <code>[^...]</code>. </li>
<li>If you want to match all substrings except the ones that match a regex pattern, you can use the feature of <a href="https://www.regular-expressions.info/lookaround.html">negative lookahead</a> <code>(?!...)</code>. </li>
</ul>
<p>Here’s an example for the negative character class:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.findall('[^a-m]', 'aaabbbaababmmmnoopmmaa')
['n', 'o', 'o', 'p']</pre>
<p>And here’s an example for the negative lookahead pattern to match all “words that are not followed by words”:</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('[a-z]+(?![a-z]+)', 'hello world')
['hello', 'world']</pre>
<p>The negative lookahead <code>(?![a-z]+)</code> doesn’t consume (<em>match</em>) any character. It just checks whether the pattern <code>[a-z]+</code> does NOT match at a given position. The only times this happens is just before the empty space and the end of the string.</p>
<h2>[Collection] What Are The Different Python Re Quantifiers?</h2>
<p>The “and”, “or”, and “not” operators are not the only regular expression operators you need to understand. So what are other operators?</p>
<p>Next, you’ll get a quick and dirty overview of the most important regex operations and how to use them in Python. Here are the most important regex quantifiers:</p>
<figure class="wp-block-table is-style-stripes">
<table>
<tbody>
<tr>
<td><strong>Quantifier</strong></td>
<td><strong>Description</strong></td>
<td><strong>Example</strong></td>
</tr>
<tr>
<td><code>.</code></td>
<td>The <strong>wild-card</strong> (‘dot’) matches any character in a string except the newline character ‘n’.</td>
<td>Regex ‘…’ matches all words with three characters such as ‘abc’, ‘cat’, and ‘dog’.</td>
</tr>
<tr>
<td><code>*</code></td>
<td>The <strong>zero-or-more</strong> asterisk matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex.</td>
<td>Regex ‘cat*’ matches the strings ‘ca’, ‘cat’, ‘catt’, ‘cattt’, and ‘catttttttt’.</td>
</tr>
<tr>
<td><code>?</code></td>
<td>The <strong>zero-or-one</strong> matches (as the name suggests) either zero or one occurrences of the immediately preceding regex. </td>
<td>Regex ‘cat?’ matches both strings ‘ca’ and ‘cat’ — but not ‘catt’, ‘cattt’, and ‘catttttttt’.</td>
</tr>
<tr>
<td><code>+</code></td>
<td>The <strong>at-least-one</strong> matches one or more occurrences of the immediately preceding regex. </td>
<td>Regex ‘cat+’ does not match the string ‘ca’ but matches all strings with at least one trailing character ‘t’ such as ‘cat’, ‘catt’, and ‘cattt’.</td>
</tr>
<tr>
<td><code>^</code></td>
<td>The <strong>start-of-string</strong> matches the beginning of a string. </td>
<td>Regex ‘^p’ matches the strings ‘python’ and ‘programming’ but not ‘lisp’ and ‘spying’ where the character ‘p’ does not occur at the start of the string.</td>
</tr>
<tr>
<td><code>$</code></td>
<td>The <strong>end-of-string</strong> matches the end of a string. </td>
<td>Regex ‘py$’ would match the strings ‘main.py’ and ‘pypy’ but not the strings ‘python’ and ‘pypi’.</td>
</tr>
<tr>
<td><code>A|B</code></td>
<td>The <strong>OR</strong> matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions. </td>
<td>Regex ‘(hello)|(hi)’ matches strings ‘hello world’ and ‘hi python’. It wouldn’t make sense to try to match both of them at the same time.</td>
</tr>
<tr>
<td><code>AB</code></td>
<td>&nbsp;The <strong>AND</strong> matches first the regex A and second the regex B, in this sequence. </td>
<td>We’ve already seen it trivially in the regex ‘ca’ that matches first regex ‘c’ and second regex ‘a’.</td>
</tr>
</tbody>
</table>
</figure>
<p>Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the ‘^’ operator is usually denoted as the ‘caret’ operator. Those names are not descriptive so I came up with more kindergarten-like words such as the “start-of-string” operator.</p>
<p>We’ve already seen many examples but let’s dive into even more!</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) '''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!'] ''' print(re.findall('is.*and', text)) '''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and'] ''' print(re.findall('her:?', text)) '''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) '''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:'] ''' print(re.findall('^Ha.*', text)) '''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. Can you figure out why Python doesn't find any?
[] ''' print(re.findall('n$', text)) '''
Finds all occurrences where the new-line character 'n'
occurs at the end of the string.
['n'] ''' print(re.findall('(Life|Death)', text)) '''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death'] '''
</pre>
<p>In these examples, you’ve already seen the special symbol ‘n’ which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions. Next, we’ll discover the most important special symbols.</p>
<h2>Related Re Methods</h2>
<p>There are seven important regular expression methods which you must master:</p>
<ul>
<li>The <strong>re.findall(pattern, string)</strong> method returns a list of string matches. Read more in <a href="https://blog.finxter.com/python-re-findall/">our blog tutorial</a>.</li>
<li>The <strong>re.search(pattern, string)</strong> method returns a match object of the first match. Read more in <a href="https://blog.finxter.com/python-regex-search/">our blog tutorial</a>.</li>
<li>The <strong>re.match(pattern, string)</strong> method returns a match object if the regex matches at the beginning of the string. Read more in <a href="https://blog.finxter.com/python-regex-match/">our blog tutorial</a>.</li>
<li>The <strong>re.fullmatch(pattern, string)</strong> method returns a match object if the regex matches the whole string. Read more in <a href="https://blog.finxter.com/python-regex-fullmatch/">our blog tutorial</a>.</li>
<li>The <strong>re.compile(pattern)</strong> method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in <a href="https://blog.finxter.com/python-regex-compile/">our blog tutorial</a>.</li>
<li>The<strong> re.split(pattern, string)</strong> method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in <a href="https://blog.finxter.com/python-regex-split/">our blog tutorial</a>.</li>
<li>The <strong>re.sub(The re.sub(pattern, repl, string, count=0, flags=0)</strong> method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">our blog tutorial</a>.</li>
</ul>
<p>These seven methods are 80% of what you need to know to get started with Python’s regular expression functionality.</p>
<h2>Where to Go From Here?</h2>
<p>You’ve learned everything you need to know about the <strong><em>Python Regex AND </em></strong>Operator. </p>
<p><em><strong>Summary</strong>: </em></p>
<p><em>There are different interpretations for the AND operator in a regular expression (regex):</em></p>
<ul>
<li><em><strong>Ordered</strong>: Match one regex pattern after another. In other words, you first match pattern <code>A</code> AND then you match pattern <code>B</code>. Here the answer is simple: you use the pattern <code>AB</code> to match both.</em></li>
<li><em><strong>Unordered</strong>: Match multiple patterns in a string but in no particular order. In this case, you’ll use a bag-of-words approach.</em></li>
</ul>
<hr class="wp-block-separator"/>
<p><strong>Want to earn money while you learn Python?</strong> Average Python programmers earn more than $50 per hour. You can certainly become average, can’t you?</p>
<p>Join the free webinar that shows you how to become a thriving coding business owner online!</p>
<p><a href="https://blog.finxter.com/webinar-freelancer/">[Webinar] Become a Six-Figure Freelance Developer with Python</a></p>
<p>Join us. It’s fun! <img src="https://s.w.org/images/core/emoji/12.0.0-1/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
</div>


https://www.sickgaming.net/blog/2020/02/...ial-video/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016