[Tut] Python Regex Fullmatch - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python Regex Fullmatch (/thread-93248.html) |
[Tut] Python Regex Fullmatch - xSicKxBot - 01-21-2020 Python Regex Fullmatch <div><p>Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!</p> <p>This article is all about the <strong>re.fullmatch(pattern, string)</strong> method of Python’s <a rel="noreferrer noopener" target="_blank" href="https://docs.python.org/3/library/re.html">re library</a>. There are three similar methods to help you use regular expressions:</p> <ul> <li>The <strong>findall(pattern, string)</strong> method returns a list of string matches. Check out <a href="https://blog.finxter.com/python-re-findall/">our blog tutorial</a>.</li> <li>The <strong>search(pattern, string)</strong> method returns a match object of the first match. Check out <a href="https://blog.finxter.com/python-regex-search/">our blog tutorial</a>. </li> <li>The <strong>match(pattern, string)</strong> method returns a match object if the regex matches at the beginning of the string. Check out <a href="https://blog.finxter.com/python-regex-match/">our blog tutorial</a>.</li> </ul> <p>So how does the re.fullmatch() method work? Let’s study the specification.</p> <h2>How Does re.fullmatch() Work in Python?</h2> <p><strong>The re.fullmatch(pattern, string) method returns a match object<em> if the pattern matches the whole string</em>. </strong></p> <p><strong>Specification</strong>:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">re.fullmatch(pattern, string, flags=0)</pre> <p>The re.fullmatch() method has up to three arguments.</p> <ul> <li><strong>pattern</strong>: the regular expression pattern that you want to match.</li> <li><strong>string</strong>: the string which you want to search for the pattern.</li> <li><strong>flags </strong>(optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know <a href="https://blog.finxter.com/python-regex-flags/">how to use those flags? Check out this detailed article</a> on the Finxter blog.</li> </ul> <p>We’ll explore them in more detail later. </p> <p><strong>Return Value:</strong></p> <p>The re.fullmatch() method returns a match object. You may ask (and rightly so):</p> <h2>What’s a Match Object?</h2> <p>If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where? </p> <p>The <a href="https://docs.python.org/3/library/re.html#match-objects">match object</a> is a simple wrapper for this information. Some regex methods of the re package in Python—such as fullmatch()—automatically create a match object upon the first pattern match.</p> <p>At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> m = re.fullmatch('h...o', 'hello') >>> m.start() 0 >>> m.end() 5</pre> <p>In the first line, you create a match object m by using the re.fullmatch() method. The pattern ‘h…o’ matches in the string ‘hello’ at start position 0 and end position 5. But note that as the fullmatch() method always attempts to match the whole string, the m.start() method will always return zero.</p> <p>Now, you know the purpose of the match object in Python. Let’s check out a few examples of re.fullmatch()!</p> <h2>A Guided Example for re.fullmatch()</h2> <p>First, you import the re module and create the text string to be searched for the regex patterns:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> text = ''' Call me Ishmael. Some years ago--never mind how long precisely --having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. ''' </pre> <p>Let’s say you want to match the full text with this regular expression: </p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.fullmatch('Call(.|\n)*', text) >>> </pre> <p>The first argument is the pattern to be found: <code>'Call(.|\n)*'</code>. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. The third argument <em>flags</em> of the fullmatch() method is optional and we skip it in the code.</p> <p>There’s no output! This means that the re.fullmatch() method did not return a match object. Why? Because at the beginning of the string, there’s no match for the ‘Call’ part of the regex. The regex starts with an empty line! </p> <p>So how can we fix this? Simple, by matching a new line character ‘\n’ at the beginning of the string. </p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.fullmatch('\nCall(.|\n)*', text) <re.Match object; span=(0, 229), match='\nCall me Ishmael. Some years ago--never mind how></pre> <p>The regex (.|\n)* matches an arbitrary number of characters (new line characters or not) after the prefix ‘\nCall’. This matches the whole text so the result is a match object. Note that there are 229 matching positions so the string included in resulting match object is only the prefix of the whole matching string. This fact is often overlooked by beginner coders.</p> <h2>What’s the Difference Between re.fullmatch() and re.match()?</h2> <p>The methods re.fullmatch() and re.match(pattern, string) both return a match object. Both attempt to match at the beginning of the string. The only difference is that re.fullmatch() also attempts to match the end of the string as well: it wants to match the whole string!</p> <p>You can see this difference in the following code:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = 'More with less' >>> re.match('More', text) <re.Match object; span=(0, 4), match='More'> >>> re.fullmatch('More', text) >>> </pre> <p>The re.match(‘More’, text) method matches the string ‘More’ at the beginning of the string ‘More with less’. But the re.fullmatch(‘More’, text) method does not match the whole text. Therefore, it returns the None object—nothing is printed to your shell!</p> <h2>What’s the Difference Between re.fullmatch() and re.findall()?</h2> <p>There are two differences between the re.fullmatch(pattern, string) and re.findall(pattern, string) methods:</p> <ul> <li>re.fullmatch(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.</li> <li>re.fullmatch(pattern, string) can only match the whole string, while re.findall(pattern, string) can return multiple matches in the string.</li> </ul> <p>Both can be seen in the following example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = 'the 42th truth is 42' >>> re.fullmatch('.*?42', text) <re.Match object; span=(0, 20), match='the 42th truth is 42'> >>> re.findall('.*?42', text) ['the 42', 'th truth is 42']</pre> <p>Note that the regex .*? matches an arbitrary number of characters but it attempts to consume as few characters as possible. This is called “non-greedy” match (the *? operator). The fullmatch() method only returns a match object that matches the whole string. The findall() method returns a list of all occurrences. As the match is non-greedy, it finds two such matches. </p> <h2>What’s the Difference Between re.fullmatch() and re.search()?</h2> <p>The methods re.fullmatch() and re.search(pattern, string) both return a match object. However, re.fullmatch() attempts to match the whole string while re.search() matches anywhere in the string.</p> <p>You can see this difference in the following code:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = 'Finxter is fun!' >>> re.search('Finxter', text) <re.Match object; span=(0, 7), match='Finxter'> >>> re.fullmatch('Finxter', text) >>> </pre> <p>The re.search() method retrieves the match of the ‘Finxter’ substring as a match object. But the re.fullmatch() method has no return value because the substring ‘Finxter’ does not match the whole string ‘Finxter is fun!’. </p> <h2>How to Use the Optional Flag Argument?</h2> <p>As you’ve seen in the specification, the fullmatch() method comes with an optional third ‘flag’ argument:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">re.fullmatch(pattern, string, flags=0)</pre> <p>What’s the purpose of the flags argument?</p> <p>Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex). </p> <figure class="wp-block-table is-style-stripes"> <table> <tbody> <tr> <td><strong>Syntax</strong></td> <td><strong>Meaning</strong></td> </tr> <tr> <td> <strong>re.ASCII</strong></td> <td>If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests. </td> </tr> <tr> <td> <strong>re.A</strong> </td> <td>Same as re.ASCII </td> </tr> <tr> <td> <strong>re.DEBUG</strong> </td> <td>If you use this flag, Python will print some useful information to the shell that helps you debugging your regex. </td> </tr> <tr> <td> <strong>re.IGNORECASE</strong> </td> <td>If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z]. </td> </tr> <tr> <td> <strong>re.I</strong> </td> <td>Same as re.IGNORECASE </td> </tr> <tr> <td> <strong>re.LOCALE</strong> </td> <td>Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable. </td> </tr> <tr> <td> <strong>re.L</strong> </td> <td>Same as re.LOCALE </td> </tr> <tr> <td> <strong>re.MULTILINE</strong> </td> <td>This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string. </td> </tr> <tr> <td> <strong>re.M</strong> </td> <td>Same as re.MULTILINE </td> </tr> <tr> <td> <strong>re.DOTALL</strong> </td> <td>Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character. </td> </tr> <tr> <td> <strong>re.S</strong> </td> <td>Same as re.DOTALL </td> </tr> <tr> <td> <strong>re.VERBOSE</strong> </td> <td>To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex. </td> </tr> <tr> <td> <strong>re.X</strong> </td> <td>Same as re.VERBOSE </td> </tr> </tbody> </table> </figure> <p>Here’s how you’d use it in a practical example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = 'Python is great!' >>> re.search('PYTHON', text, flags=re.IGNORECASE) <re.Match object; span=(0, 6), match='Python'></pre> <p>Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.</p> <h2>Where to Go From Here?</h2> <p><strong>This article has introduced the re.fullmatch(pattern, string) method that attempts to match the whole string—and returns a match object if it succeeds or None if it doesn’t.</strong></p> <p>Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:</p> <p><a href="https://blog.finxter.com/subscribe/">Download 8 Free Python Cheat Sheets now!</a></p> </div> https://www.sickgaming.net/blog/2020/01/12/python-regex-fullmatch/ |