[Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($)

[Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($) - Printable Version

+- Sick Gaming (https://www.sickgaming.net)
+-- Forum: Programming (https://www.sickgaming.net/forum-76.html)
+--- Forum: Python (https://www.sickgaming.net/forum-83.html)
+--- Thread: [Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($) (/thread-93456.html)

[Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($) - xSicKxBot - 02-02-2020

Python Regex – How to Match the Start of Line (^) and End of Line ($)

<div>This article is all about the start of line ^ and end of line $ regular expressions in Python’s <a rel="noreferrer noopener" target="_blank" href="https://docs.python.org/3/library/re.html">re library</a>. These two regexes are fundamental to all regular expressions—even outside the Python world. So invest 5 minutes now and master them once and for all!
<h2>Python Re Start-of-String (^) Regex</h2>
You can use the caret operator ^ to match the beginning of the string. For example, this is useful if you want to ensure that a pattern appears at the beginning of a string. Here’s an example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.findall('^PYTHON', 'PYTHON is fun.')
['PYTHON']</pre>
The findall(pattern, string) method finds all occurrences of the pattern in the string. The caret at the beginning of the pattern ‘^PYTHON’ ensures that you match the word Python only at the beginning of the string. In the previous example, this doesn’t make any difference. But in the next example, it does:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('^PYTHON', 'PYTHON! PYTHON is fun')
['PYTHON']</pre>
Although there are two occurrences of the substring ‘PYTHON’, there’s only one matching substring—at the beginning of the string.
But what if you want to match not only at the beginning of the string but at the beginning of each line in a multi-line string? In other words:
<h3>Python Re Start-of-Line (^) Regex</h3>
The caret operator, per default, only applies to the start of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the beginning of the string.
However, you may want to match at the beginning of each line. For example, you may want to find all lines that start with ‘Python’ in a given string.
You can specify that the caret operator matches the beginning of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = '''
Python is great.
Python is the fastest growing
major programming language in
the world.
Pythonistas thrive.'''
>>> re.findall('^Python', text)
[]
>>> re.findall('^Python', text, re.MULTILINE)
['Python', 'Python', 'Python']
>>> </pre>
The first output is the empty list because the string ‘Python’ does not appear at the beginning of the string. 
The second output is the list of three matching substrings because the string ‘Python’ appears three times at the beginning of a line.
<h3>Python re.sub()</h3>
The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">the Finxter blog tutorial</a>.
You can use the caret operator to substitute wherever some pattern appears at the beginning of the string:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.sub('^Python', 'Code', 'Python is \nPython') 'Code is \nPython'</pre>
Only the beginning of the string matches the regex pattern so you’ve got only one substitution.
Again, you can use the re.MULTILINE flag to match the beginning of each line with the caret operator:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('^Python', 'Code', 'Python is \nPython', flags=re.MULTILINE) 'Code is \nCode'</pre>
Now, you replace both appearances of the string ‘Python’.
<h3>Python re.match(), re.search(), re.findall(), and re.fullmatch()</h3>
Let’s quickly recap the most important regex methods in Python:
<ul>
<li>The re.findall(pattern, string, flags=0) method returns a list of string matches. Read more in <a href="https://blog.finxter.com/python-re-findall/">our blog tutorial</a>.</li>
<li>The re.search(pattern, string, flags=0) method returns a match object of the first match. Read more in <a href="https://blog.finxter.com/python-regex-search/">our blog tutorial</a>.</li>
<li>The re.match(pattern, string, flags=0) method returns a match object if the regex matches at the beginning of the string. Read more in <a href="https://blog.finxter.com/python-regex-match/">our blog tutorial</a>.</li>
<li>The re.fullmatch(pattern, string, flags=0) method returns a match object if the regex matches the whole string. Read more in <a href="https://blog.finxter.com/python-regex-fullmatch/">our blog tutorial</a>.</li>
</ul>
You can see that all four methods search for a pattern in a given string. You can use the caret operator ^ within each pattern to match the beginning of the string. Here’s one example per method:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'Python is Python'
>>> re.findall('^Python', text)
['Python']
>>> re.search('^Python', text)
<re.Match object; span=(0, 6), match='Python'>
>>> re.match('^Python', text)
<re.Match object; span=(0, 6), match='Python'>
>>> re.fullmatch('^Python', text)
>>> </pre>
So you can use the caret operator to match at the beginning of the string. However, you should note that it doesn’t make a lot of sense to use it for the match() and fullmatch() methods as they, by definition, start by trying to match the first character of the string.
You can also use the re.MULTILINE flag to match the beginning of each line (rather than only the beginning of the string):
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = '''Python is
Python'''
>>> re.findall('^Python', text, flags=re.MULTILINE)
['Python', 'Python']
>>> re.search('^Python', text, flags=re.MULTILINE)
<re.Match object; span=(0, 6), match='Python'>
>>> re.match('^Python', text, flags=re.MULTILINE)
<re.Match object; span=(0, 6), match='Python'>
>>> re.fullmatch('^Python', text, flags=re.MULTILINE)
>>> </pre>
Again, it’s questionable whether this makes sense for the re.match() and re.fullmatch() methods as they only look for a match at the beginning of the string.
<h2>Python Re End of String ($) Regex</h2>
Similarly, you can use the dollar-sign operator $ to match the end of the string. Here’s an example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.findall('fun$', 'PYTHON is fun')
['fun']</pre>
The findall() method finds all occurrences of the pattern in the string—although the trailing dollar-sign $ ensures that the regex matches only at the end of the string.
This can significantly alter the meaning of your regex as you can see in the next example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('fun$', 'fun fun fun')
['fun']</pre>
Although, there are three occurrences of the substring ‘fun’, there’s only one matching substring—at the end of the string.
But what if you want to match not only at the end of the string but at the end of each line in a multi-line string?
<h3>Python Re End of Line ($)</h3>
The dollar-sign operator, per default, only applies to the end of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the end of the string.
However, you may want to match at the end of each line. For example, you may want to find all lines that end with ‘.py’.
To achieve this, you can specify that the dollar-sign operator matches the end of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = '''
Coding is fun
Python is fun
Games are fun
Agreed?'''
>>> re.findall('fun$', text)
[]
>>> re.findall('fun$', text, flags=re.MULTILINE)
['fun', 'fun', 'fun']
>>> </pre>
The first output is the empty list because the string ‘fun’ does not appear at the end of the string. 
The second output is the list of three matching substrings because the string ‘fun’ appears three times at the end of a line.
<h3>Python re.sub()</h3>
The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">the Finxter blog tutorial</a>.
You can use the dollar-sign operator to substitute wherever some pattern appears at the end of the string:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> re.sub('Python$', 'Code', 'Is Python\nPython') 'Is Python\nCode'</pre>
Only the end of the string matches the regex pattern so there’s only one substitution.
Again, you can use the re.MULTILINE flag to match the end of each line with the dollar-sign operator:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('Python$', 'Code', 'Is Python\nPython', flags=re.MULTILINE) 'Is Code\nCode'</pre>
Now, you replace both appearances of the string ‘Python’.
<h3>Python re.match(), re.search(), re.findall(), and re.fullmatch()</h3>
All four methods—re.findall(), re.search(), re.match(), and re.fullmatch()—search for a pattern in a given string. You can use the dollar-sign operator $ within each pattern to match the end of the string. Here’s one example per method:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'Python is Python'
>>> re.findall('Python$', text)
['Python']
>>> re.search('Python$', text)
<re.Match object; span=(10, 16), match='Python'>
>>> re.match('Python$', text)
>>> re.fullmatch('Python$', text)
>>></pre>
So you can use the dollar-sign operator to match at the end of the string. However, you should note that it doesn’t make a lot of sense to use it for the fullmatch() methods as it, by definition, already requires that the last character of the string is part of the matching substring.
You can also use the re.MULTILINE flag to match the end of each line (rather than only the end of the whole string):
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>> text = '''Is Python
Python'''
>>> re.findall('Python$', text, flags=re.MULTILINE)
['Python', 'Python']
>>> re.search('Python$', text, flags=re.MULTILINE)
<re.Match object; span=(3, 9), match='Python'>
>>> re.match('Python$', text, flags=re.MULTILINE)
>>> re.fullmatch('Python$', text, flags=re.MULTILINE)
>>></pre>
As the pattern doesn’t match the string prefix, both re.match() and re.fullmatch() return empty results.
<h2>How to Match the Caret (^) or Dollar ($) Symbols in Your Regex?</h2>
You know that the caret and dollar symbols have a special meaning in Python’s regular expression module: they match the beginning or end of each string/line. But what if you search for the caret (^) or dollar ($) symbols themselves? How can you match them in a string?
The answer is simple: escape the caret or dollar symbols in your regular expression using the backslash. In particular, use ‘\^’ instead of ‘^’ and ‘\$’ instead of ‘$’. Here’s an example:
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re
>>> text = 'The product ^^^ costs $3 today.'
>>> re.findall('\^', text)
['^', '^', '^']
>>> re.findall('\$', text)
['$']</pre>
By escaping the special symbols ^ and $, you tell the regex engine to ignore their special meaning.
<h2>Where to Go From Here?</h2>
You’ve learned everything you need to know about the caret operator ^ and the dollar-sign operator $ in this regex tutorial. 
Summary: The caret operator ^ matches at the beginning of a string. The dollar-sign operator $ matches at the end of a string. If you want to match at the beginning or end of each line in a multi-line string, you can set the re.MULTILINE flag in all the relevant re methods.
Want to earn money while you learn Python? Average Python programmers earn more than $50 per hour. You can become average, can’t you?
Join the free webinar that shows you how to become a thriving coding business owner online!
<a href="https://blog.finxter.com/webinar-freelancer/">[Webinar] Are You a Six-Figure Freelance Developer?</a>
Join us. It’s fun! <img src="https://s.w.org/images/core/emoji/12.0.0-1/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" />
</div>

https://www.sickgaming.net/blog/2020/02/01/python-regex-how-to-match-the-start-of-line-and-end-of-line/