[Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($) - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($) (/thread-93456.html) |
[Tut] Python Regex – How to Match the Start of Line (^) and End of Line ($) - xSicKxBot - 02-02-2020 Python Regex – How to Match the Start of Line (^) and End of Line ($) <div><p>This article is all about the <strong>start of line ^ and end of line $ regular expressions in Python’s <a rel="noreferrer noopener" target="_blank" href="https://docs.python.org/3/library/re.html">re library</a>. </strong>These two regexes are fundamental to all regular expressions—even outside the Python world. So invest 5 minutes now and master them once and for all!</p> <h2>Python Re Start-of-String (^) Regex</h2> <p>You can use the caret operator ^ to match the beginning of the string. For example, this is useful if you want to ensure that a pattern appears at the beginning of a string. Here’s an example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> re.findall('^PYTHON', 'PYTHON is fun.') ['PYTHON']</pre> <p>The findall(pattern, string) method finds all occurrences of the pattern in the string. The caret at the beginning of the pattern ‘^PYTHON’ ensures that you match the word Python only at the beginning of the string. In the previous example, this doesn’t make any difference. But in the next example, it does:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('^PYTHON', 'PYTHON! PYTHON is fun') ['PYTHON']</pre> <p>Although there are two occurrences of the substring ‘PYTHON’, there’s only one matching substring—at the beginning of the string.</p> <p>But what if you want to match not only at the beginning of the string but at the beginning of each line in a multi-line string? In other words:</p> <h3>Python Re Start-of-Line (^) Regex</h3> <p>The caret operator, per default, only applies to the start of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the beginning of the string.</p> <p>However, you may want to match at the beginning of each line. For example, you may want to find all lines that start with ‘Python’ in a given string.</p> <p>You can specify that the caret operator matches the beginning of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> text = ''' Python is great. Python is the fastest growing major programming language in the world. Pythonistas thrive.''' >>> re.findall('^Python', text) [] >>> re.findall('^Python', text, re.MULTILINE) ['Python', 'Python', 'Python'] >>> </pre> <p>The first output is the empty list because the string ‘Python’ does not appear at the beginning of the string. </p> <p>The second output is the list of three matching substrings because the string ‘Python’ appears three times at the beginning of a line.</p> <h3>Python re.sub()</h3> <p><strong>The re.sub(pattern, repl, string, count=0, flags=0)</strong> method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">the Finxter blog tutorial</a>.</p> <p>You can use the caret operator to substitute wherever some pattern appears at the beginning of the string:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> re.sub('^Python', 'Code', 'Python is \nPython') 'Code is \nPython'</pre> <p>Only the beginning of the string matches the regex pattern so you’ve got only one substitution.</p> <p>Again, you can use the re.MULTILINE flag to match the beginning of each line with the caret operator:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('^Python', 'Code', 'Python is \nPython', flags=re.MULTILINE) 'Code is \nCode'</pre> <p>Now, you replace both appearances of the string ‘Python’.</p> <h3>Python re.match(), re.search(), re.findall(), and re.fullmatch()</h3> <p>Let’s quickly recap the most important regex methods in Python:</p> <ul> <li>The <strong>re.findall(pattern, string, flags=0)</strong> method returns a list of string matches. Read more in <a href="https://blog.finxter.com/python-re-findall/">our blog tutorial</a>.</li> <li>The <strong>re.search(pattern, string<strong>, flags=0</strong>)</strong> method returns a match object of the first match. Read more in <a href="https://blog.finxter.com/python-regex-search/">our blog tutorial</a>.</li> <li>The <strong>re.match(pattern, string<strong>, flags=0</strong>)</strong> method returns a match object if the regex matches at the beginning of the string. Read more in <a href="https://blog.finxter.com/python-regex-match/">our blog tutorial</a>.</li> <li>The <strong>re.fullmatch(pattern, string<strong>, flags=0</strong>)</strong> method returns a match object if the regex matches the whole string. Read more in <a href="https://blog.finxter.com/python-regex-fullmatch/">our blog tutorial</a>.</li> </ul> <p>You can see that all four methods search for a pattern in a given string. You can use the caret operator ^ within each pattern to match the beginning of the string. Here’s one example per method:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> text = 'Python is Python' >>> re.findall('^Python', text) ['Python'] >>> re.search('^Python', text) <re.Match object; span=(0, 6), match='Python'> >>> re.match('^Python', text) <re.Match object; span=(0, 6), match='Python'> >>> re.fullmatch('^Python', text) >>> </pre> <p>So you can use the caret operator to match at the beginning of the string. However, you should note that it doesn’t make a lot of sense to use it for the match() and fullmatch() methods as they, by definition, start by trying to match the first character of the string.</p> <p>You can also use the re.MULTILINE flag to match the beginning of each line (rather than only the beginning of the string):</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> text = '''Python is Python''' >>> re.findall('^Python', text, flags=re.MULTILINE) ['Python', 'Python'] >>> re.search('^Python', text, flags=re.MULTILINE) <re.Match object; span=(0, 6), match='Python'> >>> re.match('^Python', text, flags=re.MULTILINE) <re.Match object; span=(0, 6), match='Python'> >>> re.fullmatch('^Python', text, flags=re.MULTILINE) >>> </pre> <p>Again, it’s questionable whether this makes sense for the re.match() and re.fullmatch() methods as they only look for a match at the beginning of the string.</p> <h2>Python Re End of String ($) Regex</h2> <p>Similarly, you can use the dollar-sign operator $ to match the end of the string. Here’s an example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> re.findall('fun$', 'PYTHON is fun') ['fun']</pre> <p>The findall() method finds all occurrences of the pattern in the string—although the trailing dollar-sign $ ensures that the regex matches only at the end of the string.</p> <p>This can significantly alter the meaning of your regex as you can see in the next example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.findall('fun$', 'fun fun fun') ['fun']</pre> <p>Although, there are three occurrences of the substring ‘fun’, there’s only one matching substring—at the end of the string.</p> <p>But what if you want to match not only at the end of the string but at the end of each line in a multi-line string?</p> <h3>Python Re End of Line ($)</h3> <p>The dollar-sign operator, per default, only applies to the end of a string. So if you’ve got a multi-line string—for example, when reading a text file—it will still only match once: at the end of the string.</p> <p>However, you may want to match at the end of each line. For example, you may want to find all lines that end with ‘.py’.</p> <p>To achieve this, you can specify that the dollar-sign operator matches the end of each line via the re.MULTILINE flag. Here’s an example showing both usages—without and with setting the re.MULTILINE flag:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> text = ''' Coding is fun Python is fun Games are fun Agreed?''' >>> re.findall('fun$', text) [] >>> re.findall('fun$', text, flags=re.MULTILINE) ['fun', 'fun', 'fun'] >>> </pre> <p>The first output is the empty list because the string ‘fun’ does not appear at the end of the string. </p> <p>The second output is the list of three matching substrings because the string ‘fun’ appears three times at the end of a line.</p> <h3>Python re.sub()</h3> <p><strong>The re.sub(pattern, repl, string, count=0, flags=0)</strong> method returns a new string where all occurrences of the pattern in the old string are replaced by repl. Read more in <a href="https://blog.finxter.com/python-regex-sub/">the Finxter blog tutorial</a>.</p> <p>You can use the dollar-sign operator to substitute wherever some pattern appears at the end of the string:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> re.sub('Python$', 'Code', 'Is Python\nPython') 'Is Python\nCode'</pre> <p>Only the end of the string matches the regex pattern so there’s only one substitution.</p> <p>Again, you can use the re.MULTILINE flag to match the end of each line with the dollar-sign operator:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> re.sub('Python$', 'Code', 'Is Python\nPython', flags=re.MULTILINE) 'Is Code\nCode'</pre> <p>Now, you replace both appearances of the string ‘Python’.</p> <h3>Python re.match(), re.search(), re.findall(), and re.fullmatch()</h3> <p>All four methods—re.findall(), re.search(), re.match(), and re.fullmatch()—search for a pattern in a given string. You can use the dollar-sign operator $ within each pattern to match the end of the string. Here’s one example per method:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> text = 'Python is Python' >>> re.findall('Python$', text) ['Python'] >>> re.search('Python$', text) <re.Match object; span=(10, 16), match='Python'> >>> re.match('Python$', text) >>> re.fullmatch('Python$', text) >>></pre> <p>So you can use the dollar-sign operator to match at the end of the string. However, you should note that it doesn’t make a lot of sense to use it for the fullmatch() methods as it, by definition, already requires that the last character of the string is part of the matching substring.</p> <p>You can also use the re.MULTILINE flag to match the end of each line (rather than only the end of the whole string):</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>> text = '''Is Python Python''' >>> re.findall('Python$', text, flags=re.MULTILINE) ['Python', 'Python'] >>> re.search('Python$', text, flags=re.MULTILINE) <re.Match object; span=(3, 9), match='Python'> >>> re.match('Python$', text, flags=re.MULTILINE) >>> re.fullmatch('Python$', text, flags=re.MULTILINE) >>></pre> <p>As the pattern doesn’t match the string prefix, both re.match() and re.fullmatch() return empty results.</p> <h2>How to Match the Caret (^) or Dollar ($) Symbols in Your Regex?</h2> <p>You know that the caret and dollar symbols have a special meaning in Python’s regular expression module: they match the beginning or end of each string/line. But what if you search for the caret (^) or dollar ($) symbols themselves? How can you match them in a string?</p> <p>The answer is simple: escape the caret or dollar symbols in your regular expression using the backslash. In particular, use ‘\^’ instead of ‘^’ and ‘\$’ instead of ‘$’. Here’s an example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> text = 'The product ^^^ costs $3 today.' >>> re.findall('\^', text) ['^', '^', '^'] >>> re.findall('\$', text) ['$']</pre> <p>By escaping the special symbols ^ and $, you tell the regex engine to ignore their special meaning.</p> <h2>Where to Go From Here?</h2> <p>You’ve learned everything you need to know about the caret operator ^ and the dollar-sign operator $ in this regex tutorial. </p> <p><strong>Summary</strong>: <em>The caret operator ^ matches at the beginning of a string. The dollar-sign operator $ matches at the end of a string. If you want to match at the beginning or end of each line in a multi-line string, you can set the re.MULTILINE flag in all the relevant re methods.</em></p> <p><strong>Want to earn money while you learn Python?</strong> Average Python programmers earn more than $50 per hour. You can become average, can’t you?</p> <p>Join the free webinar that shows you how to become a thriving coding business owner online!</p> <p><a href="https://blog.finxter.com/webinar-freelancer/">[Webinar] Are You a Six-Figure Freelance Developer?</a></p> <p>Join us. It’s fun! <img src="https://s.w.org/images/core/emoji/12.0.0-1/72x72/1f642.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> </div> https://www.sickgaming.net/blog/2020/02/01/python-regex-how-to-match-the-start-of-line-and-end-of-line/ |