[Tut] Python Regex Capturing Groups – A Helpful Guide (+Video) - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python Regex Capturing Groups – A Helpful Guide (+Video) (/thread-100956.html) |
[Tut] Python Regex Capturing Groups – A Helpful Guide (+Video) - xSicKxBot - 04-07-2023 Python Regex Capturing Groups – A Helpful Guide (+Video) <div> <div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"1272054","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"1","legendonly":"","readonly":"","score":"5","starsonly":"","best":"5","gap":"5","greet":"Rate this post","legend":"5\/5 - (1 vote)","size":"24","title":"Python Regex Capturing Groups - A Helpful Guide (+Video)","width":"142.5","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'> <div class="kksr-stars"> <div class="kksr-stars-inactive"> <div class="kksr-star" data-star="1" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="2" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="3" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="4" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="5" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> <div class="kksr-stars-active" style="width: 142.5px;"> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> </div> <div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div> </p></div> <p><strong>Python’s regex capturing groups allow you to extract parts of a string that match a pattern. </strong></p> <ul> <li><strong>Enclose the desired pattern in parentheses <code>()</code> to create a capturing group. </strong></li> <li><strong>Use <code><a rel="noreferrer noopener" href="https://blog.finxter.com/python-regex-search/" data-type="URL" data-id="https://blog.finxter.com/python-regex-search/" target="_blank">re.search()</a></code> to find matches, and access captured groups with the <code>.group()</code> method or by indexing the result. </strong></li> </ul> <p><strong>For example: <code>match = re.search(r'(\d+)', 'abc123')</code> captures the digits, and <code>match.group(1)</code> returns <code>'123'</code>.</strong></p> <hr class="wp-block-separator has-alpha-channel-opacity"/> <p>One of the powerful aspects of Python’s regular expression capabilities is the use of <strong><em>capturing groups</em></strong>. By using capturing groups, you can easily extract specific portions of a matching string and efficiently process and manipulate data that meets a particular pattern.</p> <figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/python-regex-capturing-groups-a-helpful-guide-video/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2FJwkciuqcDH4%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure> <p>I like to use capturing groups to isolate and extract relevant data from a given text. To define a capturing group, I simply place the desired regex rule within parentheses, like this: <code>(rule)</code>. This helps me match portions of a string based on the rule and output the captured data for further processing.</p> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Tip</strong>: An essential technique I employ while working with capturing groups is using the <code><a rel="noreferrer noopener" href="https://blog.finxter.com/python-regex-finditer/" data-type="post" data-id="17635" target="_blank">finditer()</a></code> method, as it finds all the matches and returns an <a rel="noreferrer noopener" href="https://blog.finxter.com/iterators-iterables-and-itertools/" data-type="post" data-id="29507" target="_blank">iterator</a> yielding match objects that match the regex pattern. Subsequently, I can iterate through each <code>match</code> object and extract its value.</p> <p>Before I’ll teach you everything about capturing groups, allow me to give some background information on Python regular expressions. If you’re already an expert, you can <a href="#groups" target="_blank" rel="noreferrer noopener">jump directly to the “capturing groups”</a> part of the article.</p> <h2 class="wp-block-heading">Understanding Regular Expressions</h2> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="925" height="612" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-74.png" alt="" class="wp-image-1271835" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-74.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-74-300x198.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-74-768x508.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> <p>As someone who works with Python, I often find myself using regular expressions. </p> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="??" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/python-regex/" data-type="post" data-id="6210" target="_blank" rel="noreferrer noopener">Python Regex Superpower [Full Tutorial]</a></p> <p>They provide a powerful tool for dealing with strings, patterns, and parsing text data. In this section, I’ll guide you through the basics of regular expressions and shed some light on capturing groups, which can be extremely helpful in many situations. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f60a.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Basic Syntax</h3> <p>Regular expressions, or regex, are patterns that represent varying sets of characters. In Python, we can use the <code>re</code> module to perform various operations with regular expressions. A key component of regex is the set of metacharacters, which help define specific patterns. </p> <p>Some common metacharacters are:</p> <ul> <li><code>.</code> – matches any single character except a newline</li> <li><code>\w</code> – matches any word character (letters, digits, and underscores)</li> <li><code>\d</code> – matches any digit (0-9)</li> <li><code>\s</code> – matches any whitespace character (including spaces, tabs, and newlines)</li> </ul> <p>It’s important to remember that these metacharacters must be preceded by a backslash to represent their special meanings.</p> <h3 class="wp-block-heading">Special Characters</h3> <p>There are several special characters in regex that have specific meanings:</p> <ul> <li><code>*</code> – matches zero or more occurrences of the preceding character</li> <li><code>+</code> – matches one or more occurrences of the preceding character</li> <li><code>?</code> – matches zero or one occurrences of the preceding character</li> <li><code>{n}</code> – matches exactly <code>n</code> occurrences of the preceding character</li> <li><code>{n,m}</code> – matches a minimum of <code>n</code> and a maximum of <code>m</code> occurrences of the preceding character</li> </ul> <p>These <a href="https://blog.finxter.com/regex-special-characters-examples-in-python-re/" data-type="post" data-id="6421" target="_blank" rel="noreferrer noopener">special characters</a> can be combined with metacharacters and other characters to create complex patterns. My experience with Python’s regex capturing groups has been incredibly useful in extracting and manipulating specific parts of text data. Once you get the hang of it, you’ll find many ways to leverage these tools for your projects. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h2 class="wp-block-heading">Python Regex Module</h2> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="925" height="616" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-75.png" alt="" class="wp-image-1271837" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-75.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-75-300x200.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-75-768x511.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> </div> <p>In this section, I will share my knowledge on importing the regex module and some useful common functions when working with Python regex capturing groups. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f60a.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Importing the Module</h3> <p>Before I can use the regex module, I need to import it into my Python script. To do so, I simply add the following line of code at the beginning of my script:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re </pre> <p>After importing the <code>re</code> module, you can start using regular expressions to perform various text searching and manipulation tasks. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Common Functions</h3> <p>The Python regex module has several helpful functions that make working with regular expressions easier. Some of the most commonly used functions include:</p> <ul> <li><strong><code><a href="https://blog.finxter.com/python-regex-compile/" data-type="post" data-id="5783" target="_blank" rel="noreferrer noopener">re.compile()</a></code></strong>: Compiles a regular expression pattern into an object for later use. The pattern can then be applied to various texts using the object’s methods. Example:</li> </ul> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">pattern = re.compile(r'\d+') </pre> <ul> <li><strong><code><a href="https://blog.finxter.com/python-regex-search/" target="_blank" rel="noreferrer noopener">re.search()</a></code></strong>: Searches the given string for a match to the specified pattern. Returns a match object if a match is found, and <code>None</code> if no matches are found. Example:</li> </ul> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">result = re.search(pattern, "Hello 123 World!") </pre> <ul> <li><strong><code><a href="https://blog.finxter.com/python-re-findall/" data-type="URL" data-id="https://blog.finxter.com/python-re-findall/" target="_blank" rel="noreferrer noopener">re.findall()</a></code></strong>: Returns a list of all non-overlapping matches of the pattern in the target string. If no matches are found, an empty list is returned. Example:</li> </ul> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">result = re.findall(pattern, "My number is 555-1234, and my friend's number is 555-5678") </pre> <ul> <li><strong><code><a href="https://blog.finxter.com/python-regex-finditer/" data-type="post" data-id="17635" target="_blank" rel="noreferrer noopener">re.finditer()</a></code></strong>: Returns an iterator containing match objects for all non-overlapping matches in the target string. Example:</li> </ul> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">result = re.finditer(pattern, "I have 3 cats, 2 dogs, and 1 turtle") </pre> <p>By using these functions, I can effectively search and manipulate text data using regular expressions. Python regex capturing groups make it even simpler to extract specific pieces of information from the text. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f3af.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h2 class="wp-block-heading" id="groups">Capturing Groups</h2> <figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="925" height="615" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-76.png" alt="" class="wp-image-1271838" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-76.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-76-300x199.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-76-768x511.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> <p>As I dive into Python regex, one concept that has consistently come up is capturing groups. These groups simplify the process of isolating parts of a matched string for further use. In this section, I’ll discuss creating capturing groups, referencing captured groups, and the concept of non-capturing groups. Let’s dive in! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f30a.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Creating Capturing Groups</h3> <p>Creating a capturing group is as simple as encasing a part of a regular expression pattern in parentheses. For instance, if I have the pattern <code>(\d+)-(\d+)</code>, there are two capturing groups: one for each set of digits. </p> <p>You can see this in action using the Python regex library like this:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re pattern = re.compile(r'(\d+)-(\d+)') match = pattern.search('Product: 123-456') </pre> <p>Now, the <code>match</code> object contains two captured groups <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f3c6.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" />: one for <code>'123'</code> and another for <code>'456'</code>.</p> <h3 class="wp-block-heading">Referencing Captured Groups</h3> <p class="has-global-color-8-background-color has-background">After capturing groups, you might want to reference them for various operations. Using the <code>group()</code> method, you can obtain the values captured. You can access them by their index, where <code>group(0)</code> represents the entire matched string, and <code>group(1)</code>, <code>group(2)</code>, etc., correspond to the subsequent captured groups. </p> <p>In my previous example, I can quickly access the captured groups like this:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">first_group = match.group(1) # '123' second_group = match.group(2) # '456' </pre> <p>Pretty straightforward, right? <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f604.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Non-Capturing Groups</h3> <p>Sometimes, you want a group only for the regex pattern, without capturing its content. This can be achieved by using non-capturing groups. To create one, add <code>?:</code> following the opening parenthesis: <code>(?:...)</code>.</p> <p>Here’s an example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re pattern = re.compile(r'(?:ID: )(\d+)') match = pattern.search('User ID: 789') </pre> <p>In this case, the <code>'ID: '</code> portion is within a non-capturing group, and only the digits afterwards are captured. Now, if I reference the captured group, I only get the user ID:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">user_id = match.group(1) # '789' </pre> <p>And there you have it! I hope this illustrates the basics of Python regex capturing groups, including creating captures, referencing them, and when to use non-capturing groups. Happy regex-ing! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h2 class="wp-block-heading">Advanced Techniques</h2> <figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="925" height="617" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-77.png" alt="" class="wp-image-1271839" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-77.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-77-300x200.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-77-768x512.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> <p>In this section, I will discuss some advanced techniques for working with capturing groups in Python regular expressions. These techniques, such as <strong>named capturing groups</strong> and <strong>conditional matching</strong>, can make your regex patterns more powerful and easier to read. Let’s dive in! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f30a.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h3 class="wp-block-heading">Named Capturing Groups</h3> <p>Named capturing groups allow you to assign a name to a specific capturing group. This makes your regex patterns more readable and easier to understand. In Python, you can define a named capturing group using the following syntax: <code>(?P<name>...)</code>, where “name” is the desired name for the group, and “…” represents the pattern you want to capture.</p> <p>For example, let’s say I want to extract dates with the format “<code>MM/DD/YYYY</code>“. Here’s how I can use named capturing groups:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re pattern = r"(?P&lt;month&gt;\d\d)/(?P&lt;day&gt;\d\d)/(?P&lt;year&gt;\d\d\d\d)" date_string = "12/25/2020" match = re.search(pattern, date_string) if match: print('Month:', match.group('month')) print('Day:', match.group('day')) print('Year:', match.group('year')) </pre> <p>This will output:</p> <pre class="wp-block-preformatted"><code>Month: 12 Day: 25 Year: 2020 </code></pre> <p>As you can see, using named capturing groups made our regex pattern more readable, and accessing the captured groups is much simpler. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f60a.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <p class="has-base-2-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f469-200d-1f4bb.png" alt="??" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/python-regex-named-groups/" data-type="post" data-id="836544" target="_blank" rel="noreferrer noopener">Named Capturing Groups Made Easy</a></p> <h3 class="wp-block-heading">Conditional Matching</h3> <p class="has-global-color-8-background-color has-background">Conditional matching in regex allows you to match different patterns based on the existence of specific capturing groups. In Python, you can use the following syntax for conditional matching: <code>(?(id)yes|no)</code>, where “<code>id</code>” is the identifier for a capturing group, and “<code>yes</code>” and “<code>no</code>” are the patterns to match if the specified group exists, respectively.</p> <p>For example, let’s say I want to find all occurrences of the word <code>"color"</code> or <code>"colour"</code> in a text. I can use conditional matching to achieve this:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re pattern = r"col(ou)?r(?(1)u|o)r" text = "I like the color red. My favourite colour is blue." matches = re.findall(pattern, text) for match in matches: print(match[0]) </pre> <p>This will output:</p> <pre class="wp-block-preformatted"><code>o ou </code></pre> <p>Here, we used conditional matching to identify both the American and British spellings of <code>"color/colour"</code> and print the captured group responsible for the difference. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f3a8.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <p>I hope you find these advanced techniques useful in your Python regex adventures. Good luck exploring even more regex possibilities! <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f40d.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <h2 class="wp-block-heading">Practical Examples</h2> <figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="925" height="609" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-73.png" alt="" class="wp-image-1271832" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-73.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-73-300x198.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-73-768x506.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> <p>In this section, I’ll demonstrate a couple of practical examples using Python regex capturing groups, <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f40d.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f9e9.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> focusing on email validation and URL parsing.</p> <h3 class="wp-block-heading">Email Validation</h3> <p>Validating email addresses is a common task in many applications. Using capturing groups, I can create a regex pattern to match and validate email addresses. Let’s get started. First, here’s the regex pattern:</p> <pre class="wp-block-preformatted"><code>'^([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,})$'</code></pre> <p>In this pattern, I’ve used several capturing groups:</p> <ul> <li>The first group <code>([a-zA-Z0-9._%+-]+)</code> captures the username part of the email address. It includes letters, numbers, and some special characters.</li> <li>The second group <code>([a-zA-Z0-9.-]+)</code> captures the domain name, which consists of letters, numbers, and some special characters.</li> <li>The third group <code>([a-zA-Z]{2,})</code> captures the top-level domain, consisting of at least two letters.</li> </ul> <p>Now, let’s use this regex pattern in a Python function to validate an email address:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re def validate_email(email): pattern = r'^([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,})$' if re.match(pattern, email): # I match the input email against the pattern return True else: return False </pre> <h3 class="wp-block-heading">URL Parsing</h3> <p>In this example, I’ll show you how to use capturing groups to parse and extract components from a URL. Let’s start with the regex pattern:</p> <pre class="wp-block-preformatted"><code>'^(https?)://([^\s/:]+)(:\d+)?(/)?(.*)?$'</code></pre> <p>In this pattern, I’ve used several capturing groups:</p> <ul> <li>The first group <code>(https?)</code> captures the protocol (http or https).</li> <li>The second group <code>([^\s/:]+)</code> captures the domain name.</li> <li>The third group <code>(:\d+)?</code> captures the optional port number.</li> <li>The fourth group <code>(/)?</code> captures the optional slash after the domain and port.</li> <li>The fifth group <code>(.*)?</code> captures the remaining URL path, if any.</li> </ul> <p>Now, let’s create a Python function to extract the components from a URL:</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import re def parse_url(url): pattern = r'^(https?)://([^\s/:]+)(:\d+)?(/)?(.*)?$' match = re.match(pattern, url) # I match the input URL against the pattern if match: return { 'protocol': match.group(1), 'domain': match.group(2), 'port': match.group(3), 'slash': match.group(4), 'path': match.group(5) } else: return None </pre> <p>With this parse_url function, I can now extract and analyze various components of a URL. <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f310.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f50d.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p> <figure class="wp-block-image size-full"><img decoding="async" loading="lazy" width="925" height="582" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-55.png" alt="" class="wp-image-1271358" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-55.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-55-300x189.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-55-768x483.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> <div class="wp-block-group"> <div class="wp-block-group__inner-container is-layout-flow"> <h2 class="wp-block-heading"><a href="https://academy.finxter.com/university/mastering-regular-expressions/" target="_blank" rel="noreferrer noopener" title="https://academy.finxter.com/university/mastering-regular-expressions/">Python Regex Course</a></h2> <p><strong><em>Google engineers are regular expression masters. </em></strong>The Google search engine is a massive <em>text-processing engine</em> that extracts value from trillions of webpages. </p> <p><strong><em>Facebook engineers are regular expression masters.</em></strong> Social networks like Facebook, WhatsApp, and Instagram connect humans via <em>text messages</em>. </p> <p><strong><em>Amazon engineers are regular expression masters. </em></strong>Ecommerce giants ship products based on <em>textual product descriptions</em>. Regular expressions rule the game when text processing meets computer science. </p> <p><em><strong>If you want to become a regular expression master too, check out the<a href="https://academy.finxter.com/university/mastering-regular-expressions/" target="_blank" rel="noreferrer noopener" title="https://academy.finxter.com/university/mastering-regular-expressions/"> most comprehensive Python regex course</a> on the planet:</strong></em></p> <div class="wp-block-image"> <figure class="aligncenter size-large"><a href="https://academy.finxter.com/university/mastering-regular-expressions/" target="_blank" rel="noopener"><img decoding="async" loading="lazy" width="1024" height="576" src="https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-1024x576.jpg" alt="" class="wp-image-19840" srcset="https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-scaled.jpg 1024w, https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-300x169.jpg 300w, https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-768x432.jpg 768w, https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-1536x864.jpg 1536w, https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-2048x1152.jpg 2048w, https://blog.finxter.com/wp-content/uploads/2018/10/ClickToPlay-150x84.jpg 150w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure> </div> </div> </div> </div> https://www.sickgaming.net/blog/2023/04/06/python-regex-capturing-groups-a-helpful-guide-video/ |