{"id":108031,"date":"2020-01-25T12:37:14","date_gmt":"2020-01-25T12:37:14","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=5861"},"modified":"2020-01-25T12:37:14","modified_gmt":"2020-01-25T12:37:14","slug":"python-regex-sub","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2020\/01\/25\/python-regex-sub\/","title":{"rendered":"Python Regex Sub"},"content":{"rendered":"<p>Do you want to replace all occurrences of a pattern in a string? You&#8217;re in the right place! This article is all about the <strong>re.sub(pattern, string)<\/strong> method of Python&#8217;s&nbsp;<a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/docs.python.org\/3\/library\/re.html\">re library<\/a>. <\/p>\n<p>Let&#8217;s answer the following question:<\/p>\n<h2>How Does re.sub() Work in Python?<\/h2>\n<p><strong>The <strong>re.sub(pattern, repl, string, count=0, flags=0)<\/strong> method returns a new string where all occurrences of the pattern in the old string are replaced by repl.<\/strong><\/p>\n<p>Here&#8217;s a minimal example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import re\n>>> text = 'C++ is the best language. C++ rocks!'\n>>> re.sub('C\\+\\+', 'Python', text) 'Python is the best language. Python rocks!'\n>>> <\/pre>\n<p>The text contains two occurrences of the string &#8216;C++&#8217;. You use the re.sub() method to search all of those occurrences. Your goal is to replace all those with the new string &#8216;Python&#8217; (Python is the best language after all).<\/p>\n<p>Note that you must escape the &#8216;+&#8217; symbol in &#8216;C++&#8217; as otherwise it would mean the <em>at-least-one<\/em> <em>regex<\/em>. <\/p>\n<p>You can also see that the sub() method replaces all matched patterns in the string&#8212;not only the first one.<\/p>\n<p>But there&#8217;s more! Let&#8217;s have a look at the formal definition of the sub() method.<\/p>\n<p><strong>Specification<\/strong><\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">re.sub(pattern, repl, string, count=0, flags=0)<\/pre>\n<p>The method has four arguments&#8212;two of which are optional.<\/p>\n<ul>\n<li><strong>pattern<\/strong>: the regular expression pattern to search for strings you want to replace. <\/li>\n<li><strong>repl<\/strong>: the replacement string or function. If it&#8217;s a function, it needs to take one argument (the <a href=\"https:\/\/blog.finxter.com\/python-regex-match\/\">match object<\/a>) which is passed for each occurrence of the pattern. The return value of the replacement function is a string that replaces the matching substring. <\/li>\n<li><strong>string<\/strong>: the text you want to replace.<\/li>\n<li><strong>count <\/strong>(optional argument): the maximum number of replacements you want to perform. Per default, you use count=0 which reads as <em>replace all occurrences of the pattern<\/em>. <\/li>\n<li><strong>flags <\/strong>(optional argument): a more advanced modifier that allows you to customize the behavior of the method. Per default, you don&#8217;t use any flags. Want to know <a href=\"https:\/\/blog.finxter.com\/python-regex-flags\/\">how to use those flags? Check out this detailed article<\/a> on the Finxter blog.<\/li>\n<\/ul>\n<p>The initial three arguments are required. The remaining two arguments are optional. <\/p>\n<p>You&#8217;ll learn about those arguments in more detail later. <\/p>\n<p><strong>Return Value:<\/strong><\/p>\n<p><em>A new string where <strong>count<\/strong> occurrences of the first substrings that match the <strong>pattern<\/strong> are replaced with the string value defined in the <strong>repl<\/strong> argument.<\/em><\/p>\n<h2>Regex Sub Minimal Example<\/h2>\n<p>Let&#8217;s study some more examples&#8212;from simple to more complex.<\/p>\n<p>The easiest use is with only three arguments: the pattern &#8216;sing&#8217;, the replacement string &#8216;program&#8217;, and the string you want to modify (<code>text<\/code> in our example). <\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import re\n>>> text = 'Learn to sing because singing is fun.'\n>>> re.sub('sing', 'program', text) 'Learn to program because programing is fun.'<\/pre>\n<p>Just ignore the grammar mistake for now. You get the point: we don&#8217;t sing, we program.<\/p>\n<p>But what if you want to actually fix this grammar mistake? After all, it&#8217;s <em>programming<\/em>, not <em>programing<\/em>. In this case, we need to substitute &#8216;sing&#8217; with &#8216;program&#8217; in some cases and &#8216;sing&#8217; with &#8216;programm&#8217; in other cases. <\/p>\n<p>You see where this leads us: the sub argument must be a function! So let&#8217;s try this:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import re def sub(matched): if matched.group(0)=='singing': return 'programming' else: return 'program' text = 'Learn to sing because singing is fun.'\nprint(re.sub('sing(ing)?', sub, text))\n# Learn to program because programming is fun.<\/pre>\n<p>In this example, you first define a substitution function sub. The function takes the matched object as an input and returns a string. If it matches the longer form &#8216;singing&#8217;, it returns &#8216;programming&#8217;. Else it matches the shorter form &#8216;sing&#8217;, so it returns the shorter replacement string &#8216;program&#8217; instead. <\/p>\n<h2>How to Use the count Argument of the Regex Sub Method?<\/h2>\n<p>What if you don&#8217;t want to substitute all occurrences of a pattern but only a limited number of them? Just use the <strong>count <\/strong>argument! Here&#8217;s an example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import re\n>>> s = 'xxxxxxhelloxxxxxworld!xxxx'\n>>> re.sub('x+', '', s, count=2) 'helloworld!xxxx'\n>>> re.sub('x+', '', s, count=3) 'helloworld!'<\/pre>\n<p>In the first substitution operation, you replace only two occurrences of the pattern &#8216;x+&#8217;. In the second, you replace all three.<\/p>\n<p>You can also use positional arguments to save some characters:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> re.sub('x+', '', s, 3) 'helloworld!'<\/pre>\n<p>But as many coders don&#8217;t know about the <strong>count <\/strong>argument, you probably should use the keyword argument for readability.<\/p>\n<h2>How to Use the Optional Flag Argument?<\/h2>\n<p>As you&#8217;ve seen in the specification, the re.sub() method comes with an optional fourth <strong>flag <\/strong>argument:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">re.sub(pattern, repl, string, count=0, flags=0)<\/pre>\n<p>What&#8217;s the purpose of the <a href=\"https:\/\/blog.finxter.com\/python-regex-flags\/\">flags argument<\/a>?<\/p>\n<p>Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex). <\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table>\n<tbody>\n<tr>\n<td><strong>Syntax<\/strong><\/td>\n<td><strong>Meaning<\/strong><\/td>\n<\/tr>\n<tr>\n<td> <strong>re.ASCII<\/strong><\/td>\n<td>If you don&#8217;t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters &#8212; as the name suggests. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.A<\/strong> <\/td>\n<td>Same as re.ASCII <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.DEBUG<\/strong> <\/td>\n<td>If you use this flag, Python will print some useful information to the shell that helps you debugging your regex. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.IGNORECASE<\/strong> <\/td>\n<td>If you use this flag, the regex engine will perform case-insensitive matching. So if you&#8217;re searching for [A-Z], it will also match [a-z]. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.I<\/strong> <\/td>\n<td>Same as re.IGNORECASE <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.LOCALE<\/strong> <\/td>\n<td>Don&#8217;t use this flag &#8212; ever. It&#8217;s depreciated&#8212;the idea was to perform case-insensitive matching depending on your current locale. But it isn&#8217;t reliable. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.L<\/strong> <\/td>\n<td>Same as re.LOCALE <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.MULTILINE<\/strong> <\/td>\n<td>This flag switches on the following feature: the start-of-the-string regex &#8216;^&#8217; matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex &#8216;$&#8217; that now matches also at the end of each line in a multi-line string. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.M<\/strong> <\/td>\n<td>Same as re.MULTILINE <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.DOTALL<\/strong> <\/td>\n<td>Without using this flag, the dot regex &#8216;.&#8217; matches all characters except the newline character &#8216;n&#8217;. Switch on this flag to really match all characters including the newline character. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.S<\/strong> <\/td>\n<td>Same as re.DOTALL <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.VERBOSE<\/strong> <\/td>\n<td>To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character &#8216;#&#8217; are ignored in the regex. <\/td>\n<\/tr>\n<tr>\n<td> <strong>re.X<\/strong> <\/td>\n<td>Same as re.VERBOSE <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>Here&#8217;s how you&#8217;d use it in a minimal example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> import re\n>>> s = 'xxxiiixxXxxxiiixXXX'\n>>> re.sub('x+', '', s) 'iiiXiiiXXX'\n>>> re.sub('x+', '', s, flags=re.I) 'iiiiii'<\/pre>\n<p>In the second substitution operation, you ignore the capitalization by using the flag re.I which is short for re.IGNORECASE. That&#8217;s why it substitutes even the uppercase &#8216;X&#8217; characters that now match the regex &#8216;x+&#8217;, too.<\/p>\n<h2>What&#8217;s the Difference Between Regex Sub and String Replace? <\/h2>\n<p>In a way, the re.sub() method is the more powerful variant of the <a href=\"https:\/\/blog.finxter.com\/python-string-replace\/\">string.replace() method which is described in detail on this Finxter blog article<\/a>. <\/p>\n<p>Why? Because you can replace all occurrences of a regex pattern rather than only all occurrences of a string in another string.<\/p>\n<p>So with re.sub() you can do everything you can do with string.replace() but some things more!<\/p>\n<p>Here&#8217;s an example:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> 'Python is python is PYTHON'.replace('python', 'fun') 'Python is fun is PYTHON'\n>>> re.sub('(Python)|(python)|(PYTHON)', 'fun', 'Python is python is PYTHON') 'fun is fun is fun'<\/pre>\n<p>The string.replace() method only replaces the lowercase word &#8216;python&#8217; while the re.sub() method replaces all occurrences of uppercase or lowercase variants.<\/p>\n<p>Note, you can accomplish the same thing even easier with the <strong>flags <\/strong>argument.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> re.sub('python', 'fun', 'Python is python is PYTHON', flags=re.I) 'fun is fun is fun'<\/pre>\n<h2>How to Remove Regex Pattern in Python?<\/h2>\n<p>Nothing simpler than that. Just use the empty string as a replacement string:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">>>> re.sub('p', '', 'Python is python is PYTHON', flags=re.I) 'ython is ython is YTHON'<\/pre>\n<p>You replace all occurrences of the pattern <code>'p'<\/code> with the empty string <code>''<\/code>. In other words, you remove all occurrences of <code>'p'<\/code>. As you use the <code>flags=re.I<\/code> argument, you ignore capitalization.<\/p>\n<h2>Related Re Methods<\/h2>\n<p>There are five important regular expression methods which you should master:<\/p>\n<ul>\n<li>The <strong>re.findall(pattern, string)<\/strong> method returns a list of string matches. Read more in <a href=\"https:\/\/blog.finxter.com\/python-re-findall\/\">our blog tutorial<\/a>.<\/li>\n<li>The <strong>re.search(pattern, string)<\/strong> method returns a match object of the first match. Read more in <a href=\"https:\/\/blog.finxter.com\/python-regex-search\/\">our blog tutorial<\/a>.<\/li>\n<li>The <strong>re.match(pattern, string)<\/strong> method returns a match object if the regex matches at the beginning of the string. Read more in <a href=\"https:\/\/blog.finxter.com\/python-regex-match\/\">our blog tutorial<\/a>.<\/li>\n<li>The <strong>re.fullmatch(pattern, string)<\/strong> method returns a match object if the regex matches the whole string. Read more in <a href=\"https:\/\/blog.finxter.com\/python-regex-fullmatch\/\">our blog tutorial<\/a>.<\/li>\n<li>The <strong>re.compile(pattern)<\/strong> method prepares the regular expression pattern\u2014and returns a regex object which you can use multiple times in your code. Read more in <a href=\"https:\/\/blog.finxter.com\/python-regex-compile\/\">our blog tutorial<\/a>.<\/li>\n<li>The<strong> re.split(pattern, string)<\/strong> method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those. Read more in <a href=\"https:\/\/blog.finxter.com\/python-regex-split\/\">our blog tutorial<\/a>.<\/li>\n<\/ul>\n<p>These five methods are 80% of what you need to know to get started with Python&#8217;s regular expression functionality.<\/p>\n<h2>Where to Go From Here?<\/h2>\n<p><strong>You&#8217;ve learned the re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences of the pattern in the old string are replaced by repl.<\/strong><\/p>\n<p>Learning Python is hard. But if you cheat, it isn&#8217;t as hard as it has to be:<\/p>\n<p><a href=\"https:\/\/blog.finxter.com\/subscribe\/\">Download 8 Free Python Cheat Sheets now!<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Do you want to replace all occurrences of a pattern in a string? You&#8217;re in the right place! This article is all about the re.sub(pattern, string) method of Python&#8217;s&nbsp;re library. Let&#8217;s answer the following question: How Does re.sub() Work in Python? The re.sub(pattern, repl, string, count=0, flags=0) method returns a new string where all occurrences [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-108031","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/108031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=108031"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/108031\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=108031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=108031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=108031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}