[Tut] Python Regex Multiple Repeat Error - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] Python Regex Multiple Repeat Error (/thread-93822.html) |
[Tut] Python Regex Multiple Repeat Error - xSicKxBot - 03-02-2020 Python Regex Multiple Repeat Error <div><p>Just like me an hour ago, you’re probably sitting in front of your regular expression code, puzzled by a strange error message:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">re.error: multiple repeat at position x</pre> <p>How does it arise? Where does it come from? And, most importantly, how can you get rid of it?</p> <p>This article gives you answers to all of those questions. Alternatively, you can also watch my short explainer video that shows you real quick how to resolve this error:</p> <figure class="wp-block-embed-youtube wp-block-embed is-type-rich is-provider-embed-handler wp-embed-aspect-16-9 wp-has-aspect-ratio"> <div class="wp-block-embed__wrapper"> <div class="ast-oembed-container"><iframe title="Python Regex Multiple Repeat Error" width="1100" height="619" src="https://www.youtube.com/embed/BtogzCIT4zA?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div> </p></div> </figure> <h2>How Does the Multiple Repeat Error Arise in Python Re?</h2> <p><strong>Python’s <a rel="noreferrer noopener" aria-label=" (opens in a new tab)" href="https://blog.finxter.com/python-regex/" target="_blank">regex library re</a> throws the multiple repeat error when you try to stack two regex quantifiers on top of each other. For example, the regex <code>'a++'</code> will cause the multiple repeat error. You can get rid of this error by avoiding to stack quantifiers on top of each other. </strong></p> <p>Here’s an example:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> re.findall('a++', 'aaaa') Traceback (most recent call last): File "<pyshell#29>", line 1, in <module> re.findall('a++', 'aaaa') File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\re.py", line 223, in findall ... re.error: multiple repeat at position 2</pre> <p>I have shortened the error message to focus on the relevant parts. In the code, you first import the regex library re. You then use the <code>re.findall(pattern, string)</code> function (<a rel="noreferrer noopener" aria-label="see this blog tutorial (opens in a new tab)" href="https://blog.finxter.com/python-re-findall/" target="_blank">see this blog tutorial</a>) to find the pattern <code>'a++'</code> in the string <code>'aaaa'</code>.</p> <p>However, this doesn’t make a lot of sense: what’s the meaning of the pattern <code>a++</code> anyway?</p> <h2>[Tips] What’s the Source of the Multiple Repeat Error and How to Avoid It?</h2> <p>The error happens if you use the Python <a rel="noreferrer noopener" aria-label="regex (opens in a new tab)" href="https://blog.finxter.com/python-regex/" target="_blank">regex</a> package <code>re</code>. There are many different reasons for it but all of them have the same source: you stack quantifiers on top of each other. </p> <p>If you don’t know what a quantifier is, scroll down and read the following subsection where I show you exactly what it is.</p> <p>Here’s a list of reasons for the error message. Maybe your reason is among them?</p> <ul> <li>You use the regex pattern <code>'X++'</code> for any regex expression <code>X</code>. To avoid this error, get rid of one quantifier.</li> <li>You use the regex pattern <code>'X+*'</code> for any regex expression <code>X</code>. To avoid this error, get rid of one quantifier.</li> <li>You use the regex pattern <code>'X**'</code> for any regex expression <code>X</code>. To avoid this error, get rid of one quantifier.</li> <li>You use the regex pattern <code>'X{m,n}*'</code> for any regex expression <code>X</code> and number of repetitions <code>m</code> and <code>n</code>. To avoid this error, get rid of one quantifier.</li> <li>You try to match a number of characters <code>'+'</code> and use a second quantifier on top of it such as <code>'+?'</code>. In this case, you should escape the first quantifier symbol <code>'\+'</code>. </li> <li>You try to match a number of characters <code>'*'</code> and use a second quantifier on top of it such as <code>'*+'</code>. Avoid this error by escaping the first quantifier symbol <code>'\*'</code>. </li> </ul> <p>Oftentimes, the error appears if you don’t properly escape the special quantifier metacharacters in your regex pattern. </p> <p>Here’s a <a rel="noreferrer noopener" aria-label="StackOverflow (opens in a new tab)" href="https://stackoverflow.com/questions/19942314/python-multiple-repeat-error" target="_blank">StackOverflow</a> post that shows some code where this happened:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">... term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale' p = re.compile(term, re.IGNORECASE) ...</pre> <p>I edited the given code snippet to show the important part. The code fails because of a <code>multiple repeat error</code>. Can you see why?</p> <p>The reason is that the regex <code>'lg incite" OR author:"http++www.dealitem.com" OR "for sale'</code> contains two plus quantifiers stacked on top of each other in the substring <code>'http++'</code>. Get rid of those and the code will run again!</p> <h2>Python Regex Quantifiers</h2> <p>The word “<a href="https://www.merriam-webster.com/dictionary/quantity" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">quantifier</a>” originates from latin: it’s meaning is <strong>quantus = how much / how often</strong>.</p> <p><strong>This is precisely what a regular expression quantifier means: you tell the regex engine how often you want to match a given pattern. </strong></p> <p>If you think you don’t define any quantifier, you do it implicitly: no quantifier means to match the regular expression exactly once.</p> <p>So what are the regex quantifiers in Python?</p> <figure class="wp-block-table is-style-stripes"> <table> <tbody> <tr> <td>Quantifier</td> <td>Meaning</td> </tr> <tr> <td><code>A?</code></td> <td>Match regular expression <code>A</code> zero or one times</td> </tr> <tr> <td><code>A*</code></td> <td>Match regular expression <code>A</code> zero or more times</td> </tr> <tr> <td><code>A+</code></td> <td>Match regular expression <code>A</code> one or more times</td> </tr> <tr> <td><code>A{m}</code></td> <td>Match regular expression <code>A</code> exactly m times</td> </tr> <tr> <td><code>A{m,n}</code></td> <td>Match regular expression <code>A</code> between m and n times (included)</td> </tr> </tbody> </table> </figure> <p>Note that in this tutorial, I assume you have at least a remote idea of what regular expressions actually are. If you haven’t, no problem, check out my <a rel="noreferrer noopener" aria-label="detailed regex tutorial on this blog (opens in a new tab)" href="https://blog.finxter.com/python-regex/" target="_blank">detailed regex tutorial on this blog</a>.</p> <p>You see in the table that the quantifiers <code>?</code>, <code>*</code>, <code>+</code>, <code>{m}</code>, and <code>{m,n}</code> define how often you repeat the matching of regex <code>A</code>. </p> <p>Let’s have a look at some examples—one for each quantifier:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> import re >>> re.findall('a?', 'aaaa') ['a', 'a', 'a', 'a', ''] >>> re.findall('a*', 'aaaa') ['aaaa', ''] >>> re.findall('a+', 'aaaa') ['aaaa'] >>> re.findall('a{3}', 'aaaa') ['aaa'] >>> re.findall('a{1,2}', 'aaaa') ['aa', 'aa']</pre> <p>In each line, you try a different quantifier on the same text <code>'aaaa'</code>. And, interestingly, each line leads to a different output:</p> <ul> <li>The <a rel="noreferrer noopener" aria-label="zero-or-one (opens in a new tab)" href="https://blog.finxter.com/python-re-question-mark/" target="_blank">zero-or-one</a> regex <code>'a?'</code> matches four times one <code>'a'</code>. Note that it doesn’t match zero characters if it can avoid doing so.</li> <li>The <a rel="noreferrer noopener" href="https://blog.finxter.com/python-re-question-mark/" target="_blank">zero-or-more</a> regex <code>'a*'</code> matches once four <code>'a'</code>s and consumes them. At the end of the string, it can still match the empty string.</li> <li>The <a rel="noreferrer noopener" href="https://blog.finxter.com/python-re-question-mark/" target="_blank">one-or-more</a> regex <code>'a+'</code> matches once four <code>'a'</code>s. In contrast to the previous quantifier, it cannot match an empty string.</li> <li>The repeating regex <code>'a{3}'</code> matches up to three <code>'a'</code>s in a single run. It can do so only once.</li> <li>The repeating regex <code>'a{1,2}'</code> matches one or two <code>'a'</code>s. It tries to match as many as possible.</li> </ul> <p>You’ve learned the basic quantifiers of Python regular expressions. </p> <h2>Where to Go From Here?</h2> <p>To summarize, you’ve learned that the multiple repeat error appears whenever you try to stack multiple quantifiers on top of each other. Avoid this and the error message will disappear. </p> <p>If you want to boost your Python regex skills to the next level, check out my free <a href="https://blog.finxter.com/python-regex/" target="_blank" rel="noreferrer noopener" aria-label="in-depth regex superpower tutorial (20,000+) words (opens in a new tab)">in-depth regex superpower tutorial (20,000+) words</a>. Or just bookmark the article for later read.</p> </div> https://www.sickgaming.net/blog/2020/02/29/python-regex-multiple-repeat-error/ |