{"id":109211,"date":"2020-02-14T09:11:14","date_gmt":"2020-02-14T09:11:14","guid":{"rendered":"https:\/\/blog.finxter.com\/?p=6297"},"modified":"2020-02-14T09:11:14","modified_gmt":"2020-02-14T09:11:14","slug":"python-regex-syntax-2-minute-primer","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2020\/02\/14\/python-regex-syntax-2-minute-primer\/","title":{"rendered":"Python Regex Syntax [2-Minute Primer]"},"content":{"rendered":"<p>A regular expression is a decades-old concept in computer science. Invented in the 1950s by famous mathematician Stephen Cole Kleene, the decades of evolution brought a huge variety of operations. Collecting all operations and writing up a comprehensive list would result in a very thick and unreadable book by itself.<\/p>\n<figure class=\"wp-block-embed-youtube wp-block-embed is-type-rich is-provider-embed-handler wp-embed-aspect-4-3 wp-has-aspect-ratio\">\n<div class=\"wp-block-embed__wrapper\">\n<div class=\"ast-oembed-container\"><iframe loading=\"lazy\" title=\"Python Regex Syntax [15-Minute Primer]\" width=\"1100\" height=\"825\" src=\"https:\/\/www.youtube.com\/embed\/G1JLUpc-bvY?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/p><\/div>\n<\/figure>\n<p>Fortunately, you don\u2019t have to learn all regular expressions before you can start using them in your practical code projects. Next, you\u2019ll get a quick and dirty overview of the most important regex operations and how to use them in Python. In follow-up chapters, you\u2019ll then study them in detail &#8212; with many practical applications and code puzzles.<\/p>\n<p>Here are the most important regex operators:<\/p>\n<ul>\n<li><code>.<\/code> The <strong>wild-card<\/strong> operator (\u2018dot\u2019) matches any character in a string except the newline character \u2018\\n\u2019. For example, the regex \u2018&#8230;\u2019 matches all words with three characters such as \u2018abc\u2019, \u2018cat\u2019, and \u2018dog\u2019.\u00a0\u00a0<\/li>\n<li><code>*<\/code> The <strong>zero-or-more<\/strong> asterisk operator matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex. For example, the regex \u2018cat*\u2019 matches the strings \u2018ca\u2019, \u2018cat\u2019, \u2018catt\u2019, \u2018cattt\u2019, and \u2018catttttttt\u2019.\u00a0<\/li>\n<li><code>?<\/code> The <strong>zero-or-one<\/strong> operator matches (as the name suggests) either zero or one occurrences of the immediately preceding regex. For example, the regex \u2018cat?\u2019 matches both strings \u2018ca\u2019 and \u2018cat\u2019 &#8212; but not \u2018catt\u2019, \u2018cattt\u2019, and \u2018catttttttt\u2019.\u00a0<\/li>\n<li><code>+<\/code> The <strong>at-least-one<\/strong> operator matches one or more occurrences of the immediately preceding regex. For example, the regex \u2018cat+\u2019 does not match the string \u2018ca\u2019 but matches all strings with at least one trailing character \u2018t\u2019 such as \u2018cat\u2019, \u2018catt\u2019, and \u2018cattt\u2019.\u00a0<\/li>\n<li><code>^<\/code> The <strong>start-of-string<\/strong> operator matches the beginning of a string. For example, the regex \u2018^p\u2019 would match the strings \u2018python\u2019 and \u2018programming\u2019 but not \u2018lisp\u2019 and \u2018spying\u2019 where the character \u2018p\u2019 does not occur at the start of the string.<\/li>\n<li><code>$<\/code> The <strong>end-of-string<\/strong> operator matches the end of a string. For example, the regex \u2018py$\u2019 would match the strings \u2018main.py\u2019 and \u2018pypy\u2019 but not the strings \u2018python\u2019 and \u2018pypi\u2019.\u00a0<\/li>\n<li><code>A|B<\/code> The <strong>OR<\/strong> operator matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions. For example, the regex \u2018(hello)|(hi)\u2019 matches strings \u2018hello world\u2019 and \u2018hi python\u2019. It wouldn\u2019t make sense to try to match both of them at the same time.<\/li>\n<li><code>AB<\/code>\u00a0 The <strong>AND<\/strong> operator matches first the regex A and second the regex B, in this sequence. We\u2019ve already seen it trivially in the regex \u2018ca\u2019 that matches first regex \u2018c\u2019 and second regex \u2018a\u2019.\u00a0<\/li>\n<\/ul>\n<p>Note that I gave the above operators some more meaningful names (in bold) so that you can immediately grasp the purpose of each regex. For example, the \u2018^\u2019 operator is usually denoted as the \u2018caret\u2019 operator. Those names are not descriptive so I came up with more kindergarten-like words such as the \u201cstart-of-string\u201d operator.<\/p>\n<p>Let\u2019s dive into some examples!<\/p>\n<h2>Examples<\/h2>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) '''\nFinds all occurrences of an arbitrary character that is\nfollowed by the character sequence 'a!'.\n['Ha!'] ''' print(re.findall('is.*and', text)) '''\nFinds all occurrences of the word 'is',\nfollowed by an arbitrary number of characters\nand the word 'and'.\n['is settled, and'] ''' print(re.findall('her:?', text)) '''\nFinds all occurrences of the word 'her',\nfollowed by zero or one occurrences of the colon ':'.\n['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) '''\nFinds all occurrences of the word 'her',\nfollowed by one or more occurrences of the colon ':'.\n['her:'] ''' print(re.findall('^Ha.*', text)) '''\nFinds all occurrences where the string starts with\nthe character sequence 'Ha', followed by an arbitrary\nnumber of characters except for the new-line character. Can you figure out why Python doesn't find any?\n[] ''' print(re.findall('\\n$', text)) '''\nFinds all occurrences where the new-line character '\\n'\noccurs at the end of the string.\n['\\n'] ''' print(re.findall('(Life|Death)', text)) '''\nFinds all occurrences of either the word 'Life' or the\nword 'Death'.\n['Life', 'Death'] '''\n<\/pre>\n<p>In these examples, you\u2019ve already seen the special symbol <code>\\n<\/code> which denotes the new-line character in Python (and most other languages). There are many special characters, specifically designed for regular expressions.<\/p>\n<h2>Where to Go From Here?<\/h2>\n<p>If you want to master regular expressions once and for all, I&#8217;d recommend that you read the massive regular expression tutorial on the Finxter blog &#8212; for free!<\/p>\n<p><a href=\"https:\/\/blog.finxter.com\/python-regex\/\">https:\/\/blog.finxter.com\/python-regex\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A regular expression is a decades-old concept in computer science. Invented in the 1950s by famous mathematician Stephen Cole Kleene, the decades of evolution brought a huge variety of operations. Collecting all operations and writing up a comprehensive list would result in a very thick and unreadable book by itself. Fortunately, you don\u2019t have to [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[857],"tags":[73,468,528],"class_list":["post-109211","post","type-post","status-publish","format-standard","hentry","category-python-tut","tag-programming","tag-python","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/109211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=109211"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/109211\/revisions"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=109211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=109211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=109211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}