Posted on Leave a comment

Python Regex Split

Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!

This article is all about the re.split(pattern, string) method of Python’s re library.

Let’s answer the following question:

How Does re.split() Work in Python?

The re.split(pattern, string, maxsplit=0, flags=0) method returns a list of strings by matching all occurrences of the pattern in the string and dividing the string along those.

Here’s a minimal example:

>>> import re
>>> string = 'Learn Python with\t Finxter!'
>>> re.split('\s+', string)
['Learn', 'Python', 'with', 'Finxter!']

The string contains four words that are separated by whitespace characters (in particular: the empty space ‘ ‘ and the tabular character ‘\t’). You use the regular expression ‘\s+’ to match all occurrences of a positive number of subsequent whitespaces. The matched substrings serve as delimiters. The result is the string divided along those delimiters.

But that’s not all! Let’s have a look at the formal definition of the split method.

Specification

re.split(pattern, string, maxsplit=0, flags=0)

The method has four arguments—two of which are optional.

  • pattern: the regular expression pattern you want to use as a delimiter.
  • string: the text you want to break up into a list of strings.
  • maxsplit (optional argument): the maximum number of split operations (= the size of the returned list). Per default, the maxsplit argument is 0, which means that it’s ignored.
  • flags (optional argument): a more advanced modifier that allows you to customize the behavior of the function. Per default the regex module does not consider any flags. Want to know how to use those flags? Check out this detailed article on the Finxter blog.

The first and second arguments are required. The third and fourth arguments are optional.

You’ll learn about those arguments in more detail later.

Return Value:

The regex split method returns a list of substrings obtained by using the regex as a delimiter.

Regex Split Minimal Example

Let’s study some more examples—from simple to more complex.

The easiest use is with only two arguments: the delimiter regex and the string to be split.

>>> import re
>>> string = 'fgffffgfgPythonfgisfffawesomefgffg'
>>> re.split('[fg]+', string)
['', 'Python', 'is', 'awesome', '']

You use an arbitrary number of ‘f’ or ‘g’ characters as regular expression delimiters. How do you accomplish this? By combining the character class regex [A] and the one-or-more regex A+ into the following regex: [fg]+. The strings in between are added to the return list.

How to Use the maxsplit Argument?

What if you don’t want to split the whole string but only a limited number of times. Here’s an example:

>>> string = 'a-bird-in-the-hand-is-worth-two-in-the-bush'
>>> re.split('-', string, maxsplit=5)
['a', 'bird', 'in', 'the', 'hand', 'is-worth-two-in-the-bush']
>>> re.split('-', string, maxsplit=2)
['a', 'bird', 'in-the-hand-is-worth-two-in-the-bush']

We use the simple delimiter regex ‘-‘ to divide the string into substrings. In the first method call, we set maxsplit=5 to obtain six list elements. In the second method call, we set maxsplit=3 to obtain three list elements. Can you see the pattern?

You can also use positional arguments to save some characters:

 >>> re.split('-', string, 2)
['a', 'bird', 'in-the-hand-is-worth-two-in-the-bush']

But as many coders don’t know about the maxsplit argument, you probably should use the keyword argument for readability.

How to Use the Optional Flag Argument?

As you’ve seen in the specification, the re.split() method comes with an optional fourth ‘flag’ argument:

re.split(pattern, string, maxsplit=0, flags=0)

What’s the purpose of the flags argument?

Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).

Syntax Meaning
re.ASCII If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A Same as re.ASCII
re.DEBUG If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I Same as re.IGNORECASE
re.LOCALE Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L Same as re.LOCALE
re.MULTILINE This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M Same as re.MULTILINE
re.DOTALL Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S Same as re.DOTALL
re.VERBOSE To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X Same as re.VERBOSE

Here’s how you’d use it in a practical example:

>>> import re
>>> re.split('[xy]+', text, flags=re.I)
['the', 'russians', 'are', 'coming']

Although your regex is lowercase, we ignore the capitalization by using the flag re.I which is short for re.IGNORECASE. If we wouldn’t do it, the result would be quite different:

>>> re.split('[xy]+', text)
['theXXXYYYrussiansXX', 'are', 'Y', 'coming']

As the character class [xy] only contains lowerspace characters ‘x’ and ‘y’, their uppercase variants appear in the returned list rather than being used as delimiters.

What’s the Difference Between re.split() and string.split() Methods in Python?

The method re.split() is much more powerful. The re.split(pattern, string) method can split a string along all occurrences of a matched pattern. The pattern can be arbitrarily complicated. This is in contrast to the string.split(delimiter) method which also splits a string into substrings along the delimiter. However, the delimiter must be a normal string.

An example where the more powerful re.split() method is superior is in splitting a text along any whitespace characters:

import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely Frost Upon the sweetest flower of all the field. ''' print(re.split('\s+', text)) '''
['', 'Ha!', 'let', 'me', 'see', 'her:', 'out,', 'alas!', "he's", 'cold:', 'Her', 'blood', 'is', 'settled,', 'and', 'her', 'joints', 'are', 'stiff;', 'Life', 'and', 'these', 'lips', 'have', 'long', 'been', 'separated:', 'Death', 'lies', 'on', 'her', 'like', 'an', 'untimely', 'Frost', 'Upon', 'the', 'sweetest', 'flower', 'of', 'all', 'the', 'field.', ''] '''

The re.split() method divides the string along any positive number of whitespace characters. You couldn’t achieve such a result with string.split(delimiter) because the delimiter must be a constant-sized string.

Related Re Methods

There are five important regular expression methods which you should master:

  • The re.findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
  • The re.search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
  • The re.match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
  • The re.fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.
  • The re.compile(pattern) method prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code. Read more in our blog tutorial.

These five methods are 80% of what you need to know to get started with Python’s regular expression functionality.

Where to Go From Here?

You’ve learned about the re.split(pattern, string) method that divides the string along the matched pattern occurrences and returns a list of substrings.

Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:

Download 8 Free Python Cheat Sheets now!

Posted on Leave a comment

Python Regex Compile

Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!

This article is all about the re.compile(pattern) method of Python’s re library. Before we dive into re.compile(), let’s get an overview of the four related methods you must understand:

  • The findall(pattern, string) method returns a list of string matches. Read more in our blog tutorial.
  • The search(pattern, string) method returns a match object of the first match. Read more in our blog tutorial.
  • The match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Read more in our blog tutorial.
  • The fullmatch(pattern, string) method returns a match object if the regex matches the whole string. Read more in our blog tutorial.

Equipped with this quick overview of the most critical regex methods, let’s answer the following question:

How Does re.compile() Work in Python?

The re.compile(pattern) method returns a regular expression object (see next section)

You then use the object to call important regex methods such as search(string), match(string), fullmatch(string), and findall(string).

In short: You compile the pattern first. You search the pattern in a string second.

This two-step approach is more efficient than calling, say, search(pattern, string) at once. That is, IF you call the search() method multiple times on the same pattern. Why? Because you can reuse the compiled pattern multiple times.

Here’s an example:

import re # These two lines ...
regex = re.compile('Py...n')
match = regex.search('Python is great') # ... are equivalent to ...
match = re.search('Py...n', 'Python is great')

In both instances, the match variable contains the following match object:

<re.Match object; span=(0, 6), match='Python'>

But in the first case, we can find the pattern not only in the string ‘Python is great’ but also in other strings—without any redundant work of compiling the pattern again and again.

Specification:

re.compile(pattern, flags=0)

The method has up to two arguments.

We’ll explore those arguments in more detail later.

Return Value:

The re.compile(patterns, flags) method returns a regular expression object. You may ask (and rightly so):

What’s a Regular Expression Object?

Python internally creates a regular expression object (from the Pattern class) to prepare the pattern matching process. You can call the following methods on the regex object:

Method Description
Pattern.search(string[, pos[, endpos]]) Searches the regex anywhere in the string and returns a match object or None. You can define start and end positions of the search.
Pattern.match(string[, pos[, endpos]]) Searches the regex at the beginning of the string and returns a match object or None. You can define start and end positions of the search.
Pattern.fullmatch(string[, pos[, endpos]]) Matches the regex with the whole string and returns a match object or None. You can define start and end positions of the search.
Pattern.split(string, maxsplit=0) Divides the string into a list of substrings. The regex is the delimiter. You can define a maximum number of splits.
Pattern.findall(string[, pos[, endpos]]) Searches the regex anywhere in the string and returns a list of matching substrings. You can define start and end positions of the search.
Pattern.finditer(string[, pos[, endpos]]) Returns an iterator that goes over all matches of the regex in the string (returns one match object after another). You can define the start and end positions of the search.
Pattern.sub(repl, string, count=0) Returns a new string by replacing the first count occurrences of the regex in the string (from left to right) with the replacement string repl.
Pattern.subn(repl, string, count=0) Returns a new string by replacing the first count occurrences of the regex in the string (from left to right) with the replacement string repl. However, it returns a tuple with the replaced string as the first and the number of successful replacements as the second tuple value.

If you’re familiar with the most basic regex methods, you’ll realize that all of them appear in this table. But there’s one distinction: you don’t have to define the pattern as an argument. For example, the regex method re.search(pattern, string) will internally compile a regex object p and then call p.search(string).

You can see this fact in the official implementation of the re.search(pattern, string) method:

def search(pattern, string, flags=0): """Scan through string looking for a match to the pattern, returning a Match object, or None if no match was found.""" return _compile(pattern, flags).search(string)

(Source: GitHub repository of the re package)

The re.search(pattern, string) method is a mere wrapper for compiling the pattern first and calling the p.search(string) function on the compiled regex object p.

Is It Worth Using Python’s re.compile()?

No, in the vast majority of cases, it’s not worth the extra line.

Consider the following example:

import re # These two lines ...
regex = re.compile('Py...n')
match = regex.search('Python is great') # ... are equivalent to ...
match = re.search('Py...n', 'Python is great')

Don’t get me wrong. Compiling a pattern once and using it many times throughout your code (e.g., in a loop) comes with a big performance benefit. In some anecdotal cases, compiling the pattern first lead to 10x to 50x speedup compared to compiling it again and again.

But the reason it is not worth the extra line is that Python’s re library ships with an internal cache. At the time of this writing, the cache has a limit of up to 512 compiled regex objects. So for the first 512 times, you can be sure when calling re.search(pattern, string) that the cache contains the compiled pattern already.

Here’s the relevant code snippet from re’s GitHub repository:

# --------------------------------------------------------------------
# internals _cache = {} # ordered! _MAXCACHE = 512
def _compile(pattern, flags): # internal: compile pattern if isinstance(flags, RegexFlag): flags = flags.value try: return _cache[type(pattern), pattern, flags] except KeyError: pass if isinstance(pattern, Pattern): if flags: raise ValueError( "cannot process flags argument with a compiled pattern") return pattern if not sre_compile.isstring(pattern): raise TypeError("first argument must be string or compiled pattern") p = sre_compile.compile(pattern, flags) if not (flags & DEBUG): if len(_cache) >= _MAXCACHE: # Drop the oldest item try: del _cache[next(iter(_cache))] except (StopIteration, RuntimeError, KeyError): pass _cache[type(pattern), pattern, flags] = p return p

Can you find the spots where the cache is initialized and used?

While in most cases, you don’t need to compile a pattern, in some cases, you should. These follow directly from the previous implementation:

  • You’ve got more than MAXCACHE patterns in your code.
  • You’ve got more than MAXCACHE different patterns between two same pattern instances. Only in this case, you will see “cache misses” where the cache has already flushed the seemingly stale pattern instances to make room for newer ones.
  • You reuse the pattern multiple times. Because if you don’t, it won’t make sense to use sparse memory to save them in your memory.
  • (Even then, it may only be useful if the patterns are relatively complicated. Otherwise, you won’t see a lot of performance benefits in practice.)

To summarize, compiling the pattern first and storing the compiled pattern in a variable for later use is often nothing but “premature optimization”—one of the deadly sins of beginner and intermediate programmers.

What Does re.compile() Really Do?

It doesn’t seem like a lot, does it? My intuition was that the real work is in finding the pattern in the text—which happens after compilation. And, of course, matching the pattern is the hard part. But a sensible compilation helps a lot in preparing the pattern to be matched efficiently by the regex engine—work that would otherwise have be done by the regex engine.

Regex’s compile() method does a lot of things such as:

  • Combine two subsequent characters in the regex if they together indicate a special symbol such as certain Greek symbols.
  • Prepare the regex to ignore uppercase and lowercase.
  • Check for certain (smaller) patterns in the regex.
  • Analyze matching groups in the regex enclosed in parentheses.

Here’s the implemenation of the compile() method—it looks more complicated than expected, no?

def _compile(code, pattern, flags): # internal: compile a (sub)pattern emit = code.append _len = len LITERAL_CODES = _LITERAL_CODES REPEATING_CODES = _REPEATING_CODES SUCCESS_CODES = _SUCCESS_CODES ASSERT_CODES = _ASSERT_CODES iscased = None tolower = None fixes = None if flags & SRE_FLAG_IGNORECASE and not flags & SRE_FLAG_LOCALE: if flags & SRE_FLAG_UNICODE: iscased = _sre.unicode_iscased tolower = _sre.unicode_tolower fixes = _ignorecase_fixes else: iscased = _sre.ascii_iscased tolower = _sre.ascii_tolower for op, av in pattern: if op in LITERAL_CODES: if not flags & SRE_FLAG_IGNORECASE: emit(op) emit(av) elif flags & SRE_FLAG_LOCALE: emit(OP_LOCALE_IGNORE[op]) emit(av) elif not iscased(av): emit(op) emit(av) else: lo = tolower(av) if not fixes: # ascii emit(OP_IGNORE[op]) emit(lo) elif lo not in fixes: emit(OP_UNICODE_IGNORE[op]) emit(lo) else: emit(IN_UNI_IGNORE) skip = _len(code); emit(0) if op is NOT_LITERAL: emit(NEGATE) for k in (lo,) + fixes[lo]: emit(LITERAL) emit(k) emit(FAILURE) code[skip] = _len(code) - skip elif op is IN: charset, hascased = _optimize_charset(av, iscased, tolower, fixes) if flags & SRE_FLAG_IGNORECASE and flags & SRE_FLAG_LOCALE: emit(IN_LOC_IGNORE) elif not hascased: emit(IN) elif not fixes: # ascii emit(IN_IGNORE) else: emit(IN_UNI_IGNORE) skip = _len(code); emit(0) _compile_charset(charset, flags, code) code[skip] = _len(code) - skip elif op is ANY: if flags & SRE_FLAG_DOTALL: emit(ANY_ALL) else: emit(ANY) elif op in REPEATING_CODES: if flags & SRE_FLAG_TEMPLATE: raise error("internal: unsupported template operator %r" % (op,)) if _simple(av[2]): if op is MAX_REPEAT: emit(REPEAT_ONE) else: emit(MIN_REPEAT_ONE) skip = _len(code); emit(0) emit(av[0]) emit(av[1]) _compile(code, av[2], flags) emit(SUCCESS) code[skip] = _len(code) - skip else: emit(REPEAT) skip = _len(code); emit(0) emit(av[0]) emit(av[1]) _compile(code, av[2], flags) code[skip] = _len(code) - skip if op is MAX_REPEAT: emit(MAX_UNTIL) else: emit(MIN_UNTIL) elif op is SUBPATTERN: group, add_flags, del_flags, p = av if group: emit(MARK) emit((group-1)*2) # _compile_info(code, p, _combine_flags(flags, add_flags, del_flags)) _compile(code, p, _combine_flags(flags, add_flags, del_flags)) if group: emit(MARK) emit((group-1)*2+1) elif op in SUCCESS_CODES: emit(op) elif op in ASSERT_CODES: emit(op) skip = _len(code); emit(0) if av[0] >= 0: emit(0) # look ahead else: lo, hi = av[1].getwidth() if lo != hi: raise error("look-behind requires fixed-width pattern") emit(lo) # look behind _compile(code, av[1], flags) emit(SUCCESS) code[skip] = _len(code) - skip elif op is CALL: emit(op) skip = _len(code); emit(0) _compile(code, av, flags) emit(SUCCESS) code[skip] = _len(code) - skip elif op is AT: emit(op) if flags & SRE_FLAG_MULTILINE: av = AT_MULTILINE.get(av, av) if flags & SRE_FLAG_LOCALE: av = AT_LOCALE.get(av, av) elif flags & SRE_FLAG_UNICODE: av = AT_UNICODE.get(av, av) emit(av) elif op is BRANCH: emit(op) tail = [] tailappend = tail.append for av in av[1]: skip = _len(code); emit(0) # _compile_info(code, av, flags) _compile(code, av, flags) emit(JUMP) tailappend(_len(code)); emit(0) code[skip] = _len(code) - skip emit(FAILURE) # end of branch for tail in tail: code[tail] = _len(code) - tail elif op is CATEGORY: emit(op) if flags & SRE_FLAG_LOCALE: av = CH_LOCALE[av] elif flags & SRE_FLAG_UNICODE: av = CH_UNICODE[av] emit(av) elif op is GROUPREF: if not flags & SRE_FLAG_IGNORECASE: emit(op) elif flags & SRE_FLAG_LOCALE: emit(GROUPREF_LOC_IGNORE) elif not fixes: # ascii emit(GROUPREF_IGNORE) else: emit(GROUPREF_UNI_IGNORE) emit(av-1) elif op is GROUPREF_EXISTS: emit(op) emit(av[0]-1) skipyes = _len(code); emit(0) _compile(code, av[1], flags) if av[2]: emit(JUMP) skipno = _len(code); emit(0) code[skipyes] = _len(code) - skipyes + 1 _compile(code, av[2], flags) code[skipno] = _len(code) - skipno else: code[skipyes] = _len(code) - skipyes + 1 else: raise error("internal: unsupported operand type %r" % (op,))

Don’t worry, you don’t need to understand the code. Just note that all this work would have to be done by the regex engine at “matching runtime” if you wouldn’t compile the pattern first. If we can do it only once, it’s certainly a low-hanging fruit for performance optimizations—especially for long regular expression patterns.

How to Use the Optional Flag Argument?

As you’ve seen in the specification, the compile() method comes with an optional third ‘flag’ argument:

re.compile(pattern, flags=0)

What’s the purpose of the flags argument?

Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).

Syntax Meaning
re.ASCII If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A Same as re.ASCII
re.DEBUG If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I Same as re.IGNORECASE
re.LOCALE Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L Same as re.LOCALE
re.MULTILINE This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M Same as re.MULTILINE
re.DOTALL Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S Same as re.DOTALL
re.VERBOSE To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X Same as re.VERBOSE

Here’s how you’d use it in a practical example:

import re text = 'Python is great (python really is)' regex = re.compile('Py...n', flags=re.IGNORECASE) matches = regex.findall(text)
print(matches)
# ['Python', 'python']

Although your regex ‘Python’ is uppercase, we ignore the capitalization by using the flag re.IGNORECASE.

Where to Go From Here?

You’ve learned about the re.compile(pattern) method that prepares the regular expression pattern—and returns a regex object which you can use multiple times in your code.

Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:

Download 8 Free Python Cheat Sheets now!

Posted on Leave a comment

JetBrains Mono–A Font For Programmers

JetBrains, the makers of programmer tools such as IntelliJ, WebStorm, CLion and Rider, as well as the programming language Kotlin have been working on a font specifically designed for code.  JetBrains Mono is an open source font family consisting of 8 fonts specifically designed with reading and writing code in mind.

Details from the JetBrains blog:

For the most part of our day we, as developers, look at the code. And it is no wonder that we are always on the lookout for the best font to make looking at the text on the screen easier on our eyes. However, the logic in many popular fonts does not always take into account the difference between reading through code and reading a book. Our eyes move along code in a very different way, often having to move vertically as often as they do horizontally, which is opposed to reading a book where they slide along the text always in the same direction.

Therefore, while working on JetBrains Mono we focused, among other things, on the issues that can cause eye fatigue during long sessions of working with code. We have considered things like the size and shape of letters; the amount of space between them, a balance naturally engineered in monospace fonts; unnecessary details and unclear distinctions between symbols, such as I’s and l’s for example; and programming ligatures when developing our font.

Today, we proudly present JetBrains Mono – a new open-source typeface specifically made for developers. Check out what makes JetBrains Mono unique in the big family of monospaced fonts and try it in your favorite code editor. Have a look at JetBrains Mono, your eyes will thank you for it.

More details about Mono are available here.  It is the default font on all 2020 JetBrains IDEs and is available as an option in version 2019.3 and beyond of all JetBrain products.  If you use another IDE you can download the zip here.  Learn more about JetBrains Mono, including how to install and configure in Visual Studio Code in the video below.

[youtube https://www.youtube.com/watch?v=AyCZ0dVlz4A&w=853&h=480]

Programming GameDev News


<!–

–>

Posted on Leave a comment

Python Regex Fullmatch

Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!

This article is all about the re.fullmatch(pattern, string) method of Python’s re library. There are three similar methods to help you use regular expressions:

  • The findall(pattern, string) method returns a list of string matches. Check out our blog tutorial.
  • The search(pattern, string) method returns a match object of the first match. Check out our blog tutorial.
  • The match(pattern, string) method returns a match object if the regex matches at the beginning of the string. Check out our blog tutorial.

So how does the re.fullmatch() method work? Let’s study the specification.

How Does re.fullmatch() Work in Python?

The re.fullmatch(pattern, string) method returns a match object if the pattern matches the whole string.

Specification:

re.fullmatch(pattern, string, flags=0)

The re.fullmatch() method has up to three arguments.

  • pattern: the regular expression pattern that you want to match.
  • string: the string which you want to search for the pattern.
  • flags (optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.

We’ll explore them in more detail later.

Return Value:

The re.fullmatch() method returns a match object. You may ask (and rightly so):

What’s a Match Object?

If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?

The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as fullmatch()—automatically create a match object upon the first pattern match.

At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:

>>> m = re.fullmatch('h...o', 'hello')
>>> m.start()
0
>>> m.end()
5

In the first line, you create a match object m by using the re.fullmatch() method. The pattern ‘h…o’ matches in the string ‘hello’ at start position 0 and end position 5. But note that as the fullmatch() method always attempts to match the whole string, the m.start() method will always return zero.

Now, you know the purpose of the match object in Python. Let’s check out a few examples of re.fullmatch()!

A Guided Example for re.fullmatch()

First, you import the re module and create the text string to be searched for the regex patterns:

>>> import re
>>> text = '''
Call me Ishmael. Some years ago--never mind how long precisely
--having little or no money in my purse, and nothing particular
to interest me on shore, I thought I would sail about a little
and see the watery part of the world. '''

Let’s say you want to match the full text with this regular expression:

>>> re.fullmatch('Call(.|\n)*', text)
>>> 

The first argument is the pattern to be found: 'Call(.|\n)*'. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. The third argument flags of the fullmatch() method is optional and we skip it in the code.

There’s no output! This means that the re.fullmatch() method did not return a match object. Why? Because at the beginning of the string, there’s no match for the ‘Call’ part of the regex. The regex starts with an empty line!

So how can we fix this? Simple, by matching a new line character ‘\n’ at the beginning of the string.

>>> re.fullmatch('\nCall(.|\n)*', text)
<re.Match object; span=(0, 229), match='\nCall me Ishmael. Some years ago--never mind how>

The regex (.|\n)* matches an arbitrary number of characters (new line characters or not) after the prefix ‘\nCall’. This matches the whole text so the result is a match object. Note that there are 229 matching positions so the string included in resulting match object is only the prefix of the whole matching string. This fact is often overlooked by beginner coders.

What’s the Difference Between re.fullmatch() and re.match()?

The methods re.fullmatch() and re.match(pattern, string) both return a match object. Both attempt to match at the beginning of the string. The only difference is that re.fullmatch() also attempts to match the end of the string as well: it wants to match the whole string!

You can see this difference in the following code:

>>> text = 'More with less'
>>> re.match('More', text)
<re.Match object; span=(0, 4), match='More'>
>>> re.fullmatch('More', text)
>>> 

The re.match(‘More’, text) method matches the string ‘More’ at the beginning of the string ‘More with less’. But the re.fullmatch(‘More’, text) method does not match the whole text. Therefore, it returns the None object—nothing is printed to your shell!

What’s the Difference Between re.fullmatch() and re.findall()?

There are two differences between the re.fullmatch(pattern, string) and re.findall(pattern, string) methods:

  • re.fullmatch(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.
  • re.fullmatch(pattern, string) can only match the whole string, while re.findall(pattern, string) can return multiple matches in the string.

Both can be seen in the following example:

>>> text = 'the 42th truth is 42'
>>> re.fullmatch('.*?42', text)
<re.Match object; span=(0, 20), match='the 42th truth is 42'>
>>> re.findall('.*?42', text)
['the 42', 'th truth is 42']

Note that the regex .*? matches an arbitrary number of characters but it attempts to consume as few characters as possible. This is called “non-greedy” match (the *? operator). The fullmatch() method only returns a match object that matches the whole string. The findall() method returns a list of all occurrences. As the match is non-greedy, it finds two such matches.

What’s the Difference Between re.fullmatch() and re.search()?

The methods re.fullmatch() and re.search(pattern, string) both return a match object. However, re.fullmatch() attempts to match the whole string while re.search() matches anywhere in the string.

You can see this difference in the following code:

>>> text = 'Finxter is fun!'
>>> re.search('Finxter', text)
<re.Match object; span=(0, 7), match='Finxter'>
>>> re.fullmatch('Finxter', text)
>>> 

The re.search() method retrieves the match of the ‘Finxter’ substring as a match object. But the re.fullmatch() method has no return value because the substring ‘Finxter’ does not match the whole string ‘Finxter is fun!’.

How to Use the Optional Flag Argument?

As you’ve seen in the specification, the fullmatch() method comes with an optional third ‘flag’ argument:

re.fullmatch(pattern, string, flags=0)

What’s the purpose of the flags argument?

Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).

Syntax Meaning
re.ASCII If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A Same as re.ASCII
re.DEBUG If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I Same as re.IGNORECASE
re.LOCALE Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L Same as re.LOCALE
re.MULTILINE This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M Same as re.MULTILINE
re.DOTALL Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S Same as re.DOTALL
re.VERBOSE To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X Same as re.VERBOSE

Here’s how you’d use it in a practical example:

>>> text = 'Python is great!'
>>> re.search('PYTHON', text, flags=re.IGNORECASE)
<re.Match object; span=(0, 6), match='Python'>

Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.

Where to Go From Here?

This article has introduced the re.fullmatch(pattern, string) method that attempts to match the whole string—and returns a match object if it succeeds or None if it doesn’t.

Learning Python is hard. But if you cheat, it isn’t as hard as it has to be:

Download 8 Free Python Cheat Sheets now!

Posted on Leave a comment

Python Regex Match

Why have regular expressions survived seven decades of technological disruption? Because coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!

This article is all about the match() method of Python’s re library. There are two similar methods to help you use regular expressions:

  • The easy-to-use but less powerful findall() method returns a list of string matches. Check out our blog tutorial.
  • The search() method returns a match object of the first match. Check out our blog tutorial.

So how does the re.match() method work? Let’s study the specification.

How Does re.match() Work in Python?

The re.match(pattern, string) method matches the pattern at the beginning of the string and returns a match object.

Specification:

re.match(pattern, string, flags=0)

The re.match() method has up to three arguments.

  • pattern: the regular expression pattern that you want to match.
  • string: the string which you want to search for the pattern.
  • flags (optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.

We’ll explore them in more detail later.

Return Value:

The re.match() method returns a match object. You may ask (and rightly so):

What’s a Match Object?

If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?

The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as match()—automatically create a match object upon the first pattern match.

At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:

>>> m = re.match('h...o', 'hello world')
>>> m.start()
0
>>> m.end()
5
>>> 'hello world'[m.start():m.end()] 'hello'

In the first line, you create a match object m by using the re.match() method. The pattern ‘h…o’ matches in the string ‘hello world’ at start position 0. You use the start and end position to access the substring that matches the pattern (using the popular Python technique of slicing). But note that as the match() method always attempts to match only at the beginning of the string, the m.start() method will always return zero.

Now, you know the purpose of the match() object in Python. Let’s check out a few examples of re.match()!

A Guided Example for re.match()

First, you import the re module and create the text string to be searched for the regex patterns:

>>> import re
>>> text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. '''

Let’s say you want to search the text for the string ‘her’:

>>> re.match('lips', text)
>>>

The first argument is the pattern to be found: the string ‘lips’. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. The third argument flags of the match() method is optional.

There’s no output! This means that the re.match() method did not return a match object. Why? Because at the beginning of the string, there’s no match for the regex pattern ‘lips’.

So how can we fix this? Simple, by matching all the characters that preced the string ‘lips’ in the text:

>>> re.match('(.|\n)*lips', text)
<re.Match object; span=(0, 122), match="\n Ha! let me see her: out, alas! he's cold:\n>

The regex (.|\n)*lips matches all prefixes (an arbitrary number of characters including new lines) followed by the string ‘lips’. This results in a new match object that matches a huge substring from position 0 to position 122. Note that the match object doesn’t print the whole substring to the shell. If you access the matched substring, you’ll get the following result:

>>> m = re.match('(.|\n)*lips', text)
>>> text[m.start():m.end()] "\n Ha! let me see her: out, alas! he's cold:\n Her blood is settled, and her joints are stiff;\n Life and these lips"

Interestingly, you can also achieve the same thing by specifying the third flag argument as follows:

>>> m = re.match('.*lips', text, flags=re.DOTALL)
>>> text[m.start():m.end()] "\n Ha! let me see her: out, alas! he's cold:\n Her blood is settled, and her joints are stiff;\n Life and these lips"

The re.DOTALL flag ensures that the dot operator . matches all characters including the new line character.

What’s the Difference Between re.match() and re.findall()?

There are two differences between the re.match(pattern, string) and re.findall(pattern, string) methods:

  • re.match(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.
  • re.match(pattern, string) returns only the first match in the string—and only at the beginning—while re.findall(pattern, string) returns all matches in the string.

Both can be seen in the following example:

>>> text = 'Python is superior to Python'
>>> re.match('Py...n', text)
<re.Match object; span=(0, 6), match='Python'>
>>> re.findall('Py...n', text)
['Python', 'Python']

The string ‘Python is superior to Python’ contains two occurrences of ‘Python’. The match() method only returns a match object of the first occurrence. The findall() method returns a list of all occurrences.

What’s the Difference Between re.match() and re.search()?

The methods re.search(pattern, string) and re.match(pattern, string) both return a match object of the first match. However, re.match() attempts to match at the beginning of the string while re.search() matches anywhere in the string.

You can see this difference in the following code:

>>> text = 'Slim Shady is my name'
>>> re.search('Shady', text)
<re.Match object; span=(5, 10), match='Shady'>
>>> re.match('Shady', text)
>>>

The re.search() method retrieves the match of the ‘Shady’ substring as a match object. But if you use the re.match() method, there is no match and no return value because the substring ‘Shady’ does not occur at the beginning of the string ‘Slim Shady is my name’.

How to Use the Optional Flag Argument?

As you’ve seen in the specification, the match() method comes with an optional third ‘flag’ argument:

re.match(pattern, string, flags=0)

What’s the purpose of the flags argument?

Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).

Syntax Meaning
re.ASCII If you don’t use this flag, the special Python regex symbols w, W, b, B, d, D, s and S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A Same as re.ASCII
re.DEBUG If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I Same as re.IGNORECASE
re.LOCALE Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L Same as re.LOCALE
re.MULTILINE This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M Same as re.MULTILINE
re.DOTALL Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘n’. Switch on this flag to really match all characters including the newline character.
re.S Same as re.DOTALL
re.VERBOSE To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X Same as re.VERBOSE

Here’s how you’d use it in a practical example:

>>> text = 'Python is great!'
>>> re.search('PYTHON', text, flags=re.IGNORECASE)
<re.Match object; span=(0, 6), match='Python'>

Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.

Where to Go From Here?

This article has introduced the re.match(pattern, string) method that attempts to match the first occurrence of the regex pattern at the beginning of a given string—and returns a match object.

Python soars in popularity. There are two types of people: those who understand coding and those who don’t. The latter will have larger and larger difficulties participating in the era of massive adoption and penetration of digital content. Do you want to increase your Python skills daily without investing a lot of time?

Then join my “Coffee Break Python” email list of tens of thousands of ambitious coders!

Posted on Leave a comment

Building ArmorPaint From Source

ArmorPaint is an open source competitor to Substance Painter, from the creator of the Armory game engine (tutorial series available here).  It is available for just 16 Euro in binary form, but can also be built from source code.  This guide walks you step by step through the process of building ArmorPaint from source.

There are a few requirements before you can build.  Download and install the following programs if not already installed:

First step, we clone the repository.  Make sure to add the –recursive flag(that’s two ‘-‘ by the way).

Open a command prompt, cd to the directory where you want to install ArmorPaint’s source code and run the command:

git clone –recursive https://github.com/armory3d/armorpaint.git

Depending on your internet speed this could take a minute to several minutes while all of the files are downloaded. 

In Explorer, go the installation directory, then navigate to armorpaint\Kromx\V8\Libraries\win32\release and using 7zip extract v8_monolith.7z to the same directory as the .7z file.

Next in the command prompt run the following commands

(Assuming you are reusing the same CMD that you did the git clone from)

cd armorpaint

node Kromx/make –g direct3d11

cd Kromx

node Kinc/make –g direct3d11

explorer .

If you receive any errors above, the most likely cause is node not being installed.  The final command will now open a Windows Explorer window in the Kromx subdirectory.  Open the build directory and load the file Krom.sln.

image

This will open the project in Visual Studio.  If you haven’t run VS yet,you may have to do some initial installation steps.  Worst case scenario run through the initial install, close and double click Krom.sln again.

First make sure that you are building for x64 and Release mode at the top:

image

In the Solution Explorer, select Krom then hit ALT + ENTER or right click and select Properties.

Then select Debugging, in Command Arguments enter ..\..\build.krom then click Apply.

image

You are now ready to build ArmorPaint.  Select Ctrl + SHIFT + B or select Build->Build Solution.

image

Assuming no errors, are exe should be built.  Now go to the folder armorpaint\Kromx\build\x64\Release and copy the file Krom.exe, then copy to armorpaint\build\krom.  You can now run Krom.exe and you’re good to go. 

image

Step by step instructions are available in the video below.

[youtube https://www.youtube.com/watch?v=y6h2KOP47ZY&w=853&h=480]

Art Programming


<!–

–>

Posted on Leave a comment

Python Regex Search

When I first learned about regular expressions, I didn’t appreciate their power. But there’s a reason regular expressions have survived seven decades of technological disruption: coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!

This article is all about the search() method of Python’s re library. To learn about the easy-to-use but less powerful findall() method that returns a list of string matches, check out our article about the similar findall() method.

So how does the re.search() method work? Let’s study the specification.

How Does re.search() Work in Python?

The re.search(pattern, string) method matches the first occurrence of the pattern in the string and returns a match object.

Specification:

re.search(pattern, string, flags=0)

The re.search() method has up to three arguments.

  • pattern: the regular expression pattern that you want to match.
  • string: the string which you want to search for the pattern.
  • flags (optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.

We’ll explore them in more detail later.

Return Value:

The re.search() method returns a match object. You may ask (and rightly so):

What’s a Match Object?

If a regular expression matches a part of your string, there’s a lot of useful information that comes with it: what’s the exact position of the match? Which regex groups were matched—and where?

The match object is a simple wrapper for this information. Some regex methods of the re package in Python—such as search()—automatically create a match object upon the first pattern match.

At this point, you don’t need to explore the match object in detail. Just know that we can access the start and end positions of the match in the string by calling the methods m.start() and m.end() on the match object m:

>>> m = re.search('h...o', 'hello world')
>>> m.start()
0
>>> m.end()
5
>>> 'hello world'[m.start():m.end()] 'hello'

In the first line, you create a match object m by using the re.search() method. The pattern ‘h…o’ matches in the string ‘hello world’ at start position 0. You use the start and end position to access the substring that matches the pattern (using the popular Python technique of slicing).

Now, you know the purpose of the match() object in Python. Let’s check out a few examples of re.search()!

A Guided Example for re.search()

First, you import the re module and create the text string to be searched for the regex patterns:

>>> import re
>>> text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. '''

Let’s say you want to search the text for the string ‘her’:

>>> re.search('her', text)
<re.Match object; span=(20, 23), match='her'>

The first argument is the pattern to be found. In our case, it’s the string ‘her’. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. You don’t need to define the optional third argument flags of the search() method because you’re fine with the default behavior in this case.

Look at the output: it’s a match object! The match object gives the span of the match—that is the start and stop indices of the match. We can also directly access those boundaries by using the start() and stop() methods of the match object:

>>> m = re.search('her', text)
>>> m.start()
20
>>> m.end()
23

The problem is that the search() method only retrieves the first occurrence of the pattern in the string. If you want to find all matches in the string, you may want to use the findall() method of the re library.

What’s the Difference Between re.search() and re.findall()?

There are two differences between the re.search(pattern, string) and re.findall(pattern, string) methods:

  • re.search(pattern, string) returns a match object while re.findall(pattern, string) returns a list of matching strings.
  • re.search(pattern, string) returns only the first match in the string while re.findall(pattern, string) returns all matches in the string.

Both can be seen in the following example:

>>> text = 'Python is superior to Python'
>>> re.search('Py...n', text)
<re.Match object; span=(0, 6), match='Python'>
>>> re.findall('Py...n', text)
['Python', 'Python']

The string ‘Python is superior to Python’ contains two occurrences of ‘Python’. The search() method only returns a match object of the first occurrence. The findall() method returns a list of all occurrences.

What’s the Difference Between re.search() and re.match()?

The methods re.search(pattern, string) and re.match(pattern, string) both return a match object of the first match. However, re.match() attempts to match at the beginning of the string while re.search() matches anywhere in the string.

You can see this difference in the following code:

>>> text = 'Slim Shady is my name'
>>> re.search('Shady', text)
<re.Match object; span=(5, 10), match='Shady'>
>>> re.match('Shady', text)
>>>

The re.search() method retrieves the match of the ‘Shady’ substring as a match object. But if you use the re.match() method, there is no match and no return value because the substring ‘Shady’ does not occur at the beginning of the string ‘Slim Shady is my name’.

How to Use the Optional Flag Argument?

As you’ve seen in the specification, the search() method comes with an optional third ‘flag’ argument:

re.search(pattern, string, flags=0)

What’s the purpose of the flags argument?

Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (for example, whether to ignore capitalization when matching your regex).

Syntax Meaning
re.ASCII If you don’t use this flag, the special Python regex symbols \w, \W, \b, \B, \d, \D, \s and \S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A Same as re.ASCII
re.DEBUG If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I Same as re.IGNORECASE
re.LOCALE Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L Same as re.LOCALE
re.MULTILINE This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M Same as re.MULTILINE
re.DOTALL Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘\n’. Switch on this flag to really match all characters including the newline character.
re.S Same as re.DOTALL
re.VERBOSE To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X Same as re.VERBOSE

Here’s how you’d use it in a practical example:

>>> text = 'Python is great!'
>>> re.search('PYTHON', text, flags=re.IGNORECASE)
<re.Match object; span=(0, 6), match='Python'>

Although your regex ‘PYTHON’ is all-caps, we ignore the capitalization by using the flag re.IGNORECASE.

Where to Go From Here?

This article has introduced the re.search(pattern, string) method that attempts to match the first occurrence of the regex pattern in a given string—and returns a match object.

Python soars in popularity. There are two types of people: those who understand coding and those who don’t. The latter will have larger and larger difficulties participating in the era of massive adoption and penetration of digital content. Do you want to increase your Python skills daily without investing a lot of time?

Then join my “Coffee Break Python” email list of tens of thousands of ambitious coders!

Posted on Leave a comment

Python Regex Flags

In many functions, you see a third argument flags. What are they and how do they work?

Flags allow you to control the regular expression engine. Because regular expressions are so powerful, they are a useful way of switching on and off certain features (e.g. whether to ignore capitalization when matching your regex).

For example, here’s how the third argument flags is used in the re.findall() method:

re.findall(pattern, string, flags=0)

So the flags argument seems to be an integer argument with the default value of 0. To control the default regex behavior, you simply use one of the predefined integer values. You can access these predefined values via the re library:

Syntax Meaning
re.ASCII If you don’t use this flag, the special Python regex symbols \w, \W, \b, \B, \d, \D, \s and \S will match Unicode characters. If you use this flag, those special symbols will match only ASCII characters — as the name suggests.
re.A Same as re.ASCII
re.DEBUG If you use this flag, Python will print some useful information to the shell that helps you debugging your regex.
re.IGNORECASE If you use this flag, the regex engine will perform case-insensitive matching. So if you’re searching for [A-Z], it will also match [a-z].
re.I Same as re.IGNORECASE
re.LOCALE Don’t use this flag — ever. It’s depreciated—the idea was to perform case-insensitive matching depending on your current locale. But it isn’t reliable.
re.L Same as re.LOCALE
re.MULTILINE This flag switches on the following feature: the start-of-the-string regex ‘^’ matches at the beginning of each line (rather than only at the beginning of the string). The same holds for the end-of-the-string regex ‘$’ that now matches also at the end of each line in a multi-line string.
re.M Same as re.MULTILINE
re.DOTALL Without using this flag, the dot regex ‘.’ matches all characters except the newline character ‘\n’. Switch on this flag to really match all characters including the newline character.
re.S Same as re.DOTALL
re.VERBOSE To improve the readability of complicated regular expressions, you may want to allow comments and (multi-line) formatting of the regex itself. This is possible with this flag: all whitespace characters and lines that start with the character ‘#’ are ignored in the regex.
re.X Same as re.VERBOSE

How to Use These Flags?

Simply include the flag as the optional flag argument as follows:

import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('HER', text, flags=re.IGNORECASE))
# ['her', 'Her', 'her', 'her']

As you see, the re.IGNORECASE flag ensures that all occurrences of the string ‘her’ are matched—no matter their capitalization.

How to Use Multiple Flags?

Yes, simply add them together (sum them up) as follows:

import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall(' HER # Ignored', text, flags=re.IGNORECASE + re.VERBOSE))
# ['her', 'Her', 'her', 'her']

You use both flags re.IGNORECASE (all occurrences of lower- or uppercase string variants of ‘her’ are matched) and re.VERBOSE (ignore comments and whitespaces in the regex). You sum them together re.IGNORECASE + re.VERBOSE to indicate that you want to take both.

Posted on Leave a comment

Python re.findall() – Everything You Need to Know

When I first learned about regular expressions, I didn’t really appreciate their power. But there’s a reason regular expressions have survived seven decades of technological disruption: coders who understand regular expressions have a massive advantage when working with textual data. They can write in a single line of code what takes others dozens!

This article is all about the findall() method of Python’s re library. The findall() method is the most basic way of using regular expressions in Python: If you want to master them, start here!

So how does the re.findall() method work? Let’s study the specification.

How Does the findall() Method Work in Python?

The re.findall(pattern, string) method finds all occurrences of the pattern in the string and returns a list of all matching substrings.

Specification:

re.findall(pattern, string, flags=0)

The re.findall() method has up to three arguments.

  • pattern: the regular expression pattern that you want to match.
  • string: the string which you want to search for the pattern.
  • flags (optional argument): a more advanced modifier that allows you to customize the behavior of the function. Want to know how to use those flags? Check out this detailed article on the Finxter blog.

We will have a look at each of them in more detail.

Return Value:

The re.findall() method returns a list of strings. Each string element is a matching substring of the string argument.

Let’s check out a few examples!

Examples re.findall()

First, you import the re module and create the text string to be searched for the regex patterns:

import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. '''

Let’s say, you want to search the text for the string ‘her’:

>>> re.findall('her', text)
['her', 'her', 'her']

The first argument is the pattern you look for. In our case, it’s the string ‘her’. The second argument is the text to be analyzed. You stored the multi-line string in the variable text—so you take this as the second argument. You don’t need to define the optional third argument flags of the findall() method because you’re fine with the default behavior in this case.

Also note that the findall() function returns a list of all matching substrings. In this case, this may not be too useful because we only searched for an exact string. But if we search for more complicated patterns, this may actually be very useful:

>>> re.findall('\\bf\w+\\b', text)
['frost', 'flower', 'field']

The regex ‘\\bf\w+\\b’ matches all words that start with the character ‘f’.

You may ask: why to enclose the regex with a leading and trailing ‘\\b’? This is the word boundary character that matches the empty string at the beginning or at the end of a word. You can define a word as a sequence of characters that are not whitespace characters or other delimiters such as ‘.:,?!’.

In the previous example, you need to escape the boundary character ‘\b’ again because in a Python string, the default meaning of the character sequence ‘\b’ is the backslash character.

Where to Go From Here?

This article has introduced the re.findall(pattern, string) method that attempts to match all occurrences of the regex pattern in a given string—and returns a list of all matches as strings.

Python is growing rapidly and the world is more and more divided into two classes: those who understand coding and those who don’t. The latter will have larger and larger difficulties participating in the era of massive adoption and penetration of digital content. Do you want to increase your Python skills on a daily basis without investing a lot of time?

Then join my “Coffee Break Python” email list of tens of thousands of ambitious coders!

Posted on Leave a comment

Why choose the Godot Game Engine over Unity or Unreal Engine

.

This is a very common question, so this guide and video is setting out to answer why *I* might choose to use Godot over those other engines. Keep in mind, this isn’t me saying Godot is better or worse than those engines. Additionally, I have a video on Unreal vs Unity in the works, so if you want to decide which of those engines to use, stay tuned for that.

Without further ado, let’s jump in.

Free

Obviously, the lack of a price tag is one of the most obvious features of Godot. Yes, you can start for free with both Unity and Unreal Engine, but both ultimately have a price tag. With Unity, you pay a per seat license fee if you make over 100K a year. With Unreal Engine you pay a fixed 5% royalty after the first $3000 dollars earned. If you’re not making money nor plan to, this obviously doesn’t matter… but the more successful your game is, the better a deal free is!

Open Source

On the topic of free, we also have free as in freedom. Godot is free in both regards, to price tag and license, being licensed under the MIT license. Unity trails in this regard having only select subsets of the code available. Unreal Engine has the source code available and you can completely build the engine from scratch, as well as being able to fix problems yourself by walking through a debug build and applying fixes.

UE4 however is under a more restrictive proprietary license, while Godot is under the incredibly flexible and permissive code license.

Another aspect in Godot’s favor… it’s also by far the smallest code base and very modular in design from a code perspective. This makes it among the easiest engines to contribute code to. The learning curve to understand the source code is a fraction of that to get started contributing to Unreal, while contributing to Unity is frankly impossible without a very expensive negotiated source license.

Language Flexibility

Over the years Unity have *REMOVED* language support. Once there was UnityScript and Boo, a python like language, in addition to C#. Now it’s pretty much just C# and their in development visual scripting language.

Unreal on the other hand has C++ support, with the C++ thanks to Live++ usable very much like a scripting language (although final build times are by far the worst of all 3 engines!), as well as the (IMHO) single best visual programming language available, Blueprints.

For Godot the options are much more robust. First off there is the Python-lite scripting language, GDScript. You can also use C++, although the workflow for gameplay programming may be suboptimal. Additionally, C# support is being added as a first-class language and there is a visual programming language available here as well, although I can’t really think of a reason to use it as it stands now.

Where Godot really shines though is its modularity. GDScript itself is implemented as a module, meaning making other custom scripting languages is a borderline trivial task, as is extending or customizing GDScript. Additionally, there is GDNative/NativeScript it makes it fairly simple to link to external code, without having to jump into the guts of Godot (nor having to compile Godot) or to write performance critical code in C or C++. Finally, you have the ability to create C++ “modules” that have access to all of the C++ classes available in Godot without having to make changes to the underlying codebase.

Ease of Use

This one is obviously subjective, but if you are looking to create a game, especially as a beginner, the learning curve and ease of use with GDScript make this the easiest of the 3 engines to pick up, at least in my opinion. Unreal Engine is frankly fairly appalling for 2D titles, having basically abandoned Paper2D (their 2D API) on the vine. Over the last couple years Unity have really been focusing heavier on dedicated 2D support, but you still must dig through a lot of cruft and overhead to get to the meat of your game.

With Godot you pretty much everything you need for 2D out of the box and the ability to work directly with pixel (or % based) coordinates.

It’s Tiny

Unreal and Unity are multi GB installs and both have a hub or launcher app. Godot… a 50ish MB zip file (plus templates for a couple hundred more MB needed when deploying). Download, unzip and start game development!

You Like it Better?

You may, or you may not like the coding model of Godot. Chances are if you like the Node based approach to game development, you will love Godot. All three game engines (and almost all modern game engines) take a composition-based approach to scene modeling. Godot takes it one step further, making everything nodes, trees of nodes, even scenes are simply nodes. The approach is different enough that users may either love or hate the approach. If you love the approach Godot takes, you will be productive in it. If you don’t like it, you’re probably better served using Unity or Unreal.

Why Not Pick Godot Then?

I am not even going to pretend that Godot is the perfect game engine and ideal in every situation… there are certainly areas where Unity and Unreal have a small to huge advantage. This could be its own entire video, but a quick list include:

  • Performance concerns, especially on large 3D scenes (hopefully resolved with proper culling and the upcoming Vulkan renderer). In 3D, both engines out perform Godot quite often
  • Platforms… Unity and Unreal support every single platform you can imagine, Godot supports most of the common consumer categories and takes longer to get support for devices like AR/VR. Hardware manufacturers work with Unity and Epic from the design stages, while Godot pretty much must wait for hardware to come to market and then for someone to implement it. Another huge difference, and one of the few downsides to open source software, it isn’t compatible with the closed proprietary licenses of console hardware. While Godot has been ported to run on console hardware, it isn’t supported out of the box and probably never will be.
  • Ecosystem. Godot has a vibrant community but can’t hold a candle to the ecosystem around Unreal and especially Unity. There are simply more users, more books, larger asset stores, etc.
  • The resume factor… this is a part of ecosystem continued. It’s easier to get a job with Unity experience or Unreal experience on the resume than Godot. While many people wouldn’t (and really for a full-time hire, shouldn’t) care what engine you use, when people are hunting for employees, they often look for Unity or UE experience specifically. The other side of this coin is the number of people with Unity or UE experience is larger if you are the one doing the hiring.
  • As with many open source projects, it’s still heavily dependent on one or two key developers. If the leads left the project, it would be a massive blow to the future of Godot. Meanwhile there are hundred or thousands of people being paid to develop Unity or Unreal and the departure of any individual member isn’t likely to have a tangible impact.

The Longer Video Version

[youtube https://www.youtube.com/watch?v=l7BrpcboJno&w=853&h=480]

Programming General


<!–

–>