Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Python | Split String with Regex

#1
Python | Split String with Regex

Rate this post

Summary: The different methods to split a string using regex are:

  • re.split()
  • re.sub()
  • re.findall()
  • re.compile()

Minimal Example


import re text = "Earth:Moon::Mars:Phobos" # Method 1
res = re.split("[:]+", text)
print(res) # Method 2
res = re.sub(r':', " ", text).split()
print(res) # Method 3
res = re.findall("[^:\s]+", text)
print(res) # Method 4
pattern = re.compile("[^:\s]+").findall
print(pattern(text)) # Output
['Earth', 'Moon', 'Mars', 'Phobos']

Problem Formulation


?Problem: Given a string and a delimiter. How will you split the string using the given delimiter using different functions from the regular expressions library?

Example: In the following example, the given string has to be split using a hyphen as the delimiter.

# Input
text = "abc-lmn-xyz" # Expected Output
['abc', 'lmn', 'xyz']

Method 1: re.split


The re.split(pattern, string) method matches all occurrences of the pattern in the string and divides the string along the matches resulting in a list of strings between the matches. For example, re.split('a', 'bbabbbab') results in the list of strings ['bb', 'bbb', 'b'].


Approach: Use the re.split function and pass [_]+ as the pattern which splits the given string on occurrence of an underscore.

Code:

import re text = "abc_lmn_xyz"
res = re.split("[_]+", text)
print(res) # ['abc', 'lmn', 'xyz']

?Related Read: Python Regex Split

Method 2: re.sub


The regex function re.sub(P, R, S) replaces all occurrences of the pattern P with the replacement R in string S. It returns a new string. For example, if you call re.sub('a', 'b', 'aabb'), the result will be the new string 'bbbb' with all characters 'a' replaced by 'b'.


Approach: The idea here is to use the re.sub function to replace all occurrences of underscores with a space and then use the split function to split the string at spaces.

Code:

import re text = "abc_lmn_xyz"
res = re.sub(r'_', " ", text).split()
print(res) # ['abc', 'lmn', 'xyz']

?Related Read: Python Regex Sub

Method 3: re.findall


The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.


Approach: Find all occurrences of characters that are separated by underscores using the re.findall().

Code:

import re text = "abc_lmn_xyz"
res = re.findall("[^_\s]+", text)
print(res) # ['abc', 'lmn', 'xyz']

?Related Read: Python re.findall()

Method 4: re.compile


The method re.compile(pattern) returns a regular expression object from the pattern that provides basic regex methods such as pattern.search(string)pattern.match(string), and pattern.findall(string). The explicit two-step approach of (1) compiling and (2) searching the pattern is more efficient than calling, say, search(pattern, string) at once, if you match the same pattern multiple times because it avoids redundant compilations of the same pattern.


Code:

import re text = "abc_lmn_xyz"
pattern = re.compile("[^-\s]+").findall
print(pattern(text)) # ['abc', 'lmn', 'xyz']

Why use re.compile?


  • Efficiency: Using re.compile() to assemble regular expressions is effective when the expression has to be used more than once. Thus, by using the classes/objects created by compile function, we can search for instances that we need within different strings without having to rewirte the expressions again and again. This increases productivity as well as saves time.
  • Readability: Another advantage of using re.compile is the readability factor as it leverages you the power to decouple the specification of the regex.

?Read: Is It Worth Using Python’s re.compile()?

Exercise


Problem: Python regex split by spaces, commas, and periods, but not in cases like 1,000 or 1.50.

Given:
my_string = "one two 3.4 5,6 seven.eight nine,ten"
Expected Output:
["one", "two", "3.4", "25.6" , "seven", "eight", "nine", "ten"]

Solution

my_string = "one two 3.4 25.6 seven.eight nine,ten"
res = re.split('\s|(?<!\d)[,.](?!\d)', my_string)
print(res) # ['one', 'two', '3.4', '25.6', 'seven', 'eight', 'nine', 'ten']

Conclusion


Therefore, we have learned four different ways of splitting a string using the regular expressions package in Python. Feel free to use the suitable technique that fits your needs. The idea of this tutorial was to get you acquainted with the numerous ways of using regex to split a string and I hope it helped you.

Please stay tuned and subscribe for more interesting discussions and tutorials in the future. Happy coding! ?


Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.



https://www.sickgaming.net/blog/2022/12/...ith-regex/
Reply



Possibly Related Threads…
Thread Author Replies Views Last Post
  [Tut] Python Int to String with Trailing Zeros xSicKxBot 0 38 12-01-2025, 05:47 PM
Last Post: xSicKxBot
  [Tut] Wrap and Truncate a String with Textwrap in Python xSicKxBot 0 2,057 09-01-2023, 07:45 PM
Last Post: xSicKxBot
  [Tut] Write a Long String on Multiple Lines in Python xSicKxBot 0 1,506 08-17-2023, 11:05 AM
Last Post: xSicKxBot
  [Tut] 5 Effective Methods to Sort a List of String Numbers Numerically in Python xSicKxBot 0 1,557 08-16-2023, 08:49 AM
Last Post: xSicKxBot
  [Tut] Sort a List, String, Tuple in Python (sort, sorted) xSicKxBot 0 1,697 08-15-2023, 02:08 PM
Last Post: xSicKxBot
  [Tut] Python Regex Capturing Groups – A Helpful Guide (+Video) xSicKxBot 0 1,419 04-07-2023, 10:07 AM
Last Post: xSicKxBot
  [Tut] How to Access Multiple Matches of a Regex Group in Python? xSicKxBot 0 1,491 04-04-2023, 02:26 PM
Last Post: xSicKxBot
  [Tut] F-String Python Hex, Oct, and Bin: Efficient Number Conversions xSicKxBot 0 1,672 03-28-2023, 12:01 PM
Last Post: xSicKxBot
  [Tut] How to Correctly Write a Raw Multiline String in Python: Essential Tips xSicKxBot 0 1,493 03-27-2023, 05:54 PM
Last Post: xSicKxBot
  [Tut] How To Extract Numbers From A String In Python? xSicKxBot 0 1,319 02-26-2023, 02:45 PM
Last Post: xSicKxBot

Forum Jump:


Users browsing this thread:

Forum software by © MyBB Theme © iAndrew 2016