re Module
The built-in re
library in Python is a module that provides support for regular expressions. A regular expression, also known as regex or regexp, is a sequence of characters that defines a search pattern. The re
module allows you to search for patterns within strings and perform various operations on them, such as matching, replacing, or splitting.
The re
module provides various functions and methods for working with regular expressions, including compile()
, search()
, match()
, fullmatch()
, findall()
, finditer()
, sub()
, split()
, escape()
, group()
, and groups()
. These methods allow you to perform various operations on strings and regular expressions, such as compiling regular expressions, searching for patterns, replacing substrings, splitting strings, and more.
Using the re
module, you can perform complex text-processing tasks, such as parsing log files, validating user input, and more. Regular expressions can be difficult to learn and use effectively, but they are a powerful tool for working with text data.
Here is an example of using the re module to search for a pattern within a string:
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = r"fox"
result = re.search(pattern, text)
if result:
print("Match found:", result.group())
else:
print("No match found.")
In this example, we import the re module and define a string variable called text. We also define a regular expression pattern that matches the word "fox". We then use the re.search()
method to search for the pattern within the text variable. If a match is found, we print the matched substring using the group()
method. If no match is found, we print a message indicating that no match was found.
Methods
The re
module in Python provides several methods for working with regex. In this article, I will explore these methods and provide example code and output for each one.
re.compile()
The re.compile()
method is used to compile a regex pattern into a regex object that can be used for pattern matching. Here's an example:
import re
pattern = re.compile(r'\d+')
result = pattern.findall('There are 123 apples and 456 oranges')
print(result)
['123', '456']
In this example, we compile the regex pattern \d+
into a regex object using re.compile()
. We then use the findall()
method of this object to search for all occurrences of one or more digits in the given string.
re.search()
The re.search()
method searches for the first occurrence of a regex pattern in a string. Here's an example:
import re
result = re.search(r'\d+', 'There are 123 apples and 456 oranges')
print(result.group())
123
In this example, we use re.search()
to search for the first occurrence of one or more digits in the given string. We then use the group()
method of the resulting match object to get the matched string.
re.match()
The re.match()
method is similar to re.search()
, but it only searches at the beginning of the string. Here's an example:
import re
result = re.match(r'\d+', '123 apples and 456 oranges')
print(result.group())
123
In this example, we use re.match() to search for the first occurrence of one or more digits at the beginning of the given string.
re.fullmatch()
The re.fullmatch()
method is similar to re.match()
, but it matches the entire string, not just the beginning. Here's an example:
import re
result = re.fullmatch(r'\d+', '123')
print(result.group())
123
In this example, we use re.fullmatch()
to match the entire string to the regex pattern \d+
.
re.findall()
The re.findall()
method returns a list of all non-overlapping matches of a regex pattern in a string. Here's an example:
import re
result = re.findall(r'\d+', 'There are 123 apples and 456 oranges')
print(result)
['123', '456']
In this example, we use re.findall()
to find all occurrences of one or more digits in the given string.
re.finditer()
The re.finditer()
method returns an iterator that produces match objects for all non-overlapping matches of a regex pattern in a string. Here's an example:
import re
for match in re.finditer(r'\d+', 'There are 123 apples and 456 oranges'):
print(match.group())
123
456
In this example, we use re.finditer()
to find all occurrences of one or more digits in the given string. We then loop through the resulting iterator and print the matched strings.
re.sub()
The re.sub()
method is used to replace all occurrences of a pattern in a string with a replacement string. The syntax for re.sub()
is as follows:
re.sub(pattern, repl, string, count=0, flags=0)
pattern
: the regular expression pattern to search forrepl
: the replacement stringstring
: the string to search incount
(optional): the maximum number of occurrences to replace (defaults to 0, which means replace all occurrences)flags
(optional): regular expression flags
Here's an example that replaces all occurrences of "world" in a string with "python":
import re
string = "hello world, welcome to the world of python"
new_string = re.sub("world", "python", string)
print(new_string)
hello python, welcome to the python of python
re.split()
The re.split()
method is used to split a string into a list of substrings using a regular expression pattern as the delimiter. The syntax for re.split()
is as follows:
re.split(pattern, string, maxsplit=0, flags=0)
pattern
: the regular expression pattern to use as the delimiterstring
: the string to splitmaxsplit
(optional): the maximum number of splits to perform (defaults to 0, which means split all occurrences)flags
(optional): regular expression flags
Here's an example that splits a string using a regular expression pattern that matches one or more spaces:
import re
string = "hello world of python"
new_list = re.split("\s+", string)
print(new_list)
['hello', 'world', 'of', 'python']
re.escape()
The re.escape()
method is used to escape special characters in a string so that they can be used as literal characters in a regular expression pattern. The syntax for re.escape() is as follows:
re.escape(string)
string
: the string to escape
Here's an example that uses re.escape()
to escape special characters in a string:
import re
string = "hello (world)"
escaped_string = re.escape(string)
print(escaped_string)
hello\ \(world\)
re.group()
The re.group()
method is used to retrieve the matched substring for a specific group in a regular expression pattern. The syntax for re.group()
is as follows:
re.group([group1, ...])
group1
,group2
, ... (optional): the group numbers to retrieve
Here's an example that uses re.group()
to retrieve the matched substring for a specific group:
import re
string = "hello world"
pattern = r"(\w+)\s(\w+)"
match = re.search(pattern, string)
print(match.group(1))
hello
re.groups()
The re.groups()
method returns a tuple containing all the captured groups in a regular expression pattern. A group is defined using parentheses in the pattern. The method returns an empty tuple if no groups were matched.
Here's an example:
import re
pattern = r'(\d{3})-(\d{2})-(\d{4})'
string = 'My SSN is 123-45-6789.'
match = re.search(pattern, string)
if match:
groups = match.groups()
print(groups)
('123', '45', '6789')
In this example, we define a regular expression pattern that matches a Social Security Number (SSN) in the format xxx-xx-xxxx, where x
is a digit. We use the re.search()
method to search for this pattern in the given string. If a match is found, we call the groups()
method on the resulting match object to extract all the captured groups in the pattern. The resulting tuple contains the three groups of digits that make up the SSN.
References