Introduction to Regex in Python

Repeats a character one or more times (non-greedy)[aeiou] Matches a single character in the listed set[^XYZ] Matches a single character not in the listed set[a-z0-9] The set of characters can include a range( Indicates where string extraction is to start) Indicates where string extraction is to endUsing the above cheat sheet as a guide, you can pretty much come up with any syntax.

Let’s take a closer look in some more complicate search patterns.

Stepping it UpImagine that you are building some sort of validation on an input field where the user can input any number followed by the letter d, m or y.

Your regex algorithm would look something like this:^[0-9]+[dmy]$Decomposing the above: ^ signifies the beginning of the match followed by a 0–9 number.

However the + sign means it needs to be at least one 0–9 number though there can be more.

Then the string needs to be followed by d, m or y, which have to be at the end because of $.

Testing the above in python:import restr = '1d'str2 = '200y'str3 = 'y200'lst = re.

findall('^[0-9]+[dmy]$', str)lst2 = re.

findall('^[0-9]+[dmy]$', str2)lst3 = re.

findall('^[0-9]+[dmy]$', str3)print(lst)print(lst2)print(lst3)Returning:['1d']['200y'][]Photo by Arget on UnsplashEscaping Special CharactersWhen it comes to regular expressions, certain characters are special.

For instance, dot, star and dollar sign are all used for matching purposes.

So what happens if you want to match those characters?In that case, we can use the back slash.

import restr = 'Sentences have dots.

How do we escape them?'lst = re.


', str)lst1 = re.


', str)print(lst)print(lst1)The above example is using dot, and backslash dot.

As you would expect, it returns two results.

The first one matches all characters, while the second one, only the dot.

['S', 'e', 'n', 't', 'e', 'n', 'c', 'e', 's', ' ', 'h', 'a', 'v', 'e', ' ', 'd', 'o', 't', 's', '.

', ' ', 'H', 'o', 'w', ' ', 'd', 'o', ' ', 'w', 'e', ' ', 'e', 's', 'c', 'a', 'p', 'e', ' ', 't', 'h', 'e', 'm', '?']['.

']Matching exact number of charactersImagine that you want to match a date.

You know that what the format will be, DD/MM/YYYY.

Sometimes there will be 2Ds or 2Ms, sometimes just one, but always 4Ys.

import restr = 'The date is 22/10/2018'str1 = 'The date is 3/1/2019'lst = re.

findall('[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}', str)lst = re.

findall('[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}', str1)print(lst)print(lst1)Which gives the following results:['22/10/2018']['3/1/2019']Extracting the matched patternThere are certain times, that knowing the fact that you’re matching a pattern is not enough.

You want to have the ability to extract information from the match.

For instance, imagine that you are scanning a large data set looking for email addresses.

If you use what we learnt about, you could search for a pattern of:Could start with a letter, number, dot or underscoreThen followed by at least another letter, or numberWhich could be followed by a dot or an underscoreThen there’s a @Then follow the same logic again as before the @Finally look for a dot followed by at least a letter^[a-zA-Z0-9.





[a-zA-z]+From the above match, you only want to extract the domain name ie everything after the @.

All you have to do is add brackets around what you’re after:import restr = 'email123_test@gmail.

com'lst = re.






[a-zA-z]+)', str)print(lst)Returning:['gmail.

com']In SummaryIn summary, you can use regex to match strings of data and it can be used in a number of different ways.

Python includes a regex package called re, which will allow you to use this.

Should you find yourself on a Unix machine however, you can use regular expression along with grep, awk or sed.

On Windows should you want to access all these commands, you can use tools like Cygwin.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by +443,678 people.

Subscribe to receive our top stories here.


. More details

Leave a Reply