Role of Regex in String Manipulation
This is an introductory article to understand the power of regex usage through an example.
A regular expression is a sequence of characters that specifies a search pattern in any file(pattern matching) or from website(web scraping) or for pre-processing text(data cleaning).String Manipulation is pretty easy using regex. Having the knowledge of regular expression is extremely useful in data science.
A very simple analogy for understanding regex is like grep commands that we use for searching a pattern in files.
Referring to regex cheat sheet from any good source helps us in recollecting the syntax. I used cheat sheet from https://www.rexegg.com/regex-quickstart.html
Lets say we have a file with list of IP addresses followed by user id of the user in the format.
Example:’173.140.74.179 — rippin3809'
If we are now interested in collecting the IP addresses of users, our regex would be
‘(.*)\s-\s’
Here re module stores all regular expression libraries
( ) indicates to capture a group with the content inside the group.
.* indicates any character except line break, since the file has list of IP addresses with a line break we capture only the IP address with this command.
\s indicates a whitespace character
- is from the format in file followed by another whitespace character.
Thankyou for reading the article.