Regular Expressions in Ruby
In Ruby, a Regexp hold a regular expression used to match a pattern against strings. These regular expressions are created using/.../
(forward slashes) and %r{...}
literals, or by the Regexp new constructor.
What is a regular expression?
Regular expressions are an independent subset of characters and symbols that are utilized in nearly every programming language. The characters form meaningful patterns which describe the contents of a string. They are used for testing whether a particular string contains a given pattern. If the string does contain the pattern it is said to be a 'match'. Regular expressions can be complex but in turn provide a powerful mechanism for working with text. At least a basic understanding of regular expressions is required in nearly every career involving programming. Many beginners tend to avoid regular expressions, finding them to be more trouble than they're worth. This is truly unfortunate. Having a strong command of regular expressions is like receiving your first wand at Hogwarts University. The additional power they provide is well worth any initial struggle or frustration. Stick with them, do not shy away. Harness their power and reap the many rewards of your efforts.
Matching patterns in Ruby can be achieved by the =~
operator or match
method. When one operand is a regular expression and the other is a string then the regular expression is used as the pattern to match against the string. Remember, regular expressions in ruby will be enclosed in forward slashes to differentiate them from other language syntax. Here's an example:
>> text = "Here is a string that contains the numbers 3, 4, and 5, as well as the word 'puppy'"
"Here is a string that contains the numbers 3, 4, and 5, as well as the word 'puppy'"
# the 'text' variable contains our string
# we want to test for the capital letter 'H'
>> text =~ /H/
0
To test for the occurrence of a capital letter 'H' we use the =~
operator and put the letter 'H' between forward slashes / /
. Ruby returned 0, which is the index of the first occurrence. Let's try another test. This time we'll test for the number 3:
>> text =~ /3/
43
Once again ruby returned the index of the first occurrence of the number 3. What if we test for the number 6?
>> text =~ /6/
nil
There are no occurrences of the number 6 in the string we specified and therefore nil is returned.
If we use the match method instead of =~ then ruby will return a MatchData object if there is a match or nil.
Character classes
Character classes are delimited with square brackets [ ] and list characters that may appear at that point in the match. For example, /abc/ defines the pattern of letters abc, in that order. However, if we are searching for letters a or b or c we can use the regular expression /[abc]/.
Ranges
A range can be specified as a regular expression. This saves time and makes it easier to type out expressions. For example, [0-9] matches any number from 0 to 9. A range could also specify letters. So, the range [a-z] matches any lowercase letter 'a' through 'z'. Regular expressions are case sensitive and so to match any uppercase letter the expression would be, [A-Z]. Since it's common to search for multiple ranges there are several shorthand expressions that are useful.
Shorthand Patterns
- \w is equivalent to [ 0-9a-zA-Z_ ] which matches any word character (letter, number, or underscore)
- \d is equivalent to [ 0-9 ] which matches any digit
- \s matches any white space character
Wildcard
The period ( . ) matches any character except a newline.
Modifiers
So far we have used patterns to match a single character. In order to match multiple characters use a modifier.
Modifier | Description |
---|---|
+ | 1 or more |
* | 0 or more |
? | 0 or 1 |
{2,5} | between 2 and 5 |