Thursday 8 May 2008

Confused with Groovy Regex?

Ever wonder what the heck is going on with groovy regex? Well recently I was confused with the following syntax matcher[0][1] when I was doing some regex work. What the heck does it mean and where does it come from?

Let's start from the top and work our way down. (Note: When trying to figure out regex while doing grails work I normally pop open the "grails console" and test everything in there)

assert "\\S" == /\S/ These are both the same thing. Note the benefit of the /.../ is that you do not have to escape so many characters.
def matcher = 'a b c' =~ /\S/ Match any character that is not whitespace

matcher is now a java.util.regex.Matcher and now can be used to return each match.
assert matcher[0] == 'a'
assert matcher[1] == 'b'
assert matcher[2] == 'c'


Ok that makes sense the matcher has found 3 matches to the regex that was defined.

So what happens when you add grouping?
matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/

Now matcher is again an instance of java.util.regex.Matcher except now it returns a two dimensional array. The first dimension is the number of times the regex has matched the string and the second is the grouping. Confused? Well let me show you an example that should explain everything.

matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/
assert matcher[0][0] == 'a:1' The full string that was matched.
assert matcher[0][1] == 'a' The first group.
assert matcher[0][2] == '1' The second group.

assert matcher[1][0] == 'b:2'
assert matcher[1][1] == 'b'
assert matcher[1][2] == '2'

assert matcher[2][0] == 'c:3'
assert matcher[2][1] == 'c'
assert matcher[2][2] == '3'

Hopefully this answered some questions. I have included some links to some useful groovy regex links:
http://groovy.codehaus.org/Regular+Expressions
http://docs.codehaus.org/display/GROOVY/Tutorial+5+-+Capturing+regex+groups
http://www.regular-expressions.info/reference.html

No comments: