.NET Framework Bookmark and Share   
 index > Regular Expressions > What's the meaning of Groups and Captures?
 

What's the meaning of Groups and Captures?

I am confused over the .Net Groups and Captures. My code:
private void button1_Click_1(object sender, EventArgs e)
{
    Regex regex = new Regex(@"Password\s*=\s*(.*?)(?:[;""']|$)");
    Debug.WriteLine("regex.GetGroupNumbers().Length=" + regex.GetGroupNumbers().Length);
    string s = "Data Source=server1;Initial Catalog=myDB;Persist Security Info=True;User ID=testuser;Password=topsecret";
    if (regex.IsMatch(s))
    {
        Match m = regex.Match(s);
        int mCount = 0;
        while (m.Success)
        {
            mCount++;
            int[] groupNums = regex.GetGroupNumbers();
            Debug.WriteLine("Match #"+mCount+" = [" + m.Value + "]");
            for (int i = 0; i < groupNums.Length; i++)
            {
                Group g = m.Groups[groupNums[i]];
                Debug.WriteLine("  Group #" + groupNums[i] + " = [" + g.ToString() + "]");
                CaptureCollection cc = g.Captures;
                for (int j = 0; j < cc.Count; j++)
                {
                    Capture c = cc[j];
                    Debug.WriteLine("    Capture #" + j + " = [" + c.Value + "] Index:" +
                        c.Index + " Length: " + c.Length);
                }
            }
            m = m.NextMatch();
        }
    }
    return;
}
produces the following output:

regex.GetGroupNumbers().Length=2
Match #1 = [Password=topsecret]
Group #0 = [Password=topsecret]
Capture #0 = [Password=topsecret] Index:85 Length: 18
Group #1 = [topsecret]
Capture #0 = [topsecret] Index:94 Length: 9

My questions:

1. Why does regex.GetGroupNumbers() have a Length of 2? I have only one group, (.*?), in my regular expression.
2. Why does Groups 0 and 1 have the values Password=topsecret and topsecret as shown?
3.My aim is to extract the Password value (so that I can mask it out with ****). Does it mean that if there is a match, I can safely look for the Password value in Match.Groups[1].ToString() or Match.Groups[1].Captures[0].Value?
4. I am lost. What is Group and what is Capture? Perhaps this should be question 1.

Thanks.

  • Edited byK.Kong Wednesday, September 02, 2009 1:09 PMclean up code layout
  • Edited byK.Kong Wednesday, September 02, 2009 1:11 PMremove lots of blank lines from code
  • Edited byK.Kong Wednesday, September 02, 2009 1:10 PMclean up code
  •  
K.Kong
>>>>
1. Why does regex.GetGroupNumbers() have a Length of 2? I have only one group, (.*?), in my regular expression.
2. Why does Groups 0 and 1 have the values Password=topsecret and topsecret as shown?
3.My aim is to extract the Password value (so that I can mask it out with ****). Does it mean that if there is a match, I can safely look for the Password value in Match.Groups[1].ToString() or Match.Groups[1].Captures[0].Value?
4. I am lost. What is Group and what is Capture? Perhaps this should be question 1.
<<<<

1. There is always one Group, Groups[0] that matches the entire match. All other groups are numbered starting from one.
2. Password=topsecret is the entire match. topsecret is the explicit group match.
3. Yes, given your pattern, if you have a match, then you have a Groups[1] that you can work with.Use Groups[1].Value, .Index, and .Length for string position and length. You do not need Captures because your pattern will only capture ONE password.
4. Let me add an example. If you had the pattern @"(?<many>[abc])+" and the string @"123abc456", you will have a match "abc", composed of 3 captures. The match.Groups[1].Value will be "c", i.e, the last capture of the group. But match.Groups[1].Captures[0].Value will be "a". I've named the group "many" so that I can use match.Groups["many"] instead of match.Groups[1].


Les Potter, Xalnix Corporation, Yet Another C# Blog
  • Marked As Answer byK.Kong Friday, September 04, 2009 11:54 AM
  • Edited byxalnix Monday, September 07, 2009 1:44 PM
  •  
xalnix
>>>>
1. Why does regex.GetGroupNumbers() have a Length of 2? I have only one group, (.*?), in my regular expression.
2. Why does Groups 0 and 1 have the values Password=topsecret and topsecret as shown?
3.My aim is to extract the Password value (so that I can mask it out with ****). Does it mean that if there is a match, I can safely look for the Password value in Match.Groups[1].ToString() or Match.Groups[1].Captures[0].Value?
4. I am lost. What is Group and what is Capture? Perhaps this should be question 1.
<<<<

1. There is always one Group, Groups[0] that matches the entire match. All other groups are numbered starting from one.
2. Password=topsecret is the entire match. topsecret is the explicit group match.
3. Yes, given your pattern, if you have a match, then you have a Groups[1] that you can work with.Use Groups[1].Value, .Index, and .Length for string position and length. You do not need Captures because your pattern will only capture ONE password.
4. Let me add an example. If you had the pattern @"(?<many>[abc])+" and the string @"123abc456", you will have a match "abc", composed of 3 captures. The match.Groups[1].Value will be "c", i.e, the last capture of the group. But match.Groups[1].Captures[0].Value will be "a". I've named the group "many" so that I can use match.Groups["many"] instead of match.Groups[1].


Les Potter, Xalnix Corporation, Yet Another C# Blog
  • Marked As Answer byK.Kong Friday, September 04, 2009 11:54 AM
  • Edited byxalnix Monday, September 07, 2009 1:44 PM
  •  
xalnix
Thanks, I get it all.

Just one more question on your example. Do you mean to say "Since the Group has many captures?" rather than it's because the Group is named "many"?
K.Kong
Yes, that is what I meant. But, I've reworded my answer a little differently. Thanks (I was under some pressure to hurry up when I wrote that).
Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix

You can use google to search for other answers

Custom Search

More Threads

• Lexical Analysis using GPLEX
• How do I extract links from an html page
• script challenge
• Regex Split Function...
• Checking a searchphrase. Asteriks problem
• Regular Expression to match proper noun
• Match all <script> HTML blocks except ones w/ a src= attribute?
• UK Currency
• How to make bbcode parser using Regex
• Intermixed spaces & tabs at start of line