.NET Framework Bookmark and Share   
 index > Regular Expressions > Regular expression to count number of upper cased letters in string.
 

Regular expression to count number of upper cased letters in string.

This should be a simple one for you reg-ex gurus. I'm writing a function to detect shouting whena web form is submitted. I want to determine how many characters in a given string are upper-cased. A regular expression seems more elegant thaniteratingthrought the string via a for-next loop

My code is in VB.Net below, but I'm also conversant with C#.

    Public Function PreventShouting(ByVal Words As String, _
                                    ByVal CapsThresholdPercentage As Integer) As String

        If Words.Length < 15 Then Return Words
        Dim strOut As String = Words
        If Words.Length > 0 Then

            'Calculate the percent of characters in Words that are upper cased.
            Dim pattern As String = "[A-Z]{*}"
            'Dim pattern As String = "[^A-Z]" 'How many non-uppercase letters are in the string?
            Dim rx As Regex = New Regex(pattern)

            Dim MatchList As System.Text.RegularExpressions.Match
            MatchList = rx.Match(Words)
            Dim num As Integer = MatchList.Length
            Dim per As Integer = num / Words.Length * 100

            If per > CapsThresholdPercentage Then
               strOut = StrConv(Words, VbStrConv.ProperCase)
            End If

        End If

        Return strOut

    End Function

My expectation from the above is that MatchList.Length should return the number of upper cased characters found in the string. Thanks for your help.
  • Edited byKenPalmer Thursday, September 17, 2009 7:16 PMFix spelling
  •  
KenPalmer

Hello KenPalmer

In your original code, System.Text.RegularExpressions.Match.Length returns the length of the captured substring. Thiago is right that you need Regex.Matches.

The regular expression "[A-Z]{*}" is not for matching capitalized words.
You may want to use [A-Z] to match capitalized letters. For example
Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "[A-Z]")
Console.WriteLine(ms.Count)

outputs 10 because of the captialized letters IAJIALIANG

or use \b[A-Z]+\b to match capitalized words. For example,
Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "\b[A-Z]+\b")
Console.WriteLine(ms.Count)
outputs 2 because of the "I" and "JIALIANG" words.

Regards,
Jialiang Ge


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
  • Proposed As Answer byThiago Valença Friday, September 18, 2009 12:58 PM
  • Marked As Answer byKenPalmer Friday, September 18, 2009 12:59 PM
  •  
Jialiang Ge [MSFT]
Try to use Matches instead Match.

Ex:

MatchCollection a = Regex.Matches("aAaBaA", "[A-B]");
int count = a.Count; // the Count will return the number of upper cased characters
Thiago Valença

Hello KenPalmer

In your original code, System.Text.RegularExpressions.Match.Length returns the length of the captured substring. Thiago is right that you need Regex.Matches.

The regular expression "[A-Z]{*}" is not for matching capitalized words.
You may want to use [A-Z] to match capitalized letters. For example
Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "[A-Z]")
Console.WriteLine(ms.Count)

outputs 10 because of the captialized letters IAJIALIANG

or use \b[A-Z]+\b to match capitalized words. For example,
Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "\b[A-Z]+\b")
Console.WriteLine(ms.Count)
outputs 2 because of the "I" and "JIALIANG" words.

Regards,
Jialiang Ge


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
  • Proposed As Answer byThiago Valença Friday, September 18, 2009 12:58 PM
  • Marked As Answer byKenPalmer Friday, September 18, 2009 12:59 PM
  •  
Jialiang Ge [MSFT]
Thanks to both of you! Matches was the fix. My implementation appears below, along with a companion method named EnglishCase() which applies upper case letters after sentence terminators.

#Region " English Case "
    ''' <summary>
    ''' Called by the PreventShouting() method to suppress all upper-case entries. This method
    ''' capitalizes letters from input string using standard sentence terminators (.!?).
    ''' Differs from the StrConv(Words, VbStrConv.ProperCase), which is actually "Title Case",
    ''' capitalizing all words in a string.
    ''' 
    ''' We're not trying to build a grammar engine here.
    ''' This gets called only when the PreventShouting() method determines that
    ''' the allowed percentage of upper-cased letters in a string was exceeded.
    ''' </summary>
    ''' <param name="Words">The string that will be re-capitalized.</param>
    ''' <returns></returns>
    ''' <remarks>
    ''' 9/18/2009 - Ken Palmer - Created
    ''' </remarks>
    Public Shared Function EnglishCase(ByVal Words As String) As String

        If Words.Length = 0 Then Return String.Empty

        Dim ThisASCII As Integer = 0
        Dim LastASCII As Integer = 0

        Dim strOut As String = String.Empty
        Dim sb As New StringBuilder
        For Each Letter As Char In Words

            ThisASCII = Asc(Letter)
            Select Case LastASCII
                Case 0, 33, 46, 63 'To detect NULL, !, ., ?
                    sb.Append(UCase(Letter))
                Case Else
                    sb.Append(LCase(Letter))
            End Select

            'Ignore white space characters when evaluating the LastASCII.
            Select Case ThisASCII
                Case 0 To 32, 127
                Case Else
                    LastASCII = ThisASCII
            End Select
        Next
        strOut = sb.ToString

        'Fix common misspellings.
        strOut = Replace(strOut, " i ", " I ")

        Return strOut

    End Function 'EnglishCase

#End Region

#Region " Prevent Shouting "
    ''' <summary>
    ''' Suppress shouting on form entries.
    ''' This does not prevent an entry from being submitted.  Instead, it evaluates the
    ''' entry and changes the capitalization of the string if the set threshold of allowed
    ''' capital letters is exceeded.
    ''' This method calls EnglishCase() to apply capitalization rules.
    ''' </summary>
    ''' <param name="Words">The input string to evaluate.</param>
    ''' <param name="IgnoreStringLength">
    ''' String size limit to ignore rules.  For example, pass 20 to tell this
    ''' method not to alter a string with a length of 20 characters or less.
    ''' </param>
    ''' <param name="CapsThresholdPercentage"></param>
    ''' <returns></returns>
    ''' <remarks>
    ''' 9/18/2009 - Ken Palmer - Created
    ''' </remarks>
    Public Shared Function PreventShouting(ByVal Words As String, _
                                           ByVal IgnoreStringLength As Integer, _
                                           ByVal CapsThresholdPercentage As Integer) As String

        'Exit this function if the string doesn't exceed the IgnoreLength.
        If Words.Length <= IgnoreStringLength Then Return Words

        Dim strOut As String = Words

        If Words.Length > 0 Then

            'Calculate the percent of characters in Words that are upper cased.
            Dim pattern As String = "[A-Z]" 'RegEx for all upper cased English letters.
            Dim rx As Regex = New Regex(pattern)
            Dim MatchList As System.Text.RegularExpressions.MatchCollection
            MatchList = rx.Matches(Words)
            Dim num As Integer = MatchList.Count
            Dim per As Integer = num / Words.Length * 100

            If per > CapsThresholdPercentage Then
                strOut = EnglishCase(Words)
                'strOut = StrConv(Words, VbStrConv.ProperCase)
            End If

        End If

        Return strOut

    End Function 'PreventShouting

#End Region
KenPalmer
Try to use Matches instead Match.

Ex:

MatchCollection a = Regex.Matches("aAaBaA", "[A-B]");
int count = a.Count; // the Count will return the number of upper cased characters

How about a "shorter version"

Dim count As Integer= Regex.Matches("aBaBaA", "[A-Z]").Count
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
Cool! I like that John. Also, I should have set that string builder to nothing in the EnglishCase() function. Debugging and refactoring is like pulling weeds.
KenPalmer
Here is the revised code. Thanks again.

    Public Shared Function PreventShouting(ByVal Words As String, _
                                           ByVal IgnoreStringLength As Integer, _
                                           ByVal CapsThresholdPercentage As Integer) As String

        'Exit if the string doesn't exceed the IgnoreLength.
        If Words.Length <= IgnoreStringLength Then Return Words

        Dim strOut As String = Words

        If Words.Length > 0 Then
            'RegEx to find all upper cased English letters.
            Dim UpperCaseCount As Integer = Regex.Matches(Words, "[A-Z]").Count
            Dim per As Integer = UpperCaseCount / Words.Length * 100

            If UpperCaseCount > CapsThresholdPercentage Then
                strOut = EnglishCase(Words)
            End If
        End If

        Return strOut

    End Function 'PreventShouting
KenPalmer

You can use google to search for other answers

Custom Search

More Threads

• Help me to create regexp for deselect some part in a line
• replace and insert ines whir regexp
• locate all images in a web page.
• Regex errors parsing file path
• how to insert new line?
• Help with regex.split
• Regular expresison for ../
• A regular expression for replacing
• regex for [*anytext*] in a text file. help plzz
• How do I retrieve a string which look like this,”LR-AZ4N-7MN92-RT48K-B4TFD-ZDS5G-XWJSD?