This should be a simple one for you reg-ex gurus. I'm writing a function to detect shouting whena web form is submitted. I want to determine how many characters in a given string are upper-cased. A regular expression seems more elegant thaniteratingthrought the string via a for-next loop My code is in VB.Net below, but I'm also conversant with C#.
Public Function PreventShouting(ByVal Words As String, _
ByVal CapsThresholdPercentage As Integer) As String
If Words.Length < 15 Then Return Words
Dim strOut As String = Words
If Words.Length > 0 Then
'Calculate the percent of characters in Words that are upper cased.
Dim pattern As String = "[A-Z]{*}"
'Dim pattern As String = "[^A-Z]" 'How many non-uppercase letters are in the string?
Dim rx As Regex = New Regex(pattern)
Dim MatchList As System.Text.RegularExpressions.Match
MatchList = rx.Match(Words)
Dim num As Integer = MatchList.Length
Dim per As Integer = num / Words.Length * 100
If per > CapsThresholdPercentage Then
strOut = StrConv(Words, VbStrConv.ProperCase)
End If
End If
Return strOut
End Function
My expectation from the above is that MatchList.Length should return the number of upper cased characters found in the string. Thanks for your help. - Edited byKenPalmer Thursday, September 17, 2009 7:16 PMFix spelling
-
| | KenPalmer | Hello KenPalmer
In your original code, System.Text.RegularExpressions.Match.Length returns the length of the captured substring. Thiago is right that you need Regex.Matches.
The regular expression "[A-Z]{*}" is not for matching capitalized words. You may want to use [A-Z] to match capitalized letters. For example Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "[A-Z]") Console.WriteLine(ms.Count) outputs 10 because of the captialized letters IAJIALIANG
or use \b[A-Z]+\b to match capitalized words. For example, Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "\b[A-Z]+\b") Console.WriteLine(ms.Count) outputs 2 because of the "I" and "JIALIANG" words.
Regards, Jialiang Ge
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us. - Proposed As Answer byThiago Valença Friday, September 18, 2009 12:58 PM
- Marked As Answer byKenPalmer Friday, September 18, 2009 12:59 PM
-
| | Jialiang Ge [MSFT] | Try to use Matches instead Match. Ex: MatchCollection a = Regex.Matches("aAaBaA", "[A-B]"); int count = a.Count; // the Count will return the number of upper cased characters | | Thiago Valença | Hello KenPalmer
In your original code, System.Text.RegularExpressions.Match.Length returns the length of the captured substring. Thiago is right that you need Regex.Matches.
The regular expression "[A-Z]{*}" is not for matching capitalized words. You may want to use [A-Z] to match capitalized letters. For example Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "[A-Z]") Console.WriteLine(ms.Count) outputs 10 because of the captialized letters IAJIALIANG
or use \b[A-Z]+\b to match capitalized words. For example, Dim ms As MatchCollection = Regex.Matches("I Am JIALIANG", "\b[A-Z]+\b") Console.WriteLine(ms.Count) outputs 2 because of the "I" and "JIALIANG" words.
Regards, Jialiang Ge
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us. - Proposed As Answer byThiago Valença Friday, September 18, 2009 12:58 PM
- Marked As Answer byKenPalmer Friday, September 18, 2009 12:59 PM
-
| | Jialiang Ge [MSFT] | Thanks to both of you! Matches was the fix. My implementation appears below, along with a companion method named EnglishCase() which applies upper case letters after sentence terminators.
#Region " English Case "
''' <summary>
''' Called by the PreventShouting() method to suppress all upper-case entries. This method
''' capitalizes letters from input string using standard sentence terminators (.!?).
''' Differs from the StrConv(Words, VbStrConv.ProperCase), which is actually "Title Case",
''' capitalizing all words in a string.
'''
''' We're not trying to build a grammar engine here.
''' This gets called only when the PreventShouting() method determines that
''' the allowed percentage of upper-cased letters in a string was exceeded.
''' </summary>
''' <param name="Words">The string that will be re-capitalized.</param>
''' <returns></returns>
''' <remarks>
''' 9/18/2009 - Ken Palmer - Created
''' </remarks>
Public Shared Function EnglishCase(ByVal Words As String) As String
If Words.Length = 0 Then Return String.Empty
Dim ThisASCII As Integer = 0
Dim LastASCII As Integer = 0
Dim strOut As String = String.Empty
Dim sb As New StringBuilder
For Each Letter As Char In Words
ThisASCII = Asc(Letter)
Select Case LastASCII
Case 0, 33, 46, 63 'To detect NULL, !, ., ?
sb.Append(UCase(Letter))
Case Else
sb.Append(LCase(Letter))
End Select
'Ignore white space characters when evaluating the LastASCII.
Select Case ThisASCII
Case 0 To 32, 127
Case Else
LastASCII = ThisASCII
End Select
Next
strOut = sb.ToString
'Fix common misspellings.
strOut = Replace(strOut, " i ", " I ")
Return strOut
End Function 'EnglishCase
#End Region
#Region " Prevent Shouting "
''' <summary>
''' Suppress shouting on form entries.
''' This does not prevent an entry from being submitted. Instead, it evaluates the
''' entry and changes the capitalization of the string if the set threshold of allowed
''' capital letters is exceeded.
''' This method calls EnglishCase() to apply capitalization rules.
''' </summary>
''' <param name="Words">The input string to evaluate.</param>
''' <param name="IgnoreStringLength">
''' String size limit to ignore rules. For example, pass 20 to tell this
''' method not to alter a string with a length of 20 characters or less.
''' </param>
''' <param name="CapsThresholdPercentage"></param>
''' <returns></returns>
''' <remarks>
''' 9/18/2009 - Ken Palmer - Created
''' </remarks>
Public Shared Function PreventShouting(ByVal Words As String, _
ByVal IgnoreStringLength As Integer, _
ByVal CapsThresholdPercentage As Integer) As String
'Exit this function if the string doesn't exceed the IgnoreLength.
If Words.Length <= IgnoreStringLength Then Return Words
Dim strOut As String = Words
If Words.Length > 0 Then
'Calculate the percent of characters in Words that are upper cased.
Dim pattern As String = "[A-Z]" 'RegEx for all upper cased English letters.
Dim rx As Regex = New Regex(pattern)
Dim MatchList As System.Text.RegularExpressions.MatchCollection
MatchList = rx.Matches(Words)
Dim num As Integer = MatchList.Count
Dim per As Integer = num / Words.Length * 100
If per > CapsThresholdPercentage Then
strOut = EnglishCase(Words)
'strOut = StrConv(Words, VbStrConv.ProperCase)
End If
End If
Return strOut
End Function 'PreventShouting
#End Region
| | KenPalmer | Try to use Matches instead Match.
Ex:
MatchCollection a = Regex.Matches("aAaBaA", "[A-B]"); int count = a.Count; // the Count will return the number of upper cased characters
How about a "shorter version"
Dim count As Integer= Regex.Matches("aBaBaA", "[A-Z]").Count
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com - Proposed As Answer byJialiang Ge [MSFT]MSFTTuesday, September 22, 2009 5:17 AM
-
| | JohnGrove | Cool! I like that John. Also, I should have set that string builder to nothing in the EnglishCase() function. Debugging and refactoring is like pulling weeds. | | KenPalmer | Here is the revised code. Thanks again.
Public Shared Function PreventShouting(ByVal Words As String, _
ByVal IgnoreStringLength As Integer, _
ByVal CapsThresholdPercentage As Integer) As String
'Exit if the string doesn't exceed the IgnoreLength.
If Words.Length <= IgnoreStringLength Then Return Words
Dim strOut As String = Words
If Words.Length > 0 Then
'RegEx to find all upper cased English letters.
Dim UpperCaseCount As Integer = Regex.Matches(Words, "[A-Z]").Count
Dim per As Integer = UpperCaseCount / Words.Length * 100
If UpperCaseCount > CapsThresholdPercentage Then
strOut = EnglishCase(Words)
End If
End If
Return strOut
End Function 'PreventShouting
| | KenPalmer |
|