.NET Framework Bookmark and Share   
 index > Regular Expressions > Regular expression for url
 

Regular expression for url

  • Edited bywaterding Tuesday, September 22, 2009 11:26 AMmodified
  •  
waterding
http:\/\/[\w\.\/\?=\#&+-]*
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
http:\/\/[\w\.\/\?=\#&+-]*
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com


Thanks for your reply. I think I haven't described my question clearly.

I actually need a regex to extract the values of parameter 'p' and 'q' from a url, not a regex to validate url.
waterding
Oh I see, how far should the p= go to in your examples?

[Try this for starters, but I am not sure how far you want it]

String pattern = @"(?:p|q)=[\w+]*";
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
  • Edited byJohnGrove Tuesday, September 22, 2009 1:29 PM
  • Edited byJohnGrove Tuesday, September 22, 2009 1:39 PM
  •  
JohnGrove
Oh I see, how far should the p= go to in your examples?
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com

For example

I need to extract 'google+search+engine' from this url http://www.google.co.uk/#hl=en&q=google+search+engine&meta=&fp=2d600fc9f1842146

I need to extract 'banana' from this url http://www.bing.com/search?q=banana&FORM=MSNH90&mkt=en-gb

I need to extract 'google+apple' from this url http://uk.search.yahoo.com/search?rd=r1&p=google+apple&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-702
waterding
String pattern = @"(?:p|q)=(?<Word>[\w+]*)";
Regex rx = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = rx.Match(inputStringOfUrls);
while (m.Success)
{
Console.WriteLine(m.Value);
Console.WriteLine(m.Groups["Word"].Value);
Console.WriteLine("");
m = m.NextMatch();
}
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
  • Edited byJohnGrove Tuesday, September 22, 2009 1:36 PM
  • Edited byJohnGrove Tuesday, September 22, 2009 1:38 PM
  •  
JohnGrove
String pattern = "(?:p|q)=(?<Word>[\w+]*)";
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com

Just tried, it's saying that the regular expression contains an error.
waterding
String pattern = @"(?:p|q)=(?<Word>[\w+]*)";
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com

Just tried, it's saying that the regular expression contains an error.

Sorry, I forgot the verbatim @
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
String pattern = @"(?:p|q)=(?<Word>[\w+]*)";
Regex rx = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = rx.Match(inputStringOfUrls);
while (m.Success)
{
Console.WriteLine(m.Value);
Console.WriteLine(m.Groups["Word"].Value);
Console.WriteLine("");
m = m.NextMatch();
}
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com


Thanks very much for this. I just tried this url http://www.google.co.uk/#hl=en&q=google+search+engine&meta=&fp=2d600fc9f1842146

It extracts both google+search+engine and 2d600fc9f1842146 . 2d600fc9f1842146 is actually of parameter 'fp', not 'p'. Can you only match 'p' parameter and 'q' parameter.
waterding
This pattern can do

\b[pq]=(.+?)[\&\s]
or
[?&][pq]=(.+?)[&\s]

the value is inside the captured group
or you can use look ahead and look behind


John, your pattern will also match

fp=2d600fc9f1842146
cop=mss

so it's not accurate.
www.wonderstudio.cn
Eping Wang
I am mistaken, I thought the user wanted all where p=[anything] or q=[anything]

Try just adding a boundary on it like so:

String pattern = @"\b(?:p|q)=(?<Word>[\w+]*)";

John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
This pattern can do

\b[pq]=(.+?)[\&\s]
or
[?&][pq]=(.+?)[&\s]

the value is inside the captured group
or you can use look ahead and look behind


John, your pattern will also match

fp=2d600fc9f1842146
cop=mss

so it's not accurate.
www.wonderstudio.cn

Not a big deal, simple add the \b to the start of the pattern and that solves that.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
Yes, all roads to Rome.
There're still other patterns can do this work.

This is a very good case for learning Regex, I wish to write this case in my website or my blog.
www.wonderstudio.cn
Eping Wang
Yea, programming is interesting in how the mind works. We can create different solutions to do the same thing.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove

You can use google to search for other answers

Custom Search

More Threads

• Log for regex opeartion
• Looking at code with regex patterns theyre simple but i dont know regex patterns
• Regular Expression for Self closing HTML tags
• Replacing href anchor tags with the href value
• Filtering Special Characters Quickly From A String
• Regular expression issue
• How to find spaces in html which exclude spaces inside tags?
• Excluding characters
• double qouted string
• detecting duplicate characters at the end of the word.