.NET Framework Bookmark and Share   
 index > Regular Expressions > Visual Basic Extract Links, VB 2008 Extracting URL Links, Visual basic 2008 extracting html links, vb.net extract html links
 

Visual Basic Extract Links, VB 2008 Extracting URL Links, Visual basic 2008 extracting html links, vb.net extract html links

I'm new to visual basic 2008 & was wondering if anyone could show a great code sample for extracting url/links from web page html.

I've seen this code in other threads, but not sure how to use it:
(?<=\<form\sname="dlf"\ action=")[a-zA-Z0-9\.\\_:\/^<="]+ 

Maybe I don't need this and there's another way to extract links. I know that somebody here knows how. ;o]

Please specify an exact code sample for this page (since it has a lot of links.):
http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f


Thanks!
Inforequester123
Then try this...

        Dim tr As TextReader
        tr = File.OpenText("..\..\results.txt")
        Dim test As String = tr.ReadToEnd()
        Dim mx As Match

        Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"
        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)
            Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)
            TextBox1.AppendText(mx.Groups("username").Value)
            TextBox1.AppendText(vbCrLf)
        Next


Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix
Hi Inforequester,

Since all the hyperlink elementsrelated to Username on the page is in such HTML format:

<a id="video-from-username-kcqPq5189VQ" class="hLink" href="/user/redhooknoodles">redhooknoodles</a>


Thus, you can locate them and retrieve text programmically like this:

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

TextBox1.Multiline = True

WebBrowser1.Navigate("http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f")

End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")

For Each curElement As HtmlElement In theElementCollection

If curElement.GetAttribute("href").Contains("/user/") Then

TextBox1.Text += curElement.GetAttribute("innerText") & vbCrLf

End If

Next

End Sub

End Class



Best regards,
Martin Xie


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com
Martin Xie - MSFT

Thank you xalnix for your friendly help!


Hi Inforequester,


You're welcome!

Actually, xalnix didn't misunderstand you. He came up with another solution, that is Regular Expressions,to retrieve all Username text of YouTube page. I test his code sample and it works perfect as mine :)

Prerequisites: Drag&drop WebBrowser1 and TextBox1 onto Form1.

Imports System.IO

Imports System.Text.RegularExpressions

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

TextBox1.Multiline = True WebBrowser1.Navigate("http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f")

End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

Dim htmlCode As String = WebBrowser1.DocumentText.ToString 'Get html content of web page

Dim sw As New StreamWriter(Application.StartupPath & "results.txt")

sw.WriteLine(htmlCode)

sw.Close()

Dim tr As TextReader

tr = File.OpenText(Application.StartupPath & "results.txt")

Dim test As String = tr.ReadToEnd()

Dim mx As Match

Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"

For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)

Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)

TextBox1.AppendText(mx.Groups("username").Value)

TextBox1.AppendText(vbCrLf)

Next

End Sub

End Class


Tutorial: Introduction to Regular Expressions

http://msdn.microsoft.com/en-us/library/28hw3sce.aspx


Best regards,
Martin Xie


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com
Martin Xie - MSFT

You need to be more specific about your request. You think "extract links from web page" is specific? Not really, when you also say you are new to VB and we are not sure how much example code you need.

Have you looked as the page source? Are you only interested in the URL inside of the <Link> tags, or do you want <A> tags?

Here's a start. If you grab a copy of the page source and put it in a file locally, then this snippet might help. This only find the URLs in the href attributes inside of <link> tags. If letter case varies, add the RegexOptions.IgnoreCase.

        Dim tr As TextReader
        tr = File.OpenText("..\..\results.txt")
        Dim test As String = tr.ReadToEnd()
        Dim mx As Match

        Dim pattern As String = "<link[^>]*?href=""(?<link>[^""]*?)"""
        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)
            Console.WriteLine("{0}", mx.Groups("link").Value)
        Next


Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix

You need to be more specific about your request. You think "extract links from web page" is specific? Not really, when you also say you are new to VB and we are not sure how much example code you need.

Have you looked as the page source? Are you only interested in the URL inside of the <Link> tags, or do you want <A> tags?

Here's a start. If you grab a copy of the page source and put it in a file locally, then this snippet might help. This only find the URLs in the href attributes inside of <link> tags. If letter case varies, add the RegexOptions.IgnoreCase.

        Dim tr As TextReader

        tr = File.OpenText("..\..\results.txt")

        Dim test As String = tr.ReadToEnd()

        Dim mx As Match



        Dim pattern As String = "<link[^>]*?href=""(?<link>[^""]*?)"""

        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)

            Console.WriteLine("{0}", mx.Groups("link").Value)

        Next




Les Potter, Xalnix Corporation, Yet Another C# Blog




Hey Xalnix!,

Thanks for your reply. You're right, I wasn't being very specific.

Here's a more specified inquiry:

How would I extract the USER NAME text of the following page (http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f) and send onlythe usernames into a textbox1.text???


Even more specified: How would I extract the "YOUTUBE" "USER NAMES" off of this page and send them to a textbox with multiple lines???


Hope this is specific enough and thanks again for your help !!!

Inforequester123
Then try this...

        Dim tr As TextReader
        tr = File.OpenText("..\..\results.txt")
        Dim test As String = tr.ReadToEnd()
        Dim mx As Match

        Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"
        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)
            Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)
            TextBox1.AppendText(mx.Groups("username").Value)
            TextBox1.AppendText(vbCrLf)
        Next


Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix
Then try this...

        Dim tr As TextReader

        tr = File.OpenText("..\..\results.txt")

        Dim test As String = tr.ReadToEnd()

        Dim mx As Match



        Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"

        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)

            Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)

            TextBox1.AppendText(mx.Groups("username").Value)

            TextBox1.AppendText(vbCrLf)

        Next




Les Potter, Xalnix Corporation, Yet Another C# Blog

You need to be more specific with where to put this code and how to use this code because all I'm getting is a bunch of errors.


I was very specific with what I'm trying to do and your reply seems to be off the topic.

I wan't to extract the "Usernames" off of youtube.com and send the "Usernames/Account Names" to Textbox1.text w/multiple lines. Understand?

I know that I said I wanted to extract the links at first and then I specified that I wanted you to extract the usernames. I'm sure that the codes are rather similar, but I rather get the "usernames" off of any page and send them to a textbox.
Inforequester123
I'm new to visual basic 2008 & was wondering if anyone could show a great code sample for extracting TEXT from web pages and then sending the text to a Textbox1.text with multiple lines.

To be exact, I would like to extract the usernames from the following page:
http://www.youtube.com/results?search_query=hip+hop+beat&search_type=&aq=f and send them to textbox1.text with multiple lines.


I guess this is similar to harvesting, but I know that it is legal to ask, answer and this isn't the same as email harvesting (just requesting how to harvest text, that's all). :)

Please specify an exact code sample for this page:
http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f


Remember, I only need to extract/collect the text of allthe"Usernames"at once & send them to Textbox1.text.

Thanks!
Inforequester123
Hi Inforequester,

Since all the hyperlink elementsrelated to Username on the page is in such HTML format:

<a id="video-from-username-kcqPq5189VQ" class="hLink" href="/user/redhooknoodles">redhooknoodles</a>


Thus, you can locate them and retrieve text programmically like this:

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

TextBox1.Multiline = True

WebBrowser1.Navigate("http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f")

End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")

For Each curElement As HtmlElement In theElementCollection

If curElement.GetAttribute("href").Contains("/user/") Then

TextBox1.Text += curElement.GetAttribute("innerText") & vbCrLf

End If

Next

End Sub

End Class



Best regards,
Martin Xie


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com
Martin Xie - MSFT
I'm sorry, I must disagree. The pattern extracts "usernames" from your example page. It places those "usernames" in a textbox one on each line. As an added bonus, it prints additional information to the Console (which is only visible in the debugger under the Output window). Since you haven't said where the "username" is located, I've taken a stab at it myself. If you now want both "username" and "account name", you need to ask for that. We can't read minds (thought we often make very good guesses at what folks are thinking).

Now, if I don't understand what you are asking for, and I am the only one even attempting to understand you, you could consider that maybe you are notbeing clear.

If you want both Usernames and Account Names, why don't you copy a few lines of text from your sample and highlight which is which and what parts of the smaller sample you really want to capture.

I am assuming you know VB well enough. If that's not the case, that's fine too. We can work through that.
Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix
To run the code example I provided, do the following...
1. In a browser, navigate to the page you specified.
2. Grab the source (right click, view source, save-as "results.txt" in your solution directory)
3. In your VB Solutio, add a button and a textbox.
4. Set the textbox1.multiline property to True.
5. Add the code I provided to your button handler (double click on the button you added and paste the code).


Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix
Hi Inforequester,

Since all the hyperlink elementsrelated to Username on the page is in such HTML format:

<a id="video-from-username-kcqPq5189VQ" class="hLink" href="/user/redhooknoodles">redhooknoodles</a>


Thus, you can locate them and retrieve text programmically like this:

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

TextBox1.Multiline = True

WebBrowser1.Navigate("http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f")

End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

Dim theElementCollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("a")

For Each curElement As HtmlElement In theElementCollection

If curElement.GetAttribute("href").Contains("/user/") Then

TextBox1.Text += curElement.GetAttribute("innerText") & vbCrLf

End If

Next

End Sub

End Class



Best regards,
Martin Xie


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com




THANKS MARTIN !!!,

At least Xalnix wasn't the only one trying to understand me after all.You seem to understand on your first attempt. LOL !!! ;oD

Thanks and you're still number one here as I always say. Keep up the great work because I'm sure many here, honestly appreciate your help.

Best regards,
Ace B.
Inforequester123
I'm sorry, I must disagree. The pattern extracts "usernames" from your example page. It places those "usernames" in a textbox one on each line. As an added bonus, it prints additional information to the Console (which is only visible in the debugger under the Output window). Since you haven't said where the "username" is located, I've taken a stab at it myself. If you now want both "username" and "account name", you need to ask for that. We can't read minds (thought we often make very good guesses at what folks are thinking).

Now, if I don't understand what you are asking for, and I am the only one even attempting to understand you, you could consider that maybe you are notbeing clear.

If you want both Usernames and Account Names, why don't you copy a few lines of text from your sample and highlight which is which and what parts of the smaller sample you really want to capture.

I am assuming you know VB well enough. If that's not the case, that's fine too. We can work through that.
Les Potter, Xalnix Corporation, Yet Another C# Blog



See Xalnix??? Mr. Xie above easily understood my inquiry. I guess I wasn't crazy after all. LOL ;oD

But what the heck, I will give your code a shot as well and hopefully I won't get a bunch of errors this time. :)
Inforequester123

Thank you xalnix for your friendly help!


Hi Inforequester,


You're welcome!

Actually, xalnix didn't misunderstand you. He came up with another solution, that is Regular Expressions,to retrieve all Username text of YouTube page. I test his code sample and it works perfect as mine :)

Prerequisites: Drag&drop WebBrowser1 and TextBox1 onto Form1.

Imports System.IO

Imports System.Text.RegularExpressions

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

TextBox1.Multiline = True WebBrowser1.Navigate("http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f")

End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

Dim htmlCode As String = WebBrowser1.DocumentText.ToString 'Get html content of web page

Dim sw As New StreamWriter(Application.StartupPath & "results.txt")

sw.WriteLine(htmlCode)

sw.Close()

Dim tr As TextReader

tr = File.OpenText(Application.StartupPath & "results.txt")

Dim test As String = tr.ReadToEnd()

Dim mx As Match

Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"

For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)

Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)

TextBox1.AppendText(mx.Groups("username").Value)

TextBox1.AppendText(vbCrLf)

Next

End Sub

End Class


Tutorial: Introduction to Regular Expressions

http://msdn.microsoft.com/en-us/library/28hw3sce.aspx


Best regards,
Martin Xie


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com
Martin Xie - MSFT
Thanks both or you. I re-read my post and recognize I sounded belligerent(sp?). My apologies.
Les Potter, Xalnix Corporation, Yet Another C# Blog
xalnix
It is ok to disagree as long as we are not disagreeable.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
Then try this...

        Dim tr As TextReader

        tr = File.OpenText("..\..\results.txt")

        Dim test As String = tr.ReadToEnd()

        Dim mx As Match



        Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"

        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)

            Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)

            TextBox1.AppendText(mx.Groups("username").Value)

            TextBox1.AppendText(vbCrLf)

        Next




Les Potter, Xalnix Corporation, Yet Another C# Blog




No wonder I got a bunch of errors from this code. You never specified that I needed to put

Imports System.IO

Imports System.Text.RegularExpressions

above

Public

Class Form1


You only said then try this:


Dim tr As TextReader
        tr = File.OpenText("..\..\results.txt")
        Dim test As String = tr.ReadToEnd()
        Dim mx As Match

        Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"
        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)
            Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)
            TextBox1.AppendText(mx.Groups("username").Value)
            TextBox1.AppendText(vbCrLf)
        Next<br/>

Trying that alone will only give a new vb user a bunch of errors. :)

Mr Xie hit the nail on the head with a first attempt. :)

I just think that it's funny whenyou tell me that I'm not being very specific and then you go and do the same thing. Actually, I believe I was being specific enough when I said, "I'm new to visual basic 2008 & was wondering if anyone could show a great code sample for extracting url/links from a web page html." and then said "Please specify an exact code sample for this page (since it has a lot of links.):
http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f" and also said, "How would I extract the USER NAME text of the following page (http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f) and send onlythe usernames into a textbox1.text???"


"Even more specified inquiry: How would I extract the "YOUTUBE" "USER NAMES" off of this page and send them to a textbox with multiple lines????"

Enough said :)



Inforequester123
It is ok to disagree as long as we are not disagreeable.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com



I choose both. :)
Inforequester123

Thank you xalnix for your friendly help!


Hi Inforequester,


You're welcome!

Actually, xalnix didn't misunderstand you. He came up with another solution, that is Regular Expressions,to retrieve all Username text of YouTube page. I test his code sample and it works perfect as mine :)

Prerequisites: Drag&drop WebBrowser1 and TextBox1 onto Form1.

Imports System.IO

Imports System.Text.RegularExpressions

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

TextBox1.Multiline = True WebBrowser1.Navigate("http://www.youtube.com/results?search_query=hip+hop+samples&search_type=&aq=f")

End Sub

Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

Dim htmlCode As String = WebBrowser1.DocumentText.ToString 'Get html content of web page

Dim sw As New StreamWriter(Application.StartupPath & "results.txt")

sw.WriteLine(htmlCode)

sw.Close()

Dim tr As TextReader

tr = File.OpenText(Application.StartupPath & "results.txt")

Dim test As String = tr.ReadToEnd()

Dim mx As Match

Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"

For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)

Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)

TextBox1.AppendText(mx.Groups("username").Value)

TextBox1.AppendText(vbCrLf)

Next

End Sub

End Class


Tutorial: Introduction to Regular Expressions

http://msdn.microsoft.com/en-us/library/28hw3sce.aspx


Best regards,
Martin Xie


Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com




I said mr xalnix didn't understand me because I asked him to be specific with the codesand he only gave me the cake without the icing when he said"

Dim tr As TextReader
tr = File.OpenText("..\..\results.txt")
Dim test As String = tr.ReadToEnd()
Dim mx As Match

Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"
For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)
Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)
TextBox1.AppendText(mx.Groups("username").Value)
TextBox1.AppendText(vbCrLf)
Next
"


This alone will give errors, especailly when your new and need the entire code. :)

You on the other hand knew exactly what it was that I was looking for and wasn't acting like I wasn't being specific enough. Superman to the rescue flying around these forums. LOL :)

Thanks man!!!

Inforequester123
Hi Inforequester,

There are many volunteers such as xalnix and John etc.in MSDN community, who often spend their spare time in helping other community members. It's understandable that sometimes some volunteerssupply onlythe primary/basic/important code snippet or main idea instruction because of nosufficient time and assume that original poster has basic programing experience in VB.NET.
While Martin works at Microsoft Corporation as a MSFT support engineer, it's my responsibility to supply completed/clear answers to customers :)

Sincerely thank you xalnix and All for your friendly help and support in MSDN community!
Thank you Inforequester for your active participation in MSDN community and encouragement to me.

Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback on our support, please contact msdnmg@microsoft.com
Martin Xie - MSFT
It's difficult to gauge every users programming level simply by the question. Sometimes by virtue of the question it is easy, sometimes not so much, we try to do the best we can given the question and our ability to answer it.

Sometimes we make assumptions. Sometimes we are wrong. I can tell you Les has been worthy of not only the MVP award but definitely deserves being a moderator.

We are glad that your question has been answered and are always thankful for our Microsoft friends assistence when then they can offer it.
John Grove - TFD Group, Senior Software Engineer, EI Division, http://www.tfdg.com
JohnGrove
I'm sorry, I must disagree. The pattern extracts "usernames" from your example page. It places those "usernames" in a textbox one on each line. As an added bonus, it prints additional information to the Console (which is only visible in the debugger under the Output window). Since you haven't said where the "username" is located, I've taken a stab at it myself. If you now want both "username" and "account name", you need to ask for that. We can't read minds (thought we often make very good guesses at what folks are thinking).

Now, if I don't understand what you are asking for, and I am the only one even attempting to understand you, you could consider that maybe you are notbeing clear.

If you want both Usernames and Account Names, why don't you copy a few lines of text from your sample and highlight which is which and what parts of the smaller sample you really want to capture.

I am assuming you know VB well enough. If that's not the case, that's fine too. We can work through that.
Les Potter, Xalnix Corporation, Yet Another C# Blog



Hello Xalnix,

1- If I use the MSHTML method instead of the webbrowser control for parsing hundreds of web pages at one time, will this bebetter than using the webbrowser control?

2- How would I navigate to any url using a textbox with this code?:

Dim tr As TextReader
        tr = File.OpenText("..\..\results.txt")
        Dim test As String = tr.ReadToEnd()
        Dim mx As Match

        Dim pattern As String = "<a\s+id=""video-from-username[^>]*?href=""(?<link>[^""]*?)""[^>]*>(?<username>.*?(?=</a>))"
        For Each mx In Regex.Matches(test, pattern, RegexOptions.Multiline + RegexOptions.Compiled)
            Console.WriteLine("{0} ({1}, {2})", mx.Value, mx.Groups("link").Value, mx.Groups("username").Value)
            TextBox1.AppendText(mx.Groups("username").Value)
            TextBox1.AppendText(vbCrLf)
        Next





3- Using this code, where do I put the innertext that I want to extract?

I would greatly appreciate your help

Kind regards,
Velarz P.
VBNETMAN

You can use google to search for other answers

Custom Search

More Threads

• Why the "Header" returns me redundant data?
• File Comparison
• Regular Expression - String variable which has a " character
• converting a date using regular expressions
• Even number of quotes
• Regular Expression Help Needed Pls
• Possible bug report - lazy behaviour
• Regular expresison for ../
• Parsing Formatting Informations
• extract string