.NET Framework Bookmark and Share   
 index > Regular Expressions > Remove HTML unwanted tags and attributes
 

Remove HTML unwanted tags and attributes

Salaam!

I have an application that take html document and do some operations on it

1- remove the unwanted tags like script, mata , object,forn totally that means the tag and it's content

2- remove the unwanted tags and keep the inner html like tag span , div

3- remove the unwanted attributes from a tag like style, size ... etc

i wanna use regular expressions to solve these questions , any idea or any piece of code will help

regards

Hadi.Leb
Regex cannot handle hierarchal data to well and html is the prime example.

The problem with using regex is that fact that all html is not XHtml. Meaning that all tags are not closed so what may be displayed in html with an unclosed tag, you will end up deleting that section. Or if you have embeded tags, you may end up not removing enough and leaving unwanted information, or deleting things you need. In red show here:


<span><p>xxx</p><span><p>YYYY</p></span><p>ZZZ</p></span>


OmegaMan

Dear Hadi,

This post might be of some use.

HTH,

Suprotim Agarwal

Suprotim Agarwal
Regex cannot handle hierarchal data to well and html is the prime example.

The problem with using regex is that fact that all html is not XHtml. Meaning that all tags are not closed so what may be displayed in html with an unclosed tag, you will end up deleting that section. Or if you have embeded tags, you may end up not removing enough and leaving unwanted information, or deleting things you need. In red show here:


<span><p>xxx</p><span><p>YYYY</p></span><p>ZZZ</p></span>


OmegaMan

You can use google to search for other answers

Custom Search

More Threads

• Loop Through DataTable to Bind to DropDownList
• String extraction problem!
• Reg expression to search ">" in my string.
• Validation code for the comma seperated sring values
• Making Words Bold By Using Regular Expressions
• Regexoption ECMAScript
• Regex Problem
• Regular Expressions to Check for Mandatory Fields
• Regular expression for phone numbers [both U.S. and International]
• Conversion pattern reverse