.net - Stripping MS Word Tags Using Html Agility Pack -
I have a DB with some text fields pasted from MS Word, and I'm just having trouble stopping , And tags
public function strip HTML (string as ByVal html as, string as by-wordHarmelestagens boolean) New HTML document as string-deem HTMD () htmlDoc.LoadHtml (html ) Slightly nodes are HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes ("// div | // font | // span") In InvalidNodes node to node brake. ParentNode.RemoveChild (node, false) Return the next html.doc.DocumentNode.WriteTo () end function
This code only selects the desired elements and removes them ... but do not keep
Good ... I think I found a solution:
public function strip New HTML Document () HTMLDoc.LoadHtml (HTML) Invalid HTML HtmlModeCollection = HTTM as String Deam HTMDK as HTML (Buy Val HTML L String) Invalid nodes node in the node for each lDoc.DocumentNode.SelectNodes ("// div | // font | // span | // p"). Parent node Removal child (node, true) next return html doc. DocumentNode Wight content to and function
I was almost there ...: P
Comments
Post a Comment