Removing html contents from a web request using C# -
i have following code in c#
gets contents of web page , stores them in string variable.
webrequest request = webrequest.create("http://www.arsenal.com"); webresponse response = request.getresponse(); stream data = response.getresponsestream(); string html = string.empty; using (streamreader sr = new streamreader(data)) { html = sr.readtoend(); }
the code works m need store content of page without html
tags , javascript
stuff. there way (any built-in method or ready such things)?
have found ways removing html
tags javascript
, css
styles still bother me. have mention way removing html
not working well, i'm using regular expressions doing so.
as this question suggests, it's tricky process parsing html , best approach use library.
i've used html agility pack before success though this question lists other options.
Comments
Post a Comment