Removing html contents from a web request using C# -


i have following code in c# gets contents of web page , stores them in string variable.

webrequest request = webrequest.create("http://www.arsenal.com"); webresponse response = request.getresponse(); stream data = response.getresponsestream(); string html = string.empty; using (streamreader sr = new streamreader(data)) {     html = sr.readtoend(); } 

the code works m need store content of page without html tags , javascript stuff. there way (any built-in method or ready such things)?
have found ways removing html tags javascript , css styles still bother me. have mention way removing html not working well, i'm using regular expressions doing so.

as this question suggests, it's tricky process parsing html , best approach use library.

i've used html agility pack before success though this question lists other options.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -