Removing html contents from a web request using C# -
i have following code in c# gets contents of web page , stores them in string variable.
webrequest request = webrequest.create("http://www.arsenal.com"); webresponse response = request.getresponse(); stream data = response.getresponsestream(); string html = string.empty; using (streamreader sr = new streamreader(data)) { html = sr.readtoend(); } the code works m need store content of page without html tags , javascript stuff. there way (any built-in method or ready such things)?
have found ways removing html tags javascript , css styles still bother me. have mention way removing html not working well, i'm using regular expressions doing so.
as this question suggests, it's tricky process parsing html , best approach use library.
i've used html agility pack before success though this question lists other options.
Comments
Post a Comment