java - XML parser not parsing UTF-8 despite correct encoding -


note: there countless questions in general subject on here, couldn't find targeted toward specific problem.

i'm working on parsing xml http://rss.cnn.com/rss/cnn_latest.rss , parser working fine , getting looking for. no problems. , out of blue, after hours of working fine...i started getting encoding errors.

now, i've been doing writing source xml file , parsing file, below.

file xmlfile = new file("cnnxml.txt"); documentbuilderfactory dbfactory = documentbuilderfactory.newinstance();  documentbuilder dbuilder = dbfactory.newdocumentbuilder(); document doc = dbuilder.parse(xmlfile); 

what's weird first line of xml file, seem encoding is, in fact, utf-8

<?xml version="1.0" encoding="utf-8"?> 

below errors i'm getting in eclipse.

com.sun.org.apache.xerces.internal.impl.io.malformedbytesequenceexception:invalid byte 3 of 4-byte utf-8 sequence. @ com.sun.org.apache.xerces.internal.impl.io.utf8reader.invalidbyte(unknown source) @ com.sun.org.apache.xerces.internal.impl.io.utf8reader.read(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlentityscanner.load(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmlentityscanner.scandata(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scancdatasection(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl$fragmentcontentdriver.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentscannerimpl.next(unknown source) @ com.sun.org.apache.xerces.internal.impl.xmldocumentfragmentscannerimpl.scandocument(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xml11configuration.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.xmlparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.parsers.domparser.parse(unknown source) @ com.sun.org.apache.xerces.internal.jaxp.documentbuilderimpl.parse(unknown source) @ javax.xml.parsers.documentbuilder.parse(unknown source) @ getrss.main(getrss.java:87) 

and, again, working day , entirely out of started getting problems. going on?

the solution have explore. @michaelkay suggested better answer unlikely.

the file declares utf-8 not. use programmer's editor jedit or notepad++ play around encodings. data error, catch exception , make copy of file examination. might error message of server - solution check response status. note: maybe sequence in entity - see stacktrace.

my conjection xml corrupt, try-catch should data: store stacktrace or such. best if repeatable.

it data relates "out of order" message, or boundary case.


Comments

Popular posts from this blog

sql server - Cannot query correctly (MSSQL - PHP - JSON) -

php - trouble displaying mysqli database results in correct order -

C++ Linked List -