Spark Checkpoint doesn't remember state (Java HDFS) -


already looked @ spark streaming not remembering previous state doesn't help. looked @ http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing cant find javastreamingcontextfactory although using spark streaming 2.11 v 2.0.1

my code works fine when restart it... won't remember last checkpoint...

function0<javastreamingcontext> scfunction = new function0<javastreamingcontext>() {         @override         public javastreamingcontext call() throws exception {             //spark streaming needs checkpoint enough information fault- tolerant storage system such             javastreamingcontext ssc = new javastreamingcontext(conf, durations.milliseconds(spark_duration));           //checkpointdir = "hdfs://user:pw@192.168.1.50:54310/spark/checkpoint";             ssc.sparkcontext().setcheckpointdir(checkpointdir);             storagelevel.memory_and_disk();             return ssc;         }     };      javastreamingcontext ssc = javastreamingcontext.getorcreate(checkpointdir, scfunction); 

currently data streaming kafka , performing transformation , action.

javapairdstream<integer, long> responsecodecountdstream = logobject.transformtopair             (mainapplication::responsecodecount);     javapairdstream<integer, long> cumulativeresponsecodecountdstream = responsecodecountdstream.updatestatebykey             (compute_running_sum);     cumulativeresponsecodecountdstream.foreachrdd(rdd -> {         rdd.checkpoint();         log.warn("response code counts: " + rdd.take(100));     }); 

could point me right direction, if missing something?

also, can see checkpoint being saved in hdfs. why wont read it?


Comments

Popular posts from this blog

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -