Spark Checkpoint doesn't remember state (Java HDFS) -
already looked @ spark streaming not remembering previous state doesn't help. looked @ http://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointing cant find javastreamingcontextfactory although using spark streaming 2.11 v 2.0.1
my code works fine when restart it... won't remember last checkpoint...
function0<javastreamingcontext> scfunction = new function0<javastreamingcontext>() { @override public javastreamingcontext call() throws exception { //spark streaming needs checkpoint enough information fault- tolerant storage system such javastreamingcontext ssc = new javastreamingcontext(conf, durations.milliseconds(spark_duration)); //checkpointdir = "hdfs://user:pw@192.168.1.50:54310/spark/checkpoint"; ssc.sparkcontext().setcheckpointdir(checkpointdir); storagelevel.memory_and_disk(); return ssc; } }; javastreamingcontext ssc = javastreamingcontext.getorcreate(checkpointdir, scfunction);
currently data streaming kafka , performing transformation , action.
javapairdstream<integer, long> responsecodecountdstream = logobject.transformtopair (mainapplication::responsecodecount); javapairdstream<integer, long> cumulativeresponsecodecountdstream = responsecodecountdstream.updatestatebykey (compute_running_sum); cumulativeresponsecodecountdstream.foreachrdd(rdd -> { rdd.checkpoint(); log.warn("response code counts: " + rdd.take(100)); });
could point me right direction, if missing something?
also, can see checkpoint being saved in hdfs. why wont read it?
Comments
Post a Comment