How to implement a trait with a generic case class that creates a dataset in Scala -


i want create scala trait should implemented case class t. trait load data , transform spark dataset of type t. got error no encoder can stored, think because scala not know t should case class. how can tell compiler that? i've seen somewhere should mention product, there no such class defined.. feel free suggest other ways this!

i have following code not compiling error: 42: error: unable find encoder type stored in dataset. primitive types (int, string, etc) , product types (case classes) supported importing sqlcontext.implicits._ [info] .as[t]

i'm using spark 1.6.1

code:

import org.apache.spark.{sparkconf, sparkcontext} import org.apache.spark.sql.{dataset, sqlcontext}      /**       * trait moves data on hadoop spark based on location , granularity of data.       */     trait agent[t] {       /**         * load dataframe location , convert dataset         * @return dataset[t]         */       protected def load(): dataset[t] = {         // read in data         sparkcontextkeeper.sqlcontext.read           .format("com.databricks.spark.csv")           .option("header", header) // use first line of files header           .option("inferschema", "true") // automatically infer data types           .option("delimiter", "|") // deloitte expects pipe delimiter           .option("dateformat","yyyy-mm-dd") // deloitte expects kind of date format           .load("/iacc/eandis/landing/raw/" + location + "/2016/10/01/")           .as[t]       }     } 

your code missing 3 things:

  • indeed, must let compiler know t subclass of product (the superclass of scala case classes , tuples)
  • compiler require typetag , classtag of actual case class. used implicitly spark overcome type erasure
  • import of sqlcontext.implicits._

unfortunately, can't add type parameters context bounds in trait, simplest workaround use abstract class instead:

import scala.reflect.runtime.universe.typetag import scala.reflect.classtag  abstract class agent[t <: product : classtag : typetag] {   protected def load(): dataset[t] = {      val sqlcontext: sqlcontext = sparkcontextkeeper.sqlcontext     import sqlcontext.implicits._     sqlcontext.read.// same...    } } 

obviously, isn't equivalent using trait, , might suggest design isn't best fit job. alternative placing load in object , moving type parameter method:

object agent {   protected def load[t <: product : classtag : typetag](): dataset[t] = {     // same...   } } 

which 1 preferable , how you're going call load , you're planning result.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -