How to implement a trait with a generic case class that creates a dataset in Scala -
i want create scala trait should implemented case class t. trait load data , transform spark dataset of type t. got error no encoder can stored, think because scala not know t should case class. how can tell compiler that? i've seen somewhere should mention product, there no such class defined.. feel free suggest other ways this!
i have following code not compiling error: 42: error: unable find encoder type stored in dataset. primitive types (int, string, etc) , product types (case classes) supported importing sqlcontext.implicits._ [info] .as[t]
i'm using spark 1.6.1
code:
import org.apache.spark.{sparkconf, sparkcontext} import org.apache.spark.sql.{dataset, sqlcontext} /** * trait moves data on hadoop spark based on location , granularity of data. */ trait agent[t] { /** * load dataframe location , convert dataset * @return dataset[t] */ protected def load(): dataset[t] = { // read in data sparkcontextkeeper.sqlcontext.read .format("com.databricks.spark.csv") .option("header", header) // use first line of files header .option("inferschema", "true") // automatically infer data types .option("delimiter", "|") // deloitte expects pipe delimiter .option("dateformat","yyyy-mm-dd") // deloitte expects kind of date format .load("/iacc/eandis/landing/raw/" + location + "/2016/10/01/") .as[t] } }
your code missing 3 things:
- indeed, must let compiler know t subclass of
product
(the superclass of scala case classes , tuples) - compiler require
typetag
,classtag
of actual case class. used implicitly spark overcome type erasure - import of
sqlcontext.implicits._
unfortunately, can't add type parameters context bounds in trait, simplest workaround use abstract class
instead:
import scala.reflect.runtime.universe.typetag import scala.reflect.classtag abstract class agent[t <: product : classtag : typetag] { protected def load(): dataset[t] = { val sqlcontext: sqlcontext = sparkcontextkeeper.sqlcontext import sqlcontext.implicits._ sqlcontext.read.// same... } }
obviously, isn't equivalent using trait, , might suggest design isn't best fit job. alternative placing load
in object , moving type parameter method:
object agent { protected def load[t <: product : classtag : typetag](): dataset[t] = { // same... } }
which 1 preferable , how you're going call load
, you're planning result.
Comments
Post a Comment