scala - How to use NOT IN from a CSV file in Spark -
i use spark sql load data val
this
val customers = sqlcontext.sql("select * customers")
but have separate txt file contains 1 column cust_id
, 50,00 rows. i.e.
cust_id 1 2 3
i want customers
val have customers in customers
table not in txt file.
using sql select * customers not in cust_id ('1','2','3')
how can using spark?
i've read textfile , can print rows of i'm not sure how match sql query
scala> val custids = sc.textfile("cust_ids.txt") scala> custids.take(4).foreach(println) cust_id 1 2 3
you can import text file dataframe , left outer join:
val customers = seq(("1", "aaa", "shipped"), ("2", "ada", "delivered") , ("3", "fga", "never received")).todf("id","name","status") val custid = seq(1,2).todf("custid") customers.join(custid,'id === 'custid,"leftouter") .where('custid.isnull) .drop("custid") .show() +---+----+--------------+ | id|name| status| +---+----+--------------+ | 3| fga|never received| +---+----+--------------+
Comments
Post a Comment