scala - How to use NOT IN from a CSV file in Spark -

July 15, 2010

i use spark sql load data val this

val customers = sqlcontext.sql("select * customers")

but have separate txt file contains 1 column cust_id , 50,00 rows. i.e.

cust_id 1 2 3

i want customers val have customers in customers table not in txt file.

using sql select * customers not in cust_id ('1','2','3')

how can using spark?

i've read textfile , can print rows of i'm not sure how match sql query

scala> val custids = sc.textfile("cust_ids.txt") scala> custids.take(4).foreach(println) cust_id 1 2 3

you can import text file dataframe , left outer join:

val customers = seq(("1", "aaa", "shipped"), ("2", "ada", "delivered") , ("3", "fga", "never received")).todf("id","name","status") val custid = seq(1,2).todf("custid")  customers.join(custid,'id === 'custid,"leftouter")          .where('custid.isnull)          .drop("custid")          .show()   +---+----+--------------+ | id|name|        status| +---+----+--------------+ |  3| fga|never received| +---+----+--------------+

Search This Blog

CSS

scala - How to use NOT IN from a CSV file in Spark -

Comments

Post a Comment

Popular posts from this blog

php - trouble displaying mysqli database results in correct order -

depending on nth recurrence of job in control M -

sql server - Cannot query correctly (MSSQL - PHP - JSON) -