scala - How to use NOT IN from a CSV file in Spark -


i use spark sql load data val this

val customers = sqlcontext.sql("select * customers") 

but have separate txt file contains 1 column cust_id , 50,00 rows. i.e.

cust_id 1 2 3 

i want customers val have customers in customers table not in txt file.

using sql select * customers not in cust_id ('1','2','3')

how can using spark?

i've read textfile , can print rows of i'm not sure how match sql query

scala> val custids = sc.textfile("cust_ids.txt") scala> custids.take(4).foreach(println) cust_id 1 2 3 

you can import text file dataframe , left outer join:

val customers = seq(("1", "aaa", "shipped"), ("2", "ada", "delivered") , ("3", "fga", "never received")).todf("id","name","status") val custid = seq(1,2).todf("custid")  customers.join(custid,'id === 'custid,"leftouter")          .where('custid.isnull)          .drop("custid")          .show()   +---+----+--------------+ | id|name|        status| +---+----+--------------+ |  3| fga|never received| +---+----+--------------+ 

Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -