Get filename without path in Spark DataFrame SQL -


i have spark dataframe, has data parsed xml folder through spark-xml. want add column containing source file, done through input_file_name() function.

the problem returns whole path, , want filename. tried registering udf in spark sql, extracts filename, empty columns in end. function works, apparently gets empty values input, , don't understand why.

does know issue , how solve it?

edit: example

if select filename column through df.selectexpr('input_file_name()') path , filename. if define function returning input:

def f(path):     return path 

and register through session.udf.register('f',f), , select column again through df.selectexpr('f(input_file_name())'), empty column.

we can register udf return part of string after last "/", , apply function output of input_file_name():

from pyspark.sql.functions import input_file_name  spark.udf.register("filename", lambda x: x.rsplit('/', 1)[-1]) df.selectexpr('filename(input_file_name()) file')  

Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -