Get filename without path in Spark DataFrame SQL -
i have spark dataframe, has data parsed xml folder through spark-xml. want add column containing source file, done through input_file_name() function.
the problem returns whole path, , want filename. tried registering udf in spark sql, extracts filename, empty columns in end. function works, apparently gets empty values input, , don't understand why.
does know issue , how solve it?
edit: example
if select filename column through df.selectexpr('input_file_name()')
path , filename. if define function returning input:
def f(path): return path
and register through session.udf.register('f',f)
, , select column again through df.selectexpr('f(input_file_name())')
, empty column.
we can register udf
return part of string after last "/"
, , apply function output of input_file_name()
:
from pyspark.sql.functions import input_file_name spark.udf.register("filename", lambda x: x.rsplit('/', 1)[-1]) df.selectexpr('filename(input_file_name()) file')
Comments
Post a Comment