bigdata - Flume takes time to upload a file to HDFS -
i need assistance regard checking why flume take time upload flatfiles hdfs. tried uploading 1 file (10mb size) however, 17 hours has past it's still uploading ".tmp". when checked log details, seems it's stuck in channel:
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-1
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-2
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-3
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-4
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-5
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-6
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-7
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-8
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.eventqueuebackingstorefile checkpointbackupcompleted
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-9
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-10
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-11
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-12
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-13
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-14
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-15
nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-16
here configuration:
agent.sources = source1 agent.channels = channel1 agent.sinks = sinks1 agent.sources.source1.type = spooldir agent.sources.source1.spooldir = /data1/forupload agent.sources.source1.channels = channel1 agent.sources.source1.basenameheader = true agent.channels.channel1.type = file agent.channels.channel1.capacity = 1000000 agent.channels.channel1.transactioncapacity = 10000 agent.channels.channel1.checkpointdir = /data5/checkpoint agent.channels.channel1.datadirs = /data5/flumedata agent.channels.channel1.usedualcheckpoints = true agent.channels.channel1.backupcheckpointdir = /data5/backupcheckpoint agent.channels.channel1.maxfilesize = 900000000 agent.sinks.sinks1.type = hdfs agent.sinks.sinks1.hdfs.path = /user/flume agent.sinks.sinks1.hdfs.filetype = datastream agent.sinks.sinks1.channel = channel1 agent.sinks.sinks1.hdfs.fileprefix = %{basename} agent.sinks.sinks1.hdfs.filesuffix = .csv agent.sinks.sinks1.hdfs.rollinterval = 0 agent.sinks.sinks1.hdfs.rollsize = 0 agent.sinks.sinks1.hdfs.rollcount = 0
appreciate this
i think data has been sent. can check if file want send has been removed file.name.completed
. if has been removed, file should sent.
however, there might data that's still in file channel, since data transmitted in batch. if size of data left less batch size, left in channel.
in order finish sending, can use kill -sigterm flume_process_id
kill process. when flume receives signal, flushes data left hdfs. , file on hdfs renamed, i.e. remove .tmp
suffix.
Comments
Post a Comment