bigdata - Flume takes time to upload a file to HDFS -


i need assistance regard checking why flume take time upload flatfiles hdfs. tried uploading 1 file (10mb size) however, 17 hours has past it's still uploading ".tmp". when checked log details, seems it's stuck in channel:

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-1

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-2

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-3

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-4

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-5

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-6

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-7

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-8

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.eventqueuebackingstorefile checkpointbackupcompleted

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-9

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-10

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-11

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-12

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-13

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-14

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-15

nov 10, x:xx:xx.xxx pm info org.apache.flume.channel.file.logfile closing randomreader /data5/flumedata/log-16

here configuration:

agent.sources = source1 agent.channels = channel1 agent.sinks = sinks1  agent.sources.source1.type = spooldir agent.sources.source1.spooldir = /data1/forupload agent.sources.source1.channels = channel1 agent.sources.source1.basenameheader = true   agent.channels.channel1.type = file agent.channels.channel1.capacity = 1000000 agent.channels.channel1.transactioncapacity = 10000 agent.channels.channel1.checkpointdir = /data5/checkpoint agent.channels.channel1.datadirs = /data5/flumedata agent.channels.channel1.usedualcheckpoints = true agent.channels.channel1.backupcheckpointdir = /data5/backupcheckpoint agent.channels.channel1.maxfilesize = 900000000  agent.sinks.sinks1.type = hdfs agent.sinks.sinks1.hdfs.path = /user/flume agent.sinks.sinks1.hdfs.filetype = datastream agent.sinks.sinks1.channel = channel1 agent.sinks.sinks1.hdfs.fileprefix = %{basename} agent.sinks.sinks1.hdfs.filesuffix = .csv agent.sinks.sinks1.hdfs.rollinterval = 0 agent.sinks.sinks1.hdfs.rollsize = 0 agent.sinks.sinks1.hdfs.rollcount = 0 

appreciate this

i think data has been sent. can check if file want send has been removed file.name.completed. if has been removed, file should sent.

however, there might data that's still in file channel, since data transmitted in batch. if size of data left less batch size, left in channel.

in order finish sending, can use kill -sigterm flume_process_id kill process. when flume receives signal, flushes data left hdfs. , file on hdfs renamed, i.e. remove .tmp suffix.


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -