HDFS cannot obtain block length - hdfs

When I try to "hdfs dfs -cat myFile" from command line I get the exception "Cannot obtain block length for LocatedBlock" and there are many files affected by this issue.
Probably my HDFS cluster was in bad state.
Any solution for this problem please??

This means the file is being written yet, so the file needs to be closed, probably the data producer loses their connection with datanodes.
hdfs debug recoverLease -path <path-of-the-file> [-retries <retry-times>]
Given a block-id you can get the file by executing:
hdfs fsck -blockId blk_523076021
Then you can recoverLease of this path.

Related

Why won't node zero execute some of the write statements to a log file

I have a production job where I use two nodes (0=master and 1=slave) via OPENMPI and all the threads on each node via OPENMP.
I submit the job on the master.
Job opens a disk file on master to log some info. ( I see same file is opened on slave as well during the run)
I have statements like
write(lu,*) 'pid=',pid,' some text'
and
write(6, *) 'pid=',pid,' some text'
one after the other. (unit 6 is the stdout -screen- in gfortran).
I see on screen that both statements are printed one after the other ( pid=0 and pid=1 ).
Strangely enough most (not all) of master prints (pid=0) on the log file are absent.
This is puzzling. I would like to learn the rule. I thought both master and slave share the logfile.
I have a host file with two hosts each requesting 32 threads ( via slots and max-slots commands ) and I am running the following command as a script
miprun --hostfile hostfile --map-by node:pe32 myexecutable
I will appreciate if some expert can shed light on the issue.

Fluentd S3 output plugin not recognizing index

I am facing problems while using S3 output plugin with fluent-d.
s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
using %index at end never resolves to _0,_1 . But i always end up with log file names as
sflow_.log
while i need sflow_0.log
Regards,
Can you paste your fluent.conf?. It's hard to find the problem without full info. File creations are mainly controlled by time slice flag and buffer configuration..
<buffer>
#type file or memory
path /fluentd/buffer/s3
timekey_wait 1m
timekey 1m
chunk_limit_size 64m
</buffer>
time_slice_format %Y%m%d%H%M
With above, you create a file every minute and within 1min if your buffer limit is reached or due to any other factor another file is created with index 1 under same minute.

Apache Flume taking more time than copyFromLocal command

I have 24GB folderin my local file system. My task is to move that folder to HDFS. Two ways I did it.
1) hdfs dfs -copyFromLocal /home/data/ /home/
This took around 15mins to complete.
2) Using Flume.
Here is my agent
spool_dir.sources = src-1
spool_dir.channels = channel-1
spool_dir.sinks = sink_to_hdfs
# source
spool_dir.sources.src-1.type = spooldir
spool_dir.sources.src-1.channels = channel-1
spool_dir.sources.src-1.spoolDir = /home/data/
spool_dir.sources.src-1.fileHeader = false
# HDFS sinks
spool_dir.sinks.sink_to_hdfs.type = hdfs
spool_dir.sinks.sink_to_hdfs.hdfs.fileType = DataStream
spool_dir.sinks.sink_to_hdfs.hdfs.path = hdfs://192.168.1.71/home/user/flumepush
spool_dir.sinks.sink_to_hdfs.hdfs.filePrefix = customevent
spool_dir.sinks.sink_to_hdfs.hdfs.fileSuffix = .log
spool_dir.sinks.sink_to_hdfs.hdfs.batchSize = 1000
spool_dir.channels.channel-1.type = file
spool_dir.channels.channel-1.checkpointDir = /home/user/spool_dir_checkpoint
spool_dir.channels.channel-1.dataDirs = /home/user/spool_dir_data
spool_dir.sources.src-1.channels = channel-1
spool_dir.sinks.sink_to_hdfs.channel = channel-1
This step took almost an hour to push data to HDFS.
As per my knowledge Flume is distributed, so should not it be that Flume should load data faster than copyFromLocal command.
If you're looking simple at read and write operations flume is going to be at least 2x slower with your configuration as you're using a file channel - every file read from disk is encapsulated into a flume event (in memory) and then serialized back down to disk via the file channel. The sink then reads the event back from the file channel (disk) before pushing it up to hdfs.
You also haven't set a blob deserializer on your spoolDir source (so it's reading one line at a time from your source files, wrapping in a flume Event and then writing to the file channel), so paired with the HDFS Sink default rollXXX values, you'll be getting a file in hdfs per 10 events / 30s / 1k rather than a file per input file that you'd get with copyFromLocal.
All of these factors add up to give you slower performance. If you want to get a more comparable performance, you should use the BlobDeserializer on the spoolDir source, coupled with a memory channel (but understand that a memory channel doesn't guarantee delivery of an event in the event of the JRE being prematurely terminated.
Apache Flume is not intended for moving or copying folders from local file system to HDFS. Flume is meant for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. (Reference: Flume User Guide)
If you want to move large files or directories, you should use hdfs dfs -copyFromLocal as you have already mentioned.

Flume HDFS Sink generates lots of tiny files on HDFS

I have a toy setup sending log4j messages to hdfs using flume. I'm not able to configure the hdfs sink to avoid many small files. I thought I could configure the hdfs sink to create a new file every-time the file size reaches 10mb, but it is still creating files around 1.5KB.
Here is my current flume config:
a1.sources=o1
a1.sinks=i1
a1.channels=c1
#source configuration
a1.sources.o1.type=avro
a1.sources.o1.bind=0.0.0.0
a1.sources.o1.port=41414
#sink config
a1.sinks.i1.type=hdfs
a1.sinks.i1.hdfs.path=hdfs://localhost:8020/user/myName/flume/events
#never roll-based on time
a1.sinks.i1.hdfs.rollInterval=0
#10MB=10485760
a1.sinks.il.hdfs.rollSize=10485760
#never roll base on number of events
a1.sinks.il.hdfs.rollCount=0
#channle config
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sources.o1.channels=c1
a1.sinks.i1.channel=c1
It is your typo in conf.
#sink config
a1.sinks.i1.type=hdfs
a1.sinks.i1.hdfs.path=hdfs://localhost:8020/user/myName/flume/events
#never roll-based on time
a1.sinks.i1.hdfs.rollInterval=0
#10MB=10485760
a1.sinks.il.hdfs.rollSize=10485760
#never roll base on number of events
a1.sinks.il.hdfs.rollCount=0
where in the line 'rollSize' and 'rollCount', you put il as i1.
Please try to use DEBUG, then you will find like:
[SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.shouldRotate:465) - rolling: rollSize: 1024, bytes: 1024
Due to il, default value of rollSize 1024 is being used .
HDFS Sink has a property hdfs.batchSize (default 100) which describes "number of events written to file before it is flushed to HDFS". I think that's your problem here.
Consider also checking all other properties: HDFS Sink .
This can possibly happen because of the memory channel and its capacity. I guess its dumping data to HDFS as soon as its capacity becomes full. Did you try using file channel instead of memory ?

How to handle FTE queued transfers

I have fte monitor with a '*.txt' as trigger condition, whenever a text file lands at source fte transfer file to destination, but when 10 files land at source at a time then fte is triggering 10 transfer request simultaneously & all the transfers are getting queued & stuck.
Please suggest how to handle this scenarios
Ok, I have just tested this case:
I want to transfer four *.xml files from directory right when they appear in that directory. So I have monitor set to *.xml and transfer pattern set to *.xml (see commands bellow).
Created with following commands:
fteCreateTransfer -sa AGENT1 -sm QM.FTE -da AGENT2 -dm QM.FTE -dd c:\\workspace\\FTE_tests\\OUT -de overwrite -sd delete -gt /var/IBM/WMQFTE/config/QM.FTE/FTE_TEST_TRANSFER.xml c:\\workspace\\FTE_tests\\IN\\*.xml
fteCreateMonitor -ma AGENT1 -mn FTE_TEST_TRANSFER -md c:\\workspace\\FTE_tests\\IN -mt /var/IBM/WMQFTE/config/TQM.FTE/FTE_TEST_TRANSFER.xml -tr match,*.xml
I got three different results depending on configuration changes:
1) just as commands are, default agent.properties:
in transfer log appeared 4 transfers
all 4 transfers tryed to transfer all four XML files
3 of them with partial success because agent could't delete source file
with success that transfered all files and deleted all source files
Well, with transfer type File to File, final state is in fact ok - four files in destination directory because the previous file are overwritten. But with File to Queue I got 16 messages in destination queue.
2) fteCreateMonitor command modified with parameter "-bs 100", default agent.properties:
in transfer log , there is only one transfer
this transfer is with partial success result
this transfer tryed to transfer 16 files (each XML four times)
agent was not able to delete any file, so source files remained in source directory
So in sum I got same total amount of files transfered (16) as in first result. And not even deleted source files.
3) just as commands are, agent.properties modified with parameter "monitorMaxResourcesInPoll=1":
in transfer log , there is only one transfer
this transfer is with success result
this transfer tryed to transfer four files and succeeded
agent was able to delete all source files
So I was able to get expected result only with this settings. But I am still not sure about appropriateness of the monitorMaxResourcesInPoll parameter set to "1".
Therefore for me the answer is: add
monitorMaxResourcesInPoll=1
to agent.properties. But this is in collision with other answers posted here, so I am little bit confused now.
tested on version 7.0.4.4
Check the box that says "Batch together the file transfers when multiple trigger files are found in one poll interval" (screen three).
Make sure that you set the maxFilesForTransfer in the agent.properties file to a value that is large enough for you, but be careful as this will affect all transfers.
You can also set monitorMaxResourcesInPoll=1 in the agent.properties file. I don't recommend this for 2 reasons: 1) it will affect all monitors 2) it may make it so that you can never catch up on all the files you have to transfer depending on your volume and poll interval.
Set your "Batch together the file transfers..." to a value more than 10:
Max Batch Size = 100