Flink checkpoint on hdfs of HA namenode config - hdfs

The checkpoint path of flink is an hdfs absolute path like hdfs://address:port/path, but when hdfs is on HA mode, how to set the flink configuration for hdfs namenode url change?

Using Hadoop HA ,clusterId is recommended.After doing so,the URI looks like this hdfs://nameservice_id/path/file.Use this URI instead of 'hdfs://activeNamenodeHost/path'.Hope this helps.
How to set here : NameNode HA when using hdfs:// URI.
More details here : https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Related

Where to put application property file for spark application running on AWS EMR

I am submitting one spark application job jar to EMR, and it is using some property file. So I can put it into S3 and while creating the EMR I can download it and copy it at some location in EMR box if this is the best way how I can do this while creating the EMR cluster itself at bootstrapping time.
Check following snapshot
In edit software setting you can add your own configuration or JSON file ( which stored on S3 location ) and using this setting you can passed configure parameter to EMR cluster on creating time. For more details please check following links
Amazon EMR Cluster Configurations
Configuring Applications
AWS ClI
hope this will help you.

How to create hdfs replication using Cloudera CM API

Hi I am new to hadoop and wanted to create hdfs replication using cloudera manager API.
How to create hdfs replication using Cloudera CM API?
After doing lots of research, I was able to create the below command which will replicate from one location to the other in hdfs within same cluster.
But, with slight variations we can do remote cluster replication also.

YARN or HDFS logs in Filebeat

If I want to ingest logs which are present in HDFS into Filebeat, how can i do that? I can specify any directory that will be on local drives but i want the Filebeat to pick data from HDFS. Is there any way this can be done? Any help will be greatly appreciated.
You can mount HDFS to local file system, follow instructions from this answer.

HDF to HDP data store

We have two clusters. One is HDF cluster including Nifi and the other one is HDP cluster included hdf, Hive other components. We are reading data from file and want to place in hdp cluster hdfs.
Can anybody point out to documentation regarding this or some examples..
Thanks in advance
NiFi's PutHDFS processor will write data to HDFS. You configure it with your hdfs-site.xml and core-site.xml files.
Sometimes network, security, or application configurations make it difficult to securely write files from a remote NiFi to a Hadoop cluster. A common pattern is to use two NiFis - one NiFi collects, formats, and aggregates records before transmitting to a second NiFi inside the Hadoop cluster via NiFi site-to-site protocol. Because the second NiFi is inside the Hadoop cluster, it can make it easier to write files securely to HDFS.
PutHDFS features in a couple NiFi Example Dataflow Templates, which also demonstrate commonly related activities like aggregating data, directory and file naming, and NiFi site-to-site communication.

How to configure putHDFS processor in Apache NiFi such that I could transfer file from a local machine to HDFS over the network?

I have data in a file on my local windows machine. The local machine has Apache NiFi running on it. I want to send this file to HDFS over the network using NiFi. How could I configure putHDFS processor in NiFi on the local machine such that I could send data to HDFS over the network?
Thank you!
You need to copy the core-site.xml and hdfs-site.xml from one of your hadoop nodes to the machine where NiFi is running. Then configure PutHDFS so that the configuration resources are "/path/to/core-site.xml,/path/to/hdfs-site.xml". That is all that is required from the NiFi perspective, those files contain all of the information it needs to connect to the Hadoop cluster.
You'll also need to ensure that the machine where NiFi is running has network access to all of the machines in your Hadoop cluster. You can look through those config files and find any hostnames and IP addresses and make sure they can be accessed from the machine where NiFi is running.
Using the GetFile processor or the combination of ListFile/FetchFile, it would be possible to bring this file from your local disk into NiFi and pass this onto the PutHDFS processor. The PutHDFS processor relies on the associated core-site.xml and hdfs-site.xml files in its configuration.
Just add Hadoop core configuration file directory to the first field
$HADOOP_HOME/conf/hadoop/hdfs-site.xml, $HADOOP_HOME/conf/hadoop/core-site.xml
and set the hdfs directory of the data ingestion to get stored in the field of Directory & let everything else default.