GetHdfs to PutEmail Flow Apache NiFi - hdfs

I am curious if I will need any extra processors for a GETHDFS - > COUNT PUTEMAIL flow using Apache Nifi.
I will be reading a CSV file from a HDFS location and I want to email the contents of the the directory using the PUTEMAIL.
i

If you want to put the contents of the FlowFile as the email message body, then you need to extract the contents to an attribute which can be done using the ExtractText processor.
Otherwise, you don't need any other processor. You can just use the FlowFile as an attached file and you're done.

Related

Generating Single Flow file for loading it into S3

I have a Nifi Flow, which fetches a data from RDS tables and load into S3 as flat files, now i need to generate another file which will be having the name of the file that I am loading into S3 bucket, this needs to be a separate flow;
example: if the RDS extracted flat file name is RDS.txt, then the new generated file should have rds.txt as content and I need to load this file to same S3 bucket.
Problem I face is I am using a generate flowfile processor and adding the flat file name as custom text in flowfile, but i could not set up any upstream for Generate flow file processor, so this is generating more files, if I use the merge content processor after the generate flow file processor, I could see duplicate values in the flowfile.
Can anyone help me out in this
I have a Nifi Flow, which fetches a data from RDS tables and load into S3 as flat files, now i need to generate another file which will be having the name of the file that I am loading into S3 bucket, this needs to be a separate flow;
Easiest path to do this is to chain something after PutS3Object that will update the flowfile contents with what you want. It would be really simple to write with ExecuteScript. Something like this:
def ff = session.get()
if (ff) {
def updated = session.write(ff, {
it.write(ff.getAttribute("filename").bytes)
} as OutputStreamCallback)
updated = session.putAttribute(updated, "is_updated", "true")
session.transfer(updated, REL_SUCCESS)
}
Then you can put a RouteOnAttribute after PutS3Object and have it route to either a null route if it detects the attribute is_updated or route back to PutS3Object if it's not been updated.
I got a simple solution for this I have added a funnel before the put s3 object, and upstream of the funnel will receive two file, one with the extract and the other with the file name, down stream of the funnel is connected to the puts3 object, so this will load both the files at the same time

Regex pattern to pick latest file in nifi

I have a NiFi flow where I am getting all data from s3 and putting it in the destination folder. Now, the requirement is if there is any latest data then just transfer the latest data only. I have a data file in s3 like below:
20201130-011101493.parquet
20201129-011101493.parquet
And the regex I tried:
\d[0-9]{8}.parquet
The problem is it is not picking the first file which is the latest data i.e 30/11/2020
How can I modify my regex so that it will be picking the latest file only if the job runs once per day? I also referred this SO post but I guess I am not able to get my regex correct.

How to create a apache nifi template,to get data from url?

I want to get the data from a ftp link and store it as a hive table.
ashok,
Using following processors to achieve your requirments.
GetFTP-->PutFile-->ReplaceText-->PutHiveQL.
GetFTP-->Gets a file from ftp server with help of hostname&port,username&credentials.
PutFile-->Store file from FTP into local drive.
ReplaceText-->Search flowfile content &replace it with your hive query in which having putFile location to specify downloaded file to load into hive.
PutHiveQL-->Executes hive query present in flowfile.
Please let me know if you have any queries

How to read file in Apache Samza from local file system and hdfs system

Looking for approach in Apache Samza to read file from local system or HDFS
then apply filters, aggregate, where condition, order by, group by into batch of data.
Please provide some help.
You should create a system for each source of data you want to use. For example, to read from a file, you should create a system with the FileReaderSystemFactory -- for HDFS, create a system with the HdfsSystemFactory. Then, you can use the regular process callback or windowing to process your data.
You can feed your Samza Job using standard Kafka producer. To make it easy for you. You can use Logstash, you need to create Logstash script where you specify:
input as local file or hdfs
filters (optional) here you can do basic filtering, aggregation etc.
kafka output with specific topic you want to feed
input
I was using this approach to feed my samza job from local file
Another approach could be using Kafka Connect
http://docs.confluent.io/2.0.0/connect/

How To Pass Different Parameters To web Service Every Time It Is Called With Jmeter

I am testing web service using Jmeter, web service has several methods which takes some parameters. What I need to do is to pass different parameters every time user (thread) "calls" web service method.
I know, that I can do something like that, if I write Soap Messages in xml files and then to "give" Jmeter path of folder including this xml files, but Jmeter will take randomly those files and there is probability to use the same file twice or more. But I want Jmeter to use every request time different unique Soap Message.
Can Anyone help me?
Use CSV Data Set Config, prepare csv file containing your parameters.
Set CSV config recycle on EOF to false.
Set CSV config stop thread on EOF to true.
Set CSV config Sharing mode to all threads.