I am working on a NIFI project which I want to get the next business day.
For example if I input Jan-14-2022, I can get next businessday is Jan-18-2022 since 14th, 15th is weekends and 17th is holiday.
I currently have
:toDate('MMM-dd-yyyy'):toNumber():plus(86400000):format('MMM-dd-yyyy')
which will add 24 hours and return me the date one day after.
But how Can I get the next business day?
This sounds like something that a NiFi Processor could be configured to do using something such as a script or another option inside a processor.
However, this is something that would better be done with code, not NiFI as this isnt something that NiFi can natively do. I am not aware if this is something natively something NiFi should and can do.
Related
I need a scheduled query only between Monday to Friday between 9 and 7 o'clock:
Scheduled queries is currently: every hour from 9:00 to 19:00
But how to modify for Mo-Fr ?
every monday to friday from 9:00 to 19:00 not working
every monday from 9:00 to 19:00 working (so day of the week is in general not working ?)
Thanks
UPDATE: The question at hand is much more complex than the Custom setting in BigQuery Scheduled Queries allows. For this purpose, #guillaume blaquiere has the best suggestion: use Cloud Scheduler to run a cron job. Tools like Crontab Guru can be helpful in creating a statement such as 00 9-19 * * 1-5.
For simpler Scheduled Queries, please review the following from the official documentation: Set up scheduled queries.
Specifically,
To specify a custom frequency, select Custom, then enter a Cron-like
time specification in the Custom schedule field; for example every 3
hours.
There is excellent documentation in the Custom Interval tab here on the many options you have available in this field.
thanks for the Feedback. So like this one ? But this is not working
I log simple tabular data: ISO8601 date/time, number, number, text
Once a day a small lambda job downloads it as a CSV and runs a few simple stats on it and emails me.
Once every few days, I might need to visually glance at something that might have happened earlier that day.
Currently, I'm pushing the events as they happen to an SQS pipeline, then popping them off the queue in a batch lambda job once every 10 mins to a Google sheet. It's easy to just click a bookmark from any browser: there's the data, nice and easy and tabular, and it provides native csv export.
But I just hit the Google Sheets 500,000 cell limit, and now that everything is soon to run from one EC2 server, I might as well go all-AWS. And the thing which served me well as a sticking plaster for 3 years is now feeling horribly clunky.
I looked at Cloudwatch, but it appears that it can't export to CSV without fiddling via jq (?)
I looked at https://aws.amazon.com/products/databases/ - dynamob DB is out as you can't glance at the output in chronological order. Perhaps simpleDB is the solution? https://aws.amazon.com/simpledb/
But again, I don't need a DB. Just something that can cheaply and simply receive about 50 lines an hour, in an sqs manner, has no row limit, allows easy automated export to CSV via the AWS node sdk, and will let me cast a human eye over a few rows from time to time like a spreadsheet.
I keep coming back to Cloudwatch because of CloudWatch Logs Insights. It's only the lack of csv export with the sdk that's putting me off.
Am I looking in the right direction? Have I missed something obvious? Thanks.
As you all know, AWS Timestream was made generally available in the last week.
Since then, I have been trying to experiment with it and understanding how it models and stores the data.
I am facing an issue in ingesting records into Timestream.
I have certain records dated 23rd April 2020. On trying to insert these records into a Timestream table, I get the RecordRejected error.
According to this link, a record is rejected if it has the same dimension, timestamp or if the timestamp is beyond the retention period of the memory store of the table.
I have set the retention period of the memory store of my table to 12 months. According to the documentation: any records having a timestamp beyond 12 months would be rejected.
However, the above mentioned record gets rejected despite having a timestamp within 12 months from now.
On investigating further, I have noticed that, records with today's date (5th Oct 2020) get ingested successfully, however, records with a date 30 days before do not get ingested, i.e. 5th September 2020. To ensure this, I have also tried inserting a record with the date 6th Sept and a few more days between today's date and 5th Sept. All these are getting inserted successfully.
Could somebody explain why I am not able to insert records having a timestamp within the retention period of the memory store? It only allows me to insert records that are at the most 30 days old.
I would also like to know if there is a way we could insert historical data directly into the magnetic store. The memory store retention period may not be sufficient for my use case and I may need to insert data that is 2 years old or more. I understand this is not a classic use case of timestream, but I am still curious to know.
I am stuck on this issue and would really appreciate some help.
Thank you in advance.
I had a very similar issue, and for me it turned out that I had to set the Memory Store Retention Period to 8766 hours - which is slightly MORE than one year. I've no clue why that is, and why it works, but it worked for me importing older data.
PS: I'm pretty sure it's a bug in timestream
PPS: I've found the value by using the "default" set in the aws console. No other value worked for me.
Timestream loads data into the memory store only if the timestamp is within the timespan of its retention period. So if the retention period is 1 day, the timestamp can't be more than 1 day ago.
AWS TimeStream: Records that are older than one day are rejected
I need to find out the mapping name which is not run in last 6 months in informatica folder.
Is there possible?
Could any one help to find out this?
Thanks
Pandia
The best way I could suggest is look for session log. Typically it would be by name session_name.log.
You can filter the log files for the period you are looking for or write a unix script to find the latest running date of each mapping and then filtering it.
Query the OPB_TASK_INST_RUN table for all sessions which have ran over the timespan that you care about
My stream analytics job is getting data for last 24 hours
There is a lot of data to look at here, and whilst this worked for a while, its now stopped generating output events
This prevents data being sent to power bi
I only want the last 24 hours of data to be shown in Power BI
How can I do this?
I have tried to reduce the time window, but I dont want to do that as a fix.
SELECT [Timestamp], Broker, Price, COUNT(*)
INTO [powerbi2]
FROM [eventhubinput] TIMESTAMP BY [TimeStamp]
GROUP BY [TimeStamp], Broker, Price, TUMBLINGWINDOW(hh,3)
The query looks correct. There are a couple things that could be happening here:
Your PowerBI account is being throttled (see here for limits on data ingress). If this is occurring, there should be warnings in your job's Activity Log. If this is the case, you may have to decrease the rate of your job's egress and/or upgrade your PowerBI account.
Your job is falling behind your Event hub due to the high rate of ingress. You can check this by looking at the Input Events Backlogged metric in the Portal. If this is the case, scaling your job may help.
If neither of these suggestions help, I'd recommend reaching out to Azure support so the team can take a closer look.