I'm practicing for the Data Engineer GCP certification exam and got the following question:
You have a Google Cloud Dataflow streaming pipeline running with a
Google Cloud Pub/Sub subscription as the source. You need to make an
update to the code that will make the new Cloud Dataflow pipeline
incompatible with the current version. You do not want to lose any
data when making this update.
What should you do?
Possible answers:
Update the current pipeline and use the drain flag.
Update the current pipeline and provide the transform mapping JSON object.
The correct answer according to the website 1 my answer was 2. I'm not convinced my answer is incorrect and these are my reasons:
Drain is a way to stop the pipeline and does not solve the incompatibility issues.
Mapping solves the incompatibility issue.
The only way that I see 1 as the correct answer is if you don't care about compatibility.
So which one is right?
I'm studying for the same exam, and the two cores of this question are:
1- Don't lose data ← Drain, is perfect for this because you process all buffer data and stop reviving messages; normally this message is alive for 7 days of retry, so when you start a new job you will receive all without lose any data.
2- Incompatible new code ← mapping solve some incompatibilities like change name of a ParDO but no a version issue. So launch a new job with the new code, it's the only option.
So, option is A.
I think the main point is that you cannot solve all the incompatibilities with the transform mapping. Mapping can be done for simple pipeline changes (for example, names), but it doesn't generalize well.
The recommended solution is constantly draining the pipeline running a legacy version, as it will stop taking any data from reading components, finish all the work pending on workers, and shutdown.
When you start a new pipeline, you don't have to worry about state compatibility, as workers are starting fresh.
However, the question is indeed ambiguous, and it should be more precise about the type of incompatibility or state something like in general. Arguably you can always try to update the job with mapping, and if Dataflow finds the new job to be incompatible, it will not affect the running pipeline -- then your only choice would be the drain option.
I have never worked with Flyway, but all of a sudden I'm managing it. It appears outOfOrder is false by default and I am now working with a failed migration that is blocking continuous deployment to our QA environment. I merged V21__alter_sequence.sql after V22__add_column.sql was already merged. Migrations are all run by a Jenkins build process.
My question is, how do I fix this? My guesses are:
merge U21__alter_sequence.sql then merge V23__alter_sequence.sql?
update flyway.conf, add line flyway.outOfOrder=true and maybe some magic happens on the next merge?
(see my ignorance on this topic!) Any help is greatly appreciated.
I'd like to optimize my notification system, so here is how it works now:
Every time some change occurred on application, we're calling background job (Sidekiq) in order to compute some values and then to notify users via email.
This approach worked very well for a while, but suddenly we got memory leak as there were a lot of actions very frequently and we had about 30-50 workers per second so I need to refactor this.
What I would like to do is, instead of running worker immediately, to store it in array and perform bit later.
But I'm afraid that also will cause a problem, but just "delayed" problem.
I'm looking forward to hear more approaches and solutions as well.
Thanks in advance
So I found one very interesting solution:
I'm storing values to Redis directly as key - value, where the value is dataset with data I'd need later for computation. Then I'm using simple cron job, which occurs service which is responsible for reading data from Redis and computing them. I optimized Sidekiq workers to work only when cron is executed, everything works perfectly fine and even much faster then before.
I'm still eager to hear if there is any other approach/solution.
Thanks
I have been looking for a solution for 2 weeks now. I have seen solutions like schtasks, timeouts, timer loops, xml-triggering, etc.
However, I can't get it working on my C++ application compiling with MinGW GCC-compiler. What I'm looking for is a C++ solution to schedule a task e.g. notepad.exe. I also have to be able to adjust the task, for instance by changing the trigger-time. (The biggest problem is that my application that is going to schedule will be ran by non-administrators!)
Unfortunately, I haven't been able to get it to work. I have also tried this example, but it won't compile because I'm missing comdef.h, wincred.h and taskschd.h? Besides that, I also can't use pragma statements with my GCC-compiler...
So, if anyone has a solution for my problem, please speak :)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Is there some cron like library that would let me schedule some function to be ran at certain time (15:30 for example, not x hours from now etc)? If there isn't this kind of library how this should be implemented? Should I just set callback to be called every second and check the time and start jobs scheduled for the time or what?
node-cron does just what I described
node-schedule A cron-like and not-cron-like job scheduler for Node.
agenda is a Lightweight job scheduling for node. This will help you.
later.js is a pretty good JavaScript "scheduler" library. Can run on Node.js or in a web browser.
I am using kue: https://github.com/learnboost/kue . It is pretty nice.
The official features and my comments:
delayed jobs.
If you want to let the job run at a specific time, calculate the milliseconds between that time and now. Call job.delay(milliseconds) (The doc says minutes, which is wrong.) Don't forget to add "jobs.promote();" when you init jobs.
job event and progress pubsub.
I don't understand it.
rich integrated UI.
Very useful. You can check the job status (done, running, delayed) in integrated UI and don't need to write any code. And you can delete old records in UI.
infinite scrolling
Sometimes not working. Have to refresh.
UI progress indication
Good for the time-consuming jobs.
job specific logging
Because they are delayed jobs, you should log useful info in the job and check later through UI.
powered by Redis
Very useful. When you restart your node.js app, all job records are still there and the scheduled jobs will execute too!
optional retries
Nice.
full-text search capabilities
Good.
RESTful JSON API
Sound good, but I never use it.
Edit:
kue is not a cron like library.
By default kue does not supports job which runs repeatedly (e.g. every Sunday).
You can use timexe
It's simple to use, light weight, has no dependencies, has an improved syntax over cron, with a resolution in milliseconds and works in the browser.
Install:
npm install timexe
Use:
var timexe = require('timexe');
var res = timexe("* * * 15 30", function(){ console.log("It's now 3:30 pm"); });
(I'm the author)
node-crontab allows you to edit system cron jobs from node.js. Using this library will allow you to run programs even after your main process termintates. Disclaimer: I'm the developer.
I am the auhor of node-runnr . It have a very simple approach to create job. Also its very easy and clear to declare time and interval.
For example, to execute a job at every 10min 20sec,
Runnr.addIntervalJob('10:20', function(){...}, 'myjob')
To do a job at 10am and 3pm daily,
Runnr.addDailyJob(['10:0:0', '15:0:0'], function(){...}, 'myjob')
Its that simple.
For further detail: https://github.com/Saquib764/node-runnr
All these answers and noone has pointed to the most popular NPM package .. cron
https://www.npmjs.com/package/cron
Both node-schedule and node-cron we can use to implement cron-based schedullers.
NOTE : for generating cron expressions , you can use this cron_maker
This won't be suitable for everyone, but if your application is already setup to take commands via a socket, you can use netcat to issue a commands via cron proper.
echo 'mycommand' | nc -U /tmp/myapp.sock