Optaplanner Project Job Scheduling: Start successor job before predecessor job completes - scheduling

I am learning to use opta planner job scheduling to schedule jobs in my factory line. My particular query is
In the book binding example provided on the Optaplanner website can the successor operation start before 100% of the predecessor job is complete? Specifically the design activity duration is 2 hours, but can its successor activities (Making cover and pages) start at 0.5 hours?
If this is possible, what are the changes required in the input file to achieve this objective
Your help will be really appreciated!
Thanks

Related

how to read/write a value from/in the context object cdk step functions

I am writing in AWS CDK a Step Function which run two tasks in parallel. I would like to access from one of the tasks , a value of the second tasks , which runs in parallel (for example, I would like to know in task 1, which is the time started task 2, or maybe id from task 2).
Here an screenshot of the state machine definition in Step Function.
In the example of the screenshot, I would like to use the Id of the GlueStartRunJob (1) in GlueStartRunJob.
I was thinking about using the Context Object for that purpose. Nevertheless, I am not sure if this is the right approach...
The Context Object is read-only and allows a give state to access contextual information about it's self, not about other states from elsewhere in the workflow.
I'm not 100% clear what you are aiming to accomplish here, but I can see a couple of possible approaches.
First, you might just want to order these Glue Jobs to run sequentially so the output from the first can be used in the second.
Second, if you need the workflow to take action after the Glue Jobs have started but before they have completed, you'd need to take an approach that does not use the .sync integration pattern. With this integration pattern, Step Functions puts a synchronous facade over an asynchronous interaction, taking care of the steps to track completion and return you the results. You could instead use the default RequestResponse pattern to start the jobs in your parallel state, then do whatever you needed to after. You'd need to then include your own polling logic if you wanted the workflow to wait for completion of the jobs and return data on them or take action on completion. You can see an example of such polling for Glue Crawlers in this blog post (for which you can find sample code here).

How to update a Dataflow incompatible pipeline without loosing in data?

I'm practicing for the Data Engineer GCP certification exam and got the following question:
You have a Google Cloud Dataflow streaming pipeline running with a
Google Cloud Pub/Sub subscription as the source. You need to make an
update to the code that will make the new Cloud Dataflow pipeline
incompatible with the current version. You do not want to lose any
data when making this update.
What should you do?
Possible answers:
Update the current pipeline and use the drain flag.
Update the current pipeline and provide the transform mapping JSON object.
The correct answer according to the website 1 my answer was 2. I'm not convinced my answer is incorrect and these are my reasons:
Drain is a way to stop the pipeline and does not solve the incompatibility issues.
Mapping solves the incompatibility issue.
The only way that I see 1 as the correct answer is if you don't care about compatibility.
So which one is right?
I'm studying for the same exam, and the two cores of this question are:
1- Don't lose data ← Drain, is perfect for this because you process all buffer data and stop reviving messages; normally this message is alive for 7 days of retry, so when you start a new job you will receive all without lose any data.
2- Incompatible new code ← mapping solve some incompatibilities like change name of a ParDO but no a version issue. So launch a new job with the new code, it's the only option.
So, option is A.
I think the main point is that you cannot solve all the incompatibilities with the transform mapping. Mapping can be done for simple pipeline changes (for example, names), but it doesn't generalize well.
The recommended solution is constantly draining the pipeline running a legacy version, as it will stop taking any data from reading components, finish all the work pending on workers, and shutdown.
When you start a new pipeline, you don't have to worry about state compatibility, as workers are starting fresh.
However, the question is indeed ambiguous, and it should be more precise about the type of incompatibility or state something like in general. Arguably you can always try to update the job with mapping, and if Dataflow finds the new job to be incompatible, it will not affect the running pipeline -- then your only choice would be the drain option.

GCP Workflow : Can we setup a step wait for other steps to complete?

I am using GCP Workflow Beta to check if I can build some of my workflows. The documentation mentions how we can conditionally execute steps with switch case and next for jumps. However can we have a flows where
A step waits for two or more previous steps to complete
Multiple steps triggered the same time.
As you can see, what I am implying is conditional parallel execution of steps. Is there a way to do this ?
Also, I see we have some basic functions like len, string etc in the examples. Can you please guide me where I can find a list of all such functions that are available ? I was looking for something to manipulate JSON.
You can't wait a step because you can't run multiple step in parallel for now.
In my company, we also expect, and wait, a lot about parallelization and that's why I had a meeting with the PM yesterday to share these expectation with him. It's one of the top priority in the roadmap but I don't know when it will be released (something like Q1 or Q2 2021 I guess).
The functions sounds like python code but it's not really clear in the documentation. I will share this with the PM.

Quartz wait for a set of jobs to finish

We have a quatz job that does a lot of calculations and is taking a while to complete. In order to speed it up we want to split the primary job to start multiple smaller jobs that do the calculations and return the result. After all the small jobs complete we need a final job that will pull the subtotals together.
Currently the idea is each small job will write to a store, and when creating the final job we pass in all small job names to it with MapData. The final job will look for these jobs and reschedule if any are found, else it will run the totals.
Is there a better way to accomplish this in quartz?
This isn't necessarily answering the question, but I'm afraid I don't think Quartz is the tool for the job here. It's a scheduler, not a mechanism for load balancing. You could look at using Quartz in combination with NServiceBus or MassTransit. The job could fire multiple messages for the small jobs, maybe even using the same message type and a Distributor and use a Saga to pull everything back together.

How do you model a business workflow in ColdFusion?

Since there's no complete BPM framework/solution in ColdFusion as of yet, how would you model a workflow into a ColdFusion app that can be easily extensible and maintainable?
A business workflow is more then a flowchart that maps nicely into a programming language. For example:
How do you model a task X that follows by multiple tasks Y0,Y1,Y2 that happen in parallel, where Y0 is a human process (need to wait for inputs) and Y1 is a web service that might go wrong and might need auto retry, and Y2 is an automated process; follows by a task Z that only should be carried out when all Y's are completed?
My thoughts...
Seems like I need to do a whole lot of storing / managing / keeping
track of states, and frequent checking with cfscheuler.
cfthread ain't going to help much since some tasks can take days
(e.g. wait for user's confirmation).
I can already image the flow is going to be spread around in multiple UDFs,
DB, and CFCs
any opensource workflow engine in other language that maybe we can port over to CF?
Thank you for your brain power. :)
Study the Java Process Definition Language specification where JBoss has an execution engine for it. Using this Java based engine may be your easiest solution, and it solves many of the problems you've outlined.
If you intend to write your own, you will probably end up modelling states and transitions, vertices and edges in a directed graph. And this as Ciaran Archer wrote are the components of a State Machine. The best persistence approach IMO is capturing versions of whatever data is being sent through workflow via serialization, capturing the current state, and a history of transitions between states and changes to that data. The mechanism probably needs a way to keep track of who or what has responsibility for taking the next action against that workflow.
Based on your question, one thing to consider is whether or not you really need to represent parallel tasks in your solution. Where instead it might be possible to en-queue a set of messages and then specify a wait state for all of those to complete. Representing actual parallelism implies you are moving data simultaneously through several different processes. In which case when they join again you need an algorithm to resolve deltas, which is very much a non trivial task.
In the context of ColdFusion and what you're trying to accomplish, a scheduled task may be necessary if the system you're writing needs to poll other systems. Consider WDDX as a serialization format. JSON, while seductively simple, I recall has some edge cases around numbers and dates that can cause you grief.
Finally see my answer to this question for some additional thoughts.
Off the top of my head I'm thinking about the State design pattern with state persisted to a database. Check out the Head First Design Patterns's Gumball Machine example.
Generally this will work if you have something (like a client / order / etc.) going through a number of changes of state.
Different things will happen to your object depending on what state you are in, and that might mean sitting in a database table waiting for a flag to be updated by a user manually.
In terms of other languages I know Grails has a workflow module available. I don't know if you would be better off porting to CF or jumping ship to Grails (right tool for the job and all that).
It's just a thought, hope it helps.