How to get a placeholder/dummy step in SageMaker pipelines? - amazon-web-services

I'd like to sketch out some steps in a SageMaker pipeline, and only fill them in one at a time, but I don't think there's an EmptyStep option anywhere.
I've considered using some vacuously true ConditionalSteps, or subclassing sagemaker.workflow.steps.Step, but the former can't be chained, and the latter seems likely to break things, given my implementation wouldn't necessarily conform to what the service is looking for.
Is there a good way to go about this? An empty processor step?

There's no way to create your own empty step in a SageMaker Pipeline. The easiest way for you to achieve this would be to use a LambdaStep and create stub Lambda functions. With a processor you will pay the penalty of cold start for each job.
I work at AWS and my opinions are my own.

Related

My chatbot (lex v2) exceeded intent & utterances quotas and i need more intents. [AWS][LEXV2]

I am creating a large chatbot, the problem is that I have already exceeded the limits that amazon has, I need approximately 1800 intents(it is a large project), and the "hard limits" cannot be increased (I already spoke with an amazon agent) , I wanted to know if anyone has experienced this problem and how to solve it (not changing Dialogflow/wattson tools).
I was thinking of creating a "Chatbot Orchestrator" and splitting the chatbot into several parts (experiences) and invoking the corresponding bot and intent.
Any ideas?
A possible solution is use Kendra for search the intent, basically i need to activate the fallback intent and use Kendra in a lambda function for the search of the answer.
in this document, there is an example.
Kendra, as mentioned already, is an alternative.
However, I would suggest you do a deep dive through your intents to see how many pertain to the same context and can be combined and effectively managed through the use of slots and Lambdas to get the right behaviour.
Another approach would be to use separate bots if you have clean divisions between the intents. Note that your costs here could increase quite substantially as you'd need to invoke all the bots, evaluate the confidence scores and then decide which response to return to the client.

How to easily do the DAG analyse of a pipeline?

To optimize a pipeline, it's important to know the rate-limiting rules or paths. Based on the DAG analyse, is there any approaches that could easily calculate the critical path or the key events ?
This is a broad question to answer, but you may be interested in looking at the --runtime-profile command line argument, which can be used to profile Snakemake code using yappi.

Separation of Runtime and History

I whould like to use separate databases for runtime and history data without implementing a custom HistoryEventHandler. Does someone know how this is possible?
I read the camunda user guides but this did not help much because it only hints the custom implementation way.
Currently, everytime I query history data (about 2mil activity entries) the performance of the system drops as it kind of blocks the runtime, too. I'd like to avoid this without loosing the ability to query historic data.
That would be a really cool feature, but it is currently not supported. You will have to disable the default history and implement a custom handler.
Camunda BPM offers Optimize, which pulls the history data from the Engine to an Elastic Search database. If you are using the Enterprise version, it may be a way to solve it.
(Based on your comments to other answers, it appears that you're interested in learning more about custom HistoryEventHandler implementations. Thus, I'm adding this answer in the hope that it will help.)
Implementing a custom History Event Handler isn't difficult, but there are a few important points to keep in mind:
Unless you want to skip the storage of history information in the standard Camunda history tables, you'll want to use their CompositeHistoryEventHandler. This simply gives you the ability to use multiple HistoryEventHandler implementations.
Any HistoryEventHandler implementations will complete in the same threads as the ones executing process instances; thus, you will want to be cognizant of the performance impacts your custom HistoryEventHandler will have.
You may want to consider publishing your history events through a message bus or messaging system to allow for reliable delivery without impacting Camunda workflow instance performance.
Finally, it may make sense to use your custom HistoryEventHandler along with Camunda's default HistoryEventHandler and their functionality for deleting process instances after a period of time. This would allow you to use their querying capabilities for some period of time without having the history stack up (and thus slowing down your system).

Set BatchSize for specific function of a WebJob

Is is possible to set batch size on function level within a webjob?
I have multiple functions in a webjob, some of them depend on other external APIs which does not allow a high degree of parallelization.
I have seen only the Singleton attribute which is not exactly what I am looking for.
just figured out that this is possible with a custom QueueProcessorFactory I already use.
An example from MS is here:
https://github.com/Azure/azure-webjobs-sdk-samples/blob/master/BasicSamples/MiscOperations/CustomQueueProcessorFactory.cs
Having attributes for this would be nice ;-)
Alex
Yeah, custom QueueProcessor instances are designed were designed to be the "escape hatch" allowing you full control in advanced scenarios. We want to keep the mainline paths simple and easy to use, while allowing you to drop down and deeply customize when needed. Adding a bunch of override options on QueueTriggerAttribute itself would be possible, but could also complicate the programming model.
If you would like to suggest a change, I suggest you log issues in the public repo: https://github.com/Azure/azure-webjobs-sdk/issues
Thanks :)

High level PHP library for Amazon SWF deciders to check state of activity tasks

I'm writing PHP for fairly simple workflow for Amazon SWF. I've found myself starting to write a library to check if certain actions have been started or completed. Essentially looping over the event list to check how things have progressed, and then starting an appropriate activity if its needed. This can be a bit faffy at times as the activity type and input information isn't in every event, it seems to be in the ActivityTaskScheduled event. This sort of thing I've discovered along the way, and I'm concerned that I could be missing subtle things about event lists.
It makes me suspect that someone must have already written some sort of generic library for finding the current state of various activities. Maybe even some sort of more declarative way of coding up the flowcharts that are associated with SWF. Does anything like this exist for PHP?
(Googling hasn't come up with anything)
I'm not aware of anything out there that does what you want, but you are doing it right. What you're talking about is coding up the decider, which necessarily has to look at the entire execution state (basically loop through the event list) and decide what to do next.
Here's an example written in python
( Using Amazon SWF To communicate between servers )
that looks for events of type 'ActivityTaskCompleted' to then decide what to do next, and then, yes, looks at the previous 'ActivityTaskScheduled' entry to figure out what the attributes for the previous task were.
If you write a php framework that specifies the workflow in a declarative way then a generic decider that implements it, please consider sharing it :)
I've since found https://github.com/cbalan/aws-swf-fluent-php which looks promising, but not really used it, so can't speak to the whether it works or not.
I've forked it and started a bit of very light refactoring to allow some testing, available at https://github.com/michalc/aws-swf-fluent-php