In Camunda , What is the correct way of optimised coding among two options? - camunda

Retrieving userTasks first and then from task, retrive the processInstanceId.
First retrieving ProcessInstances and then iterate it with a loop and find usertasks in each process instance.
Please give reason too.

If you want to build a task list you should use the task queries, as you can do the right queries and filters there easily. Process instance id's are part of the result.
But it would indeed be interesting to understand the use case, what exactly you need the process instance id for.

Related

Azure WebJobs - how to keep the state?

I have a need to implement some kind of orchestrator by a WebJob. So it needs to keep a state for some kind of internal queue.
Is there any way apart from static field and from using database to keep that state?
In general the idea is simple: I have a calculation job. It get's e.g. ProductId and is doing calculations for it. It takes some time, so when another message for same ProductId is coming I need to wait, until previous calculations will finish. But at the same time I can pick a message for another ProductId if there are no running calculations for that.
I haven't found any way to make sequential processing of messages based on a specific conditions. So end up with idea to implement a stateful orchestrator which will do the trick.
Am I doing it in a wrong way?

Hbase Update/Insert using Get/Put

Could anyone advise what would be the best way to go for my requirement.
I have the below
An Hbase table
An input file in HDFS
My Requirement is as below
Read the input file and fetch the key. Using key get the data from
Hbase.
Do a comparison to check.
If comparison fails, insert
If comparison is successful update.
I know i can use get to fetch the data and put to write it back. Is this the best way to go forward.I hope i will use mapreduce so that i can get the process to run in parallel.
HBase has a checkAndPut() and a checkAndDelete() operation. which allows you to perform a put or a delete if you have the value you expect (compare=NO_OP if you don't care about the value but just about the key).
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html
Depending on the size of your problem I actually recommend a slightly different approach here. While it probably is feasible to implement HBase puts inside a MapReduce Job, it sounds like a rather complex task.
I'd recommend loading data from HBase into MapReduce joining the two tables and then exporting them back into HBase.
Using Pig this would be rather easy to achieve.
Take a look at Pig HBaseStorage.
Going this route you'd load both files, join them and then write back to HBase. If all there is to it, is comparing keys, this can be achieved in 5 lines of PigLatin.
HTH

Powercenter - concurrent target instances

We have a situation where we have a different execution order of instances of the same target being loaded from a single source qualifier.
We have a problem when we promote a mapping from DEV to TEST when we execute in TEST after promoting there are problems.
For instance we have a router with 3 groups for Insert, Update and Delete followed by the appropriate update strategies to set the row type accordingly followed by three target instances.
RTR ----> UPD_Insert -----> TGT_Insert
\
\__> UPD_Update -------> TGT_Update
\
\__> UPD_Delete ---------> TGT_Delete
When we test this out using data to do an insert followed by an update followed by a delete all based on the same primary key we get a different execution order in TEST compared to the same data in our DEV environment.
Anyone have any thoughts - I would post an image but I don't have enough cred yet.
Cheers,
Gil.
You can not controll the load order as long as you have a single source. I you could separate the loads to use separate sources the target load order setting in the mapping could be used, or you could even create separate mappings for them.
As it is now you should use a single target and utilize the update strategy transformation to determine the wanted operation for each record passing through. It is then possible to use a sort to define in what order the different operations is made to the physical table.
You can use the sorter transformation just before update strategy......based on update strategy condition you can sort the incoming rows....So first date will go through the Insert, than update at last through delete strategy.
Simple solution is try renaming the target definition in alphabetical order... like INSERT_A, UPDATE_B, DELETE_C then start loading
This will load in A,B,C order. Try and let me know

Write to multiple tables in HBASE

I have a situation here where I need to write to two of the hbase tables say table1,table 2. Whenever a write happens on table 1, I need to do some operation on table 2 say increment a counter in table 2 (like triggering). For this purpose I need to access (write) to two tables in the same task of a map-reduce program. I heard that it can be done using MultiTableOutputFormat. But I could not find any good example explaining in detail. Could some one please answer whether is it possible to do so. If so how can/should I do it. Thanks in advance.
Please provide me an answer that should not include co-processors.
To write into more than one table in map-reduce job, you have to specify that in job configuration. You are right this can be done using MultiTableOutputFormat.
Normally for a single table you use like:
TableMapReduceUtil.initTableReducerJob("tableName", MyReducer.class, job);
Instead of this write:
job.setOutputFormatClass(MultiTableOutputFormat.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(2);
TableMapReduceUtil.addDependencyJars(job);
TableMapReduceUtil.addDependencyJars(job.getConfiguration());
Now at the time of writing data in table write as:
context.write(new ImmutableBytesWritable(Bytes.toBytes("tableName1")),put1);
context.write(new ImmutableBytesWritable(Bytes.toBytes("tableName2")),put2);
For this you can use HBase Observer, You have to create an observer and have to deploy on your server(applicable only for HBase Version >0.92), It will automatic trigger to another table.
And I think HBase Observer has similar concepts of like Aspects.
For more details -
https://blogs.apache.org/hbase/entry/coprocessor_introduction

Auto-increment on Azure Table Storage

I am currently developing an application for Azure Table Storage. In that application I have table which will have relatively few inserts (a couple of thousand/day) and the primary key of these entities will be used in another table, which will have billions of rows.
Therefore I am looking for a way to use an auto-incremented integer, instead of GUID, as primary key in the small table (since it will save lots of storage and scalability of the inserts is not really an issue).
There've been some discussions on the topic, e.g. on http://social.msdn.microsoft.com/Forums/en/windowsazure/thread/6b7d1ece-301b-44f1-85ab-eeb274349797.
However, since concurrency problems can be really hard to debug and spot, I am a bit uncomfortable with implementing this on own. My question is therefore if there is a well tested impelemntation of this?
For everyone who will find it in search, there is a better solution. Minimal time for table lock is 15 seconds - that's awful. Do not use it if you want to create a truly scalable solution. Use Etag!
Create one entity in table for ID (you can even name it as ID or whatever).
1) Read it.
2) Increment.
3) InsertOrUpdate WITH ETag specified (from the read query).
if last operation (InsertOrUpdate) succeeds, then you have a new, unique, auto-incremented ID. If it fails (exception with HttpStatusCode == 412), it means that some other client changed it. So, repeat again 1,2 and 3.
The usual time for Read+InsertOrUpdate is less than 200ms. My test utility with source on github.
See UniqueIdGenerator class by Josh Twist.
I haven't implemented this yet but am working on it ...
You could seed a queue with your next ids to use, then just pick them off the queue when you need them.
You need to keep a table to contain the value of the biggest number added to the queue. If you know you won't be using a ton of the integers, you could have a worker every so often wake up and make sure the queue still has integers in it. You could also have a used int queue the worker could check to keep an eye on usage.
You could also hook that worker up so if the queue was empty when your code needed an id (by chance) it could interupt the worker's nap to create more keys asap.
If that call failed you would need a way to (tell the worker you are going to do the work for them (lock), then do the workers work of getting the next id and unlock)
lock
get the last key created from the table
increment and save
unlock
then use the new value.
The solution I found that prevents duplicate ids and lets you autoincrement it is to
lock (lease) a blob and let that act as a logical gate.
Then read the value.
Write the incremented value
Release the lease
Use the value in your app/table
Then if your worker role were to crash during that process, then you would only have a missing ID in your store. IMHO that is better than duplicates.
Here is a code sample and more information on this approach from Steve Marx
If you really need to avoid guids, have you considered using something based on date/time and then leveraging partition keys to minimize the concurrency risk.
Your partition key could be by user, year, month, day, hour, etc and the row key could be the rest of the datetime at a small enough timespan to control concurrency.
Of course you have to ask yourself, at the price of date in Azure, if avoiding a Guid is really worth all of this extra effort (assuming a Guid will just work).