In my current project set up I have a version with a set due date.
Now all the tasks for that version have an assigned start/end date (which in this specific case is the same day).
E.g.
task x -> due 20.3.
task y -> due 21.3.
task [...] -> due [...]
Now it could happen that I am not able to finish a certain task as planned on the given day.
The only option I found to postpone a set of tasks was to bulk edit the start/end date, which only gives me the option to set the same start/end date for all selected tasks.
Is there a way to simply postpone a set of tasks by amount x?
E.g.:
task #1 -> due someday+x
... -> ....
task #N -> due otherday+x
What would be the best approach for the problem described above?
My issue has been resolved, please see the redmine forum in case you're interested in the solution.
Related
I would like to get some advice to see whether step function is suitable for my use case.
I have a bunch of user records generated at random time. I need to do some pre-processing and validation before putting them into a pool. I have a stage which runs periodically (1-5min) to collect records from the pool and combine them, then publish them.
I need realtime traceability/monitor of each record and I need to notify the user once the record is published.
Here is a diagram to illustrate the flow.
Is a step function suitable for my use case? if not, is there any alternative which help me to simplify the solution? Thanks
Yes, Step Functions is an option. Step Function "State Machines" add the greatest value vs other AWS serverless workflow patterns such as event-driven or pub/sub when the scenario involves complex branching/retry logic and observability requirements. SM logic is explicit and visual, which makes it simple to reason about the workflow. For each State Machine (SM) execution, you can easily trace the exact path the execution took and where it failed. This added functionality is reflected in its higher cost.
In any case, you need to gather records until its time to collect them. This batching requirement means that your achitecture will need more elements than just a State Machine. Here are some ideas:
(1) A SM preprocesses Records one-by-one as they arrive
One option is to use State Machines to orchestrate the preprocessing and validation only. Each arriving event record kicks off a SM execution. Pre-processed records go into a queue, from which they are periodically polled and sent to be combined.
[Records EventBrige event] -> [preprocessing SM] -> [Record queue] -> [polling lambda] -> [Combining Service]
(2) Preprocess and process bached records in a end-to-end State Machine
Gather records in a queue as they arrive. A lambda periodically polls the queue and begins the SM execution on a batch of records. A SM Map Task pre-processes and validates the records in parallel then calls the combining service, all within a single execution. This setup gives you the greatest visibility, but is more complex because you have to handle cases where a single record causes the batched execution to fail.
[Records arrive] -> [Record source queue] -> [polling lambda gets batch] -> [SM for preprocessing, collecting and combining]
Other
There are plenty of other combinations, including chaining SM's together if necessary. Or avoiding SM's altogether. Which option is best for you will depend on which pain points matter most to you: observability, error handling, simplicity, cost.
I have a dataflow job processing data from pub/sub defined like this:
read from pub/sub -> process (my function) -> group into day windows -> write to BQ
I'm using Write.Method.FILE_LOADS because of bounded input.
My job works fine, processing lots of GBs of data but it fails and tries to retry forever when it gets to create another table. The job is meant to run continuously and create day tables on its own, it does fine on the first few ones but then gives me indefinitely:
Processing stuck in step write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at least 05h30m00s without outputting or completing in state finish
Before this happens it also throws:
Load job <job_id> failed, will retry: {"errorResult":{"message":"Not found: Table <name_of_table> was not found in location US","reason":"notFound"}
It is indeed a right error because this table doesn't exists. Problem is that the job should create it on its own because of defined option CreateDisposition.CREATE_IF_NEEDED.
The number of day tables that it creates correctly without a problem depens on number of workers. It seems that when some worker creates one table its CreateDisposition changes to CREATE_NEVER causing the problem, but it's only my guess.
The similar problem was reported here but without any definite answer:
https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609
ProcessElement definition here seems to give some clues but I cannot really say how it works with multiple workers: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138
I use 2.15.0 Apache SDK.
I encountered the same issue, which is still not fixed in BEAM 2.27.0 of january 2021. Therefore I had to develop a workaround: a custom PTransform which checks if the target table exist before the the BigQueryIO stage. It uses the bigquery java client for this and a Guava cache, as well as a windowing strategy (fixed, check every 15s) to sustain a heavy traffic of about 5000 elements per second. Here is the code: https://gist.github.com/matthieucham/85459eff5fdea8d115be520e2dd5ccc1
There was a bug in the past that caused this error, but that particular one was fixed in commit https://github.com/apache/beam/commit/d6b4dcec5f297f5c1bd08f345f0e1e5c756775c2#diff-3f40fd931c8b8b972772724369cea310 Can you check to see if the version of Beam you are running includes this commit?
I created a schedule in IBM OPL:
dvar sequence schedule in all(j in Jobs) job[j];
If the CP-Module generates a solution, the solution is sometimes not a non-delay solution. This is however not allowed and thus I want to enforce a non-delay schedule.
I tried different solutions in the subject to-Section...
forall(t in Jobs)
if (t > 1)
startOf(job[t]) == endOf(job[t-1]);
... but these fail (obviously) when job t-1 is not followed by job t.
Anyone who can give me a hint on how to solve this problem?
Kind regards,
Franz
you should try to use endAtStart.
(OPL constraint to restrict the relative positions of interval variables.)
regards
endAtStart will only work if your Jobs are always ready to start when the preceding Job finishes. If this is not the case, you will receive an error.
A better solution would be to regard the no-delay in the objective. What you are trying to achieve is to start every new job as soon as possible. So you can for example use the staticLex function:
minimize staticLex(startOf(job[1]), startOf(job[2]),...);
However, this expression can become very long and you would need to hardcode the number of intervals.
A little workaround would be to assign the intervals decreasing weighting factors:
minimize sum(j in Jobs) startOf(job[j])*100^-ord(Jobs,j);
I hope this works for you!
Regards
Jan
We’re using JCo 3.0 to connect to RFCs and read data from SAP R/3. We use one RFC RFC_READ_TABLE often and use a second custom RFC to read employee information. My questions revolve around a third RFC RSAQ_REMOTE_QUERY_CALL. I'm calling an ad-hoc query I built in SAP using this RFC but I’m not getting the expected results. The main problem is that it appears that SAP is ignoring one of my selection criteria and using what was saved in SAP when I originally built it. The date criterion stored in my ad-hoc is 6/23/2013. If I pass in 6/28/2013 from JCo, I get the same results as if I had passed 6/23/2013 from JCo.
We have built several ad-hoc queries whose only criteria is a personnel number and call them successfully using RFC RSAQ_REMOTE_QUERY_CALL.
Background on my ad-hoc query: reporting period of today, joining together four aspects of an employee’s information: their latest action (hire, rehire, etc.), organization (e.g. company), pay (e.g. pay scale level) and communication (e.g. email). The query will run every workday.
Here are my questions:
My ad-hoc has three selection criteria. The first two are simple strings. The third is a date. The date will vary each time the query runs. We are referencing the first criteria using SP$00001, the second with SP$00002 and the third with SP$00003. The order of the criteria changes from the ad-hoc to SQ01 (what was SP$00001 in the ad-hoc is now SP$00003). Shouldn’t we reference them in the order defined in the ad-hoc (e.g. SP$00001)?
The two simple string selections are using OPTION “EQ”. The date criteria is using OPTION GT (greater than). Is “GT” correct?
We have some limited accessibility to SAP. Is there a way to see which SP$ parameters are mapped to which criteria?
If my ad-hoc was saved with five criteria but four of them never change when I call the ad-hoc from JCo, do I just need to set the value of the one or do I need to set the other four as well?
Do I have to call this ad-hoc using a variant (function.getImportParameterList().setValue(“VARIANT”, “VARIANT_NAME”))?
Does the Reporting Period have an impact on the date criteria? I have tried changing the Reporting Period to be PNPBEGDA = today and PNPENDDA = today and noticed no change.
Is there a way in SAP to get a “declaration” of your ad-hoc (name, inputs, outputs, criteria)? I have looked at JCoFunction.toXml() and JCoFunctionTemplate. These are good if you want to see something at runtime before it goes to SAP, but I’m looking for something I can use on the JCo end to help me write Java code that matches the ad-hoc.
I have looked at length on the web for answers to my questions and have not found anything that is useful. If there is anything which would help me, please let me know.
Thanks,
LM
Since I don't know much about SQnn, I won't be able to answer all of your questions...
I don't know, sorry.
It should be, at least it's the usual operator for greater than.
Yes - set an external breakpoint right inside the function module and trace its execution while performing the RFC call. Warning: At least basic ABAP knowledge required.
I don't know, sorry.
I don't know either, sorry.
That would depend on the query, I suspect...
JCo won't be able to help you out there - it doesn't know about queries, it only knows function modules. There might be other RSAQ_* function modules to get that information though.
I played with setting up a variant in SQ01 for my query. I added some settings in the variant that solved my problem and answered several of my questions in my post. The main thing I did was add a dynamically calculated date as part of my criteria. Here's how:
1. In SQ01, access menu "Go To" -> "Maintain Variants".
2. Choose your variant and in subobjects, choose "Attributes" and click "Change".
3. In the displayed list, find your date criterion.
4. Choose "D" in Selection Variable, choose a comparison option (mine was GT for greater than), and a "Name of a Variable" (really, this is the type of dynamic date calculation you need).
5. Go back to the Subobjects panel, choose "Values" and click "Change".
6. Enter any other criteria you need in the "Program selections" section.
7. Save the variant.
By doing this, I don't need to pass anything into the query from JCo. Also, SAP will automatically update the date criteria you entered in step #4 above.
So to to answer my questions from my original post:
1 and 4. It doesn't matter because I'm no longer passing anything in from JCo.
2. "GT" is Greater Than.
3 and 7. If anyone knows, I'd really like to find out.
5. Use the name you as it is in SAP (step #2 above).
6. I still don't know, but it's not holding me up.
I'm posting this in case anyone out there needs this type of information. Thanks to Esti and vwegert for helping me out.
Long story short, I'm rewriting a piece of a system and am looking for a way to store some hit counters in AWS SimpleDB.
For those of you not familiar with SimpleDB, the (main) problem with storing counters is that the cloud propagation delay is often over a second. Our application currently gets ~1,500 hits per second. Not all those hits will map to the same key, but a ballpark figure might be around 5-10 updates to a key every second. This means that if we were to use a traditional update mechanism (read, increment, store), we would end up inadvertently dropping a significant number of hits.
One potential solution is to keep the counters in memcache, and using a cron task to push the data. The big problem with this is that it isn't the "right" way to do it. Memcache shouldn't really be used for persistent storage... after all, it's a caching layer. In addition, then we'll end up with issues when we do the push, making sure we delete the correct elements, and hoping that there is no contention for them as we're deleting them (which is very likely).
Another potential solution is to keep a local SQL database and write the counters there, updating our SimpleDB out-of-band every so many requests or running a cron task to push the data. This solves the syncing problem, as we can include timestamps to easily set boundaries for the SimpleDB pushes. Of course, there are still other issues, and though this might work with a decent amount of hacking, it doesn't seem like the most elegant solution.
Has anyone encountered a similar issue in their experience, or have any novel approaches? Any advice or ideas would be appreciated, even if they're not completely flushed out. I've been thinking about this one for a while, and could use some new perspectives.
The existing SimpleDB API does not lend itself naturally to being a distributed counter. But it certainly can be done.
Working strictly within SimpleDB there are 2 ways to make it work. An easy method that requires something like a cron job to clean up. Or a much more complex technique that cleans as it goes.
The Easy Way
The easy way is to make a different item for each "hit". With a single attribute which is the key. Pump the domain(s) with counts quickly and easily. When you need to fetch the count (presumable much less often) you have to issue a query
SELECT count(*) FROM domain WHERE key='myKey'
Of course this will cause your domain(s) to grow unbounded and the queries will take longer and longer to execute over time. The solution is a summary record where you roll up all the counts collected so far for each key. It's just an item with attributes for the key {summary='myKey'} and a "Last-Updated" timestamp with granularity down to the millisecond. This also requires that you add the "timestamp" attribute to your "hit" items. The summary records don't need to be in the same domain. In fact, depending on your setup, they might best be kept in a separate domain. Either way you can use the key as the itemName and use GetAttributes instead of doing a SELECT.
Now getting the count is a two step process. You have to pull the summary record and also query for 'Timestamp' strictly greater than whatever the 'Last-Updated' time is in your summary record and add the two counts together.
SELECT count(*) FROM domain WHERE key='myKey' AND timestamp > '...'
You will also need a way to update your summary record periodically. You can do this on a schedule (every hour) or dynamically based on some other criteria (for example do it during regular processing whenever the query returns more than one page). Just make sure that when you update your summary record you base it on a time that is far enough in the past that you are past the eventual consistency window. 1 minute is more than safe.
This solution works in the face of concurrent updates because even if many summary records are written at the same time, they are all correct and whichever one wins will still be correct because the count and the 'Last-Updated' attribute will be consistent with each other.
This also works well across multiple domains even if you keep your summary records with the hit records, you can pull the summary records from all your domains simultaneously and then issue your queries to all domains in parallel. The reason to do this is if you need higher throughput for a key than what you can get from one domain.
This works well with caching. If your cache fails you have an authoritative backup.
The time will come where someone wants to go back and edit / remove / add a record that has an old 'Timestamp' value. You will have to update your summary record (for that domain) at that time or your counts will be off until you recompute that summary.
This will give you a count that is in sync with the data currently viewable within the consistency window. This won't give you a count that is accurate up to the millisecond.
The Hard Way
The other way way is to do the normal read - increment - store mechanism but also write a composite value that includes a version number along with your value. Where the version number you use is 1 greater than the version number of the value you are updating.
get(key) returns the attribute value="Ver015 Count089"
Here you retrieve a count of 89 that was stored as version 15. When you do an update you write a value like this:
put(key, value="Ver016 Count090")
The previous value is not removed and you end up with an audit trail of updates that are reminiscent of lamport clocks.
This requires you to do a few extra things.
the ability to identify and resolve conflicts whenever you do a GET
a simple version number isn't going to work you'll want to include a timestamp with resolution down to at least the millisecond and maybe a process ID as well.
in practice you'll want your value to include the current version number and the version number of the value your update is based on to more easily resolve conflicts.
you can't keep an infinite audit trail in one item so you'll need to issue delete's for older values as you go.
What you get with this technique is like a tree of divergent updates. you'll have one value and then all of a sudden multiple updates will occur and you will have a bunch of updates based off the same old value none of which know about each other.
When I say resolve conflicts at GET time I mean that if you read an item and the value looks like this:
11 --- 12
/
10 --- 11
\
11
You have to to be able to figure that the real value is 14. Which you can do if you include for each new value the version of the value(s) you are updating.
It shouldn't be rocket science
If all you want is a simple counter: this is way over-kill. It shouldn't be rocket science to make a simple counter. Which is why SimpleDB may not be the best choice for making simple counters.
That isn't the only way but most of those things will need to be done if you implement an SimpleDB solution in lieu of actually having a lock.
Don't get me wrong, I actually like this method precisely because there is no lock and the bound on the number of processes that can use this counter simultaneously is around 100. (because of the limit on the number of attributes in an item) And you can get beyond 100 with some changes.
Note
But if all these implementation details were hidden from you and you just had to call increment(key), it wouldn't be complex at all. With SimpleDB the client library is the key to making the complex things simple. But currently there are no publicly available libraries that implement this functionality (to my knowledge).
To anyone revisiting this issue, Amazon just added support for Conditional Puts, which makes implementing a counter much easier.
Now, to implement a counter - simply call GetAttributes, increment the count, and then call PutAttributes, with the Expected Value set correctly. If Amazon responds with an error ConditionalCheckFailed, then retry the whole operation.
Note that you can only have one expected value per PutAttributes call. So, if you want to have multiple counters in a single row, then use a version attribute.
pseudo-code:
begin
attributes = SimpleDB.GetAttributes
initial_version = attributes[:version]
attributes[:counter1] += 3
attributes[:counter2] += 7
attributes[:version] += 1
SimpleDB.PutAttributes(attributes, :expected => {:version => initial_version})
rescue ConditionalCheckFailed
retry
end
I see you've accepted an answer already, but this might count as a novel approach.
If you're building a web app then you can use Google's Analytics product to track page impressions (if the page to domain-item mapping fits) and then to use the Analytics API to periodically push that data up into the items themselves.
I haven't thought this through in detail so there may be holes. I'd actually be quite interested in your feedback on this approach given your experience in the area.
Thanks
Scott
For anyone interested in how I ended up dealing with this... (slightly Java-specific)
I ended up using an EhCache on each servlet instance. I used the UUID as a key, and a Java AtomicInteger as the value. Periodically a thread iterates through the cache and pushes rows to a simpledb temp stats domain, as well as writing a row with the key to an invalidation domain (which fails silently if the key already exists). The thread also decrements the counter with the previous value, ensuring that we don't miss any hits while it was updating. A separate thread pings the simpledb invalidation domain, and rolls up the stats in the temporary domains (there are multiple rows to each key, since we're using ec2 instances), pushing it to the actual stats domain.
I've done a little load testing, and it seems to scale well. Locally I was able to handle about 500 hits/second before the load tester broke (not the servlets - hah), so if anything I think running on ec2 should only improve performance.
Answer to feynmansbastard:
If you want to store huge amount of events i suggest you to use distributed commit log systems such as kafka or aws kinesis. They allow to consume stream of events cheap and simple (kinesis's pricing is 25$ per month for 1K events per seconds) – you just need to implement consumer (using any language), which bulk reads all events from previous checkpoint, aggregates counters in memory then flushes data into permanent storage (dynamodb or mysql) and commit checkpoint.
Events can be logged simply using nginx log and transfered to kafka/kinesis using fluentd. This is very cheap, performant and simple solution.
Also had similiar needs/challenges.
I looked at using google analytics and count.ly. the latter seemed too expensive to be worth it (plus they have a somewhat confusion definition of sessions). GA i would have loved to use, but I spent two days using their libraries and some 3rd party ones (gadotnet and one other from maybe codeproject). unfortunately I could only ever see counters post in GA realtime section, never in the normal dashboards even when the api reported success. we were probably doing something wrong but we exceeded our time budget for ga.
We already had an existing simpledb counter that updated using conditional updates as mentioned by previous commentor. This works well, but suffers when there is contention and conccurency where counts are missed (for example, our most updated counter lost several million counts over a period of 3 months, versus a backup system).
We implemented a newer solution which is somewhat similiar to the answer for this question, except much simpler.
We just sharded/partitioned the counters. When you create a counter you specify the # of shards which is a function of how many simulatenous updates you expect. this creates a number of sub counters, each which has the shard count started with it as an attribute :
COUNTER (w/5shards) creates :
shard0 { numshards = 5 } (informational only)
shard1 { count = 0, numshards = 5, timestamp = 0 }
shard2 { count = 0, numshards = 5, timestamp = 0 }
shard3 { count = 0, numshards = 5, timestamp = 0 }
shard4 { count = 0, numshards = 5, timestamp = 0 }
shard5 { count = 0, numshards = 5, timestamp = 0 }
Sharded Writes
Knowing the shard count, just randomly pick a shard and try to write to it conditionally. If it fails because of contention, choose another shard and retry.
If you don't know the shard count, get it from the root shard which is present regardless of how many shards exist. Because it supports multiple writes per counter, it lessens the contention issue to whatever your needs are.
Sharded Reads
if you know the shard count, read every shard and sum them.
If you don't know the shard count, get it from the root shard and then read all and sum.
Because of slow update propogation, you can still miss counts in reading but they should get picked up later. This is sufficient for our needs, although if you wanted more control over this you could ensure that- when reading- the last timestamp was as you expect and retry.