Making a method idempotent

Making a method idempotent - web-services

Suppose we have a method deposit(Integer amount) in a class AccountManager. This method would modify the accountTotal and therefore is not idempotent.
How would you make this method idempotent?

A method that changes something based on the users input, like yours seems to do, cannot be made idempotent. This would mean that if you call deposit(10) a hundred times, only that it has the same effect as calling it once, which would not make any sense.

By passing the initial state of the state which you plan to change. You can give a unique id to each modification and pass the id of the state with respect to which you are making the change. Since the id would increase (or change in some way) after each change, subsequent requests for change with respect to the old id would do nothing (because the object would not be in the state with that id). Think of this as a blanket "check in" operation in a CMS. The files which haven't changed will not need to be checked in. And after the initial check in of changes on files which did change, the subsequent blanket check ins will do nothing.

Related

How does django decide which transaction to choose on transaction.on_commit()

I had to use transaction.on_commit() for synchronous behaviour in one of the signals of my project. Though it works fine, I couldn't understand how does transaction.on_commit() decide which transaction to take. I mean there can be multiple transactions at the same time. But how does django know which transaction to take by using transaction.on_commit()

According to the docs
You can also wrap your function in a lambda:
transaction.on_commit(lambda: some_celery_task.delay('arg1'))
The function you pass in will be called immediately after a hypothetical database write made where on_commit() is called would be successfully committed.
If you call on_commit() while there isn’t an active transaction, the callback will be executed immediately.
If that hypothetical database write is instead rolled back (typically when an unhandled exception is raised in an atomic() block), your function will be discarded and never called.
If you are using it on post_save method with sender=SomeModel. Probably the on_commit is executed each time a SomeModel object is saved. Without the proper code we would not be able to tell the exact case.

If I understand the question correctly, I think the docs on Savepoints explains this.
Essentially, you can nest any number of transactions, but on_commit() is only called after the top most one commits. However, on_commit() that's nested within a savepoint will only be called if that savepoint was committed and all the ones above it are committed. So, it's tied to which ever one is currently open at the point it's called.

Backing up a running rocksdb-instance

I would like to backup a running rocksdb-instance to a location on the same disk in a way that is safe, and without interrupting processing during the backup.
I have read:
Rocksdb Backup Instructions
Checkpoints Documentation
Documentation in rocksdb/utilities/{checkpoint.h,backupable_db.{h,cc}}
My question is whether the call to CreateNewBackupWithMetadata is marked as NOT threadsafe to express, that two concurrent calls to this function will have unsafe behavior, or to indicate that ANY concurrent call on the database will be unsafe. I have checked the implementation, which appears to be creating a checkpoint - which the second article claims are used for online backups of MyRocks -, but I am still unsure, what part of the call is not threadsafe.
I currently interpret this as, it is unsafe, because CreateBackup... calls DisableFileDeletions and later EnableFileDeletions, which, of course, if two overlapping calls are made, may cause trouble. Since the SST-files are immutable, I am not worried about them, but am unsure whether modifying the WAL through insertions can corrupt the backup. I would assume that triggering a flush on backup should prevent this, but I would like to be sure.
Any pointers or help are appreciated.

I ended up looking into the implementation way deeper, and here is what I found:
Recall a rocksdb database consists of Memtables, SSTs and a single WAL, which protects data in the Memtables against crashes.
When you call rocksdb::BackupEngine::CreateBackupWithMetadata, there is no lock taken internally, so this call can race, if two calls are active at the same time. Most notably this call does Disable/EnableFileDeletions, which, if called by one call, while another is still active spells doom for the other call.
The process of copying the files from the database to the backup is protected from modifications while the call is active by creating a rocksdb::Checkpoint, which, if flush_before_backup was set to true, will first flush the Memtables, thus clearing the active WAL.
Internally the call to CreateCustomCheckpoint calls DB::GetLiveFiles in db_filecheckpoint.cc. GetLiveFiles takes the global database lock (_mutex), optionally flushes the Memtables, and retrieves the list of SSTs. If a flush in GetLiveFiles happens while holding the global database-lock, the WAL must be empty at this time, which means the list should always contain the SST-files representing a complete and consistent database state from the time of the checkpoint. Since the SSTs are immutable, and since file deletion through compaction is turned off by the backup-call, you should always get a complete backup without holding writes on the database. However this, of course, means it is not possible to determine the exact last write/sequence number in the backup when concurrent updates happen - at least not without inspecting the backup after it has been created.
For the non-flushing version, there maybe WAL-files, which are retrieved in a different call than GetLiveFiles, with no lock held in between, i.e. these are not necessarily consistent, but I did not investigate further, since the non-flushing case was not applicable to my use.

Dirtied plug still has the old value during setDependentsDirty()

When I dirty an input plug for example mFileAttr, the setDependentsDirty() gets properly invoked, but the value of fileName plug is still the old value! I only see it getting updated once it goes through compute(). How can I access the new value in setDependentsDirty() function since it's indeed triggered by the plug value update?
MStatus FNode::setDependentsDirty(const MPlug& plug, MPlugArray& plugArray) {
if (plug == mFileAttr)
{
MPlug fileNamePlug(thisMObject(), plug);
MString fileName = fileNamePlug.asString();
}
return MPxNode::setDependentsDirty(plug, plugArray); }
Edit:
Just to clarify, reading plug value itself, plug.asString(), it still holds the old value.

If you take close look in doc you will see why you are not getting the updated value
"IMPORTANT NOTE: since the setDependentsDirty() method is called during dirty propagation, you must be careful not to perform any dependency graph computations from within the routine. Instead, if you want to know the value of a plug, use MDataBlock::outputValue() because it will not result in computation (and thus recursion). In general, the majority of {setDependentsDirty()} methods which users will implement should involve only fixed relationships. In the rare occurence where you need to look at plug values, please heed the warning with {MDataBlock::outputValue()} and use plugs which contain values which you know to be up to date prior to the start of dirty propagation.
"

Consistently using the value of "now" throughout the transaction

I'm looking for guidelines to using a consistent value of the current date and time throughout a transaction.
By transaction I loosely mean an application service method, such methods usually execute a single SQL transaction, at least in my applications.
Ambient Context
One approach described in answers to this question is to put the current date in an ambient context, e.g. DateTimeProvider, and use that instead of DateTime.UtcNow everywhere.
However the purpose of this approach is only to make the design unit-testable, whereas I also want to prevent errors caused by unnecessary multiple querying into DateTime.UtcNow, an example of which is this:
// In an entity constructor:
this.CreatedAt = DateTime.UtcNow;
this.ModifiedAt = DateTime.UtcNow;
This code creates an entity with slightly differing created and modified dates, whereas one expects these properties to be equal right after the entity was created.
Also, an ambient context is difficult to implement correctly in a web application, so I've come up with an alternative approach:
Method Injection + DeterministicTimeProvider
The DeterministicTimeProvider class is registered as an "instance per lifetime scope" AKA "instance per HTTP request in a web app" dependency.
It is constructor-injected to an application service and passed into constructors and methods of entities.
The IDateTimeProvider.UtcNow method is used instead of the usual DateTime.UtcNow / DateTimeOffset.UtcNow everywhere to get the current date and time.
Here is the implementation:
/// <summary>
/// Provides the current date and time.
/// The provided value is fixed when it is requested for the first time.
/// </summary>
public class DeterministicTimeProvider: IDateTimeProvider
{
private readonly Lazy<DateTimeOffset> _lazyUtcNow =
new Lazy<DateTimeOffset>(() => DateTimeOffset.UtcNow);
/// <summary>
/// Gets the current date and time in the UTC time zone.
/// </summary>
public DateTimeOffset UtcNow => _lazyUtcNow.Value;
}
Is this a good approach? What are the disadvantages? Are there better alternatives?

Sorry for the logical fallacy of appeal to authority here, but this is rather interesting:
John Carmack once said:
There are four principle inputs to a game: keystrokes, mouse moves, network packets, and time. (If you don't consider time an input value, think about it until you do -- it is an important concept)"
Source: John Carmack's .plan posts from 1998 (scribd)
(I have always found this quote highly amusing, because the suggestion that if something does not seem right to you, you should think of it really hard until it seems right, is something that only a major geek would say.)
So, here is an idea: consider time as an input. It is probably not included in the xml that makes up the web service request, (you wouldn't want it to anyway,) but in the handler where you convert the xml to an actual request object, obtain the current time and make it part of your request object.
So, as the request object is being passed around your system during the course of processing the transaction, the time to be considered as "the current time" can always be found within the request. So, it is not "the current time" anymore, it is the request time. (The fact that it will be one and the same, or very close to one and the same, is completely irrelevant.)
This way, testing also becomes even easier: you don't have to mock the time provider interface, the time is always in the input parameters.
Also, this way, other fun things become possible, for example servicing requests to be applied retroactively, at a moment in time which is completely unrelated to the actual current moment in time. Think of the possibilities. (Picture of bob squarepants-with-a-rainbow goes here.)

Hmmm.. this feels like a better question for CodeReview.SE than for StackOverflow, but sure - I'll bite.
Is this a good approach?
If used correctly, in the scenario you described, this approach is reasonable. It achieves the two stated goals:
Making your code more testable. This is a common pattern I call "Mock the Clock", and is found in many well-designed apps.
Locking the time to a single value. This is less common, but your code does achieve that goal.
What are the disadvantages?
Since you are creating another new object for each request, it will create a mild amount of additional memory usage and additional work for the garbage collector. This is somewhat of a moot point since this is usually how it goes for all objects with per-request lifetime, including the controllers.
There is a tiny fraction of time being added before you take the reading from the clock, caused by the additional work being done in loading the object and from doing lazy loading. It's negligible though - probably on the order of a few milliseconds.
Since the value is locked down, there's always the risk that you (or another developer who uses your code) might introduce a subtle bug by forgetting that the value won't change until the next request. You might consider a different naming convention. For example, instead of "now", call it "requestRecievedTime" or something like that.
Similar to the previous item, there's also the risk that your provider might be loaded with the wrong lifecycle. You might use it in a new project and forget to set the instancing, loading it up as a singleton. Then the values are locked down for all requests. There's not much you can do to enforce this, so be sure to comment it well. The <summary> tag is a good place.
You may find you need the current time in a scenario where constructor injection isn't possible - such as a static method. You'll either have to refactor to use instance methods, or will have to pass either the time or the time-provider as a parameter into the static method.
Are there better alternatives?
Yes, see Mike's answer.
You might also consider Noda Time, which has a similar concept built in, via the IClock interface, and the SystemClock and FakeClock implementations. However, both of those implementations are designed to be singletons. They help with testing, but they don't achieve your second goal of locking the time down to a single value per request. You could always write an implementation that does that though.

Code looks reasonable.
Drawback - most likely lifetime of the object will be controlled by DI container and hence user of the provider can't be sure that it always be configured correctly (per-invocation and not any longer lifetime like app/singleton).
If you have type representing "transaction" it may be better to put "Started" time there instead.

This isn't something that can be answered with a realtime clock and a query, or by testing. The developer may have figured out some obscure way of reaching the underlying library call...
So don't do that. Dependency injection also won't save you here; the issue is that you want a standard pattern for time at the start of the 'session.'
In my view, the fundamental problem is that you are expressing an idea, and looking for a mechanism for that. The right mechanism is to name it, and say what you mean in the name, and then set it only once. readonly is a good way to handle setting this only once in the constructor, and lets the compiler and runtime enforce what you mean which is that it is set only once.
// In an entity constructor:
this.CreatedAt = DateTime.UtcNow;

Handling PL/pgSQL function's message in C++ code

I am running a calculation in a PL/pgSQL function and I want to use the result of that calculation in my C++ code. What's the best way to do that?
I can insert that result into a table and use it from there but I'm not sure how well that fares with best practices. Also, I can send message to stderr with RAISE NOTICE but I don't know can I use that message in my code.

The details here are a bit thin on the ground, so it's hard to say for sure.
Strongly preferable whenever possible is to just get the function's return value directly. SELECT my_function(args) if it returns a single result, or SELECT * FROM my_function(args); if it returns a row or set of rows. Then process the result like any other query result. This is part of the basic use of simple SQL and PL/PgSQL functions.
Other options include:
Returning a refcursor. This can be useful in some circumstances where you want to return a dynamic result set or multiple result sets, though it's now mostly superseded by RETURN QUERY and RETURN QUERY EXECUTE. You then FETCH from the refcursorto get the result rows.
LISTENing for an event and having the function NOTIFY when the work is done, possibly with the result as a notify payload. This is useful when the function isn't necessarily called on the same connection as the program that wants to use its results.
Create a temporary table in the function, then SELECT from the table from the session that called the function.
Emitting log messages via RAISE and setting client_min_messages so you receive them, then processing them. This is a very ugly way to do it and should really be avoided at all costs.
INSERTing the results into an existing non-temporary table, then SELECTing them out once the transaction commits and the rows become visible to other transactions.
Which is better? It depends entirely on what you're trying to do. In almost all cases the correct thing to do is just call the function and process the return value, but there are exceptions in special cases.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js