How to retrieve the new integer data after completion of a call to Document().Set(MapFieldValue{string_key, Increment()}) in Firestore C++ - c++

I'm attempting to atomically increment integer values in Firestore and read the value on the client side after the Set() operation is complete on the server side, with the guarantee that another Set() call won't overwrite the value on the server before the value is retrieved for the client. Without this guarantee it seems that there could be a chance of unwanted data duplication.
However, I can't seem to find a means to guarantee this in my code.
Here's what I've written so far:
doc_ref.Get().OnCompletion([this, ref_name](const firebase::Future<firebase::firestore::DocumentSnapshot>& future) {
if (future.error() == 0) {
const firebase::firestore::DocumentSnapshot& document = *future.result();
db->Collection("collection_name").Document(ref_name).Set({ ref_name, firebase::firestore::FieldValue::Increment(1) })
.OnCompletion([this, ref_name](const firebase::Future<void>& void_future) {
dr_uc.Get().OnCompletion([this, ref_name](const firebase::Future<firebase::firestore::DocumentSnapshot>& new_future) {
if (new_future.error() == 0) {
const firebase::firestore::DocumentSnapshot& new_document = *new_future.result();
int user_name_count = new_document.Get(ref_name).integer_value();
}
});
});
}
});
The call to firebase::firestore::FieldValue::Increment(1) guarantees that the integer value will be updated atomically on the server side but the Get() call doesn't appear to guarantee a read from the server which hasn't been written to by another Set() call prior to the data being retrieved.
Is there some means to provide this guarantee using Firebase's Firestore?

The increment operator does nothing more than ensure the increment happens atomically on the server, and does not involve any transfer of values to/from the client.
If you want full control over the order of the operations, including the read, you probably want to use a transaction to accomplish that.

Related

Using a lock in C++ across multiple tasks

I am not really seeking code examples, but I'm hoping someone can review my program design and provide feedback. I am trying to figure out how do I ensure I have one instance of my "workflow" running at a time.
I am working in C++.
This is my workflow:
I read rows off of a Postgres database.
If the table has any records, I want to do these instructions:
Read the records and transform them to JSON
Send the JSON document to a remote Web service
Parse the response from the service. The service tells me which records were saved or not saved, based on their primary key.
I delete the successfully saved records
I log the unsuccessful records (there's another process that consumes the logs and so my work is done).
I want to perform all of this threads using a separate thread (or "task", whatever higher-level abstraction is available in C++), and I want to make sure that if my function for [1] gets called multiple times, the additional calls basically get "dropped" if step 1 is already in flight.
In C++, I believe I can use a flag and a mutex. I use a something like std::lock_guard<std::mutex> at the top of my method. Then the next line checks for a flag.
// MyWorkflow.cpp
std::mutex myMutex;
int inFlight = 0;
void process() {
std::lock_guard<std::mutex> guard(myMutex);
if (inflight) {
return;
}
inflight = 1;
std::vector<Widget> widgets = readFromMyTable();
std::string json = getJson(&widgets);
... // Send the json to the remote service and handle the response
}
Okay, let me explain my confusion. I want to use Curl to perform the HTTP request. But Curl works asynchronously. And so if I make the asynchronous HTTP call via Curl, my update function will just return and myMutex will be released, right?
I think in my asynchronous response handler, I need to call a second function that's in MyWorkflow.cpp
void markCompletion() {
std::lock_guard<std::mutex> guard(myMutex);
inFlight = 0; // Reset the inflight flag here
}
Is this the right approach? I am worried that if an exception is thrown anywhere before I call markCompletion(), I will block all future callers. I think I need to ensure I have proper exception handling and always call markCompletion().
I am terribly sorry for asking such a noob question, but I really want to learn to do this the right way.

Pagination in Dynamo DB Results with Completable Future

I am querying Dynamo DB for a given primary key. Primary Key consists of two UUID fields (fieldUUID1, fieldUUID2).
I have a lot of queries to be executed for the above primary key combination with list of values. For which i am using Asynchronous CompleteableFuture with ExecutorService with a thread pool of size 4.
After all the queries return results, which is CompletableFuture<Object>, i join them using allOf method of completable future which ensures that all the query execution is complete, and it gives me CompletableFuture<void>, on which using stream i receive CompletableFuture<List<Object>>
If some of the queries result in pagination of result, i.e. returns lastEvaluatedKey, there is no way for me to know which Query Request returned this.
if i do a .get() call while i received `CompletableFuture, this will be a blocking operation, which defeats the purpose of using asynchronous. Is there a way i can handle this scenario?
example:
I can try thenCompose method, but how do i know at what point i need to stop when lastEvaluatedKey is absent.
for (final QueryRequest queryRequest : queryRequests) {
final CompletableFuture<QueryResult> futureResult =
CompletableFuture.supplyAsync(() ->
dynamoDBClient.query(queryRequest), executorService));
if (futureResult == null) {
continue;
}
futures.add(futureResult);
}
// Wait for completion of all of the Futures provided
final CompletableFuture<Void> allfuture = CompletableFuture
.allOf(futures.toArray(new CompletableFuture[futures.size()]));
// The return type of the CompletableFuture.allOf() is a
// CompletableFuture<Void>. The limitation of this method is that it does not
// return the combined results of all Futures. Instead we have to manually get
// results from Futures. CompletableFuture.join() method and Java 8 Streams API
// makes it simple:
final CompletableFuture<List<QueryResult>> allFutureList = allfuture.thenApply(val -> {
return futures.stream().map(f -> f.join()).collect(Collectors.toList());
});
final List<QueryOutcome> completableResults = new ArrayList<>();
try {
try {
// at this point all the Futures should be done, because we already executed
// CompletableFuture.allOf method.
final List<QueryResult> returnedResult = allFutureList.get();
for (final QueryResult queryResult : returnedResult) {
if (MapUtils.isNotEmpty(queryResult.getLastEvaluatedKey()) {
// how to get hold of original request and include last evaluated key ?
}
}
} finally {
}
} finally {
}
I can rely on .get() method, but it will be a blocking call.
the quick solution to your need is to change your futures list. Instead of having it store CompletableFuture<QueryResult> you can change to store CompletableFuture<RequestAndResult> where RequestAndResult is a simple data class holding a QueryRequest and a QueryResult. To do that you need to change your first loop.
Then, once the allfuture completes you can iterate over futures and get access to both the requests and the results.
However, there is a deeper issue here. What are you planning to do once you have access to the origianl QueryRequest? my guess is that you want to issue a followup request with exclusiveStartKey set to whatever the response's lastEvaluatedKey holds. This means that you will wait for all original queries to complete and only then you'll issue the next bunch. This is inefficient: if a query returned with a lastEvaluatedKey you want to issue its followup query ASAP.
To achieve this my advise to you is to introduce a new method which takes a single QueryRequest object and returns a CompletableFuture<QueryResult>. Its implementation will be roughly as follows:
issue a query with the given request
once the result arrives check it. if its lastEvaluatedKey is empty return it as the result of the method
otherwise, update request.exclusiveStartKey and go back to the first step.
Yes, its a bit harder to do that with CompletableFutures (compared to blocking code) but is totally doable.
Once you have that method your code needs to call this method once for each of the requests in queryRequests, put the returned CompletableFutures in a list, and do a CompletableFuture.allOf() on that list. Once the allOf future completes you can just use the results - no need to do issue followup queries.

DynamoDB concurrent write

I have an existing DynamoDB table which has attributes say
---------------------------------------------------------
hk(hash-key)| rk(range-key)| a1 | a2 | a3 |
---------------------------------------------------------
I have an existing DynamoDb client which will only update existing record for a1 only. I want to create a second writer(DDB client) which will also update an existing record, but, for a2 and a3 only. If both the ddb client tries to update same record (1 for a1 and other for a2 and a3) at the exact same time, will DynamoDb guarantee that all a1 a2 a3 are updated with correct value(all three new values)? Is using save behavior UPDATE_SKIP_NULL_ATTRIBUTES sufficient for this purpose or do I need to implement some kind of optimistic locking? If not,
Is there something that DDB provides on the fly for this purpose?
If you happen to be using the Dynamo Java SDK you are in luck, because the SDK supports just that with Optimistic Locking. Im not sure if the other SDKs support anything similar - I suspect they do not.
Optimistic locking is a strategy to ensure that the client-side item
that you are updating (or deleting) is the same as the item in
DynamoDB. If you use this strategy, then your database writes are
protected from being overwritten by the writes of others — and
vice-versa.
Consider using this distributed locking library, https://www.npmjs.com/package/dynamodb-lock-client, here is the sample code we use in our codebase:
const DynamoDBLockClient = require('dynamodb-lock-client');
const PARTITION_KEY = 'id';
const HEARTBEAT_PERIOD_MS = 3e3;
const LEASE_DURATION_MS = 1e4;
const RETRY_COUNT = 1e2;
function dynamoLock(dynamodb, lockKey, callback) {
const failOpenClient = new DynamoDBLockClient.FailOpen({
dynamodb,
lockTable: process.env.LOCK_STORE_TABLE,// replace this with your own lock store table
partitionKey: PARTITION_KEY,
heartbeatPeriodMs: HEARTBEAT_PERIOD_MS,
leaseDurationMs: LEASE_DURATION_MS,
retryCount: RETRY_COUNT,
});
return new Promise((resolve, reject) => {
let error;
// Locking required as several lambda instances may attempt to update the table at the same time and
// we do not want to get lost updates.
failOpenClient.acquireLock(lockKey, async (lockError, lock) => {
if (lockError) {
return reject(lockError);
}
let result = null;
try {
result = await callback(lock);
} catch (callbackError) {
error = callbackError;
}
return lock.release((releaseError) => {
if (releaseError || error) {
return reject(releaseError || error);
}
return resolve(result);
});
});
});
}
async function doStuff(id) {
await dynamoLock(dynamodb, `Lock-DataReset-${id}`, async () => {
// do your ddb stuff here
});
}
Reads to DyanmoDB are eventually consistent.
See this: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html
DynamoDB supports eventually consistent and strongly consistent reads.
Eventually Consistent Reads
When you read data from a DynamoDB table, the response might not
reflect the results of a recently completed write operation. The
response might include some stale data. If you repeat your read
request after a short time, the response should return the latest
data.
Strongly Consistent Reads
When you request a strongly consistent read, DynamoDB returns a
response with the most up-to-date data, reflecting the updates from
all prior write operations that were successful. A strongly consistent
read might not be available if there is a network delay or outage.
Note DynamoDB uses eventually consistent reads, unless you specify
otherwise. Read operations (such as GetItem, Query, and Scan) provide
a ConsistentRead parameter. If you set this parameter to true,
DynamoDB uses strongly consistent reads during the operation.
Basically you have specify that you need to have strongly consistent data when you read.
And that should solve your problem. With consistent reads you should see updates to all three fields.
Do note that there are pricing impacts for strongly consistent reads.

Querying a growing data-set

We have a data set that grows while the application is processing the data set. After a long discussion we have come to the decision that we do not want blocking or asynchronous APIs at this time, and we will periodically query our data store.
We thought of two options to design an API for querying our storage:
A query method returns a snapshot of the data and a flag indicating weather we might have more data. When we finish iterating over the last returned snapshot, we query again to get another snapshot for the rest of the data.
A query method returns a "live" iterator over the data, and when this iterator advances it returns one of the following options: Data is available, No more data, Might have more data.
We are using C++ and we borrowed the .NET style enumerator API for reasons which are out of scope for this question. Here is some code to demonstrate the two options. Which option would you prefer?
/* ======== FIRST OPTION ============== */
// similar to the familier .NET enumerator.
class IFooEnumerator
{
// true --> A data element may be accessed using the Current() method
// false --> End of sequence. Calling Current() is an invalid operation.
virtual bool MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
enum class Availability
{
EndOfData,
MightHaveMoreData,
};
class IDataProvider
{
// Query params allow specifying the ID of the starting element. Here is the intended usage pattern:
// 1. Call GetFoo() without specifying a starting point.
// 2. Process all elements returned by IFooEnumerator until it ends.
// 3. Check the availability.
// 3.1 MightHaveMoreDataLater --> Invoke GetFoo() again after some time by specifying the last processed element as the starting point
// and repeat steps (2) and (3)
// 3.2 EndOfData --> The data set will not grow any more and we know that we have finished processing.
virtual std::tuple<std::unique_ptr<IFooEnumerator>, Availability> GetFoo(query-params) = 0;
};
/* ====== SECOND OPTION ====== */
enum class Availability
{
HasData,
MightHaveMoreData,
EndOfData,
};
class IGrowingFooEnumerator
{
// HasData:
// We might access the current data element by invoking Current()
// EndOfData:
// The data set has finished growing and no more data elements will arrive later
// MightHaveMoreData:
// The data set will grow and we need to continue calling MoveNext() periodically (preferably after a short delay)
// until we get a "HasData" or "EndOfData" result.
virtual Availability MoveNext() = 0;
virtual Foo Current() const = 0;
virtual ~IFooEnumerator() {}
};
class IDataProvider
{
std::unique_ptr<IGrowingFooEnumerator> GetFoo(query-params) = 0;
};
Update
Given the current answers, I have some clarification. The debate is mainly over the interface - its expressiveness and intuitiveness in representing queries for a growing data-set that at some point in time will stop growing. The implementation of both interfaces is possible without race conditions (at-least we believe so) because of the following properties:
The 1st option can be implemented correctly if the pair of the iterator + the flag represent a snapshot of the system at the time of querying. Getting snapshot semantics is a non-issue, as we use database transactions.
The 2nd option can be implemented given a correct implementation of the 1st option. The "MoveNext()" of the 2nd option will, internally, use something like the 1st option and re-issue the query if needed.
The data-set can change from "Might have more data" to "End of data", but not vice versa. So if we, wrongly, return "Might have more data" because of a race condition, we just get a small performance overhead because we need to query again, and the next time we will receive "End of data".
"Invoke GetFoo() again after some time by specifying the last processed element as the starting point"
How are you planning to do that? If it's using the earlier-returned IFooEnumerator, then functionally the two options are equivalent. Otherwise, letting the caller destroy the "enumerator" then however-long afterwards call GetFoo() to continue iteration means you're losing your ability to monitor the client's ongoing interest in the query results. It might be that right now you have no need for that, but I think it's poor design to exclude the ability to track state throughout the overall result processing.
It really depends on many things whether the overall system will at all work (not going into details about your actual implementation):
No matter how you twist it, there will be a race condition between checking for "Is there more data" and more data being added to the system. Which means that it's possibly pointless to try to capture the last few data items?
You probably need to limit the number of repeated runs for "is there more data", or you could end up in an endless loop of "new data came in while processing the last lot".
How easy it is to know if data has been updated - if all the updates are "new items" with new ID's that are sequentially higher, you can simply query "Is there data above X", where X is your last ID. But if you are, for example, counting how many items in the data has property Y set to value A, and data may be updated anywhere in the database at the time (e.g. a database of where taxis are at present, that gets updated via GPS every few seconds and has thousands of cars, it may be hard to determine which cars have had updates since last time you read the database).
As to your implementation, in option 2, I'm not sure what you mean by the MightHaveMoreData state - either it has, or it hasn't, right? Repeated polling for more data is a bad design in this case - given that you will never be able to say 100% certain that there hasn't been "new data" provided in the time it took from fetching the last data until it was processed and acted on (displayed, used to buy shares on the stock market, stopped the train or whatever it is that you want to do once you have processed your new data).
Read-write lock could help. Many readers have simultaneous access to data set, and only one writer.
The idea is simple:
-when you need read-only access, reader uses "read-block", which could be shared with other reads and exclusive with writers;
-when you need write access, writer uses write-lock which is exclusive for both readers and writers;

Asynchronous network calls

I made a class that has an asynchronous OpenWebPage() function. Once you call OpenWebPage(someUrl), a handler gets called - OnPageLoad(reply). I have been using a global variable called lastAction to take care of stuff once a page is loaded - handler checks what is the lastAction and calls an appropriate function. For example:
this->lastAction == "homepage";
this->OpenWebPage("http://www.hardwarebase.net");
void OnPageLoad(reply)
{
if(this->lastAction == "homepage")
{
this->lastAction = "login";
this->Login(); // POSTs a form and OnPageLoad gets called again
}
else if(this->lastAction == "login")
{
this->PostLogin(); // Checks did we log in properly, sets lastAction as new topic and goes to new topic URL
}
else if(this->lastAction == "new topic")
{
this->WriteTopic(); // Does some more stuff ... you get the point
}
}
Now, this is rather hard to write and keep track of when we have a large number of "actions". When I was doing stuff in Python (synchronously) it was much easier, like:
OpenWebPage("http://hardwarebase.net") // Stores the loaded page HTML in self.page
OpenWebpage("http://hardwarebase.net/login", {"user": username, "pw": password}) // POSTs a form
if(self.page == ...): // now do some more checks etc.
// do something more
Imagine now that I have a queue class which holds the actions: homepage, login, new topic. How am I supposed to execute all those actions (in proper order, one after one!) via the asynchronous callback? The first example is totally hard-coded obviously.
I hope you understand my question, because frankly I fear this is the worst question ever written :x
P.S. All this is done in Qt.
You are inviting all manner of bugs if you try and use a single member variable to maintain state for an arbitrary number of asynchronous operations, which is what you describe above. There is no way for you to determine the order that the OpenWebPage calls complete, so there's also no way to associate the value of lastAction at any given time with any specific operation.
There are a number of ways to solve this, e.g.:
Encapsulate web page loading in an immutable class that processes one page per instance
Return an object from OpenWebPage which tracks progress and stores the operation's state
Fire a signal when an operation completes and attach the operation's context to the signal
You need to add "return" statement in the end of every "if" branch: in your code, all "if" branches are executed in the first OnPageLoad call.
Generally, asynchronous state mamangment is always more complicated that synchronous. Consider replacing lastAction type with enumeration. Also, if OnPageLoad thread context is arbitrary, you need to synchronize access to global variables.