What is the pattern for Google Cloud Functions to implement mutex - google-cloud-platform

I'm using https triggered Google Cloud Functions to handle client requests to perform database writes. The data is structured in a way that most in parallel writes will not result in corruption.
There are few cases where I need to prevent multiple write actions to happen at once for the same item. What are the common patterns to lock access to some resource on the function level. I'm looking for some "mutex-like" functionality.
I was thinking of some external service that could grant or deny access to the resource for requesting function instances, but the connection overhead would be huge - handshake each time etc.
Added an example as requested. In this specific case, restructuring the data to keep the track of updates isn't a suitable solution.
import * as admin from "firebase-admin";
function updateUserState(userId: string) {
// Query current state
admin
.database()
.ref()
.child(`/users/${userId}/state`)
.once("value")
.then(snapshot => {
return snapshot.val() || 0;
})
.then(currentState =>
// Perform some operation
modifyStateAsync(currentState)
)
.then(newState => {
admin
.database()
.ref()
.child(`/users/${userId}/state`)
.set(newState);
});
}

This is not a pattern that you want to implement in Cloud Functions. Restricting the parallelism of Cloud Functions would limit its scalability, which is counter to the way Cloud Functions works. To learn more about how Cloud Functions scales, watch this video.
If you have a database that needs to have some protection against concurrent access, you should be using the database's own transaction features. Pretty much every database that provides concurrent access to data also provides some ability to perform atomic transactions. Use these transactions, and let the serverless container scale up and down in the way it sees fit.

In the Google Cloud there is an elegant way to have a global distributed mutex for a critical section in a Cloud Function:
gcslock
This is a library written in Go language, hence available for Cloud Functions written in Go, that utilises atomicity guarantees of the Google Cloud Storage service. This approach is apparently not available in AWS because of the lack of such guarantees in the S3 service.
The tool is not applicable for every use case. Acquiring and releasing the lock are operations of order of 10ms, which might be too much for high speed processing use cases.
For a typical batch process, that is not time critical, the tool provides pretty interesting option of guaranteeing that your Cloud Function is not running concurrently over the same target resource. Just create the lock file in GCS with the name that is unique for the operation that you'd like to put into the critical section and release it once its done (or rely on the GCS object lifecycle management to clean the locks up).
Please see more considerations and pros and cons in the original tool GitHub project.
There is also apparently an implementation of the same in Python.
Here is a nice article that summarises use cases for distributed locking on GCP in particular.

Related

Common DynamoDB through a common lambda or each microservice

We have a "shared" layer that has a few resources accessed by different services in the project. There is a table storing shared information (user permission on each of the resources in the project, since it can get big so not being stored in JWT token)
Should we have a Lamba read the dynamoDB table and give other microservices access to the shared lambda only or should we give the microservices access to the table directly so that they can just use a lib method to read the permissions from the table? I am leaning towards direct DynamoDB table access since that avoids the extra hoop through a lambda.
Both approaches have advantages & disadvantages:
Direct Access to DynamoDB - Good Sides
The authors of the other Lambda functions can build on their own phases. Faster teams can sprint and not wait for the slower team
If one lambda function is misbehaving / failing, the other lambdas are still decoupled from it and the blast radius gets limited
Direct Access to DynamoDB - Bad sides
The effort for writing similar stuff is duplicated in different lambda instances.
Each lambda can write their own logic and introduce differences in implementations. This could be intentionally designed to work that way but it could also be that one developer misunderstood the requirements
If this DynamoDB gets poisoned by wrong coding by one of the consuming lambdas, the other lambdas can also go down.
It becomes hard to measure the reserve capacity, Some of the lambdas can easily become greedy when it comes to read units.
Mediating Lambda - Good Sides
Reduces the effort required to implement similar logic for different consumers
If the shared lambda that manages the DynamoDB is performing actions like audit trail storing, you will be able to easily measure the required read & write capacity units.
If it is decoupled from the consumers, then the failure can be reduced and contained within it.
Mediating Lambda - Bad Sides
This shared lambda can easily become a single point of failure if the consuming lambdas are expecting return values from it.
More communication is required between the team managing this lambda and the consuming teams. Politics can easily be introduced by this Lambda :D
If the consuming teams are developing in a much faster rate than the owner of this shared lambda, it could easily be a blocker to other teams if integration is done poorly.

Load testing AWS SDK client

What is the recommended way to performance test AWS SDK clients? I'm basically just listing/describing resources and would like to see what happens when I query 10k objects. Does AWS provide some type of mock API, or do I really need to request 10k of each type of resource to do this?
I can of course mock in at least two levels:
SDK: I wrap the SDK with my own interfaces and create mocks. This doesn't exercise the SDK's JSON to objects code and my mocks affect the AppDomain with additional memory, garbage collection, etc.
REST API: As I understand it the SDKs are just wrappers to the REST API (hence the HTTP response codes shown in the objects. It seems I can configure the SDK to go to custom endpoints.
This isolates the mocks from the main AppDomain and is more representative, but of course I'm still making some assumptions about response time, limits, etc.
Besides the above taking a long time to implement, I would like to make sure my code won't fail at scale, either locally or at AWS. The only way I see to guarantee that is creating (and paying for) the resources at AWS. Am I missing anything?
When you query 10k or more objects you'll have to deal with:
Pagination - the API usually returns only a limited number of items per call, providing NextToken for the next call.
Rate Limiting - if you hammer some AWS APIs too much they'll rate limit you which the SDK will probably report as some kind of Rate Limit Exceeded Exception.
Memory usage - hopefully you don't collect all the results in the memory before processing. Process them as they arrive to conserve your operating memory.
Other than that I don't see why it shouldn't work.
Update: Also check out Moto - the AWS mocking library (for Python) that can also run in a standalone mode for use with other languages. However as with any mocking it may not behave 100% the same as the real thing, for instance around the Rate Limiting behaviour.

Implement atomic transactions over multiple AWS resources

I want to implement atomic transactions over multiple AWS resources -- e.g. uploading an object to S3 and adding a record to a DynamoDB table. Both should happen in lockstep -- or not at all. If one of the operations fails, the other should be rolled back. I understand I can implement it myself, but I was wondering if there is an existing library that does it.
One of the challenges while implementing this is expiry of temporary credentials. What if credentials expire after one of the operations was performed?
Any suggestions?
Transactions are hard! Especially in a distributed system. Transactions are also slow.
If there is any way to redesign your system to not require transactional semantics, I strongly encourage you to try.
If you really need transactions, involving multiple AWS resources, across different services.. you sort of have to roll your own. You can leverage a distributed data store that supports atomic operations and build on top of that.
It won’t be easy.

Api Gateway, multiple lambda in the same JAR

I'm trying to deploy an API suite by using Api Gateway and implementing code in Java using lambda. Is it ok to have many ( related, of course ) lambdas in a single jar ( what I'm supposing to do ) or it is better to create a single jar for each lambda I want to deploy? ( this will became a mess very easily)
This is really a matter of taste but there are a few things you have to consider.
First of all there are limitations to how big a single Lambda upload can be (50MB at time of writing).
Second, there is also a limit to the total size of all all code that you upload (currently 1.5GB).
These limitations may not be a problem for your use case but are good to be aware of.
The next thing you have to consider is where you want your overhead.
Let's say you deploy a CRUD interface to a single Lambda and you pass an "action" parameter from API Gateway so that you know which operation you want to perform when you execute the Lambda function.
This adds a slight overhead to your execution as you have to route the action to the appropriate operation. This is likely a very fast routing but nevertheless, it adds CPU cycles to your function execution.
On the other hand, deploying the same jar over several Lambda function will quickly get you closer to the limits I mentioned earlier and it also adds administrative overhead in managing your Lambda functions as that number grows. They can of course be managed via CloudFormation or cli scripts but it will still add an administrative overhead.
I wouldn't say there is a right and a wrong way to do this. Look at what you are trying to do, think about what you would need to manage the deployment and take it from there. If you get it wrong you can always start over with another approach.
Personally I like the very small service Lambdas that do internal routing and handles more than just a single operation but they are still very small and focused on a specific type of task be it a CRUD for a database table or managing a selected few very closely related operations.
There's some nice advice on serverless.com
As polythene say's, the answer is "it depends". But they've listed the pros and cons for 4 ways of going about it:
Microservices Pattern
Services Pattern
Monolithic Pattern
Graph Pattern
https://serverless.com/blog/serverless-architecture-code-patterns/

what does low level storage management like iRODS exactly for (in fedora-commons)?

I am not clear about the actual advantage of having iRODS or any other low level storage management. What are it's benefits exactly and when should we use it?
In Fedora-commons with normal file system low level storage:
a datastream created on May 8th, 2009 might be located in the 2009/0508/20/48/ directory.
How does iRODS helpful here?
I wanted to close the loop here, for other Stack Overflow users.
You posted the same question to our Google Group https://groups.google.com/d/msg/irod-chat/fti4ZHvmS-Y/LU8CQCZQHwAJ The question was answered there, and, thanks to you, the response is now also posted on the iRODS.org FAQ: http://irods.org/faq/
Here it is, once again, for posterity:
Don’t think of iRODS as simply low level storage management.
iRODS is really the only platform for policy managed data preservation. It does indeed virtualize storage, providing a global, logical namespace over heterogeneous types of storage, but it also allows you to enforce preservation policies at each storage location, no matter what client or access method is used. It also provides a global metadata catalog that is automatically maintained and reflects the application of your preservation policies, allowing audit and verification of your preservation policies.
iRODS is developing a powerful metadata management capability, allowing pluggable indexing and query capability that allow synchronization with external indices (e.g. Elastic Search, MAUI, Jena triple store).
With the pluggable rule engine and asynchronous messaging architecture, it becomes rather straightforward to generate audit and provenance metadata that will track every single (pre- and post-) operation on your data, including any plugins you may develop or utilize.
iRODS is middleware, rather than a prepackaged solution. This middleware supports plugins and configurable policies at all points, so you are not limited by a pre-defined set of tools. iRODS also can be connected to wide range of preservation, computation, and enterprise services, and can manage large amounts of data (both in number of objects and size of those objects), and efficiently move and manage data using high performance protocols, including third party data transfer protocols.
iRODS is built to support federation, so that your preservation environment may share data with other institutions or organizations while remaining under your own audit and policy control. Many organizations are doing this for many millions of objects, many thousands of users, and with a large range of object sizes.