Dynamically update variables for Google Cloud Functions - google-cloud-platform

I currently have a Firebase Function and to do its task it needs a key. This key changes every 4-20 days and I want to be able to have the functions update the key themselves. What would the best way to do this be? To get the key it is a slow network call to a 3rd party API so I'd rather store it. Currently I have an environment variable that I change myself when I find the functions failing, but I would rather have this process done automatically.
I don't think I can change the environment variables at run time so is the only option to store the value in my database and query for that every time I need it? This seems a bit slow, but I'm not sure.

is the only option to store the value in my database and query for that every time I need it?
Cloud Functions is stateless and will not retain any information outside of the code and data that was deployed with the function. So, you will need some sort of persistent storage to hold the key. It doesn't have to be a database. It can be any persistent storage you want.
You can certainly just read the key once (from wherever you choose to store it) and store it in memory if it was not previously read, for as long as you are allowed to keep using it without refreshing the value. Memory does persist for some time per server instance, but it is not shared among all of your function invocations, as each one might run on a different instance.

As informed to me by Paul Rudin on the Google Cloud Slack, you could cache the key as a global variable which is, in practice, often reused: https://cloud.google.com/functions/docs/bestpractices/tips#use_global_variables_to_reuse_objects_in_future_invocations
Use global variables to reuse objects in future invocations There is
no guarantee that the state of a Cloud Function will be preserved for
future invocations. However, Cloud Functions often recycles the
execution environment of a previous invocation. If you declare a
variable in global scope, its value can be reused in subsequent
invocations without having to be recomputed.
This way you can cache objects that may be expensive to recreate on
each function invocation. Moving such objects from the function body
to global scope may result in significant performance improvements.
The following example creates a heavy object only once per function
instance, and shares it across all function invocations reaching the
given instance:
// Global (instance-wide) scope
// This computation runs at instance cold-start
const instanceVar = heavyComputation();
/**
* HTTP function that declares a variable.
*
* #param {Object} req request context.
* #param {Object} res response context.
*/
exports.scopeDemo = (req, res) => {
// Per-function scope
// This computation runs every time this function is called
const functionVar = lightComputation();
res.send(`Per instance: ${instanceVar}, per function: ${functionVar}`);
};

Related

What is the timeout time for simple caching in AWS Lambda?

So I was finding caching solutions for my AWS Lambda functions and I find out something called 'Simple Caching'. It's fits perfectly for what I want since my data is not changed frequently. However one thing that I was unable to find that what is the timeout for this cache. When is the data refreshed by the function and is there any way I can control it ?
An example of the code I am using for the function:
let cachedValue;
module.exports.handler = function(event, context, callback) {
console.log('Starting Lambda.');
if (!cachedValue) {
console.log('Setting cachedValue now...');
cachedValue = 'Foobar';
} else {
console.log('Cached value is already set: ', cachedValue);
}
};
What you're doing here is taking advantage of a side effect of container reuse. There is no lower or upper bound for how long such values will persist, and no guarantee that they will persist at all. It's a valid optimization to use, but it's entirely outside your control.
Importantly, you need to be aware that this stores the value in one single container. It lives for as long as the Node process in the container are alive, and is accessible whenever a future invocation of the function reuses that process in that container.
If you have two or more invocations of the same function running concurrently, they will not be in the same container, and they will not see each other's global variables. This doesn't make it an invalid technique, but you need to be aware of that fact. The /tmp/ directory will exhibit very similar behavior, which is why you need to clean that up when you use it.
If you throw any exception, the process and possibly the container will be destroyed, either way the cached values will be gone on the next invocation, since there's only one Node process per container.
If you don't invoke the function at all for an undefined/undocumented number of minutes, the container will be released by the service, so this goes away.
Re-deploying the function will also clear this "cache," since a new function version won't reuse containers from older function versions.
It's a perfectly valid strategy as long as you recognize that it is a feature of a black box with no user-serviceable parts.
See also https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/ -- a post that is several years old but still accurate.

Is it safe to store an CFC user content handler in the Application scope?

I have read a lot of posts about storing CFCs in the Application scope, and I understand that if a CFC stores data then it should not be in the Application scope. All CFCs that do non-util stuff would store data - when you pass in parameters like a username or email address - so I don't get when and when not to use the Application scope for a non-util cfc.
My question is that I have a posthandler.cfc component of about 500 lines of code which handles posts from a user (just like SO would handle each question being posted on this site). The posthandler.cfc component:
'cleans' any images and text submitted by the user
places the images in the correct folder
writes all the text to a database
returns a URL where the post can be viewed
The returned URL is received by a simple Jquery ajax call which redirects the user to the URL.
This happens quite regularly on the site and at the moment a new CFC instance is being created for each post. Would it safe to put it in the Application scope instead and not cause race/locking conditions?
Just passing in parameters doesn't "save" anything. Conceptually, each thread has its own arguments and local scope, which are not visible to any other thread, and cease to exist when the function exits. So from that perspective, there's no conflict.
Also, storing data doesn't mean saving it to a database table. It refers to components that maintain state by storing data in a shared scope/object/etc.. With "shared" meaning the resource is accessible to other threads, and can potentially be modified by multiple threads at the same time, leading to race conditions.
For example, take this (contrived) component function that "saves" information in the variables scope. If you create a new instance of that component each time, the function is safe because each request gets it's own instance and separate copy of the variables scope to play with.
public numeric function doStuff( numeric num1, numeric num2 ) {
variables.firstNum = arguments.num1 * 12;
variables.secondNum = arguments.num2 * 10;
return variables.firstNum / variables.secondNum;
}
Now take that same component and put it in the application scope. It's no longer safe. As soon as you store it in the application scope, the instance - AND its variables - become application scoped as well. So when the function "saves" data to the variables scope it's essentially updating an application variable. Obviously those aren't thread safe because they're are accessible to all requests. So multiple threads could easily read/modify the same variables at the same time, resulting in a race condition.
// "Essentially" becomes this ....
public numeric function doStuff( numeric num1, numeric num2 ) {
application.firstNum = arguments.num1 * 12;
application.secondNum = arguments.num2 * 10;
return application.firstNum / application.secondNum;
}
Also, as James A Mohler pointed out, the same issue occurs when you omit the scope. Declaring a function variable without a scope does NOT make it local to the function. It makes it part of the default scope: variables - (creating the same thread safety problem described above). This behavior has led to many a threading bug, when developers forget to scope a single query variable or even a loop index. So be sure to explicitly scope EVERY function variable.
// Implicitly creates "variables.firstNum" and "variables.secondNum"
public numeric function doStuff( numeric num1, numeric num2 ) {
firstNum = arguments.num1 * 12;
secondNum = arguments.num2 * 10;
return firstNum / secondNum;
}
Aside from adding locking, both examples could be made thread safe by explicitly using local scope instead. By storing data in the transient, local scope, it's not visible to other threads and ceases to exist once the function exits.
public numeric function doStuff( numeric num1, numeric num2 ) {
local.firstNum = arguments.num1 * 12;
local.secondNum = arguments.num2 * 10;
return local.firstNum / local.secondNum;
}
Obviously there are other cases to consider, such as complex objects or structures, which are passed by reference, and whether or not those objects are modified within the function. But hopefully that sheds some light on what's meant by "saving data" and how scoping can make the difference between a stateless component (safe for the application scope) and stateful components (which are not).
TL;DR;
In your case, it sounds like most of the information is not shared, and is request level (user info, uploaded images, etc..), so it's probably safe to store in the application scope.

Is decrypting and setting environment variables inside an AWS Lambda function a vulnerability?

Inside of AWS Lambda functions I have some libs that look for sensitive information in the os environment. I can encrypt the env vars using KMS, but I've found myself having to overwrite the encrypted env vars in the lambda handler module -- is this a vulnerability? Eg.
# lambda_handler.py
encrypted_env_var = os.environ["SECRET_KEY"]
decrypted_env_var = decrypt(encrypted_env_var)
os.environ["SECRET_KEY"] = decrypted_env_var
def lambda_function(event, context):
... libs get and use SECRET_KEY ...
I understand that encrypting them covers you eg. when using awscli, but could setting this in the container be a vulnerability? As I understand from here, the container may not be destroyed immediately.
Furthermore, in the suggested decryption code snippet that AWS gives you (in the lambda dashboard), the comments caught my attention:
# lambda_handler.py
ENCRYPTED = os.environ['SECRET_KEY']
# Decrypt code should run once and variables stored outside of the function
# handler so that these are decrypted once per container
DECRYPTED = boto3.client('kms').decrypt(CiphertextBlob=b64decode(ENCRYPTED))['Plaintext']
def lambda_handler(event, context):
# handle the event here
Would it be sufficient (albeit messy) to just unset the relevant variables at the end of the function?
Thanks
the container may not be destroyed immediately.
The container almost certainly will not be destroyed immediately... but that isn't bad.
Containers can persist for minutes to hours, and are reused -- this is why Lambda functions are usually able to execute so quickly, because under ideal conditions, the majority of your function invocations will find an idle container that they can reuse.
However... containers are reused only by one single version of one single function of yours. Nobody else, and no other functions or different versions of the same function, even from the same account, reuse your containers. Only the function version that caused the container to be created.
The infrastructure destroys containers that are no longer needed, and it does this based on the traffic your function is seeing (or not seeing). Anecdotal observations suggest that all idle containers will disappear completely after 10-15 minutes if you completely stop invoking your function, but this is not documented and may vary.
the comments caught my attention
The comments are saying that you should store the decrypted variables outside the handler function so they are in a global scope -- it would be very inefficient to call the KMS API to decrypt the variables with each invocation of the function. You could conceivably do that, but you should store the encrypted environment variables under a different name, and then set the decrypted values into the names your code expects.
Would it be sufficient (albeit messy) to just unset the relevant variables at the end of the function?
That won't work, unless you store the original encrypted values elsewhere (such as using different initial environment variable names), because your example code would overwrite the encrypted values, and the next time your function runs in that container, it would fail to decrypt those unset values.
The point of encrypted environment variables is that while they are stored at rest by the Lambda infrastructure, they are encrypted. Once they are in the memory space of your container, there is not a meaningful chance of them being compromised, since (as noted above) those containers are exclusive to your function.

Best way to continually stream out variables as a function is computed?

I have a function that I need to make multiple instances of, but that function requires variables from the previous instance to run. There are 5 variables at different stages of the function that the other ones need to run, so I want to be able to create 5 different instances of the function because the function that inputs data into this function is much faster.
What I am in the process of doing is creating a class with a buffer that will notify each other when each stage is computed by using conditional variables and ofcourse mutex to lock.
What is the fastest way to do this to minimize any time lost since the whole goal is to create multiple instances of this function to process data in multi-threaded manner?

Consistently using the value of "now" throughout the transaction

I'm looking for guidelines to using a consistent value of the current date and time throughout a transaction.
By transaction I loosely mean an application service method, such methods usually execute a single SQL transaction, at least in my applications.
Ambient Context
One approach described in answers to this question is to put the current date in an ambient context, e.g. DateTimeProvider, and use that instead of DateTime.UtcNow everywhere.
However the purpose of this approach is only to make the design unit-testable, whereas I also want to prevent errors caused by unnecessary multiple querying into DateTime.UtcNow, an example of which is this:
// In an entity constructor:
this.CreatedAt = DateTime.UtcNow;
this.ModifiedAt = DateTime.UtcNow;
This code creates an entity with slightly differing created and modified dates, whereas one expects these properties to be equal right after the entity was created.
Also, an ambient context is difficult to implement correctly in a web application, so I've come up with an alternative approach:
Method Injection + DeterministicTimeProvider
The DeterministicTimeProvider class is registered as an "instance per lifetime scope" AKA "instance per HTTP request in a web app" dependency.
It is constructor-injected to an application service and passed into constructors and methods of entities.
The IDateTimeProvider.UtcNow method is used instead of the usual DateTime.UtcNow / DateTimeOffset.UtcNow everywhere to get the current date and time.
Here is the implementation:
/// <summary>
/// Provides the current date and time.
/// The provided value is fixed when it is requested for the first time.
/// </summary>
public class DeterministicTimeProvider: IDateTimeProvider
{
private readonly Lazy<DateTimeOffset> _lazyUtcNow =
new Lazy<DateTimeOffset>(() => DateTimeOffset.UtcNow);
/// <summary>
/// Gets the current date and time in the UTC time zone.
/// </summary>
public DateTimeOffset UtcNow => _lazyUtcNow.Value;
}
Is this a good approach? What are the disadvantages? Are there better alternatives?
Sorry for the logical fallacy of appeal to authority here, but this is rather interesting:
John Carmack once said:
There are four principle inputs to a game: keystrokes, mouse moves, network packets, and time. (If you don't consider time an input value, think about it until you do -- it is an important concept)"
Source: John Carmack's .plan posts from 1998 (scribd)
(I have always found this quote highly amusing, because the suggestion that if something does not seem right to you, you should think of it really hard until it seems right, is something that only a major geek would say.)
So, here is an idea: consider time as an input. It is probably not included in the xml that makes up the web service request, (you wouldn't want it to anyway,) but in the handler where you convert the xml to an actual request object, obtain the current time and make it part of your request object.
So, as the request object is being passed around your system during the course of processing the transaction, the time to be considered as "the current time" can always be found within the request. So, it is not "the current time" anymore, it is the request time. (The fact that it will be one and the same, or very close to one and the same, is completely irrelevant.)
This way, testing also becomes even easier: you don't have to mock the time provider interface, the time is always in the input parameters.
Also, this way, other fun things become possible, for example servicing requests to be applied retroactively, at a moment in time which is completely unrelated to the actual current moment in time. Think of the possibilities. (Picture of bob squarepants-with-a-rainbow goes here.)
Hmmm.. this feels like a better question for CodeReview.SE than for StackOverflow, but sure - I'll bite.
Is this a good approach?
If used correctly, in the scenario you described, this approach is reasonable. It achieves the two stated goals:
Making your code more testable. This is a common pattern I call "Mock the Clock", and is found in many well-designed apps.
Locking the time to a single value. This is less common, but your code does achieve that goal.
What are the disadvantages?
Since you are creating another new object for each request, it will create a mild amount of additional memory usage and additional work for the garbage collector. This is somewhat of a moot point since this is usually how it goes for all objects with per-request lifetime, including the controllers.
There is a tiny fraction of time being added before you take the reading from the clock, caused by the additional work being done in loading the object and from doing lazy loading. It's negligible though - probably on the order of a few milliseconds.
Since the value is locked down, there's always the risk that you (or another developer who uses your code) might introduce a subtle bug by forgetting that the value won't change until the next request. You might consider a different naming convention. For example, instead of "now", call it "requestRecievedTime" or something like that.
Similar to the previous item, there's also the risk that your provider might be loaded with the wrong lifecycle. You might use it in a new project and forget to set the instancing, loading it up as a singleton. Then the values are locked down for all requests. There's not much you can do to enforce this, so be sure to comment it well. The <summary> tag is a good place.
You may find you need the current time in a scenario where constructor injection isn't possible - such as a static method. You'll either have to refactor to use instance methods, or will have to pass either the time or the time-provider as a parameter into the static method.
Are there better alternatives?
Yes, see Mike's answer.
You might also consider Noda Time, which has a similar concept built in, via the IClock interface, and the SystemClock and FakeClock implementations. However, both of those implementations are designed to be singletons. They help with testing, but they don't achieve your second goal of locking the time down to a single value per request. You could always write an implementation that does that though.
Code looks reasonable.
Drawback - most likely lifetime of the object will be controlled by DI container and hence user of the provider can't be sure that it always be configured correctly (per-invocation and not any longer lifetime like app/singleton).
If you have type representing "transaction" it may be better to put "Started" time there instead.
This isn't something that can be answered with a realtime clock and a query, or by testing. The developer may have figured out some obscure way of reaching the underlying library call...
So don't do that. Dependency injection also won't save you here; the issue is that you want a standard pattern for time at the start of the 'session.'
In my view, the fundamental problem is that you are expressing an idea, and looking for a mechanism for that. The right mechanism is to name it, and say what you mean in the name, and then set it only once. readonly is a good way to handle setting this only once in the constructor, and lets the compiler and runtime enforce what you mean which is that it is set only once.
// In an entity constructor:
this.CreatedAt = DateTime.UtcNow;