How to increase deploy timeout limit at AWS Opsworks? - amazon-web-services

I would like to increase the deploy time, in a stack layer that hosts many apps (AWS Opsworks).
Currenlty I get the following error:
Eror
[2014-05-05T22:27:51+00:00] ERROR: Running exception handlers
[2014-05-05T22:27:51+00:00] ERROR: Exception handlers complete
[2014-05-05T22:27:51+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache/chef-stacktrace.out
[2014-05-05T22:27:51+00:00] ERROR: deploy[/srv/www/lakers_test] (opsworks_delayed_job::deploy line 65) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 600s:
Thanks in advance.

First of all, as mentioned in this ticket reporting a similar issue, the Opsworks guys recommend trying to speed up the call first (there's always room for optimization).
If that doesn't work, we can go down the rabbit hole: this gets called, which in turn calls Mixlib::ShellOut.new, which happens to have a timeout option that you can pass in the initializer!
Now you can use an Opsworks custom cookbook to overwrite the initial method, and pass the corresponding timeout option. Opsworks merges the contents of its base cookbooks with the contents of your custom cookbook - therefore you only need to add & edit one single file to your custom cookbook: opsworks_commons/libraries/shellout.rb:
module OpsWorks
module ShellOut
extend self
# This would be your new default timeout.
DEFAULT_OPTIONS = { timeout: 900 }
def shellout(command, options = {})
cmd = Mixlib::ShellOut.new(command, DEFAULT_OPTIONS.merge(options))
cmd.run_command
cmd.error!
[cmd.stderr, cmd.stdout].join("\n")
end
end
end
Notice how the only additions are just DEFAULT_OPTIONS and merging these options in the Mixlib::ShellOut.new call.
An improvement to this method would be changing this timeout option via a chef attribute, that you could in turn update via your custom JSON in the Opsworks interface. This means passing the timeout attribute in the initial Opsworks::ShellOut.shellout call - not in the method definition. But this depends on how the shellout method actually gets called...

Related

The create_profile_job() method is not accepting the new parameters

I am trying to use a Lambda function (written in python) to create a series of profile jobs in DataBrew. AWS recently added a new parameter to this function ("Configuration) which I have added in my code. However, when I call the function, I get the following error message: "Unknown parameter in input: "Configuration", must be one of: DatasetName, EncryptionKeyArn, EncryptionMode, Name, LogSubscription, MaxCapacity, MaxRetries, OutputLocation, RoleArn, Tags, Timeout, JobSample." This does not match the parameter list in the boto3 documentation, which was recently updated to align with the new features added to DataBrew on 07/23/21. Has anyone else had this issue? If so, is there a timeline for this bug to be fixed?
It turns out that the version of boto3 that is available in Lambda by default is not the most updated version. Hence, in order to use all the parameters for this method, you have to add the latest version of boto3 (and all dependencies) as a Lambda layer.

Google App Engine deferred.defer task not getting executed

I have a Google App Engine Standard Environment application that has been working fine for a year or more, that, quite suddenly, refuses to enqueue new deferred tasks using deferred.defer.
Here's the Python 2.7 code that is making the deferred call:
# Find any inventory items that reference the product, and change them too.
# because this could take some time, we'll do it as a deferred task, and only
# if needed.
if upd:
updater = deferredtasks.InvUpdate()
deferred.defer(updater.run, product_key)
My app.yaml file has the necessary bits to support deferred.defer:
- url: /_ah/queue/deferred
script: google.appengine.ext.deferred.deferred.application
login: admin
builtins:
- deferred: on
And my deferred task has logging in it so I should see it running when it does:
#-------------------------------------------------------------------------------
# DEFERRED routine that updates the inventory items for a particular product. Should be callecd
# when ANY changes are made to the product, because it should trigger a re-download of the
# inventory record for that product to the iPad.
#-------------------------------------------------------------------------------
class InvUpdate(object):
def __init__(self):
self.to_put = []
self.product_key = None
self.updcount = 0
def run(self, product_key, batch_size=100):
updproduct = product_key.get()
if not updproduct:
logging.error("DEFERRED ERROR: Product key passed in does not exist")
return
logging.info(u"DEFERRED BEGIN: beginning inventory update for: {}".format(updproduct.name))
self.product_key = product_key
self._continue(None, batch_size)
...
When I run this in the development environment on my development box, everything works fine. Once I deploy it to the App Engine server, the inventory updates never get done (i.e. the deferred task is not executed), and there are no errors (and no other logging from the deferred task in fact) in the log files on the server. I know that with the sudden move to get everybody on Python 3 as quickly as possible, the deferred.defer library has been marked as not recommended because it only works with the 2.7 Python environment, and I planned on moving to task queues for this, but I wasn't expecting deferred.defer to suddenly stop working in the existing python environment.
Any insight would be greatly appreciated!
I'm pretty sure you cant pass the method of an instance to appengine taskqueue, because that instance will not get exist when your task runs since it will be running in a different process. I actually dont understand how your task ever worked when running remotely in the first place (and running locally is not an accurate representation of how things will run remotely)
Try changing your code to this:
if upd:
deferred.defer(deferredtasks.InvUpdate.run_cls, product_key)
and then InvUpdate is the same but has a new function run_cls:
class InvUpdate(object):
#classmethod
def run_cls(cls, product_key):
cls().run(product_key)
And I'm still on the process of migrating to cloud tasks and my deferred tasks still work

Should I have concern about datastoreRpcErrors?

When I run dataflow jobs that writes to google cloud datastore, sometime I see the metrics show that I had one or two datastoreRpcErrors:
Since these datastore writes usually contain a batch of keys, I am wondering in the situation of RpcError, if some retry will happen automatically. If not, what would be a good way to handle these cases?
tl;dr: By default datastoreRpcErrors will use 5 retries automatically.
I dig into the code of datastoreio in beam python sdk. It looks like the final entity mutations are flushed in batch via DatastoreWriteFn().
# Flush the current batch of mutations to Cloud Datastore.
_, latency_ms = helper.write_mutations(
self._datastore, self._project, self._mutations,
self._throttler, self._update_rpc_stats,
throttle_delay=_Mutate._WRITE_BATCH_TARGET_LATENCY_MS/1000)
The RPCError is caught by this block of code in write_mutations in the helper; and there is a decorator #retry.with_exponential_backoff for commit method; and the default number of retry is set to 5; retry_on_rpc_error defines the concrete RPCError and SocketError reasons to trigger retry.
for mutation in mutations:
commit_request.mutations.add().CopyFrom(mutation)
#retry.with_exponential_backoff(num_retries=5,
retry_filter=retry_on_rpc_error)
def commit(request):
# Client-side throttling.
while throttler.throttle_request(time.time()*1000):
try:
response = datastore.commit(request)
...
except (RPCError, SocketError):
if rpc_stats_callback:
rpc_stats_callback(errors=1)
raise
...
I think you should first of all determine which kind of error occurred in order to see what are your options.
However, in the official Datastore documentation, there is a list of all the possible errors and their error codes . Fortunately, they come with recommended actions for each.
My advice is that your implement their recommendations and see for alternatives if they are not effective for you

Set or modify an AWS Lambda environment variable with Python boto3

i want to set or modify an environment variable in my lambda script.
I need to save a value for the next call of my script.
For exemple i create an environment variable with the aws lambda console and don't set value. After that i try this :
import boto3
import os
if os.environ['ENV_VAR']:
print(os.environ['ENV_VAR'])
os.environ['ENV_VAR'] = "new value"
In this case my value will never print.
I tried with :
os.putenv()
but it's the same result.
Do you know why this environment variable is not set ?
Thank you !
Consider using the boto3 lambda command, update_function_configuration to update the environment variable.
response = client.update_function_configuration(
FunctionName='test-env-var',
Environment={
'Variables': {
'env_var': 'hello'
}
}
)
I need to save a value for the next call of my script.
That's not how environment variables work, nor is it how lambda works. Environment variables cannot be set in a child process for the parent - a process can only set environment variables in its own and child process environments.
This may be confusing to you if you set environment variables at the shell, but in that case, the shell is the long running process setting and getting your environment variables, not the programs it calls.
Consider this example:
from os import environ
print environ['A']
environ['A'] = "Set from python"
print environ['A']
This will only set env A for itself. If you run it several times, the initial value of A is always the shell's value, never the value python sets.
$ export A="set from bash"
$ python t.py
set from bash
Set from python
$ python t.py
set from bash
Set from python
Further, even if that wasn't the case, it wouldn't work reliably with aws lambda. Lambda runs your code on whatever compute resources are available at the time; it will typically cache runtimes for frequently executed functions, so in these cases data could be written to the filesystem to preserve it. But if the next invocation wasn't run in that runtime, your data would be lost.
For your needs, you want to preserve your data outside the lambda. Some obvious options are: write to s3, write to dynamo, or, write to sqs. The next invocation would read from that location, achieving the desired result.
AWS Lambda just executes the piece of code with given set of inputs. Once executed, it returns the output and that's all. If you want to preserve the output for your next call, then you probably need to store that in DB or Queue as Dan said. I personally use SQS in conjunction with SNS that sends me notifications about current state. You can even store the end result like success or failure in SQS which you can use for next trigger. Just throwing the options here, rest all depends on your requirements.

Loopback: return error from beforeValidation hook

I need to make custom validation of instance before saving it to MySQL DB.
So I perform (async) check inside beforeValidate model hook.
MyModel.beforeValidate = function(next){
// async check that finally calls next() or next(new Error('fail'))
}
But when check fails and I pass Error obj to next function, the execution continues anyway.
Is there any way to stop execution and response to client with error?
This is a known bug in the framework, see https://github.com/strongloop/loopback/issues/614
I am working on a new hook implementation that will not have issues like the one you have experienced, see loopback-datasource-juggler#367 and the pull request loopback-datasource-juggler#403