Scheduled Django management commands using Zappa & Lamda - django

I've got my Django site working on Lambda using Zappa. It was very simple. I'm now searching to find out how I set up scheduled Django management commands. From what I've read the work around is to create Python functions that execute the management commands and then schedule the functions to run using the Zappa settings file. Is this still the right method as the help manual doesn't say anything?

At the time of writing there is an open Zappa issue about this.
symroe came up with this solution which seems to work nicely:
class Runner:
def __getattr__(self, attr):
from django.core.management import call_command
return lambda: call_command(attr)
import sys
sys.modules[__name__] = Runner()
This allows you to specify any Django management command in your zappa_settings.json file without further code modifications. That bit looks like this where zappa_schedule.py is the name of the file containing the above code and publish_scheduled_pages() is a registered management command:
"events": [{
"function": "zappa_schedule.publish_scheduled_pages",
"expression": "rate(1 hour)"
}],

Related

GCP Composer: Run Python Script in another GCS bucket

I'm new to Airflow, and I'm trying to run a python script that reads data from Bigquery, does some preprocessing, and exports a table back to Bigquery. This is the dag I have
from airflow.models import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
YESTERDAY = datetime.now() - timedelta(days=1)
default_args = {
'owner': 'me',
'depends_on_past': False,
'start_date': YESTERDAY,
'email': [''],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'max_tries': 0,
}
with DAG(
dag_id = 'my_code',
default_args = default_args,
schedule_interval = '#daily',
catchup = False
) as dag:
import data = BashOperator(
task_id = 'daily_task',
bash_command = 'python gs://project_id/folder1/python_script.py'
)
This gives an error of 'No such file or directory found'. I did not set up the Environment in Composer, so I'm not sure if it requires specific credentials. I tried storing the script in the dags folder, but then it wasn't able to access the bigquery tables.
I have two questions:
How do I properly define the location of the python script within another GCS bucket? Should the gs location work if proper credentials are applied? Or do I necessarily have to store the scripts in a folder within the dags folder?
How do I provide the proper credentials (like login ID and password) within the DAG, in case that is all that's needed to solve the issues?
I handwrote the code since the original is in a work laptop and I cannot copy. Let me know if there are any errors. Thank you!
To solve your issue, I propose you a solution which in my opinion, is easier to manage.
Whenever possible it is better to use Python scripts within Composer's Bucket.
Copy your Python script in the Composer bucket and DAG folder with a separated process outside of Composer (gcloud) or directly in the DAG. If you want to do that in the DAG, you can check from this link
Use a Python operator that invokes your Python script inside the DAG
The Service Account used by Composer needs having the good privileges to read and write data to BigQuery. If you copy the Python scripts directly in the DAG, the SA needs to have the privileges to download file from GCS in the project 2.
from your_script import your_method_with_bq_logic
with airflow.DAG(
'your_dag',
default_args=your_args,
schedule_interval=None) as dag:
bq_processing = PythonOperator(
task_id='bq_processing',
python_callable=your_method_with_bq_logic
)
bq_processing
You can import the Python script main method in the code because it exists in the DAG folder.

Running Taurus BlazeMeter on AWS Lambda

I am trying to run a BlazeMeter Taurus script with a JMeter script inside via AWS Lambda. I'm hoping that there is a way to run bzt via a local installation in /tmp/bzt instead of looking for a bzt installation on the system which doesn't really exist since its lambda.
This is my lambda_handler.py:
import subprocess
import json
def run_taurus_test(event, context):
subprocess.call(['mkdir', '/tmp/bzt/'])
subprocess.call(['pip', 'install', '--target', '/tmp/bzt/', 'bzt'])
# subprocess.call('ls /tmp/bzt/bin'.split())
subprocess.call(['/tmp/bzt/bin/bzt', 'tests/taurus_test.yaml'])
return {
'statusCode': 200,
'body': json.dumps('Executing Taurus Test hopefully!')
}
The taurus_test.yaml runs as expected when testing on my computer with bzt installed via pip normally, so I know the issue isn't with the test script. The same traceback as below appears if I uninstall bzt from my system and try use a local installation targeted in a certain directory.
This is the traceback in the execution results:
Traceback (most recent call last):
File "/tmp/bzt/bin/bzt", line 5, in <module>
from bzt.cli import main
ModuleNotFoundError: No module named 'bzt'
It's technically failing in /tmp/bzt/bin/bzt which is the executable that's failing, and I think it is because it's not using the local/targeted installation.
So, I'm hoping there is a way to tell bzt to use keep using the targeted installation in /tmp/bzt instead of calling the executable there and then trying to pass it on to an installation that doesn't exist elsewhere. Feedback if AWS Fargate or EC2 would be better suited for this is also appreciated.
Depending on the size of the bzt package, the solutions are:
Use Lambda Docker recent feature, and this way, what you run locally is what you get on Lambda.
Use Lambda layers (similar to Docker), this layer as the btz module in the python directory as described there
When you package your Lambda, instead of uploading a simple Python file, create a ZIP file containing both: /path/to/zip_root/lambda_handler.py and pip install --target /path/to/zip_root

AWS sagemaker model monitor- ImportError: cannot import name 'ModelQualityMonitor'

I am trying to create a model quality monitor job, using the class ModelQualityMonitor from Sagemaker model_monitor, and i think i have all the import statements defined yet i get the message cannot import name error
from sagemaker import get_execution_role, session, Session
from sagemaker.model_monitor import ModelQualityMonitor
role = get_execution_role()
session = Session()
model_quality_monitor = ModelQualityMonitor(
role=role,
instance_count=1,
instance_type='ml.m5.xlarge',
volume_size_in_gb=20,
max_runtime_in_seconds=1800,
sagemaker_session=session
)
Any pointers are appreciated
Are you using an Amazon SageMaker Notebook? When I run your code above in a new conda_python3 Amazon SageMaker notebook, I don't get any errors at all.
Example screenshot output showing no errors:
If you're getting something like NameError: name 'ModelQualityMonitor' is not defined then I suspect you are running in a Python environment that doesn't have the Amazon SageMaker SDK installed in it. Perhaps try running pip install sagemaker and then see if this resolves your error.

issue with importing dependencies while running Dataflow from google cloud composer

I'm running Dataflow from google cloud composer, the dataflow script contains some non-standard dependencies like zeep, googleads.
which are required to be installed on dataflow worker nodes, so I packaged them with setup.py. when I try to run this in a dag, composer is validating the dataflow files and complaining about No module names Zeep , googleads. So I created pythonvirtualenvoperator and installed all the non standard dependencies required and tried to run the dataflow job and it still complained about inporting zeep and googleads.
Here is my codebase:
PULL_DATA = PythonVirtualenvOperator(
task_id=PROCESS_TASK_ID,
python_callable=execute_dataflow,
op_kwargs={
'main': 'main.py',
'project': PROJECT,
'temp_location': 'gs://bucket/temp',
'setup_file': 'setup.py',
'max_num_workers': 2,
'output': 'gs://bucket/output',
'project_id': PROJECT_ID},
requirements=['google-cloud-storage==1.10.0', 'zeep==3.2.0',
'argparse==1.4.0', 'google-cloud-kms==0.2.1',
'googleads==15.0.2', 'dill'],
python_version='2.7',
use_dill=True,
system_site_packages=True,
on_failure_callback=on_failure_handler,
on_success_callback=on_success_handler,
dag='my-dag')
and my python callable code:
def execute_dataflow(**kwargs):
import subprocess
TEMPLATED_COMMAND = """
python main.py \
--runner DataflowRunner \
--project {project} \
--region us-central1 \
--temp_location {temp_location} \
--setup_file {setup_file} \
--output {output} \
--project_id {project_id}
""".format(**kwargs)
process = subprocess.Popen(['/bin/bash', '-c', TEMPLATED_COMMAND])
process.wait()
return process.returncode
My main.py file
import zeep
import googleads
{Apache-beam-code to construct dataflow pipeline}
Any suggestions?
My job has a requirements.txt. Rather than using the --setup_file option as yours does, it specifies the following:
--requirements_file prod_requirements.txt
This tells DataFlow to install the libraries in requirements.txt prior to running the job.
Reference: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
Using a sample Dataflow pipeline script with import googleads, zeep, I set up a test Composer environment. The DAG is just as yours, and I get the same error.
Then I make a couple of changes, to make sure the dependencies can be found on the worker machines.
In the DAG, I use a plain PythonOperator, not a PythonVirtualenvOperator.
I have my dataflow pipeline and setup file (main.py and setup.py) in a Google Cloud Storage bucket, so Composer can find them.
The setup file has a list of requirements where I need to have e.g. zeep and googleads. I adapted a sample setup file from here, changing this:
REQUIRED_PACKAGES = [
'google-cloud-storage==1.10.0', 'zeep==3.2.0',
'argparse==1.4.0', 'google-cloud-kms==0.2.1',
'googleads==15.0.2', 'dill'
]
setuptools.setup(
name='Imports test',
version='1',
description='Imports test workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
cmdclass={
# Command class instantiated and run during pip install scenarios.
'build': build,
'CustomCommands': CustomCommands,
}
)
My DAG is
with models.DAG( 'composer_sample',
schedule_interval=datetime.timedelta(days=1),
default_args=default_dag_args) as dag:
PULL_DATA = PythonOperator(
task_id='PULL_DATA',
python_callable=execute_dataflow,
op_kwargs={
'main': '/home/airflow/gcs/data/main.py',
'project': PROJECT,
'temp_location': 'gs://dataflow-imports-test/temp',
'setup_file': '/home/airflow/gcs/data/setup.py',
'max_num_workers': 2,
'output': 'gs://dataflow-imports-test/output',
'project_id': PROJECT_ID})
PULL_DATA
with no changes to the Python callable. However, with this configuration I still get the error.
Next step, in the Google Cloud Platform (GCP) console, I go to "Composer" through the navigation menu, and then click on the environment's name. On the tab "PyPI packages", I add zeep and googleads, and click "submit". It takes a while to update the environment, but it works.
After this step, my pipeline is able to import the dependencies and run successfully. I also tried running the DAG with the dependencies indicated on the GCP console but not in the requirements of setup.py. And the workflow breaks again, but in different places. So make sure to do indicate them in both places.
You need to install the libraries in your Cloud Composer environment (check out this link). There is a way to do it within the console but I find these steps easier:
Open your environments page
Select the actual environment where your Composer is running
Navigate to the PyPI Packages tab
Click edit
Manually add each line of your requirements.txt
Save
You might get an error if the version you provided for a library is too old, so check the logs and update the numbers, as needed.

Get files from S3 using Jython in Grinder test script

I have master/worker EC2 instances that I'm using for Grinder tests. I need to try out a load test that directly gets files from an S3 bucket, but I'm not sure how that would look in Jython for the Grinder test script.
Any ideas or tips? I've looked into it a little and saw that Python has the boto package for working with AWS - would that work in Jython as well?
(Edit - adding code and import errors for clarification.)
Python approach:
Did "pip install boto3"
Test script:
from net.grinder.script.Grinder import grinder
from net.grinder.script import Test
import boto3
# boto3 for Python
test1 = Test(1, "S3 request")
resource = boto3.resource('s3')
def accessS3():
obj = resource.Object(<bucket>,<key>)
test1.record(accessS3)
class TestRunner:
def __call__(self):
accessS3()
The error for this is:
net.grinder.scriptengine.jython.JythonScriptExecutionException: : No module named boto3
Java approach:
Added aws-java-sdk-1.11.221 jar from .m2\repository\com\amazonaws\aws-java-sdk\1.11.221\ to CLASSPATH
from net.grinder.script.Grinder import grinder
from net.grinder.script import Test
import com.amazonaws.services.s3 as s3
# aws s3 for Java
test1 = Test(1, "S3 request")
s3Client = s3.AmazonS3ClientBuilder.defaultClient()
test1.record(s3Client)
class TestRunner:
def __call__(self):
result = s3Client.getObject(s3.model.getObjectRequest(<bucket>,<key>))
The error for this is:
net.grinder.scriptengine.jython.JythonScriptExecutionException: : No module named amazonaws
I'm also running things on a Windows computer, but I'm using Git Bash.
Given that you are using Jython, I'm not sure whether you want to execute the S3 request in java or python syntax.
However, I would suggest following along with the python guide at the link below.
http://docs.ceph.com/docs/jewel/radosgw/s3/python/