I am doing this using Django / GAE / Python environment:
cron:
#run events every 12 hours
and
def events(request):
# read all records
# Do some processing on a few records
return http.HTTPResponseGone('Some Records are modified' )
Result in production :
Job runs on time with 'failed' message
However, it has done the job exactly on the datastore as required
No error log entry seen
Dev : No errors ; returns the message 'Some Records are modified'
Is it possible to avoid HTTP Response returned ? There is no need for HTTPResponse for me, however, I have kept this as Dev server testing fails in its absence. Can some one
help me to make the code clean?
Gone is error 410. You should return 200 Success if the operation succeeds. When you return HttpResponse, the default status is 200.
Related
I am not sure where I am doing wrong. I am running a DAG and I have set retries to 5. My expectation is that, If a task got failed, it will retry 5 times and if it doesn't get pass it will mark it as failed but contrary to that, the task marked as success and triggered the upstream task, despite it has not done all the updates required in that particular task to get complete. Am I missing any required parameter? Thanks.
Note- I am using dataproc cluster and updating records in BQ table and it has limitation of 1500 updates/table/day. That limit got exceeded, but airflow didn't bother to mark it failed, rather it has triggered upstream task. Any fix which we can use? so that the task would have failed even if there is limit/quota exceed issue?
issue- "write to table failed: Reason: 403 limit error due to append/update limit exceeded"- After this, It started running the other codes available. But how can I tell Airflow that it's an error and should stop the task?
writing_to_bq(df=ent1, table, retry_number=3)
def writing_to_bq(df,table,retry):
for _ in range(retry+1):
try:
df.to_gbq(table,if_exists=“append")
break
except
Exception as e:
log.info("Trying again”)
Normally if you set args globally (for all your tasks) or only in the task, it should work as expected :
dag_default_args={
'owner' : 'Airflow',
'retries': 5,
'retry_delay': timedelta(minutes=2),
...
}
with airflow.DAG(
"your_dag_id",
default_args=dag_default_args,
schedule_interval=None) as dag:
....
In this example, I used a retry_delay to 2 minutes.
After discussed together, you indicated the BigQueryInsertJobOperator operator doesn't mark the task as failed if you receive a 403 status for a quota limit exceeded.
You can also use a PythonOperator with Python BigQuery client, in this case normally the client will throw an error for a 403 status. If it's not the case, you can have more flexibility to apply logic based on the result :
def execute_bq_query():
from google.cloud import bigquery
client = bigquery.Client()
QUERY = 'SELECT ....'
query_job = client.query(QUERY)
rows = query_job.result()
PythonOperator(
task_id="task_id",
python_callable=execute_bq_query
)
New to using Pytest on APIs. From my understanding, testing creates another instance of Flask. Additionally, from the tutorials I have seen, they also suggest to create a separate DB table instance to add, fetch and remove data for test purposes. However, I simply plan to use the remote api URL as host to simply make the call.
Now, I set my conftest like this, where the flag --testenv would indicate to make the get/post call on the host listed below:
import pytest
import subprocess
def pytest_addoption(parser):
"""Add option to pass --testenv=api_server to pytest cli command"""
parser.addoption(
"--testenv", action="store", default="exodemo", help="my option: type1 or type2"
)
#pytest.fixture(scope="module")
def testenv(request):
return request.config.getoption("--testenv")
#pytest.fixture(scope="module")
def testurl(testenv):
if testenv == 'api_server':
return 'http://api_url:5000/'
else:
return 'http://locahost:5000'
And my test file is written like this:
import pytest
from app import app
from flask import request
def test_nodes(app):
t_client = app.test_client()
truth = [
{
*body*
}
]
res = t_client.get('/topology/nodes')
print (res)
assert res.status_code == 200
assert truth == json.loads(res.get_data)
I run the code using this:
python3 -m pytest --testenv api_server
The thing I expect is that the test file would simply make a call to the remote api with the creds, fetch the data regardless of how it gets pulled in the remote code, and bring it here for assertion. However, I am getting the 400 BAD REQUEST error, with the error being like this:
assert 400 == 200
E + where 400 = <WrapperTestResponse streamed [400 BAD REQUEST]>.status_code
single_test.py:97: AssertionError
--------------------- Captured stdout call ----------------------
{"timestamp": "2022-07-28 22:11:14,032", "level": "ERROR", "func": "connect_to_mysql_db", "line": 23, "message": "Error connecting to the mysql database (2003, \"Can't connect to MySQL server on 'mysql' ([Errno -3] Temporary failure in name resolution)\")"}
<WrapperTestResponse streamed [400 BAD REQUEST]>
Does this mean that the test file is still trying to lookup the database locally for fetching? I am unable to figure out on which host are they sending the test url as well, so I am kind of stuck here. Looking to get some help around here.
Thanks.
I am trying to run a query in BigQuery right after a dataflow job completes successfully. I have defined 3 different functions in main.py.
The first one is for running the dataflow job. The second one checks the dataflow jobs status. And the last one runs the query in BigQuery.
The trouble is the second function checks the dataflow job status multiple times for a period of time and after the dataflow job completes successfully, it does not stop checking the status.
And then function deployment fails due to 'function load attempt timed out' error.
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
import os
import re
import config
from google.cloud import bigquery
import time
global flag
def trigger_job(gcs_path, body):
credentials = GoogleCredentials.get_application_default()
service = build('dataflow', 'v1b3', credentials=credentials, cache_discovery=False)
request = service.projects().templates().launch(projectId=config.project_id, gcsPath=gcs_path, body=body)
response = request.execute()
def get_job_status(location, flag):
credentials=GoogleCredentials.get_application_default()
dataflow=build('dataflow', 'v1b3', credentials=credentials, cache_discovery=False)
result=dataflow.projects().jobs().list(projectId=config.project_id, location=location).execute()
for job in result['jobs']:
if re.findall(r'' + re.escape(config.job_name) + '', job['name']):
while flag==0:
if job['currentState'] != "JOB_STATE_DONE":
print('NOT DONE')
else:
flag=1
print('DONE')
break
def bq(sql):
client = bigquery.Client()
query_job = client.query(sql, location='US')
gcs_path = config.gcs_path
body=config.body
trigger_job(gcs_path,body)
flag=0
location='us-central1'
get_job_status(location,flag)
sql= """CREATE OR REPLACE TABLE 'table' AS SELECT * FROM 'table'"""
bq(SQL)
Cloud Function timeout is set to 540 seconds but deployment fails in 3-4 minutes.
Any help is very appreciated.
It appears from the code snippet provided that your HTTP-triggered cloud function is not returning a HTTP response.
All HTTP-based cloud functions must return a HTTP response for proper termination. From the google documentation Ensure HTTP functions send an HTTP response (Emphasis - mine):
If your function is HTTP-triggered, remember to send an HTTP response,
as shown below. Failing to do so can result in your function executing
until timeout. If this occurs, you will be charged for the entire
timeout time. Timeouts may also cause unpredictable behavior or cold
starts on subsequent invocations, resulting in unpredictable behavior
or additional latency.
Thus, you must have a function that in your main.py that returns some sort of value, ideally a value that can be coerced into a Flask http response.
Update
This issue was caused by me not including a token in the APIClient's header. This is resolved.
I have a standard ModelViewSet at /test-endpoint. I am trying to use APIClient to test the endpoint.
from rest_framework.test import APIClient
... # During this process, a file is uploaded to S3. Could this cause the issue? Again, no errors are thrown. I just get a 500.
self.client = APIClient()
...
sample_call = {
"name": "test_document",
"description": "test_document_description"
}
response = self.client.post('/test-endpoint', sample_call, format='json')
self.assertEqual(response.status_code, 201)
This call works with the parameters I set in sample_call. It returns a 201. When I run the test, however, I get a 500. How can I modify this to get the 201 passed?
I run the tests with python src/manage.py test modulename
To rule out the obvious, I copy-pasted the sample call into Postman and run it without issue. I believe the 500 status code is coming from the fact that I'm testing the call and not using it in a live environment.
No error messages are being thrown beyond the AssertionError:
AssertionError: 500 != 201
Full Output of testing
/home/bryant/.virtualenvs/REDACTED/lib/python3.4/site- packages/django_boto/s3/shortcuts.py:28: RemovedInDjango110Warning: Backwards compatibility for storage backends without support for the `max_length` argument in Storage.get_available_name() will be removed in Django 1.10.
s3.save(full_path, fl)
F
======================================================================
FAIL: test_create (sample.tests.SampleTestCase)
Test CREATE Document
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/bryant/api/redacted/src/sample/tests.py", line 31, in test_create
self.assertEqual(response.status_code, 201)
AssertionError: 500 != 201
----------------------------------------------------------------------
Ran 1 test in 2.673s
FAILED (failures=1)
Destroying test database for alias 'default'...
The S3 warning is expected. Otherwise, all appears normal.
To debug a failing test case in Django/DRF:
Put import pdb; pdb.set_trace() just before the assertion and see the request.content as suggested by #Igonato, or you can just add a print(request.content), there is no shame for it.
Increase verbosity of your tests by adding -v 3
Use dot notation to investigate the specific test case: python src/manage.py test modulename.tests.<TestCase>.<function>
I hope these are useful to keep in mind.
I have some code written in django/python. The principal is that the HTTP Response is a generator function. It spits the output of a subprocess on the browser window line by line. This works really well when I am using the django test server. When I use the real server it fails / basically it just beachballs when you press submit on the page before.
#condition(etag_func=None)
def pushviablah(request):
if 'hostname' in request.POST and request.POST['hostname']:
hostname = request.POST['hostname']
command = "blah.pl --host " + host + " --noturn"
return HttpResponse( stream_response_generator( hostname, command ), mimetype='text/html')
def stream_response_generator( hostname, command ):
proc = subprocess.Popen(command.split(), 0, None, subprocess.PIPE, subprocess.PIPE, subprocess.PIPE )
yield "<pre>"
var = 1
while (var == 1):
for line in proc.stdout.readline():
yield line
Anyone have any suggestions on how to get this working with on the real server? Or even how to debug why it is not working?
I discovered that the generator function is actually running but it has to complete before the httpresponse throws up a page onscreen. I don't want to have to wait for it to complete before the user sees output. I would like the user to see output as the subprocess progresses.
I'm wondering if this issue could be related to something in apache2 rather than django.
#evolution did you use gunicorn to deploy your app. If yes then you have created a service. I am having a similar kind of issue but with libreoffice. As much as I have researched I have found that PATH is overriding the command path present on your subprocess. I did not have a solution till now. If you bind your app with gunicorn in terminal then your code will also work.