Every task failing to execute on Google Cloud Tasks - django

I need to run some asynchronous tasks in a Django app, and I started to look into Google Cloud Tasks. I think I have followed all the instructions - and every possible variation I could think of, without success so far.
The problem is that all created tasks go to the queue, but fail to execute. The console and the logs report only a http code 301 (permanent redirection). For the sake of simplicity, I deployed the same code to two services of an App Engine (standard), and routed the tasks request to only one of them.
It looks like the code itself is working fine. When I go to "https://[proj].appspot.com/api/v1/tasks", the routine executes nicely and there's no redirection according to DevTools/Network. When Cloud Tasks try to call "/api/v1/tasks", it fails every time.
If anyone could take a look at the code below and point out what may be causing this failure, I'd appreciate very much.
Thank you.
#--------------------------------
# [proj]/.../urls.py
#--------------------------------
from [proj].api import tasks
urlpatterns += [
# tasks api
path('api/v1/tasks', tasks, name='tasks'),
]
#--------------------------------
# [proj]/api.py:
#--------------------------------
from django.views.decorators.csrf import csrf_exempt
#csrf_exempt
def tasks(request):
print('Start api')
payload = request.body.decode("utf-8")
print (payload)
print('End api')
return HttpResponse('OK')
#--------------------------------
# [proj]/views/manut.py
#--------------------------------
from django.views.generic import View
from django.shortcuts import redirect
from [proj].tasks import TasksCreate
class ManutView(View):
template_name = '[proj]/manut.html'
def post(self, request, *args, **kwargs):
relative_url = '/api/v1/tasks'
testa_task = TasksCreate()
resp = testa_task.send_task(
url=relative_url,
schedule_time=5,
payload={'task_type': 1, 'id': 21}
)
print(resp)
return redirect(request.META['HTTP_REFERER'])
#--------------------------------
# [proj]/tasks/tasks.py:
#--------------------------------
from django.conf import settings
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
from typing import Dict, Optional, Union
import json
import time
class TasksCreate:
def send_task(self,
url: str,
payload: Optional[Union[str, Dict]] = None,
schedule_time: Optional[int] = None, # in seconds
name: Optional[str] = None,
) -> None:
client = tasks_v2.CloudTasksClient()
parent = client.queue_path(
settings.GCP_PROJECT,
settings.GCP_LOCATION,
settings.GCP_QUEUE,
)
# App Engine task:
task = {
'app_engine_http_request': { # Specify the type of request.
'http_method': 'POST',
'relative_uri': url,
'app_engine_routing': {'service': 'tasks'}
}
}
if name:
task['name'] = name
if isinstance(payload, dict):
payload = json.dumps(payload)
if payload is not None:
converted_payload = payload.encode()
# task['http_request']['body'] = converted_payload
task['app_engine_http_request']['body'] = converted_payload
if schedule_time is not None:
now = time.time() + schedule_time
seconds = int(now)
nanos = int((now - seconds) * 10 ** 9)
# Create Timestamp protobuf.
timestamp = timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos)
# Add the timestamp to the tasks.
task['schedule_time'] = timestamp
resp = client.create_task(parent, task)
return resp
# --------------------------------
# [proj]/dispatch.yaml:
# --------------------------------
dispatch:
- url: "*/api/v1/tasks"
service: tasks
- url: "*/api/v1/tasks/"
service: tasks
- url: "*appspot.com/*"
service: default
#--------------------------------
# [proj]/app.yaml & tasks.yaml:
#--------------------------------
runtime: python37
instance_class: F1
automatic_scaling:
max_instances: 2
service: default
#handlers:
#- url: .*
# secure: always
# redirect_http_response_code: 301
# script: auto
entrypoint: gunicorn -b :$PORT --chdir src server.wsgi
env_variables:
...
UPDATE:
Here are the logs for an execution:
{
insertId: "1lfs38fa9"
jsonPayload: {
#type: "type.googleapis.com/google.cloud.tasks.logging.v1.TaskActivityLog"
attemptResponseLog: {
attemptDuration: "0.008005s"
dispatchCount: "5"
maxAttempts: 0
responseCount: "5"
retryTime: "2020-03-09T21:50:33.557783Z"
scheduleTime: "2020-03-09T21:50:23.548409Z"
status: "UNAVAILABLE"
targetAddress: "POST /api/v1/tasks"
targetType: "APP_ENGINE_HTTP"
}
task: "projects/[proj]/locations/us-central1/queues/tectaq/tasks/09687434589619534431"
}
logName: "projects/[proj]/logs/cloudtasks.googleapis.com%2Ftask_operations_log"
receiveTimestamp: "2020-03-09T21:50:24.375681687Z"
resource: {
labels: {
project_id: "[proj]"
queue_id: "tectaq"
target_type: "APP_ENGINE_HTTP"
}
type: "cloud_tasks_queue"
}
severity: "ERROR"
timestamp: "2020-03-09T21:50:23.557842532Z"
}

At last I could make Cloud Tasks work, but only using http_request type (with absolute url). There was no way I could make the tasks run when they were defined as app_engine_http_request (relative url).
I had already tried the http_request type with POST, but that was before I exempted the api function from have the csrf token previously checked, and that was causing an error Forbidden (Referer checking failed - no Referer.): /api/v1/tasks, which I failed to connect to the csrf omission.
If someone stumble across this issue in the future, and find out a way to make app_engine_http_request work on Cloud Tasks with Django, I'd still like very much to know the solution.

The problem is that App Engine task handlers do not follow redirects, so you have to find out why the request is being redirected and make an exception for App Engine requests. In my case I was redirecting http to https and had to make an exception like so: (Node Express)
app.use((req, res, next) => {
const protocol = req.headers['x-forwarded-proto']
const userAgent = req.headers['user-agent']
if (userAgent && userAgent.includes('AppEngine-Google')) {
console.log('USER AGENT IS GAE, SKIPPING REDIRECT TO HTTPS.')
return next()
} else if (protocol === 'http') {
res.redirect(301, `https://${req.headers.host}${req.url}`)
} else {
next()
}
})

The problem is that all created tasks go to the queue, but fail to execute. The console and the logs report only a http code 301 (permanent redirection).
Maybe the request handler for your task endpoint wants a trailing slash.
Try changing this:
class ManutView(View):
template_name = '[proj]/manut.html'
def post(self, request, *args, **kwargs):
relative_url = '/api/v1/tasks'
...
to this:
class ManutView(View):
template_name = '[proj]/manut.html'
def post(self, request, *args, **kwargs):
relative_url = '/api/v1/tasks/'
...
Also just try hitting the task url yourself and see if you can get a task to run from curl

If someone stumble across this issue in the future, and find out a way
to make app_engine_http_request work on Cloud Tasks with Django, I'd
still like very much to know the solution.
#JCampos I manage to make it work on my Django app (I use in addition DRF but I do no think it causes a big difference).
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
class CloudTasksMixin:
#property
def _cloud_task_client(self):
return tasks_v2.CloudTasksClient()
def send_to_cloud_tasks(self, url, http_method='POST', payload=None,in_seconds=None, name=None):
""" Send task to be executed """
parent = self._cloud_task_client.queue_path(settings.TASKS['PROJECT_NAME'], settings.TASKS['QUEUE_REGION'], queue=settings.TASKS['QUEUE_NAME'])
task = {
'app_engine_http_request': {
'http_method': http_method,
'relative_uri': url
}
}
...
And then I use a view like this one:
class CloudTaskView(views.APIView):
authentication_classes = []
def post(self, request, *args, **kwargs):
# Do your stuff
return Response()
Finally I implement this url in the urls.py (from DRF) with csrf_exempt(CloudTaskView.as_view())
At first I had 403 error, but thanks to you and your comment with csrf_exempt, it is now working.

It seems that Cloud Tasks calls App Engine using a HTTP url (that's ok because probably they are in the same network), but if you are using HTTPs, Django should be redirecting (http -> https) any request that's being received, including your handler endpoint.
To solve this, you should tell Django to not redirect your handler.
You can use settings.SECURE_REDIRECT_EXEMPT for it.
For instance:
SECURE_REDIRECT_EXEMPT = [r"^api/v1/tasks/$"]

Related

Triggering DAG from cloudfunction gen2,throws , .HTTPError: 400 Client Error: Bad Request for url: https://<composer-url>/api/v1/dags/test-dag/dagRuns

I am trying to trigger Composer2 DAG from cloud Function gen2 when a Bigquery table is inserted with some records.
Event I am listening to is - google.cloud.bigquery.v2.JobService.InsertJob
And source is - /projects/MYPROJECT/datasets/DATASET/tables/test_trigger.
I am able to get,the trigger to the cloud function,when a record is inserted, but when the cloud fnction is trying to trigger DAG,it throwing following error.
Triggering DAG from cloudfunction gen2,throws error, requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://<composer-url>/api/v1/dags/test-dag/dagRuns
Here is my main.py
from typing import Any
import composer2_airflow_rest_api
def trigger_dag_bq(data, context=None):
web_server_url = (
"https:<URL>composer.googleusercontent.com"
)
dag_id = 'test-dag'
composer2_airflow_rest_api.trigger_dag(web_server_url, dag_id,data)
And composer2_airflow_rest_api.py
from typing import Any
import google.auth
from google.auth.transport.requests import AuthorizedSession
import requests
# Following GCP best practices, these credentials should be
# constructed at start-up time and used throughout
# https://cloud.google.com/apis/docs/client-libraries-best-practices
AUTH_SCOPE = "https://www.googleapis.com/auth/cloud-platform"
CREDENTIALS, _ = google.auth.default(scopes=[AUTH_SCOPE])
def make_composer2_web_server_request(url: str, method: str = "GET", **kwargs: Any) -> google.auth.transport.Response:
"""
Make a request to Cloud Composer 2 environment's web server.
Args:
url: The URL to fetch.
method: The request method to use ('GET', 'OPTIONS', 'HEAD', 'POST', 'PUT',
'PATCH', 'DELETE')
**kwargs: Any of the parameters defined for the request function:
https://github.com/requests/requests/blob/master/requests/api.py
If no timeout is provided, it is set to 90 by default.
"""
authed_session = AuthorizedSession(CREDENTIALS)
# Set the default timeout, if missing
if "timeout" not in kwargs:
kwargs["timeout"] = 90
return authed_session.request(method, url, **kwargs)
def trigger_dag(web_server_url: str, dag_id: str, data: dict) -> str:
"""
Make a request to trigger a dag using the stable Airflow 2 REST API.
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
Args:
web_server_url: The URL of the Airflow 2 web server.
dag_id: The DAG ID.
data: Additional configuration parameters for the DAG run (json).
"""
endpoint = f"api/v1/dags/{dag_id}/dagRuns"
request_url = f"{web_server_url}/{endpoint}"
json_data = {"conf": data.decode('utf-8')}
response = make_composer2_web_server_request(
request_url, method="POST", json=json_data
)
if response.status_code == 403:
raise requests.HTTPError(
"You do not have a permission to perform this operation. "
"Check Airflow RBAC roles for your account."
f"{response.headers} / {response.text}"
)
elif response.status_code != 200:
response.raise_for_status()
else:
return response.text
Does anyone know,whats going wrong?
All the resources are in same project.

Using Websocket in Django View Not Working

Problem Summary
I am sending data to a front-end (React component) using Django and web-sockets. When I run the app and send the data from my console everything works. When I use a button on the front-end to trigger a Django view that runs the same function, it does not work and generates a confusing error message.
I want to be able to click a front-end button which begins sending the data to the websocket.
I am new to Django, websockets and React and so respectfully ask you to be patient.
Overview
Django back-end and React front-end connected using Django Channels (web-sockets).
User clicks button on front-end, which does fetch() on Django REST API end-point.
[NOT WORKING] The above endpoint's view begins sending data through the web-socket.
Front-end is updated with this value.
Short Error Description
The error Traceback is long, so it is included at the end of this post. It begins with:
Internal Server Error: /api/run-create
And ends with:
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
What I've Tried
Sending Data Outside The Django View
The function below sends data to the web-socket.
Works perfectly when I run it in my console - front-end updates as expected.
Note: the same function causes the attached error when run from inside the Django view.
import json
import time
import numpy as np
import websocket
def gen_fake_path(num_cities):
path = list(np.random.choice(num_cities, num_cities, replace=False))
path = [int(num) for num in path]
return json.dumps({"path": path})
def fake_run(num_cities, limit=1000):
ws = websocket.WebSocket()
ws.connect("ws://localhost:8000/ws/canvas_data")
while limit:
path_json = gen_fake_path(num_cities)
print(f"Sending {path_json} (limit: {limit})")
ws.send(path_json)
time.sleep(3)
limit -= 1
print("Sending complete!")
ws.close()
return
Additional Detail
Relevant Files and Configuration
consumer.py
class AsyncCanvasConsumer(AsyncWebsocketConsumer):
async def connect(self):
self.group_name = "dashboard"
await self.channel_layer.group_add(self.group_name, self.channel_name)
await self.accept()
async def disconnect(self, close_code):
await self.channel_layer.group_discard(self.group_name, self.channel_name)
async def receive(self, text_data=None, bytes_data=None):
print(f"Received: {text_data}")
data = json.loads(text_data)
to_send = {"type": "prep", "path": data["path"]}
await self.channel_layer.group_send(self.group_name, to_send)
async def prep(self, event):
send_json = json.dumps({"path": event["path"]})
await self.send(text_data=send_json)
Relevant views.py
#api_view(["POST", "GET"])
def run_create(request):
serializer = RunSerializer(data=request.data)
if not serializer.is_valid():
return Response({"Bad Request": "Invalid data..."}, status=status.HTTP_400_BAD_REQUEST)
# TODO: Do run here.
serializer.save()
fake_run(num_cities, limit=1000)
return Response(serializer.data, status=status.HTTP_200_OK)
Relevant settings.py
WSGI_APPLICATION = 'evolving_salesman.wsgi.application'
ASGI_APPLICATION = 'evolving_salesman.asgi.application'
CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels.layers.InMemoryChannelLayer"
}
}
Relevant routing.py
websocket_url_pattern = [
path("ws/canvas_data", AsyncCanvasConsumer.as_asgi()),
]
Full Error
https://pastebin.com/rnGhrgUw
EDIT: SOLUTION
The suggestion by Kunal Solanke solved the issue. Instead of using fake_run() I used the following:
layer = get_channel_layer()
for i in range(10):
path = list(np.random.choice(4, 4, replace=False))
path = [int(num) for num in path]
async_to_sync(layer.group_send)("dashboard", {"type": "prep", "path": path})
time.sleep(3)
Rather than creating a new connection from same server to itself , I'd suggest you to use the get_channel_layer utitlity .Because you are in the end increasing the server load by opening so many connections .
Once you get the channel layer , you can simply do group send as we normally do to send evnets .
You can read more about here
from channels.layers import get_channel_layer
from asgiref.sync import async_to_sync
def media_image(request,chat_id) :
if request.method == "POST" :
data = {}
if request.FILES["media_image"] is not None :
item = Image.objects.create(owner = request.user,file=request.FILES["media_image"])
message=Message.objects.create(item =item,user=request.user )
chat = Chat.objects.get(id=chat_id)
chat.messages.add(message)
layer = get_channel_layer()
item = {
"media_type": "image",
"url" : item.file.url,
"user" : request.user.username,
'caption':item.title
}
async_to_sync(layer.group_send)(
'chat_%s'%str(chat_id),
#this is the channel group name,which is defined inside your consumer"
{
"type":"send_media",
"item" : item
}
)
return HttpResponse("media sent")
In the error log, I can see that the handshake succeded for the first iteration and failed for 2nd . You can check that by printing something in the for loop . If that's the case the handshake most probably failed due to mulitple connections . I don't know how many connections the Inmemrorycache supports from same origin,but that can be reason that the 2nd connection is getting diconnected . You can get some idea in channel docs.Try using redis if you don't want to change your code,its pretty easy if you are using linux .

asyncio task hangs midway

So i am building a scraper which takes a bunch of urls, a success function that will run via celery if that url was fetched successfully and if any error occurs just return and collect the bunch of urls that were not successfull and send them to be scheduled again to a celery function.
Below is the code.
class AsyncRequest:
def __init__(self, urls_batch, callback, task_name, method, finish_callback=None, *args, **kwargs):
"""
:param urls_batch: List of urls to fetch in asycn
:param callback: Callback that process a successfull response
"""
self.tasks = []
self.headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"
}
self.urls_batch = urls_batch
self.task_name = task_name
self.callback = callback
self.finish_callback = finish_callback
self.args = args
self.kwargs = kwargs
self.proxy = kwargs["proxy"] if "proxy" in kwargs.keys() else None
self.finish_callback = finish_callback
self.successfull_urls = []
self.verify_ssl = kwargs["verify_ssl"] if "verify_ssl" in kwargs.keys() else True
async def fetch(self, session, url, time_out=15, retry_limit=3, *args, **kwargs):
try:
for i in range(retry_limit):
try:
async with session.request(self.method, url, headers=self.headers,
timeout=ClientTimeout(total=None, sock_connect=time_out,
sock_read=time_out),
verify_ssl=self.verify_ssl, proxy=self.proxy) as response:
if response.status in [200, 203]:
result = await response.text()
self.successfull_urls.append(url)
# I dont think its a celery issue because even if i comment out the below line it still gets stuck
#self.callback.delay(result, url=url, *self.args, **self.kwargs)
return
else:
logger.error(
"{} ERROR===============================================>".format(self.task_name))
logger.error("status: {}".format(response.status))
except ClientHttpProxyError as e:
logger.error("{} ---> {} pkm: {}, timeout: {}".format(self.task_name, type(e), pkm,
proxy = UpdateProxy()
except Exception as e:
logger.error(
"{} ---> {} url: {}, timeout: {}!!!! returning".format(self.task_name, type(e), pkm, time_out))
logger.error("pkm: {} errored".format(self.kwargs["search_url_pkm_mapping"][url]))
async def main(self):
async with aiohttp.ClientSession(timeout=100) as session:
results = await asyncio.gather(*(self.fetch(session, url) for url in self.urls_batch),
return_exceptions=True)
logger.info("Gather operation done ----------> Results: {}".format(results))
logger.info("{} Successful Urls".format(len(self.successfull_urls)))
errored_urls = [url for url in self.urls_batch if url not in self.successfull_urls]
logger.error("{} urls errored".format(len(errored_urls)))
# Below code to send errored urls to a celery task that scheduled them again
# if self.finish_callback and len(errored_urls) > 0:
# self.finish_callback.delay(errored_urls, self.task_name, *self.args, **self.kwargs)
So what happens is if i send a batch of 50urls, almost 40 to 45 of them work perfectly and the remaining just hangs. Nothing happens, i expect that at least the tasks will throw sum error due to network issue or server or anything. They just should be finished and code after gather which shows the number of successfull urls and errored urls is executed. And this does not happen, the code just hangs. The log lines after gather are not executed and i dont know where the error is.
Any help will be appreciated highly.
EDIT: I have removed the celery code and just fetching and keeping track of successfull urls and errored one. It still gets stuck. If its necessary to note, i am sending the request to google(but even if google is blocking my request some error must be thrown right.)
EDIT2: One more thing i would like to list is if i just keep the url's batch size small such as 15 to 20. Then there is no hang and everything works as expected. But the moment i increase the urls batch to say 50 it stucks, on 3 to 5 urls.

Authentication with GitLab to a terminal

I have a terminal that served in webbrowser with wetty. I want to authenticate the user from gitlab to let user with interaction with the terminal(It is inside docker container. When user authenticated i ll allow him to see the containers terminal).
I am trying to do OAuth 2.0 but couldn't manage to achieve.
That is what i tried.
I created an application on gitlab.
Get the code and secret and make a http call with python script.
Script directed me to login and authentication page.
I tried to get code but failed(Their is no mistake on code i think)
Now the problem starts in here. I need to get the auth code from redirected url to gain access token but couldn't figure out. I used flask library for get the code.
from flask import Flask, abort, request
from uuid import uuid4
import requests
import requests.auth
import urllib2
import urllib
CLIENT_ID = "clientid"
CLIENT_SECRET = "clientsecret"
REDIRECT_URI = "https://UnrelevantFromGitlabLink.com/console"
def user_agent():
raise NotImplementedError()
def base_headers():
return {"User-Agent": user_agent()}
app = Flask(__name__)
#app.route('/')
def homepage():
text = 'Authenticate with gitlab'
return text % make_authorization_url()
def make_authorization_url():
# Generate a random string for the state parameter
# Save it for use later to prevent xsrf attacks
state = str(uuid4())
save_created_state(state)
params = {"client_id": CLIENT_ID,
"response_type": "code",
"state": state,
"redirect_uri": REDIRECT_URI,
"scope": "api"}
url = "https://GitlapDomain/oauth/authorize?" + urllib.urlencode(params)
print get_redirected_url(url)
print(url)
return url
# Left as an exercise to the reader.
# You may want to store valid states in a database or memcache.
def save_created_state(state):
pass
def is_valid_state(state):
return True
#app.route('/console')
def reddit_callback():
print("-----------------")
error = request.args.get('error', '')
if error:
return "Error: " + error
state = request.args.get('state', '')
if not is_valid_state(state):
# Uh-oh, this request wasn't started by us!
abort(403)
code = request.args.get('code')
print(code.json())
access_token = get_token(code)
# Note: In most cases, you'll want to store the access token, in, say,
# a session for use in other parts of your web app.
return "Your gitlab username is: %s" % get_username(access_token)
def get_token(code):
client_auth = requests.auth.HTTPBasicAuth(CLIENT_ID, CLIENT_SECRET)
post_data = {"grant_type": "authorization_code",
"code": code,
"redirect_uri": REDIRECT_URI}
headers = base_headers()
response = requests.post("https://MyGitlabDomain/oauth/token",
auth=client_auth,
headers=headers,
data=post_data)
token_json = response.json()
return token_json["access_token"]
if __name__ == '__main__':
app.run(host="0.0.0.0",debug=True, port=65010)
I think my problem is on my redirect url. Because it is just an irrelevant link from GitLab and there is no API the I can make call.
If I can fire
#app.route('/console')
that line on Python my problem will probably will be solved.
I need to make correction on my Python script or different angle to solve my problem. Please help.
I was totally miss understand the concept of auth2. Main aim is to have access_token. When i corrected callback url as localhost it worked like charm.

Django system check stuck on unreachable url

In my project I have requests library that sends POST request. Url for that request is hardcoded in function, which is accessed from views.py.
The problem is that when I dont have internet connection, or host, on which url is pointing, is down, I cant launch developer server, it gets stuck on Performing system check. However, if I comment the line with url, or change it to guarantee working host, check is going well.
What is good workaround here ?
views.py
def index(request):
s = Sync()
s.do()
return HttpResponse("Hello, world. You're at the polls index.")
sync.py
class Sync:
def do(self):
reservations = Reservation.objects.filter(is_synced=False)
for reservation in reservations:
serializer = ReservationPKSerializer(reservation)
dictionary = {'url': 'url', 'hash': 'hash', 'json': serializer.data}
encoded_data = json.dumps(dictionary)
r = requests.post('http://gservice.ca29983.tmweb.ru/gdocs/do.php', headers={'Content-Type': 'application/json'}, data=encoded_data)
if r.status_code == 200:
reservation.is_synced = True
reservation.save()
It might appear to be stuck because requests automatically retries the connection a few times. Try reducing the retry count to 0 or 1 with:
Can I set max_retries for requests.request?