Django, Apache2 on Google Kubernetes Engine writing Opencensus Traces to Stackdriver Trace - django

I have a Django web app served from Apache2 with mod_wsgi in docker containers running on a Kubernetes cluster in Google Cloud Platform, protected by Identity-Aware Proxy. Everything is working great, but I want to send GCP Stackdriver traces for all requests without writing one for each view in my project. I found middleware to handle this, using Opencensus. I went through this documentation, and was able to manually generate traces that exported to Stackdriver Trace in my project by specifying the StackdriverExporter and passing the project_id parameter as the Google Cloud Platform Project Number for my project.
Now to make this automatic for ALL requests, I followed the instructions to set up the middleware. In settings.py, I added the module to INSTALLED_APPS, MIDDLEWARE, and set up the OPENCENSUS_TRACE dictionary of options. I also added the OPENCENSUS_TRACE_PARAMS. This works great with the default exporter 'opencensus.trace.exporters.print_exporter.PrintExporter', as I can see the Trace and Span information, including Trace ID and all details in my Apache2 web server logs. However, I want to send these to my Stackdriver Trace processor for analysis.
I tried setting the EXPORTER parameter to opencensus.trace.exporters.stackdriver_exporter.StackdriverExporter, which works when run manually from the shell, as long as you supply the project number.
When it is set up to use StackdriverExporter, the web page will not respond load, the health check starts to fail, and ultimately the web page comes back with a 502 error, stating I should try again in 30 seconds (I believe the Identity-Aware Proxy is generating this error, once it detects the failed health check), but the server generates no errors, and there are no logs in access or errors for Apache2.
There is another dictionary in settings.py named OPENCENSUS_TRACE_PARAMS, which I presume is needed to determine which project number the exporter should be using. The example has GCP_EXPORTER_PROJECT set as None, and SERVICE_NAME set as 'my_service'.
What options do I need to set to get the exporter to send back to Stackdriver instead of printing to logs? Do you have any idea about how I can set this up?
settings.py
MIDDLEWARE = (
...
'opencensus.trace.ext.django.middleware.OpencensusMiddleware',
)
INSTALLED_APPS = (
...
'opencensus.trace.ext.django',
)
OPENCENSUS_TRACE = {
'SAMPLER': 'opencensus.trace.samplers.probability.ProbabilitySampler',
'EXPORTER': 'opencensus.trace.exporters.stackdriver_exporter.StackdriverExporter', # This one just makes the server hang with no response or error and kills the health check.
'PROPAGATOR': 'opencensus.trace.propagation.google_cloud_format.GoogleCloudFormatPropagator',
# 'EXPORTER': 'opencensus.trace.exporters.print_exporter.PrintExporter', # This one works to print the Trace and Span with IDs and details in the logs.
}
OPENCENSUS_TRACE_PARAMS = {
'BLACKLIST_PATHS': ['/health'],
'GCP_EXPORTER_PROJECT': 'my_project_number', # Should this be None like the example, or Project ID, or Project Number?
'SAMPLING_RATE': 0.5,
'SERVICE_NAME': 'my_service', # Not sure if this is my app name or some other service name.
'ZIPKIN_EXPORTER_HOST_NAME': 'localhost', # Are the following even necessary, or are they causing a failure that is not detected by Apache2?
'ZIPKIN_EXPORTER_PORT': 9411,
'ZIPKIN_EXPORTER_PROTOCOL': 'http',
'JAEGER_EXPORTER_HOST_NAME': None,
'JAEGER_EXPORTER_PORT': None,
'JAEGER_EXPORTER_AGENT_HOST_NAME': 'localhost',
'JAEGER_EXPORTER_AGENT_PORT': 6831
}
Here's an example (I prettified the format for readability) of the Apache2 log when it is set to use the PrintExporter:
[Fri Feb 08 09:00:32.427575 2019]
[wsgi:error]
[pid 1097:tid 139801302882048]
[client 10.48.0.1:43988]
[SpanData(
name='services.views.my_view',
context=SpanContext(
trace_id=e882f23e49e34fc09df621867d753532,
span_id=None,
trace_options=TraceOptions(enabled=True),
tracestate=None
),
span_id='bcbe7b96906a482a',
parent_span_id=None,
attributes={
'http.status_code': '200',
'http.method': 'GET',
'http.url': '/',
'django.user.name': ''
},
start_time='2019-02-08T17:00:29.845733Z',
end_time='2019-02-08T17:00:32.427455Z',
child_span_count=0,
stack_trace=None,
time_events=[],
links=[],
status=None,
same_process_as_parent_span=None,
span_kind=1
)]
Thanks in advance for any tips, assistance, or troubleshooting advice!
Edit 2019-02-08 6:56 PM UTC:
I found this in the middleware:
# Initialize the exporter
transport = convert_to_import(settings.params.get(TRANSPORT))
if self._exporter.__name__ == 'GoogleCloudExporter':
_project_id = settings.params.get(GCP_EXPORTER_PROJECT, None)
self.exporter = self._exporter(
project_id=_project_id,
transport=transport)
elif self._exporter.__name__ == 'ZipkinExporter':
_service_name = self._get_service_name(settings.params)
_zipkin_host_name = settings.params.get(
ZIPKIN_EXPORTER_HOST_NAME, 'localhost')
_zipkin_port = settings.params.get(
ZIPKIN_EXPORTER_PORT, 9411)
_zipkin_protocol = settings.params.get(
ZIPKIN_EXPORTER_PROTOCOL, 'http')
self.exporter = self._exporter(
service_name=_service_name,
host_name=_zipkin_host_name,
port=_zipkin_port,
protocol=_zipkin_protocol,
transport=transport)
elif self._exporter.__name__ == 'TraceExporter':
_service_name = self._get_service_name(settings.params)
_endpoint = settings.params.get(
OCAGENT_TRACE_EXPORTER_ENDPOINT, None)
self.exporter = self._exporter(
service_name=_service_name,
endpoint=_endpoint,
transport=transport)
elif self._exporter.__name__ == 'JaegerExporter':
_service_name = self._get_service_name(settings.params)
self.exporter = self._exporter(
service_name=_service_name,
transport=transport)
else:
self.exporter = self._exporter(transport=transport)
The exporter is now named StackdriverExporter, instead of GoogleCloudExporter. I set up a class in my app named GoogleCloudExporter that inherits StackdriverExporter, and updated my settings.py to use GoogleCloudExporter, but it didn't seem to work, I wonder if there is other code referencing these old naming schemes, possibly for the transport. I'm searching the source code for clues... This at least tells me I can get rid of the ZIPKIN and JAEGER param options, as this is determined on the EXPORTER param.
Edit 2019-02-08 11:58 PM UTC:
I scrapped Apache2 to isolate the problem and just set my docker image to use Django's built in webserver CMD ["python", "/path/to/manage.py", "runserver", "0.0.0.0:80"] and it works! When I go to the site, it writes traces to Stackdriver Trace for each request, the Span name is the module and method being executed.
Somehow Apache2 is not being allowed to send these, but I can do so from the shell when running as root. I'm adding Apache2 and mod-wsgi tags to the question, because I have a funny feeling this has to do with forking child processes in Apache2 and mod-WSGI. Would it be the child process being unable to be created as apache2's child process is sandboxed, or could this be a permissions thing? It seems strange, because it is just calling python modules, no external system OS binaries, that I am aware of. Any other ideas would be greatly appreciated!

I had this problem while using gunicorn with gevent as the worker class. To resolve and get cloud traces working the solution was to monkey patch grpc like so
from gevent import monkey
monkey.patch_all()
import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()
See https://github.com/grpc/grpc/issues/4629#issuecomment-376962677

Related

Airflow SSHOperator: How To Securely Access Pem File Across Tasks?

We are running Airflow via AWS's managed MWAA Offering. As part of their offering they include a tutorial on securely using the SSH Operator in conjunction with AWS Secrets Manager. The gist of how their solution works is described below:
Run a Task that fetches the pem file from a Secrets Manager location and store it on the filesystem at /tmp/mypem.pem.
In the SSH Connection include the extra information that specifies the file location
{"key_file":"/tmp/mypem.pem"}
Use the SSH Connection in the SSHOperator.
In short the workflow is supposed to be:
Task1 gets the pem -> Task2 uses the pem via the SSHOperator
All of this is great in theory, but it doesn't actually work. It doesn't work because Task1 may run on a different node from Task2, which means Task2 can't access the /tmp/mypem.pem file location that Task1 wrote the file to. AWS is aware of this limitation according to AWS Support, but now we need to understand another way to do this.
Question
How can we securely store and access a pem file that can then be used by Tasks running on different nodes via the SSHOperator?
I ran into the same problem. I extended the SSHOperator to do both steps in one call.
In AWS Secrets Manager, two keys are added for airflow to retrieve on execution.
{variables_prefix}/airflow-user-ssh-key : the value of the private key
{connections_prefix}/ssh_airflow_user : ssh://replace.user#replace.remote.host?key_file=%2Ftmp%2Fairflow-user-ssh-key
from typing import Optional, Sequence
from os.path import basename, splitext
from airflow.models import Variable
from airflow.providers.ssh.operators.ssh import SSHOperator
from airflow.providers.ssh.hooks.ssh import SSHHook
class SSHOperator(SSHOperator):
"""
SSHOperator to execute commands on given remote host using the ssh_hook.
:param ssh_conn_id: :ref:`ssh connection id<howto/connection:ssh>`
from airflow Connections.
:param ssh_key_var: name of Variable holding private key.
Creates "/tmp/{variable_name}.pem" to use in SSH connection.
May also be inferred from "key_file" in "extras" in "ssh_conn_id".
:param remote_host: remote host to connect (templated)
Nullable. If provided, it will replace the `remote_host` which was
defined in `ssh_hook` or predefined in the connection of `ssh_conn_id`.
:param command: command to execute on remote host. (templated)
:param timeout: (deprecated) timeout (in seconds) for executing the command. The default is 10 seconds.
Use conn_timeout and cmd_timeout parameters instead.
:param environment: a dict of shell environment variables. Note that the
server will reject them silently if `AcceptEnv` is not set in SSH config.
:param get_pty: request a pseudo-terminal from the server. Set to ``True``
to have the remote process killed upon task timeout.
The default is ``False`` but note that `get_pty` is forced to ``True``
when the `command` starts with ``sudo``.
"""
template_fields: Sequence[str] = ("command", "remote_host")
template_ext: Sequence[str] = (".sh",)
template_fields_renderers = {"command": "bash"}
def __init__(
self,
*,
ssh_conn_id: Optional[str] = None,
ssh_key_var: Optional[str] = None,
remote_host: Optional[str] = None,
command: Optional[str] = None,
timeout: Optional[int] = None,
environment: Optional[dict] = None,
get_pty: bool = False,
**kwargs,
) -> None:
super().__init__(
ssh_conn_id=ssh_conn_id,
remote_host=remote_host,
command=command,
timeout=timeout,
environment=environment,
get_pty=get_pty,
**kwargs,
)
if ssh_key_var is None:
key_file = SSHHook(ssh_conn_id=self.ssh_conn_id).key_file
key_filename = basename(key_file)
key_filename_no_extension = splitext(key_filename)[0]
self.ssh_key_var = key_filename_no_extension
else:
self.ssh_key_var = ssh_key_var
def import_ssh_key(self):
with open(f"/tmp/{self.ssh_key_var}", "w") as file:
file.write(Variable.get(self.ssh_key_var))
def execute(self, context):
self.import_ssh_key()
super().execute(context)
The answer by holly is good. I am sharing a different way I solved this problem. I used the strategy of converting the SSH Connection into a URI and then input that into Secrets Manager under the expected connections path, and everything worked great via the SSH Operator. Below are the general steps I took.
Generate an encoded URI
import json
from airflow.models.connection import Connection
from pathlib import Path
pem = Path(“/my/pem/file”/pem).read_text()
myconn= Connection(
conn_id="connX”,
conn_type="ssh",
host="10.x.y.z,
login=“mylogin”,
extra=json.dumps(dict(private_key=pem)),
print(myconn.get_uri())
Input that URI under the environment's configured path in Secrets Manager. The important note here is to input the value in the plaintext field without including a key. Example:
airflow/connections/connX and under Plaintext only include the URI value
Now in the SSHOperator you can reference this connection Id like any other.
remote_task = SSHOperator(
task_id="ssh_and_execute_command",
ssh_conn_id="connX"
command="whoami",
)

Why is python-pdfkit hanging on printing page with OpenLayers3 content when run with uWSGI and NGINX?

I'm using Django served by uWSGI and NGINX.
Ubuntu 14.04.1 LTS 64-bit
Python 3.4
Django 1.7.4
uWSGI 1.9.17.1-debian (64bit)
NGINX 1.4.6
python-pdfkit 0.5.0
wkhtmltopdf 0.12.2.1
OpenLayers v3.0.0
When I try running pdfkit.from_url(...) to print a map to pdf the request times out.
More specifically it hangs in python's subprocess.py communicate, self._communicate:
with _PopenSelector() as selector:
if self.stdin and input:
selector.register(self.stdin, selectors.EVENT_WRITE)
if self.stdout:
selector.register(self.stdout, selectors.EVENT_READ)
if self.stderr:
selector.register(self.stderr, selectors.EVENT_READ)
while selector.get_map():
...
selector.get_map() always returns a valid result, ensuring an infinite loop.
If I run this in the Django development server (instead of uWSGI+NGINX) everything runs fine.
in my view:
wkhtmltopdfBinLocationString = '/usr/local/bin/wkhtmltopdf'
wkhtmltopdfBinLocationBytes = wkhtmltopdfBinLocationString.encode('utf-8')
#this fixes some leftover python2 assumptions about strings
config = pdfkit.configuration(wkhtmltopdf=wkhtmltopdfBinLocationBytes)
pdfkit.from_url(reportPdfUrl, reportPdfFile, configuration=config, options={
'javascript-delay': 1500
})
Several places I have seen answers along the line of "set the close-on-exec flag on the socket" solving similar issues.
Is this something I can set from my "from_url" options (wkhtmltopdf does not accept it by that name) or can I configure uWSGI to assume 'close-on-exec'? I have not been able to make either of these work, but maybe I just need help with changing my uWSGI customization file:
[uwsgi]
workers = 1
chdir = [...]
plugins = python34
wsgi-file = [...]/wsgi.py
pythonpath = [...]
I tried something like
close-on-exec = true
but that didn't seem to do anything.
NOTE: the wsgi.py file is simple:
"""
WSGI config for dst project.
It exposes the WSGI callable as a module-level variable named ``application``.
For more information on this file, see
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/
"""
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "[my_project].settings")
from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()
Any thoughts?

Django and Bad Request (400)

I created new django project;
added to my settings.py:
DEBUG = False
ALLOWED_HOSTS= [
'localhost',
'my_site.com'
]
created app test_view;
added hello_world to test_view.views
from django.http.response import HttpResponse
def hello_world(request):
return HttpResponse('Hello World!!!')
added test route to urls.py url(r'test/', 'test_view.views.hello_world');
fixed /etc/hosts
127.0.0.1 localhost my_site.com
Now when i'm trying to access http://my_site.com:8000/test/ django returns Bad Request (400). But when url is http://localhost:8000/test/ I can see my Hello World page. What can be wrong?
UPD:
The same result with DEBUG = True
UPD2:
One more working hostname is ubuntu-virtualbox (computer's name).
But even when I changed computer's name to my_site, ubuntu-virtualbox was still available and my_site returned Bad Request (400)
May it be because of some system settings? (it's clean ubuntu in virtualbox).
Or maybe problem in virtualenv? Is there a way to trace the error?
It might be a bad Cookie. Try deleting them.
It looks like django can see if request isn't passed through dns server. Installation and configuration of bind9 instead of changes in /etc/hosts solved this problem.
You need another line in your hosts file.
127.0.0.1 localhost
127.0.0.1 my_site.com
Then in your ALLOWED_HOSTS...
ALLOWED_HOSTS = [
'localhost',
'.my_site.com', # not 'my_site.com'
]
ALSO, and this is probably important seeing as you are running your site from a virtual machine, when you run the site with python manage.py runserver, run it like this...
python manage.py runserver virtual.server.ip.address:8000
Obviously replace 'virtual.server.ip.address' with that virtual machines actual ip address.
I print *DEBUG = None* and my django works.

Running periodic tasks with django and celery

I'm trying create a simple background periodic task using Django-Celery-RabbitMQ combination. I installed Django 1.3.1, I downloaded and setup djcelery. Here is how my settings.py file looks like:
BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
....
import djcelery
djcelery.setup_loader()
...
INSTALLED_APPS = (
'djcelery',
)
And I put a 'tasks.py' file in my application folder with the following contents:
from celery.task import PeriodicTask
from celery.registry import tasks
from datetime import timedelta
from datetime import datetime
class MyTask(PeriodicTask):
run_every = timedelta(minutes=1)
def run(self, **kwargs):
self.get_logger().info("Time now: " + datetime.now())
print("Time now: " + datetime.now())
tasks.register(MyTask)
And then I start up my django server (local development instance):
python manage.py runserver
Then I start up the celerybeat process:
python manage.py celerybeat --logfile=<path_to_log_file> -l DEBUG
I can see entries like this in the log:
[2012-04-29 07:50:54,671: DEBUG/MainProcess] tasks.MyTask sent. id->72a5963c-6e15-4fc5-a078-dd26da663323
And I also can see the corresponding entries getting created in database, but I can't find where it is logging the text I specified in the actual run function in MyTask class.
I tried fiddling with the logging settings, tried using the django logger instead of celery logger, but of no use. I'm not even sure, my task is getting executed. If I print any debug information in the task, where does it go?
Also, this is first time I'm working with any type of message queuing system. It looks like the task will get executed as part of the celerybeat process - outside the django web framework. Will I still be able to access all the django models I created.
Thanks,
Venkat.
Celerybeat it stuff, which pushes task when it need, but not executing them. You tasks instances stored in RabbitMq server. You need to execute celeryd daemon for executing your tasks.
python manage.py celeryd --logfile=<path_to_log_file> -l DEBUG
Also if you using RabbitMq, I recommend to you to install special rabbitmq management plugins:
rabbitmq-plugins list
rabbitmq-enable rabbitmq_management
service rabbitmq-server restart
It will be available at http://:55672/ login: guest pass: guest. Here you can check how many tasks in your rabbit instance online.
You should check the RabbitMQ logs, since celery sends the tasks to RabbitMQ and it should execute them. So all the prints of the tasks should be in RabbitMQ logs.

django generator function not working with real server

I have some code written in django/python. The principal is that the HTTP Response is a generator function. It spits the output of a subprocess on the browser window line by line. This works really well when I am using the django test server. When I use the real server it fails / basically it just beachballs when you press submit on the page before.
#condition(etag_func=None)
def pushviablah(request):
if 'hostname' in request.POST and request.POST['hostname']:
hostname = request.POST['hostname']
command = "blah.pl --host " + host + " --noturn"
return HttpResponse( stream_response_generator( hostname, command ), mimetype='text/html')
def stream_response_generator( hostname, command ):
proc = subprocess.Popen(command.split(), 0, None, subprocess.PIPE, subprocess.PIPE, subprocess.PIPE )
yield "<pre>"
var = 1
while (var == 1):
for line in proc.stdout.readline():
yield line
Anyone have any suggestions on how to get this working with on the real server? Or even how to debug why it is not working?
I discovered that the generator function is actually running but it has to complete before the httpresponse throws up a page onscreen. I don't want to have to wait for it to complete before the user sees output. I would like the user to see output as the subprocess progresses.
I'm wondering if this issue could be related to something in apache2 rather than django.
#evolution did you use gunicorn to deploy your app. If yes then you have created a service. I am having a similar kind of issue but with libreoffice. As much as I have researched I have found that PATH is overriding the command path present on your subprocess. I did not have a solution till now. If you bind your app with gunicorn in terminal then your code will also work.