Google cloud functions missing logs issue - google-cloud-platform

I have a small python CF conencted to a PubSub topic that should send out some emails using the sendgrid API.
The CF can dynamically load & run functions based on a env var (CF_FUNCTION_NAME) provided (monorepo architecture):
# main.py
import logging
import os
from importlib import import_module
def get_function(function_name):
return getattr(import_module(f"functions.{function_name}"), function_name)
def do_nothing(*args):
return "no function"
cf_function_name = os.getenv("CF_FUNCTION_NAME", False)
disable_logging = os.getenv("CF_DISABLE_LOGGING", False)
def run(*args):
if not disable_logging and cf_function_name:
import google.cloud.logging
client = google.cloud.logging.Client()
client.get_default_handler()
client.setup_logging()
print("Logging enabled")
cf = get_function(cf_function_name) if cf_function_name else do_nothing
return cf(*args)
This works fine, except for some issues related to Stackdriver logging:
The print statement "Logging enabled" shoud be printed every invocation, but only happens once?
Exceptions rasied in the dynamically loaded function are missing in the logs, instead the logs just show 'finished with status crash', which is not very useful.
Screenshot of the stackdriver logs of multiple subsequent executions:
stackdriver screenshot
Is there something I'm missing here?
Is my dynamic loading of funcitons somehow messing witht the logging?
Thanks.

I don't see any issue here. When you load your function for the first time, one instance is created and the logging is enabled (your logging trace). Then, the instance stay up until its eviction (unpredictable!).
If you want to see several trace, perform 2 calls in the same time. Cloud Function instance can handle only one request at the same time. 2 calls in parallel imply the creation of another instance and thus, a new logging initialisation.
About the exception, same things. If you don't catch and print it, nothing will be logged. Simply catch them!

It seems like there is an issue with Cloud Functions and Python for a month now, where errors do not get logged automatically with tracebacks and categorized correctly as "Error": GCP Cloud Functions no longer categorizes errors correctly with tracebacks

Related

What would be a good way to initialize a subscription for pubsub

I am trying to use pubsub for the first time outside of cloud functions in a cloud run service, and I am confused about the use of createSubscription.
I trigger the message from a job in cloud scheduler, and when I set the topic it creates the topic in the project for me if it doesn't exist yet.
Now the cloud run service, when it starts, could call createSubscription, because initially there is no subscription yet. But it seems that createSubscription should only be called once (like createTopic) because after that I get an error saying a subscription with that name already exists.
I could place a try/catch around createSubscription and ignore the error on subsequent service deployments, but that seems a bit odd.
What would be a good way to initialize the subscription?
This is what we do in production - we have a try-catch block, so if the sub is already there, we ignore the exception. Make sure to also check the filters if necessary. They might change (if you change them programmatically, in this case you need to recreate the subscription)
TopicName topic = TopicName.ofProjectTopicName(projectId, this.topic);
try {
client.createSubscription("projects/xxx/subscriptions/" + subscriptionId, topic, PushConfig.getDefaultInstance(), 600);
} catch (AlreadyExistsException e) {
// sub is already there. nothing to do
}

How to use logger in DAG callbacks with Airflow running on Google Composer?

We are running Apache Airflow in a Google Cloud Composer environment. This runs a pre-built Airflow on Kubernetes, our image version is composer-2.0.32-airflow-2.3.4.
In my_dag.py, we can use the logging module to log something, and the output is visible under "Logs" in Cloud Composer.
import logging
log = logging.getLogger("airflow")
log.setLevel(logging.INFO)
log.info("Hello Airflow logging!")
However, when using the same logger in a callback (e.g. on_failure_callback of a DAG), the log lines do not appear anywyhere - not in the Airflow workers, nor the airflow-scheduler nor dag-processor-manager. I am triggering a DAG failure by setting a short (e.g. 5 minute) timeout, and I confirmed that the callback is indeed running by making an HTTP request to a webhook inside the callback. The webhook is called but the logs are nowhere to be found.
Is there a way to log something in a callback, and find the logs somewhere in Airflow?
Unfortunately in the on_failure_callback method, the logs doesn't appears in the DAG tasks logs (Webserver), but normally the logs are written in Cloud Logging.
In Cloud Logging, select the Cloud Composer Environment resource, then the location (europe-west1) and, finally, the name of the composer environment: composer-log-error-example.
Then select the airflow-worker :
You can check this link
Also for the log in Airflow DAGs and method called by on_failure_callback, I usually directly use the Python logging without other init and it works well :
import logging
def task_failure_alert(context):
logging.info("Hello Airflow logging!")

Possible to already have a "ready cursor" in a serverless environment?

Take the following two timings on a trivial SQL statement:
timeit.timeit("""
import MySQLdb;
import settings;
conn = MySQLdb.connect(host=settings.DATABASES['default']['HOST'], port=3306, user=settings.DATABASES['default']['USER'], passwd=settings.DATABASES['default']['PASSWORD'], db=settings.DATABASES['default']['NAME'], charset='utf8');
cursor=conn.cursor();
cursor.execute('select 1');
cursor.fetchone()
""", number=100
)
# 2.5417470932006836
And, the same thing but assuming we already have a cursor that is ready to execute a statement:
timeit.timeit("""
cursor.execute('select 1');
cursor.fetchone()""",
setup="""
import MySQLdb;
import settings;
conn = MySQLdb.connect(host=settings.DATABASES['default']['HOST'], port=3306, user=settings.DATABASES['default']['USER'], passwd=settings.DATABASES['default']['PASSWORD'], db=settings.DATABASES['default']['NAME'], charset='utf8');
cursor=conn.cursor()
""", number=100
)
# 0.1153109073638916
And so we see that the second approach is about 20x faster on initialization time when we don't have to create a new connection/cursor each time.
But how would it be possible to do something like this in a serverless environment? For example, if I were using Google Cloud Functions or Cloud Run, would it be possible to:
Authenticate the user in order to set up a cursor to the database; and
Open a websocket where they can then send the query each time? (For an open websocket, do we need to check authentication on the user each time?)
Or, is there a possible approach to deal with the above overhead in a serverless environment?
AS #John Hanley mentioned, on Cloud Run, your code can either run continuously as a service or as a job. Both services and jobs run in the same environment and can use the same integrations with other services on Google Cloud.
Cloud Run services-Used to run code that responds to web requests, or
events.
Cloud Run jobs. Used to run code that performs work (a job) and quits
when the work is done.
A Cloud Run instance that has at least one open WebSocket connection is considered active and is therefore billed Websocket connection.
WebSockets applications are supported on Cloud Run with no additional configuration required. However, WebSockets streams are HTTP requests still subject to the request timeout,configured for your Cloud Run service, so you need to make sure this setting works well for your use of WebSockets such as implementing reconnects in your clients.
I would also recommend you check links for a socket instance and sql connect functions.

How can I manually specify a X-Cloud-Trace-Context header value to and correlate and trace logs in separate Cloud Run requests?

I'm using Cloud Run and Cloud Tasks to do some async processing of webhooks. When I get a request to my Cloud Run service, I queue up a task in my Cloud Tasks queue and return a response from my service immediately. Cloud Tasks will then trigger my service again (different endpoint) and do some processing. I want to correlate all the logs in these steps by using the same trace id, but it is not working.
When creating a task in Cloud Tasks, I request it to send the X-Cloud-Trace-Context header and I fill it with the original request's X-Cloud-Trace-Context header value. Theoretically, when the request comes to my Cloud Run service from Cloud Tasks, it should have this header value, and all my logs will be correlated correctly. However, when this second request comes, it looks like Cloud Run is overriding the header with a new trace id.
Is there a way to prevent this from happening? If not, what is the recommended solution to correlate all the logs (generated by service code and also the logs auto generated by GCP) in the steps described above?
Thanks for the help!
We found that passing along the traceparent header into the cloud task works. The trace id is preserved and a new span/parent id is automatically assigned by cloudrun.
task = {
"http_request": {
"url": url,
"headers": {
"traceparent": request.headers.get('traceparent', "")
}
}
}
Note it also appears to work with "X-Cloud-Trace-Context" but you have to split the value and only pass along the trace id (ex the cloudrun header value is like "trace_id/span_id;flags" -- you have to split out just the trace_id and set that as the task header value). Otherwise it seems like cloudrun considers the header invalid and, as you mentioned, sets a whole new trace context.
As a related note - while this gets the right header into place you still need to actually log the trace_id in some fashion for your logs to correlate. Looks to me like the logs generated by cloudrun itself do this, but I had to configure my logger so that my logs would also be correlated.
I don't think you can override the HTTP headers set by Cloud Tasks, but you can override the trace member in the log records sent to Stack Driver.
So you could include the original trace ID in the task payload and then override the trace in the logs produced by your Cloud Run endpoint which performs the real work.

App Engine local datastore content does not persist

I'm running some basic test code, with web.py and GAE (Windows 7, Python27). The form enables messages to be posted to the datastore. When I stop the app and run it again, any data posted previously has disappeared. Adding entities manually using the admin (http://localhost:8080/_ah/admin/datastore) has the same problem.
I tried setting the path in the Application Settings using Extra flags:
--datastore_path=D:/path/to/app/
(Wasn't sure about syntax there). It had no effect. I searched my computer for *.datastore, and couldn't find any files, either, which seems suspect, although the data is obviously being stored somewhere for the duration of the app running.
from google.appengine.ext import db
import web
urls = (
'/', 'index',
'/note', 'note',
'/crash', 'crash'
)
render = web.template.render('templates/')
class Note(db.Model):
content = db.StringProperty(multiline=True)
date = db.DateTimeProperty(auto_now_add=True)
class index:
def GET(self):
notes = db.GqlQuery("SELECT * FROM Note ORDER BY date DESC LIMIT 10")
return render.index(notes)
class note:
def POST(self):
i = web.input('content')
note = Note()
note.content = i.content
note.put()
return web.seeother('/')
class crash:
def GET(self):
import logging
logging.error('test')
crash
app = web.application(urls, globals())
def main():
app.cgirun()
if __name__ == '__main__':
main()
UPDATE:
When I run it via command line, I get the following:
WARNING 2012-04-06 19:07:31,266 rdbms_mysqldb.py:74] The rdbms API is not available because the MySQLdb library could not be loaded.
INFO 2012-04-06 19:07:31,778 appengine_rpc.py:160] Server: appengine.google.com
WARNING 2012-04-06 19:07:31,783 datastore_file_stub.py:513] Could not read datastore data from c:\users\amy\appdata\local\temp\dev_appserver.datastore
WARNING 2012-04-06 19:07:31,851 dev_appserver.py:3394] Could not initialize images API; you are likely missing the Python "PIL" module. ImportError: No module named _imaging
INFO 2012-04-06 19:07:32,052 dev_appserver_multiprocess.py:647] Running application dev~palimpsest01 on port 8080: http://localhost:8080
INFO 2012-04-06 19:07:32,052 dev_appserver_multiprocess.py:649] Admin console is available at: http://localhost:8080/_ah/admin
Suggesting that the datastore... didn't install properly?
As of 1.6.4, we stopped saving the datastore after every write. This method did not work when simulating the transactional model found in the High Replication Datastore (you would lose the last couple writes). It is also horribly inefficient. We changed it so the datastore dev stub flushes all writes and saves its state on shut down. It sounds like the dev_appserver is not shutting down correctly. You should see:
Applying all pending transactions and saving the datastore
in the logs when shutting down the server (see source code and source code). If you don't, it means that the dev_appserver is not being shut down cleanly (with a TERM signal or KeyInterrupt).