Stackdriver log entries with same trace ID not associated - google-cloud-platform

I have a NodeJS project hosted on GKE with trace agent and Stackdriver logging enabled. The project is logging to stdout using winston like this:
const { createLogger, format, transports } = require('winston');
const { combine, json } = format;
const addTraceId = format(info => {
const agent = global._google_trace_agent;
if (agent) {
const traceProjectId = agent.getWriterProjectId();
const traceId = agent.getCurrentContextId();
if (traceProjectId && traceId) {
info['logging.googleapis.com/trace'] = `projects/${traceProjectId}/traces/${traceId}`;
}
}
});
createLogger({
level: 'debug',
transports: new transports.Console()
format: combine(
addTraceId(),
json()
);
});
I can see traceId appear in Stackdriver and consistent across the logs within same trace. But they are all individual log entries instead of collapsed under the first entry.
I checked the request log has header x-cloud-trace-context: "a54d7110fc59c879b7ae67fb481fb89b/113593995793831;o=1" as well.
Also, I'm able to see in the tracing done properly trace list console.
And when I deploy the same to GAE I can see logs associated and collapsed under the first entry. Any ideas?

Use logName app instead of stdout

Related

Get generated API key from AWS AppSync API created with CDK

I'm trying to access data from my stack where I'm creating an AppSync API. I want to be able to use the generated Stacks' url and apiKey but I'm running into issues with them being encoded/tokenized.
In my stack I'm setting some fields to the outputs of the deployed stack:
this.ApiEndpoint = graphAPI.url;
this.Authorization = graphAPI.graphqlApi.apiKey;
When trying to access these properties I get something like ${Token[TOKEN.209]} and not the values.
If I'm trying to resolve the token like so: this.resolve(graphAPI.graphqlApi.apiKey) I instead get { 'Fn::GetAtt': [ 'AppSyncAPIApiDefaultApiKey537321373E', 'ApiKey' ] }.
But I would like to retrieve the key itself as a string, like da2-10lksdkxn4slcrahnf4ka5zpeemq5i.
How would I go about actually extracting the string values for these properties?
The actual values of such Tokens are available only at deploy-time. Before then you can safely pass these token properties between constructs in your CDK code, but they are opaque placeholders until deployed. Depending on your use case, one of these options can help retrieve the deploy-time values:
If you define CloudFormation Outputs for a variable, CDK will (apart from creating it in CloudFormation), will, after cdk deploy, print its value to the console and optionally write it to a json file you pass with the --outputs-file flag.
// AppsyncStack.ts
new cdk.CfnOutput(this, 'ApiKey', {
value: this.api.apiKey ?? 'UNDEFINED',
exportName: 'api-key',
});
// at deploy-time, if you use a flag: --outputs-file cdk.outputs.json
{
"AppsyncStack": {
"ApiKey": "da2-ou5z5di6kjcophixxxxxxxxxx",
"GraphQlUrl": "https://xxxxxxxxxxxxxxxxx.appsync-api.us-east-1.amazonaws.com/graphql"
}
}
Alternatively, you can write a script to fetch the data post-deploy using the listGraphqlApis and listApiKeys commands from the appsync JS SDK client. You can run the script locally or, for advanced use cases, wrap the script in a CDK Custom Resource construct for deploy-time integration.
Thanks to #fedonev I was able to extract the API key and url like so:
const client = new AppSyncClient({ region: "eu-north-1" });
const command = new ListGraphqlApisCommand({ maxResults: 1 });
const res = await client.send(command);
if (res.graphqlApis) {
const apiKeysCommand = new ListApiKeysCommand({
apiId: res.graphqlApis[0].apiId,
});
const apiKeyResponse = await client.send(apiKeysCommand);
const urls = flatMap(res.graphqlApis[0].uris);
if (apiKeyResponse.apiKeys && res.graphqlApis[0].uris) {
sendSlackMessage(urls[1], apiKeyResponse.apiKeys[0].id || "");
}
}

GCP Cloud Tasks: shorten period for creating a previously created named task

We are developing a GCP Cloud Task based queue process that sends a status email whenever a particular Firestore doc write-trigger fires. The reason we use Cloud Tasks is so a delay can be created (using scheduledTime property 2-min in the future) before the email is sent, and to control dedup (by using a task-name formatted as: [firestore-collection-name]-[doc-id]) since the 'write' trigger on the Firestore doc can be fired several times as the document is being created and then quickly updated by backend cloud functions.
Once the task's delay period has been reached, the cloud-task runs, and the email is sent with updated Firestore document info included. After which the task is deleted from the queue and all is good.
Except:
If the user updates the Firestore doc (say 20 or 30 min later) we want to resend the status email but are unable to create the task using the same task-name. We get the following error:
409 The task cannot be created because a task with this name existed too recently. For more information about task de-duplication see https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/create#body.request_body.FIELDS.task.
This was unexpected as the queue is empty at this point as the last task completed succesfully. The documentation referenced in the error message says:
If the task's queue was created using Cloud Tasks, then another task
with the same name can't be created for ~1hour after the original task
was deleted or executed.
Question: is there some way in which this restriction can be by-passed by lowering the amount of time, or even removing the restriction all together?
The short answer is No. As you've already pointed, the docs are very clear regarding this behavior and you should wait 1 hour to create a task with same name as one that was previously created. The API or Client Libraries does not allow to decrease this time.
Having said that, I would suggest that instead of using the same Task ID, use different ones for the task and add an identifier in the body of the request. For example, using Python:
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
def create_task(project, queue, location, payload=None, in_seconds=None):
client = tasks_v2.CloudTasksClient()
parent = client.queue_path(project, location, queue)
task = {
'app_engine_http_request': {
'http_method': 'POST',
'relative_uri': '/task/'+queue
}
}
if payload is not None:
converted_payload = payload.encode()
task['app_engine_http_request']['body'] = converted_payload
if in_seconds is not None:
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
task['schedule_time'] = timestamp
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
print(response)
#You can change DOCUMENT_ID with USER_ID or something to identify the task
create_task(PROJECT_ID, QUEUE, REGION, DOCUMENT_ID)
Facing a similar problem of requiring to debounce multiple instances of Firestore write-trigger functions, we worked around the default Cloud Tasks task-name based dedup mechanism (still a constraint in Nov 2022) by building a small debounce "helper" using Firestore transactions.
We're using a helper collection _syncHelper_ to implement a delayed throttle for side effects of write-trigger fires - in the OP's case, send 1 email for all writes within 2 minutes.
In our case we are using Firebease Functions task queue utils and not directly interacting with Cloud Tasks but thats immaterial to the solution. The key is to determine the task's execution time in advance and use that as the "dedup key":
async function enqueueTask(shopId) {
const queueName = 'doSomething';
const now = new Date();
const next = new Date(now.getTime() + 2 * 60 * 1000);
try {
const shouldEnqueue = await getFirestore().runTransaction(async t=>{
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate()> now) {
return false;
}
await t.set(syncRef, { timestamp: Timestamp.fromDate(next) });
return true;
});
if (shouldEnqueue) {
let queue = getFunctions().taskQueue(queueName);
await queue.enqueue({
timestamp: next.toISOString(),
},
{ scheduleTime: next }); }
} catch {
...
}
}
This will ensure a new task is enqueued only if the "next execution" time has passed.
The execution operation (also a cloud function in our case) will remove the sync data entry if it hasn't been changed since it was executed:
exports.doSomething = functions.tasks.taskQueue({
retryConfig: {
maxAttempts: 2,
minBackoffSeconds: 60,
},
rateLimits: {
maxConcurrentDispatches: 2,
}
}).onDispatch(async data => {
let { timestamp } = data;
await sendYourEmailHere();
await getFirestore().runTransaction(async t => {
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate() <= new Date(timestamp)) {
await t.delete(syncRef);
}
});
});
This isn't a bullet proof solution (if the doSomething() execution function has high latency for example) but good enough for 99% of our use cases.

AWS Lambda using Winston logging loses Request ID

When using console.log to add log rows to AWS CloudWatch, the Lambda Request ID is added on each row as described in the docs
A simplified example based on the above mentioned doc
exports.handler = async function(event, context) {
console.log("Hello");
return context.logStreamName
};
Would produce output such as
START RequestId: c793869b-ee49-115b-a5b6-4fd21e8dedac Version: $LATEST
2019-06-07T19:11:20.562Z c793869b-ee49-115b-a5b6-4fd21e8dedac INFO Hello
END RequestId: c793869b-ee49-115b-a5b6-4fd21e8dedac
REPORT RequestId: c793869b-ee49-115b-a5b6-4fd21e8dedac Duration: 170.19 ms Billed Duration: 200 ms Memory Size: 128 MB Max Memory Used: 73 MB
The relevant detail here regarding this question is the Request ID, c793869b-ee49-115b-a5b6-4fd21e8dedac which is added after the timestamp on the row with "Hello".
The AWS documentation states
To output logs from your function code, you can use methods on the console object, or any logging library that writes to stdout or stderr.
The Node.js runtime logs the START, END, and REPORT lines for each invocation, and adds a timestamp, request ID, and log level to each entry logged by the function.
When using Winston as a logger, the Request ID is lost. Could be an issued with formatters or transports. The logger is created like
const logger = createLogger({
level: 'debug',
format: combine(
timestamp(),
printf(
({ timestamp, level, message }) => `${timestamp} ${level}: ${message}`
)
),
transports: [new transports.Console()]
});
I also tried simple() formatter instead of printf(), but that has no effect on whether Request ID is present or not. Also removing formatting altogether still prints the plain text, i.e. no timestamp or request id.
I also checked the source code of Winston Console transport, and it uses either console._stdout.write if present, or console.log for writing, which is what the AWS documentation said to be supported.
Is there some way to configure Winston to keep the AWS Lambda Request ID as part of the message?
P.S. There are separate Winston Transports for AWS CloudWatch that I am aware of, but they require other setup functionality that I'd like to avoid if possible. And since the Request ID is readily available, they seem like an overkill.
P.P.S. Request ID can also be fetched from Lambda Context and custom logger object initialized with it, but I'd like to also avoid that, pretty much for the same reasons: extra work for something that should be readily available.
The issue is with the usage of console._stdout.write() / process._stdout.write(), which Winston built-in Console Transport uses when present.
For some reason lines written to stdout go to CloudWatch as is, and timestamp/request ID are not added to log rows as they are with console.log() calls.
There is a discussion on Github about making this a constructor option that could be selected on transport creation, but it was closed as a problem related to specific IDEs and how they handle stdout logs. The issue with AWS Lambdas is mentioned only as a side note in the discussion.
My solution was to make a custom transport for Winston, which always uses console.log() to write the messages and leave timestamp and request ID to be filled in by AWS Lambda Node runtime.
Addition 5/2020:
Below is an examples of my solution. Unfortunaly I cannot remember much of the details of this implementation, but I pretty much looked at Winston sources in Github and took the bare minimum implementation and forced use of console.log
'use strict';
const TransportStream = require('winston-transport');
class SimpleConsole extends TransportStream {
constructor(options = {}) {
super(options);
this.name = options.name || 'simple-console';
}
log(info, callback) {
setImmediate(() => this.emit('logged', info));
const MESSAGE = Symbol.for('message');
console.log(info[MESSAGE]);
if (callback) {
callback();
}
}
};
const logger = createLogger({
level: 'debug',
format: combine(
printf(({ level, message }) => `${level.toUpperCase()}: ${message}`)
),
transports: [new SimpleConsole()]
});
const debug = (...args) => logger.debug(args);
// ... And similar definition to other logging levels, info, warn, error etc
module.exports = {
debug
// Also export other logging levels..
};
Another option
As pointed out by #sanrodari in the comments, the same can be achieved by directly overriding the log method in built-in Console transport and force the use of console.log.
const logger = winston.createLogger({
transports: [
new winston.transports.Console({
log(info, callback) {
setImmediate(() => this.emit('logged', info));
if (this.stderrLevels[info[LEVEL]]) {
console.error(info[MESSAGE]);
if (callback) {
callback();
}
return;
}
console.log(info[MESSAGE]);
if (callback) {
callback();
}
}
})
]
});
See full example for more details
I know OP said they would like to avoid using the Lambda context object to add the request ID, but I wanted to share my solution with others who may not have this requirement. While the other answers require defining a custom transport or overriding the log method of the Console transport, for this solution you just need to add one line to the top of your handler function.
import { APIGatewayTokenAuthorizerEvent, Callback, Context } from "aws-lambda";
import { createLogger, format, transports } from "winston";
const logger = createLogger({
level: "debug",
format: format.json({ space: 2 }),
transports: new transports.Console()
});
export const handler = (
event: APIGatewayTokenAuthorizerEvent,
context: Context,
callback: Callback
): void => {
// Add this line to add the requestId to logs
logger.defaultMeta = { requestId: context.awsRequestId };
logger.info("This is an example log message"); // prints:
// {
// "level": "info",
// "message": "This is an example log message",
// "requestId": "ac1de841-ca30-4a09-9950-dd4fe7e37af8"
// }
};
Documentation for Lambda context object in Node.js
For other Winston formats like printf, you will need to add the requestId property to the format string. Not only is this more concise, but it has the benefit of allowing you to customize where the request ID appears in your log output, rather than always prepending the request ID like CloudWatch does.
As already mentioned by #kaskelloti AWS does not transforms messages logged by console._stdout.write() and console._stderr.write()
here is my modified solution which respects levels in AWS logs
const LEVEL = Symbol.for('level');
const MESSAGE = Symbol.for('message');
const logger = winston.createLogger({
transports: [
new winston.transports.Console({
log(logPayload, callback) {
setImmediate(() => this.emit('logged', logPayload));
const message = logPayload[MESSAGE]
switch (logPayload[LEVEL]) {
case "debug":
console.debug(message);
break
case "info":
console.info(message);
break
case "warn":
console.warn(message);
break
case "error":
console.error(message);
break
default:
//TODO: handle missing levels
break
}
if (callback) {
callback();
}
}
})
],
})
according to the AWS docs
To output logs from your function code, you can use methods on the console object, or any logging library that writes to stdout or stderr.
I ran a quick test using the following Winston setup in a lambda:
const path = require('path');
const { createLogger, format, transports } = require('winston');
const { combine, errors, timestamp } = format;
const baseFormat = combine(
timestamp({ format: 'YYYY-MM-DD HH:mm:ss' }),
errors({ stack: true }),
format((info) => {
info.level = info.level.toUpperCase();
return info;
})(),
);
const splunkFormat = combine(
baseFormat,
format.json(),
);
const prettyFormat = combine(
baseFormat,
format.prettyPrint(),
);
const createCustomLogger = (moduleName) => createLogger({
level: process.env.LOG_LEVEL,
format: process.env.PRETTY_LOGS ? prettyFormat : splunkFormat,
defaultMeta: { module: path.basename(moduleName) },
transports: [
new transports.Console(),
],
});
module.exports = createCustomLogger;
and in CloudWatch, I am NOT getting my Request ID. I am getting a timestamp from my own logs, so I'm less concerned about it. Not getting the Request ID is what bothers me

AWS RDSDataService query not running

I'm trying to use RDSDataService to query an Aurora Serverless database. When I'm trying to query, my lambda just times out (I've set it up to 5 minutes just to make sure it isn't a problem with that). I have 1 record in my database and when I try to query it, I get no results, and neither the error or data flows are called. I've verified executeSql is called by removing the dbClusterOrInstanceArn from my params and it throw the exception for not having it.
I have also run SHOW FULL PROCESSLIST in the query editor to see if the queries were still running and they are not. I've given the lambda both the AmazonRDSFullAccess and AmazonRDSDataFullAccess policies without any luck either. You can see by the code below, i've already tried what was recommended in issue #2376.
Not that this should matter, but this lambda is triggered by a Kinesis event trigger.
const AWS = require('aws-sdk');
exports.handler = (event, context, callback) => {
const RDS = new AWS.RDSDataService({apiVersion: '2018-08-01', region: 'us-east-1'})
for (record of event.Records) {
const payload = JSON.parse(new Buffer(record.kinesis.data, 'base64').toString('utf-8'));
const data = compileItem(payload);
const params = {
awsSecretStoreArn: 'arn:aws:secretsmanager:us-east-1:149070771508:secret:xxxxxxxxx,
dbClusterOrInstanceArn: 'arn:aws:rds:us-east-1:149070771508:cluster:xxxxxxxxx',
sqlStatements: `select * from MY_DATABASE.MY_TABLE`
// database: 'MY_DATABASE'
}
console.log('calling executeSql');
RDS.executeSql(params, (error, data) => {
if (error) {
console.log('error', error)
callback(error, null);
} else {
console.log('data', data);
callback(null, { success: true })
}
});
}
}
EDIT: We've run the command through the aws cli and it returns results.
EDIT 2: I'm able to connect to it using the mysql2 package and connecting to it through the URI, so it's defiantly an issue with either the aws-sdk or how I'm using it.
Nodejs excution is not waiting for the result that's why process exit before completing the request.
use mysql library https://www.npmjs.com/package/serverless-mysql
OR
use context.callbackWaitsForEmptyEventLoop =false
Problem was the RDS had to be crated in a VPC, in which the Lambda's were not in

How to return an entire Datastore table by name using Node.js on a Google Cloud Function

I want to retrieve a table (with all rows) by name. I want to HTTP request using something like this on the body {"table": user}.
Tried this code without success:
'use strict';
const {Datastore} = require('#google-cloud/datastore');
// Instantiates a client
const datastore = new Datastore();
exports.getUsers = (req, res) => {
//Get List
const query = this.datastore.createQuery('users');
this.datastore.runQuery(query).then(results => {
const customers = results[0];
console.log('User:');
customers.forEach(customer => {
const cusKey = customer[this.datastore.KEY];
console.log(cusKey.id);
console.log(customer);
});
})
.catch(err => { console.error('ERROR:', err); });
}
Google Datastore is a NoSQL database that is working with entities and not tables. What you want is to load all the "records" which are "key identifiers" in Datastore and all their "properties", which is the "columns" that you see in the Console. But you want to load them based the "Kind" name which is the "table" that you are referring to.
Here is a solution on how to retrieve all the key identifiers and their properties from Datastore, using HTTP trigger Cloud Function running in Node.js 8 environment.
Create a Google Cloud Function and choose the trigger to HTTP.
Choose the runtime to be Node.js 8
In index.js replace all the code with this GitHub code.
In package.json add:
{
"name": "sample-http",
"version": "0.0.1",
"dependencies": {
"#google-cloud/datastore": "^3.1.2"
}
}
Under Function to execute add loadDataFromDatastore, since this is the name of the function that we want to execute.
NOTE: This will log all the loaded records into the Stackdriver logs
of the Cloud Function. The response for each record is a JSON,
therefore you will have to convert the response to a JSON object to
get the data you want. Get the idea and modify the code accordingly.