AWS Lambda using Winston logging loses Request ID - amazon-web-services

When using console.log to add log rows to AWS CloudWatch, the Lambda Request ID is added on each row as described in the docs
A simplified example based on the above mentioned doc
exports.handler = async function(event, context) {
console.log("Hello");
return context.logStreamName
};
Would produce output such as
START RequestId: c793869b-ee49-115b-a5b6-4fd21e8dedac Version: $LATEST
2019-06-07T19:11:20.562Z c793869b-ee49-115b-a5b6-4fd21e8dedac INFO Hello
END RequestId: c793869b-ee49-115b-a5b6-4fd21e8dedac
REPORT RequestId: c793869b-ee49-115b-a5b6-4fd21e8dedac Duration: 170.19 ms Billed Duration: 200 ms Memory Size: 128 MB Max Memory Used: 73 MB
The relevant detail here regarding this question is the Request ID, c793869b-ee49-115b-a5b6-4fd21e8dedac which is added after the timestamp on the row with "Hello".
The AWS documentation states
To output logs from your function code, you can use methods on the console object, or any logging library that writes to stdout or stderr.
The Node.js runtime logs the START, END, and REPORT lines for each invocation, and adds a timestamp, request ID, and log level to each entry logged by the function.
When using Winston as a logger, the Request ID is lost. Could be an issued with formatters or transports. The logger is created like
const logger = createLogger({
level: 'debug',
format: combine(
timestamp(),
printf(
({ timestamp, level, message }) => `${timestamp} ${level}: ${message}`
)
),
transports: [new transports.Console()]
});
I also tried simple() formatter instead of printf(), but that has no effect on whether Request ID is present or not. Also removing formatting altogether still prints the plain text, i.e. no timestamp or request id.
I also checked the source code of Winston Console transport, and it uses either console._stdout.write if present, or console.log for writing, which is what the AWS documentation said to be supported.
Is there some way to configure Winston to keep the AWS Lambda Request ID as part of the message?
P.S. There are separate Winston Transports for AWS CloudWatch that I am aware of, but they require other setup functionality that I'd like to avoid if possible. And since the Request ID is readily available, they seem like an overkill.
P.P.S. Request ID can also be fetched from Lambda Context and custom logger object initialized with it, but I'd like to also avoid that, pretty much for the same reasons: extra work for something that should be readily available.

The issue is with the usage of console._stdout.write() / process._stdout.write(), which Winston built-in Console Transport uses when present.
For some reason lines written to stdout go to CloudWatch as is, and timestamp/request ID are not added to log rows as they are with console.log() calls.
There is a discussion on Github about making this a constructor option that could be selected on transport creation, but it was closed as a problem related to specific IDEs and how they handle stdout logs. The issue with AWS Lambdas is mentioned only as a side note in the discussion.
My solution was to make a custom transport for Winston, which always uses console.log() to write the messages and leave timestamp and request ID to be filled in by AWS Lambda Node runtime.
Addition 5/2020:
Below is an examples of my solution. Unfortunaly I cannot remember much of the details of this implementation, but I pretty much looked at Winston sources in Github and took the bare minimum implementation and forced use of console.log
'use strict';
const TransportStream = require('winston-transport');
class SimpleConsole extends TransportStream {
constructor(options = {}) {
super(options);
this.name = options.name || 'simple-console';
}
log(info, callback) {
setImmediate(() => this.emit('logged', info));
const MESSAGE = Symbol.for('message');
console.log(info[MESSAGE]);
if (callback) {
callback();
}
}
};
const logger = createLogger({
level: 'debug',
format: combine(
printf(({ level, message }) => `${level.toUpperCase()}: ${message}`)
),
transports: [new SimpleConsole()]
});
const debug = (...args) => logger.debug(args);
// ... And similar definition to other logging levels, info, warn, error etc
module.exports = {
debug
// Also export other logging levels..
};
Another option
As pointed out by #sanrodari in the comments, the same can be achieved by directly overriding the log method in built-in Console transport and force the use of console.log.
const logger = winston.createLogger({
transports: [
new winston.transports.Console({
log(info, callback) {
setImmediate(() => this.emit('logged', info));
if (this.stderrLevels[info[LEVEL]]) {
console.error(info[MESSAGE]);
if (callback) {
callback();
}
return;
}
console.log(info[MESSAGE]);
if (callback) {
callback();
}
}
})
]
});
See full example for more details

I know OP said they would like to avoid using the Lambda context object to add the request ID, but I wanted to share my solution with others who may not have this requirement. While the other answers require defining a custom transport or overriding the log method of the Console transport, for this solution you just need to add one line to the top of your handler function.
import { APIGatewayTokenAuthorizerEvent, Callback, Context } from "aws-lambda";
import { createLogger, format, transports } from "winston";
const logger = createLogger({
level: "debug",
format: format.json({ space: 2 }),
transports: new transports.Console()
});
export const handler = (
event: APIGatewayTokenAuthorizerEvent,
context: Context,
callback: Callback
): void => {
// Add this line to add the requestId to logs
logger.defaultMeta = { requestId: context.awsRequestId };
logger.info("This is an example log message"); // prints:
// {
// "level": "info",
// "message": "This is an example log message",
// "requestId": "ac1de841-ca30-4a09-9950-dd4fe7e37af8"
// }
};
Documentation for Lambda context object in Node.js
For other Winston formats like printf, you will need to add the requestId property to the format string. Not only is this more concise, but it has the benefit of allowing you to customize where the request ID appears in your log output, rather than always prepending the request ID like CloudWatch does.

As already mentioned by #kaskelloti AWS does not transforms messages logged by console._stdout.write() and console._stderr.write()
here is my modified solution which respects levels in AWS logs
const LEVEL = Symbol.for('level');
const MESSAGE = Symbol.for('message');
const logger = winston.createLogger({
transports: [
new winston.transports.Console({
log(logPayload, callback) {
setImmediate(() => this.emit('logged', logPayload));
const message = logPayload[MESSAGE]
switch (logPayload[LEVEL]) {
case "debug":
console.debug(message);
break
case "info":
console.info(message);
break
case "warn":
console.warn(message);
break
case "error":
console.error(message);
break
default:
//TODO: handle missing levels
break
}
if (callback) {
callback();
}
}
})
],
})

according to the AWS docs
To output logs from your function code, you can use methods on the console object, or any logging library that writes to stdout or stderr.
I ran a quick test using the following Winston setup in a lambda:
const path = require('path');
const { createLogger, format, transports } = require('winston');
const { combine, errors, timestamp } = format;
const baseFormat = combine(
timestamp({ format: 'YYYY-MM-DD HH:mm:ss' }),
errors({ stack: true }),
format((info) => {
info.level = info.level.toUpperCase();
return info;
})(),
);
const splunkFormat = combine(
baseFormat,
format.json(),
);
const prettyFormat = combine(
baseFormat,
format.prettyPrint(),
);
const createCustomLogger = (moduleName) => createLogger({
level: process.env.LOG_LEVEL,
format: process.env.PRETTY_LOGS ? prettyFormat : splunkFormat,
defaultMeta: { module: path.basename(moduleName) },
transports: [
new transports.Console(),
],
});
module.exports = createCustomLogger;
and in CloudWatch, I am NOT getting my Request ID. I am getting a timestamp from my own logs, so I'm less concerned about it. Not getting the Request ID is what bothers me

Related

How get path params in CDK + APIGateway + Lambda

So, turns out I had it all along but I was logging it out incorrectly. I had been doing a Object.keys(event).forEach and console logging each key and value. I guess this didn't display the value as it's a nested object. Using JSON.stringify, as per #robC3's answer shows all the nested objects and values properly and is easier too! TL;DR just use curly braces in your gateway paths and they will be present in event.pathParameters.whateverYouCalledThem
I'm used to express land where you just write /stuff/:things in your route and then req.params.things becomes available in your handler for 'stuff'.
I'm struggling to get the same basic functionality in CDK. I have a RestAPI called 'api' and resources like so...
const api = new apigateway.RestApi(this, "image-cache-api", { //options })
const stuff = api.root.addResource("stuff")
const stuffWithId = get.addResource("{id}")
stuffWithId.addMethod("GET", new apigateway.LambdaIntegration(stuffLambda, options))
Then I deploy the function and call it at https://<api path>/stuff/1234
Then in my lambda I check event.pathParameters and it is this: {id: undefined}
I've had a look through the event object and the only place I can see 1234 is in the path /stuff/1234 and while I could just parse it out of that I'm sure that's not how it's supposed to work.
:/
Most of the things I have turned up while googling mention "mapping templates". That seems overly complicated for such a common use case so I had been working to the assumption there would be some sort of default mapping. Now I'm starting to think there isn't. Do I really have to specify a mapping template just to get access to path params and, if so, where should it go in my CDK stack code?
I tried the following...
stuffWithId.addMethod("GET", new apigateway.LambdaIntegration(stuffLambda, {
requestTemplate: {
"id": "$input.params('id')",
}
}))
But got the error...
error TS2559: Type '{ requestTemplate: { id: string; }; }' has no properties in common with type 'LambdaIntegrationOptions'.
I'm pretty confused as to whether I need requestTemplate, requestParametes, or something else entirely as all the examples I have found so far are for the console rather than CDK.
This works fine, and you can see where the full path, path params, query params, etc., are in the event structure when you test it in a browser.
// lambdas/handler.ts
// This code uses #types/aws-lambda for typescript convenience.
// Build first, and then deploy the .js handler.
import { APIGatewayProxyHandler, APIGatewayProxyResult } from 'aws-lambda';
export const main: APIGatewayProxyHandler = async (event, context, callback) => {
return <APIGatewayProxyResult> {
body: JSON.stringify([ event, context ], null, 4),
statusCode: 200,
};
}
// apig-lambda-proxy-demo-stack.ts
import * as path from 'path';
import { aws_apigateway, aws_lambda, Stack, StackProps } from 'aws-cdk-lib';
import { Construct } from 'constructs';
export class ApigLambdaProxyDemoStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
const stuffLambda = new aws_lambda.Function(this, 'stuff-lambda', {
code: aws_lambda.Code.fromAsset(path.join('dist', 'lambdas')),
handler: 'handler.main',
runtime: aws_lambda.Runtime.NODEJS_14_X,
});
const api = new aws_apigateway.RestApi(this, 'image-cache-api');
const stuff = api.root.addResource('stuff');
const stuffWithId = stuff.addResource('{id}');
stuffWithId.addMethod('GET', new aws_apigateway.LambdaIntegration(stuffLambda));
}
}
This sample query:
https://[id].execute-api.[region].amazonaws.com/prod/stuff/1234?q1=foo&q2=bar
gives this response (excerpt):
If you want to handle arbitrary paths at a certain point in your API, you'll want to explore the IResource.addProxy() CDK method. For example,
api.root.addProxy({
defaultIntegration: new aws_apigateway.LambdaIntegration(stuffLambda),
});
That creates a {proxy+} resource at the API root in the example and would forward all requests to the lambda. Rather than configuring every single endpoint in API Gateway, you could handle them all in your same handler.
First thing to note is that all cdk using LambdaIntegration module actually have to be Post - Get methods with LambdaIntegration don't function as you are sending data to the Lambda. If you want to do a get specifically you have to write custom methods in the api for it.
Now, I have only done this in Python, but hopefully you can get the idea:
my_rest_api = apigateway.RestApi(
self, "MyAPI",
retain_deployments=True,
deploy_options=apigateway.StageOptions(
logging_level=apigateway.MethodLoggingLevel.INFO,
stage_name="Dev
)
)
a_resource = apigateway.Resource(
self, "MyResource",
parent=my_rest_api.root,
path_part="Stuff"
)
my_method = apigateway.Method(
self, "MyMethod",
http_method="POST",
resource=quoting_resource,
integration=apigateway.AwsIntegration(
service="lambda",
integration_http_method="POST",
path="my:function:arn"
)
)
your Resource construct defines your path - you can chain multiple resources together if you want to have methods off each level, or just put them all together in path_part - so you could have resourceA defined, and use it as the parent in resourceB - which would get you resourceAPathPart/resourceBPathPart/ to access your lambda.
or you can put it all together in resourceA with path_part = stuff/path/ect
I used the AwsIntegration method here instead of LambdaIntegration because, in the full code, I'm using stage variables to dynamically pick different lambdas depending on what stage im in, but the effect is rather similar

GCP Cloud Tasks: shorten period for creating a previously created named task

We are developing a GCP Cloud Task based queue process that sends a status email whenever a particular Firestore doc write-trigger fires. The reason we use Cloud Tasks is so a delay can be created (using scheduledTime property 2-min in the future) before the email is sent, and to control dedup (by using a task-name formatted as: [firestore-collection-name]-[doc-id]) since the 'write' trigger on the Firestore doc can be fired several times as the document is being created and then quickly updated by backend cloud functions.
Once the task's delay period has been reached, the cloud-task runs, and the email is sent with updated Firestore document info included. After which the task is deleted from the queue and all is good.
Except:
If the user updates the Firestore doc (say 20 or 30 min later) we want to resend the status email but are unable to create the task using the same task-name. We get the following error:
409 The task cannot be created because a task with this name existed too recently. For more information about task de-duplication see https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/create#body.request_body.FIELDS.task.
This was unexpected as the queue is empty at this point as the last task completed succesfully. The documentation referenced in the error message says:
If the task's queue was created using Cloud Tasks, then another task
with the same name can't be created for ~1hour after the original task
was deleted or executed.
Question: is there some way in which this restriction can be by-passed by lowering the amount of time, or even removing the restriction all together?
The short answer is No. As you've already pointed, the docs are very clear regarding this behavior and you should wait 1 hour to create a task with same name as one that was previously created. The API or Client Libraries does not allow to decrease this time.
Having said that, I would suggest that instead of using the same Task ID, use different ones for the task and add an identifier in the body of the request. For example, using Python:
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
def create_task(project, queue, location, payload=None, in_seconds=None):
client = tasks_v2.CloudTasksClient()
parent = client.queue_path(project, location, queue)
task = {
'app_engine_http_request': {
'http_method': 'POST',
'relative_uri': '/task/'+queue
}
}
if payload is not None:
converted_payload = payload.encode()
task['app_engine_http_request']['body'] = converted_payload
if in_seconds is not None:
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
task['schedule_time'] = timestamp
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
print(response)
#You can change DOCUMENT_ID with USER_ID or something to identify the task
create_task(PROJECT_ID, QUEUE, REGION, DOCUMENT_ID)
Facing a similar problem of requiring to debounce multiple instances of Firestore write-trigger functions, we worked around the default Cloud Tasks task-name based dedup mechanism (still a constraint in Nov 2022) by building a small debounce "helper" using Firestore transactions.
We're using a helper collection _syncHelper_ to implement a delayed throttle for side effects of write-trigger fires - in the OP's case, send 1 email for all writes within 2 minutes.
In our case we are using Firebease Functions task queue utils and not directly interacting with Cloud Tasks but thats immaterial to the solution. The key is to determine the task's execution time in advance and use that as the "dedup key":
async function enqueueTask(shopId) {
const queueName = 'doSomething';
const now = new Date();
const next = new Date(now.getTime() + 2 * 60 * 1000);
try {
const shouldEnqueue = await getFirestore().runTransaction(async t=>{
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate()> now) {
return false;
}
await t.set(syncRef, { timestamp: Timestamp.fromDate(next) });
return true;
});
if (shouldEnqueue) {
let queue = getFunctions().taskQueue(queueName);
await queue.enqueue({
timestamp: next.toISOString(),
},
{ scheduleTime: next }); }
} catch {
...
}
}
This will ensure a new task is enqueued only if the "next execution" time has passed.
The execution operation (also a cloud function in our case) will remove the sync data entry if it hasn't been changed since it was executed:
exports.doSomething = functions.tasks.taskQueue({
retryConfig: {
maxAttempts: 2,
minBackoffSeconds: 60,
},
rateLimits: {
maxConcurrentDispatches: 2,
}
}).onDispatch(async data => {
let { timestamp } = data;
await sendYourEmailHere();
await getFirestore().runTransaction(async t => {
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate() <= new Date(timestamp)) {
await t.delete(syncRef);
}
});
});
This isn't a bullet proof solution (if the doSomething() execution function has high latency for example) but good enough for 99% of our use cases.

Stackdriver log entries with same trace ID not associated

I have a NodeJS project hosted on GKE with trace agent and Stackdriver logging enabled. The project is logging to stdout using winston like this:
const { createLogger, format, transports } = require('winston');
const { combine, json } = format;
const addTraceId = format(info => {
const agent = global._google_trace_agent;
if (agent) {
const traceProjectId = agent.getWriterProjectId();
const traceId = agent.getCurrentContextId();
if (traceProjectId && traceId) {
info['logging.googleapis.com/trace'] = `projects/${traceProjectId}/traces/${traceId}`;
}
}
});
createLogger({
level: 'debug',
transports: new transports.Console()
format: combine(
addTraceId(),
json()
);
});
I can see traceId appear in Stackdriver and consistent across the logs within same trace. But they are all individual log entries instead of collapsed under the first entry.
I checked the request log has header x-cloud-trace-context: "a54d7110fc59c879b7ae67fb481fb89b/113593995793831;o=1" as well.
Also, I'm able to see in the tracing done properly trace list console.
And when I deploy the same to GAE I can see logs associated and collapsed under the first entry. Any ideas?
Use logName app instead of stdout

AWS RDSDataService query not running

I'm trying to use RDSDataService to query an Aurora Serverless database. When I'm trying to query, my lambda just times out (I've set it up to 5 minutes just to make sure it isn't a problem with that). I have 1 record in my database and when I try to query it, I get no results, and neither the error or data flows are called. I've verified executeSql is called by removing the dbClusterOrInstanceArn from my params and it throw the exception for not having it.
I have also run SHOW FULL PROCESSLIST in the query editor to see if the queries were still running and they are not. I've given the lambda both the AmazonRDSFullAccess and AmazonRDSDataFullAccess policies without any luck either. You can see by the code below, i've already tried what was recommended in issue #2376.
Not that this should matter, but this lambda is triggered by a Kinesis event trigger.
const AWS = require('aws-sdk');
exports.handler = (event, context, callback) => {
const RDS = new AWS.RDSDataService({apiVersion: '2018-08-01', region: 'us-east-1'})
for (record of event.Records) {
const payload = JSON.parse(new Buffer(record.kinesis.data, 'base64').toString('utf-8'));
const data = compileItem(payload);
const params = {
awsSecretStoreArn: 'arn:aws:secretsmanager:us-east-1:149070771508:secret:xxxxxxxxx,
dbClusterOrInstanceArn: 'arn:aws:rds:us-east-1:149070771508:cluster:xxxxxxxxx',
sqlStatements: `select * from MY_DATABASE.MY_TABLE`
// database: 'MY_DATABASE'
}
console.log('calling executeSql');
RDS.executeSql(params, (error, data) => {
if (error) {
console.log('error', error)
callback(error, null);
} else {
console.log('data', data);
callback(null, { success: true })
}
});
}
}
EDIT: We've run the command through the aws cli and it returns results.
EDIT 2: I'm able to connect to it using the mysql2 package and connecting to it through the URI, so it's defiantly an issue with either the aws-sdk or how I'm using it.
Nodejs excution is not waiting for the result that's why process exit before completing the request.
use mysql library https://www.npmjs.com/package/serverless-mysql
OR
use context.callbackWaitsForEmptyEventLoop =false
Problem was the RDS had to be crated in a VPC, in which the Lambda's were not in

Aws Pass Value from Lambda Trigger to Step Function

When using a Lambda Function to trigger a Step Function, how do you get the output of the function that triggered it?
Ok, so if you want to pass an input to a Step Function execution (or, more exactly, your 'State Machine''s execution), you just need to set said input the input property when calling StartExecution (see AWS Documentation: Start Execution)
In your case, it would most likely be your lambda's last step before calling it's callback.
If it's a node js lambda, that's what it would look like
const AWS = require("aws-sdk");
const stepfunctions = new AWS.StepFunctions();
exports.myHandler = function(event, context, callback) {
... your function's code
const params = {
stateMachineArn: 'YOUR_STATE_MACHINE_ARN', /* required */
input: 'STRINGIFIED INPUT',
name: 'AN EXECUTION NAME (such as an uuid or whatever)'
};
stepfunctions.startExecution(params, function(err, data) {
if (err) callback(err); // an error occurred
else callback(null, "some success message"); // successful response
});
}
Alternatively, if your payload is too big, you could store the data in S3 or DynamoDB and pass the reference to it as your State Machine's execution's input.