Kinesis Data Stream GetRecords giving Junk Data - amazon-web-services

While doing the Getrecords from Kinesis Data Stream it is is giving values like ����Y�ݽ#��5���I$ݔѵ�����ս��aQ���̽�9������ռ�� intermittently. I have a large data and the Junk data comes in between the correct data.
My Code is Nodejs in Lambda and is triggered by SQS (which is a DLQ of an upstream lambda that process the Kinesis Stream) . This lambda is expected to do the reprocessing logic of the failed messages of the upstream lambda.
Any suggestion would be very helpful. Thanks.
Sample Data :
{ "Attribute1" : "Value with over 400 charcters available ����Y�ݽ#��5���I$ݔѵ�����ս��aQ���̽�9������ռ�� . to make it work" , "Attribute2" : "value2" , "Attribute3" : "value3"}
var getKinesisRecords = async function (shardIterator){
var getParams = {
ShardIterator: shardIterator,
Limit: 1
};
await kinesis.getRecords(getParams, function(err, result) {
if (err) {console.log(err, err.stack); return null;}// an error occurred
else {
try {
for (var record in result.Records) {
var data = result.Records[record].Data
console.log("Kinesis Record data"+data);
}
} catch(err) {
console.log("Error while getting Record from Kinesis");
console.log(err);
return null;
}
if (result.NextShardIterator) getKinesisRecords(result.NextShardIterator);
} // successful response
}).promise();
//Process the record
}

Related

Piping a API Gateway response to client through Lambda handler

I have a REST API using AWS API Gateway. The API is handled by a custom Lambda function. I have a /prompts endpoint in my API, for which the Lambda function will call Open AI API, send it the prompt, and stream the result to the user as it is being generated (which can take a few seconds).
I'm able to stream and handle the response from Open AI's API to my Lambda function.
I would now like to re-stream / pipe that response to the client.
My question is how to do that?
Is there a way to simply pipe the stream being received from Open AI API to my client?
My Lambda function is:
ry {
const res = await openai.createCompletion({
...params,
stream: true,
}, { responseType: 'stream' });
res.data.on('data', data => {
const lines = data.toString().split('\n').filter(line => line.trim() !== '');
for (const line of lines) {
const message = line.replace(/^data: /, '');
if (message === '[DONE]') {
// store the response to DynamoDB
storeRecord(content)
return content
}
try {
const parsed = JSON.parse(message);
content += parsed.choices[0].text
// ****** I want to send content to the front-end client... *******
} catch(error) {
console.error('Could not JSON parse stream message', message, error);
}
}
});
} catch (error) {
if (error.response?.status) {
console.error(error.response.status, error.message);
error.response.data.on('data', data => {
const message = data.toString();
try {
const parsed = JSON.parse(message);
console.error('An error occurred during OpenAI request: ', parsed);
} catch(error) {
console.error('An error occurred during OpenAI request: ', message);
}
});
} else {
console.error('An error occurred during OpenAI request', error);
}
}

Self invoking lambda invocation timing out

We're trying to develop a self-invoking lambda to process S3 files in chunks. The lambda role has the policies needed for the invocation attached.
Here's the code for the self-invoking lambda:
export const processFileHandler: Handler = async (
event: S3CreateEvent,
context: Context,
callback: Callback,
) => {
let bucket = loGet(event, 'Records[0].s3.bucket.name');
let key = loGet(event, 'Records[0].s3.object.key');
let totalFileSize = loGet(event, 'Records[0].s3.object.size');
const lastPosition = loGet(event, 'position', 0);
const nextRange = getNextSizeRange(lastPosition, totalFileSize);
context.callbackWaitsForEmptyEventLoop = false;
let data = await loadDataFromS3ByRange(bucket, key, nextRange);
await database.connect();
log.debug(`Successfully connected to the database`);
const docs = await getParsedDocs(data, lastPosition);
log.debug(`upserting ${docs.length} records to database`);
if (docs.length) {
try {
// upserting logic
log.debug(`total documents added: ${await docs.length}`);
} catch (err) {
await recurse(nextRange.end, event, context);
log.debug(`error inserting docs: ${JSON.stringify(err)}`);
}
}
if (nextRange.end < totalFileSize) {
log.debug(`Last ${context.getRemainingTimeInMillis()} milliseconds left`);
if (context.getRemainingTimeInMillis() < 10 * 10 * 10 * 6) {
log.debug(`Less than 6000 milliseconds left`);
log.debug(`Invoking next iteration`);
await recurse(nextRange.end, event, context);
callback(null, {
message: `Lambda timed out processing file, please continue from LAST_POSITION: ${nextRange.start}`,
});
}
} else {
callback(null, { message: `Successfully completed the chunk processing task` });
}
};
Where recurse is an invocation call to the same lambda. Rest of the things work as expected it just times out whenever the call stack comes on this invocation request:
const recurse = async (position: number, event: S3CreateEvent, context: Context) => {
let newEvent = Object.assign(event, { position });
let request = {
FunctionName: context.invokedFunctionArn,
InvocationType: 'Event',
Payload: JSON.stringify(newEvent),
};
let resp = await lambda.invoke(request).promise();
console.log('Invocation complete', resp);
return resp;
};
This is the stack trace logged to CloudWatch:
{
"errorMessage": "connect ETIMEDOUT 63.32.72.196:443",
"errorType": "NetworkingError",
"stackTrace": [
"Object._errnoException (util.js:1022:11)",
"_exceptionWithHostPort (util.js:1044:20)",
"TCPConnectWrap.afterConnect [as oncomplete] (net.js:1198:14)"
]
}
Not a good idea to create a self-invoking lambda function. In case of an error (could also be a bad handler call on AWS side) a lambda function might re-run several times. Very hard to monitor and debug.
I would suggest using Step Functions. I believe this tutorial can help Iterating a Loop Using Lambda
From the top of my head, if you prefer not dealing with Step Functions, you could create a Lambda trigger for an SQS queue. Then you pass a message to the queue if you want to run the lambda function another time.

AWS Kinesis Firehose Lambda Data Transformation with encryption in node

I am trying to write a data transformation for an Aurora Postgres data stream. The identity transformation gives me objects like this:
{
"type": "DatabaseActivityMonitoringRecords",
"version": "1.0",
"databaseActivityEvents": "AYADeLZrKReAa/7tgBmd/d06ybQAXwA...BABVoaWV1Fi8LyA==",
"key": "AADDDVAS ...pztPgaw=="
}
So it appears to me that I need to decrypt these. I have the arn of the key attached to the database in question but I cannot get the data transformation to work. Here is what I have so far:
console.log('Loading function');
const zlib = require('zlib');
const AWS = require('aws-sdk');
exports.handler = async (event, context) => {
/* Process the list of records and transform them */
const output = event.records.map((record) => {
const keyId = "arn:aws:kms:us-east-1:501..89:key/1...d7"; // I don't need this?
const CiphertextBlob = record.data;
const kmsClient = new AWS.KMS({region: 'us-east-1'});
const pt = kmsClient.decrypt({ CiphertextBlob }, (err, data) => {
if (err) console.log(err, err.stack); // an error occurred
else {
const { Plaintext } = data;
console.log(data);
return Plaintext;
}
});
return { old_data : record.data,};
});
console.log(`Processing completed. Successful records ${output.length}.`);
return { records: output };
};
This gives me the baffling error "errorMessage": "Error: Unable to stringify response body" I am following the example outlined here: https://docs.aws.amazon.com/kms/latest/developerguide/programming-encryption.html#decryption
Any idea what I am doing wrong here?

Is it possible to extract contents of a Cloudwatch log from a subscription

I have a lambda that subscribes to a Cloudwatch Log stream. This all works tickety-boo i.e. when the log stream is written to the lambda receives a notification. Now, is there a way of receiving the contents of the log or a section of the log with the notification or do I then have to query the the log stream to garner the information that I need?
Regards
Angus
Yes you can. Here's how to do it with a Node.js Lambda:
var zlib = require('zlib');
exports.handler = function(input, context) {
// decode input from base64
var zippedInput = new Buffer.from(input.awslogs.data, 'base64');
// decompress the input
zlib.gunzip(zippedInput, function(error, buffer) {
if (error) { context.fail(error); return; }
// parse the input from JSON
var payload = JSON.parse(buffer.toString('utf8'));
// ignore control messages
if (payload.messageType === 'CONTROL_MESSAGE') {
return null;
}
// print the timestamp and message of each log event
payload.logEvents.forEach(function(logEvent) {
console.log(logEvent.timestamp + ' ' + logEvent.message);
});
});
};

AWS javascript SDK request.js send request function execution time gradually increases

I am using aws-sdk to push data to Kinesis stream.
I am using PutRecord to achieve realtime data push.
I am observing same delay in putRecords as well in case of batch write.
I have tried out this with 4 records where I am not crossing any shard limit.
Below is my node js http agent configurations. Default maxSocket value is set to infinity.
Agent {
domain: null,
_events: { free: [Function] },
_eventsCount: 1,
_maxListeners: undefined,
defaultPort: 80,
protocol: 'http:',
options: { path: null },
requests: {},
sockets: {},
freeSockets: {},
keepAliveMsecs: 1000,
keepAlive: false,
maxSockets: Infinity,
maxFreeSockets: 256 }
Below is my code.
I am using following code to trigger putRecord call
event.Records.forEach(function(record) {
var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
// put record request
evt = transformEvent(payload );
promises.push(writeRecordToKinesis(kinesis, streamName, evt ));
}
Event structure is
evt = {
Data: new Buffer(JSON.stringify(payload)),
PartitionKey: payload.PartitionKey,
StreamName: streamName,
SequenceNumberForOrdering: dateInMillis.toString()
};
This event is used in put request.
function writeRecordToKinesis(kinesis, streamName, evt ) {
console.time('WRITE_TO_KINESIS_EXECUTION_TIME');
var deferred = Q.defer();
try {
kinesis.putRecord(evt , function(err, data) {
if (err) {
console.warn('Kinesis putRecord %j', err);
deferred.reject(err);
} else {
console.log(data);
deferred.resolve(data);
}
console.timeEnd('WRITE_TO_KINESIS_EXECUTION_TIME');
});
} catch (e) {
console.error('Error occured while writing data to Kinesis' + e);
deferred.reject(e);
}
return deferred.promise;
}
Below is output for 3 messages.
WRITE_TO_KINESIS_EXECUTION_TIME: 2026ms
WRITE_TO_KINESIS_EXECUTION_TIME: 2971ms
WRITE_TO_KINESIS_EXECUTION_TIME: 3458ms
Here we can see gradual increase in response time and function execution time.
I have added counters in aws-sdk request.js class. I can see same pattern in there as well.
Below is code snippet for aws-sdk request.js class which executes put request.
send: function send(callback) {
console.time('SEND_REQUEST_TO_KINESIS_EXECUTION_TIME');
if (callback) {
this.on('complete', function (resp) {
console.timeEnd('SEND_REQUEST_TO_KINESIS_EXECUTION_TIME');
callback.call(resp, resp.error, resp.data);
});
}
this.runTo();
return this.response;
},
Output for send request:
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 1751ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 1816ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 2761ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 3248ms
Here you can see it is increasing gradually.
Can anyone please suggest how can I reduce this delay?
3 seconds to push single record to Kinesis is not at all acceptable.