AWS CloudWatch Synthetics Canary doesn't load the entire page

AWS CloudWatch Synthetics Canary doesn't load the entire page - amazon-web-services

I basically want to load the following page every minute and create a screenshot of it: https://www.amazon.com/b?ie=UTF8&node=24088939011
It takes about one minute to create an AWS CloudWatch canary: I open AWS CloudWatch, click on "Synthetics Canaries" on the left hand side, click on "Create canary", enter the URL to a webpage, and just use the default settings except that I change it from running every 5 minutes to running every minute.
Availability tab of the canary I created:
Configuration tab of the canary I created:
The canary runs and says it's 100% successful but when I look at the screenshots I see that the page never loads fully:
The screenshots should show what you would see when opening the page in your browser: https://www.amazon.com/b?ie=UTF8&node=24088939011
This is the default script that is used by the canary:
const { URL } = require('url');
const synthetics = require('Synthetics');
const log = require('SyntheticsLogger');
const syntheticsConfiguration = synthetics.getConfiguration();
const syntheticsLogHelper = require('SyntheticsLogHelper');
const loadBlueprint = async function () {
const urls = ['https://www.amazon.com/b?ie=UTF8&node=24088939011'];
// Set screenshot option
const takeScreenshot = true;
/* Disabling default step screen shots taken during Synthetics.executeStep() calls
* Step will be used to publish metrics on time taken to load dom content but
* Screenshots will be taken outside the executeStep to allow for page to completely load with domcontentloaded
* You can change it to load, networkidle0, networkidle2 depending on what works best for you.
*/
syntheticsConfiguration.disableStepScreenshots();
syntheticsConfiguration.setConfig({
continueOnStepFailure: true,
includeRequestHeaders: true, // Enable if headers should be displayed in HAR
includeResponseHeaders: true, // Enable if headers should be displayed in HAR
restrictedHeaders: [], // Value of these headers will be redacted from logs and reports
restrictedUrlParameters: [] // Values of these url parameters will be redacted from logs and reports
});
let page = await synthetics.getPage();
for (const url of urls) {
await loadUrl(page, url, takeScreenshot);
}
};
// Reset the page in-between
const resetPage = async function(page) {
try {
await page.goto('about:blank',{waitUntil: ['load', 'networkidle0'], timeout: 30000} );
} catch(ex) {
synthetics.addExecutionError('Unable to open a blank page ', ex);
}
}
const loadUrl = async function (page, url, takeScreenshot) {
let stepName = null;
let domcontentloaded = false;
try {
stepName = new URL(url).hostname;
} catch (error) {
const errorString = `Error parsing url: ${url}. ${error}`;
log.error(errorString);
/* If we fail to parse the URL, don't emit a metric with a stepName based on it.
It may not be a legal CloudWatch metric dimension name and we may not have an alarms
setup on the malformed URL stepName. Instead, fail this step which will
show up in the logs and will fail the overall canary and alarm on the overall canary
success rate.
*/
throw error;
}
await synthetics.executeStep(stepName, async function () {
const sanitizedUrl = syntheticsLogHelper.getSanitizedUrl(url);
/* You can customize the wait condition here. For instance, using 'networkidle2' or 'networkidle0' to load page completely.
networkidle0: Navigation is successful when the page has had no network requests for half a second. This might never happen if page is constantly loading multiple resources.
networkidle2: Navigation is successful when the page has no more then 2 network requests for half a second.
domcontentloaded: It's fired as soon as the page DOM has been loaded, without waiting for resources to finish loading. Can be used and then add explicit await page.waitFor(timeInMs)
*/
const response = await page.goto(url, { waitUntil: ['networkidle0'], timeout: 30000});
if (response) {
domcontentloaded = true;
const status = response.status();
const statusText = response.statusText();
logResponseString = `Response from url: ${sanitizedUrl} Status: ${status} Status Text: ${statusText}`;
//If the response status code is not a 2xx success code
if (response.status() < 200 || response.status() > 299) {
throw `Failed to load url: ${sanitizedUrl} ${response.status()} ${response.statusText()}`;
}
} else {
const logNoResponseString = `No response returned for url: ${sanitizedUrl}`;
log.error(logNoResponseString);
throw new Error(logNoResponseString);
}
});
// Wait for 15 seconds to let page load fully before taking screenshot.
if (domcontentloaded && takeScreenshot) {
await page.waitFor(15000);
await synthetics.takeScreenshot(stepName, 'loaded');
await resetPage(page);
}
};
const urls = [];
exports.handler = async () => {
return await loadBlueprint();
};
I tried creating a canary in exactly the same way but for another similar page (https://www.amazon.ca/b?ie=UTF8&node=6548466011) and it just works:
What am I doing wrong? Why aren't the screenshots that the canary takes showing the fully loaded page?
Why are parts of the page missing in the screenshot but they show up correctly when I open the page (https://www.amazon.com/b?ie=UTF8&node=24088939011) in my browser?

Related

AWS lambda send partial response

I have a lambda function which does a series of actions. I have a react application which triggers the lambda function.
Is there a way I can send a partial response from the lambda function after each action is complete.
const testFunction = (event, context, callback) => {
let partialResponse1 = await action1(event);
// send partial response to client
let partialResponse2 = await action2(partialResponse1);
// send partial response to client
let partialResponse3 = await action3(partialResponse2);
// send partial response to client
let response = await action4(partialResponse3);
// send final response
}
Is this possible in lambda functions? If so, how we can do this. Any ref docs or sample code would be do a great help.
Thanks.
Note: This is fairly a simple case of showing a loader with % on the client-side. I don't want to overcomplicate things SQS or step functions.
I am still looking for an answer for this.

From what I understand you're using API Gateway + Lambda and are looking to show the progress of the Lambda via UI.
Since each step must finish before the next step begin I see no reason not to call the lambda 4 times, or split the lambda to 4 separate lambdas.
E.g.:
// Not real syntax!
try {
res1 = await ajax.post(/process, {stage: 1, data: ... });
out(stage 1 complete);
res2 = await ajax.post(/process, {stage: 2, data: res1});
out(stage 2 complete);
res3 = await ajax.post(/process, {stage: 3, data: res2});
out(stage 3 complete);
res4 = await ajax.post(/process, {stage: 4, data: res3});
out(stage 4 complete);
out(process finished);
catch(err) {
out(stage {$err.stage-number} failed to complete);
}
If you still want all 4 calls to be executed during the same lambda execution you may do the following (this especially true if the process is expected to be very long) (and because it's usually not good practice to execute "long hanging" http transaction).
You may implement it by saving the "progress" in a database, and when the process is complete save the results to the database as well.
All you need to do is query the status every X seconds.
// Not real syntax
Gateway-API --> lambda1 - startProcess(): returns ID {
uuid = randomUUID();
write to dynamoDB { status: starting }.
send sqs-message-to-start-process(data, uuid);
return response { uuid: uuid };
}
SQS --> lambda2 - execute(): returns void {
try {
let partialResponse1 = await action1(event);
write to dynamoDB { status: action 1 complete }.
// send partial response to client
let partialResponse2 = await action2(partialResponse1);
write to dynamoDB { status: action 2 complete }.
// send partial response to client
let partialResponse3 = await action3(partialResponse2);
write to dynamoDB { status: action 3 complete }.
// send partial response to client
let response = await action4(partialResponse3);
write to dynamoDB { status: action 4 complete, response: response }.
} catch(err) {
write to dynamoDB { status: failed, error: err }.
}
}
Gateway-API --> lambda3 -> getStatus(uuid): returns status {
return status from dynamoDB (uuid);
}
Your UI Code:
res = ajax.get(/startProcess);
uuid = res.uuid;
in interval every X (e.g. 3) seconds:
status = ajax.get(/getStatus?uuid=uuid);
show(status);
if (status.error) {
handle(status.error) and break;
}
if (status.response) {
handle(status.response) and break;
}
}
Just remember that lambda's cannot exceed 15 minutes execution. Therefore, you need to be 100% certain that whatever the process does, it never exceeds this hard limit.

What you are looking for is to have response expose as a stream where you can write to the stream and flush it
Unfortunately its not there in Node.js
How to stream AWS Lambda response in node?
https://docs.aws.amazon.com/lambda/latest/dg/programming-model.html
But you can still do the streaming if you use Java
https://docs.aws.amazon.com/lambda/latest/dg/java-handler-io-type-stream.html
package example;
import java.io.InputStream;
import java.io.OutputStream;
import com.amazonaws.services.lambda.runtime.RequestStreamHandler;
import com.amazonaws.services.lambda.runtime.Context;
public class Hello implements RequestStreamHandler{
public void handler(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
int letter;
while((letter = inputStream.read()) != -1)
{
outputStream.write(Character.toUpperCase(letter));
}
}
}

Aman,
You can push the partial outputs into SQS and read the SQS messages to process those message. This is a simple and scalable architecture. AWS provides SQS SDKs in different languages, for example, JavaScript, Java, Python, etc.
Reading and writing into SQS is very easy using SDK and that too can be implemented in serverside or in your UI layer (with proper IAM).

I found AWS step function may be what you need:
AWS Step Functions lets you coordinate multiple AWS services into serverless workflows so you can build and update apps quickly.
Check this link for more detail:
In our example, you are a developer who has been asked to create a serverless application to automate handling of support tickets in a call center. While you could have one Lambda function call the other, you worry that managing all of those connections will become challenging as the call center application becomes more sophisticated. Plus, any change in the flow of the application will require changes in multiple places, and you could end up writing the same code over and over again.

Puppeteer: multiple user requests to the same Chromium instance

To hopefully save on system resources I want to run user requests through the same Chromium version in Puppeteer.
If a user submits a form on my site which calls Puppeteer, and Chromium is already running, how can I use the same Chromium instance up to a maximum of 4 tabs?
If there are more than 4 tabs open in the Chromium instance then I want to launch a new Chromium instance.
How can I achieve this? Would I need to store the browserWSEndpoint of the Chromium instance to a file and then retrieve it every time a new user submits a request? (This would be using browserWSEndpoint with puppeteer.connect()).
If I have to do it this way, lets say there are 2 Chromium browsers active. The first most recent browser has the maximum four open tabs, so I could not use this browser. I would then check the next browserWSEndpoint and, if there are less than 4 open tabs, create a new page; and if not, launch a new browser.
Does that sound OK?

You can use Lambda which would save you cost , make sure to avoid the 30 seconds timeout of lambda if you are going to be using API Gateway.
https://github.com/alixaxel/chrome-aws-lambda
NodeJS
const chromium = require('chrome-aws-lambda');
exports.handler = async (event, context) => {
let result = null;
let browser = null;
try {
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless,
});
let page = await browser.newPage();
await page.goto(event.url || 'https://example.com');
result = await page.title();
} catch (error) {
return context.fail(error);
} finally {
if (browser !== null) {
await browser.close();
}
}
return context.succeed(result);
};

Google IOT per device heartbeat alert using Stackdriver

I'd like to alert on the lack of a heartbeat (or 0 bytes received) from any one of large number of Google IOT core devices. I can't seem to do this in Stackdriver. It instead appears to let me alert on the entire device registry which does not give me what I'm looking for (How would I know that a particular device is disconnected?)
So how does one go about doing this?

I have no idea why this question was downvoted as 'too broad'.
The truth is Google IOT doesn't have per device alerting, but instead offers only alerting on an entire device registry. If this is not true, please reply to this post. The page that clearly states this is here:
Cloud IoT Core exports usage metrics that can be monitored
programmatically or accessed via Stackdriver Monitoring. These metrics
are aggregated at the device registry level. You can use Stackdriver
to create dashboards or set up alerts.
The importance of having per device alerting is built into the promise assumed in this statement:
Operational information about the health and functioning of devices is
important to ensure that your data-gathering fabric is healthy and
performing well. Devices might be located in harsh environments or in
hard-to-access locations. Monitoring operational intelligence for your
IoT devices is key to preserving the business-relevant data stream.
So its not easy today to get an alert if one among many, globally dispersed devices, loses connectivity. One needs to build that, and depending on what one is trying to do, it would entail different solutions.
In my case I wanted to alert if the last heartbeat time or last event state publish was older than 5 minutes. For this I need to run a looping function that scans the device registry and performs this operation regularly. The usage of this API is outlined in this other SO post: Google iot core connection status

For reference, here's a Firebase function I just wrote to check a device's online status, probably needs some tweaks and further testing, but to help anybody else with something to start with:
// Example code to call this function
// const checkDeviceOnline = functions.httpsCallable('checkDeviceOnline');
// Include 'current' key for 'current' online status to force update on db with delta
// const isOnline = await checkDeviceOnline({ deviceID: 'XXXX', current: true })
export const checkDeviceOnline = functions.https.onCall(async (data, context) => {
if (!context.auth) {
throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
}
// deviceID is passed in deviceID object key
const deviceID = data.deviceID
const dbUpdate = (isOnline) => {
if (('wasOnline' in data) && data.wasOnline !== isOnline) {
db.collection("devices").doc(deviceID).update({ online: isOnline })
}
return isOnline
}
const deviceLastSeen = () => {
// We only want to use these to determine "latest seen timestamp"
const stamps = ["lastHeartbeatTime", "lastEventTime", "lastStateTime", "lastConfigAckTime", "deviceAckTime"]
return stamps.map(key => moment(data[key], "YYYY-MM-DDTHH:mm:ssZ").unix()).filter(epoch => !isNaN(epoch) && epoch > 0).sort().reverse().shift()
}
await dm.setAuth()
const iotDevice: any = await dm.getDevice(deviceID)
if (!iotDevice) {
throw new functions.https.HttpsError('failed-get-device', 'Failed to get device!');
}
console.log('iotDevice', iotDevice)
// If there is no error status and there is last heartbeat time, assume device is online
if (!iotDevice.lastErrorStatus && iotDevice.lastHeartbeatTime) {
return dbUpdate(true)
}
// Add iotDevice.config.deviceAckTime to root of object
// For some reason in all my tests, I NEVER receive anything on lastConfigAckTime, so this is my workaround
if (iotDevice.config && iotDevice.config.deviceAckTime) iotDevice.deviceAckTime = iotDevice.config.deviceAckTime
// If there is a last error status, let's make sure it's not a stale (old) one
const lastSeenEpoch = deviceLastSeen()
const errorEpoch = iotDevice.lastErrorTime ? moment(iotDevice.lastErrorTime, "YYYY-MM-DDTHH:mm:ssZ").unix() : false
console.log('lastSeen:', lastSeenEpoch, 'errorEpoch:', errorEpoch)
// Device should be online, the error timestamp is older than latest timestamp for heartbeat, state, etc
if (lastSeenEpoch && errorEpoch && (lastSeenEpoch > errorEpoch)) {
return dbUpdate(true)
}
// error status code 4 matches
// lastErrorStatus.code = 4
// lastErrorStatus.message = mqtt: SERVER: The connection was closed because MQTT keep-alive check failed.
// will also be 4 for other mqtt errors like command not sent (qos 1 not acknowledged, etc)
if (iotDevice.lastErrorStatus && iotDevice.lastErrorStatus.code && iotDevice.lastErrorStatus.code === 4) {
return dbUpdate(false)
}
return dbUpdate(false)
})
I also created a function to use with commands, to send a command to the device to check if it's online:
export const isDeviceOnline = functions.https.onCall(async (data, context) => {
if (!context.auth) {
throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
}
// deviceID is passed in deviceID object key
const deviceID = data.deviceID
await dm.setAuth()
const dbUpdate = (isOnline) => {
if (('wasOnline' in data) && data.wasOnline !== isOnline) {
console.log( 'updating db', deviceID, isOnline )
db.collection("devices").doc(deviceID).update({ online: isOnline })
} else {
console.log('NOT updating db', deviceID, isOnline)
}
return isOnline
}
try {
await dm.sendCommand(deviceID, 'alive?', 'alive')
console.log('Assuming device is online after succesful alive? command')
return dbUpdate(true)
} catch (error) {
console.log("Unable to send alive? command", error)
return dbUpdate(false)
}
})
This also uses my version of a modified DeviceManager, you can find all the example code on this gist (to make sure using latest update, and keep post on here small):
https://gist.github.com/tripflex/3eff9c425f8b0c037c40f5744e46c319
All of this code, just to check if a device is online or not ... which could be easily handled by Google emitting some kind of event or adding an easy way to handle this. COME ON GOOGLE GET IT TOGETHER!

Starting a StepFunction and exiting doesn't trigger execution

I have Lambda function tranportKickoff which receives an input and then sends/proxies that input forward into a Step Function. The code below does run and I am getting no errors but at the same time the step function is NOT executing.
Also critical to the design, I do not want the transportKickoff function to wait around for the step function to complete as it can be quite long running. I was, however, expecting that any errors in the calling of the Step Function would be reported back synchronously. Maybe this thought is at fault and I'm somehow missing out on an error that is thrown somewhere. If that's the case, however, I'd like to find a way which is able to achieve the goal of having the kickoff lambda function exit as soon as the Step Function has started execution.
note: I can execute the step function independently and I know that it works correctly
const stepFn = new StepFunctions({ apiVersion: "2016-11-23" });
const stage = process.env.AWS_STAGE;
const name = `transport-steps ${message.command} for "${stage}" environment at ${Date.now()}`;
const params: StepFunctions.StartExecutionInput = {
stateMachineArn: `arn:aws:states:us-east-1:999999999:stateMachine:transportion-${stage}-steps`,
input: JSON.stringify(message),
name
};
const request = stepFn.startExecution(params);
request.send();
console.info(
`startExecution request for step function was sent, context sent was:\n`,
JSON.stringify(params, null, 2)
);
callback(null, {
statusCode: 200
});
I have also checked from the console that I have what I believe to be the right permissions to start the execution of a step function:
I've now added more permissions (see below) but still experiencing the same problem:
'states:ListStateMachines'
'states:CreateActivity'
'states:StartExecution'
'states:ListExecutions'
'states:DescribeExecution'
'states:DescribeStateMachineForExecution'
'states:GetExecutionHistory'

Ok I have figured this one out myself, hopefully this answer will be helpful for others:
First of all, the send() method is not a synchronous call but it does not return a promise either. Instead you must setup listeners on the Request object before sending so that you can appropriate respond to success/failure states.
I've done this with the following code:
const stepFn = new StepFunctions({ apiVersion: "2016-11-23" });
const stage = process.env.AWS_STAGE;
const name = `${message.command}-${message.upc}-${message.accountName}-${stage}-${Date.now()}`;
const params: StepFunctions.StartExecutionInput = {
stateMachineArn: `arn:aws:states:us-east-1:837955377040:stateMachine:transportation-${stage}-steps`,
input: JSON.stringify(message),
name
};
const request = stepFn.startExecution(params);
// listen for success
request.on("extractData", req => {
console.info(
`startExecution request for step function was sent and validated, context sent was:\n`,
JSON.stringify(params, null, 2)
);
callback(null, {
statusCode: 200
});
});
// listen for error
request.on("error", (err, response) => {
console.warn(
`There was an error -- ${err.message} [${err.code}, ${
err.statusCode
}] -- that blocked the kickoff of the ${message.command} ITMS command for ${
message.upc
} UPC, ${message.accountName} account.`
);
callback(err.statusCode, {
message: err.message,
errors: [err]
});
});
// send request
request.send();
Now please bear in mind there is a "success" event but I used "extractData" to capture success as I wanted to get a response as quickly as possible. It's possible that success would have worked equally as well but looking at the language in the Typescript typings it wasn't entirely clear and in my testing I'm certain that the "extractData" method does work as expected.
As for why I was not getting any execution on my step functions ... it had to the way I was naming the function ... you're limited to a subset of characters in the name and I'd stepped over that restriction but didn't realize until I was able to capture the error with the code above.

For anyone encountering issues executing state machines from Lambda's make sure the permission 'states:StartExecution' is added to the Lambda permissions and the regions match up.
Promise based version:
import { StepFunctions } from 'aws-sdk';
const clients = {
stepFunctions: new StepFunctions();
}
const createExecutor = ({ clients }) => async (event) => {
console.log('Executing media pipeline job');
const params = {
stateMachineArn: '<state-machine-arn>',
input: JSON.stringify({}),
name: 'new-job',
};
const result = await stepFunctions.startExecution(params).promise();
// { executionArn: "string", startDate: number }
return result;
};
const startExecution = createExecutor({ clients });
// Pass in the event from the Lambda e.g S3 Put, SQS Message
await startExecution(event);
Result should contain the execution ARN and start date (read more)

Phantomjs cookie authentication failed to capture and print the web page

I am using phantomjs to print the webpage and create a pdf. As the UI needs the user's authentication before finding the data, I used persistent cookies to authenticate the user. But somehow I got login screen every time in the created PDF. I observed that the user authenticated successfully and also the result's webpage showing proper result (debug logs showing the proper data array) but while printing the web page or creating a PDF, it somehow gets the login screen. Sometimes I observed that I got two different cookies in my PHP code while getting the report data and in javascript 'document.cookies'.
Please let me know how can I fix this.
var page = require('webpage').create(),
system = require('system'), t, address;
page.settings.userName = 'myusername';
page.settings.password = 'mypassword';
if (system.args.length === 1) {
console.log('Usage: scrape.js ');
phantom.exit();
} else {
t = Date.now();
address = system.args[1];
page.open(address, function (status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
t = Date.now() - t;
var title = page.evaluate(function() { return document.title;})
console.log('Page title is ' + title);
console.log('Loading time ' + t + ' msec');
}
phantom.exit();
});
}
Another piece of code of sending a cookie file
bin/phantomjs --cookies-file=/tmp/cookies.txt --disk-cache=yes --ignore-ssl-errors=yes /phantomjs/pdf.js 'username' 'params' '/tmp/phantomjs_file' /tmp/phantom_pdf.pdf
And
phantomjs --cookies-file=cookies.txt examples/rasterize.js localhost:7000/reports /tmp/report.pdf

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS CloudWatch Synthetics Canary doesn't load the entire page - amazon-web-services

Related

AWS lambda send partial response

Puppeteer: multiple user requests to the same Chromium instance

Google IOT per device heartbeat alert using Stackdriver

Starting a StepFunction and exiting doesn't trigger execution

Phantomjs cookie authentication failed to capture and print the web page

Categories

Resources