Problems running puppeteer inside EC2 instance

Problems running puppeteer inside EC2 instance - amazon-web-services

I am using AWS CloudFormation to deploy my application inside AWS.
I'm using a t2.2xlarge EC2 instance inside an ECS Cluster with Load Balancing.
I have a microservice written in nodejs that process some HTML, converts it to PDF and upload the output to S3. That's where I use puppeteer.
The problem is that whenever I execute the application inside the ec2 instance, the code reaches a point in which it opens a new page and stops, not resolve, or never ends. Sincerely, I don't know what is happening.
This is part of the code snippet that it executes:
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-web-security'
]
});
console.log('Puppeteer before launch ...');
console.log(await browser.version());
console.log('Puppeteer launched ...');
const page = await browser.newPage();
console.log('Fetching contents ...');
const URL = `http://${FRONTEND_ENDPOINT}/Invoice/${invoiceData.id}`;
await page.goto(URL, {
waitUntil: ['networkidle0']
});
await page.content();
const bodyHandle = await page.$('body');
await page.evaluate(body => body.innerHTML, bodyHandle);
await bodyHandle.dispose();
console.log('Saving pdf file ...');
await page.emulateMedia('screen');
await page.pdf({
path: path.join(__dirname, '../tmp/page1.pdf'),
format: 'A4',
printBackground: true
});
await browser.close();
I am basically crawling a page, getting its HTML contents and converting it to PDF.
These are my logs:
Puppeteer before launch ...
Puppeteer launched ...
And it does not print anything else.
UPDATE:
This is the print of the logs using the DEBUG flag

Related

Are API keys and Google Cloud Platform service account credentials safe to store as environment variables in Netlify deploy settings?

I have this app https://github.com/ChristianOConnor/google-cloudfunction-callfromreactapp. It works by simply calling some text via a button press. The text is delivered by a Netlify function. I set up the Netlify function by adding a netlify.toml file to the root directory:
netlify.toml:
[functions]
directory = "functions/"
and adding this file:
functions/hello-netlify.js:
exports.handler = async (event) => {
return {
statusCode: 200,
body: process.env.GREETING_TEST,
};
};
I added GREETING_TEST environmental variable in Netlify's deploy settings and set it to "this variable is now working":
The app works perfectly after deploying:
I have a default python Google Cloud Function that simply prints "Hello World!"
The question is, if I replace the test Netlify function that spits out "this variable is now working," with this,
import { JWT } from "google-auth-library";
exports.handler = async (event) => {
const client = new JWT({
email: process.env.CLIENT_EMAIL,
key: process.env.PRIVATE_KEY
});
const url = process.env.RUN_APP_URL;
const res = await client.request({url});
const resData = res.data
return {
statusCode: 200,
body: resData,
};
};
set the CLIENT_EMAIL and PRIVATE_KEY to that of my relevant Google Cloud Function service account, and set RUN_APP_URL to the Google Cloud Function's trigger url, would that be safe? My secret environment variables like PRIVATE_KEY would never be visible right?

AWS Amplify Multiple frontends single API & Backend

I have 2 amplify webapps and I want that both are using the same AWS backend. So i followed the instructions on https://docs.amplify.aws/cli/teams/multi-frontend/#workflow. App A has the full amplify backend src and App B shall use these too. So I run on App B.
amplify pull
>Amplify AppID found: df4xxxxxxx. Amplify App name is: XXXXX
>Backend environment dev found in Amplify Console app: XXXXX
Seems to work.
But when I now try to make an api call via:
AWS_API = 'api_name_on_aws';
async getUserInfosByUsername(username) {
var userInfos;
await API.get(this.AWS_API, `/users/infos/testuser`,
{
headers: {},
response: true,
body: {},
queryStringParameters: {},
})
.then((response) => {
userInfos = response;
})
.catch((error) => {
console.log(error.response);
});
return userInfos;
}
then no api request will send. (I can see within the google chrome dev console/network that no request is send).
The "request" method is just return "undefined" and thats all... On App A everything is working fine.
Did I miss something? Should I do something else that App B can use the API of APP A?

Uppy Companion doesn't work for > 5GB files with Multipart S3 uploads

Our app allow our clients large file uploads. Files are stored on AWS/S3 and we use Uppy for the upload, and dockerize it to be used under a kubernetes deployment where we can up the number of instances.
It works well, but we noticed all > 5GB uploads fail. I know uppy has a plugin for AWS multipart uploads, but even when installed during the container image creation, the result is the same.
Here's our Dockerfile. Has someone ever succeeded in uploading > 5GB files to S3 via uppy? IS there anything we're missing?
FROM node:alpine AS companion
RUN yarn global add #uppy/companion#3.0.1
RUN yarn global add #uppy/aws-s3-multipart
ARG UPPY_COMPANION_DOMAIN=[...redacted..]
ARG UPPY_AWS_BUCKET=[...redacted..]
ENV COMPANION_SECRET=[...redacted..]
ENV COMPANION_PREAUTH_SECRET=[...redacted..]
ENV COMPANION_DOMAIN=${UPPY_COMPANION_DOMAIN}
ENV COMPANION_PROTOCOL="https"
ENV COMPANION_DATADIR="COMPANION_DATA"
# ENV COMPANION_HIDE_WELCOME="true"
# ENV COMPANION_HIDE_METRICS="true"
ENV COMPANION_CLIENT_ORIGINS=[...redacted..]
ENV COMPANION_AWS_KEY=[...redacted..]
ENV COMPANION_AWS_SECRET=[...redacted..]
ENV COMPANION_AWS_BUCKET=${UPPY_AWS_BUCKET}
ENV COMPANION_AWS_REGION="us-east-2"
ENV COMPANION_AWS_USE_ACCELERATE_ENDPOINT="true"
ENV COMPANION_AWS_EXPIRES="3600"
ENV COMPANION_AWS_ACL="public-read"
# We don't need to store data for just S3 uploads, but Uppy throws unless this dir exists.
RUN mkdir COMPANION_DATA
CMD ["companion"]
EXPOSE 3020
EDIT:
I made sure I had:
uppy.use(AwsS3Multipart, {
limit: 5,
companionUrl: '<our uppy url',
})
And it still doesn't work- I see all the chunks of the 9GB file sent on the network tab but as soon as it hits 100% -- uppy throws an error "cannot post" (to our S3 url) and that's it. failure.
Has anyone ever encountered this? upload goes fine till 100%, then the last chunk gets HTTP error 413, making the entire upload fail.
Thanks!

Here I'm adding some code samples from my repository that will help you to understand the flow of using the BUSBOY package to stream the data to the S3 bucket. Also, I'm adding the reference links here for you to get the package details I'm using.
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/index.html
https://www.npmjs.com/package/busboy
export const uploadStreamFile = async (req: Request, res: Response) => {
const busboy = new Busboy({ headers: req.headers });
const streamResponse = await busboyStream(busboy, req);
const uploadResponse = await s3FileUpload(streamResponse.data.buffer);
return res.send(uploadResponse);
};
const busboyStream = async (busboy: any, req: Request): Promise<any> {
return new Promise((resolve, reject) => {
try {
const fileData: any[] = [];
let fileBuffer: Buffer;
busboy.on('file', async (fieldName: any, file: any, fileName: any, encoding: any, mimetype: any) => {
// ! File is missing in the request
if (!fileName)
reject("File not found!");
let totalBytes: number = 0;
file.on('data', (chunk: any) => {
fileData.push(chunk);
// ! given code is only for logging purpose
// TODO will remove once project is live
totalBytes += chunk.length;
console.log('File [' + fieldName + '] got ' + chunk.length + ' bytes');
});
file.on('error', (err: any) => {
reject(err);
});
file.on('end', () => {
fileBuffer = Buffer.concat(fileData);
});
});
// ? Haa, finally file parsing wen't well
busboy.on('finish', () => {
const responseData: ResponseDto = {
status: true, message: "File parsing done", data: {
buffer: fileBuffer,
metaData
}
};
resolve(responseData)
console.log('Done parsing data! -> File uploaded');
});
req.pipe(busboy);
} catch (error) {
reject(error);
}
});
}
const s3FileUpload = async (fileData: any): Promise<ResponseDto> {
try {
const params: any = {
Bucket: <BUCKET_NAME>,
Key: <path>,
Body: fileData,
ContentType: <content_type>,
ServerSideEncryption: "AES256",
};
const command = new PutObjectCommand(params);
const uploadResponse: any = await this.S3.send(command);
return { status: true, message: "File uploaded successfully", data: uploadResponse };
} catch (error) {
const responseData = { status: false, message: "Monitor connection failed, please contact tech support!", error: error.message };
return responseData;
}
}

In the AWS S3 service in a single PUT operation, you can upload a single object up to 5 GB in size.
To upload > 5GB files to S3 you need to use the multipart upload S3 API, and also the AwsS3Multipart Uppy API.
Check your upload code to understand if you are using AWSS3Multipart correctly, setting the limit properly for example, in this case a limit between 5 and 15 is recommended.
import AwsS3Multipart from '#uppy/aws-s3-multipart'
uppy.use(AwsS3Multipart, {
limit: 5,
companionUrl: 'https://uppy-companion.myapp.net/',
})
Also, check this issue on Github Uploading a large >5GB file to S3 errors out #1945

If you're getting Error: request entity too large in your Companion server logs I fixed this in my Companion express server by increasing the body-parser limit:
app.use(bodyparser.json({ limit: '21GB', type: 'application/json' }))
This is a good working example of Uppy S3 MultiPart uploads (without this limit increased): https://github.com/jhanitesh10/uppy
I'm able to upload files up to a (self-imposed) limit of 20GB using this code.

Uppy - How do you upload to s3 via multipart? Using companion?

https://uppy.io/docs/aws-s3-multipart/
Uppy multipart plugin sounds like exactly what I need but I can't see how to do the backend part of things. The impression I get is that I need to setup a companion to route the upload to S3 but can't find any details on setting up the companion for this.
I can see lots of references about using Companion to fetch external content but none on the multipart S3 uploading.
I neither see anywhere inside Uppy to provide AWS credentials which makes me think Companion even more.
But there are 4 steps to complete a multipart upload and I can't see how providing one companion url will help Uppy.
Thanks in advance to anyone who can help or jog me in the right direction.

Providing Uppy a companion URL makes it so that Uppy will fire off a series of requests to the-passed-url.com/s3/multipart. You then need to configure your server to handle these requests. Your server will be where your credentials are handled for AWS.
In short when you click the upload button in Uppy, this is what happens:
Uppy sends a post request to /s3/multipart to create/initiate the multipart upload.
Using the data returned from the previous request, Uppy will send a get request to /s3/multipart/{uploadId} to generate AWS S3 pre-signed URLs to use for uploading the parts.
Uppy will then upload the parts using the pre-signed URLs from the previous request.
Finally, Uppy will send a post request to /s3/multipart/{uploadId}/complete to complete the multipart upload.
I was able to accomplish this using Laravel/Vue. I don't know what your environment is but I've posted my solution which should help, especially if your server is using PHP.
Configuring Uppy to Use Multipart Uploads with Laravel/Vue

I am sharing code snippets for AWS S3 Multipart [github]
If you add Companion to the mix, your users will be able to select files from remote sources, such as Instagram, Google Drive, and Dropbox, bypassing the client (so a 5 GB video isn’t eating into your users’ data plans), and then uploaded to the final destination. Files are removed from Companion after an upload is complete, or after a reasonable timeout. Access tokens also don’t stick around for long, for security reasons.
Setup companion server:
1: Setup s3 configuration.
Uppy automatically generates the upload URL and puts the file in the uploads directory.
s3: {
getKey: (req, filename) =>{
return `uploads/${filename}`;
},
key: 'AWS KEY',
secret: 'AWS SECRET',
bucket: 'AWS BUCKET NAME',
},
2: Support upload from a remote resource
Uppy handles everything for us. We just need to provide a secret key and token from different remote resources like Instagram, drive, etc.
example: Drive upload
Generate google key and secrete from google and add it to code
Add redirect URL for authentication
3: Run node server locally
const fs = require('fs')
const path = require('path')
const rimraf = require('rimraf')
const companion = require('#uppy/companion')
const app = require('express')()
const DATA_DIR = path.join(__dirname, 'tmp')
app.use(require('cors')({
origin: true,
credentials: true,
}))
app.use(require('cookie-parser')())
app.use(require('body-parser').json())
app.use(require('express-session')({
secret: 'hello planet',
}))
const options = {
providerOptions: {
drive: {
key: 'YOUR GOOGLE DRIVE KEY',
secret: 'YOUR GOOGLE DRIVE SECRET'
},
s3: {
getKey: (req, filename) =>{
return `uploads/${filename}`;
} ,
key: 'AWS KEY',
secret: 'AWS SECRET',
bucket: 'AWS BUCKET NAME',
},
},
server: { host: 'localhost:3020' },
filePath: DATA_DIR,
secret: 'blah blah',
debug: true,
}
try {
fs.accessSync(DATA_DIR)
} catch (err) {
fs.mkdirSync(DATA_DIR)
}
process.on('exit', () => {
rimraf.sync(DATA_DIR)
})
app.use(companion.app(options))
// handle server errors
const server = app.listen(3020, () => {
console.log('listening on port 3020')
})
companion.socket(server, options)
Setup client:
1: client HTML code:
This code will allow upload from the drive, webcam, local, etc. You can customize it to support more remote places.
Add companion URL as your above node server running URL(http://localhost:3020)
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Uppy</title>
<link href="https://releases.transloadit.com/uppy/v1.29.1/uppy.min.css" rel="stylesheet">
</head>
<body>
<div id="drag-drop-area"></div>
<script src="https://releases.transloadit.com/uppy/v1.29.1/uppy.min.js"></script>
<script>
Uppy.Core({
debug: false,
autoProceed: false,
restrictions: {
maxNumberOfFiles: 5,
}
}).
use(Uppy.AwsS3Multipart, {
limit: 4,
companionUrl: 'http://localhost:3020'
}).
use(Uppy.Dashboard, {
inline: true,
showProgressDetails: true,
showLinkToFileUploadResult: false,
proudlyDisplayPoweredByUppy: false,
target: '#drag-drop-area',
}).use(Uppy.GoogleDrive, { target: Uppy.Dashboard, companionUrl: 'http://localhost:3020' })
.use(Uppy.Url, { target: Uppy.Dashboard, companionUrl: 'http://localhost:3020' })
.use(Uppy.Webcam, { target: Uppy.Dashboard, companionUrl: 'http://localhost:3020' });
</script>
</body>
</html>

Google cloud function with different end point sheducle with google

I have created a project in express
const express = require('express');
const app = express();
const PORT = 5555;
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});
app.get('/tr', (req, res, next) => {
res.json({ status: 200, data: 'tr' })
});
app.get('/po', (req, res, next) => {
res.json({ status: 200, data: 'po' })
});
module.exports = {
app
};
deployed on cloud function with name my-transaction
and i am scheduling with google clound giving the url like
http://url/my-transaction/po
When I deployed without authentiation scheduler runs job success, but when I do with authentication it fails.
similary if i create a sample project like below
exports.helloHttp = (req, res) => {
res.json({ status: 200, data: 'test hello' })
};
and deploy similary configuring same as above with authentication it works.
only differce is in last function name is similar to entry point means
while above entry point is app with different end points.
any help,
appreciated
Thanks

This is because you need to add auth information to your http requests on cloud Scheduler
First you need to create a service account with the role Cloud Functions Invoker
when you have created the service account, you can see that has a email associated fro example:
cfinvoker#fakeproject.iam.gserviceaccount.com
After that you can create a new scheduler job with auth information by following these steps:
Select target http
Write the url (cloud function url)
Click on "show more"
Select Auth header > Add OIDC token
Write the full email address of the service account
This new job scheduler will be send the http request with the auth infromation to execute successfully your cloud function.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Problems running puppeteer inside EC2 instance - amazon-web-services

Related

Are API keys and Google Cloud Platform service account credentials safe to store as environment variables in Netlify deploy settings?

AWS Amplify Multiple frontends single API & Backend

Uppy Companion doesn't work for > 5GB files with Multipart S3 uploads

Uppy - How do you upload to s3 via multipart? Using companion?

Google cloud function with different end point sheducle with google

Categories

Resources