I am working to create a cloud function that moves avro files from GCS to BigQuery, anytime a new file lands in GCS. I am using the cloud function ui in GCP. I have 512 MB for memory allocated. Trigger is Google Storage. Event Type if Finalize/Create. Source Code is Inline Editor.
Below is my code. I can successfully deploy, but I am receiving the below error post deployment, and nothing moves to BigQuery.
Additionally, I am attempting to move avro files from a folder WITHIN a bucket, so I am not pulling directly from the top parent bucket. That is the purpose of the below code, as I attempt to get into the folder, which is called "example_spend/"
error: Cannot find module google-cloud/bigquery
'use strict';
exports.createExampleTableFromFile = function(event, callback) {
const file = event.data;
if (file.resourceState === 'exists' && file.name &&
file.name.indexOf('example_spend/') !== -1) {
console.log('Processing file: ' + event.data.name);
const BigQuery = require('#google-cloud/bigquery');
const Storage = require('#google-cloud/storage');
const assert = require('assert');
const filename = event.data.name;
const bucketName = event.data.bucket;
const projectId = "gcp-pilot-192921";
const datasetId = "example_etl";
const tableId = filename.slice(0,filename.indexOf(".")).toLowerCase();
const bigquery = new BigQuery({
projectId: projectId,
});
const storage = Storage({
projectId: projectId
});
const metadata = {
sourceFormat: 'AVRO',
autodetect: true,
writeDisposition: 'WRITE_TRUNCATE'
};
bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(filename), metadata)
.then(results => {
const job = results[0];
assert.equal(job.status.state, 'DONE');
console.log(`Job ${job.id} to load table ${tableId} completed.`);
const errors = job.status.errors;
if (errors && errors.length > 0) {
throw errors;
}
})
.catch(err => {
console.error('Error during load job:', err);
});
callback();
}};
It looks like you haven't added any dependencies to your function:
When using the inline editor, click on to "requirements.txt" for python or "package.json" for javascript, where you can enter the required packages that your function needs to run, these will then be imported when your function spins up. Note that you can also specify versions if required, for example in python: requests==2.19.0.
Related
I upload file to google storage using "#ffmpeg-installer/ffmpeg" and #google-cloud/storage in my node.js App.
Step 1. file uploading to fs is in child processes - one process for each type of resolution (totaly six).
step 2. encription (converting to stream)
step 3. upload to google storage
I use "Upload a directory to a bucket" in order to send the video from the client to the Google Cloud Storage bucket.
This way is working fine only with small video.
for example when I upload video with duration one hour it split on chunk and totally I get more three thousands files. But the problem occurs when there are more than 1500 files
So actually i upload folder with large amount of files, but not all of this files are uploaded to cloud.
maybe someone had the similar problem and helps fix it.
const uploadFolder = async (bucketName, directoryPath, socketInstance) => {
try {
let dirCtr = 1;
let itemCtr = 0;
const fileList = [];
const onComplete = async () => {
const folderName = nanoid(46);
await Promise.all(
fileList.map(filePath => {
const fileName = path.relative(directoryPath, filePath);
const destination = `${ folderName }/${ fileName }`;
return storage
.bucket(bucketName)
.upload(filePath, { destination })
.then(
uploadResp => ({ fileName: destination, status: uploadResp[0] }),
err => ({ fileName: destination, response: err })
);
})
);
if (socketInstance) socketInstance.emit('uploadProgress', {
message: `Added files to Google bucket`,
last: false,
part: false
});
return folderName;
};
const getFiles = async directory => {
const items = await fs.readdir(directory);
dirCtr--;
itemCtr += items.length;
for(const item of items) {
const fullPath = path.join(directory, item);
const stat = await fs.stat(fullPath);
itemCtr--;
if (stat.isFile()) {
fileList.push(fullPath);
} else if (stat.isDirectory()) {
dirCtr++;
await getFiles(fullPath);
}
}
}
await getFiles(directoryPath);
return onComplete();
} catch (e) {
log.error(e.message);
throw new Error('Can\'t store folder.');
}
};
I am scouring the documentation, and it only provides pseudo-code of the credentials for v3 (e.g. const client = new S3Client(clientParams)
How do I initialize an S3Client with the bucket and credentials to perform a getSignedUrl request? Any resources pointing me in the right direction would be most helpful. I've even searched YouTube, SO, etc and I can't find any specific info on v3. Even the documentation and examples doesn't provide the actual code to use credentials. Thanks!
As an aside, do I have to include the fake folder structure in the filename, or can I just use the actual filename? For example: bucket/folder1/folder2/uniqueFilename.zip or uniqueFilename.zip
Here's the code I have so far: (Keep in mind I was returning the wasabiObjKey to ensure I was getting the correct file name. I am. It's the client, GetObjectCommand, and getSignedUrl that I'm having issues with.
exports.getPresignedUrl = functions.https.onCall(async (data, ctx) => {
const wasabiObjKey = `${data.bucket_prefix ? `${data.bucket_prefix}/` : ''}${data.uid.replace(/-/g, '_').toLowerCase()}${data.variation ? `_${data.variation.replace(/\./g, '').toLowerCase()}` : ''}.zip`
const { S3Client, GetObjectCommand } = require('#aws-sdk/client-s3')
const s3 = new S3Client({
bucketEndpoint: functions.config().s3_bucket.name,
region: functions.config().s3_bucket.region,
credentials: {
secretAccessKey: functions.config().s3.secret,
accessKeyId: functions.config().s3.access_key
}
})
const command = new GetObjectCommand({
Bucket: functions.config().s3_bucket.name,
Key: wasabiObjKey,
})
const { getSignedUrl } = require("#aws-sdk/s3-request-presigner")
const url = getSignedUrl(s3, command, { expiresIn: 60 })
return wasabiObjKey
})
There are a credential chain that provide credential to your API calls from SDK
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/setting-credentials-node.html
Loaded from AWS Identity and Access Management (IAM) roles for Amazon
EC2
Loaded from the shared credentials file (~/.aws/credentials)
Loaded from environment variables
Loaded from a JSON file on disk
Other credential-provider classes provided by the JavaScript SDK
You can embed the credential inside your source code but it's not the prefered way
new S3Client(configuration: S3ClientConfig): S3Client
Where S3ClientConfig contain a credentials property
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/modules/credentials.html
const { S3Client,GetObjectCommand } = require("#aws-sdk/client-s3");
let client = new S3Client({
region:'ap-southeast-1',
credentials:{
accessKeyId:'',
secretAccessKey:''
}
});
(async () => {
const response = await client.send(new GetObjectCommand({Bucket:"BucketNameHere",Key:"ObjectNameHere"}));
console.log(response);
})();
Sample answer
'$metadata': {
httpStatusCode: 200,
requestId: undefined,
extendedRequestId: '7kwrFkEp3lEnLU+OtxjrgdmS6gQmvPdbnqqR7I8P/rdFrUPBkdKYPYykWivuHPXCF1IHgjCIbe8=',
cfId: undefined,
attempts: 1,
totalRetryDelay: 0
},
Here's a simple approach I use (in Deno) for testing (in case you don't want to go the signedUrl approach and just let the SDK do the heavy lifting for you):
import { config as env } from 'https://deno.land/x/dotenv/mod.ts' // https://github.com/pietvanzoen/deno-dotenv
import { S3Client, ListObjectsV2Command } from 'https://cdn.skypack.dev/#aws-sdk/client-s3' // https://github.com/aws/aws-sdk-js-v3
const {AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY} = env()
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/modules/credentials.html
const credentials = {
accessKeyId: AWS_ACCESS_KEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
}
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/interfaces/s3clientconfig.html
const config = {
region: 'ap-southeast-1',
credentials,
}
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/s3client.html
const client = new S3Client(config)
export async function list() {
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/interfaces/listobjectsv2commandinput.html
const input = {
Bucket: 'BucketNameHere'
}
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/command.html
const cmd = new ListObjectsV2Command(input)
// https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/listobjectsv2command.html
return await client.send(cmd)
}
I'm trying to use imagemagick in my Google Cloud function. The function is triggered by uploading a file to a Google Cloud Storage bucket. I have grander plans, but trying to get there one step at a time. Starting with identify.
// imagemagick_setup
const gm = require('gm').subClass({imageMagick: true});
const path = require('path');
const {Storage} = require('#google-cloud/storage');
const storage = new Storage();
exports.processListingImage = (event, context) => {
const object = event.data || event; // Node 6: event.data === Node 8+: event
const filename = object.name;
console.log("Filename: ", filename);
const fullFileObject = storage.bucket(object.bucket).file(object.name);
console.log("Calling resize function");
let resizePromise = resizeImage( fullFileObject );
<more stuff>
};
function resizeImage( file, sizes ) {
const tempLocalPath = `/tmp/${path.parse(file.name).base}`;
return file
.download({destination: tempLocalPath})
.catch(err => {
console.error('Failed to download file.', err);
return Promise.reject(err);
})
.then( () => {
// file now downloaded, get it's metadata
return new Promise((resolve, reject) => {
gm( tempLocalPath )
.identify( (err, result) => {
if (err)
{
console.log("Error reading metadata: ", err);
}
else
{
console.log("Well, there seems to be metadata: ", result);
}
});
});
});
} // end resizeImage()
The local file path is: "/tmp/andy-test.raw". But when the identify function runs, I get an error:
identify-im6.q16: unable to open image `/tmp/magick-12MgKrSna0qp9U.ppm': No such file or directory # error/blob.c/OpenBlob/2701.
Why is identify looking for a different file than what I (believe) I told it to look for? Eventually, I am going to resize the image and write it back out to Cloud Storage, but I wanted to get identify to run first..
Mark had the right answer - if I upload a jpg file, it works. Onto the next challenge.
My Function is triggered by a cloud storage event and will load files into a BigQuery table, my issue is that we recieved some .zip files with the same name and the function is attempting to load these files as well and this is causing some issues with the table. I need to make the code only process files that are .csv. Below is the code I have so far:
exports.ToBigQuery = (event, callback) => {
const file = event.data;
const context = event.context;
const BigQuery = require('#google-cloud/bigquery');
const Storage = require('#google-cloud/storage');
const projectId = "gas-ddr";
const datasetId = "gas_ddr";
const bucketName = file.bucket;
const filename = file.name;
const dashOffset = filename.indexOf('-');
const tableId = filename.substring(0, dashOffset);
console.log(`Load ${filename} into ${tableId}.`);
// Instantiates clients
const bigquery = new BigQuery({
projectId: projectId,
});
const storage = Storage({
projectId: projectId,
});
const metadata = {
allowJaggedRows: true,
skipLeadingRows: 1
};
let job;
// Loads data from a Google Cloud Storage file into the table
bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(filename),metadata)
.then(results => {
job = results[0];
console.log(`Job ${job.id} started.`);
// Wait for the job to finish
return job;
})
.then(metadata => {
// Check the job's status for errors
const errors = metadata.status.errors;
if (errors && errors.length > 0) {
throw errors;
}
})
.then(() => {
console.log(`Job ${job.id} completed.`);
})
.catch(err => {
console.error('ERROR:', err);
});
callback();
};
This is simply a javascript related question. You can simply extract the extension part of a filename and process files accordingly:
function getExtension(filename) {
var parts = filename.split('.');
return parts[parts.length - 1];
}
if (getExtension(filename) == "csv") {
// Loads data from a Google Cloud Storage file into the table
bigquery
.dataset(datasetId)
...
}
I have a cloud function that currently will take a .csv file that is landing on the cloud storage and will load the file into a Big Query Table. the issue is it is appending to it, I need it to overwrite, I found a way to do this using the command line --replace but not sure how to do it in .json using a cloud function. Below is my current code:
exports.ToBigQuery_Stage = (event, callback) => {
const file = event.data;
const context = event.context;
const BigQuery = require('#google-cloud/bigquery');
const Storage = require('#google-cloud/storage');
const projectId = "gas-ddr";
const datasetId = "gas_ddr_qc_stage";
const bucketName = file.bucket;
const filename = file.name;
// Do not use the ftp_files Bucket to ensure that the bucket does not get crowded.
// Change bucket to gas_ddr_files_staging
// Set the table name (TableId) to the full file name including date,
// this will give each table a new distinct name and we can keep a record of all of the files recieved.
// This may not be the best way to do this... at some point we will need to archive and delete prior records.
const dashOffset = filename.indexOf('-');
const tableId = filename.substring(0, dashOffset) + "_STAGE";
console.log(`Load ${filename} into ${tableId}.`);
// Instantiates clients
const bigquery = new BigQuery({
projectId: projectId,
});
const storage = Storage({
projectId: projectId,
});
const metadata = {
allowJaggedRows: true,
skipLeadingRows: 1
};
let job;
// Loads data from a Google Cloud Storage file into the table
bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(filename),metadata)
.then(results => {
job = results[0];
console.log(`Job ${job.id} started.`);
// Wait for the job to finish
return job;
})
.then(metadata => {
// Check the job's status for errors
const errors = metadata.status.errors;
if (errors && errors.length > 0) {
throw errors;
}
})
.then(() => {
console.log(`Job ${job.id} completed.`);
})
.catch(err => {
console.error('ERROR:', err);
});
callback();
};
You can add this into metadata:
const metadata = {
allowJaggedRows: true,
skipLeadingRows: 1,
writeDisposition: 'WRITE_TRUNCATE'
};
You can find more in the documentation.