I have a few lambda functions that allow to make a multipart upload to an Amazon S3 bucket. These are responsible for creating the multipart upload, then another one for each part upload and the last one for completing the upload.
First two seem to work fine (they respond with statusCode 200), but the last one fails. On Cloudwatch, I can see an error saying 'Your proposed upload is smaller than the minimum allowed size'.
This is not true, since I'm uploading files bigger than 5Mb minimum size specified on docs. However, I think the issue is happening in every single part upload.
Why? Because each part only has 2Mb of data. On docs, I can see that every but the last part needs to be at least 5Mb sized. However, when I try to upload parts bigger than 2Mb, I get a CORS error, most probably because I have passed the 6Mb lambda payload limit.
Can anyone help me with this? Below I leave my client-side code, just in case you can see any error on it.
setLoading(true);
const file = files[0];
const size = 2000000;
const extension = file.name.substring(file.name.lastIndexOf('.'));
try {
const multiStartResponse = await startMultiPartUpload({ fileType: extension });
console.log(multiStartResponse);
let part = 1;
let parts = [];
/* eslint-disable no-await-in-loop */
for (let start = 0; start < file.size; start += size) {
const chunk = file.slice(start, start + size + 1);
const textChunk = await chunk.text();
const partResponse = await uploadPart({
file: textChunk,
fileKey: multiStartResponse.data.Key,
partNumber: part,
uploadId: multiStartResponse.data.UploadId,
});
console.log(partResponse);
parts.push({ ETag: partResponse.data.ETag, PartNumber: part });
part++;
}
/* eslint-enable no-await-in-loop */
const completeResponse = await completeMultiPartUpload({
fileKey: multiStartResponse.data.Key,
uploadId: multiStartResponse.data.UploadId,
parts,
});
console.log(completeResponse);
} catch (e) {
console.log(e);
} finally {
setLoading(false);
}
It seems that uploading parts via lambda is simply not possible, so we need to use a different approach.
Now, our startMultiPartUpload lambda returns not only an upload ID but also a bunch of signedURLs, generated with S3 aws-sdk class, using getSignedUrlPromise method, and 'uploadPart' as operation, as shown below:
const getSignedPartURL = (bucket, fileKey, uploadId, partNumber) =>
s3.getSignedUrlPromise('uploadPart', { Bucket: bucket, Key: fileKey, UploadId:
uploadId, PartNumber: partNumber })
Also, since uploading a part this way does not return an ETag (or maybe it does, but I just couldn't achieve it), we need to call listParts method on S3 class after uploading each part in order to get those ETags. I'll leave my React code below:
const uploadPart = async (url, data) => {
try {
// return await uploadPartToS3(url, data);
return fetch(url, {
method: 'PUT',
body: data,
}).then((e) => e.body);
} catch (e) {
console.error(e);
throw new Error('Unknown error');
}
};
// If file is bigger than 50Mb then perform a multi part upload
const uploadMultiPart = async ({ name, size, originFileObj },
updateUploadingMedia) => {
// chunk size determines each part size. This needs to be > 5Mb
const chunkSize = 60000000;
let chunkStart = 0;
const extension = name.substring(name.lastIndexOf('.'));
const partsQuan = Math.ceil(size / chunkSize);
// Start multi part upload. This returns both uploadId and signed urls for each
part.
const startResponse = await startMultiPartUpload({
fileType: extension,
chunksQuan: partsQuan,
});
console.log('start response: ', startResponse);
const {
signedURLs,
startUploadResponse: { Key, UploadId },
} = startResponse.data;
try {
let promises = [];
/* eslint-disable no-await-in-loop */
for (let i = 0; i < partsQuan; i++) {
// Split file into parts and upload each one to it's signed url
const chunk = await originFileObj.slice(chunkStart, chunkStart +
chunkSize).arrayBuffer();
chunkStart += chunkSize;
promises.push(uploadPart(signedURLs[i], chunk));
if (promises.length === 5) {
await Promise.all(promises);
promises = [];
}
console.log('UPLOAD PART RESPONSE', uploadResponse);
}
/* eslint-enable no-await-in-loop */
// wait until every part is uploaded
await allProgress({ promises, name }, (media) => {
updateUploadingMedia(media);
});
// Get parts list to build complete request (each upload does not retrieve ETag)
const partsList = await listParts({
fileKey: Key,
uploadId: UploadId,
});
// build parts object for complete upload
const completeParts = partsList.data.Parts.map(({ PartNumber, ETag }) => ({
ETag,
PartNumber,
}));
// Complete multi part upload
completeMultiPartUpload({
fileKey: Key,
uploadId: UploadId,
parts: completeParts,
});
return Key;
} catch (e) {
console.error('ERROR', e);
const abortResponse = await abortUpload({
fileKey: Key,
uploadId: UploadId,
});
console.error(abortResponse);
}
};
Sorry for identation, I corrected it line by line as best as I could :).
Some considerations:
-We use 60Mb chunks because our backend took too long generating all those signed urls for big files.
-Also, this solution is meant to upload really big files, that's why we await every 5 parts.
However, we are stil facing issues to upload huge files (about 35gb) since after uploading 100/120 parts, fetch requests suddenly starts to fail and no more parts are uploaded. If someone knows what's going on, it would be amazing. I publish this as an answer because I think most people will find this very useful.
Related
I upload file to google storage using "#ffmpeg-installer/ffmpeg" and #google-cloud/storage in my node.js App.
Step 1. file uploading to fs is in child processes - one process for each type of resolution (totaly six).
step 2. encription (converting to stream)
step 3. upload to google storage
I use "Upload a directory to a bucket" in order to send the video from the client to the Google Cloud Storage bucket.
This way is working fine only with small video.
for example when I upload video with duration one hour it split on chunk and totally I get more three thousands files. But the problem occurs when there are more than 1500 files
So actually i upload folder with large amount of files, but not all of this files are uploaded to cloud.
maybe someone had the similar problem and helps fix it.
const uploadFolder = async (bucketName, directoryPath, socketInstance) => {
try {
let dirCtr = 1;
let itemCtr = 0;
const fileList = [];
const onComplete = async () => {
const folderName = nanoid(46);
await Promise.all(
fileList.map(filePath => {
const fileName = path.relative(directoryPath, filePath);
const destination = `${ folderName }/${ fileName }`;
return storage
.bucket(bucketName)
.upload(filePath, { destination })
.then(
uploadResp => ({ fileName: destination, status: uploadResp[0] }),
err => ({ fileName: destination, response: err })
);
})
);
if (socketInstance) socketInstance.emit('uploadProgress', {
message: `Added files to Google bucket`,
last: false,
part: false
});
return folderName;
};
const getFiles = async directory => {
const items = await fs.readdir(directory);
dirCtr--;
itemCtr += items.length;
for(const item of items) {
const fullPath = path.join(directory, item);
const stat = await fs.stat(fullPath);
itemCtr--;
if (stat.isFile()) {
fileList.push(fullPath);
} else if (stat.isDirectory()) {
dirCtr++;
await getFiles(fullPath);
}
}
}
await getFiles(directoryPath);
return onComplete();
} catch (e) {
log.error(e.message);
throw new Error('Can\'t store folder.');
}
};
I use react native and backend was built with Prisma and GraphQL (Apollo Server).
I don't store image data to Prisma but to aw3.
The problem is I want to upload several images at once to my app. So I make image column of Prisma Array [], not String.
But as using aw3, I can upload only one image at once. So even if I make image column as Array, I can't upload several images at once as Array using aw3.
When I searched people suggest 3 options in order to upload multiple files by aw3.
multi-thread
multi-processing
zip upload (amazon-lambda)
In my case(to upload files as Array),
which option is most advisable?
And can you teach me the way of doing that?
My backend code:
export const uploadToS3 = async (file, userId, folderName) => {
const { filename, createReadStream } = await file;
const readStream = createReadStream();
const objectName = `${folderName}/${userId}-${Date.now()}-${filename}`;
const { Location } = await new AWS.S3()
.upload({
Bucket: "chungchunonuploads",
Key: objectName,
ACL: "public-read",
Body: readStream,
})
.promise();
return Location;
};
We need to resolve multiple file upload promises with Promise.all. Let us refactor our code and split it into 2 functions.
// Assume that we have list of all files to upload
const filesToUpload = [file1, file2, file3, fileN];
export const uploadSingleFileToS3 = async (file, userId, folderName) => {
const { filename, createReadStream } = await file;
const readStream = createReadStream();
const objectName = `${folderName}/${userId}-${Date.now()}-${filename}`;
const response = await new AWS.S3().upload({
Bucket: "chungchunonuploads",
Key: objectName,
ACL: "public-read",
Body: readStream,
});
return response;
};
const uploadMultipleFilesToS3 = async (filesToUpload, userId, folderName) => {
const uploadPromises = filesToUpload.map((file) => {
return uploadSingleFileToS3(file, userId, folderName);
});
// Array containing all uploaded files data
const uploadResult = await Promise.all(uploadPromises);
// Add logic here to update the database with Prisma ORM
};
// Call uploadMultipleFilesToS3 with all required parameters
I'm trying to create a REST API with ExpressJS that accept an image and pass it to another service (with a POST request) which is in charge to perform some operations (resize, etc..) and store into an AWS S3. I know that the same solution can be easily done with a Lambda Function directly but I have a K8s and I want to make worth it.
All components are already working with the exception of the service that forward the image to the second service.
The idea that I've found on internet is using a stream, but I got the exception Error: Expected a stream at Object.getStream [as default]
How can I solve that? Is the right practice or there is a better solution to achieve the same result?
const headers = req.headers;
const files: any = req.files
const filename = files[0].originalname;
const buffer = await getStream(files[0].stream)
const formFile = new FormData();
formFile.append('image', buffer, filename);
headers['Content-Type'] = 'multipart/form-data';
axios.post("http://localhost:1401/content/image/test/upload/", formFile, {
headers: headers,
})
.catch((error) => {
const { status, data } = error.response;
res.status(status).send(data);
})
I've found a solution that I post here for those who'll have the same problem.
Install form-data on node:
yarn add form-data
Then in your controller:
const headers = req.headers;
const files: any = req.files
const formFile = new FormData();
files.forEach((file: any) => {
const filename = file.originalname;
const buffer = file.buffer
formFile.append('image', buffer, filename);
})
// set the correct header otherwise it won't work
headers["content-type"] = `multipart/form-data; boundary=${formFile.getBoundary()}`
// now you can send the image to the second service
axios.post("http://localhost:1401/content/image/test/upload/", formFile, {
headers: headers,
})
.then((r : any) => {
res.sendStatus(r.status).end()
})
.catch((error) => {
const { status, data } = error.response;
res.status(status).send(data);
})
I want a user to be able to record and upload a .wav to an S3 bucket. Using this, I am able to achieve this, working correctly, as a .webm. file. I am now trying to adapt this to use a MediaRecorder bolt-on that allows the support of .wav files. So I am now trying to integrate that code with RecordRTC which adds .wav support to MediaRecorder.
This functionality essentially works, in that I end up with a .wav file in my Amazon S3 bucket, but the file is corrupted. I think the main place of concern is in the callback function for ondataavailable (a lot of the code afterwards probably can be ignored, but is there just in case). The line console.log(blob); in the following code shows that the blob type is audio/webm.
Any ideas how this can be fixed?
Edit: The resulting file is actually a .webm file, according to link, not a .wav after all! So why not? (However, my computer still shows it as a .wav in the File Inspector)
function isConstructor(obj) {
return !!obj.prototype && !!obj.prototype.constructor.name;
}
class AudioStream {
constructor(region, IdentityPoolId, audioStoreWithBucket) {
this.region = region; //s3 region
this.IdentityPoolId = IdentityPoolId; //identity pool id
this.bucketName = audioStoreWithBucket; //audio file store
this.s3; //variable defination for s3
this.dateinfo = new Date();
this.timestampData = this.dateinfo.getTime(); //timestamp used for file uniqueness
this.etag = []; // etag is used to save the parts of the single upload file
this.recordedChunks = []; //empty Array
this.booleanStop = false; // this is for final multipart complete
this.incr = 0; // multipart requires incremetal so that they can merge all parts by ascending order
this.filename = this.timestampData.toString() + ".wav"; //unique filename
this.uploadId = ""; // upload id is required in multipart
this.recorder; //initializing recorder variable
this.audioConstraints = {
audio: true
};
}
audioStreamInitialize() {
var self = this;
AWS.config.region = self.region;
AWS.config.credentials = new AWS.CognitoIdentityCredentials({
IdentityPoolId: self.IdentityPoolId,
});
self.s3 = new AWS.S3();
navigator.mediaDevices.getUserMedia(self.audioConstraints)
.then(function(stream) {
self.recorder = RecordRTC(stream, {
type: 'audio',
mimeType: 'audio/wav',
recorderType: MediaStreamRecorder,
disableLogs: true,
// get intervals based blobs
// value in milliseconds
timeSlice: 1800000,
// requires timeSlice above
// returns blob via callback function
ondataavailable: function(blob) {
console.log("ondata!")
var normalArr = [];
/*
Here we push the stream data to an array for future use.
*/
console.log(blob);
self.recordedChunks.push(blob);
normalArr.push(blob);
/*
here we create a blob from the stream data that we have received.
*/
var bigBlob = new Blob(normalArr, {
type: 'audio/wav'
});
/*
if the length of recordedChunks is 1 then it means its the 1st part of our data.
So we createMultipartUpload which will return an upload id.
Upload id is used to upload the other parts of the stream
else.
It Uploads a part in a multipart upload.
*/
if (self.recordedChunks.length == 1) {
self.startMultiUpload(bigBlob, self.filename)
} else {
/*
self.incr is basically a part number.
Part number of part being uploaded. This is a positive integer between 1 and 10,000.
*/
self.incr = self.incr + 1
self.continueMultiUpload(bigBlob, self.incr, self.uploadId, self.filename, self.bucketName);
}
} // end ondataavailable
});
/*
Called to handle the dataavailable event, which is periodically triggered each time timeslice milliseconds of media have been recorded
(or when the entire media has been recorded, if timeslice wasn't specified).
The event, of type BlobEvent, contains the recorded media in its data property.
You can then collect and act upon that recorded media data using this event handler.
*/
});
}
disableAllButton() {
//$("#formdata button[type=button]").attr("disabled", "disabled");
}
enableAllButton() {
//$("#formdata button[type=button]").removeAttr("disabled");
}
/*
The MediaRecorder method start(), which is part of the MediaStream Recording API,
begins recording media into one or more Blob objects.
You can record the entire duration of the media into a single Blob (or until you call requestData()),
or you can specify the number of milliseconds to record at a time.
Then, each time that amount of media has been recorded, an event will be delivered to let you act upon the recorded media,
while a new Blob is created to record the next slice of the media
*/
startRecording(id) {
var self = this;
//self.enableAllButton();
//$("#record_q1").attr("disabled", "disabled");
/*
1800000 is the number of milliseconds to record into each Blob.
If this parameter isn't included, the entire media duration is recorded into a single Blob unless the requestData()
method is called to obtain the Blob and trigger the creation of a new Blob into which the media continues to be recorded.
*/
/*
PLEASE NOTE YOU CAN CHANGE THIS PARAM OF 1800000 but the size should be greater then or equal to 5MB.
As for multipart upload the minimum breakdown of the file should be 5MB
*/
//this.recorder.start(1800000);
this.recorder.startRecording();
Shiny.setInputValue("timecode", self.filename);
}
stopRecording(id) {
var self = this;
self.recorder.stopRecording();
}
pauseRecording(id) {
var self = this;
self.recorder.pauseRecording();
//$("#pause_q1").addClass("hide");
//$("#resume_q1").removeClass("hide");
}
resumeRecording(id) {
var self = this;
self.recorder.resumeRecording();
//$("#resume_q1").addClass("hide");
//$("#pause_q1").removeClass("hide");
}
/*
Initiates a multipart upload and returns an upload ID.
Upload id is used to upload the other parts of the stream
*/
startMultiUpload(blob, filename) {
var self = this;
var audioBlob = blob;
var params = {
Bucket: self.bucketName,
Key: filename,
ContentType: 'audio/wav',
ACL: 'private',
};
self.s3.createMultipartUpload(params, function(err, data) {
if (err) {
console.log(err, err.stack); // an error occurred
} else {
self.uploadId = data.UploadId
self.incr = 1;
self.continueMultiUpload(audioBlob, self.incr, self.uploadId, self.filename, self.bucketName);
}
});
}
continueMultiUpload(audioBlob, PartNumber, uploadId, key, bucketName) {
var self = this;
var params = {
Body: audioBlob,
Bucket: bucketName,
Key: key,
PartNumber: PartNumber,
UploadId: uploadId
};
console.log(params);
self.s3.uploadPart(params, function(err, data) {
if (err) {
console.log(err, err.stack)
} // an error occurred
else {
/*
Once the part of data is uploaded we get an Entity tag for the uploaded object(ETag).
which is used later when we complete our multipart upload.
*/
self.etag.push(data.ETag);
if (self.booleanStop === true) {
self.completeMultiUpload();
}
}
});
}
/*
Completes a multipart upload by assembling previously uploaded parts.
*/
completeMultiUpload() {
var self = this;
var outputTag = [];
/*
here we are constructing the Etag data in the required format.
*/
self.etag.forEach((data, index) => {
const obj = {
ETag: data,
PartNumber: ++index
};
outputTag.push(obj);
});
var params = {
Bucket: self.bucketName, // required
Key: self.filename, // required
UploadId: self.uploadId, // required
MultipartUpload: {
Parts: outputTag
}
};
self.s3.completeMultipartUpload(params, function(err, data) {
if (err) {
console.log(err, err.stack);
} // an error occurred
else {
// initialize variable back to normal
self.etag = [], self.recordedChunks = [];
self.uploadId = "";
self.booleanStop = false;
//self.disableAllButton();
self.removeLoader();
console.log("sent!");
}
});
}
/*
set loader
*/
setLoader() {
//$("#kc-container").addClass("overlay");
//$(".preloader-wrapper.big.active.loader").removeClass("hide");
}
/*
remove loader
*/
removeLoader() {
// $("#kc-container").removeClass("overlay");
//$(".preloader-wrapper.big.active.loader").addClass("hide");
}
getFilename() {
return this.filename;
}
}
Using S3, I can generate a pre-signed URL to upload a file, but I can't find a way to determine from another resource whether the:
upload completed successfully or not.
another upload is in progress.
I am asking in the context of Expo's FileSystem.uploadAsync which according to the documentation uploads in the background.
Here's the snippet of my upload code.
const startResult = await this.webClient.post("/v3/upload/start", {
category: meta.category,
contentType: meta.contentType,
contentMd5Hash: fileInfo.md5,
contentLength: fileInfo.size,
});
if (startResult.status !== 200) {
throw new Error(
`start result got ${startResult.status} expecting 200. ${startResult.data}`
);
}
const {
presignedUploadUrl,
contentMd5,
completionToken,
} = startResult.data;
const result = await FileSystem.uploadAsync(
presignedUploadUrl,
fileInfo.uri,
{
headers: {
"Content-MD5": contentMd5 as string,
"Content-Type": meta.contentType,
},
httpMethod: "PUT",
}
);
// I need to do this to indicate that the upload was completed
// All this tells the backend that it is done, but there's no other
// proof that it was done.
const completeResult = await this.webClient.post("/v3/upload/complete", {
completionToken,
presignedUploadUrl,
});
const { artifactId } = completeResult.data;
console.log(artifactId);