Given quite a steep cost of Cloudinary as multimedia hosting service (images and videos), our client decided that they want to switch to AWS S3 as file hosting.
The problem is that there are a lot of files (thousands of images and videos) already in the app, so merely switching the provider is not enough - we need to also migrate all the files and make it look like nothing really changed for the end user.
This topic is somehow covered on Strapi forum: https://forum.strapi.io/t/switch-from-cloudinary-to-s3/15285, but there is no solution posted besides vaguely described procedure.
Is there a way to reliably perform the migration, without losing any data and without the need to change anything on client (apps that communicate with Strapi by REST/GraphQL API) side?
There are three steps to perform the migration:
switch provider from Cloudinary to S3 in Strapi
migrate files from Cloudinary to S3
perform database update to reroute Strapi from Cloudinary to S3
Switching provider
This is the only step that is actually well documented, so I will be brief here.
First, you need to uninstall your Cloudinary Strapi plugin by running yarn remove #strapi/provider-upload-cloudinary and install S3 Plugin by running yarn add #strapi/plugin-sentry.
After you do that, you need to create your AWS infrastructure (S3 bucket and IAM with sufficient permissions). Please follow official Strapi S3 plugin documentation https://market.strapi.io/providers/#strapi-provider-upload-aws-s3 and this guide https://dev.to/kevinadhiguna/how-to-setup-amazon-s3-upload-provider-in-your-strapi-app-1opc for steps to follow.
Check that you've done everything correctly by logging in to your Strapi Admin Panel and accessing Media Library. If everything went well, all images should be missing (you will see all metadata like sizes and extensions, but not actual images). Try to upload new image by clicking on 'Add new assets' button. This image should upload successfully and also appear in your S3 bucket.
After everything works as described above, proceed to actual data migration.
Files migration
Most simple (and error resistant) way to migrate files from Cloudinary to S3 is to download them locally, then use AWS Console to upload them. If you have only hundreds (or low thousands) of files to migrate, you might actually used Cloudinary Web UI to download them all (there is a limit of downloading 1000 files at once from Cloudinary Web App).
If this is not suitable for you, there is a CLI available that can easily download all files using your terminal:
pip3 install cloudinary-cli (download CLI)
cld config -url {CLOUDINARY_API_ENV} (api env can be found on first page you see when you log into cloudinary)
cld -C {CLOUD_NAME} sync --pull . / (This step begins the download. Based on how much files you have, it might take a while. Run this command from a directory you want to download the files in. {CLOUD_NAME} can be find just above {CLOUDINARY_API_ENV} on Cloudinary dashboard, you should also see it in after running second command in your terminal. For me, this command failed several times in the middle of the download, but you can just run it again and it will continue without any problem.)
After you download files to your computer, simply use drag and drop S3 feature to upload them into your S3 bucket.
Update database
Strapi saves links to all files in database. This means that even though you switched your provider to S3 and copied all files, Strapi still doesn't know where to find these files as links in database point to Cloudinary server.
You need to update three columns in Strapi database (this approach is tested on Postgres database, there might be minor changes when using other databases). Look into 'files' table, there should be url, formats and provider columns.
Provider column is trivial, just replace cloudinary by aws-s3.
Url and formats are harder as you need to replace only part of the string - to be more precise, Cloudinary stores urls in {CLOUDINARY_LINK}/{VERSION}/{FILE} format, while S3 uses {S3_BUCKET_LINK}/{FILE} format.
My friend and colleague came up with following SQL query to perform the update:
UPDATE files SET
formats = REGEXP_REPLACE(formats::TEXT, '\"https:\/\/res\.cloudinary\.com\/{CLOUDINARY_PROJECT}\/((image)|(video))\/upload\/v\d{10}\/([\w\.]+)\"', '"https://{BUCKET_NAME}.s3.{REGION}/\4"', 'g')::JSONB,
url = REGEXP_REPLACE(url, 'https:\/\/res\.cloudinary\.com\/{CLOUDINARY_PROJECT}\/((image)|(video))\/upload\/v\d{10}\/([\w\.]+)', 'https://{BUCKET_NAME}.s3.{REGION}/\4', 'g')
just don't forget to replace {CLOUDINARY_PROJECT}, {BUCKET_NAME} and {REGION} with correct strings (easiest way to see those values is to access the database, go to files table and check one of the old urls and url of file you uploaded at the end of Switching provider step.
Also, before running the query, don't forget to backup your database! Even better, make a copy of production database and run the query on it before you mess with the production.
And that's all! Strapi is now uploading files to S3 bucket and you also have access to all the data you previously had on Cloudinary.
We have a vendor that provides us data files ( 4/5 files ~10 GB each) on a monthly basis. They provide these files on their FTP site that we connect to using the username and password provided by them.
We download the zip files, unzip them, extract some relevant files, Gzip them and upload them to our s3 bucket and from there we push the data to Redshift.
Currently I have a python script that runs on an EC2 instance that does all this, but I am sure there's a better "serverless" solution out there ( Ideally in AWS environment) that can do this for me since this doesnt seem to be a very unique use case.
I am looking for recommendations / alternate solutions for processing these files.
Thank you.
everyone, I want to download a file from ieee-portdata through aws s3.
The instruction of ieee-dataport says:
The dataset is available two ways: as a large all-in-one 0.3 TB tarball available for download via the web interface; and as a set of individual files that are avalable for browsing or downloading via Amazon S3.
The path of the file is:
https://ieee-dataport.org/open-access/tracking-neurons-moving-and-deforming-brain-dataset
The instruction of ieee-dataport is:
https://ieee-dataport.org/help/accessing-ieee-dataport-dataset-files-through-amazon-web-services-aws
I have an IEEE account, and I am an IEEE DataPort subscriber.
I have installed AWS CLI and configured it. But when i try to scan the files in the dataport, it always shows:
How can i download this file from ieee-dataport?
Thank you very much.
Is it possible to send files from a mobile application to ES2 that has a python script file that processes the file and the final product will be save into S3?
Deploy a simple webapp in EC2 to receive the data from your mobile app, run the python script you mentioned with the data, use the S3 API and save the data there. As for how you're going to deploy that webapp, there are tons of ways/languages/technologies, fit for another question.
I'm writing a program that saves images on Amazon S3 servers. My test suite is taking close to a minute to run due to having to run multiple uploads straight to S3 in order to test various features of the photos.
What is the issue here and how can I fix this?
You could:
1- Use multipart upload to speed it up, see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
2- Use a mock "no-op" class to fake the upload instantly, see https://code.google.com/p/mockito/
3- Use a local emulation for testing, see https://github.com/jubos/fake-s3