AEM/Adobe Experience Manager upload only some assets to AWS S3 - amazon-web-services

My company is using AEM 6.5 and we were thinking to get some better performance out of our systems.
The idea we had is to upload only some assets (for example videos) to an S3 bucket and keep the other assets locally, we do not want to upload all the assets/datastore to S3. I know I can switch the datastore to S3, but that would mean all the assets go to S3, and we don't want this.
Restriction: we want the video upload to be done seamlessly from within the AEM Author, the editor should upload the video normally and somehow, behind the scenes, this transition to S3 to happen.
I checked as much documentation as I could find, and there is no mention of this partial asset upload to S3, you either go full S3 or nothing at all (we already tested full S3 datastore, it's working, but we do not want it).
So, my question is: did someone manage to do something like this?
Thanks

Have you looked into writing an Adobe Experience Manager workflow that would then read a list of assets to upload and then only update those specified assets. You could control which assets are uploaded to an Amazon S3 bucket before running the AEM workflow.
You can create a custom workflow step as discussed here. However in your use case - you would use the S3 Java API to create a custom workflow step. This is one way you can control which assets are uploaded to an Amazon S3 bucket from AEM.
https://helpx.adobe.com/experience-manager/using/message_service_gateway_api_64.html

Technically, it is possible to upload assets to S3, when they are uploaded to AEM instead of storing them in JCR. Nevertheless, this probably won't work as you expect and would require a lot of refactoring of AEM itself to make it work properly.
Just because the binary is stored in S3, does not mean that AEMs internals are aware of that and can deal with it.
Take asset preview on the author for example: this part of AEM would expect the binary to be stored in JCR. Now you have to rewrite this whole part of AEM to go look for those assets in S3. This would be a massive headache, overlaying those parts of AEM are already deprecated etc. And this is just one example of hundreds, that you would need to find a solution for.
It is not worth the effort.
You probably need to go "all-in" with S3 or leave it as is. Not sure what the reasoning is behind this drive to only use S3 "partially" for videos instead of all assets. Videos are probably already the largest assets you have, so it can't be cost. We run pure asset installations with S3 datastore that have 20TB-60TB of data which is totally fine.

Related

Strapi - how to switch and migrate from Cloudinary to S3 in production

Given quite a steep cost of Cloudinary as multimedia hosting service (images and videos), our client decided that they want to switch to AWS S3 as file hosting.
The problem is that there are a lot of files (thousands of images and videos) already in the app, so merely switching the provider is not enough - we need to also migrate all the files and make it look like nothing really changed for the end user.
This topic is somehow covered on Strapi forum: https://forum.strapi.io/t/switch-from-cloudinary-to-s3/15285, but there is no solution posted besides vaguely described procedure.
Is there a way to reliably perform the migration, without losing any data and without the need to change anything on client (apps that communicate with Strapi by REST/GraphQL API) side?
There are three steps to perform the migration:
switch provider from Cloudinary to S3 in Strapi
migrate files from Cloudinary to S3
perform database update to reroute Strapi from Cloudinary to S3
Switching provider
This is the only step that is actually well documented, so I will be brief here.
First, you need to uninstall your Cloudinary Strapi plugin by running yarn remove #strapi/provider-upload-cloudinary and install S3 Plugin by running yarn add #strapi/plugin-sentry.
After you do that, you need to create your AWS infrastructure (S3 bucket and IAM with sufficient permissions). Please follow official Strapi S3 plugin documentation https://market.strapi.io/providers/#strapi-provider-upload-aws-s3 and this guide https://dev.to/kevinadhiguna/how-to-setup-amazon-s3-upload-provider-in-your-strapi-app-1opc for steps to follow.
Check that you've done everything correctly by logging in to your Strapi Admin Panel and accessing Media Library. If everything went well, all images should be missing (you will see all metadata like sizes and extensions, but not actual images). Try to upload new image by clicking on 'Add new assets' button. This image should upload successfully and also appear in your S3 bucket.
After everything works as described above, proceed to actual data migration.
Files migration
Most simple (and error resistant) way to migrate files from Cloudinary to S3 is to download them locally, then use AWS Console to upload them. If you have only hundreds (or low thousands) of files to migrate, you might actually used Cloudinary Web UI to download them all (there is a limit of downloading 1000 files at once from Cloudinary Web App).
If this is not suitable for you, there is a CLI available that can easily download all files using your terminal:
pip3 install cloudinary-cli (download CLI)
cld config -url {CLOUDINARY_API_ENV} (api env can be found on first page you see when you log into cloudinary)
cld -C {CLOUD_NAME} sync --pull . / (This step begins the download. Based on how much files you have, it might take a while. Run this command from a directory you want to download the files in. {CLOUD_NAME} can be find just above {CLOUDINARY_API_ENV} on Cloudinary dashboard, you should also see it in after running second command in your terminal. For me, this command failed several times in the middle of the download, but you can just run it again and it will continue without any problem.)
After you download files to your computer, simply use drag and drop S3 feature to upload them into your S3 bucket.
Update database
Strapi saves links to all files in database. This means that even though you switched your provider to S3 and copied all files, Strapi still doesn't know where to find these files as links in database point to Cloudinary server.
You need to update three columns in Strapi database (this approach is tested on Postgres database, there might be minor changes when using other databases). Look into 'files' table, there should be url, formats and provider columns.
Provider column is trivial, just replace cloudinary by aws-s3.
Url and formats are harder as you need to replace only part of the string - to be more precise, Cloudinary stores urls in {CLOUDINARY_LINK}/{VERSION}/{FILE} format, while S3 uses {S3_BUCKET_LINK}/{FILE} format.
My friend and colleague came up with following SQL query to perform the update:
UPDATE files SET
formats = REGEXP_REPLACE(formats::TEXT, '\"https:\/\/res\.cloudinary\.com\/{CLOUDINARY_PROJECT}\/((image)|(video))\/upload\/v\d{10}\/([\w\.]+)\"', '"https://{BUCKET_NAME}.s3.{REGION}/\4"', 'g')::JSONB,
url = REGEXP_REPLACE(url, 'https:\/\/res\.cloudinary\.com\/{CLOUDINARY_PROJECT}\/((image)|(video))\/upload\/v\d{10}\/([\w\.]+)', 'https://{BUCKET_NAME}.s3.{REGION}/\4', 'g')
just don't forget to replace {CLOUDINARY_PROJECT}, {BUCKET_NAME} and {REGION} with correct strings (easiest way to see those values is to access the database, go to files table and check one of the old urls and url of file you uploaded at the end of Switching provider step.
Also, before running the query, don't forget to backup your database! Even better, make a copy of production database and run the query on it before you mess with the production.
And that's all! Strapi is now uploading files to S3 bucket and you also have access to all the data you previously had on Cloudinary.

Continuous Delivery issues with S3 and AWS CloudFront

I'm building out a series of content websites, and I've built a working CodePipeline that allows me to push edits to HTML files on github that instantly reflect in the S3 bucket, and consequently on the live website.
I created a cloudfront distro to get HTTPS for my website. The certificate and distro work fine, and it populates with my index.html in my S3 bucket, but the changes made via my github pipeline to the S3 bucket are reflected in the S3 bucket but not the CloudFront Distribution.
From what I've read, the edge locations used in cloudfront don't update their caches super often, and when they do, they might not update the edited index.html file because it has the same name as the old version.
I don't want to manually rename my index.html file in S3 every time one of my writers needs to post a top 10 Tractor Brands article or implement an experimental, low-effort clickbait idea, so that's pretty much off the table.
My overall objective is to build something where teams can quickly add an article with a few images to the website that goes live in minutes, and I've been able to do it so far but not with HTTPS.
If any of you know a good way of instantly updating CloudFront Distributions without changing file names, that would be great. Othterwise I'll probably have to start over because I need my sites secured and the ability to update them instantly.
You people are awesome. Thanks a million for any help.
You need to invalidate files from the edge caches. It's a simple and quick process.
You can automate the process yourself in your pipeline, or you could potentially use a third-party tool such as aws-cloudfront-auto-invalidator.

Best way to public download a full folder with amazon?

I'm doing a launcher (in C#) that downloads a full game or app. The app can be very large (i.e. 5GB) and I need to get it with the correct folder hierarchhy, so the same launcher can check if the user has the correct app or it needs to be repaired or updated.
I'm trying to do that with amazon s3 and CloudFront, but seems that I can only get objects and not the full folder of the app.
I also have stored the folder in an EC2, and that works fine, but seems that EC2 is not designed for that, so downloads are extremely slow.
Is there any amazon service to do that?
Have you considered zipping the files first? It solves alot of issues eg folder structure, compression and works great from s3 and cloud front. Its a common solution for this use case.
You can do this in your application with the DownlodDirectory method in TransferUtility class in the .NET SDK.
You can read more about the DownloadDirectory method here. By default I believe it only downloads objects in the root path, so don’t forget to do it recursively for sub-folders if necessary.

Issue with update of objects in AWS S3 bucket

While building an AWS website for one of my client I am having issues with the eventual consistency of S3 Bucket while updating an object.
In one of the feature that we have developed the user can update his profile picture and we are saving the profile picture in the S3 bucket and saving the public URL of it in the DB for later retrieval.
Now for new Objects it is working fine but for updates it is taking time(~ 5-10 mins) for the update to happen. I have explored the internet and could not find a solution to this. Some people suggested to use a versioning like v1/filename and v2/filename and with update take the data from the latest version directory but this is too impractical.
Can any one please suggest me what to do?
enable versioning in the bucket and use the versioning features to get the latest - rather than altering the path. s3 will handle the number of copies. See
https://forums.aws.amazon.com/thread.jspa?threadID=263531 for a discussion of this feature and consistency

Use AWS Elastic Transcoder and S3 to stream HLSv4 without making everything public?

I am trying to stream a video with HLSv4. I am using AWS Elastic Transcoder and S3 to convert the original file (eg. *.avi or *.mp4) to HLSv4.
Transcoding is successful, with several *.ts and *.aac (with accompanying *.m3u8 playlist files for each media file) and a master *.m3u8 playlist file linking to the media-file specific playlist files. I feel fairly comfortable everything is in order here.
Now the trouble: This is a membership site and I would like to avoid making every video file public. The way to do this typically with S3 is to generate temporary keys server-side which you can append to the URL. Trouble is, that changes the URLs to the media files and their playlists, so the existing *.m3u8 playlists (which provide references to the other playlists and media) do not contain these keys.
One option which occurred to me would be to generate these playlists on the fly as they are just text files. The obvious trouble is overhead, it seems hacky, and these posts were discouraging: https://forums.aws.amazon.com/message.jspa?messageID=529189, https://forums.aws.amazon.com/message.jspa?messageID=508365
After spending some time on this, I feel like I'm going around in circles and there doesn't seem to be a super clear explanation anywhere for how to do this.
So as of September 2015, what is the best way to use AWS Elastic Transcoder and S3 to stream HLSv4 without making your content public? Any help is greatly appreciated!
EDIT: Reposting my comment below with formatting...
Thank you for your reply, it's very helpful
The plan that's forming in my head is to keep the converted ts and aac files on S3 but generate the 6-8 m3u8 files + master playlist and serve them directly from app server So user hits "Play" page and jwplayer gets master playlist from app server (eg "/play/12/"). Server side, this loads the m3u8 files from s3 into memory and searches and replaces the media specific m3u8 links to point to S3 with a freshly generated URL token
So user-->jwplayer-->local master m3u8 (verify auth server side)-->local media m3u8s (verify auth server side)-->s3 media files (accessed with signed URLs and temporary tokens)
Do you see any issues with this approach? Such as "you can't reference external media from a playlist" or something similarly catch 22-ish?
Dynamically generated playlists is one way to go. I actually implemented something like this as a Nginx module and it works very fast, though it's written in C and compiled and not PHP.
The person in your first link is more likely to have issues because of his/hers 1s chunk duration. This adds a lot of requests and overhead, the value recommended by Apple is 10s.
There are solutions like HLS encrypted with AES-128 (supported on the Elastic Transcoder), which also adds overhead if you do it on the-fly, and HLS with DRM like PHLS/Primetime which will most likely get you into a lot of trouble on the client-side.
There seems to be a way to do it with Amazon CloudFront. Please note that I haven't tried it personally and you need to check if it works on Android/iOS.
The idea is to use Signed Cookies instead of Signed URLs. They were apparently introduced in March 2015. The linked blog entry even uses HLS as an example.
Instead of dynamic URLs you send a Set-Cookie header after you authenticate the user. The cookie (hopefully) gets passed around with every request (playlist and segments) and CloudFront decides whether to allow the access to your S3 bucket or not:
You can find the documentation here:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html