Amazon S3 server multiple upload - amazon-web-services

I'm writing a program that saves images on Amazon S3 servers. My test suite is taking close to a minute to run due to having to run multiple uploads straight to S3 in order to test various features of the photos.
What is the issue here and how can I fix this?

You could:
1- Use multipart upload to speed it up, see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html
2- Use a mock "no-op" class to fake the upload instantly, see https://code.google.com/p/mockito/
3- Use a local emulation for testing, see https://github.com/jubos/fake-s3

Related

AEM/Adobe Experience Manager upload only some assets to AWS S3

My company is using AEM 6.5 and we were thinking to get some better performance out of our systems.
The idea we had is to upload only some assets (for example videos) to an S3 bucket and keep the other assets locally, we do not want to upload all the assets/datastore to S3. I know I can switch the datastore to S3, but that would mean all the assets go to S3, and we don't want this.
Restriction: we want the video upload to be done seamlessly from within the AEM Author, the editor should upload the video normally and somehow, behind the scenes, this transition to S3 to happen.
I checked as much documentation as I could find, and there is no mention of this partial asset upload to S3, you either go full S3 or nothing at all (we already tested full S3 datastore, it's working, but we do not want it).
So, my question is: did someone manage to do something like this?
Thanks
Have you looked into writing an Adobe Experience Manager workflow that would then read a list of assets to upload and then only update those specified assets. You could control which assets are uploaded to an Amazon S3 bucket before running the AEM workflow.
You can create a custom workflow step as discussed here. However in your use case - you would use the S3 Java API to create a custom workflow step. This is one way you can control which assets are uploaded to an Amazon S3 bucket from AEM.
https://helpx.adobe.com/experience-manager/using/message_service_gateway_api_64.html
Technically, it is possible to upload assets to S3, when they are uploaded to AEM instead of storing them in JCR. Nevertheless, this probably won't work as you expect and would require a lot of refactoring of AEM itself to make it work properly.
Just because the binary is stored in S3, does not mean that AEMs internals are aware of that and can deal with it.
Take asset preview on the author for example: this part of AEM would expect the binary to be stored in JCR. Now you have to rewrite this whole part of AEM to go look for those assets in S3. This would be a massive headache, overlaying those parts of AEM are already deprecated etc. And this is just one example of hundreds, that you would need to find a solution for.
It is not worth the effort.
You probably need to go "all-in" with S3 or leave it as is. Not sure what the reasoning is behind this drive to only use S3 "partially" for videos instead of all assets. Videos are probably already the largest assets you have, so it can't be cost. We run pure asset installations with S3 datastore that have 20TB-60TB of data which is totally fine.

Heroku doesn't update github file system when an image is uploaded from website

I ran into the problem where Heroku doesn't update my GitHub repository (or say static filesystem) when a blog post (including pictures) is created from the website.
Other images survive, whilst the ones saved in my filesystem with the server running on heroku, disapper.
I found this on their documentation.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted.
I'm still confused why not all the pictures disappear and only those added later do.
Is AWS S3 a solution for this? If it is, how can I represent my filesystem using buckets?
Say, for the Blog Post 1 I have 2 picture resolutions, which means storing the files in different folders corresponding to those resolutions.
---1920x1920
-----picture.jpg
---800x800
-----picture.jpg
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Is AWS S3 a solution for this?
S3 is the recommended solution for this, and the configuration is documented in Heroku DevCentre with specfic instructions for uploading from Python.
Note these Python instructions use the Direct Upload approch: Have the flask app generate a pre-signed URL, which is then passed back to the client Javascript code, so that the user's browser can make the upload to S3 directly. The resulting S3 URL of the image, is then put into a hidden element in the form, which is then received by your app on form submit.
The fact that you have separate image sizes suggests your app does some processing (maybe with PIL) to get these thumbnails. In which case it may be easier to use the Pass-Through approach where your app implements its own upload mechanism, does the processing and then uploads the thumbnails to S3 (The upload to S3 part is well document, such as in this SO thread).
The Pass-Through method carries the warning that this may cause blocking of a single threaded worker. If your site gets a volume of requests that causes this to be an issue, you may need to increase the number of gunicorn workers, or change to a worker type that supports concurrency (This github post has some useful commands/info on concurrent worker types).
The best way to implement this whole thing (although the requirement for a redisgo dyno and worker dyno may push you into the paid teir) may be with Background Tasks using rq. You use the Direct-Upload approach above to upload the original image, then have a background job download that, do the resizing, and put the resulting thumbnails back onto S3.
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Have one Bucket for the entire app, and just include forward slashes in the object's key to mimic a subdirectory structure.

Media files on Heroku

If I host a small Django website on Heroku and I am using just one dyno, is it save to upload media files on that server, or should I necessarily use AWS S3 storage to store media files? What are other alternatives for media storage?
No, it is never safe to store things on the Heroku filesystem. Even though you only have one dyno, it is still ephemeral, and can be killed at any time; for example when you push new code.
Using S3 is the way to go (alternatives are the Azure and Google offerings). There are several other advantages for using S3, mostly ability to service files without stressing your small server.
While your site is small, a dyno is very small as well, so a major advantage of S3, if used correctly, is that you can have the backing of the AWS S3 infrastructure to service the files. By "used correctly", I mean that you want to upload and service files directly to/from S3 so your server is only used for signing the S3 urls, but the actual files never go through your server.
Check https://devcenter.heroku.com/articles/s3-upload-python and http://docs.fineuploader.com/quickstart/01-getting-started.html (I strongly recommend Fine-Uploader if you can use the free version or afford the small license fee.).
Obviously, you can also just implement S3 media files in django using django-storage-redux, but that that means your server will be busy uploading files. If that's ok for your small server, then it is ok too.

Update wowza StreamPublisher schedule via REST API (or alternative)

Just getting started with Wowza Streaming Engine.
Objective:
Set up a streaming server which live streams existing video (from S3) at a pre-defined schedule (think of a tv channel that linearly streams - you're unable to seek through).
Create a separate admin app that manages that schedule and updates the streaming app accordingly.
Accomplish this with as a little custom Java as possible.
Questions:
Is it possible to fetch / update streamingschedule.smil with the Wowza Streaming Engine REST API?
There are methods to retrieve and update specific SMIL files via the REST API, but they only seem to be applicable to those created through the manager. After all, streamingschedule.smil needs to be created manually by hand
Alternatively, is it possible to reference a streamingschedule.smil that exists on an S3 bucket? (In a similar way footage can be linked from S3 buckets with the use of the MediaCache module)
A comment here (search for '3a') seems to indicate it's possible, but there's a lot of noise in that thread.
What I've done:
Set up Wowza Streaming Engine 4.4.1 on EC2
Enabled REST API documentation
Created a separate S3 bucket and filled it with pre-recorded footage
Enabled MediaCache on the server which points to the above S3 bucket
Created a customised VOD edge application, with AppType set to Live and StreamType set to live in order to be able to point to the above (as suggested here)
Created a StreamPublisher module with a streamingschedule.smil file
The above all works and I have a working schedule with linearly streaming content pulled from an S3 bucket. Just need to be able to easily manipulate that schedule without having to manually edit the file via SSH.
So close! TIA
To answer your questions:
No. However, you can update it by creating an http provider and having it handle the modifications to that schedule. Should you want more flexibility here you can even extend the scheduler module to not require that file at all.
Yes. You would have to modify the ServerListenerStreamPublisher solution to accomplish it. Currently it solely looks a the local filesystem to read teh streamingschedule.smil file.
Thanks,
Matt

AWS S3 file uploads: PHP SDK vs REST API

I need to upload a file to AWS Simple Storage Service from a PHP script. The script gets called from an external program and for some unknown reason the script bombs out as soon as I load the AWS PHP SDK. I've tried everything to get it to work without any success. I'm therefore thinking of rather using the AWS S3 REST API to upload the file.
My question is, what is the major drawback of using the REST API compared to the PHP SDK? I know it will be a bit harder to use the REST APIs, but if I only need to upload files to S3, would it take significantly more time? Or would it be worth spending another half a day (hopefully) trying to get the script to run while using the SDK?