Image files as inputs with AWS Elastic Transcoder? - amazon-web-services

Here's my situation. I've been working on building a service at work that takes dynamically generated images and outputs animations as mp4 or gif. The user has the options of setting dimensions, time for each frame, etc.
I have this working currently with ffmpeg. It works ok, but is difficult (and potentially expensive) to scale due largely to the cpu/memory requirements that ffmpeg needs.
I just spent some time experimenting with AWS's Elastic Transcoder. It doesn't seem to like static image files (jpg, png) as source material in jobs. The file types aren't listed under the available Preset options either.
I'm sure that I could adapt the existing architecture to save the static images as video files (sound isn't needed) and upload those. That will still require ffmpeg to be in the pipeline though.
Are there any other AWS services that might work with my needs and allow the use of Elastic Transcoder?

Related

Django / Docker, manage a million images and many large files

The project I am working on relies on many static files. I am looking for guidance on how to deal with the situation. First I will explain the situation, then I will ask my questions.
The Situation
The files that need management:
Roughly 1.5 million .bmp images, about 100 GB
Roughly 100 .h5 files, 250 MB each, about 25 GB
bmp files
The images are part of an image library, the user can filter trough them based on multiple different kinds of meta data. The meta data is spread out over multiple Models such as: Printer, PaperType, and Source.
In development the images sit in the static folder of the Django project, this works fine for now.
h5 files
Each app has its own set of .h5 files. They are used to inspect user generated images. Results of this inspection are stored in the database, the image itself is stored on disc.
Moving to production
Now that you know a bit about the problem it is time to ask my questions.
Please note that I have never pushed a Django project to production before. I am also new to Docker.
Docker
The project needs to be deployed on multiple machines, to make this more easy I decided to use Docker. I managed to build the Image and run the Container without the .bmp and .h5 files. So far so good!
How do I deal with the .h5 files? It does not seem like a good idea to build an Image that is 25 GB in size. Is there a way to download the .h5 file at a later point in time? As in building a Docker Image that only contains the code and downloads the .h5 files later.
Image files
I'm pretty sure that Django's collectstatic command is not meant for moving the amount of images the project uses. I'm thinking along the lines of directly uploading the images to some kind of image server.
If there are specialized image servers I would love to hear your suggestions.

Google Built CentOS Image - Anyone have a download for this?

I've looked for this across the web a few times, and I feel like this hasn't been asked exactly, or I may just be getting bogged down with the wrong syntax. Hoping to get an easy answer here (yes, you can't get this, is an acceptable answer).
The variations from the base CentOS image are listed here: Link to GCP
However, they don't actually provide a download for this image. I'm trying to get a local VM running in VMWare with this image.
I feel as though they'd provide this to their clients to make it easier to prepare for use of their product, but I'm not finding it anywhere.
If anyone could toss me a link to a pre-configured CentOS ISO with the minor changes, I'd definitely take that as an alternative. I'm just not confident in my skills with Linux enough to configure the firewall properly :)
GCP doesn't support Google-provied images for exporting. However, they support exporting images for custom images.
I don't have any experience about image exporting, but I think this works.
Create custom images
You can create custom images based on your GCE VM instance.
Go navigation -> Compute engine -> images page.
You can create custom image via disk or snapshot in this page.
Select one and create a custom image.
Export your image
After creating custom image successfully, Go custom image page and click "export" on upper side.
Select export format and GCS destination. then click export.
Now you have an image in the Google Cloud storage.
Download image file and import to your local VM machine.

What service should I use to process my files in a Cloud Storage bucket and upload the result?

I have a software that process some files. What I need is:
start a default image on google cloud (I think docker should be a good solution) using an API or a run command
download files from google storage
process it, run my software using those downloaded files
upload the result to google storage
shut the image down, expecting not to be billed anymore
What I do know is how to create my image hehe. But I can't find any info saying me what google cloud service should I use or even if I could do it like I'm thinking. I think I'm not using the right keywords to find what i need.
I was looking at Kubernetes, but i couldn't figure out how to manipulate those instances to execute a one time processing.
[EDIT]
Explaining better the process I have an app that receive images and send it to Google storage. After that, I need to process that images, apply filters, georeferencing, split image etc. So I want to start a docker image to process it and upload the results to google cloud again.
If you are using any of the runtimes supported by Google Cloud Functions, they are easiest way to do those kind of operations (i.e. fetch something from Google Cloud Storage, perform some actions on those files and upload them again). The Cloud Functions will be triggered by an event of your choice, and after the job, it will die.
Next option in terms of complexity would be to deploy a Google App Engine application in standard environment. It allows you to deploy your own application written in any of the supported languages for this environment. While there is traffic in your application, you will have instances serving, but the number of instances running can go down to 0 when they are not serving, which would mean less cost.
Another option would be Google App Engine in flexible environment. This product allows you to deploy your application in any custom runtime. This option has always at least one instance running, so it would never shut down.
Lastly, you can use Google Compute Engine to "create and run virtual machines on Google infrastructure". Otherwise than GAE, this is not that managed by Google, which means that most of the configuration is up to you. In this case, you would need to programmatically indicate your VM to shut down after you have finished your operations.
Based on your edit where you stated that you already have an app that is inserting images into Google Cloud Storage, your easiest option would be to use Cloud Functions that are triggered by additions, changes, or deletions to objects in Cloud Storage buckets.
You can follow the Cloud Functions tutorial for Cloud Storage to get an idea of the generic process and then implement your own code that handles your specific tasks. There are other tutorials like the Imagemagick tutorial for Cloud Functions that might also be relevant to the type of processing you intend to do.
Cloud Functions is probably your lightest weight approach. You could of course do more full scale applications, but that is likely overkill, more expensive, and more complex. You can write your processing code in Node.js, Python, or Go.

GCP image creation from a compressed RAW image

When an image is created from a compressed RAW image stored in a gcs bucket, is an instance spun up in the background to validate the image? I would like to understand how the image creation process works and if Google adds some software on top of what's in the RAW image.
According to our documentation regarding the importation of Boot Disk Images to Compute Engine. In the overview section, we explain all the needed steps to understand how the image creating process works. This will reply to your question “I would like to understand how the image creation process works”.
Verifying this steps details will allow you to address the remaining questions and you’ll know that we don’t spin up an instance in the background to validate the image and that Google doesn’t add any software on top of what’s in the RAW image.
Customers are responsible for:
1- Planning for the import path
2- Preparing the boot disk so it can boot in Compute Engine environment
3- Creating and compressing the boot disk
4- Uploading the image
5- Using the imported image to create a VM

AWS Rekognition use

I have an android app which uploads images taken by the camera to AWS S3. I would like to be able to keep the image if it contains the face of the user, and only the face of the user. (ie a selfie - unfortunately android does not save which camera was used in EXIF data).
I have found code to do this on android, but that seems like an unnecessary amount of network calls. Seeing as I am using S3, it seems like there should be away to have S3 do if for me automatically. Ie, every image uploaded to a folder is automatically run through Rekog, stored if the same as reference image and deleted otherwise.
The service is so new however, the documentation rather sparse, than I cannot find any docs describing if this is possible. Does anyone know?
You can do the following:
S3 upload event -> trigger lambda -> calls Rekognition CompareFaces API -> based on Confidence score threshold -> decides to delete or retain.
Points to note:
You need to have a reference image stored in S3
If there are too many images uploaded, you can see if AWS Batch is better suited, if you are OK with not doing it real time, then spot instances should be preferable.
I'm working with Rekognition as well. As best I can tell from your question, ComparesFaces or SearchFaces could be used to determine whether to store or delete the image. As far as how to get Rekog auto-run on a specific folder I guess it could start with S3 invoking Lambda but I'm not sure what additional AWS services would be required beyond...