GCP image storage strategy - google-cloud-platform

GCP image storage strategy - google-cloud-platform

I am working on a project where a photographer is going to upload a set of high-resolutions pictures (20 to 40 pictures). I am going to store each picture twice: 1 original and 1 with the watermark. On the platform, only pictures with a watermark are going to be displayed. The user will be able to buy pictures and the one selected are going to be send by email (original pictures).
bucket-name: photoshoot-fr
main-folder(s): YYYY-MM-DD-HH_MODEL-NAME example: 2020-01-03_Melania-Doul
I am not sure here if I should have 2 different folder inside the previous folder which are original and protected. Both folders are going to contain the exact same pictures with the same id but one is going to store the original pictures and the other one protected pictures. Is there any better bucket design solution?
n.b: it's a personal project but there are multiple photographers and each photographers are going to upload 2-3 set of photos every day. Each main-folder is going to be deleted after 2 months.

Related

Tagging photos in s3 for sorting versus storing in the database

So as a photo company we present photos to customers in a gallery view. Right now we show all the photos from a game. This is done by getting a list of the objects and getting a presigned URL to display them on our site. And it works very well.
The photos belong to an "event" and each photo object is stored in an easy to maintain/search folder structure. And every photo has a unique name.
But we are going to build a sorting system so customers can filter their view. Our staff would upload the images to S3 first, and then the images would be sorted. And as of right now, we do not store any reference to the image in the database until the customer adds it to their cart.
So I believe we have 3 ways we can tag these images..
Either store a reference to the image in our database with tags.
Apply metadata to the s3 object
Apply tags to the s3 object
My issue with the database method is, we shoot hundreds of thousands of images a month, I feel that would overly bloat the database. Maybe we create a separate DB just for this (dynamo, etc?)?
Metadata could work, and for the most part the images will only be tagged or untagged an average of 1 time. But I understand that every time the metadata is changed that it would create a new copy of that image. We don't do versioning so there would still only exist one copy. But there would be a cost associated with duplicating an image, right? But the pro would be, the metadata would come down with the GET object, so a second request wouldn't be needed. But available or not in the presigned URL header?
Tags on the other hand can be set/unset as needed with no/minimal additional cost. Although getting objects and the tags would be two separate calls... but on the other hand I can get the list of objects by tag and therefore only getting the presigned urls for the objects that need to be displayed vs all of them. So that could be helpful.
That is at least how I understand it. If I am wrong please tell me.

how to remove images from amazon mws catalog

I have got 2 amazon account, US, UK (diff merchantId) and in UK account, i have got DE,ES,IT,FR account with the same merchantID as UK one.
the problem here is, last week in backend because of some issue images got duplicated and when i published the catalog, all images got duplicated in amazon accounts as well. I have spoken to catalog team and they said that they can not do anything unless i provide them the ASINS and only 7 ASINS per support case. I have got almost 9000 ASINS. I need to remove the duplicated images from my catalog and i dont know what to do..
I have published the image feed for 10 ASINS with Operation type 'DELETE' to delete all the images for that product and the response says successful BUT images are still there, I dont know why... i was reading somewhere that images are linked between accounts, so if i delete the images from my UK account, i still need to do the same for US account. i did the same with success response BUT still images are not being deleted from amazon...
can anyone tell me how to remove all images regardless of duplicated or not? and then i can update the images again with proper order.
appreciate your help and time.

If you`d wish to delete single images you have two options:
Delete them manually in your inventory.
Create a file with the full list of ASINs and forward this list in a new case to Amazon Seller Support and ask them to delete the images on this ASINs. As this order will delete every image on this ASINs you`ll have to re-upload the residual images.
Info: Option 2 is just goodwill and I cannot confirm that every Seller Support associate will do that.
Thanks
Raz

Django Cache + Django Database request

I'm building a Django web application which allow users to select a photo from the computer system and keep populating onto the users timeline. The timeline will be showing 10 photos initially and then have a pull to refresh to fetch the next 10 photos on the timeline.
So my first question is I'm able to upload images which gets store on the file system,but how do I show only first 10 and then pull a refresh to fetch the next 10 and so on.
Next, I want the user experience of the app to be fast. So, I'm considering caching. So, i was thinking, what do I cache. Since there are 3 types of cache in Django- Database cache, MemCache, or FileSystem Caching.
So my secon question is should I cache the first 10 photos of each user or something else?
Kindly answer with your suggestions.

So my first question is I'm able to upload images which gets store on the file system,but how do I show only first 10 and then pull a refresh to fetch the next 10 and so on.
Fetch first 10 with your initial logic, fetch next photos in chronological order. You must have some timestamp relating to your photo posting. Fetch images according to that. You can use Django Paginator for this.
what do I cache
Whatever static data you want to show to the user frequently and wont change right away. You can cache per user or for all users. According to that you choose what to cache.
should I cache the first 10 photos of each user or something else
Depends on you. Are those first pictures common to all the users? Then you can cache. If not and the pictures are user dependent, there is no point caching them. The user will anyway have to fetch the first images. And I highly doubt the user will keep asking for the same first 10 photos frequently. Again, it's your logic. If you think caching will help, you can go ahead and cache.

The DiskCache project was first created for a similar problem (caching images). It includes a couple of features that will help you to cache and serve images efficiently. DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.
diskcache.DjangoCache provides a Django-compatible cache interface with a few extra features. In particular, the get and set methods permit reading and writing files. An example:
from django.core.cache import cache
with open('filename.jpg', 'rb') as reader:
cache.set('filename.jpg', reader, read=True)
Later you can get a reference to the file:
reader = cache.get('filename.jpg', read=True)
If you simply wanted the name of the file on disk (in the cache):
try:
with cache.get('filename.jpg', read=True) as reader:
filename = reader.name
except AttributeError:
filename = None
The code above requests a file from the cache. If there is no such value, it will return None. None will cause an exception to be raised by the with statement because it lacks an __exit__ method. In that case, the exception is caught and filename is set to None.
With the filename, you can use something like X-Accel-Redirect to tell Nginx to serve the file directly from disk.

where to store thumbnails, DB or S3?

We have lots of images stored in AWS s3. We plan to provide thumbnails for user to preview. If I store them in S3, I can only retrieve them one by one, which is not efficient. Show I store them in database? (I need query my database to decide which set of thumbnails to show the user to preview)

The best answer depends on the usage pattern of the images.
For many applications, S3 would be the best choice for the simple reason that you can easily use S3 as an origin for CloudFront, Amazon's CDN. By using CloudFront (or indeed any CDN), the images are hosted physically around the world and served from the fastest location for a given user.
With S3, you would not retrieve the images at all. You would simply use S3's URL in your final HTML page (or the CloudFront URL, if you go that route).
If you serve images from the database, that increases resource consumption on the DB (more IO, more CPU, and some RAM used to cache image queries that is not available to cache other queries).
No matter which route you go, pre-create the thumbnail rather than producing it on the fly. Storage space is cheap, and the delay to fetch (from S3 or the DB), process, then re-serve the thumbnail will lessen the user experience. Additionally if you create the thumbnail on the fly, you cannot benefit from a CDN.

If I store them in S3, I can only retrieve them one by one, which is not efficient.
No, it only looks inefficient because of the way you are using it.
S3 is massively parallel. It can serve your image to tens of thousands of simultaneous users without breaking a sweat. It can serve 100's of images to the same user in parallel -- so you can load 100 images in the same time it takes to load 1 image. So why is your page slow?
Your browsers is trying to be a good citizen and only pull 2-4 images from a site at a time. This "serialization" is what is slowing you down and causing the bottleneck.
You can trick the browser by hosting assets on multiple domains. This is called "domain sharding". You can do it with multiple buckets (put images into 4 different buckets, depending on the last digit of their ID). Alternatively, you can do it with CloudFront: http://abhishek-tiwari.com/post/CloudFront-design-patterns-and-best-practices/

As a best practice, you should store your static data in S3 & save their reference in Db.
In your particular case, you can save filename / hyperlink to the image file in a database that you can query upon depending on your business logic.
This will give you reference to all the images that you can now fetch from S3 & display it to your users.
This can also help you to replace your reference to thumbnail depending on your need. For example, if you are running a e-commerce site, you can replace the thumbnail reference to point to new product image without much effort.
I hope this helps.

Knowing when a photo is deleted with Flickr API

In my application, I have to run a periodical job where I need data about all the photos on my client's Flickr account. Currently, I perform several calls to flickr.photos.search to retrieve meta about all the photos each time.
What I ask is: is there a way to be get informed by Flickr when a photo is modified or deleted, so that I don't need to retrieve metas for each photos, but rather store theme once, and only download what has changed since the last time I run the job ?
Thx in advance for your help.

There is no such notification possible from the Flickr API to your code.
What you can do on the other hand is (recommended only if volume for change of photo metadata is high) -
Setup a cron job which would scan through the photos and store if the photo id's are deleted or not - which can be used later.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js