How to design scalable video streaming architecture using GCP? - google-cloud-platform

I have a video streaming application which does streaming the video from google storage bucket. All the files which reside on the storage bucket are not public. Every time when users click on a video from the front-end I am generating a signed URL using API and load into the HTML5 video player.
Problem
I see if the file size is more than 100 MB it takes around 30-40 sec to load the video on front-end.
When I googled to resolved this problem, some of the articles are saying use cloud CDN and storage bucket then cache the file. As far as I know, to cache the file, the file has to publicly available. I can't make files publicly available.
So my concern is, are there any ways where we can make it scalable/ reduce the initial time?

Cloud CDN will help your latency for sure. Also, with that amount of latency it might be good to look into the actual requests that are being sent to Cloud Storage to make sure chunks are being requested and that the whole video file isn't being loaded before starting to play.
Caching the file does not require that the file is public. You can make the file private and add the Cloud CDN service into your Cloud Storage ACLs (https://cloud.google.com/cdn/docs/using-signed-urls#configuring_permissions). Also, as Kolban noted above, signed cookies might be better for your application to streamline the requests.

Not an exact answer but this site is useful to design solution using GCP.
https://gcp.solutions/diagram/media-transcoding
As mentioned earlier, CDN is right way to go for video streaming with low latency.

Related

What is the difference between S3 video storage and streaming?

I'm hosting videos on aws S3 at the moment. I can place the s3 url into the src attribute of my tags and everything works correctly and plays as though the video is being streamed to my site. These are not small videos either. Some are 1gb in size.
I can also immediately jump to the end of the video as though the entire file wasn't downloaded, but just the part I need.
Whenever I google info on streaming on demand video from aws I get answers that I need a service in front of s3 to do something like this. Is aws automatically doing this for me?
S3 support partial GET requests. This allows clients to request only a specific part of the file. Most modern players (including HTML5) are able to utilize this feature to provide the experience you describe to the users.
Quoting from here:
HTTP range requests allow to send only a portion of an HTTP message
from a server to a client. Partial requests are useful for large media
or downloading files with pause and resume functions, for example.

Zip images on web server and return the url

I am currently looking for a way to improve the traffic flow of an app.
Currently the user uploads his data via the app, using Google Cloud Platform as storage provider. Other users can then download this data again.
This works well so far, but since the download traffic at GCP is relatively expensive I had the idea to outsource this to a cheap web server.
The idea is that the user requests the file(s) at GCP. There it is checked if the file(s) are already on the web server. If not, the file(s) will be uploaded to the server.
At the server the files are zipped and the link is sent back to GCP, where it is emailed to the user.
TL:DR My question is, how can i zip a specific selection of files on a web server without nodejs etc. and send the link of the generated file back to GCP
I'm open for other ideas aswell
This is a particular case, covered by Google Cloud CDN (Content Delivery Network) service.
As you can read here, there already is a way to connect the CDN to a Storage bucket, and it will do exactly what you've thought to do with your own web server. The only difference is that it's already production ready. It handles cache misses, cache hits and so on.
You can compare the prices: here you can find CDN prices, and here you can find Storage prices. The important difference is that Storage costs per TB of Egress, meanwhile CDN costs per 10TB of Egress, and the price is still lower.
Of course, you can still stick to your idea. I would implement it by developing a REST API. The API, with just one endpoint will serve the file, if it is present on the web server. If it is not present, it will:
perform a redirect to the direct link for the file hosted in Storage;
start to fetch the file form Storage and put it in the cache.
You would still need to handle the cache: what happens when somebody changes a file? That's something related to the way you're working with those files, so it strictly depends on your app functional domain, and in any case, Cloud CDN would solve it without any further development.

Cloud File Storage with Bandwidth Limits

I want to develop an app for a friend's small business that will store/serve media files. However I'm afraid of having a piece of media goes viral, or getting DDoS'd. The bill could go up quite easily with a service like S3 and I really want to avoid surprise expenses like that. Ideally I'd like some kind of max-bandwidth limit.
Now, the solutions for S3 this has been posted here
But it does require quite a few steps. So I'm wondering if there is a cloud storage solution that makes this simpler I.e. where I don't need to create a custom microservice. I've talked to the support on Digital Ocean and they also don't support this
So in the interest of saving time, and perhaps for anyone else who finds themselves in a similar dilemma, I want to ask this question here, I hope that's okay.
Thanks!
Not an out-of-the-box solution, but you could:
Keep the content private
When rendering a web page that contains the file or links to the file, have your back-end generate an Amazon S3 pre-signed URLs to grant time-limited access to the object
The back-end could keep track of the "popularity" of the file and, if it exceeds a certain rate (eg 1000 over 15 minutes), it could instead point to a small file with a message of "please try later"

How do I transfer images from public database to Google Cloud Bucket without downloading locally

I have a a csv file that has over 10,000 urls pointing to images on the internet. I want to perform some machine learning task on them. I am using Google Cloud Platform infrastructure for this task. My first task is to transfer all this images from the urls to a GCP bucket, so that I can access them later via docker containers.
I do not want to download them locally first and then upload them as that is just too much work, instead just transfer them directly to bucket. I have looked at Storage Transfer Service and for my specific case I think, I will be using a URL list. Can anyone help me figure out how do I proceed next. Is this even a possible option?
If yes, how do I generate an MD5 has that is mentioned here for each url in my list and also get the number of bytes for image for each url ?
As you noted, Storage Transfer Service requires that you provide it with the MD5 of each file. Fortunately, many HTTP servers may provide you with the MD5 of an object without requiring that you download it. Issuing an HTTP HEAD request may result in the server providing you with a Content-MD5 header in its response, which may not be in the form that Storage Transfer service requires, but it can be converted into that form.
The downside here is that web servers are not necessarily going to provide you with that information. There's no way of knowing without checking.
Another option worth considering is to set up one or more GCE instances and run a script from there to download the objects to your GCE instance and from there upload them into GCS. This still involves downloading them "locally," but locally no longer means a place off of Google Cloud, which should speed things up substantially. You can also divide up the work by splitting your CSV file into, say, 10 files with 1000 objects each in them, and setting up 10 GCE instances to do the work.

Load big file from Google Cloud Storage into Google Cloud Functions?

Is there a way to load big files (>100MB) from Google Cloud Storage into Google Cloud Functions? I read in their quotas that the "Max event size for background functions" is limited to 10MB. Can I read it chunk-wise or something like that?
Many thanks.
Cloud Functions for Storage are triggered with the metadata for the file, which is relatively small and won't hit the max-event-side limit.
To access the actual contents of the file, you'll use the node.js package for Cloud Storage, which is not affected by the 10MB limit.
Unfortunately to my knowledge this isn't possible.
It is however possible to upload larger files from Google Cloud Functions to Cloud Storage by setting resumable=true. The way this works is that it uploads 10MB of the file to your bucket, the request eventually times out and retries, which will then re-download, re-process and re-upload the the file, resuming from where it left of with the next 10MB of the file, and so on.
Obviously this requires all processing to be done repeatedly and the request to time out making the entire process extremely inefficient and not recommended.