Google cloud - file pipe - google-cloud-platform

I have a system that is able to produce a custom video (based on input text) faster than real-time.
I would like to create an HTTP endpoint: /create_video?description=dog riding a horse that, as part of the response, returns the URL to the produced video.
Video can be quite long, and its generation can take some time. Rather than waiting for it to complete, I would like to return the response as soon as first frames are available, such that the user can watch instantly using the provided URL (we generate faster than real-time, so there will be no buffering). The URL must point to generate video indefinitely (even months after generation).
I am using Google Cloud. What would be the recommended way to do that?
I could create a custom endpoint that serves the videos, and has the described properties, but maybe something as simple as Cloud Storage could work (I was not able to get it to read while writing was not finalized though)?

As per #Piotr Dabkowski, after doing some extra research it seems to not be that easy. My best idea is to implement a custom endpoint that streams the result, while the file is being generated using temporary array entry in DB. Once the file is fully generated (db entry will be empty and point to cloud storage location), redirects to cloud storage.

Related

Streaming media to files in AWS S3

My problem:
I want to stream media I record on the client (typescript code) to my AWS storage (services like YouTube / Twitch / Zoom / Google Meet can live record and save the record to their cloud. Some of them even have host-failure tolerance and create a file if the host has disconnected).
I want each stream to have a different file name so future triggers will be available from it.
I tried to save the stream into S3, but maybe there are more recommended storage solutions for my problems.
What services I tried:
S3: I tried to stream directly into S3 but it doesn't really support updating files.
I tried multi-part files but they are not host-failure tolerance.
I tried to upload each part and have a lambda to merge it (yes, it is very dirty and consuming) but I sometimes had ordering problems.
Kinesis-Video: I tried to use kinesis-video but couldn't enable the saving feature with the SDK.
By hand, I saw it saved a new file after a period of time or after a size was reached so maybe it is not my wanted solution.
Amazon IVS: I tried it because Twitch recommended this although it is way over my requirements.
I couldn't find an example of what I want to do in code with SDK (only by hand examples).
Questions
Do I look at the right services?
What can I do with the AWS-SDK to make it work?
Is there a good place with code examples for future problems? Or maybe a way to search for solutions?
Thank you for your help.

Google Cloud Vision - How to check logs that include original image uploaded or results of processed request

We are trying to look into the details of Google Cloud Vision transactions. We are interested on the Cloud Vision requests where the returned processing is below satisfactory (e.g. empty JSON). In general: we are interested in what input was received and what did GCV process with that?
I had assumed this would be auto-logged?
It seems that the default logging solution does not provide much information about the value of the transaction other than the time or error type. (Is there a way to dig deeper into the log?)
Is there a way to log (or somehow view the uploaded url of) the original image that the service received and/or the results of the processed request?
Could you provide an example of how to retrieve the detected results and/or the input image, say, for "DOCUMENT_TEXT_DETECTION"
Can you be a bit more specific? Which specific Google Cloud Vision service are you trying to use (Image Classification, Object Detention)? Are you using the GCP console (i.e. UI), the API, ...? Which can of information do you want to get?
In any case, you can use advanced logs to have a look at your Google Cloud Vision logs. For instance, you can use the following filter to see the error logs:
protoPayload.serviceName="vision.googleapis.com"
severity>=ERROR
Or remove the second line for getting all the logs related with Cloud Vision. You can then click on "Expand" to get all the information about the job.

Hosting a Chatbot Program in AWS Lambda

I am a developer and new to the system engineering part, so still getting my concept clear.
I need to deploy my chatbot in Lambda and host it using API Gateway, but following conceptual problem is arising.
I have a chatbot built using simple AIML. I created it on python and its working properly.
For those who don't know of AIML, here I create an image of the AIML kernel : k = aiml.Kernel() and then as the conversation flow happens this kernel image is important for the conversation.
In my system, at an instance I just have one image of the kernel and things are good. But when I host this python program to Lambda and deploy it using API Gateway, for each request I will have a new image of the kernel, and my program will not function properly.
In a chatbot the conversation is happening at runtime, and and past conversation data is important, but if I am using API Gateway to trigger the Lambda function each time the user writes a new line, then every time a new image will be created of the kernel.
One option which I found was storing the user's session and conversation in a database. But in runtime if I am chatting, then the retrieval of past conversation and have the past conversation in the new image of kernel doesn't sounds a good way to go.
Or, even if we store the past conversation and send to the Lambda function using some JSON payload, then also since a new image of Kernel will be created by API Gateway, I have to run all the past conversation first and then only get the response for the new dialogue in the chat.
IN SHORT : How can I have one image of the kernel in the Lambda function, and get output using API Gateway, where the API is called multiple time for the same image of lambda function.
Or even if you know the general idea, how most online chatbots process and give responses, then that will also be very helpful.
To answer your actual question, yes. You can create your kernel image outside of the lambda handler function. This means that the image will only be created at the point a new lambda container is spun up and won't be recreated at every invocation.
If prior conversation is important, then I feel I should warn you about some of the pitfalls of this approach though.
Lambda containers will die if no new requests are received (approx. half hour, but AWS doesn't specify this and can change it at any time).
Lambda containers will be recycled periodically, even if they are being used.
If you have multiple conversations, you can't assign a specific lambda container to a particular user.
The best way could be to use inbuilt data structures to maintain such conversations.
You need to run the whole thing essentially. However, appropriate mapping to reach quickly to the desired o/p may enhance/optimize your result.

Google Tag Manager clickstream to Amazon

So the questions has more to do with what services should i be using to have the efficient performance.
Context and goal:
So what i trying to do exactly is use tag manager custom HTML so after each Universal Analytics tag (event or pageview) send to my own EC2 server a HTTP request with a similar payload to what is send to Google Analytics.
What i think, planned and researched so far:
At this moment i have two big options,
Use Kinesis AWS which seems like a great idea but the problem is that it only drops the information in one redshift table and i would like to have at least 4 o 5 so i can differentiate pageviews from events etc ... My solution to this would be to divide from the server side each request to a separated stream.
The other option is to use Spark + Kafka. (Here is a detail explanation)
I know at some point this means im making a parallel Google Analytics with everything that implies. I still need to decide what information (im refering to which parameters as for example the source and medium) i should send, how to format it correctly, and how to process it correctly.
Questions and debate points:
Which options is more efficient and easiest to set up?
Send this information directly from the server of the page/app or send it from the user side making it do requests as i explained before.
Does anyone did something like this in the past? Any personal recommendations?
You'd definitely benefit from Google Analytics custom task feature instead of custom HTML. More on this from Simo Ahava. Also, Google Big Query is quite a popular destination for streaming hit data since it allows many 'on the fly computations such as sessionalization and there are many ready-to-use cases for BQ.

Fetch data from website real time

Ok basically i'm fetching data from website using curl and parsing the contents using CkHtmlToText.
My issue is how to fetch new data website is writing down.
For example website contents are as follow:
-test1
-test2
After 1 second contents are :
-test1
-test2
-test3
How to fetch only the next line website wrote down that i didnt get yet which is " test3".
Any ideas ? Thank you.
Language im using is : Visual c++
HTTP requests are stateless. You make a request, you get a result, then you make another completely independent request, you get another result, and so on. If the resource you are trying to access is changing over time, you need to make multiple requests, where each time you will get the full updated resource.
I imagine you may be describing a web page that automatically updates while you are looking at it (like a Twitter feed, for example). In that case, the response contains a script that allows the browser to fetch new data and inject it into the DOM. Unless you also plan to build the DOM and use a JavaScript engine (basically implementing a web browser) to run the script, this is probably not useful to you. Instead, you are better off finding an API that gives you data in a format that is easy to parse and get updates for. If this API is a REST API (built on HTTP), then you will still need to make independent requests to get updates.