In dialogflow, how to upload file containing questions and answers programmatically? - google-cloud-platform

Can we upload a training data (in .txt) using Python code in Dialogflow or Google cloud platform using Detect intent and Agent API ? If so, please share your insights.

You can look at using a PUT request to add additional training data to your intents. However, there is not a direct option to upload a text file. Generally Dialogflow does a really good job of interpreting the user's intent with just a handful of training samples, making it feasible to type each in manually or copy & paste. As it uses machine learning to match similar phrases, it shouldn't be necessary to upload a large text file.

Yes, you for training phrases you can upload one .txt file(one line per phrase) or multiple .txt zipped archive files(there's a limit of 10).
There's more on this here in the docs.

Related

Streaming media to files in AWS S3

My problem:
I want to stream media I record on the client (typescript code) to my AWS storage (services like YouTube / Twitch / Zoom / Google Meet can live record and save the record to their cloud. Some of them even have host-failure tolerance and create a file if the host has disconnected).
I want each stream to have a different file name so future triggers will be available from it.
I tried to save the stream into S3, but maybe there are more recommended storage solutions for my problems.
What services I tried:
S3: I tried to stream directly into S3 but it doesn't really support updating files.
I tried multi-part files but they are not host-failure tolerance.
I tried to upload each part and have a lambda to merge it (yes, it is very dirty and consuming) but I sometimes had ordering problems.
Kinesis-Video: I tried to use kinesis-video but couldn't enable the saving feature with the SDK.
By hand, I saw it saved a new file after a period of time or after a size was reached so maybe it is not my wanted solution.
Amazon IVS: I tried it because Twitch recommended this although it is way over my requirements.
I couldn't find an example of what I want to do in code with SDK (only by hand examples).
Questions
Do I look at the right services?
What can I do with the AWS-SDK to make it work?
Is there a good place with code examples for future problems? Or maybe a way to search for solutions?
Thank you for your help.

what is the efficient way of pulling data from s3 among boto3, athena and aws command line utils

Can someone please let me know what is the efficient way of pulling data from s3. Basically I want to pull out data between for a given time range and apply some filters over the data ( JSON ) and store it in a DB. I am new to AWS and after little research found that I can do it via boto3 api, athena queries and aws CLI. But I need some advise on which one to go with.
If you are looking for the simplest and most straight-forward solution, I would recommend the aws cli. It's perfect for running commands to download a file, list a bucket, etc. from the command line or a shell script.
If you are looking for a solution that is a little more robust and integrates with your application, then any of the various AWS SDKs will do fine. The SDKs are a little more feature rich IMO and much cleaner than running shell commands in your application.
If your application that is pulling the data is written in python, then I definitely recommend boto3. Make sure to read the difference between a boto3 client vs resource.
Some options:
Download and process: Launch a temporary EC2 instance, have a script download the files of interest (eg one day's files?), use a Python program to process the data. This gives you full control over what is happening.
Amazon S3 Select: This is a simple way to extract data from CSV files, but it only operates on a single file at a time.
Amazon Athena: Provides an SQL interface to query across multiple files using Presto. Serverless, fast. Charged based on the amount of data read from disk (so it is cheaper on compressed data).
Amazon EMR: Hadoop service that provides very efficient processing of large quantities of data. Highly configurable, but quite complex for new users.
Based on your description (10 files, 300MB, 200k records) I would recommend starting with Amazon Athena since it provides a friendly SQL interface across many data files. Start by running queries across one file (this makes it faster for testing) and once you have the desired results, run it across all the data files.

Question about high level architecture required to process and visualize fitness app data (From Apple Health for example) using google cloud services?

I'm working on a project where I am tasked to use google cloud services to process and visualize fitness data. For example, I have exported some apple health data from my watch, and it is in .xml format. From a high level, I envision this .xml file starting off in object storage, and being converted to .csv through a cloud function (triggered by the creation of the .xml object in storage) and stored again in object storage (different bucket). Then I see these .csv files being processed by a DataFlow pipeline, which will reformat the data to the template schema that I would like the data to be organized with. This pipeline will output the resultant .csv to BigQuery, which will then be designated as a data source for Data Studio. I will then configure Data Studio to produce some simple reports that compare the health data to recommended values. I would like for this report to be accessible as a .pdf in object storage potentially as well. Am I on the right track, or am I missing some key services to accomplish this?
Also, I'm new to posting on StackOverflow, so if this question is against the rules or not welcome, please let me know.
Any feedback is greatly appreciated, as I have not been able to bounce these ideas off of other experienced cloud architects/developers.
This question is currently off-topics by the rule of StackOverflow, as it does not contain any problems to resolve. See point 4-5.
As a high-level advice, I do not see why it should not be possible based on the services you mentioned but you would need to implement it and try it on your side and evaluate the features of each service in your workflow.
In terms of solution or architecture advice, those are generally paid services and you would most likely find little help here for those unless you have a specific problem to solve with said services. You might find some help on the internet as well. ie.Cloud Solutions, Built it on GCP, etc
You might find this interesting to review as well as it mimics your solution. Hope this helps.

Does google store the requests that are sent via Google DLP API

I am trying to understand if Google stores text or data that are sent to DLP API? For example, I am having some data (text files) locally and I am planning to use google DLP to help identify sensitive information and maybe transform those back.
Would Google store the text files data that I am using? In other words, would it retain a copy of the files that I am sending? I am trying to read through the security and compliance page, but there is nothing that I could find that clearly explains this.
Could anyone please advise?
Here is what I was looking at https://cloud.google.com/dlp/data-security
Google DLP API only classifies and identifies the kind of data, mostly sensitive, we want to analyse and Google doesn't store the data we send.
We certainly don't store the data being scanned with the *Content api methods beyond what is needed to process it and return a response to you.

Printing multiple, customized user docs from a website

I have a database driven website built with Django running on a Linux server. It manages many separate groups each with hundreds of users. It needs to be able to print customized docs (i.e. access credentials) on demand for one, some or all users. Each group has its own logo and each credential is customized with the user's name, photo and some number of additional graphic stamps. All the custom information is based on stored data for the user.
I'm trying to determine the best method for formatting the credentials and printing. Here are the options I've come up with so far:
straight HTML formatting, using table tags to break the credential into cells to contain the custom text or graphics. This seems straightforward except it doesn't seem to lend itself to printing a couple hundred credentials at once.
Starting with a doc template in the form of a PDF file and using available PDF command line toolkits to stamp in the custom information and append the multiple PDFs into a single file for printing. This also seems reasonable except that the cost of a server license for these toolkits is prohibitively expensive for Linux (>$500).
stand alone program running on the client that retrieves user data via a web service and does all the formatting and printing locally.
Are there other options? Any advice? Thanks for your help.
I once did something similar using SVG. This allows for great flexibility as you can design your "credential" in inkscape, use placeholder names and logos, and then once completed, open the output svg in a text editor and replace the placeholders with context variables.
One tip, put all django template code (if any) as xml comments, ex <!--{% load xyz_tags %}-->, otherwise, a lot of things get screwed up if you open it in inkscape.
Solution was to use the open source ReportLab library to build up the PDF pages from scratch.
I could not find an inexpensive way to stamp the custom components into an existing PDF. ReportLab can do this, but only through their commercial product.
Works great though.