QuickSight using ML [closed] - amazon-web-services

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I would like to use the ML model I created in AWS in my QuickSight reports.
Is there a way to consume the ML endpoint in order to run batch predictions in QuickSight?
Can I define a 'calculated field' in order to do that?

At this time there is no direct integration with AWS SageMaker and QuickSight, however you can use utilize SageMaker's batch transform jobs to convert data outside of QuickSight and then import this information into QuickSight for visualization. The output format for SageMaker's batch transform jobs is S3, which is a supported input data source for QuickSight.
https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-sagemaker-supports-high-throughput-batch-transform-jobs-for-non-real-time-inferencing/
Depending on how fancy you want to be, you can also integrate calls to AWS services such as AWS Lambda or AWS SageMaker as a user-defined function (UDF) within your datastore. Here are a few resources that may help:
https://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html
https://aws.amazon.com/blogs/big-data/from-sql-to-microservices-integrating-aws-lambda-with-relational-databases/
Calculated fields will probably not help you in this regard - calculated fields are restricted to a relatively small set of operations, and none of these operations support calls to external sources.
https://docs.aws.amazon.com/quicksight/latest/user/calculated-field-reference.html

Related

Change glue jobs to lambda function [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 months ago.
Improve this question
I got my first job as a BI support with AWS and the company has several glue jobs which is a very expensive game so I want to try and change it, instead of using glue jobs to use lambda function. The question is, how do I change a glue job to a lambda function? can anybody help? Thanks.
In general: you don't.
A glue Job can a) run for faaaar longer and b) can consume faaaar more resources and c) can have code and dependencies far exceeding the limits of Lambda. You can't replace a glue job with a lambda unless you did not need a glue job in the first place because you operate on few resources, for a short time with little code. If that is the case you would need to be a lot more specific how the current job is integrated. E.g. triggers will no longer work, network connectivity might no longer work, etc.

What services would best be used to collect, transform, and store media player logs? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
A video player sends the server log data about what the user has been doing (start, pause, play, playing, etc.)
Sending the logs to the server and storing them in the DB, then running queued jobs to calculate stats on these has worked... okay, so far.
It's clear there should be some sort of optimization here. What services provide the best custom log storage?
What would be the best manual option? Considering running some Lambda functions and storing in AWS (RDS?) manually, but wondering if the maintenance of such a service is warranted.
I would store the logs in AWS S3 (Storage) and then use AWS Glue (Transform) and AWS Athena for ad-hoc querying of different stats, this will still work out cheaper than using a traditional database approach plus it has a lot of other advantages.

Is there tooling to publish the ENTIRE content FROM an AWS QLDB table to Cloudwatch using CloudWatch putMetric API? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have a use-case where QLDB table contains this :
customer-id | customer-name | customer-item-count
I have a need to publish the metrics per customer-id every 5 minutes to cloudwatch and this data is available is in the QLDB table.
Is there a way to do?
QLDB has export jobs to export the content to S3, is there tooling to dump contents to cloudwatch?
Many customers use periodic S3 exports (or Kinesis integration, if you signed up for the preview) to keep some sort of analytics DB up to date. For example, you might copy data into Redshift or ElasticSearch every minute. I don't have code examples to share with you right now. The tricky part is getting the data into the right shape for the destination. For example, QLDB supports nested content while Redshift does not.
Once the data is available and aggregated in the way you wish to query it, it should be a simple matter to run a report and write the results into CloudWatch.

How to use Google DLP API to delete sensitive content from data stored in Google Big Query? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a certain table in Google Big Query which has some sensitive fields. I read and understood about inspection of data but cannot find a way to redact the data using DLP API directly in BigQuery database.
Two questions:
Is it possible to do it just using DLP API?
If not, what is the best way to fix data in a table which runs into Terabytes?
The API does not yet support de-identifying bigquery directly.
You can however write a dataflow pipeline that leverages content.deidentify. If you batch your rows utilizing Table objects (https://cloud.google.com/dlp/docs/reference/rest/v2/ContentItem#Table) this can work pretty efficiently.

What is the right cloud service(s) for real-time streaming and backtesting applications? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I wish to record time series data at a very high frequency. I am wondering if there is a elegant serverless solution that allows me to store, and react to real-time data.
I want to use stored data to create statistical models, and then I want to process new data in real-time based on those models.
AWS Kinesis streams seems to fit the bill - however, I am unsure if it is only for reacting in real time, or if it also collects historical data that I might be able to use offline to build models.
Google DataFlow and Pub/Sub also seem to be relevant, but not sure if it would be appropriate for the above.
If you go with AWS, you might use Kinesis and EMR to achieve your goal. Firstly you can create a delivery stream in fully managed Kinesis Firehose and route it to S3 or Redshift to collect historical data.
Once your data is on S3, you may do the statistical analysis by pointing the S3 bucket to an EMR job to process fresh data that s3 receive. Read this article for more information.
On EMR managed hadoop framework, you may setup Open-Source R and RStudio for statistical analysis if you will. Here is guide on that.
We accomplished this using Kinesis with Flink ( from apache ) . Flink is really very scalable solution.