AWS Dynamo DB: High read latency using query on GSI

AWS Dynamo DB: High read latency using query on GSI - amazon-web-services

I have a dynamo db table which contains Date, City and other attributes as the columns. I have configured GSI with Date as the hash key. The table contains 27 attributes from 350 cities recorded daily.
| Date | City | Attribute1 | Attribute27|
+------------+------------+-------------+------------+
| 25-06-2020 | Boston | someValue | someValue |
| 25-06-2020 | NY | someValue | someValue |
| 25-06-2020 | Chicago | someValue | someValue |
+------------+------------+-------------+------------+
I have a Lambda proxy integration setup in API Gateway. The lambda function receives a 7 day date range as the request. Each of the date, in this range is used query the dynamodb (using query input) to get all the items for a given day. The result for each day is consolidated for a week, and is then sent back as a JSON response.
The latency seen in POSTMAN is around 1.5s, after increasing the lambda memory to 1024MB (Even though, only 76MB is being consumed).
Is there any way to improve the performance? The dynamo db is already running in On-Demand Capacity.

You don't say if you are using parallel queries or not.
If not do so.
You also don't say what cloudwatch is showing for Query latency, as mentioned by Marcin, DAX can help reduce that.
You also don't mention what cloudwatch is showing for lambda execution. There's various articles about optimizing lambda.
Whatever's left is networking...not much you can do about that..one piece to consider is reusing Db connections in your lambda

Related

How can I find the missed utterances rate per day from Lex using CloudWatch?

We want to find the missed utterance rate per day from Lex logs.
For example:
Day 1 - 10 total utterances, 1 missed utterance
Day 2 - 20 total utterances, 4 missed utterance
...
We want to be able to plot (missed utterances/total utterances x 100) per day (essentially, %) for one week, however we also need to include Lambda exceptions as part of our "missed utterances" count.
How do we calculate the total & missed utterance count and then obtain a %?
Is this possible in CloudWatch Insight Logs?
Expectd output is a graph for 7 days that has the percentage of missed utterances+exceptions to total utterances for the day.
<date 1> 1%
<date 2> 4%
...
One query we tried is:
fields #message
| filter #message like /Exception/ or missedUtterance=1
| stats count(*) as exceptionCount, count(#message) as messageCount by bin(1d)
| display exceptionCount, (exceptionCount/messageCount) * 100
| sort #timestamp desc

This is unfortunately not possible to do within CloudWatch Log Insights as you would need to have 2 filter & 2 stats commands.
One filter would be used for getting the total count & another for getting the exception + missed utterance count.
While you can filter after one another, you can't get the counts of the result of each filter as 2 stats commands are not supported from within Log Insights (yet).
The most you can do within CloudWatch is to create a dashboard (or 2 Log Insights) with the below queries and calculate the percentage yourself:
fields #message
| stats count(*) as totalUtteranceCount by bin(1d)
fields #message
| filter #message like /Exception/ or missedUtterance = 1
| stats count(*) as exceptionAndMissedUtteranceCount by bin(1d)
In an enterprise chatbot project that I was an engineer on, I configured logs to be exported to ElasticSearch (OpenSearch in AWS Console), which then opened a whole new world of data analysis & gave me the ability to run statistics like the above.
If this is a must, I would look to implementing a similar solution until AWS improves CloudWatch Log Insights or provides this statistic within Amazon Lex itself.
In the long run, I would go with the first option however Log Insights is not meant to be a full-blown data analysis tool & you'll need to carry out much more analysis on your data (missed utterances, intents etc.) anyway in order to be able to improve your bot.
Hopefully, something like this query works in the future!
fields #message
| stats count(*) as totalUtteranceCount by bin(1d)
| filter #message like /Exception/ or missedUtterance = 1
| stats count(*) as exceptionAndMissedUtteranceCount by bin(1d)
| display (exceptionAndMissedUtteranceCount/totalUtteranceCount) * 100
| sort #timestamp desc

We could get it working using the below query:
fields strcontains(#message, 'Error') as ErrorMessage
|fields strcontains(#message, '"missedUtterance":true') as #missedUtterance
| stats sum(ErrorMessage) as ErrorMessagCount , sum(missedUtterance) as missedCount,
count(#message) as messageCount , (((ErrorMessagCount) + (missedCount)) /messageCount * 100) by bin(1d)
Here, we are using strcontains instead of parse because if there are no missed utterance on a particular day, the calculation (ErrorMesageCount + missedCount)/messageCount * 100 was empty.
Answer is like:

How to get uptime total and percentage of GCP compute vm instance through MQL?

I am trying to get total uptime of a single GCP compute vm instance inclusive of restarts. I've seen multiple posts not one with using MQL.
Eg: In the past 24 hours if instance is not running for 1hr , i expect the mql query to return 23 hrs
In the below snap, code snippet the graph reqpresents the max uptime but doesn't consider the restarts . I've tried using secondary aggregator with max but still query doesn't report the exact value.
If you have any idea on how to get information of total uptime in the past 1 day through MQL that would be very helpful. Any pointers are much appreciated. Thank you.
fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime_total'
| group_by 1d, [value_uptime_total_max: max(value.uptime_total)]
| every 1d

you can try with the uptime metric instead :
fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime'
| filter (metric.instance_name == 'instance-1')
| align delta(1d)
| every 1d
| group_by [], [value_uptime_mean: mean(value.uptime)]
so you get a graph similar to this one:

Using sliding in the group_by and sum aggregator for the calculation.
fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime_total'
| filter (metric.instance_name = "the instance name you need")
| group_by [], sliding(1d), [value_uptime_total_sum: sum(value.uptime_total)]

GCP compute VM metrics instace/uptime , instance/uptime_total are not reliable. Rather tracking uptime through uptime check and using following MQL query gives the exact values for historical uptime.
Please replace 30d with appropriate value 1d , 1h
fetch uptime_url
| metric 'monitoring.googleapis.com/uptime_check/check_passed'
| filter (metric.check_id == 'dev-uptime-test')
| group_by 30d,
[value_check_passed_fraction_true: fraction_true(value.check_passed)]
| every 30d | mean

How S3 select pricing works? What is data returned and scanned in s3 select means

I have a 1M rows of CSV data. select 10 rows, Will I be billed for 10 rows. What is data returned and data scanned means in S3 Select?
There is less documentation on these terms of S3 select

To keep things simple lets forget for some time that S3 reads in a columnar way. Suppose you have the following data:
| City | Last Updated Date |
|------------|---------------------|
| London | 1st Jan |
| London | 2nd Jan |
| New Delhi | 2nd Jan |
A query for fetching the latest update date
forces S3 to scan all 3 records
but the returned records are only 2 (when the last updated date is 2nd Jan)
A query of select city where last updated date is 1st Jan,
will scan all 3 rows
but return only 1 string - "New Delhi".
Hence based on your query, it might scan more data (3 rows) but return less data (2 rows).
I hope you understand the difference between Data Scanned and Data Returned now.

PowerBi: incremental data load by using OData feed

is there any possibility to save previous data before overriding because of refreshing data?
Steps i have done:
I Created a table and appended to table A
Created a Column called DateTime with the function
DateTime.LocalNow()
Now i have a problem how to save previous data before the refreshing phase. I need to preserve the timestamp of previous data and actually data.
Example giving:
Before refreshing:
Table A:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
New Table:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
After refreshing:
Table A:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
| value 2 | 23.03.2016 23:01
New Table:
|Columnname x| DateTime | ....
| value | 23.03.2016 23:00
| value 2 | 23.03.2016 23:01
kind regards

Incremental refreshes in the Power BI Service or Power BI Desktop aren't currently supported. But please vote for this feature. (update: see that link for info on a preview feature that does this)
If you need this behavior you need to load these rows to a database then incrementally load the database. The load to Power BI will still be a full load of the table(s).

This is now available in PowerBI Premium
From the docs
Incremental refresh enables very large datasets in the Power BI Premium service with the following benefits:
Refreshes are faster. Only data that has changed needs to be refreshed. For example, refresh only the last 5 days of a 10-year dataset.
Refreshes are more reliable. For example, it is not necessary to maintain long-running connections to volatile source systems.
Resource consumption is reduced. Less data to refresh reduces overall consumption of memory and other resources.

Syncframework:Map single table into multiple tables

I have two tables like the fallowing:
On server:
| Orders Table | OrderDetails Table
-------------------------------------------------------------------------------------
| Id | Id
| OrderDate | OrderId
| ServerName | Product
| Quantity
On client:
| Orders Table | OrderDetails Table
-------------------------------------------------------------------------------------
| Id | Id
| OrderDate | OrderId
| Product
| Quantity
| ClientName
I need to sync the [Server].[Orders Table].[ServerName] to [Client].[OrderDetails Table].[ClientName]
The Question:
What is the true and efficient way of making it?
I know Deprovisioning and provisioning with different config, is one way of doing it.
So I just wanna know the correct way.
Thanks.
EDIT :
Other columns of each table should sync normally ([Server].[Orders Table].[Id] to [Client].[Orders Table].[Id] ...).
And mapping strategy sometimes changes based on the row of data (which which is sending/receiving).

Sync Fx is not an ETL tool. simply put, it's DB sync is per table.
if you really want to force it to do what you want, you can simply intercept ChangesSelected event for the OrderDetails table, lookup the extra column from the other table and then dynamically add the column to the dataset before it gets applied on the other side.
see this link on how to manipulate the change dataset

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS Dynamo DB: High read latency using query on GSI - amazon-web-services

Related

How can I find the missed utterances rate per day from Lex using CloudWatch?

How to get uptime total and percentage of GCP compute vm instance through MQL?

How S3 select pricing works? What is data returned and scanned in s3 select means

PowerBi: incremental data load by using OData feed

Syncframework:Map single table into multiple tables

Categories

Resources