how to sum values in function lambda in dynamodb in AWS - amazon-web-services

I have some tables in dynamodb and I simply want to take a cost variable, of a service and create a function that adds (like sum(column)) up all from one id and returns the result. how can I do it

Summing up values from a DynamoDB table requires a full table scan by design.
It's because you need to gather all values from the column you are trying to sum up.
Your question is similar to Find Average and Total sum in DynamoDB?

You can use a query with a projection expression to read all values for the attribute you wish to sum into an array, then sum the values in the array client side.
A query avoids the need to do a full table scan.
For this to work, the "id" you reference in "all from one id" must be a partition key.

Related

BigQuery: clustering column with millions of cardinality

I have a BigQuery table, partitioned by date (for everyday there is one partition).
I would like to add various columns sometimes populated and sometimes missing and a column for a unique-id.
The data need to be searchable through a unique id. The other use case is to aggregate per column.
This unique id will have a cardinality of millions per day.
I would like to use the unique-id for clustering.
Is there any limitation on this? Anyone has tried it?
It's a valid use case to enable clustering on an id column, the amount of values shouldn't cause any limitations.

Quicksight - how to show aggregated value as Fraction

I am creating a pivot table in quicksight, one of the rows I want to calculate the average so I have applied avaerage option on it. It aggregates it up to the root level 500/6 = 83.333 so far so good.
Now I want to tweak this a bit and instead of showing 83.333, I want to show it as fraction value like 5/6, it basically denotes 5 are passed out of 6. How I can achive the same ?
Unfortunately, as far as I know, QuickSight doesn't provide any such functionality out of the box. What you can do instead is get a total count of all the objects in the table and another count of the items that have passed and then display them one after another.

Having one column dependent on the order of another column in Amazon QuickSight

Here is the simple table data I have:
What I want:
For each business, I am only interested in it's LATEST Profit% in QuickSight table format (or any other visual type honestly and not just table)
What I have tried:
I put this table into QuickSight's Table visual type and put the "Business Name" into "Group by", "Timestamp" and "Profit%" into "Value". Fortunately, I can select only the max value of the Timestamp column (which is exactly what I want!) as you can see in the image below:
The first two columns are exactly what I want however if you look at the "Profit%" column, you can see that I am getting a "Sum" value. But I just want the respective value associated with that respective "Max" Timestamp only! For example for the "Business-1" row, the expected value in the Profit% column is 25.
Possible solutions I have tried:
i. Using a calculated field using the lastValue function - Unfortunately, this function is not supported in the region I am using QuickSight. If it was supported, my issue would have been resolved.
ii. Using a calculated field called rank using the function rank and then using a filter for this column. This has not worked out as yet. Suggestions are welcome if it is actually possible using this logic.
I think you will be able to get the desired result by using the maxover function and an ifelse function
ifelse(Timestamp=maxover(Timestamp,[{Business Name}],PRE_FILTER),{Profit%},0)

How to Hash an Entire Redshift Table?

I want to hash entire redshift tables in order to check for consistency after upgrades, backups, and other modifications which shouldn't affect table data.
I've found Hashing Tables to Ensure Consistency in Postgres, Redshift and MySQL but the solution still requires spelling out each column name and type so it can't be applied new tables in a generic manner. I'd have to manually change column names and types.
Is there some other function or method by which I could hash / checksum entire tables in order to confirm they are identical? Ideally without spelling out the specific column and column types of that table.
There is certainly no in-built capability in Redshift to hash whole tables.
Also, I'd be a little careful of the method suggested in that article because, from what I can see, it is calculating a hash of all the values in a column but isn't associating the hashed value with a row identifier. Therefore if Row 1 and Row 2 swapped values in a column, the hash wouldn't change. So, it's not strictly calculating an adequate hash (but I could be wrong!).
You could investigate using the new Stored Procedures in Redshift to see whether you can create a generic function that would work for any table.

Countif comparing dates in Tableau

I am trying to create a table where it only counts the attendees one one type of training (rows) if they attended another particular training (column) AFTER the first one. I think I need to recreate a countif function that compares the dates of the trainings, but not sure how to set this up so that it compares the dates of the row trainings and column trainings. Any ideas?
Edit 3/23
Alex, your solution would work if I had different variables for the dates of each type of training. Is there a way to construct this without having to create new variables for each type of training that I want to compare? Put another way, is there a way to refer to the rows and columns of the table in the formula that would compare the dates? So, something like "count if the start date of this column exceeds the start date of this row." (basically, is there something like the Excel index function in Tableau?)
It may help to see how my data is structured -- here is a scrubbed version: https://docs.google.com/spreadsheets/d/1YR1Wz-pfGHhBxDQDGYgmemLGoCK0cSvKOeE8w33ZI3s/edit?usp=sharing
The "table" tab shows the table that I'm trying to create in Tableau.
Define a calculated field for your condition, called say, trained_after, as:
training_b_date > training_a_date
trained_after will be true or false for each data row depending on whether the B training was dated later than the A training
If you want more precise control over the difference between the dates, use the date_diff function. Say date_diff("hour", training_a_date, training_b_date) > 24 to insist upon a short waiting period.
That field may be all you need. You can put trained_after on the filter shelf to filter only to see data rows meeting the condition. Or put it on another shelf to partition the data according to that condition. Or use your field to create other calculated fields.
Realize that if either of your date fields is null, then your calculated field will evaluate to null in that case. Aggregate functions like Sum(), Count() etc ignore null values.