I have a CloudWatch query that creates a table of output that looks something like:
id | name | age
1313 | Sam | 24
1313 | Sam | 24
1313 | Sam | 24
1481 | David | 62
1481 | David | 62
3748 | Sarah | 37
3748 | Sarah | 37
3748 | Sarah | 37
1481 | David | 62
(All example values)
Is there a way to have CloudWatch automatically deduplicate its output, so I just see:
id | name | age
1313 | Sam | 24
1481 | David | 62
3748 | Sarah | 37
You can calculate an aggregated value across these 3 fields and then drop it (keep just these 3). Like this for example:
YOUR CURRENT QUERY | count(*) by id, name, age | display id, name, age
Related
I have data showing me the dates grouped like this:
For security reasons, I had to remove the Customer Description detail, due to confidentiality.
How do I repeat the date column the same way you repeat the Row Labels in an Excel Pivot?
I've looked, but couldn't find a solution to this - this option should be available.
EDIT
When you have the following source data in Excel:
Date | Customer | Item Description | Qty Out | Unit Price | Sales
--------------------------------------------------------------------------------------------------------------------------------------------
14/08/2020 | Customer 1 | Item 11 | 4.00 | 65.00 | 260.00
14/08/2020 | Customer 2 | Item 12 | 56.00 | 12.00 | 672.00
14/08/2020 | Customer 3 | Item 13 | 64.00 | 35.00 | 2,240.00
14/08/2020 | Customer 4 | Item 14 | 29.00 | 65.00 | 1,885.00
15/08/2020 | Customer 2 | Item 15 | 746.00 | 12.00 | 8,952.00
15/08/2020 | Customer 3 | Item 16 | 14.00 | 75.00 | 1,050.00
15/08/2020 | Customer 4 | Item 17 | 45.00 | 741.00 | 33,345.00
15/08/2020 | Customer 5 | Item 18 | 456.00 | 125.00 | 57,000.00
15/08/2020 | Customer 6 | Item 19 | 925.00 | 17.00 | 15,725.00
16/08/2020 | Customer 4 | Item 20 | 6.00 | 532.00 | 3,192.00
16/08/2020 | Customer 5 | Item 21 | 56.00 | 94.00 | 5,264.00
16/08/2020 | Customer 6 | Item 22 | 546.00 | 37.00 | 20,202.00
You then pivot this data using Microsoft Excel, where you get the following:
You then choose the option to Repeat Item Labels as can be seen below:
After selecting this, you get my expected results I require in Power BI:
Is there not a function available like this in Power BI?
Just adding this for your reference as a work around. Check this below image with a custom column created in the Power Query Editor-
date_customer = Date.ToText([Date]) &" : "& [Customer]
Then added both Date and date_customer in the Matrix row level. The output is as below- (using your sample data)
ANOTHER OPTION Another option is to add Date and Customer in the Matrix row and the output is will be as below- (using your sample data)
This is also a meaningful output as date are showing as a group header. But in case of requirement of having redundant date to show, you can consider the first option.
I am working with a Stata dataset that tracks a company's contract year.
However, systematically I am missing a year:
Is there a code I could quickly run through to replace the missing year with the year from the previous observation?
The following works for me:
clear
input var year
564 2029
597 2029
653 .
342 2041
456 2041
end
replace year = year[_n-1] if missing(year)
list
+------------+
| var year |
|------------|
1. | 564 2029 |
2. | 597 2029 |
3. | 653 2029 |
4. | 342 2041 |
5. | 456 2041 |
+------------+
My application is creating a log file every 10min, which I want to store in DynamoDB in an aggregated way, e.g. 144 log files per day, 1008 log files per week or ~4400 log files per month.
I have different partition keys, but for sake of simplicity I have used only a single partition key in the following examples.
The straight forward solution would be to have different tables, e.g.
Table "TenMinLogsDay":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-04-30 | 144 | some serialized aggregated data
1 | 2017-05-01 | 144 | some serialized aggregated data
1 | 2017-05-02 | 144 | some serialized aggregated data
1 | 2017-05-03 | 144 | some serialized aggregated data
Table "TenMinLogsWeek":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 1008 | some serialized aggregated data
1 | 2017-05-08 | 1008 | some serialized aggregated data
1 | 2017-05-15 | 1008 | some serialized aggregated data
Table "TenMinLogsMonth":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 4464 | some serialized aggregated data
1 | 2017-06-01 | 4320 | some serialized aggregated data
1 | 2017-07-01 | 4464 | some serialized aggregated data
I would prefer however a combined table. Out of the box DynamoDB does not seem to support this.
Also, I want to query either the daily OR the weekly OR the monthly aggregated items, thus I don't want to use the filter feature for this.
The following solution would be possible, but seems like a poor hack:
Table "TenMinLogsCombined":
id (=part.key) | date (=sort key) | week (=LSI sort key) | month (=LSI sort key) | cntTenMinLogs | data
-------------- | ---------------- | -------------------- | --------------------- | ------------- | -----
1 | 2017-04-30 | (empty) | (empty) | 144 | ...
1 | 2017-05-01 | (empty) | (empty) | 144 | ...
1 | 0017-05-01 | 2017-05-01 | (empty) | 1008 | ...
1 | 1017-05-01 | (empty) | 2017-05-01 | 4464 | ...
1 | 2017-05-02 | (empty) | (empty) | 144 | ...
1 | 2017-05-03 | (empty) | (empty) | 144 | ...
Explanation:
By using the year "0017" and "1017" instead of "2017" I can query the date range for, e.g. 2017-05-01 to 2017-05-04 and DynamoDB won't read the items starting with 0017 or 1017
For week or month range queries, such a hack is not required, as empty LSI sort keys are possible.
Does anybody know of a better way to achieve this?
Say I have a data set of country GDPs formatted like this:
---------------------------------
| Year | Country A | Country B |
| 1990 | 128 | 243 |
| 1991 | 130 | 212 |
| 1992 | 187 | 207 |
How would I use Stata's reshape command to change this into a long table with country-year rows, like the following?
----------------------
| Country| Year | GDP |
| A | 1990 | 128 |
| A | 1991 | 130 |
| A | 1992 | 187 |
| B | 1990 | 243 |
| B | 1991 | 212 |
| B | 1992 | 207 |
It is recommended that you try to solve the problem on your own first. Although you might have tried, you show no sign that you did. For future questions, please post the code you attempted, and why it didn't work for you.
The following gives what you ask for:
clear all
set more off
input ///
Year CountryA CountryB
1990 128 243
1991 130 212
1992 187 207
end
list
reshape long Country, i(Year) j(country) string
rename Country GDP
order country Year GDP
sort country Year
list, sep(0)
Note: you need the string option here because your stub suffixes are strings (i.e. "A" and "B"). See help reshape for the details.
I'm dynamically writing a Django query and am receiving unexpected results based on the slice parameters. For example, if I request queryset[0:10] and querset[10:20] I receive some of the same item's in query2 that I found in query1.
Searching around, the issue I'm facing appears similar to:
Simple Djanqo Query generating confusing Queryset results
except I am defining a order_by for my query so it doesn't appear to be an exact match.
Viewing the querset.query for my two queries....
queryset[0:10] generates:
SELECT "intercache_localinventorycountsummary"."id",
"intercache_localinventorycountsummary"."part",
"intercache_localinventorycountsummary"."site",
"intercache_localinventorycountsummary"."location",
"intercache_localinventorycountsummary"."hadTransactionsDuring"
FROM "intercache_localinventorycountsummary"
ORDER BY "intercache_localinventorycountsummary"."hadTransactionsDuring" DESC
LIMIT 10
queryset[10:20] generates:
SELECT "intercache_localinventorycountsummary"."id",
"intercache_localinventorycountsummary"."part",
"intercache_localinventorycountsummary"."site",
"intercache_localinventorycountsummary"."location",
"intercache_localinventorycountsummary"."hadTransactionsDuring"
FROM "intercache_localinventorycountsummary"
ORDER BY "intercache_localinventorycountsummary"."hadTransactionsDuring" DESC
LIMIT 10 OFFSET 10
Per request, I've listed the literal SQL generated by Django, and ran it manually against the DB.
Results for Query1:
id | part | site | location | hadTransactionsDuring
------+---------+------+----------+-----------------------
2787 | 2217-1 | 01 | Bluebird | t
2839 | 2215 | 01 | 2600 FG | t
2558 | R4367 | 01 | 2600 Raw | t
2637 | 4453 | 01 | 2600 FG | t
2810 | 1000 | 01 | 2600 FG | t
2531 | 3475 | 01 | 2600 FG | t
2526 | 4596Z | 01 | 2550 FG | t
2590 | 3237-12 | 01 | 2600 Raw | t
3077 | 4841Y | 01 | 2600 FG | t
2919 | 3407 | 01 | 2600 FG | t
Results for Query2:
id | part | site | location | hadTransactionsDuring
------+--------------+------+----------+-----------------------
2598 | 2217-2 | 01 | 2600 Raw | t
2578 | 2216-5 | 01 | 2600 Raw | t
2531 | 3475 | 01 | 2600 FG | t
3010 | 3919 | 01 | 2600 FG | t
2558 | R4367 | 01 | 2600 Raw | t
2637 | 4453 | 01 | 2600 FG | t
2526 | 4596Z | 01 | 2550 FG | t
2590 | 3237-12 | 01 | 2600 Raw | t
2570 | R3760-BRN-GS | 01 | 2600 Raw | f
2569 | 4098 | 01 | 2600 FG | f
(You can see id's 2558, 2637, 2526, 2590 are returned for both queries)
Any guesses what I'm doing wrong here? It seem I must be fundamentally misunderstanding something about how QuerySet slicing works.
Update:
The DB schema is as follows... are result orderings non-reliable when ordering by non-indexed fields perhaps?
\d intercache_localinventorycountsummary
Table "public.intercache_localinventorycountsummary"
Column | Type | Modifiers
-----------------------+--------------------------+------------------------------------------------------------------------------------
id | integer | not null default nextval('intercache_localinventorycountsummary_id_seq'::regclass)
_domain_id | integer |
_created | timestamp with time zone | not null
_synced | timestamp with time zone |
_active | boolean | not null default true
dirty | boolean | not null default true
lastRefresh | timestamp with time zone |
part | character varying(18) | not null
site | character varying(8) | not null
location | character varying(8) | not null
quantity | numeric(16,9) |
startCount | timestamp with time zone |
endCount | timestamp with time zone |
erpCountQOH | numeric(16,9) |
hadTransactionsDuring | boolean | not null default false
quantityChangeSince | numeric(16,9) |
hadManualDating | boolean | not null
variance | numeric(16,9) |
unitCost | numeric(16,9) |
countCost | numeric(16,9) |
varianceCost | numeric(16,9) |
Indexes:
"intercache_localinventorycountsummary_pkey" PRIMARY KEY, btree (id)
"intercache_localinventorycount__domain_id_5691b6f8cca017dc_uniq" UNIQUE CONSTRAINT, btree (_domain_id, part, site, location)
"intercache_localinventorycountsummary__active" btree (_active)
"intercache_localinventorycountsummary__domain_id" btree (_domain_id)
"intercache_localinventorycountsummary__synced" btree (_synced)
Foreign-key constraints:
"_domain_id_refs_id_163d40e6b21ac0f9" FOREIGN KEY (_domain_id) REFERENCES intercache_domain(id) DEFERRABLE INITIALLY DEFERRED
The problem lies with this:
ORDER BY "intercache_localinventorycountsummary"."hadTransactionsDuring" DESC
Apparently you've overridden ordering either explicitly in the query or in model's meta options (vide Model Meta options: ordering).
If you want to order by hadTransactionsDuring but have predictable ordering, you should add second ordering that will resolve cases where first one has same value. For example:
queryset.order_by("-hadTransactionsDuring", "id")
Keep in mind RDBMSes, be it PostgreSQL or MySQL, never guarantee any order at all unless explicitly specified with ORDER BY. Most queries usually return in order of primary key, but that's more like just a happy coincidence, depending on internal implementation of table storage, rather than something you can rely on. In other words you cannot assume that Django queryset is ordered on any field besides the fields you've specified in order_by.