I am using a window function to get the difference in the values of a column (downloads) between two dates. I'd also like to get the product of that difference multiplied by the size of the file to get the bytes downloaded for the period.
With the help of this community, I am able to get the number of downloads but cannot find the correct syntax to get the product of downloads * size.
Table 'files'
+---------------+------------------------+------+-----------+------------+
| site | full_path | size | downloads | date_stamp |
+---------------+------------------------+------+-----------+------------+
| Lawrenceville | lr1/dir1/subdir1/file1 | 1000 | 7 | 2019-08-08 |
| Lawrenceville | lr1/dir1/subdir1/file1 | 1010 | 9 | 2019-08-15 |
| Lawrenceville | lr1/dir1/subdir1/file2 | 1213 | 5 | 2019-08-08 |
| Lawrenceville | lr1/dir1/subdir1/file2 | 2000 | 5 | 2019-08-15 |
| Lawrenceville | lr1/dir2/subdir1/file1 | 2213 | 5 | 2019-08-15 |
| Rennes | rr1/dir1/subdir1/file3 | 200 | 3 | 2019-08-08 |
| Rennes | rr1/dir1/subdir1/file3 | 201 | 4 | 2019-08-15 |
+---------------+------------------------+------+-----------+------------+
SELECT site, sum(diff) FROM (SELECT site, downloads - lag(downloads, 1) OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff FROM files WHERE date_stamp IN ('2019-08-15', '2019-08-08')) group by site
produces this:
+---------------+-----------+
| site | downloads |
+---------------+-----------+
| Lawrenceville | 2 |
| Rennes | 1 |
+---------------+-----------+
I have tried:
SELECT site, sum(diff), sum(sum(diff)*bytes) FROM (SELECT site, downloads - lag(downloads, 1), size OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff, bytes FROM files WHERE date_stamp IN ('2019-08-15', '2019-08-08')) group by site
sqlite3.OperationalError: near "(": syntax error
Ideally I want this output:
+---------------+-----------+----------+
| site | downloads | bytes |
+---------------+-----------+----------+
| Lawrenceville | 2 | 2020 |
| Rennes | 1 | 201 |
+---------------+-----------+----------+
Lawrenceville had 2 downloads of file lr1/dir1/subdir1/file1 which is 1010 bytes (on 2019-08-15). File lr1/dir1/subdir1/file2 had no downloads for that period. It would be nice to include files lr1/dir1/subdir1/file2 and lr1/dir2/subdir1/file1 but they get excluded by the window function. I can get them with a separate query.
Rennes has 1 download of file rr1/dir1/subdir1/file3
If your current query works then you only need max() window function in the subquery:
SELECT site, sum(diff) downloads, sum(diff) * size bytes
FROM (
SELECT
site,
downloads - lag(downloads, 1) OVER (PARTITION BY site, full_path ORDER BY date_stamp) AS diff,
max(size) OVER (PARTITION BY site, full_path) AS size
FROM files
WHERE date_stamp IN ('2019-08-15', '2019-08-08')
)
group by site
See the demo.
Results:
| site | downloads | bytes |
| ------------- | --------- | ----- |
| Lawrenceville | 2 | 2020 |
| Rennes | 1 | 201 |
Related
I am trying to build a Power BI report for data from a SQL database where I have to show detail pages using Drillthrough. The only viable way to connect the datasets is using the database row ids.
From a user's perspective the row ids would not add any value but a lot of noise.
Is there a way to drillthrough using the row ids without showing them in a visual?
Yes, this is possible in the current release of Power Bi Desktop using a workaround solution that involves hiding the row id column in the parent (or summary) page.
Take the following tables as example:
ALBUM
+---------+------------------------+
| AlbumId | AlbumName |
+---------+------------------------+
| 1 | Hoist |
+---------+------------------------+
| 2 | The Story Of the Ghost |
+---------+------------------------+
TRACK
+---------+---------+--------------------------+
| TrackId | AlbumId | TrackName |
+---------+---------+--------------------------+
| 1 | 1 | Julius |
+---------+---------+--------------------------+
| 2 | 1 | Down With Disease |
+---------+---------+--------------------------+
| 3 | 1 | If I Could |
+---------+---------+--------------------------+
| 4 | 1 | Riker's Mailbox |
+---------+---------+--------------------------+
| 5 | 1 | Axilla, Part II |
+---------+---------+--------------------------+
| 6 | 1 | Lifeboy |
+---------+---------+--------------------------+
| 7 | 1 | Sample In a Jar |
+---------+---------+--------------------------+
| 8 | 1 | Wolfmans Brother |
+---------+---------+--------------------------+
| 9 | 1 | Scent of a Mule |
+---------+---------+--------------------------+
| 10 | 1 | Dog Faced Boy |
+---------+---------+--------------------------+
| 11 | 1 | Demand |
+---------+---------+--------------------------+
| 12 | 2 | Ghost |
+---------+---------+--------------------------+
| 13 | 2 | Birds of a Feather |
+---------+---------+--------------------------+
| 14 | 2 | Meat |
+---------+---------+--------------------------+
| 15 | 2 | Guyute |
+---------+---------+--------------------------+
| 16 | 2 | Fikus |
+---------+---------+--------------------------+
| 17 | 2 | Shafty |
+---------+---------+--------------------------+
| 18 | 2 | Limb by Limb |
+---------+---------+--------------------------+
| 19 | 2 | Frankie Says |
+---------+---------+--------------------------+
| 20 | 2 | Brian and Robert |
+---------+---------+--------------------------+
| 21 | 2 | Water in the Sky |
+---------+---------+--------------------------+
| 22 | 2 | Roggae |
+---------+---------+--------------------------+
| 23 | 2 | Wading in the Velvet Sea |
+---------+---------+--------------------------+
| 24 | 2 | The Moma Dance |
+---------+---------+--------------------------+
| 25 | 2 | End of Session |
+---------+---------+--------------------------+
Add them as data sources. The 1:many relationship between AlbumId should be created. Create a parent page with a table containing AlbumId and AlbumName. Then create the details page with a table containing only the TrackName column. In the Drillthrough filter field of the details page, drag the Album Table -> AlbumId to this field.
Now go back to the parent page and notice that when you right click on an album, you get the drillthrough menu to the details page. This works, but now you have a messy AlbumId column on your parent page.
The workaround is to hide the AlbumId on the parent report. First go to the Format(Paint roller) menu of the table on the parent report and in the column header -> word wrap turn this off. Then drag the column separator of the table to hide the AlbumId. See before and after images below.
BEFORE HIDE
AFTER HIDE
I have the powerbi file posted here if you want to see it in action.
My application is creating a log file every 10min, which I want to store in DynamoDB in an aggregated way, e.g. 144 log files per day, 1008 log files per week or ~4400 log files per month.
I have different partition keys, but for sake of simplicity I have used only a single partition key in the following examples.
The straight forward solution would be to have different tables, e.g.
Table "TenMinLogsDay":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-04-30 | 144 | some serialized aggregated data
1 | 2017-05-01 | 144 | some serialized aggregated data
1 | 2017-05-02 | 144 | some serialized aggregated data
1 | 2017-05-03 | 144 | some serialized aggregated data
Table "TenMinLogsWeek":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 1008 | some serialized aggregated data
1 | 2017-05-08 | 1008 | some serialized aggregated data
1 | 2017-05-15 | 1008 | some serialized aggregated data
Table "TenMinLogsMonth":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 4464 | some serialized aggregated data
1 | 2017-06-01 | 4320 | some serialized aggregated data
1 | 2017-07-01 | 4464 | some serialized aggregated data
I would prefer however a combined table. Out of the box DynamoDB does not seem to support this.
Also, I want to query either the daily OR the weekly OR the monthly aggregated items, thus I don't want to use the filter feature for this.
The following solution would be possible, but seems like a poor hack:
Table "TenMinLogsCombined":
id (=part.key) | date (=sort key) | week (=LSI sort key) | month (=LSI sort key) | cntTenMinLogs | data
-------------- | ---------------- | -------------------- | --------------------- | ------------- | -----
1 | 2017-04-30 | (empty) | (empty) | 144 | ...
1 | 2017-05-01 | (empty) | (empty) | 144 | ...
1 | 0017-05-01 | 2017-05-01 | (empty) | 1008 | ...
1 | 1017-05-01 | (empty) | 2017-05-01 | 4464 | ...
1 | 2017-05-02 | (empty) | (empty) | 144 | ...
1 | 2017-05-03 | (empty) | (empty) | 144 | ...
Explanation:
By using the year "0017" and "1017" instead of "2017" I can query the date range for, e.g. 2017-05-01 to 2017-05-04 and DynamoDB won't read the items starting with 0017 or 1017
For week or month range queries, such a hack is not required, as empty LSI sort keys are possible.
Does anybody know of a better way to achieve this?
I'm new to Django, and I'm trying to create a simple application that keep track on different server configurations in a SQlite database. I've created 2 database models:
from django.db import models
class Server(models.Model):
name = models.CharField(max_length=250)
class Config(models.Model):
server = models.ForeignKey(Server, on_delete=models.CASCADE)
configuration = models.CharField(max_length=250)
config_version = models.IntegerField()
Here are the 2 models sample data:
Server:
| id | name |
| ------ | ------ |
| 1 | Server1 |
| 2 | Server2 |
| 3 | Server3 |
Config:
| id | configuration | config_version | server |
| ------ | ------------- | -------------- | ------ |
| 1 | srv1_cfg1 | 1 | 1 |
| 2 | srv2_cfg1 | 1 | 2 |
| 3 | srv2_cfg2 | 2 | 2 |
| 4 | srv2_cfg3 | 3 | 2 |
| 5 | srv3_cfg1 | 1 | 3 |
| 6 | srv1_cfg2 | 2 | 1 |
| 7 | srv1_cfg3 | 3 | 1 |
I would like to query the Config table, and get only rows with the maximum value of "config_version" field for each server id, like:
Desired result:
| id | configuration | config_version | serverid | servername |
| ------ | ------------- | -------------- | -------- | ---------- |
| 4 | srv2_cfg3 | 3 | 2 | Server2 |
| 5 | srv3_cfg1 | 1 | 3 | Server3 |
| 7 | srv1_cfg3 | 3 | 1 | Server1 |
I've tried many different options to construct the correct query, but so far I cannot get what I want. My best result is to query the Server table:
Server.objects.annotate(maxver=Max('config__config_version'))
But it seems I cannot get access to the Config table objects, so I guess I need to query the Config table with some filtering?
I can do this with a raw SQL query, but I would strongly prefer to do it the "Django" way. Any help will be much appreciated.
After some more struggle with this, I've came with a solution that works for me. I'm sure it is not optimal, but at least seems to works:
from django.db.models import Max, F
s1 = Config.objects.annotate(maxver=Max('server__config__config_version'))
config_list = s1.filter(config_version=F('maxver'))
If there is a better way to do this, I would love to know it.
I have a complex result that requires writing raw sql queries.
See https://stackoverflow.com/a/38548462/80353
The expected result is a table showing several columns.
The first column header is simply Product and the other column headers are store names.
The values are simply the product names and the aggregated sales values of the product in these stores.
Which stores will be shown is entirely dynamic. Maximum should be 9 stores.
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
For more details of the schema, check the question in How to get back aggregate values across 2 dimensions using Python Cubes?
My question
The schema is not super important to my question which is:
Since I am going to write a complex raw query, is there a way to map the query result to a model where the fields are dynamic?
I found documentation about how to execute raw queries in Django and how to execute raw queries to existing models with fixed fields and matching table.
My question is is it possible to do that for a model that has no matching table and dynamic fields?
If so, how?
Or if I choose to use materialised view in postgresql, how do I match it with a model class?
I have two models, Version and Description.
class Version(models.Model):
version_name = models.CharField(max_length=100)
version_value = models.IntegerField()
url = models.CharField(max_length=240)
class Description(models.Model):
version = models.ForeignKey(Version)
lang = models.CharField(max_length=8)
content = models.TextField()
And a DescriptionSerializer.
class DescriptionSerializer(serializers.ModelSerializer):
version_name = serializers.RelatedField(source='version')
class Meta:
model = Description
fields = ('version_name', 'content')
They stored the descriptions of different versions in different languages.
E.g.
Version
+----+--------------+---------------+---------------------+
| id | version_name | version_value | url |
+----+--------------+---------------+---------------------+
| 1 | 1.0.0 | 1 | http://abc.net.tw/ |
| 2 | 1.0.1 | 2 | http://abc.net.tw/2 |
| 3 | 1.0.2 | 3 | http://abc.net.tw/3 |
| 4 | 1.0.3 | 4 | http://abc.net.tw/4 |
| 7 | 1.1.0 | 5 | http://abc.net.tw/5 |
| 8 | 1.1.1 | 6 | http://abc.net.tw/6 |
+----+--------------+---------------+---------------------+
Description
+------------+-------+---------+
| version_id | lang | content |
+------------+-------+---------+
| 1 | en_US | English |
| 1 | zh_TW | Chinese |
| 1 | es_ES | Spanish |
| 2 | en_US | English |
| 2 | zh_TW | Chinese |
| 2 | es_ES | Spanish |
| 3 | en_US | English |
| 3 | zh_TW | Chinese |
| 3 | es_ES | Spanish |
| 4 | en_US | English |
| 7 | en_US | English |
| 8 | en_US | English |
| 4 | es_ES | Spanish |
| 7 | es_ES | Spanish |
+------------+-------+---------+
I'm using django rest framework to implement a web API that returns the description of each version in certain language. If a description of certain language doesn't exist, use English version instead.
I can use following SQL to retrieve the desired result. I've read DRF's docs on relatedField and reverse relation. But I still can't figure out how to use django's ORM to do the same thing and to use it with django rest framework's serializer.
select
coalesce(d.id, d2.id), coalesce(d.version_id, d2.version_id), coalesce(d.lang, d2.lang), coalesce(d.content, d2.content)
from
version v
left outer join description d on v.id = d.version_id and d.lang='zh_TW'
left outer join description d2 on v.id = d2.version_id and d2.lang='en_US'
Please advise how to do it in django.
You can't use django orm for everything. There are numerous things you can't do with django. For those cases you either use straight up SQL (from django.db import connection, transaction etc...) or if the query results can be worked into objects you have described - then you can use raw queries (link)