Postgres - How to Join Tables without duplicates

Postgres - How to Join Tables without duplicates - django

I'm working on a project that locally used SQLite, now when moving to PostGres (On Heroku) my query reported an error "r.social must appear in the GROUP BY clause or be used in an aggregate function"
The original query is:
SELECT DISTINCT c.name, r.social, c.description, p.price
FROM cryptomodels_coin c
LEFT JOIN cryptomodels_coinprice p
ON p.coin_id = c.name
LEFT JOIN cryptomodels_CoinRating r
ON r.coin_id = c.name
GROUP BY c.name
Which works fine locally, with one unique row returned for each coin
When I added this to the PostGres environment, it threw the aggregate function error mentioned above - I managed to resolve this by adding all columns to the "Group by" clause, as seen below:
SELECT DISTINCT c.name, r.social, c.description, p.price
FROM cryptomodels_coin c
LEFT JOIN cryptomodels_coinprice p
ON p.coin_id = c.name
LEFT JOIN cryptomodels_CoinRating r
ON r.coin_id = c.name
GROUP BY c.name, r.social, c.description, p.price
The issue is that I now have duplicate rows for each coin
I've done a fair bit of reading and tried numerous solutions, some of which throw errors and others still result in duplicate rows, really not sure how to proceed, thank you for any assistance
EDIT for additional information:
Each coin has numerous prices and numerous ratings, with the cryptomodels_coin table being referenced by the other tables by using it's name as "coin_id" the so three coins for example:
Coin table:
| Name |
--------
| 0X |
| XSV |
| BTC |
Price table:
| Coin_id | Price |
-------------------
| 0X | 43.2 |
| XSV | 20.0 |
| BTC | 99999|
Rating table:
| Coin_id | Social|
-------------------
| 0X | 20,000|
| XSV | 12,000|
| BTC | 5,0000|
EDIT 2:
CREATE TABLE "cryptomodels_coin" (
"name" varchar(200) NOT NULL PRIMARY KEY,
"description" text NOT NULL);
CREATE TABLE "cryptomodels_coinprice" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"price" real NULL,
"coin_id" varchar(200) NOT NULL REFERENCES "cryptomodels_coin" ("name") );
CREATE TABLE "cryptomodels_coinrating" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"social" text NULL, "coin_id" varchar(200) NOT NULL REFERENCES "cryptomodels_coin" ("name"));
Added SQLFiddle:
http://sqlfiddle.com/#!15/9fcff/1
Thanks!

I guess something like this would eliminate duplicates as you wish:
SELECT c.name AS name,
r.social AS social,
c.description AS description,
SUM(p.price) AS price
FROM cryptomodels_coin c
LEFT JOIN cryptomodels_coinprice p ON p.coin_id = c.name
LEFT JOIN cryptomodels_CoinRating r ON r.coin_id = c.name
GROUP BY c.name,r.social,c.description

Related

PowerBI Additive Slicers (Applied to whole Report)

I have a quite niche problem in regards to PowerBI slicer operations.
I wish to filter the data based on two different slicers.
For example I have two different slicers;
A list of categories, say Genre of Films; and
A list of all film directors
I wish to filter the data shown in the PowerBI report based on an OR condition between the two slicers.
For instance, I wish to filter based on all Horror films OR films directed by Quentin Tarantino. So this list would show all Horror films in my database and all films directed by Quentin Tarantino (that are not necessarily Horror films).
I presume that I will need to write some form of DAX code for this but through substantive searching I have not come across this particular problem.
Let me know if you need any further info.
Regards,
Josh

Let's suppose you have a table called IMDB like following
| tbl-IMDB |
|--------------------------|---------|----------------------|
| Name | Genre | Director |
|--------------------------|---------|----------------------|
| The Shawshank Redemption | Drama | Frank Darabont |
| The Godfather | Crime | Francis Ford Coppola |
| The Dark Knight | Action | Christopher Nolan |
| 12 Angry Men | Drama | Sidney Lumet |
| Schindler's List | History | Steven Spielberg |
| Pulp Fiction | Crime | Quentin Tarantino |
and two more disconnected table for slicers as following
| tbl-Director |
|----------------------|
| Director |
|----------------------|
| Frank Darabont |
| Francis Ford Coppola |
| Christopher Nolan |
| Sidney Lumet |
| Steven Spielberg |
| Quentin Tarantino |
|tbl-Genre|
|---------|
| Genre |
|---------|
| Drama |
| Crime |
| Action |
| History |
the data model looks like this
Now, if I understood your question correctly, when you select {Crime,Action} from Genre and {Sidney Lumet, Steven Spielberg} from Director it should return 5 instances; i.e. leaving only the first instance of -The Shawshank Redemption
In order to get there, first all the slicer combinations are needed to be taken into account and what they can do the table.
a. all the slicer values are selected in Director and Genre slicer - DAX to return the full table.
b. none of the slicer values are selected in Director and Genre slicer - DAX to return the full table because the default behavior of DAX is to return the full table if nothing is selected in the slicer.
c. partial values are selected in Genere and nothing/everything is selected in Director and vice versa -
DAX is to return the full table for example, if Drama is selected (2 instances) + nothing/everything is selected in Director (6 instnaces) -> total 6 instances
d. partial values are selected in both Genere and Director - DAX to return the addition of the instances according to the slicer selection - ;e,g. Drama=2+Sideney Lumet=1 => total 3 instances
The above logic can be incorporated in a DAX measure like following
Measure =
VAR _genre = --- what are the genres selected from Genere Slicer and find them in IMDB tbl
MAXX (
FILTER ( IMDB, IMDB[Genre] IN ALLSELECTED ( Genre[Genre] ) ),
IMDB[Genre]
)
VAR _director = --- what are the directors selected from Director Slicer and find them in IMDB tbl
MAXX (
FILTER ( IMDB, IMDB[Director] IN ALLSELECTED ( Director[Director] ) ),
IMDB[Director]
)
VAR _genreCountALL = -- what are the total count of genere from Genre tbl regardless of the slicer seletion
CALCULATE ( COUNT ( Genre[Genre] ), ALL ( Genre ) )
VAR _directorCountALL = -- what are the total count of director from Director tbl regardless of the slicer seletion
CALCULATE ( COUNT ( Director[Director] ), ALL ( Director ) )
VAR _genreCountSELECT = -- what are the total count of genere from Genre tbl according to slicer seletion
COUNT ( Genre[Genre] )
VAR _directorCountSELECT = -- what are the total count of director from Director tbl according to slicer seletion
COUNT ( Director[Director] )
VAR _slice = --- if Genere and/Director slicer are both selected then return the max else in every other instances it is a full tbl
SWITCH (
TRUE (),
_genreCountALL <> _genreCountSELECT
&& _directorCountALL <> _directorCountSELECT,
CALCULATE (
MAX ( IMDB[Name] ),
FILTER ( IMDB, IMDB[Director] IN { _director } || IMDB[Genre] IN { _genre } )
),
CALCULATE ( MAX ( IMDB[Name] ) )
)
RETURN
_slice
and put a visual level filter like following
The measure produces following
nothing is selected - returns full table
partial values are selected in only one slicer- returns full table
partial values are selected in all available slicers - returns the sliced table

Oracle 18c - Alternative to REGEXP_REPLACE

After migrating to Oracle 18c Enterprise Edition, a function based index fails to create.
Here is my index DDL:
CREATE INDEX my_index ON my_table
(UPPER( REGEXP_REPLACE ("DEPT_NUM",'[^[:alnum:]]',NULL,1,0)))
TABLESPACE my_tbspace
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 64K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
);
I get the following error:
ORA-01743: only pure functions can be indexed
01743. 00000 - "only pure functions can be indexed"
*Cause: The indexed function uses SYSDATE or the user environment.
*Action: PL/SQL functions must be pure (RNDS, RNPS, WNDS, WNPS). SQL
expressions must not use SYSDATE, USER, USERENV(), or anything
else dependent on the session state. NLS-dependent functions
are OK.
Is this a known bug in 18c? If this function based index is no longer supported, what is another way to write this function?

The issue is regexp_replace is not deterministic. The problem arises when changing NLS settings:
alter session set nls_language = english;
with rws as (
select 'STÜFF' v
from dual
)
select regexp_replace ( v, '[A-Z]+', '#' )
from rws;
REGEXP_REPLACE(V,'[A-Z]+','#')
#Ü#
alter session set nls_language = german;
with rws as (
select 'STÜFF' v
from dual
)
select regexp_replace ( v, '[A-Z]+', '#' )
from rws;
REGEXP_REPLACE(V,'[A-Z]+','#')
#
U-umlaut is at the end of the alphabet in English. But after U in German. So the first statement doesn't replace it. The second does.
In Oracle Database 12.1 and earlier regexp_replace was incorrectly marked as deterministic. 12.2 fixed this by making it non-deterministic.
Consider carefully whether any workarounds manage diacritics correctly.
MOS note 2592779.1 discusses this further.

Most likely the REGEXP_REPLACE causes the problem, see Find out if a string contains only ASCII characters. You can bypass the limitation with a user defined function (thanks to Bob Jarvis)
CREATE OR REPLACE FUNCTION KEEP_ALNUM(strIn IN VARCHAR2)
RETURN VARCHAR2
DETERMINISTIC
AS
BEGIN
RETURN UPPER(REGEXP_REPLACE(strIn, '[^[:alnum:]]', NULL, 1, 0));
END KEEP_ALNUM;
/
CREATE INDEX DEPTS_1 ON DEPTS(KEEP_ALNUM(DEPT_NUM));
Just ensure function has keyword DETERMINISTIC, then you can define even useless functions like below and create a functional index on it
CREATE OR REPLACE FUNCTION SillyValue RETURN VARCHAR2 DETERMINISTIC
AS
BEGIN
RETURN DBMS_RANDOM.STRING('p', 20);
END;
/

There are a couple of workarounds.
First one is a hack.
As you may know, when you create FBI then Oracle creates hidden column and index on it.
Moreover, you even can specify the name of that column instead of FBI expression and Oracle will use an index.
set lines 70 pages 70
column column_name format a15
column data_type format a15
drop table my_table;
create table my_table(dept_num, dept_descr) as select rownum||'*', 'dummy' from dual connect by level <= 1e6;
create index my_index
on my_table(upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0)));
select column_name, data_type from user_tab_cols where table_name = 'MY_TABLE';
explain plan for
select * from my_table where upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0)) = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
explain plan for
select * from my_table where SYS_NC00003$ = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
Output
Table dropped.
Table created.
Index created.
COLUMN_NAME DATA_TYPE
--------------- ---------------
DEPT_NUM VARCHAR2
DEPT_DESCR CHAR
SYS_NC00003$ VARCHAR2
3 rows selected.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
So to mimic FBI you can create a hidden column and an index on top of it.
That can be done in Oracle 11g using dbms_stats.create_extended_stats.
drop index my_index;
begin
for i in (select dbms_stats.create_extended_stats
(user, 'my_table', '(upper(regexp_replace("DEPT_NUM", ''[^[:alnum:]]'', null, 1, 0)))') as col_name
from dual)
loop
execute immediate(utl_lms.format_message('alter table %s rename column "%s" to my_hidden_col','my_table', i.col_name));
end loop;
end;
/
select column_name, data_type from user_tab_cols where table_name = 'MY_TABLE';
create index my_index on my_table(my_hidden_col);
explain plan for
select * from my_table where upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0)) = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
explain plan for
select * from my_table where MY_HIDDEN_COL = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
Output
Index dropped.
PL/SQL procedure successfully completed.
COLUMN_NAME DATA_TYPE
--------------- ---------------
DEPT_NUM VARCHAR2
DEPT_DESCR CHAR
MY_HIDDEN_COL VARCHAR2
3 rows selected.
Index created.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
Starting with Oracle 12c hidden columns are documented so it becomes even more straightforward.
alter table my_table add (my_hidden_col invisible as
(upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0))) virtual);
create index my_index on my_table(my_hidden_col);
Another approach is to implement the same logic without a regex.
create index my_index on my_table(
translate(upper(dept_num, '_'||translate(dept_num, '_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', '_'), '_')));
But in this case you have to make sure that all expressions with regex in predicates are replaced with the new one.

The work-around I found easiest was to create the index using NLS_UPPER instead of UPPER:
CREATE INDEX my_index ON my_table
( REGEXP_REPLACE (NLS_UPPER("DEPT_NUM"),'[^[:alnum:]]',NULL,1,0)))
TABLESPACE my_tbspace
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 64K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
);

Choose DAX measure based on slicer value

Is it possible to dynamically pick up appropriate DAX measure defined in a table by slicer value?
Source table:
+----------------+------------+
| col1 | col2 |
+----------------+------------+
| selectedvalue1 | [measure1] |
| selectedvalue2 | [measure2] |
| selectedvalue3 | [measure3] |
+----------------+------------+
The values of col1 I put into slicer. I can retrieve these values by:
SlicerValue = SELECTEDVALUE(tab[col1])
I could hard code:
MyVariable = SWITCH(TRUE(),
SlicerValue = "selectedvalue1" , [measure1],
SlicerValue = "selectedvalue2" , [measure2],
SlicerValue = "selectedvalue3" , [measure3],
BLANK()
)
But I do not want to hard code the relation SelectedValue vs Measure in DAX measure. I want to have it defined in the source table.
I need something like this:
MyMeasure = GETMEASURE(tab[col2])
Of course assuming that such a function exists and that only one value of col2 has been filtered.

#NickKrasnov mentioned calculation groups elsewhere. To automate the generation of your hard-coded lookup table, you could use DMVs against your pbix.
You might do something like below to get output formatted that can be pasted into a large SWITCH.
SELECT
'"' + [Name] + '", [' + [Name] + '],'
FROM $SYSTEM.TMSCHEMA_MEASURES

Django ORM: need list of values with max ID for each value

My table looks roughly like this:
ID | DISPLAY | CATEGORY | NAME
1 | true | 1 | some name
2 | true | 1 | some other name
3 | true | 1 | some name
4 | true | 2 | something else
I want a result set that would give me the name and max ID for a given category and display = true, so in SQL form:
select name, max(id) as recent
from TABLE
where category = 1 and display = true
And I have done this as:
rs = TABLE.objects.filter(category=1, display=True).values('name').annotate(recent=Max('id'))
But I'm getting a random id, not the maximum ID.
Why is that? What do I need to do?

Your django query return a list of grouped column "name" and the max_id for this group.
In your example, the result is:
Queryset({'name':'some name', recent="3"},{'name':'some other name', recent="2"})
If you need the max_id with these filters, you can use aggregate function:
rs = TABLE.objects.filter(category=1, display=True).aggregate(recent=Max('id'))

Compare Tables in BigQuery

How would I compare two tables (Table1 and Table2) and find all the new entries or changes in Table2.
Using SQL Server I can use
Select * from Table1
Except
Select * from Table2
Here a sample of what I want
Table1
A | 1
B | 2
C | 3
Table2
A | 1
B | 2
C | 2
D | 4
So, if I comparing the two tables I want my results to show me the following
C | 2
D | 4
I tried a few statements with no luck.

Now that I have your actual sample dataset, I can write a query that finds every domain in one table that is not on the other table:
https://bigquery.cloud.google.com/table/inbound-acolyte-377:demo.1024 has 24,729,816 rows. https://bigquery.cloud.google.com/table/inbound-acolyte-377:demo.1025 has 24,732,640 rows.
Let's look at everything in 1025 that is not in 1024:
SELECT a.domain
FROM [inbound-acolyte-377:demo.1025] a
LEFT OUTER JOIN EACH [inbound-acolyte-377:demo.1024] b
ON a.domain = b.domain
WHERE b.domain IS NULL
Result: 39,629 rows.
(8.1s elapsed, 2.04 GB processed)

To get the differences (given that tkey is your unique row identifier):
SELECT a.tkey, a.name, b.name
FROM [your.tableold] a
JOIN EACH [your.tablenew] b
ON a.tkey = b.tkey
WHERE a.name != b.name
LIMIT 100
For the new rows, one way is the one you proposed:
SELECT col1, col2
FROM table2
WHERE col1 NOT IN
(SELECT col1 FROM Table1)
(you'll have to switch to a JOIN EACH when Table1 gets too large)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Postgres - How to Join Tables without duplicates - django

Related

PowerBI Additive Slicers (Applied to whole Report)

Oracle 18c - Alternative to REGEXP_REPLACE

Choose DAX measure based on slicer value

Django ORM: need list of values with max ID for each value

Compare Tables in BigQuery

Categories

Resources