PostgreSQL - splitting rows using regex - regex

I have got a table that has got colums ID, name and description. Table looks like:
ID | Name | Model
1 | Ford | Focus 3
2.1-3 | Opel | 1. Astra 2
| | 2. Vectra 2
| | 3. Vectra 3
3.1-2 | Toyota| Avensis 2; Micra
4.1-2 | Opel | (various versions) 1. Astra
| | (various versions) 2. Vectra
5.1-3 | Mazda | MX5; GTR; MX4
And I would like to split it into new rows using "regexp_split_to_table"
To get result like this:
ID | Name | Description
1 | Ford | Focus 3
2.1 | Opel | Astra 2
2.2 | Opel | Vectra 2
2.3 | Opel | Vectra 3
3.1 | Toyota| Avensis
3.2 | Toyota| Micra
4.1 | Opel | Astra
4.2 | Opel | Vectra
5.1 | Mazda | MX5
5.2 | Mazda | GTR
5.3 | Mazda | MX4
How to do it using postresql and update main table after that?
Big thanks for your help!

Generally it is impossible for several reasons:
Your data is not ordered, for your example it is possible to have something like
ID | Name | Description
| | (year) 2. 2004
1 | Ford | Some text
5.1-3 | Mazda | Petrol; 1.9; 3-doors
| | 2. Diesel
2.1-3 | Opel | 1. Astra
| | 3. 2005
3 | Toyota| 2001; Petrol; 1.8 TDI
4.1-2 | Opel | (model) 1. Vectra
Just try something like
drop table if exists my_ugly_table;
create table my_ugly_table as select generate_series(1,3) as x, generate_series(1,3) as y;
select * from my_ugly_table;
update my_ugly_table set y = 4 where x = 2;
select * from my_ugly_table;
and you will have
first result:
x | y
---+---
1 | 1
2 | 2
3 | 3
(3 rows)
second result:
x | y
---+---
1 | 1
3 | 3
2 | 4
(3 rows)
As you can see row order was changed.
Next, your goal is wrong at the point that you want to keep in the ID some valuable data like 1.2, 1.3 and so on. ID must be just a unique identifier of the row and nothing else. Ideally you have no any knowledge about values of IDs - them just exists.
However you can try do do something with your original data using plpgsql or something like:
First of all create table where we will performing something bad with our data:
create table models_t as select * from models m where 1=2;
It will create empty table models_t with structure same to table models.
Finally for them lets create truly PK:
alter table models_t add mt_id serial not null primary key;
Next lets fill it by data:
do language plpgsql $$
declare
p_rec models;
c_rec models;
begin
p_rec := null;
for c_rec in (select * from models) loop
p_rec.id := coalesce(c_rec.id, p_rec.id);
p_rec.name := coalesce(c_rec.name, p_rec.name);
p_rec.description := c_rec.description;
insert into models_t values (p_rec.id, p_rec.name, p_rec.description);
raise notice '% %', c_rec.id, c_rec.name;
end loop;
end; $$;
Only one result of it is that we have table without gaps like:
postgres=# select * from models_t;
id | name | description | mt_id
-------+--------+----------------------+-------
1 | Ford | Some text | 1
2.1-3 | Opel | 1. Astra | 2
2.1-3 | Opel | 2. Diesel | 3
2.1-3 | Opel | 3. 2005 | 4
3 | Toyota | 2001; Ptrol; 1.8 TDI | 5
4.1-2 | Opel | (model) 1. Vectra | 6
4.1-2 | Opel | (year) 2. 2004 | 7
5.1-3 | Mazda | Petrol; 1.9; 3-doors | 8
(8 rows)
Actually it is enough. However, lets parse our last data:
select
*,
substring(id from '(\d*)\.?.*') as main_id, -- First number before dot
row_number() over (partition by substring(id from '(\d*)\.?.*')) as secondary_id -- Order inside previous value
from models_t;
Result:
id | name | description | mt_id | main_id | secondary_id
-------+--------+----------------------+-------+---------+--------------
1 | Ford | Some text | 1 | 1 | 1
2.1-3 | Opel | 1. Astra | 2 | 2 | 1
2.1-3 | Opel | 2. Diesel | 3 | 2 | 2
2.1-3 | Opel | 3. 2005 | 4 | 2 | 3
3 | Toyota | 2001; Ptrol; 1.8 TDI | 5 | 3 | 1
4.1-2 | Opel | (model) 1. Vectra | 6 | 4 | 1
4.1-2 | Opel | (year) 2. 2004 | 7 | 4 | 2
5.1-3 | Mazda | Petrol; 1.9; 3-doors | 8 | 5 | 1
(8 rows)
At that point we can use columns main_id and secondary_id to build wanted IDs like 1.1 or 2.3.
Everything other is up to you.
Good luck and have fun.

Related

How do I repeat row labels in a matrix?

I have data showing me the dates grouped like this:
For security reasons, I had to remove the Customer Description detail, due to confidentiality.
How do I repeat the date column the same way you repeat the Row Labels in an Excel Pivot?
I've looked, but couldn't find a solution to this - this option should be available.
EDIT
When you have the following source data in Excel:
Date | Customer | Item Description | Qty Out | Unit Price | Sales
--------------------------------------------------------------------------------------------------------------------------------------------
14/08/2020 | Customer 1 | Item 11 | 4.00 | 65.00 | 260.00
14/08/2020 | Customer 2 | Item 12 | 56.00 | 12.00 | 672.00
14/08/2020 | Customer 3 | Item 13 | 64.00 | 35.00 | 2,240.00
14/08/2020 | Customer 4 | Item 14 | 29.00 | 65.00 | 1,885.00
15/08/2020 | Customer 2 | Item 15 | 746.00 | 12.00 | 8,952.00
15/08/2020 | Customer 3 | Item 16 | 14.00 | 75.00 | 1,050.00
15/08/2020 | Customer 4 | Item 17 | 45.00 | 741.00 | 33,345.00
15/08/2020 | Customer 5 | Item 18 | 456.00 | 125.00 | 57,000.00
15/08/2020 | Customer 6 | Item 19 | 925.00 | 17.00 | 15,725.00
16/08/2020 | Customer 4 | Item 20 | 6.00 | 532.00 | 3,192.00
16/08/2020 | Customer 5 | Item 21 | 56.00 | 94.00 | 5,264.00
16/08/2020 | Customer 6 | Item 22 | 546.00 | 37.00 | 20,202.00
You then pivot this data using Microsoft Excel, where you get the following:
You then choose the option to Repeat Item Labels as can be seen below:
After selecting this, you get my expected results I require in Power BI:
Is there not a function available like this in Power BI?
Just adding this for your reference as a work around. Check this below image with a custom column created in the Power Query Editor-
date_customer = Date.ToText([Date]) &" : "& [Customer]
Then added both Date and date_customer in the Matrix row level. The output is as below- (using your sample data)
ANOTHER OPTION Another option is to add Date and Customer in the Matrix row and the output is will be as below- (using your sample data)
This is also a meaningful output as date are showing as a group header. But in case of requirement of having redundant date to show, you can consider the first option.

Checking for a Range of Values

I could check for a range of values, use the BETWEEN operator.
MySQL [distributor]> select prod_name, prod_price from products where prod_price between 3.49 and 11.99;
+---------------------+------------+
| prod_name | prod_price |
+---------------------+------------+
| Fish bean bag toy | 3.49 |
| Bird bean bag toy | 3.49 |
| Rabbit bean bag toy | 3.49 |
| 8 inch teddy bear | 5.99 |
| 12 inch teddy bear | 8.99 |
| 18 inch teddy bear | 11.99 |
| Raggedy Ann | 4.99 |
| King doll | 9.49 |
| Queen doll | 9.49 |
+---------------------+------------+
9 rows in set (0.005 sec)
I reference to django docs and found gte, gt, lt, lte but no between.
How could I achieve the between functionality?
use this in django ORM products.objects.filter(prod_price__range=(3.49 , 11.99)) ref for more info

PowerBI: Use non-shown values for Drillthrough

I am trying to build a Power BI report for data from a SQL database where I have to show detail pages using Drillthrough. The only viable way to connect the datasets is using the database row ids.
From a user's perspective the row ids would not add any value but a lot of noise.
Is there a way to drillthrough using the row ids without showing them in a visual?
Yes, this is possible in the current release of Power Bi Desktop using a workaround solution that involves hiding the row id column in the parent (or summary) page.
Take the following tables as example:
ALBUM
+---------+------------------------+
| AlbumId | AlbumName |
+---------+------------------------+
| 1 | Hoist |
+---------+------------------------+
| 2 | The Story Of the Ghost |
+---------+------------------------+
TRACK
+---------+---------+--------------------------+
| TrackId | AlbumId | TrackName |
+---------+---------+--------------------------+
| 1 | 1 | Julius |
+---------+---------+--------------------------+
| 2 | 1 | Down With Disease |
+---------+---------+--------------------------+
| 3 | 1 | If I Could |
+---------+---------+--------------------------+
| 4 | 1 | Riker's Mailbox |
+---------+---------+--------------------------+
| 5 | 1 | Axilla, Part II |
+---------+---------+--------------------------+
| 6 | 1 | Lifeboy |
+---------+---------+--------------------------+
| 7 | 1 | Sample In a Jar |
+---------+---------+--------------------------+
| 8 | 1 | Wolfmans Brother |
+---------+---------+--------------------------+
| 9 | 1 | Scent of a Mule |
+---------+---------+--------------------------+
| 10 | 1 | Dog Faced Boy |
+---------+---------+--------------------------+
| 11 | 1 | Demand |
+---------+---------+--------------------------+
| 12 | 2 | Ghost |
+---------+---------+--------------------------+
| 13 | 2 | Birds of a Feather |
+---------+---------+--------------------------+
| 14 | 2 | Meat |
+---------+---------+--------------------------+
| 15 | 2 | Guyute |
+---------+---------+--------------------------+
| 16 | 2 | Fikus |
+---------+---------+--------------------------+
| 17 | 2 | Shafty |
+---------+---------+--------------------------+
| 18 | 2 | Limb by Limb |
+---------+---------+--------------------------+
| 19 | 2 | Frankie Says |
+---------+---------+--------------------------+
| 20 | 2 | Brian and Robert |
+---------+---------+--------------------------+
| 21 | 2 | Water in the Sky |
+---------+---------+--------------------------+
| 22 | 2 | Roggae |
+---------+---------+--------------------------+
| 23 | 2 | Wading in the Velvet Sea |
+---------+---------+--------------------------+
| 24 | 2 | The Moma Dance |
+---------+---------+--------------------------+
| 25 | 2 | End of Session |
+---------+---------+--------------------------+
Add them as data sources. The 1:many relationship between AlbumId should be created. Create a parent page with a table containing AlbumId and AlbumName. Then create the details page with a table containing only the TrackName column. In the Drillthrough filter field of the details page, drag the Album Table -> AlbumId to this field.
Now go back to the parent page and notice that when you right click on an album, you get the drillthrough menu to the details page. This works, but now you have a messy AlbumId column on your parent page.
The workaround is to hide the AlbumId on the parent report. First go to the Format(Paint roller) menu of the table on the parent report and in the column header -> word wrap turn this off. Then drag the column separator of the table to hide the AlbumId. See before and after images below.
BEFORE HIDE
AFTER HIDE
I have the powerbi file posted here if you want to see it in action.

Pattern matching with regular expression in spark dataframes using spark-shell

Suppose we are given dataset ("DATA") like :
YEAR | FIRST NAME | LAST NAME | VARIABLES
2008 | JOY | ANDERSON | spark|python|scala; 45;w/o sports;w datascience
2008 | STEVEN | JOHNSON | Spark|R; 90|56
2006 | NIHA | DIVA | w/o sports
and we have another dataset ("RESULT") like :
YEAR | FIRST NAME | LAST NAME
1992 | EMMA | CENA
2008 | JOY | ANDERSON
2008 | STEVEN | ANDERSON
2006 | NIHA | DIVA
and so on.
The output should be ("RESULT") :
YEAR | FIRST NAME | LAST NAME | SUBJECT | SCORE | SPORTS | DATASCIENCE
1992 | EMMA | CENA | | | |
2008 | JOY | ANDERSON | SPARK | 45 | FALSE | TRUE
2008 | JOY | ANDERSON | PYTHON | 45 | FALSE | TRUE
2008 | JOY | ANDERSON | SCALA | 45 | FALSE | TRUE
2008 | STEVEN | ANDERSON | | | |
2006 | NIHA | DIVA | | | FALSE |
2008 | STEVEN | JOHNSON | SPARK | 90 | |
2008 | STEVEN | JOHNSON | SPARK | 56 | |
2008 | STEVEN | JOHNSON | R | 90 | |
2008 | STEVEN | JOHNSON | R | 56 | |
and so on.
Please note that there are some rows in DATA which are not present in RESULT and vice-versa. For eg - "2008,STEVEN,JOHNSON" is not present in RESULT but is present in DATA. And the entries should be made in RESULT dataset. The columns {SUBJECT, SCORE, SPORTS, DATASCIENCE} are made by my intuition that "spark" refers to the SUBJECT and so on.
Hope you understand my query. And I am using spark-shell with spark dataframes.
Note that "Spark" and "spark" should be considered as same.
As explained in the comments, you have can implement some of the tricky logic as in answers to splitting row in multiple row in spark-shell
data:
val df = List(
("2008","JOY ","ANDERSON ","spark|python|scala;45;w/o sports;w datascience"),
("2008","STEVEN ","JOHNSON ","Spark|R;90|56"),
("2006","NIHA ","DIVA ","w/o sports")
).toDF("YEAR","FIRST NAME","LAST NAME","VARIABLE")
I only highlight the relatively tricky parts, you can figure it out the details yourself. I suggest to handle "w" and "w/o" tags separately. Furthermore, you have to explode the language in separate "sql" statements. This give
val step1 = df.withColumn("backrefReplace",split(regexp_replace('VARIABLE,"^([A-z|]+)?;?([\\d\\|]+)?;?(w.*)?$","$1"+sep+"$2"+sep+"$3"),sep))
.withColumn("letter",explode(split('backrefReplace(0),"\\|")))
.select('YEAR,$"FIRST NAME",$"LAST NAME",'VARIABLE,'letter,
explode(split('backrefReplace(1),"\\|")).as("digits"),
'backrefReplace(2).as("tags")
)
which gives
scala> step1.show(false)
+----+----------+---------+----------------------------------------------+------+------+------------------------+
|YEAR|FIRST NAME|LAST NAME|VARIABLE |letter|digits|tags |
+----+----------+---------+----------------------------------------------+------+------+------------------------+
|2008|JOY |ANDERSON |spark|python|scala;45;w/o sports;w datascience|spark |45 |w/o sports;w datascience|
|2008|JOY |ANDERSON |spark|python|scala;45;w/o sports;w datascience|python|45 |w/o sports;w datascience|
|2008|JOY |ANDERSON |spark|python|scala;45;w/o sports;w datascience|scala |45 |w/o sports;w datascience|
|2008|STEVEN |JOHNSON |Spark|R;90|56 |Spark |90 | |
|2008|STEVEN |JOHNSON |Spark|R;90|56 |Spark |56 | |
|2008|STEVEN |JOHNSON |Spark|R;90|56 |R |90 | |
|2008|STEVEN |JOHNSON |Spark|R;90|56 |R |56 | |
|2006|NIHA |DIVA |w/o sports | | |w/o sports |
+----+----------+---------+----------------------------------------------+------+------+------------------------+
Then you have to handle capitalisation, and the tags. For the tags, you can have a relatively generic code using explode and pivot, but you have to do some cleaning to match your exact result. Here is an example:
List(("a;b;c")).toDF("str")
.withColumn("char",explode(split('str,";")))
.groupBy('str)
.pivot("char")
.count
.show()
+-----+---+---+---+
| str| a| b| c|
+-----+---+---+---+
|a;b;c| 1| 1| 1|
+-----+---+---+---+
Read more about pivot here
The final step is simply to do a left join on the second dataset (first "RESULT").

How to run raw query with a model with dynamic fields in Django 1.9?

I have a complex result that requires writing raw sql queries.
See https://stackoverflow.com/a/38548462/80353
The expected result is a table showing several columns.
The first column header is simply Product and the other column headers are store names.
The values are simply the product names and the aggregated sales values of the product in these stores.
Which stores will be shown is entirely dynamic. Maximum should be 9 stores.
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
For more details of the schema, check the question in How to get back aggregate values across 2 dimensions using Python Cubes?
My question
The schema is not super important to my question which is:
Since I am going to write a complex raw query, is there a way to map the query result to a model where the fields are dynamic?
I found documentation about how to execute raw queries in Django and how to execute raw queries to existing models with fixed fields and matching table.
My question is is it possible to do that for a model that has no matching table and dynamic fields?
If so, how?
Or if I choose to use materialised view in postgresql, how do I match it with a model class?