Combining data from 2 rows into 1

Combining data from 2 rows into 1 - row

I have an extremely complex query. I want the 2 row data combine in to 1 row.
It gives me the following Output
PNAME RN LVN HA MSW SC
AA AG-1W SS-1M LO-2W PA-1W SK-1M
AA JL-1W TD -1M NULL NULL NULL
IS there any way I could have the results in 1 Row or Combine the 2 Rows in to 1. Like As Follows.
PNAME RN LVN HA MSW SC
AA AG-1W SS-1M LO-2W PA-1W SK-1M
JL-1W TD -1M NULL NULL NULL

It is not exactly clear what you are trying to achieve, but you can implement usage of row_number() to prevent the pname from showing in additional rows:
select case when rownum = 1 then pname else '' end pname,
[RN], [LVN], [HA], [MSW], [SC]
from
(
select pname, disc, value,
ROW_NUMBER() over(partition by disc order by disc) rownum
from temp
) src
pivot
(
max(value)
for disc in ([RN], [LVN], [HA], [MSW], [SC])
) piv
See SQL Fiddle with demo
results:
| PNAME | RN | LVN | HA | MSW | SC |
----------------------------------------------------
| AA | AG-1W | SS-1M | LO-2W | PA-1W | SK-1M |
| | JL-1W | TD-1M | (null) | (null) | (null) |
This uses to the value of the row_number() to decide if the pname should be displayed. It will only show the value when the rownum=1, otherwise it will be blank.
If you want the data in a single row, you can use something similar to the following:
;with cte as
(
select pname, disc, value,
ROW_NUMBER() over(partition by disc order by disc) rownum
from temp
),
piv as
(
select *
from cte
pivot
(
max(value)
for disc in ([RN], [LVN], [HA], [MSW], [SC])
) piv
)
select pname,
STUFF((SELECT distinct ', ' + [RN]
from piv p2
where p1.pname = p2.pname
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'') RN,
STUFF((SELECT distinct ', ' + [LVN]
from piv p2
where p1.pname = p2.pname
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'') LVN,
STUFF((SELECT distinct ', ' + [HA]
from piv p2
where p1.pname = p2.pname
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'') HA,
STUFF((SELECT distinct ', ' + [MSW]
from piv p2
where p1.pname = p2.pname
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'') MSW,
STUFF((SELECT distinct ', ' + [SC]
from piv p2
where p1.pname = p2.pname
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'') SC
from piv p1
group by pname
See SQL Fiddle with Demo
The result is:
| PNAME | RN | LVN | HA | MSW | SC |
--------------------------------------------------------------------
| AA | AG-1W, JL-1W | SS-1M, TD-1M | LO-2W | PA-1W | SK-1M |

Related

Usage of TSQL string_split in DAX

I have 2 tables in SSAS / Power BI:
Table1:
| ValueName| ValueKey |
|:---- |:------: |
| abc | 1,2,3 |
Table2:
| ID | ValueKey | Value |
|:---- |:------: |:------: |
| ID1 | 1 | 87,8 |
| ID2 | 85 | 14 |
| ID3 | 90 | 95,8 |
| ID4 | 3 | 13,4 |
I need to retrieve (in temp table, later make calculations over this temp table) ID, Value and only those rows, which have ValueKey 1 or 2 or 3.
I need to do it with DAX. In SQL we have for such situation STING_SPLIT function. Is there some way how can I achive this with DAX? My ValueKey column (table1) is comma separated text and ValueKey (table2) column is INT.
Thanks in advance

Like #Jeroen Mostert suggests, you can do this by abusing the PATHCONTAINS function like this:
FilteredTable2 =
VAR CurrKey = SELECTEDVALUE ( Table1[ValueKey] )
VAR PathFromKey = SUBSTITUTE ( CurrKey, ",", "|" ) /* Paths use | as separator. */
RETURN
FILTER ( Table2, PATHCONTAINS ( PathFromKey, Table2[ValueKey] ) )
However, this is not best practice for relating tables. In general, you don't want multiple keys in a single fields.

Get latest value for each ID in Power BI / DAX measure

In Power BI I would like to create a DAX measure that will retrieve the latest string value for specific IDs. Example source table:
Name_ID | Name | DateTime | Value
----------------------------------------------------------
1 | Child_1 | 18.8.2021 12:33:24 | F
32 | Parent_32 | 18.8.2021 11:41:09 | F
13 | Child_1 | 18.8.2021 11:30:58 | E
48 | Parent_48 | 18.8.2021 09:13:11 | F
2 | Child_2 | 17.8.2021 00:09:42 | S
1 | Child_1 | 17.8.2021 23:03:34 | F
48 | Parent_48 | 17.8.2021 21:46:27 | S
6 | Parent_6 | 16.8.2021 17:31:26 | S
.
.
.
specific parents IDs for example here are 6, 32 and 48, so the result should be something like this:
Name_ID | Name | DateTime (of last execution) | Value
------------------------------------------------------------------------------
32 | Parent_32 | 18.8.2021 11:41:09 | F
48 | Parent_48 | 18.8.2021 09:13:11 | F
6 | Parent_6 | 16.8.2021 17:31:26 | S
The result table I'm trying to get is only parents latest appearance and retrieving the whole row or just Value from last column.
This seems so easy in theory and on paper but I just can't seem to get it in DAX I have tried with various calculate formulas but without any result worth mentioning .
I'm beginner in Power Bi and any help would be very appreciated!

You can use a measure like this one, where we check Max Date per Name:
Flag =
var MaxDatePerName = CALCULATE(max(Sheet3[DateTime]), FILTER(ALL(Sheet3), SELECTEDVALUE(Sheet3[Name]) = Sheet3[Name]))
return
if( MaxDatePerName = SELECTEDVALUE(Sheet3[DateTime]) && LEFT(SELECTEDVALUE(Sheet3[Name]),6) = "Parent", 1, BLANK())

With RANKX
Measure2 =
VAR _0 =
MAX ( 'Table 1'[DateTime] )
VAR _00 =
MAX ( 'Table 1'[Name] )
VAR _1 =
CALCULATE (
RANKX (
FILTER ( ALL ( 'Table 1' ), 'Table 1'[Name] = _00 ),
CALCULATE ( MAX ( 'Table 1'[DateTime] ) ),
,
DESC
)
)
VAR _2 =
IF ( _1 = 1 && CONTAINSSTRING ( _00, "Parent" ) = TRUE (), _0, BLANK () )
RETURN
_2

Row number partition by to POWER BI DAX query

Can someone help me to convert the sql string to Dax?
row_number() p over (partition by date, customer, type order by day)
The row number is my desired output.

Assuming that your data looks like this table:
Sample
+------------+----------+---------+--------+
| Date | Customer | Product | Gender |
+------------+----------+---------+--------+
| 01/01/2018 | 1234 | P2 | F |
| 01/01/2018 | 1234 | P2 | M |
| 03/01/2018 | 1235 | P1 | F |
| 03/01/2018 | 1235 | P2 | F |
+------------+----------+---------+--------+
I have created a calculated column called Rank, using the RANKX and FILTER function.
The first part of the calculation is to create variables outside the scope of the FILTER function. The second part uses RANKX that takes an expression value - in this case Gender - to order the values.
Rank =
VAR _currentdate = 'Sample'[Date]
VAR _customer = 'Sample'[Customer]
var _product = 'Sample'[Product]
return
RANKX(FILTER('Sample',
[Date]=_currentdate &&
[Customer] = _customer &&
[Product] = _product),[Gender],,ASC)
The output is
I contrasted the output to the SQL equivalent.
select
*,
row_number() over(partition by Date,Customer,Product order by Gender)
from (
select '2018-01-01' as Date,1234 as CUSTOMER,'P2' AS PRODUCT, 'M' Gender union
select '2018-01-01' as Date,1234,'P2','F' UNION
select '2018-01-03' as Date,1235,'P1','F' UNION
select '2018-01-03' as Date,1235,'P2','F'
)t1

regex for specific delimiter string in Hive serde

I use serde to read data with specific format with delimiter |
One line of my data may looks like: key1=value2|key2=value2|key3="va , lues", and I create the hive table as below:
CREATE EXTERNAL TABLE(
field1 STRING,
field2 STRING,
field3 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^\\|]*)\\|([^\\|]*)\\|([^\\|]*)",
"output.format.string" = "%1$s %2$s %3$s"
)
STORED AS TEXTFILE;
I need to extract all values, ignore all quotas if they exist.
Result looks like a
value2 value2 va , lues
How can I change my current regexp for extractig values ?

I can currently offer 2 options, none of them is perfect.
BTW, "output.format.string" is obsolete and has no effect.
1
create external table mytable
(
q1 string
,field1 string
,q2 string
,field2 string
,q3 string
,field3 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '.*?=(?<q1>"?)(.*?)(?:\\k<q1>)\\|.*?=(?<q2>"?)(.*?)(?:\\k<q2>)\\|.*?=(?<q3>"?)(.*?)(?:\\k<q3>)')
stored as textfile
;
select * from mytable
;
+----+--------+----+--------+----+-----------+
| q1 | field1 | q2 | field2 | q3 | field3 |
+----+--------+----+--------+----+-----------+
| | value2 | | value2 | " | va , lues |
+----+--------+----+--------+----+-----------+
2
create external table mytable
(
field1 string
,field2 string
,field3 string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties ('input.regex' = '.*?=(".*?"|.*?)\\|.*?=(".*?"|.*?)\\|.*?=(".*?"|.*?)')
stored as textfile
;
select * from mytable
;
+--------+--------+-------------+
| field1 | field2 | field3 |
+--------+--------+-------------+
| value2 | value2 | "va , lues" |
+--------+--------+-------------+

How to remove duplicates in hive string?

I have a comma-separated column(string) with duplicate values. I want to remove duplicates:
e.g.
column_name
-----------------
gun,gun,man,gun,man
shuttle,enemy,enemy,run
hit,chase
I want result like:
column_name
----------------
gun,man
shuttle,enemy,run
hit,chase
I am using hive database.

Option 1: keep last occurrence
This will keep the last occurrence of every word.
E.g. 'hello,world,hello,world,hello' will result in 'world,hello'
select regexp_replace
(
column_name
,'(?<=^|,)(?<word>.*?),(?=.*(?<=,)\\k<word>(?=,|$))'
,''
)
from mytable
;
+-------------------+
| gun,man |
| shuttle,enemy,run |
| hit,chase |
+-------------------+
Option 2: keep first occurrence
This will keep the first occurrence of every word.
E.g. 'hello,world,hello,world,hello' will result in 'hello,world'
select reverse
(
regexp_replace
(
reverse(column_name)
,'(?<=^|,)(?<word>.*?),(?=.*(?<=,)\\k<word>(?=,|$))'
,''
)
)
from mytable
;
Option 3: sorted
E.g. 'Cherry,Apple,Cherry,Cherry,Cherry,Banana,Apple' will result in 'Apple,Banana,Cherry'
select regexp_replace
(
concat_ws(',',sort_array(split(column_name,',')))
,'(?<=^|,)(?<word>.*?)(,\\k<word>(?=,|$))+'
,'${word}'
)
from mytable
;

If value sort is not a concern:
with mytable as (
select 'gun,gun,man,gun,man' as column_name union
select 'shuttle,enemy,enemy,run' as column_name union
select 'hit,chase' as column_name
) -- test data
SELECT column_name, concat_ws(',',collect_set(item)) from (
select distinct column_name, s.item from mytable
lateral view explode(split(column_name,',')) s as item
) t
group by column_name
;
+--------------------------+--------------------+--+
| column_name | _c1 |
+--------------------------+--------------------+--+
| gun,gun,man,gun,man | gun,man |
| hit,chase | chase,hit |
| shuttle,enemy,enemy,run | enemy,run,shuttle |
+--------------------------+--------------------+--+
If want to keep the value sorted:
with mytable as (
select 'gun,gun,man,gun,man' as column_name union
select 'shuttle,enemy,enemy,run' as column_name union
select 'hit,chase' as column_name
) -- test data
select column_name,concat_ws(',',collect_set(item)) as column_name_distincted
from (
select column_name,item, min(pos) as pos
from (
select column_name,pos,item
from mytable
lateral view posexplode(split(column_name,',')) s as pos,item
) t
group by column_name,item
order by column_name,pos
) t
group by column_name
;
+--------------------------+-------------------------+--+
| column_name | column_name_distincted |
+--------------------------+-------------------------+--+
| gun,gun,man,gun,man | gun,man |
| hit,chase | hit,chase |
| shuttle,enemy,enemy,run | shuttle,enemy,run |
+--------------------------+-------------------------+--+

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Combining data from 2 rows into 1 - row

Related

Usage of TSQL string_split in DAX

Get latest value for each ID in Power BI / DAX measure

Row number partition by to POWER BI DAX query

regex for specific delimiter string in Hive serde

How to remove duplicates in hive string?

Categories

Resources