How Retrieve Part Of text Using Regex Sunstring

How Retrieve Part Of text Using Regex Sunstring - regex

Hi All I have Query text From Query History such as "Create OR Replace Procedure PROCEDURENAME()", what i want is Procedure name such as in this string case "PROCEDURENAME" to be found using Regex Substring function. The Developer can create Procedure with this Syntax too "CREATE PROCEDURE PROCEDUREENAME()" so the reg expression should find out the name of procedure too.

If you want to get the procedure name and parameters, you can use the following:
select regexp_substr( qtext, 'CREATE.*PROCEDURE\\s+([^)]*\\))', 1, 1, 'ei', 1 ) as res from
values('Create OR Replace Procedure PROCEDURENAME1() as xyz'),
('Create procedure PROCEDURENAME2( a varchar)'),
('Create procedure PROCEDURENAME3( a varchar, b number)')
tmp(qtext) ;
+--------------------------------------+
| RES |
+--------------------------------------+
| PROCEDURENAME1() |
| PROCEDURENAME2( a varchar) |
| PROCEDURENAME3( a varchar, b number) |
+--------------------------------------+
If you want to parse the only procedure name, you can use this one:
select regexp_substr( qtext, 'CREATE.*PROCEDURE\\s+([^()]*)\(.*\)', 1, 1, 'ei', 1 ) as res from
values('Create OR Replace Procedure PROCEDURENAME1() as xyz'),
('Create procedure PROCEDURENAME2( a varchar)'),
('Create procedure PROCEDURENAME3( a varchar, b number)')
tmp(qtext) ;
+----------------+
| RES |
+----------------+
| PROCEDURENAME1 |
| PROCEDURENAME2 |
| PROCEDURENAME3 |
+----------------+

Related

Usage of TSQL string_split in DAX

I have 2 tables in SSAS / Power BI:
Table1:
| ValueName| ValueKey |
|:---- |:------: |
| abc | 1,2,3 |
Table2:
| ID | ValueKey | Value |
|:---- |:------: |:------: |
| ID1 | 1 | 87,8 |
| ID2 | 85 | 14 |
| ID3 | 90 | 95,8 |
| ID4 | 3 | 13,4 |
I need to retrieve (in temp table, later make calculations over this temp table) ID, Value and only those rows, which have ValueKey 1 or 2 or 3.
I need to do it with DAX. In SQL we have for such situation STING_SPLIT function. Is there some way how can I achive this with DAX? My ValueKey column (table1) is comma separated text and ValueKey (table2) column is INT.
Thanks in advance

Like #Jeroen Mostert suggests, you can do this by abusing the PATHCONTAINS function like this:
FilteredTable2 =
VAR CurrKey = SELECTEDVALUE ( Table1[ValueKey] )
VAR PathFromKey = SUBSTITUTE ( CurrKey, ",", "|" ) /* Paths use | as separator. */
RETURN
FILTER ( Table2, PATHCONTAINS ( PathFromKey, Table2[ValueKey] ) )
However, this is not best practice for relating tables. In general, you don't want multiple keys in a single fields.

SAS IFN function gets stuck

Before we get to my question please note that I purposefully did not include example data in this post, as my problem occurs on my full dataset and subsets of it. I have two dataset with client data in the following format.
Have_1
+------------+------------+------+
| dt | dt_next | id |
+------------+------------+------+
| 30.09.2010 | 31.10.2010 | 0001 |
+------------+------------+------+
| 31.10.2010 | 30.11.2010 | 0001 |
+------------+------------+------+
| 30.11.2010 | 31.12.2010 | 0001 |
+------------+------------+------+
| 31.12.2010 | 31.01.2011 | 0001 |
+------------+------------+------+
Have_2
+------+-------+------------+------------+
| id | event | start_date | end_date |
+------+-------+------------+------------+
| 0001 | 1 | 31.10.2010 | 30.11.2010 |
+------+-------+------------+------------+
| 0001 | 2 | 31.10.2010 | 31.12.2010 |
+------+-------+------------+------------+
I am trying to use the IFN function to put 1-0 flags in my dataset by using the following logic:
Proc SQL;
Create table want as
Select a.*
,ifn(a.id in (select id from have_2 where a.dt <= end_date and start_date <= a.dt_next), 1, 0) as flg_1
,ifn(a.id in (select id from have_2 where a.dt <= end_date and start_date <= a.dt), 1, 0) as flg_2
From have_1 as a;
Quit;
The code works fine if I take only one client, however, if I take the full dataset (or even a small subset of it such as only 10 clients) then the code gets stuck in the sense that the process begins without error but simply never finishes. I tried setting indexes to both my input datasets, without success.
Are there any peculiarities to the IFN function, which can make it behave that way?

So why not just join and take the max of all events if any event's dates fall into those periods? That should eliminate the need to do two subqueries for every observation in HAVE1.
proc sql;
create table want2 as
select a.id
, a.dt
, a.dt_next
, max(a.dt <= b.end_date and b.start_date <= a.dt_next) as flg1
, max(a.dt <= b.end_date and b.start_date <= a.dt) as flg2
from have1 a
left join have2 b
on a.id = b.id
group by 1,2,3
;
quit;
Note the issue is with the subqueries, not the IFN() function call. Also there is no need for IFN() function here. SAS evaluates boolean expressions to 1 or 0. So the expression a=b returns the same result as IFN(a=b,1,0) returns.

PowerQuery - Fill missing data according to specific pattern

I am trying to clean data received from an Excel file and transform it using PowerQuery (in PowerBI) into a useable format.
Below a sample table, and what I am trying to do:
| Country | Type of location |
|--------- |------------------ |
| A | 1 |
| | 2 |
| | 3 |
| B | 1 |
| | 2 |
| | 3 |
| C | 1 |
| | 2 |
| | 3 |
As you can see, I have a list of location types for each country (always constant, always the same number per country, ie each country has 3 rows for 3 location types)
What I am trying to do is to see if there is a way to fill the empty cells in the "Country" column, with the appropriate Country name, which would give something like this:
| Country | Type of location |
|--------- |------------------ |
| A | 1 |
| A | 2 |
| A | 3 |
| B | 1 |
| B | 2 |
| B | 3 |
| C | 1 |
| C | 2 |
| C | 3 |
For now I thought about using a series of if/else if conditions, but as there are 100+ countries this doesn't seem like the right solution.
Is there any way to do this more efficiently?

As Murray mentions, the Table.FillDown function works great and is built into the GUI under the Transform tab in the query editor:
Note that it only fills down to replace nulls, so if you have empty strings instead of nulls in those rows, you'll need to do a replacement first. The button for that is just above the Fill button in the GUI and you'd use the dialog box like this
or else just use the M code that this generates instead of the GUI:
= Table.ReplaceValue(#"Previous Step","",null,Replacer.ReplaceValue,{"Country"})

Yes, like you can do in Excel, you can fill down.
From the docs - Table.FillDown
I believe you will need to sort the data correctly first.
Table.FillDown(
Table.FromRecords({
[Place = 1, Name = "Bob"],
[Place = null, Name = "John"],
[Place = 2, Name = "Brad"],
[Place = 3, Name = "Mark"],
[Place = null, Name = "Tom"],
[Place = null, Name = "Adam"]
}),
{"Place"}
)

How to remove duplicates in hive string?

I have a comma-separated column(string) with duplicate values. I want to remove duplicates:
e.g.
column_name
-----------------
gun,gun,man,gun,man
shuttle,enemy,enemy,run
hit,chase
I want result like:
column_name
----------------
gun,man
shuttle,enemy,run
hit,chase
I am using hive database.

Option 1: keep last occurrence
This will keep the last occurrence of every word.
E.g. 'hello,world,hello,world,hello' will result in 'world,hello'
select regexp_replace
(
column_name
,'(?<=^|,)(?<word>.*?),(?=.*(?<=,)\\k<word>(?=,|$))'
,''
)
from mytable
;
+-------------------+
| gun,man |
| shuttle,enemy,run |
| hit,chase |
+-------------------+
Option 2: keep first occurrence
This will keep the first occurrence of every word.
E.g. 'hello,world,hello,world,hello' will result in 'hello,world'
select reverse
(
regexp_replace
(
reverse(column_name)
,'(?<=^|,)(?<word>.*?),(?=.*(?<=,)\\k<word>(?=,|$))'
,''
)
)
from mytable
;
Option 3: sorted
E.g. 'Cherry,Apple,Cherry,Cherry,Cherry,Banana,Apple' will result in 'Apple,Banana,Cherry'
select regexp_replace
(
concat_ws(',',sort_array(split(column_name,',')))
,'(?<=^|,)(?<word>.*?)(,\\k<word>(?=,|$))+'
,'${word}'
)
from mytable
;

If value sort is not a concern:
with mytable as (
select 'gun,gun,man,gun,man' as column_name union
select 'shuttle,enemy,enemy,run' as column_name union
select 'hit,chase' as column_name
) -- test data
SELECT column_name, concat_ws(',',collect_set(item)) from (
select distinct column_name, s.item from mytable
lateral view explode(split(column_name,',')) s as item
) t
group by column_name
;
+--------------------------+--------------------+--+
| column_name | _c1 |
+--------------------------+--------------------+--+
| gun,gun,man,gun,man | gun,man |
| hit,chase | chase,hit |
| shuttle,enemy,enemy,run | enemy,run,shuttle |
+--------------------------+--------------------+--+
If want to keep the value sorted:
with mytable as (
select 'gun,gun,man,gun,man' as column_name union
select 'shuttle,enemy,enemy,run' as column_name union
select 'hit,chase' as column_name
) -- test data
select column_name,concat_ws(',',collect_set(item)) as column_name_distincted
from (
select column_name,item, min(pos) as pos
from (
select column_name,pos,item
from mytable
lateral view posexplode(split(column_name,',')) s as pos,item
) t
group by column_name,item
order by column_name,pos
) t
group by column_name
;
+--------------------------+-------------------------+--+
| column_name | column_name_distincted |
+--------------------------+-------------------------+--+
| gun,gun,man,gun,man | gun,man |
| hit,chase | hit,chase |
| shuttle,enemy,enemy,run | shuttle,enemy,run |
+--------------------------+-------------------------+--+

Show count of columns distinct values

Hello my fellow colleagues from StackOverflow!
I will be brief, and cut to the point:
I have a table in MS Access, it contains 2 columns of interest- County, and TGTE (Type Of Geothermal Energy ). Column TGTE is of type VARCHAR and it can have 1 of two values, to make it easier let's say it is either L or H.
I need to create SQL query that shows a result which is described bellow:
Bellow is the part of the table:
County | TGTE | ... |
First | L |
First | L |
First | H |
Second | H |
Third | L |
__________________
I need a resulting query that shows the count of distinct TGTE in every County like this:
County | TGTE = L | TGTE = H |
First | 2 | 1 |
Second | 0 | 1 |
Third | 1 | 0 |
__________________________________
How can I create query that displays the desired result described above ?
NOTE:
I have browsed through archive, and found similar things, but nothing to help me.
To be honest, I do not know how to formulate the question properly, so I guess that is why Google couldn't be of much help...
I have tried with this:
SELECT County, COUNT(TGTE) as [Something]
FROM MyTable
WHERE TGTE = "L"
GROUPBY COUNTY;
but this is the result I get:
County | TGTE = L |
First | 2 |
Second | 0 |
Third | 1 |
__________________________________
If I change L to H, in the query above, I get this:
County | TGTE = H |
First | 1 |
Second | 1 |
Third | 0 |
__________________________________
I work on Windows XP, in C++, using ADO to access an MS Access 2007 database.
If there is anything else that I can do to help, ask and I will gladly do it.
EDIT #1:
After trying Declan's solution this is what I get:
Values in main table:
| County | TGTE |
| Стари Град | H |
| Сурчин | L |
| Стари Град | H |
| Савски Венац | H |
| Раковица | H |
Output :
| County | TGTE = L | TGTE = H |
| Раковица | 1 | 1 |
| Савски Венац | 1 | 0 |
| Сурчин | 1 | 0 |
| Стари Град | 1 | 0 |
It should output this:
| County | TGTE = L | TGTE = H |
| Раковица | 1 | 0 |
| Савски Венац | 1 | 0 |
| Сурчин | 0 | 1 |
| Стари Град | 2 | 0 |
EDIT #2:
On Declan's request, here is the original query I use:
wchar_t *query = L"select Општина, \
sum( iif( Тип_геотермалне_енергије =
'Хидрогеотермална енергија', 1, 0 ) ) as [HGTE], \
sum( iif( Тип_геотермалне_енергије =
'Литогеотермална енергија', 1, 0 ) ) as [LGTE] \
from Објекат \
group by Општина; ";
Translated to our example, it looks like this:
wchar_t *query = L"select County, \
sum( iif( TGTE = 'H', 1, 0 ) ) as [HGTE], \
sum( iif( TGTE = 'L', 1, 0 ) ) as [LGTE] \
from MyTable \
group by County; ";
EDIT #3:
After I copy the above query in Access and run it, everything works fine, thus I believe that the problem lies in in usage of ADO.
EDIT #4:
After browsing through Internet, I am sure that problem is ADO.
How can I use IIF() in ADO so my query can work?
If it can't be done, than how to modify y query to do what I have described above?

You need to use the iif function within the two additional columns. Here is some pseudo code to get you started.
SELECT County
,sum(iif(TGTE = "L",1,0)) as [L_Count]
,sum(iif(TGTE = "H",1,0)) as [H_Count]
FROM MyTable
GROUP BY
COUNTY;

I have reworked Deslan's query like bellow, and it works:
SELECT County
,sum( switch( ТGTE = 'L', 1, TGTE = 'H', 0 ) ) as [L_Count]
,sum( switch( ТGTE = 'H', 1, TGTE = 'L', 0 ) ) as [H_Count]
FROM MyTable
GROUP BY
County;
Everything works fine, when I run it through ADO and MS Access 2007.
I do not understand why IIF() isn't working in ADO, maybe it is not supported or something...
Thank you Declan anyway, for your solution.You have +1 from me.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How Retrieve Part Of text Using Regex Sunstring - regex

Related

Usage of TSQL string_split in DAX

SAS IFN function gets stuck

PowerQuery - Fill missing data according to specific pattern

How to remove duplicates in hive string?

Show count of columns distinct values

Categories

Resources