SQL: Multiple IF's examining the same varible - if-statement

Let's say we have two variables in a SQL statement.
#myVar1 VARCHAR(100) = 'I ate an apple for lunch'
#myVar2 INT
Now, I want the following logic:
IF #myVar1 like '%apple%' THEN #myVar2 = 1
IF #myVar1 like '%orange%' THEN #myVar2 = 2
IF #myVar1 like '%banana%' THEN #myVar2 = 3
IF #myVar1 like '%kiwi%' THEN #myVar2 = 4
IF #myVar1 like '%pineapple%' THEN #myVar2 = 5
This is outside of any select statement. I just need to examine a passed in string, and set an integer according to an expected sub-string. There would be a finite list of possible values for #myVar1, and they are all known.
I just need help with the T-SQL syntax. Thanks!

Use CASE as following:
DECLARE #myVar1 VARCHAR(100) = 'I ate an apple for lunch'
DECLARE #myVar2 INT
SET #myVar2 =
CASE
WHEN #myVar1 LIKE '%apple%' THEN 1
WHEN #myVar1 LIKE '%orange%' THEN 2
WHEN #myVar1 LIKE '%banana%' THEN 3
WHEN #myVar1 LIKE '%kiwi%' THEN 4
WHEN #myVar1 LIKE '%pineapple%' THEN 5
END

Create a table with the pattern as the key and the integer value as another column. Then you can write
SELECT #MyVar2 = IntCol
FROM TheTable
WHERE #MyVar1 LIKE PatternCol
Now you can manage the table contents and not have to edit code.

It is recommended to store data into tables, not within SQL code. In your case that means storing mappings between patterns and values similar to this:
FruitValue
FruitValueId INT IDENTITY(1, 1),
Token VARCHAR(32),
Value INT
Inserted values should be like the following:
('apple', 1), ('orange', 2) and so on
From your example, it is not clear what happens if multiple fruits appear in one sentence, but for the simple example of one fruit per sentence, the following query could be used (adapt to the actual used SQL flavor):
SELECT TOP 1 #myVar2 = Value
FROM FruitValue
WHERE #myVar1 LIKE '%' + Token + '%'

What about a catalog-table?
CREATE TABLE Fruits(ID INT IDENTITY, FruitName VARCHAR(100));
INSERT INTO Fruits VALUES
('Apple'),('Orange'),('Banana'),('Kiwi'),('Pineapple');
DECLARE #myVar1 VARCHAR(100)='I ate an apple for lunch';
DECLARE #myVar2 INT=(SELECT ID FROM Fruits WHERE #myVar1 LIKE '%' + FruitName + '%');
SELECT #myVar2;
--If there might be more than one:
SELECT *
FROM Fruits
WHERE #myVar1 LIKE '%' + FruitName + '%';
DROP TABLE Fruits

you could also try this
declare
#myVar1 VARCHAR(100) = 'I ate an apple for lunch',
#myVar2 INT
IF #myVar1 like '%apple%' set #myVar2 = 1 else
IF #myVar1 like '%orange%' set #myVar2 = 2 else
IF #myVar1 like '%banana%' set #myVar2 = 3 else
IF #myVar1 like '%kiwi%' set #myVar2 = 4 else
IF #myVar1 like '%pineapple%' set #myVar2 = 5
select #myVar2

Related

how to get all the words that start with a certain character in bigquery

I have a text column in a bigquery table. Sample record of that column looks like -
with temp as
(
select 1 as id,"as we go forward into unchartered waters it's important to remember we are all in this together. #united #community" as input
union all
select 2 , "US cities close bars, restaurants and cinemas #Coronavirus"
)
select *
from temp
I want to extract all the words in this column that start with a # . later on I would like to get the frequency of these terms. How do I do this in BigQuery ?
My output would look like -
id, word
1, united
1, community
2, coronavirus
Below is for BigQuery Standard SQL
I want to extract all the words in this column that start with a #
#standardSQL
WITH temp AS (
SELECT 1 AS id,"as we go forward into unchartered waters it's important to remember we are all in this together. #united #community" AS input UNION ALL
SELECT 2 , "US cities close bars, restaurants and cinemas #Coronavirus"
)
SELECT id, word
FROM temp, UNNEST(REGEXP_EXTRACT_ALL(input, r'(?:^|\s)#([^#\s]*)')) word
with output
Row id word
1 1 united
2 1 community
3 2 Coronavirus
later on I would like to get the frequency of these terms
#standardSQL
SELECT word, COUNT(1) frequency
FROM temp, UNNEST(REGEXP_EXTRACT_ALL(input, r'(?:^|\s)#([^#\s]*)')) word
GROUP BY word
You can do this without regexes, by splitting words and then selecting ones that start the way you want. For example:
SELECT
id,
ARRAY(SELECT TRIM(x, "#") FROM UNNEST(SPLIT(input, ' ')) as x WHERE STARTS_WITH(x,'#')) str
FROM
temp
If you prefer the hashtags to be separate rows, you can be a bit tiedier:
SELECT
id,
TRIM(x, "#") str
FROM
temp,
UNNEST(SPLIT(input, ' ')) x
WHERE
STARTS_WITH(x,'#')

Create new Column in Power Bi with RegEx

I'm relatively new to Power BI and want to generate a new one based on a column. The contents of the new column should be based on the first value of another column. For example:
ColumnA NewColumn
1123 Argentinia
5644 Brazil
5555 Brazil
3334 Denmark
1124 Argentinia
As you can see, the first value of the number decides which country will be added to the new column.
In SQL I know that I can use something like this:
`select * from table where column LIKE '%[2]`%'
and so on but is this possible with Power BI? Thanks a lot.
Edit:
My additional list looks like this:
ID Country
1 Argentina
2 Swiss
3 Denmark
4 Norway
5 Brazil
and so on...
I thougt I could use somethin like this:
NewColumn = IF('table'[ColumnA] = "%[1]`%"
THEN "Argentinia"
ELSE if IF('table'[ColumnA] = "%[2]`%
THEN Swiss
ELSE "No Country")
Add your Number / Country List to a new table. Let's assume you call it Countries.
Now you can add a column to your original table (let's assume you've called that one Fact Table), using something like:
Country =
LOOKUPVALUE (
Countries[Country],
Countries[ID],
VALUE ( LEFT ( 'Fact Table'[ColumnA], 1 ) )
)
See https://pwrbi.com/so_56391689/ for worked example.
Okay, I've now also found a solution:
NewColumn = SWITCH(TRUE();
LEFT(table[ColumnA]; 1) in {"1"}; "Argentina";
LEFT(table[ColumnA]; 1) in {"2"}; "Swiss";
LEFT(table[ColumnA]; 1) in {"3"}; "Denmark";
LEFT(table[ColumnA]; 1) in {"4"}; "Norway";
LEFT(table[ColumnA]; 1) in {"5"}; "Brazil"
)
Works very well :)

Why can't BigQuery cast this number as an integer?

In my query, I have a value formatted as a dollar amount, like this:
Coverage_Amount
$10,000
$15,000
null
$2,000
So I remove the extra characters and map the null to 0. I get a column back like this:
Coverage_Amount
10000
15000
0
2000
However, these values are stored as strings, and when I try something like this:
CASE
WHEN Coverage_Amount IS NOT NULL THEN INTEGER(REGEXP_REPLACE(query.Coverage_Amount, r'\$|,', ''))
ELSE 0
END AS Coverage_Amount
I get back
Coverage_Amount
null
null
0
null
The documentation for the INTEGER() function says
Casts expr to a 64-bit integer. Returns NULL if is a string that doesn't correspond to an integer value.
Is there anything I can do to make BigQuery recognize that these are in fact integers?
Both below versions for BigQuery (respectivelly Legacy SQL and StandardSQL) work and return below result
Coverage_Amount val
10000 10000
15000 15000
2000 2000
Legacy SQL
#legacySQL
SELECT
Coverage_Amount,
IFNULL(INTEGER(REGEXP_REPLACE(Coverage_Amount, r'\$|,', '')), 0) AS val
FROM
(SELECT '10000' Coverage_Amount),
(SELECT '15000' Coverage_Amount),
(SELECT '2000' Coverage_Amount)
Standard SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '10000' Coverage_Amount UNION ALL
SELECT '15000' UNION ALL
SELECT '2000'
)
SELECT
Coverage_Amount,
IFNULL(CAST(REGEXP_REPLACE(Coverage_Amount, r'\$|,', '') AS INT64), 0) AS val
FROM `project.dataset.table`
Obviously, same works for '$15,000' and '$10,000' and '$2,000' etc.
It could be because you have spaces after 0 at the end of string.
I mean f.e. '&10000 '. So you can try to use RTRIM(value, ' ')
SELECT
Coverage_Amount,
IFNULL(INTEGER(REGEXP_REPLACE(RTRIM(Coverage_Amount, ' '), r'\$|,', '')),0) AS val
FROM
(SELECT '$10,000 ' Coverage_Amount)
to delete all spaces from the end of string
Then output will be:
Row Coverage_Amount val
1 $10,000 10000
Are you using Standard? This worked for me (notice I use the CAST operator):
WITH data as(
select "$10,000" d UNION ALL
select "$15,000" UNION ALL
select "$2,000")
SELECT
d,
CAST(REGEXP_REPLACE(d, r'\$|,', '') AS INT64) AS Coverage_Amount
FROM data

Converting a specific column data in .csv to text using Python pandas

I have a .csv file like below where all the contents are text
col1 Col2
My name Arghya
The Big Apple Fruit
I am able to read this csv using pd.read_csv(index_col=False, header=None).
How do I combine all the three rows in Col1 into a list separated by a full stop.
If need convert column values to list:
print (df.Col1.tolist())
#alternative solution
#print (list(df.Col1))
['This is Apple', 'I am in Mumbai', 'I like rainy day']
And then join values in list - output is string:
a = '.'.join(df.Col1.tolist())
print (a)
This is Apple.I am in Mumbai.I like rainy day
print (df)
0 1
0 Col1 Col2
1 This is Apple Fruit
2 I am in Mumbai Great
3 I like rainy day Flood
print (list(df.loc[:, 0]))
#alternative
#print (list(df[0]))
['Col1', 'This is Apple', 'I am in Mumbai', 'I like rainy day']

Redshift. Convert comma delimited values into rows

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:
I have:
user_id|user_name|user_action
-----------------------------
1 | Shone | start,stop,cancell...
I would like to see
user_id|user_name|parsed_action
-------------------------------
1 | Shone | start
1 | Shone | stop
1 | Shone | cancell
....
A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.
Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.
Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:
select
(row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;
If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:
select
n::int
into numbers
from
(select
row_number() over (order by true) as n
from cmd_logs)
cross join
(select
max(regexp_count(user_action, '[,]')) as max_num
from cmd_logs)
where
n <= max_num + 1;
Once there is a numbers table, we can do:
select
user_id,
user_name,
split_part(user_action,',',n) as parsed_action
from
cmd_logs
cross join
numbers
where
split_part(user_action,',',n) is not null
and split_part(user_action,',',n) != '';
Another idea is to transform your CSV string into JSON first, followed by JSON extract, along the following lines:
... '["' || replace( user_action, '.', '", "' ) || '"]' AS replaced
... JSON_EXTRACT_ARRAY_ELEMENT_TEXT(replaced, numbers.i) AS parsed_action
Where "numbers" is the table from the first answer. The advantage of this approach is the ability to use built-in JSON functionality.
If you know that there are not many actions in your user_action column, you use recursive sub-querying with union all and therefore avoiding the aux numbers table.
But it requires you to know the number of actions for each user, either adjust initial table or make a view or a temporary table for it.
Data preparation
Assuming you have something like this as a table:
create temporary table actions
(
user_id varchar,
user_name varchar,
user_action varchar
);
I'll insert some values in it:
insert into actions
values (1, 'Shone', 'start,stop,cancel'),
(2, 'Gregory', 'find,diagnose,taunt'),
(3, 'Robot', 'kill,destroy');
Here's an additional table with temporary count
create temporary table actions_with_counts
(
id varchar,
name varchar,
num_actions integer,
actions varchar
);
insert into actions_with_counts (
select user_id,
user_name,
regexp_count(user_action, ',') + 1 as num_actions,
user_action
from actions
);
This would be our "input table" and it looks just as you expected
select * from actions_with_counts;
id
name
num_actions
actions
2
Gregory
3
find,diagnose,taunt
3
Robot
2
kill,destroy
1
Shone
3
start,stop,cancel
Again, you can adjust initial table and therefore skipping adding counts as a separate table.
Sub-query to flatten the actions
Here's the unnesting query:
with recursive tmp (user_id, user_name, idx, user_action) as
(
select id,
name,
1 as idx,
split_part(actions, ',', 1) as user_action
from actions_with_counts
union all
select user_id,
user_name,
idx + 1 as idx,
split_part(actions, ',', idx + 1)
from actions_with_counts
join tmp on actions_with_counts.id = tmp.user_id
where idx < num_actions
)
select user_id, user_name, user_action as parsed_action
from tmp
order by user_id;
This will create a new row for each action, and the output would look like this:
user_id
user_name
parsed_action
1
Shone
start
1
Shone
stop
1
Shone
cancel
2
Gregory
find
2
Gregory
diagnose
2
Gregory
taunt
3
Robot
kill
3
Robot
destroy
Here are two ways to achieve this.
In my example, I'm assuming that I am accepting a comma separated list of values. My values look like schema.table.column.
The first involves using a recursive CTE.
drop table if exists #dep_tbl;
create table #dep_tbl as
select 'schema.foobar.insert_ts,schema.baz.load_ts' as dep
;
with recursive tmp (level, dep_split, to_split) as
(
select 1 as level
, split_part(dep, ',', 1) as dep_split
, regexp_count(dep, ',') as to_split
from #dep_tbl
union all
select tmp.level + 1 as level
, split_part(a.dep, ',', tmp.level + 1) as dep_split_u
, tmp.to_split
from #dep_tbl a
inner join tmp on tmp.dep_split is not null
and tmp.level <= tmp.to_split
)
select dep_split from tmp;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
The second involves a stored procedure.
CREATE OR REPLACE PROCEDURE so_test(dependencies_csv varchar(max))
LANGUAGE plpgsql
AS $$
DECLARE
dependencies_csv_vals varchar(max);
BEGIN
drop table if exists #dep_holder;
create table #dep_holder
(
avoid varchar(60000)
);
IF dependencies_csv is not null THEN
dependencies_csv_vals:='('||replace(quote_literal(regexp_replace(dependencies_csv,'\\s','')),',', '\'),(\'') ||')';
execute 'insert into #dep_holder values '||dependencies_csv_vals||';';
END IF;
END;
$$
;
call so_test('schema.foobar.insert_ts,schema.baz.load_ts')
select
*
from
#dep_holder;
the above yields:
|dep_split|
|schema.foobar.insert_ts|
|schema.baz.load_ts|
in conclusion
If you only care about one single column in your input (the X delimited values), then I think the stored procedure is easier/faster.
However, if you have other columns you care about and want to keep those columns along with your comma separated value column now transformed to rows, OR, if you want to know the argument (original list of delimited values), I think the stored procedure is the way to go. In that case, you can just add those other columns to your columns selected in the recursive query.
You can get the expected result with the following query. I'm using "UNION ALL" to convert a column to row.
select user_id, user_name, split_part(user_action,',',1) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',2) as parsed_action from cmd_logs
union all
select user_id, user_name, split_part(user_action,',',3) as parsed_action from cmd_logs
Here's my equally-terrible answer.
I have a users table, and then an events table with a column that is just a comma-delimited string of users at said event. eg
event_id | user_ids
1 | 5,18,25,99,105
In this case, I used the LIKE and wildcard functions to build a new table that represents each event-user edge.
SELECT e.event_id, u.id as user_id
FROM events e
LEFT JOIN users u ON e.user_ids like '%' || u.id || '%'
It's not pretty, but I throw it in a WITH clause so that I don't have to run it more than once per query. I'll likely just build an ETL to create that table every night anyway.
Also, this only works if you have a second table that does have one row per unique possibility. If not, you could do LISTAGG to get a single cell with all your values, export that to a CSV and reupload that as a table to help.
Like I said: a terrible, no-good solution.
Late to the party but I got something working (albeit very slow though)
with nums as (select n::int n
from
(select
row_number() over (order by true) as n
from table_with_enough_rows_to_cover_range)
cross join
(select
max(json_array_length(json_column)) as max_num
from table_with_json_column )
where
n <= max_num + 1)
select *, json_extract_array_element_text(json_column,nums.n-1) parsed_json
from nums, table_with_json_column
where json_extract_array_element_text(json_column,nums.n-1) != ''
and nums.n <= json_array_length(json_column)
Thanks to answer by Bob Baxley for inspiration
Just improvement for the answer above https://stackoverflow.com/a/31998832/1265306
Is generating numbers table using the following SQL
https://discourse.looker.com/t/generating-a-numbers-table-in-mysql-and-redshift/482
SELECT
p0.n
+ p1.n*2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as number
INTO numbers
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
ORDER BY 1
LIMIT 100
"ORDER BY" is there only in case you want paste it without the INTO clause and see the results
create a stored procedure that will parse string dynamically and populatetemp table, select from temp table.
here is the magic code:-
CREATE OR REPLACE PROCEDURE public.sp_string_split( "string" character varying )
AS $$
DECLARE
cnt INTEGER := 1;
no_of_parts INTEGER := (select REGEXP_COUNT ( string , ',' ));
sql VARCHAR(MAX) := '';
item character varying := '';
BEGIN
-- Create table
sql := 'CREATE TEMPORARY TABLE IF NOT EXISTS split_table (part VARCHAR(255)) ';
RAISE NOTICE 'executing sql %', sql ;
EXECUTE sql;
<<simple_loop_exit_continue>>
LOOP
item = (select split_part("string",',',cnt));
RAISE NOTICE 'item %', item ;
sql := 'INSERT INTO split_table SELECT '''||item||''' ';
EXECUTE sql;
cnt = cnt + 1;
EXIT simple_loop_exit_continue WHEN (cnt >= no_of_parts + 2);
END LOOP;
END ;
$$ LANGUAGE plpgsql;
Usage example:-
call public.sp_string_split('john,smith,jones');
select *
from split_table
You can try copy command to copy your file into redshift tables
copy table_name from 's3://mybucket/myfolder/my.csv' CREDENTIALS 'aws_access_key_id=my_aws_acc_key;aws_secret_access_key=my_aws_sec_key' delimiter ','
You can use delimiter ',' option.
For more details of copy command options you can visit this page
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html