Substitute for Function STUFF (SQL Server) in AWS redshift - amazon-web-services

I have to replace first 3 digits of a column to a fix first 3 digits (123)
Working SQL Server code. (Not working on AWS RedShift)
Code:
Select
Stuff (ColName,1,3,'123')as NewColName
From DataBase.dbo.TableName
eg 1 -Input --- 8010001802000000000209092396---output -1230001802000000000209092396
eg 2 -Input --- 555209092396- --output -123209092396
it should replace the first 3 digits to 123 irrespective of its length.
Please advice anything that is supported in AWS Redshift.
yet trying using substring and repalce.

I see that AWS RedShift was based on an old version of Postgres, and I looked up the SUBSTRING function for you (https://docs.aws.amazon.com/redshift/latest/dg/r_SUBSTRING.html), which is pretty forgiving of its argument values.
In this sample in Transact-SQL, and as documented for RedShift, the third argument of SUBSTRING can be much longer than the actual strings without causing an error. In Transact-SQL, even the second argument is "forgiving" if it starts after the end of the actual string:
;
WITH RawData AS
(SELECT * FROM (VALUES ('8010001802000000000209092396'),
('555209092396'),
('AB')
) AS X(InputString)
)
SELECT InputString, '123' + SUBSTRING(InputString, 4, 1000) AS OutputString
FROM RawData
InputString OutputString
8010001802000000000209092396 1230001802000000000209092396
555209092396 123209092396
AB 123
As it appears that the concatenation operator in Redshift is ||, I think your expression will be very close to:
'123' || SUBSTRING(InputString, 4, 1000)

Got this and it worked
--Using Substring and concat
Select
cast('123'+substring(ColName,4,LEN(ColName)) as numeric (28)) as NewColName
From DataBase.dbo.TableName

Related

BigQuery - substr() cannot be negative

I went from hive to bigQuery.
But when I run the query, I got this message: Third argument in SUBST() cannot be negative.
Substr(variable, instr(variable, ‘a’)+2, instr(variable, ‘f’) - instr(variable, ‘a’) - 3)
Don't forget the super powerful regex string function (extract and substring). Here an example
with input as (select "14526a utfsd f azd" as data)
select REGEXP_SUBSTR(data,"[a-z]{5}") from input

Regexp expression from Oracle SQL to Big Query

I previously had help here for an Regexp expression in oracle sql which worked great.However, our place is converting to Big Query and the regexp does not seem to be working anymore.
In my tables, i have the following data
WC 12/10 change FC from 24 to 32
W/C 12/10 change fc from 401 to 340
W/C12/10 18-26
This oracle sql would have split the table up to give me the before number (24) and (32) and (12/10).
cast(REGEXP_SUBSTR(Line_Comment, '((\d+ |\d+)(change )?(- |-|to |to|too|too )(\d+))', 1, 1, 'i',2) as Int) as Before,
cast(REGEXP_SUBSTR(Line_Comment, '((\d+ |\d+)(change )?(- |-|to |to|too|too )(\d+))', 1, 1, 'i', 5) as Int) as After,
REGEXP_SUBSTR(Line_Comment, '((\d+)(\/|-|.| )(\d+)(\/|-|.| )(\d+))|(\d+)(\/|-|.| )(\d+)', 1, 1, 'i') as WC_Date,
Totally understand that comments are not consistent and may not work but if it works more than 80% of the time which it has then we are fine with this.
Since moving to big query, I'm getting this error message. In oracle, the tables were in varchar but in big query when they migrated it, its now in strings. Could this be the reason why its broken?Is there anyone who can help with this?This is way over my head.
No matching signature for function REGEXP_SUBSTR for argument types:
STRING, STRING, INT64, INT64, STRING, INT64. Supported signatures:
REGEXP_SUBSTR(STRING, STRING, [INT64], [INT64]); REGEXP_SUBSTR(BYTES,
BYTES, [INT64], [INT64]) at [69:12]
Since google bigquery REGEXP_SUBSTR doesn't support the subexpr parameter of Oracle's REGEXP_SUBSTR, you need to modify your regexes to take advantage of the fact that:
If the regular expression contains a capturing group, the function returns the substring that is matched by that capturing group.
So for each value you are trying to extract, you need to make that the only capturing group in the regex:
cast(REGEXP_SUBSTR(Line_Comment, '(?:(\d+ |\d+)(?:change )?(?:- |-|to |to|too|too )(?:\d+))', 1, 1) as Int) as Before,
cast(REGEXP_SUBSTR(Line_Comment, '(?:(?:\d+ |\d+)(?:change )?(?:- |-|to |to|too|too )(\d+))', 1, 1) as Int) as After,
REGEXP_SUBSTR(Line_Comment, '((?:\d+)(?:\/|-|.| )(?:\d+)(?:\/|-|.| )(?:\d+))|((?:\d+)(?:\/|-|.| )(?:\d+))', 1, 1) as WC_Date,
Note you can substantially simplify your regexes as below:
(\d+) ?(?:change )?(?:-|too?) ?(?:\d+)
(?:\d+) ?(?:change )?(?:-|too?) ?(\d+)
(?:\d+)(?:[\/.-](?:\d+)){1,2}
Regex demos on regex101: numbers, date
Based on the sample data you provided in the comment section, you can try below query:
with t1 as (
select 'WC 12/10 change FC from 24 to 32' as Comment
union all select 'W/C 12/10 change fc from 401 to 340' as Comment,
union all select 'W/C12/10 18-26' as Comment
)
select Comment,
regexp_extract(t1.Comment, r'(\d+\/\d+)') as WC,
regexp_extract(t1.Comment, r'.+\s(\d{1,3})[\s|\-]') as Before,
regexp_extract(t1.Comment, r'.+[\sto\s|\-](\d{1,3})$') as After
from t1
Output:
Consider below super simple approach
select Comment,
format('%s/%s', arr[offset(0)], arr[safe_offset(1)]) as wc,
arr[safe_offset(2)] as before,
arr[safe_offset(3)] as after
from your_table, unnest([struct(regexp_extract_all(Comment, r'\d+') as arr)])
if applied to sample data in your question - output is

Parsing a name from a complex string in Tableau

I have a series of values in Tableau that are long strings intermixed with letters and numbers. I am unable to control the data output, but would like to parse the names from these strings. They follow the following format:
Potato 1TByte 4.5 NFA
Board 256GByte 553 NCA
Launch 4 512GByte 4.5 NFA
Launch 4S 512GByte 4.5 NCA
From each of these, I am attempting to capture the following:
"Potato"
"Board"
"Launch 4"
"Launch 4S"
Each string follows the same format: the name, followed by size, followed by some extra information we don't really care about.
I've tried to put together some text parsing strings, but am coming up short, and am still trying to learn regular expressions.
The Tableau calculated field I was trying to work with was something like the following:
LEFT([String], FIND([String], "Byte") - 2)
The issue is that the text and numbers preceding Byte can be anywhere from 4 to 2 characters and I need a way to identify the length of that.
Any help would be greatly appreciated!
One option which uses a regex replacement:
REGEXP_REPLACE('Launch 4 512GByte 4.5 NFA', ' \d+[A-Z]Byte .*$', '')
This strips off everything from the Byte term to the right, leaving us with only the product name.
You could try the following - this seems to work - Screenshot of Tableau output. Find below the formulas for the various derived columns you see in the screenshot (Your source column is called [Name])
Step1 = LEFT([Name],FIND([Name],"Byte")-1)
Step2 = LEN([Step1])-LEN(REPLACE([Step1]," ",""))
Step3 = FINDNTH([Step1]," ",[Step2])
Step4 = LEFT([Step1],[Step3]-1)
And of course you can nest all these in a single calculated field - kept them as separate columns for easier understanding

HiveQL: Parse strings and count

I am using HiveQL to work with millions of rows of domain name text data stored in HDFS. The following is a hand-selected subset to illustrate lexical diversity. There are duplicate entries.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
mgmtsubnet.mgmtvcn.oraclevcn.com.
asdf.mgmtvcn.oraclevcn.com.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
localhost.
a.localhost.
img.pulsemgr.com.
36.136.154.156.in-addr.arpa.
accounts.spotify.com.
_dmarc.ixia-devops.com.
&eventtype=close&reason=4&duration=35.
&eventtype=close&reason=3&duration=10336.
I am trying to get a count of # of rows based on the last two levels of the domain, where sometimes the 2nd level is absent (i.e. localhost.). For example:
domain_root count
oraclevcn.com. 4
localhost. 1
a.localhost. 1
pulsemgr.com. 1
in-addr.arpa. 1
spotify.com. 1
ixia-devops.com 1
It would be nice to also see how to filter out domains 2nd level is absent.
I am not sure where to start. I have seen use of the SPLIT() function, but that may not be robust since there could be many levels to a domain name, for example: a.b.c.d.e.f.g.h.i etc.
Any ideas are implementations are appreciated.
Below would be the query with regexp_extract.
select domain_root, count(*) from (select regexp_extract('dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.', '[A-Za-z0-9-]+\.[A-Za-z0-9-]+\.$', 0) as domain_root from table) A group by A.domain_root -- replace first argument with column name
regex will extract for domain root with Alphanumeric and special character '-'
hope this helps.

Using a regular expression to match everything after a colon

I've got a string like this:
192.168.114.182:SomethingFun-1083:EOL/Nothing Here : MySQL Database 4.12
192.168.15.82:SomethingElse-1325083:All Types : PHP Version Info : 23
I'm trying to select this item in an Oracle database (using REGEXP_SUBSTR) and get all remaining text after the second colon (:).
So in the end, I'd like to get these values:
EOL/Nothing Here : MySQL Database 4.12
All Types : PHP Version Info : 23
I've tried these, but it haven't found something that works.
REGEXP_SUBSTR(title,'[^:]+',1,3) as Title -- doesn't work if last field has ":"
REGEXP_SUBSTR(title,'(?:[^:]*:)+([^:]*)') as Title
how about REGEXP_REPLACE
REGEXP_REPLACE(title,'^[^:]+:[^:]+:(.*)$', '\1') as Title
Oracle's regular expression functions tend to be CPU intensive versus alternatives. Tom Kyte's advice is "however, if you can do it without regular expressions - do it without them. regular expressions consume CPU hugely."
An alternative not using regular expressions would be substr(test_values, instr(test_values, ':', 1, 2) + 1)
SQL> create table t (test_values varchar2(100));
Table created.
SQL> insert into t values ('192.168.114.182:SomethingFun-1083:EOL/Nothing Here : MySQL Database 4.12');
1 row created.
SQL> insert into t values ('192.168.15.82:SomethingElse-1325083:All Types : PHP Version Info : 23');
1 row created.
SQL> commit;
Commit complete.
SQL> select substr(test_values, instr(test_values, ':', 1, 2) + 1)
2 from t;
SUBSTR(TEST_VALUES,INSTR(TEST_VALUES,':',1,2)+1)
-----------------------------------------------------
EOL/Nothing Here : MySQL Database 4.12
All Types : PHP Version Info : 23
Bench-marking left as an exercise for the reader.
This should work:
^[^:]+:[^:]+:(.*)$