Select group > 10 in REGEXP_REPLACE in teradata

Select group > 10 in REGEXP_REPLACE in teradata - regex

I have a regex, where I have around 19 groups and I need to select every single one. However when i try to select a group > 10, it selects group one and add the number behind the 1.
My regex looks like this:
pattern ='GPU = (\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s\sHDD = (\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)\s(\d*)'
And i want to replace all of it with the one item of the group with this code:
regexp_replace(text,pattern,'\19') as GPU_19 //OUTPUT--> AB9
I am using Teradata format.

Related

REGEXP_EXTRACT with String Value in Bigquery

I want to extract words in a column, the column value looks like this:'p-fr-youtube-car'. And they should all be extracted to their own column.
INPUT:
p-fr-youtube-car
DESIRED OUTPUT:
Country = fr
Channel = youtube
Item = car
I've tried below to extract the first word, but can't figure out the rest.What RegEx will achieve my desired output from this input? And how can I make it not case sensative fr and FR will be the same.
REGEXP_EXTRACT_ALL(CampaignName, r"^p-([a-z]*)") AS Country

You can use [^-]+ to match parts between hyphens and only capture what you need to fetch.
To get strings like youtube, you can use
REGEXP_EXTRACT_ALL(CampaignName, r'^p-[^-]+-([^-]+)')
To get strings like car, you can use
REGEXP_EXTRACT_ALL(CampaignName, r'^p-[^-]+-[^-]+-([^-]+)')
So, [^-]+ matches one or more chars other than - and ([^-]+) is the same pattern wrapped with a capturing group whose contents REGEXP_EXTRACT actually returns as a result.

You can use named groups.
Example Regex:
p-(?P<Country>[a-z]*)\-(?P<Channel>[a-z]*)\-(?P<Item>[a-z]*)$
https://regex101.com/r/fKoBIn/3

Below is for BigQuery Standard SQL
I would recommend use of SPLIT in cases like yours
#standardSQL
SELECT CampaignName,
parts[SAFE_OFFSET(1)] AS Country,
parts[SAFE_OFFSET(2)] AS Channel,
parts[SAFE_OFFSET(3)] AS Item
FROM `project.dataset.table`,
UNNEST([STRUCT(SPLIT(CampaignName, '-') AS parts)])
if to apply to sample data from your question - the output is
Row CampaignName Country Channel Item
1 p-fr-youtube-car fr youtube car
Meantime, if for some reason you are required to use Regexp - you can use below
#standardSQL
SELECT CampaignName,
parts[SAFE_OFFSET(1)] AS Country,
parts[SAFE_OFFSET(2)] AS Channel,
parts[SAFE_OFFSET(3)] AS Item
FROM `project.dataset.table`,
UNNEST([STRUCT(REGEXP_EXTRACT_ALL(CampaignName, r'(?:^|-)([^-]*)') AS parts)])

Wildcard expression in SQL Server

I know that, the following query returns the rows, which are contain the exact 5 characters between the A and G
select *
from
(select 'prefixABBBBBGsuffix' code /*this will be returned. */
union
select 'prefixABBBBGsuffix') rex
where
code like '%A_____G%'
But I want 17 character between A and G, then like condition must have 17 underscores. So I search little in google I found [] will be used in like. Then I tried so for.
select *
from
(select 'AprefixABBBBBGsuffixG' code
union
select 'AprefixABBBBGsuffixG') rex
where
code like '%A[_]^17G%' /*As per my understanding, '[]' makes a set. And
'^17' would be power of the set (like Mathematics).*/
Then it returns the NULL set. How can I search rows which has certain number of character in the set []?
Note:
I'm using SQL Server 2012.

I would use REPLICATE to generate desired number of '_':
select * from (
select 'prefixABBBBBGsuffix' code
union
select 'prefixABBBBGsuffix'
) rex
where code like '%A' + REPLICATE('_',17) + 'G%';

same answer as previously but corrected. 17 wasn't the number, it was 18 and 19 for strings, also put in the len(textbetweenA and G) to show.
select rex.*
from (
select len('prefixABBBBBGsuffix') leng, 'AprefixABBBBBGsuffixG' code
union
select len('prefixABBBBGsuffix'), 'AprefixABBBBGsuffixG'
union
select 0, 'A___________________G'
) rex
where
rex.code like '%A' + replicate('_',19) + 'G%'
--and with [] the set would be [A-Za-z]. Notice this set does not match the A___________________G string.
select rex.*
from (
select len('prefixABBBBBGsuffix') leng, 'AprefixABBBBBGsuffixG' code
union
select len('prefixABBBBGsuffix'), 'AprefixABBBBGsuffixG'
union
select 0, 'A___________________G'
) rex
where
rex.code like '%A' + replicate('[A-Za-z]',19) + 'G%'
[A-Za-z0-9] matches one character within the scope of alphabet (both cases) or a number 0 through 9
I can't find any working information about another way to handle a number of chars like that, replicate is just a way to ease parameterization and typing.

SQL Regex Pattern, How to match only a specific variable between two characters? (see Sample Output)

I have this inputs:
John/Bean/4000-M100
John/4000-M100
John/4000
How can I get just the 4000 but note that the 4000 there will be change from time to time it can be 3000 or 2000 how can I treat that using regex pattern?
Here's my output so far, it statisfies John/400-M100 and John/4000 but the double slash doesnt suffice the match requirements in the regex I have:
REGEXP_REPLACE(REGEXP_SUBSTR(a.demand,'/(.*)-|/(.*)',1,1),'-|/','')

You can use this query to get the results you want:
select regexp_replace(data, '^.*/(\d{4})[^/]*$', '\1')
from test
The regex looks for a set of 4 digits following a / and then not followed by another / before the end of the line and replaces the entire content of the string with those 4 digits.
Demo on dbfiddle

This would also work, unless you need any digit followed by three zeros. See it in action here, for as long as it lives, http://sqlfiddle.com/#!4/23656/5
create table test_table
( data varchar2(200))
insert into test_table values('John/Bean/4000-M100')
insert into test_table values('John/4000-M100')
insert into test_table values('John/4000')
select a.*,
replace(REGEXP_SUBSTR(a.data,'/\d{4}'), '/', '')
from test_table a

The following will match any multiple of 1000 less than 10000 when its preceded by a slash:
\/[1-9]0{3}
To match any four-digit number preceded by a slash, not followed by another digit, such as 4031 in—
Sal_AS_180763852/4200009751_S5_154552/4031
—try:
\/\d{3}(?:(?:\d[^\d])|(?:\d$))
https://regex101.com/r/Am34WO/1

Trim Results After Certain Character

I have a table that list all of the available product ids.
For example, 1020, 1020A, 1020B.
I am looking to group these product ids together.
Is it possible to do this via SQL?

to group rows with 1020, 1020A, 1020B into a group called 1020 you just need to use the substring expression in group by clause:
select substring(your_column from 1 for 4), ...
from ...
group by substring(your_column from 1 for 4)
if you have options with a different length like 102A, 102B turning into 102 you'll need a regular expression for that. The general idea is that you can use any expression, not just the column name, in group by clause

How to limit characters when using regexp_extract in hive?

I have a fixed length string in which I need to extract portions as fields. First 5 characters to ACCOUNT1, next 2 characters to ACCOUNT2 and so on.
I would like to use regexp_extract (not substring) but I am missing the point. They return nothing.
select regexp_extract('47t7916A2088M040323','(.*){0,5}',1) as ACCOUNT1,
regexp_extract('47t7916A2088M040323','(.*){6,8}',1) as ACCOUNT2 --and so on

If you want using regexp then use it like in this example. For Account1 expression '^(.{5})' means: ^ is a beginning of the string, then capturing group 1 consisting of any 5 characters (group is in the round brackets). {5} - is a quantifier, means exactly 5 times. For Account2 - capturing group 2 is after group1. (.{2}) - means two any characters.
In this example in the second regexp there are two groups(for first and second column) and we extracting second group.
hive> select regexp_extract('47t7916A2088M040323','^(.{5})',1) as Account1,
> regexp_extract('47t7916A2088M040323','^(.{5})(.{2})',2) as Account2;
OK
47t79 16
Time taken: 0.064 seconds, Fetched: 1 row(s)
Actually you can use the same regexp containing groups for all columns, extracting different capturing groups.
Example using the same regexp and extracting different groups:
hive> select regexp_extract('47t7916A2088M040323','^(.{5})(.{2})',1) as Account1,
> regexp_extract('47t7916A2088M040323','^(.{5})(.{2})',2) as Account2
> ;
OK
47t79 16
Time taken: 1.043 seconds, Fetched: 1 row(s)
Add more groups for each column. This approach works only for fixed length columns. If you want to parse delimited string, then put the delimiter characters between groups, modify group to match everything except delimiters and remove/modify quantifiers. For such example substring or split for delimited string looks much more simple and cleaner, regexp allows to parse very complex patterns. Hope you have caught the idea.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Select group > 10 in REGEXP_REPLACE in teradata - regex

Related

REGEXP_EXTRACT with String Value in Bigquery

Wildcard expression in SQL Server

SQL Regex Pattern, How to match only a specific variable between two characters? (see Sample Output)

Trim Results After Certain Character

How to limit characters when using regexp_extract in hive?

Categories

Resources