Create a new column by executing regular expression on existing column - regex

I have column with data as follows:
p=Chicago, IL|q=rental houses
My goal is to obtain
Chicago IL rental houses as the outcome by running regular expression on the column via a select query.

Use below regx on string
/p=(.*)|q=(.*)/
Then join 2 substrings with spaces.
If you want get result from select query you can use select with concat or concat_ws function instead.

Related

REGEXP_EXTRACT with String Value in Bigquery

I want to extract words in a column, the column value looks like this:'p-fr-youtube-car'. And they should all be extracted to their own column.
INPUT:
p-fr-youtube-car
DESIRED OUTPUT:
Country = fr
Channel = youtube
Item = car
I've tried below to extract the first word, but can't figure out the rest.What RegEx will achieve my desired output from this input? And how can I make it not case sensative fr and FR will be the same.
REGEXP_EXTRACT_ALL(CampaignName, r"^p-([a-z]*)") AS Country
You can use [^-]+ to match parts between hyphens and only capture what you need to fetch.
To get strings like youtube, you can use
REGEXP_EXTRACT_ALL(CampaignName, r'^p-[^-]+-([^-]+)')
To get strings like car, you can use
REGEXP_EXTRACT_ALL(CampaignName, r'^p-[^-]+-[^-]+-([^-]+)')
So, [^-]+ matches one or more chars other than - and ([^-]+) is the same pattern wrapped with a capturing group whose contents REGEXP_EXTRACT actually returns as a result.
You can use named groups.
Example Regex:
p-(?P<Country>[a-z]*)\-(?P<Channel>[a-z]*)\-(?P<Item>[a-z]*)$
https://regex101.com/r/fKoBIn/3
Below is for BigQuery Standard SQL
I would recommend use of SPLIT in cases like yours
#standardSQL
SELECT CampaignName,
parts[SAFE_OFFSET(1)] AS Country,
parts[SAFE_OFFSET(2)] AS Channel,
parts[SAFE_OFFSET(3)] AS Item
FROM `project.dataset.table`,
UNNEST([STRUCT(SPLIT(CampaignName, '-') AS parts)])
if to apply to sample data from your question - the output is
Row CampaignName Country Channel Item
1 p-fr-youtube-car fr youtube car
Meantime, if for some reason you are required to use Regexp - you can use below
#standardSQL
SELECT CampaignName,
parts[SAFE_OFFSET(1)] AS Country,
parts[SAFE_OFFSET(2)] AS Channel,
parts[SAFE_OFFSET(3)] AS Item
FROM `project.dataset.table`,
UNNEST([STRUCT(REGEXP_EXTRACT_ALL(CampaignName, r'(?:^|-)([^-]*)') AS parts)])

Extract text between parenthesis from a postgres table without creating additional column

I am trying to extract text between parenthesis from a column in postgres table. I am using following command. It is creating an additional blank column.
SELECT *, SUBSTRING (col2, '\[(.+)\]') FROM table
My table looks like this:
col1 col2
1 mut(MI_0118)
2 mut(MI_0119)
3 mut(MI_0120)
My desired output is:
col1 col2
1 MI_0118
2 MI_0119
3 MI_0120
How can I extract the text without creating an additional column.
Thanks
Your regex is wrong, that's why you get an empty column. You don't want square brackets, but parentheses around the search string
select col1, substring(col2, '\((.+)\)')
from input
Online example
The * in the SELECT statement is including all columns. Then you are adding another unnamed column. If you do:
SELECT col1, SUBSTRING (col2, '\[(.+)\]') AS col2 FROM table
It will be closer to what you want.

Trim Results After Certain Character

I have a table that list all of the available product ids.
For example, 1020, 1020A, 1020B.
I am looking to group these product ids together.
Is it possible to do this via SQL?
to group rows with 1020, 1020A, 1020B into a group called 1020 you just need to use the substring expression in group by clause:
select substring(your_column from 1 for 4), ...
from ...
group by substring(your_column from 1 for 4)
if you have options with a different length like 102A, 102B turning into 102 you'll need a regular expression for that. The general idea is that you can use any expression, not just the column name, in group by clause

PostgreSQL - finding string using regular expression

What I am looking to do is to, within Postgres, search a column for a string (an account number). I have a log table, which has a parameters column that takes in parameters from the application. It is a paragraph of text and one of the parameters stored in the column is the account number.
The position of the account number is not consistent in the text and some rows in this table have nothing in the column (since no parameters are passed on certain screens). The account number has the following format: L1234567899. So for the account number, the first character is a letter and then it is followed by ten digits.
I am looking for a way to extract the account number alone from this column so I can use it in a view for a report.
So far what I have tried is getting it into an array, but since the position changes, I cannot count on it being in the same place.
select foo from regexp_split_to_array(
(select param from log_table where id = 9088), E'\\s+') as foo
You can use regexp_match() to achieve that result.
(regexp_match(foo,'[A-Z][0-9]{10}'))[1]
DBFiddle
Use substring to pull out the match group.
select substring ('column text' from '[A-Z]\d{10}')
Reference: PostgreSQL regular expression capture group in select

OpenRefine : split a cell based on the a string of 5 numbers (postal code)

I am new to OpenRefine and GREL.
In a address row, I am trying to extract the city and the postal code.
The row will typically contains : 12 rue du Paradis 75012 Paris
I'd like to split this row starting from the 5 digit number (75012). After I could easily extract the city.
In the command "Split into several columns", what Regular expression would you put (or is it another command)?
Thanks!
The 'split into several columns' takes a regular expression as an argument to specify the separator to be used when doing the split. This is probably not what you need in this case - since there isn't a common expression for the separator.
Instead you would probably be better using the "Add column based on this column" option and then using a 'match' function to create the new column. The 'match' takes a regular expression as an argument, but allows you to capture the output - so you can use this to do pattern matching in a string. In this case for example you could use something like:
value.match(/.*\s+(\d{5})\s+(.*)/)
This would capture the 5 digit number and the city in an array:
["75012","Paris"]
You could then use this to create the values you want in the new column, or in two new columns. E.g.:
value.match(/.*\s+(\d{5})\s+(.*)/)[0]
will get the number