Postgres regexp_replace query usage - regex

We have a column with a particular prefix value followed by dynamic digits, for example,
AAAA0000
AAAA0001
AAAA0002
AAAA0003
...
...
Now we want to update the prefix value from AAAA to BBBB in all the rows it exists. I have tried using regexp_replace, replace and also other possible function but without success.
Could you please help me to do this?

update table_name set the_column = 'BBBB'||substr(the_column, 6,13) where the_column like 'AAAA%';
Where as, 6 is the starting position of the digits and 13 is the ending position of the string..
So the value 'BBBB' will get updated till the position 5 and then the concatenation of sub string as extracted above.

I don't see the need for regex here:
update the_table
set the_column = 'BBBB'||substr(the_column, 5)
where the_column like 'AAAA%';

Here's how you use regexp_replace here:
update the_table
set the_column = regexp_replace(the_column, '^AAAA', 'BBBB');

Related

How to split string in PostgreSQL

we have ranges like this
"0,5, 0,5"
"0,112, 0,118"
and want to split by the second comma.
Any idea?
You can update the regex you split by with comma then a space after.
select regexp_split_to_array('0,112, 0,118', ', ')
demo:db<>fiddle
Supposing, there ist always at least one space after the second comma and none after the others, you could use this for the split regex:
SELECT
regexp_split_to_array(ranges, ',\s+')
FROM
t
This returns an array like {"0,5","0,5"}.
You can split both ranges into columns using a subquery:
SELECT
r[1],
r[2]
FROM (
SELECT
regexp_split_to_array(ranges, ',\s+') as r
FROM
t
) s
Edit:
TO wants to get everything after the second comma. So you need a regex for splitting, which finds the nth (here n = 2) occurrence of a comma:
(?:(^.*?,.*)),
This can be used to query the required data:
demo:db<>fiddle
SELECT
(regexp_split_to_array(ranges, '(?:(^.*?,.*)),'))[2]
FROM
t
Use regexp_replace:
select regexp_replace('0,112, 0,118', '.*,\s+', '') as foo;
Output:
foo
-------
0,118
(1 row)
Thank you all for the quick answers. It finally worked by using this
regexp_matches(your_string_value, '\d+[,|.]\d+|\d+','g'))[1]
This helped me getting rid of all unnecessary characters within the values + delivered me back the second value in the range.

I have to remove last number and characters in the columns in Python dataframe?

I have a column which has values like this: 34343434,4 and 223232,5.
I want to remove the last digit and character , and expecting like this: 34343434 223232.
I tried using this code:
dataframe['column name'] = pd.to_numeric(dataframe['column name']).astype.float().round(0, 2)
But it didn't work. Could somebody help me get the output I desire?
You can try this way using regular expression:
dataframe['colname'] = dataframe['colname'].str.replace(',.','',regex=True)
documentation link

regexp_replace() - matches but does not replace at end of line

I'm trying to regexp_replace() all the values of a column ending without "/", by adding "/".
I can get the correct values by using this statement (the pattern was tested with a PCRE checker):
SELECT * FROM `table` WHERE `column` REGEXP("(?<=[^\/])$");
And the non-matching ones with:
SELECT * FROM `table` WHERE `column` REGEXP("(?<![^\/])$");
But when the statement is:
UPDATE `table` SET `column` = REGEXP_REPLACE(`column`, "(?<=[^\/])$", "/");
Then, there is no change, whatever value I put into the third parameter:
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1031 Changed: 0 Warnings: 0
You could do this easily without regex:
UPDATE `table` SET `column` = `column` + '/'
WHERE RIGHT(`column`, 1) <> '/'
trying to understand why it does not work
As I rationalize the problem, you are asking REGEXP_REPLACE to do two things:
Discover that something is missing, and
Point to a location in the string.
Your regexp says that it is missing, but I question whether it points to a specific substring (even an empty one) for replacing. It's easy to point to a found substring (or substrings). It is hard to point to a missing substring. And such a 'pointer' is needed to do the replacement.
Hence, Michal's approach (even if some regexp were needed) is the "right" way to solve the problem.

How can I separate a string by underscore (_) in google spreadsheets using regex?

I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)
=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("&REGEXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns
To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.
I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!

PostgreSQL UPDATE substring replacement

I have a few rows in a test database where there are dollar signs prefixed to the value. I want to UPDATE the values in the name row of the test1 table however when I threw the following query together it emptied the six rows of data in the name column...
UPDATE test1 SET name=overlay('$' placing '' from 1 for 1);
So "$user" became "" when I intended for that column/row value to become "user".
How do I combine UPDATE and a substr replacement without deleting any of the other data?
If there isn't a dollar sign I want the row to remain untouched.
The dollar sign only occurs as the first character when it does occur.
If you want to replace all dollar signs, use this:
update test1
set name = replace(name, '$', '');
If you want to replace the $ only at the beginning of the value you can use substr() and a where clause to only change those rows where the column actually starts with a $
update test1
set name = substr(name, 2)
where name like '$%';
To answer the question using the pattern the OP had in mind.
UPDATE test1 SET name=overlay(name placing '' from 1 for 1)
WHERE name like '$%';