PostgreSQL UPDATE substring replacement

PostgreSQL UPDATE substring replacement - regex

I have a few rows in a test database where there are dollar signs prefixed to the value. I want to UPDATE the values in the name row of the test1 table however when I threw the following query together it emptied the six rows of data in the name column...
UPDATE test1 SET name=overlay('$' placing '' from 1 for 1);
So "$user" became "" when I intended for that column/row value to become "user".
How do I combine UPDATE and a substr replacement without deleting any of the other data?
If there isn't a dollar sign I want the row to remain untouched.
The dollar sign only occurs as the first character when it does occur.

If you want to replace all dollar signs, use this:
update test1
set name = replace(name, '$', '');
If you want to replace the $ only at the beginning of the value you can use substr() and a where clause to only change those rows where the column actually starts with a $
update test1
set name = substr(name, 2)
where name like '$%';

To answer the question using the pattern the OP had in mind.
UPDATE test1 SET name=overlay(name placing '' from 1 for 1)
WHERE name like '$%';

Related

Athena - check if column a string contains column b string

I have a table with two columns, column A is a URL string and column B is a tracking id string. I need to check whether the tracking id string is included in the URL string, and then remove it from the URL string if so. Guessing it's quite straightforward but I just think of how do it. Thanks.

You can check with:
url like '%' || tracking || '%'
You could remove the substring with:
replace(url, tracking)
Frankly, there is no need to check for the tracking ID first, since if it isn't present, it simply won't be replaced, so you can just use the replace(url, tracking) command.
See: 6.9. String Functions and Operators — Presto Documentation

string replace method to be replaced by regular expression

I am using string replace method to clean-up column names.
df.columns=df.columns.str.replace("#$%./- ","").str.replace(' ', '_').str.replace('.', '_').str.replace('(','').str.replace(')','').str.replace('.','').str.lower()
Though it works, certainly does not look pythonic. Any suggestion?
I need only A-Za-z and underscore _ if required as column names.
Update:
I tried using Regular expression in the first replace method, but I still need to chain the string like this...
terms.columns=terms.columns.str.replace(r"^[^a-zA-Z1-9]*", '').str.replace(' ', '_').str.replace('(','').str.replace(')','').str.replace('.', '').str.replace(',', '')
Update showing test data:
Original string (Tab separated):
[Sr.No. Course Terms Besic of Education Degree Course Course Approving Authority (i.e Medical Council, etc.) Full form of Course 1 year Duration 2nd year 3rd year Duration 4 th year Duration]
Change column names:
terms.columns=terms.columns.str.replace(r"^[^a-zA-Z1-9]*", '').str.replace(' ', '_').str.replace('(','').str.replace(')','').str.replace('.', '').str.replace(',', '').str.lower()
Output:
['srno', 'course', 'terms', 'besic_of_education', 'degree_course',
'course_approving_authority_ie_medical_council_etc',
'full_form_of_course', '1_year_duration', '2nd_year_',
'3rd_year_duration', '4_th_year_duration']
Above output is correct. The question: Is there any way to achive the same other than the way I have used?

You can use a smaller number of .replace operations by replacing non-word strings with an empty string and subsequently removing the whitespace characters with an underscore.
df.columns.str.replace("[^\w\s]+","").str.replace("\s+","_")‌.str.lower()
I hope this helps.

How can I separate a string by underscore (_) in google spreadsheets using regex?

I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)

=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("&REGEXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns

To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.

I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!

Postgresql - How do I extract the first occurence of a substring in a string using a regular expression pattern?

I am trying to extract a substring from a text column using a regular expression, but in some cases, there are multiple instances of that substring in the string.
In those cases, I am finding that the query does not return the first occurrence of the substring. Does anyone know what I am doing wrong?
For example:
If I have this data:
create table data1
(full_text text, name text);
insert into data1 (full_text)
values ('I 56, donkey, moon, I 92')
I am using
UPDATE data1
SET name = substring(full_text from '%#"I ([0-9]{1,3})#"%' for '#')
and I want to get 'I 56' not 'I 92'

You can use regexp_matches() instead:
update data1
set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1];
As no additional flag is passed, regexp_matches() only returns the first match - but it returns an array so you need to pick the first (and only) element from the result (that's the [1] part)
It is probably a good idea to limit the update to only rows that would match the regex in the first place:
update data1
set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1]
where full_text ~ 'I [0-9]{1,3}'

Try the following expression. It will return the first occurrence:
SUBSTRING(full_text, 'I [0-9]{1,3}')

You can use regexp_match() In PostgreSQL 10+
select regexp_match('I 56, donkey, moon, I 92', 'I [0-9]{1,3}');
Quote from documentation:
In most cases regexp_matches() should be used with the g flag, since
if you only want the first match, it's easier and more efficient to
use regexp_match(). However, regexp_match() only exists in PostgreSQL
version 10 and up. When working in older versions, a common trick is
to place a regexp_matches() call in a sub-select...

Postgres regexp_replace query usage

We have a column with a particular prefix value followed by dynamic digits, for example,
AAAA0000
AAAA0001
AAAA0002
AAAA0003
...
...
Now we want to update the prefix value from AAAA to BBBB in all the rows it exists. I have tried using regexp_replace, replace and also other possible function but without success.
Could you please help me to do this?

update table_name set the_column = 'BBBB'||substr(the_column, 6,13) where the_column like 'AAAA%';
Where as, 6 is the starting position of the digits and 13 is the ending position of the string..
So the value 'BBBB' will get updated till the position 5 and then the concatenation of sub string as extracted above.

I don't see the need for regex here:
update the_table
set the_column = 'BBBB'||substr(the_column, 5)
where the_column like 'AAAA%';

Here's how you use regexp_replace here:
update the_table
set the_column = regexp_replace(the_column, '^AAAA', 'BBBB');

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PostgreSQL UPDATE substring replacement - regex

To answer the question using the pattern the OP had in mind. UPDATE test1 SET name=overlay(name placing '' from 1 for 1) WHERE name like '$%';

Related

Athena - check if column a string contains column b string

string replace method to be replaced by regular expression

How can I separate a string by underscore (_) in google spreadsheets using regex?

Postgresql - How do I extract the first occurence of a substring in a string using a regular expression pattern?

Postgres regexp_replace query usage

Categories

Resources