I have a field of type character varying. On insert I'd like to strip out special characters. In this particular case I'd like to strip out hyphens from a column of hyphenated strings, hyphen_field"123-456-789" from table_two and insert as "123456789" into non_hyphen_field in table_one. I'm starting with a statement of the following form:
INSERT INTO schema.table_one(var_one,var_two,non_hyphen_field)
SELECT var_one, var_two, hyphen_field
FROM schema.table_two;
What is the cleanest way to accomplish this?
On Postgres you can use replace function.
select replace('123-456-789', '-','');
| replace |
| :-------- |
| 123456789 |
dbfiddle here
Related
I want the user to only be able to enter the values in the following regex:
^[AB | BC | MB | NB | NL | NS | NT | NU | ON |QC | PE | SK | YT]{2}$
My problem is that words like : PP AA QQ are accepted.
I am not sure how i can prevent that ? Thank you.
Site i use to verify the expression : https://regex101.com/
In most RegExp flavors, square brackets [] denotate character classes; that is, a set of individual tokens that can be matched in a specific position.
Because P is included in this character class (along with a quantifier of {2}) PP is matched.
Instead, you seem to want a group with alternatives; for that, you'd use parenthesis () (while also eliminating the whitespace, something it doesn't appear was intentional on your part):
^(AB|BC|MB|NB|NL|NS|NT|NU|ON|QC|PE|SK|YT){2}$
RegEx101
This matches things like ABBC, ABAB, NLBC, etc.
I need to match and replace string like VA123 - so two letters and 3 numbers, but this expression is not working as intended. Any idea where I am going off?
SELECT REGEXP_REPLACE ('test VA123', '^\[A-Z]{2}[0-9]{3}$', 'test')
FROM dual;
I want the output in this case to say test test
It you want white space (or start of a string) before the matched string then you can use:
SELECT REGEXP_REPLACE ('test VA123', '(^|\s)[A-Z]{2}[0-9]{3}$', '\1test')
AS replaced_value
FROM dual;
| REPLACED_VALUE |
| :------------- |
| test test |
db<>fiddle here
For example:
df.select('category').show()
+---------------------------+
| category|
+---------------------------+
| money,insurance|
| life, housework|
| game,FPS,network|
| game,fight,jump|
| hotel|
| trip,hotel|
| null|
I want to use RLIKE to write a regex expression to fuzzy match one of substrings list, ['money', 'life'].
-- This is an exact match
SELECT *
FROM tb_name
WHERE col_name RLIKE '(money|life)'
-- This is a fuzzy match
SELECT *
FROM tb_name
WHERE col_name RLIKE '*.(money|life)'
BUT there is error in ast tree in the fuzzy match code snippet.
06-11 16:59:17-fatal filter ast tree
(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TAB tb_name))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR "hdfs://XXXX/XX")) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (RLIKE (TOK_TABLE_OR_COL col_name ) '*.(money|life)')) (TOK_LIMIT 2000)))
06-11 16:59:17-fatal Filter feature: .TOK_TAB \S tdw_inter_db.*|.TOK_(CUBE|ROLLUP) .
So I can't see anything wrong with the fuzzy match code snippet.
So could anyone help me?
Thanks in advances.
'(?i)money|life' regexp will match strings containing any of money, life, case insensitive - (?i)
I have a table with a single column name_string, which contains backslash character. I wanted to remove the backslash character using regexp_replace, but it does not work.
Table:
create table t (name_string varchar(100));
insert into table t values ('\\"aaa\\"'), ('\\"bbb\\"');
Query:
select
name_string, regexp_replace(name_string, '\\"', '"')
from t;
returning
+--------------+----------+
| name_string | _c1 |
+--------------+----------+
| \"aaa\" | \"aaa\" |
| \"bbb\" | \"bbb\" |
+--------------+----------+
However, select regexp_replace('\"aaa\"', '\\"', '"') returns the correct result.
I am confused about why this may be the case. Could someone please shed light on this? Appreciate it!
Use 4 backslashes:
select regexp_replace(name_string,'\\\\"','"') from t;
Only backslash needs escaping. In Java and in regex the backslash has special meaning and needs escaping.
Maybe try:
select
name_string, regexp_replace(name_string, '\\\"', '"')
from t;
I think it's about escaping - you escape 2 characters- backslash and double quote
I have a csv file that contains 10,000s of rows. Each row has 8 columns. One of those columns contains text similar to this:
this is a row: http://somedomain.com | some_text | http://someanotherdomain.com | some_more_text
this is a row: http://yetanotherdomain.net
this is a row: https://hereisadomain.org | some_text
I'm currently accessing the data in this column this way:
for row in csv_reader:
the_url = row[3]
# this regex is used to find the hrefs
href_regex = re.findall('(?:http|ftp)s?://.*', the_url)
for link in href_regex:
print (link)
Output from the print statement:
http://somedomain.com | some_text | http://someanotherdomain.com | some_more_text
http://yetanotherdomain.net
https://hereisadomain.org | some_text
How do I obtain only the URLs?
http://somedomain.com
http://someanotherdomain.com
http://yetanotherdomain.net
https://hereisadomain.org
Just change your pattern to:
\b(?:http|ftp)s?://\S+
Instead of matching anything with .*, match any non-whitespace characters instead with \S+. You might want to add a word boundary before your non capturing group, too.
Check it live here.
Instead of repeating any character at the end
'(?:http|ftp)s?://.*'
^
repeat any character except a space, to ensure that the pattern will stop matching at the end of a URL:
'(?:http|ftp)s?://[^ ]*'
^^^^