Hive regexp_replace failed to replace backslash

Hive regexp_replace failed to replace backslash - regex

I have a table with a single column name_string, which contains backslash character. I wanted to remove the backslash character using regexp_replace, but it does not work.
Table:
create table t (name_string varchar(100));
insert into table t values ('\\"aaa\\"'), ('\\"bbb\\"');
Query:
select
name_string, regexp_replace(name_string, '\\"', '"')
from t;
returning
+--------------+----------+
| name_string | _c1 |
+--------------+----------+
| \"aaa\" | \"aaa\" |
| \"bbb\" | \"bbb\" |
+--------------+----------+
However, select regexp_replace('\"aaa\"', '\\"', '"') returns the correct result.
I am confused about why this may be the case. Could someone please shed light on this? Appreciate it!

Use 4 backslashes:
select regexp_replace(name_string,'\\\\"','"') from t;
Only backslash needs escaping. In Java and in regex the backslash has special meaning and needs escaping.

Maybe try:
select
name_string, regexp_replace(name_string, '\\\"', '"')
from t;
I think it's about escaping - you escape 2 characters- backslash and double quote

Related

Redshift Translate command to replace characters

I need to translate commas in a column to pipe with with spaces on each side in Redshift ('a,b,c' becomes 'a | b | c' using Translate. Something in this statement is not giving me my desired results and I can't figure out why?
select 'a,b,c' as comma_string, translate(comma_string, ',', ' | ' ) as pipe_string
is yielding 'a b c' with no pipes. Having trouble getting the space before and after the pipe as
select 'a,b,c' as comma_string, translate(comma_string, ',', '|' ) as pipe_string
gives me 'a|b|c'

The REPLACE command works for this. NOt sure why Translate doesn't.
select 'a,b,c' as comma_string, REPLACE(comma_string, ',' ,' | ') as pipe_string
yields the desired result 'a | b | c'

You would need to use REPLACE since TRANSLATE only maps single characters:
TRANSLATE is similar to the REPLACE function and the REGEXP_REPLACE function, except that REPLACE substitutes one entire string with another string and REGEXP_REPLACE lets you search a string for a regular expression pattern, while TRANSLATE makes multiple single-character substitutions.
https://docs.aws.amazon.com/redshift/latest/dg/r_TRANSLATE.html

Match a word in a list of words regex

I want the user to only be able to enter the values in the following regex:
^[AB | BC | MB | NB | NL | NS | NT | NU | ON |QC | PE | SK | YT]{2}$
My problem is that words like : PP AA QQ are accepted.
I am not sure how i can prevent that ? Thank you.
Site i use to verify the expression : https://regex101.com/

In most RegExp flavors, square brackets [] denotate character classes; that is, a set of individual tokens that can be matched in a specific position.
Because P is included in this character class (along with a quantifier of {2}) PP is matched.
Instead, you seem to want a group with alternatives; for that, you'd use parenthesis () (while also eliminating the whitespace, something it doesn't appear was intentional on your part):
^(AB|BC|MB|NB|NL|NS|NT|NU|ON|QC|PE|SK|YT){2}$
RegEx101
This matches things like ABBC, ABAB, NLBC, etc.

Remove special characters from string on insert?

I have a field of type character varying. On insert I'd like to strip out special characters. In this particular case I'd like to strip out hyphens from a column of hyphenated strings, hyphen_field"123-456-789" from table_two and insert as "123456789" into non_hyphen_field in table_one. I'm starting with a statement of the following form:
INSERT INTO schema.table_one(var_one,var_two,non_hyphen_field)
SELECT var_one, var_two, hyphen_field
FROM schema.table_two;
What is the cleanest way to accomplish this?

On Postgres you can use replace function.
select replace('123-456-789', '-','');
| replace |
| :-------- |
| 123456789 |
dbfiddle here

How to use regex with cson

I wanna capture logical operators from ooRexx with regex in a .cson file because I want support syntax highlighting of ooRexx with the Atom editor. Those are the operators I try to cover:
>= <= \> \< \= >< <> == \== // && || ** ¬> ¬< ¬= ¬== >> << >>= \<< ¬<< \>> ¬>> <<=
And this is the regex part in the cson file:
'match': '\\+ | - | [\\\\] | \\/ | % | \\* | \\| | & |=|¬|>|<|
>= | <= | ([\\\\]>) | ([\\\\]<) | ([\\\\]=) | >< | <> | == | ([\\\\]==) |
\\/\\/ | && | \\|\\| | \\*\\* | ¬> | ¬< | ¬= | ¬== | >> | << | >>= | ([\\\\]<<) | ¬<< |
([\\\\]>>) | ¬>> | <<='
I'm struggling with the slashes (forward and backward) and also with the double **My knowledge about regex is very basic, to say it nicely. Is there somebody who can help me with that?

You have spaces around the pipe bars: these spaces are counted in the regular expression. So when you write something like | \*\* |, the double asterisks get caught, but only if they are surrounded by a space on each side, and not if they're affixed to a word or at the beginning/end of a line. Same issue with the slashes — I have tested it, and it does seem to catch them for me, but only as long as your slashes (or asterisks) are between two spaces.
A few other things to keep in mind:
You shouldn't need the square brackets around backslashes; they're useful to provide classes of possible characters to match. For instance, [<>]= will catch both >= and <=. Writing [\\] is equivalent to writing \\ directly because \\ counts as a single character, due to the first escaping backslash. Similarly, your parentheses here are not being used; see grouping.
Also think of using repetition operators like + and *. So \\>+ will catch both \> and \>>.
Finally, the question mark will help you avoid repetition, by marking the previous character (or group of characters, in square brackets) as optional. ==? will match both = and ==.
You can group together a LOT of your statements with these three tricks combined… I'll leave that exercise to you!
Just another hint when developing long regular expressions — use a tester like Regex101 or similar with a test file to see your changes in real time, and debuggers like Regexper will help you understand how your regular expression is parsed.

Oracle SQL Regex not returning expected results

I am using a regex that works perfectly in Java/PHP/regex testers.
\d(?:[()\s#-]*\d){3,}
Examples: https://regex101.com/r/oH6jV0/1
However, trying to use the same regex in Oracle SQL is returning no results. Take for example:
select *
from
(select column_value str from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3')))
where regexp_like(str, '\d(?:[()\s#-]*\d){3,}');
This returns no rows. Why does this act so differently? I even used a regex tester that does POSIX ERE, but that still works.

Oracle does not support non-capturing groups (?:). You will need to use a capturing group instead.
It also doesn't like the perl-style whitespace meta-character \s match inside a character class [] (it will match the characters \ and s instead of whitespace). You will need to use the POSIX expression [:space:] instead.
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
select *
from (
select column_value str
from table(sys.dbms_debug_vc2coll('123','1234','12345','12 135', '1', '12 3'))
)
where regexp_like(str, '\d([()[:space:]#-]*\d){3,}')
Results:
| STR |
|--------|
| 1234 |
| 12345 |
| 12 135 |

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Hive regexp_replace failed to replace backslash - regex

Use 4 backslashes: select regexp_replace(name_string,'\\\\"','"') from t; Only backslash needs escaping. In Java and in regex the backslash has special meaning and needs escaping.

Maybe try: select name_string, regexp_replace(name_string, '\\\"', '"') from t; I think it's about escaping - you escape 2 characters- backslash and double quote

Related

Redshift Translate command to replace characters

Match a word in a list of words regex

Remove special characters from string on insert?

How to use regex with cson

Oracle SQL Regex not returning expected results

Categories

Resources