Regular expression exclusion in PostgreSQL - regex

I have to split some string in PostgreSQL on ',' but not on '\,' (backslash is escape character).
For example, regexp_split_to_array('123,45,67\,89', ???) must split the string to array {123, 45, "67\,89"}.
What done already: E'(?<!3),' works with '3' as escape character. But how can I use the backslash instead of 3?
Does not work:
E'(?<!\),' does not split the string at all
E'(?<!\\),' throws error "parentheses () not balanced"
E'(?<!\ ),' (with space) splits on all ',' including '\,'
E'(?<!\\ ),' (with space) splits on all ',' too.

The letter E in front of the text means C string and then you must escape twice, one for the C string and one for the regexp.
Try with and without E:
regexp_split_to_array('123,45,67\,89', '(?<!\\),')
regexp_split_to_array('123,45,67\,89', E'(?<!\\\\),')
Here http://rextester.com/VEE84838 a running example (unnest() is just for row by row display of results):
select unnest(regexp_split_to_array('123,45,67\,89', '(?<!\\),'));
select unnest(regexp_split_to_array('123,45,67\,89', E'(?<!\\\\),'));

You can also split it to groups first:
(\d+),(\d+\,\d+)?
( and later on concatenate them with comma)

Related

REGEX_TOO_COMPLEX error when parsing regex expression

I need to split the CSV file at commas, but the problem is that file can contain commas inside fields. So for an example:
one,two,tree,"four,five","six,seven".
It uses double quotes to escape, but I could not solve it.
I tried to use something like this with this regex, but I got an error: REGEX_TOO_COMPLEX.
data: lv_sep type string,
lv_rep_pat type string.
data(lv_row) = iv_row.
"Define a separator to replace commas in double quotes
lv_sep = cl_abap_conv_in_ce=>uccpi( uccp = 10 ).
concatenate '$1$2' lv_sep into lv_rep_pat.
"replace all commas that are separator with the new separator
replace all occurrences of regex '(?:"((?:""|[^"]+)+)"|([^,]*))(?:,|$)' in lv_row with lv_rep_pat.
split lv_row at lv_sep into table rt_cells.
You must use this Regex => ,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
DATA: lv_sep TYPE string,
lv_rep_pat TYPE string.
DATA(lv_row) = 'one,two,tree,"four,five","six,seven"'.
"Define a separator to replace commas in double quotes
lv_sep = cl_abap_conv_in_ce=>uccpi( uccp = 10 ).
CONCATENATE '$1$2' lv_sep INTO lv_rep_pat.
"replace all commas that are separator with the new separator
REPLACE ALL OCCURRENCES OF REGEX ',(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)' IN lv_row WITH lv_rep_pat.
SPLIT lv_row AT lv_sep INTO TABLE data(rt_cells).
LOOP AT rt_cells into data(cells).
WRITE cells.
SKIP.
ENDLOOP.
Testing output
I never ever touched ABAP, so please see this as pseudo code
I'd recommend using a non-regex solution here:
data: checkedOffsetComma type i,
checkedOffsetQuotes type i,
baseOffset type i,
testString type string value 'val1, "val2, val21", val3'.
LOOP AT SomeFancyConditionYouDefine.
checkedOffsetComma = baseOffset.
checkedOffsetQuotes = baseOffset.
find FIRST OCCURRENCE OF ','(or end of line here) in testString match OFFSET checkedOffsetComma.
write checkedOffsetComma.
find FIRST OCCURRENCE OF '"' in testString match OFFSET checkedOffsetQuotes.
write checkedOffsetQuotes.
*if the next comma is closer than the next quotes
IF checkedOffsetComma < checkedOffsetQuotes.
REPLACE SECTION checkedOffsetComma 1 OF ',' WITH lv_rep_pat.
baseOffset = checkedOffsetComma.
ELSE.
*if we found quotes, we go to the next quotes afterwards and then continue as before after that position
find FIRST OCCURRENCE OF '"' in testString match OFFSET checkedOffsetQuotes.
write baseOffset.
ENDIF.
ENDLOOP.
This assumes that there are no quotes in quotes thingies. Didn't test, didn't validate in any way. I'd be happy if this at least partly compiles :)

Hive regexp_replace

my use case is the follow:
String text_string: "text1:message1,text3:message3,text2:message,..."
select regexp_replace(text_string, '[^:]*:([^,]*(,|$))', '$1')
Correct output: message1,message3,message2,...
The pattern work, but the problem is that if there is a character ":" o "," in the message the replace doesn't work.
So I tried to use "::" and ",," characters as a separators in the string
String text_string: "text1::message1,,text3::message3,,text2::message2,..."
select regexp_replace(text_string, '[^::]*::([^,,]*(,,|$))', '$1')
Correct output: message1,,message3,,message2,,...
but also in this case, if there is one ":" or "," character in the string (in the text or in the message) the replace command doesn't work.
How should the regular expression be modified to work?
Delimiters cannot be characters that are likely to be in the data. Since you have control over it, use pipes '|' or tildes '~' maybe. Only you can come up with the right characters by analyzing the data.
If you can't do that, then you'll need to put quotes around the data that contains the delimiter character and come up with a way to deal with that.

Replace a sequence of characters with a sequence of different characters of same length using regular expressions

I have a string which starts with spaces. I want to replace the leading spaces with equal number of dashes -. I don't want to replace any other spaces which may occur elsewhere in the string.
If I use /^\s*/-/, it only replaces with a single dash. If I use /^\s/-/, it only replaces the first space with a dash. If I remove the anchor /\s/-/, it replaces every occurences of space in the string which is not acceptable.
My string looks like this in general:
<n-leading-spaces><a-non-space-character><remaining-characters>
Example (pipes added to show the boundary):
| ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn |
After substitution (pipes added to show the boundary):
|---ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn |
NOTE: I cannot use any code snippet. I just want to know whether this can be done using just regex patterns. (Forgive my formatting as I'm new to markdown. I welcome formatting corrections)
You can use the following solution to replace a sequence of characters with a sequence of different characters of same length using regular expressions:
my $string = ' ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn ';
$string =~ s/^(\s+)/"-" x length($1)/eg;
print $string;
Returns '----ajfn ssfdjn ng jnv sjfj%nv sjfj n s ;sn '

Regex Find/Replace char on a line before a specific word

Hope here is the right place to write ask this question.
I am preparing a script to import to a database using notepad++.
I have a huge file that has rows like that:
(10496, '69055-230', 'Rua', '5', 'Manaus', 'Parque 10 de Novembro',
'AM'),
INSERT INTO dne id, cep, tp_logradouro, logradouro, cidade,
bairro, uf VALUES
Is there a way using FIND/REPLACE to replace the ',' to ';' on every line before the INSERT statement?
I am not sure how to match the end of the line before a specific word.
The result would be
(10496, '69055-230', 'Rua', '5', 'Manaus', 'Parque 10 de Novembro',
'AM');
INSERT INTO dne id, cep, tp_logradouro, logradouro, cidade,
bairro, uf VALUES
Find what: ,(?=\s*INSERT)
Replace with: ;
Description
, matches a literal comma
(?=\s*INSERT) is a lookeahead that will assert for (but won't consume)
\s* any number of white spaces (including newlines)
INSERT as literal
If you also want to replace any commas before the end of the file, use
,(?=\h*\R\h*INSERT|\s*\z)
Note both expressions would fail if you have another instance of a comma followed by INSERT that shouldn't be replaced, but in that case you should specify it in the question.
You don't even need a regular expression for that.
Select Extended in Search Mode
Replace ,\nINSERT INTO with ;\nINSERT INTO
This matches , at the end of a line just before INSERT INTO at the beginning of the next line. Keep in mind that \n will match only in a Linux/Unix/Mac OS X file. For Windows use \r\n, for Mac OS Classic \r (reference).
Using sublim text or notepad++, click CTRL+h and replace all ")INSERT," by ");INSERT"
I expect that the INSERT statements will all have the form:
INSERT INTO table col1, col2, col3, ...
VALUES (val1, val2, val3, ...),
^^ what you want to replace
Assuming that the only place that ), will be observed is the end of the VALUES line, then you can just can just do the following replacement:
Find: ),$
Replace: );$
You can do this replacement with the regex option enabled.

Detect \ using regex in R [duplicate]

I'm writing strings which contain backslashes (\) to a file:
x1 = "\\str"
x2 = "\\\str"
# Error: '\s' is an unrecognized escape in character string starting "\\\s"
x2="\\\\str"
write(file = 'test', c(x1, x2))
When I open the file named test, I see this:
\str
\\str
If I want to get a string containing 5 backslashes, should I write 10 backslashes, like this?
x = "\\\\\\\\\\str"
[...] If I want to get a string containing 5 \ ,should i write 10 \ [...]
Yes, you should. To write a single \ in a string, you write it as "\\".
This is because the \ is a special character, reserved to escape the character that follows it. (Perhaps you recognize \n as newline.) It's also useful if you want to write a string containing a single ". You write it as "\"".
The reason why \\\str is invalid, is because it's interpreted as \\ (which corresponds to a single \) followed by \s, which is not valid, since "escaped s" has no meaning.
Have a read of this section about character vectors.
In essence, it says that when you enter character string literals you enclose them in a pair of quotes (" or '). Inside those quotes, you can create special characters using \ as an escape character.
For example, \n denotes new line or \" can be used to enter a " without R thinking it's the end of the string. Since \ is an escape character, you need a way to enter an actual . This is done by using \\. Escaping the escape!
Note that the doubling of backslashes is because you are entering the string at the command line and the string is first parsed by the R parser. You can enter strings in different ways, some of which don't need the doubling. For example:
> tmp <- scan(what='')
1: \\\\\str
2:
Read 1 item
> print(tmp)
[1] "\\\\\\\\\\str"
> cat(tmp, '\n')
\\\\\str
>