Regular expression to match strings in "" - regex

I have this \"([^"]*)\"
and on data """Storno ISP""- ""Nesprávne nastavená modulácia KZ (G.DMT/G.992.1B), potrebné nastaviť adsl2+ (G.992.5B)""" "Fast" "Battery" "JNAKA".
I would like to match only "Fast" "Battery" "JNAKA".
Where am I wrong?

You may require no double quotes on each side:
(?<!")"([^"]+)"(?!")
See the regex demo
Details
(?<!") - no " immediately on the left is allowed
" - a " char
([^"]+) - Group 1: one or more chars other than "
" - a " char
(?!") - no " immediately on the right is allowed.

Related

Regex expression to remove only two first quotes in every line of text

I need to remove only first two quotes in every line:
"key0": "some very long text",
"key1": "some very long text",
into
key0: "some very long text",
key1: "some very long text",
I tried [^"]{2} "{2}
You can use
Find What: ^([^"]*)"([^"]*)"
Or, if there are lines with no quotes in them, so as to avoid overmatching:
Find What: ^([^"\r\n]*)"([^"\r\n]*)"
Replace With: $1$2
See the regex demo.
Details:
^ - start of string
([^"]*) - Group 1 ($1): zero or more chars other than a " char ([^"\r\n]* matches any zero or more chars other than ", CR and LF)
" - a " char
([^"]*) - Group 2 ($2): zero or more chars other than a " char
" - a " char

How to ignore a semicolon if it's between quotation marks in regex?

I want to write a regex for splitting some protocol I have (up to) 4 sections, divided by ;. My problem is that if said ; is between quotation marks, I want to ignore it. How do I do that?
I don't want my groups to terminate when the ; is between quotation marks.
This is what I've got so far -
((?:.)+?;)
example input -
_-₪* #,##0.00_-;-₪* #,##0.00_-;" ; asd"_-₪* "-"??_-;_-#_-
should return
group1 - _-₪* #,##0.00_-;
group2 - -₪* #,##0.00_-;
group3 - " ; asd"_-₪* "-"??_-;
group4 - _-#_-
Thanks
You can use
(?:"[^"]*"|[^;])+
See the regex demo.
Details
(?: - start of a non-capturing group:
" - a " char
[^"]* - any zero or more (*) chars other than a " char ([^...] is a negated character class)
" - a " char
| - or
[^;] - any char other than a ; char
)+ - end of the non-capturing group, repeat one or more times (+).

Match every thing between "****" or [****]

I have a regex that look like this:
(?="(test)"\s*:\s*(".*?"|\[.*?]))
to match the value between "..." or [...]
Input
"test":"value0"
"test":["value1", "value2"]
Output
Group1 Group2
test value0
test "value1", "value2" // or - value1", "value2
I there any trick to ignore "" and [] and stick with two group, group1 and group2?
I tried (?="(test)"\s*:\s*(?="(.*?)"|\[(.*?)])) but this gives me 4 groups, which is not good for me.
You may use this conditional regex in PHP with branch reset group:
"(test)"\h*:\h*(?|"([^"]*)"|\[([^]]*)])
This will give you 2 capture groups in both the inputs with enclosing " or [...].
RegEx Demo
RegEx Details:
(?|..) is a branch reset group. Here Subpatterns declared within each alternative of this construct will start over from the same index
(?|"([^"]*)"|\[([^]]*)]) is if-then-else conditional subpatern which means if " is matched then use "([^"]*)" otherwise use \[([^]]*)] subpattern
You can use a pattern like
"(test)"\s*:\s*\K(?|"\K([^"]*)|\[\K([^]]*))
See the regex demo.
Details:
" - a " char
(test) - Group 1: test word
" - a " char
\s*:\s* - a colon enclosed with zero or more whitespaces
\K - match reset operator that clears the current overall match memory buffer (group value is still kept intact)
(?|"\K([^"]*)|\[\K([^]]*)) - a branch reset group:
"\K([^"]*) - matches a ", then discards it, and then captures into Group 2 zero or more chars other than "
| - or
\[\K([^]]*) - matches a [, then discards it, and then captures into Group 2 zero or more chars other than ]
In Java, you can't use \K and ?|, use capturing groups:
String s = "\"test\":[\"value1\", \"value2\"]";
Pattern pattern = Pattern.compile("\"(test)\"\\s*:\\s*(?:\"([^\"]*)|\\[([^\\]]*))");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println("Key: " + matcher.group(1));
if (matcher.group(2) != null) {
System.out.println("Value: " + matcher.group(2));
} else {
System.out.println("Value: " + matcher.group(3));
}
}
See a Java demo.

R Regex number followed by punctuation followed by space

Suppose I had a string like so:
x <- "i2: 32390. 2093.32: "
How would I return a vector that would give me the positions of where a number is followed by a : or a . followed by a space?
So for this string it would be
"2: ","0. ","2: "
The regex you need is just '\\d[\\.:]\\s'. Using stringr's str_extract_all to quickly extract matches:
library(stringr)
str_extract_all("i2: 32390. 2093.32: ", '\\d[\\.:]\\s')
produces
[[1]]
[1] "2: " "0. " "2: "
You can use it with R's built-in functions, and it should work fine, as well.
What it matches:
\\d matches a digit, i.e. number
[ ... ] sets up a range of characters to match
\\. matches a period
: matches a colon
\\s matches a space.

Regex on numbers and spaces

I'm trying to match numbers surrounded by spaces, like this string:
" 1 2 3 "
I'm puzzled why the regex \s[0-9]\s matches 1 and 3 but not 2. Why does this happen?
Because the space has already been consumed:
\s[0-9]\s
This matches "spacedigitspace" so lets go through the process
" 1 2 3 "
^
|
No match
" 1 2 3 "
^
|
No match
" 1 2 3 "
^
|
Matches, consume " 1 "
"2 3 "
^
|
No match
"2 3 "
^
|
No match
"2 3 "
^
|
No match
"2 3 "
^
|
Matches, consume " 3 "
You want a lookaround:
(?<=\s)\d(?=\s)
This is very different, as it look for \d and then asserts that it is preceded by, and followed by, a space. This assertion is "zero width" which means that the spaces aren't consumed by the engine.
More precisely, the regex \s[0-9]\s does not match 2 only when you go through all matches in the string " 1 2 3 " one by one. If you were to try to start matching at positions 1 or 2, " 2 " would be matched.
The reason for this is that \s is capturing part of the input - namely, the spaces around the digit. When you match " 1 ", the space between 1 and 2 is already taken; the regex engine is looking at the tail of the string, which is "2 3 ". At this point, there is no space in front of 2 that the engine could capture, so it goes straight to finding " 3 "
To fix this, put spaces into zero-length look-arounds, like this:
(?<=\s)[0-9](?=\s)
Now the engine ensures that there are spaces in front and behind the digit without consuming these spaces as part of the match. This lets the engine treat the space between 1 and 2 as a space behind 1 and also as a space in front of 2, thus returning both matches.
The input is captured, and the subsequent matches won't match, you can use a lookahead to fix this
\s+\d+(?=\s+)
The expression \s[0-9]\s mathces " 1 " and " 3 ". As the space after the 1 is matched, it can't also be used to match " 2 ".
You can use a positive lookbehind and a positive lookahead to match digits that are surrounded by spaces:
(?<= )(\d+)(?= )
Demo: https://regex101.com/r/hT1dT6/1