I have a string of text:
\n new"test \n aaaa" \n ta \n `this is a \n newline that should be kept`
My goal is to match all \n's outside of backticks (`), quotes ("), or single quotes ('). Based off another question (https://stackoverflow.com/a/48953880/14465957), I switched the positive lookahead used to a negative one, which now matches all newlines outside of quotes ("). However, it doesn't work when I attempted to ignore single and back ticks.
What am I doing wrong?
Working quotes:
https://regex101.com/r/ooqz5d/1/
If you're using PCRE, you can use a control verb to skip everything inside of a quote closure:
(['"`]).*?\1(*SKIP)(*F)|\\n
(['"`]) any type of quote, put it in group 1
.*? any characters, non greedy
\1 the quote that captured in group 1
(*SKIP)(*F) skip the current match, which is a quote closure
|\\n match a \n
See the test cases
Also, if you need to ignore escaped quotes(\", \' etc), you may try
(['"`])(?:(?<!\\)\\(?:\\\\)*\1|(?!\1).)*\1(*SKIP)(*F)|\\n
Check the test cases
Using JavaScript
For JavaScript, you can't use control verbs. But you can use group capture to replace outbound \n
Regex
((['"`])[\s\S]*?\2)|\\n
Substitution
$1
const regex = /((['"`])[\s\S]*?\2)|\\n/g;
const text = String.raw`\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n', text);
const result = text.replace(regex, '$1');
console.log('after\n', result);
Real line breaks
const regex = /((['"`])[\s\S]*?\2)|\n/g;
const text = `\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n----\n', text);
const result = text.replace(regex, '$1');
console.log('after\n----\n', result);
Use
text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1')
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
JavaScript code:
const text = String.raw`\nnew"test\naaaa\\\n"\nta\n\`this is a \nnewline that should be kept\`\n'this is a \nnew test'\n`
console.log(text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1'))
I am trying to capture words on the right side of this regex expression that are not captured on the left.
In the code below, the left side captures "17 inch" in this string: "this 235/45R17 is a 17 inch tyre"
(?<=([-.0-9]+(\s)(inches|inch)))|???????
However, anything I put in the right side, such as a simple +w is interfering with the left side
How can I tell the RegEx to capture any word, unless it is a digit followed by inch - in which case capture both 17 and inch?
Description
((?:(?![0-9.-]+\s*inch(?:es)?).)+)|([0-9.-]+\s*inch(?:es)?)
** To see the image better, simply right click the image and select view in new window
Example
Live Demo
https://regex101.com/r/fY9jU5/2
Sample text
this 235/45R17 is a 17 inch tyre
Sample Matches
Capture group 1 will be the values that didn't match the 17 inch
Capture Group 2 will be the number of inches
MATCH 1
1. [0-20] `this 235/45R17 is a `
MATCH 2
2. [20-27] `17 inch`
MATCH 3
1. [27-32] ` tyre`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture (1 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[0-9.-]+ any character of: '0' to '9', '.',
'-' (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
inch 'inch'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount
possible)):
----------------------------------------------------------------------
es 'es'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[0-9.-]+ any character of: '0' to '9', '.', '-'
(1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
inch 'inch'
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
es 'es'
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
I have a string that looks like this:
5 Secs ( 14.2725% ) 60 Secs ( 12.630% ) 300 Secs ( 15.5993% )
Using (\d{2}[.]\d{3}), I can match the values I want; but, I just need value 1 on one query, value 2 on another query and value 3 for the third. This is part of a monitoring system, so it has to be done with a single line of regex, I don't have access to other shell tools that would make this easy.
Description
^(?:[^(]*\([^)]*\)){2}[^(]*\(\s*\K[0-9]{2}[.][0-9]{3}
^^^
The number in the {2} can be changed on demand and the expression will then select the N+1 percent value in the string. If you select 0 then the expression will match the first percent, if you select 1 then the expression will match the second percent... and so on.
This expression assumes the language supports the \K regex command that forces the engine to drop everything that it has matched uptil the \K.
Example
Live Demo
https://regex101.com/r/oX0mJ4/1
Sample text
5 Secs ( 14.2725% ) 60 Secs ( 12.630% ) 300 Secs ( 15.5993% )
Sample Matches
Using the expression as written it returns the 3 third entry.
15.599
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
(?: group, but do not capture (2 times):
----------------------------------------------------------------------
[^(]* any character except: '(' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
[^)]* any character except: ')' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
){2} end of grouping
----------------------------------------------------------------------
[^(]* any character except: '(' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\K 'K'
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
[.] any character of: '.'
----------------------------------------------------------------------
[0-9]{3} any character of: '0' to '9' (3 times)
----------------------------------------------------------------------
I have below string (one long string).
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1111","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1112","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2223"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=22223&Locale=en_US","selector":"ExecutionEnd"},
Please note:
The string is in one line and you could notice that in each {} pair, the content is very similar.
I could only do it with regex and cannot do any split by any functions.
I want to use regular expression to filter out the one containing CustomerID with minimum complete length. For example, I want to filter out as below.
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"}
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2223"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=22223&Locale=en_US","selector":"ExecutionEnd"},
But I'm not sure how to do this. I tried many times with zero width assertion but still cannot figure it out. Could you please enlighten me? Thanks!
Forward
I don't recommend using Regex to parse JSON because of all the possible edge cases. But it appears you have some control over the data and can therefore limit the edge cases.
Description
Based on your source text, this regex will do the following:
Find all the JSON entries that have a CustomerID field in nested inside the Params array, and embeded inside the href string
Validates that both the CustomerID located in Params and href are identical
work with both compressed and expanded JSON
Avoids some obvious edge cases that the regex police complain about
Note: running this regex I used the Case insensitive flag.
\{(?=(?:"[^"]*"|[^{}"]*|\{[^{}]*})*?"params":\{(?:"[^"]*"|[^{}"]*|\{[^{}]*})*?"CustomerID":"([^"]*)")(?=(?:"[^"]*"|[^{}"]*|\{[^{}]*})*?"href":"[^"]*&CustomerID=\1)(?:"[^"]*"|[^{}"]*|\{[^{}]*})*}
To view the image better, right click the image and select open in new window.
Examples
Source Text
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1111","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"}
,{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01"},"id":"1112","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&Locale=en_US","selector":"ExecutionEnd"},{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2223"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=22223&Locale=en_US","selector":"ExecutionEnd"},
{"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"44444"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=44444&Locale=en_US","selector":"ExecutionEnd"},
Matches
[0][0] = {"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"2222"},"id":"1234","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=2222&Locale=en_US","selector":"ExecutionEnd"}
[0][1] = 2222
[1][0] = {"type":"Execution","typeValue":"Custom","targetValue":"_self","params":{"_report":"reportname","hyperlinkInput":"2","Organization":"orgid","As_Of_Date":"2016-04-01","CustomerID":"44444"},"id":"1235","href":"href?hyperlinkInput=2&Organization=orgid&As_Of_Date=2016-04-01&CustomerID=44444&Locale=en_US","selector":"ExecutionEnd"}
[1][1] = 44444
Explained
Capture Groups
group 0 gets the entire matching JSON block
group 1 gets the value associated with CustomerID
Expanded
NODE EXPLANATION
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
"params": '"params":'
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
"CustomerID":" '"CustomerID":"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most
amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)*? end of grouping
----------------------------------------------------------------------
"href":" '"href":"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
&CustomerID= '&CustomerID='
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[^{}"]* any character except: '{', '}', '"' (0
or more times (matching the most amount
possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
\{ '{'
----------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
} '}'
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
} '}'
I am trying to make regular expression for Valid sharepoint folder name, which have conditions:
Cannot begin or end with a dot,
Cannot contain consecutive dots and
Cannot contain any of the following characters: ~ " # % & * : < > ? / \ { | }.
Wrote Regex for 1st and 3rd point:
[^\.]([^~ " # % & * : < > ? / \ { | }]+) [^\.]$
and for third (?!.*\.\.).*)$ but they are not working properly and have to integrate them into one expression.
Please help.
What about just
^\w(?:\w+\.?)*\w+$
I made a small test here
EDIT
This also works
^\w(?:\w\.?)*\w+$
How about:
/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/
explanation:
The regular expression:
(?-imsx:^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?/\\{|}]+).+$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[~"#%&*:<>?/\\{|}] any character of: '~', '"', '#', '%',
+ '&', '*', ':', '<', '>', '?', '/', '\\',
'{', '|', '}' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In action (perl script):
my $re = qr/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/;
while(<DATA>) {
chomp;
say /$re/ ? "OK : $_" : "KO : $_";
}
__DATA__
.abc
abc.
a..b
abc
output:
KO : .abc
KO : abc.
KO : a..b
OK : abc