Regex to remove all instance of letter outside of quotes - regex

I have a string of text:
\n new"test \n aaaa" \n ta \n `this is a \n newline that should be kept`
My goal is to match all \n's outside of backticks (`), quotes ("), or single quotes ('). Based off another question (https://stackoverflow.com/a/48953880/14465957), I switched the positive lookahead used to a negative one, which now matches all newlines outside of quotes ("). However, it doesn't work when I attempted to ignore single and back ticks.
What am I doing wrong?
Working quotes:
https://regex101.com/r/ooqz5d/1/

If you're using PCRE, you can use a control verb to skip everything inside of a quote closure:
(['"`]).*?\1(*SKIP)(*F)|\\n
(['"`]) any type of quote, put it in group 1
.*? any characters, non greedy
\1 the quote that captured in group 1
(*SKIP)(*F) skip the current match, which is a quote closure
|\\n match a \n
See the test cases
Also, if you need to ignore escaped quotes(\", \' etc), you may try
(['"`])(?:(?<!\\)\\(?:\\\\)*\1|(?!\1).)*\1(*SKIP)(*F)|\\n
Check the test cases
Using JavaScript
For JavaScript, you can't use control verbs. But you can use group capture to replace outbound \n
Regex
((['"`])[\s\S]*?\2)|\\n
Substitution
$1
const regex = /((['"`])[\s\S]*?\2)|\\n/g;
const text = String.raw`\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n', text);
const result = text.replace(regex, '$1');
console.log('after\n', result);
Real line breaks
const regex = /((['"`])[\s\S]*?\2)|\n/g;
const text = `\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n----\n', text);
const result = text.replace(regex, '$1');
console.log('after\n----\n', result);

Use
text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1')
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
JavaScript code:
const text = String.raw`\nnew"test\naaaa\\\n"\nta\n\`this is a \nnewline that should be kept\`\n'this is a \nnew test'\n`
console.log(text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1'))

Related

VSCode regex replace using regex group

I want to replace map["someKey"] as double by (map["someKey"] as num).toDouble(), but I just can't get the regex working correctly. Can someone who knows more about regex tell me what to put in Find and Replace in VSCode?
I've tried ([a-zA-Z0-9]*) as double and replaced this by ($1 as num).toDouble(), but this doesn't work.
Use
(\w+\["[^"]+"\]) as double
See regex proof. Replace with ($1 as num).toDouble().
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"]+ any character except: '"' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
"\] '"]'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
as double ' as double'

Regex function to match a specific sequence of characters

I'm looking for a regex that would be True only in case that:
it starts with ${ (1 time, $ can only be there one time)
after that any characters or nothing until } is found
This would match:
${
${test
test${test
${$fdsf$
${test}
This would not match:
$${
$${test
${test}test
Hopefully it's clear :)
Is that possible ?
Does that work for you?
(?<!\$)\$\{[^}]*}?$
https://regex101.com/r/g0vhJo/2
Also use
([^$\n]|^)\$\{[^}\n]*}?$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^$\n] any character except: '$', '\n'
(newline)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\$ '$'
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^}\n]* any character except: '}', '\n' (newline)
(0 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
}? '}' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regex help - match one string but not another

I have been using this:
~^\/student-accommodation\/(?:[^\/]+?)\/([^\/]+)\/$
to match for URLs like
/student-accommodation/manchester/ropemaker-court-manchester/
But now I need to edit this regex so it also matches for URLs like the below. All these new URLs will follow the same pattern and add a string that starts with #utm-source. Importantly they won't have another / in them.
/student-accommodation/manchester/ropemaker-court-manchester/#utm_source=afs&utm_medium=email&utm_campaign=ropemakercourt_afs_dec20
But then I don't want the regex to match for URLs like the below:
/student-accommodation/manchester/ropemaker-court-manchester/en-suite/
Can anyone help? I am a novice at regex! Thanks
Use
^\/student-accommodation\/[^\/]+\/([^\/]+)\/(?:#utm_source.*)?$
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
student- 'student-accommodation'
accommodation
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
#utm_source '#utm_source'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Sort Email:Pass List by first occurrence of each email

I want to sort a email:password by first occurrence of each email.
Example list:
email#example.com:passsword1
email#example.com:passsword2
email#example.com:passsword3
email1#example.com:passsword1
email1#example.com:passsword2
email1#example.com:passsword2
So only
email#example.com:passsword1
email1#example.com:passsword1
should be kept as result.
With my limited Regex skills I worked out this one but I guess I misunderstand something:
^(.*)(\r?\n\1)+(?=:)
Use
^((.*:).*)(?:\r?\n\2.*)+
See proof, use g and m flags.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\r? '\r' (carriage return) (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
\2 what was matched by capture \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)+ end of grouping

Split the string using tags using RegEx

I need help in splitting the below multi tagged string with the tags like <eyn> and <un> and <an>
Your colleague <eyn id='test#test.com'>user</eyn> is now communicating with <un id='test#test.com'>user</un> from <an id='4442729'>test, Inc.</an>
Doing it with Regex
It's not advisable to use a regex to parse HTML due to all the possible obscure edge cases that can crop up, but it seems that you have some control over the HTML so you should able to avoid many of the edge cases the regex police cry about.
Proposed Solution
I'd probably want to collect the entire tag, the ID value, and the raw text between the open and close tags all in one action.
This Regex
<(eyn|un|an)\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\bid=('[^']*'|"[^"]*"|[^'"\s>]*))(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?\/?>(.*?)<\/\w+>
** To see the image better, simply right click the image and select view in new window
Will do the following
find all the eyn, un, an tags
requires the tag to have an ID attribute
allow the ID attribute value to be unquoted or surrounded by ' or "
avoids difficult edge cases that makes pattern matching in HTML difficult
Creates the following capture groups
group 0 the entire tag from open to close
group 1 the tag name
group 2 the ID value
group 3 the raw text inside between the open and close tags
Examples
See also Live demo
Sample Text
Note the difficult edge cases nested inside the second block of text.
Your colleague <eyn id='test#test.com'>user</eyn> is now communicating with <un id='test#test.com'>user</un> from <an id='4442729'>test, Inc.</an>
Your colleague <eyn onmouseover=' if ( 3 > a ) { var
string=" <eyn id=NotTheDroidYouAreLookingFor>R2D2</eyn>; "; } '
id='DesiredDroids'>This is the droid I'm looking for</eyn> is now communicating with <un id="test#test.com">user</un> from <an id=4442729>test, Inc.</an>
Sample Matches
Match 1
Full match 15-49 `<eyn id='test#test.com'>user</eyn>`
Group 1. 16-19 `eyn`
Group 2. 23-38 `'test#test.com'`
Group 3. 39-43 `user`
Match 2
Full match 76-108 `<un id='test#test.com'>user</un>`
Group 1. 77-79 `un`
Group 2. 83-98 `'test#test.com'`
Group 3. 99-103 `user`
Match 3
Full match 114-146 `<an id='4442729'>test, Inc.</an>`
Group 1. 115-117 `an`
Group 2. 121-130 `'4442729'`
Group 3. 131-141 `test, Inc.`
Match 4
Full match 163-326 `<eyn onmouseover=' if ( 3 > a ) { var
string=" <eyn id=NotTheDroidYouAreLookingFor>R2D2</eyn>; "; } '
id='DesiredDroids'>This is the droid I'm looking for</eyn>`
Group 1. 164-167 `eyn`
Group 2. 271-286 `'DesiredDroids'`
Group 3. 287-320 `This is the droid I'm looking for`
Match 5
Full match 353-385 `<un id="test#test.com">user</un>`
Group 1. 354-356 `un`
Group 2. 360-375 `"test#test.com"`
Group 3. 376-380 `user`
Match 6
Full match 391-421 `<an id=4442729>test, Inc.</an>`
Group 1. 392-394 `an`
Group 2. 398-411 `4442729`
Group 3. 406-416 `test, Inc.`
Explained
NODE EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
eyn 'eyn'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
un 'un'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
an 'an'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
[^>=] any character except: '>', '='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=' '=\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=" '="'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
[^'"] any character except: ''', '"'
--------------------------------------------------------------------------------
[^\s>]* any character except: whitespace (\n,
\r, \t, \f, and " "), '>' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
id= 'id='
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^'"\s>]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and " "),
'>' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^>=] any character except: '>', '='
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=' '=\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
=" '="'
--------------------------------------------------------------------------------
[^"]* any character except: '"' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
[^'"\s]* any character except: ''', '"',
whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\s? whitespace (\n, \r, \t, \f, and " ")
(optional (matching the most amount
possible))
--------------------------------------------------------------------------------
\/? '/' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'