Regex split by space except strings inside single quotation - regex

I have the following string
This is some testing dbo.GetPersonData(this_.PersonId,'Date contract recieved from the client') cs.dbo.Person 'this is test'
I want it generic and obtain the following result
This
is
some
testing
dbo.GetPersonData(This_.PersonId,
'DateContract received from the client'
)
cs.dbo.person
'this is test'
What I am trying to achieve here is to split by spaces apart of anything between single quotation
I have tried using
'(.*?)'|\S+
However, it is ignoring: <'Date contract recieved from the client'>
Result

I have tried using
'(.*?)'|\S+
Your try was not so bad, we just have to exclude the quote from the second alternative:
'.*?'|[^\s']+
regex101

\S matches any non-whitespace, and ', too.
Use
'([^']*)'|[^\s']+
Or, if you have PCRE:
(?|'([^']*)'|([^\s']+))
See proof.
Explanation
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^\s']+ any character except: whitespace (\n, \r,
\t, \f, and " "), ''' (1 or more times
(matching the most amount possible))

You should use the following template:
(\n.*?)*)\n\s*\n

Related

Regex for matching predefined rules for italic text formatting

I'm trying to write a regex for matching user input that will be turned into italic format using markdown.
In the string i need to find the following pattern: an asterisk followed by any kind of non-whitespace character and ending with any kind of non-whitespace character followed by an asterisk.
So basically: substring *substring substring substring* substring should spit out *substring substring substring*.
So far I came up only with /\*(?:(?!\*).)+\*/, which matches everything between two asterisks, but it doesn't take into consideration whether the substring between asterisks starts or end with whitespace - which it shouldn't.
Thank you for your input! :)
Use
\*(?![*\s])(?:[^*]*[^*\s])?\*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[*\s] any character of: '*', whitespace (\n,
\r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^*]* any character except: '*' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[^*\s] any character except: '*', whitespace
(\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\* '*'

Regex to remove all instance of letter outside of quotes

I have a string of text:
\n new"test \n aaaa" \n ta \n `this is a \n newline that should be kept`
My goal is to match all \n's outside of backticks (`), quotes ("), or single quotes ('). Based off another question (https://stackoverflow.com/a/48953880/14465957), I switched the positive lookahead used to a negative one, which now matches all newlines outside of quotes ("). However, it doesn't work when I attempted to ignore single and back ticks.
What am I doing wrong?
Working quotes:
https://regex101.com/r/ooqz5d/1/
If you're using PCRE, you can use a control verb to skip everything inside of a quote closure:
(['"`]).*?\1(*SKIP)(*F)|\\n
(['"`]) any type of quote, put it in group 1
.*? any characters, non greedy
\1 the quote that captured in group 1
(*SKIP)(*F) skip the current match, which is a quote closure
|\\n match a \n
See the test cases
Also, if you need to ignore escaped quotes(\", \' etc), you may try
(['"`])(?:(?<!\\)\\(?:\\\\)*\1|(?!\1).)*\1(*SKIP)(*F)|\\n
Check the test cases
Using JavaScript
For JavaScript, you can't use control verbs. But you can use group capture to replace outbound \n
Regex
((['"`])[\s\S]*?\2)|\\n
Substitution
$1
const regex = /((['"`])[\s\S]*?\2)|\\n/g;
const text = String.raw`\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n', text);
const result = text.replace(regex, '$1');
console.log('after\n', result);
Real line breaks
const regex = /((['"`])[\s\S]*?\2)|\n/g;
const text = `\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n----\n', text);
const result = text.replace(regex, '$1');
console.log('after\n----\n', result);
Use
text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1')
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
JavaScript code:
const text = String.raw`\nnew"test\naaaa\\\n"\nta\n\`this is a \nnewline that should be kept\`\n'this is a \nnew test'\n`
console.log(text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1'))

Using Regex match for sentences that do not contain a specific word

Trying to develop a regular expression to extract sentences that don't contain specific words. To keep it simple, IHere is a simple example:
Input:
Sagittal scout images cervicothoracic : Mild-to-moderate multilevel spondylosis. Fracture present.
Desired Output:
Fracture present.
Attempt #1
Regex:
[^.]*(?!cervi(c|x))[^.]*\.
Actual Output:
Sagittal scout images cervicothoracic : Mild-to-moderate multilevel spondylosis. Fracture present.
Attempt #2:
Regex:
[^.]*[^(cervi(c|x))][^.]*\.
Actual Output:
Sagittal scout images cervicothoracic : Mild-to-moderate multilevel spondylosis. Fracture present.
Can verify these results in https://regexr.com/
Use
(?<![^.])\s*((?:(?!cervi[cx])[^.])*\.)
See proof
Explanation
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
[^.] any character except: '.'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
cervi 'cervi'
--------------------------------------------------------------------------------
[cx] any character of: 'c', 'x'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^.] any character except: '.'
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
) end of \1

Need help stopping at parenthesis

I'm new with regex and really need some help here. I'm trying to create a regex that finds everything before the first space and open parenthesis in the example below. Basically, I'm trying to just keep the country name or names and exclude everything after.
Falkland Islands (Malvinas)
I tried this but it isn't working:
(\w+)(?=[\s+(\w+\s+])
Use
^.*?(?=\s*\()
See proof here.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\( '('
--------------------------------------------------------------------------------
) end of look-ahead

a regex for cleaning quotes between quotes

I'm trying write a regex that clears double quotes inside double quotes of a shortcode attribute.
I wrote this regex
\="(.*?)\"
and it matches the string between quotes http://regex101.com/r/jW0uC4
But when I have attribute value that also contains double quotes it fails http://regex101.com/r/pL9bI0
So, how can i improve the regex as it will catch the string only between =" and last "
Thanks in advance
This regex matches the sample text you provided:
/="(.*?)"(?=\s*(?:[a-z]+=|]))/
Explanation:
=" '="'
( group and capture to \1:
.*? any character except \n (0 or more times
(matching the least amount possible))
) end of \1
" '"'
(?= look ahead to see if there is:
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
(?: group, but do not capture:
[a-z]+ any character of: 'a' to 'z' (1 or
more times (matching the most amount
possible))
= '='
| OR
] ']'
) end of grouping
) end of look-ahead
But user errors are hard to fix and this regex may not work in all cases (for example if text contains an = character). You should make sure user input is escaped properly.