a regex for cleaning quotes between quotes - regex

I'm trying write a regex that clears double quotes inside double quotes of a shortcode attribute.
I wrote this regex
\="(.*?)\"
and it matches the string between quotes http://regex101.com/r/jW0uC4
But when I have attribute value that also contains double quotes it fails http://regex101.com/r/pL9bI0
So, how can i improve the regex as it will catch the string only between =" and last "
Thanks in advance

This regex matches the sample text you provided:
/="(.*?)"(?=\s*(?:[a-z]+=|]))/
Explanation:
=" '="'
( group and capture to \1:
.*? any character except \n (0 or more times
(matching the least amount possible))
) end of \1
" '"'
(?= look ahead to see if there is:
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
(?: group, but do not capture:
[a-z]+ any character of: 'a' to 'z' (1 or
more times (matching the most amount
possible))
= '='
| OR
] ']'
) end of grouping
) end of look-ahead
But user errors are hard to fix and this regex may not work in all cases (for example if text contains an = character). You should make sure user input is escaped properly.

Related

Regex for matching predefined rules for italic text formatting

I'm trying to write a regex for matching user input that will be turned into italic format using markdown.
In the string i need to find the following pattern: an asterisk followed by any kind of non-whitespace character and ending with any kind of non-whitespace character followed by an asterisk.
So basically: substring *substring substring substring* substring should spit out *substring substring substring*.
So far I came up only with /\*(?:(?!\*).)+\*/, which matches everything between two asterisks, but it doesn't take into consideration whether the substring between asterisks starts or end with whitespace - which it shouldn't.
Thank you for your input! :)
Use
\*(?![*\s])(?:[^*]*[^*\s])?\*
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
[*\s] any character of: '*', whitespace (\n,
\r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^*]* any character except: '*' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
[^*\s] any character except: '*', whitespace
(\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\* '*'

How do I use regex to add text in between a set of quotes?

I have a regular expression to match a string in my configuration file.
/\s+apiEndpoint:\n\s+''/gm
This regex matches the following field in my JavaScript file.
apiEndpoint:
'';
How do I extend this regex so that it inserts text https://localhost:6000 between the set of single quotes?
apiEndpoint:
'https://localhost:6000';
Use this to add:
(\s+apiEndpoint:\n\s+)''
Use this to update or add:
(\s+apiEndpoint:\n\s+)'[^']*'
Replace with $1'https://localhost:6000'.
See proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
apiEndpoint: 'apiEndpoint:'
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
'' '\'\''
--------------------------------------------------------------------------------
[^']* any character except: ''' (0 or more times
(matching the most amount possible))
You can add matching between ' using negated character class starting with [^
\s+apiEndpoint:\n\s+'[^\s']*'
The pattern matches:
\s+apiEndpoint:\n\s+ Match 1+ whitespace chars, apiEndpoint: a newline and 1+ whitespace chars
' Match '
[^\s']* match 0+ times any char except a whitspace char or '
' Match closing '
» Regex demo
Or if you want to allow whitespace chars and excaped \' in between:
\s+apiEndpoint:\n\s+'[^'\\]*(?:\\.[^'\\]*)*'
This pattern matches:
\s+apiEndpoint:\n\s+ Match 1+ whitespace chars, apiEndpoint: a newline and 1+ whitespace chars
' Match '
[^'\\]* Match 0* times any char except ' or \
(?:\\.[^'\\]*)* Optionally repeat matching \ followed by any char and again 0* times any char except ' or \
' Match '
» Regex demo | Regex graph
as regex you can use:
/apiEndpoint:'https?:\/\/\w+(:[0-9]*)?(\.\w+)?'/gm
to insert the text you can use:
e.g. baseUrl = 'https://localhost:6000'
`${baseUrl}`
Note:
You can check your regex here: https://regex101.com/
e.g.:

Insert carriage return at first instance of character in Notepad++

I have a text file with hundreds of lines. Each line contain the below information:
software.cisco.com , Added by IT, ZZ 6584
What I am trying to do is insert carriage return where the first comma is. I'm able to do this with search/replace and using the /n expression. Problem is it inserts carriage return twice leaving me with 3 lines. I am trying to insert carriage return at first comma only and keep rest of line.
Before:
software.cisco.com , Added by IT, ZZ 6584
After:
software.cisco.com
#Added by IT, ZZ 6584
Use
^(.*?),\s*
Replacement: $1\n#.
See proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))

How can I only match a leading space followed by a non numeric in Regex

I need some assistance, I have been at this for hours now. I am not winning.
I need to match a space only if its followed by a non-numeric character (which I will replace with blank to remove it from the string).
I have tried this ^[^\s+]+\D and it works to some extent.
if I have the string " JLABCD-1 836397-BTD56517" it return correctly without the leading space, which is what I want "JLABCD-1 836397-BTD56517"
if I have " BefhMS JLZARL-1 836397-BTD56517" it returns this "JLZARL-1 836397-BTD56517"
But if I don't have a space before the the first word, I want it to ignore all other spaces.
If I have "_JLABCD-1 836397-BTD56517", I want to return "JLABCD-1 836397-BTD56517" or the original string as it is. Not "836397-BTD56517" which is what I am getting at the moment.
Is this possible with Regex?
Use a look ahead:
"^ +(?=\D)"
but it seems you just want to match any leading spaces. If so, just use:
"^ +"
The negated (due to its first character being ^) character class [^\s+] in your regex matches anything not whitespace or a +.
Use
^\s+(\D)
Replace with $1, it is a backreference to the capturing group (\D). Or \1 if $1 does not work.
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\D non-digits (all but 0-9)
--------------------------------------------------------------------------------
) end of \1

Match only the first occurrence of a phrase

I have the following Json:
{"field1": "someText",
"field2": "Text Again",
"field3": "Text Again"}
I would need to match the first occurrence of any phrase starting with a capital letter (such as "Text Again", for example)
I have written the following:
("[A-Za-z]+\s[A-Za-z]+")
It does work fine when testing with https://regex101.com/, for instance. However, it does not seem to correctly function as part of the usage of ReplaceTextWithMapping (Apache NiFi). Is the regex incorrect?
Thank you for your help
Description
:\s*"\s*(?=[A-Z])(?![^"]*?\s[a-z])([A-Za-z\s]+)"
This regular expression does the following:
finds the first title case string in value side of what appears to be JSON encoded string
ensures each word is capitalized
returns the value inside the quotes as capture group 1
Example
Live Demo
https://regex101.com/r/eO0xW6/1
Source String
{"field1": "someText",
"field2": "Text again",
"field3": "Text Again"}
First Match
Text Again
Explanation
Summary
:\s*" validates that where only checking the value side of the JSON
\s* matches any spaces after the opening quote if they exist
(?=[A-Z]) ensure the first character in the string is uppercase
(?![^"]*?\s[a-z]) looks for any spaces that are followed by a lower case character. If found then this isn't a match
([A-Za-z\s]+) captures all the characters inside the quote
" matches the quote
Detailed
NODE EXPLANATION
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
[^"]*? any character except: '"' (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[A-Za-z\s]+ any character of: 'A' to 'Z', 'a' to
'z', whitespace (\n, \r, \t, \f, and "
") (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
I have posted my findings on the issue to the Apache NiFi mailing list:
http://apache-nifi-developer-list.39713.n7.nabble.com/Issues-with-Regex-used-with-ReplaceTextWithMapping-where-am-I-going-wrong-tc10592.html
I have not received any confirmation from the community, but it seems to me that, although the regex [A-Z][A-Za-z]*\s[A-Z][A-Za-z]* is correct in this case, the processor (ReplaceTextWithMapping) does not deal well with blank spaces (\s) and the string contains space between two words.