RegEx Match split by space escaping double quotes / single quotes? - regex

I have the following regex:
/(?:[^\s"]+|"[^"]*")+/g
this works great for double quotes
How can I make it also match paired single quotes?
`a string 'keep together' or "keep together"`
becomes
`a`, `string` `'keep together'`, `or` `"keep together"`

You may use
/(?:[^\s"']+|"[^"]*"|'[^']*')+/g
^ ^^^^^^^^
The '[^']*' part will match a ', then any 0 or more occurrences of chars other than ', and then a '. A single quote must be added to the first negated character class, too.
See the regex demo

Related

How to replace text without changing quoted string with regex

I want to replace
$this->input->post("product_name");
with
$post_data["product_name"];
I want to use notepad++ regex, but I couldn't find proper solution
In find --> $this->input->post("[\*w\]");
In replace --> $post_data["$1"];
but its not working
The $this->input->post("[\*w\]"); pattern does not work because:
$ is a special char matching the end of a line, you need to use \$ to match it as a literal char
[\*w'\] is a malformed pattern as there is no matching unescaped ] for the [ that opens a character class. Also, w just matches w, not any letter, digit or underscore, \w does that.
You may use
Find What: \$this->input->post\("(\w*)"\);
Replace With: $post_data["$1"];
If there can be any char inside double quotes use .*? instead of \w*:
Find What: \$this->input->post\("(.*?)"\);
Regulex graph:
NPP test:
Use this pattern to match desired text \$this->input->post\(("[^"]+")\);
And replace it with pattern \$post_data\[\1\]
Explanation:
\$this->input->post - matach $this->input->post literally
\(("[^"]+")\); - match (literally, then match double quates and everything between them with "[^"]+" and store inside first capturing group, then match ); literally
To replace
$this->input->post("product_name");
by
$post_data["product_name"];
do replace, with regex activated
this->input->post\("(.*)"\);
by
post_data\["\1"\];
The \x with x a number, corresponds to the x-th match catched with the parenthesis. Here we catch any character inside this->input->post(XXXX);
Don't forget to escape special character with \.
Your special characters were []()

Regex to allow only # in the string and block the special characters

I need to write a regex to allow the contents but block the special characters ' and -- in the string. I am working on a product which uses the regex to allow or block contents goofing around the product, I managed to find the below pattern:
^('|--|#|\\x27|\\x23)$
Which is supposed to match --, ' and # in the string, but when I tested this pattern in some online regex pattern matching. it was not highlighting the string when it contains --, ' or #.
See Start of String and End of String Anchors at regular-expressions.info:
The caret ^ matches the position before the first character in the string. Applying ^a to abc matches a. ^b does not match abc at all, because the b cannot be matched right after the start of the string, matched by ^.
Similarly, $ matches right after the last character in the string. c$ matches c in abc, while a$ does not match at all.
Also, \x27 matches a ', and \x23 matches a #, thus, no need doubling them with literals.
So, you just need
(--|\x27|\x23)
Or (using a non-capturing group):
(?:--|\x27|\x23)
See demo

regex for first instance of a specific character that DOESN'T come immediately after another specific character

I have a function, translate(), takes multiple parameters. The first param is the only required and is a string, that I always wrap in single quotes, like this:
translate('hello world');
The other params are optional, but could be included like this:
translate('hello world', true, 1, 'foobar', 'etc');
And the string itself could contain escaped single quotes, like this:
translate('hello\'s world');
To the point, I now want to search through all code files for all instances of this function call, and extract just the string. To do so I've come up with the following grep, which returns everything between translate(' and either ') or ',. Almost perfect:
grep -RoPh "(?<=translate\(').*?(?='\)|'\,)" .
The problem with this though, is that if the call is something like this:
translate('hello \'world\', you\'re great!');
My grep would only return this:
hello \'world\
So I'm looking to modify this so that the part that currently looks for ') or ', instead looks for the first occurrence of ' that hasn't been escaped, i.e. doesn't immediately follow a \
Hopefully I'm making sense. Any suggestions please?
You can use this grep with PCRE regex:
grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" .
Here is a regex demo
RegEx Breakup:
\b # word boundary
translate # match literal translate
\( # match a (
\s* # match 0 or more whitespace
\K # reset the matched information
' # match starting single quote
(?: # start non-capturing group
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
) # end non-capturing group
(?: # start non-capturing group
\\\\. # match a backslash followed by char that is "escaped"
[^'\\\\]* # match 0 or more chars that are not a backslash or single quote
)* # end non-capturing group
' # match ending single quote
Here is a version without \K using look-arounds:
grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" .
RegEx Demo 2
I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern.
In these cases, I like to use something like the following reasoning:
A string is a quote, zero or more characters, and a quote: '.*'
A character is anything that isn't a quote (because a quote terminates the string): '[^']*'
Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*'
Put it all together and you get
grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .

Delete String Within Quotation Marks

I have a xml file with these datas:
PONumber="HC01/1501/000001"
PONumber="HC01/1501/000002"
PONumber="HC01/1501/000003"
PONumber="HC01/1501/000004"
...
PONumber="HC01/1501/000100"
What i want is to delete 'HC01/1501/000001' until 'HC01/1501/000100'.
How to do it using regular expression to replace them with empty string
Thanks in advance
The below regex would replace chars present present with double quotes with an empty string.
Regex:
"[^"]*"
" - matches double quotes.
[^"]* - negated character class which matches any character but not of double quotes, zero or more times.
"- Matches the ending double quote.
So this matches a complete double quoted block. So by replacing the matched double quoted block with "" will give you the expected output.
Replacement string:
""
(?<=").*?(?=")
You can use lookaheads here.See demo.Replace by empty string
https://regex101.com/r/sJ9gM7/85
For each line you can replace the matches string of following regex
(?<==).*
With ''.
Demo
(?<=) is a positive look-behind and (?<==).* will match every thing after =.
If thats only the data that you have, use the RegEx /".*"/ and replacement as "".
Demo & Explanation
Else, use this RegEx: /"HC01\/1501\/000(0[0-9][0-9]|100)"/g
and the replacement string as "".
Demo & Explanation

Remove all characters after a certain match

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5