PhpStorm search and replace multiple times between two strings - regex

In PhpStorm IDE, using the search and replace feature, I'm trying to add .jpg to all strings between quotes that come after $colorsfiles = [ and before the closing ].
$colorsfiles = ["Blue", "Red", "Orange", "Black", "White", "Golden", "Green", "Purple", "Yellow", "cyan", "Gray", "Pink", "Brown", "Sky Blue", "Silver"];
If the "abc" is not in between $colorsfiles = [ and ], there should be no replacement.
The regex that I'm using is
$colorsfiles = \[("(\w*?)", )*
and replace string is
$colorsfiles = ["$2.jpg"]
The current result is
$colorsfiles = ["Brown.jpg"]"Sky Blue", "Silver"];
While the expected output is
$colorsfiles = ["Blue.jpg", "Red.jpg", "Orange.jpg", "Black.jpg", "White.jpg", "Golden.jpg", "Green.jpg", "Purple.jpg", "Yellow.jpg", "cyan.jpg", "Gray.jpg", "Pink.jpg", "Brown.jpg", "Sky Blue.jpg", "Silver.jpg"];

You should have said that you're trying it on IDE
Even though I don't use PHPStorm, I'm posting solution tested on my NetBeans.
Find : "([\w ]+)"([\,\]]{1})
Replace : "$1\.jpg"$2

why you need regex for this? a simple array_map() will do the trick for you.
<?php
function addExtension($color)
{
return $color.".jpg";
}
$colorsfiles = ["Blue", "Red", "Orange", "Black", "White", "Golden", "Green", "Purple", "Yellow", "cyan", "Gray", "Pink", "Brown", "Sky Blue", "Silver"];
$colorsfiles_with_extension = array_map("addExtension", $colorsfiles);
print_r($colorsfiles_with_extension);
?>
Edit: I've tested it on my PhpStorm, let's do it like
search:
"([a-zA-Z\s]+)"
replace_all:
"$1.jpg"

You may use
(\G(?!^)",\s*"|\$colorsfiles\s*=\s*\[")([^"]+)
and replace with $1$2.jpg. See this regex demo.
The regex matches $colorsfiles = [" or the end of the previous match followed with "," while capturing these texts into Group 1 (later referred to with $1 placeholder) and then captures into Group 2 (later referred to with $2) one or more chars other than a double quotation mark.
Details
(\G(?!^)",\s*"|\$colorsfiles\s*=\s*\[") -
\G(?!^)",\s*" - the end of the previous match (\G(?!^)), ", substring, 0+ whitespaces (\s*) and a " char
| - or
\$colorsfiles\s*=\s*\[" - $colorsfiles, 0+ whitespaces (\s*), =, 0+ whitespaces, [" (note that $ and [ must be escaped to match literal chars)
([^"]+) - Capturing group 2: one or more (+) chars other than " (the negated character class, [^"])

Related

match everything but a given string and do not match single characters from that string

Let's start with the following input.
Input = 'blue, blueblue, b l u e'
I want to match everything that is not the string 'blue'. Note that blueblue should not match, but single characters should (even if present in match string).
From this, If I replace the matches with an empty string, it should return:
Result = 'blueblueblue'
I have tried with [^\bblue\b]+
but this matches the last four single characters 'b', 'l','u','e'
Another solution:
(?<=blue)(?:(?!blue).)+(?=blue|$)|^(?:(?!blue).)+(?=blue|$)
Regex demo
If you regex engine support the \K flag, then we can try:
/blue\K|.*?(?=blue|$)/gm
Demo
This pattern says to match:
blue match "blue"
\K but then forget that match
| OR
.*? match anything else until reaching
(?=blue|$) the next "blue" or the end of the string
Edit:
On JavaScript, we can try the following replacement:
var input = "blue, blueblue, b l u e";
var output = input.replace(/blue|.*?(?=blue|$)/g, (x) => x != "blue" ? "" : "blue");
console.log(output);

Break string into words using scan method + regexp, if word has `'` character, drop this character and everything after it

sample_string = "let's could've they'll you're won't"
sample_string.scan(/\w+/)
Above gives me:
["let", "s", "could", "ve", "they", "ll", "you", "re", "won", "t"]
What I want:
["let", "could", "they", "you", "won"]
Been playing around in https://rubular.com/ and trying assertions like \w+(?<=') but no luck.
Given:
> sample_string = "let's could've they'll you're won't"
You can do split and map:
> sample_string.split.map{|w| w.split(/'/)[0]}
=> ["let", "could", "they", "you", "won"]
You can use
sample_string.scan(/(?<![\w'])\w+/)
sample_string.scan(/\b(?<!')\w+/)
See the Rubular demo. The patterns (they are absolute synonyms) match
(?<![\w']) - a location in the string that is not immediately preceded with a word or ' char
\b(?<!') - a word boundary position which is not immediately preceded with a ' char
\w+ - one or more word chars.
See the Ruby demo:
sample_string = "let's could've they'll you're won't"
p sample_string.scan(/(?<![\w'])\w+/)
# => ["let", "could", "they", "you", "won"]

split string with negative regex pattern

I want to split sting by non alphanumeric characters except a particular pattern .
Example :
string_1 = "section (ab) 5(a)"
string_2 = "section -bd, 6(1b)(2)"
string_3 = "section - ac - 12(c)"
string_4 = "Section (ab) 5(1a)(cf) (ad)"
string_5 = "section (ab) 5(a) test (ab) 5 6(ad)"
i want to split these strings in a way so that i can get bellow output
["section", "ab", "5(a)"]
["section", "bd", "6(1b)(2)"]
["section", "ac", "12(c)"]
["section", "ab", "5(1a)(cf)", "ad"]
["section", "ab", "5(a)", "test", "ab, "5", "6(ad)"]
To be more precise i want to split into every non-alphanumeric characters except this \d+([\w\(\)]+) pattern .
It can be achieved in this regex inside findall using:
\b\w+(?:\([^)]*\))*
RegEx Demo
Code:
>>> import re
>>> reg = re.compile(r'\b\w+(?:\([^)]*\))*')
>>> arr = ['section (ab) 5(a)', 'section -bd, 6(1b)(2)', 'section - ac - 12(c)', 'Section (ab) 5(1a)(cf) (ad)', 'section (ab) 5(a) test (ab) 5 6(ad)']
>>> for el in arr:
... print ( reg.findall(el) )
...
['section', 'ab', '5(a)']
['section', 'bd', '6(1b)(2)']
['section', 'ac', '12(c)']
['Section', 'ab', '5(1a)(cf)', 'ad']
['section', 'ab', '5(a)', 'test', 'ab', '5', '6(ad)']
You can use
\d+[\w()]+|\w+
See the regex demo.
Details
\d+[\w()]+ - 1+ digits and then 1+ word or ( or ) chars
| - or
\w+ - 1+ word chars.
In ElasticSearch, use
"tokenizer": {
"my_tokenizer": {
"type": "pattern",
"pattern": "\\d+[\\w()]+|\\w+",
"group": 0
}
}

Using REGEX to find keywords

Hey all I have the following badwords that I want to check to see if they are in a string that I am passing:
Private Function injectionCheck(queryString As String) As Integer
Dim badWords() As String = {"EXEC", "EXECUTE", ";", "-", "*", "--", "#",
"UNION", "DROP", "DELETE", "UPDATE", "INSERT", "MASTER",
"TABLE", "XP_CMDSHELL", "CREATE", "XP_FIXEDDRIVES",
"SYSCOLUMNS", "SYSOBJECTS"}
Dim pattern As String = "\b(" + Regex.Escape(badWords(0))
For Each key In badWords.Skip(1)
pattern += "|" + Regex.Escape(key)
Next
pattern += ")\b"
Return Regex.Matches(queryString, pattern, RegexOptions.IgnoreCase).Count
End Function
For the pattern I get the following:
\b(EXEC|EXECUTE|;|-|\*|--|#|UNION|DROP|DELETE|UPDATE|INSERT|MASTER|TABLE|XP_CMDSHELL|
CREATE|XP_FIXEDDRIVES|SYSCOLUMNS|SYSOBJECTS)\b
Which looks correct to me. But every time I call it I get 0 as the response to this:
Dim blah As Integer = injectionCheck("select * from bob where something = 'you'")
So what am I leaving out that needs to be there since the above should not return 0 - It should return 2 since both * and ' are used that should not be used.
If you plan to match words as whole words, but the keywords may start/end with non-word characters, you might get into a similar trouble. The word boundary meaning depends on the context: \b--\b will match in X--X but not in , --,.
You need an unambiguous boundary matching. Use lookarounds (?<!\w) as leading and (?!\w) as a trailing word boundary.
Implement the changes as shown below:
Dim pattern As String = "(?<!\w)(" + Regex.Escape(badWords(0)) ' <== HERE
For Each key In badWords.Skip(1)
pattern += "|" + Regex.Escape(key)
Next
pattern += ")(?!\w)" ' <== AND HERE

c++11 - regex matching

I am extracting info from a string using regex.
auto version = { // comments shows the expected output
// version // output : (year, month, sp#, patch#)
"2012.12", // "2012", "12", "", ""
"2012.12-1", // "2012", "12", "", "1"
"2012.12-SP1", // "2012", "12", "SP1", ""
"2012.12-SP2-1", // "2012", "12", "SP2", "1"
"I-2013.12-2", // "2013", "12", "", "2"
"J-2014.09", // "2014", "09", "", ""
"J-2014.09-SP2-1", // "2014", "09", "SP2", "1"
};
The regex I have is the following:
// J - 2014 . 09 - SP2 - 1
std::regex regexExpr("[A-Z]?-?([0-9]{4})\\.([0-9]{2})-?(SP[1-9])?-?([1-9])?.*");
and this seems to work well. I am not very confident about this since I don't have much expertise in regex. Is the regex right and can this be improved?
You can just use \w{2,}|\d as your regex that match any combinations of word characters with length 2 or more (\w{2,})(to avoid of matching the j at the beginning of some strings) or a digit with length 1 (\d)(for match the 1 at the end of some strings)!
Demo
You can use sub_match class template for this aim:
The class template sub_match is used by the regular expression engine to denote sequences of characters matched by marked sub-expressions. A match is a [begin, end) pair within the target range matched by the regular expression, but with additional observer functions to enhance code clarity.