Trying to batch a bunch of conversions with a regex find and replace and I'm not actually sure it's possible, let alone how to achieve this.
Using Sublime text as an editor, open to other tools to accomplish this if possible.
Two sample lines :
Session::flash( 'error', 'Only users with permission may view the directory user.' );
Session::flash( 'error', 'System user ID does not exist.' );
** Desired outcome: **
flash('Only users with permission may view the directory user.')->error();
flash('System user ID does not exist.')->error();
Current Regex that matches:
Session::flash(\s*'error',.* )
Is it possible, that the text lines can be saved and reused in the replace lines? Hoping for a solution along the lines of $variable so that I may replace the strings with something like
** Wishful line: **
flash('$variable')->error();
Thanks folks!
You could use 2 capturing groups and in the replacement referer to those capturing groups.
\bSession::flash\(\s*'([^']+)',\s*('[^']+')\s*\);
In the replacement use:
flash($2)->$1;
Explanation
\bSession::flash\(\s* Match a wordboundary to prevent Session being part of a longer word, then match Session::flash( followed by 0+ times a whitespace char
'([^']+)' Match ', then capture in group 1 matching not a ' using a negated character class, then match ' again
,\s* Match a comma followed by 0+ times a whitespace char
('[^']+') Capture in group 2 matching ', then not ' and again '
\s*\); Match 0+ times a whitespace char followed by );
Regex demo
Result:
flash('Only users with permission may view the directory user.')->error;
flash('System user ID does not exist.')->error;
What you're looking for here is a capture group and a backreference.
In a regular expression anything wrapped in ( and ) is captured for later use by whatever performed the regular expression match, which in this case is Sublime Text. The number of capture groups that are supported varies depending on the regular expression library in use, but you generally get at least 10.
In use, every incidence of () creates a capture, with the first capture being numbered as 1, the second one 2 and so on (generally also the entire match is capture 0). Using the sequence \1 or $1 means "use the contents of the first capture group".
As an example, consider the regular expression ^([a-z]).\1. Breaking it down:
^ - match starting at the start of a line
( - start a capture
[a-z] - match a single lower case letter
) - end a capture
. - match any character
\1 - match whatever the contents of the first capture was
Given this input:
abc
aba
bab
This regular expression matches aba and bab because the first character in both cases is captured as \1 and needs to match later. However abc doesn't match because in that case \1 is a, but the third character is c.
The result of the capture can also be used in the replacement text as well the same way. If you modify your regular expression, you can capture the text you want to keep and use it in the replacement.
As a note, your regex as outlined in your question above doesn't match in Sublime because ( starts a capture group, and thus does not match the ( that's actually in the text. If you're using Sublime and you turn on the Highlight Matches option in the Find/Replace panel, you'll see that your regex is not considered a match.
Find:
Session::flash\(\s*'error'\s*,(.*)\);
Replace:
flash(\1)->error();
Result:
flash( 'Only users with permission may view the directory user.' )->error();
flash( 'System user ID does not exist.' )->error();
This is more or less the regex outlined in your question, except:
The ( and ) in your regex have been replaced with \( and \), which is to tell the regex that this should match a literal ( and not be considered to start a capture.
The .* is changed to (.*) which means "whatever text appears here, capture it for later use.
The replacement text refers to the text captured as \1 and puts it back in the replacement.
Related
tl;dr:
I am searching for a way to match the closing character sequence based upon the style of the opening sequence syntax in PHP with PCRE-style regular expressions.
The task
I am writing a module to capture all translatable strings from written PHP code. One responsibility of this module will be to also capture any translation context stated within the code. This context is provided as part of an options array.
In PHP (afair starting with version 5.4), there are two different styles possible to define an array:
a) array(...)
b) [...]
I now want to write a regular expression that is able to recognize both styles. The pattern should be able to correctly match the ending character sequence depending on the style chosen to start the array.
Unfortunately, I was not able to find any documentation on how to apply the IF-statement to a given capturing group.
In theory it should look something like this:
/ ... (array\(|\[) ... (?(?=\1==\[)\]|\)) ... /
(Note: "..." in the line above should indicate that the regex pattern is longer than stated here. This should only serve as an example for what I am trying to achieve)
The (?(?=\1==\[)\]|\))part translated to "normal language": If the contents of the first capturing group is an opening square bracket, then the pattern should capture a closing square bracket, otherwise a closing round bracket is required.
Is it possible to achieve something like this? Any help is greatly appreciated!
Thanks in advance
Chris
The regex answer is
(?:array(\()|\[).*?(?(1)\)|])
See the regex demo
Details
(?:array(\()|\[) - a non-capturing group matching either array( while capturing ( into Group 1, or [ char
.*? - any 0 or more chars other than line break chars as few as possible
(?(1)\)|]) - a conditional construct: if Group 1 is matched (the ( char is in the group memory buffer) the ) must match at the current position, else ].
If you want to capture the values using the same capturing group, you could also use a branch reset group (?| to refer to group 1 for the value.
To get the values between the opening and closing parenthesis or square brackets, you could use a negated character class [^ to match any char except the listed in the character class.
(?|array(\([^()]*\))|(\[[^][]*]))
Explanation
(?| Branch reset group
array match literally
( Capture group 1
\([^()]*\) Match (...)
) Close group 1
| Or
( Capture group 2
\[[^][]*] Match [...]
) Close group 2
) close branch reset group
Regex demo
I have a dasbhoard in Google Data Studio
I'm trying to create a custom field and replace all the characters that are going after # and ? sing (of course them too). But this formula - i dont know why - does not work
I was trying this one
REGEXP_REPLACE(Landing Page,'(#|\?)(.*)','')
Could you please help?
The pattern you tried (#|\?)(.*) caputures either # or ? using a capturing group with an alternation | followed by capturing 0+ times any char in another capturing group.
But in the replacement there is an empty string specified, removing all that is matched.
You could make use of a character class ([#?]) in a capturing group to capture one of the listed.
To only do the replacement where there is something after the match, you could match 1+ times any character except a newline using .+
To remove what comes after the matched character, you could refer to the capturing group using \\1 so that you keep the # or ? and remove what is matched afterwards.
The pattern could look like:
([#?]).+
I am tasked to refactor namespaces in vs2015 Solution, removing duplicate/repeating words.
I need a FIND regex that returns these namespaces and everywhere that may have been used or referenced.
I need replace regex to remove the second occurrence of the word from namespace.
EXAMPLE
TestApp.SA.TestApp => TestApp.SA
TestApp.TestApp.SA => TestApp.SA
Here is my regex to Find(which I know can be better) : TestApp.*?(TestApp)
Somebody please help with an expression for replace, which I think is to set the second occurrence of TestApp to whiteSpace ?
The patterns I will suggest are not a 100% safe solution, but will show you a way to use regex for search and search and replace in your files.
The basic expressions you may use for the task are
(\w+)\.(\w+\.)*\1
and
Find: (\w+)((?:\.\w+)*)\.\1
Replace: $1$2
See the regex demo
The patterns mean:
(\w+) - match and capture 1+ alphanumeric/underscore chars into Group 1
\. - matches a literal dot
(\w+\.)* - zero or more sequences ((...)*) of 1+ word chars followed with a dot (each subsequent submatch will erase the Group 2 buffer, but it is not important when just searching)
\1 - a backreference to the contents captured in Group 1
The second pattern is almost the same, just the capturing groups are a bit adjusted for the replacement numbered backreferences to replace text correctly.
In Notepad++, I use the expression (?<=").*(?=") to find all strings in between quotes. It would the seem rather trivial to be able to only keep those results. However, I cannot find an easy solution for this.
I think the problem is that Notepad++ is not able to make multiple selections. But there must be some kind of workaround, right? Perhaps I must invert the regex and then find/replace those results to end up with the strings I want.
For example:
blablabla "Important" blabla
blabla "Again important" blablabla
I want to keep:
Important
Again important
There is no great solution for this and depending on your use case I would recommend writing a quick script that actually uses your first expression and creates a new file with all of the matches (or something like this). However, if you just want something quick and dirty, this expression should get you started:
[^"]*(?:"([^"]*)")?
\1\n
Explanation:
[^"]* # 0+ non-" characters
(?: # Start non-capturing group
" # " literally
( # Start capturing group
[^"]* # 0+ non-" characters
) # End capturing group
" # " literally
)? # End non-capturing group AND make it optional
The reason the optional non-capturing group is used is because the end of your file may very well not have a string in quotes, so this isn't a necessary match (we're more interested in the first [^"]* that we want to remove).
Try something like this:
[^"\r\n]+"([^"]+)"[^"\r\n]+
And replace with $1. The above regex assumes there will be only 2 double quotes in each line.
[^"]+ matches non-quote characters.
[^"\r\n]+ matches non-quote, non newline characters.
regex101 demo
Hard to be certain from your post, but I think you may want : SEE BELOW
<(?<=")(.*)(?=")
The part you keep will be captured as \2.
(?<=")(.*)(?=")
\1 \2 \3
Your original regex string uses parentheses to group characters for evaluation. Parentheses ALSO group characters for capturing. That is what I added.
Update:
The regex pattern you provided doesn't seem to work correctly. Won't this work?
\"(.*)\"
\1 now captures the content.
I have the following regex which suppose to match email addresses:
[a-z0-9!#$%&'*+\\-/=?^_`{|}~][a-z0-9!#$%&'*+\\-/=?^_`{|}~.]{0,63}#[a-z0-9][a-z0-9\\-]*[a-z0-9](\\.[a-z0-9][a-z0-9\\-]*[a-z0-9])+$.
I have the following code in AS3:
var mails:Array = str.toLowerCase().match(pattern);
(pattern is RegExp with the mentioned regular expression).
I retrieve two results, when str is gaga#example.com:
gaga#example.com
.com
Why?
.com was captured by the last part of the regex (\\.[a-z0-9][a-z0-9\\-]*[a-z0-9]).
Regular expressions capture substrings matched by portions of the pattern that are enclosed in () for later use.
For example, the regex 0x([0-9a-fA-F]) will match a hexadecimal number of the form 0x9F34 and capture the hex portion in a separate group.
I'm not sure about your regex, there is a good tutorial about email validation here.
To me this reads:
[a-z0-9!#$%&'*+\-/=?^_{|}~] # single of chosen character set
[a-z0-9!#$%&'*+\\-/=?^_{|}~.]{0,63} # any of chosen character set with the addition of , \
#
[a-z0-9] # single alpha numeric
[a-z0-9\-]* # any alphanumeric with the addition of -
a-z # single alphabetical
0-9+ # at least one number
$ # end of line
. # any character
As to why you get two sub-strings in your array, its because both match the pattern - see docs
gaga#example.com is the match of the whole regular expression and .com is the last match of the first group ((\\.[a-z0-9][a-z0-9\\-]*[a-z0-9])).
([a-z0-9!#$%&'*+\\-/=?^_`{|}~][a-z0-9!#$%&'*+\\-/=?^_`{|}~.]{0,63}#[a-z0-9\\-]*[a-z0-9]+\\.([a-z0-9\\-]*[a-z0-9]))+$
This seem to work as expected (tested in Regex Tester). Last capturing group removed.
To add to what others have said:
There are two results because it matches both the whole email address, and the last group surrounded by parentheses.
If you don't want a group to be captured you can add ?: to the beginning of the group. Look in the AS documentation for non-capturing groups:
http://www.adobe.com/livedocs/flash/9.0/main/wwhelp/wwhimpl/js/html/wwhelp.htm?href=00000118.html#wp129703
"A noncapturing group is one that is used for grouping only; it is not "collected," and it does not match numbered backreferences. Use (?: and ) to define noncapturing groups, as follows:
var pattern = /(?:com|org|net)/;"