On a recreative coding website ( I came across a piece of code which prints out the following string:
If a problem can't be solved with *regex*, it's a bad problem.
I was suprised seeing the code because it seems that it only uses a regular expression to accomplish this:
(all credits for this piece of art go to the original author from the website I mentioned above: Quantum)
I really want to know how this works exactly but I couldn't find anything on Google, can someone explain this to me? Oh, and it's written in Perl.

The code uses an Eval-group inside the regex to execute arbitrary code. You have to use use re 'eval' to enable the behavior.
Eval-groups look like (?{...}) with the part inside the curly braces being evaluated.
The rest of the regex is OR'ing and XOR'ing characters. For instance '['^'+']' is equivalent to 'p'. . simply concatenates all those characters.
You can paste the part after the =~ matching operator into your perl shell and see the final regex that is being matched/executed.


VBA regex vs other regexes - I don't understand why it's doing that?

So, I have been working with js previously and I have some regexes that I have tested with regexr and regex101 and they work great. However, when I decided to use that expression for VBA, it stopped working properly.
The original js regex:
(?: AT FOLLOWING LOCATION[ \w]+\n)?(?: +FX -[\s\S]*?)((?:MP[ ]+[\d.\w]+[ -]+?[\w\b .\/]+)(?:[ -]+[\w\b .\/]+$)?)[\s\S]*?(?:\d+\.+\d\d[\) ]+(?:FT.? *\n|IN.? *\n)|USE X AT MP +\d{1,3}\.\d{1,3}|(?=\n {4}\w)|(?=\n {2}\()|$)
For the sake of our discussion, let's use the following as a match sample:
The vba regex initially tested was the same, and it did not work. So I started looking further and tested the chunks one at a time, and noted that the problem seems to lie in \n. For the example above, it would be (?:[ -]+[\w\b .\/]+$)? that failed and I tested [ -]+[\w\b .\/]+ and it worked, while [ -]+[\w\b .\/]+\n did not (gim flags were enabled). What I don't understand is, why did it work with js and other algorithms - it looks like \n is legal with vba Regex?
And, more importantly, other than understanding why it behaves how it did, what would be the best ways to make it work?
Edit 1:
Based on comments, I have changed \n to [\r\n]+ instead. With that, it worked with my tester strings that uses vbCrLf as line breaks. However, when applying to the actual document, it no longer worked. It appears that the text read from the document shows up as an up arrow when being displayed through immediate via debug.print(). I tried highlighting it but when I do so, it changes from an up arrow to a blank square (like one of those unreadable characters). I tried copying the document text over to Notepad++ so that I can read the symbols better, and it appears that they are CrLf, but I don't know if clipboard changed anything or not. The symbol shown in word shows it as a soft return instead of a hard return. What am I missing still?

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
I want to add to this regex and output only matches that start with 'ca',
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

UNIX: How would I grep in a script using a variable as a search parameter for a file?

Before I Start, this isn't exactly how it seems and I did search the web for a while before coming here. Basically I have a script where the user passes in a string and stores it in a variable. I then have to take that word and search for all the subwords that could be made from it in a dictionary file. The problem I am having is I need to make sure the words are at least 4 characters long. I do not have the best grasp on regular expressions. I'm aware of the techniques you can use just logically can't piece it together sometimes. I will show you the line of code and explain my reasoning behind why I think it should be this way. Then, could someone correct me on my logic? I am not looking for someone to send me the working line of code but perhaps correct my logic so I can understand better and derive the answer on my own.
words=$(grep -iE '(["$text"]{4,})' /usr/dict/words)
echo "$words"
For example if I pass in string college I should get output like
I am storing the command in another variable to echo. I am not sure why exactly, It just seems from what I saw online most people were rather fond of this. Using grep with -i for ignore case and -E for regular expression or (egrep) I believe the expression needs to be enclosed in single quote parenthesis for expressions. $text is the variable I stored the users input in. I know $ usually signifies the ending in and [] is a range and "" makes it read the variable rather than print what is there. Then {4,} meaning four or more characters. then the last part is the path to the file. Any input would be appreciated and again, I do not like being spoon fed answers it's an easy way to learn nothing. I would just like corrections on my logic if all possible. Thanks everyone!!
If by "subwords" you mean permutations of its letters, then your command is fine except for the quotes. Unfortunately you have to do it like this:
words=$(grep -iE '(['"$text"']{4,})' /usr/dict/words)
This way you pass to grep the single quoted string so that the shell doesn't interpret its special symbols. But at the same time you have to expand your $text var, thus you have to make a gap inside your single-quoted string, and in that gap place your variable in double quotes.
Hope I didn't spoil it for you.

Regular expression to extract argument in a LaTeX command

I have a large LaTeX document where I have defined a macro like this:
I want to get rid of it by replacing in all the document \abs{...} by \left|...\right|, so I am thinking in a regular expression. I am familiar with their basics but I do not know how to find the bracket that closes the expression, given that the following situations are possible:
What I have been able to do for the moment is \\abs\{([^\}]*)\} and then replace as \left\1\right|but it is only able to deal with the pieces of code of the first kind only.
By the way, I am using the TeXstudio regular expression engine.
Well, I did a little more of research and I managed to solve it. According to this response in a similar question, it suffices to use recursive regular expressions and a text editor that supports them, for example Sublime Text 2 (I could not do it with TeXstudio). This does the trick:
Find: \\abs\{(([^\{\}]|(?R))*)\}
Replace: \\left|\1\\right|
EDIT 1: Actually this solves only the two first cases, but fails with the third, so any idea on how to improve the regular expression would be appreciated.
EDIT 2: See comment from #CasimiretHippolyte for full answer using \\abs\{((?>[^{}]+|\{(?1)\})*)\}

Regular expression to remove comment

I am trying to write a regular expression which finds all the comments in text.
For example all between /* */.
/* Hello */
When I do this:/\*.*\*/, it behaves odd and nothing is shown. What is wrong with it?
EDIT: The comments can be spread across multiple lines
Unlike the example posted above, you were trying to match comments that spanned multiple lines. By default, . does not match a line break. Thus you have to enable multi-line mode in the regex to match multi-line comments.
Also, you probably need to use .*? instead of .*. Otherwise it will make the largest match possible, which will be everything between the first open comment and the last close comment.
I don't know how to enable multi-line matching mode in Sublime Text 2. I'm not sure it is available as a mode. However, you can insert a line break into the actual pattern by using CTRL + Enter. So, I would suggest this alternative:
If Sublime Text 2 doesn't recognize the \n, you could alternatively use CTRL + Enter to insert a line break in the pattern, in place of \n.
I encountered this problem several years ago and wrote an entire article about it.
If you don't have access to non-greedy matching (not all regex libraries support non-greedy) then you should use this regex:
If you do have access to non-greedy matching then you can use:
Also, keep in mind that regular expressions are just a heuristic for this problem. Regular expressions don't support cases in which something appears to be a comment to the regular expression but actually isn't:
someString = "An example comment: /* example */";
// The comment around this code has been commented out.
// /*
// */
Just want to add for HTML Comments is is this
Just an additionnal note about using regex to remove comments inside a programming language file.
Doing this you must not forget the case where you have the string /* or */ inside a string in the code - like var string = "/*"; - (we never know if you parse a huge code that is not yours)!
So the best is to parse the document with a programming language and have a boolean to save the state of an open string (and ignore any match inside open string).
Again a string delimited by " can contain a \" so pay attention with the regex!
You cannot write a regular expression that would be able to correctly find all comments, or even one type of comments - single-line or multiline.
Regular expressions can only provide a partial match, one that would would cover perhaps 90% of all cases, but that's it.
The syntax for regular expression is so complex, it is only possible to identify them correctly in 100% of cases by doing a full expression evaluation, which in turn is based on tokenizing the code. The latter is a huge task, which is implemented by all AST parsers today. See AST Explorer
Only a proper-written AST parser can tell you precisely where all regular expressions are located in your code. You would have to write a parser then based on that.
Or, you could use one of the existing libraries that already do all that, like decomment.
RegEx examples where any head-on approach is going to stumble, being unable to tell a regular expression from a comment block:
/\// - it will think this reg-ex is a single-line comment
/\/*/ - it will think this reg-ex opens a multi-line comment
The answer which user1919238 wrote works. Just corroborating that here, although the many upvotes probably do give you a clue.
It got rid of all these annoying block comments, put here just to show the usefulness/thank user1919238 for saving time:
/*# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJzb3VyY2VzIjpbIndlYnBhY2s6Ly9zdHlsZXMvZ2xvYmFscy5jc3MiXSwibmFtZXMiOltdLCJtYXBwaW5ncyI6IkFBQUE7O0VBRUUsVUFBVTtFQUNWLFNBQVM7RUFDVDt3RUFDc0U7QUFDeEU7O0FBRUE7RUFDRSxjQUFjO0VBQ2QscUJBQXFCO0FBQ3ZCOztBQUVBO0VBQ0Usc0JBQXNCO0FBQ3hCIiwic291cmNlc0NvbnRlbnQiOlsiaHRtbCxcbmJvZHkge1xuICBwYWRkaW5nOiAwO1xuICBtYXJnaW46IDA7XG4gIGZvbnQtZmFtaWx5OiAtYXBwbGUtc3lzdGVtLCBCbGlua01hY1N5c3RlbUZvbnQsIFNlZ29lIFVJLCBSb2JvdG8sIE94eWdlbixcbiAgICBVYnVudHUsIENhbnRhcmVsbCwgRmlyYSBTYW5zLCBEcm9pZCBTYW5zLCBIZWx2ZXRpY2EgTmV1ZSwgc2Fucy1zZXJpZjtcbn1cblxuYSB7XG4gIGNvbG9yOiBpbmhlcml0O1xuICB0ZXh0LWRlY29yYXRpb246IG5vbmU7XG59XG5cbioge1xuICBib3gtc2l6aW5nOiBib3JkZXItYm94O1xufVxuIl0sInNvdXJjZVJvb3QiOiIifQ== */
if you want to replace the obnoxious comment from flutter main.dart,
Press cmd +r on mac or cntrl+ r on windows,
type //.* into the box above, leave the box below empty
click .* on the replace dialog, to activate regex,
then click on replace all. this will remove all your comments, you can do this if you want to remove all comments in any file in a flutter.
Additional, to reformat the main.dart
press cmd+a on mac and cntrl+a on windows,
then press cmd+alt(option)+l or cntrl+alt+l, this will reformat the code.
I will attach a picture of the main. dart, the green .* at the top of the page is what you will press to activate the regex.