sublimetext: using regular expression for replacing string

I'm trying to clean up an long CSV file using SublimeText instead of Excel.
I've created a RegExp which use some greedy expression like
The search pattern works fine, but when it comes to the replacing everything with a blank string, Sublime return an error in the bottom bar I can barely read as it immediately disappears without anything happening.
I do suspect it's an error, and I have read something "too generic" rule or something.
Ok, Sublime Text use a particular syntax for Reg Expr that slightly differs from the one used in coding.
In my specific circumstance, to find a domain in a string using a greedy expression including the carriage return (useful to clean a huge amount of rubbish in an SEO backlinks spreadsheet) I ended up using the following.
Dots doesn't require escaping ... no need to add the end of string ^$ ... it simply works and I didn't spend time investigating the reasons.


VBA regex vs other regexes - I don't understand why it's doing that?

So, I have been working with js previously and I have some regexes that I have tested with regexr and regex101 and they work great. However, when I decided to use that expression for VBA, it stopped working properly.
The original js regex:
(?: AT FOLLOWING LOCATION[ \w]+\n)?(?: +FX -[\s\S]*?)((?:MP[ ]+[\d.\w]+[ -]+?[\w\b .\/]+)(?:[ -]+[\w\b .\/]+$)?)[\s\S]*?(?:\d+\.+\d\d[\) ]+(?:FT.? *\n|IN.? *\n)|USE X AT MP +\d{1,3}\.\d{1,3}|(?=\n {4}\w)|(?=\n {2}\()|$)
For the sake of our discussion, let's use the following as a match sample:
The vba regex initially tested was the same, and it did not work. So I started looking further and tested the chunks one at a time, and noted that the problem seems to lie in \n. For the example above, it would be (?:[ -]+[\w\b .\/]+$)? that failed and I tested [ -]+[\w\b .\/]+ and it worked, while [ -]+[\w\b .\/]+\n did not (gim flags were enabled). What I don't understand is, why did it work with js and other algorithms - it looks like \n is legal with vba Regex?
And, more importantly, other than understanding why it behaves how it did, what would be the best ways to make it work?
Edit 1:
Based on comments, I have changed \n to [\r\n]+ instead. With that, it worked with my tester strings that uses vbCrLf as line breaks. However, when applying to the actual document, it no longer worked. It appears that the text read from the document shows up as an up arrow when being displayed through immediate via debug.print(). I tried highlighting it but when I do so, it changes from an up arrow to a blank square (like one of those unreadable characters). I tried copying the document text over to Notepad++ so that I can read the symbols better, and it appears that they are CrLf, but I don't know if clipboard changed anything or not. The symbol shown in word shows it as a soft return instead of a hard return. What am I missing still?

Why isn't Atom recognizing my regular expressions?

I'm using Atom to format some text data for analysis (I know there are probably better ways of doing it than this so I'm all ears) but it doesn't seem to be recognizing my regular expression.
The text is POS tagged tokens with sentences being delineated with newlines, formatted as such:
I was able to replace all of the tabs (\t) with a front slash (/) no problem, but I'm now trying to turn all newlines that DON'T delineate sentences with just a space. I tried \S\n and it "wasn't found". I also tried to highlight all delineating newlines with ^\n$ but there were only two matches and only at the end of the document.
Am I doing this wrong? My only usage of regex is with Python, so maybe there's just a different way to do it in Atom.
EDIT: I'm just giving up and gonna use Python to process it. Nothing suggested work. The search function seemed to just be bugging out in general (e.g. one search would not work but then if I closed the search function and reopened it, the same search would work) because it's a long file (700,000+ lines) despite it not being a large file, data-wise (6,235 KB). If anyone can recommend a large file text editor, though, it would be appreciated.

Find and replace with regular expression in Notepad++

At the moment, I have a PHP function that gets the contents of a CSV file and puts it into a multi-dimensional array, which contains text that I print out in various places, using the indexes.
an example of use would be:
The first index, [index], would be the name of the page. The second index [pageText] would indicate what it is (text for the page). The third index, [conceptQualityText] indicates what the actual text is. The last index, [$lang] gets the text in the desired language.
->page location
->what is it
->the content
->what language it should be displayed in.
This all worked fine in the previous PHP versions. However, upgrading to 7.2, PHP seems to be a bit more strict. I was a bit more green ~2 years ago when I first made this solution, and now know that since these indexes aren't defined as strings e.g. encapsulated in single quotes like so: ['index'], they fit the notation of a superglobal (DEFINE). I didn't give it much thought back then, but now PHP seems to interpret them as so (superglobals), and so I get thrown the error that x word is an undefined superglobal.
My initial thought is to make a search and replace on my example string:
using the regular expression functionality in Notepad++.
However, the example is just one of many, the notation of the array indexing is basically:
So my question is:
How can I make use of the Notepad++ search and replace, using a regular expression, so that my index pointers become strings, instead of acting as superglobal variables?
e.g. make:
I will need some sort of logic that checks for whatever is inside the brackets and encapsulates them with single quotes, except for the last index, [$lang].
I tried to give as much information as possible, let me know if anything needs to be elaborated.
I tried to refer to these docs without much luck.
I found a solution using
find: \b(localText\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)
replace: $1'$2'$3'$4'$5'$6'
and it works like a charm. Thanks for everyone who took their time to help.
You can use the following regex to match:
The regex matches a Word between Square brackets unless it quoted.
Replace with:
The regex will not match the last brackets because it contains a '$' sign.

Regular expression to remove comment

I am trying to write a regular expression which finds all the comments in text.
For example all between /* */.
/* Hello */
When I do this:/\*.*\*/, it behaves odd and nothing is shown. What is wrong with it?
EDIT: The comments can be spread across multiple lines
Unlike the example posted above, you were trying to match comments that spanned multiple lines. By default, . does not match a line break. Thus you have to enable multi-line mode in the regex to match multi-line comments.
Also, you probably need to use .*? instead of .*. Otherwise it will make the largest match possible, which will be everything between the first open comment and the last close comment.
I don't know how to enable multi-line matching mode in Sublime Text 2. I'm not sure it is available as a mode. However, you can insert a line break into the actual pattern by using CTRL + Enter. So, I would suggest this alternative:
If Sublime Text 2 doesn't recognize the \n, you could alternatively use CTRL + Enter to insert a line break in the pattern, in place of \n.
I encountered this problem several years ago and wrote an entire article about it.
If you don't have access to non-greedy matching (not all regex libraries support non-greedy) then you should use this regex:
If you do have access to non-greedy matching then you can use:
Also, keep in mind that regular expressions are just a heuristic for this problem. Regular expressions don't support cases in which something appears to be a comment to the regular expression but actually isn't:
someString = "An example comment: /* example */";
// The comment around this code has been commented out.
// /*
// */
Just want to add for HTML Comments is is this
Just an additionnal note about using regex to remove comments inside a programming language file.
Doing this you must not forget the case where you have the string /* or */ inside a string in the code - like var string = "/*"; - (we never know if you parse a huge code that is not yours)!
So the best is to parse the document with a programming language and have a boolean to save the state of an open string (and ignore any match inside open string).
Again a string delimited by " can contain a \" so pay attention with the regex!
You cannot write a regular expression that would be able to correctly find all comments, or even one type of comments - single-line or multiline.
Regular expressions can only provide a partial match, one that would would cover perhaps 90% of all cases, but that's it.
The syntax for regular expression is so complex, it is only possible to identify them correctly in 100% of cases by doing a full expression evaluation, which in turn is based on tokenizing the code. The latter is a huge task, which is implemented by all AST parsers today. See AST Explorer
Only a proper-written AST parser can tell you precisely where all regular expressions are located in your code. You would have to write a parser then based on that.
Or, you could use one of the existing libraries that already do all that, like decomment.
RegEx examples where any head-on approach is going to stumble, being unable to tell a regular expression from a comment block:
/\// - it will think this reg-ex is a single-line comment
/\/*/ - it will think this reg-ex opens a multi-line comment
The answer which user1919238 wrote works. Just corroborating that here, although the many upvotes probably do give you a clue.
It got rid of all these annoying block comments, put here just to show the usefulness/thank user1919238 for saving time:
/*# sourceMappingURL=data:application/json;base64,eyJ2ZXJzaW9uIjozLCJzb3VyY2VzIjpbIndlYnBhY2s6Ly9zdHlsZXMvZ2xvYmFscy5jc3MiXSwibmFtZXMiOltdLCJtYXBwaW5ncyI6IkFBQUE7O0VBRUUsVUFBVTtFQUNWLFNBQVM7RUFDVDt3RUFDc0U7QUFDeEU7O0FBRUE7RUFDRSxjQUFjO0VBQ2QscUJBQXFCO0FBQ3ZCOztBQUVBO0VBQ0Usc0JBQXNCO0FBQ3hCIiwic291cmNlc0NvbnRlbnQiOlsiaHRtbCxcbmJvZHkge1xuICBwYWRkaW5nOiAwO1xuICBtYXJnaW46IDA7XG4gIGZvbnQtZmFtaWx5OiAtYXBwbGUtc3lzdGVtLCBCbGlua01hY1N5c3RlbUZvbnQsIFNlZ29lIFVJLCBSb2JvdG8sIE94eWdlbixcbiAgICBVYnVudHUsIENhbnRhcmVsbCwgRmlyYSBTYW5zLCBEcm9pZCBTYW5zLCBIZWx2ZXRpY2EgTmV1ZSwgc2Fucy1zZXJpZjtcbn1cblxuYSB7XG4gIGNvbG9yOiBpbmhlcml0O1xuICB0ZXh0LWRlY29yYXRpb246IG5vbmU7XG59XG5cbioge1xuICBib3gtc2l6aW5nOiBib3JkZXItYm94O1xufVxuIl0sInNvdXJjZVJvb3QiOiIifQ== */
if you want to replace the obnoxious comment from flutter main.dart,
Press cmd +r on mac or cntrl+ r on windows,
type //.* into the box above, leave the box below empty
click .* on the replace dialog, to activate regex,
then click on replace all. this will remove all your comments, you can do this if you want to remove all comments in any file in a flutter.
Additional, to reformat the main.dart
press cmd+a on mac and cntrl+a on windows,
then press cmd+alt(option)+l or cntrl+alt+l, this will reformat the code.
I will attach a picture of the main. dart, the green .* at the top of the page is what you will press to activate the regex.

negative look ahead to exclude html tags

I'm trying to come up with a validation expression to prevent users from entering html or javascript tags into a comment box on a web page.
The following works fine for a single line of text:
..but it won't allow any newline characters because of the dot(.). If I go with something like this:
it will allow multiple lines but the expression only matches '<' and '>' on the first line. I need it to match any line.
This works fine:
but it's ugly and I'm concerned that it's going to break for some users because it's a multi-lingual application.
Any ideas? Thanks!
Note that your RE prevents users from entering < and >, in any context. "2 > 1", for example. This is very undesirable.
Rather than trying to use regular expressions to match HTML (which they aren't well suited to do), simply escape < and > by transforming them to < and >. Alternatively, find a package for your language-of-choice that implements whitelisting to allow a limited subset of HTML, or that supports its own markup language (I hear markdown is nice).
As for "." not matching newline characters, some regexp implementations support a flag (usually "m" for "multi-line" and "s" for "single line"; the latter causes "." to match newlines) to control this behavior.
The first two are basically equivalent to /^[^<>]*$/, except this one works on multiline strings. Any reason why you didn't write the RE that way?
So, I looked into it and there is a .Net 'SingleLine' option for regular expressions that causes "." to also match on the new line character. Unfortunately, this isn't available in the ASP.Net RegularExpressionValidator. As far as I can see, there's no way to make something like ^(?!.(<\w+>)).$ work on a multi-line textbox without doing server-side validation.
I took your advice and went the route of escaping the tags on the server side. This requires setting the validation page directive to 'false' but in this particular instance that isn't a big deal because the comment box is really the only thing to worry about.