Visual Studio Find and Replace Regular Expressions help - regex

I'd like to replace some assignment statements like:
int someNum = txtSomeNum.Text;
int anotherNum = txtAnotherNum.Text;
with
int someNum = Int32.Parse(txtSomeNum.Text);
int anotherNum = Int32.Parse(txtAnotherNum.Text);
Is there a good way to do this with Visual Studio's Find and Replace, using Regular Expressions? I'm not sure what the Regular expression would be.

I think in Visual Studio, you can mark expressions with curly braces {txtSomeNum.Text}. Then in the replacement, you can refer to it with \1. So the replacement line would be something like Int32.Parse(\1).
Update: via #Timothy003
VS 11 does away with the {} \1 syntax and uses () $1

Comprehensive guide
http://blog.goyello.com/2009/08/22/do-it-like-a-pro-%E2%80%93-visual-studio-find-and-replace/

This is what I was looking for:
Find: = {.*\.Text}
Replace: = Int32.Parse(\1)

Better regex for the original problem would be
find expr.: {:i\.Text}
replace expr.: Int32.Parse(\1)
Check out:
http://msdn.microsoft.com/en-us/library/2k3te2cs%28v=vs.100%29.aspx
for the definitive guide to regex in VS.
I recently completed reformatting another programmer's C++ project from hell. He had completely and arbitrarily entered, or left out at random, spaces and tabs, indentation (or not), and an insane level of parentheses nesting, such that none of us used to coding standards of any type could even begin to read the code before I started. Used regex extensively to find and correct abnormal constructs. In a couple of hours, I was able to correct major problems in approximately 125,000 lines of code without actually looking at most of them. In one particular single find/replace I changed more than 22,000 lines of code in 125 files, total time under 10 seconds.
Particularly useful constructs in the regex:
:b+ == one or more blanks and/or tabs.
:i == matches a C-style variable name or keyword (i.e. while, if,
pick3, bNotImportant)
:Wh == a whitespace char.; not just blank or tab
:Sm == any of the arithmetic symbols (+, -, >, =, etc.)
:Pu == any punctuation mark
\n == line break (useful for finding where he had inserted 8 or 10 blank lines)
^ == matches start of line ($ to match end)
While it would have been nice to match some other regex standard (duh), I did find a number of the MS extensions extremely useful for searching a code base, such as not having to define 'identifier' hundreds of times as "[A-Za-z0-9]+", instead just using ":i".

Related

RegEx Expression for Eclipse that searches for all items that have not been dealt with

To help stop SQL Injection attacks, I am going through about 2000 parameter requests in my code to validate them. I validate them by determining what type of value (e.g. integer, double) they should return and then applying a function to them to sanitize the value.
Any requests I have dealt with look like this
*SecurityIssues.*(request.getParameter
where * signifies any number of characters on the same line.
What RegExp expression can I use in the Eclipse search (CTRL+H) which will help me search for all the ones I have not yet dealt with, i.e. all the times that the text request.getParameter appears when it is not preceded by the word SecurityIssues?
Examples for matches
The regular expression should match each of the following e.g.
int companyNo = StringFunctions.StringToInt(request.getParameter("COMPANY_NO‌​"))
double percentage = StringFunctions.StringToDouble(request.getParameter("MARKETSHARE"))
int c = request.getParameter("DUMMY")
But should not match:
int companyNo = SecurityIssues.StringToIntCompany(request.getParameter("COMP‌​ANY_NO"))
With inspiration and the links provided by #michaeak (thank you), as well as testing in https://regex101.com/ I appear to have found the answer:
^((?!SecurityIssues).)*(request\.getParameter)
The advantage of this answer is that I can blacklist the word SecurityIssues, as opposed to having to whitelist the formats that I do want.
Note, that it is relatively slow, and also slowed down my computer a lot when performing the search.
Try e.g.
=\s*?((?!SecurityIssues).)*?(request\.getParameter)\(
Notes
Paranthesis ( or ) are special characters for group matching. They need to be escaped with \.
If .* will match anything, also characters that you don't want it to match. So .*? will prevent it from matching anything (reluctant). This can be helpful if after the wildcard other items need to match.
There is a tutorial at https://docs.oracle.com/javase/tutorial/essential/regex/index.html , I think all of these should be available in eclipse. You can then deal with generic replacement also.
Problem
From reading Regular expression that doesn't contain certain string and Regular expression to match a line that doesn't contain a word? it seems quite difficult to create a regex matching anything but not to contain a certain word.

Visual Studio: Find / Replace using regular expression to replace

I'm using the Find / Replace tool of visual studio to find something using regular expressions and make a replace. I have this in the find: Assert.IsTrue\(([^,;]*)\) *; and the replace Assert.IsTrue($1, "$1");, so what this does is looking for every Assert.IsTrue(); whith anything in the parentheses except for commas , and semicolons ;, and then add whatever was on the parentheses inside quotes and after a comma ,. So, if I have Assert.IsTrue(wtv) it will be replaced with Assert.IsTrue(wtv,"wtv").
The problem is when the wtv has quotes or break lines, so if I have
Assert.IsTrue("wtv" == "wtv") it will be replaced to
Assert.IsTrue("wtv" == "wtv", ""wtv" == "wtv"") and
Assert.IsTrue(wtv ||
wtv2)
will be replaced to
Assert.IsTrue(wtv ||
wtv2, "wtv ||
wtv2")
. What I want to do is eliminate in the replacement the new line \r and the quotes, so the results after the replacement are
Assert.IsTrue("wtv" == "wtv", "wtv == wtv") and
Assert.IsTrue(wtv ||
wtv2, "wtv ||wtv2")
First I'll clarify that this doesn't really solve the problem, is just a nasty work around, not a real solution. I post it just in case someone needs a work around as I do (I doubt it but well). Still, as this is not the real answer I'll not mark it as so (unless someone explains me that it's not possible a real answer), and new answers are always welcomed.
What I did was in the part that need regex add several groups that ([^,;"\r\n]*) first look for anything that it's not a comma, semicolon, quote or new-line, then look for (["\r\n]*) ne-line or semicolon, and then repeat this pattern several times.
So, what this will do as it's using * it will look if it happens 0 or more times, and is repeated several times in case that there is more than one comma or more than one new-line (note that if there are none, that's not a problem since I'm using *). And, the replace would look like
Assert.IsTrue($1$2$3..., "$1$3$5...");
where in the first argument I put all the numbers, and in quotes I put only the odd numbers since the even are either non existent or quote / new-line.
I used 31 of these, so if there are more than 15 groups of commas / new-line, it will not be found and replaced
The find
Assert.IsTrue\(([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)(["\r\n]*)([^,;"\r\n]*)\) *;
The replace
Assert.IsTrue($1$2$3$4$5$6$7$8$9$10$11$12$13$14$15$16$17$18$19$20$21$22$23$24$25$26$27$28$29$30$31, "$1$3$5$7$9$11$13$15$17$19$21$23$25$27$29$31");
This works for the examples I provided and for any example with less than 15 groups of commas / new-lines, if I can come up with something better (since this a really crappy solution), I'll add it here.

visual Studio 2010 regular expressions for 'Find In Files'

I have look at the many stackoverflow posts concerning VS regular expressions and read the Microsoft page concerning regular expressions but still cannot determine where I am going wrong.
Microsoft VS regex
I want to find all lines which include the word, attribute, but which are not comment lines (do not contain the // symbol).
I have tried using the regular expression
~(^ *//).*attribute.*
meaning:
~(^ *//) --> exclude lines which begin with '//' preceded by zero or more spaces
.* --> match any character zero or more times
attributes --> match the word attributes
.* --> match any character that comes after the word attribute
I have tried several other regular expressions with about the same amount of failure. I am wondering if anyone can spot something obvious that I am not doing.
I also gave the below a try:
~( *//).*attribute.* (thinking maybe the carat was being taken as a literal instead of special)
~(//).*attribute.* (thinking maybe the * was being taken as a literal instead of special)
~(//)attribute (imminent failure but will try anything)
\s*~(//).*attributes.*
I saw quite a few posts suggesting to use the find command in batch. This can be done, but I would prefer to have the ability to double click on the findings so that the file will be opened and already scrolled to the correct location.
How about this one.
^(?=.*attribute.*\n)(?!.*//).*

Notepad++ masschange using regular expressions

I have issues to perform a mass change in a huge logfile.
Except the filesize which is causing issues to Notepad++ I have a problem to use more than 10 parameters for replacement, up to 9 its working fine.
I need to change numerical values in a file where these values are located within quotation marks and with leading and ending comma: ."123,456,789,012.999",
I used this exp to find and replace the format to:
,123456789012.999, (so that there are no quotation marks and no comma within the num.value)
The exp used to find is:
([,])(["])([0-9]+)([,])([0-9]+)([,])([0-9]+)([,])([0-9]+)([\.])([0-9]+)(["])([,])
and the exp to replace is:
\1\3\5\7\9\10\11\13
The problem is parameters \11 \13 are not working (the chars eg .999 as in the example will not appear in the changed values).
So now the question is - is there any limit for parameters?
It seems for me as its not working above 10. For shorter num.values where I need to use only up to 9 parameters the string for serach and replacement works fine, for the example above the search works but not the replacement, the end of the changed value gets corrupted.
Also, it came to my mind that instead of using Notepad++ I could maybe change the logfile on the unix server directly, howerver I had issues to build the correct perl syntax. Anyone who could help with that maybe?
After having a little play myself, it looks like back-references \11-\99 are invalid in notepad++ (which is not that surprising, since this is commonly omitted from regex languages.) However, there are several things you can do to improve that regular expression, in order to make this work.
Firstly, you should consider using less groups, or alternatively non-capture groups. Did you really need to store 13 variables in that regex, in order to do the replacement? Clearly not, since you're not even using half of them!
To put it simply, you could just remove some brackets from the regex:
[,]["]([0-9]+)[,]([0-9]+)[,]([0-9]+)[,]([0-9]+)[.]([0-9]+)["][,]
And replace with:
,\1\2\3\4.\5,
...But that's not all! Why are you using square brackets to say "match anything inside", if there's only one thing inside?? We can get rid of these, too:
,"([0-9]+),([0-9]+),([0-9]+),([0-9]+)\.([0-9]+)",
(Note I added a "\" before the ".", so that it matches a literal "." rather than "anything".)
Also, although this isn't a big deal, you can use "\d" instead of "[0-9]".
This makes your final, optimised regex:
,"(\d+),(\d+),(\d+),(\d+)\.(\d+)",
And replace with:
,\1\2\3\4.\5,
Not sure if the regex groups has limitations, but you could use lookarounds to save 2 groups, you could also merge some groups in your example. But first, let's get ride of some useless character classes
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
We could merge those groups:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
^^^^^^^^^^^^^^^^^^^^
We get:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(,)
Let's add lookarounds:
(?<=\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(?=,)
The replacement would be \2\4\6\8.
If you have a fixed length of digits at all times, its fairly simple to do what you have done. Even though your expression is poorly written, it does the job. If this is the case, look at Tom Lords answer.
I played around with it a little bit myself, and I would probably use two expressions - makes it much easier. If you have to do it in one, this would work, but be pretty unsafe:
(?:"|(\d+),)|(\.\d+)"(?=,) replace by \1\2
Live demo: http://regex101.com/r/zL3fY5

Bug in Mathematica: regular expression applied to very long string

In the following code, if the string s is appended to be something like 10 or 20 thousand characters, the Mathematica kernel seg faults.
s = "This is the first line.
MAGIC_STRING
Everything after this line should get removed.
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
...";
s = StringReplace[s, RegularExpression#"(^|\\n)[^\\n]*MAGIC_STRING(.|\\n)*"->""]
I think this is primarily Mathematica's fault and I've submitted a bug report and will follow up here if I get a response. But I'm also wondering if I'm doing this in a stupid/inefficient way. And even if not, ideas for working around Mathematica's bug would be appreciated.
Mathematica uses PCRE syntax, so it does have the /s aka DOTALL aka Singleline modifier, you just prepend the (?s) modifier before the part of the expression in which you want it to apply.
See the RegularExpression documentation here: (expand the section labeled "More Information")
http://reference.wolfram.com/mathematica/ref/RegularExpression.html
The following set options for all regular expression elements that follow them:
(?i) treat uppercase and lowercase as equivalent (ignore case)
(?m) make ^ and $ match start and end of lines (multiline mode)
(?s) allow . to match newline
(?-c) unset options
This modified input doesn't crash Mathematica 7.0.1 for me (the original did), using a string that is 15,000 characters long, producing the same output as your expression:
s = StringReplace[s,RegularExpression#".*MAGIC_STRING(?s).*"->""]
It should also be a bit faster for the reasons #AlanMoore explained
The best way to optimize the regex depends on the internals of Mathematica's regex engine, but I would definitely get rid of the (.|\\n)*, as #Simon mentioned. It's not just the alternation--although it's almost always a mistake to have an alternation in which every alternative matches exactly one character; that's what character classes are for. But you're also capturing each character when you match it (because of the parentheses), only to throw it away when you match the next character.
A quick scan of the Mathematica regex docs doesn't turn up anything like the /s (Singleline or DOTALL) modifier, so I recommend the old JavaScript standby, [\\s\\S]* -- match anything that is whitespace or anything that isn't whitespace. Also, it might help to add the $ anchor to the end of the regex:
"(^|\\n)[^\\n]*MAGIC_STRING[\\s\\S]*$"
But your best option would probably be not to use regexes at all. I don't see anything here that requires them, and it would probably be much easier as well as more efficient to use Mathematica's normal string-manipulation functions.
Mathematica is a great executive toy but I'd advise against trying to do anything serious with it like regexs over long strings or any kind of computation over significant amounts of data (or where correctness is important). Use something tried and tested. Visual F# 2010 takes 5 milliseconds and one line of code to get the correct answer without crashing:
> let str =
"This is the first line.\nMAGIC_STRING\nEverything after this line should get removed." +
String.replicate 2000 "0123456789";;
val str : string =
"This is the first line.
MAGIC_STRING
Everything after this li"+[20022 chars]
> open System.Text.RegularExpressions;;
> #time;;
--> Timing now on
> (Regex "(^|\\n)[^\\n]*MAGIC_STRING(.|\\n)*").Replace(str, "");;
Real: 00:00:00.005, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
val it : string = "This is the first line."