SQL Server Management Studio / Visual Studio regular expression bug? - regex

I quite often use the regular expression search and replace in SQL Server Management Studio 10.5 editor to clean up auto generated sql before use. The same behaviour described below occurs in Visual Studio 2010 editor as well.
I have the following sql insert statement that I'd like to clean up:
INSERT INTO [lttadev].[dbo].[GameInst]
([GameInstId]
,[GameSetId]
,[UserInfoId]
,[GameLevelId]
,[CreatedOn]
,[CreatedBy]
,[ModifiedOn]
,[ModifiedBy])
VALUES
(<GameInstId, uniqueidentifier,>
,<GameSetId, uniqueidentifier,>
,<UserInfoId, uniqueidentifier,>
,<GameLevelId, uniqueidentifier,>
,<CreatedOn, datetime,>
,<CreatedBy, uniqueidentifier,>
,<ModifiedOn, datetime,>
,<ModifiedBy, uniqueidentifier,>)
To alter the values clause I have the two following regular expressions:
,[^,]*,\>
\<
Both are replaced by an empty string to delete the unwanted text. The first one strips out the comma, type, second comma and final angle bracket. The second one strips out the initial angle bracket. Both work as expected.
However if I join the regexes up into a single expression to speed the text processing, they select different text:
(,[^,]*,\>|\<)
The first expression selects the expected text. However the second expression gets the first angle bracket as well as the preceding comma. Is this a defect in the regular expression engine or am I not understanding something here?

Try wrapping the first alternative in parentheses:
(,[^,]*,\>)|\<
The documentation doesn't indicate what precedence | has, and it gives only this example:
(sponge|mud) bath
but elsewhere on that page are various examples where | is used with parentheses that would have been unnecessary (though harmless) in another regex engine:
(("[^"]*")|('[^']*'))
(- [a-z][1-3])|(- [0-9][a-z])
(([0-9]+.[0-9]*)|([0-9]*.[0-9]+)|([0-9]+))
so I'm guessing that | might have a fairly high precedence — lower than the trivial within-string concatenation of sponge or mud, but higher than more complex concatenations like in "[^"]*". (Of course, that last expression was written by someone who didn't notice that . has a special meaning, so it may warrant a grain of salt.)
Since I don't have Visual Studio, I can't test this out further.

Related

OpenModelica SimulationOptions 'variableFilter' not working with '^' exceptions

To reduce size of my simulation output files, I want to give variable name exceptions instead of a list of many certain variables to the simulationsOptions/outputFilter (cf. OpenModelica Users Guide / Output) of my model. I found the regexp operator "^" to fullfill my needs, but that didn't work as expected. So I think that something is wrong with the interpretation of connected character strings when negated.
Example:
When I have any derivatives der(...) in my model and use variableFilter=der.* the output file will contain all the filtered derivatives. Since there are no other varibles beginning with character d the same happens with variableFilter=d.*. For testing I also tried variableFilter=rde.* to confirm that every variable is filtered.
When I now try to except by variableFilter=^der.*, =^rde.* or =^d.*, I get exactly the same result as without using ^. So the operator seems to be ignored in this notation.
When I otherwise use variableFilter=[^der].*, =[^rde].* or even =[^d].*, all wanted derivation variables are filtered from the ouput, but there is no difference between those three expressions above. For me it seems that every character is interpretated standalone and not as as a connected string.
Did I understand and use the regexp usage right or could this be a code bug?
Side/follow-up question: Where can I officially report this for software revision?
_
OpenModelica v.1.19.2 (64-bit)

Visual studio find all Assert.IsTrue that have no message

I'm trying to find using visual studio regular expressions (https://msdn.microsoft.com/en-us/library/2k3te2cs.aspx) all calls to Assert.IsTrue that only pass the Boolean argument, for example Assert.IsTrue(parameter) would be one and Assert.IsTrue(parameter, "message") wouldn'
t.
For simple things, Assert.IsTrue\(([a-zA-Z ]+)\) does the trick, this works for the example provided above but not for things when there are evaluations done for example Assert.IsTrue(2 > 3). For this I tried using Assert.IsTrue\((.+[^,])\) so it matches everything that doesn't have "," but this only filters when the , is at the end, I'm not sure how to filter commas inside.
Finally, what I really want to do (which I'm not sure if it's possible with regular expressions alone) is to find Assert.IsTrue that have only one parameter but this parameter could be a method call, so it could have commas or not, something like Assert.IsTrue(isTrue(p1,p2))
I don't know why you want the solution to be programmatically, but if it's ok for you to have a Visual Studio based solution, you could just look for an example of Assert.IsTrue(parameter);, right click the method and select "Find all references`.
Remove the .+ from your second example and add a * also you should escape the period directly after Assert.
Assert\.IsTrue\(([^,])*\)
As for your expanded expression, something like this might work.
Assert\.IsTrue\(([a-zA-Z\.])*\(.*\)\)
This should let you find what you're looking for.

ctags and R regex

I'm trying to use ctags with R. Using this answer I've added
--langdef=R
--langmap=r:.R.r
--regex-R=/^[ \t]*"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t]function/\1/f,Functions/
--regex-R=/^"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^f][^u][^n][^c][^t][^i][^o][^n]/\1/g,GlobalVars/
--regex-R=/[ \t]"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^f][^u][^n][^c][^t][^i][^o][^n]/\1/v,FunctionVariables
to my .ctags file. However when trying to generate tags with the following R file
x <- 1
foo <- function () {
y <- 2
return(y)
}
Only the function foo is recognized. I want to generate tags also for variables (i.e for x and y in my code above). Should I change the regex in the ctags file? Thanks in advance.
Those patterns don't appear to be correct, because a variable is only identified if it is assigned a value that has the same length as "function" but does not share any of its characters. E.g.:
x <- aaaaaaaa
The following ctags configuration should work properly:
--langdef=R
--langmap=R:.R.r
--regex-R=/^[ \t]*"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t]function[ \t]*\(/\1/f,Functions/
--regex-R=/^"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^\(]+$/\1/g,GlobalVars/
--regex-R=/[ \t]"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t][^\(]+$/\1/v,FunctionVariables/
The idea here is that a function must be followed by a parenthesis while the variable declarations do not. Given the limitations of a regex parser this parenthesis must appear in the same line as the keyword "function" for this configuration to work.
Update: R support will be included in Universal Ctags, so give it a try and report bugs and missing features.
The problem with the original regex is that it does not capture variable declarations / assignments that are shorter than 8 characters (since the regex requires 8 characters that are NOT f-u-n-c-t-i-o-n in that specific order).
The problem with the regex updated by Vitor is that, it does not capture variable declarations / assignments which incorporate parantheses, a situation which is quite common in R.
And a forgotton issue is that, neither of them captures superassigned objects with <<- (only local assignment with <- is covered).
As I checked the Universal Ctags repo, although pcre regex is planned to be supported as also raised in Issue 519 and a commented out pcre flag exists in the configuration file, there is not yet support for pcre type positive/negative lookahead or lookbehind expressions in ctags unfortunately. When that support starts, things will be much easier.
My solution, first of all, takes into account superassignment by "<{1,2}-" and also the fact that the right side of assignment can include:
either a sequence of 8 characters which are not f-u-n-c-t-i-o-n followed by any or no characters (.*)
OR a sequence of at most 7 any characters (.{1,7})
My proposed regex patterns are as such:
--langdef=R
--langmap=R:.R.r
--regex-R=/^[ \t]*"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<-[ \t]function[ \t]*\(/\1/f,Functions/
--regex-R=/^"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<{1,2}-[ \t]([^f][^u][^n][^c][^t][^i][^o][^n].*|.{1,7}$)/\1/g,GlobalVars/
--regex-R=/[ \t]"?([.A-Za-z][.A-Za-z0-9_]*)"?[ \t]*<{1,2}-[ \t]([^f][^u][^n][^c][^t][^i][^o][^n].*|.{1,7}$)/\1/v,FunctionVariables/
And my tests show that, it captures much more objects than the previous ones.
Edit:
Even these do not capture the variables that are declared as arguments to functions. Since ctags regex makes a non-greedy search and non-capturing groups do not work, the result is still limited. But we can at least capture the first arguments to functions and define them as an additional type as a, Arguments:
--regex-R=/.+function[ ]*\([ \t]*(,*([^,= \(\)]+)[,= ]*)/\2/a,Arguments/

EditPad: Need a regex that handles multiple possible data formats

First, I'm using EditPadPro for my regex cleaning, so any answers given should work within that environment.
I get a large spreadsheet full of data that I have to clean every day. I've managed to get it down to a couple of different regexes that I run, and this works... but I'm curious to see if it's possible to reduce down to a single regex.
Here is some sample data:
3-CPC_114851_70095_70095_CAN-bre
3-CPC_114851_70095_70095_CAN
b11-ao1-113775-bre
b7-ao-114441
b7-ao-114441-bre
b7-ao1-114441
b7-ao1-114441-bre
http://go.nlvid.com/results1/?http://bo
go.nlv/results1/?click
b4-sm-1359
b6-sm-1356-bre
1359_195_1453814569-bre
1356_104_1456856729
b15-rad-8905
b15-rad-8905-bre
Here is how the above data needs to end up:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
So, there are numerous rules, such as:
In cases of more than 2 underscores, the result needs to contain only the value immediately after the first underscore, and everything from the dash onwards.
In cases where the string contains "-ao-", "-ao1-", everything prior to the final numeric string should be removed.
If a question mark is present, everything from the mark onwards should be removed.
If the string contains "-sm-" or "-rad-", everything prior to those alpha strings should be removed.
If the string contains 2 underscores, averything after the first numeric string up to a dash
(if present) should be removed, and the string "sm-" should be prepended.
Additionally there is other data that must be left untouched, including but not limited to:
113535|24905|24905
as well as many variations on this pattern of xxxxxx|yyyyy|zzzzz (and not always those string lengths)
This may be asking way too much of regex, I'm not sure as I'm not great with it. But I've seen some pretty impressive things done with it, so I thought I'd put this out to the community and see what you come back with.
Jonathan, I can wrap all of those into one regex, except the last one (where you prepend sm- to a string that does not contain sm). It is not possible in this context, because we cannot capture "sm" to reuse in the replacement, and because there is no "conditional replacement" syntax in EPP.
That being said, you can achieve what you want in EPP with two regexes and one macro to chain the two.
Here is how.
The solution below is tested in EPP.
Regex 1
Press Ctrl + Sh + F to enter Search / Replace mode
Enter the following Search and Replace in the appropriate boxes
At the top right of the Search bar, click the Favorite Searches pull-down, select "Add", give it a name, e.g. Regex 1
Search:
(?mx)^
(?=(?:[^_\r\n]*?_){3})[^_\r\n]+?_([^_\r\n]+)[^-\r\n]+(-[^\r\n]+)?
|
[^\r\n]*?-ao1?-\D*([^\r\n]+)
|
([^\r\n?]*)(?=\?)[^\r\n]+
|
[^\r\n]*?-((?:sm|rad)-[^\r\n]+)
Replace:
\1\2\3\4\5
Regex 2
Same 1-2-3 steps as above.
Search
^(?!(?:[^_\r\n]*?_){3})(?=(?:[^_\r\n]*?_){2})(\d+)(?:[^-\r\n]+(-[^\r\n]+)?)
Replace
sm-\1\2
Chaining Regex 1 and Regex 2
Top menu: Macros, Record Macro, give it a name.
Click the Favorite searches pulldown, select Regex 1
Hit Replace All.
Click the Favorite searches pulldown, select Regex 2
Hit Replace All.
Macros, Stop recording.
Whenever you want to do your sequence of replacements, pull it by name under the Macros menu.
Testing This
I have tested my "Jonathan macro" on your input. Here is the result:
114851-bre
114851
113775-bre
114441
114441-bre
114441
114441-bre
http://go.nlvid.com/results1/
go.nlv/results1/
sm-1359
sm-1356-bre
sm-1359-bre
sm-1356
rad-8905
rad-8905-bre
Try this:
Toggle the Search Panel : SHIFT+CTRL+F
SEARCH: .*?((?:sm-|rad-)?(?:(?:\d+|[\w\.]+\/.*?))(?:-\w+)?$)
REPLACE: $1
Check REGEX and WORDS
Click Replace All or Hit CTRL+ALT+F3
Check the image below:

RegEx to match C# Interface file names only

In the Visual Studio 2010 "Productivity Power Tools" plugin (which is great), you can configure file tabs to be color coded based on regular expressions.
I have a RegEx to differentiate the tab color of Interface files (IMyInterface.cs) from regular .cs files:
[I]{1}[A-Z]{1}.*\.cs$
Unfortunately this also color codes any file that starts with a capital "I" (Information.cs, for example).
How could this RegEx be modified to only include files where the first letter is "I" and the second letter is not lowercase?
Your regexp should work as it is. It is possible that it is executed in ignore case mode. Try to disable that mode inside your regexp with (?-i):
(?-i)[I]{1}[A-Z]{1}.*\.cs$
Try this
"(?-i)^I[A-Z].*\.cs$"
Sets case insensitve off first.
Regular Expression Options
Filenames in Windows are not case-sensitive, so obviously Power Tools will be using case-insensitive matching.
How about this:
^I([A-Z][A-Za-z0-9]*){1}\.cs$
so
IMyInterface.cs // matches, MyInterface
IB.cs // B
IBa.cs // Ba
IC1.cs // C1
I.cs // don't
Information.cs // don't
Prooflink
I based mine off the default patterns placed in there and used ^I[A-Z].*\.cs[ ]*(\[read only\])?$ - I think that there is a precedence question, though, so that if you leave the default .cs pattern matcher in there and add yours to the end, you might have yours hidden, because it matched the general one first.
And you can't re-order or delete them, so it's a little fiddly to get the ordering working well ...
FWIW, I don't think the case-sensitivity question ((?-i) makes any difference.