Replacing all instances of a name in all strings in a solution - regex

We have a large solution with many projects in it, and throughout the project in forms, messages, etc we have a reference to a company name. For years this company name has been the same, so it wasn't planned for it to change, but now it has.
The application is specific to one state in the US, so localizations/string resource files were never considered or used.
A quick Find All instances of the word pulled up 1309 lines, but we only need to change lines that actually end up being displayed to the user (button text, message text, etc).
Code can be refactored later to make it more readable when we have time to ensure nothing breaks, but for time being we're attempting to find all visible instances and replace them.
Is there any way to easily find these "instances"? Perhaps a type of Regex that can be used in the Find All functionality in Visual Studio to only pull out the word when it's wrapped inside quotes?
Before I go down the rabbit hole of trying to make my job easier and spending far more time than it would have taken to just go line by line, figured I would see if anyone has done something like this before and has a solution.

You can give this a try. (I hope your code is under source control!)
Foobar{[^"]*"([^"]*"[^"]*")*[^"]*}$
And replace with
NewFoobar\1
Explanation
Foobar the name you are searching for
[^"]*" a workaround for the missing non greedy modifier. [^"] means match anything but " that means this matches anything till the first ".
([^"]*"[^"]*")* To ensure that you are matching only inside quotes. This ensures that there are only complete sets of quotes following.
[^"]* ensures that there is no quote anymore till the end of the line $
{} the curly braces buts all this stuff following your companies name into a capturing group, you can refer to it using \1

The VS regex capability is quite stripped down. It perhaps represents 20% of what can be done with full-powered regular expressions. It won't be sufficient for your needs. For example, one way to solve this quote-delimited problem is to use non-greedy matching, which VS regex does not support.
If I were in your shoes, I would write a perl script or a C# assembly that runs outside of Visual Studio, and simply races through all files (having a particular file extension) and fixes everything. Then reload into Visual Studio, and you are done. Well, if all went well with the regex anway.
Ultimately what you really must watch out for is code like this:
Log.WriteLine("Hello " + m_CompanyName + " There");
In this case, regex will think that "m_CompanyName" appears between two quotes - but it is not what you meant. In this case you need even more sophistication, and I think you'll find the answer with a special .net regular expression extension.

Related

Notepad++ / RegEx - Find & replace multiple parts of a line while ignoring any words between

I appreciate what I'm asking may be very simple to more experienced folks. I've spent several hours trying to get my head around RegEx and have gotten close to what I need, but as this is something I'm trying to achieve for a hobby project (RegEx is not something I require in my day job) I'm hoping some of you may be able to help me out.
In short, I have a very large file with tens of thousands of lines of code that I am converting to be readable by another program. All I need to accomplish this is to change some formatting.
I need to find every instance where the tag "{#graphic examplename}" is used, and change it so that only "examplename" remains in square [[ ]] brackets.
Examples of how the tags currently appear (example names can be either single words or multiple):
"{#graphic example1}",
"{#graphic example2}",
"{#graphic example3 with multiple words}"
What I want them to look like when done, replacing the { with [[, removing #graphic, and replacing } with ]].
"[[example1]]",
"[[example2]]",
"[[example3 with multiple words]]"
It's easy enough to do a simple find-and-replace to replace "{#graphic " with "[[", as the #graphic tag is something I want to remove universally however the issue I'm running into is that I can't replicate that with the "}" at the end, because I can't find a way to specify that I only want to replace examples of "}" that come after an instance of "{#graphic " while leaving any other words (the examplename) intact.
Any assistance gratefully received - if the above needs any elaboration please don't hesitate to ask, I understand I may be putting this in amateurish terms.
Regards,
K
Often programs have a way of capturing groups and referencing them later, often with $
so find {#graphic ([^}]+)}
replace [[$1]]
Captures what is inside the () and makes it available in the replace as $1, i.e. the first "capture group".
Regex 101 is an excellent resource for trying these things out:
https://regex101.com/r/r5OX8I/1

How to handle a tilde / swung dash (~) in a regular expression in order to exclude temporary MS Office files?

I have a batch job in xml that gets scheduled by a job scheduling engine. This engine provides the possibility of observing directories for changes of their content. My task is to monitor directories on a file exchange server running Windows, where customers and clients upload files we need to process.
We need to know about the arrival of new files as soon as possible.
I have to put a regular expression into that xml-job in order to not match subdirectories and temporary files.
In most cases, customers and clients upload files formatted as text/csv/pdf, which don't cause any problems. Some upload MS Office files, which, on the other hand, become a problem if someone opens them in the directory. Then an invisible temporary file is created beginning with ~$.
According to the documentation of the scheduling engine, the regex follows the POSIX 1003.2 standard. However, I am not able to prevent notifications being sent when someone opens an MS Office file in a monitored directory.
My regular expressions, that I have tried so far are:
First try before even noticing temporary office files:
^[a-zA-Z0-9_\-]+\.+[a-zA-Z0-9_\-][^~][^.part]*$
Second try, intention was excluding a leading ~:
^[^~][a-zA-Z0-9_\-]+\.+[a-zA-Z0-9_\-][^~][^.part]*$
Third try, intention was excluding a leading ~ by its character code:
^[^\x7e][a-zA-Z0-9_\-]+\.+[a-zA-Z0-9_\-][^~][^.part]*$
Fourth try, intention was excluding a leading ~ by its character code with a capital E:
^[^\x7E][a-zA-Z0-9_\-]+\.+[a-zA-Z0-9_\-][^~][^.part]*$
All of those don't stop sending notifications on file openings…
Does anyone have any idea what to do?
All suggestions and alternatives are welcome.
I even checked them at regex101, regexplanet.com, regexr.com and regextester.com where the second try was matching exactly as desired. I did not even forget to configure POSIX compilation if it was possible on those sites (not all).
How can I exclude the ~ character from matching the regular expression (at the beginning of a file name)?
Short version:
How can I create a regular expression that matches any file with any extension apart from .part and does neither match the file thumbs.db, nor any file whose name begins with a ~?
Requirements:
What should not be matched:
Subfolders (my approach was files without a .),
Thumbs.db (Windows thumbnails db),
*.part (filezilla partial uploads),
~$. (temporary files starting with ~ or ~$, MS Office tmp files)
The following list provides some files and folders that must be matched or not matched by the regex:
Ablage (subfolder, should not be matched)
Abrechnungen (subfolder, should not be matched)
eine_testdatei.csv
TEST-WORKBOOK.xlsx
TEST-WORKBOOK_äöüß.xlsx
Test-2018-08-08.txt
~$TEST-WORKBOOK.xlsx (temporary file, should not be matched)
TEST-WORKBOOK.xlsx.part (partial upload, should not be matched)
TEST-WORKBOOK.part (partial upload, should not be matched)
New Problems occurred while trying to find the regex
A few problems came up after the creation of this question when I tried to apply the actually correct regex stated in the answer given by #Bohemian. I wasn't aware of those problems, so I just add them here for completeness.
The first one occurred when certain characters in the regex were not allowed in xml. The xml file is parsed by a java class that throws an exception trying to parse < and >, they are forbidden in xml documents if not related to xml nodes directly (valid: <xml-node>...</xml-node>, invalid: attribute="<ome_on, why isn't this VALI|>").
This can be avoided by using the html names < instead of < and > instead of >.
The second (and currently unresolved) issue is an operand criticized for the actually correct regular expression ^(?=.*\.)(?!thumbs.db$)[^~].*(?<!\.part)$. The engine says:
Error: 2018-08-17T06:05:46Z REGEX-13
[repetition-operator operand invalid, ^(?=.*\.)(?!thumbs.db$)[^~].*(?<!\.part)$]
The corresponding line in the xml file looks like this:
<start_when_directory_changed directory="F:\someDirectory" regex="^(?=.*\.)(?!thumbs.db$)[^~].*(?<!\.part)$" />
Now I am stuck again, because my knowledge of regular expressions is pretty low. It is so low, that I don't even have any idea what character could be that criticized operand in the regex.
Research has brought me to this question whose accepted answer states "POSIX regexes don't support using the question mark ? as a non-greedy (lazy) modifier to the star and plus quantifiers (…)", which gives me an idea about what is wrong with the great regex. Still, I am not able to provide a working regex, more research will have to follow…
POSIX ERE doesn't allow for a simple way to exclude a particular string from matching. You can disallow a particular character -- like in [^.part] you are matching a single character which is not (newline or) dot or p or a or r or t -- and you can specify alternations, but those are very cumbersome to combine into an expression which excludes some particular patterns.
Here's how to do it, but as you can see, it's not very readable.
^([^~t.]|t($|[^h])|th($|[^u])|thu($|[^m])|thum($|[^b])|thumb($|[^s])|thumbs($|[^.])|thumbs\.($|[^d])|thumbs\.d($|[^b])|\.($|[^p])|\.p($|[^a])|\.pa($|[^r])|\.par($|[^t]))+$
... and it still probably doesn't do exactly what you want.
Try this:
^(?=.*\.)(?!thumbs.db$)[^~].*(?<!\.part)$
See live demo.
There is nothing special about the tilda character in regex.
I am very late on this but above comments were helpful for me. It may not work for you but my solution is:
file_list <- file_list[!grepl("~", file_list)]

Preserve case during visual studio regex find and replace

I'm trying to find and replace strings using the Visual Studio regex find and replace in some code which includes a lot of inline documentation.
e.g. replace "east" with "north", and "East" with "North".
Since the files contain grammatically correct English right now, I want to be careful not to alter the case of text that may get replaced in the comments.
I know you can turn on the match case, or have one regex for lowercase and one for capitalized words, but I'm wondering if I actually have to do it twice or not (obviously I don't want to).
I've seen other answers for perl and javascript which give language-specfic answers to this question (requiring callbacks), but I'm wondering if it's possible to do just within the visual studio dialog.
If you study Using Regular Expressions in Visual Studio, you will see that there are no such an operator that would keep the case of any specified letter matched/captured with a regex.
In some regex flavors, like in Perl and R (g)sub, you could turn your captures/matches lower/uppercase with a specific operator, but again, it would be a hardcoded action, not keeping the original case intact.
Thus, the only option you have with regex is to run individual search and replace operations (like east --> north and East --> North, maybe with word boundaries around \beast\b to match a whole word).
Else, you need to process the text with some custom code written in some full fledged language.

Regexp-replace: Multiple replacements within a match

I'm converting our MVC3 project to use T4MVC. And I would like to replace java-script includes to work with T4MVC as well. So I need to replace
"~/Scripts/DataTables/TableTools/TableTools.min.js"
"~/Scripts/jquery-ui-1.8.24.min.js"
Into
Scripts.DataTables.TableTools.TableTools_min_js
Scripts.jquery_ui_1_8_24_min_js
I'm using Notepad++ as a regexp tool at the moment, and it is using POSIX regexps.
I can find script name and replace it with these regexps:
Find: \("~/Scripts/(.*)"\)
Replace with \(Scripts.\1\)
But I can't figure out how do I replace dots and dashes in the file names into underscores and replace forward slashes into dots.
I can check that js-filename have dot or dash in a name with this
\("~/Scripts/(?=\.*)(?=\-*).*"\)
But how do I replace groups within a group?
Need to have non-greedy replacement within group, and have these replacements going in an order, so forward slashes converted into a dot will not be converted to underscore afterwards.
This is a non-critical problem, I've already done all the replacements manually, but I thought I'm good with regexp, so this problem bugs me!!
p.s. preferred tool is Notepad++, but any POSIX regexp solution would do -)
p.p.s. Here you can get a sample of stuff to be replaced
And here is the the target text
I would just use a site like RegexHero
You can past the code into the target string box, then place (?<=(~/Script).*)[.-](?=(.*"[)]")) into the Regular Expression box, with _ in the Replacement String box.
Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.
From there, Paste (?<=(<script).*)("~/)(?=(.*[)]" ))|(?<=(Url.).*)(")(?=(.*(\)" ))) into the Regular Expression box and leave the Replacement String box empty.
Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.
From there paste (?<=(Script).*)[/](?=(.*[)]")) into the Regular Expression box and . into the Replacement String box.
After that, the Final String box will have what you are looking for. I'm not sure the upper limits of how much text you can parse, but it could be broken up if that's an issue. I'm sure there might be better ways to do it, but this tends to be the way I go about things like this. One reason I like this site, is because I don't have to install anything, so I can do it anywhere quickly.
Edit 1: Per the comments, I have moved step 3 to Step 5 and added new steps 3 and 4. I had to do it this way, because new Step 5 would have replaced the / in "~/Scripts with a ., breaking the removal of "~/. I also had to change Step 5's code to account for the changed beginning of Script
Here is a vanilla Notepad++ solution, but it's certainly not the most elegant one. I managed to do the transformation with several passes over the file.
First pass
Replace . and - with _.
Find: ("~/Scripts[^"]*?)[.-]
Replace With: \1_
Unfortunately, I could not find a way to match only the . or -, because it would require a lookbehind, which is apparently not supported by Notepad++. Due to this, every time you execute the replacement only the first . or - in a script name will be replaced (because matches cannot overlap). Hence, you have to run this replacement multiple times until no more replacements are done (in your example input, that would be 8 times).
Second pass
Replace / with ..
Find: ("~/Scripts[^"]*?)/
Replace with: \1.
This is basically the same thing as the first pass, just with different characters (you will have to this 3 times for the example file). Doing the passes in this order ensures that no slashes will end up as underscores.
Third pass
Remove the surrounding characters.
Find: "~/(Scripts[^"]*?)"
Replace with: \1
This will now match all the script names that are still surrounded by "~/ and ", capturing what is in between and just outputting that.
Note that by including those surrounding characters in the find patterns of the first two passes, you can avoid converting the . in strings that are already of the new format.
As I said this is not the most convenient way to do it. Especially, since passes one and two have to be executed manually multiple times. But it would still save a lot of time for large files, and I cannot think of a way to get all of them - only in the correct strings - in one pass, without lookbehind capabilities. Of course, I would very much welcome suggestions to improve this solution :). I hope I could at least give you (and anyone with a similar problem) a starting point.
If, as your question indicates, you'd like to use N++ then use N++ Python Script. Setup the script and assign a shortcut key, then you have a single pass solution requiring only to open, modify, and save... can't get much simpler than that.
I think part of the problem is that N++ is not a regex tool and the use of a dedicated regex tool
, or even a search/replace solution, is sometimes warranted. You may be better off, both in speed and in time value using a tool made for text processing vs editing.
[Script Edit]:: Altered to match the modified in/out expectations.
# Substitute & Replace within matched group.
from Npp import *
import re
def repl(m):
return "(Scripts." + re.sub( "[-.]", "_", m.group(1) ).replace( "/", "." ) + ")"
editor.pyreplace( '(?:[(].*?Scripts.)(.*?)(?:"?[)])', repl )
Install:: Plugins -> Plugin Manager -> Python Script
New Script:: Plugins -> Python Script -> script-name.py
Select target tab.
Run:: Plugins -> Python Script -> Scripts -> script-name
[Edit: An extended one-liner PythonScript command]
Having need for the new regex module for Python (that I hope replaces re) I played around and compiled it for use with the N++ PythonScript plugin and decided to test it on your sample set.
Two commands on the console ended up with the correct results in the editor.
import regex as re
editor.setText( (re.compile( r'(?<=.*Content[(].*)((?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+))+(?=.*[)]".*)' ) ).sub(lambda m: {'omit':'','toDot':'.','toUnderscore':'_'}[[ key for key, value in m.groupdict().items() if value != None ][0]], editor.getText() ) )
Very sweet!
What else is really cool about using regex instead of re was that I was able to build the expression in Expresso and use it as is! Which allows for a verbose explanation of it, just by copy-paste of the r'' string portion into Expresso.
The abbreviated text of which is::
Match a prefix but exclude it from the capture. [.*Content[(].*]
[1]: A numbered capture group. [(?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+)], one or more repetitions
Select from 3 alternatives
[omit]: A named capture group. [["~]+?([~])[/]|["]]
Select from 2 alternatives
["~]+?([~])[/]
Any character in this class: ["]
[toUnderscore]: A named capture group. [[-.]+]
[toDot]: A named capture group. [[/]+]
Match a suffix but exclude it from the capture. [.*[)]".*]
The command breakdown is fairly nifty, we are telling Scintilla to set the full buffer contents to the results of a compiled regex substitution command by essentially using a 'switch' off of the name of the group that isn't empty.
Hopefully Dave (the PythonScript Author) will add the regex module to the ExtraPythonLibs part of the project.
Alternatively you could use a script that would do it and avoid copy pasting and the rest of the manual labor altogether. Consider using the following script:
$_.gsub!(%r{(?:"~/)?Scripts/([a-z0-9./-]+)"?}i) do |i|
'Scripts.' + $1.split('/').map { |i| i.gsub(/[.-]/, '_') }.join('.')
end
And run it like this:
$ ruby -pi.bak script.rb *.ext
All the files with extension .ext will be edited in-place and the original files will be saved with .ext.bak extension. If you use revision control (and you should) then you can easily review changes with some visual diff tool, correct them if necessary and commit them afterwards.

hgignore: help ignoring all files but certain ones

I need an .hgdontignore file :-) to include certain files and exclude everything else in a directory. Basically I want to include only the .jar files in a particular directory and nothing else. How can I do this? I'm not that skilled in regular expression syntax. Or can I do it with glob syntax? (I prefer that for readability)
Just as an example location, let's say I want to exclude all files under foo/bar/ except for foo/bar/*.jar.
The answer from Michael is a fine one, but another option is to just exclude:
foo/bar/**
and then manually add the .jar files. You can always add files that are excluded by an ignore rule and it overrides the ignore. You just have to remember to add any jars you create in the future.
To do this, you'll need to use this regular expression:
foo/bar/.+?\.(?!jar).+
Explanation
You are telling it what to ignore, so this expression is searching for things you don't want.
You look for any file whose name (including relative directory) includes (foo/bar/)
You then look for any characters that precede a period ( .+?\. == match one or more characters of any time until you reach the period character)
You then make sure it doesn't have the "jar" ending (?!jar) (This is called a negative look ahead
Finally you grab the ending it does have (.+)
Regular expressions are easy to mess up, so I strongly suggest that you get a tool like Regex Buddy to help you build them. It will break down a regex into plain English which really helps.
EDIT
Hey Jason S, you caught me, it does miss those files.
This corrected regex will work for every example you listed:
foo/bar/(?!.*\.jar$).+
It finds:
foo/bar/baz.txt
foo/bar/baz
foo/bar/jar
foo/bar/baz.jar.txt
foo/bar/baz.jar.
foo/bar/baz.
foo/bar/baz.txt.
But does not find
foo/bar/baz.jar
New Explanation
This says look for files in "foo/bar/" , then do not match if there are zero or more characters followed by ".jar" and then no more characters ($ means end of the line), then, if that isn't the case, match any following characters.
Anyone that wants to use negative lookaheads (or ?! in regex syntax) or any kind of back-referencing mechanism should be aware that Mercurial will fall back from google's RE2 to Python's re module for matching.
RE2 is a non-backtracking engine that guarantees a run-time linear with the size of the input. If performance is important to you, that is if you have a big repository, you should consider sticking to more simple patterns that Re2 supports, which is why I think that the solution offered by Ryan.