Regular Expression Searching Matching One Word but Not Another - regex

<ReportExport ID="export1" runat="server" AlertNoTests="false" PDFPageOrientation="Portrait"
HideExcel="true" OnPDFClicked="CreatePDF" AllowPDFOptions="true" HideBulkPDFOptions="false"
HideOrientation="true" HidePaperSize="true" MaxReportsAtOnce="250" HideTextExport="true" />
I'm trying to use Visual Studio's find feature using regular expressions to find ReportExport in my entire solution where the HideTextExport property is not being set. This is only ever defined in the markup once on a given page.
Any ideas on how I would find where ReportExport exists... but HideTextExport does not exist in the text?
Thanks in advance!

This works for me:
\<ReportExport(:Wh+~(HideTextExport):w=:q)+:Wh*/\>
:Wh+ matches the whitespace preceding the attribute name and :w matches the name, but only after ~(HideTextExport) confirms that the name is not "HideTextExport". :q matches the attribute's value (assuming values are always quoted). < and > have to be escaped or VS Find will treat them as word boundaries.
This is effectively the same as the .NET regex,
<ReportExport(?:\s+(?!HideTextExport)[A-Za-z]+="[^"]+")+\s*/>

First off one should install the Productivity Power tools to Visual Studio (via Tools->Extension Manager) and use .net regex instead of the antiquated regex provided out of the box for the Visual Studio Find.
With that the user could use this regex pattern (if the productivity power tools has singleline turned on to handle the span of lines for the element):
(ReportExport.+?HideTextExport="false")
That will return all reportexports where its false and one could tweak the regex to change it to replace false to true.
But...if the HideTextExport is missing, this makes regex a poor choice to use to find this element because the uncertantity of the location of the attribute makes the .* or .+ too greedy and ends reporting false positives when trying to find a missing text in a match.
A generalized way of saying that is, regex finds patterns and that is its job, but it requires lexical analsys to find missing patterns where regex simply cannot.

Related

REGEX in MS Word 2016: Exclude a simple String from Search

So I read a lot about Negation in Regex but can't solve my problem in MS Word 2016.
How do I exclude a String, Word, Number(s) from being found?
Example:
<[A-Z]{2}[A-Z0-9]{9;11}> to search a String like XY123BBT22223
But how to exclude for example a specefic one like SEDWS12WW04?
Well it depends on what you need to achieve or is this a matter of curiosity... RegEx is not the same as the built-in Advanced Find with Wildcards; for that you need VBA.
Depending on your need, without using VBA, you could make use of space and return characters - something like this will work for the strings provided: [ ^13][A-Z]{2}[0-9]{1,}[A-Z]{1,}[0-9]{1,}[ ^13] (assuming you use normal carriage returns and spaces in your document)
Anyway, this is a good article on wildcard searches in MS Word: https://wordmvp.com/FAQs/General/UsingWildcards.htm
EDIT:
In light of your further comments you will probably want to look at section 8 of the linked article which explains grouping. For my proposed search you can use this to your advantage by creating 3 groups in your 'find' and only modifying the middle group, if indeed you do intend to modify. Using groups the search would look something like:
([ ^13])([A-Z]{2}[0-9]{1,}[A-Z]{1,}[0-9]{1,})([ ^13])
and the replace might look like this:
\1 SOMETHING \3
Note also: compared to a RegEx solution my suggestion is kinda lame, mainly because compared to RegEx, MS-Words find and replace (good as it is, and really it is) is kinda lame... it's hacky but it might work for you (although you might need to do a few searches).
BUT... if it really is REGEX that you want, well you can get access to this via VBA: How to Use/Enable (RegExp object) Regular Expression using VBA (MACRO) in word
And... then you will be able to use proper RegEx for find and replace, well almost - I'm under the impression that the VBA RegEx still has some quirks...
As already noted by others, this is not possible in Microsoft Word's flavor of regular expressions.
Instead, you should use standard regular expressions. It is actually possible to use standard regular expressions in MS Word if you use a special tool that integrates into Microsoft Word called Multiple Find & Replace (see http://www.translatortools.net/products/transtoolsplus/word-multiplefindreplace). This tool opens as a pane to the right of the document window and works just like the Advanced Find & Replace dialog. However, in addition to Word's existing search functionality, it can use the standard regular expressions syntax to search and replace any text within a Word document.
In your particular case, I would use this:
\b[A-Z]{2}[A-Z0-9]{9,11}\b(?<!\bSEDWS12WW04)
To explain, this searches for a word boundary + ID + word boundary, and then it looks back to make sure that the preceding string does not match [word boundary + excluded ID]. In a similar vein, you can do something like
(?<!\bSEDWS12WW04|\bSEDWS12WW05|\bSEDWS12WW05)
to exlude several IDs.
Multiple Find & Replace is quite powerful: you can add any number of expressions (either using regular expressions or using Word's standard search syntax) to a list and then search the document for all of them, replace everything, display all matches in a list and replace only specific matches, and a few more things.
I created this tool for translators and editors, but it is great for any advanced search/replace operations in Word, and I am sure you will find it very useful.
Best regards, Stanislav

Preserve case during visual studio regex find and replace

I'm trying to find and replace strings using the Visual Studio regex find and replace in some code which includes a lot of inline documentation.
e.g. replace "east" with "north", and "East" with "North".
Since the files contain grammatically correct English right now, I want to be careful not to alter the case of text that may get replaced in the comments.
I know you can turn on the match case, or have one regex for lowercase and one for capitalized words, but I'm wondering if I actually have to do it twice or not (obviously I don't want to).
I've seen other answers for perl and javascript which give language-specfic answers to this question (requiring callbacks), but I'm wondering if it's possible to do just within the visual studio dialog.
If you study Using Regular Expressions in Visual Studio, you will see that there are no such an operator that would keep the case of any specified letter matched/captured with a regex.
In some regex flavors, like in Perl and R (g)sub, you could turn your captures/matches lower/uppercase with a specific operator, but again, it would be a hardcoded action, not keeping the original case intact.
Thus, the only option you have with regex is to run individual search and replace operations (like east --> north and East --> North, maybe with word boundaries around \beast\b to match a whole word).
Else, you need to process the text with some custom code written in some full fledged language.

Visual Studio Find and Replace with Regex

I want to replace C# attributes with VB.NET, which means, [Serializable] should become <Serializable>.
The pattern (\[)(.+)(\]) does find the results but I don't know how to replace the first and the last groups with the appropriate parenthesis.
I read this page, but I didn't understand how to use the curly braces for F&R, I tried to wrap the groups with it but it didn't work.
If you are using the Productivity Power Tools extension from Microsoft that support normal .NET regexes, what you would put in the textbox for the replacement given your regular expression above is:
<$2>
where $2 refers to the second capture group in your regex, i.e. the text between the brackets.
Note that this only works with the Quick Find from Productivity Power Tools though. The normal find/replace in Visual Studio use another syntax altogether.
Find what: \[{Serializable}\]
Replace with: <\1>

Looking for a regex to match more than one reference string in TortoiseSVN

We used two different methods to reference external documents and Bugzilla bug numbers.
I'm now looking for a regular expression that matches these two possibilities of reference strings for convenient display and linking in the TortoiseSVN 1.6.16 log screen. First should be a bugzilla entry of the form [BZ#123], second is [some text and numbers], which has not to be converted into a url.
This can be matched with
\[BZ#\d+\]
and
\[.*?\]
My problem now is to concatenate those two match strings together. Usually this would be done by the regex (first|second), and I've done it this way:
(\[.*?\]|\[BZ#\d+\])
Unfortunately in this case TortoiseSVN seems to catch it all as the bug number because of the round braces. Even if I add a second expression which (according to the documentation) is meant to be used to extract the issue number itself, this second expression is supposed to be ignored:
(\[.*?\]|\[BZ#\d+\])
\[BZ#(\d+)\]
In this case TortoiseSVN displays the bug and document references correctly in the separate column, but uses them completely for the bugtracker url, which is of course not working:
https://mybugzillaserver/show_bug.cgi?id=[BZ#949]
BTW, Mercurial uses a better way by using {1}, {2}, ... as the placeholder in URLs.
Has anybody an idea how to solve this problem?
EDIT
In short: We have used [BZ#123] as bug number references and [anytext] as references to other (partly non-electronic) documents. We would like to have both patterns listed in TortoiseSVN's extra column, but only the bug number from the first part shpuld be used as %BUGID% in the URL string.
EDIT 2
Supposedly TortoiseSVN cannot handle nested regex groups (round braces), so this question doesn't have any satisfactory answer at the moment.
I'm not familiar with TortoiseSVN regex, but what it looked like the problem was that the first piece of the regex ([.*?\]) would always match, so you would never even get to the part evaluating the second part, \[BZ#(\d+)\]
Try this one instead:
((?<=\[BZ#)\d+(?=\])|\[.*?\])
Explanation:
( #Opening group.
(?<=\[BZ#) #Look behind for a bugzilla placeholder.
\d+ #Capture just the digits.
(?=\]) #Look ahead for the closing bracket (probably not necessary.)
| #Or, if that fails,
\[.*?\] #Find all other placeholders.
) #Closing the group.
Edit: I've just looked at TortoiseSVN docs. You could also try to keep the Message part expression the same, but change the Bug-ID expression to:
(?<=\[BZ#)(\d+)(?=\])
Edit: ?<= represents a zero-width lookbehind. See http://www.regular-expressions.info/lookaround.html. It is possible that TortoiseSVN doesn't support lookbehinds.
What happens if you just use (\d+) for your Bug-ID expression?

Removing everything between a tag (including the tag itself) using Regex / Eclipse

I'm fairly new to figuring out how Regex works, but this one is just frustrating.
I have a massive XML document with a lot of <description>blahblahblah</description> tags. I want to basically remove any and all instances of <description></description>.
I'm using Eclipse and have tried a few examples of Regex I've found online, but nothing works.
<description>(.*?)</description>
Shouldn't that work?
EDIT:
Here is the actual code.
<description><![CDATA[<center><table><tr><th colspan='2' align='center'><em>Attributes</em></th></tr><tr bgcolor="#E3E3F3"><th>ID</th><td>308</td></tr></table></center>]]></description>
I'm not familiar with Eclipse, but I would expect its regex search facility to use Java's built-in regex flavor. You probably just need to check a box labeled "DOTALL" or "single-line" or something similar, or you can add the corresponding inline modifier to the regex:
(?s)<description>(.*?)</description>
That will allow the . to match newlines, which it doesn't by default.
EDIT: This is assuming there are newlines within the <description> element, which is the only reason I can think of why your regex wouldn't work. I'm also assuming you really are doing a regex search; is that automatic in Eclipse, or do you have to choose between regex and literal searching?