Wildcard in Word 2013 to match zero or more whitespaces - regex

What is the analog of regular expression's * modifier in Word 2013 wildcards?
In Word 2013 Find tool with wildcards enabled, apparently 0 is not a valid number as the number of matches. For example, if you type in the search box
fe{1,2}d
it will match fed and feed. However,
fe{0,2}d
will just produce an error message. What is the correct expression to match fd, fed, feed, feeed, etc.?
My motivation is to match a specific text when it is in a paragraph alone (i.e., surrounded by paragraph marks ^13) but with a possible whitespaces after it:
^13hello world {0,}^13
which just produces an error message. I did not find any solution without enabling wildcards, but even with wildcards enabled I can't get it working.
Similarly,
^13hello world #^13
matches one or more spaces, but I need zero or more.

I don't believe Word has ever had an equivalent for the zero-or-more operator, so while I haven't checked in Word 2013, I wouldn't expect to see it there either. (This page is old, but as far as I know it's still pretty authoritative on wildcard searching in Word: http://word.mvps.org/faqs/general/usingwildcards.htm)
In general, I would suggest doing two searches, one without the character and one using the 1-or-more operator.
ETA: Removed bad wildcard search.

Related

Regex how to get a full match of nth word (without using non-capturing groups)

I am trying to use Regex to return the nth word in a string. This would be simple enough using other answers to similar questions; however, I do not have access to any of the code. I can only access a regex input field and the server only returns the 'full match' and cannot be made to return any captured groups such as 'group 1'
EDIT:
From the developers explaining the version of regex used:
"...its javascript regex so should mostly be compatible with perl i
believe but not as advanced, its fairly low level so wasn't really
intended for use by end users when originally implemented - i added
the dropdown with the intention of having some presets going
forwards."
/EDIT
Sample String:
One Two Three Four Five
Attempted solution (which is meant to get just the 2nd word):
^(?:\w+ ){1}(\S+)$
The result is:
One Two
I have also tried other variations of the regex:
(?:\w+ ){1}(\S+)$
^(?:\w+ ){1}(\S+)
But these just return the entire string.
I have tried replicating the behaviour that I see using regex101 but the results seem to be different, particularly when changing around the ^ and $.
For example, I get the same output on regex101 if I use the altered regex:
^(?:\w+ ){1}(\S+)
In any case, none of the comparing has helped me actually achieve my stated aim.
I am hoping that I have just missed something basic!
===EDIT===
Thanks to all of you who have contributed thus far, however, I am still running into issues. I am afraid that I do not know the language or restrictions on the regex other than what I can ascertain through trial and error, therefore here is a list of attempts and results all of which are trying to return "Two" from a sample of:
One Two Three Four Five
\w+(?=( \w+){1}$)
returns all words
^(\w+ ){1}\K(\w+)
returns no words atall (so I assume that \K does not work)
(\w+? ){1}\K(\w+?)(?= )
returns no words at all
\w+(?=\s\w+\s\w+\s\w+$)
returns all words
^(?:\w+\s){1}\K\w+
returns all words
====
With all of the above not working, I thought I would test out some others to see the limitations of the system
Attempting to return the last word:
\w+$
returns all words
This leads me to believe that something strange is going on with the start ^ and end $ characters, perhaps the server puts these in automatically if they are omitted? Any more ideas greatly appreciated.
I don't known if your language supports positive lookbehind, so using your example,
One Two Three Four Five
here is a solution which should work in every language :
\w+ match the first word
\w+$ match the last word
\w+(?=\s\w+$) match the 4th word
\w+(?=\s\w+\s\w+$) match the 3rd word
\w+(?=\s\w+\s\w+\s\w+$) match the 2nd word
So if a string contains 10 words :
The first and the last word are easy to find. To find a word at a position, then you simply have to use this rule :
\w+(?= followed by \s\w+ (10 - position) times followed by $)
Example
In this string :
One Two Three Four Five Six Seven Height Nine Ten
I want to find the 6th word.
10 - 6 = 4
\w+(?= followed by \s\w+ 4 times followed by $)
Our final regex is
\w+(?=\s\w+\s\w+\s\w+\s\w+$)
Demo
It's possible to use reset match (\K) to reset the position of the match and obtain the third word of a string as follows:
(\w+? ){2}\K(\w+?)(?= )
I'm not sure what language you're working in, so you may or may not have access to this feature.
I'm not sure if your language does support \K, but still sharing this anyway in case it does support:
^(?:\w+\s){3}\K\w+
to get the 4th word.
^ represents starting anchor
(?:\w+\s){3} is a non-capturing group that matches three words (ending with spaces)
\K is a match reset, so it resets the match and the previously matched characters aren't included
\w+ helps consume the nth word
Regex101 Demo
And similarly,
^(?:\w+\s){1}\K\w+ for the 2nd word
^(?:\w+\s){2}\K\w+ for the 3rd word
^(?:\w+\s){3}\K\w+ for the 4th word
and so on...
So, on the down side, you can't use look behind because that has to be a fixed width pattern, but the "full match" is just the last thing that "full matches", so you just need something whose last match is your word.
With Positive look-ahead, you can get the nth word from the right
\w+(?=( \w+){n}$)
If your server has extended regex, \K can "clear matched items", but most regex engines don't support this.
^(\w+ ){n}\K(\w+)
Unfortunately, Regex doesn't have a standard "match only n'th occurrence", So counting from the right is the best you can do. (Also, Regex101 has a searchable quick reference in the bottom right corner for looking up special characters, just remember that most of those characters are not supported by all regex engines)

RegEx Expression for Eclipse that searches for all items that have not been dealt with

To help stop SQL Injection attacks, I am going through about 2000 parameter requests in my code to validate them. I validate them by determining what type of value (e.g. integer, double) they should return and then applying a function to them to sanitize the value.
Any requests I have dealt with look like this
*SecurityIssues.*(request.getParameter
where * signifies any number of characters on the same line.
What RegExp expression can I use in the Eclipse search (CTRL+H) which will help me search for all the ones I have not yet dealt with, i.e. all the times that the text request.getParameter appears when it is not preceded by the word SecurityIssues?
Examples for matches
The regular expression should match each of the following e.g.
int companyNo = StringFunctions.StringToInt(request.getParameter("COMPANY_NO‌​"))
double percentage = StringFunctions.StringToDouble(request.getParameter("MARKETSHARE"))
int c = request.getParameter("DUMMY")
But should not match:
int companyNo = SecurityIssues.StringToIntCompany(request.getParameter("COMP‌​ANY_NO"))
With inspiration and the links provided by #michaeak (thank you), as well as testing in https://regex101.com/ I appear to have found the answer:
^((?!SecurityIssues).)*(request\.getParameter)
The advantage of this answer is that I can blacklist the word SecurityIssues, as opposed to having to whitelist the formats that I do want.
Note, that it is relatively slow, and also slowed down my computer a lot when performing the search.
Try e.g.
=\s*?((?!SecurityIssues).)*?(request\.getParameter)\(
Notes
Paranthesis ( or ) are special characters for group matching. They need to be escaped with \.
If .* will match anything, also characters that you don't want it to match. So .*? will prevent it from matching anything (reluctant). This can be helpful if after the wildcard other items need to match.
There is a tutorial at https://docs.oracle.com/javase/tutorial/essential/regex/index.html , I think all of these should be available in eclipse. You can then deal with generic replacement also.
Problem
From reading Regular expression that doesn't contain certain string and Regular expression to match a line that doesn't contain a word? it seems quite difficult to create a regex matching anything but not to contain a certain word.

Regex alphanumeric with hyphen, single quotes, and single spacing is timing out (crashing)

I have the following regular expression that I use but it crashes in my browsers (does nothing and then likely times out).
I am trying to accept alphanumeric, as well as dashes and single quotes. I'm also trying to restrict spacing to allow only single spaces (no more than one space consecutively)
<constant>
<constant-name>expressionFormat</constant-name>
<constant-value>^([a-zA-Z0-9'-]+\s?)*$</constant-value>
</constant>
A sample example string that crashes with this is:
"ABCDEFGHIJKLMNOPQ43 5343443RSTUVWXYZ0123456789 ‘ –"
I'm using Struts. Any tips on what I'm doing wrong? Thanks in advance!
I've found a solution.
My OLD expression:
^([a-zA-Z0-9'-]+\s?)*$
First off, I got rid of the \s since it includes other things like tabs, new lines, etc, which I do not want.
The ? is "greedy", which means if the regex fails it continues evaluating the rest of the string until it's sure it's going to return a failure... In essence, the + and ? were making it try and check recursively making it resource intensive for longer strings.
The following expression works much better for my case:
^([a-zA-Z0-9' -])*$
I believe that the browser is just taking a really long time to process the regex search and may even be timing out.
Your sample string
ABCDEFGHIJKLMNOPQ43 5343443RSTUVWXYZ0123456789 ‘ –
will not be matched by your regular expression:
^([a-zA-Z0-9'-]+\s?)*$
Add the special characters (‘ ’ — –), i.e.,
‘ ’ — –
if you want to accept them.
^([a-zA-Z0-9'‘’—–-]+\s?)*$
This regex matches your sample string.
UPDATE:
Try this regex that uses atomic grouping to avoid catastrophic backtracking:
^(?>[a-zA-Z0-9'-]+\s?)*$

visual Studio 2010 regular expressions for 'Find In Files'

I have look at the many stackoverflow posts concerning VS regular expressions and read the Microsoft page concerning regular expressions but still cannot determine where I am going wrong.
Microsoft VS regex
I want to find all lines which include the word, attribute, but which are not comment lines (do not contain the // symbol).
I have tried using the regular expression
~(^ *//).*attribute.*
meaning:
~(^ *//) --> exclude lines which begin with '//' preceded by zero or more spaces
.* --> match any character zero or more times
attributes --> match the word attributes
.* --> match any character that comes after the word attribute
I have tried several other regular expressions with about the same amount of failure. I am wondering if anyone can spot something obvious that I am not doing.
I also gave the below a try:
~( *//).*attribute.* (thinking maybe the carat was being taken as a literal instead of special)
~(//).*attribute.* (thinking maybe the * was being taken as a literal instead of special)
~(//)attribute (imminent failure but will try anything)
\s*~(//).*attributes.*
I saw quite a few posts suggesting to use the find command in batch. This can be done, but I would prefer to have the ability to double click on the findings so that the file will be opened and already scrolled to the correct location.
How about this one.
^(?=.*attribute.*\n)(?!.*//).*

Regular Expression Using the Dot-Matches-All Mode

Normally the . doesn't match newline unless I specify the engine to do so with the (?s) flag. I tried this regexp on my editor's (UltraEdit v14.10) regexp engine using Perl style regexp mode:
(?s).*i
The search text contains multiple lines and each line contains many 'i' characters.
I expect the above regexp means: search as many characters (because with the '?s' the . now matches anything including newline) as possible (because of the greediness for *) until reaching the character 'i'.
This should mean "from the first character to the last 'i' in the last sentence" (greediness should reach the last sentence, right?).
But with UltraEdit's test, it turns out to be "from the first character to the last 'i' in the first sentence that contains an i". Is this result correct? Did I make any wrong interpretation of my reg expression?
e.g. given this text
aaa
bbb
aiaiaiaiaa
bbbicicid
it is
aaa
bbb
aiaiaiai
matched. But I expect:
aaa
bbb
aiaiaiaiaa
bbbicici
Your regex is correct, and so are your expectations of its performance.
This is a long-known bug in UltraEdit's regex implementation which I have written repeatedly to support about. As far as I know, it still hasn't been fixed. The problem appears to lie in the fact that UE's regex implementation is essentially line-based, and additional lines are taken into the match only if necessary. So .* will match greedily on the current line, but it will not cross a newline boundary if it doesn't have to in order to achieve a match.
There are some other subtle bugs with line endings. For example, lookbehind doesn't work across newlines, either.
Write to IDM support, or change to an editor with decent regex support. I did both.
Yes you are right this looks like a bug.
Your interpretation is correct. If you are in Perl mode and not Posix.
However it should apply to posix as well.
Altough defining the modifiers like you do is very rare.
Mostly you provide a string with delimiters and the modifier afterwards like /.*i/s
But this doesn't matter because your way is correct too. And if it wouldnt be supported, it wouldn't match the first newline either.
So yes, this is definately a bug in your program.
You're right that that regex should match the entire string (all 4 lines). My guess is that UltraEdit is attempting to do some sort of optimization by working line by line, and only accumulating new lines "when necessary".