Regex alphanumeric with hyphen, single quotes, and single spacing is timing out (crashing) - regex

I have the following regular expression that I use but it crashes in my browsers (does nothing and then likely times out).
I am trying to accept alphanumeric, as well as dashes and single quotes. I'm also trying to restrict spacing to allow only single spaces (no more than one space consecutively)
<constant>
<constant-name>expressionFormat</constant-name>
<constant-value>^([a-zA-Z0-9'-]+\s?)*$</constant-value>
</constant>
A sample example string that crashes with this is:
"ABCDEFGHIJKLMNOPQ43 5343443RSTUVWXYZ0123456789 ‘ –"
I'm using Struts. Any tips on what I'm doing wrong? Thanks in advance!

I've found a solution.
My OLD expression:
^([a-zA-Z0-9'-]+\s?)*$
First off, I got rid of the \s since it includes other things like tabs, new lines, etc, which I do not want.
The ? is "greedy", which means if the regex fails it continues evaluating the rest of the string until it's sure it's going to return a failure... In essence, the + and ? were making it try and check recursively making it resource intensive for longer strings.
The following expression works much better for my case:
^([a-zA-Z0-9' -])*$

I believe that the browser is just taking a really long time to process the regex search and may even be timing out.
Your sample string
ABCDEFGHIJKLMNOPQ43 5343443RSTUVWXYZ0123456789 ‘ –
will not be matched by your regular expression:
^([a-zA-Z0-9'-]+\s?)*$
Add the special characters (‘ ’ — –), i.e.,
‘ ’ — –
if you want to accept them.
^([a-zA-Z0-9'‘’—–-]+\s?)*$
This regex matches your sample string.
UPDATE:
Try this regex that uses atomic grouping to avoid catastrophic backtracking:
^(?>[a-zA-Z0-9'-]+\s?)*$

Related

Regex to match one of any terms, some terms with spaces

I'm trying to write a RegEx that matches one of several terms, as part of a spam filter. The problem is, some of these terms contain spaces, and I'm having trouble writing a valid expression.
What I originally had (before multiple word temrs) was this:
(?i)(alzheimers|baldness|obese)
Now, I want to add, for example "blood pressure", but the following expression is chucking a barny:
(?i)(alzheimers|baldness|blood pressure|obese)
You can have whitespace characters in an either-or group, your expression works. Check it out for yourself:
https://regex101.com/r/56tz6B/1
Your expression should also match "blood pressure" without any problems.
Could you try to use \s+ instead of the space character and see if it works? Please note that this would also match any whitespace (tabs, new lines etc.).

RegEx Expression for Eclipse that searches for all items that have not been dealt with

To help stop SQL Injection attacks, I am going through about 2000 parameter requests in my code to validate them. I validate them by determining what type of value (e.g. integer, double) they should return and then applying a function to them to sanitize the value.
Any requests I have dealt with look like this
*SecurityIssues.*(request.getParameter
where * signifies any number of characters on the same line.
What RegExp expression can I use in the Eclipse search (CTRL+H) which will help me search for all the ones I have not yet dealt with, i.e. all the times that the text request.getParameter appears when it is not preceded by the word SecurityIssues?
Examples for matches
The regular expression should match each of the following e.g.
int companyNo = StringFunctions.StringToInt(request.getParameter("COMPANY_NO‌​"))
double percentage = StringFunctions.StringToDouble(request.getParameter("MARKETSHARE"))
int c = request.getParameter("DUMMY")
But should not match:
int companyNo = SecurityIssues.StringToIntCompany(request.getParameter("COMP‌​ANY_NO"))
With inspiration and the links provided by #michaeak (thank you), as well as testing in https://regex101.com/ I appear to have found the answer:
^((?!SecurityIssues).)*(request\.getParameter)
The advantage of this answer is that I can blacklist the word SecurityIssues, as opposed to having to whitelist the formats that I do want.
Note, that it is relatively slow, and also slowed down my computer a lot when performing the search.
Try e.g.
=\s*?((?!SecurityIssues).)*?(request\.getParameter)\(
Notes
Paranthesis ( or ) are special characters for group matching. They need to be escaped with \.
If .* will match anything, also characters that you don't want it to match. So .*? will prevent it from matching anything (reluctant). This can be helpful if after the wildcard other items need to match.
There is a tutorial at https://docs.oracle.com/javase/tutorial/essential/regex/index.html , I think all of these should be available in eclipse. You can then deal with generic replacement also.
Problem
From reading Regular expression that doesn't contain certain string and Regular expression to match a line that doesn't contain a word? it seems quite difficult to create a regex matching anything but not to contain a certain word.

Ant regex expression

Quite a simple one in theory but can't quite get it!
I want a regex in ant which matches anything as long as it has a slash on the end.
Below is what I expect to work
<regexp id="slash.end.pattern" pattern="*/"/>
However this throws back
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*/
^
I have also tried escaping this to \*, but that matches a literal *.
Any help appreciated!
Your original regex pattern didn't work because * is a special character in regex that is only used to quantify other characters.
The pattern (.)*/$, which you mentioned in your comment, will match any string of characters not containing newlines, however it uses a possibly unnecessary capturing group. .*/$ should work just as well.
If you need to match newline characters, the dot . won't be enough. You could try something like [\s\S]*/$
On that note, it should be mentioned that you might not want to use $ in this pattern. Suppose you have the following string:
abc/def/
Should this be evaluated as two matches, abc/ and def/? Or is it a single match containing the whole thing? Your current approach creates a single match. If instead you would like to search for strings of characters and then stop the match as soon as a / is found, you could use something like this: [\s\S]*?/.

Going from regex to word vba (.Find)

I have this regex
<#([^\s]+).*?>\s?<a href=""(.*?)"".*?>(.*?)</a>(\s?\((Pending|Prepared)\))?
And i really need it in a vba version for words .find method (don't need the matching-groups), here is what i have so far
\<\#*\>*\<a href=*\>*\<\/a\>
But i cant get the last part to work, here I'm talking about
(\s?\((Pending|Prepared)\))?
I really hope someone can help me, as regex in this case is not an option (Although i know i can use regex in VBA!)
Cheers
I don't see an OR | in the documentation (Wildcard character reference) or the examples (Putting regular expressions to work in Word), so instead I suggest splitting it into two separate searches. The Word MVPs site has a good reference on the Word Regex as well if you want more information.
[^\s] can be written in the Word style regex as [! ] (note the space), + becomes #. It appears that neither the {n,} nor {n,m} syntax of VBA support an n value of 0, making ? and * hard to implement in Word. One option that the MS guys seem to use is *, which in Word is "Any string of characters". By my testing, * is lazy, meaning the pattern \<#*\> run against the string <#sometag> asdfsadfasdf > will only match <#sometag>. In addition, it can match 0 characters, for example \<\#*\> will match <#>.
So assuming that the first part is working as you expect, you could try the following two regex:
\<\#*\>*\<a href=*\>*\<\/a\>*\(Pending\)
and
\<\#*\>*\<a href=*\>*\<\/a\>*\(Prepared\)
The trouble here is that the * will match up until it hits the P of Pending or Prepared, so there could be other text in between, but it's the only way I can see of matching an optional space. If you can guaruntee that the space will or will not be there, that would go a long way towards making the regex safer.
Give that a try and see if it works for you!

Floating Point - Regular expression

I am struggling to understand this simple regular expression. I have the following attempt:
[0-9]*\.?[0-9]*
I understand this as zero-to-many numeric digits, followed by one-to-zero periods and finally ending in zero-to-many numeric digits.
I am not want to match anything other than exactly as above. I do not want positive/negative support or any other special support types. However, for some reason, the above also matches what appear to be random characters. All of the following for whatever reason match:
f32
32a
32-
=33
In an answer, I am looking for:
An explanation of why my regular expression does not work.
A working version with an explanation of why it does work.
Edit: Due to what seems to be causing trouble, I have added the "QT" tag, that is the environment I am working with.
Edit: Due to continued confusion, I am going to add a bit of code. I am starting to think I am either misusing QT, or QT has a problem:
void subclassedQDialog::setupTxtFilters()
{
QRegExp numbers("^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$");
txtToFilter->setValidator(new QRegExpValidator(numbers,this));
}
This is from within a subclassed QDialog. txtToFilter is a QLineEdit. I can provide more code if someone can suggest what may be relevant. While the expression above is not the original, it is one of the ones from comments below and also fails in the same way.
Your problem is you haven't escaped the \ properly, you need to put \\. Otherwise the C++ compiler will strip out the \ (at least gcc does this, with a warning) and the regex engine will treat the . as any character.
Put ^ at the start and $ at the end. This anchors your regex to the start and end of the string.
Your expression find a match in the middle of the string. If you add anchors to the beginning and to the end of your expression, the strings from your list will be ignored. Your expression would match empty strings, but that't the price you pay for being able to match .99 and 99. strings.
^[0-9]*\.?[0-9]*$
A better choice would be
^[0-9]*(\.[0-9]+)?$
because it would match the decimal point only if at least one digit is present after it.
One of them needs to be a + instead of *. Do you want to allow ".9" to be valid, or will you require the leading 0?