RegEx Expression for Eclipse that searches for all items that have not been dealt with - regex

To help stop SQL Injection attacks, I am going through about 2000 parameter requests in my code to validate them. I validate them by determining what type of value (e.g. integer, double) they should return and then applying a function to them to sanitize the value.
Any requests I have dealt with look like this
*SecurityIssues.*(request.getParameter
where * signifies any number of characters on the same line.
What RegExp expression can I use in the Eclipse search (CTRL+H) which will help me search for all the ones I have not yet dealt with, i.e. all the times that the text request.getParameter appears when it is not preceded by the word SecurityIssues?
Examples for matches
The regular expression should match each of the following e.g.
int companyNo = StringFunctions.StringToInt(request.getParameter("COMPANY_NO‌​"))
double percentage = StringFunctions.StringToDouble(request.getParameter("MARKETSHARE"))
int c = request.getParameter("DUMMY")
But should not match:
int companyNo = SecurityIssues.StringToIntCompany(request.getParameter("COMP‌​ANY_NO"))

With inspiration and the links provided by #michaeak (thank you), as well as testing in https://regex101.com/ I appear to have found the answer:
^((?!SecurityIssues).)*(request\.getParameter)
The advantage of this answer is that I can blacklist the word SecurityIssues, as opposed to having to whitelist the formats that I do want.
Note, that it is relatively slow, and also slowed down my computer a lot when performing the search.

Try e.g.
=\s*?((?!SecurityIssues).)*?(request\.getParameter)\(
Notes
Paranthesis ( or ) are special characters for group matching. They need to be escaped with \.
If .* will match anything, also characters that you don't want it to match. So .*? will prevent it from matching anything (reluctant). This can be helpful if after the wildcard other items need to match.
There is a tutorial at https://docs.oracle.com/javase/tutorial/essential/regex/index.html , I think all of these should be available in eclipse. You can then deal with generic replacement also.
Problem
From reading Regular expression that doesn't contain certain string and Regular expression to match a line that doesn't contain a word? it seems quite difficult to create a regex matching anything but not to contain a certain word.

Related

RegEx: Searching for numbers (int, float) that are NOT part of a word

I'm hoping we have some regular expression guru's here that might be able to help me - a regex newbie - solve a problem.
I know some people will want to know some background info on this issue:
Regex Flavor: Basic Regex, being used in a Vertica Database using the REGEXP_REPLACE function.
The regex I am using is working great with one exception.
I have a rule that I'm trying to implement, related to stripping the numbers from text, where any number that is part of a word, e.g. table5, go2market, 33monroe, room222, etc. is ignored and NOT filtered.
Here is what I started with for detecting numbers:
[-+]?[0-9]*\.?[0-9]
That seems to work pretty well, including handling directly adjacent commas and parentheses for example.
But all cases where there is a number that is part of alphabetic text is also being detected, which fails the rule that it cannot be a part of a word, and by word, I mean any alphabetic text.
So, in searching for solutions, I happened upon this regex that seems to work well detecting those specific cases where numbers appear next to, or in, any string of characters:
((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]*)
My thought was that maybe I could add this as an INVERTED match to my original regex, to allow it to still select standalone numbers while ignoring those that were a part of a word, like so:
[-+]?[0-9]^((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]*)*\.?[0-9]^((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]*)
Unfortunately however, it breaks the original detection of standalone numbers.
:(
I'm hoping there is someone here that can spot what I'm doing wrong, and help me identify the right solution?
Thanks in advance!
According to Vertica documentation, the regex flavour seems to follow the Perl syntax. In this case you can use negative lookarounds and in particular a negative lookbehind: (?<!\w) (not preceded with a word character.)
Lookarounds are only tests and don't consume characters.
You can also use a negative lookahead to test the right part, (?!\w) (not followed by a word character), but it's more simple to use a word boundary since the pattern ends with a digit (that is also a word character):
(?<!\w)[-+]?\d*\.?\d+\b
In the worst case, if you have something like v1.0 in your string and you want to avoid it, you can try to use the bactracking control verbs (*SKIP) and (*FAIL). (*FAIL) forces the pattern to fail and (*SKIP) skips all the already matched positions before it. I hope vertica supports these Perl regex features.
Something like:
\p{L}+[-+]?\d*\.?\d+(*SKIP)(*FAIL)|[-+]?\d*\.?\d+(*SKIP)(?!\p{L})

Regex for Google Analytics Goals

I've searched all the other Regex on Google Analytics questions but I can't use the answers as this is pretty specific to my problem.
I want to set a goal but use Regex to flag it as a goal IF string includes
/client-thank-you/ AND anything EXCEPT hire
so in other words
/client-thank-you/hire is not correct
/client-thank-you/anything/else is correct
Each of the following regexes will match any string that contains /client-thank-you/ and does not contain hire, depending on what assumption(s) you make about where "hire" is in the string.
Solution
Where can "hire" be located in the string?
Anywhere:
((?!hire).)*?/client-thank-you/((?!hire).)*
Only following the "/client-thank-you/":
.*?/client-thank-you/((?!hire).)*
Only immediately following the "/client-thank-you/":
.*?/client-thank-you/(?!hire).*
Notes
Optimization:
Each of these regexes will match the entire string. If your tool lets you determine if a string contains a substring match (rather than naively attempting to match the entire string), then you could optimize the second and third regexes by removing the leading .*?. Likewise, the third regex could be further optimized by removing the trailing .* as well.
Positively require "anything":
Note that all of these regexes assume that a string that ends with "/client-thank-you/" (with nothing after it) is valid. If this assumption is incorrect (i.e. the string .*/client-thank-you/$ is not a match), then change the trailing * on every regex to +. This would also mean that you have to keep the last .* on the third regex as a .+ (i.e. don't optimize that away).
EDIT:
The above will not work since GA uses a very limited version of regex (that does not include lookaround). If there is no other GA tool (other than a single regex) that you can use that meets your needs, then you could use the following as a last-ditch effort:
([-._~!$&'()*+,;=:#/0-9A-Za-gi-z]|h[-._~!$&'()*+,;=:#/0-9A-Za-hj-z]|hi[-._~!$&'()*+,;=:#/0-9A-Za-qs-z]|hir[-._~!$&'()*+,;=:#/0-9A-Za-df-z]|.{1,3}$)
And in expanded form for illustration purposes only:
( | | | | )
[-._~!$&'()*+,;=:#/0-9A-Za-gi-z] h[-._~!$&'()*+,;=:#/0-9A-Za-hj-z] hi[-._~!$&'()*+,;=:#/0-9A-Za-qs-z] hir[-._~!$&'()*+,;=:#/0-9A-Za-df-z] .{1,3}$
This regex will match 1-4 characters that do not form "hire". It does so by matching the minimum number of characters necessary to verify that the match is neither "hire" nor can serve as a prefix of "hire". It takes into account end-of-line (e.g. "hir" is valid if there is nothing else after it). The characters that it matches are all valid characters that can occur in the path component of a URL as specified in RFC 3986.
You use this regex by substituting it for every ((?!hire).) in any of the solutions given above. For example:
.*?/client-thank-you/([-._~!$&'()*+,;=:#/0-9A-Za-gi-z]|h[-._~!$&'()*+,;=:#/0-9A-Za-hj-z]|hi[-._~!$&'()*+,;=:#/0-9A-Za-qs-z]|hir[-._~!$&'()*+,;=:#/0-9A-Za-df-z]|.{1,3}$).*
This matches any url that contains "/client-thank-you/" but not "/client-thank-you/hire".
Do be careful, though. Doubled "h"s will make this workaround fail (e.g. "hhire"). However, if "hire" will only ever follow a path delimiter (i.e. /hire/), then that shouldn't be a problem.
If you can't use a lookahead like Travis suggested, then I suggest setting the goal to fire on an event instead of a pageview.
If you're using Google Tag Manager, you'll have the ability to write a more advanced regex, or at least set a blocking rule for the event that prevents it from firing when 'hire' is in the page URL.

Using Flags of Regex within Google Forms

I'm trying to use flags within Google Forms, and I've been googling hoping to find an answer in the last couple of hours, but didn't find any. Google Forms say that the regular expression is not valid. Even when I use a simple regex such as: (?i)t. I'm trying to use the regex inside a paragraph question.
How can I make it work?
Edit:
What I really need is to match [a-zA-Z" ]+( *),( *)[1-9]([0-9]??)\n repeatedly, so each line will look something like: Sam "The Man" McAdams , 9\n. Of course, the number of lines is unknown. using the repetition modifiers of * or + at the end of the regex does not satisfy my needs, because if the first line is accepted as valid, the other lines might be composed of anything really, and it considers it as a valid input, while it's not.
You can use the following expression to validate an entire string that only consists of lines meeting your pattern:
^([a-zA-Z" ]+ *, *[1-9][0-9]?(\n|$))+$
See the regex demo.
The main point is to add an alternation group to match either a newline or the end of string ((\n|$)) and wrap the whole pattern into a +-quantified group ((...)+) anchored at both start (^) and end ($).

Going from regex to word vba (.Find)

I have this regex
<#([^\s]+).*?>\s?<a href=""(.*?)"".*?>(.*?)</a>(\s?\((Pending|Prepared)\))?
And i really need it in a vba version for words .find method (don't need the matching-groups), here is what i have so far
\<\#*\>*\<a href=*\>*\<\/a\>
But i cant get the last part to work, here I'm talking about
(\s?\((Pending|Prepared)\))?
I really hope someone can help me, as regex in this case is not an option (Although i know i can use regex in VBA!)
Cheers
I don't see an OR | in the documentation (Wildcard character reference) or the examples (Putting regular expressions to work in Word), so instead I suggest splitting it into two separate searches. The Word MVPs site has a good reference on the Word Regex as well if you want more information.
[^\s] can be written in the Word style regex as [! ] (note the space), + becomes #. It appears that neither the {n,} nor {n,m} syntax of VBA support an n value of 0, making ? and * hard to implement in Word. One option that the MS guys seem to use is *, which in Word is "Any string of characters". By my testing, * is lazy, meaning the pattern \<#*\> run against the string <#sometag> asdfsadfasdf > will only match <#sometag>. In addition, it can match 0 characters, for example \<\#*\> will match <#>.
So assuming that the first part is working as you expect, you could try the following two regex:
\<\#*\>*\<a href=*\>*\<\/a\>*\(Pending\)
and
\<\#*\>*\<a href=*\>*\<\/a\>*\(Prepared\)
The trouble here is that the * will match up until it hits the P of Pending or Prepared, so there could be other text in between, but it's the only way I can see of matching an optional space. If you can guaruntee that the space will or will not be there, that would go a long way towards making the regex safer.
Give that a try and see if it works for you!

What is wrong with my simple regex that accepts empty strings and apartment numbers?

So I wanted to limit a textbox which contains an apartment number which is optional.
Here is the regex in question:
([0-9]{1,4}[A-Z]?)|([A-Z])|(^$)
Simple enough eh?
I'm using these tools to test my regex:
Regex Analyzer
Regex Validator
Here are the expected results:
Valid
"1234A"
"Z"
"(Empty string)"
Invalid
"A1234"
"fhfdsahds527523832dvhsfdg"
Obviously if I'm here, the invalid ones are accepted by the regex. The goal of this regex is accept either 1 to 4 numbers with an optional letter, or a single letter or an empty string.
I just can't seem to figure out what's not working, I mean it is a simple enough regex we have here. I'm probably missing something as I'm not very good with regexes, but this syntax seems ok to my eyes. Hopefully someone here can point to my error.
Thanks for all help, it is greatly appreciated.
You need to use the ^ and $ anchors for your first two options as well. Also you can include the second option into the first one (which immediately matches the third variant as well):
^[0-9]{0,4}[A-Z]?$
Without the anchors your regular expression matches because it will just pick a single letter from anywhere within your string.
Depending on the language, you can also use a negative look ahead.
^[0-9]{0,4}[A-Za-z](?!.*[0-9])
Breakdown:
^[0-9]{0,4} = This look for any number 0 through 4 times at the beginning of the string
[A-Za-z] = This look for any characters (Both cases)
(?!.*[0-9]) = This will only allow the letters if there are no numbers anywhere after the letter.
I haven't quite figured out how to validate against a null character, but that might be easier done using tools from whatever language you are using. Something along this logic:
if String Doesn't equal $null Then check the Rexex
Something along those lines, just adjusted for however you would do it in your language.
I used RegEx Skinner to validate the answers.
Edit: Fixed error from comments