Creating a regular expression to match words of varying lengths - regex

I'm writing a regular expression to parse a logfile and I'm having trouble figuring out how to establish a range(?) of sorts for a particular expression. In this case specifically, my logfile contains various severities:
(['EMERG','ALERT','CRIT','ERR','WARNING','NOTICE','INFO','DEBUG'])
I'm basically wondering how I'd write regular expression to match all of those. I understand most digit work, but characters are posing difficult issues for me.

this regex will match all these entries: [A-Za-z]{1,}
basically it says match all patterns that have only chars from A to Z or a to z with the lenght of at least one char.
for more information see this: regex cheat-sheet
and try your regex here: http://gskinner.com/RegExr/

Related

Regular expression dilemma

I'm trying for a few hours to write a pattern for some matching algorithm and I can't manage to find something for the following issue: given the example "my_name_is", I need to extract all words individually, as well as the whole expression. Consider that it may be a list of n examples, some that can be matched, some that cannot be matched.
"my_name_is" => ["my", "name", "is", "my_name_is"]
How can I do this, how should the regexp look like? Looking forward for your answers, thank you!
Regular Expressions are patterns used to match a string of characters. We usually use them to validate a string of characters, or to find and replace a specific pattern within text.
Here, it seems the outcome you're looking for is an array of strings that have been split using an underscore. Regex isn't what you're looking for.
Implementation would change based on language, but consider the following code:
function stringToArray(myStr)
{
words = str_split(myStr, '_');
return array_merge(words, [myStr]);
}
use re.findall with the following as your regex:
([^_]+)+?
This should match all sets of consecutive characters that don't contain the underscore.
As for the whole thing? You already have it, so there's no reason to regex the whole string

RegEx Expression for Eclipse that searches for all items that have not been dealt with

To help stop SQL Injection attacks, I am going through about 2000 parameter requests in my code to validate them. I validate them by determining what type of value (e.g. integer, double) they should return and then applying a function to them to sanitize the value.
Any requests I have dealt with look like this
*SecurityIssues.*(request.getParameter
where * signifies any number of characters on the same line.
What RegExp expression can I use in the Eclipse search (CTRL+H) which will help me search for all the ones I have not yet dealt with, i.e. all the times that the text request.getParameter appears when it is not preceded by the word SecurityIssues?
Examples for matches
The regular expression should match each of the following e.g.
int companyNo = StringFunctions.StringToInt(request.getParameter("COMPANY_NO‌​"))
double percentage = StringFunctions.StringToDouble(request.getParameter("MARKETSHARE"))
int c = request.getParameter("DUMMY")
But should not match:
int companyNo = SecurityIssues.StringToIntCompany(request.getParameter("COMP‌​ANY_NO"))
With inspiration and the links provided by #michaeak (thank you), as well as testing in https://regex101.com/ I appear to have found the answer:
^((?!SecurityIssues).)*(request\.getParameter)
The advantage of this answer is that I can blacklist the word SecurityIssues, as opposed to having to whitelist the formats that I do want.
Note, that it is relatively slow, and also slowed down my computer a lot when performing the search.
Try e.g.
=\s*?((?!SecurityIssues).)*?(request\.getParameter)\(
Notes
Paranthesis ( or ) are special characters for group matching. They need to be escaped with \.
If .* will match anything, also characters that you don't want it to match. So .*? will prevent it from matching anything (reluctant). This can be helpful if after the wildcard other items need to match.
There is a tutorial at https://docs.oracle.com/javase/tutorial/essential/regex/index.html , I think all of these should be available in eclipse. You can then deal with generic replacement also.
Problem
From reading Regular expression that doesn't contain certain string and Regular expression to match a line that doesn't contain a word? it seems quite difficult to create a regex matching anything but not to contain a certain word.

Yet another password validating regular expression

I've gone through multiple examples to validate passwords via regular expression, but none of them quite fit what I am looking for. I've been using trial and error to build my own, but without complete success.
Here is the regular expression that so far is the closest match for what I am looking for:
(?=.*?[a-z]{3,})(?=.*?[A-Z]{3,})(?=.*?[0-9]{2,})[a-zA-Z0-9]{8,24}
The password should have three lowercase and three uppercase alphabets and two numbers. Password length should be between 8 and 24 characters. Special characters are not looked for, they can be used as long as other requirements are met.
The regular expression above matches ABCdef12 but does not match Ad1Be1Cf. How I should modify the regular expression so it also matches the latter example?
Use look aheads for the content assertion, and a simple regex for the length:
^(?=(.*[a-z]){3})(?=(.*[A-Z]){3})(?=(.*\d){2}).{8,24}$
See demo
I'm reasonably confident this this the shortest regex that will work for you.
(?=.{8,24}$)(?=.*?[a-z].*?[a-z].*?[a-z])(?=.*?[A-Z].*?[A-Z].*?[A-Z])(?=.*?\d.*?\d)(^.*$)
You can use this.It uses lookahead to test all conditions.
See Demo.
http://regex101.com/r/yX3eB5/9

Extracting String Parts with Regular Expressions

This is a string:
http://news.ycombinator.com/page?vasya=pupkin&b=b news.ycombinator.com/page news.ycombinator.com/page.php news.ycombinator.com/page
I am extracting a host with page. So I wrote the following regular expression:
([a-zA-Z0-9\.]*[a-zA-Z0-9]+[^\/][\.][a-zA-Z0-9\/\.]+)
It returns me these (in bold):
http://news.ycombinator.com/page?vasya=pupkin&b=b news.ycombinator.com/page news.ycombinator.com/page.php news.ycombinator.com/page
This is not exactly what I need. Regexp should not see a host with page in case of this string: http://news.ycombinator.com/page?vasya=pupkin&b=b, because it is a link, which should be treated differently.
Should be rejected:
"http://news.ycombinator.com/page?vasya=pupkin&b=b", "http://news.ycombinator.com/page", "http://news.ycombinator.com/","http://news.ycombinator.com".
Should not be rejected:
"news.ycombinator.com/page","news.ycombinator.com/page.php", "news.ycombinator.com/page/index", "news.ycombinator.com/page/index.php"
How to improve this regexp so it could select only those string parts, which have no word characters nearby?
I'm not sure exactly what you are using to do your regex, but you've actually solved your own problem - you just need the regex to match whole words. This will depend on the program you are using, but this is a guidleine (posix style regex):
([:space:][a-zA-Z0-9\.]*[a-zA-Z0-9]+[^\/][\.][a-zA-Z0-9\/\.]+[:space:])
or maybe ([:space:]([a-zA-Z0-9]*[\.\/])+[a-zA-Z0-9]+[:space:])
In the second one, you will have to make sure the inner groups are for non capturing groups.

is it the right reqular expression

i have following regular expression but it's not working properly it takes only three values after # sign but i want it to be any number length
"/^[a-zA-Z0-9_\.\-]+\#([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9]{2,4}$/"
this#thi This is validated
this#this It is not validating this expression
Can you please tell me what's the problem with the expression...
Thanks
If you want your regex to match "any number length" then why are you using {2,4}?
I think a better example of the strings you're trying to match might give others a better idea of what you want, because based on your regex it is a bit confusing what you're looking for.
Try this:
^[a-zA-Z0-9_.-]+#([a-zA-Z0-9-]+\.)+[a-zA-Z0-9]{2,4}$
The main problem is that you didn't escape the dot: \.. In regular expression the dot matches everything (mostly), making your regex quite liberal.