Regex[Python] Extract from url path parameters

Regex[Python] Extract from url path parameters - regex

I have an URLs from the access log. Example:
/someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w
/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen
I cannot make any assumption on the service name or the function name.
I'm trying to find a regex that can only match in the first log:
67814
alloy%20nudge%20w
and in the second:
asdNmasdf423-asd342e
FS443GH
front%20parking%20sen
with some heuristic, I tried to use [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,} match only long strings but the function names(getPersonFromAllAccessoriesByDescription, getDealerFromSomethingSomething) also had been caught.
I was thinking about regex that can do the same as [a-zA-Z0-9_%-]{15,} but with condition that it must be at least one digit, so this way the function names will be skipped.
Thank you

Your heuristics is fine, use
\b(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}
See proof.
Explanation
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[a-zA-Z_%-]* any character of: 'a' to 'z', 'A' to
'Z', '_', '%', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_%-]{5,} any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '%', '-' (at least 5
times (matching the most amount possible))

Related

String must be alphanumeric and contain a certain substring

I'm working on adding a regex that determines whether a given input is valid. The input should be alpha numeric (underscores, dashes, periods also allowed) and between 1 and 60 characters. It should also contain a certain substring inside it (let's just say "foo.bar"). This is my attempt:
^.[a-zA-Z0-9_.-]{1,60}$
That does what I need, aside from the substring part. I'm not sure how to add the "the string must contain the substring foo.bar" requirement. FWIW I'm doing this in Ruby so I understand this means PCRE is being used.
As an example, this string should be valid:
aGreatStringWithfoo.barInIt1111
this shouldn't
aBadStringWithoutTheSubstringInIt

Use
^(?=.{1,60}$)[a-zA-Z0-9_.-]*foo\.bar[a-zA-Z0-9_.-]*$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.{1,60} any character except \n (between 1 and
60 times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_.-]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '.', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
foo 'foo'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
bar 'bar'
--------------------------------------------------------------------------------
[a-zA-Z0-9_.-]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '.', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Match regex expression Regex sketchengine (doesn't match substring)

I am learning about regular expressions and I'm trying to solve this question: https://regex.sketchengine.co.uk/cgi/ex1.cgi
So far, I've come up with:
^[psr][^ta|?!ea].*$
But instead of checking if it doesn't match 'ea' as a substring, it tries to not match 'e' and 'a' as a second character. What is my error in this?

Your regex is wrong, see its description:
NODE
EXPLANATION
^
the beginning of the string
[psr]
any character of: 'p', 's', 'r'
[^ta|?!ea]
any character except: 't', 'a', '
.*
any character except \n (0 or more times (matching the most amount possible))
$
before an optional \n, and the end of the string
Use
.*p[ioa ]t.*
See proof
EXPLANATION
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
p 'p'
--------------------------------------------------------------------------------
[ioa ] any character of: 'i', 'o', 'a', ' '
--------------------------------------------------------------------------------
t 't'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))

In your pattern you do not allow matching pe se and re ruling out respite, but you only don't want to allow pe.
You could use a negative lookahead to rule out a p directly followed by one of your characters in the character class.
^(?!p[tea])[psr].*
The pattern matches:
^ Start of the string
(?!p[tea]) Negative lookahead, assert not pt or pe or pa directly to the right
[psr].* Match either p s r followed by 0+ times any character
Regex demo
Note that there are no | ? or ! in the example data.

Regex starts with alphabet or underscore (_)

I'm trying to check string starting with underscore(_) or alphabet and can contain only letters, digits, hyphens, underscores or periods.
The string can also be of length 1.
Expected valid strings:
_Name
_name.First
name-first
Name
name.First
name-first
A
b
I tried using the below given regex but is not working for a single alphabet.
^[a-zA-Z0-9_][a-zA-Z0-9_|/.|/-]{1,20}[a-zA-Z0-9]$

Use
^[a-zA-Z0-9_](?:[a-zA-Z0-9_.-]{0,20}[a-zA-Z0-9])?$
See proof.
Explanation
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[a-zA-Z0-9_] any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
[a-zA-Z0-9_.- any character of: 'a' to 'z', 'A' to
]{0,20} 'Z', '0' to '9', '_', '.', '-' (between
0 and 20 times (matching the most amount
possible))
--------------------------------------------------------------------------------
[a-zA-Z0-9] any character of: 'a' to 'z', 'A' to
'Z', '0' to '9'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

How can I write a regular expression that matchs case insensitive emails?

I have following code to find user with an email.
User.findOne({
email: { $regex: new RegExp('^' + req.body.email.toLowerCase() + '$','i') }
})
It finds the user with a given email by lowercase letters and case insensitive search.
The problem is, we have some emails like john+doe#johndoe.com and this regular expression doesn't match those emails.
What should I add to regular expression to find that kind of emails?

The issue is that you're using the e-mail address, as req.body.email, unescaped in a regular expression.
As you noticed, characters that have a special meaning in regexes, like +, will cause problems. Even worse, when a user enters .* as their e-mail address, your query will match any user, which is a security concern.
What you want is to escape the e-mail address input so any special characters will be searched for as-is (have their "special meaning" stripped from them).
The easiest way is to use a module like regex-escape that will do that for you:
var escape = require('regex-escape');
...
User.findOne({
email: { $regex: new RegExp('^' + escape(req.body.email) + '$','i') }
})
Since the regex is already set to match case-insensitive, there's not need to lowercase the string.

Description
I use this expression, it's not perfect as there are some edge cases which will slip by but those are easy enough to test by simply sending the test email:
^[_a-z0-9-+]+(?:\.[_a-z0-9-+]+)*#[a-z0-9-]+(?:\.[a-z0-9-]+)*(?:\.[a-z]{2,4})$
By adding A-Z to each of the character classes I've made the same expression case insensitive.
^[_a-zA-Z0-9-+]+(?:\.[_a-zA-Z0-9-+]+)*#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*(?:\.[a-zA-Z]{2,4})$
Example
Live Demo
https://regex101.com/r/uC5oG4/1
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
[_a-z0-9-+]+ any character of: '_', 'a' to 'z', '0' to
'9', '-', '+' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[_a-z0-9-+]+ any character of: '_', 'a' to 'z', '0'
to '9', '-', '+' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
[a-z0-9-]+ any character of: 'a' to 'z', '0' to '9',
'-' (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[a-z0-9-]+ any character of: 'a' to 'z', '0' to
'9', '-' (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[a-z]{2,4} any character of: 'a' to 'z' (between 2
and 4 times (matching the most amount
possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------

REGEX - How to match set of string with at least one special character anywhere?

I have a problem in matching password using following regex.
^[A-Za-z\d[\!\#\#\$\%\^\&\*\(\)\_\+]{1,}]{6,}$
In above expression I want user to enter at least one special anywhere with remaining characters should be alphanumeric. The password length can't be less than six.
But the above expression is allowing user to enter not any special character. Could anyone please tell me how can I restrict the user to enter at least one special character?

How about:
^(?=[\w!##$%^&*()+]{6,})(?:.*[!##$%^&*()+]+.*)$
explanation:
The regular expression:
(?-imsx:^(?=[\w!##0^&*()+]{6,})(?:.*[!##0^&*()+]+.*)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[\w!##0^&*()+]{6,} any character of: word characters (a-z,
A-Z, 0-9, _), '!', '#', '#', '0', '^',
'&', '*', '(', ')', '+' (at least 6
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[!##0^&*()+]+ any character of: '!', '#', '#', '0',
'^', '&', '*', '(', ')', '+' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Instead of complicating your regex, how about iterating over the chars and counting the special ones
count = 0
for char in string:
if isspecial(char):
count = count+1
if count > 1:
reject()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex[Python] Extract from url path parameters - regex

Related

String must be alphanumeric and contain a certain substring

Match regex expression Regex sketchengine (doesn't match substring)

Regex starts with alphabet or underscore (_)

How can I write a regular expression that matchs case insensitive emails?

REGEX - How to match set of string with at least one special character anywhere?

Categories

Resources