I want to validate hostnames (ie x.y.z format). Currently I'm using the regular expression below, but it is not working.
It accepts x.y.z.a etc. I want to restrict it to only accept x.y.z. Does anyone know how I can fix it?
/^([a-z0-9]+(-[a-z0-9]+)*\.)+([a-z]{2,12})$/i
Just replace the + modifier with {1,2}:
/^([a-z0-9]+(-[a-z0-9]+)*\.){1,2}([a-z]{2,12})$/i
And, if you don't need the capture groups:
/^(?:[a-z0-9]+(?:-[a-z0-9]+)*\.){1,2}[a-z]{2,12}$/i
If you want exactly 3 parts (x.y.z), use {2} instead of {1,2}
/^(?:[a-z0-9]+(?:-[a-z0-9]+)*\.){2}[a-z]{2,12}$/i
This will do the job. Above regex will match only x.y.z format
^([a-z0-9]\.){2}[a-z0-9]$
Two times x. format with a x at the end.
Related
I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.
The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.
It looks like this:
2022-04-03 14:37:25.101991388 2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122 2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096
What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.
I've tried different approaches, but just couldn't get it right at all.
So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort?
The docs offer this example:
.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)
Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?
Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)
You could for example create a specific pattern to match the first part, and capture the second part:
^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b
Regex demo
Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.
^(?:\S+\s+){2}(?<timestamp>\S+)
Regex demo
I have this regex for email validation (assume only x#y.com, abc#defghi.org, something#anotherhting.edu are valid)
/^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i
But #abc.edu and abc#xyz.eduorg are both valid as to the regex above. Can anyone explain why that is?
My approach:
there should be at least one character or number before #
then there comes #
there should be at least one character or number after # and before .
the string should end with either edu, com, or org.
Try this
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
and it should become clear - you need to group those alternatives, otherwise you can match any string that has 'edu' in it, or any string that ends with org. To put it another way, your version matches any of these patterns
^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)
(edu)
(org)$
It's worth pointing out that the original poster is using this as a regex learning exercise. This would be a terrible regex for actual production use! It's a thorny problem - see Using a regular expression to validate an email address for a lot more depth.
Your grouping parentheses are incorrect:
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
Can also just use one case as you're using the i modifier:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
N.B. you were also missing a + from the second set, I assume this was just a typo...
What you have written is the equivalent of matching something that:
Begins with [a-zA-Z0-9]+#[a-zA-Z0-9].com
contains edu
or ends with org
What you were looking for was:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
Your regex looks ok.
I guess you are looking using a find function in stead of a match function
Without specifying what you use it is a bit difficult, but in Python you would write
import re
pattern = re.compile ('^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$')
re.match('#abc.edu') # fails, use this to validate an input
re.search('#abc.edu') # matches, finds the edu
Try to use it:
[a-zA-Z0-9]+#[a-zA-Z0-9]+.(com|edu|org)+$
U forget about + modificator if u want to catch any combinations of (com|edu|org)
Upd: as i see second [a-zA-Z0-9] u missed + too
I have this string, and I need to get the datetime out of it by using regex. I have little to no experience with regex and am stuck.
As an example, take this string: Vic-nc_20150406_0100
I want to get the following result: 201504060100
How am I to accomplish this? So far I've come up with this expression: ([0-9]{8})_([0-9]{4}), although the result is two groups (20150404 and 0100).
Another expression I've come up with is ([0-9]{8}_[0-9]{4}), now the result is 20150406_0100.
I either need to combine the groups or filter out the [_] somehow. Can anybody help me out?
Thanks in advance!
If you want to replace, then just take the value of two groups.
Find (\d{8})_(\d{4})
Replace \1\2 or $1$2 based on your program language.
I am still figuring my way around regex and have come across a problem that I am trying to solve. How do I validate for multiple specific email addresses?
For example, I want to only allow testdomain.com, realdomain.com, gooddomain.com to be validated. All other email addresses are not allowed.
annie#testdomain.com OK
aaron1#realdomain.com OK
amber#gooddomain.com OK
annie#otherdomain.com NOT OK
But I'm stil unclear on how to add multiple specific email addresses for the regex.
Any and all help would be appreciated.
Thank you,
Do you mean to include various ligitimate domains in one regex?
\b[A-Z0-9._%-]+#(testdomain|gooddomain|realdomain)\.com\b
You didn't specify which language you're using, but most regex implementations have a notion of logical operators, so the domain part of your pattern would have something like:
(domain1|domain2|domain3)
\b[A-Z0-9._%-]+#(testdomain|realdomain|gooddomain)\.com\b
Assuming the above works for testdomain:
\b[A-Z0-9._%-]+#(?:testdomain|realdomain|gooddomain)\.com\b
Also, please note that you will have to add a case insensitive i modifier for this to work with your test cases, or use [A-Za-z0-9._%-] instead of [A-Z0-9._%-]
See here
To make this expandable to many domains, I would probably capture the domain name and then compare that captured domain name with your whitelist in code.
.+#(.+)
First, ".+" will match any number (more than 0) of any characters up until the last "#" symobol in the string.
Second, "#" will match the "#" symbol.
Third, "(.+)" will match and capture (capture because of the parenthesis) any character string after the "#" symbol.
Then, depending on the language you are using, you can get the captured string. Then you can see if that captured string is in your domain whitelist. Note, you'll want to do a case insensitive comparison in this last step.
The official standard is known as RFC 2822.
Use OR operator | for all domain names you want to allow. Do not forget to escape . in the domain.
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:testdomain\.com|realdomain\.com|gooddomain\.com)
Also use case-insensitivity modifier/flag to allow capital letters in the address.
Hi I want to be able to set the regular expression to allow for dates to be entered like this
01/01/1900 or 01/01/70, I have the following but not sure how to make it so that it takes 4 or 2 at the end.
^([1-9]|0[1-9]|1[012])[- /.]([1-9]|0[1-9]|[12][0-9]|3[01])[- /.][0-9]{4}$
The other one I would like to know is for URL
This one I have no idea how do I make it so that it matches correct URL's?
Thank you
This should match two our four digit numbers:
\d{2}(\d{2})?
Your full regex would be something like this:
^([1-9]|0[1-9]|1[012])[- /.]([1-9]|0[1-9]|[12][0-9]|3[01])[- /.]\d{2}(\d{2})?$
URLs are hard to test. http://localhost is a valid URL and so it https://test.example.co.uk:443/index.ece?foo=bar. I would look for something in your language to test this for you or do a very simple test like this (you will have to delimit some special chars depending on the regex engine you use):
^https?://
To modify your regex so that it takes either 2 or 4 digits at the end, you can try this:
^([1-9]|0[1-9]|1[012])[- /.]([1-9]|0[1-9]|[12][0-9]|3[01])[- /.]([0-9]{4}|[0-9]{2})$
For URLs, you can try (from here):
(http|https)://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
or have a look at this S.O. question.
^([1-9]|0[1-9]|1[012])[- /.]([1-9]|0[1-9]|[12][0-9]|3[01])[- /.]([0-9]{4}|[0-9]{2})$
Well, is ([0-9]{4}|[0-9]{2}) not good enough for you? Probably you could add some checking that first two digits in the four-digits group is 19 or 20 but it depends on your needs.
As for URL matching look here. There's many of them with tests.
You can use another alternation in at the end to accept 2 or 4 (the same way you do the "or" options for the other date parts). Alternatively, you can require 2 digits in the last position, and then have 2 optional digits after that.
Unless you need to capture the individual parts (day, month, year), you should use non-capturing parentheses, like this (?:) (that's the .NET syntax).
Finally, you should consider the type of validation that you are trying to achieve with this. It is probably better to enforce the format, and not worry about bad forms like 91/73/9004 because even with what you have you can still get invalid dates, like 02/31/2011. Since you probably have to perform further validation, why not simplify the regex to something like ^(?:\d{1,2}[-/.]){2}\d{2}(?:\d{2})?$
As for URLs, stackoverflow is littered with duplicate questions about this.