Given an input like this, 56.1.2.3 56.1.2.4 255.255.255.254 56.1.2.7-9 56.5.1.1 to 56.5.1.7, I need a regex that can pick out the what I have in brackets [56.1.2.3] [56.1.2.4 255.255.255.254] [56.1.2.7-9] [56.5.1.1 to 56.5.1.7].
Here is what i have:
private static final String IP_Address = "\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\d";//56.1.2.3
private static final String IP_WithMask = "(\\d{1,3}.){3}(\\d{1,3})(?:\\s+[255])(\\d{1,3}.){3}(\\d{1,3})"; //56.1.2.3 255.255.255.254
private static final String IP_CIDR = "(\\d{1,3}.){3}(\\d{1,3})(?:\\s*/)(\\d{1,3})"; //56.1.2.3/24
private static final String IP_ADDRESS_Dash_Numeric_RANGE = "((\\d{1,3}.){3}(\\d{1,3})(?:\\s*-)(\\d{1,3}))";// 56.1.2.3-4
private static final String IP_ADDRESS_Dash_ADDRESS_RANGE = "((\\d{1,3}.){3}(\\d{1,3})(?:\\s*-\\s*)(\\d{1,3}.){3}(\\d{1,3}))";//56.1.2.3-56.1.2.5
private static final String IP_ADDRESS_To_Numeric_RANGE = "(\\d{1,3}.){3}(\\d{1,3})(?:\\s*[T|t][O|o]\\s*)(\\d{1,3})";//56.1.2.3 to 255
private static final String IP_ADDRESS_To_ADDRESS_RANGE = "((\\d{1,3}.){3}(\\d{1,3})(?:\\s*[T|t][O|o]\\s*)(\\d{1,3}.){3}(\\d{1,3}))";//56.1.2.3 to 56.1.3.5`
The Problem is that my regex can't pick out the difference between a single IP and the case of an IP followed by a mask (56.1.2.3 255.x.x.x). Same problem exists for the other types too.
I tested the below regex, works on all your cases.
See the comments in the formatted regex.
The way to decipher results is to test groups 3-6 for the type of IP.
All the components are captured, even the segment start.
Regex:
(\d{1,3}(?:\.\d{1,3}){2}\.(\d{1,3}))(?:(?:-|\s+to\s+)(\d{1,3}(?![\d\.]))|(?:-|\s*to\s+)(\d{1,3}(?:\.\d{1,3}){3})|\s+(25\d(?:\.\d{1,3}){3})|\s*/(\d{1,3}))?
Formatted (with this app):
( # (1), IP
\d{1,3}
(?: \. \d{1,3} ){2}
\.
( \d{1,3} ) # (2), From segment
)
(?:
(?: - | \s+ to \s+ )
( # (3), Dash/To segment
\d{1,3}
(?! [\d\.] )
)
|
(?: - | \s* to \s+ )
( # (4), Dash/To range
\d{1,3}
(?: \. \d{1,3} ){3}
)
|
\s+
( # (5), Mask
25 \d
(?: \. \d{1,3} ){3}
)
|
\s* /
( # (6), Port
\d{1,3}
)
)?
Related
This question already has answers here:
Regex for find All ip address except IP address starts with 172
(3 answers)
Closed 3 years ago.
The following regex captures IP addresses as well as DNS hostnames.
What I'd like is to add some IPs to ignore, such as 1.0.0.0 and 0.0.0.0 for example. I tried some negative lookahead without success.
[\w-]+(\.[\w-]+)+
for example :
www.google.com 255.255.255.255 1.0.0.0 stackoverflow.com 0.0.0.0
should match 3 out of 5 in that line
Any tips would be great.
edit : I tried this, which somewhat works but also filters out other values such as 1.1.1.1 for example
(?![1\.0\.0\.0]|[0\.0\.0\.0])[\w-]+(\.[\w-]+)+
To find IP's and domains while ignoring IP's 1.0.0.0 and 0.0.0.0 and
validation ov Ipv4 and domain contains at least a letter, all wrapped inside
a white space boundary is thisr :
(?<!\S)(?!0{0,2}[01](?:\.0{1,3}){3})(?:(?:0{0,2}\d|0?[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])(?:\.(?:0{0,2}\d|0?[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])){3}|(?=\S*[a-zA-Z])[\w-]+(?:\.[\w-]+)+)(?!\S)
https://regex101.com/r/ZPQS5K/1
Expanded
(?<! \S )
(?! # Not 0.0.0.0 or 1.0.0.0
0{0,2} [01]
(?: \. 0{1,3} ){3}
)
(?:
(?: # IP address
0{0,2} \d
| 0? [1-9] \d
| 1 \d{2}
| 2 [0-4] \d
| 25 [0-5]
)
(?:
\.
(?:
0{0,2} \d
| 0? [1-9] \d
| 1 \d{2}
| 2 [0-4] \d
| 25 [0-5]
)
){3}
| # or
(?= \S* [a-zA-Z] ) # At least a letter
[\w-]+ # Domain
(?: \. [\w-]+ )+
)
(?! \S )
I am trying to get this regex dialed-in to validate whether a URL begins with https and if a port is supplied the only valid values are 443 or 5443. This regex is pretty close but not quite there.
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5{0,1}443)?(.)*
How do I solve this problem?
This is a mainstream URL validator that tests if it's between whitespace boundary's.
It only allows https device and the port numbers 5443 or 443.
(?<!\S)https://(?:\S+(?::\S*)?#)?(?:(?:(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))|localhost)(?::5?443)?(?:/[^\s]*)?(?!\S)
Readable version
(?<! \S )
https ://
(?:
\S+
(?: : \S* )?
#
)?
(?:
(?:
(?:
[1-9] \d?
| 1 \d\d
| 2 [01] \d
| 22 [0-3]
)
(?:
\.
(?: 1? \d{1,2} | 2 [0-4] \d | 25 [0-5] )
){2}
(?:
\.
(?:
[1-9] \d?
| 1 \d\d
| 2 [0-4] \d
| 25 [0-4]
)
)
| (?:
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)
(?:
\.
(?: [a-z\u00a1-\uffff0-9]+ -? )*
[a-z\u00a1-\uffff0-9]+
)*
(?:
\.
(?: [a-z\u00a1-\uffff]{2,} )
)
)
| localhost
)
(?: : 5? 443 )?
(?: / [^\s]* )?
(?! \S )
You should append a / after this optional port group so it doesn't allow any digits before a /. Try using this regex,
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5?443)?\/\S*
Notice, I've also changed (:5{0,1}443)? to (:5?443)? and changed last .* to \S* so the url doesn't capture spaces as spaces in URL is not a valid thing. Besides that, you can also get rid of so many groups in your regex, unless you need them.
Regex Demo
Edit:
As you said in comments, that you want to match following URLs too,
https://example.com
https:example.com
https:example.com:443
you need to make \/\S* part optional by placing a ? after them. The modified regex becomes this, which will match above URLs.
^https:\/\/([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,})(:5?443)?(\/\S*)?
Demo with filepath part being optional
Your RegEx seems to work okay. You may try using this RegEx and add additional boundaries, just for safety, if you wish so:
^(https:\/\/)([a-zA-Z\d\.]{2,})\.([a-zA-Z]{2,}):(5443|443)?$
I only added a $ end char so that to bound your original expression from the right. You may just simply add a few port numbers, if you may have, in this capturing group:
(5443|443)
You can also remove unnecessary boundaries, if you wish.
java.util.regex.Pattern ips
= java.util.regex.Pattern.compile("(\\d{1,3}(?:\\.\\d{1,3}){2}\\.(\\d{1,3}))(?:(?:-|\\s+to\\s+)(\\d{1,3}(?![\\d\\.]))|(?:-|\\s*to\\s+)(\\d{1,3}(?:\\.\\d{1,3}){3})|\\s+(25\\d(?:\\.\\d{1,3}){3})|\\s*\\/(\\d{1,3}))?");
Currently my Regex will accept the following types of IP address input, but only one input type at a time:
ip: "47.1.2.3"
range: "47.1.2.3-4"
ip range: "47.1.2.3-47.1.2.4"
ip to range: "47.1.2.3 to 4"
ip to ip range: "47.1.2.3 to 47.1.2.4"
ip CIDR: "47.1.2.4/32"
ip Mask: "47.1.2.4 255.255.255.255"
I would like to modify my regex to accept combinations of these separated by a comma or space. Ideally the regex would have named capture groups as listed above to make handling easier.
I want the following to also be a valid input, but I want to be able to pull out the matches described above with named groups.
"47.1.2.3 to 4, 47.1.2.7, 47.1.3.9-47.1.3.19"
I'm attempting to use the regex to verify input into a text field. The following code is the textfield:
public class HostCollectionTextField extends JFormattedTextField implements CellEditor, MouseListener {
ArrayList listeners = new ArrayList();
HostCollection hc;
java.util.regex.Pattern ips
= java.util.regex.Pattern.compile("(\\d{1,3}(?:\\.\\d{1,3}){2}\\.(\\d{1,3}))(?:(?:-|\\s+to\\s+)(\\d{1,3}(?![\\d\\.]))|(?:-|\\s*to\\s+)(\\d{1,3}(?:\\.\\d{1,3}){3})|\\s+(25\\d(?:\\.\\d{1,3}){3})|\\s*\\/(\\d{1,3}))?");
public HostCollectionTextField() {
this.addMouseListener(this);
this.hc = new HostCollection();
this.setFormatterFactory(new AbstractFormatterFactory() {
#Override
public JFormattedTextField.AbstractFormatter getFormatter(JFormattedTextField tf) {
RegexFormatter f = new RegexFormatter(ips);
return f;
}
});
this.getDocument().addDocumentListener(new DocListener(this));
addActionListener(new ActionListener() {
#Override
public void actionPerformed(ActionEvent ae) {
if (stopCellEditing()) {
fireEditingStopped();
}
}
});
}
//class methods....
}
This is the RegexFormatter Class:
public class RegexFormatter extends DefaultFormatter {
protected java.util.regex.Matcher matcher;
public RegexFormatter(java.util.regex.Pattern regex) {
setOverwriteMode(false);
matcher = regex.matcher(""); // create a Matcher for the regular expression
}
public Object stringToValue(String string) throws java.text.ParseException {
if (string == null) {
return null;
}
matcher.reset(string); // set 'string' as the matcher's input
if (!matcher.matches()) // Does 'string' match the regular expression?
{
throw new java.text.ParseException("does not match regex", 0);
}
// If we get this far, then it did match.
return super.stringToValue(string); // will honor the 'valueClass' property
}
}
The ip parts are pretty unique, there should be no problem with
overlapping parts during a match using whitespace and/or comma as separator.
You probably need two versions of the same regex.
One to validate, one to extract.
The one to extract is just your original regex used in a global match.
This is used after a validation.
The validation one is below. It matches multiple ip parts at once using
the anchors ^$ with the original quantified regex embedded between using
the required separator [\s,]+.
Not sure if this will work for your validation code, but if entering
a single ip part now, works, then this should.
Validation regex:
"^(?:\\d{1,3}(?:\\.\\d{1,3}){2}\\.\\d{1,3}(?:(?:-|\\s+to\\s+)\\d{1,3}(?![\\d\\.])|(?:-|\\s*to\\s+)\\d{1,3}(?:\\.\\d{1,3}){3}|\\s+25\\d(?:\\.\\d{1,3}){3}|\\s*\\/\\d{1,3})?(?:[\\s,]*$|[\\s,]+))+$"
Formatted:
^
(?:
\d{1,3}
(?: \. \d{1,3} ){2}
\.
\d{1,3}
(?:
(?: - | \s+ to \s+ )
\d{1,3}
(?! [\d\.] )
|
(?: - | \s* to \s+ )
\d{1,3}
(?: \. \d{1,3} ){3}
|
\s+
25 \d
(?: \. \d{1,3} ){3}
|
\s* \/
\d{1,3}
)?
(?:
[\s,]* $
|
[\s,]+
)
)+
$
edit: add group names to extraction regex.
# "(?<IP>\\d{1,3}(?:\\.\\d{1,3}){2}\\.(?<From_Seg>\\d{1,3}))(?:(?:-|\\s+to\\s+)(?<To_Seg>\\d{1,3}(?![\\d\\.]))|(?:-|\\s*to\\s+)(?<To_Range>\\d{1,3}(?:\\.\\d{1,3}){3})|\\s+(?<Mask>25\\d(?:\\.\\d{1,3}){3})|\\s*/(?<Port>\\d{1,3}))?"
(?<IP> # (1), IP
\d{1,3}
(?: \. \d{1,3} ){2}
\.
(?<From_Seg> \d{1,3} ) # (2), From segment
)
(?:
(?: - | \s+ to \s+ )
(?<To_Seg> # (3), Dash/To segment
\d{1,3}
(?! [\d\.] )
)
|
(?: - | \s* to \s+ )
(?<To_Range> # (4), Dash/To range
\d{1,3}
(?: \. \d{1,3} ){3}
)
|
\s+
(?<Mask> # (5), Mask
25 \d
(?: \. \d{1,3} ){3}
)
|
\s* /
(?<Port> # (6), Port
\d{1,3}
)
)?
I have an input string ("My Email id is abc # gmail.com"). From the input string I need to validate Email id using Regex and need to replace it with (xxxxxxx).
I am using the below pattern but it doesn't work if the Email Id contains white Space.
\\w+([-+.']\\w+)*#\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*
Thanks.
If all you want to do is add whitespaces to word characters and maintain the original
regex integrity, it starts to get ugly:
// (?=\\s*\\w)[\\w\\s]+(?:[-+.'](?=\\s*\\w)[\\w\\s]+)*#(?=\\s*\\w)[\\w\\s]+(?:[-.](?=\\s*\\w)[\\w\\s]+)*\\.(?=\\s*\\w)[\\w\\s]+(?:[-.](?=\\s*\\w)[\\w\\s]+)*
(?= \s* \w )
[\w\s]+
(?:
[-+.']
(?= \s* \w )
[\w\s]+
)*
#
(?= \s* \w )
[\w\s]+
(?:
[-.]
(?= \s* \w )
[\w\s]+
)*
\.
(?= \s* \w )
[\w\s]+
(?:
[-.]
(?= \s* \w )
[\w\s]+
)*
You can test everything out here:
I would like to extract the value of individual variables paying attention to the different ways they have been defined. For example, for dtime we want to extract 0.004. It also has to be able to interpret exponential numbers, like for example for variable vis it should extract 10e-6.
The problem is that each variable has its own number of white spaces between the variable name and the equal sign (i dont have control on how they have been coded)
Text to test:
dtime = 0.004D0
case = 0
newrun = 1
periodic = 0
iscalar = 1
ieddy = 1
mg_level = 5
nstep = 20000
vis = 10e-6
ak = 10e-6
g = 9.81D0
To extract dtime's value this REGEX works:
(?<=dtime =\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
To extract dtime's value this REGEX works:
(?<=vis =\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
The problem is that I need to know the exact number of spaces between the variable name and the equal sign. I tried using \s+ but it does not work, why?
(?<=dtime\s+=\s)[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
If you are using PHP or PERL or more generally PCRE then you can use the \K flag to solve this problem like this:
dtime\s+=\s\K[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
^^
Notice the \K, it tells the expression to ignore everything
behind it as if it was never matched
Regex101 Demo
Edit: I think you need to capture the number in a capturing group if you can't use look behinds or eliminate what was matched so:
dtime\s*=\s*([-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)
(?<=dtime\s+=\s) is a variable length lookbehind because of \s+. Most(not all) engines support only a 'fixed' length lookbehind.
Also, your regex requires a digit before the exponential form, so if there is no digit, it won't match. Something like this might work -
# dtime\s*=\s*([-+]?[0-9]*\.?[0-9]*(?:[eE][-+]?[0-9]+)?)
dtime \s* = \s*
( # (1)
[-+]? [0-9]* \.? [0-9]*
(?: [eE] [-+]? [0-9]+ )?
)
Edit: After review, I see you're trying to fold multiple optional forms into one regex.
I think this is not really that straight forward. Just as interest factor, this is probably a baseline:
# dtime\s*=\s*([-+]?(?(?=[\d.]+)(\d*\.\d+|\d+\.\d*|\d+|(?!))|)(?(?=[eE][-+]?\d+)([eE][-+]?\d+)|))(?(2)|(?(3)|(?!)))
dtime \s* = \s*
( # (1 start)
[-+]? # optional -+
(?(?= # conditional check for \d*\.\d*
[\d.]+
)
( # (2 start), yes, force a match on one of these
\d* \. \d+ # \. \d+
| \d+ \. \d* # \d+ \.
| \d+ # \d+
| (?!) # or, Fail the match, the '.' dot is there without a number
) # (2 end)
| # no, match nothing
)
(?(?= # conditional check for [eE] [-+]? \d+
[eE] [-+]? \d+
)
( [eE] [-+]? \d+ ) # (3), yes, force a match on it
| # no, match nothing
)
) # (1 end)
(?(2) # Conditional check - did we match something? One of grp2 or grp3 or both
| (?(3)
| (?!) # Did not match a number, Fail the match
)
)