I need a regular expression that will find a number(s) that is not inside parenthesis.
Example abcd 1 (35) (df)
It would only see the 1.
Is this very complex? I've tried and had no luck.
Thanks for any help
An easy solution is to first remove the unwanted values:
my $string = "abcd 12 (35) (df) 2311,22";
$string =~ s/\(\d+\)//g; # remove numbers within parens
my #numbers = $string =~ /\d+/g; # extract the numbers
This is quite hard but something like this will probably do:
^(?:\()(\d+)(?:[^)])|(?:[^(0-9]|^)(\d+)(?:[^)0-9]|^)|(?:[^(])(\d+)(?:\))$
The problem is to match (123, 123) and also to not match the string 123 as the number 2 between the non-parentheses characters 1 and 3. Also there are probably some edge cases for start of and end of string.
My suggestion is to not use a regex for this. Maybe a regex that matches numbers and then use the capture info to check if the surrounding characters are not parentheses.
The regular expression would be:
^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$
The result is the first (and only) matching group of the regex.
Maybe you want to remove the ^ and $ if the regex should not match only if it’s the content of a whole single line. You can also use [a-zA-Z] or [[:alpha:]]. This depends on the regular expression engine you use and, of course, the content you want to match.
Example perl code:
if (m/^[a-z]+ ([0-9]+) \([0-9]+\) \([a-z]+\)$/) {
print("$1\n");
}
Please note that your question contains not enough information to make a good answer possible (you did not say anything about the general format of your expression, for example if you want to match integers or floating points)
How about
/(?:^|[^\d(])(\d+)(?:[^\d)]|$)/
? This matches a string of digits (\d+) that are
preceded by the beginning of the string, or a character that is not a digit or an open parenthesis ((?:^|[^\d(]))
succeeded by the end of the string, or by a character that is not a digit or a close parenthesis ((?:[^\d)]|$))
Related
I'm new to regex and I have a scenario where regex will be useful.
My requirement is quite simple, I to want detect if the word NET is present in a string, and extract the digits that follow it without including the word NET or the spaces that follow it.
In my particular case following the word NET are several white space characters, and the number of these can vary as they're used as padding.
My Input string is as follows
NET 4.800 g
The reg ex I have concocted is as follows
(?<=NET)\s*(\d{0,4}\.\d{1,3})
This produces a result close to what I'm attempting to do.
It performs a positive look-ahead on the characters NET and then matches as many white space characters that follow. Finally I select up to four digits, a period and up to three more digits.
The problem lies in that I'm grabbing the indeterminate number of padding spaces before the number. All I actually want is the number it self.
I did attempt putting \s* into the lookahead, but this failed. Does anyone have any suggestions as to where I'm going wrong here?
I suspect that you are using $& to capture your string, and not $1. The variable $& contains the entire matching string, which then includes your spaces, but not your lookbehind assertion. This sounds like your problem description: That you need to exclude a variable amount of spaces, but you get the error about "variable length lookbehind assertions are not supported".
This would be quite an easy question to answer if you had included your code. You should always do that: Always show.
So... I assume you have something like:
if (/your_regex/) {
$match = $&;
}
Then you should change it to
if (/your_regex/) {
$match = $1;
}
This way, only the string inside the parenthesis will be captured, and \s* outside it will be discarded.
With this proper way of matching, which can also be made in a simpler way, you can simplify your regex. Showing a strict and a flexible version:
use strict;
use warnings;
use Data::Dumper;
my $str = "NET 4.800 g";
my ($number) = $str =~ /^NET\s*(\d{0,4}\.\d{0,3})\sg$/; # strict match
print Dumper $number; # $VAR1 = '4.800';
my ($simple) = $str =~ /NET\s*([\d.]+)/; # flexible match
print Dumper $simple; # $VAR1 = '4.800';
In the strict match, we use anchors at beginning ^ and end $. We make sure that the string starts with NET and ends with g, and account for the exact numbers and spaces we expect to find between.
The flexible match simply looks for NET and captures the number that comes after it. This can take place anywhere in the string, and even match partially.
I am having a string say
my $str = "FILLER-1-1,EQPT:MN,EQPT_MISSING,NSA,04-30,15-07-13,NEND,NA";
I want to match a pattern say
my $pattern = "FILLER-1-1";
I am using the below regexp
$reg = $str =~ /$pattern/;
This is working fine
Now the problem is it is also matching if our string is
FILLER-1-10/FILLER-1-11/FILLER-1-12 so on ...
I dont want to match this. Also I don't want my regexp to be like
$reg = $str =~ /$pattern\W+/;
This one is working against the above mentioned issue but \W may come or not come. In some strings it can come while in other it may not come. So i need the regexp to match only FILLER-1-1 without using \W+ and it should match specifically FILLER-1-10
Note: If somebody is doing -(minus) rating to my question, please let me know what's wrong in the code. It will be appreciable if the person write the comment too
As \w matches [a-zA-Z0-9], you can use the zero-width assumption \b, which denotes a change in \w state (called a "word boundary", hence the "b" shortcut):
/FILLER-1-1\b/
This means that there needs to be a character that differs from the previous word state - a word state change.
It will match
FILLER-1-1.
FILLER-1-1&
FILLER-1-1,
It will not match
FILLER-1-1a
FILLER-1-16
Read more about it here.
If you want to match FILLER at the start of the input (line) followed by two numbers, this simple regex should work:
/~FILLER-\d+-\d+/
~ matches the beginning of the input
\d matches any digit ([0-9])
+ matches at least one, but can match any number
use ? quantifier like so:
/FILLER-\d-\d\W?/
The \W? means not a word zero or one time
I am trying to figure out a way to determine if my matched comma(,) does not lie inside a regex. Basically, i do not want to match my character if it lies in a regex.
The regex i have come up with is ,(?<!.+\/)(?!.+\/) but its not quite working.
Any ideas?
I want to skip /some,regex/ but match any other commas.
Edit:
Live example: http://rubular.com/r/WjrwSnmzyP
Here is the regex that will work for you:
,(?!\s)(?=(?:(?:[^/]*\/){2})*[^/]*$)
Live Demo: http://rubular.com/r/37buDdg1tW
Explanation: It means match comma followed by EVEN number of forward slash /. Hence comma (,) between 2 slash (/) characters will NOT be matched and outside ones will be matched (since those are followed by even number of / characters).
A curious thing about regular expressions is that if you want to use them to ignore "something" that is within "something else", you need to match that "something else", prefer matches of it, and then either silently discard or reproduce those matches.
For example, in order to remove all commas from a string unless they are in a regular expression literal—
In Perl:
my $s = "/foo,bar/,baz";
$s =~ s{(/(?:[^/\\]|\\.)+/)|,}{\1}g;
In ECMAScript:
var s = "/foo,bar/,baz";
s = s.replace(/(\/([^\/\\]|\\.)+\/)|,/g, "$1");
or
s = s.replace(new RegExp("(/([^/\\\\]|\\\\.)+/)|,", "g"), "$1");
Note that I am capturing the match for the regular expression literal in the string value, and reproducing it (\1 or $1) if it matched. (If the other part of the alternation – the standalone comma – matched, the empty string is captured, so this simple approach suffices here.)
For further reading I recommend “Mastering Regular Expressions” by Jeffrey E. F. Friedl. Two rather enlightening example chapters, each from a different edition, are available for free online.
I am trying to find a pattern match as below
abc(xxxx):efg(xxxx):xyz(xxxx) where xxxx - [0-9] digits
I used
set string "my string is abc(xxxx):efg(xxxx):xyz(xxxx)"
regexp abc(....):efg(....):xyz(....) $string result_str
it returns 0. Can anyone help?
The problem you've got is that ( and ) have special meaning to regular expressions in Tcl (and many other RE engines besides) in that they denote a capturing sub-RE. To make the characters “normal”, they have to be escaped with a backslash, and that means that it's best to put the regular expression in braces (because backslashes are general Tcl metacharacters).
Thus:
% set string "my string is abc(xxxx):efg(xxxx):xyz(xxxx)"
% regexp {abc\(....\):efg\(....\):xyz\(....\)} $string
1
If you want to also capture the contents of those parentheses, you need a slightly more complex RE:
regexp {abc\((....)\):efg\((....)\):xyz\((....)\)} $string \
all abc_bit efg_bit xyz_bit
Note that those .... sequences always match exactly four characters, but it's better to be more specific. To match any number of digits in each case:
regexp {abc\((\d+)\):efg\((\d+)\):xyz\((\d+)\)} $string -> abc efg xyz
When using regexp to extract bits of a string, it's pretty common to use -> as a (rather strange) variable name for the whole string match; it looks mnemonically like it's saying “send the pieces extracted to these variables”.
Not worked with tcl but seems like you need to escape the ( and ). Also if you are sure that the x's would be digits, use \d{4} instead of ..... Based on this, the updated regex you could try is
abc\(\d{4}\):efg\(\d{4}\):xyz\(\d{4}\).
Given a string of identifiers separated by :, is it possible to construct a regular expression to extract the unique identifiers into another string, also separated by :?
How is it possible to achieve this using a regular expression? I have tried s/(:[^:])(.*)\1/$1$2/g with no luck, because the (.*) is greedy and skips to the last match of $1.
Example: a:b:c:d:c:c:x:c:c:e:e:f should give a:b:c:d:x:e:f
Note: I am coding in perl, but I would very much appreciate using a regex for this.
In .NET which supports infinite repetition inside lookbehind, you could search for
(?<=\b\1:.*)\b(\w+):?
and replace all matches with the empty string.
Perl (at least Perl 5) only supports fixed-length lookbehinds, so you can try the following (using lookahead, with a subtly different result):
\b(\w+):(?=.*\b\1:?)
If you replace that with the empty string, all previous repetitions of a duplicate entry will be removed; the last one will remain. So instead of
a:b:c:d:x:e:f
you would get
a:b:d:x:c:e:f
If that is OK, you can use
$subject =~ s/\b(\w+):(?=.*\b\1:?)//g;
Explanation:
First regex:
(?<=\b\1:.*): Check if you can match the contents of backreference no. 1, followed by a colon, somewhere before in the string.
\b(\w+):?: Match an identifier (from a word boundary to the next :), optionally followed by a colon.
Second regex:
\b(\w+):: Match an identifier and a colon.
(?=.*\b\1:?): Then check whether you can match the same identifier, optionally followed by a colon, somewhere ahead in the string.
Check out: http://www.regular-expressions.info/duplicatelines.html
Always a useful site when thinking about any regular expression.
$str = q!a:b:c:d:c:c:x:c:c:e:e:f!;
1 while($str =~ s/(:[^:]+)(.*?)\1/$1$2/g);
say $str
output :
a:b:c:d:x:e:f
here's an awk version, no need regex.
$ echo "a:b:c:d:c:c:x:c:c:e:e:f" | awk -F":" '{for(i=1;i<=NF;i++)if($i in a){continue}else{a[$i];printf $i}}'
abcdxef
split the fields on ":", go through the splitted fields, store the elements in an array. check for existence and if exists, skip. Else print them out. you can translate this easily into Perl code.
If the identifiers are sorted, you may be able to do it using lookahead/lookbehind. If they aren't, then this is beyond the computational power of a regex. Now, just because it's impossible with formal regex doesn't mean it's impossible if you use some perl specific regex feature, but if you want to keep your regexes portable you need to describe this string in a language that supports variables.