I cannot get this regex to work:
"4. 182 ex" (number, period, 2 blank spaces, 3 numbers, blank space, 2 characters"
The regex syntax should return "4182" and remove period, blank spaces, and characters.
Can you help me please?
EDIT!!!
Thanks everyone but I missed the key question:
a) the regex shall only find the value (4182) when the same line contains a specific text for example "magic", so for example:
"Magic 4. 182 ex"
b) the regex shall "only" find the value (4182) when the table contains a specific text for example "Magic":
"Magic 4. 182 ex
Lisefeo 2. 123 fg
Nioos 3. 124 df"
specific text = exact match or contains those charachters
My regex that I've tried so far but does it work for a whole table (not just a line) ?
(Magic.*?(\d).\s\s(\d{3})\s\w\w)
Just remove all characters that are not digit:
Perl:
$string =~ s/\D+//g;
or
php:
$string = preg_replace('/\D+/', '', $string);
According to your updated question, you could do:
$string =~ s/^Magic(\d+)\. (\d{3})\b.*$/$1$2/
or, with php:
$string = preg_replace('/^Magic(\d+)\. (\d{3})\b.*$/', '$1$2', $string);
For it to match exactly what you said, use:
(\d)\.\s\s(\d{3})\s\w\w
You'll get it in two groups, first digit and second digit group.
RegEx101 exmple
Regards.
^([\d]+)\.[\s]+([\d]+)[\s]..
Tested with perl:
> echo "4. 182 ex" | perl -lne 'print $1,$2 if(/^([\d]+)\.[\s]+([\d]+)[\s]../)'
4182
Related
I'm a regex newbie and I've got a valid regex for SSNs:
/^(\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}]*$/
But I now need to expand it to accept either an SSN or another alphanumeric ID of 7 characters, like this:
/^[a-zA-Z0-9]{7}$/
I thought it'd be as simple as grouping the SSN and adding an OR | but my tests are still failing. This is what I've got now:
/^((\d{3}(\s|-)?\d{2}(\s|-)?\d{4})|[\d{9}])|[a-zA-Z0-9]{7}$/
What am I doing wrong? And is there a more elegant way to say either SSN or my other ID?
Thanks for any helpful tips.
Valid SSNs:
123-45-6789
123456789
123 45 6789
Valid ID: aCe8999
I have modified your first regex also a bit, below is demo program. This is as per my understanding of the problem. Let me know if any modification is needed.
my #ids = (
'123-45-6789',
'123456789',
'123 45 6789',
'1234567893434', # invalid
'123456789wwsd', # invalid
'aCe8999',
'aCe8999asa' # invalid
);
for (#ids) {
say "match = $&" if $_ =~ /^ (?:\d{3} ([ \-])? \d{2} \1? \d{4})$ | ^[a-zA-Z0-9]{7}$/x ;
}
Output:
match = 123-45-6789
match = 123456789
match = 123 45 6789
match = aCe8999
Your first regex got some problems. The important thing about it is that it accepts {{{{}}}}} which means you have built a wrong character class. Also it matches 123-45 6789 (notice the mixture of space and dash).
To mean OR in regular expressions you need to use pipe | and remember that each symbol belongs to the side that it resides. So for example ^1|2$ checks for strings beginning with 1 or ending with 2 not only two individual input strings 1 and 2.
To apply the exact match you need to do ^1$|^2$ or ^(1|2)$.
With the second regex ^[a-zA-Z0-9]{7}$ you are not saying alphanumeric ID of 7 characters but you are saying numeric, alphabetic or alphanumeric. So it matches 1234567 too. If this is not a problem, the following regex is the solution by eliminating the said issues:
^\d{3}([ -]?)\d\d\1\d{4}$|^[a-zA-Z0-9]{7}$
I need a Perl regex to pull a number of between six and ten digits out of a string. The number will always follow a particular word followed by a space (case unknown).
For example, if the word I was looking for is 'string':
some random text blah blah blahSTRING 1234567890some more random text
Desired output:
1234567890
Another example:
yet more random textra ra rastring 654321hey hey my my
Desired output:
654321
I want to load the result into a variable.
/string ([0-9]{6,10})/i
string matches STRING and string as the expression ends with i (case insenstive matching)
matches a space
(starts a capture group to capture the number you trying to get
[0-9]{6,10}matches a number with 6 to 10 places
https://regex101.com/r/mB1zF4/1
Group 1 should contain your number with
/^.*string (\d+).*$/i
Thanks everyone, between all the responses and a bit of googling I ended up with
#!/usr/local/bin/perl -w
use strict;
my $string = 'sgtusadl;fdsas;adlhstring 12345678daf;slkdfja;dflk';
my ( $number ) = $string =~ m/string\s\d{6,10}/gi;
$number =~ s/[^0-9]//g;
print "number is $number\n";
exit 0;
I'm using the Regex interpreter found in XYplorer file browser. I want to match any string (in this case a filename) that has repeated groups of 'several' characters. More specifically, I want a match on the string:
jack johnny - mary joe ken johnny bill
because it has 'johnny' at least twice. Note that it has spaces and a dash too.
It would be nice to be able to specify the length of the group to match, but in general 4, 5 or 6 will do.
I have looked at several previous questions here, but either they are for specific patterns or involve some language as well. The one that almost worked is:
RegEx: words with two letters repeated twice (eg. ABpoiuyAB, xnvXYlsdjsdXYmsd)
where the answer was:
\b\w*(\w{2})\w*\1
However, this fails when there are spaces in the strings.
I'd also like to limit my searches to .jpg files, but XYplorer has a built-in filter to only look at image files so that isn't so important to me here.
Any help will be appreciated, thanks.
.
.
.
EDIT -
The regex by OnlineCop below answered my original question, thanks very much:
(\b\w+.\b).(\1)
I see that it matches words, not arbitrary string chunks, but that works for my present need. And I am not interested in capturing anything, just in detecting a match.
As a refinement, I wonder if it can be changed or extended to allow me to specify the length of words (or string chunks) that must be the same in order to declare a match. So, if I specified a match length of 5 and my filenames are:
1) jack john peter paul mary johnnie.jpg
2) jack johnnie peter paul mary johnnie.jpg
the first one would not match since no substring of five characters or more is repeated. The second one would match since 'johnnie' is repeated and is more than 5 chars long.
Do you wish to capture the word 'johnny' or the stuff between them (or both)?
This example shows that it selects everything from the first 'johnny' to the last, but it does not capture the stuff between:
Re: (\b\w+\b).*(\1)
Result: jack bill
This example allows some whitespace between names/words:
Re: (\b\w+.*\b).*(\1)
String: Jackie Chan fought The Dragon who was fighting Jackie Chan
Result: Jackie Chan Jackie Chan
Use perl:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $line = <STDIN> ) {
chomp $line;
my #words = split ( /\s+/, $line );
my %seen;
foreach my $word ( #words ) {
if ( $seen{$word} ) { print "Match: $line\n"; last }
$seen{$word}++;
}
}
And yes, it's not as neat as a one line regexp, but it's also hopefully a bit clearer what's going on.
How can I access capture buffers in brackets with quantifiers?
#!/usr/local/bin/perl
use warnings;
use 5.014;
my $string = '12 34 56 78 90';
say $string =~ s/(?:(\S+)\s){2}/$1,$2,/r;
# Use of uninitialized value $2 in concatenation (.) or string at ./so.pl line 7.
# 34,,56 78 90
With #LAST_MATCH_START and #LAST_MATCH_END it works*, but the line gets too long.
Doesn't work, look at TLP's answer.
*The proof of the pudding is in the eating isn't always right.
say $string =~ s/(?:(\S+)\s){2}/substr( $string, $-[0], length($-[0]-$+[0]) ) . ',' . substr( $string, $-[1], length($-[1]-$+[1]) ) . ','/re;
# 12,34,56 78 90
You can't access all previous values of the first capturing group, only the last value (or the current at the match end, as you can see it) will be saved in $1 (unless you want to use a (?{ code }) hack).
For your example you could use something like:
s/(\S+)\s+(\S+)\s+/$1,$2,/
The statement that you say "works" has a bug in it.
length($-[0]-$+[0])
Will always return the length of the negative length of your regex match. The numbers $-[0] and $+[0] are the offset of the start and end of the first match in the string, respectively. Since the match is three characters long (in this case), the start minus end offset will always be -3, and length(-3) will always be 2.
So, what you are doing is taking the first two characters of the match 12 34, and the first two characters of the match 34 and concatenating them with a comma in the middle. It works by coincidence, not because of capture groups.
It sounds as though you are asking us to solve the problems you have with your solution, rather than asking us about the main problem.
My first questions here.
I have a string of digits like 55111233
as you can see 5 is consecutive twice, 1 thrice 2 once and 3 twice.
I want it to be replaced into 52132132
in general number1<count>number2<count>...numbern<count>
Please guide me.
$digits = "55111233";
$digits =~ s/((\d)\2*)/$2 . length($1)/ge;
print $digits;
You can do:
$str =~s/(\d)(\1*)/$1.(length($2)+1)/eg;