Perl: difficulties with regular expression

Perl: difficulties with regular expression - regex

I have problem with making regular expression on perl, may be someone can help me.
Input strings:
bss/216476/29/52/9___\000243477___agt-1319.jpg
bss/216476/29/52/9___\000243477___agt-1319_1.jpg
bss/216476/29/52/9___\000243477___agt-1319_2.jpg
What i expect to get:
29 52 9 1319 or 29 52 9 1319 0
29 52 9 1319 1
29 52 9 1319 2
My Regex works only with last 2 strings:
/\/(\d{2})\/(\d{2})\/(\d+).*-(\d+)_(\d{1})/
As you can see in first line there is no picture number such _0.jpg and here is a problem.
I had tried to make regex like
/\/(\d{2})\/(\d{2})\/(\d+).*-((\d+)_(\d{1}))|(\d+)/
but looks like i'm wrong.
Thank you for help.

Use a non capturing group (?:...) and a ? to make it optional:
/\/(\d{2})\/(\d{2})\/(\d+).*-(\d+)(?:_(\d{1}))?/
It also can clean up your regex some if you use a different delimiter in cases where you need to include a slash. Additionally, you can use the /x modifier so you can include spacing for readability:
use strict;
use warnings;
while (<DATA>) {
if (m{ / (\d{2}) / (\d{2}) / (\d+) .*- (\d+) (?:_(\d{1}))? }x) {
print join(" ", map {$_//''} ($1, $2, $3, $4, $5)), "\n";
}
}
__DATA__
bss/216476/29/52/9___\000243477___agt-1319.jpg
bss/216476/29/52/9___\000243477___agt-1319_1.jpg
bss/216476/29/52/9___\000243477___agt-1319_2.jpg
Outputs:
29 52 9 1319
29 52 9 1319 1
29 52 9 1319 2

Related

How to preg_replace all digits except specific numbers?

It is necessary to clean a string from everything but English letters, spaces and specific numbers (eg 18,19,20 should be kept in the string).
Please help me with regex /([^a-zA-Z\s])/ to keep the specified numbers.

You can list the numbers that you want to keep between word boundaries for example and then make use of SKIP FAIL:
\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+
Rgex demo
$pattern = "/\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+/";
$s="test 18 119 19 50 20 ##$##%";
echo preg_replace($pattern, "", $s);
Output
test 18 19 20

Using the PREG_SPLIT_DELIM_CAPTURE option with preg_split:
$s="test 18 119 19 50 20 ##$##%";
echo implode('', preg_split('~\b(1[89]|20)\b|[^a-z\s]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE));

Regex match multiple numbers stop at string (word) despite more matches exist

Goal;
Match all variations of phone numbers with 8 digits + (optional) country code.
Stop match when "keyword" is found, even if more matches exist after the "keyword".
Need this in a one-liner and have tried a plethora of variations with lookahead/behind and negate [^keyword] but I am unable to understand how to achieve this.
Example of text;
abra 90998855
kadabra 04 94 84 54
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Ladida
keyword
I Want It To Stop Matching Here Or Right Before The "keyword"
more nice text with some matches
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Example of regex;
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})[^keyword]
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?!keyword)
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?=keyword)
-> This matches nothing
((\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?:(?!keyword))*)
-> This matches all numbers also below the keyword

Regex to match different formats of phone numbers

I have a bunch of numbers which I want to parse.
+79261234567
89261234567
79261234567
9261234567
+7 926 123 45 67
8(926)123-45-67
123-45-67
79261234567
(495)1234567
(495) 123 45 67
89261234567
8-926-123-45-67
8 927 1234 234
8 927 12 12 888
8 927 12 555 12
8 927 123 8 123
What I came with at first is cycle through all the variants like this
(\+[\d]{11}|[\d]{10,11}|\+\d\ [\d]{3}\ [\d]{3}\ [\d]{2}\ [\d]{2}|\d\([\d]{3}\)[\d\-]{9}|[\d\ ]{14,15}|[\d\-]{14,15}|[\d\-]{9}|\(\d\d\d\)[\d\-]{9,10}|\(\d\d\d\)[\d\ ]{9,10}|\(\d\d\d\)[\d\-]{7})
Is there more elegant way to match these numbers?

This regex will match all of the examples and not much extra:
[+]?(\b\d{1,2}[ -]?)?([(]?\d{3}[)]?)((?:[ -]?\d){4,7})(?![ -]?\d)
It can contain between 7 to 12 digits.
Although it would still match with something like this :
+12 (345) 6-7-8 9-0-1
But that should be within acceptable limits.
However, that one could still match part of a longer number.
And to avoid that it would need some negative look-behinds.
(note that there are no look-behinds in javascript regex)
[+]?(?<!\d)(?<!\d[ -])(?:((\d{1,2}[ -]?)?[(]?\d{3}[)]?[ -]?)(\d(?:[ -]?\d){3,6}))(?![ -]?\d)
Here's a regex101 test for that last one.

To have a more elegant solution, you will have to make the pattern more relaxed. One option is to capture 7, 10, or 11 numbers separated by 0 or more delimiters:
\+?(?:[ ()-]*\d){10,11}|(?:[ ()-]*\d){7}
Regex101 Tested

Regex, replacing for newline with group replace

675185538end432 204 9/9 4709 908 2
343269172end430 3 43 9335 975 7
590144128end89 7 29 3-5-4 420 2
337460105end8Y5 7A 78 2 23
292484648end70 A53 03 9235 93
These are the strings that I am working with. I want to find a regex to replace the above strings as follows
675185538
432 204 9/9 4709 908 2
343269172
430 3 43 9335 975 7
590144128
89 7 29 3-5-4 420 2
337460105
8Y5 7A 78 2 23
292484648
70 A53 03 9235 93
Wherever end comes, \r\n should be introduced.
The string before end is numeric and after end is alphanumeric with whiteline characters.
I am using notepad++.

To make the match strict, try this:
Find: ^(\d+)end(\w)
Replace: \1\r\n\2
This captures, then puts back via back references, the preceding number between start of line and "end" and the following digit/letter. This won't match "end" elsewhere.

Kludgery:
Find (\d\d\d\d\d\d\d\d\d)end(\d)
Replace \1\r\n\2
Find creates two capture groups:
each group is bounded by an ( and a )
one capture group matches exactly nine numerals
the other capture group matches exactly one numeral.
In the replace:
the first capture group is referenced with \1
and the second group with \2.

REGEX: How to split string with space and double quote

I have a input of string with spaces and double quotes as below:
Input :
18 17 16 "Arc 10 12 11 13" "Segment 10 23 33 32 12" 23 76 21
Expected Output:
18
17
16
Arc 10 12 11 13
Segment 10 23 33 32 12
23
76
21
How can I do this using Regex? Thank you in advance

You can use next regexp(see example):
("[^"]+")|\S+
("[^"]+") - quoted sequence.
\S+ - non whitespace sequence.
Probably order of groups is depend from regexp implementation. In the demo engine matching stared from left to right. Also do not forget escape special characters with double slash.

"(.+?)"|(\w+(?=\s|$))
check here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl: difficulties with regular expression - regex

Related

How to preg_replace all digits except specific numbers?

Regex match multiple numbers stop at string (word) despite more matches exist

Regex to match different formats of phone numbers

Regex, replacing for newline with group replace

REGEX: How to split string with space and double quote

Categories

Resources