I am trying to make regex but failed in matching - regex

Random Text
qty 2 MBC102 Rs. 1,890
required 2unit MBC 103
mbc 104 2pcs #5000
MBC 1011 #4000 4 pc Price 5000
3pcs MBC1012 100rolls
MBC1013 500 pc
MBC1014 2pcs mbc 1015
qty 2 # 20000 unit 2
#900 MbC-1016 rolls 150
5000Rs mbc909 mbC 890
56 qty # 5000
mbC 820 qty 90 #25000
Want to match quantity
2
2
2
4
3
100
500
2
2
150
56
90
I tried this code but not work properly
(?i)^(?!qty|unit|pc|pcs|roll|rolls).[0-9]+

The units are after the number, not before. So use two alternatives in the regexp. One alternative has qty as a lookbehind, the other has all the units as a lookahead.
(?i)(?:(?<=^qty\s)\d+|(?:\d+(?=\s?(?:qty|unit|pc|pcs|roll|rolls))))
A single spaces is required after qty when it's before the number, because lookbehinds have to be fixed length. The space is optional when the unit is after the number, so it will match 2pcs and 500 pc

Related

Regex for converting spaces to tabs but leaving word items in the middle alone?

I have a problem that my Googling tells me can be solved with Regex, but I'm completely unfamiliar and I tried following some tutorials but I'm entirely lost. I have this sample data set:
59 65 21366 CLEMENTINES 4.89 2.00 9.78
59 61 22384 PORK BACK RIBS 6.50 2.40 15.59
59 65 30669 BANANAS 1.89 1.00 1.89
59 13 391314 KODIAK POWER CAKES 14.69 1.00 14.69
59 65 392373 BAJA CHOPPED SALAD KIT 2.99 1.00 2.99
59 39 429227 FILA MENS ANKLE SOCK 6PK 9.99 1.00 9.99
59 65 1056187 ASIAN CASHEW SALAD KIT 2.99 1.00 2.99
59 28 1159696 SHOPKINS GG/TWOZIES ASST 5.97 1.00 5.97
59 13 1221327 KODIAK POWER CAKES -3.00 -3.00 COUPON
59 14 1270070 KLEENEX ULTRA SOFT 12 PCK 16.49 1.00 16.49
59 21 5221111 10 DRAWER STORAGE CART 29.99 1.00 29.99
59 17 1019 HALF + HALF 1 L 1.99 1.00 1.99
I want to import it into a spreadsheet. Visually I can see what I want (3 numeric columns at the beginning, then a description that may or may not contain spaces, then usually 3 numeric columns, but sometimes 2 + a word (see the line that ends in "coupon").
But because of the spaces and lack of quotes, my Excel skills (which are also marginal) don't allow me to import this in a sensible way.
I thought of doing multiple processes: pull off the 3 columns at the left and then 3 columns at the right... but in Excel I see no way to operate "from the right".
Any help appreciated. Thanks.
[edit] I realize from the comments that my ignorance has resulted in a poor question.
I didn't realize "Regex" was specific to language, etc. I am trying to import a csv into Excel, but I was using Notepad++ to perform the regex operations. I don't know what "flavor" that uses but the answer below helped greatly.
You can match this with:
^(\S*) (\S*) (\S*) (.*) (\S*) (\S*) (\S*)$
^ matches the start of a line
\S* matches one or more non-whitespace characters
.* matches anything, including spaces
the parentheses capture the matches into capture groups
$ matches the end of a line.
You haven't said what tool you intend to use to do this.
One way is with a Perl one-liner:
perl -pe 's/^(\S*) (\S*) (\S*) (.*) (\S*) (\S*) (\S*)$/"\1","\2","\3","\4","\5","\6","\7"/' input.txt
Returning:
"59","65","21366","CLEMENTINES","4.89","2.00","9.78"
...
"59","13","1221327","KODIAK POWER CAKES","-3.00","-3.00","COUPON"
... etc.

Regex for 10 digit phone number with variable spacing

I need to validate that a string follows these rules:
contains numerals
may optionally contain any number of space characters in any position
may not contain any other kind of character
the first two numerals must be one of the set: 02; 03; 07; 08; 13; 18
and the number of numerals must be exactly 10 unless the first two numerals are 1 and 3, in which case the number of numerals may be 10 or 6.
Essentially these are Australian landline (with area code), free-call and 13 numbers.
Ideally the regex should be as implementation-agnostic as possible.
Examples of valid input:
0299998888
02 99998888
02 9999 8888
02 99 998 888
0299 998 888
0299 998888
131999
131 999
13 19 99
1300123456
1300 123456
1300 123 456
1300 12 34 56
1300 12 34 56
PS. I've checked at least 5 other answers and searched for multiple variations of this question, to no avail.
The nearest I have is:
^(?=\d{10}$)(02|03|04|07|08|13|18)\d+
... however this does not account for spacing and won't accept 6 digit numbers beginning with 13.
Note, in theory, the following is acceptable:
1 3 1999
1 3 1 9 9 9
By this I mean that first pair of numerals may have a space between them (as bad as that looks).
Following are examples of random numbers that should fail:
13145 (not enough numerals)
1300-123-456 (hyphens not permitted)
9999 8888 (not enough numerals)
(02) 9999 8888 (parentheses not permitted)
You can make a separate pattern for 13 in alternation:
^(?:(?=(?:\s*\d\s*){10}$)(?:0\s*[2378]|1\s*[38])|(?=(?:\s*\d\s*){6}$)1\s*3).*
Demo: https://regex101.com/r/Hkjus2/2

PCRE Regex Matching Patterns until next pattern is found

I'm struggling to find a solution to this regex which appears to be fairly straight forward. I need to match a pattern that precedes another matching pattern.
I need to capture the "Mean:" that follows "Keberos-wsfed" in the following:
Kerberos:
Historical:
Between 26 and 50 milliseconds: 10262
Between 50 and 100 milliseconds: 658
Between 101 and 200 milliseconds: 9406
Between 201 and 500 milliseconds: 6046
Between 501 milliseconds and 1 second: 1646
Between 1 and 5 seconds: 1399
Between 6 and 10 seconds: 13
Between 11 and 30 seconds: 34
Between 31 seconds and 1 minute: 7
Between 1 minute and 2 minutes: 1
Mean: 268, Mode: 36, Median: 123
Total: 29472
Kerberos-wsfed:
Historical:
Between 26 and 50 milliseconds: 3151
Between 50 and 100 milliseconds: 129
Between 101 and 200 milliseconds: 650
Between 201 and 500 milliseconds: 411
Between 501 milliseconds and 1 second: 171
Between 1 and 5 seconds: 119
Between 6 and 10 seconds: 4
Between 11 and 30 seconds: 6
Between 1 minute and 2 minutes: 1
Mean: 176, Mode: 33, Median: 37
Total: 4642
I can match (?:Kerberos-wsfed:), I can match Mean: but I must find the value of Mean after Kerberos-wsfed but having difficulty. Thanks for the assistance.
Use the regex
Kerberos-wsfed[\s\S]*?Mean: *(\d+)
The mean value is contained in the capturing group 1, that is $1 or \1 depending on your programming language.
See demo.
Try to use that regular expresion: #Kerberos-wsfed:.+?Mean:\s+(\d+)#s
You can use just space or \s instead of \s+ if you're shure in file format.
Value 176 will be at group 1 of matched elements
Demo: https://regex101.com/r/gwkUPJ/1
Using capturing group:
Kerberos-wsfed:[\s\S]*Mean:\s(\d+)
Kerberos-wsfed: matches the literal as-is
[\s\S]* allows any number of characters between (including line delimitters)
Mean:\s matches the literal Mean followed by a space \s
Finally (\d+) which is wrapped in the first capturing group captures the value you are looking for. It essentially allows any number of digits
Regex 101 Demo
The value that you are looking for (176) will be in the first capturing group which is $1 or the first one based on your language. For instance, in PHP:
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo $matches[0][1];
// Output: 176

Regex to match different formats of phone numbers

I have a bunch of numbers which I want to parse.
+79261234567
89261234567
79261234567
9261234567
+7 926 123 45 67
8(926)123-45-67
123-45-67
79261234567
(495)1234567
(495) 123 45 67
89261234567
8-926-123-45-67
8 927 1234 234
8 927 12 12 888
8 927 12 555 12
8 927 123 8 123
What I came with at first is cycle through all the variants like this
(\+[\d]{11}|[\d]{10,11}|\+\d\ [\d]{3}\ [\d]{3}\ [\d]{2}\ [\d]{2}|\d\([\d]{3}\)[\d\-]{9}|[\d\ ]{14,15}|[\d\-]{14,15}|[\d\-]{9}|\(\d\d\d\)[\d\-]{9,10}|\(\d\d\d\)[\d\ ]{9,10}|\(\d\d\d\)[\d\-]{7})
Is there more elegant way to match these numbers?
This regex will match all of the examples and not much extra:
[+]?(\b\d{1,2}[ -]?)?([(]?\d{3}[)]?)((?:[ -]?\d){4,7})(?![ -]?\d)
It can contain between 7 to 12 digits.
Although it would still match with something like this :
+12 (345) 6-7-8 9-0-1
But that should be within acceptable limits.
However, that one could still match part of a longer number.
And to avoid that it would need some negative look-behinds.
(note that there are no look-behinds in javascript regex)
[+]?(?<!\d)(?<!\d[ -])(?:((\d{1,2}[ -]?)?[(]?\d{3}[)]?[ -]?)(\d(?:[ -]?\d){3,6}))(?![ -]?\d)
Here's a regex101 test for that last one.
To have a more elegant solution, you will have to make the pattern more relaxed. One option is to capture 7, 10, or 11 numbers separated by 0 or more delimiters:
\+?(?:[ ()-]*\d){10,11}|(?:[ ()-]*\d){7}
Regex101 Tested

RegEx: Reject sub-portion of complicated expression

In the sample text below, I want to match groups of text (newlines and all) starting with a line defined by \nI.*' and including the subsequent lines starting with \nA, only if none of the intermediate lines contains "BOM=". I.e. in the example, I would want to match the first "device" and its following attributes, but not the second device, as shown in my comments (after #s).
I 657 device:THAT 2 1290 400 0 1 ' # Start matching here because no lines have "BOM="
A 1335 425 12 0 5 0 some text
A 1335 455 12 0 5 0 some text
A 1300 440 12 0 9 3 some text
A 1370 375 12 0 3 0 some text # Finish matching here
C 655 1 3 0
A 1370 450 12 0 3 3 #=2
C 740 2 4 0
A 1305 450 12 0 9 3 #=1
C 740 2 4 0
A 1305 450 12 0 9 3 #=1
I 318 device:THIS 2 300 1840 0 1 ' # Do not match again here because there's a line with "BOM="
A 320 1880 12 0 7 3 some text
A 320 1880 12 0 9 3 some text
A 380 1880 12 0 1 1 BOM=1,2
A 345 1865 12 0 5 0 some text
A 380 1830 12 0 3 0 some text
C 666 1 3 0
In the sample text, "some text" is various descriptors for electrical devices, e.g. "RATING=63MW", "REFDES=R123". It may contain whitespace but not newlines.
The furthest I've gotten yet is the expression
((\n|^)I((?!misc).)*?'\n)((A.*\n)*(A.*BOM=.*\n)(A.*\n)*)
which matches the opposite of what I want, i.e it finds the text blocks that DO contain BOM=. I thought I could switch this by changing (A.*BOM=.*\n) to (?!(A.*BOM=.*\n)) but this did not work.
I'm hoping to use this in Notepad++ when I'm done.
You can perhaps try this regex:
^I(?:(?!misc).)*'\n(?!(?:A.*\n)*?A.*BOM=)(?:A.*\n)*
regex101 demo
I added a third block where the BOM= is instead on a line starting with C, where the device being matched because BOM= is not on the same line as the consecutive lines beginning with A.
Multiline by default matches on every line on Notepad++, so it's usually not necessary to have (^|\n), but you can revert it if you need it.
I also kept (?:(?!misc).)* in because you had it in your expression, although it doesn't have to do anything with your sample data.
(?!(?:A.*\n)*?A.*BOM=) is what's making the match fail when there's a BOM= in the lines. It's a negative lookahead which will prevent a match only if A.*BOM= matches after any number of lines of (?:A.*\n)*? (i.e. lines beginning with A).