Find a String from a varying number block to the end - regex

I have nearly 8000 lines of the following text:
DIL 2 M 006 SC SCHÜTZ 083 1 Stck
25215-1 BIN-SORT 2152310251724-1 BIN-SORT getestet 048 133 Stck
RBBE60-T3dsg 21S003 SEALING 6X8.9X2.4 MM 082 3 Stck
I am only interested in the 3 digit block at the end and the number behind.
So this should be the output:
083 1
048 133
082 3
It could be, that the same number e.g. 048 appears at the beginning of the line. this shouldn't be a hit.
Unfortunatelly i have no idea how to extract this strings with the help of notpad++.

This expression,
.*(\d{3}\s+\d+).*
with a replacement of $1 is likely to work here.
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.

You may try the following find and replace, in regex mode:
Find: ^.*?(\d+ \d+) \S*$
Replace: $1
The logic here is to use .* to consume everything up until the last two consecutive digits in the line. Then, we replace with only the captured two digits.
Demo

Related

Regex to match blocks of text with key phrases in the middle

VB2010: I have text that consists of blocks of text that start with day and time DD HHMM and end only at the next day/time.
Here is my sample text:
18 2131 Z50000 ZZ-AAA
PR
PR
AGM TPS P773QQ 1500 DCA 22FEB
21,77,23,M10,F,26,3100,2
OK
18 2134 Z50000 ZZ-AAA
PR
QU HMKKDBB
.DDVZAZC 182134
ARR
FI US1500/AN P773QQ/DA KDCA/AD KMIA/IN 2026/FB 152/LA /LR
DT DDL DCAV 182134 M33A
- OS KMIA /GNO6541/R200RR
18 2134 Z50000 ZZ-AAA
PR
PR
ARR OPN P773QQ 1500 DCA 22FEB
0757
OK
18 2135 Z50000 ZZ-AAA
PR
PR
ARR M58 P773QQ 1500 DCA 22FEB
212
UNKNOWN POL/SPOL
QU HMKKDBB
.DDVZAZC 182134
ARR
FI US1500/AN P773QQ/DA KDCA/AD KMIA/IN 2026/FB 152/LA /LR
DT DDL DCAV 182134 M33A
- OS KMIA /GNO6541/R200RR
18 2136 Z50000 ZZ-AAA
PRF 1500/18 MIA IN 0152 333
18 2137 Z50000 ZZ-AAA
PR
PRZ 1500/18 MIA IN 2026 N/A 333
My goal is to get only the blocks of text that have key phrases ^FI and ^DT in the middle. The matching groups should contain only two blocks. The one from 18 2134 and end at M33A and then from 18 2135 to M33A.
I have tried:
This works for the most part except it starts the match at the prior block.
RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase
^\d\d \d{4}(.*?)^FI US(.*?)^DT DDL(.*?)\r
This one I took from another post but cant seem to wrap my head around. It matches only the first part of every block.
RegexOptions.Multiline Or RegexOptions.IgnoreCase
^\d\d \d{4}.*\r[\s\S]*?(?=(?:^\d\d \d{4}|$))
Haven't used regex in a while so any help appreciated.
You may use
(?ms)^\d\d +\d{4}\b(?:(?!^(?:\d\d +\d{4}\b|FI|DT)).)*?^(?:FI|DT).*?(?=^\d\d +\d{4}\b|\Z)
See the regex demo (Though it is a PCRE regex test, it will work the same in .NET).
Pattern details
(?ms) - multiline and singleline options
^ - start of a line
\d\d +\d{4}\b - 2 digits, 1 or more spaces and 4 digits as a whole word
(?:(?!^(?:\d\d +\d{4}\b|FI|DT)).)*? - any char, 0+ repetitions, as few as possible, that does not start the sequence: start of a line, 2 digits, 1 or more spaces and 4 digits as a whole word, or FI or DT
^(?:FI|DT) - FI or DT at the start of a line
.*? - any 0+ chars, as few as possible
(?=^\d\d +\d{4}\b|\Z) - a positive lookahead that requires ^\d\d +\d{4}\b (start of a line, 2 digits, 1 or more spaces and 4 digits as a whole word) or \Z (end of string) to match immediately to the right of the current location.
This regex should find what you need, if single line enabled
[0-3]\d\s+[0-2]\d[0-5]\d.*?(FI.*?)\n(DT.*?)\n
Explanation:
[0-3]\d\s+[0-2]\d[0-5]\d day hour and minute check
.*? ungreedy capturing, . includes newline
(FI.*?)\n first group, FI line, until line break
(DT.*?)\n second group, same deal

regex working with long lines

I got a lot of these strings in one txt-file:
X00NAP-0111-OG02Flur-A 2 AIR-CAP2702I-E-K9 00:b8:b8:b8:7d:b8 0111-HGS DE 10.100.100.100 8
X006NAP-0500-EG00Grossrau-A 2 AIR-CAP2702I-E-K9 50:0f:80:94:82:c0 HGS 0500 DE 10.100.100.100 1
Y008NAP-8399-OG04OE3020-A 2 AIR-CAP2702I-E-K9 00:b8:b8:b8:7d:b8 HGS Erfurter Hof DE 10.100.100.100 1
A1234NAP-4101-OG02Raum237-A 2 AIR-CAP2602I-E-K9 00:b8:b8:b8:7d:b8 AP 2 Anmeldung V DE 10.100.100.100 0
I am only interested in the first string and the number on the end of the lines. The number can be max. 99
So in the end I would like to have a output like this:
X00NAP-0111-OG02Flur-A 8
X006NAP-0500-EG00Grossrau-A 1
Y008NAP-8399-OG04OE3020-A 1
A1234NAP-4101-OG02Raum237-A 0
I tried a lot of things with regex, but nothing worked really.
Here is a general regex solution:
Find:
^([^\s]*).*(\d+)$
Replace:
$1 $2
The idea here is to match the first string and final number as capture groups, which are indicated by the two terms in the pattern surrounded by parentheses. These capture groups are made available in the replacement as $1 and $2 (sometimes \1 and \2, depending on the regex tool/engine). We can replace each line with these capture groups to leave you with the output you expect.
Note that this may "trash" the original file, but if you are using a tool like Notepad++, you can simply copy this result out, then undo the replacement, or just close the original file without saving.
Demo
The simplest way I can think of is:
Find: " .* "
Replace: " "
This replaces everything from the first space to the last space with a single space, achieving your goal.
Note: Quotes are only there to help show where spaces are in the regex.

Regular Expression for parsing a sports score

I'm trying to validate that a form field contains a valid score for a volleyball match. Here's what I have, and I think it works, but I'm not an expert on regular expressions, by any means:
r'^ *([0-9]{1,2} *- *[0-9]{1,2})((( *[,;] *)|([,;] *)|( *[,;])|[,;]| +)[0-9]{1,2} *- *[0-9]{1,2})* *$'
I'm using python/django, not that it really matters for the regex match. I'm also trying to learn regular expressions, so a more optimal regex would be useful/helpful.
Here are rules for the score:
1. There can be one or more valid set (set=game) results included
2. Each result must be of the form dd-dd, where 0 <= dd <= 99
3. Each additional result must be separated by any of [ ,;]
4. Allow any number of sets >=1 to be included
5. Spaces should be allowed anywhere except in the middle of a number
So, the following are all valid:
25-10 or 25 -0 or 25- 9 or 23 - 25 (could be one or more spaces)
25-10,25-15 or 25-10 ; 25-15 or 25-10 25-15 (again, spaces allowed)
25-1 2 -25, 25- 3 ;4 - 25 15-10
Also, I need each result as a separate unit for parsing. So in the last example above, I need to be able to separately work on:
25-1
2 -25
25- 3
4 - 25
15-10
It'd be great if I could strip the spaces from within each result. I can't just strip all spaces, because a space is a valid separator between result sets.
I think this is solution for your problem.
str.replace(r"(\d{1,2})\s*-\s*(\d{1,2})", "$1-$2")
How it works:
(\d{1,2}) capture group of 1 or 2 numbers.
\s* find 0 or more whitespace.
- find -.
$1 replace content with content of capture group 1
$2 replace content with content of capture group 2
you can also look at this.

Keep only a specific portion that is between brackets and remove all the other ones

I want to remove everything between [] and ()
Samples:
ST Action Issue 08 (1988-12)(GollnerPublishing Ltd)
ST Magazine 070 (1993-03)(La Terre du Milieu)(fr)
20000 Leagues Under The Sea (1988)(Coktel Vision)[cr Big 4]
A Day at the Races (1989)(Team)(Disk 1 of 2)[cr MCA]
My regex :[^a-zA-Z\s\d][\d-()A-Za-z\] ]+
It works, the only catch is that I need to keep this one: (Disk 1 of 2). So in the last example, that would be something like this: A Day at the Races (Disk 1 of 2)
I can't find a way to exclude (Disk 1 of 2) (ie 'Don't match') and integrate it in the whole expression.
You can leverage a negative lookahead (?![([]Disk 1 of 2[\])]) that will avoid matching (Disk 1 of 2) or [Disk 1 of 2]:
(?![([]Disk 1 of 2[\])])(?:\([^()]*\)|\[[^\][]*])
See the regex demo
The (?:\([^()]*\)|\[[^\][]*]) part just matches either a string between round brackets (parentheses) containing no parentheses inside (\([^()]*\)) or (|) a string inside square brackets containing no square brackets inside (\[[^\][]*]).
If 1 and 2 are dynamic and stand for integer numbers, use \d+: (?![([]Disk \d+ of \d+[\])])

I'm trying to do a search/replace using regex for mass replacing on Notepad++

I need to add a parameter for each code and name, i tried using (.+) or (.*) for each number, but it didnt work. Each space means that is a different number and not every space has the same width. Example from this:
Abanda CDP 192 129 58 0 0 0 2 3 3
2.998 0.013 33.091627 -85.527029 2582661
To this:
Abanda CDP |code1=192 |code2=129 |code3=58 |code4=0 |code5=0 |code6=0 |code7=2 |code8=3 |code9=3
|code9=2.998 |code10=0.013 |code11=33.091627 |code12=-85.527029 |code13=2582661
Try ([0-9.-]+). The reason .+ doesn't work is because . matches whitespace as well. The reason you can't just use \S+ (non-spaces) is because you only want to match the numbers.