regex working with long lines

regex working with long lines - regex

I got a lot of these strings in one txt-file:
X00NAP-0111-OG02Flur-A 2 AIR-CAP2702I-E-K9 00:b8:b8:b8:7d:b8 0111-HGS DE 10.100.100.100 8
X006NAP-0500-EG00Grossrau-A 2 AIR-CAP2702I-E-K9 50:0f:80:94:82:c0 HGS 0500 DE 10.100.100.100 1
Y008NAP-8399-OG04OE3020-A 2 AIR-CAP2702I-E-K9 00:b8:b8:b8:7d:b8 HGS Erfurter Hof DE 10.100.100.100 1
A1234NAP-4101-OG02Raum237-A 2 AIR-CAP2602I-E-K9 00:b8:b8:b8:7d:b8 AP 2 Anmeldung V DE 10.100.100.100 0
I am only interested in the first string and the number on the end of the lines. The number can be max. 99
So in the end I would like to have a output like this:
X00NAP-0111-OG02Flur-A 8
X006NAP-0500-EG00Grossrau-A 1
Y008NAP-8399-OG04OE3020-A 1
A1234NAP-4101-OG02Raum237-A 0
I tried a lot of things with regex, but nothing worked really.

Here is a general regex solution:
Find:
^([^\s]*).*(\d+)$
Replace:
$1 $2
The idea here is to match the first string and final number as capture groups, which are indicated by the two terms in the pattern surrounded by parentheses. These capture groups are made available in the replacement as $1 and $2 (sometimes \1 and \2, depending on the regex tool/engine). We can replace each line with these capture groups to leave you with the output you expect.
Note that this may "trash" the original file, but if you are using a tool like Notepad++, you can simply copy this result out, then undo the replacement, or just close the original file without saving.
Demo

The simplest way I can think of is:
Find: " .* "
Replace: " "
This replaces everything from the first space to the last space with a single space, achieving your goal.
Note: Quotes are only there to help show where spaces are in the regex.

Related

Notepad++ add new line above changing syntax with replace

I have a constant syntax of "Se " but there is a number in front of it that changes. I want to add a newline \n before the number. I've tried using \c to address any character (for the changing number) during replace, I don't know how to get the number part to copy over or work.
this is what it currently looks like
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
I want the new line to be above any item that includes "Se", so that it looks like this
1 hinge 2pk
1 Se wall cabinet
4 door 15x40"
this is what i've tried so far (not including parenthesis)
REPLACE TOOL
Find what: [\C Se ]
Replace with: [\n\C Se ]
✓ = Regular expression
but this is what I get
1 hinge 2pk
C Se wall cabinet
4 door 15x40
How do I get the number to the left of "Se" to copy down (as this number is always changing)

You can use:
^\d+\h+Se\b
^ Start of string
\d+ Match 1+ digits
\h+ Match 1+ spaces
Se\b Match Se followed by a word boundary
Regex demo
In the replacement use a newline and the full match \n$0
Find what:
^\d+\h+Se\b
Replace with
\n$0

Well, try this simple code, hope it will help...
Find:^(\d.*? Se .*\n)
Replace with:\n$1 or \n\1

How to use a selective regex to perform replace in a pandas series?

I would like to use a regex when applying pandas.Series.str.replace. I am aware that it takes in regex, but my output is not as intended. Here is a simple example. Suppose I have
ser = pd.Series(['asd3', 'qwe3', 'asd4', 'zxc'])
I would like to turn the 'asd3' and 'asd4' into 'asd'. That is, simply removing any integer at the end. I am using the code:
ser.str.replace('asd([0-9])','')
Bote that I am using the ([0-9]) notation, which I interpret as saying: for any element of the series, if it looks like 'asd([0-9])', then replace the [0-9] with `` (that is, remove it). But what I get is
0
1 qwe3
2
3 zxc
whereas what I would like to get is:
0 asd
1 qwe3
2 asd
3 zxc
this is a simple example, and my regex string is uglier than that, but I hope this conveys the idea of what I intend to do.

In your case, .replace('asd([0-9])','') just removes asd and any digit after it.
Use
ser.str.replace('asd[0-9]+','asd')
or
ser.str.replace('(asd)[0-9]+',r'\1')
The .replace('asd[0-9]+','asd') will replace asd and any 1+ digits after it with asd, and in .replace('(asd)[0-9]+',r'\1'), the asd substring will be captured into Group 1 (due to the capturing parentheses) and 1+ digits will be matched, and the whole match will be replaced with the \1 placeholder that holds the value of Group 1 (that is, asd).

Regular Expression for parsing a sports score

I'm trying to validate that a form field contains a valid score for a volleyball match. Here's what I have, and I think it works, but I'm not an expert on regular expressions, by any means:
r'^ *([0-9]{1,2} *- *[0-9]{1,2})((( *[,;] *)|([,;] *)|( *[,;])|[,;]| +)[0-9]{1,2} *- *[0-9]{1,2})* *$'
I'm using python/django, not that it really matters for the regex match. I'm also trying to learn regular expressions, so a more optimal regex would be useful/helpful.
Here are rules for the score:
1. There can be one or more valid set (set=game) results included
2. Each result must be of the form dd-dd, where 0 <= dd <= 99
3. Each additional result must be separated by any of [ ,;]
4. Allow any number of sets >=1 to be included
5. Spaces should be allowed anywhere except in the middle of a number
So, the following are all valid:
25-10 or 25 -0 or 25- 9 or 23 - 25 (could be one or more spaces)
25-10,25-15 or 25-10 ; 25-15 or 25-10 25-15 (again, spaces allowed)
25-1 2 -25, 25- 3 ;4 - 25 15-10
Also, I need each result as a separate unit for parsing. So in the last example above, I need to be able to separately work on:
25-1
2 -25
25- 3
4 - 25
15-10
It'd be great if I could strip the spaces from within each result. I can't just strip all spaces, because a space is a valid separator between result sets.

I think this is solution for your problem.
str.replace(r"(\d{1,2})\s*-\s*(\d{1,2})", "$1-$2")
How it works:
(\d{1,2}) capture group of 1 or 2 numbers.
\s* find 0 or more whitespace.
- find -.
$1 replace content with content of capture group 1
$2 replace content with content of capture group 2
you can also look at this.

best approach for my pattern match

So, I've built a regex which follows this:
4!a2!a2!c[3!c]
which is translated to
4 alpha character followed by
2 alpha characters followed by
2 characters followed by
3 optional character
this is a standard format for SWIFT BIC code HSBCGB2LXXX
my regex to pull this out of string is:
(?<=:32[^:]:)(([a-zA-Z]{4}[a-zA-Z]{2})[0-9][a-zA-Z]{1}[X]{3})
Now this is targeting a specific tag (32) and works, however, I'm not sure if it's the cleanest, plus if there are any characters before H then it fails.
the string being matched against is:
:32B:HsBfGB4LXXXHELLO
the following returns HSBCGB4LXXX, but this:
:32B:2HsBfGB4LXXXHELLO
returns nothing.
EDIT
For clarity. I have a string which contains multiple lines all starting with :2xnumber:optional letter (eg, :58A:) i want to specify a line to start matching in and return a BIC from anywhere in the line.
EDIT
Some more example data to help:
:20:ABCDERF Z
:23B:CRED
:32A:140310AUD2120,
:33B:AUD2120,
:50K:/111222333
Mr Bank of Dad
Dads house
England
:52D:/DBEL02010987654321
address 1
address 2
:53B:/HSBCGB2LXXX
:57A://AU124040
AREFERENCE
:59:/44556677
A line which HSBCGB2LXXX contains a BIC
:70:Another line of data
:71A:Even more
Ok, so I need to pass in as a variable the tag 53 or 59 and return the BIC HSBCGB2LXXX only!

Your regex can be simplified, and corrected to allow a character before the H, to:
:32[^:]:.?([a-zA-Z]{6}\d[a-zA-Z]XXX)
The changes made were:
Lost the look behind - just make it part of the match
Inserting .? meaning "optional character"
([a-zA-Z]{4}[a-zA-Z]{2}) ==> [a-zA-Z]{6} (4+2=6)
[0-9] ==> \d (\d means "any digit")
[X]{3} ==> XXX (just easier to read and less characters)
Group 1 of the match contains your target

I'm not quite sure if I understand your question completely, as your regular expression does not completely match what you have described above it. For example, you mentioned 3 optional characters, but in the regexp you use 3 mandatory X-es.
However, the actual regular expression can be further cleaned:
instead of [a-zA-Z]{4}[a-zA-Z]{2}, you can simply use [a-zA-Z]{6}, and the grouping parentheses around this might be unnecessary;
the {1} can be left out without any change in the result;
the X does not need surrounding brackets.
All in all
(?<=:32[^:]:)([a-zA-Z]{6}[0-9][a-zA-Z]X{3})
is shorter and matches in the very same cases.
If you give a better description of the domain, probably further improvements are also possible.

Regex for single space

I'm trying to match a file which is delimited by multiple spaces. The problem I have is that the first field can contain a single space. How can I match this with a regex?
Eg:
Name Other Data Other Data 2
Bob Smith XX1 0101010101
John Doe XX2 0101010101
Bob Doe XX3 0101010101
John Smith XX4 0101010101
Can I split these lines into three fields with a regex, splitting by a space but allowing for the single space in the first field?

Hi the following regex should work
(\w*\s\w*)\s+\w{2}\d\s+\d*

This would work:
Pattern:
(.*?)[ ]{2,}(.*?)[ ]{2,}(.*)
Replacement:
+$1+ -$2- *$3*
$1 contains the first column, $2 the second and $3 the third one.
Example:
http://regexr.com?32tbt

You could split at two or more spaces:
[ ]{2,}
But you are probably better off, determining the lengths of the captures of this regular expression:
(Name[ ]+)(Other Data[ ]+)
And then to use a simple substring method that slices your lines into portions of the same length.
So in your case the first capture would be 15 characters long, the second 14 and the column would have 13 (but the last one doesn't really matter, which is why it isn't actually captured). Then you take the first 15, the next 14 and the remaining characters of every line and trim each one (remove trailing whitespace).

I think the simplest is to use a regex that matches two or more spaces.
/ +/
Which breaks down as... delimiter (/) followed by a space () followed by another space one or more times (+) followed by the end delimiter (/ in my example, but is language specific).
So simply put, use regex to match space, then one or more spaces as a means to split your string.

Usually, with this kind of files, the best approach is to get a substring based on where your required information is and then trim it. I see your file contains 16 chars before the second field, you can get a substring of length 16 from the beginning which will contain your desired text. You should trim it to get only the text you need without the spaces.
If the spacing pattern you posted is consistent (if it won't change among different files of this kind) you have also another problem: what happens to longer names?
Name Other Data
Johnny AppleseeXX1
TutankamonfirstXX2
if you really want to use a regex, be sure to avoid those corner cases.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex working with long lines - regex

The simplest way I can think of is: Find: " .* " Replace: " " This replaces everything from the first space to the last space with a single space, achieving your goal. Note: Quotes are only there to help show where spaces are in the regex.

Related

Notepad++ add new line above changing syntax with replace

How to use a selective regex to perform replace in a pandas series?

Regular Expression for parsing a sports score

best approach for my pattern match

Regex for single space

Categories

Resources