Regex, replacing for newline with group replace - regex

675185538end432 204 9/9 4709 908 2
343269172end430 3 43 9335 975 7
590144128end89 7 29 3-5-4 420 2
337460105end8Y5 7A 78 2 23
292484648end70 A53 03 9235 93
These are the strings that I am working with. I want to find a regex to replace the above strings as follows
675185538
432 204 9/9 4709 908 2
343269172
430 3 43 9335 975 7
590144128
89 7 29 3-5-4 420 2
337460105
8Y5 7A 78 2 23
292484648
70 A53 03 9235 93
Wherever end comes, \r\n should be introduced.
The string before end is numeric and after end is alphanumeric with whiteline characters.
I am using notepad++.

To make the match strict, try this:
Find: ^(\d+)end(\w)
Replace: \1\r\n\2
This captures, then puts back via back references, the preceding number between start of line and "end" and the following digit/letter. This won't match "end" elsewhere.

Kludgery:
Find (\d\d\d\d\d\d\d\d\d)end(\d)
Replace \1\r\n\2
Find creates two capture groups:
each group is bounded by an ( and a )
one capture group matches exactly nine numerals
the other capture group matches exactly one numeral.
In the replace:
the first capture group is referenced with \1
and the second group with \2.

Related

Regex, How to find a repeating group of random numbers?

I'm currently parsing data from PDFs and I'd like to get the name and amount in a simple format: [NAME] [AMOUNT]
NAME LAST
7 494 25 7 494 25 199 44
NAME LAST
4 488 00 4 488 00 109 07
NAME MIDDLE LAST
7 854 00 7 854 00 298 25
NAME LAST
494 23 494 23 12 01
NAME MIDDLE LAST
4 301 56 4 301 56 112 61
NAME M LAST
13 359 25 13 359 25 130 54
This data means the following:
[NAME] [M?] [LAST]
[TOTAL WAGES] [PIT WAGES] [PIT WITHHELD]
NAME LAST $7,494.25 $7,494.25 $199.44
NAME LAST $4,488.00 $4,488.00 $109.07
NAME MIDDLE LAST $7,854.00 $7,854.00 $298.25
NAME LAST $494.23 $494.23 $12.01
NAME MIDDLE LAST $4,301.56 $4,301.56 $112.61
NAME M LAST $13,359.25 $13,359.25 $130.54
I'd like a regex to detect the duplicate group of numbers so that it parses to this:
NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25
Hopefully, that makes sense. Thanks
Assuming that no-one in your organisation is making more than $1M or less than $1, this regex will do what you want:
*([a-z][a-z ]+)\R+((\d+)(?: (\d+))? (\d+)) (?=\2).*
It looks for
some number of spaces
names (simplistically) with [a-z][a-z ]+ (captured in group 1)
newline characters (\R+)
2 or 3 sets of digits separated by spaces ((\d+)(?: (\d+))? (\d+)) (captured overall in group 2, with individual groups of digits captured in groups 3, 4 and 5)
a space, followed by an assertion that group 2 is repeated (?=\2)
characters to match the rest of the string to end of line (may not be required, dependent on your application) (.*)
You can replace that with
$1 \$$3$4.$5
to get the following output for your sample data:
NAME LAST $7494.25
NAME LAST $4488.00
NAME MIDDLE LAST $7854.00
NAME LAST $494.23
NAME MIDDLE LAST $4301.56
NAME M LAST $13359.25
Demo on regex101
If you're using JavaScript, you need a couple of minor changes. In the regex, replace \R with [\r\n] as JavaScript doesn't recognise \R. In the substitution, replace \$ with $$.
Demo on regex 101
If your regex flavour supports conditional replacements, you can add a , between the thousands and hundreds by checking if group 4 was part of the match:
$1 \$$3${4:+,}$4.$5
In this case the output is:
NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25
Demo on regex101

Regex match multiple numbers stop at string (word) despite more matches exist

Goal;
Match all variations of phone numbers with 8 digits + (optional) country code.
Stop match when "keyword" is found, even if more matches exist after the "keyword".
Need this in a one-liner and have tried a plethora of variations with lookahead/behind and negate [^keyword] but I am unable to understand how to achieve this.
Example of text;
abra 90998855
kadabra 04 94 84 54
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Ladida
keyword
I Want It To Stop Matching Here Or Right Before The "keyword"
more nice text with some matches
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Example of regex;
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})[^keyword]
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?!keyword)
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?=keyword)
-> This matches nothing
((\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?:(?!keyword))*)
-> This matches all numbers also below the keyword

Regex only searching part of a string

I'm trying to do some validation on a string.
Y0 40 98 86 A
I would like to be able to replace the 0's which occur in the first 2 characters ie Y0 with O's.
I know how to do the replace part but am struggling to just select that first 0.
It should match all 0's within that first two characters. Ie 00 0Z etc
To clarify, I don't mind what language I just need helping making the Regex Selector
One-step replacement
Thanks to #Rawing for the comment:
"00 40 98 86 A".gsub(/^0|(?<=^.)0/, 'O')
# "OO 40 98 86 A"
The regex means :
start of the string, followed by:
zero, or
a character, followed by a zero.
Another variant by #SebastianProske:
"A0 40 98 86 A".gsub(/(?<!..)0/, 'O')
# "AO 40 98 86 A"
It means : a 0, but only when not preceded by two characters.
Here's a test.
Two steps replacement
It might be easier to do it in two steps. Replace the first character by O if it's a 0, then replace the second character if it's a 0.
Here's a ruby example with a matching group:
"Y0 40 98 86 A".sub(/^0/,'O').sub(/^(.)0/,'\1O')
# "YO 40 98 86 A"
You could also use a lookbehind:
"Y0 40 98 86 A".sub(/^0/,'O').sub(/(?<=^.)0/,'O')
=> "YO 40 98 86 A"

Matching across multiple lines regular expression

I have several lists in a single text file that look like below. It always starts with 0 and it always ends with the word Unique at the start of a newline. I would like to get rid of all of it apart from the line with Unique on it. I looked through stackoverflow and tried the following but it returns the whole text file (there are other strings in the file that I haven't put in this example). Basically the problem is how to account for the newlines in the regex selection
^0(.|\n)*
Input:
0 145
1 139
2 175
3 171
4 259
5 262
6 293
7 401
8 430
9 417
10 614
11 833
12 1423
13 3062
14 10510
15 57587
16 5057575
17 10071
18 375
19 152
20 70
21 55
22 46
23 31
24 25
25 22
26 25
27 14
28 16
29 16
30 8
31 10
32 8
33 21
34 8
35 51
36 65
37 605
38 32
39 2
40 1
41 2
44 1
48 2
51 1
52 1
57 1
63 2
68 1
82 1
94 1
95 1
101 3
102 7
103 1
110 1
111 1
119 1
123 1
129 2
130 3
131 2
132 1
135 1
136 2
137 7
138 4
Unique: 252851
Expected output:
Unique: 252851
You need to use something like
^0[\s\S]*?[\n\r]Unique:
and replace with Unique:.
^ - start of a line
0 - a literal 0
[\s\S]*? - zero or more characters incl. a newline as few as possible
[\n\r] - a linebreak symbol
Unique: - a whole word Unique:
Another possible regex is:
^0[^\r]*(?:\r(?!Unique:)[^\r]*)*
where \r is the line endings in the current file. Replace with an empty string.
Note that you could also use (?m)^0.*?[\r\n]Unique: regex (to replace with Unique:) with the (?m) option:
m: multi-line (dot(.) match newline)
Your method of matching newlines should work, although it's not optimal (alternation is rather slow); the next problem is to make sure the match stops before Unique:
(?s)^0.*(?=Unique:)
should work if there is only one Unique: in your file.
Explanation:
(?s) # Start "dot matches all (including newlines) mode
^0 # Match "0" at the start of the file
.* # Match as many characters as possible
(?=Unique:) # but then backtrack until you're right before "Unique:"

REGEX: How to split string with space and double quote

I have a input of string with spaces and double quotes as below:
Input :
18 17 16 "Arc 10 12 11 13" "Segment 10 23 33 32 12" 23 76 21
Expected Output:
18
17
16
Arc 10 12 11 13
Segment 10 23 33 32 12
23
76
21
How can I do this using Regex? Thank you in advance
You can use next regexp(see example):
("[^"]+")|\S+
("[^"]+") - quoted sequence.
\S+ - non whitespace sequence.
Probably order of groups is depend from regexp implementation. In the demo engine matching stared from left to right. Also do not forget escape special characters with double slash.
"(.+?)"|(\w+(?=\s|$))
check here