find specific number combination within from and to text - regex

I want only a match on a 3 digit number (under 600, in example below "598") when a specific number in string is visible between start wording and end wording. With below regular expression I get a match of everything, can anyone help?
Regular expression: (?<=Start)(.*)(?=End).
Test string:
Start 440 3 956 4 603 5 - 6 603 7 440 8 - 9 440 10 956 11 440 12 603 13 2005
14 440 15 598 16 1156 17 946 18 761 19 761 20 946 21 598 22 598
23 1156 24 2057 25 946 26 1194 27 946 28 946 - - - Zurich 2019 M T W T F S S - - - - 1 - 2 1058 3 542 4 852 5 - 6 1517 7 1058 8 - 9 1058 10 848 11 542 12 705 13 1306 14 1058 15 1258 16 2159 17 1617 18 700 19 863 20 700 21 1258 22 1911 23 1911 24 1617 25 1258 26 2759 27 1258 28 1258 - - - End

With \b[0-5]\d{2}\b you find all 3 digit number under 600.
Demo: https://regex101.com/r/0ZSbbY/2

Try this pattern:
(?<=^|\D)[1-5]?\d{2}(?!.+Start)(?=\D.+End)
(?<=^|\D)[1-5]?\d{1,2} this will match all 1- or 2-digit numbers, as they are less than 600. It also findes also 1**, 2**, 3**, 4**, 5** numbers.
(?!.+Start)(?=\D.+End) this lookahead assure that we are before End word and not before Start word, i.e. between them. It couldn't be done with positive lookbehind as #TimBiegeleisen stated, as it would have variable length.
Demo

#!/usr/bin/perl
use Modern::Perl;
use Data::Dumper;
my $str = 'Start 440 3 956 4 603 5 - 6 603 7 440 8 - 9 440 10 956 11 440 12 603 13 2005 14 440 15 598 16 1156 17 946 18 761 19 761 20 946 21 598 22 598 23 1156 24 2057 25 946 26 1194 27 946 28 946 - - - Zurich 2019 M T W T F S S - - - - 1 - 2 1058 3 542 4 852 5 - 6 1517 7 1058 8 - 9 1058 10 848 11 542 12 705 13 1306 14 1058 15 1258 16 2159 17 1617 18 700 19 863 20 700 21 1258 22 1911 23 1911 24 1617 25 1258 26 2759 27 1258 28 1258 - - - End';
my $threshold = 600;
my $re = qr/
(?: # start non capture group
Start # literally
| # OR
\G # iterate from last match position
) # end group
(?:(?!End).)*? # make sure we don't have "End" before to number to find
(?<!\d) # negative lookbehind, make sure we don't have a digit before
(\d{3}) # 3 digit number
(?!\d) # negative lookahead, make sure we don't have a digit after
/x;
# Retrieve all 3 digit numbers between Start and End
my #numbers = $str =~ /$re/g;
# Select numbers that are less than $threshold. In this case 600
#numbers = grep { $_ < $threshold } #numbers;
say Dumper \#numbers;
Output:
$VAR1 = [
440,
440,
440,
440,
440,
598,
598,
598,
542,
542
];

If you're searching for a specific number, like one that is close to 600, I would suggest to use regexp to collect all numbers and then use some algorythm to find matching number.
This regexp will help you to check that your string matches pattern and to collect all numbers using group "number".
^Start (([^\d]+ )*((?<number>\d+) )*)*End$
This simplier regexp will help you to collect numbers without checking all String:
\d+
Iterate trough your numbers collection and find needed one.
Sorry I don't noticed what language do you use to write code snippet.

Related

I need to fetch all the numbers between 2 spaces after my expression

I would like to extract data from the below sample data using regex
I have tried \d{2}/\d{4} and get the ex: 39/2021.I need to get 23 which is in between 2 spaces. Any numbers between those 2 spaces after my expression.
Sample Data
Backlog 25 567 07/2022 120 2510
39/2021 23 590 08/2022 120 2630
40/2021 120 710 09/2022 120 2750
41/2021 120 830 10/2022 120 2870
42/2021 120 950 11/2022 120 2990
45/2021 120 1070 12/2022 120 3110
47/2021 120 1190 13/2022 120 3230
48/2021 120 1310 14/2022 240 3470
49/2021 120 1430 15/2022 120 3590
50/2021 120 1550 16/2022 120 3710
51/2021 120 1670 17/2022 240 3950
52/2021 120 1790 18/2022 120 4070
02/2022 120 1910 19/2022 120 4190
03/2022 120 2030 20/2022 120 4310
04/2022 120 2150 21/2022 240 4550
05/2022 120 2270 22/2022 120 4670
06/2022 120 2390 23/2022 120 4790
enter image description here
I have added a picture reference for the output.
You can use a capture group, matching a space before the digits and either assert a whitespace boundary after it or match the following space
\b\d{2}/\d{4} (\d+)(?!\S)
The pattern matches:
\b A word boundary
\d{2}/\d{4} Match 2 digits / 4 digits
(\d+) Capture 1+ digits in group 1
(?!\S) Negative lookahead, assert a whitespace boundary to the right
Regex demo
If there should be a space at the left and at the right:
\b\d{2}/\d{4} (\d+)
Regex demo

add number as prefix

I have list of number:
19
20
21
22
23
24
25
26
many more numbers...
I want to add one number to all of then as prefix so thay will all becam etree digit numbers:
219
220
221
222
223
224
225
226
It should go lik this in find section: \S{2,} than what should I put in replace section? 2$1 or what I em not expert.
Find all two digits and capture them (with parentheses).
\b(\d\d)\b
Replace captured groups with an additional 2 in front.
2$1

Matching across multiple lines regular expression

I have several lists in a single text file that look like below. It always starts with 0 and it always ends with the word Unique at the start of a newline. I would like to get rid of all of it apart from the line with Unique on it. I looked through stackoverflow and tried the following but it returns the whole text file (there are other strings in the file that I haven't put in this example). Basically the problem is how to account for the newlines in the regex selection
^0(.|\n)*
Input:
0 145
1 139
2 175
3 171
4 259
5 262
6 293
7 401
8 430
9 417
10 614
11 833
12 1423
13 3062
14 10510
15 57587
16 5057575
17 10071
18 375
19 152
20 70
21 55
22 46
23 31
24 25
25 22
26 25
27 14
28 16
29 16
30 8
31 10
32 8
33 21
34 8
35 51
36 65
37 605
38 32
39 2
40 1
41 2
44 1
48 2
51 1
52 1
57 1
63 2
68 1
82 1
94 1
95 1
101 3
102 7
103 1
110 1
111 1
119 1
123 1
129 2
130 3
131 2
132 1
135 1
136 2
137 7
138 4
Unique: 252851
Expected output:
Unique: 252851
You need to use something like
^0[\s\S]*?[\n\r]Unique:
and replace with Unique:.
^ - start of a line
0 - a literal 0
[\s\S]*? - zero or more characters incl. a newline as few as possible
[\n\r] - a linebreak symbol
Unique: - a whole word Unique:
Another possible regex is:
^0[^\r]*(?:\r(?!Unique:)[^\r]*)*
where \r is the line endings in the current file. Replace with an empty string.
Note that you could also use (?m)^0.*?[\r\n]Unique: regex (to replace with Unique:) with the (?m) option:
m: multi-line (dot(.) match newline)
Your method of matching newlines should work, although it's not optimal (alternation is rather slow); the next problem is to make sure the match stops before Unique:
(?s)^0.*(?=Unique:)
should work if there is only one Unique: in your file.
Explanation:
(?s) # Start "dot matches all (including newlines) mode
^0 # Match "0" at the start of the file
.* # Match as many characters as possible
(?=Unique:) # but then backtrack until you're right before "Unique:"

How to find and replace in a text editor?

I am very new to text editing, so I'm sorry if this question is unclear, let me know if there's anything I can specify to make my question more understandable.
My file has 27 tab-separated columns and thousands of rows. I want to replace tabs with an underscore (basically merging the first 3 columns together), but only after my first two columns. How do I do this?
Here's what I currently have for my find:
([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([
^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^
\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\r
and then here's my replace:
\1_\2_\3\t\4\t\5\t\6\t\7\t\8\t\9\t\10\t\11\t\12\t\13\t\14\t\15\t\16\t\17\t\18\t\19\t\20\t\21\t\22\t\23\t\24\t\25\t\26\t\27\r
Also, any references to a good regex guide would be welcomed!
Below are representative data. Each number is separated by a tab in my editor, not by a space.
chr1 28404 29751 25 14 57 42 44 44 56 34 16 24 18 24 24 23 24 163 57 30 28 31 36 23 28 17
chr1 235561 236222 5 13 4 24 4 8 7 6 5 14 20 7 10 3 6 11 9 9 16 8 16 6 11 9
chr1 540455 541272 20 11 6 7 5 7 12 24 7 9 9 6 22 3 10 32 18 22 11 13 10 27 9 10
chr1 713112 715467 96 105 332 159 131 277 225 199 61 164 128 116 156 107 143 687 204 186 97 125 174 193 213 118
chr1 761657 764380 106 153 334 182 161 326 215 343 85 174 160 135 176 151 141 724 308 223 120 141 200 198 247 151
Try this
Find :
(.+?)\t(.+?)\t(.+?)\n
Replace with
\1_\2_\3\n
have a look at Demo
Moreover you'll have to disable ". matches New Line" in your text editor.
So, you have something like
Running this search and replace below, you will get:
Regex:
^(\s*(?:[^\t]+\t){2})([^\t]+)\t([^\t]+)\t
Replacement: $1$2_$3_
If you can have empty columns, replace the + quantifier to *:
^(\s*(?:[^\t]*\t){2})([^\t]*)\t([^\t]*)\t

REGEX: How to split string with space and double quote

I have a input of string with spaces and double quotes as below:
Input :
18 17 16 "Arc 10 12 11 13" "Segment 10 23 33 32 12" 23 76 21
Expected Output:
18
17
16
Arc 10 12 11 13
Segment 10 23 33 32 12
23
76
21
How can I do this using Regex? Thank you in advance
You can use next regexp(see example):
("[^"]+")|\S+
("[^"]+") - quoted sequence.
\S+ - non whitespace sequence.
Probably order of groups is depend from regexp implementation. In the demo engine matching stared from left to right. Also do not forget escape special characters with double slash.
"(.+?)"|(\w+(?=\s|$))
check here