I want to find all separated words (which means characters between two spaces), that are decimal numbers including plus and minus signs in Linux terminal using egrep.
My solution:
(?<= |\n|\t)[\+\-]?[0-9]+(?= |\n|\t)
Explanation:
(?<= |\n|\t) checks if there is a space or newline or tabulator before decimal number
(?= |\n|\t) checks if there is a space or newline or tabulator after decimal number.
This code works well in program Kiki 0.5.6 where I test implementation, but if I copy it to terminal, it doesn't work. I think that terminal doesn't recognize special parentheses constructions (?= or ?<=). Am I right? How can I apply to terminal?
For example: my text:
1.fasfa
123asfavdsvdas156
1safsavdsvsd1sdva5s31as35d1va
595s6dva2sdvas9
asd9as5dv92s
sd559vs fs5s94 4dfs dfa4s44 459 9dasf 8sdfa 5sfa
napr. uNIveRziTA
sfaf 2262 2226 56565 adss
uNiVerZita
uNIVERZITa
123
123 sadasf 123456 sfafs 134
-1234- -25- -5- 5- --55
-
-55
123 100 999 124 6262 62 6 2 62 62 65 26565 22 62 62652 +665 +0649 ---662 265 959 595 099 199 -059 -0245 -444
--1245 -555-5-55 --555- 555-
+25
-55
+++55 +5 ++5 ++55+665+
samo samo samo samo otec otec skola skola samo lamo samo lamo
re20. (?<=(\t|\n| ))([+-])?[1-9][0-9]*(?= |$|\n)
--- ---
doma doma doma doma doma doma doma doma doma
meno.priezvisko#tuke.sk meno.priezvisko.1#tuke.sk meno.priezvisko#student.tuke.sk meno.priezvisko.2#student.tuke.sk
23:56:59.555
00:00:00.000
23:59:59.999
31/12/2099
00/12/2054
01/01/2000
matches:
459
2262
2226
56565
123
123
123456
134
-55
123
100
999
124
6262
62
6
2
62
65
26565
22
62
62652
+655
+0649
egrep does not support lookaround assertions. However, GNU grep comes with perl compatible regular expressions using the -P switch:
grep -oP '(?<=\s|^)[+-]?[0-9]+(?=\s|$)' input
Note that you can simplify |\n|\t to \s which stands for whitespace character. In order to match numbers that start at the begin of a line and numbers that end at the end of the line I've added ^ and $ as alternatives for \s.
Related
Goal;
Match all variations of phone numbers with 8 digits + (optional) country code.
Stop match when "keyword" is found, even if more matches exist after the "keyword".
Need this in a one-liner and have tried a plethora of variations with lookahead/behind and negate [^keyword] but I am unable to understand how to achieve this.
Example of text;
abra 90998855
kadabra 04 94 84 54
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Ladida
keyword
I Want It To Stop Matching Here Or Right Before The "keyword"
more nice text with some matches
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Example of regex;
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})[^keyword]
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?!keyword)
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?=keyword)
-> This matches nothing
((\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?:(?!keyword))*)
-> This matches all numbers also below the keyword
I have this regex for phone numbers in New Zealand:
^\+?[\d\s\(\)]{1,14}$|^((\+?64\s*[\(]?2\d{1}[\)]?|\(?02\d{1}\)?)\s*\d{3}\s*\d{3,5})$
I want to allow an empty string as well so I do what the internet says to do (add ^$| to the front):
As you can see it makes most of the passing numbers fail. It does the same thing when I add brackets to the front.
How do I allow empty strings and also phone numbers using the expression at the top of this question?
Please copy paste this into regexr.com to experiment with possible solutions:
expression:
^\+?[\d\s\(\)]{1,14}$|^((\+?64\s*[\(]?2\d{1}[\)]?|\(?02\d{1}\)?)\s*\d{3}\s*\d{3,5})$
Text:
Positive:
021 755 2375
+79261234567
9261234567
+1234567
89261234567
4035555678
23423
3454
021 2343234
926 3 4
1 416 555 9292
926 1234567
495 1234567
+7 555 1234567
+7(926)1234567
(926) 1234567
469 123 45 67
0800 345345786
09 419 7555
0800 475 4669
202 555 4567
Negative:
027 .343 -454
8 800 600-APPLE
+42 555.123.4567
926.123.4567
64 25 .435 -34323
025 .435 -343
123-4567
123-89-01
+1-(800)-123-4567
8 (926) 1234567
415-555-1234
650-555-2345
(416)555-3456
09-419 7555
364563456345645643565346768
Use grouping:
(?:^$)|(?:^\+?[\d\s\(\)]{1,14}$|^((\+?64\s*[\(]?2\d{1}[\)]?|\(?02\d{1}\)?)\s*\d{3}\s*\d{3,5})$)
Demo
Given the following list of phone numbers
8144658695
812 673 5748
812 453 6783
812-348-7584
(617) 536 6584
834-674-8595
Write a single regular expression (use vim on loki) to reformat the numbers so they look like this
814 465 8695
812 673 5748
812 453 6783
812 348 7584
617 536 6584
834 674 8595
I am using the search and replace command. My regular expression using back referencing:
:%s/\(\d\d\d\)\(\d\d\d\)\(\d\d\d\d\)/\1 \2 \3\g
only formats the first line.
Any ideas?
Try this:
:%s,.*\(\d\d\d\).*\(\d\d\d\).*\(\d\d\d\d\).*,\1 \2 \3,
First use count to match a pattern multiple times, it is a bad habbit to repeat the pattern:
\d\{3} "instead of \d\d\d
Than you also have to match the whitespaces etc:
:%s/.*\(\d\{3}\).*\(\d\{3}\).*\(\d\{4}\).*/\1 \2 \3/g
Or even better, escape the whole regex with \v:
:%s/\v.*(\d{3}).*(\d{3}).*(\d{4}).*/\1 \2 \3/g
This greatly increases readability
675185538end432 204 9/9 4709 908 2
343269172end430 3 43 9335 975 7
590144128end89 7 29 3-5-4 420 2
337460105end8Y5 7A 78 2 23
292484648end70 A53 03 9235 93
These are the strings that I am working with. I want to find a regex to replace the above strings as follows
675185538
432 204 9/9 4709 908 2
343269172
430 3 43 9335 975 7
590144128
89 7 29 3-5-4 420 2
337460105
8Y5 7A 78 2 23
292484648
70 A53 03 9235 93
Wherever end comes, \r\n should be introduced.
The string before end is numeric and after end is alphanumeric with whiteline characters.
I am using notepad++.
To make the match strict, try this:
Find: ^(\d+)end(\w)
Replace: \1\r\n\2
This captures, then puts back via back references, the preceding number between start of line and "end" and the following digit/letter. This won't match "end" elsewhere.
Kludgery:
Find (\d\d\d\d\d\d\d\d\d)end(\d)
Replace \1\r\n\2
Find creates two capture groups:
each group is bounded by an ( and a )
one capture group matches exactly nine numerals
the other capture group matches exactly one numeral.
In the replace:
the first capture group is referenced with \1
and the second group with \2.
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
It's perhaps quite simple, but I can't figure it out:
I have a random number (can be 1,2,3 or 4 digits)
It's repeating on a second line:
2131
2131
How can I remove the first number?
EDIT: Sorry I didn't explained it better. These lines are in a plain text file. I'm using BBEdit as my editor. And the actual file looks like this (only then app. 10.000 lines):
336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain
If possible the result should look like this:
336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain
Search:
^(\d{1,4})\n(?:\1\n)+([a-z]+$)
Replace:
\1 - \2
I don't have access to BBEdit, but apparently you have to check the "Grep" option to enable regex search-n-replace. (I don't know why they call it that, since it seems to be powered by the PCRE library, which is much more powerful than grep.)
since you didn't mention any programming language, tools. I assume those numbers are in a file. each per line, and any repeated numbers are in neighbour lines. uniq command can solve your problem:
kent$ echo "1234
dquote> 1234
dquote> 431
dquote> 431
dquote> 222
dquote> 222
dquote> 234"|uniq
1234
431
222
234
Another way find: /^(\d{1,4})\n(?=\1$)/ replace: ""
modifiers mg (multi-line and global)
$str =
'1234
1234
431
431
222
222
222
234
234';
$str =~ s/^(\d{1,4})\n(?=\1$)//mg;
print $str;
Output:
1234
431
222
234
Added On the revised sample, you could do something like this:
Find: /(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/
Replace: $1 - $2
Mods: /mg (multi-line, global)
Test:
$str =
'
336
336
rinde
337
337
337
diving
338
338
graffiti
339
337
339
forest
340
340
mountain
';
$str =~ s/(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/$1 - $2/mg;
print $str;
Output:
336 - rinde
337 - diving
338 - graffiti
339
337
339 - forest
340 - mountain
Added2 - I was more impressed with the OP's later desired output format than the original question. It has many elements to it so, unable to control myself, generated a way too complicated regex.
Search: /^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n])*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
Replace: $1 - $2\n
Modifiers: mg (multi-line, global)
Expanded-
# Find:
s{ # Find a single unique digit pattern on a line (group 1)
^(\d{1,4})\n+ # Grp 1, capture a digit sequence
(?:\1\n+)* # Optionally consume the sequence many times,
\s* # and whitespaces (cleanup)
# Get the next word (group 2)
(?:
# Either find a valid word
( # Grp2
(?:
(?:\w|[^\S\n])* # Optional \w or non-newline whitespaces
[a-zA-Z] # with at least one alpha character
(?:\w|[^\S\n])*
)
)
\s* # Consume whitespaces (cleanup),
(?:\n|$) # a newline
# or, end of string
|
# OR, dont find anything (clears group 2)
)
}
# Replace (rewrite the new block)
{$1 - $2\n}xmg; # modifiers expanded, multi-line, global
find:
((\d{1,4})\r(\D{1,10}))|(\d{1,6})
replace:
\2 - \3
You should be able to clean it up from there quite easily!
Detecting such a pattern is not possible using regexp.
You can split the string by the "\n" and then compare.