Regex split on white space except ones with a colon - regex

I have the following string
s = "hiack: 18 seqno: 37 cwnd: 20.000 ssthresh: 200 dupacks: 0"
I would like to use a regex to split that line up so it becomes
s = ['hiack: 18' 'seqno: 37' 'cwnd: 20.000' 'ssthresh: 200' 'dupacks: 0']
What regex pattern should I use to achive this?
Edit: I am using python incase that makes a difference

% nodejs
> s = "hiack: 18 seqno: 37 cwnd: 20.000 ssthresh: 200 dupacks: 0"
'hiack: 18 seqno: 37 cwnd: 20.000 ssthresh: 200 dupacks: 0'
> s.split(/\b\s+\b/)
[ 'hiack: 18',
'seqno: 37',
'cwnd: 20.000',
'ssthresh: 200',
'dupacks: 0' ]
>
Even if you use anything else than JS, you can pick up the regex.
It works using \b aka word boundaries

Related

How to preg_replace all digits except specific numbers?

It is necessary to clean a string from everything but English letters, spaces and specific numbers (eg 18,19,20 should be kept in the string).
Please help me with regex /([^a-zA-Z\s])/ to keep the specified numbers.
You can list the numbers that you want to keep between word boundaries for example and then make use of SKIP FAIL:
\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+
Rgex demo
$pattern = "/\b(?:1[89]|20)\b(*SKIP)(*F)|[^a-zA-Z\s]+/";
$s="test 18 119 19 50 20 ##$##%";
echo preg_replace($pattern, "", $s);
Output
test 18 19 20
Using the PREG_SPLIT_DELIM_CAPTURE option with preg_split:
$s="test 18 119 19 50 20 ##$##%";
echo implode('', preg_split('~\b(1[89]|20)\b|[^a-z\s]+~', $s, -1, PREG_SPLIT_DELIM_CAPTURE));

Regex match multiple numbers stop at string (word) despite more matches exist

Goal;
Match all variations of phone numbers with 8 digits + (optional) country code.
Stop match when "keyword" is found, even if more matches exist after the "keyword".
Need this in a one-liner and have tried a plethora of variations with lookahead/behind and negate [^keyword] but I am unable to understand how to achieve this.
Example of text;
abra 90998855
kadabra 04 94 84 54
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Ladida
keyword
I Want It To Stop Matching Here Or Right Before The "keyword"
more nice text with some matches
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Example of regex;
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})[^keyword]
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?!keyword)
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?=keyword)
-> This matches nothing
((\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?:(?!keyword))*)
-> This matches all numbers also below the keyword

Match number ending in 1 except when ending in 11

I need to match any number ending in 1 except numbers ending in 11. I use awk. To illustrate, the correctly working lines are:
if ( max ~ /1$/ && max !~ /11$/ ) { print max }
or using regex:
if ( max ~ /[^1]1$|^1$/ ) { print max }
or a much slower variant of the same regex:
([^1]|^)1$
I actualy suspect just this one part (with a modification) should work somehow. It is nice and short and readable, does the job in far less steps than the above combos, works for all numbers with 2 digits of more, but fails for 1 itself. Which I fixed above, but would prefer a better one (if there is). I actually need it to work for 1 to 3 digit numbers, but would prefer to not limiting it.
[^1]1$
As soon as I try quantifers to fix it, it fails to work correctly. It either starts picking leading 1s (e.g. 1211 is matched and it should not) or loose a single digit number 1 as a match. Obviously, my problem is lying in the fact I must match the end of the number. How to make a better regex?
Test cases:
Matching numbers are:
1
21
31
121
131
1021
skip (not match) numbers ending in 11 like:
11
111
211
1011
1211
Can't you just do, I believe it is quicker than a regex parsing:
If you know max is a number:
if ( max%10 == 1 && max%100 != 11 ) { print max }
If you do not know max is a number:
if ( max+0==max && max%10 == 1 && max%100 != 11 ) { print max }
If you want a regex, you can use ^[0-9]*[02-9]1$|^1$ but this is just an extension of RavinderSingh13's answer to make sure it is a number.
If your Input_file is same as shown sample then following awk may help you here.
awk '/[02-9]1$/||/^1$/' Input_file
Let's say following is the sample Input_file.
cat Input_file
1
2001
21
31
121
131
1021
11
111
211
1011
1211
Then following will be output after running the code.
awk '/[02-9]1$/||/^1$/' Input_file
1
2001
21
31
121
131
1021

PCRE Regex Matching Patterns until next pattern is found

I'm struggling to find a solution to this regex which appears to be fairly straight forward. I need to match a pattern that precedes another matching pattern.
I need to capture the "Mean:" that follows "Keberos-wsfed" in the following:
Kerberos:
Historical:
Between 26 and 50 milliseconds: 10262
Between 50 and 100 milliseconds: 658
Between 101 and 200 milliseconds: 9406
Between 201 and 500 milliseconds: 6046
Between 501 milliseconds and 1 second: 1646
Between 1 and 5 seconds: 1399
Between 6 and 10 seconds: 13
Between 11 and 30 seconds: 34
Between 31 seconds and 1 minute: 7
Between 1 minute and 2 minutes: 1
Mean: 268, Mode: 36, Median: 123
Total: 29472
Kerberos-wsfed:
Historical:
Between 26 and 50 milliseconds: 3151
Between 50 and 100 milliseconds: 129
Between 101 and 200 milliseconds: 650
Between 201 and 500 milliseconds: 411
Between 501 milliseconds and 1 second: 171
Between 1 and 5 seconds: 119
Between 6 and 10 seconds: 4
Between 11 and 30 seconds: 6
Between 1 minute and 2 minutes: 1
Mean: 176, Mode: 33, Median: 37
Total: 4642
I can match (?:Kerberos-wsfed:), I can match Mean: but I must find the value of Mean after Kerberos-wsfed but having difficulty. Thanks for the assistance.
Use the regex
Kerberos-wsfed[\s\S]*?Mean: *(\d+)
The mean value is contained in the capturing group 1, that is $1 or \1 depending on your programming language.
See demo.
Try to use that regular expresion: #Kerberos-wsfed:.+?Mean:\s+(\d+)#s
You can use just space or \s instead of \s+ if you're shure in file format.
Value 176 will be at group 1 of matched elements
Demo: https://regex101.com/r/gwkUPJ/1
Using capturing group:
Kerberos-wsfed:[\s\S]*Mean:\s(\d+)
Kerberos-wsfed: matches the literal as-is
[\s\S]* allows any number of characters between (including line delimitters)
Mean:\s matches the literal Mean followed by a space \s
Finally (\d+) which is wrapped in the first capturing group captures the value you are looking for. It essentially allows any number of digits
Regex 101 Demo
The value that you are looking for (176) will be in the first capturing group which is $1 or the first one based on your language. For instance, in PHP:
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo $matches[0][1];
// Output: 176

Regex for removing repeating numbers on different lines [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
It's perhaps quite simple, but I can't figure it out:
I have a random number (can be 1,2,3 or 4 digits)
It's repeating on a second line:
2131
2131
How can I remove the first number?
EDIT: Sorry I didn't explained it better. These lines are in a plain text file. I'm using BBEdit as my editor. And the actual file looks like this (only then app. 10.000 lines):
336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain
If possible the result should look like this:
336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain
Search:
^(\d{1,4})\n(?:\1\n)+([a-z]+$)
Replace:
\1 - \2
I don't have access to BBEdit, but apparently you have to check the "Grep" option to enable regex search-n-replace. (I don't know why they call it that, since it seems to be powered by the PCRE library, which is much more powerful than grep.)
since you didn't mention any programming language, tools. I assume those numbers are in a file. each per line, and any repeated numbers are in neighbour lines. uniq command can solve your problem:
kent$ echo "1234
dquote> 1234
dquote> 431
dquote> 431
dquote> 222
dquote> 222
dquote> 234"|uniq
1234
431
222
234
Another way find: /^(\d{1,4})\n(?=\1$)/ replace: ""
modifiers mg (multi-line and global)
$str =
'1234
1234
431
431
222
222
222
234
234';
$str =~ s/^(\d{1,4})\n(?=\1$)//mg;
print $str;
Output:
1234
431
222
234
Added On the revised sample, you could do something like this:
Find: /(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/
Replace: $1 - $2
Mods: /mg (multi-line, global)
Test:
$str =
'
336
336
rinde
337
337
337
diving
338
338
graffiti
339
337
339
forest
340
340
mountain
';
$str =~ s/(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/$1 - $2/mg;
print $str;
Output:
336 - rinde
337 - diving
338 - graffiti
339
337
339 - forest
340 - mountain
Added2 - I was more impressed with the OP's later desired output format than the original question. It has many elements to it so, unable to control myself, generated a way too complicated regex.
Search: /^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n])*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
Replace: $1 - $2\n
Modifiers: mg (multi-line, global)
Expanded-
# Find:
s{ # Find a single unique digit pattern on a line (group 1)
^(\d{1,4})\n+ # Grp 1, capture a digit sequence
(?:\1\n+)* # Optionally consume the sequence many times,
\s* # and whitespaces (cleanup)
# Get the next word (group 2)
(?:
# Either find a valid word
( # Grp2
(?:
(?:\w|[^\S\n])* # Optional \w or non-newline whitespaces
[a-zA-Z] # with at least one alpha character
(?:\w|[^\S\n])*
)
)
\s* # Consume whitespaces (cleanup),
(?:\n|$) # a newline
# or, end of string
|
# OR, dont find anything (clears group 2)
)
}
# Replace (rewrite the new block)
{$1 - $2\n}xmg; # modifiers expanded, multi-line, global
find:
((\d{1,4})\r(\D{1,10}))|(\d{1,6})
replace:
\2 - \3
You should be able to clean it up from there quite easily!
Detecting such a pattern is not possible using regexp.
You can split the string by the "\n" and then compare.