Delete numerals at the end but keep dates and text - regex

I'm a beta tester for a hockey game and sometimes the schedules I get are fouled up. Can anyone help this Notepad-challenged newbie?
Turn this:
19;10;2012;Oklahoma City Barons;San Antonio Rampage323
19;10;2012;Milwaukee Admirals;Charlotte Checkers572
19;10;2012;Manchester Monarchs;Providence Bruins002
19;10;2012;Albany Devils;Syracuse Crunch579
Into this:
19;10;2012;Oklahoma City Barons;San Antonio Rampage
19;10;2012;Milwaukee Admirals;Charlotte Checkers
19;10;2012;Manchester Monarchs;Providence Bruins
19;10;2012;Albany Devils;Syracuse Crunch
Thanks!

To teach you some regex...
First you can match digits with \d
Secondly, you can "anchor" the match, the $ means "the end of the string"
Finally, you want to specify 1 or more digits, so you add the + quantifier to the \d token I mentioned earlier to create \d+
3.1. If the numbers are not ALWAYS on the end, make it optional with * ('0 or more') \d*
Full regex: \d+$ or \d*$

Assuming Perl:
cat file | perl -ne 's/\d+$//' > newfile
Where file is the file with the numbers and newfile is the corrected entry.

Related

How to split text into "steps" using regex in perl?

I am trying to split texts into "steps"
Lets say my text is
my $steps = "1.Do this. 2.Then do that. 3.And then maybe that. 4.Complete!"
I'd like the output to be:
"1.Do this."
"2.Then do that."
"3.And then maybe that."
"4.Complete!"
I'm not really that good with regex so help would be great!
I've tried many combination like:
split /(\s\d.)/
But it splits the numbering away from text
I would indeed use split. But you need to exclude the digit from the match by using a lookahead.
my #steps = split /\s+(?=\d+\.)/, $steps;
All step-descriptions start with a number followed by a period and then have non-numbers, until the next number. So capture all such patterns
my #s = $steps =~ / [0-9]+\. [^0-9]+ /xg;
say for #s;
This works only if there are surely no numbers in the steps' description, like any approach relying on matching a number (even if followed by a period, for decimal numbers)†
If there may be numbers in there, we'd need to know more about the structure of the text.
Another delimiting pattern to consider is punctuation that ends a sentence (. and ! in these examples), if there are no such characters in steps' description and there are no multiple sentences
my #s = $steps =~ / [0-9]+\. .*? [.!] /xg;
Augment the list of patterns that end an item's description as needed, say with a ?, and/or ." sequence as punctuation often goes inside quotes.‡
If an item can have multiple sentences, or use end-of-sentence punctuation mid-sentence (as a part of a quotation perhaps) then tighten the condition for an item's end by combining footnotes -- end-of-sentence punctuation and followed by number+period
my #s = $steps =~ /[0-9]+\. .*? (?: \."|\!"|[.\!]) (?=\s+[0-9]+\. | \z)/xg;
If this isn't good enough either then we'd really need a more precise description of that text.
† An approach using a "numbers-period" pattern to delimit item's description, like
/ [0-9]+\. .*? (?=\s+[0-9]+\. | \z) /xg;
(or in a lookahead in split) fails with text like
1. Only $2.50   or   1. Version 2.4.1   ...
‡ To include text like 1. Do "this." and 2. Or "that!" we'd want
/ [0-9]+\. .*? (?: \." | !" | [.!?]) /xg;
Following sample code demonstrates power of regex to fill up %steps hash in one line of code.
Once the data obtained you can dice and slice it anyway your heart desires.
Inspect the sample for compliance with your problem.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my($str,%steps,$re);
$str = '1.Do this. 2.Then do that. 3.And then maybe that. 4.Complete!';
$re = qr/(\d+)\.(\D+)\./;
%steps = $str =~ /$re/g;
say Dumper(\%steps);
say "$_. $steps{$_}" for sort keys %steps;
Output
$VAR1 = {
'1' => 'Do this',
'2' => 'Then do that',
'3' => 'And then maybe that'
};
1. Do this
2. Then do that
3. And then maybe that

Regex match until third occurrence of a char is found, counting occurrence of said char starting from the end of string

Let's dive in : Input :
p9_rec_tonly_.cr_called.seg
p9_tonly_.cr_called.seg
p10_nor_nor_.cr_called.seg
p10_rec_tn_.cr_called.seg
p10_tn_.cr_called.seg
p26_rec_nor_nor_.cr_called.seg
p26_rec_tn_.cr_called.seg
p26_tn_.cr_called.seg
Desired output :
p9_rec
p9
p10_nor
p10_rec
p10
p26_rec_nor
p26_rec
p26
Starting from the beginning of my string, I need to match until the third occurrence of " _ " (underscore) is found, but I need to count " _ " (underscore) occurrence starting from end of string.
Any tips is appreciated,
Best regards
I believe this regex should do the trick!
^.*?(?=_[^_]*_[^_]*_[^_]*$)
Online Demo
Explanation:
^ the start of the line
.*? matches as many characters as possible
(?=...) asserts that its contents follow our match
_[^_]*_[^_]*_[^_]* Looks for exactly three underscores after our match.
$ the end of the line
You should think beyond regex to solve this problem. For example, if you are using Python just use rsplit with a limit of 3 and get the first resulting string:
>>> data = [
'p9_rec_tonly_.cr_called.seg',
'p9_tonly_.cr_called.seg',
'p10_nor_nor_.cr_called.seg',
'p10_rec_tn_.cr_called.seg',
'p10_tn_.cr_called.seg',
'p26_rec_nor_nor_.cr_called.seg',
'p26_rec_tn_.cr_called.seg',
'p26_tn_.cr_called.seg',
]
>>> for d in data:
print(d.rsplit('_', 3)[0])
p9_rec
p9
p10_nor
p10_rec
p10
p26_rec_nor
p26_rec
p26
bash you say? Well it's not a regular expression but you can do pattern substitutions (or stripping with bash):
while read var ; do echo ${var%_*_*_*} ; done <<EOT
p9_rec_tonly_.cr_called.seg
p9_tonly_.cr_called.seg
p10_nor_nor_.cr_called.seg
p10_rec_tn_.cr_called.seg
p10_tn_.cr_called.seg
p26_rec_nor_nor_.cr_called.seg
p26_rec_tn_.cr_called.seg
p26_tn_.cr_called.seg
EOT
${var%_*_*_*} expands variable var stripping shorted suffix match for _*_*_*.
Otherwise to perform regex operations in shell, you could normally ask a utility like sed for help and feed your lines through for instance this:
sed -e 's#_[^_]*_[^_]*_[^_]*$##'
or for short:
sed -e 's#\(_[^_]*\)\{3\}$##'
Find three groups of _ and zero or more characters of not _ at the end of line $ replacing them with nothing ('').

Looking for single occurrence between '{' and ':' in a large text

I'm new to the Regex world, so please be kind on the tantrums :-)
I would like to print only the first occurrence of a string between { and :.
Example in the following string:
({TRIGGER.VALUE}=0 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>75)
or
({TRIGGER.VALUE}=1 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>65)
I want it to output only Zabbix windows
how is that possible?
I tried {([a-zA-Z0-9 ]*): it is printing : and doing it twice.
Thanks for reading!
Srini
You may use a PCRE regex with -o option (extracting the matches rather than returning the whole lines) to grab the text you need and use head -1 to only have the first match:
s='({TRIGGER.VALUE}=0 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>75) or ({TRIGGER.VALUE}=1 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>65)'
echo $s | grep -oP '(?<={)[\w\s]+(?=:)' | head -1
See an online demo
Pattern details:
(?<={) - there must be a { immediately to the left of the current location
[\w\s]+ - 1+ word and/or whitespace chars
(?=:) - there must be a : immediately to the right of the current location.

Regex to match numeric pattern

I am trying to match specific numeric pattern from the below list.
My requirement is to match only report.20150325 to report.20150331. Please help.
report.20150319
report.20150320
report.20150321
report.20150322
report.20150323
report.20150324
report.20150325
report.20150326
report.20150327
report.20150328
report.20150329
report.20150330
report.20150331
It's very simple to match 25 to 31 use regex 2[5-9]|3[01]
Here is complete regex
(report\.201503(2[5-9]|3[01]))
DEMO
Explanation of 2[5-9]|3[01]
2 followed by a single character in the range between 5 and 9
OR
3 followed by 0 or 1
You could use something like so: ^report\.201503(2[5-9]|3[01])$/gm (built using this tool).
It should match the reports you are after, as shown here.
A regexp match isn't always the right approach. Here you are asking to match a string followed by a number so use a string and numeric comparisons:
$ awk -F'.' '$1=="report" && ($2>=20150325) && ($2<=20150331)' file
report.20150325
report.20150326
report.20150327
report.20150328
report.20150329
report.20150330
report.20150331
Seems like you want to print the lines which falls between the lines which matches two separate patterns (including the lines which matches the patterns).
$ sed -n '/^report\.20150325$/,/^report\.20150331$/p' file
report.20150325
report.20150326
report.20150327
report.20150328
report.20150329
report.20150330
report.20150331

Regex - searching between strings with new lines

Preconditions:
[numbers]
[vip]111,222[vip]
[standard]333[standard]
[numbers]
What I want:
Find everything between [numbers]
Problem:
When this text is in one line the solution is simple
(?<=\[numbers\])(.*?)(?=\[numbers\])
But it is possible to search when new line are like in preconditions?
in most regex varieties the dot (.) stands for anything ON THE LINE.
You can scan across end-of-lines by using an expression for "anything", for example:
(?<=\[numbers\]) ( [\d\D]*? ) (?=\[numbers\])
as [\d\D] stands for 'anything that is a digit, or anything that is NOT a digit'
You don't really need a regex for the presented problem, with awk it's enough to set the record separator:
awk 1 RS='\\[numbers\\]\n' ORS=''
Output:
[vip]111,222[vip]
[standard]333[standard]