Extracting Number from Log File

Extracting Number from Log File - regex

I'm trying to extract a number from a log file that outputs lines of text like this:
1/11/2016 3:26:12 AM 1/11/2016 3:27:00 AM 45.6 A
The output from the line is 45.6 A
However, my Regex code is returning the 12 A from 3:26:12 AM. I need it to completely ignore the time number and just output the 45.6 A.
Here's my Regex code:
$regex = '\d+(?:\.\d+)?(?=\s+A)'

You just forgot to anchor the lookeahead at the end of the string:
\d+(?:\.\d+)?(?=\s+A$)
^
See the regex demo
The \d+(?:\.\d+)? will match one or more digits optionally followed with a . followed with one or more digits (a float value), and the (?=\s+A$) lookahead will require one or more whitespace characters with A right at the end of the string to appear after the float value.
$s = '1/11/2016 3:26:12 AM 1/11/2016 3:27:00 AM 45.6 A'
$rx = '\d+(?:\.\d+)?(?=\s+A$)'
$result = [regex]::Match($s, $rx, 'RightToLeft')
if ($result) { $result.Value; }

You can use word boundary (\b) to match only A, not AM:
\d+(?:\.\d+)?(?=\s+A\b)
DEMO: https://regex101.com/r/pA7jK2/1

if you just need find the last digit with an A in it, try this
(\d+\.\d\sA)
Demo here

Related

Python Regex - How to extract the third portion?

My input is of this format: (xxx)yyyy(zz)(eee)fff where {x,y,z,e,f} are all numbers. But fff is optional though.
Input: x = (123)4567(89)(660)
Expected output: Only the eeepart i.e. the number inside 3rd "()" i.e. 660 in my example.
I am able to achieve this so far:
re.search("\((\d*)\)", x).group()
Output: (123)
Expected: (660)
I am surely missing something fundamental. Please advise.
Edit 1: Just added fff to the input data format.

You could find all those matches that have round braces (), and print the third match with findall
import re
n = "(123)4567(89)(660)999"
r = re.findall("\(\d*\)", n)
print(r[2])
Output:
(660)

The (eee) part is identical to the (xxx) part in your regex. If you don't provide an anchor, or some sequencing requirement, then an unanchored search will match the first thing it finds, which is (xxx) in your case.
If you know the (eee) always appears at the end of the string, you could append an "at-end" anchor ($) to force the match at the end. Or perhaps you could append a following character, like a space or comma or something.
Otherwise, you might do well to match the other parts of the pattern and not capture them:
pattern = r'[0-9()]{13}\((\d{3})\)'

If you want to get the third group of numbers in brackets, you need to skip the first two groups which you can do with a repeating non-capturing group which looks for a set of digits enclosed in () followed by some number of non ( characters:
x = '(123)4567(89)(660)'
print(re.search("(?:\(\d+\)[^(]*){2}(\(\d+\))", x).group(1))
Output:
(660)
Demo on rextester

Regex to match everything from nth occurence of character onwards [duplicate]

i am trying to build one regex expression for the below sample text in which i need to replace the bold text. So far i could achieve this much
((\|)).*(\|) which is selecting the whole string between the first and last pip char. i am bound to use apache or java regex.
Sample String: where text length between pipes may vary
1.1|ProvCM|111111111111|**10.15.194.25**|10.100.10.3|10.100.10.1|docsis3.0

To match part after nth occurrence of pipe you can use this regex:
/^(?:[^|]*\|){3}([^|]*)/
Here n=3
It will match 10.15.194.25 in matched group #1
RegEx Demo

^((?:[^|]*\\|){3})[^|]+
You can use this.Replace by $1<anything>.See demo.
https://regex101.com/r/tP7qE7/4
This here captures from start of string to | and then captures 3 such groups and stores it in $1.The next part of string till | is what you want.Now you can replace it with anything by $1<textyouwant>.

Here's how you can do the replacement:
String input = "1.1|ProvCM|111111111111|10.15.194.25|10.100.10.3|10.100.10.1|docsis3.0";
int n = 3;
String newValue = "new value";
String output = input.replaceFirst("^((?:[^|]+\\|){"+n+"})[^|]+", "$1"+newValue);
This builds:
"1.1|ProvCM|111111111111|new value|10.100.10.3|10.100.10.1|docsis3.0"

Regex Lookahead/Lookbehind if more than one occurance

I have string formulas like this:
?{a,b,c,d}
It can be can be embedded like this:
?{a,b,c,?{x,y,z}}
or this is the same:
?{a,b,c,
?{x,y,z}
}
So I have to find those commas, what are in the second and greather "level" brackets.
In the example below I marked the "levels" where I have to find all commas:
?{a,b,c,
?{x,y, <--Those
?{1,2,3} <--Those
}
}
I've tried with lookahead and lookbehind, but I'm totally confused now :/
Here is my latest working try, but it is not good at all:
OnlineRegex
Update:
To avoid misunderstanding, I don't want to count the commas.
I'd like to get groups of commas to replace them.
The condition is find the commas where more than one "open tags" before it like this: ?{
.. without closing tag like this: }
Examlpe.:
In this case I have not replace any commas:
?{1,2,3} ?{a,b,c}
But in this case I have to replace commas between a b c
?{1,2,3,?{a,b,c}}

For the examples which you have provided, the following regex works(gives the desired output as mentioned by you):
(?<!^\?{[^{}]*),(?=[\s\S]*(?:\s*}){2,})
For String ?{a,b,c,d}, see Demo1 No Match
For String, ?{a,b,c,?{x,y,z}}, see Demo2 Match successful
For String,
?{a,b,c,
?{x,y,z}
}
see Demo3 Match Successful
For String,
?{a,b,c,
?{x,y,
?{1,2,3}
}
}
see Demo4 Match Successful
For String ?{1,2,3} ?{a,b,c} ?{1,2,3} ?{a,b,c}, see Demo5 No Match
Explanation:
(?<!^\?{[^{}]*), - negative lookbehind to discard the 1st level commas. The logic applied here is it should not match the comma which is preceded by start of the string followed by ?{ followed by 0+ occurrences of any character except { or }
(?=[\s\S]*(?:\s*}){2,}) - The comma matched above must be followed by atleast 2 occurrences of }(consecutive or having only whitespaces between them)

Your question is rather unclear #norbre, but I presume you'd like to extract (i.e. "count") the number of commas.
You can't do this with a regex. Regexps can't count number of occurences. However, you can use this to extract the "internal part" and then use a spreadsheet formula to count number of commas:
^(?:\?{[a-zA-Z0-9,]+?,\n??\s*?\?{)([a-zA-Z0-9,?{}\n\s]+?(?:\n*?\s*?|})+)(?:[a-zA-Z0-9,\n\s]*})$
Try: https://regex101.com/r/Rr0eFo/5
Examples
1.
Input:
?{a,b,c,?{e,f},1,2,3}
Output:
e,f}
2.
Input:
?{a,b,c,
?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
,d,e,f}
Output:
x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
3.
Input:
?{a,b,c,?{e},1,2,3}
Output:
e}
(note that there are no commas here!)
One caveat however. As I have said, regexps can't count number of occurences.
Hence, the following sample (don't know if it's valid or not for your case) would return wrong match:
?{a,b,c,?{e,f}
,1,2,3,?{a,b}
}
Output:
e,f}
,1,2,3,?{a,b}

OK replacing commas is another story so I'll add another answer.
Your regexp engine would need to support recursion.
Still I don't see a way to do it with one regex - one match would either contain the first comma or contain everything between the braces!
What I suggest is to use one regexp to get "what is inside the inner braces", run a replace (, => "") and assemble the whole line again using submatches from the regexp.
Here it is: (\?{[^?{}]*)((?>[^?{}]|(?R))+?)([^?{}]*?\})
Try: https://regex101.com/r/IzTeY0/3
Example 1:
Input:
?{a,b,c,
?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
,d,e,f}
Submatches:
1. ?{a,b,c,
2. ?{x,y,z,e,
?{1,2,3,?{f,g,3},4,5,6}
}
3.
,d,e,f}
Replace all commas in submatch 2 with anything you want, then reassamble the whole string using submatches 1 and 3.
Again, this would break the regexp:
?{a,b,c,?{e,f}
,1,2,3,?{a,b}
}
Submatch 2 would look like this:
?{e,f}
,1,2,3,?{a,b}

Convert string with preg_replace in PHP

I have this string
$string = "some words and then #1.7 1.7 1_7 and 1-7";
and I would like that #1.7/1.7/1_7 and 1-7 to be replaced by S1E07.
Of course, instead of "1.7" is just an example, it could be "3.15" for example.
I managed to create the regular expression that would match the above 4 variants
/\#\d{1,2}\.\d{1,2}|\d{1,2}_\d{1,2}|\d{1,2}-\d{1,2}|\d{1,2}\.\d{1,2}/
but I cannot figure out how to use preg_replace (or something similar?) to actually replace the matches so they end up like S1E07

You need to use preg_replace_callback if you need to pad 0 if the number less than 10.
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$string = preg_replace_callback('/#?(\d+)[._-](\d+)/', function($matches) {
return 'S'.$matches[1].'E'.($matches[2] < 10 ? '0'.$matches[2] : $matches[2]);
}, $string);

You could use this simple string replace:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/', 'S${1}E${2}', $string);
But it would not yield zero-padded numbers for the episode number:
// some words and then S1E7 S1E7 S1E7 and S1E7
You would have to use the evaluation modifier:
preg_replace('/#?\b(\d{1,2})[-._](\d{1,2})\b/e', '"S".str_pad($1, 2, "0", STR_PAD_LEFT)."E".str_pad($2, 2, "0", STR_PAD_LEFT)', $string);
...and use str_pad to add the zeroes.
// some words and then S01E07 S01E07 S01E07 and S01E07
If you don't want the season number to be padded you can just take out the first str_pad call.

I believe this will do what you want it to...
/\#?([0-9]+)[._-]([0-9]+)/
In other words...
\#? - can start with the #
([0-9]+) - capture at least one digit
[._-] - look for one ., _ or -
([0-9]+) - capture at least one digit
And then you can use this to replace...
S$1E$2
Which will put out S then the first captured group, then E then the second captured group

You need to put brackets around the parts you want to reuse ==> capture them. Then you can access those values in the replacement string with $1 (or ${1} if the groups exceed 9) for the first group, $2 for the second one...
The problem here is that you would end up with $1 - $8, so I would rewrite the expression into something like this:
/#?(\d{1,2})[._-](\d{1,2})/
and replace with
S${1}E${2}
I tested it on writecodeonline.com:
$string = "some words and then #1.7 1.7 1_7 and 1-7";
$result = preg_replace('/#?(\d{1,2})[._-](\d{1,2})/', 'S${1}E${2}', $string);

2-step regular expression matching with a variable in Perl

I am looking to do a 2-step regular expression look-up in Perl, I have text that looks like this:
here is some text 9337 more text AA 2214 and some 1190 more BB stuff 8790 words
I also have a hash with the following values:
%my_hash = ( 9337 => 'AA', 2214 => 'BB', 8790 => 'CC' );
Here's what I need to do:
Find a number
Look up the text code for the number using my_hash
Check if the text code appears within 50 characters of the identified number, and if true print the result
So the output I'm looking for is:
Found 9337, matches 'AA'
Found 2214, matches 'BB'
Found 1190, no matches
Found 8790, no matches
Here's what I have so far:
while ( $text =~ /(\d+)(.{1,50})/g ) {
$num = $1;
$text_after_num = $2;
$search_for = $my_hash{$num};
if ( $text_after_num =~ /($search_for)/ ) {
print "Found $num, matches $search_for\n";
}
else {
print "Found $num, no matches\n";
}
This sort of works, except that the only correct match is 9337; the code doesn't match 2214. I think the reason is that the regular expression match on 9337 is including 50 characters after the number for the second-step match, and then when the regex engine starts again it is starting from a point after the 2214. Is there an easy way to fix this? I think the \G modifier can help me here, but I don't quite see how.
Any suggestions or help would be great.

You have a problem with greediness. The 1,50 will consume as much as it can. Your regex should be /(\d+)(.+?)(?=($|\d))/
To explain, the question mark will make the multiple match non-greedy (it will stop as soon as the next pattern is matched - the next pattern gets precedence). The ?= is a lookahead operator to say "check if the next element is a digit. If so, match but do not consume." This allows the first digit to get picked up by the beginning of the regex and be put into the next matched pattern.
[EDIT]
I added an optional end value to the lookahead so that it wouldn't die on the last match.

Just use :
/\b\d+\b/g
Why match everything if you don't need to? You should use other functions to determine where the number is :
/(?=9337.{1,50}AA)/
This will fail if AA is further than 50 chars away from the end of 9337. Of course you will have to interpolate your variables to match your hashe's keys and values. This was just an example for your first key/value pair.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Extracting Number from Log File - regex

You can use word boundary (\b) to match only A, not AM: \d+(?:\.\d+)?(?=\s+A\b) DEMO: https://regex101.com/r/pA7jK2/1

if you just need find the last digit with an A in it, try this (\d+\.\d\sA) Demo here

Related

Python Regex - How to extract the third portion?

Regex to match everything from nth occurence of character onwards [duplicate]

Regex Lookahead/Lookbehind if more than one occurance

Convert string with preg_replace in PHP

2-step regular expression matching with a variable in Perl

Categories

Resources