Find numbers after specific text in a string with RegEx - regex

I have a multiline string like the following:
2012-15-08 07:04 Bla bla bla blup
2012-15-08 07:05 *** Error importing row no. 5: The import of this line failed because bla bla
2012-15-08 07:05 Another text that I don't want to search...
2012-15-08 07:06 Another text that I don't want to search...
2012-15-08 07:06 *** Error importing row no. 5: The import of this line failed because bla bla
2012-15-08 07:07 Import has finished bla bla
What I want is to extract all row numbers that have errors with the help of RegularExpression (with PowerShell). So I need to find the number between "*** Error importing row no. " and the following ":" as this will always give me the row number.
I looked at various other RegEx question but to be honest the answers are like chinese to me.
Tried to built RegEx with help of http://regexr.com/ but haven't been successful so far, for example with the following pattern:
"Error importing row no. "(.?)":"
Any hints?

Try this expression:
"Error importing row no\. (\d+):"
DEMO
Here you need to understand the quantifiers and escaped sequences:
. any character; as you want only numbers, use \d; if you meant the period character you must escape it with a backslash (\.)
? Zero or one character; this isn't what do you want, as you can here an error on line 10 and would take only the "1"
+ One or many; this will suffice for us
* Any character count; you must take care when using this with .* as it can consume your entire input

Pretty straight forward. Right now your quoting is going to cause an error in the regex you wrote up. Try this instead:
$LogText = ""#Your logging stuff
[regex]$Regex = "Error importing row no\. ([0-9]*):"
$Matches = $Regex.Matches($LogText)
$Matches | ForEach-Object {
$RowNum = $_.Groups[1].Value #(Waves hand) These are the rows you are looking for
}

THere could be multiple ways , few simple ones shown below might help:-
I took your log in a file called temp.txt.
cat temp.txt | grep " Error importing row no." | awk -F":" '{print $2}' | awk -F"." '{print $2}'
OR
cat temp.txt | grep " Error importing row no." | sed 's/\(.*\)no.\(.*\):\(.*\)/\2/'

Related

grep regex how to get only results with one preceeding word?

My string is :
www.abc.texas.com
mail.texas.com
subdomain.xyz.cc.texas.com
www2.texas.com
I an trying to get results only with "one" word before texas.com. Expectation when I do a regex grep :
mail.texas.com
www2.texas.com
So mail & www2 are the "one" word that I'm talking about. I tried :
grep "*.texas.com", but I get all of them in results. Can someone please help ?
You can use
grep '^[^.]*\.texas\.com'
Details:
^ - start of string
[^.]* - zero or more chars other than a . char
\.texas\.com - .texas.com string (literal . char must be escaped in the regex pattern).
See the online demo:
#!/bin/bash
s='www.abc.texas.com
mail.texas.com
subdomain.xyz.cc.texas.com
www2.texas.com'
grep '^[^.]*\.texas\.com' <<< "$s"
Output:
mail.texas.com
www2.texas.com
With awk:
awk 'BEGIN{FS=OFS="."} /texas.com$/ && NF==3' file
Output:
mail.texas.com
www2.texas.com
Set one dot as input and output field separator, check for texas.com at the end ($) of your line and check for three fields.
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
With your shown samples, please try following awk code.
awk -F'.' 'NF==3 && $2=="texas" && $3=="com"' Input_file
Explanation: Simple making field separator as . for all the lines in awk program. Then in main program checking condition if NF==3(means number of fields in current line)are 3 AND 2nd field is texas and 3rd field is com if all 3 conditions are MET then print the line.

How to output multiple regex matches through comma on the same line

I want to use grep/awk/sed to extract matched strings for each line of a log file. Then place it into csv file.
Highlighted strings (1432,53,http://www.espn.com/)
If the input is:
2018-10-31
18:48:01.717,INFO,15592.15627,PfbProxy::handlePfbFetchDone(0x1d69850,
pfbId=561, pid=15912, state=4, fd=78, timer=61), FETCH DONE: len=45,
PFBId=561, pid=0, loadTime=1434 ms, objects=53, fetchReqEpoch=0.0,
fetchDoneEpoch:0.0, fetchId=26, URL=http://www.espn.com/
2018-10-31
18:48:01.806,DEBUG,15592.15621,FETCH DONE: len=45, PFBId=82, pid=0,
loadTime=1301 ms, objects=54, fetchReqEpoch=0.0, fetchDoneEpoch:0.0,
fetchId=28, URL=http://www.diply.com/
Expected output for the above log lines:
URL,LoadTime,Objects
http://www.espn.com/,1434,53
http://www.diply.com/,1301,54
This is an example, and the actual Log File will have much more data.
--My-Solution-So-far-
For now I used grep to get all lines containing keyword 'FETCH DONE' (these lines contain strings I am looking for).
I did come up with regular expression that matches the data I need, but when I grep it and put it in the file it prints each string on the new line which is not quite what I am looking for.
The grep and regular expression I use (online regex tool: https://regexr.com/42cah):
echo -en 'url,loadtime,object\n'>test1.csv #add header
grep -Po '(?<=loadTime=).{1,5}(?= )|((?<=URL=).*|\/(?=.))|((?<=objects=).{1,5}(?=\,))'>>test1.csv #get matching strings
Actual output:
URL,LoadTime,Objects
http://www.espn.com
1434
53
http://www.diply.com
1301
54
Expected output:
URL,LoadTime,Objects
http://www.espn.com/,1434,53
http://www.diply.com/,1301,54
I was trying using awk to match multiple regex and print comma in between. I couldn't get it to work at all for some reason, even though my regex matches correct strings.
Another idea I have is to use sed to replace some '\n' for ',':
for(i=1;i<=n;i++)
if(i % 3 != 0){
sed REPLACE "\n" with "," on i-th line
}
Im pretty sure there is a more efficient way of doing it
Using sed:
sed -n 's/.*loadTime=\([0-9]*\)[^,]*, objects=\([0-9]*\).* URL=\(.*\)/\3,\1,\2/p' input | \
sed 1i'URL,LoadTime,Objects'

Why GREP can't tolerate multiple \n characters [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 5 years ago.
I am trying to use GREP to select multiple-line records from a file.
The records look something like that
########## Ligand Number : 1
blab bla bla
bla blab bla
########## Ligand Number : 2
blab bla bla
bla blab bla
########## Ligand Number : 3
bla bla bla
<EOF>
I am using Perl RegEx (-P).
To bypass the multiple line limitation in GREP, I use grep -zo. This way, the parser can consume multiple lines and output exactly what I want. generally, it works fine.
However, the problem is that the delimiter here is two empty lines after the end of last record line (three consecutive '\n' characters: one for end line and two for two empty lines).
When I try to use an expression like
grep -Pzo '^########## Ligand Number :\s+\d+.+?\n\n\n' inputFile
it returns nothing. It seems that grep can't tolerate consecutive '\n' characters.
Can anybody give an explanation?
P.S. I bypassed it already by translating the '\n' characters to '\a' first, then translating them back. like this following example:
cat inputFile | tr '\n' '\a' | grep -Po '########## Ligand Number :\s+\d+\a.+?\a\a\a' | tr '\a' '\n'
But I need to understand why couldn't GREP understand the '\n\n\n' pattern.
In a PCRE regex, . does not match line break symbols by default, and s modifier enables the POSIX like dot behavior.
Thus, add (?s) at the start, or replace . with [\s\S].
(?s)^########## Ligand Number :\s+\d+.+?\n\n\n

Is there a way to grep this multiline part from log?

I have errors in log file like this:
[11:16:16 31/10] 2428 ERROR: Wide character in subroutine entry at /home/site/site/app/lib/SC/Contro
ller/Client/Sites.pm line 1584.
Stack:
[/xxx:1584]
[/xxx:70]
[/xxx:133]
I want to put this error to some file like:
cat apache.error.log | grep "query NS" > apache.error.log-NS
But how may I do that for multiline log message?
I have found this solution:
cat apache.error.log | grep -Pzo '^.*?Wide character.*?\nStack.*?(\n(?=\s).*?)*$'
Where (\n(?=\s).*?)* means:
(...)* Find multiple times
\n next lines
(?=\s) which starts from whitespace character
.*? until end of that line (notice $ character at whole regex)
No offense, but I think #Eugen's solution is too generic and since this is a log file it might be possible that the reg ex matches unwanted lines too.
So, make sure that we fetch the exact line, This is what i think should be the reg ex. Feel free to comment!
^\[\d{2}\:\d{2}\:\d{2}\s\d{2}\/\d{2}\]\s\d+\sERROR\:\sWide\scharacter\sin\ssubroutine\sentry\sat\s[a-zA-Z\/\.]+\s.*?\nStack.*?(\n(?=\s).*?)*$

Parse a file without common delimiter in shell

I would like to ask you for help with parsing a file in shell.
Here is my data:
ID:1 g-t="Demo one" rfid="af7e 25" t-link="http://demo.site.com/api2",User af73 25 http://example.com/useraf73
ID:2 g-t="Demo one" rfid="77 63" t-link="http://demo.site.com/api",User 77 http://example.com/user77
There is no common delimiter, basically I need these fields:
ID=1 | g-t="Demo one" | rfid="af7e 25" | t-link="http://demo.site.com/api2" | User af73 25 | http://example.com/useraf73
Here is where I am stuck:
awk '{match($0,"g-t=([^\" ]+)",a)}END{print a[1]}'
I am trying to match double quote with space but I have no idea why it is not printing the result. All the chars work fine except double quotes.
What I am doing wrong? Awk is not a must here, I am open to suggestions.
Thanks.
It has been quite a while since I regularly used awk but if I remember correctly match() takes only 2 args and END{} happens only once, not for every line like I think you want. Something like:
awk '{match($0,/g-t="([^\"]+")/); print substr($0, RSTART, RLENGTH)}' dataFile
may be closer to what you had in mind?
A brute force Perl one-liner could look something like this:
perl -lne 'if (m/ID:(\S+) g-t="([^"]+)" rfid="([^"]+)" t-link="([^"]+)",User (.*) (http:.*)/){print "$1|$2|$3|$4|$5|$6"}' dataFile
and demonstrates getting all of the fields data separated by OR bars. You can move the () groups around to get more or less of the text you want for each resultant $1, $2 etc... See perldoc perl for more information.