How to match a group of lines that match a pattern - regex

I am trying to filter out a group of lines that match a pattern using a regexp but am having trouble getting the correct regexp to use.
The text file contains lines like this:
transaction 390134; promote; 2016/12/20 01:17:07 ; user: build
to: DEVELOPMENT ; from: DEVELOPMENT_BUILD
# some commit comment
/./som/file/path 11745/409 (22269/257)
# merged
version 22269/257 (22269/257)
ancestor: (22133/182)
transaction 390136; promote; 2016/12/20 01:17:08 ; user: najmi
to: DEVELOPMENT ; from: DEVELOPMENT_BUILD
/./some/other/file/path 11745/1 (22269/1)
version 22269/1 (22269/1)
ancestor: (none - initial version)
type: dir
I would like to filter out the lines that start with "transaction", contain "User: build all the way until the next line that starts with "transaction".
The idea is to end up with transaction lines where user is not "build".
Thanks for any help.

If you want only the transaction lines for all users except build:
grep '^transaction ' test_data| grep -v 'user: build$'
If you want the whole transaction record for such users:
awk '/^transaction /{ p = !/user: build$/};p' test_data
OR
perl -lne 'if(/^transaction /){$p = !/user: build$/}; print if $p' test_data
The -A and -v options of grep command would have done the trick if all transaction records had same number of lines.

Related

Splunk search Regex: to filter timestamp and userId

Below the text want to extract timestamp align with UserId from the below line and group it
2020-10-12 12:30:22.540 INFO 1 --- [enerContainer-4] c.t.t.o.s.s.UserPrepaidService : Validating the user with UserID:1111 systemID:sys111
From below whole logs
2020-10-12 12:30:22.538 INFO 1 --- [ener-4] c.t.t.o.s.service.UserService : AccountDetails":[{"snumber":"2222","sdetails":[{"sId":"0474889018","sType":"Java","plan":[{"snumber":"sdds22"}]}]}]}
2020-10-12 12:30:22.538 INFO 1 --- [ener-4] c.t.t.o.s.service.ReceiverService : Received userType is:Normal
2020-10-12 12:30:22.540 INFO 1 --- [enerContainer-4] c.t.t.o.s.s.UserPrepaidService : Validating the user with UserID:1111 systemID:sys111
2020-10-12 12:30:22.540 INFO 1 --- [enerContainer-4] c.t.t.o.s.util.CommonUtil : The Code is valid for userId: 1111 systemId: sys111
2020-10-12 12:30:22.577 INFO 1 --- [enerContainer-4] c.t.t.o.s.r.Dao : Saving user into dB ..... with User-ID:1111
....
same repetitive line
Below is my SPL search commands it returns only userid group by from that specific line.
But I want the time stamp as well from that line and group by it with time chart
index="tis" logGroup="/ecs/logsmy" "logEvents{}.message"="*Validating the user with UserID*" | spath output=myfield path=logEvents{}.message | rex field=myfield "(?<=Validating the user with UserID:)(?<userId>[0-9]+)(?= systemID:)" | table userId | dedup userId | stats count values(userId) by userId
Basically I tired the below
(^(?<dtime>\d{4}-\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\.\d+) )(?<=Validating the user with UserID:)(?<userId>[0-9]+)(?= systemID:)
but it gave all the time stamp not specifically the line I mentioned above
You placed the lookaround right after matching the timestamp pattern, but you have to first move to the postition where the lookbehind is true.
If you want both values, you can match Validating the user with UserID: and systemID: instead of using a lookaround.
If there are leading whitspace chars, you could match them with \s or [^\S\r\n]*
^\s*(?<dtime>\d{4}-\d{1,2}-\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\.\d+).*\bValidating the user with UserID:(?<userId>[0-9]+) systemID:
Regex demo

Extracting part of lines with specific pattern and sum the digits using bash

I am just learning bash scripting and commands and i need some help with this assignment.
I have txt file that contains the following text and i need to:
Extract guest name ( 1.1.1 ..)
Sum guest result and output the guest name with result.
I used sed with simple regex to extract out the name and the digits but i have no idea about how to summarize the numbers becuase the guest have multiple lines record as you can see in the txt file. Note: i can't use awk for processing
Here is my code:
cat file.txt | sed -E 's/.*([0-9]{1}.[0-9]{1}.[0-9]{1}).*([0-9]{1})/\1 \2/'
And result is:
1.1.1 4
2.2.2 2
1.1.1 1
3.3.3 1
2.2.2 1
Here is the .txt file:
Guest 1.1.1 have "4
Guest 2.2.2 have "2
Guest 1.1.1 have "1
Guest 3.3.3 have "1
Guest 2.2.2 have "1
and the output should be:
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
Thank you in advance
I know your teacher wont let you use awk but, since beyond this one exercise you're trying to learn how to write shell scripts, FYI here's how you'd really do this job in a shell script:
$ awk -F'[ "]' -v OFS=' = ' '{sum[$2]+=$NF} END{for (id in sum) print id, sum[id]}' file
3.3.3 = 1
2.2.2 = 3
1.1.1 = 5
and here's a bash builtins equivalent which may or may not be what you've covered in class and so may or may not be what your teacher is expecting:
$ cat tst.sh
#!/bin/env bash
declare -A sum
while read -r _ id _ cnt; do
(( sum[$id] += "${cnt#\"}" ))
done < "$1"
for id in "${!sum[#]}"; do
printf '%s = %d\n' "$id" "${sum[$id]}"
done
$ ./tst.sh file
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
See https://www.artificialworlds.net/blog/2012/10/17/bash-associative-array-examples/ for how I'm using the associative array. It'll be orders of magnitude slower than the awk script and I'm not 100% sure it's bullet-proof (since shell isn't designed to process text there are a LOT of caveats and pitfalls) but it'll work for the input you provided.
OK -- since this is a class assignment, I will tell you how I did it, and let you write the code.
First, I sorted the file. Then, I read the file one line at a time. If the name changed, I printed out the previous name and count, and set the count to be the value on that line. If the name did not change, I added the value to the count.
Second solution used an associative array to hold the counts, using the guest name as the index. Then you just add the new value to the count in the array element indexed on the guest name.
At the end, loop through the array, print out the indexes and values.
It's a lot shorter.

Print remaining lines in file after regular expression that includes variable

I have the following data:
====> START LOG for Background Process: HRBkg Hello on 2013/09/27 23:20:20 Log Level 3 09/27 23:20:20 I Background process is using
processing model #: 3 09/27 23:20:23 I 09/27 23:20:23 I --
Started Import for External Key
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3 09/30 07:31:07 I Background process is using
processing model #: 3 09/30 07:31:09 I 09/30 07:31:09 I --
Started Import for External Key
I need to extract the remaining file contents after the LAST match of ====> START LOG.....
I have tried numerous times to use sed/awk, however, I can not seem to get awk to utilize a variable in my regular expression. The variable I was trying to include was for the date (2013/09/30) since that is what makes the line unique.
I am on an HP-UX machine and can not use grep -A.
Any advice?
There's no need to test for a specific time just to find the last entry in the file:
awk '
BEGIN { ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
NR == FNR { if (/START LOG/) lastMatch=NR; next }
FNR == lastMatch { found=1 }
found
' file
This might work for you (GNU sed):
a=2013/09/30
sed '\|START LOG.*'"$a"'|{h;d};H;$!d;x' file
This will return your desired output.
sed -n '/START LOG/h;/START LOG/!H;$!b;x;p' file
If you have tac available, you could easily do..
tac <file> | sed '/START LOG/q' | tac
Here is one in Python:
#!/usr/bin/python
import sys, re
for fn in sys.argv[1:]:
with open(fn) as f:
m=re.search(r'.*(^====> START LOG.*)',f.read(), re.S | re.M)
if m:
print m.group(1)
Then run:
$ ./re.py /tmp/log.txt
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
If you want to exclude the ====> START LOGS.. bit, change the regex to:
r'.*(?:^====> START LOG.*?$\n)(.*)'
For the record, you can easily match a variable against a regular expression in Awk, or vice versa.
awk -v date='2013/09/30' '$0 ~ date {p=1} p' file
This sets p to 1 if the input line matches the date, and prints if p is non-zero.
(Recall that the general form in Awk is condition { actions } where the block of actions is optional; if omitted, the default action is to print the current input line.)
This prints the last START LOG, it set a flag for the last block and print it.
awk 'FNR==NR { if ($0~/^====> START LOG/) f=NR;next} FNR>=f' file file
You can use a variable, but if you have another file with another date, you need to know the date in advance.
var="2013/09/30"
awk '$0~v && /^====> START LOG/ {f=1}f' v="$var" file
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
With GNU awk (gawk) or Mikes awk (mawk) you can set the record separator (RS) so that each record will contain a whole log message. So all you need to do is print the last one in the END block:
awk 'END { printf "%s", RS $0 }' RS='====> START LOG' infile
Output:
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
Answer in perl:
If your logs are in assume filelog.txt.
my #line;
open (LOG, "<filelog.txt") or "die could not open filelog.tx";
while(<LOG>) {
#line = $_;
}
my $lengthline = $#line;
my #newarray;
my $j=0;
for(my $i= $lengthline ; $i >= 0 ; $i++) {
#newarray[$j] = $line[$i];
if($line[$i] =~ m/^====> START LOG.*/) {
last;
}
$j++;
}
print "#newarray \n";

AWStats multiple columns in extra section

I have an AWStats running and the reports are built from IIS logfiles.
I have an extra section to view all the actions of the executed perlscripts on the site.
The config looks like this:
ExtraSectionName1="Actions"
ExtraSectionCodeFilter1="200 304"
ExtraSectionCondition1="URL,\/cgi\-bin\/.+\.pl"
ExtraSectionFirstColumnTitle1="Action"
ExtraSectionFirstColumnValues1="QUERY_STRING,action=([a-zA-Z0-9]+)"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionStatTypes1=HPB
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=20
MinHitExtra1=1
The output looks like this:
Action Pages Hits
foo 1234 1234
bar 5678 5678
But there are some actions with the same name in different perl scripts.
I would need this:
Script Action Pages Hits
foo.pl foo 1234 1234
bar.pl foo 1234 1234
foo.pl bar 5678 5678
bar.pl bar 5678 5678
Does anyone know how to create such a report?
EDIT:
I did some more research and all forum posts I've found say that it is not possible to have two columns in an extra section without hacking in awstats.pl
Now I am trying to put it into one column using URLWITHQUERY to output someting like this:
Action Pages Hits
foo.pl?action=foo 1234 1234
foo.pl?action=bar 1234 1234
bar.pl?action=foo 5678 5678
...
The new problem is that the query has more parameters than action, which are unordered.
I tried this
ExtraSectionFirstColumnValues1="URLWITHQUERY,([a-zA-Z0-9]+\.pl\?).*(action=[a-zA-Z0-9]+)"
but AWStats only gets the value from the first bracket pair and ignores the rest. I think it internally works with $1 provided by the perl regex 'magic'.
Any ideas?
maybe?
ExtraSectionFirstColumnTitle1="Script"
ExtraSectionFirstColumnValues1="URL,\/cgi\-bin\/(.+\.pl)`enter code here`"
ExtraSectionFirstColumnFormat1="%s"
ExtraSectionFirstColumnTitle2="Action"
ExtraSectionFirstColumnValues2="QUERY_STRING,action=([a-zA-Z0-9]+)"
ExtraSectionFirstColumnFormat2="%s"
I've found a solution.
awstats.pl fetches the data for the specified extra sections in line 19664 - 19750
This is my modification:
# Line 19693 - 19701 in awstats.pl (AWStats version 7 Revision 1.971)
elsif ( $rowkeytype eq 'URLWITHQUERY' ) {
if ( "$urlwithnoquery$tokenquery$standalonequery" =~
/$rowkeytypeval/ )
{
$rowkeyval = "$1$2"; # I simply added a $2 for the second capture group
$rowkeyok = 1;
last;
}
}
This will get the first and the second capture group specified in the ExtraSectionFirstColumnValuesX regex.
Example:
ExtraSectionFirstColumnValues1="URLWITHQUERY,([a-zA-Z0-9]+\.pl\?).*(action=[a-zA-Z0-9]+)"
Needless to say that you need to add a $3 $4 $5 ... if you need more groups.

Mercurial template: separator for tags and bookmarks

Can I put space characters as separators only if I have any Tag or Bookmark?
Example:
hg log --template "{rev} {author} {tags} {bookmarks} {desc|firstline}\n"
Output:
3: Author1 TIP BKMRK_NAME Another commit
2: Author1 Third commit
1: Author1 TAG1 Second commit
0: Author1 Initial commit
The changesets that don't have Tags or Bookmarks prints the space characters. I'd like to suppress those extra spaces:
3: Author1 TAG_NAME BKMRK_NAME Another commit
2: Author1 Third commit
1: Author1 TAG1 Second commit
0: Author1 Initial commit
With a recent Mercurial (after 2.5), you can use the if template expression:
hg log --template '{rev} {author}{if(tags, " {tags}")}{if(bookmarks," {bookmarks}")} {desc|firstline}\n'
A new style can be created that defines how the start of a collection (i.e. bookmarks), the element in the collection itself and the last element of a collection look like.
Save the following style definition in a file e.g. called "my_style":
changeset = '{rev} {author}{bookmarks} {desc|firstline}\n'
start_bookmarks = ' ['
bookmark = '{bookmark}, '
last_bookmark = '{bookmark}]'
You then can call hg log with the just newly created style:
> hg log --style /path/to/my_style
6 james [bar, foo, master] b 3
5 james b 2
4 james a 3
This would only insert a space and bracket if there are bookmarks at all (note that there's no space between {author} and {bookmarks}).
Some nice cli templates as reference can be found here.
I don’t know much about templating in Mercurial, but you can always filter out the extra spaces with sed:
hg log --template "{rev} {author} {tags} {bookmarks} {desc|firstline}\n" | sed "s/ */ /g"