parsing ns2 trace file [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm using NS 2.35 and am trying to determine the end-to-end delay of my routing algorithm.
I think anyone with some good scripting experience should be able to answer this question, sadly that person is not me.
I have a trace file, that looks something like this:
- -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 6 ------- null}
h -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 -1 ------- null}
+ -t 0.55 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1056 -a 0 -x {2.0 17.0 10 ------- null}
+ -t 0.555 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1057 -a 0 -x {2.0 17.0 11 ------- null}
r -t 0.556 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
+ -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
- -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
But here is what I need to do.
A line that starts with + is when a new packet is added to the network.
A line starting with r is when a packet has been received by the destination. the double-typed number after the -t is the time at which that event happened. And finally, after the -i is the identity of the packet.
For me to calculate average end-to-end delay, I need to find every line that has a certain id after the -i. from there I need to calculate the timestamp of the r minus the timestamp of the +
So I figure there could be a regular expression separated by spaces. I could put each of the segements into their own variables. Then I would check the 15th (the packet ID).
But I'm not sure where to go from there, or how to put it all together.
I know there are some AWK scripts on the web for doing this, but they are all outdated and don't fit the current format (and I'm not sure how to change them).
Any help would be greatly appreciated.
EDIT:
Here is an example of a full packet route that I'm looking to find.
I've taken out a lot of lines in between these ones, so that you can see a single packets events.
# a packet is enqueued from node 2 going to node 7. It's ID is 1636. this was at roughly 1.75sec
+ -t 1.74499999999998 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.1s, it left node 2.
- -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.134 it hopped from 2 to 7 (not important)
h -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# at 2.182 it was received by node 7
r -t 2.182 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it was the enqueued by node 7 to be sent to node 12
+ -t 2.182 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# slightly later it left node 7 on its was to node 12
- -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it hopped from 7 to 12 (not important)
h -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 12
r -t 2.2312 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# added to queue, heading to node 17
+ -t 2.2312 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# left for node 17
- -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# hopped to 17 (not important)
h -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 17 notice the time delay
r -t 2.28 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
The ideal output of the script would recognize 2.134 as the start time, and 2.28 as the end, and then give me the delay of 0.146sec. It would do this for all packet IDs and only report the average.
It was requested that I expand a bit on how the file works, and what I am expecting.
The file is listing descriptions of about 10,000 packets. Each packet can be in a different state. The important states are + which means a packet has been enqueued at a router, and r which means the packet has been received by its destination.
It is possible that a packet that is enqueued (so a + entry) is not actually received and is instead dropped. This means we cannot assume that for every + entry there will be a r entry.
What I'm trying to measure is the average end to end delay. What this means, is that if you look at a single packet, it will have a time it was enqueued, and a time it was received. I need to make this calculation to find its end-to-end delay. But I also need to do it for 9,999 other packets to get an average.
I've thought about it more, and heres generally how I think the algorithm needs to work.
remove all lines that don't begin with a + or an r because they are unimportant.
go through all of the packet IDs (that is the numbers after -i, such as 1052 in the example), and put them into some sort of groups (multiple arrays perhaps).
each group should now contain all of the information about a particular packet.
inside the group, check if there is a +, ideally we want the very first +. Record its time.
look for any more + lines. Look at their time. It's possible the log is slightly jumbled. So its possible there is a + line later on that is actually earlier in the simulation.
If this new + line has an earlier time, then update the time variable with that.
assuming there are no more + lines, look for an r line.
if there is no r line, the packet was dropped so don't worry about it.
for every r line you find, all we need to do is find the one who has the lastest timestamp
The r line with the latest timestamp is where the packet was finally received.
subtract the + time from the r time, this gives us the time it took for the packet to travel.
Add this value to an array so that later it can be averaged.
repeat this process on every packet ID group, and then finally average the created array of delays.
Thats a lot of typing, but I think its as clear as I can be in what I want. I wish i was a regex master, but I just don't have time to learn it well enough to pull this off.
Thanks for all your help, and let me know if you have any questions.

There's not much to work with here, as Iain said in the comments to your question, but if I understand what you want to do correctly, something like this should work:
awk '/^[+r]/{$1~/r/?r[$15]=$2:r[$15]?d[$15]=r[$15]-$2:1} END {for(p in d){sum+=r[p];num++}print sum/num}' trace.file
It skips all lines not starting with '+' or 'r'. If the line starts with 'r' it adds time to the r array. Otherwise, it calculates the delay and adds it to the d array if the element is found in the r array. Finally it loops over the elements in the d array, adds up the total delay and number of elements and calculates the average from this. In your case the average is 0.
The :1 at the end of the main block is just in there so I can get away with a ternary expression instead of the significantly more verbose if statement.
EDIT: New expression to work with the added conditions:
awk '/^[+r]/{$1~/r/?$3>r[$15]?r[$15]=$3:1:!a[$15]||$3<a[$15]?a[$15]=$3:1} END {for(i in r){sum+=r[i]-a[i];num++}print "Average delay", sum/num}'
or as an awk-file
/^[+r]/ {
if ($1 ~ /r/) {
if ($3 > received[$15])
received[$15] = $3;
} else {
if (!added[$15] || $3 < added[$15])
added[$15] = $3;
}
} END {
for (packet in received) {
sum += received[packet] - added[packet];
num++
}
print "Average delay", sum/num
}
According to your algorithm it seems like 1.745 would be the start time, while you write that 2.134 is.

Related

Clear old resources in k8s

I want to make a command which can clear all old deployments. For example, I have deployments in a namespace
kubectl -n web get deploy --sort-by=.metadata.creationTimestamp
myproject-static-staging-master 1/1 1 1 54d
myproject-static-staging-task-13373 1/1 1 1 20d
myproject-static-staging-task-13274 1/1 1 1 19d
myproject-static-staging-task-13230 1/1 1 1 19d
myproject-static-staging-task-13323 1/1 1 1 19d
myproject-static-staging-task-13264 1/1 1 1 18d
myproject-static-staging-task-13319 1/1 1 1 13d
myproject-static-staging-task-13470 1/1 1 1 6d20h
myproject-static-staging-task-13179 1/1 1 1 6d20h
myproject-static-staging-task-13453 1/1 1 1 6d4h
myproject-static-staging-moving-to-old 1/1 1 1 6d
myproject-static-staging-moving-test 1/1 1 1 5d20h
I want to save only that's (5 newest)
myproject-static-staging-task-13470 1/1 1 1 6d20h
myproject-static-staging-task-13179 1/1 1 1 6d20h
myproject-static-staging-task-13453 1/1 1 1 6d4h
myproject-static-staging-moving-to-old 1/1 1 1 6d
myproject-static-staging-moving-test 1/1 1 1 5d20h
I tried that command
kubectl get deployment -n web --template '{{range
.items}}{{.metadata.name}}{{"\n"}}{{end}}'
--sort-by=.metadata.creationTimestamp | grep -v master | grep myproject-static-staging | head -n 5 | xargs -r kubectl -n web delete
deployment
but it is no correct.
You can use xargs command like this:
command1 | xargs -I{} command2 {}
Xargs will replace the output from command1 with the empty {}. For example, if the output of command1 is '1 2 3', then Xargs will invoke commands: 'command2 1', 'command2 2', and 'command2 3'.
So in your case, you can use
kubectl get deployment -n web --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' --sort-by=.metadata.creationTimestamp | grep -v master | grep myproject-static-staging | tail -r | tail -n +6 | xargs -I{} kubectl -n web delete deployment {}
'tail -r' will reverse the order, and 'tail -n +6' will select all rows except the first 5.

Bash, need to change word order in multiple directories

I have a considerable classical FLAC collection where each album is a directory. I've realized that I have used a sub-optimal structure and need to rename all the directories.
My current naming convention is:
COMPOSER (CONDUCTOR) - NAME OF PIECE
E.g.
"Bach (Celibidache) - Mass in F minor"
I want to change the naming to
COMPOSER - NAME OF PIECE (CONDUCTOR)
I.e.
"Bach - Mass in F minor (Celibidache)"
There are some possible exceptions, the (CONDUCTOR) may be (CONDUCTOR, SOLOIST) and some directories do not have the (CONDUCTOR) part and should be left as is. The NAME OF PIECE can contain all legal letters and symbols.
All albums are located in the same parent directory, so no sub-directories.
What is the easy way to do this?
use perl rename (some distributions have this as rename - Ubuntu and related, some as prename - Fedora and Redhat AFAIK). Check first.
prename -n -- '-d && s/(\(.*\)) - (.*)/- \2 \1/' *
-n don't rename just print the results - remove after you are ok with the results.
-- end of the options, start of the perlexpr and files
-d check that the file is a directory
s/.../.../ - substitution
Example:
[test01#localhost composers]$ ls -la
total 12
drwxrwxr-x 3 test01 test01 4096 Feb 14 12:37 .
drwxrwxr-x. 7 test01 test01 4096 Feb 14 12:23 ..
drwxrwxr-x 2 test01 test01 4096 Feb 14 12:37 'Bach (Celibidache) - Mass in F minor'
-rw-rw-r-- 1 test01 test01 0 Feb 14 12:27 'Bach (Celibidache) - Mass in F minor.flac'
[test01#localhost composers]$ prename -n -- '-d && s/(\(.*\)) - (.*)/- \2 \1/' *
Bach (Celibidache) - Mass in F minor -> Bach - Mass in F minor (Celibidache)
[test01#localhost composers]$ prename -- '-d && s/(\(.*\)) - (.*)/- \2 \1/' *
[test01#localhost composers]$ ls -la
total 12
drwxrwxr-x 3 test01 test01 4096 Feb 14 12:38 .
drwxrwxr-x. 7 test01 test01 4096 Feb 14 12:23 ..
-rw-rw-r-- 1 test01 test01 0 Feb 14 12:27 'Bach (Celibidache) - Mass in F minor.flac'
drwxrwxr-x 2 test01 test01 4096 Feb 14 12:37 'Bach - Mass in F minor (Celibidache)'
Note that without -d both the file and the directory would have been renamed.

How to extract full paths from grep output using regular expression

I need to automatically detect any USB drives plugged in, mounted or not, mount the ones not mounted already in folders that have the name of the given name of the device (like it happens in a Windows machine by default) and get the routes of the mount points of all the devices. The devices should be mounted in folders in /media/pi (using a Raspberry Pi, so pi is my username). This is what I'm doing:
To get the path of all mounter devices:
1) Run lsblk, outputs:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.4G 0 disk
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
sdb 8:16 1 14.3G 0 disk
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
mmcblk0 179:0 0 14.9G 0 disk
├─mmcblk0p1 179:1 0 41.8M 0 part /boot
└─mmcblk0p2 179:2 0 14.8G 0 part /
2) With a particularly crafted line, I can filter out some unnecessary info:
I run lsblk | grep 'sd' | grep 'media' which outputs:
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
I need to get /media/pi/D0B46928B4691270 and /media/pi/MI PENDRIVE, preferably in an array. Currently I'm doing this:
lsblk | grep 'sd' | grep 'media' | cut -d '/' -f 4
But it only works with paths that have no spaces and the output of grep is not an array of course. What would be a clean way of doing this with regular expressions?
Thanks.
lsblk supports json output with the -J flag. I would recommend that if you want to parse the output:
lsblk -J | jq '..|.?|select(.name|startswith("sd")).mountpoint // empty'
Something like this?
$ echo "$f"
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.4G 0 disk
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
sdb 8:16 1 14.3G 0 disk
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
mmcblk0 179:0 0 14.9G 0 disk
├─mmcblk0p1 179:1 0 41.8M 0 part /boot
└─mmcblk0p2 179:2 0 14.8G 0 part /
$ grep -o '/media/.*$' <<<"$f"
/media/pi/D0B46928B4691270
/media/pi/MI PENDRIVE
$ IFS=$'\n' drives=( $(grep -o '/media/.*$' <<<"$f") )
$ printf '%s\n' "${drives[#]}"
/media/pi/D0B46928B4691270
/media/pi/MI PENDRIVE

opencv_traincascade.Unspecified error (No element name has been given)

I'm trying to train my own cascade, but get the following error:
Unspecified error (No element name has been given) in cv::operator
<<, file C:\builds\2_4_PackSlave-win64-vc11-shared\opencv\modules\core\include\
opencv2/core/operations.hpp, line 2910
I made ​​these steps:
I cut 20 photos of the object so that they remained only the desired object
Resize to 30x18
Make objectSamples.dat file like this :
object(1).jpg 1 0 0 30 18
object(10).jpg 1 0 0 30 18
object(11).jpg 1 0 0 30 18
And negatives.dat like :
negatives\1.jpeg
negatives\10.jpg
negatives\11.jpg
size of pic ~ 500x500
4.Make vec file:
opencv_createsamples -info objectSamples.dat -vec objectSamples.vec -w 30 -h 18 -num 20
5.Show samples ( my pictures are shown entirely) : opencv_createsamples -vec objectSamples.vec -w 30 -h 18
6.Try to train : opencv_traincascade -data Cascade -vec objectSamples.vec -bg negatives.dat -numPos 10 -numNeg 10 -numStages 2 -featureType HAAR -w 30 -h 18
But get an error:
What am I doing wrong?
I read these articles and the answer, but I didn't understand, in what a problem:
trouble-when-use-opencv_traincascadeexe
haartraining tutorial
docs.opencv traincascade
Increased the number of images to 1000 positive and 2000 negatives
opencv_traincascade -data Cascade -vec boobsSamples.vec -bg negativesBig/negatives.txt -numPos 400 -numNeg 1000 -numStages 2 -featureType HAAR -w 30 -h 18 -mode ALL
Geting the same error.
Problem solved!
I copied the opencv_traincascade.exe to a folder of images. When I ordered the full path to the opencv_traincascade.exe in the library, the problem disappeared.
F:\OpenCV\opencv\build\x64\vc11\bin\opencv_traincascade -data Cascade -vec positives.vec -bg negativesBig/negatives.txt -numPos 400 -numNeg 1000 -numStages 2 -featureType HAAR -w 30 -h 18 -mode ALL

parse Log File, check for date, report results

I need to take the time stamp printed in After FTP connection and check whether it happened today.
I have a log file which contains the following:
---------------------------------------------------------------------
Opening connection for file1.dat
---------------------------------------------------------------------
---------------------------------------------------------------------
Before ftp connection -- time is -- Mon Oct 21 04:01:52 CEST 2013
---------------------------------------------------------------------
---------------------------------------------------------------------
After ftp connection -- time is Mon Oct 21 04:02:03 CEST 2013 .
---------------------------------------------------------------------
---------------------------------------------------------------------
Opening connection for file2.dat
---------------------------------------------------------------------
---------------------------------------------------------------------
Before ftp connection -- time is -- Wed Oct 23 04:02:03 CEST 2013
---------------------------------------------------------------------
---------------------------------------------------------------------
After ftp connection -- time is Wed Oct 23 04:02:04 CEST 2013 .
---------------------------------------------------------------------
Desired Output:
INPUT:file1.dat --> FAIL # since it is Oct 21st considering today is Oct 23.
INPUT:file2.dat --> PASS # since it is Oct 23rd.
INPUT:file3.dat --> FAIL # File information does not exist
What I tried so far:
grep "file1.dat\\|Before ftp connection\\|After ftp connection" logfilename
But this returns all the info that matches either file1.dat OR Before ftp connection OR After ftp connection. Considering the above sample, I get 5 lines out of which last 2 lines are from file2.dat:
Opening connection for file1.dat
Before ftp connection -- time is -- Mon Oct 21 04:01:52 CEST 2013
After ftp connection -- time is Mon Oct 21 04:02:03 CEST 2013 .
Before ftp connection -- time is -- Wed Oct 23 04:02:03 CEST 2013
After ftp connection -- time is Wed Oct 23 01:02:04 CEST 2013 .
I am stuck here. So ideally I need to take Mon Oct 21 04:02:03 CEST 2013 and compare and print the a result FAIL.
Defining the records correctly makes things a lot easier:
$ awk '{print $5,($0~"After.*"d?"PASS":"FAIL")}' d="$(date +'%a %b %d')" RS= file
file1.dat FAIL
file2.dat PASS
Use awk:
# read dates in shell variables
read x m d x x y < <(date)
awk -v f='file2.dat' -v m=$m -v d=$d -v y=$y '$0 ~ f {s=1; next}
s && /After ftp connection/ {
res = ($8==m && $9==d && $12==y) ? "PASS" : "FAIL";
print f, res; exit
}' file.log
file2.dat PASS
FOLLOW UP by OP:
I achieved the intended results by this:
check_success ()
{
CHK_DIR=/Archive
if [[ ! -d ${CHK_DIR} ]]; then
exit 1
elif [[ ! -d ${LOG_FOLDER} ]]; then
exit 1
fi
count_of_files=$(ls -al --time-style=+%D $CHK_DIR/*.dat | grep $(date +%D) | cut -f1 | awk '{ print $7}' | wc -l)
if [[ $count_of_files -lt 1 ]]; then
exit 2
fi
list_of_files=$(basename $(ls -al --time-style=+%D $CHK_DIR/*.dat | grep $(date +%D) | cut -f1 | awk '{ print $7}'))
for filename in $list_of_files
do
filename=basename filename
lg_name=$(grep -El "Opening.*$filename" $LOG_FOLDER/* | head -1 )
m=$(date +%b)
d=$(date +%d)
y=$(date +%Y)
output=$(awk -v f=$filename -v m=$m -v d=$d -v y=$y '$0 ~ f {s=1; next} s && /After ftp connection/ { res = ($8==m && $9==d && $12==y) ? "0" : "1"; print res; exit }' $lg_name)
if [[ ${output} != 0 ]]; then
exit 2
fi
done
exit 0
}
I used Anubhava's snippet, nevertheless Thanks to all the three champs.
It was tricky!
$ awk -vtoday=$(date "+%Y%m%d")
'/^Opening/ {file=$4}
/^After ftp connection/
{$1=$2=$3=$4=$5=$6=$NF="";
r="date -d \"" $0 "\" \"+%Y%m%d\""; r | getline dat;
if (today==dat) {print file, "PASS"}
else {print file, "FAIL"}}
' file
For file1.dat FAIL
For file2.dat PASS
Explanation
-vtoday=$(date "+%Y%m%d") gives today's date with "20131023" format
/^Opening/ {file=$4} gets lines starting with Opening and store the filename, that happens to be in the 4th field.
/^After ftp connection/ on lines starting with "After ftp connection...", do:
{$1=$2=$3=$4=$5=$6=$NF=""; delete up to 6th field and last one so the rest is the date info.
r="date -d \"" $0 "\" \"+%Y%m%d\""; r | getline dat; calculate the date on YYYYMMDD format of that line.
if (today==dat) {print file, "PASS} make comparison of dates.
else {print file, "FAIL"} idem.