Clear old resources in k8s - templates

I want to make a command which can clear all old deployments. For example, I have deployments in a namespace
kubectl -n web get deploy --sort-by=.metadata.creationTimestamp
myproject-static-staging-master 1/1 1 1 54d
myproject-static-staging-task-13373 1/1 1 1 20d
myproject-static-staging-task-13274 1/1 1 1 19d
myproject-static-staging-task-13230 1/1 1 1 19d
myproject-static-staging-task-13323 1/1 1 1 19d
myproject-static-staging-task-13264 1/1 1 1 18d
myproject-static-staging-task-13319 1/1 1 1 13d
myproject-static-staging-task-13470 1/1 1 1 6d20h
myproject-static-staging-task-13179 1/1 1 1 6d20h
myproject-static-staging-task-13453 1/1 1 1 6d4h
myproject-static-staging-moving-to-old 1/1 1 1 6d
myproject-static-staging-moving-test 1/1 1 1 5d20h
I want to save only that's (5 newest)
myproject-static-staging-task-13470 1/1 1 1 6d20h
myproject-static-staging-task-13179 1/1 1 1 6d20h
myproject-static-staging-task-13453 1/1 1 1 6d4h
myproject-static-staging-moving-to-old 1/1 1 1 6d
myproject-static-staging-moving-test 1/1 1 1 5d20h
I tried that command
kubectl get deployment -n web --template '{{range
.items}}{{.metadata.name}}{{"\n"}}{{end}}'
--sort-by=.metadata.creationTimestamp | grep -v master | grep myproject-static-staging | head -n 5 | xargs -r kubectl -n web delete
deployment
but it is no correct.

You can use xargs command like this:
command1 | xargs -I{} command2 {}
Xargs will replace the output from command1 with the empty {}. For example, if the output of command1 is '1 2 3', then Xargs will invoke commands: 'command2 1', 'command2 2', and 'command2 3'.
So in your case, you can use
kubectl get deployment -n web --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' --sort-by=.metadata.creationTimestamp | grep -v master | grep myproject-static-staging | tail -r | tail -n +6 | xargs -I{} kubectl -n web delete deployment {}
'tail -r' will reverse the order, and 'tail -n +6' will select all rows except the first 5.

Related

How do I use grep -Po regex options to match across multiple lines without being greedy?

I would like to capture only successful nmap scan results and exclude results that did not return useful information. I've listed my desired grep output below that I want.
I tried using (?s) to enable DOTALL to make . include line breaks so that I can match/capture across multiple lines, but the problem is that it appears to disable the use of \n which I want to use as part of my pattern.
I'm trying to use a lookahead but I know the .* is greedy and I think it's matching the longest string which is basically the entire file. I want it to use the shortest string instead.
How can I dynamically capture successful nmap scan results in the following text file using Grep's -Po regex options?
desired output:
Nmap scan report for 10.11.1.72
Host is up (0.028s latency).
PORT STATE SERVICE
111/tcp open rpcbind
| nfs-ls: Volume /home
| access: Read Lookup NoModify NoExtend NoDelete NoExecute
| PERMISSION UID GID SIZE TIME FILENAME
| drwxr-xr-x 0 0 4096 2015-09-17T13:21:59 .
| drwxr-xr-x 0 0 4096 2015-01-07T10:56:34 ..
| drwxr-xr-x 1013 1013 4096 2015-09-17T13:21:47 jenny
| drwxr-xr-x 1012 1012 4096 2015-09-17T13:21:40 joe45
| drwxr-xr-x 1011 1011 4096 2015-09-17T13:21:52 john
| drwxr-xr-x 1014 1014 4096 2019-10-27T23:48:51 marcus
| drwxr-x--- 0 1010 4096 2015-01-08T16:01:31 ryuu
|_
| nfs-showmount:
|_ /home 10.11.0.0/255.255.0.0
| nfs-statfs:
| Filesystem 1K-blocks Used Available Use% Maxfilesize Maxlink
|_ /home 7223800.0 2059608.0 4797244.0 31% 8.0T 32000
Here is my current command that I'm starting with:
grep -Poz '(?s)\d+\.\d+\.\d+\.\d+.*Nmap' test2
test2 file:
### SCAN RESULTS ###
Nmap scan report for 10.11.1.39
Host is up (0.041s latency).
PORT STATE SERVICE
111/tcp filtered rpcbind
Nmap scan report for 10.11.1.44
Host is up (0.043s latency).
PORT STATE SERVICE
111/tcp closed rpcbind
Nmap scan report for 10.11.1.50
Host is up (0.043s latency).
PORT STATE SERVICE
111/tcp filtered rpcbind
Nmap scan report for 10.11.1.71
Host is up (0.040s latency).
PORT STATE SERVICE
111/tcp closed rpcbind
Nmap scan report for 10.11.1.72
Host is up (0.040s latency).
PORT STATE SERVICE
111/tcp open rpcbind
| nfs-ls: Volume /home
| access: Read Lookup NoModify NoExtend NoDelete NoExecute
| PERMISSION UID GID SIZE TIME FILENAME
| drwxr-xr-x 0 0 4096 2015-09-17T13:21:59 .
| drwxr-xr-x 0 0 4096 2015-01-07T10:56:34 ..
| drwxr-xr-x 1013 1013 4096 2015-09-17T13:21:47 jenny
| drwxr-xr-x 1012 1012 4096 2015-09-17T13:21:40 joe45
| drwxr-xr-x 1011 1011 4096 2015-09-17T13:21:52 john
| drwxr-xr-x 1014 1014 4096 2019-10-27T23:48:51 marcus
| drwxr-x--- 0 1010 4096 2015-01-08T16:01:31 ryuu
|_
| nfs-showmount:
|_ /home 10.11.0.0/255.255.0.0
| nfs-statfs:
| Filesystem 1K-blocks Used Available Use% Maxfilesize Maxlink
|_ /home 7223800.0 2068516.0 4788336.0 31% 8.0T 32000
Nmap scan report for 10.11.1.73
Host is up (0.041s latency).
PORT STATE SERVICE
111/tcp filtered rpcbind
Nmap scan report for 10.11.1.75
Host is up (0.041s latency).
PORT STATE SERVICE
111/tcp filtered rpcbind
Nmap scan report for 10.11.1.79
Host is up (0.041s latency).
PORT STATE SERVICE
111/tcp filtered rpcbind
Nmap scan report for 10.11.1.101
Host is up (0.041s latency).
PORT STATE SERVICE
111/tcp closed rpcbind
Use a non-greedy quantifier followed by a lookahead.
grep -Poz '(?s)\d+\.\d+\.\d+\.\d+.*?(?=Nmap)' test2
Finally figured out how to do this, probably not the prettiest way of doing it but it works...
command:
grep -Poz 'Nmap scan report.+\nHost is up.+\n\nPORT.+\n\d+.+\n\|(.|\n)+?(?=\n\n)' test2
output:
Nmap scan report for 10.11.1.72
Host is up (0.041s latency).
PORT STATE SERVICE
111/tcp open rpcbind
| nfs-ls: Volume /home
| access: Read Lookup NoModify NoExtend NoDelete NoExecute
| PERMISSION UID GID SIZE TIME FILENAME
| drwxr-xr-x 0 0 4096 2015-09-17T13:21:59 .
| drwxr-xr-x 0 0 4096 2015-01-07T10:56:34 ..
| drwxr-xr-x 1013 1013 4096 2015-09-17T13:21:47 jenny
| drwxr-xr-x 1012 1012 4096 2015-09-17T13:21:40 joe45
| drwxr-xr-x 1011 1011 4096 2015-09-17T13:21:52 john
| drwxr-xr-x 1014 1014 4096 2019-10-27T23:48:51 marcus
| drwxr-x--- 0 1010 4096 2015-01-08T16:01:31 ryuu
|_
| nfs-showmount:
|_ /home 10.11.0.0/255.255.0.0
| nfs-statfs:
| Filesystem 1K-blocks Used Available Use% Maxfilesize Maxlink
|_ /home 7223800.0 2059600.0 4797252.0 31% 8.0T 32000
notes:
had to specify unique filter for first 5 lines to exclude unsuccessful scans
it's important to use -z as this allows for matching \n
it was necessary to use OR expression (.|\n)*? to match all text across multiple lines
used a lookahead (?=\n\n) to specify end of match
make sure to use +? to make modifier non-greedy so that it matches the shortest string instead of longest string

How to find the regex to perform exact match in shell scripting

I have output from the kubernetes command
kubectl get pods | grep eam-ui
eam-ui-hk8rk 1/1 Running 0 43m
eam-ui-jn9jj 1/1 Running 0 43m
eam-ui-v02-2vdlh 1/1 Running 0 2d6h
eam-ui-v02-4gkxx 1/1 Running 0 2d6h
eam-ui-v03-2hqjq 1/1 Running 0 2d22h
eam-ui-v03-jv4w7 1/1 Running 0 2d22h
I need match the exact string from first column like ( eam-ui, eam-ui-v02, eam-ui-v03 ). The last 5 alphanumeric will change for each execution
Tried with -w and even with -F option. Works with v02 & v03 it Worked. But for eam-ui, it matching all
$ kubectl get pods | grep -w eam-ui-v02
eam-ui-v02-2vdlh 1/1 Running 0 2d6h
eam-ui-v02-4gkxx 1/1 Running 0 2d6h
kubectl get pods | grep -w eam-ui-v03
eam-ui-v03-2hqjq 1/1 Running 0 2d22h
eam-ui-v03-jv4w7 1/1 Running 0 2d22h
get pods | grep -w eam-ui
eam-ui-hk8rk 1/1 Running 0 48m
eam-ui-jn9jj 1/1 Running 0 48m
eam-ui-v02-2vdlh 1/1 Running 0 2d6h
eam-ui-v02-4gkxx 1/1 Running 0 2d6h
eam-ui-v03-2hqjq 1/1 Running 0 2d22h
eam-ui-v03-jv4w7 1/1 Running 0 2d22h
from above i wanted only
eam-ui-hk8rk 1/1 Running 0 48m
eam-ui-jn9jj 1/1 Running 0 48m
I suggest using awk since you only need to check the first field values:
# To check eam-ui
kubectl get pods | awk '$1 ~ /^eam-ui-[[:alnum:]]{5}$/'
# To check eam-ui-v02
kubectl get pods | awk '$1 ~ /^eam-ui-v02-[[:alnum:]]{5}$/'
# To check eam-ui-v03
kubectl get pods | awk '$1 ~ /^eam-ui-v03-[[:alnum:]]{5}$/'
Details
^ - start of string
eam-ui- - literal text
[[:alnum:]]{5} - five alphanumeric chars
$ - end of string.
See online demo
This will exclude lines containing v02 or v03:
grep -v -e 'v0[2|3]' test.txt

Google Composer - Airflow: My tasks are not scheduled

I am using Airflow and Cloud Composer and as I have some issues with Airflow Scheduler (it is slow or stops)
Version: composer-1.10.4-airflow-1.10.6
I launched a "huge" collect (because I will sometimes need it) with airflow to test the scalability of my pipelines.
The result is that my scheduler apparently only schedule the DAGs with few tasks, and the tasks of the big DAGs are not scheduled. Do you have insights or advices about that?
Here are information about my current configuration:
Cluster config:
10 Cluster nodes, 20 vCPUs, 160Go Memory
airflow config:
core
store_serialized_dags: True
dag_concurrency: 160
store_dag_code: True
min_file_process_interval: 30
max_active_runs_per_dag: 1
dagbag_import_timeout: 900
min_serialized_dag_update_interval: 30
parallelism: 160
scheduler
processor_poll_interval: 1
max_threads: 8
dag_dir_list_interval: 30
celery
worker_concurrency: 16
webserver
default_dag_run_display_number: 5
workers: 2
worker_refresh_interval: 120
airflow scheduler DagBag parsing (airflow list_dags -r):
DagBag loading stats for /home/airflow/gcs/dags
Number of DAGs: 27
Total task number: 32229
DagBag parsing time: 22.468404
---------------+--------------------+---------+----------+-----------------------
file | duration | dag_num | task_num | dags
---------------+--------------------+---------+----------+-----------------------
/folder__dags/dag1 | 1.83547 | 1 | 1554 | dag1
/folder__dags/dag2 | 1.717692 | 1 | 3872 | dag2
/folder__dags/dag3 | 1.53 | 1 | 3872 | dag3
/folder__dags/dag4 | 1.391314 | 1 | 210 | dag4
/folder__dags/dag5 | 1.267788 | 1 | 3872 | dag5
/folder__dags/dag6 | 1.250022 | 1 | 1554 | dag6
/folder__dags/dag7 | 1.0973419999999998 | 1 | 2904 | dag7
/folder__dags/dag8 | 1.081566 | 1 | 3146 | dag8
/folder__dags/dag9 | 1.019032 | 1 | 3872 | dag9
/folder__dags/dag10 | 0.98541 | 1 | 1554 | dag10
/folder__dags/dag11 | 0.959722 | 1 | 160 | dag11
/folder__dags/dag12 | 0.868756 | 1 | 2904 | dag12
/folder__dags/dag13 | 0.81513 | 1 | 160 | dag13
/folder__dags/dag14 | 0.69578 | 1 | 14 | dag14
/folder__dags/dag15 | 0.617646 | 1 | 294 | dag15
/folder__dags/dag16 | 0.588876 | 1 | 210 | dag16
/folder__dags/dag17 | 0.563712 | 1 | 160 | dag17
/folder__dags/dag18 | 0.55615 | 1 | 726 | dag18
/folder__dags/dag19 | 0.553248 | 1 | 140 | dag19
/folder__dags/dag20 | 0.55149 | 1 | 168 | dag20
/folder__dags/dag21 | 0.543682 | 1 | 168 | dag21
/folder__dags/dag22 | 0.530684 | 1 | 168 | dag22
/folder__dags/dag23 | 0.498442 | 1 | 484 | dag23
/folder__dags/dag24 | 0.46574 | 1 | 14 | dag24
/folder__dags/dag25 | 0.454656 | 1 | 28 | dag25
/create_conf | 0.022272 | 1 | 20 | create_conf
/airflow_monitoring | 0.006782 | 1 | 1 | airflow_monitoring
---------------+--------------------+---------+----------+------------------------
Thank you for your help
Airflow scheduler processes files in the DAGs directory in round-robin scheduling algorithm and this can cause long delays between tasks because the scheduler will not be able to enqueue a task whose dependencies recently completed until its round robin returns to the enclosing DAG's module. Multiple DAG objects can be defined in the same Python module, but this is generally discouraged from a fault isolation perspective. It may be necessary to define multiple DAGs per module.
Sometimes the best approach is to restart the scheduler:
Get cluster credentials as described in official documentation
Run the following command to restart the scheduler:
kubectl get deployment airflow-scheduler -o yaml | kubectl replace --force -f -
Additionally, please restart the Airflow web server. Sometimes broken, invalid or resource intensive DAGs can cause webserver crashes, restarts or complete downtime. Once way to do so is remove or upgrade one of the PyPI packages from your environment.
Exceeding API usage limits/quotas
To avoid exceeding API usage limits/quotas or avoid running too many simultaneous processes, you can define Airflow pools in the Airflow web UI and associate tasks with existing pools in your DAGs. Refer to the Airflow documentation.
Check the logs in Logging section -> Cloud Composer Environment and look for any errors or warnings like: cannot import module, DagNotFound in DagModel.
Please, have a look to my earlier answer regarding memory. Referring to the official documentation:
DAG execution is RAM limited. Each task execution starts with two
Airflow processes: task execution and monitoring. Currently, each node
can take up to 6 concurrent tasks. More memory can be consumed,
depending on the size of the DAG.
Moreover, I would like to share with you an interesting article on Medium, regarding calculations for resource requests.
I hope you find the above pieces of information useful.

How to extract full paths from grep output using regular expression

I need to automatically detect any USB drives plugged in, mounted or not, mount the ones not mounted already in folders that have the name of the given name of the device (like it happens in a Windows machine by default) and get the routes of the mount points of all the devices. The devices should be mounted in folders in /media/pi (using a Raspberry Pi, so pi is my username). This is what I'm doing:
To get the path of all mounter devices:
1) Run lsblk, outputs:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.4G 0 disk
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
sdb 8:16 1 14.3G 0 disk
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
mmcblk0 179:0 0 14.9G 0 disk
├─mmcblk0p1 179:1 0 41.8M 0 part /boot
└─mmcblk0p2 179:2 0 14.8G 0 part /
2) With a particularly crafted line, I can filter out some unnecessary info:
I run lsblk | grep 'sd' | grep 'media' which outputs:
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
I need to get /media/pi/D0B46928B4691270 and /media/pi/MI PENDRIVE, preferably in an array. Currently I'm doing this:
lsblk | grep 'sd' | grep 'media' | cut -d '/' -f 4
But it only works with paths that have no spaces and the output of grep is not an array of course. What would be a clean way of doing this with regular expressions?
Thanks.
lsblk supports json output with the -J flag. I would recommend that if you want to parse the output:
lsblk -J | jq '..|.?|select(.name|startswith("sd")).mountpoint // empty'
Something like this?
$ echo "$f"
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.4G 0 disk
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
sdb 8:16 1 14.3G 0 disk
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
mmcblk0 179:0 0 14.9G 0 disk
├─mmcblk0p1 179:1 0 41.8M 0 part /boot
└─mmcblk0p2 179:2 0 14.8G 0 part /
$ grep -o '/media/.*$' <<<"$f"
/media/pi/D0B46928B4691270
/media/pi/MI PENDRIVE
$ IFS=$'\n' drives=( $(grep -o '/media/.*$' <<<"$f") )
$ printf '%s\n' "${drives[#]}"
/media/pi/D0B46928B4691270
/media/pi/MI PENDRIVE

parsing ns2 trace file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm using NS 2.35 and am trying to determine the end-to-end delay of my routing algorithm.
I think anyone with some good scripting experience should be able to answer this question, sadly that person is not me.
I have a trace file, that looks something like this:
- -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 6 ------- null}
h -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 -1 ------- null}
+ -t 0.55 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1056 -a 0 -x {2.0 17.0 10 ------- null}
+ -t 0.555 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1057 -a 0 -x {2.0 17.0 11 ------- null}
r -t 0.556 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
+ -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
- -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
But here is what I need to do.
A line that starts with + is when a new packet is added to the network.
A line starting with r is when a packet has been received by the destination. the double-typed number after the -t is the time at which that event happened. And finally, after the -i is the identity of the packet.
For me to calculate average end-to-end delay, I need to find every line that has a certain id after the -i. from there I need to calculate the timestamp of the r minus the timestamp of the +
So I figure there could be a regular expression separated by spaces. I could put each of the segements into their own variables. Then I would check the 15th (the packet ID).
But I'm not sure where to go from there, or how to put it all together.
I know there are some AWK scripts on the web for doing this, but they are all outdated and don't fit the current format (and I'm not sure how to change them).
Any help would be greatly appreciated.
EDIT:
Here is an example of a full packet route that I'm looking to find.
I've taken out a lot of lines in between these ones, so that you can see a single packets events.
# a packet is enqueued from node 2 going to node 7. It's ID is 1636. this was at roughly 1.75sec
+ -t 1.74499999999998 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.1s, it left node 2.
- -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.134 it hopped from 2 to 7 (not important)
h -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# at 2.182 it was received by node 7
r -t 2.182 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it was the enqueued by node 7 to be sent to node 12
+ -t 2.182 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# slightly later it left node 7 on its was to node 12
- -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it hopped from 7 to 12 (not important)
h -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 12
r -t 2.2312 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# added to queue, heading to node 17
+ -t 2.2312 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# left for node 17
- -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# hopped to 17 (not important)
h -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 17 notice the time delay
r -t 2.28 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
The ideal output of the script would recognize 2.134 as the start time, and 2.28 as the end, and then give me the delay of 0.146sec. It would do this for all packet IDs and only report the average.
It was requested that I expand a bit on how the file works, and what I am expecting.
The file is listing descriptions of about 10,000 packets. Each packet can be in a different state. The important states are + which means a packet has been enqueued at a router, and r which means the packet has been received by its destination.
It is possible that a packet that is enqueued (so a + entry) is not actually received and is instead dropped. This means we cannot assume that for every + entry there will be a r entry.
What I'm trying to measure is the average end to end delay. What this means, is that if you look at a single packet, it will have a time it was enqueued, and a time it was received. I need to make this calculation to find its end-to-end delay. But I also need to do it for 9,999 other packets to get an average.
I've thought about it more, and heres generally how I think the algorithm needs to work.
remove all lines that don't begin with a + or an r because they are unimportant.
go through all of the packet IDs (that is the numbers after -i, such as 1052 in the example), and put them into some sort of groups (multiple arrays perhaps).
each group should now contain all of the information about a particular packet.
inside the group, check if there is a +, ideally we want the very first +. Record its time.
look for any more + lines. Look at their time. It's possible the log is slightly jumbled. So its possible there is a + line later on that is actually earlier in the simulation.
If this new + line has an earlier time, then update the time variable with that.
assuming there are no more + lines, look for an r line.
if there is no r line, the packet was dropped so don't worry about it.
for every r line you find, all we need to do is find the one who has the lastest timestamp
The r line with the latest timestamp is where the packet was finally received.
subtract the + time from the r time, this gives us the time it took for the packet to travel.
Add this value to an array so that later it can be averaged.
repeat this process on every packet ID group, and then finally average the created array of delays.
Thats a lot of typing, but I think its as clear as I can be in what I want. I wish i was a regex master, but I just don't have time to learn it well enough to pull this off.
Thanks for all your help, and let me know if you have any questions.
There's not much to work with here, as Iain said in the comments to your question, but if I understand what you want to do correctly, something like this should work:
awk '/^[+r]/{$1~/r/?r[$15]=$2:r[$15]?d[$15]=r[$15]-$2:1} END {for(p in d){sum+=r[p];num++}print sum/num}' trace.file
It skips all lines not starting with '+' or 'r'. If the line starts with 'r' it adds time to the r array. Otherwise, it calculates the delay and adds it to the d array if the element is found in the r array. Finally it loops over the elements in the d array, adds up the total delay and number of elements and calculates the average from this. In your case the average is 0.
The :1 at the end of the main block is just in there so I can get away with a ternary expression instead of the significantly more verbose if statement.
EDIT: New expression to work with the added conditions:
awk '/^[+r]/{$1~/r/?$3>r[$15]?r[$15]=$3:1:!a[$15]||$3<a[$15]?a[$15]=$3:1} END {for(i in r){sum+=r[i]-a[i];num++}print "Average delay", sum/num}'
or as an awk-file
/^[+r]/ {
if ($1 ~ /r/) {
if ($3 > received[$15])
received[$15] = $3;
} else {
if (!added[$15] || $3 < added[$15])
added[$15] = $3;
}
} END {
for (packet in received) {
sum += received[packet] - added[packet];
num++
}
print "Average delay", sum/num
}
According to your algorithm it seems like 1.745 would be the start time, while you write that 2.134 is.