How to extract full paths from grep output using regular expression

How to extract full paths from grep output using regular expression - regex

I need to automatically detect any USB drives plugged in, mounted or not, mount the ones not mounted already in folders that have the name of the given name of the device (like it happens in a Windows machine by default) and get the routes of the mount points of all the devices. The devices should be mounted in folders in /media/pi (using a Raspberry Pi, so pi is my username). This is what I'm doing:
To get the path of all mounter devices:
1) Run lsblk, outputs:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.4G 0 disk
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
sdb 8:16 1 14.3G 0 disk
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
mmcblk0 179:0 0 14.9G 0 disk
├─mmcblk0p1 179:1 0 41.8M 0 part /boot
└─mmcblk0p2 179:2 0 14.8G 0 part /
2) With a particularly crafted line, I can filter out some unnecessary info:
I run lsblk | grep 'sd' | grep 'media' which outputs:
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
I need to get /media/pi/D0B46928B4691270 and /media/pi/MI PENDRIVE, preferably in an array. Currently I'm doing this:
lsblk | grep 'sd' | grep 'media' | cut -d '/' -f 4
But it only works with paths that have no spaces and the output of grep is not an array of course. What would be a clean way of doing this with regular expressions?
Thanks.

lsblk supports json output with the -J flag. I would recommend that if you want to parse the output:
lsblk -J | jq '..|.?|select(.name|startswith("sd")).mountpoint // empty'

Something like this?
$ echo "$f"
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 14.4G 0 disk
└─sda1 8:1 1 14.4G 0 part /media/pi/D0B46928B4691270
sdb 8:16 1 14.3G 0 disk
└─sdb1 8:17 1 14.3G 0 part /media/pi/MI PENDRIVE
mmcblk0 179:0 0 14.9G 0 disk
├─mmcblk0p1 179:1 0 41.8M 0 part /boot
└─mmcblk0p2 179:2 0 14.8G 0 part /
$ grep -o '/media/.*$' <<<"$f"
/media/pi/D0B46928B4691270
/media/pi/MI PENDRIVE
$ IFS=$'\n' drives=( $(grep -o '/media/.*$' <<<"$f") )
$ printf '%s\n' "${drives[#]}"
/media/pi/D0B46928B4691270
/media/pi/MI PENDRIVE

Related

How to increase 1st partition size via terminal only when there are second and third adjacent partitions for NVME

This is on an AWS EC2 M5a with EBS (Ubuntu 16.04)
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 50G 0 disk
├─nvme0n1p1 259:2 0 20G 0 part /
├─nvme0n1p2 259:3 0 2G 0 part [SWAP]
└─nvme0n1p3 259:4 0 28G 0 part
├─vg_abcdef-logs 251:1 0 8G 0 lvm /var/log
└─vg_abcdef-app 251:2 0 19G 0 lvm /home/abcdef
nvme1n1 259:0 0 50G 0 disk
└─vg_backups-backups 251:0 0 49G 0 lvm /home/abcdef/Backups-Disk
I added 50GB to the disk (nvme0n1)/EBS volume for a total of 100GB and need to expand the first partition (root /). I have the following:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 100G 0 disk
├─nvme0n1p1 259:2 0 20G 0 part /
├─nvme0n1p2 259:3 0 2G 0 part [SWAP]
└─nvme0n1p3 259:4 0 28G 0 part
├─vg_abcdef-logs 251:1 0 8G 0 lvm /var/log
└─vg_abcdef-app 251:2 0 19G 0 lvm /home/abcdef
nvme1n1 259:0 0 50G 0 disk
└─vg_backups-backups 251:0 0 49G 0 lvm /home/abcdef/Backups-Disk
The resize2fs command wont work on the first partition because there is a second and third partition right after it. As you will see below - resize2fs will work on the third (nvme0n1p3)
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 100G 0 disk
├─nvme0n1p1 259:2 0 20G 0 part /
├─nvme0n1p2 259:3 0 2G 0 part [SWAP]
└─nvme0n1p3 259:4 0 78G 0 part
├─vg_abcdef-logs 251:1 0 8G 0 lvm /var/log
└─vg_abcdef-app 251:2 0 19G 0 lvm /home/abcdef
nvme1n1 259:0 0 50G 0 disk
└─vg_backups-backups 251:0 0 49G 0 lvm /home/abcdef/Backups-Disk
How do I move the second and third partitions (via terminal/CLI only) whereas there is enough space to expand (extend file system) on the first partition? I would prefer a solution wherein I do not have to stop and restart the EC2 instance.

error when resizing partition using `growpart` on AWS EBS instance

I have an EC2 instance where I'm attempting to resize the disk on the fly. I've followed the instructions in this SO post but when I run sudo growpart /dev/nvme0n1p1 1, I get the following error:
FAILED: failed to get start and end for /dev/nvme0n1p11 in /dev/nvme0n1p1
What does this mean and how can I resolve it?
More info:
Output from lsblk:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 300G 0 disk
└─nvme0n1p1 259:1 0 300G 0 part /
I can see that EBS volume is in the in-use (optimizing) state.
Thanks in advance!

But for me the solution didn’t work
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdf 202:80 0 280G 0 disk
├─xvdf1 202:81 0 4G 0 part [SWAP]
├─xvdf2 202:82 0 10G 0 part /data1
├─xvdf3 202:83 0 10G 0 part /data2
├─xvdf4 202:84 0 1K 0 part
├─xvdf5 202:85 0 10G 0 part /applications1
├─xvdf6 202:86 0 4G 0 part /applications2
├─xvdf7 202:87 0 8G 0 part /logsOld
├─xvdf8 202:88 0 50G 0 part /extra
├─xvdf9 202:89 0 20G 0 part /logs
└─xvdf10 202:90 0 64G 0 part /extra/tmp
growpart /dev/xvdf 10
FAILED: failed to get start and end for /dev/xvdf10 in /dev/xvdf

I think the name of the command growpart is a bit misleading, because following the aws instructions you should grow the disk:
sudo growpart /dev/nvme0n1 1
not the partition /dev/nvme0n1p1

How to find the regex to perform exact match in shell scripting

I have output from the kubernetes command
kubectl get pods | grep eam-ui
eam-ui-hk8rk 1/1 Running 0 43m
eam-ui-jn9jj 1/1 Running 0 43m
eam-ui-v02-2vdlh 1/1 Running 0 2d6h
eam-ui-v02-4gkxx 1/1 Running 0 2d6h
eam-ui-v03-2hqjq 1/1 Running 0 2d22h
eam-ui-v03-jv4w7 1/1 Running 0 2d22h
I need match the exact string from first column like ( eam-ui, eam-ui-v02, eam-ui-v03 ). The last 5 alphanumeric will change for each execution
Tried with -w and even with -F option. Works with v02 & v03 it Worked. But for eam-ui, it matching all
$ kubectl get pods | grep -w eam-ui-v02
eam-ui-v02-2vdlh 1/1 Running 0 2d6h
eam-ui-v02-4gkxx 1/1 Running 0 2d6h
kubectl get pods | grep -w eam-ui-v03
eam-ui-v03-2hqjq 1/1 Running 0 2d22h
eam-ui-v03-jv4w7 1/1 Running 0 2d22h
get pods | grep -w eam-ui
eam-ui-hk8rk 1/1 Running 0 48m
eam-ui-jn9jj 1/1 Running 0 48m
eam-ui-v02-2vdlh 1/1 Running 0 2d6h
eam-ui-v02-4gkxx 1/1 Running 0 2d6h
eam-ui-v03-2hqjq 1/1 Running 0 2d22h
eam-ui-v03-jv4w7 1/1 Running 0 2d22h
from above i wanted only
eam-ui-hk8rk 1/1 Running 0 48m
eam-ui-jn9jj 1/1 Running 0 48m

I suggest using awk since you only need to check the first field values:
# To check eam-ui
kubectl get pods | awk '$1 ~ /^eam-ui-[[:alnum:]]{5}$/'
# To check eam-ui-v02
kubectl get pods | awk '$1 ~ /^eam-ui-v02-[[:alnum:]]{5}$/'
# To check eam-ui-v03
kubectl get pods | awk '$1 ~ /^eam-ui-v03-[[:alnum:]]{5}$/'
Details
^ - start of string
eam-ui- - literal text
[[:alnum:]]{5} - five alphanumeric chars
$ - end of string.
See online demo

This will exclude lines containing v02 or v03:
grep -v -e 'v0[2|3]' test.txt

Way to get SCSI disk names in Linux C++ application

In my Linux C++ application I want to get names of all SCSI disks which are present on the
system. e.g. /dev/sda, /dev/sdb, ... and so on.
Currently I am getting it from the file /proc/scsi/sg/devices output using below code:
host chan SCSI id lun type opens qdepth busy online
0 0 0 0 0 1 128 0 1
1 0 0 0 0 1 128 0 1
1 0 0 1 0 1 128 0 1
1 0 0 2 0 1 128 0 1
// If SCSI device Id is > 26 then the corresponding device name is like /dev/sdaa or /dev/sdab etc.
if (MAX_ENG_ALPHABETS <= scsiId)
{
// Device name order is: aa, ab, ..., az, ba, bb, ..., bz, ..., zy, zz.
deviceName.append(1, 'a'+ (char)(index / MAX_ENG_ALPHABETS) - 1);
deviceName.append(1, 'a'+ (char)(index % MAX_ENG_ALPHABETS));
}
// If SCSI device Id is < 26 then the corresponding device name is liek /dev/sda or /dev/sdb etc.
else
{
deviceName.append(1, 'a'+ index);
}
But the file /proc/scsi/sg/devices also contains the information about the disk which were previously present on the system. e.g If I detach the disk (LUN) /dev/sdc from the system
the file /proc/scsi/sg/devices still contains info of /dev/sdc which is invalid.
Tell me is there any different way to get the SCSI disk names? like a system call?
Thanks

You can simply read list of all files like /dev/sd* (in C, you would need to use opendir/readdir/closedir) and filter it by sdX (where X is one or two letters).
Also, you can get list of all partitions by reading single file /proc/partitions, and then filter 4th field by sdX:
$ cat /proc/partitions
major minor #blocks name
8 0 52428799 sda
8 1 265041 sda1
8 2 1 sda2
8 5 2096451 sda5
8 6 50066541 sda6
which would give you list of all physical disks together with their capacity (3rd field).

After get disk name list from /proc/scsi/sg/devices, you can verify the existence through code. For example, install sg3-utils, and use sg_inq to query whether the disk is active.

parsing ns2 trace file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm using NS 2.35 and am trying to determine the end-to-end delay of my routing algorithm.
I think anyone with some good scripting experience should be able to answer this question, sadly that person is not me.
I have a trace file, that looks something like this:
- -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 6 ------- null}
h -t 0.548 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1052 -a 0 -x {2.0 17.0 -1 ------- null}
+ -t 0.55 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1056 -a 0 -x {2.0 17.0 10 ------- null}
+ -t 0.555 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1057 -a 0 -x {2.0 17.0 11 ------- null}
r -t 0.556 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
+ -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
- -t 0.556 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1047 -a 0 -x {2.0 17.0 1 ------- null}
But here is what I need to do.
A line that starts with + is when a new packet is added to the network.
A line starting with r is when a packet has been received by the destination. the double-typed number after the -t is the time at which that event happened. And finally, after the -i is the identity of the packet.
For me to calculate average end-to-end delay, I need to find every line that has a certain id after the -i. from there I need to calculate the timestamp of the r minus the timestamp of the +
So I figure there could be a regular expression separated by spaces. I could put each of the segements into their own variables. Then I would check the 15th (the packet ID).
But I'm not sure where to go from there, or how to put it all together.
I know there are some AWK scripts on the web for doing this, but they are all outdated and don't fit the current format (and I'm not sure how to change them).
Any help would be greatly appreciated.
EDIT:
Here is an example of a full packet route that I'm looking to find.
I've taken out a lot of lines in between these ones, so that you can see a single packets events.
# a packet is enqueued from node 2 going to node 7. It's ID is 1636. this was at roughly 1.75sec
+ -t 1.74499999999998 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.1s, it left node 2.
- -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# at 2.134 it hopped from 2 to 7 (not important)
h -t 2.134 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# at 2.182 it was received by node 7
r -t 2.182 -s 2 -d 7 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it was the enqueued by node 7 to be sent to node 12
+ -t 2.182 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# slightly later it left node 7 on its was to node 12
- -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# it hopped from 7 to 12 (not important)
h -t 2.1832 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 12
r -t 2.2312 -s 7 -d 12 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# added to queue, heading to node 17
+ -t 2.2312 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# left for node 17
- -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
# hopped to 17 (not important)
h -t 2.232 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 -1 ------- null}
# received by 17 notice the time delay
r -t 2.28 -s 12 -d 17 -p cbr -e 500 -c 0 -i 1636 -a 0 -x {2.0 17.0 249 ------- null}
The ideal output of the script would recognize 2.134 as the start time, and 2.28 as the end, and then give me the delay of 0.146sec. It would do this for all packet IDs and only report the average.
It was requested that I expand a bit on how the file works, and what I am expecting.
The file is listing descriptions of about 10,000 packets. Each packet can be in a different state. The important states are + which means a packet has been enqueued at a router, and r which means the packet has been received by its destination.
It is possible that a packet that is enqueued (so a + entry) is not actually received and is instead dropped. This means we cannot assume that for every + entry there will be a r entry.
What I'm trying to measure is the average end to end delay. What this means, is that if you look at a single packet, it will have a time it was enqueued, and a time it was received. I need to make this calculation to find its end-to-end delay. But I also need to do it for 9,999 other packets to get an average.
I've thought about it more, and heres generally how I think the algorithm needs to work.
remove all lines that don't begin with a + or an r because they are unimportant.
go through all of the packet IDs (that is the numbers after -i, such as 1052 in the example), and put them into some sort of groups (multiple arrays perhaps).
each group should now contain all of the information about a particular packet.
inside the group, check if there is a +, ideally we want the very first +. Record its time.
look for any more + lines. Look at their time. It's possible the log is slightly jumbled. So its possible there is a + line later on that is actually earlier in the simulation.
If this new + line has an earlier time, then update the time variable with that.
assuming there are no more + lines, look for an r line.
if there is no r line, the packet was dropped so don't worry about it.
for every r line you find, all we need to do is find the one who has the lastest timestamp
The r line with the latest timestamp is where the packet was finally received.
subtract the + time from the r time, this gives us the time it took for the packet to travel.
Add this value to an array so that later it can be averaged.
repeat this process on every packet ID group, and then finally average the created array of delays.
Thats a lot of typing, but I think its as clear as I can be in what I want. I wish i was a regex master, but I just don't have time to learn it well enough to pull this off.
Thanks for all your help, and let me know if you have any questions.

There's not much to work with here, as Iain said in the comments to your question, but if I understand what you want to do correctly, something like this should work:
awk '/^[+r]/{$1~/r/?r[$15]=$2:r[$15]?d[$15]=r[$15]-$2:1} END {for(p in d){sum+=r[p];num++}print sum/num}' trace.file
It skips all lines not starting with '+' or 'r'. If the line starts with 'r' it adds time to the r array. Otherwise, it calculates the delay and adds it to the d array if the element is found in the r array. Finally it loops over the elements in the d array, adds up the total delay and number of elements and calculates the average from this. In your case the average is 0.
The :1 at the end of the main block is just in there so I can get away with a ternary expression instead of the significantly more verbose if statement.
EDIT: New expression to work with the added conditions:
awk '/^[+r]/{$1~/r/?$3>r[$15]?r[$15]=$3:1:!a[$15]||$3<a[$15]?a[$15]=$3:1} END {for(i in r){sum+=r[i]-a[i];num++}print "Average delay", sum/num}'
or as an awk-file
/^[+r]/ {
if ($1 ~ /r/) {
if ($3 > received[$15])
received[$15] = $3;
} else {
if (!added[$15] || $3 < added[$15])
added[$15] = $3;
}
} END {
for (packet in received) {
sum += received[packet] - added[packet];
num++
}
print "Average delay", sum/num
}
According to your algorithm it seems like 1.745 would be the start time, while you write that 2.134 is.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to extract full paths from grep output using regular expression - regex

lsblk supports json output with the -J flag. I would recommend that if you want to parse the output: lsblk -J | jq '..|.?|select(.name|startswith("sd")).mountpoint // empty'

Related

How to increase 1st partition size via terminal only when there are second and third adjacent partitions for NVME

error when resizing partition using `growpart` on AWS EBS instance

How to find the regex to perform exact match in shell scripting

Way to get SCSI disk names in Linux C++ application

parsing ns2 trace file [closed]

Categories

Resources