GREP values from a column in txt file - regex

I have a txt file with 1200 entries in this way (iPerf output by the way)
1 [ 4] 0.0- 1.0 sec 10.6 MBytes 89.1 Mbits/sec
2 [ 4] 1.0- 2.0 sec 13.5 MBytes 113 Mbits/sec
3 [ 4] 2.0- 3.0 sec 9.50 MBytes 79.7 Mbits/sec
4 [ 4] 3.0- 4.0 sec 9.00 MBytes 75.5 Mbits/sec
How can I get ONLY the second values expressed in Mbits/sec using grep ?
Output example:
89.1
113
79.7
75.5

awk '{print $9}' your-file.txt
will do it for you. For example:
$ cat ~/test.txt
1 [ 4] 0.0- 1.0 sec 10.6 MBytes 89.1 Mbits/sec
2 [ 4] 1.0- 2.0 sec 13.5 MBytes 113 Mbits/sec
3 [ 4] 2.0- 3.0 sec 9.50 MBytes 79.7 Mbits/sec
4 [ 4] 3.0- 4.0 sec 9.00 MBytes 75.5 Mbits/sec
$ awk '{print $9}' ~/test.txt
89.1
113
79.7
75.5
Another way to tackle this is:
awk -F 'MBytes' '{print $2}' test.txt | awk -F 'Mbits' '{print $1}' | tr -d " "
In the above method we are:
Splitting each line by MBytes.
That gives us 2 parts: $1 is everything before MBytes. $2 is everything after MBytes
We choose everything after MBytes and split it further by Mbits
That gives us two parts again and we choose everything before Mbits
If there is white space before and after the numbers, we use tr to remove white space
So we get
$ cat test.txt
1 [ 4] 0.0- 1.0 sec 10.6 MBytes 89.1 Mbits/sec
2 [ 4] 1.0- 2.0 sec 13.5 MBytes 113 Mbits/sec
3 [ 4] 2.0- 3.0 sec 9.50 MBytes 79.7 Mbits/sec
4 [ 4] 3.0- 4.0 sec 9.00 MBytes 75.5 Mbits/sec
awk -F 'MBytes' '{print $2}' test.txt | awk -F 'Mbits' '{print $1}' | tr -d " "
Result:
89.1
113
79.7
75.5

if your data is fixed length format you can always use cut
cut -c38-41 data
if you know that the values are 4 chars wide.

Related

The throughput of ip_reasseble is low

The throughput of ip_reasseble is low, ip_reassembly example was used to test it. Another PC2 run iperf -c to send ip fragments packets, the maximum throughput is about 900Mbps.
Testing setups:
PC1(dpdk) --- PC1(iperf -c)
PC1: 10Gb/s NIC
PC2: 1Gb/s NIC, MTU: 1500
#./build/ip_reassembly -l 2 -- -p 0x1 &
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:00:1f.6 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:15b7 net_e1000_em
EAL: PCI device 0000:04:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:04:00.1 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10fb net_ixgbe
IP_RSMBL: Creating LPM table on socket 0
IP_RSMBL: Creating LPM6 table on socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
Initializing port 0 ... Port 0 modified RSS hash function based on hardware support,requested:0xa38c configured:0x8104
Address:00:1B:21:C1:E9:C6
txq=2,0,0
IP_RSMBL: Socket 0: adding route 100.10.0.0/16 (port 0)
IP_RSMBL: Socket 0: adding route 100.20.0.0/16 (port 1)
IP_RSMBL: Socket 0: adding route 100.30.0.0/16 (port 2)
IP_RSMBL: Socket 0: adding route 100.40.0.0/16 (port 3)
IP_RSMBL: Socket 0: adding route 100.50.0.0/16 (port 4)
IP_RSMBL: Socket 0: adding route 100.60.0.0/16 (port 5)
IP_RSMBL: Socket 0: adding route 100.70.0.0/16 (port 6)
IP_RSMBL: Socket 0: adding route 100.80.0.0/16 (port 7)
IP_RSMBL: Socket 0: adding route 0101:0101:0101:0101:0101:0101:0101:0101/48 (port 0)
IP_RSMBL: Socket 0: adding route 0201:0101:0101:0101:0101:0101:0101:0101/48 (port 1)
IP_RSMBL: Socket 0: adding route 0301:0101:0101:0101:0101:0101:0101:0101/48 (port 2)
IP_RSMBL: Socket 0: adding route 0401:0101:0101:0101:0101:0101:0101:0101/48 (port 3)
IP_RSMBL: Socket 0: adding route 0501:0101:0101:0101:0101:0101:0101:0101/48 (port 4)
IP_RSMBL: Socket 0: adding route 0601:0101:0101:0101:0101:0101:0101:0101/48 (port 5)
IP_RSMBL: Socket 0: adding route 0701:0101:0101:0101:0101:0101:0101:0101/48 (port 6)
IP_RSMBL: Socket 0: adding route 0801:0101:0101:0101:0101:0101:0101:0101/48 (port 7)
Checking link status
done
Port0 Link Up. Speed 10000 Mbps - full-duplex
IP_RSMBL: entering main loop on lcore 2
IP_RSMBL: -- lcoreid=2 portid=0
run iperf -c on PC2
# iperf -c 192.168.10.157 -i 1 -u -t 30 -p 2152 -b 900M -l 1600
------------------------------------------------------------
Client connecting to 192.168.10.157, UDP port 2152
Sending 1600 byte datagrams, IPG target: 13.56 us (kalman adjust)
UDP buffer size: 958 MByte (default)
------------------------------------------------------------
[ 3] local 192.168.10.100 port 37771 connected with 192.168.10.157 port 2152
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 113 MBytes 944 Mbits/sec
[ 3] 1.0- 2.0 sec 112 MBytes 944 Mbits/sec
[ 3] 2.0- 3.0 sec 112 MBytes 944 Mbits/sec
[ 3] 3.0- 4.0 sec 113 MBytes 944 Mbits/sec
[ 3] 4.0- 5.0 sec 112 MBytes 944 Mbits/sec
[ 3] 5.0- 6.0 sec 112 MBytes 944 Mbits/sec
[ 3] 6.0- 7.0 sec 113 MBytes 944 Mbits/sec
[ 3] 7.0- 8.0 sec 112 MBytes 944 Mbits/sec
[ 3] 8.0- 9.0 sec 113 MBytes 944 Mbits/sec
[ 3] 9.0-10.0 sec 112 MBytes 944 Mbits/sec
[ 3] 10.0-11.0 sec 112 MBytes 944 Mbits/sec
[ 3] 11.0-12.0 sec 112 MBytes 944 Mbits/sec
[ 3] 12.0-13.0 sec 112 MBytes 944 Mbits/sec
[ 3] 13.0-14.0 sec 112 MBytes 944 Mbits/sec
[ 3] 14.0-15.0 sec 113 MBytes 944 Mbits/sec
[ 3] 15.0-16.0 sec 112 MBytes 944 Mbits/sec
[ 3] 16.0-17.0 sec 113 MBytes 944 Mbits/sec
[ 3] 17.0-18.0 sec 112 MBytes 944 Mbits/sec
[ 3] 18.0-19.0 sec 112 MBytes 944 Mbits/sec
[ 3] 19.0-20.0 sec 112 MBytes 944 Mbits/sec
[ 3] 20.0-21.0 sec 113 MBytes 944 Mbits/sec
[ 3] 21.0-22.0 sec 112 MBytes 944 Mbits/sec
[ 3] 22.0-23.0 sec 112 MBytes 944 Mbits/sec
[ 3] 23.0-24.0 sec 112 MBytes 944 Mbits/sec
[ 3] 24.0-25.0 sec 112 MBytes 944 Mbits/sec
[ 3] 25.0-26.0 sec 113 MBytes 944 Mbits/sec
[ 3] 26.0-27.0 sec 112 MBytes 944 Mbits/sec
[ 3] 27.0-28.0 sec 112 MBytes 944 Mbits/sec
[ 3] 28.0-29.0 sec 113 MBytes 944 Mbits/sec
[ 3] 29.0-30.0 sec 113 MBytes 944 Mbits/sec
[ 3] WARNING: did not receive ack of last datagram after 10 tries.
[ 3] 0.0-30.0 sec 3.30 GBytes 944 Mbits/sec
[ 3] Sent 2211842 datagrams
the result of ip fragments reassembled:
# ps
PID TTY TIME CMD
335 pts/9 00:02:35 ip_reassembly
535 pts/9 00:00:00 ps
25304 pts/9 00:00:00 su
25306 pts/9 00:00:00 zsh
# kill -SIGUSR1 335
-- lcoreid=2 portid=0 frag tbl stat:
max entries: 4096;
entries in use: 4088;
finds/inserts: 4344078;
entries added: 883521;
entries deleted by timeout: 837;
entries reused by timeout: 0;
total add failures: 2581961;
add no-space failures: 2581961;
add hash-collisions failures: 0;
TX bursts: 0
TX packets _queued: 0
TX packets dropped: 0
TX packets send: 0
RX gtpu packets: 872727
I add some stats of udp port 2152 in example of ip_reassembly to show the successful reassembled
packets. According to the result, the PC2 send 2211842 datagrams while only 872727 packets were reassembled by ip_reassembly. When I low down the sending speed of iperf to 800Mbps, no drops print.
I don't find the throughput description in DPDK guide https://doc.dpdk.org/guides-22.07/prog_guide/ip_fragment_reassembly_lib.html
has anyone met the same questions?

Match and $Matches in Powershell via RegEx

I have a little issue with a Powershell project, I've been working on for some time now.
The basic idea is that six iPerf speed measurements will be executed.
A logfile is created to have some data which can be displayed to the user.
But there's some issue with the match and variable Matches in Powershell to display multiple values..
There's the code, I've been working on..
$1 = (Get-Content -Path 'iperf3.txt' -TotalCount 398)[-1] # Fetch details about speed measurements for download
$2 = (Get-Content -Path 'iperf3.txt' -TotalCount 796)[-1] # Fetch details about speed measurements for upload
$3 = (Get-Content -Path 'iperf3.txt' -TotalCount 1195)[-1] # Same as above
$4 = (Get-Content -Path 'iperf3.txt' -TotalCount 1593)[-1] # Same as above
$5 = (Get-Content -Path 'iperf3.txt' -TotalCount 1992)[-1] # Same as above
$6 = (Get-Content -Path 'iperf3.txt' -TotalCount 2390)[-1] # Same as above
Output via Get-Content and TotalCount
[SUM] 0.00-30.00 sec 1.09 GBytes 313 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 312 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.11 GBytes 317 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 311 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.11 GBytes 317 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 312 Mbits/sec receiver
Afterwards, I use the RegEx and Variable to output the numbers before Mbits/sec and include Mbits/sec from this line of code..
$1 -match '\d+\sMbits[/]sec'
$2 -match '\d+\sMbits[/]sec'
etc.
I do variable Matches to validate that the Variable is True, and receive the output of 312 Mbits/sec, but nothing more.
Now this is where I cannot see the fault in the code. The variable passed and it's true, but I only have one Value as 313 Mbits/sec via Value.
I figured that I would see both 313 Mbits/sec and 312 Mbits/sec in the output/value prompt.
Did I do something wrong while using the match/Matches variable/function?
Any feedback and/or suggestions will be appreciated.
The automatic $Matches variable only ever reflects the results of the most recent -match operation - and then only if (a) the matching was successful and (b), fundamentally, only if the LHS was a single string - if the LHS was a collection (array), -match acts as a filter, returning the subarray of matching elements, and does not populate $Matches.
However, your command can be greatly streamlined:
Use a single Get-Content call
Use a Select-Object call with the -Index parameter to extract the lines of interest (indices are 0-based).
Use the -replace operator instead of -match in order to directly extract the substrings of interest:
(
Get-Content 'iperf3.txt' | Select-Object -Index 397,795,1194,1592,1991,2389
) -replace '.+\b(\d+\sMbits/sec).+', '$1'
Taking a step back:
Instead of selecting the lines of interest by fixed indices (line numbers), select them by regexes too, which allows you to use a single Select-String call:
Select-String -LiteralPath 'iperf3.txt' -Pattern '\s*\[SUM].+\b(\d+\sMbits/sec).+' |
ForEach-Object {
$_.Matches.Groups[1].Value
}
You didn't do anything wrong it's just the default behavior for the -match operator and how the $Matches automatic variable is populated.
Here is an extract from Matching operators that explains very well how it works:
It is important to note that the $Matches hashtable contains only the first occurrence of any matching pattern.
You have 2 workarounds, the first one could be using Regex.Matches Method to find all appearances of the matched pattern:
$string = #'
[SUM] 0.00-30.00 sec 1.09 GBytes 313 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 312 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.11 GBytes 317 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 311 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.11 GBytes 317 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 312 Mbits/sec receiver
'#
[regex]::Matches($string, '\d+\sMbits[/]sec').Value
Note that, in above example, $string is a multi-line string, however in the example it will be an array since it requires a loop.
$string = #'
[SUM] 0.00-30.00 sec 1.09 GBytes 313 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 312 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.11 GBytes 317 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 311 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.11 GBytes 317 Mbits/sec receiver
[SUM] 0.00-30.00 sec 1.09 GBytes 312 Mbits/sec receiver
'# -split '\r?\n'
foreach($line in $string) {
if($line -match '\d+\sMbits[/]sec') {
$Matches[0]
}
}
GitHub issue #7867 proposes to add a -matchall operator to PowerShell, if you believe it would be helpful consider up-voting it.

How to match a regular expression to a text file?

I want to read from a file as text by data=fileread('channelresult') function. Then the data is used for regular expression matching. The channelresult file content is:
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-0.10 sec 9.24 MBytes 774 Mbits/sec
[ 4] 0.10-0.20 sec 14.8 MBytes 1.24 Gbits/sec
[ 4] 0.20-0.30 sec 15.0 MBytes 1.27 Gbits/sec
[ 4] 0.30-0.40 sec 17.6 MBytes 1.48 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.74 GBytes 1.49 Gbits/sec 1005 sender
[ 4] 0.00-10.00 sec 1.74 GBytes 1.49 Gbits/sec receiver
The regular expression I use is
pattern=number_str+'\s+sec\s+'+number_str+'\s+\w+\s+'+number_str+'\s+(\w)\w+/\w+\s+(\d+)\s+'+number_str+'\s(\w)'
And number_str='(\d*\.\d+|\d+)'. When I use out = regexp(data,pattern,'match') the variable out does not contain anything. It's a 0 by 0 cell array.

Reshape from wide to long without Identifier

I have problems in reshaping data from wide to long format:
I have no identifier variable for the wide variables.
My dataset is quite wide. I do have about 7000 variables.
The number of variables per ID is not constant, so for some IDs I have 5 and for others I have 10 variables.
I was hoping that this Stata FAQ could help me, but unfortunately this does not work properly (see following code snippets).
So I do have data that looks like the following example:
clear
input str45 Year
"2010"
"2011"
"2012"
"2014"
end
input str45 A101Meas0010
"1.50"
"1.70"
"1.71"
"1.71"
input str45 A101Meas0020
"50"
"60"
"65"
"64"
input str45 A101Meas0020A
"51"
"62"
"64"
"68"
input str45 FE123Meas0010
"1.60"
"1.75"
"1.92"
"1.94"
input str45 FE123Meas0020
"60"
"72"
"88"
"92"
list
+-------------------------------------------------------------+
| Year A10~0010 A10~0020 A1~0020A FE1~0010 FE1~0020 |
|-------------------------------------------------------------|
1. | 2010 1.50 50 51 1.60 60 |
2. | 2011 1.70 60 62 1.75 72 |
3. | 2012 1.71 65 64 1.92 88 |
4. | 2014 1.71 64 68 1.94 92 |
+-------------------------------------------------------------+
The final table I want to achieve would look something like this:
+--------------------------------------------------+
| Year ID Meas0010 Meas0020 Meas0020A |
|--------------------------------------------------|
1. | 2010 A101 1.50 50 . |
2. | 2010 FE123 1.60 51 60 |
3. | 2011 A101 1.70 60 . |
4. | 2011 FE123 1.75 62 72 |
5. | 2012 A101 1.71 65 . |
6. | 2012 FE123 1.92 64 88 |
7. | 2014 A101 1.71 64 . |
8. | 2014 FE123 1.94 68 92 |
+--------------------------------------------------+
I tried following code snippet close to the example from the Stata FAQ, but this throws an error:
unab vars : *Meas*
local stubs : subinstr local vars "Meas0010" "", all
local stubs : subinstr local stubs "Meas0020" "", all
local stubs : subinstr local stubs "Meas0020A" "", all
reshape long "`stubs'", i(Year) j(Measurement) string
(note: j = Meas0010 Meas0020 Meas0020A)
(note: A101AMeas0010 not found)
variable A101Meas0010 not found
r(111);
Any ideas how to reshape this? I never had to reshape such an odd structure before.
Additional Question: In the example above I did have to specify the Measurement-Names Meas0010, Meas0020 and Meas0020A. Is it possible to automate this as well? All measurement names start with the keyword Meas, so the variable names are always of the structure _ID+MeasName, e.g. A101Meas0020A stands for ID A101 and Measurement Meas0020A.
The annoying thing is: I do know how to do this in MATLAB, but I am forced to use Stata here.
Your variable name structure is a little awkward, but there is a syntax to match. It's better covered in the help for reshape, and is only barely mentioned in the FAQ you cite (which I wrote, so I can be emphatic that it's intended as a supplement to the help, not the first line of documentation).
Your example yields to
clear
input str4 (Year A101Meas0010 A101Meas0020 A101Meas0020A FE123Meas0010 FE123Meas0020)
"2010" "1.50" "50" "51" "1.60" "50"
"2011" "1.70" "60" "62" "1.75" "60"
"2012" "1.71" "65" "64" "1.92" "65"
"2014" "1.71" "64" "68" "1.94" "64"
end
reshape long #Meas0010 #Meas0020 #Meas0020A, i(Year) j(ID) string
destring, replace
sort Year ID
list, sepby(Year)
+-----------------------------------------------+
| Year ID Meas0010 Meas0020 Me~0020A |
|-----------------------------------------------|
1. | 2010 A101 1.5 50 51 |
2. | 2010 FE123 1.6 50 . |
|-----------------------------------------------|
3. | 2011 A101 1.7 60 62 |
4. | 2011 FE123 1.75 60 . |
|-----------------------------------------------|
5. | 2012 A101 1.71 65 64 |
6. | 2012 FE123 1.92 65 . |
|-----------------------------------------------|
7. | 2014 A101 1.71 64 68 |
8. | 2014 FE123 1.94 64 . |
+-----------------------------------------------+
It seems bizarre that your example enters everything as string: note the destring in my code.
Without access to your dataset, I'd say that you should be able to find the more general syntax without automation. You know that there are at most about 10 measurements in the fullest case. In any event you are already showing the syntax tricks needed to remove strings you don't need.

Extracting specific values from a from a text file with long lines

I'm trying to get all "CP" values from a log file like below:
2013-06-27 17:00:00,017 INFO - [AlertSchedulerThread18] [2013-06-27 16:59:59, 813] -- SN: 989333333333 ||DN: 989333333333 ||CategoryId: 4687 ||CGID: null||Processing started ||Billing started||Billing Process: 97 msec ||Response code: 2001 ||Package id: 4387 ||TransactionId: 66651372336199820989389553437483742||CDR:26 msec||CDR insertion: 135 msec||Successfully inserted in CDR Table||CP:53 msec||PROC - 9 msec||Successfully executed procedure call.||Billing Ended||197 msec ||Processing ended
2013-06-27 17:00:00,018 INFO - [AlertSchedulerThread62] [2013-06-27 16:59:59, 824] -- SN: 989333333333 ||DN: 989333333333 ||CategoryId: 3241 ||CGID: null||Processing started ||Billing started||Billing Process: 61 msec ||Response code: 2001 ||Package id: 2861 ||TransactionId: 666513723361998319893580191324005184||CDR:25 msec||CDR insertion: 103 msec||Successfully inserted in CDR Table||CP:59 msec||PROC - 24 msec||Successfully executed procedure call.||Billing Ended||187 msec ||Processing ended
2013-06-27 17:00:00,028 INFO - [AlertSchedulerThread29] [2013-06-27 16:59:59, 903] -- SN: 989333333333 ||DN: 989333333333 ||CategoryId: 4527 ||CGID: null||Processing started ||Billing started||Billing Process: 47 msec ||Response code: 2001 ||Package id: 4227 ||TransactionId: 666513723361999169893616006323701572||CDR:22 msec||CDR insertion: 83 msec||Successfully inserted in CDR Table||CP:21 msec||PROC - 7 msec||Successfully executed procedure call.||Billing Ended||112 msec ||Processing ended
...getting output like this:
CP:53 msec
CP:59 msec
CP:21 msec
How can I do this using awk?
cut is always good and fast for these things:
$ cut -d"*" -f3 file
CP:53 msec
CP:59 msec
CP:21 msec
Anyway, these awk ways can make it:
$ awk -F"|" '{print $27}' file | sed 's/*//g'
CP:53 msec
CP:59 msec
CP:21 msec
or
$ awk -F"\|\|" '{print $14}' file | sed 's/*//g'
CP:53 msec
CP:59 msec
CP:21 msec
Or also
$ awk -F"*" '{print $3}' file
CP:53 msec
CP:59 msec
CP:21 msec
In both, we set the field delimiter to split the string as some specific character | or *. Then we print a certain block of the split text.
How about a hilarious sed command?
sed -n 's/.*\*\*\(.*\)\*\*.*/\1/p'
$ awk -F'[|][|]' '{print $14}' file
**CP:53 msec**
**CP:59 msec**
**CP:21 msec**
If you REALLY have '*'s in the input, just tweak to remove them:
$ awk -F'[|][|]' '{gsub(/\*/,""); print $14}' file
CP:53 msec
CP:59 msec
CP:21 msec
There's always grep:
grep -o 'CP:[[:digit:]]* msec' log.txt
If it's not necessarily going to be msec every time, you can just take everything up to the pipe:
grep -o 'CP:[^|]*' log.txt
With awk:
awk -F"[|*]+" '{ print $14 }' file
Code for GNU sed
$sed -r 's/.*(CP:[0-9]+\smsec).*/\1/' file
CP:53 msec
CP:59 msec
CP:21 msec