removing the dots and colons in the time field - regex

I have the following contents from data.log file. I wish to extract the time value and part of the payload (after deadbeef in the payload, third row, starting second to last byte. Please refer to expected output).
data.log
print 1: file offset 0x0
ts=0x584819041ff529e0 2016-12-07 14:13:24.124834649 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 50 e6 61 c3 85 21 01 00 de ad be ef 85 d7 ..P.a..!........
91 21 6f 9a 32 94 fd 07 01 00 de ad be ef 85 d7 .!o.2...........
print 2: file offset 0x60
ts=0x584819041ff52b00 2016-12-07 14:13:24.124834716 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 68 e7 61 c3 85 21 01 00 de ad be ef 86 d7 ..h.a..!........
91 21 c5 34 77 bd fd 07 01 00 de ad be ef 86 d7 .!.4w...........
Expected output
I just want to replace the dots and colons in the time field (before UTC) and get the entire value.
141324124834649,85d79121
141324124834716,86d79121
What I have done so far
I have extracted the fields after "." but not sure how to replace the colons and get the entire time value.
awk -F '[= ]' '$NF == "UTC"{split($4,b,".");s=b[2]",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' data.log
124834649,85d79121
124834716,86d79121
Any help is much appreciated.

awk '$NF == "UTC"{gsub("[.:]","",$3);s=$3",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' data.log
Result:
141324124834649,85d79121
141324124834716,86d79121
PS: it can be simplified with getline :
awk '$NF == "UTC"{gsub("[.:]","",$3);s=$3","} /de ad be ef/{s=s $15 $16;getline;print(s $1 $2)}' data.log

You can extract the time part like this:
$ awk '/UTC/ {split($0,a); gsub(/[\.:]/,"",a[3]); print a[3]}' file
141324124834649
141324124834716

for the UTC part, (rest of the code is the same
awk '/UTC$/{gsub(/[\.:]/,"");print $3}' YourFile
just remove the ":" and "." and take the field value, other part of the line don't have those 2 character, so is not modified
$NF test is replaced by /UTC$/, a bit faster and simpler (OMHO)
the full code
awk -F '[= ]' '/UTC$/{gsub(/[\.:]/,"");s=$3",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' YourFile

Related

Regex to match part of a hex

so I need to use regex to match a part of a hexadecimal string, but that part is random. Let me try to explain more:
So I have this hexa data:
70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78
I need to match only the f2 in that case. But that is not always the case. Each data will be different. The only thing that is always the same is the '00 00 00' part and the '78' at the end. All the rest is random.
I managed to make the following regex:
/(?=00 00 00).+?(?=78)/
The output is:
00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0
But I dont know how to build a regex to take only the 'f2' (reminder: not always is going to be f2)
Any thoughts?
Given the explanation in this comment, the regex that you need is:
(?<=00 00 00 [0-9a-f]{2} )[0-9a-f]{2}
Providing the first input string from the question, this regex matches f2 (no spaces around it).
Check it online.
How it works:
(?<= # start of a positive lookbehind
00 00 00 # match the exact string ("00 00 00 ")
[0-9a-f] # match one hex digit (lowercase only)
{2} # match the previous twice (i.e. two hex digits)
# there is a space after ")"
) # end of the lookbehind
[0-9a-f]{2} # match two hex digits
The positive lookbehind works like a non-capturing group but it is not part of the match. Basically it says that the matching part ([0-9a-f]{2}) matches only if it is preceded by a match of the lookbehind expression.
The matching part of the expression is [0-9a-f]{2} (i.e. two hex digits).
You need to add i or whatever flag uses the regex engine that you use to denote "ignore cases" (i.e. the a-f part of regex also match A-F). If you cannot (or do not want to) provide this flag you can put [0-9A-Fa-f] everywhere and it works.
If your regex engine does not support lookbehind you can get the same result using capturing groups:
00 00 00 [0-9a-f]{2} ([0-9a-f]{2})
Applied on the same input, this regex matches 00 00 00 20 f2 and its first (and only) capturing group matches f2.
Check it online.
Update
If it is important that the input string contains 78 somewhere after the matching part then add (?=(?: [0-9a-z]{2})* 78) to the first regex:
(?<=00 00 00 [0-9a-f]{2} )[0-9a-f]{2}(?=(?: [0-9a-z]{2})* 78)
(?= introduces a positive lookahead. It behaves similar to a lookbehind but must stay after the matching part of the reged and it is verified against the part of the string located after the matching part of the string.
(?: starts a non-capturing group.
The [0-9a-z]{2} followed or preceded by a space in the lookahead and lookbehind ensure that the entire matching string is composed only of 2 hex digit numbers separated by spaces. You can use .* instead but that will match anything, even if they do not follow the format of 2 hex digit numbers.
For the version without lookaheads or lookbehinds add (?: [0-9a-z]{2})* 78 at the end of the regex:
00 00 00 [0-9a-f]{2} ([0-9a-f]{2})(?: [0-9a-z]{2})* 78
The regex matches the entire string starting with 00 00 00 and ending with 78 and the first capturing group matches the second number after 00 00 00 (your target).
Is the f2 surrounded by asterisks?
Without asterisks:
00 00 00 [a-f0-9]+ (?<hexits>[a-f0-9]+).+78
With asterisks:
\*(?<hexits>[a-f0-9]+)\*
You can use the following regex to match the hexadecimal value after "00 00 00": /00 00 00 ([0-9A-Fa-f]{2})/. The value you want is in the capturing group, represented by \1.
Here is a demo:
import re
s = '70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78'
match = re.search(r'00 00 00 ([0-9A-Fa-f]{2})', s)
if match:
print(match.group(1))
The output will be:
f2
You don't really need a regex for that. Get the offset of 3 bytes of zero in a row and take the 4th one after it:
s = '70 75 62 71 00 7e 00 01 4c 00 06 72 61 6e 64 6f 6d 74 00 1c 4c 6a 2f 73 2f 6e 64 6f 6d 3b 78 70 77 25 00 00 00 20 f2 90 c2 91 c4 c4 ca 91 c0 c0 ca 91 94 cb c5 97 90 c5 90 c2 90 96 c7 ca 91 91 93 94 c6 c5 c6 cb c0 78'
s2 = '01 02 03 00 00 00 05 06 07'
def locate(s):
data = bytes.fromhex(s)
offset = data.find(bytes([0,0,0]))
return data[offset + 4]
print(f'{locate(s):02X}')
print(f'{locate(s2):02X}')
Output:
F2
06
You could also extract the "f2" string directly from the string:
offset = s.index('00 00 00')
print(s[offset + 12 : offset + 14]) # 'f2'

Splitting of frame in different parts in bigquery

i've got a string frame looking like this in google bigquery:
S,0,2B3,8, C2 B3 00 00 00 00 03 DE
S,0,3FA,6, 00 E0 04 A5 00 0B
S,0,440,8, 80 40 4E A5 00 47 00 64
S,0,450,8, 89 50 01 12 01 19 01 B3
S,0,4B0,8, 80 B0 4E A5 00 43 00 64
my aim is to extract the 8 bytes at the end (eg 80 40 4E A5 00 47 00 64
). possible only the ones beginning with 83 and 84.
i didnt get it to work with neither split, trim, contains nor regexp_extract.
i'd be quite happy if anyone could help me.
regards
/edit
Thank you both very much for your solutions! this helped quite a lot.
#standardSQL
SELECT
*
FROM (
SELECT
timestamp,
REGEXP_EXTRACT(CAN_Frame, r', ([^,]+)$') AS bytes_string,
FROM_HEX(REPLACE(REGEXP_EXTRACT(CAN_Frame, r', ([^,]+)$'), ' ', '')) AS bytes
FROM `data.source`
)
WHERE SUBSTR(bytes, 1, 1) IN (b'\x83', b'\x84')
ORDER BY timestamp DESC
LIMIT 8000
gives me
Row timestamp bytes_string bytes
1 2017-09-29 14:31:02 UTC 84 10 00 25 00 21 00 4F hBAAJQAhAE8=
2 2017-09-29 14:30:42 UTC 83 80 00 01 00 03 00 0D g4AAAQADAA0=
3 2017-09-29 14:30:40 UTC 84 B2 00 27 00 08 00 03 hLIAJwAIAAM=
#standardSQL
SELECT
timestamp,
TRIM(SPLIT(CAN_Frame)[OFFSET(4)]) AS bytes
FROM
`data.source`
WHERE
LENGTH(CAN_Frame) > 1 and
SUBSTR(TRIM(SPLIT(CAN_Frame)[OFFSET(4)]),1,2) IN ('83', '84')
ORDER BY
timestamp DESC
LIMIT
8000
gives me
Row timestamp bytes
1 2017-09-29 14:31:02 UTC 84 10 00 25 00 21 00 4F
2 2017-09-29 14:30:42 UTC 83 80 00 01 00 03 00 0D
3 2017-09-29 14:30:40 UTC 84 B2 00 27 00 08 00 03
is there a possibility to get only the sixth and seventh byte from the bytes_string beginning with 83, to get 4th and 5th byte from the bytes_string beginning with 84 and to geht the 8th byte from string 83 and the 3rd byte from string 84 for further calculations?
best regards
Below is for BigQuery StandardSQL
#standardSQL
WITH `yourTable` AS (
SELECT 'S,0,2B3,8, C2 B3 00 00 00 00 03 DE' AS frame UNION ALL
SELECT 'S,0,3FA,6, 00 E0 04 A5 00 0B' UNION ALL
SELECT 'S,0,440,8, 80 40 4E A5 00 47 00 64' UNION ALL
SELECT 'S,0,450,8, 89 50 01 12 01 19 01 B3' UNION ALL
SELECT 'S,0,4B0,8, 80 B0 4E A5 00 43 00 64'
)
SELECT frame, TRIM(SPLIT(frame)[OFFSET(4)]) AS bytes
FROM `yourTable`
WHERE SUBSTR(TRIM(SPLIT(frame)[OFFSET(4)]), 1, 2) IN ('80', 'C2')
Here is an example that should help. It produces two columns with different interpretations of the bytes: one (bytes_string) is just the end of the strings that you showed, whereas the other (bytes) is the bytes string converted to an actual BYTES type. In the BigQuery UI, make sure to deselect "Use Legacy SQL" under "Show Options" or include the #standardSQL directive:
#standardSQL
WITH Frames AS (
SELECT 'S,0,2B3,8, C2 B3 00 00 00 00 03 DE' AS frame UNION ALL
SELECT 'S,0,3FA,6, 00 E0 04 A5 00 0B' UNION ALL
SELECT 'S,0,440,8, 80 40 4E A5 00 47 00 64' UNION ALL
SELECT 'S,0,450,8, 89 50 01 12 01 19 01 B3' UNION ALL
SELECT 'S,0,4B0,8, 80 B0 4E A5 00 43 00 64'
)
SELECT
frame,
REGEXP_EXTRACT(frame, r', ([^,]+)$') AS bytes_string,
FROM_HEX(REPLACE(REGEXP_EXTRACT(frame, r', ([^,]+)$'), ' ', '')) AS bytes
FROM Frames;
Here is another example that demonstrates filtering on the bytes column to include only values starting with \x83 or \x84 (this will return an empty result set for the sample data you provided):
#standardSQL
WITH Frames AS (
SELECT 'S,0,2B3,8, C2 B3 00 00 00 00 03 DE' AS frame UNION ALL
SELECT 'S,0,3FA,6, 00 E0 04 A5 00 0B' UNION ALL
SELECT 'S,0,440,8, 80 40 4E A5 00 47 00 64' UNION ALL
SELECT 'S,0,450,8, 89 50 01 12 01 19 01 B3' UNION ALL
SELECT 'S,0,4B0,8, 80 B0 4E A5 00 43 00 64'
)
SELECT
*
FROM (
SELECT
frame,
REGEXP_EXTRACT(frame, r', ([^,]+)$') AS bytes_string,
FROM_HEX(REPLACE(REGEXP_EXTRACT(frame, r', ([^,]+)$'), ' ', '')) AS bytes
FROM Frames
)
WHERE SUBSTR(bytes, 1, 1) IN (b'\x83', b'\x84');

Splunk Data preview - Timestamp in milliseconds, Regex problems

Im trying to parse out the timestamp in milliseconds with this Regex:
\d{7}/
Any idea why its not working?
9281736 : COUNT IN 1003
Tx: 01 04 00 71 00 02 21 d0 ...q..!.
Rx: 01 04 04 00 08 0a 28 7c f8 ......(|.
9282136 : COUNT IN 1003
Tx: 01 04 00 c9 00 02 a1 f5 ........
Rx: 01 04 04 00 08 00 00 7a 46 .......zF
9282536 : COUNT IN 1003
Tx: 01 04 01 2d 00 02 e0 3e ...-...>
Rx: 01 04 04 00 00 ff ff fa 34 ........4
9282936 : COUNT IN 1003
Tx: 01 04 01 f5 00 02 60 05 ......`.
Rx: 01 04 04 00 23 00 00 0a 4e
I preview with "Unsorted data" and get timestamp error message - "Failed to parse timestamp. Defaulting to file modtime."
I think the Regex should be \d{7}, not /\d{7}. Note the slashes!

How to read the model of monitor from the EDID?

In the registry there is one (or more) key depending how many monitors you have HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\DISPLAY\DEL404C{Some Unique ID}\Device Parameters\EDID which is a REG_BINARY key. In my case this is :
00 ff ff ff ff ff ff 00 4c 2d 6f 03 39 31 59 4d
07 12 01 03 0e 29 1a 78 2a 80 c5 a6 57 49 9b 23
12 50 54 bf ef 80 95 00 95 0f 81 80 81 40 71 4f
01 01 01 01 01 01 9a 29 a0 d0 51 84 22 30 50 98
36 00 ac ff 10 00 00 1c 00 00 00 fd 00 38 4b 1e
51 0e 00 0a 20 20 20 20 20 20 00 00 00 fc 00 53
79 6e 63 4d 61 73 74 65 72 0a 20 20 00 00 00 ff
00 48 56 44 51 32 30 36 37 37 37 0a 20 20 00 ef
My question is how can I read only model of monitor ("SyncMaster" for example) and not all of the information using C or C++?
The format of EDID is described here: http://en.wikipedia.org/wiki/Extended_display_identification_data
What you're interested in here is the descriptor blocks of the EDID, which are found in the byte ranges 54-71, 72-89, 90-107, and 108-125. Here's those four blocks in your EDID:
#1: 9a29 a0d0 5184 2230 5098 3600 acff 1000 00
#2: 0000 00fd 0038 4b1e 510e 000a 2020 2020 20
#3: 0000 00fc 0053 796e 634d 6173 7465 720a 20
#4: 0000 00ff 0048 5644 5132 3036 3737 370a 00
You can identify the descriptor containing the monitor name because the first three bytes are all zero (so it isn't a detailed timing descriptor), and the fourth one byte FC (indicating the type). The fifth byte and beyond contain the name, which is here:
5379 6e63 4d61 7374 6572 0a20 SyncMaster..
So, in short: Check at offsets 54, 72, 90, and 108 for the sequence 00 00 00 FC; if you find a match, the monitor name is the next 12 bytes.

vbscript match within a match

Good day all.
I am running some Cisco show commands on a router. I am capturing the output to an array. I want to use Regex to find certain information in the output. The Regex works in the sense that it find the line containing it however there is not enough unique information I can create my regex with so I end up with more that I want. Here is the output:
ROUTERNAME#sh diag
Slot 0:
C2821 Motherboard with 2GE and integrated VPN Port adapter, 2 ports
Port adapter is analyzed
Port adapter insertion time 18w4d ago
Onboard VPN : v2.3.3
EEPROM contents at hardware discovery:
PCB Serial Number : FOC1XXXXXXXXX
Hardware Revision : 1.0
Top Assy. Part Number : 800-26921-04
Board Revision : E0
Deviation Number : 0
Fab Version : 03
RMA Test History : 00
RMA Number : 0-0-0-0
RMA History : 00
Processor type : 87
Hardware date code : 20090816
Chassis Serial Number : FTXXXXXXXXXX
Chassis MAC Address : 0023.ebf4.5480
MAC Address block size : 32
CLEI Code : COMV410ARA
Product (FRU) Number : CISCO2821
Part Number : 73-8853-05
Version Identifier : V05
EEPROM format version 4
EEPROM contents (hex):
0x00: 04 FF C1 8B 46 4F 43 31 33 33 33 31 4E 36 34 40
0x10: 03 E8 41 01 00 C0 46 03 20 00 69 29 04 42 45 30
0x20: 88 00 00 00 00 02 03 03 00 81 00 00 00 00 04 00
0x30: 09 87 83 01 32 8F C0 C2 8B 46 54 58 31 33 33 36
0x40: 41 30 4C 41 C3 06 00 23 EB F4 54 80 43 00 20 C6
0x50: 8A 43 4F 4D 56 34 31 30 41 52 41 CB 8F 43 49 53
0x60: 43 4F 32 38 32 31 20 20 20 20 20 20 82 49 22 95
0x70: 05 89 56 30 35 20 D9 02 40 C1 FF FF FF FF FF FF
AIM Module in slot: 0
Hardware Revision : 1.0
Top Assy. Part Number : 800-27059-01
Board Revision : A0
Deviation Number : 0-0
Fab Version : 02
PCB Serial Number : FOXXXXXXXXX
RMA Test History : 00
RMA Number : 0-0-0-0
RMA History : 00
Product (FRU) Number : AIM-VPN/SSL-2
Version Identifier : V01
EEPROM format version 4
EEPROM contents (hex):
0x00: 04 FF 40 04 F4 41 01 00 C0 46 03 20 00 69 B3 01
0x10: 42 41 30 80 00 00 00 00 02 02 C1 8B 46 4F 43 31
0x20: 33 33 31 36 39 59 55 03 00 81 00 00 00 00 04 00
0x30: CB 8D 41 49 4D 2D 56 50 4E 2F 53 53 4C 2D 32 89
0x40: 56 30 31 00 D9 02 40 C1 FF FF FF FF FF FF FF FF
0x50: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
0x60: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
0x70: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
What I want to capture is the Model number that is contained in the 'Product (FRU) Number:' section. In this example 'CISCO2821'. I want to output or MsgBox just the CISCO2821 although other possibilities could be 'CISCO2911/K9' or something similar.
This is the regex pattern I am using:
Product\s\(FRU\)\sNumber\s*:\s*CIS.*
Using a regex testing tool I was able to match the entire line containing what I want but I want to write only the model number.
I looked at 'ltrim' and 'rtrim' but did not think that could do it.
Any help would be greatly appreciated.
Regards.
Ok, this is in VB.NET not vbscript, but this may help get you on your way:
Dim RegexObj As New Regex("Product\s\(FRU\)\sNumber[\s\t]+:\s(CIS.+?)$", RegexOptions.IgnoreCase Or RegexOptions.Multiline)
ResultString = RegexObj.Match(SubjectString).Groups(1).Value
Invest in 2 little helper functions:
Function qq(sT) : qq = """" & sT & """" : End Function
Function newRE(sP, sF)
Set newRE = New RegExp
newRE.Pattern = sP
newRE.Global = "G" = Mid(sF, 1, 1)
newRE.IgnoreCase = "I" = Mid(sF, 2, 1)
newRE.MultiLine = "M" = Mid(sF, 3, 1)
End Function
and use
' 3 ways to skin this cat
Dim sInp : sInp = Join(Array( _
"CLEI Code: COMV410ARA" _
, "Product (FRU) Number : CISCO2821" _
, "Part Number:73-8853-05" _
), vbCrLf) ' or vbLf, vbCr
WScript.Echo sInp
' (1) just search for CIS + sequence of non-spaces - risky if e.g. CLEI Code starts with CIS
WScript.Echo 0, "=>", qq(newRE("CIS\S+", "gim").Execute(sInp)(0).Value)
' (2) use a capture/group (idea stolen from skyburner; just 'ported' to VBScript)
WScript.Echo 1, "=>", qq(newRE("\(FRU\)[^:]+:\s(\S+)", "gim").Execute(sInp)(0).Value)
WScript.Echo 2, "=>", qq(newRE("\(FRU\)[^:]+:\s(\S+)", "gim").Execute(sInp)(0).SubMatches(0))
' (3) generalize & use a Dictionary
Dim dicProps : Set dicProps = CreateObject("Scripting.Dictionary")
Dim oMT
For Each oMT in newRe("^\s*(.+?)\s*:\s*(.+?)\s*$", "GiM").Execute(sInp)
Dim oSM : Set oSM = oMT.SubMatches
dicProps(oSM(0)) = oSM(1)
Next
Dim sName
For Each sName In dicProps.Keys
WScript.Echo qq(sName), "=>", qq(dicProps(sName))
Next
to get this output:
CLEI Code: COMV410ARA
Product (FRU) Number : CISCO2821
Part Number:73-8853-05
0 => "CISCO2821"
1 => "(FRU) Number : CISCO2821"
2 => "CISCO2821"
"CLEI Code" => "COMV410ARA"
"Product (FRU) Number" => "CISCO2821"
"Part Number" => "73-8853-05"
and - I hope - some food for thought.
Important
a (plain) pattern matches/finds some part of the input
captures/groups/submatches/parentheses cut parts from this match
sometimes dealing with a generalized version of the problem gives
you more gain for less work