Splitting of frame in different parts in bigquery

Splitting of frame in different parts in bigquery - regex

i've got a string frame looking like this in google bigquery:
S,0,2B3,8, C2 B3 00 00 00 00 03 DE
S,0,3FA,6, 00 E0 04 A5 00 0B
S,0,440,8, 80 40 4E A5 00 47 00 64
S,0,450,8, 89 50 01 12 01 19 01 B3
S,0,4B0,8, 80 B0 4E A5 00 43 00 64
my aim is to extract the 8 bytes at the end (eg 80 40 4E A5 00 47 00 64
). possible only the ones beginning with 83 and 84.
i didnt get it to work with neither split, trim, contains nor regexp_extract.
i'd be quite happy if anyone could help me.
regards
/edit
Thank you both very much for your solutions! this helped quite a lot.
#standardSQL
SELECT
*
FROM (
SELECT
timestamp,
REGEXP_EXTRACT(CAN_Frame, r', ([^,]+)$') AS bytes_string,
FROM_HEX(REPLACE(REGEXP_EXTRACT(CAN_Frame, r', ([^,]+)$'), ' ', '')) AS bytes
FROM `data.source`
)
WHERE SUBSTR(bytes, 1, 1) IN (b'\x83', b'\x84')
ORDER BY timestamp DESC
LIMIT 8000
gives me
Row timestamp bytes_string bytes
1 2017-09-29 14:31:02 UTC 84 10 00 25 00 21 00 4F hBAAJQAhAE8=
2 2017-09-29 14:30:42 UTC 83 80 00 01 00 03 00 0D g4AAAQADAA0=
3 2017-09-29 14:30:40 UTC 84 B2 00 27 00 08 00 03 hLIAJwAIAAM=
#standardSQL
SELECT
timestamp,
TRIM(SPLIT(CAN_Frame)[OFFSET(4)]) AS bytes
FROM
`data.source`
WHERE
LENGTH(CAN_Frame) > 1 and
SUBSTR(TRIM(SPLIT(CAN_Frame)[OFFSET(4)]),1,2) IN ('83', '84')
ORDER BY
timestamp DESC
LIMIT
8000
gives me
Row timestamp bytes
1 2017-09-29 14:31:02 UTC 84 10 00 25 00 21 00 4F
2 2017-09-29 14:30:42 UTC 83 80 00 01 00 03 00 0D
3 2017-09-29 14:30:40 UTC 84 B2 00 27 00 08 00 03
is there a possibility to get only the sixth and seventh byte from the bytes_string beginning with 83, to get 4th and 5th byte from the bytes_string beginning with 84 and to geht the 8th byte from string 83 and the 3rd byte from string 84 for further calculations?
best regards

Below is for BigQuery StandardSQL
#standardSQL
WITH `yourTable` AS (
SELECT 'S,0,2B3,8, C2 B3 00 00 00 00 03 DE' AS frame UNION ALL
SELECT 'S,0,3FA,6, 00 E0 04 A5 00 0B' UNION ALL
SELECT 'S,0,440,8, 80 40 4E A5 00 47 00 64' UNION ALL
SELECT 'S,0,450,8, 89 50 01 12 01 19 01 B3' UNION ALL
SELECT 'S,0,4B0,8, 80 B0 4E A5 00 43 00 64'
)
SELECT frame, TRIM(SPLIT(frame)[OFFSET(4)]) AS bytes
FROM `yourTable`
WHERE SUBSTR(TRIM(SPLIT(frame)[OFFSET(4)]), 1, 2) IN ('80', 'C2')

Here is an example that should help. It produces two columns with different interpretations of the bytes: one (bytes_string) is just the end of the strings that you showed, whereas the other (bytes) is the bytes string converted to an actual BYTES type. In the BigQuery UI, make sure to deselect "Use Legacy SQL" under "Show Options" or include the #standardSQL directive:
#standardSQL
WITH Frames AS (
SELECT 'S,0,2B3,8, C2 B3 00 00 00 00 03 DE' AS frame UNION ALL
SELECT 'S,0,3FA,6, 00 E0 04 A5 00 0B' UNION ALL
SELECT 'S,0,440,8, 80 40 4E A5 00 47 00 64' UNION ALL
SELECT 'S,0,450,8, 89 50 01 12 01 19 01 B3' UNION ALL
SELECT 'S,0,4B0,8, 80 B0 4E A5 00 43 00 64'
)
SELECT
frame,
REGEXP_EXTRACT(frame, r', ([^,]+)$') AS bytes_string,
FROM_HEX(REPLACE(REGEXP_EXTRACT(frame, r', ([^,]+)$'), ' ', '')) AS bytes
FROM Frames;
Here is another example that demonstrates filtering on the bytes column to include only values starting with \x83 or \x84 (this will return an empty result set for the sample data you provided):
#standardSQL
WITH Frames AS (
SELECT 'S,0,2B3,8, C2 B3 00 00 00 00 03 DE' AS frame UNION ALL
SELECT 'S,0,3FA,6, 00 E0 04 A5 00 0B' UNION ALL
SELECT 'S,0,440,8, 80 40 4E A5 00 47 00 64' UNION ALL
SELECT 'S,0,450,8, 89 50 01 12 01 19 01 B3' UNION ALL
SELECT 'S,0,4B0,8, 80 B0 4E A5 00 43 00 64'
)
SELECT
*
FROM (
SELECT
frame,
REGEXP_EXTRACT(frame, r', ([^,]+)$') AS bytes_string,
FROM_HEX(REPLACE(REGEXP_EXTRACT(frame, r', ([^,]+)$'), ' ', '')) AS bytes
FROM Frames
)
WHERE SUBSTR(bytes, 1, 1) IN (b'\x83', b'\x84');

Related

Formatting a file using regex

Spent some time trying to format a file with roughly 5,000 hex values, however with no luck. For example
1b 00 10 50 a3 bb 0e b7 ff ff 00 00 00 00 09 00
01 02 00 01 00 85 03 0e 00 00 00 55 0e 04 66 03
2a 38 32 80 00 0e 00 2f c2
1b 00 10 50 a3 bb 0e b7 ff ff 00 00 00 00 09 00
01 02 00 01 00 85 03 2b 00 00 00 55 2b 04 58 28
2a 39 32 80 00 01 00 12 57 4d 32 34 30 20 41 43
20 56 65 72 2e 41 00 00 23 06 00 0a 23 06 00 0a
01 00 00 c0 14 56
1b 00 30 a6 59 b8 0e b7 ff ff 00 00 00 00 09 00
00 02 00 01 00 04 03 0d 00 00 00 55 0d 04 33 2a
03 3a 32 40 00 0e be 40
1b 00 f0 01 f1 b6 0e b7 ff ff 00 00 00 00 09 00
00 02 00 01 00 04 03 0e 00 00 00 55 0e 04 66 2a
00 3b 32 40 00 01 05 c9 b1
and so on..
I need to format in the following matter:
1b 00 10 50 a3 bb 0e b7 ff ff 00 00 00 00 09 00 01 02 00 01 00 85 03 0e 00 00 00 55 0e 04 66 03 2a 38 32 80 00 0e 00 2f c2
1b 00 10 50 a3 bb 0e b7 ff ff 00 00 00 00 09 00 01 02 00 01 00 85 03 2b 00 00 00 55 2b 04 58 28 2a 39 32 80 00 01 00 12 57 4d 32 34 30 20 41 43 20 56 65 72 2e 41 00 00 23 06 00 0a 23 06 00 0a 01 00 00 c0 14 56
1b 00 30 a6 59 b8 0e b7 ff ff 00 00 00 00 09 00 00 02 00 01 00 04 03 0d 00 00 00 55 0d 04 33 2a 03 3a 32 40 00 0e be 40
1b 00 f0 01 f1 b6 0e b7 ff ff 00 00 00 00 09 00 00 02 00 01 00 04 03 0e 00 00 00 55 0e 04 66 2a 00 3b 32 40 00 01 05 c9 b1
Basically put each hex block into one line still separated by space. For the life of me, i cannot figure out the regular expression to format this for me. I have tried different expressions but everything i tried either removes the line that separates hex blocks or grabs a last character of the line instead of the actual \n.
Maybe there is a better way of formatting files other than using regex

Please try regex: (?!\n\n)\n
Demo

crtdbg dumps a memory leak when sf::Text::setOutlineThickness is used

Working with SFML 2.4.2 on Windows 7 64-bit version, I've noticed an issue with sf::Text::setOutlineThickness(float). Once it is used in the program, except for default value 0, crtdbg dumps a memory leak of various sizes of bytes but always the same amount. I believe this is related to the size of the string, if the text gets drawn, and if the parameter of setOutlineThickness is accepted, demonstrated here:
/// Initial set-up
sf::Text test;
test.setString("A");
// ... Set charactersize, font, fillcolor, etc ...
test.setOutlineThickness(1);
test.setOutlineColor(sf::Color::Black);
/// Make a drawcall for test later in the program
void Game::draw(sf::RenderTarget & target, sf::RenderStates states) const
{
target.draw(test, states);
}
This produces a leak:
{8601} normal block at 0x0000000005CA5C90, 60 bytes long.
Data: < > 03 00 07 00 0B 00 0F 00 13 00 17 00 1B 00 1F 00
{8600} normal block at 0x0000000005E03A20, 120 bytes long.
Data: < > 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
{8599} normal block at 0x0000000005E2A680, 960 bytes long.
Data: < > 00 00 00 00 80 07 00 00 0D 01 00 00 80 07 00 00
{8598} normal block at 0x0000000005CA36B0, 72 bytes long.
Data: < h > F0 1A 9D 05 00 00 00 00 68 AE 83 DB FE 07 00 00
If test.setString("B");, there are still four blocks but the byte size differs, since the string uses another character:
68 bytes, 136 bytes, 1088 bytes, 72 bytes.
Finally if test.setString("AB");, there are 8 blocks with the expected sizes:
{8667} normal block at 0x0000000005C35D10, 68 bytes long.
Data: < > 03 00 07 00 0B 00 0F 00 13 00 17 00 1B 00 1F 00
{8666} normal block at 0x0000000005C61310, 136 bytes long.
Data: < > 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
{8665} normal block at 0x000000000325CDE0, 1088 bytes long.
Data: < > 00 00 00 00 80 07 00 00 0D 01 00 00 80 07 00 00
{8664} normal block at 0x0000000005C340D0, 72 bytes long.
Data: < h > F0 1A 96 05 00 00 00 00 68 AE B5 DB FE 07 00 00
{8601} normal block at 0x0000000005C35C90, 60 bytes long.
Data: < > 03 00 07 00 0B 00 0F 00 13 00 17 00 1B 00 1F 00
{8600} normal block at 0x0000000005D93A20, 120 bytes long.
Data: < > 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
{8599} normal block at 0x0000000005DBA680, 960 bytes long.
Data: < > 00 00 00 00 80 07 00 00 0D 01 00 00 80 07 00 00
{8598} normal block at 0x0000000005C336B0, 72 bytes long.
Data: < h > F0 1A 96 05 00 00 00 00 68 AE B5 DB FE 07 00 00
I use sf::Text as a private member of a class which should be destroyed with the class but that doesn't seem to be the case. What am I missing?
I use _CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF | _CRTDBG_LEAK_CHECK_DF);, is this a false positive?
Glancing at the function, sf::Text::setOutlineThickness, I don't see an issue here. A brief documentation.
Different leaks depending on the size of the string is more of a symptom really, it's the drawcall and non-default value on outline thickness that I'm clueless on.

Looks like there is a real leak in SFML, at Font.cpp#L561
It looks like this:
FT_Glyph_Stroke(&glyphDesc, stroker, false);
But according to the docu of FT_Glyph_Stroke, it should actually be this, so the source glyph is destroyed:
FT_Glyph_Stroke(&glyphDesc, stroker, true);

removing the dots and colons in the time field

I have the following contents from data.log file. I wish to extract the time value and part of the payload (after deadbeef in the payload, third row, starting second to last byte. Please refer to expected output).
data.log
print 1: file offset 0x0
ts=0x584819041ff529e0 2016-12-07 14:13:24.124834649 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 50 e6 61 c3 85 21 01 00 de ad be ef 85 d7 ..P.a..!........
91 21 6f 9a 32 94 fd 07 01 00 de ad be ef 85 d7 .!o.2...........
print 2: file offset 0x60
ts=0x584819041ff52b00 2016-12-07 14:13:24.124834716 UTC
type: ERF Ethernet
dserror=0 rxerror=0 trunc=0 vlen=0 iface=1 rlen=96 lctr=0 wlen=68
pad=0x00 offset=0x00
dst=aa:bb:cc:dd:ee:ff src=ca:fe:ba:be:ca:fe
etype=0x0800
45 00 00 32 00 00 40 00 40 11 50 ff c0 a8 34 35 E..2..#.#.P...45
c0 a8 34 36 80 01 00 00 00 1e 00 00 08 08 08 08 ..46............
08 08 68 e7 61 c3 85 21 01 00 de ad be ef 86 d7 ..h.a..!........
91 21 c5 34 77 bd fd 07 01 00 de ad be ef 86 d7 .!.4w...........
Expected output
I just want to replace the dots and colons in the time field (before UTC) and get the entire value.
141324124834649,85d79121
141324124834716,86d79121
What I have done so far
I have extracted the fields after "." but not sure how to replace the colons and get the entire time value.
awk -F '[= ]' '$NF == "UTC"{split($4,b,".");s=b[2]",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' data.log
124834649,85d79121
124834716,86d79121
Any help is much appreciated.

awk '$NF == "UTC"{gsub("[.:]","",$3);s=$3",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' data.log
Result:
141324124834649,85d79121
141324124834716,86d79121
PS: it can be simplified with getline :
awk '$NF == "UTC"{gsub("[.:]","",$3);s=$3","} /de ad be ef/{s=s $15 $16;getline;print(s $1 $2)}' data.log

You can extract the time part like this:
$ awk '/UTC/ {split($0,a); gsub(/[\.:]/,"",a[3]); print a[3]}' file
141324124834649
141324124834716

for the UTC part, (rest of the code is the same
awk '/UTC$/{gsub(/[\.:]/,"");print $3}' YourFile
just remove the ":" and "." and take the field value, other part of the line don't have those 2 character, so is not modified
$NF test is replaced by /UTC$/, a bit faster and simpler (OMHO)
the full code
awk -F '[= ]' '/UTC$/{gsub(/[\.:]/,"");s=$3",";a=15} /de ad be ef/{s=s $a $(a+1);if(a==1)print s;a=1}' YourFile

Memory leaks in boost asio

I have client/server app. Interaction implemented via Boost.Asio.
I created unit test to check long running transmission of data.
During the test memory leak detected.
Task Manager shows me that memory usage constantly grows - up to 35MB per 10min. Report produced at the end of test contains this:
Result StandardError: Detected memory leaks!
Dumping objects ->
{14522} normal block at 0x00E8ADC0, 16 bytes long.
Data: < _M} Y > B0 5F 4D 7D F9 59 F2 02 F4 E9 E6 00 CC CC CC CC
{14012} normal block at 0x00E8B280, 16 bytes long.
Data: < v > C0 76 A4 00 94 01 00 00 98 01 00 00 F0 D2 E3 00
{14011} normal block at 0x00E74B38, 12 bytes long.
Data: < > 00 00 00 00 9C 01 00 00 98 01 00 00
{14007} normal block at 0x00E745F8, 8 bytes long.
Data: < L > E0 4C E5 00 00 00 00 00
{14006} normal block at 0x00E54CB8, 60 bytes long.
Data: < v 4 > E4 76 A4 00 D0 D3 B0 00 00 00 00 00 34 80 E3 00
{13724} normal block at 0x00E710F8, 385 bytes long.
Data: < > 03 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
{13722} normal block at 0x00E85C58, 28 bytes long.
Data: < F _ _ > F2 B6 46 00 B4 5F E3 00 A0 5F E3 00 BC 96 E7 00
{13720} normal block at 0x00E6F9B8, 80 bytes long.
Data: <wxF > 77 78 46 00 FC FF FF FF 00 00 00 00 CC CC CC CC
{13700} normal block at 0x00E6DFD0, 88 bytes long.
Data: < > C8 A4 A4 00 01 00 00 00 01 00 00 00 00 00 00 00
…
Data: <` X L > 60 8E E0 00 58 17 E2 00 CD 4C F7 EA
{153} normal block at 0x00DF0070, 12 bytes long.
Data: <` kf > 60 8D E0 00 98 00 E2 00 15 6B 66 0E
{151} normal block at 0x00DF0038, 12 bytes long.
Data: < .g> 20 86 E0 00 E0 FC E1 00 9D B7 2E 67
{149} normal block at 0x00DF0658, 12 bytes long.
Data: < G > A0 89 E0 00 00 00 00 00 47 01 D5 11
{147} normal block at 0x00DF0268, 12 bytes long.
Data: <` > 60 84 E0 00 A8 F5 E1 00 ED 8C AA BA
{145} normal block at 0x00DF0230, 12 bytes long.
Data: < ' " > 20 84 E0 00 00 11 E2 00 27 B0 22 00
{143} normal block at 0x00DF0690, 12 bytes long.
Data: <` P KnOQ> 60 88 E0 00 50 04 E2 00 4B 6E 4F 51
{141} normal block at 0x00DF0540, 12 bytes long.
Data: <` > 7> 60 82 E0 00 00 0A E2 00 3E 0D 9E 37
{139} normal block at 0x00DF0620, 12 bytes long.
Data: <Pq 1 > 50 71 DF 00 00 00 00 00 E5 DD 31 B5
{137} normal block at 0x00DF0700, 12 bytes long.
Data: < q # #> 10 71 DF 00 40 FA E1 00 14 8B 0D 23
{134} normal block at 0x00DF5CE0, 96 bytes long.
Data: <h BV BV > 68 19 E0 00 D0 42 56 00 E0 42 56 00 88 00 00 00
{133} normal block at 0x00DF0188, 8 bytes long.
Data: < \ > A0 5C DF 00 00 00 00 00
{132} normal block at 0x00DF5CA0, 16 bytes long.
Data: < > 88 01 DF 00 D8 AA DF 00 20 AC DF 00 20 AC DF 00
Object dump complete.
I tried to put breakpoint to mentioned memory allocations via boost's --detect_memory_leaks="allocation number" and setting in Watch window at Debug mode _crtBreakAlloc = 1000. It does not work. Maybe because leaks occur not in my code, but in boost/OpenSSL code?
I can't figure out where leaks occur. What can I do?
Windows 8, Visual Studio 2015, boost 1.60, OpenSSL 1.0.2g

Have a look at this post to see some suggested tips for dealing with memory leaks under windows. Have a scroll down, don't just look at the first answer. In particular it may be worth considering the DEBUG_NEW macro-based solution discussed by the second answer. Given that boost asio is largely header-only, this should help you even if the offending allocations come from the boost library.

Part 1: Report from Visual Studio about memory leaks
I'm using Boost.Asio to communicate with the server over TLS, i.e. Boost.Asio uses OpenSSL.
Seems that OpenSSL initializes itself and do not cleans memory before the end of the app (because app closes and memory will be released anyway).
This is not big chunk of memory (I do not know how to measure it).
As result Visual Studio treated that memory as leak. But it is not.
(This is my assumption, maybe real reason for such report is smth else. But I do not see any other possible reasons. )
Part 2:
In the question above I asked about memory leak for tens of Mb. This is my bad code that leads to huge memory buffer )).
Huge memory consumption and report from VisualStudio about memory leak made me believe that smth is very wrong ))
Buffer easily reduced to much smaller size.

How to read the model of monitor from the EDID?

In the registry there is one (or more) key depending how many monitors you have HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\DISPLAY\DEL404C{Some Unique ID}\Device Parameters\EDID which is a REG_BINARY key. In my case this is :
00 ff ff ff ff ff ff 00 4c 2d 6f 03 39 31 59 4d
07 12 01 03 0e 29 1a 78 2a 80 c5 a6 57 49 9b 23
12 50 54 bf ef 80 95 00 95 0f 81 80 81 40 71 4f
01 01 01 01 01 01 9a 29 a0 d0 51 84 22 30 50 98
36 00 ac ff 10 00 00 1c 00 00 00 fd 00 38 4b 1e
51 0e 00 0a 20 20 20 20 20 20 00 00 00 fc 00 53
79 6e 63 4d 61 73 74 65 72 0a 20 20 00 00 00 ff
00 48 56 44 51 32 30 36 37 37 37 0a 20 20 00 ef
My question is how can I read only model of monitor ("SyncMaster" for example) and not all of the information using C or C++?
The format of EDID is described here: http://en.wikipedia.org/wiki/Extended_display_identification_data

What you're interested in here is the descriptor blocks of the EDID, which are found in the byte ranges 54-71, 72-89, 90-107, and 108-125. Here's those four blocks in your EDID:
#1: 9a29 a0d0 5184 2230 5098 3600 acff 1000 00
#2: 0000 00fd 0038 4b1e 510e 000a 2020 2020 20
#3: 0000 00fc 0053 796e 634d 6173 7465 720a 20
#4: 0000 00ff 0048 5644 5132 3036 3737 370a 00
You can identify the descriptor containing the monitor name because the first three bytes are all zero (so it isn't a detailed timing descriptor), and the fourth one byte FC (indicating the type). The fifth byte and beyond contain the name, which is here:
5379 6e63 4d61 7374 6572 0a20 SyncMaster..
So, in short: Check at offsets 54, 72, 90, and 108 for the sequence 00 00 00 FC; if you find a match, the monitor name is the next 12 bytes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Splitting of frame in different parts in bigquery - regex

Related

Formatting a file using regex

crtdbg dumps a memory leak when sf::Text::setOutlineThickness is used

removing the dots and colons in the time field

Memory leaks in boost asio

How to read the model of monitor from the EDID?

Categories

Resources