Reverse Engineering an 8 bit CRC - crc

I have a device that spits out data with a checksum of some kind (8bit), that I would like to reverse engineer. (balboa spa wifi, for those interested).
The messages are:
7E1DFFAF130000640B2B00000100000400000000000000000064000000A57E
Where "7E" is the header/footer of the message, 1D is the length, and A5 in this case, is the checksum byte.
I've tried feeding these into reveng, but it just spits out "no models found" no matter how I set the parameters. What am I doing wrong?
Some example data with the header/footer stripped off, checksums at the end:
1DFFAF130000640B2800000100000400000000000000000064000000D1
1DFFAF130000640B2900000100000400000000000000000064000000FD
1DFFAF130000640B2A0000010000040000000000000000006400000089
1DFFAF130000640B2B00000100000400000000000000000064000000A5
Thanks

reveng -w 8 -s followed by those four strings gives me a result. However you don't have enough data to resolve the parameters. You need more such messages with CRCs, including messages of differing lengths.

Related

Most efficient way to use AWS SQS (with Golang)

When using the AWS SQS (Simple Queue Service) you pay for each request you make to the service (push, pull, ...). There is a maximum of 256kb for each message you can send to a queue.
To save money I'd like to buffer messages sent to my Go application before I send them out to SQS until I have enough data to efficiently use the 256kb limit.
Since my Go application is a webserver, my current idea is to use a string mutex and append messages as long as I would exceed the 256kb limit and then issue the SQS push event. To save even more space I could gzip every single message before appending it to the string mutex.
I wonder if there is some kind of gzip stream that I could use for this. My assumption is that gzipping all concatenated messages together will result in smaller size then gzipping every message before appending it to the string mutex. One way would be to gzip the string mutex after every append to validate its size. But that might be very slow.
Is there a better way? Or is there a total better approach involving channels? I'm still new to Go I have to admit.
I'd take the following approach
Use a channel to accept incoming "internal" messages to a go routine
In that go routine keep the messages in a "raw" format, so 10 messages is 10 raw uncompressed items
Each time a new raw item arrives, compress all the raw messages into one. If the size with the new message > 256k then compress messages EXCEPT the last one and push to SQS
This is computationally expensive. Each individual message causes a full compression of all pending messages. However it is efficient for SQS use
You could guesstimate the size of the gzipped messages and calculate whether you've reached the max size threshold. Keep track of a message size counter and for every new message increment the counter by it's expected compressed size. Do the actual compression and send to SQS only if your counter will exceed 256kb. So you could avoid compressing every time a new message comes in.
For a use-case like this, running a few tests on a sample set of messages should give the rough percentage of compression expected.
Before you get focused on compression, eliminate redundant data that is known on both sides. This is what encodings like msgpack, protobuf, AVRO, and so on do.
Let's say all of your messages are a struct like this:
type Foo struct {
bar string
qux int
}
and you were thinking of encoding it as JSON. Then the most efficient you could do is:
{"bar":"whatever","qux",123}
If you wanted to just append all of those together in memory, you might get something like this:
{"bar":"whatever","qux",123}{"bar":"hello world","qux",0}{"bar":"answer to life, the universe, and everything","qux",42}{"bar":"nice","qux",69}
A really good compression algorithm might look at hundreds of those messages and identify the repetitiveness of {"bar":" and ","qux",.
But compression has to do work to figure that out from your data each time.
If the receiving code already knows what "schema" (the {"bar": some_string, "qux": some_int} "shape" of your data) each message has, then you can just serialize the messages like this:
"whatever"123"hello world"0"answer to life, the universe, and everything"42"nice"69
Note that in this example encoding, you can't just start in the middle of the data and unambiguously find your place. If you have a bunch of messages such as {"bar":"1","qux":2}, {"bar":"2","qux":3}, {"bar":"3","qux":4}, then the encoding will produce: "1"2"2"3"3"4, and you can't just start in the middle and know for sure if you're looking at a number or a string - you have to count from the ends. Whether or not this matters will depend on your use case.
You can come up with other simple schemes that are more unambiguous or make the code for writing or reading messages easier or simpler, like using a field separator or message separator character which is escaped in your encoding of the other data (just like how \ and " would be escaped in quoted JSON strings).
If you can't have the receiver just know/hardcode the expected message schema - if you need the full flexibility of something like JSON and you always unmarshal into something like a map[string]interface{} or whatever - then you should consider using something like BSON.
Of course, you can't use msgpack, protobuf, AVRO, or BSON directly - they need a medium that allows arbitrary bytes like 0x0. And according to the AWS SQS FAQ:
Q: What kind of data can I include in a message?
Amazon SQS messages can contain up to 256 KB of text data, including XML, JSON and unformatted text. The following Unicode characters are accepted:
#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]
So if you want to aim for maximum space efficiency for your exact usecase, you'd have to write your own code which use the techniques from those encoding schemes, but only used bytes which bytes are allowed in SQS messages.
Relatedly, if you have a lot of integers, and you know most of them are small (or clump around a certain spot of the number line, so that by adding a constant offset to all of them you can make most of them small), you can use one of the variable length quantity techniques to encode all of those integers. In fact several of those common encoding schemes mentioned above use variable length quantities in their encoding of integers. If you use a "piece size" of six (6) bits (instead of the standard implicitly assumed piece size of eight (8) bits to match a byte) then you can use base64. Not full base64 encoding, because the padding will completely defeat the purpose - just map from the 64 possible values that fit in six bits to the 64 distinct ASCII characters that base64 uses.
Anyway, unless you know your data has a lot repetition (but not the kind that you can just not send, like the same field names in every message) I would start with all of that, and only then would I look at compression.
Even so, if you want minimal size, I would aim for LZMA, and if you want minimal computing overhead, I would use LZ4. Gzip is not bad per se - if it's much easier to use gzip then just use it - but if you're optimizing for either size or for speed, there are better options. I don't know if gzip is even a good "middle ground" of speed and output size and working memory size - it's pretty old and maybe there's compression algorithms which are just strictly superior in speed and output and memory size by now. I think gzip, depending on implementation, also includes headers and framing information (like version metadata, size, checksums, and so on), which if you really need to minimize for size you probably don't want, and in the context of SQS messages you probably don't need.

Not getting expected results from complex view

This is a somewhat involved question as the data I am working with is a little large.
I have the following document structure: https://gist.github.com/gaigepr/5b28a7c67ced0cd71e4e
and the following map function: https://gist.github.com/gaigepr/a721bcc8ef6f681f3807
A little description, this function goes through the example document to collect a list of all combinations of characters from 1 to 5 and supplies them with a 1 or 0 to indicate a win or a loss for that particular combo of characters. This is accomplished by getting the powerset of the team and ignoring the empty set when emitting the array key and integer to indicate a win or loss.
The problem I am having is with reducing the data. My goal is to get the win rate of a particular group of characters in the game this data is from. the view takes a key formatted as such: [1] and should output the win rate and games played by that pair of characters.
so my reduce function should be something like this:
However when I do this, I do not actually get all the games played by that pair in the reduction. in my test database, I have 96 games played by the above pair [1, 18] but when I run map and reduce on the with that key, I get that there were only 2 games played and null for the win rate.
A note: This seems to only happen inconsistently. With my data, when I query with the key [1, 18] I get accurate results.
I am a little bit at a loss for what to do to debug this and would appreciate some help. I am happy to add more details, gists, even pictures of the futon output if that would be helpful.
I do not have a lot of reason for this yet, or confirmation, but it seems that the data passed to the reduce function is not formatted how I expect it to be but I am not sure why that is.

Check if data arrived OK

I am using the boost::asio (tcp) to send and receive fixed size (100bytes) data from one pc to another. What's the best way to check if everything arrived OK without impacting performance?
One idea is to save the first and last character and put them first, so: "hello my...battle in the end" will became "hd hello my...battle in the end". The final string will be 102 characters and the receiver can perform a size check also
another idea is to use a hash but i guess this will be very intense for the cpu
Do you guys have any good idea?
NOTE: Please keep in mind: i will use this millions of times, every microsecond counts.
The data are words separated by spaces.
TCP is designed to be a reliable transmission protocol. Since you say you're using TCP, you can simply assume that if the data arrived and is of the full length, it arrived correctly.
If you're worried about data being corrupted in transmission beyond what TCP's 16-bit checksum can detect, you might add a 32-bit CRC to the end of your data.

struct.unpack for network byte order binary encoded numbers

I am totally new to Python. I have to parse a .txt file that contains network byte order binary encoded numbers (see here for the details on the data). I know that I have to use the package struct.unpack in Python. My questions are the following:
(1) Since I don't really understand how the function struct.unpack works, is it straight forward to parse the data? By that, I mean that if you look at the data structure it seems that I have to write a code for each type of messages. But if I look online for the documentation on struct.unpack it seems more straight forward but I am not sure how to write the code. A short sample would be appreciated.
(2) What's the best practice once I parse the data? I would like to save the parsed file in order to avoid parsing the file each time I need to make a query. In what format should I keep the parsed file that would be the most efficient?
This should be relatively straight forward. I can't comment on how you're actually supposed to get the byte encoded packets of information, but I can help you parse them.
First, here's a list of some of the packet types you'll be dealing with that I gathered from section 4 of the documentation:
TimeStamp
System Event Message
Stock Related Messages
Stock Directory
Stock Trading Action
Reg SHO Short Sale Price Test Restricted Indicator
Market Participant Position
Add Order Message
This continues on. But as an example, let's see how to decode one or two of these:
System Event Message
A System Event Message packet has 3 portions, which is 6 bytes long:
A Message Type, which starts at byte 0, is 1 byte long, with a Value of S (a Single Character)
A TimeStamp, which starts at byte 1, is 4 bytes long, and should be interpreted an in Integer.
An Event Code, which starts at byte 5, is 1 byte long and is a String (Alpha).
Looking up each type in the struct.unpack code table, we'll need to build a string to represent this sequence. First, we have a Character, then a 4Byte Unsigned Integer, then another Character. This corresponds to the encoding and decoding string of "cIc".
*NOTE: The unsigned portion of the Integer is documented in Section 3: Data Types of their documentation
Construct a fake packet
This could probably be done better, but it's functional:
>>> from datetime import datetime
>>> import time
>>> data = struct.pack('cIc', 'S', int(time.mktime(datetime.now().timetuple())), 'O')
>>> print repr(data) # What does the bytestring look like?
'S\x00\x00\x00\xa6n\x8dRO' # Yep, that's bytes alright!
Unpack the data
In this example, we'll use the fake packet above, but in the real world we'd use a real data response:
>>> response_tuple = struct.unpack('cIc', data)
>>> print(repr(response_tuple))
('S', 1385000614, 'O')
In this case, the 3rd item in the tuple (the 'O') is a key, to be looked up in another table called System Event Codes - Daily and System Event Codes - As Needed.
If you need additional examples, feel free to ask, but that's the jist of it.
Recommendations on how to store this data. Well, I suppose that depends on what you'd like to do long term to this data. Probably, a database makes sense here. However, without further information, I cannot say.
Hope that helps!

Reading GPS data from a gadget

I have a GPS gadget, and I would like to know how I can receive information with C++ (coordinates) from it?
This is probably easier to answer than the comments indicate. While most GPS units do support proprietary (usually) binary protocols, nearly all that are even close to current also support NMEA-formatted data, and will supply it by default. Whether connected by serial port, bluetooth or USB, most also look to user programs like a serial port.
IOW, you can probably support something like 90% of reasonably current GPS units simply by opening a serial port, and reading data from it. The only ugly part is that the NMEA standard for the data format is fairly expensive (though it's not as bad as it used to be -- you can now buy just the parts of it you care about, which is to say only appendix B, IIRC).
Fortunately, unless you're planning to get your software certified for marine or aircraft navigation, you probably don't need that -- there are a fair number of web sites that cover the (few) message types you probably care about. For that matter, if you just open up a terminal program on the right COM port and look at the data, it's not too hard to find the latitude/longitude all by yourself (at least if you have at least a vague notion of where you are).
It's basically CSV format -- one record per line, with fields separated by commas. It's all in normal ASCII/ANSI text, so getting coordinates is mostly a matter of checking that the first field says it's the right kind of record ("$GPGGA"), then getting fields 3, 4 5, and 6 (3 and 4 are latitude and 5 and 6 longitude -- 3 and 5 give the number, 4 and 6 have "N", or "S" and "E" or "W" to indicate north/south latitude and East/West longitude). You'll probably also want to check that the field after that contains "1" or "2", indicating a normal or Differential GPS fix respectively (a "0" would indicate no GPS fix, so invalid data).
Not that it matters a lot, but you should probably also realize that there are several other field types you'll probably receive, and some of them contain fix data as well. $GPGGA just happens to be the one I've used -- there are others that are probably just as good, but I've never had much reason to look at others, because that one's been adequate for what I needed/did.
Edit: Here's a sample dump of data exactly as it's received from the GPS, in this case over a Bluetooth virtual serial port:
$GPGGA,090809.103,3901.4345,N,10448.2482,W,0,00,50.0,2078.2,M,-21.4,M,0.0,0000*43
$GPRMC,090809.103,V,3901.4345,N,10448.2482,W,0.00,0.00,220311,,*07
$GPVTG,0.00,T,,M,0.00,N,0.0,K*60
$GPGGA,090810.103,3901.4345,N,10448.2482,W,0,00,50.0,2078.2,M,-21.4,M,0.0,0000*4B
In the second $GPGGA record:
$GPGGA,090810.103,3901.4345,N,10448.2482,W,0,00,50.0,2078.2,M,-21.4,M,0.0,0000*4B
"$GPGGA" is the record ("Sentence") type identifier.
"090810.103" is the current UTC time.
"3901.4345,N" is the latitude.
"10448.2482,W" is the longitude.
The next: "0" is telling you that the preceding latitude/longitude are not valid.
The latitude and longitude are both formatted (IIRC) with the last two places before the decimal point being the minutes, and what's after the decimal point the fractions of a minute, so the `3901.4345" really means 39ยบ 1.4345' north latitude.