Find the Algorithm that generates the checksum

Find the Algorithm that generates the checksum - crc

I have a sensing device that transmits a 6-byte message along with an 1-byte counter and supposedly a checksum.
The data looks something like this:
------DATA----------- -Counter- --Checksum?--
55 FF 00 00 EC FF ---- 60---------- 1F
The last four bits in the counter are always set 0, i.e those bits are probably not used. The last byte is assumed to be the checksum since it has a quite peculiar nature. It tends to randomly change as data changes.
Now what I need is to find the algorithm to compute this checksum based on --DATA--.
what I have tried is all possible CRC-8 polynomials, for each polynomial i have tried to reflect data, toggle it, initiate it with non-zeroes etc. I've come to the conclusion that i am not dealing with a normal crc-algorithm. I have also tried some flether and adler methods without success, xor stuff back and forth but still i have no clue how to generate the checksum.
My biggest concern is, how is the counter used??? Same data but with different countervalue generates different checksums.
I have tried to include the counter in my computations but without any luck.
Here are some other datasamples:
55 FF 00 00 F0 FF A0 38
66 0B EA FF BF FF C0 CA
5E 18 EA FF B7 FF 60 BD
F6 30 16 00 FC FE 10 81
One more thing that might be worth mentioning is that the last byte in the data only takes on the values FF or FE
Please, if you have any tips or tricks that i may try post them here, I am truly desperate.
Thanks

Some random ideas:
Bit ordering: you are currently representing the data as octets, but this is not how the CRC algorthm sees it. CRCs operate on polynomials represented as arrays of bits, not arrays of octets. Because of this, it is possible that the device performs the CRC using different bit ordering scheme than the one you are using.
Depending on the device, I would say it is rather likely that the counter is included in the CRC calculation.
If this is an embedded device, it might be use some other code, such as a BCH.
Is there any other information that can be given about the device in question?
This might give some indication as to how strong coding has been used. As an example, certain CRC-12 (0x8F8) provides Hamming distance of 5 up to data word lengths of 53 bits (in your data the data word could be 52 bits, assuming CRC size of 12 bits).
Edit:
See the answer in How could I guess a checksum algorithm? for some additional ideas.

Related

How to access data bytes of an JPEG image in order to encrypt it?

For a project I need to encrypt data bytes of an JPEG image in order to show both encrypted and decrypted image. I used to done this with BMP format easily by skipping 1078 bytes then encrypted rest. But for JPEG it's a lot harder. I found that JPEG has 20 byte markers but when I skip 20 bytes instead of 1078 I can't see the encrypted image as I did with BMP.
So can you tell me how can I access data bytes in JPEG format in order to encrypt it ?
Note:Code is a lot larger so I can't post whole code. The info on how to access data bytes in JPEG format would be sufficient.

The JPEG format is much more complex than a simple bitmap. In addition to image data, a JPEG can contain comments, metadata, and probably some other info that I'm not aware of. (Images from a camera almost certainly include metadata, such as where and when the image was taken.) There is structure beyond the header bytes. If you want to change the data within a JPEG and have the result be a valid JPEG, you would need to be able to read and write the JPEG structure.
As I understand it – and my understanding is far from complete – a JPEG file consists of a sequence of segments. Each segment starts with the marker byte FF followed by a byte that indicates what kind of segment it is. For some kinds of segments, this is followed by two bytes indicating how long the data of the segment is (the length includes the length bytes, but does not include the marker and kind bytes). For example, the file starts with the segment FF D8, a 2-byte segment (no length bytes) that indicates the start of the image. This is followed by another segment. The page you linked to gives example data where the next segment is an application segment: FF E0 followed by 16 bytes of data. How do you know there are 16 bytes of data? The first two of those bytes are 00 10, which is 16 in decimal. After that comes another FF marker, signalling another segment.
FF D8
FF E0 00 10 4A 46 49 46 00 01 01 01 00 48 00 48 00 00
FF DB 00 43 [65 bytes of data]
...
To retain the JPEG format, you should process the file in terms of segments. Drop the idea of having a (20-byte) file header; there is simply a sequence of segments. Some of the segments play the role of header data and should not be modified without a deeper understanding of the format. Other segments contain the data you want to modify.
I think the segments you are interested in are the frames, which are kinds C0 through CF minus C4 and CC. These segments have variable-length data, hence their data begins with a two-byte length. If you're lucky, modifying just the data in these segments will get you the results you want. If you're not lucky, there is critical additional structure within this data that I'm not aware of, and your modifications will corrupt that structure.
What you might want to do is scan the file looking for a marker byte, FF. Look at the byte after that. Is it a kind you want to modify? If not, just copy bytes until you reach the next marker. If it is a kind to modify, read the next two bytes to get the length of the data. Then read (length-2) bytes, remembering that you already read 2 bytes. This is the data to process. After processing, re-calculate the length then write the segment (FF followed by the kind, followed by the new length, followed by the modified data). Keep going until you run out of file to process.
There is a complication to keep in mind, hinted at by "re-calculate the length". If the byte FF appears somewhere in the modified data, you'll need to flag it as not a marker by inserting the null byte, 00, after it. This is one way the length could change.
If you're still following, you might be able to pull this off. I can point you to a copy of the JPEG standard, which is a rough read. Still, it has a list of kinds of segments in its Table B.1 (the 36th page, but numbered 32). Remember that your encryption is of the file, not of the image. To decrypt the data, you will need the encrypted file, not merely the image it produces.

what method can create a 16 byte checksum

I have a customer's broken application that I am trying to fix, but the original code has been lost, and I am trying to re-engineer it. It involves downloading 140 blocks of data, 768 bytes per block, to a device. In the data there is a "checksum" consisting of 16 bytes of data, that presumably covers all 140 blocks or possibly some small subset (no way to know).
If I change so little as 1 bit in the data, the entire 16 bytes of "checksum" changes. What I'm looking for are ideas about how this "checksum" might be calculated.
Example:
In block 24 at offset 0x116 I change two bytes from 0xe001 to 0xe101, and the "checksum" data changes from:
53 20 5a 20 3e f5 38 72 eb d7 f4 3c d9 0a 3f c5
to this:
7f fe ad 1f cc c3 1e 3c 22 0a bf 6a 6d 03 ad 97
I can experiment if I have some clue how they might be calculating this "checksum".
Looking for any idea to get me started.

Partial answer:
Presumably covers all 140 blocks or possibly some small subset (no way
to know)
If I change so little as 1 bit in the data, the entire 16 bytes of
"checksum" changes.
Iterate through each bit of the data, modifying each in turn, and see if the checksum changes. Then you can figure out which subset (if any) is being hashed.
Once you know that, then you will be able to come up with known input / output pairs. You can plug a known input into a tool that will generate outputs using well known hash algorithms (for example, this one: http://www.toolsvoid.com/multi-hash-generator - just the first hit I got from Google, not necessarily the most comprehensive).
EDIT: As per comments on the question, this will not work if there is a salt. But at least it will quickly rule out the simplest cases.

Parsing compressed blob using Apache Tika

I've been trying to detect and parse using Apache Tika a large byte array I retrieved using Spring JDBC from rows of compressed blob records. These records originally came from IBM DB2/Mainframe and then migrated to Oracle 11g/Unix. The first few hex values are
8c d0 9e 45 00 17 2a 96 86 70 00 12 2a 5c c8 10 80 0f 00 03 00 f8 fb 97 00 80 81 86
I've tried the various Compressor and Package Parsers of Tika but it apparently cannot detect what type of format it is. Ultimately, when decompressed, the byte arrays should be of the Word.Document.8 format. The hex signature of MSWord is D0 CF 11 E0 A1 B1 1A E1 00, and Tika can detect this of course.
Is anyone familiar with the hex signature above? I was wondering if I can just reverse engineer the hex signatures to determine the compression algorithm used - how should one go about it?

Fast way to convert strings to numbers on large dataset

I have a data set with tens of millions of rows. Several columns on this data represent categorical features. Each level of these features is represented by an alpha-numeric string like "b009d929".
C1 C2 C3 C4 C5 C6 C7
68fd1e64 80e26c9b fb936136 7b4723c4 25c83c98 7e0ccccf de7995b8 ...
68fd1e64 f0cf0024 6f67f7e5 41274cd7 25c83c98 fe6b92e5 922afcc0
I'd like to to be able to use Python to map each distinct level to a number to save memory. So that feature C1's levels would be replaced by numbers from 1 to C1_n, C2's levels would be replaced by numbers from 1 to C2_n...
Each feature has different number of levels, ranging from under 10 to 10k+.
I tried dictionaries with Pandas' .replace() but it gets extremely slow.
What is a fast way to approach this problem?

I figured out that the categorical features values were hashed onto 32 bits. So I ended up reading the file in chunks and applying this simple function
int(categorical_feature_value, 16)

Streaming mulitple mp3 files to Icecast

I have a few thousand mp3 files on a web server that I need to stream to an Icecast 2.3.3 server instance running on the same server.
Each file is assigned to one or more categories. There are 7 categories in total. I'd like to have one mount per category.
I need to be able to add and remove files to categories. When a file is added / removed, I need to somehow merge the file into the category or shuffle the files in the category, after which I assume I'll need to restart the mount.
My question is: Is there a source application I could use that runs as a service on Windows OS that can automate this kind of thing?
Alternatively I could write a program to shuffle and merge these files together as one big "category" mp3 file, but would like to know if there's another way.
Any advice is very much appreciated.

Since you're just dealing with MP3 files, SHOUTcast sc_trans might be a good option for you.
http://wiki.winamp.com/wiki/SHOUTcast_DNAS_Transcoder_2
You can configure it to use a playlist (which you can generate programmatically), or have it read the directory and just run with the files as-is. Note that sc_trans doesn't support mount points, so you will have to configure Icecast to accept a SHOUTcast-style connection. This works, but will require you to run multiple instances of Icecast. If you'd like to stream everything on a single port later on, you can set up a master Icecast instance which relays all of the streams from the others.
There are plenty of other choices out there depending on your needs. Tools like SAM DJ allow full control over playlists and advertisements but can be overkill depending on what you need to do.
I typically find myself working with a diverse set of inputs, so I use VLC to playback and then some custom software to get this encoded and off to the streaming server. This isn't difficult to do, and you can even use VLC to do the encoding for you if you're crafty in configuring it.

I know it's old, and you most likely already found your solution. However, there may be more folks with this issue, so I throw in a a few considerations when you decide to write an own "shuffler" for MP3 files.
I would not use pure random for the task at hand: the likeliness of titles being played multiple times consecutively exists; you don't want that.
Also, you most likely have your titles sorted in some way, say
Artist A - Title 1
Artist A - Title 2
...
Artist B - Title 1
...
You most likely aim for diversity when shuffling, so you don't want to play the same artist twice consecutively.
I would read all filenames into an array with indices 0...n.
Find the artist with the most number of files, let the number be m.
Then find the next prime p, which is co-prime to n, but larger than m.
The generate a pseudo random number s in [0...n] just ONCE to find a starting song; this avoids to play the same starting sequence each time.
In a loop, do play song s, then set
s := (s + p) mod n
This is guaranteed to play all songs, and they will play just once, and multiple consecutive songs of the same artist are avoided.
Here's a little example for just 16 songs, capital letters are Artist, small letters song titles.
Aa Ab Ac Ba Bb Bc Bd Ca C2 Da Db Dc Dd Ea Fa Fb
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The artists B and E have the most songs, hence
m := 4
You search for a prime number co-prime to 16 = 2 * 2 * 2 * 2, but larger than 4, and you find:
p := 5
You invoke the PRNG function once and obtain, say, 11, so s = 11 is the first song (s = 0) to be played. Then you loop:
Aa Ab Ac Ba Bb Bc Bd Ca Cb Da Db Dc Dd Ea Fa Fb
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
s 1 14 11 8 5 2 15 12 9 6 3 0 13 10 7 4
s is the played sequence:
Dc Aa Bc Db Fb Bb Da Fa Ba Cb Ea Ac Ca Dd Ab Bd
No artist repetition, no two songs after each other, much diversity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js