Why aren't EXE's in binary?

Why aren't EXE's in binary? - c++

Why is it that if you open up an EXE in a hex editor, you will see all sorts of things. If computers only understand binary then shouldn't there only be 2 possible symbols seen in the file? Thanks

You're confusing content with representation. Every single file on your computer can be represented with binary (1s and 0s), and indeed that's how it's generally stored on disk (alignment of magnetic particles) or RAM (charge).
You're viewing your exe with a "hex editor", which represents the content using hexadecimal numbers. It does this because it's easier to understand and navigate hex than binary (compare "FA" to "11111010").
So the hexadecimal symbol "C0" represents the same value as the binary "11000000", "C1" == "11000001", "C2" == "11000010", and so on.

The hexadecimal values are interpreted binary values in memory. The software only make it a bit more readable to human beings.
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9
1010 = 10 A
1011 = 11 B
1100 = 12 C
1101 = 13 D
1110 = 14 E
1111 = 15 F

Computers don't only understand binary, that's a misconception. At the very lowest, lowest, lowest level, yes, data in digital computers is a series of 1s and 0s. But computer CPUs group those bits together into bytes, words, dwords, qwords, etc. The basic unit dealt with by a modern CPU is a dword or a qword, not a bit. That's why they're called 32-bit or 64-bit processors. If you want to get them to work with a single bit, you pretty much end up including 31 or 63 extraneous bits with it. (It gets a bit blurry when you start dealing with flag registers.)
Digital computers really came into their own as of 8-bit processors, so hexadecimal became a very useful display format as it succinctly represents a byte (8 bits) in two characters. You're using a hex editor, so it's showing you hex, and because of this early byte-orientation, it's showing you two characters for every 8 bits. It's mostly a display thing, though; there's little reason it couldn't show you one character for every 4 bits or four characters for every 16 bits, although file systems generally work on byte granularity for actual data (and much, much larger chunks for storage allocation granularity -- almost always 4k or more).

This character A you see here on the screen is just a pattern made of ones and zeros. It's how we all cooperate by all the standards that make all ones and zeros making patterns ending up on the screen understandable.
The character A can have the value 65. In binary this is 0100 0001 but on the screen it might be the pattern
##
# #
####
# #
# #
In a exe file a lot of stuff is stored in various formats, floats, integers and strings. These formats are often used as they will easily be read directly by the computer without further conversion. In a Hex editor you will often be able to read strings that happen to be stored in the exe file.
In a computer everything's binary

There are only two possible states. What you're seeing is larger patterns of combinations of them, much in the same way that the only things sentences are made of are letters and punctuation.

Each character (byte) in the file represents 8 bits (8 ones or zeroes). You don't see bits, you see bytes (and larger types).

So I am going to give a layman answer here. What others suggested above is correct, you can read binary through Hex representation. Most data is saved in round number of bytes anyway. It is possible that e.g. compression algorithm computes a compressed representation in some odd number of bits, but it would still pad it to a full byte to save it. And each byte can be represented as 8 bits or 2 hex digits.
But, this may not be what you have asked. Quite likely you found some ascii data inside
the supposedly binary data. Why? Well, sometimes code is not just for running. Sometimes
compilers include some bits of human readable data that can help debugging if the code were
to crash and you needed to access the stack trace. Things like variable names, line numbers etc.
Not that I ever had to do that. I don't have bugs in my code. Thats right.

Don't forget that about operating system and disk file sytem. They are may only use files in their formats. For example executable files in win32 must begin with PE header. Operation system loads exutable in memory and transfer control, assort api-instructions in the exutables and so on...The low level instructions executes by CPU, for that level instructions already may be a sets of byte.

Related

Bitstream parsing and Endianness

I am trying to parse a bitstream, and I am having trouble getting my head around endianness. I have a byte buffer, and I need to be able to read bitfields out which are of varying lengths, anywhere from 1 bit to 8 bits mostly.
My problem comes with the endianness of the bytes. When I step through with a debugger, the bottom 4 bits appear to be in the top portion of the byte. That is, where I am expecting the first two bits to be 10 (they must be 10), however, the first byte in the bitstream is 0xA3, or 1010 0011, when checking with the debugger. Meaning, assuming that the bits are in the "correct" order, the first two bits are in fact 11 (reading right to left).
It would seem, however, that if the bits were not in the right order, and should be 0x3A, or 0011 1010, I then have 10 as my expected first two bits.
This confuses me, because it doesn't seem to be a matter of bit order, MSb to LSb/LSb to MSb, but rather nibble order. How does this happen? That seems to just be the way it came out of the file. There is a possibility this is an invalid bitstream, but I have seen this kind of thing before when reading files in Hex Editors, nibbles seemingly in the "wrong" order.
I am just confused and would like some help understanding what's going on. I don't often deal with things at this level.

You don't need to concern the bit order, because in C/C++ there is no way for you to iterate through the bits using pointer arithmetics. You can only manipulate the bits using bit-wise operators that are independent of the bit order of the local machine. What you mentioned in the OP is just a matter of visualization. Different debuggers may choose different ways to visualize the bits in a byte. There is no right or wrong for this matter. There is just preference. What really matters if the byte order.

Saving binary date into file in c++

My algoritm produces stream of 9bits and 17bits I need to find solution to store this data in file. but i can't just store 9 bits as int and 17bits as int_32.
For example if my algoritm produces 10x9bit and 5x17bits the outfile size need to be 22bytes.
Also one of the big problem to solve is that the out file can be very big and size of the file is unknown.
The only idea with I have now is to use bool *vector;

If you have to save dynamic bits, then you should probably save two values: The first being either the number of bits (if bits are consecutive from 0 to x), or a bitmask to say which bits are valid; The second being the 32-bit integer representing your bits.

Taking your example literally: if you want to store 175 bits and it consists of unknown number of entities of two different lengths, then the file absolutely cannot be only 22 bytes. You need to know what is ahead of you in the file, you need the lengths. If you got only two possible sizes, then it can be only a single bit. 0 means 9 bit, 1 means 17 bit.
|0|9bit|0|9bit|1|17bit|0|9bit|1|17bit|1|17bit|...
So for your example, you would need 10*(1+9)+5*(1+17) = 190 bits ~ 24 bytes. The outstanding 2 bits need to be padded with 0's so that you align at byte boundary. The fact that you will go on reading the file as if there was another entity (because you said you don't know how long the file is) shouldn't be a problem because last such padding will be always less than 9 bits. Upon reaching end of file, you can throw away the last incomplete reading.
This approach indeed requires implementing a bit-level manipulation of the byte-level stream. Which means careful masking and logical operations. BASE64 is exactly that, only being simpler than you, consisting only of fixed 6-bit entities, stored in a textfile.

Is there a name for this compression algorithm?

Say you have a four byte integer and you want to compress it to fewer bytes. You are able to compress it because smaller values are more probable than larger values (i.e., the probability of a value decreases with its magnitude). You apply the following scheme, to produce a 1, 2, 3 or 4 byte result:
Note that in the description below (the bits are one-based and go from most significant to least significant), i.e., the first bit refers to most significant bit, the second bit to the next most significant bit, etc...)
If n<128, you encode it as a
single byte with the first bit set
to zero
If n>=128 and n<16,384 ,
you use a two byte integer. You set
the first bit to one, to indicate
and the second bit to zero. Then you
use the remaining 14 bits to encode
the number n.
If n>16,384 and
n<2,097,152 , you use a three byte
integer. You set the first bit to
one, the second bit to one, and the
third bit to zero. You use the
remaining 21 bits, to encode n.
If n>2,097,152 and n<268,435,456 ,
you use a four byte integer. You set
the first three bits to one and the
fourth bit to zero. You use the
remaining 28 bits to encode n.
If n>=268,435,456 and n<4,294,967,296,
you use a five byte integer. You set
the first four bits to one and use
the following 32-bits to set the
exact value of n, as a four byte
integer. The remainder of the bits is unused.
Is there a name for this algorithm?

This is quite close to variable-length quantity encoding or base-128. The latter name stems from the fact that each 7-bit unit in your encoding can be considered a base-128 digit.

it sounds very similar to Dlugosz' Variable-Length Integer Encoding

Huffman coding refers to using fewer bits to store more common data in exchange for using more bits to store less common data.

Your scheme is similar to UTF-8, which is an encoding scheme used for Unicode text data.
The chief difference is that every byte in a UTF-8 stream indicates whether it is a lead or trailing byte, therefore a sequence can be read starting in the middle. With your scheme a missing lead byte will make the rest of the file completely unreadable if a series of such values are stored. And reading such a sequence must start at the beginning, rather than an arbitrary location.

Varint
Using the high bit of each byte to indicate "continue" or "stop", and the remaining bits (7 bits of each byte in the sequence) interpreted as plain binary that encodes the actual value:
This sounds like the "Base 128 Varint" as used in Google Protocol Buffers.
related ways of compressing integers
In summary: this code represents an integer in 2 parts:
A first part in a unary code that indicates how many bits will be needed to read in the rest of the value, and a second part (of the indicated width in bits) in more-or-less plain binary that encodes the actual value.
This particular code "threads" the unary code with the binary code, but other, similar codes pack the complete unary code first, and then the binary code afterwards,
such as Elias gamma coding.
I suspect this code is one of the family of "Start/Stop Codes"
as described in:
Steven Pigeon — Start/Stop Codes — Procs. Data Compression Conference 2001, IEEE Computer Society Press, 2001.

What encryption scheme meets requirement of decimal plaintext & ciphertext and preserves length?

I need an encryption scheme where the plaintext and ciphertext are composed entirely of decimal digits.
In addition, the plaintext and ciphertext must be the same length.
Also the underlying encryption algorithm should be an industry-standard.
I don't mind if its symmetric (e.g AES) or asymmetric (e.g RSA) - but it must be a recognized algorithm for which I can get a FIPS-140 approved library. (Otherwise it won't get past the security review stage).
Using AES OFB is fine for preserving the length of hex-based input (i.e. where each byte has 256 possible values: 0x00 --> 0xFF). However, this will not work for my means as plaintext and ciphertext must be entirely decimal.
NB: "Entirely decimal" may be interpreted two ways - both of which are acceptable for my requirements:
Input & output bytes are characters '0' --> '9' (i.e. byte values: 0x30 -> 0x39)
Input & output bytes have the 100 (decimal) values: 0x00 --> 0x99 (i.e. BCD)
Some more info:
The max plaintext & ciphertext length is likely to be 10 decimal digits.
(I.e. 10 bytes if using '0'-->'9' or 5 bytes if using BCD)
Consider following sample to see why AES fails:
Input string is 8 digit number.
Max 8-digit number is: 99999999
In hex this is: 0x5f5e0ff
This could be treated as 4 bytes: <0x05><0xf5><0xe0><0xff>
If I use AES OFB, I will get 4 byte output.
Highest possible 4-byte ciphertext output is <0xFF><0xFF><0xFF><0xFF>
Converting this back to an integer gives: 4294967295
I.e. a 10-digit number.
==> Two digits too long.
One last thing - there is no limit on the length any keys / IVs required.

Use AES/OFB, or any other stream cipher. It will generate a keystream of pseudorandom bits. Normally, you would XOR these bits with the plaintext. Instead:
For every decimal digit in the plaintext
Repeat
Take 4 bits from the keystream
Until the bits form a number less than 10
Add this number to the plaintext digit, modulo 10
To decrypt, do the same but subtract instead in the last step.
I believe this should be as secure as using the stream cipher normally. If a sequence of numbers 0-15 is indistinguishable from random, the subsequence of only those of the numbers that are smaller than 10 should still be random. Using add/subtract instead of XOR should still produce random output if one of the inputs are random.

One potential candidate is the FFX encryption mode, which has recently been submitted to NIST.

Stream ciphers require a nonce for security; the same key stream state must never be re-used for different messages. That nonce adds to the effective ciphertext length.
A block cipher used in a streaming mode has essentially the same issue: a unique initialization vector must be included with the cipher text.
Many stream ciphers are also vulnerable to ciphertext manipulation, where flipping a bit in the ciphertext undetectably flips the corresponding bit in the plaintext.
If the numbers are randomly chosen, and each number is encrypted only once, and the numbers are shorter than the block size, ECB offers good security. Under those conditions, I'd recommend AES in ECB mode as the solution that minimizes ciphertext length while providing strong privacy and integrity protection.
If there is some other information in the context of the ciphertext that could be used as an initialization vector (or nonce), then this could work. This could be something explicit, like a transaction identifier during a purchase, or something implicit like the sequence number of a message (which could be used as the counter in CTR mode). I guess that VeriShield is doing something like this.

I am not a cipher guru, but an obvious question comes to mind: would you be allowed to use One Time Pad encryption? Then you can just include a large block of truly random bits in your decoding system, and use the random data to transform your decimal digits in a reversible way.
If this would be acceptable, we just need to figure out how the decoder knows where in the block of randomness to look to get the key to decode any particular message. If you can send a plaintext timestamp with the ciphertext, then it's easy: convert the timestamp into a number, say the number of seconds since an epoch date, modulus that number by the length of the randomness block, and you have an offset within the block.
With a large enough block of randomness, this should be uncrackable. You could have the random bits be themselves encrypted with strong encryption, such that the user must type in a long password to unlock the decoder; in this way, even if the decryption software was captured, it would still not be easy to break the system.
If you have any interest in this and would like me to expand further, let me know. I don't want to spend a lot of time on an answer that doesn't meet your needs at all.
EDIT: Okay, with the tiny shred of encouragement ("you might be on to something") I'm expanding my answer.
The idea is that you get a block of randomness. One easy way to do this is to just pull data out of the Linux /dev/random device. Now, I'm going to assume that we have some way to find an index into this block of randomness for each message.
Index into the block of randomness and pull out ten bytes of data. Each byte is a number from 0 to 255. Add each of these numbers to the respective digit from the plaintext, modulo by 10, and you have the digits of the ciphertext. You can easily reverse this as long as you have the block of random data and the index: you get the random bits and subtract them from the cipher digits, modulo 10.
You can think of this as arranging the digits from 0 to 9 in a ring. Adding is counting clockwise around the ring, and subtracting is counting counter-clockwise. You can add or subtract any number and it will work. (My original version of this answer suggested using only 3 bits per digit. Not enough, as pointed out below by #Baffe Boyois. Thank you for this correction.)
If the plain text digit is 6, and the random number is 117, then: 6 + 117 == 123, modulo 10 == 3. 3 - 117 == -114, modulo 10 == 6.
As I said, the problem of finding the index is easy if you can use external plaintext information such as a timestamp. Even if your opponent knows you are using the timestamp to help decode messages, it does no good without the block of randomness.
The problem of finding the index is also easy if the message is always delivered; you can have an agreed-upon system of generating a series of indices, and say "This is the fourth message I have received, so I use the fourth index in the series." As a trivial example, if this is the fourth message received, you could agree to use an index value of 16 (4 for fourth message, times 4 bytes per one-time pad). But you could also use numbers from an approved pseudorandom number generator, initialized with an agreed constant value as a seed, and then you would get a somewhat unpredictable series of indexes within the block of randomness.
Depending on your needs, you could have a truly large chunk of random data (hundreds of megabytes or even more). If you use 10 bytes as a one-time pad, and you never use overlapping pads or reuse pads, then 1 megabyte of random data would yield over 100,000 one-time pads.

You could use the octal format, which uses digits 0-7, and three digits make up a byte. This isn't the most space-efficient solution, but it's quick and easy.
Example:
Text: Hello world!
Hexadecimal: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21
Octal: 110 145 154 154 157 040 167 157 162 154 144 041
(spaces added for clarity to separate bytes)

I don't believe your requirement can be met (at all easily anyway), though it's possible to get pretty close close.
AES (like most encryption algorithms) is written to work with octets (i.e. 8-bit bytes), and it's going to produce 8-bit bytes. Once it's done its thing, converting the result to use only decimal digits or BCD values won't be possible. Therefore, your only choice is to convert the input from decimal or BCD digits to something that fills an octet as completely as possible. You can then encrypt that, and finally re-encode the output to use only decimal or BCD digits.
When you convert the ASCII digits to fill the octets, it'll "compress" the input somewhat. The encryption will then produce the same size of output as the input. You'll then encode that to use only decimal digits, which will expand it back to roughly the original size.
The problem is that neither 10 nor 100 is a number that you're going to easily fit exactly into a byte. Numbers from 1 to 100 can be encoded in 7 bits. So, you'll basically treat those as a bit-stream, putting them in 7 bits at a time, but taking them out 8 bits at a time to get bytes to encrypt.
That uses the space somewhat better, but it's still not perfect. 7 bits can encode values from 0 to 127, not just 0 to 99, so even though you'll use all 8 bits, you won't use every possible combination of those 8 bits. Likewise, in the result, one byte will turn into three decimal digits (0 to 255), which clearly wastes some space. As a result, your output will be slightly larger than your input.
To get closer than that, you could compress your input with something like Huffman or an LZ* compression (or both) before encrypting it. Then you'd do roughly the same thing: encrypt the bytes, and encode the bytes using values from 0 to 9 or 0 to 99. This will give better usage of the bits in the bytes you encrypt, so you'd waste very little space in that transformation, but does nothing to improve the encoding on the output side.

For those doubting FFX mode AES, please feel free to contact me for further details. Our implementation is a mode of AES that effectively sits on top of existing ciphers. the specification with proof/validation is up on the NIST modes website. FFSEM mode AES is included under FFX mode.
http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffx/ffx-spec.pdf
If its meaningful, you can also have a conversation with NIST directly about their status in respect to modes submission/AES modes acceptance to address your FIPS question. FFX has security proofs, independent cryptographic review and is not a "new cipher". It is however based on methods that go back 20+ years - proven techniques. In implementation we ability to encrypt data whilst preserving length, structure, integrity, type and format. For example specify an explicit format policy that the output will be NNN-NN-NNNN.
So, as a mode of AES we can for example on a mainframe environment for implementation we simple use the native AES processor on a z10. Similar on open systems with HSM devices- we can sit on top of an existing AES implementation.
Format Preserving Encryption (as its often referred to) in this way is already being used in industry and available in off-the-shelf products and rather quick to deploy - already used in POS devices etc, Payments systems, Enterprise deployments etc.
Mark Bower
VP Product Management
Voltage Security
Drop a note to info#voltage.com for more info or take a look at our website for more info.

Something like a Feistel cipher should fit your requirements. Split your input number into two parts (say 8 digits each), pass one part through a not-necessarily-reversible-or-bijective function, and subtract the other part from the result of that function (modulo e.g. 100,000,000). Then rearrange the digits somehow and repeat a bunch of times. Ideally, one should slightly vary the function which is used each time. Decryption is similar to encryption except that one starts by undoing the last rearrangement step, then subtracts the second part of the message from the result of using the last function one used on the first part of the message (again, modulo 100,000,000), then undoes the previous rearrangement step, etc.
The biggest difficulties with a Feistel cipher are finding a function which achieve good encryption with a reasonable number of rounds, and figuring out how many rounds are required to achieve good encryption. If speed is not important, one could probably use something like AES to perform the scrambling function (since it doesn't have to be bijective, you could arbitrarily pad the data before each AES step, and interpret the result as a big binary number modulo 100,000,000). As for the number of rounds, 10 is probably too few, and 1000 is probably excessive. I don't know what value between there would be best.

Using only 10 digits as input/output is completely insecure. It is so insecure, that in very likely that will be cracked in real application, so consider using at least 39 digits (128 bits equivalent) If you are going to use only 10 digits there is no point in using AES in this case you have chance to invent your own (insecure) algorithm.
The only way you might get out of this is using STREAM cipher. Use 256 bit key "SecureKey" and initialisation vector IV which should be different at each beginning of starting season.
Translate this number into 77 digit (decimal) number and use "addition whit overflow" over modulo 10 whit each digit.
For instance
AES(IV,KEY) = 4534670 //and lot more
secret_message = 01235
+ and mod 10
---------------------------------------------
ciphertext = 46571 // and you still have 70 for next message
when you run out for digits from stream cipher -> AES(IV,KEY) increase IV and repeat IV=IV+1
Keep in mind that you should absolutely never use same IV twice, so you should have some scheme over this to prevent this.
Another concern is also in generating Streams. If you generate number that is higher than 10^77 you should discard that number increase IV and try again whit new IV. Other way there is high probability that you will have biased numbers and vulnerability.
There also very likely that there is flaw in this scheme or there will be in your implementation

Compression for a unique stream of data

I've got a large number of integer arrays. Each one has a few thousand integers in it, and each integer is generally the same as the one before it or is different by only a single bit or two. I'd like to shrink each array down as small as possible to reduce my disk IO.
Zlib shrinks it to about 25% of its original size. That's nice, but I don't think its algorithm is particularly well suited for the problem. Does anyone know a compression library or simple algorithm that might perform better for this type of information?
Update: zlib after converting it to an array of xor deltas shrinks it to about 20% of the original size.

If most of the integers really are the same as the previous, and the inter-symbol difference can usually be expressed as a single bit flip, this sounds like a job for XOR.
Take an input stream like:
1101
1101
1110
1110
0110
and output:
1101
0000
0010
0000
1000
a bit of pseudo code
compressed[0] = uncompressed[0]
loop
compressed[i] = uncompressed[i-1] ^ uncompressed[i]
We've now reduced most of the output to 0, even when a high bit is changed. The RLE compression in any other tool you use will have a field day with this. It'll work even better on 32-bit integers, and it can still encode a radically different integer popping up in the stream. You're saved the bother of dealing with bit-packing yourself, as everything remains an int-sized quantity.
When you want to decompress:
uncompressed[0] = compressed[0]
loop
uncompressed[i] = uncompressed[i-1] ^ compressed[i]
This also has the advantage of being a simple algorithm that is going to run really, really fast, since it is just XOR.

Have you considered Run-length encoding?
Or try this: Instead of storing the numbers themselves, you store the differences between the numbers. 1 1 2 2 2 3 5 becomes 1 0 1 0 0 1 2. Now most of the numbers you have to encode are very small. To store a small integer, use an 8-bit integer instead of the 32-bit one you'll encode on most platforms. That's a factor of 4 right there. If you do need to be prepared for bigger gaps than that, designate the high-bit of the 8-bit integer to say "this number requires the next 8 bits as well".
You can combine that with run-length encoding for even better compression ratios, depending on your data.
Neither of these options is particularly hard to implement, and they all run very fast and with very little memory (as opposed to, say, bzip).

You want to preprocess your data -- reversibly transform it to some form that is better-suited to your back-end data compression method, first. The details will depend on both the back-end compression method, and (more critically) on the properties you expect from the data you're compressing.
In your case, zlib is a byte-wise compression method, but your data comes in (32-bit?) integers. You don't need to reimplement zlib yourself, but you do need to read up on how it works, so you can figure out how to present it with easily compressible data, or if it's appropriate for your purposes at all.
Zlib implements a form of Lempel-Ziv coding. JPG and many others use Huffman coding for their backend. Run-length encoding is popular for many ad hoc uses. Etc., etc. ...

Perhaps the answer is to pre-filter the arrays in a way analogous to the Filtering used to create small PNG images. Here are some ideas right off the top of my head. I've not tried these approaches, but if you feel like playing, they could be interesting.
Break your ints up each into 4 bytes, so i0, i1, i2, ..., in becomes b0,0, b0,1, b0,2, b0,3, b1,0, b1,1, b1,2, b1,3, ..., bn,0, bn,1, bn,2, bn,3. Then write out all the bi,0s, followed by the bi,1s, bi,2s, and bi,3s. If most of the time your numbers differ only by a bit or two, you should get nice long runs of repeated bytes, which should compress really nicely using something like Run-length Encoding or zlib. This is my favourite of the methods I present.
If the integers in each array are closely-related to the one before, you could maybe store the original integer, followed by diffs against the previous entry - this should give a smaller set of values to draw from, which typically results in a more compressed form.
If you have various bits differing, you still may have largish differences, but if you're more likely to have large numeric differences that correspond to (usually) one or two bits differing, you may be better off with a scheme where you create ahebyte array - use the first 4 bytes to encode the first integer, and then for each subsequent entry, use 0 or more bytes to indicate which bits should be flipped - storing 0, 1, 2, ..., or 31 in the byte, with a sentinel (say 32) to indicate when you're done. This could result the raw number of bytes needed to represent and integer to something close to 2 on average, which most bytes coming from a limited set (0 - 32). Run that stream through zlib, and maybe you'll be pleasantly surprised.

Did you try bzip2 for this?
http://bzip.org/
It's always worked better than zlib for me.

Since your concern is to reduce disk IO, you'll want to compress each integer array independently, without making reference to other integer arrays.
A common technique for your scenario is to store the differences, since a small number of differences can be encoded with short codewords. It sounds like you need to come up with your own coding scheme for differences, since they are multi-bit differences, perhaps using an 8 bit byte something like this as a starting point:
1 bit to indicate that a complete new integer follows, or that this byte encodes a difference from the last integer,
1 bit to indicate that there are more bytes following, recording more single bit differences for the same integer.
6 bits to record the bit number to switch from your previous integer.
If there are more than 4 bits different, then store the integer.
This scheme might not be appropriate if you also have a lot of completely different codes, since they'll take 5 bytes each now instead of 4.

"Zlib shrinks it by a factor of about 4x." means that a file of 100K now takes up negative 300K; that's pretty impressive by any definition :-). I assume you mean it shrinks it by 75%, i.e., to 1/4 its original size.
One possibility for an optimized compression is as follows (it assumes a 32-bit integer and at most 3 bits changing from element to element).
Output the first integer (32 bits).
Output the number of bit changes (n=0-3, 2 bits).
Output n bit specifiers (0-31, 5 bits each).
Worst case for this compression is 3 bit changes in every integer (2+5+5+5 bits) which will tend towards 17/32 of original size (46.875% compression).
I say "tends towards" since the first integer is always 32 bits but, for any decent sized array, that first integer would be negligable.
Best case is a file of identical integers (no bit changes for every integer, just the 2 zero bits) - this will tend towards 2/32 of original size (93.75% compression).
Where you average 2 bits different per consecutive integer (as you say is your common case), you'll get 2+5+5 bits per integer which will tend towards 12/32 or 62.5% compression.
Your break-even point (if zlib gives 75% compression) is 8 bits per integer which would be
single-bit changes (2+5 = 7 bits) : 80% of the transitions.
double-bit changes (2+5+5 = 12 bits) : 20% of the transitions.
This means your average would have to be 1.2 bit changes per integer to make this worthwhile.
One thing I would suggest looking at is 7zip - this has a very liberal licence and you can link it with your code (I think the source is available as well).
I notice (for my stuff anyway) it performs much better than WinZip on a Windows platform so it may also outperform zlib.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js