Cannot write bitsets fast enough - c++

I need to be able to write down 12-bit bitsets on the order of speed of around 1 millisecond per 10,000 bitsets. Basically I'm provided with data in 12-bit packages (bitsets, in this case) and I need to be able to store them (I've chosen to write them to a file, open to suggestions if other methods exist) within an incredibly small timespan.
Right now I've set up an example of a bitset array of size 10,000 (to simulate what I would actually get) and write them all down into a file
int main()
{
std::bitset<12> map[10000];
std::ofstream os("myfile.txt", std::ofstream::binary);
//From here
for (int i = 0; i < 10000; ++i)
{
os << map[i];
}
//to here takes slightly under 7 ms -- too slow
}
As the comments say, it takes 7 ms. I'm open to any and all speed improvements, and am hopeful to get (optimally) 1ms for that loop.
Edit Info: This is for a Serial Peripheral Interface (SPI), and the data will be all available, as it is in the example, then dumped all at once, not as a stream of bitsets. For more technical specs, I'm using an Arduino Atmega328p, ADS7816, and an SD card reader

Two recommendations:
Minimize trips to the OS. Write multiple bytes in one go.
Pack the bits before writing. Your current solution writes the bits as characters, i.e. one byte for every bit. Write in binary mode, which would be 8 times more compact (and also faster).
#include <bitset>
#include <fstream>
#include <vector>
int main()
{
std::bitset<12> map[10000];
// Initialize with demo values
for (int i = 0; i < 10000; ++i) {
map[i] = i + 1;
}
// Pack bits into a binary buffer
std::vector<uint8_t> buffer((10000 * 12 + 7) / 8);
for (int i = 0, j = 0, rem = 0; i < 10000; ++i) {
unsigned long b = map[i].to_ulong();
buffer[j++] |= static_cast<uint8_t>(b >> (4 + rem));
buffer[j] |= static_cast<uint8_t>(b << (4 - rem));
rem += 12 % 8;
if (rem >= 8) {
rem -= 8;
j++;
}
}
// Write the buffer in 1 go
std::ofstream os("myfile.bin", std::ofstream::binary);
os.write(reinterpret_cast<const char*>(buffer.data()), buffer.size());
os.close(); // don't forget to close() to flush the file
}
If you prefer to keep your text file format, at least enable buffering:
int main()
{
std::bitset<12> map[10000];
// Enable buffering
std::vector<char> buf(256 * 1024);
std::ofstream os("myfile.txt", std::ofstream::binary);
os.rdbuf()->pubsetbuf(buf.data(), buf.size());
for (int i = 0; i < 10000; ++i)
{
os << map[i];
}
os.close(); // to flush the buffer
}

Related

How do I count how often a string of letters occurs in a .txt file? (in C++)

I searched for ways to count how often a string of letters appears in a .txt file and found (among others) this thread: Count the number of times each word occurs in a file
which deals with the problem by counting words (which are separated by spaces).
However, I need to do something slightly different:
I have a .txt file containing billions of letters without any formatting (no spaces, no puntuation, no line breaks, no hard returns, etc.), just a loooooong line of the letters a, g, t and c (i.e: a DNA sequence ;)).
Now I want to write a program that goes through the entire sequence and count how often each possible continuous sequence of 9 letters appears in that file.
Yes, there are 4^9 possible combinations of 9-letter 'words' made up of the characters A, G, T and C, but I only want to output the top 1000.
Since there are no spaces or anything, I would have to go through the file letter-by-letter and examine all the 9-letter 'words' that appear, i.e.:
ATAGAGCTAGATCCCTAGCTAGTGACTA
contains the sequences:
ATAGAGCTA, TAGAGCTAG, AGAGCTAGA, etc.
I hope you know what I mean, it's hard for me to describe the same in English, since it is not my native language.
Best regards and thank you all in advance!
Compared to billions, 2^18, or 256k seems suddenly small. The good news is that it means your histogram can be stored in about 1 MB of data. A simple approach would be to convert each letter to a 2-bit representation, assuming your file only contains AGCT, and none of the RYMK... shorthands and wildcards.
This is what this 'esquisse' does. It packs the 9 bytes of text into an 18 bit value and increments the corresponding histogram bin. To speed up[ conversion a bit, it reads 4 bytes and uses a lookup table to convert 4 glyphs at a time.
I don't know how fast this will run, but it should be reasonable. I haven't tested it, but I know it compiles, at least under gcc. There is no printout, but there is a helper function to unpack sequence packed binary format back to text.
It should give you at least a good starting point
#include <vector>
#include <array>
#include <algorithm>
#include <iostream>
#include <fstream>
#include <exception>
namespace dna {
// helpers to convert nucleotides to packed binary form
enum Nucleotide : uint8_t { A, G, C, T };
uint8_t simple_lut[4][256] = {};
void init_simple_lut()
{
for (size_t i = 0 ; i < 4; ++i)
{
simple_lut[i]['A'] = A << (i * 2);
simple_lut[i]['C'] = C << (i * 2);
simple_lut[i]['G'] = G << (i * 2);
simple_lut[i]['T'] = T << (i * 2);
}
}
uint32_t pack4(const char(&seq)[4])
{
return simple_lut[0][seq[0]]
+ simple_lut[1][seq[1]]
+ simple_lut[2][seq[2]]
+ simple_lut[3][seq[3]];
}
// you can use this to convert the historghrtam
// index back to text.
std::string hist_index2string(uint32_t n)
{
std::string result;
result.reserve(9);
for (size_t i = 0; i < 9; ++i, n >>= 2)
{
switch (n & 0x03)
{
case A: result.insert(result.begin(), 'A'); break;
case C: result.insert(result.begin(), 'C'); break;
case G: result.insert(result.begin(), 'G'); break;
case T: result.insert(result.begin(), 'T'); break;
default:
throw std::runtime_error{ "totally unexpected error while unpacking index !!" };
}
}
return result;
}
}
int main(int argc, const char**argv, const char**)
{
if (argc < 2)
{
std::cerr << "Usage: prog_name <input_file> <output_file>\n";
return 3;
}
using dna::pack4;
dna::init_simple_lut();
std::vector<uint32_t> result;
try
{
result.resize(1 << 18);
std::ifstream ifs(argv[1]);
// read 4 bytes at a time, convert to packed bits representation
// then rotate in bits 2 by 2 in our 18 bits buffer.
// increment coresponding bin by 1
const uint32_t MASK{ (1 << 19) - 1 };
const std::streamsize RB{ 4 };
uint32_t accu{};
uint32_t packed{};
// we need to load at least 9 bytes to 'prime' the engine
char buffer[4];
ifs.read(buffer, RB);
accu = pack4(buffer) << 8;
ifs.read(buffer, RB);
accu |= pack4(buffer);
if (ifs.gcount() != RB)
{
throw std::runtime_error{ " input file is too short "};
}
ifs.read(buffer, RB);
while (ifs.gcount() != 0)
{
packed = pack4(buffer);
for (size_t i = 0; i < (size_t)ifs.gcount(); ++i)
{
accu <<= 2;
accu |= packed & 0x03;
packed >>= 2;
accu &= MASK;
++result[accu];
}
ifs.read(buffer.pch, RB);
}
ifs.close();
// histogram is compiled. store data to another file...
// you can crate a secondary table of { index and count }
// it's only 5MB long and use partial_sort to extract the first
// 1000.
}
catch(std::exception& e)
{
std::cerr << "Error \'" << e.what() << "\' while reading file.\n";
return 3;
}
return 0;
}
This algorithm can be adapted to run on multiple threads, by opening the file in multiple streams with the proper share configuration, and running the loop on bits of the file. Care must be taken for the 16 bytes seams at the end of the process.
If running in parallel, the inner loop is so short that it may be a good idea to provide each thread with its own histogram and merge the results at the end, otherwise, the locking overhead would slow things quite a bit.
[EDIT] Silly me I had the packed binary lookup wrong.
[EDIT2] replaced the packed lut with a faster version.
This works for you,
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
string line;
int sum=0;
ifstream inData ;
inData.open("countletters.txt");
while(!inData.eof())
{
getline(inData,line);
int numofChars= line.length();
for (unsigned int n = 0; n<line.length();n++)
{
if (line.at(n) == ' ')
{
numofChars--;
}
}
sum=numofChars+sum;
}
cout << "Number of characters: "<< sum << endl;
return 0 ;
}

Reading Binary file into std::vector<bool>

Hello I am trying to write 8 bits from std::vector to binary file and read them back . Writing works fine , have checked with binary editor and all values are correct , but once I try to read I got bad data .
Data that i am writing :
11000111 //bits
Data that i got from reading:
11111111 //bits
Read function :
std::vector<bool> Read()
{
std::vector<bool> map;
std::ifstream fin("test.bin", std::ios::binary);
int size = 8 / 8.0f;
char * buffer = new char[size];
fin.read(buffer, size);
fin.close();
for (int i = 0; i < size; i++)
{
for (int id = 0; id < 8; id++)
{
map.emplace_back(buffer[i] << id);
}
}
delete[] buffer;
return map;
}
Write function(just so you guys know more whats going on)
void Write(std::vector<bool>& map)
{
std::ofstream fout("test.bin", std::ios::binary);
char byte = 0;
int byte_index = 0;
for (size_t i = 0; i < map.size(); i++)
{
if (map[i])
{
byte |= (1 << byte_index);
}
byte_index++;
if (byte_index > 7)
{
byte_index = 0;
fout.write(&byte, sizeof(byte));
}
}
fout.close();
}
Your code spreads out one byte (the value of buffer[i], where i is always 0) over 8 bools. Since you only read one byte, which happens to be non-zero, you now end up with 8 trues (since any non-zero integer converts to true).
Instead of spreading one value out, you probably want to split one value into its constituent bits:
for (int id = 0; id < 8; id++)
{
map.emplace_back((static_cast<unsigned char>(buffer[i]) & (1U << id)) >> id);
}

Reading then adding large number of integers from binary file fast in C/C++

I was writing code to read unsigned integers from a binary file using C/C++ on a 32 bit Linux OS intended to run on an 8-core x86 system. The application takes an input file which contains unsigned integers in little-endian format one after another. So the input file size in bytes is a multiple of 4. The file could have a billion integers in it. What is the fastest way to read and add all the integers and return the sum with 64 bit precision?
Below is my implementation. Error checking for corrupt data is not the major concern here and the input file is considered to without any issues in this case.
#include <iostream>
#include <fstream>
#include <pthread.h>
#include <string>
#include <string.h>
using namespace std;
string filepath;
unsigned int READBLOCKSIZE = 1024*1024;
unsigned long long nFileLength = 0;
unsigned long long accumulator = 0; // assuming 32 bit OS running on X86-64
unsigned int seekIndex[8] = {};
unsigned int threadBlockSize = 0;
unsigned long long acc[8] = {};
pthread_t thread[8];
void* threadFunc(void* pThreadNum);
//time_t seconds1;
//time_t seconds2;
int main(int argc, char *argv[])
{
if (argc < 2)
{
cout << "Please enter a file path\n";
return -1;
}
//seconds1 = time (NULL);
//cout << "Start Time in seconds since January 1, 1970 -> " << seconds1 << "\n";
string path(argv[1]);
filepath = path;
ifstream ifsReadFile(filepath.c_str(), ifstream::binary); // Create FileStream for the file to be read
if(0 == ifsReadFile.is_open())
{
cout << "Could not find/open input file\n";
return -1;
}
ifsReadFile.seekg (0, ios::end);
nFileLength = ifsReadFile.tellg(); // get file size
ifsReadFile.seekg (0, ios::beg);
if(nFileLength < 16*READBLOCKSIZE)
{
//cout << "Using One Thread\n"; //**
char* readBuf = new char[READBLOCKSIZE];
if(0 == readBuf) return -1;
unsigned int startOffset = 0;
if(nFileLength > READBLOCKSIZE)
{
while(startOffset + READBLOCKSIZE < nFileLength)
{
//ifsReadFile.flush();
ifsReadFile.read(readBuf, READBLOCKSIZE); // At this point ifsReadFile is open
int* num = reinterpret_cast<int*>(readBuf);
for(unsigned int i = 0 ; i < (READBLOCKSIZE/4) ; i++)
{
accumulator += *(num + i);
}
startOffset += READBLOCKSIZE;
}
}
if(nFileLength - (startOffset) > 0)
{
ifsReadFile.read(readBuf, nFileLength - (startOffset));
int* num = reinterpret_cast<int*>(readBuf);
for(unsigned int i = 0 ; i < ((nFileLength - startOffset)/4) ; ++i)
{
accumulator += *(num + i);
}
}
delete[] readBuf; readBuf = 0;
}
else
{
//cout << "Using 8 Threads\n"; //**
unsigned int currthreadnum[8] = {0,1,2,3,4,5,6,7};
if(nFileLength > 200000000) READBLOCKSIZE *= 16; // read larger blocks
//cout << "Read Block Size -> " << READBLOCKSIZE << "\n";
if(nFileLength % 28)
{
threadBlockSize = (nFileLength / 28);
threadBlockSize *= 4;
}
else
{
threadBlockSize = (nFileLength / 7);
}
for(int i = 0; i < 8 ; ++i)
{
seekIndex[i] = i*threadBlockSize;
//cout << seekIndex[i] << "\n";
}
pthread_create(&thread[0], NULL, threadFunc, (void*)(currthreadnum + 0));
pthread_create(&thread[1], NULL, threadFunc, (void*)(currthreadnum + 1));
pthread_create(&thread[2], NULL, threadFunc, (void*)(currthreadnum + 2));
pthread_create(&thread[3], NULL, threadFunc, (void*)(currthreadnum + 3));
pthread_create(&thread[4], NULL, threadFunc, (void*)(currthreadnum + 4));
pthread_create(&thread[5], NULL, threadFunc, (void*)(currthreadnum + 5));
pthread_create(&thread[6], NULL, threadFunc, (void*)(currthreadnum + 6));
pthread_create(&thread[7], NULL, threadFunc, (void*)(currthreadnum + 7));
pthread_join(thread[0], NULL);
pthread_join(thread[1], NULL);
pthread_join(thread[2], NULL);
pthread_join(thread[3], NULL);
pthread_join(thread[4], NULL);
pthread_join(thread[5], NULL);
pthread_join(thread[6], NULL);
pthread_join(thread[7], NULL);
for(int i = 0; i < 8; ++i)
{
accumulator += acc[i];
}
}
//seconds2 = time (NULL);
//cout << "End Time in seconds since January 1, 1970 -> " << seconds2 << "\n";
//cout << "Total time to add " << nFileLength/4 << " integers -> " << seconds2 - seconds1 << " seconds\n";
cout << accumulator << "\n";
return 0;
}
void* threadFunc(void* pThreadNum)
{
unsigned int threadNum = *reinterpret_cast<int*>(pThreadNum);
char* localReadBuf = new char[READBLOCKSIZE];
unsigned int startOffset = seekIndex[threadNum];
ifstream ifs(filepath.c_str(), ifstream::binary); // Create FileStream for the file to be read
if(0 == ifs.is_open())
{
cout << "Could not find/open input file\n";
return 0;
}
ifs.seekg (startOffset, ios::beg); // Seek to the correct offset for this thread
acc[threadNum] = 0;
unsigned int endOffset = startOffset + threadBlockSize;
if(endOffset > nFileLength) endOffset = nFileLength; // for last thread
//cout << threadNum << "-" << startOffset << "-" << endOffset << "\n";
if((endOffset - startOffset) > READBLOCKSIZE)
{
while(startOffset + READBLOCKSIZE < endOffset)
{
ifs.read(localReadBuf, READBLOCKSIZE); // At this point ifs is open
int* num = reinterpret_cast<int*>(localReadBuf);
for(unsigned int i = 0 ; i < (READBLOCKSIZE/4) ; i++)
{
acc[threadNum] += *(num + i);
}
startOffset += READBLOCKSIZE;
}
}
if(endOffset - startOffset > 0)
{
ifs.read(localReadBuf, endOffset - startOffset);
int* num = reinterpret_cast<int*>(localReadBuf);
for(unsigned int i = 0 ; i < ((endOffset - startOffset)/4) ; ++i)
{
acc[threadNum] += *(num + i);
}
}
//cout << "Thread " << threadNum + 1 << " subsum = " << acc[threadNum] << "\n"; //**
delete[] localReadBuf; localReadBuf = 0;
return 0;
}
I wrote a small C# program to generate the input binary file for testing.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace BinaryNumWriter
{
class Program
{
static UInt64 total = 0;
static void Main(string[] args)
{
BinaryWriter bw = new BinaryWriter(File.Open("test.txt", FileMode.Create));
Random rn = new Random();
for (UInt32 i = 1; i <= 500000000; ++i)
{
UInt32 num = (UInt32)rn.Next(0, 0xffff);
bw.Write(num);
total += num;
}
bw.Flush();
bw.Close();
}
}
}
Running the program on a Core i5 machine # 3.33 Ghz (its quad-core but its what I got at the moment) with 2 GB RAM and Ubuntu 9.10 32 bit had the following performance numbers
100 integers ~ 0 seconds (I would really have to suck otherwise)
100000 integers < 0 seconds
100000000 integers ~ 7 seconds
500000000 integers ~ 29 seconds (1.86 GB input file)
I am not sure if the HDD is 5400RPM or 7200RPM. I tried different buffer sizes for reading and found reading 16 MB at a time for big input files was kinda the sweet spot.
Are there any better ways to read faster from the file to increase overall performance? Is there a smarter way to add large arrays of integers faster and folding repeatedly? Is there any major roadblocks to performance the way I have written the code / Am I doing something obviously wrong that's costing a lot of time?
What can I do to make this process of reading and adding data faster?
Thanks.
Chinmay
Accessing a mechanical HDD from multiple threads the way you do is going to take some head movement (read slow it down). You're almost surely IO bound (65MBps for the 1.86GB file).
Try to change your strategy by:
starting the 8 threads - we'll call them CONSUMERS
the 8 threads will wait for data to be made available
in the main thread start to read chunks (say 256KB) of the file thus being a PROVIDER for the CONSUMERS
main thread hits the EOF and signals the workers that there is no more avail data
main thread waits for the 8 workers to join.
You'll need quite a bit of synchronization to get it working flawlessly and I think it would totally max out your HDD / filesystem IO capabilities by doing sequencial file access. YMMV on smallish files which can be cached and served from the cache at lightning speed.
Another thing you can try is to start only 7 threads, leave one free CPU for the main thread & the rest of the system.
.. or get an SSD :)
Edit:
For simplicity see how fast you can simply read the file (discarding the buffers) with no processing, single-threaded. That plus epsilon is your theoretical limit to how fast you can get this done.
If you want to read (or write) a lot of data fast, and you don't want to do much processing with that data, you need to avoid extra copies of the data between buffers. That means you want to avoid fstream or FILE abstractions (as they introduce an extra buffer that needs to be copied through), and avoid read/write type calls that copy stuff between kernel and user buffers.
Instead, on linux, you want to use mmap(2). On a 64-bit OS, just mmap the entire file into memory, use madvise(MADV_SEQUENTIAL) to tell the kernel you're going to be accessing it mostly sequentially, and have at it. For a 32-bit OS, you'll need to mmap in chunks, unmapping the previous chunk each time. Something much like your current structure, with each thread mmapping one fixed-size chunk at a time should work well.

How does one store a vector<bool> or a bitset into a file, but bit-wise?

How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.

Bit packing of array of integers

I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).
What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?
I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.
Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.
Edit:
This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.
Today I released: PackedArray: Packing Unsigned Integers Tightly (github project).
It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:
PackedArray principle:
. compact storage of <= 32 bits items
. items are tightly packed into a buffer of uint32_t integers
PackedArray requirements:
. you must know in advance how many bits are needed to hold a single item
. you must know in advance how many items you want to store
. when packing, behavior is undefined if items have more than bitsPerItem bits
PackedArray general in memory representation:
|-------------------------------------------------- - - -
| b0 | b1 | b2 |
|-------------------------------------------------- - - -
| i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
|-------------------------------------------------- - - -
. items are tightly packed together
. several items end up inside the same buffer cell, e.g. i0, i1, i2
. some items span two buffer cells, e.g. i3, i6
I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:
Pick a constant n such that the largest integer can be represented.
Allow n to vary from value to value.
The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.
The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.
The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.
I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:
#include <sys/types.h>
#include <stdio.h>
int pack(int64_t* input, int nin, void* output, int n)
{
int64_t inmask = 0;
unsigned char* pout = (unsigned char*)output;
int obit = 0;
int nout = 0;
*pout = 0;
for(int i=0; i<nin; i++)
{
inmask = (int64_t)1 << (n-1);
for(int k=0; k<n; k++)
{
if(obit>7)
{
obit = 0;
pout++;
*pout = 0;
}
*pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
inmask >>= 1;
obit++;
nout++;
}
}
return nout;
}
int unpack(void* input, int nbitsin, int64_t* output, int n)
{
unsigned char* pin = (unsigned char*)input;
int64_t* pout = output;
int nbits = nbitsin;
unsigned char inmask = 0x80;
int inbit = 0;
int nout = 0;
while(nbits > 0)
{
*pout = 0;
for(int i=0; i<n; i++)
{
if(inbit > 7)
{
pin++;
inbit = 0;
}
*pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
inbit++;
}
pout++;
nbits -= n;
nout++;
}
return nout;
}
int main()
{
int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int64_t output[21];
unsigned char compressed[21*8];
int n = 5;
int nbits = pack(input, 21, compressed, n);
int nout = unpack(compressed, nbits, output, n);
for(int i=0; i<=20; i++)
printf("input: %lld output: %lld\n", input[i], output[i]);
}
This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.
Most any compression algorithm will get close to the minimum entropy needed to encode the integers, for example, Huffman coding, but accessing it like an array will be non-trivial.
Starting from Jason B's implementation, I eventually wrote my own version which processes bit-blocks instead of single bits. One difference is that it is lsb: It starts from lowest output bits going to highest. This only makes it harder to read with a binary dump, like Linux xxd -b. As a detail, int* can be trivially changed to int64_t*, and it should even better be unsigned. I have already tested this version with a few million arrays and it seems solid, so I share will the rest:
int pack2(int *input, int nin, unsigned char* output, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
if(nin>0) output[0] = 0;
for(int i=0; i<nin; i++)
{
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
output[nout] |= (input[i] & (((1 << ibite)-1) ^ ((1 << ibit)-1))) >> ibit << obit;
obit += ibite - ibit;
nout += obit >> 3;
if(obit & 8) output[nout] = 0;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
int unpack2(int *oinput, int nin, unsigned char* ioutput, int n)
{
int obit = 0;
int ibit = 0;
int ibite = 0;
int nout = 0;
for(int i=0; i<nin; i++)
{
oinput[i] = 0;
ibit = 0;
while(ibit < n) {
ibite = std::min(n, ibit + 8 - obit);
oinput[i] |= (ioutput[nout] & (((1 << (ibite-ibit+obit))-1) ^ ((1 << obit)-1))) >> obit << ibit;
obit += ibite - ibit;
nout += obit >> 3;
obit &= 7;
ibit = ibite;
}
}
return nout;
}
I know this might seem like the obvious thing to say as I'm sure there's actually a solution, but why not use a smaller type, like uint8_t (max 255)? or uint16_t (max 65535)?. I'm sure you could bit-manipulate on an int64_t using defined values and or operations and the like, but, aside from an academic exercise, why?
And on the note of academic exercises, Bit Twiddling Hacks is a good read.
If you have fixed sizes, e.g. you know your number is 38bit rather than 64, you can build structures using bit specifications. Amusing you also have smaller elements to fit in the remaining space.
struct example {
/* 64bit number cut into 3 different sized sections */
uint64_t big_num:38;
uint64_t small_num:16;
uint64_t itty_num:10;
/* 8 bit number cut in two */
uint8_t nibble_A:4;
uint8_t nibble_B:4;
};
This isn't big/little endian safe without some hoop-jumping, so can only be used within a program rather than in a exported data format. It's quite often used to store boolean values in single bits without defining shifts and masks.
I don't think you can avoid iterating across the elements.
AFAIK Huffman encoding requires the frequencies of the "symbols", which unless you know the statistics of the "process" generating the integers, you will have to compute (by iterating across every element).