Reading 32 bit hex data from file - c++

What is the best way to go about reading signed multi-byte words from a buffer of bytes?
Is there a standard way to do this that I am not aware of, or am I on the right track reading in 4 chars and raising them to their respecting power of 16 and summing them together?
int ReadBuffer(int BuffPosition, int SequenceLength){
int val = 0;
int limit = BuffPosition + SequenceLength;
int place = 0;
for( BuffPosition; BuffPosition < limit; BuffPosition++ ){
int current = Buff[BuffPosition];
current *= pow(16, (2*place));
val += current;
place++;
}
return val;}

Assuming you read/write your file on the same machine (same endianness), you can use a 32 bit type like int32_t (#include <cstdint>) and read directly. Small example below:
#include <iostream>
#include <fstream>
#include <cstdint>
int main()
{
std::fstream file("file.bin", std::ios::in | std::ios::out | std::ios::binary);
const std::size_t N = 256; // length of the buffer
int32_t buf[N]; // our buffer
for (std::size_t i = 0; i < N; ++i) // fill the buffer
buf[i] = i;
// write to file
file.write((char*)buf, N * sizeof(int32_t));
for (std::size_t i = 0; i < N; ++i) // zero-in the buffer
buf[i] = 0; // to convince we're not cheating
// read from file
file.seekg(0); // rewind to beginning
file.read((char*)buf, N * sizeof(int32_t));
// display the buffer
for (std::size_t i = 0; i < N; ++i) // fill the buffer
std::cout << buf[i] << " ";
}

I now realize that I can take a char* buffer and cast it to a data type with the correct size.
char* 8BitBuffer[4000];
int* 32BitBuffer;
if(sizeof(int) == 4){
32BitBuffer = (int*)8BitBuffer;
}
dostuffwith(32BitBuffer[index]);
I am trying to process a wav file, so In an attempt to maximize efficiency I was trying to avoid reading from the file 44100 times a second. Whether or not that is actually slower than reading from an array I am not actually sure.

Related

Faster way of loading (big) std::vector<std::vector<float>> from file

I have implemented a way to save a std::vector of vectors to file and read them using this code (found here on stackoverflow):
Saving:
void saveData(std::string path)
{
std::ofstream FILE(path, std::ios::out | std::ofstream::binary);
// Store size of the outer vector
int s1 = RecordData.size();
FILE.write(reinterpret_cast<const char*>(&s1), sizeof(s1));
// Now write each vector one by one
for (auto& v : RecordData) {
// Store its size
int size = v.size();
FILE.write(reinterpret_cast<const char*>(&size), sizeof(size));
// Store its contents
FILE.write(reinterpret_cast<const char*>(&v[0]), v.size() * sizeof(float));
}
FILE.close();
}
Reading:
void loadData(std::string path)
{
std::ifstream FILE(path, std::ios::in | std::ifstream::binary);
if (RecordData.size() > 0) // Clear data
{
for (int n = 0; n < RecordData.size(); n++)
RecordData[n].clear();
RecordData.clear();
}
int size = 0;
FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
RecordData.resize(size);
for (int n = 0; n < size; ++n) {
int size2 = 0;
FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
float f;
//RecordData[n].reserve(size2); // This doesn't make a difference in speed
for (int k = 0; k < size2; ++k) {
FILE.read(reinterpret_cast<char*>(&f), sizeof(f));
RecordData[n].push_back(f);
}
}
}
This works perfectly, but loading for a big dataset (980MB, size 32000 for inner vectors and 1600 of those) takes ~7-8 seconds (in contrast to saving, which is done in under 1 sec.). Since I can see memory-usage in Visual Studio going up slowly during loading, my guess would be a lot of memory allocations. The commented out line RecordData[n].resize(size2); doesn't make a difference, though.
Can anybody give me a faster way of loading this kind of data? My first try was putting all the data in one big std::vector<float> but that for some reason seemed to give some kind of overflow (which shouldn't happen, because sizeof(int) = 4, so ~4 billion, should be enough for an index variable (does std::vector use somehing else internally?)). Also it would be really nice to have a data-structure of std::vector<std::vector<float>>. In the future I will have to handle way bigger datasets (altough I will probably use <short> for that to save memory and handle it as a fixed-point-number), so loading-speeds will be more significant...
Edit:
I should point out, that 32000 for the inner vector and 1600 for the outer vector is just an example. Both can vary. I think, I would have to save an "index-vector" as the first inner vector to declare the number of items for the rest (like I said in a comment: I'm a first-time file-reader/-writer and haven't used std::vector for more than I week or two, so I'm not sure about that). I will look into block-reading and post the result in a later edit...
Edit2:
So, here is the version of perivesta (thank you for that). The only change I made is discarding RV& RecordData because this is a global variable for me.
Curiously this brings my loading time down only from ~7000ms to ~1500ms for a 980 GB file, not 7429ms to 644 ms for a 2 GB file for perivesta (strange, how different speeds differ on different systems ;-) )
void loadData2(std::string path)
{
std::ifstream FILE(path, std::ios::in | std::ifstream::binary);
if (RecordData.size() > 0) // Clear data
{
for (int n = 0; n < RecordData.size(); n++)
RecordData[n].clear();
RecordData.clear();
}
int size = 0;
FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
RecordData.resize(size);
for (auto& v : RecordData) {
// load its size
int size2 = 0;
FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
v.resize(size2);
// load its contents
FILE.read(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(float));
}
}
This is an implementation of Alan Birtles' comment: When reading, read an inner vector with one single FILE.read call instead of many individual ones. This reduces the time dramatically on my system:
These are the results for a 2GB file:
Writing took 2283 ms
Reading v1 took 7429 ms
Reading v2 took 644 ms
Here is the code that produces this output:
#include <vector>
#include <iostream>
#include <string>
#include <chrono>
#include <random>
#include <fstream>
using RV = std::vector<std::vector<float>>;
void saveData(std::string path, const RV& RecordData)
{
std::ofstream FILE(path, std::ios::out | std::ofstream::binary);
// Store size of the outer vector
int s1 = RecordData.size();
FILE.write(reinterpret_cast<const char*>(&s1), sizeof(s1));
// Now write each vector one by one
for (auto& v : RecordData) {
// Store its size
int size = v.size();
FILE.write(reinterpret_cast<const char*>(&size), sizeof(size));
// Store its contents
FILE.write(reinterpret_cast<const char*>(&v[0]), v.size() * sizeof(float));
}
FILE.close();
}
//original version for comparison
void loadData1(std::string path, RV& RecordData)
{
std::ifstream FILE(path, std::ios::in | std::ifstream::binary);
if (RecordData.size() > 0) // Clear data
{
for (int n = 0; n < RecordData.size(); n++)
RecordData[n].clear();
RecordData.clear();
}
int size = 0;
FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
RecordData.resize(size);
for (int n = 0; n < size; ++n) {
int size2 = 0;
FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
float f;
//RecordData[n].resize(size2); // This doesn't make a difference in speed
for (int k = 0; k < size2; ++k) {
FILE.read(reinterpret_cast<char*>(&f), sizeof(f));
RecordData[n].push_back(f);
}
}
}
//my version
void loadData2(std::string path, RV& RecordData)
{
std::ifstream FILE(path, std::ios::in | std::ifstream::binary);
if (RecordData.size() > 0) // Clear data
{
for (int n = 0; n < RecordData.size(); n++)
RecordData[n].clear();
RecordData.clear();
}
int size = 0;
FILE.read(reinterpret_cast<char*>(&size), sizeof(size));
RecordData.resize(size);
for (auto& v : RecordData) {
// load its size
int size2 = 0;
FILE.read(reinterpret_cast<char*>(&size2), sizeof(size2));
v.resize(size2);
// load its contents
FILE.read(reinterpret_cast<char*>(&v[0]), v.size() * sizeof(float));
}
}
int main()
{
using namespace std::chrono;
const std::string filepath = "./vecdata";
const std::size_t sizeOuter = 16000;
const std::size_t sizeInner = 32000;
RV vecSource;
RV vecLoad1;
RV vecLoad2;
const auto tGen1 = steady_clock::now();
std::cout << "generating random numbers..." << std::flush;
std::random_device dev;
std::mt19937 rng(dev());
std::uniform_real_distribution<float> dis;
for(int i = 0; i < sizeOuter; ++i)
{
RV::value_type inner;
for(int k = 0; k < sizeInner; ++k)
{
inner.push_back(dis(rng));
}
vecSource.push_back(inner);
}
const auto tGen2 = steady_clock::now();
std::cout << "done\nSaving..." << std::flush;
const auto tSave1 = steady_clock::now();
saveData(filepath, vecSource);
const auto tSave2 = steady_clock::now();
std::cout << "done\nReading v1..." << std::flush;
const auto tLoadA1 = steady_clock::now();
loadData1(filepath, vecLoad1);
const auto tLoadA2 = steady_clock::now();
std::cout << "verifying..." << std::flush;
if(vecSource != vecLoad1) std::cout << "FAILED! ...";
std::cout << "done\nReading v2..." << std::flush;
const auto tLoadB1 = steady_clock::now();
loadData2(filepath, vecLoad2);
const auto tLoadB2 = steady_clock::now();
std::cout << "verifying..." << std::flush;
if(vecSource != vecLoad2) std::cout << "FAILED! ...";
std::cout << "done\nResults:\n" <<
"Generating took " << duration_cast<milliseconds>(tGen2 - tGen1).count() << " ms\n" <<
"Writing took " << duration_cast<milliseconds>(tSave2 - tSave1).count() << " ms\n" <<
"Reading v1 took " << duration_cast<milliseconds>(tLoadA2 - tLoadA1).count() << " ms\n" <<
"Reading v2 took " << duration_cast<milliseconds>(tLoadB2 - tLoadB1).count() << " ms\n" <<
std::flush;
}
First of all, since you know the number of elements up front, you should reserve space in your vector to prevent unnecessary reallocations as the vector grows. Secondly, all those push_backs are probably costing you. That function does have some overhead. And thirdly, as Alan says, reading the entire file all in one go can't possibly hurt, which you can do if you resize (as opposed to reserve) the vector first.
So, with all that said, I would do this (once you have read the size of the data into size2):
RecordData.resize(size2); // both reserves and allocates space for size2 items
FILE.read(reinterpret_cast<char*>(RecordData.data()), size2 * sizeof(float));
I would think this is optimal.
It's unfortunate in cases like this, IMO, that std::vector insists on zero-initialising all size2 elements when you call resize since you're immediately going to overwrite them, but I don't know of an easy easy to prevent this. You'd need to get into custom allocators, and it's probably not worth the effort.
//RecordData[n].resize(size2); // This doesn't make a difference in speed
If you use this line (while not changing the rest of the code) one should expect the code to be slower, not faster!
resize changes the size of the vector and then you push more elements to it, resulting in a vector of double the size you actually need.
I suppose you wanted reserve instead. reserve only allocates capacity without changing the size of the vector and then pushing elements can be expected to be faster, because memory is only allocated once.
Alternatively use resize and then assign to already existing elements.

Read int through char * binary data from a file with std::ifstream::read()

Background: This question is a follow up of this one.
The given answer suggesting to access the data through unsigned char * instead of char* worked successfully.
Main question: But how can we do if we have no choice ? (i.e. if char* imposed by a function prototype).
Context:
Let's assume that we have written an int array in binary format into a file.
It may look as (without errors checking):
const std::string bin_file("binary_file.bin");
const std::size_t len(10);
int test_data[len] {-4000, -3000, -2000, -1000, 0, 1000, 2000, 3000, 4000, 5000};
std::ofstream ofs(bin_file, std::ios::trunc | std::ios::binary);
for(std::size_t i = 0; i < len; ++i)
{
ofs.write(reinterpret_cast<char*>(&test_data[i]), sizeof test_data[i]);
}
ofs.close();
Now I want to open the file, read it and print the previously written data one by one.
The opening is performed as follows (without errors checking):
std::ifstream ifs(bin_file, std::ios::binary); // open in binary mode
// get the length
ifs.seekg(0, ifs.end);
std::size_t byte_size = static_cast<std::size_t>(ifs.tellg());
ifs.seekg(0, ifs.beg);
At this point, byte_size == len*sizeof(int).
Possible solutions:
I know that I can do it either by:
int val;
for(std::size_t i = 0; i < len; ++i)
{
ifs.read(reinterpret_cast<char*>(&val), sizeof val);
std::cout << val << '\n';
}
or by:
int vals[len];
ifs.read(reinterpret_cast<char*>(vals), static_cast<std::streamsize>(byte_size));
for(std::size_t i = 0; i < len; ++i)
std::cout << vals[i] << '\n';
Both of these solutions work fine but none of them are the purpose of this question.
Problem description:
I consider here the case where I want to get the full binary file contents into a char* and handle it afterwards.
I cannot use an unsigned char* since std::ifstream::read() is expecting a char*.
I tried:
char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));
int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
// Get the value via std::memcpy works fine
//std::memcpy(&val, &buff[i*sizeof val], sizeof val);
// Get the value via bit-wise shifts fails (guess: signedness issues)
for(std::size_t j = 0; j < sizeof val; ++j)
val |= reinterpret_cast<unsigned char *>(buff)[i*sizeof val + j] << CHAR_BIT*j; // For little-endian
std::cout << val << '\n';
}
delete[] buff;
ifs.close();
With std::memcpy to copy the 4 bytes into the int, I got the expected results (the printed vals are the same values than the original ones).
With bit-wise shifting, even with reinterpret_cast<unsigned char*>ing the buffer, I got trash values resulting in failing to get back the original int value (the printed vals are "garbage" values: not the same values than the original ones).
My question is: What does std::memcpy to be able to get the right values back from a char* instead of an unsigned char* while it is not possible with my bit-wise shifting ?
And how could I solve it without using std::memcpy (for general interest purposes) ? I could not figure it out.
Ok, this was a really stupid error, shame on me.
Actually, I forgot to reset val to zero before each next iteration...
The problem was not related to the bit-wise shifting, and the reinterpret_cast<unsigned char *> worked successfully.
The corrected version should be:
char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));
int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
for(std::size_t j = 0; j < sizeof val; ++j)
val |= reinterpret_cast<unsigned char *>(buff)[i*sizeof val + j] << CHAR_BIT*j; // For little-endian
std::cout << val << '\n';
val = 0; // Reset the val
}
delete[] buff;
ifs.close();
For those who don't like casting, we can replace it with a mask as follows:
char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));
int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
int mask = 0x000000FF;
for(std::size_t j = 0; j < sizeof val; ++j)
{
val |= (buff[i*sizeof val + j] << CHAR_BIT*j) & mask; // For little-endian
mask = mask << CHAR_BIT;
}
std::cout << val << '\n';
val = 0; // Reset the val
}
delete[] buff;
ifs.close();
Perfect example when the issue comes from between the keyboard and the chair :)

Write byte to ofstream in C++

I have char array, and i want to write it to txt file, but in bytes.
ofstream sw("C:\\Test.txt");
for(int i = 0; i < 256; i++)
{
sw << (byte)myArray[i];
}
This will write chars into the file, but i want to write bytes. If there is char 'a' i want to write '97'. Thank you.
To write a byte or a group of bytes using std::fstream or std::ofstream, you use the write() function: std::ostream::write()
const int ArraySize = 100;
Byte byteArray[ArraySize] = ...;
std::ofstream file("myFile.data");
if(file)
{
file.write(&byteArray[0], ArraySize);
file.write(&moreData, otherSize);
}
ofstream sw("C:\\Test.txt");
for(int i = 0; i < 256; i++)
{
sw << (int)myArray[i];
}
This will convert a char 'a' to an int(or byte) value 97.

fwrite, fread - problems with fread

I have following code:
int main()
{
char* pedal[20];
char* pedal2[20];
for (int i = 0; i < 20; i++)
{
pedal[i] = "Pedal";
}
FILE* plik;
plik = fopen("teraz.txt","wb");
for (int i = 0; i < 20; i++)
{
fwrite(pedal[i],strlen(pedal[i]),1,plik);
}
system("pause");
fclose(plik);
plik = fopen("teraz.txt","rb");
for (int i = 0; i < 20; i++)
{
fread(pedal2[i],5,1,plik); //I know for now that every element has 5 bytes
}
for (int i = 0; i < 20; i++)
{
std::cout << pedal2[i] << std::endl;
}
fclose(plik);
system("pause");
return 0;
}
It's crashing at reading and second question let's assume that I have structure where I keep like integers, floats and also char* array and how can I easly write whole structure to the file? Normal fwrite with sizeof structure is not working
Your problem that you didn't allocate buffer for reading. In fact line
fread(pedal2[i],5,1,plik)
reads to unknown place. You need allocate memory (in your case it is 5 + 1 bytes for zero terminated string).
pedal2[i] = malloc(5+1);
fread(pedal2[i],5,1,plik)
Don't forget to release it after usage.
You can't read into pedal2 without first having allocated space for it.
You need something like this:
for (int i = 0; i < 20; ++i) {
pedal[i] = malloc(100); //allocate some space
}
Your first question seem to have already been answered by Simone & Dewfy.
For your second question about how to write structure values into the file, you will have to write member by member.
Please check fprintf. You can probably use it for writing different data types.

Easiest way to repeat a sequence of bytes into a larger buffer in C++

Given (in C++)
char * byte_sequence;
size_t byte_sequence_length;
char * buffer;
size_t N;
Assuming byte_sequence and byte_sequence_length are initialized to some arbitrary length sequence of bytes (and its length), and buffer is initialized to point to N * byte_sequence_length bytes, what would be the easiest way to replicate the byte_sequence into buffer N times? Is there anything in STL/BOOST that already does something like this?
For example, if the sequence were "abcd", and N was 3, then buffer would end up containing "abcdabcdabcd".
I would probably just go with this:
for (int i=0; i < N; ++i)
memcpy(buffer + i * byte_sequence_length, byte_sequence, byte_sequence_length);
This assumes you are dealing with binary data and are keeping track of the length, not using '\0' termination.
If you want these to be c-strings you'll have to allocate an extra byte and add in the '\0' a the end. Given a c-string and an integer, you'd want to do it like this:
char *RepeatN(char *source, size_t n)
{
assert(n >= 0 && source != NULL);
size_t length = strlen(source) - 1;
char *buffer = new char[length*n + 1];
for (int i=0; i < n; ++i)
memcpy(buffer + i * length, source, length);
buffer[n * length] = '\0';
}
Repeating the buffer while avoiding pointer arithmetic:
You can use std::vector < char > or std::string to make things easier for you. Both of these containers can hold binary data too.
This solution has the nice properties that:
You don't need to worry about memory access violations
You don't need to worry about getting the size of your buffer correct
You can append sequences at any time to your buffer without manual re-allocations
.
//Note this works even for binary data.
void appendSequenceToMyBuffer(std::string &sBuffer
, const char *byte_sequence
, int byte_sequence_length
, int N)
{
for(int i = 0; i < N; ++i)
sBuffer.append(byte_sequence, byte_sequence_length);
}
//Note: buffer == sBuffer.c_str()
Alternate: For binary data using memcpy:
buffer = new char[byte_sequence_length*N];
for (int i=0; i < N; ++i)
memcpy(buffer+i*byte_sequence_length, byte_sequence, byte_sequence_length);
//...
delete[] buffer;
Alternate: For null terminated string data using strcpy:
buffer = new char[byte_sequence_length*N+1];
int byte_sequence_length = strlen(byte_sequence);
for (int i=0; i < N; ++i)
strcpy(buffer+i*byte_sequence_length, byte_sequence, byte_sequence_length);
//...
delete[] buffer;
Alternate: If you are filling the buffer with a single value:
buffer = new char[N];
memset(buffer, byte_value, N);
//...
delete[] buffer;
If N is known to be a power of 2, you can copy from the first part of your buffer to subsequent parts, increasing the amount copied each time:
assert((N > 0) && ((N & (N-1)) == 0));
memcpy(buffer, byte_sequence, byte_sequence_length);
for (size_t i = 1; i < N; i *= 2)
memcpy(buffer + i * byte_sequence_length, buffer, i * byte_sequence_length);
Edit: It is trivial to extend this to work when N is not a power of 2. Here's an improved version, which removes all constraints on N and also replaces the odd for statement with a while.
if (N > 0)
memcpy(buffer, byte_sequence, byte_sequence_length);
size_t copied = 1;
while (copied < N)
{
size_t tocopy = min(copied, N - copied);
memcpy(buffer + copied * byte_sequence_length, buffer, tocopy * byte_sequence_length);
copied += tocopy;
}
You can use the STL algorithm Generate:
MSDN: Generate