C++ converting string containing non human readable data to 200 double

C++ converting string containing non human readable data to 200 double - c++

I have a string whose length is 1600 and I know that it contains 200 double. When I print out the string I get the following :Y���Vz'#��y'#��!U�}'#�-...
I would like to convert this string to a vector containing the 200 doubles.
Here is the code I tried (blobString is a string 1600 characters long):
string first_eight = blobString.substr(0, sizeof(double)); // I get the first 8 values of the string which should represent the first double
double double_value1
memcpy(&double_value1, &first_eight, sizeof(double)); // First thing I tried
double* double_value2 = (double*)first_eight.c_str(); // Second thing I tried
cout << double_value1 << endl;
cout << double_value2 << endl;
This outputs the following:
6.95285e-310
0x7ffd9b93e320
--- Edit solution---
The second method works all I had to do was look to where double_value1 was pointing.
cout << *double_value2 << endl;

Here's an example that might get you closer to what you need. Bear in mind that unless the numbers in your blob are in the exact format that your particular C++ compiler expects, this isn't going to work like you expect. In my example I'm building up the buffer of doubles myself.
Let's start with our array of doubles.
double doubles[] = { 0.1, 5.0, 0.7, 8.6 };
Now I'll build an std::string that should look like your blob. Notice that I can't simply initialize a string with a (char *) that points to the base of my list of doubles, as it will stop when it hits the first zero byte!
std::string double_buf_str;
double_buf_str.append((char *)doubles, 4 * sizeof(double));
// A quick sanity check, should be 32
std::cout << "Length of double_buf_str "
<< double_buf_str.length()
<< std::endl;
Now I'll reinterpret the c_str() pointer as a (double *) and iterate through the four doubles.
for (auto i = 0; i < 4; i++) {
std::cout << ((double*)double_buf_str.c_str())[i] << std::endl;
}
Depending on your circumstances you might consider using a std::vector<uint8_t> for your blob, instead of an std::string. C++11 gives you a data() function that would be the equivalent of c_str() here. Turning your blob directly into a vector of doubles would give you something even easier to work with--but to get there you'd potentially have to get dirty with a resize followed by a memcpy directly into the internal array.
I'll give you an example for completeness. Note that this is of course not how you would normally initialize a vector of doubles...I'm imagining that my double_blob is just a pointer to a blob containing a known number of doubles in the correct format.
const int count = 200; // 200 doubles incoming
std::vector<double> double_vec;
double_vec.resize(count);
memcpy(double_vec.data(), double_blob, sizeof(double) * count);
for (double& d : double_vec) {
std::cout << d << std::endl;
}
#Mooning Duck brought up the great point that the result of c_str() is not necessarily aligned to an appropriate boundary--which is another good reason not to use std::string as a general purpose blob (or at least don't interpret the internals until they are copied somewhere that guarantees a valid alignment for the type you are interested in). The impact of trying to read a double from a non-aligned location in memory will vary depending on architecture, giving you a portability concern. In x86-based machines there will only be a performance impact AFAIK as it will read across alignment boundaries and assemble the double correctly (you can test this on a x86 machine by writing then reading back a double from successive locations in a buffer with an increasing 1-byte offset--it'll just work). In other architectures you'll get a fault.
The std::vector<double> solution will not suffer from this issue due to guarantees about the alignment of newed memory built into the standard.

Related

Reinterpret casted value varies by compiler

For the same program:
const char* s = "abcd";
auto x1 = reinterpret_cast<const int64_t*>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << *x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
In gcc5(link): 139639660962401
In gcc8(link): 1684234849
Why does the value vary according to different compiler versions?
What is then a compiler safe way to move from const char* to int64_t and backward(just like in this problem - not for actual integer strings but one with other chars as well)?

Why does the value vary according to different compiler versions?
Behaviour is undefined.
What is then a compiler safe way to move from const char* to int64_t and backward
It is somewhat unclear what you mean by "move from const char* to int64_t". Based on the example, I assume you mean to create a mapping from a character sequence (of no greater length than fits) into a 64 bit integer in a way that can be converted back using another process - possibly compiled by another (version of) compiler.
First, create a int64_tobject, initialise to zero:
int64_t i = 0;
Get length of the string
auto len = strlen(s);
Check that it fits
assert(len < sizeof i);
Copy the bytes of the character sequence onto the integer
memcpy(&i, s, len);
(As long as the integer type doesn't have trap representations) The behaviour is well defined, and the generated integer will be the same across compiler versions as long as the CPU endianness (and negative number representation) remains the same.
Reading the character string back doesn't require copying because char is exceptionally allowed to alias all other types:
auto back = reinterpret_cast<char*>(&i);
Note the qualification in the last section. This method does not work if the integer is passed (across the network for example) to process running on another CPU. That can be achieved as well by bit shifting and masking so that you copy octets to certain position of significance using bit shifting and masking.

When you dereference the int64_t pointer, it is reading past the end of the memory allocated for the string you casted from. If you changed the length of the string to at least 8 bytes, the integer value would become stable.
const char* s = "abcdefg"; // plus null terminator
auto x1 = reinterpret_cast<const int64_t*>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << *x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"
If you wanted to store the pointer in an integer instead, you should use intptr_t and leave out the * like:
const char* s = "abcd";
auto x1 = reinterpret_cast<intptr_t>(s);
auto x2 = reinterpret_cast<const char*>(x1);
std::cout << x1 << std::endl;
std::cout << x2 << std::endl; // Always "abcd"

Based on what RemyLebeau pointed out in the comments of your post,
unsigned 5_byte_mask = 0xFFFFFFFFFF; std::cout << *x1 & 5_byte_mask << std::endl;
Should be a reasonable way to get the same value on a little endian machine with whatever compiler. It may be UB by one specification or another, but from a compiler's perspective, you're dereferencing eight bytes at a valid address that you have initialized five bytes of, and masking off the remaining bytes that are uninitialized / junk data.

How to test if a particular C++ statement is faster or slower than other? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
For example when printing a single character like a new line character, which might be faster while using cout in C++, passing as string or as character?
cout << "\n";
Or
cout << '\n';
This video motivated me to write efficient codes.
How would you go about testing such things? Maybe I might want to test other things to see which is faster so it would be helpful to know how I can test these things myself.

In theory, yes, using '\n' instead of "\n" is quite faster when took out the elapsed time of printing 1000 occurrences of the same good-ol new-line:
Remember: A single char possibly cannot be slower than a pointer... since a pointer points to addresses of each char (like a container) and this is why its byte size is not fixed, and a char only has one address and that is itself... of only 1 byte
// Works with C++17 and above...
#include <iostream>
#include <chrono>
template<typename T, typename Duration = std::chrono::milliseconds, typename ...Args>
constexpr static auto TimeElapsedOnOperation(T&& functor, Args&&... arguments)
{
auto const ms = std::chrono::steady_clock::now();
std::invoke(std::forward<decltype(functor)>(functor),
std::forward<Args>(arguments)...);
return std::chrono::duration_cast<std::chrono::
milliseconds>(std::chrono::steady_clock::now() - ms);
}
int main()
{
std::cout << TimeElapsedOnOperation([]
{
for (auto i = 0; i < 1000; i++)
std::cout << "\n";
}).count() << std::endl;
std::cin.get();
std::cout << TimeElapsedOnOperation([]
{
for (auto i = 0; i < 1000; i++)
std::cout << '\n';
}).count() << std::endl;
std::cin.get();
return 0;
}
It gave the following output: (Can occur differently...)
<1000> newlines follow...
2195 milliseconds For the string "\n"
More <1000> newlines follow...
852 milliseconds For the character '\n'
2195 - 852 = 1343 milliseconds
It took 1343 (1.343 seconds) milliseconds longer... So we can take the approximation that it was 61.18% (1343 / 2195 * 100) slower than using just '\n'
This is just an approximation since the performance can differ in other machines...
As to why this happens:
A single character (1 byte) constant is much smaller (in bytes) than a string having a single character since a string is a pointer to char (Points to specified addresses in memory) which takes up more space than a single char in the memory since it is a container (for memory addresses of each character) after all... (i.e, const char*)...
There is a difference how a character and a string is read... A character is directly accessed while the string is iterated and the operations are performed for each individual character pointed and the result is stored back inside the address of the pointer...
A string is always a char array, while a char is safely considered an integer containing the respective numerical value (Extended ASCII, from which different character encodings are branched) of it, a string of 1 character is an array of 1 character (along with its address...), which, in fact, is not equal to a single char...
So maybe (just maybe) you are on the better side of using '\n' instead...
However, some "tricky" compiler may optimize your code from "\n" to '\n' anytime..., so, actually, we never can guess, but still, it is considered good practice to declare a char as a char...

Theoretical considerations only:
A single character can be just printed out as is, a string needs to be iterated over to find the terminating null character.
A single character can be passed and used directly as value; a string is passed by pointer, so the address must be resolved before the character(s) can be used.
So at least, the single character cannot be slower. However, a sufficiently clever compiler might spot the constant one-character string and optimise any difference away (especially if operator<< is inline).
How to test: At very first, you'd be interested in a system that might disturb the test as little as possible (context switches between threads are expensive), so best close any open applications.
A very simple test program might repeatedly use both operators sufficiently often, something like:
for(uint32_t loop = 0; loop < SomeLimit; ++loop)
{
// take timestamp in highest precision possible
for(uint32_t i = 0; i < Iterations; ++i)
{
// single character
}
// calculate difference to timestamp, add to sum for character
// take timestamp in highest precision possible
for(uint32_t i = 0; i < Iterations; ++i)
{
// string
}
// calculate difference to timestamp, add to sum for string
}
Interleaving character and string output might help to get a better average over runtime if OS activities vary during the test, the inner loops should run sufficiently long to get a reasonable time interval for measurement.
The longer the program runs, the more precise the output will be. To prevent overflow, use uint64_t to collect the sums (your program would have to run more than 200000 days even with ns precision to overflow...).

Unknown behavior of uint64_t datatype

I am trying to store a very big number in a uint64_t like:
int main(int argc, char** argv) {
uint64_t ml = sizeof(void*)*(1<<63);
cout << "ml=" << ml << "\n";
const char* r;
const char* mR=r+ml;
return 0;
}
But I do not know why am I getting the output as 0, despite of storing it in a uint64_t datatype?
EDIT: char* mR is my memory buffer and I can increase my memory buffer to at most ml. I want to make use of 64GB RAM machine. So, can you suggest how much should I increment mR to..as I want to use all the available RAM. That is to what value should I set ml to?

Try
uint64_t ml = ((uint64_t)1)<<63;
or just
uint64_t ml = 0x8000000000000000;
Just 1 << 63 uses integers, and if the shift value is too big, it is undefined behavior. In you case, it may result in 0 due to overflow.
Please note that if you multiply 0x8000000000000000 by sizeof(void*), you'll likely get overflow too.

If you want to allocate 64G of memory, that would be:
char* buffer = new char[64ULL * 1024 * 1024 * 1024];
or simply:
char* buffer = new char[1ULL << 36];
Note that 64G is 2^36 bytes, which is far, far less than the 2^63 number that you're trying to use. Although, typically when you use that much memory, it's because your program organically uses it through various operations... not by just allocating it in one large chunk.

Just use:
uint64_t ml = sizeof(void*) * (1ULL << 63);
Because, as AlexD already said, 1 << 63 uses integers, and 1 << 63 is actually 0.

Even after you correct the shift to (uint64_t)1 << 63, if sizeof(void*) is any even number (and it assuredly is), then your product will be divisible by 2^64, and thus be zero when stored in a uint64_t.

Visual Studio Release and Debug difference

I'm writing a code in which I read and image and process it and get a Mat of double/float. I'm saving it to a file and later on, I'm reading it from that file.
When I use double, the space it requires is 8MB for 1Kx1K image, when I use float it is 4MB. So I want to use float.
Here is my code and output:
Mat data = readFloatFile("file_location");
cout << data.at<float>(0,0) << " " << data.at<double>(0,0);
When I run this code in the DEBUG mode, the print out for float is -0 and double gives exception namely assertion failed. But when I use RELEASE mode the print out for float is -0 and 0.832 for double which is true value.
My question is why I cant get output when I use data.at<float>(0,0) and why I don't get exception when I use data.at<double>(0,0) in RELEASE mode which is supposed to be the case?
EDIT: Here is my code which writes and reads
void writeNoiseFloat(string imageName,string fingerprintname) throw(){
Mat noise = getNoise(imageName);
FILE* fp = fopen(fingerprintname.c_str(),"wb");
if (!fp){
cout << "not found ";
perror("fopen");
}
float *buffer = new float[noise.cols];
for(int i=0;i<noise.rows;++i){
for(int j=0;j<noise.cols;++j)
buffer[j]=noise.at<float>(i,j);
fwrite(buffer,sizeof(float),noise.cols,fp);
}
fclose(fp);
free(buffer);
}
void readNoiseFloat(string fpath,Mat& data){
clock_t start = clock();
cout << fpath << endl;
FILE* fp = fopen(fpath.c_str(),"rb");
if (!fp)perror("fopen");
int size = 1024;
data.create(size,size,CV_32F);
float* buffer= new float[size];
for(int i=0;i<size;++i) {
fread(buffer,sizeof(float),size,fp);
for(int j=0;j<size;++j){
data.at<float>(i,j)=buffer[j];
cout << data.at<float>(i,j) << " " ;
cout << data.at<double>(i,j);
}
}
fclose(fp);
}
Thanks in advance,

The first of all, you can not use the float and double in one cv::Mat as storage itself is only array of bytes. Size of this array will be different for matrix of float and matrix of double.
So, you have to decide what you are using.
Essentially, data.at<type>(x,y) is equivalent to (type*)data_ptr[x][y] (note this is not exact code, its purpose is to show what is happening)
EDIT:
On the basis of code you added you are creating matrix of CV_32F this means that you must use float to write and read and element. Using of double causes reinterpretation of value and will definitely give you an incorrect result.
Regarding to assertion, I am sure that inside the cv::MAT::at<class T> there is a kind of following code:
assert(sizeof<T>==this.getDepth());
Usually asserts are compiled only in DEBUG mode, so that's why you do not give this error in RELEASE.
EDIT2:
Not regarding to issue, but never use free() with new or delete with malloc(). The result can be a hardly debugging issue.
So please use delete[] for buffer.

Difference between debug & release:
There's a bug in your code. It just doesn't show up in Release mode. That is what the debugger is for. Debugger tells you if there's any bug/issue with the code, Release just runs through it...
Also the compiler optimizes your code to run faster and is therefore smaller, the debugger uses more size on your HD because you can actually DEBUG it.
Release initializes your un-initialized variables to 0. This may vary on different compilers.

Serialize a vector of floats

The problem is that I want to save a 2D vector of float in a file. since I am not familiar with C++ which is becoming troublesome, possible ways to solve are:
Serializing them in a string and write to a file.
Serializing them to binary data and write to a file.
Which one of the two methods could be more efficient in terms of speed?
I am doing something like:
std::string serialized;
for (int s = 0; s < (int) mfcc_features_a.size(); s++)
{
for (int t = 0; t < (int) mfcc_features_a[s].size(); t++){
serialized = serialized + "|" + boost::lexical_cast<std::string>(mfcc_features_a[s][t]);
}
}
std::cout << "serialized string is: " << serialized << std::endl;

Storing binary data is liable to be somewhat faster, since the data will almost certainly be smaller. The difference may or may not be significant to the overall performance of your program: you'd have to measure in order to find out.
In C++03 there is a major inefficiency in your code. specialized = specialized + "|" + ... creates gradually longer and longer copies of the full data, three copies per float value. Either use +=, or write the data directly to a stream. In C++11 you could solve it by writing specialized = std::move(specialized) + "|" + ...

While binary is certain to be faster to execute, it can be troublesome to code and debug, as floating point formats are poorly understood by most programmers. In this respect, the overall time programming plus execution could be slower.
Also, if portability of the data to any other machine is desired, almost certainly it is worthwhile to convert to a universally readable format.

Although not very C++'ish, I like to use the sprintf routine to format the floating point values up to a fixed width string (24 characters).
char *pData = new char[vec.size() * 24 + 1];
char *p = pData;
for (size_t i = 0; i < vec.size(); ++i, p += 24)
sprintf(p, "%+.14E\r\n", vec(i));
// ... write pData to file ...
delete[] pData;
Good luck!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ converting string containing non human readable data to 200 double - c++

Related

Reinterpret casted value varies by compiler

How to test if a particular C++ statement is faster or slower than other? [closed]

Unknown behavior of uint64_t datatype

Visual Studio Release and Debug difference

Serialize a vector of floats

Categories

Resources