We have a file which is present in a data store (S3) which contains data in the form of byte[] (uploaded using Java language).
Now when i download the file, the data i get is in the form of std::basic_streambuf (Ideally it should also be having bytes). Now i want to send this data to another API which takes uint8_t* as the input.
What is the way to do so? Is it making any sense to even do that?
I tried this:
// Assume streambuf is:
std::streambuf *buf;
std::stringstream ss;
ss << buf;
// Solution1
const std::string output1 = ss.str();
cout<<output1;
// This prints the whole data with some weird characters (i think weird characters are valid because data is in byte form). Upon converting output1 to uint8_t*, the final data contains only 20 characters/bytes.
// Solution2
uint8_t* finalString;
ss >> finalString;
cout<<finalString;
// This prints only initial 20 characters/bytes and the remaining part is skipped.
So with both Solution1 and Solution2, ultimate goal of getting uint8_t* of full data could not be achieved. What is the suggested way to do so?
You have to read your data out of the buffer (since the buffer itself can be streaming the data in as it's available). One possible implementation is something like this:
vector<uint8_t> bytes;
do {
bytes.push_back(buf->sgetc());
} while(buf->snextc() != EOF);
// your data is in bytes.data() of type uint8_t*
Of course if you know the number of bytes from the beginning instead of having to read the buffer to find out, simply pre-allocate the vector beforehand.
Related
I have the following function (so far):
void read_binary_file(std::istream is,
ByteArray arr)
{
int length = is.tellg();
char *buffer = new char[length];
is.read(buffer, length);
// What to do next?
// The goal is to place istream buffer in my `ByteArray` class `values`class,
// ByteArray - an array of `float`, each item should be 4 bytes from the buffer
}
My goal is to place each 4 bytes from the buffer inside my ByteArray->values class. Each item should contain 4 bytes from the buffer.
ByteArray definition:
class ByteArray
{
....
float *values;
}
Limitations: I don't want to use stl/ vector classes.
I couldn't find an example with my current limitations.
Any idea how I can do that?
If I understand correctly, you want to create a ByteArray object and copy bytes from buffer to ByteArray::values[] as floats. Assuming that the file is opened in binary mode & contain floats dumped in correct format+endianness, and total data in file is multiple of sizeof(float):
class ByteArray
{
private:
float* values;
public:
void set(char* buffer, int len)
{
values = new float[len/4];
for(int itr =0; itr < len/4; itr++)
{
values[itr] = *(float*)(buffer+itr*4);
}
}
};
...
arr.set(buffer, length);
Note that i) smarter codes are possible but I kept it as simple as possible for your understanding. ii) Ulrich is right, you should pass istream by reference (as well as ByteArray for most practical purposes):
void read_binary_file(std::istream& is,
ByteArray& arr)
...
If you want to use istream to send bytes byte by byte you can say
arr.values=(float*)buffer;
or
arr.values=new float[length/4];
memcpy(arr.values,buffer,length);
delete[] buffer;
It works until you want to send a float which contains a eof byte by accident. 2 is a float like that, so it isn't uncommon. Then you can't do anything as istream stops at that byte. So I recommend not to send floats byte by byte in stringteams. Send them an other way eg in hexa. (hat way you don't loose precision).
What generated the file you want to read?
I've filled a stringstream with data in cycle. About 2mb of data. Then I need to read the data.
First, I use such cycle:
while (ss >> str)
{}
At the end of the cycle, str contains the last chunk of data. But in this case, I can't read \r\n symbols.
Well, I try to use this method:
int length = 20.000.000;
char * buffer = new char[length];
ss.read(buffer, length);
And the buffer contains little more than half of the data. After flushing.
Why is this?
I am taking input from a file in binary mode using C++; I read the data into unsigned ints, process them, and write them to another file. The problem is that sometimes, at the end of the file, there might be a little bit of data left that isn't large enough to fit into an int; in this case, I want to pad the end of the file with 0s and record how much padding was needed, until the data is large enough to fill an unsigned int.
Here is how I am reading from the file:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//TODO: read the remaining data and pad it here prior to processing
} else {
//output to error stream and exit with failure condition
}
The TODO in the code is where I'm having trouble. After the file input finishes and the loop exits, I need to read in the remaining data at the end of the file that was too small to fill an unsigned int. I need to then pad the end of that data with 0's in binary, recording enough about how much padding was done to be able to un-pad the data in the future.
How is this done, and is this already done automatically by C++?
NOTE: I cannot read the data into anything but an unsigned int, as I am processing the data as if it were an unsigned integer for encryption purposes.
EDIT: It was suggested that I simply read what remains into an array of chars. Am I correct in assuming that this will read in ALL remaining data from the file? It is important to note that I want this to work on any file that C++ can open for input and/or output in binary mode. Thanks for pointing out that I failed to include the detail of opening the file in binary mode.
EDIT: The files my code operates on are not created by anything I have written; they could be audio, video, or text. My goal is to make my code format-agnostic, so I can make no assumptions about the amount of data within a file.
EDIT: ok, so based on constructive comments, this is something of the approach I am seeing, documented in comments where the operations would take place:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//1: declare Char array
//2: fill it with what remains in the file
//3: fill the rest of it until it's the same size as an unsigned int
} else {
//output to error stream and exit with failure condition
}
The question, at this point, is this: is this truly format-agnostic? In other words, are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size? I should know this, I know, but I've got to ask it anyway.
Are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size?
No data type can be less than a byte, and your file is represented as an array of char meaning each character is one byte. Thus it is impossible to not get a whole number measure in bytes.
Here is step one, two, and three as per your post:
while (fin >> m)
{
// ...
}
std::ostringstream buffer;
buffer << fin.rdbuf();
std::string contents = buffer.str();
// fill with 0s
std::fill(contents.begin(), contents.end(), '0');
The reader and writer
#include<string>
#include<fstream>
#include<memory>
class BinarySearchFile{
BinarySearchFile::BinarySearchFile(std::string file_name){
// concatenate extension to fileName
file_name += ".dat";
// form complete table data filename
data_file_name = file_name;
// create or reopen table data file for reading and writing
binary_search_file.open(data_file_name, std::ios::binary); // create file
if(!binary_search_file.is_open()){
binary_search_file.clear();
binary_search_file.open(data_file_name, std::ios::out | std::ios::binary);
binary_search_file.close();
binary_search_file.open(data_file_name), std::ios::out | std::ios::in | std::ios::binary | std::ios::ate;
}
std::fstream binary_search_file;
void BinarySearchFile::writeT(std::string attribute){
if(binary_search_file){
binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
}
}
std::string BinarySearchFile::readT(long filePointerLocation, long sizeOfData)
{
if(binary_search_file){
std::string data;
data.resize(sizeOfData);
binary_search_file.seekp(filePointerLocation);
binary_search_file.seekg(filePointerLocation);
binary_search_file.read(&data[0], sizeOfData);
return data;
}
};
The reader call
while (true){
std::unique_ptr<BinarySearchFile> data_file(new BinarySearchFile("classroom.dat"));
std::string attribute_value = data_file->read_data(0, 20);
}
The writer call
data_file->write_data("packard ");
The writer writes a total of 50 bytes
"packard 101 500 "
The reader is to read the first 20 bytes and the result is "X packard X" where X represents some malformed bytes of data. Why is the data read back in x-number of bytes corrupt?
You can't simply write data out by casting it's address to a char* and hoping to get anything useful. You have to define the binary format you want to use, and implement it. In the case of std::string, this may mean outputing the length in some format, then the actual data. Or in the case where fixed length fields are needed, forcing the string (or a copy of the string) to that length using std::string::resize, then outputting that, using std::string::data() to get your char const*.
Reading will, of course, be similar. You'll read the data into a std::vector<char> (or for fixed length fields, a char[]), and parse it.
binary_search_file.write(reinterpret_cast<char *>(&attribute), attribute.length() * 2);
It is incorrect to cast std::string to char* if you need char* you must use attribute.c_str().
std::string apart from string pointer contains other data members, for example, allocator, your code will write all of that data to file. Also I don't see any reason to multiply string length by 2. +1 makes sense if you want to output terminating zero.
I am doing a synchronous read/write using boost-asio. The data is coming in binary format, without boundary, the length information is encoded in the packet format. So it is important to read in with specified size. Can ip::tcp::iostream do that? Can someone provide an example? Thanks.
Simple:
boost::asio::read(socket, buffers, boost::asio::transfer_exactly(your_fixed_size));
I work on a program wich send different data with different size. I use a fixed header of 8 byte to encode the size, then, I add the data :
enum { header_length = 8 }; //const header length
I get the size (m_outbound_data is a std::string == a serialized object)
//give header length
std::ostringstream header_stream
header_stream << std::setw(header_length) //set a field padding for header
<< std::hex //set next val to hexadecimal
<< m_data_out.m_outbound_data.size(); //write size in hexa
m_data_out.m_outbound_header = header_stream.str(); //m_outbound_head == size in hexa in a std::string
//m_outbound_header = [ 8 byte size ]
//m_outbound_data = [ serialized data ]
//write all data in the std::vector and send it
std::vector<boost::asio::const_buffer> buffer;
buffer.push_back(boost::asio::buffer(m_data_out.m_outbound_header));
buffer.push_back(boost::asio::buffer(m_data_out.m_outbound_data));
And for reading, you need to read in 2 time : 1st read 8 byte to get the size, then read the data in a vector and deserialize into object :
struct network_data_in {
char m_inbound_header[header_length]; //size of data to read
std::vector<char> m_inbound_data; // read data
};
I use this struct to get data, call read on the m_inbound_header to fill the buffer with size first, then, in the handle :
//get size of data
std::istringstream is(std::string(m_data_in.m_inbound_header, header_length));
std::size_t m_inbound_datasize = 0;
is >> std::hex >> m_inbound_datasize;
m_data_in.m_inbound_data.resize(m_inbound_datasize); //resize the vector
then call again read with the m_inbound_data on buffer, this result of reading exactly the data sent
In the second handle_read you juste have to deserialize the data :
//extract data
std::string archive_data (&(m_data_in.m_inbound_data[0]),m_data_in.m_inbound_data.size());
std::istringstream archive_stream(archive_data);
boost::archive::text_iarchive archive(archive_stream);
archive >> t; //deserialize
Hope that help you !
TCP is a stream-based protocol. This means that whatever you read is just a stream of bytes.
Let's consider an example: you have a message of a fixed size and you send it over TCP. How can the program at the other end read the entire message? there are two ways, one is to surround you message with control chracters (e.g. STX at start and ETX at end). At the start, the program would discard any chars before STX, then read any other chars into the message buffer until ETX is encountered.
Another way is to encode the message length in a fixed-size header (which apparently is your case). So the best thing you can do is figure out a way to read the message length, parse it and read the remaining bytes accordingly.