How to set maximum read length for a stream in C++? - c++

I'm reading data from a stream into a char array of a given length, and I'd like to make the maximum width of read to be large enough to fit in that char array.
The reason I use a char array is that part of my specification is that the length of any individual token cannot exceed a certain value, so I'm saving myself some constructor calls.
I thought width() did what I wanted, but I was apparently wrong...
EDIT: I'm using the stream extraction operators to perform the extraction, since these are flat text files with values separated by whitespace.

If you're processing text, you're looking for the get function: http://cppreference.com/wiki/io/get
const int size = 200;
char myArray[size] = {};
cin.get(myArray, size);
Note: only size - 1 characters are read, which leaves a NULL terminator in myArray.
If it's raw data, you'd probably prefer read: http://cppreference.com/wiki/io/read
const int size = 200;
char myArray[size] = {};
cin.read(myArray, size);
size bytes are read.

char x[4];
cin.width(4);
cin >> x;
cout << x;
Input: "abcdef"
Output: "abc"
(x[3] is null terminating char)
Width works fine in this case.
Note: Empirical testing indicates that the cin.width call only lasts for one stream operation. It may be more convenient to use cin >> setw(4) >> x; instead, though this requires iomanip.

Related

ifstream not reading the same characters as they are written in the file

console
file
Simple explanation: ifstream's get() is reading the wrong chars (console is different from file) and I need to know why.
I am recording registers into a file as a char array. When I write it to the file, it writes successfully. I open the file and find the chars I intended, except notepad apparently shows unicode character 0000 ( NULL) as a space.
For instance, the entries
id = 1000; //an 8-byte long long
name = "stack"; //variable size
surname = "overflow"; //variable size
degree = "internet"; //variable size
sex = 'c'; //1-byte char
birthdate = 256; //4-byte int
become this on the file:
& èstackoverflowinternetc
or, putting the number of unicode characters that disappear when posted here between brackets:
&[3]| [1]è|stack|overflow|internet|c| [1] | //separating each section with a | for easier reading. Some unicode characters disappear when I post them here, but I assure you they are the correct ones
SIZE| ID | name| surname| degree |g| birth
(writing is working fine and puts the expected characters)
Trouble is, when the console in the code below prints what the buffer is reading from the file, it gives me the following record (extra spaces included)
Þstackoverflowinternetc
Which is bad because it returns me the wrong ID and birthdate. Either "-21" and "4747968" or "Ù" and "-1066252288". Other fields are unnaffected. Weird because size bytes show up as empty space in the console, so it shouldn't be able to split name, surname, degree and sex.
ifstream infile("alumni.freire", ios::binary);
if(infile.is_open()){
infile.seekg(pos, ios::beg);
int size;
size = infile.get();
char charreg[size];
charreg[0] = size;
//testing what buffer gives me
for(int i = 1; i < size; i++){
charreg[i] = infile.get();
cout << charreg[i];
}
}
What am I doing wrong?
EDIT: to explain better what I did:
I get the entries on the first "code" from user input and use them as parameters when creating a "reg" class I implemented. The reg class then does (adequatly, I've already tested it) the conversion to strings, and calculates a hidden four-element char array containing instance size, name size, surname size and degree size. When the program writes the class on-file, it is written perfectly, as I showed in the second "code" section. (If you do the calculations you'll see '&' equals the size of the entire thing, for example). When I read it from the file, it appears differently on console for some reason. Different characters. But it reads the right amount of characters because "name", "surname" and "degree" appear correctly.
EDIT n2: I made "charreg[]" into an int array and printed it and the values are correct. I have no idea what's happening anymore.
EDIT n3: Apparently the reason I was getting the wrong chars is that I should have used unsigned chars...
The idea to write, as is, your structure is good. But your approach is wrong.
You must have something to separate your fields.
For example you know that your ID is 8 byte long, great ! You can read 8 bytes :
long long id;
read(fd, &id, 8);
In your example you got -24 because you read the first byte of the full id number.
But for the rest of the file, how can you know the length of the first name and the last name ?
You could read byte by byte until you find an null byte.
But I suggest you to use a more structured file.
For example, you can define a structure like this :
long long id; // 8 bytes
char firstname[256]; // 256 bytes
char lastname[256]; // 256 bytes
char sex; // 1 byte
int birthdate; // 4 bytes
With this structure you can read and write super easily :
struct my_struct s;
read(fd, &s, sizeof(struct my_struct)); // read 8+256+256+1+4 bytes
s.birthdate = 128;
write(fd, &s, sizeof(struct my_struct));// write the structure
Of course you loose the "variable length" of the first name and last name. Do you really need more than 100 chars for a name ?
In a case you really need, you could introduce an header over each variable length value. But you loose the ability to read everything at once.
long long id;
int foo_size;
char *foo;
And then to read it :
struct my_struct s;
read(fd, &s, 12); // read the header, 8 + 4 bytes
char foo[s.foo_size];
read(fd, &s, s.foo_size);
You should define what exactly you need to save. Define a precise data structure that you can easily deduce at read, avoid things like "oh, let's read until null-byte".
I used C function to explain you because it's much more representative. You know what you read and what you write.
Start to play with this, and then try the same with c++ streams/function
I don't know how you are writing back information to the file but here is how I would do that, I'm hoping this is a fairly simple way of doing it. Keep in mind I have no idea what kind of file you are actually working with.
long long id = 1000;
std::string name = "name";
std::string surname = "overflow";
std::string degree = "internet";
unsigned char sex = 'c';
int birthdate = 256;
ofstream outfile("test.txt", ios::binary);
if (outfile.is_open())
{
const char* idBytes = static_cast<char*>(static_cast<void*>(&id));
const char* nameBytes = name.c_str();
const char* surnameBytes = surname.c_str();
const char* degreeBytes = degree.c_str();
const char* birthdateBytes = static_cast<char*>(static_cast<void*>(&birthdate));
outfile.write(idBytes, sizeof(id));
outfile.write(nameBytes, name.length());
outfile.write(surnameBytes, surname.length());
outfile.write(degreeBytes, degree.length());
outfile.put(sex);
outfile.write(birthdateBytes, sizeof(birthdate));
outfile.flush();
outfile.close();
}
and here is how I am going to output it, which to me seems to be coming out as expected.
ifstream infile("test.txt", std::ifstream::ate | ios::binary);
if (infile.is_open())
{
std::size_t fileSize = infile.tellg();
infile.seekg(0);
for (int i = 0; i < fileSize; i++)
{
char c = infile.get();
std::cout << c;
}
std::cout << std::endl;
}

Convert a 16-bit integer to an array of char? (C++)

I need to write 16-bit integers to a file. fstream only writes characters. Thus I need to convert the integers to char - the actual integer, not the character representing the integer (i.e. 0 should be 0x00, not 0x30) I tried the following:
char * chararray = (char*)(&the_int);
However this creates a backwards array of two characters. The individual characters are not flipped, but the order of the characters is. Thus I created this function:
char * inttochar(uint16_t input)
{
int input_size = sizeof(input);
char * chararray = (char*)(&input);
char * output;
output[0]='\0';
for (int i=0; i<input_size; i++)
{
output[i]=chararray[input_size-(i+1)];
}
return output;
}
This seems slow. Surely there is a more efficient, less hacky way to convert it?
It's a bit hard to understand what you're asking here (perhaps it's just me, although I gather the commentators thought so too).
You write
fstream only writes characters
That's true, but doesn't necessarily mean you need to create a character array explicitly.
E.g., if you have an fstream object f (opened in binary mode), you can use the write method:
uint16_t s;
...
f.write(static_cast<const char *>(&s), sizeof(uint16_t));
As others have noted, when you serialize numbers, it often pays to use a commonly-accepted ordering. Hence, use htons (refer to the documentation for your OS's library):
uint16_t s;
...
const uint16_t ns = htons(s);
f.write(static_cast<const char *>(&ns), sizeof(uint16_t));

Serialize/deserialize unsigned char

I'm working on an API for an embedded device, and need to display an image generated (by the API). The screen attached to the device allows me to render bitmaps, with data stored as unsigned char image[] = { 0B00000000, 0B00001111, 0B11111110... }.
What is the easiest way to deserialize a string in whatever format needed?
My approach was to create a stringstream, separate by comma and push to vector<char>. However, the function to render bitmaps will only accept char, and from what I can find online it seems to be quite difficult to convert it. Ideally, I'd rather not use a vector at all, as including it adds several kbs to the project, which is limited in size by both the download speed of the embedded device (firmware is transferred by EDGE) and the onboard storage.
From the comments, it sounds like you want to convert a string composed of a series of "0b00000000" style literals, comma separated, into an array of their actual values. The way I would do this is to:
Get the number of bytes in the image (I assume this is known from the string length?).
Create a std::vector of unsigned char to hold the results.
For each byte in the input, construct a std::bitset from the string value, and then get its actual value.
Here's a code example. Since you have said you'd rather not use vector I have used C-style arrays and strings:
#include <bitset>
#include <cstring>
#include <iostream>
#include <memory>
int main() {
auto input = "0b00000000,0b00001111,0b11111111";
auto length = strlen(input);
// Get the number of bytes from the string length. Each byte takes 10 chars
// plus a comma separator.
int size = (length + 1) / 11;
// Allocate memory to hold the result.
std::unique_ptr<unsigned char[]> bytes(new unsigned char[size]);
// Populate each byte individually.
for (int i = 0; i < size; ++i) {
// Create the bitset. The stride is 11, and skip the first 2 characters
// to skip the 0b prefix.
std::bitset<8> bitset(input + 2 + i * 11, 8);
// Store the resulting byte.
bytes[i] = bitset.to_ulong();
}
// Now loop back over each byte, and output it to confirm the result.
for (int i = 0; i < size; ++i) {
std::cout << "0b" << std::bitset<8>(bytes[i]) << std::endl;
}
}

How to capture length of sscanf'd string?

I'm parsing a string that follows a predictable pattern:
1 character
an integer (one or more digits)
1 colon
a string, whose length came from #2
For example:
s5:stuff
I can see easily how to parse this with PCRE or the like, but I'd rather stick to plain string ops for the sake of speed.
I know I'll need to do it in 2 steps because I can't allocate the destination string until I know its length. My problem is gracefully getting the offset for the start of said string. Some code:
unsigned start = 0;
char type = serialized[start++]; // get the type tag
int len = 0;
char* dest = NULL;
char format[20];
//...
switch (type) {
//...
case 's':
// Figure out the length of the target string...
sscanf(serialized + start, "%d", &len);
// <code type='graceful'>
// increment start by the STRING LENGTH of whatever %d was
// </code>
// Don't forget to skip over the colon...
++start;
// Build a format string which accounts for length...
sprintf(format, "%%%ds", len);
// Finally, grab the target string...
sscanf(serialized + start, format, string);
break;
//...
}
That code is roughly taken from what I have (which isn't complete because of the issue at hand) but it should get the point across. Maybe I'm taking the wrong approach entirely. What's the most graceful way to do this? The solution can either C or C++ (and I'd actually like to see the competing methods if there are enough responses).
You can use the %n conversion specifier, which doesn't consume any input - instead, it expects an int * parameter, and writes the number of characters consumed from the input into it:
int consumed;
sscanf(serialized + start, "%d%n", &len, &consumed);
start += consumed;
(But don't forget to check that sscanf() returned > 0!)
Use the %n format specifier to write the number of characters read so far to an integer argument.
Here's a C++ solution, it could be better, and is hard-coded specifically to deal with your example input, but shouldn't require much modification to get working.
std::stringstream ss;
char type;
unsigned length;
char dummy;
std::string value;
ss << "s5:Helloxxxxxxxxxxx";
ss >> type;
ss >> length;
ss >> dummy;
ss.width(length);
ss >> value;
std::cout << value << std::endl;
Disclaimer:
I'm a noob at C++.
You can probably just use atoi which will ignore the colon.
e.g. len = atoi(serialized + start);
The only thing with atoi is that if it returns zero it could mean either the conversion failed, or that the length was truly zero. So it's not always the most appropriate function.
if you replace you colon with a space scanf will stop on it and you can get the size malloc the size then run another scanf to get the rest of the string`
int main (int argc, const char * argv[]) {
char foo[20];
char *test;
scanf("%s",foo); //"hello world"
printf("foo = %s\n", foo);//prints hello
//get size
test = malloc(sizeof(char)* 10);//replace 10 with your string size
scanf("%s", test);
printf("test = %s\n", test);//prints world
return 0;
}
`
Seems like the format is overspecified... (using a variable length field to specify the length of a variable length field).
If you're using GCC, I'd suggest
if (sscanf(serialized,"%c%d:%as",&type,&len,&dest)<3) return -1;
/* use type, dest; ignore len */
free(dest);
return 0;

Very strange char array behaviour

.
unsigned int fname_length = 0;
//fname length equals 30
file.read((char*)&fname_length,sizeof(unsigned int));
//fname contains random data as you would expect
char *fname = new char[fname_length];
//fname contains all the data 30 bytes long as you would expect, plus 18 bytes of random data on the end (intellisense display)
file.read((char*)fname,fname_length);
//m_material_file (std:string) contains all 48 characters
m_material_file = fname;
// count = 48
int count = m_material_file.length();
now when trying this way, intellisense still shows the 18 bytes of data after setting the char array to all ' ' and I get exactly the same results. even without the file read
char name[30];
for(int i = 0; i < 30; ++i)
{
name[i] = ' ';
}
file.read((char*)fname,30);
m_material_file = name;
int count = m_material_file.length();
any idea whats going wrong here, its probably something completely obvious but im stumped!
thanks
Sounds like the string in the file isn't null-terminated, and intellisense is assuming that it is. Or perhaps when you wrote the length of the string (30) into the file, you didn't include the null character in that count. Try adding:
fname[fname_length] = '\0';
after the file.read(). Oh yeah, you'll need to allocate an extra character too:
char * fname = new char[fname_length + 1];
I guess that intellisense is trying to interpret char* as C string and is looking for a '\0' byte.
fname is a char* so both the debugger display and m_material_file = fname will be expecting it to be terminated with a '\0'. You're never explicitly doing that, but it just happens that whatever data follows that memory buffer has a zero byte at some point, so instead of crashing (which is a likely scenario at some point), you get a string that's longer than you expect.
Use
m_material_file.assign(fname, fname + fname_length);
which removes the need for the zero terminator. Also, prefer std::vector to raw arrays.
std::string::operator=(char const*) is expecting a sequence of bytes terminated by a '\0'. You can solve this with any of the following:
extend fname by a character and add the '\0' explicitly as others have suggested or
use m_material_file.assign(&fname[0], &fname[fname_length]); instead or
use repeated calls to file.get(ch) and m_material_file.push_back(ch)
Personally, I would use the last option since it eliminates the explicitly allocated buffer altogether. One fewer explicit new is one fewer chance of leaking memory. The following snippet should do the job:
std::string read_name(std::istream& is) {
unsigned int name_length;
std::string file_name;
if (is.read((char*)&name_length, sizeof(name_length))) {
for (unsigned int i=0; i<name_length; ++i) {
char ch;
if (is.get(ch)) {
file_name.push_back(ch);
} else {
break;
}
}
}
return file_name;
}
Note:
You probably don't want to use sizeof(unsigned int) to determine how many bytes to write to a binary file. The number of bytes read/written is dependent on the compiler and platform. If you have a maximum length, then use it to determine the specific byte size to write out. If the length is guaranteed to fewer than 255 bytes, then only write a single byte for the length. Then your code will not depend on the byte size of intrinsic types.