I have a struct representing ASCII data:
struct LineData
{
char _asciiData[256];
uint8_t _asciiDataLength;
}
created using:
std::string s = "some data here";
memcpy(obj._asciiData, s.length());
obj._asciiDataLength = s.length();
How do I write the char array to file as ASCII, in the lowest latency? I want to avoid the intermediary stage of creating a temporary std::string.
I tried:
file.write((char *)obj._asciiData, sizeof(obj._asciiDataLength));
file << std::endl;
but my file just contains '0' each line.
That's because sizeof(obj._asciiDataLength) is probably 1 on your system so only one character is written. You want the actual length, not the size of the uint8_t:
file.write(obj._asciiData, obj._asciiDataLength);
Related
I want to store multiple arrays which all entries consist of either 0 or 1.
This file would be quite large if i do it the way i do it.
I made a minimalist version of what i currently do.
#include <iostream>
#include <fstream>
using namespace std;
int main(){
ofstream File;
File.open("test.csv");
int array[4]={1,0,0,1};
for(int i = 0; i < 4; ++i){
File << array[i] << endl;
}
File.close();
return 0;
}
So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways?
If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.
Thank you very much!
So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways? If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.
The obvious solution is to take 64 characters, say A-Z, a-z, 0-9, and + and /, and have each character code for six entries in your table. There is, in fact, a standard for this called Base64. In Base64, A encodes 0,0,0,0,0,0 while / encodes 1,1,1,1,1,1. Each combination of six zeroes or ones has a corresponding character.
This still leaves commas, spaces, and newlines free for your use as separators.
If you want to store the data as compactly as possible, I'd recommend storing it as binary data, where each bit in the binary file represents one boolean value. This will allow you to store 8 boolean values for each byte of disk space you use up.
If you want to store arrays whose lengths are not multiples of 8, it gets a little bit more complicated since you can't store a partial byte, but you can solve that problem by storing an extra byte of meta-data at the end of the file that specifies how many bits of the final data-byte are valid and how many are just padding.
Something like this:
#include <iostream>
#include <fstream>
#include <cstdint>
#include <vector>
using namespace std;
// Given an array of ints that are either 1 or 0, returns a packed-array
// of uint8_t's containing those bits as compactly as possible.
vector<uint8_t> packBits(const int * array, size_t arraySize)
{
const size_t vectorSize = ((arraySize+7)/8)+1; // round up, then +1 for the metadata byte
vector<uint8_t> packedBits;
packedBits.resize(vectorSize, 0);
// Store 8 boolean-bits into each byte of (packedBits)
for (size_t i=0; i<arraySize; i++)
{
if (array[i] != 0) packedBits[i/8] |= (1<<(i%8));
}
// The last byte in the array is special; it holds the number of
// valid bits that we stored to the byte just before it.
// That way if the number of bits we saved isn't an even multiple of 8,
// we can use this value later on to calculate exactly how many bits we should restore
packedBits[vectorSize-1] = arraySize%8;
return packedBits;
}
// Given a packed-bits vector (i.e. as previously returned by packBits()),
// returns the vector-of-integers that was passed to the packBits() call.
vector<int> unpackBits(const vector<uint8_t> & packedBits)
{
vector<int> ret;
if (packedBits.size() < 2) return ret;
const size_t validBitsInLastByte = packedBits[packedBits.size()-1]%8;
const size_t numValidBits = 8*(packedBits.size()-((validBitsInLastByte>0)?2:1)) + validBitsInLastByte;
ret.resize(numValidBits);
for (size_t i=0; i<numValidBits; i++)
{
ret[i] = (packedBits[i/8] & (1<<(i%8))) ? 1 : 0;
}
return ret;
}
// Returns the size of the specified file in bytes, or -1 on failure
static ssize_t getFileSize(ifstream & inFile)
{
if (inFile.is_open() == false) return -1;
const streampos origPos = inFile.tellg(); // record current seek-position
inFile.seekg(0, ios::end); // seek to the end of the file
const ssize_t fileSize = inFile.tellg(); // record current seek-position
inFile.seekg(origPos); // so we won't change the file's read-position as a side effect
return fileSize;
}
int main(){
// Example of packing an array-of-ints into packed-bits form and saving it
// to a binary file
{
const int array[]={0,0,1,1,1,1,1,0,1,0};
// Pack the int-array into packed-bits format
const vector<uint8_t> packedBits = packBits(array, sizeof(array)/sizeof(array[0]));
// Write the packed-bits to a binary file
ofstream outFile;
outFile.open("test.bin", ios::binary);
outFile.write(reinterpret_cast<const char *>(&packedBits[0]), packedBits.size());
outFile.close();
}
// Now we'll read the binary file back in, unpack the bits to a vector<int>,
// and print out the contents of the vector.
{
// open the file for reading
ifstream inFile;
inFile.open("test.bin", ios::binary);
const ssize_t fileSizeBytes = getFileSize(inFile);
if (fileSizeBytes < 0)
{
cerr << "Couldn't read test.bin, aborting" << endl;
return 10;
}
// Read in the packed-binary data
vector<uint8_t> packedBits;
packedBits.resize(fileSizeBytes);
inFile.read(reinterpret_cast<char *>(&packedBits[0]), fileSizeBytes);
// Expand the packed-binary data back out to one-int-per-boolean
vector<int> unpackedInts = unpackBits(packedBits);
// Print out the int-array's contents
cout << "Loaded-from-disk unpackedInts vector is " << unpackedInts.size() << " items long:" << endl;
for (size_t i=0; i<unpackedInts.size(); i++) cout << unpackedInts[i] << " ";
cout << endl;
}
return 0;
}
(You could probably make the file even more compact than that by running zip or gzip on the file after you write it out :) )
You can indeed write and read binary data. However having line breaks and commas would be difficult. Imagine you save your data as boolean data, so only ones and zeros. Then having a comma would mean you need an special character, but you have only ones and zeros!. The next best thing would be to make an object of two booleans, one meaning the usual data you need (c++ would then read the data in pairs of bits), and the other meaning whether you have a comma or not, but I doubt this is what you need. If you want to do something like a csv, then it would be easy to just fix the size of each column (int would be 4 bytes, a string of no more than 32 char for example), and then just read and write accordingly. Suppose you have your binary
To initially save your array of the an object say pets, then you would use
FILE *apFile;
apFile = fopen(FILENAME,"w+");
fwrite(ARRAY_OF_PETS, sizeof(Pet),SIZE_OF_ARRAY, apFile);
fclose(apFile);
To access your idx pet, you would use
Pet m;
ifstream input_file (FILENAME, ios::in|ios::binary|ios::ate);
input_file.seekg (sizeof(Pet) * idx, ios::beg);
input_file.read((char*) &m,sizeof(Pet));
input_file.close();
You can also add data add the end, change data in the middle and so on.
console
file
Simple explanation: ifstream's get() is reading the wrong chars (console is different from file) and I need to know why.
I am recording registers into a file as a char array. When I write it to the file, it writes successfully. I open the file and find the chars I intended, except notepad apparently shows unicode character 0000 ( NULL) as a space.
For instance, the entries
id = 1000; //an 8-byte long long
name = "stack"; //variable size
surname = "overflow"; //variable size
degree = "internet"; //variable size
sex = 'c'; //1-byte char
birthdate = 256; //4-byte int
become this on the file:
& èstackoverflowinternetc
or, putting the number of unicode characters that disappear when posted here between brackets:
&[3]| [1]è|stack|overflow|internet|c| [1] | //separating each section with a | for easier reading. Some unicode characters disappear when I post them here, but I assure you they are the correct ones
SIZE| ID | name| surname| degree |g| birth
(writing is working fine and puts the expected characters)
Trouble is, when the console in the code below prints what the buffer is reading from the file, it gives me the following record (extra spaces included)
Þstackoverflowinternetc
Which is bad because it returns me the wrong ID and birthdate. Either "-21" and "4747968" or "Ù" and "-1066252288". Other fields are unnaffected. Weird because size bytes show up as empty space in the console, so it shouldn't be able to split name, surname, degree and sex.
ifstream infile("alumni.freire", ios::binary);
if(infile.is_open()){
infile.seekg(pos, ios::beg);
int size;
size = infile.get();
char charreg[size];
charreg[0] = size;
//testing what buffer gives me
for(int i = 1; i < size; i++){
charreg[i] = infile.get();
cout << charreg[i];
}
}
What am I doing wrong?
EDIT: to explain better what I did:
I get the entries on the first "code" from user input and use them as parameters when creating a "reg" class I implemented. The reg class then does (adequatly, I've already tested it) the conversion to strings, and calculates a hidden four-element char array containing instance size, name size, surname size and degree size. When the program writes the class on-file, it is written perfectly, as I showed in the second "code" section. (If you do the calculations you'll see '&' equals the size of the entire thing, for example). When I read it from the file, it appears differently on console for some reason. Different characters. But it reads the right amount of characters because "name", "surname" and "degree" appear correctly.
EDIT n2: I made "charreg[]" into an int array and printed it and the values are correct. I have no idea what's happening anymore.
EDIT n3: Apparently the reason I was getting the wrong chars is that I should have used unsigned chars...
The idea to write, as is, your structure is good. But your approach is wrong.
You must have something to separate your fields.
For example you know that your ID is 8 byte long, great ! You can read 8 bytes :
long long id;
read(fd, &id, 8);
In your example you got -24 because you read the first byte of the full id number.
But for the rest of the file, how can you know the length of the first name and the last name ?
You could read byte by byte until you find an null byte.
But I suggest you to use a more structured file.
For example, you can define a structure like this :
long long id; // 8 bytes
char firstname[256]; // 256 bytes
char lastname[256]; // 256 bytes
char sex; // 1 byte
int birthdate; // 4 bytes
With this structure you can read and write super easily :
struct my_struct s;
read(fd, &s, sizeof(struct my_struct)); // read 8+256+256+1+4 bytes
s.birthdate = 128;
write(fd, &s, sizeof(struct my_struct));// write the structure
Of course you loose the "variable length" of the first name and last name. Do you really need more than 100 chars for a name ?
In a case you really need, you could introduce an header over each variable length value. But you loose the ability to read everything at once.
long long id;
int foo_size;
char *foo;
And then to read it :
struct my_struct s;
read(fd, &s, 12); // read the header, 8 + 4 bytes
char foo[s.foo_size];
read(fd, &s, s.foo_size);
You should define what exactly you need to save. Define a precise data structure that you can easily deduce at read, avoid things like "oh, let's read until null-byte".
I used C function to explain you because it's much more representative. You know what you read and what you write.
Start to play with this, and then try the same with c++ streams/function
I don't know how you are writing back information to the file but here is how I would do that, I'm hoping this is a fairly simple way of doing it. Keep in mind I have no idea what kind of file you are actually working with.
long long id = 1000;
std::string name = "name";
std::string surname = "overflow";
std::string degree = "internet";
unsigned char sex = 'c';
int birthdate = 256;
ofstream outfile("test.txt", ios::binary);
if (outfile.is_open())
{
const char* idBytes = static_cast<char*>(static_cast<void*>(&id));
const char* nameBytes = name.c_str();
const char* surnameBytes = surname.c_str();
const char* degreeBytes = degree.c_str();
const char* birthdateBytes = static_cast<char*>(static_cast<void*>(&birthdate));
outfile.write(idBytes, sizeof(id));
outfile.write(nameBytes, name.length());
outfile.write(surnameBytes, surname.length());
outfile.write(degreeBytes, degree.length());
outfile.put(sex);
outfile.write(birthdateBytes, sizeof(birthdate));
outfile.flush();
outfile.close();
}
and here is how I am going to output it, which to me seems to be coming out as expected.
ifstream infile("test.txt", std::ifstream::ate | ios::binary);
if (infile.is_open())
{
std::size_t fileSize = infile.tellg();
infile.seekg(0);
for (int i = 0; i < fileSize; i++)
{
char c = infile.get();
std::cout << c;
}
std::cout << std::endl;
}
I've recently been getting in to IO with C++. I am trying to read a string from a binary file stream.
The custom type is saved like this:
The string is prefixed with the length of the string. So hello, would be stored like this: 6Hello\0.
I am basically reading text from a table (in this case a name table) in a binary file. The file header tells me the offset of this table (112 bytes in this case) and the number of names (318).
Using this information I can read the first byte at this offset. This tells me the length of the string (e.g. 6). So I'll start at the next byte and read 5 more to get the full string "Hello". This seems to work fine with the first name at the offset. trying to recursively read the rest provides a lot of garbage really. I've tried using loops and recursive functions but its not working out so well. Not sure what the problem is, so reverted to the original one name retrieval method. Here's the code:
int printName(fstream& fileObj, __int8 buff, DWORD offset, int& iteration){
fileObj.seekg(offset);
fileObj.read((char*)&buff, sizeof(char));
int nameSize = (int)buff;
char* szName = new char[nameSize];
for(int i=1; i <= nameSize; i++){
fileObj.seekg(offset+i);
fileObj.read((char*)&szName[i-1], sizeof(char));
}
cout << szName << endl;
return 0;
}
Any idea how to iterate through all 318 names without creating dodgy output?
Thanks for taking the time to look through this, your help is greatly appreciated.
You're overcomplicating a bit - there's no need to seek to the next sequential read.
Removing unused and pointless parameters, I would write this function something like this:
void printName(fstream& fileObj, DWORD offset) {
char size = 0;
if (fileObj.seekg(offset) && fileObj.read(&size, sizeof(char)))
{
char* name = new char[size];
if (fileObj.read(name, size))
{
cout << name << endl;
}
delete [] name;
}
}
i have a buffer
char buffer[size];
which i am using to store the file contents of a stream(suppose pStream here)
HRESULT hr = pStream->Read(buffer, size, &cbRead );
now i have all the contents of this stream in buffer which is of size(suppose size here). now i know that i have two strings
"<!doctortype html" and ".html>"
which are present somewhere (we don't their loctions) inside the stored contents of this buffer and i want to store just the contents of the buffer from the location
"<!doctortype html" to another string ".html>"
in to another buffer2[SizeWeDontKnow] yet.
How to do that ??? (actually contents from these two location are the contents of a html file and i want to store the contents of only html file present in this buffer). any ideas how to do that ??
You can use strnstr function to find the right position in your buffer. After you've found the starting and ending tag, you can extract the text inbetween using strncpy, or use it in place if the performance is an issue.
You can calculate needed size from the positions of the tags and the length of the first tag nLength = nPosEnd - nPosStart - nStartTagLength
Look for HTML parsers for C/C++.
Another way is to have a char pointer from the start of the buffer and then check each char there after. See if it follows your requirement.
If that's the only operation which operates on HTML code in your app, then you could use the solution I provided below (you can also test it online - here). However, if you are going to do some more complicated parsing, then I suggest using some external library.
#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;
int main()
{
const char* beforePrefix = "asdfasdfasdfasdf";
const char* prefix = "<!doctortype html";
const char* suffix = ".html>";
const char* postSuffix = "asdasdasd";
unsigned size = 1024;
char buf[size];
sprintf(buf, "%s%sTHE STRING YOU WANT TO GET%s%s", beforePrefix, prefix, suffix, postSuffix);
cout << "Before: " << buf << endl;
const char* firstOccurenceOfPrefixPtr = strstr(buf, prefix);
const char* firstOccurenceOfSuffixPtr = strstr(buf, suffix);
if (firstOccurenceOfPrefixPtr && firstOccurenceOfSuffixPtr)
{
unsigned textLen = (unsigned)(firstOccurenceOfSuffixPtr - firstOccurenceOfPrefixPtr - strlen(prefix));
char newBuf[size];
strncpy(newBuf, firstOccurenceOfPrefixPtr + strlen(prefix), textLen);
newBuf[textLen] = 0;
cout << "After: " << newBuf << endl;
}
return 0;
}
EDIT
I get it now :). You should use strstr to find the first occurence of the prefix then. I edited the code above, and updated the link.
Are you limited to C, or can you use C++?
In the C library reference there are plenty of useful ways of tokenising strings and comparing for matches (string.h):
http://www.cplusplus.com/reference/cstring/
Using C++ I would do the following (using buffer and size variables from your code):
// copy char array to std::string
std::string text(buffer, buffer + size);
// define what we're looking for
std::string begin_text("<!doctortype html");
std::string end_text(".html>");
// find the start and end of the text we need to extract
size_t begin_pos = text.find(begin_text) + begin_text.length();
size_t end_pos = text.find(end_text);
// create a substring from the positions
std::string extract = text.substr(begin_pos,end_pos);
// test that we got the extract
std::cout << extract << std::endl;
If you need C string compatibility you can use:
char* tmp = extract.c_str();
I have a program that I need to read binary text into. I read the binary text via a redirection:
readData will be an executable made by my Makefile.
Example: readData < binaryText.txt
What I want to do is read the binary text, and store each character in the binary text file as a character inside a char array. The binary text is made up of 32 This is my attempt at doing so...
unsigned char * buffer;
char d;
cin.seekg(0, ios::end);
int length = cin.tellg();
cin.seekg(0, ios::beg);
buffer = new unsigned char [length];
while(cin.get(d))
{
cin.read((char*)&buffer, length);
cout << buffer[(int)d] << endl;
}
However, I keep getting a segmentation fault on this. Might anyone have any ideas on how to read binary text into a char array? Thanks!
I'm more a C programmer rather than a C++, but I think that you should have started your while loop
while(cin.get(&d)){
The easiest would be like this:
std::istringstream iss;
iss << std::cin.rdbuf();
// now use iss.str()
Or, all in one line:
std::string data(static_cast<std::istringstream&>(std::istringstream() << std::cin.rdbuf()).str());
Something like this should do the trick.
You retrieve the filename from the arguments and then read the whole file in one shot.
const char *filename = argv[0];
vector<char> buffer;
// open the stream
std::ifstream is(filename);
// determine the file length
is.seekg(0, ios_base::end);
std::size_t size = is.tellg();
is.seekg(0, std::ios_base::beg);
// make sure we have enough memory space
buffer.reserve(size);
buffer.resize(size, 0);
// load the data
is.read((char *) &buffer[0], size);
// close the file
is.close();
You then just need to iterate over the vector to read characters.
The reason why you are getting segmentation fault is because you are trying to access an array variable using a character value.
Problem:
buffer[(int)d] //d is a ASCII character value, and if the value exceeds the array's range, there comes the segfault.
If what you want is an character array, you already have that from cin.read()
Solution:
cin.read(reinterpret_cast<char*>(buffer), length);
If you want to print out, just use printf
printf("%s", buffer);
I used reinterpret_cast because it thought it is safe to convert to signed character pointer since most characters that are used would range from 0 ~ 127. You should know that character values from 128 to 255 would be converted wrongly.