C++ stream get unget not a nop?

C++ stream get unget not a nop? - c++

I have the following C++ program and ran it using Visual Studio 2008 on Windows 7. I get and then unget a character. After doing so, the file position is different. Why? How do I get around this problem?
test.txt (download link below if you want)
/* Comment 1 */
/* Comment 2 */
#include <fstream>
int main (int argc, char ** argv) {
char const * file = "test.txt";
std::fstream fs(file, std::ios::in);
std::streampos const before = fs.tellg();
// replacing the following two lines with
// char c = fs.peek(); results in the same problem
char const c = fs.get();
fs.unget();
std::streampos const after = fs.tellg();
fs.seekg(after);
char const c2 = fs.get();
fs.close();
return 0;
}
c: 47 '/' char
c2: -1 'ÿ' char
before: {_Myoff=0 _Fpos=0 _Mystate=0 } std::fpos<int>
after: {_Myoff=0 _Fpos=-3 _Mystate=0 } std::fpos<int>
Adding | std::fstream::binary to the constructor seems to solve the problem. Perhaps it has to do with newlines in the file? If so, why does it affect code that doesn't even get close to reading a newline?
Updated with a seeking to the after position and getting another character.
It seems that saving via Notepad vs. Vim makes a difference. Saving via Notepad makes the stream work okay.
I have uploaded the file to google docs if you want to dl it:
https://docs.google.com/leaf?id=0B8Ufd7Rk6dvHZmYyZjgwYmItMTI3MC00MDljLWJjYTctMWMxYWM0ODk1MTE2&hl=en_US

Ok using your input file I see the same behavior you do. After some experimentation, it looks like the file was in Unix format, then had the ^M characters edited out (at least that's how I was able to reproduce it).
To fix it, I edited the file in Vim, executed ":set ff=dos", then added and deleted a character to dirty the file, then saved it.

The file position behaves as expected:
// unget.cpp
#include <fstream>
#include <iostream>
int main ()
{
char const * file = "test.txt";
std::fstream fs(file, std::fstream::in);
std::cout << fs.tellg() << std::endl; // 0
char c = fs.get();
std::cout << fs.tellg() << std::endl; // 1
fs.unget();
std::cout << fs.tellg() << std::endl; // 0
fs.close();
return 0;
}
Build and run:
$ clang++ unget.cpp
$ ./a.out
0
1
0
Or, I don't understand where is the problem.

Related

Proper way to convert HEX to ASCII read from a file C++

In my code bellow CODE 1 reading HEX from a file and storing in in string array won't convert it to ASCII when printed out.
#include <iostream>
#include <sstream>
#include <fstream>
int main(int argc, char** argv)
{
// CODE 1
std::ifstream input("C:\\test.txt"); // The test.txt contains \x48\x83\xEC\x28\x48\x83
std::stringstream sstr;
input >> sstr.rdbuf();
std::string test = sstr.str();
std::cout << "\nString from file: " << test;
//char* lol = new char[test.size()];
//memcpy(lol, test.data(), test.size());
////////////////////////////////////////////////////////
// CODE 2
std::string test_2 = "\x48\x83\xEC\x28\x48\x83";
std::cout << "\n\nHardcoded string: " << test_2 << "\n";
// Prints as ASCII "H(H" , which I want my CODE 1 to do.
}
In my CODE 2 sample, same HEX is used and it prints it as ASCII. Why is it not the same for CODE 1?

Okay, it looks like there is some confusion. First, I have to ask if you're SURE you know what is in your file.
That is, does it contain, oh, it looks like about 20 characters:
\
x
4
8
et cetera?
Or does it contain a hex 48 (one byte), a hex 83 (one byte), for a total of 5-ish characters?
I bet it's the first. I bet your file is about 20 characters long and literally contains the string that's getting printed.
And if so, then the code is doing what you expect. It's reading a line of text and writing it back out. If you want it to actually interpret it like the compiler does, then you're going to have to do the steps yourself.
Now, if it actually contains the hex characters (but I bet it doesn't), then that's a little different problem, and we'll have to look at that. But I think you just have a string of characters that includes \x in it. And reading / writing that isn't going to automatically do some magic for you.

When you read from file, the backslash characters are not escaped. Your test string from file is literally an array of chars: {'\\', 'x', '4', '8', ... }
Whereas your hardcoded literal string, "\x48\x83\xEC\x28\x48\x83"; is fully hex escaped by the compiler.
If you really want to store your data as a text file as a series of "backslash x NN" sequences, you'll need to convert after you read from file. Here's a hacked up loop that would do it for you.
std::string test = sstr.str();
char temp[3] = {};
size_t t = 0;
std::string corrected;
for (char c : test)
{
if (isxdigit(c))
{
temp[t] = c;
t++;
if (t == 2)
{
t = 0;
unsigned char uc = (unsigned char)strtoul(tmp, nullptr, 16);
corrected += (char)uc;
}
}
}

You can split the returned string in \x then make casting from string to int,
finally casting to char.
this resource will be helpful
strtok And convert

Get file string in compile time [duplicate]

Is there a way to include an entire text file as a string in a C program at compile-time?
something like:
file.txt:
This is
a little
text file
main.c:
#include <stdio.h>
int main(void) {
#blackmagicinclude("file.txt", content)
/*
equiv: char[] content = "This is\na little\ntext file";
*/
printf("%s", content);
}
obtaining a little program that prints on stdout "This is
a little
text file"
At the moment I used an hackish python script, but it's butt-ugly and limited to only one variable name, can you tell me another way to do it?

I'd suggest using (unix util)xxd for this.
you can use it like so
$ echo hello world > a
$ xxd -i a
outputs:
unsigned char a[] = {
0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72, 0x6c, 0x64, 0x0a
};
unsigned int a_len = 12;

The question was about C but in case someone tries to do it with C++11 then it can be done with only little changes to the included text file thanks to the new raw string literals:
In C++ do this:
const char *s =
#include "test.txt"
;
In the text file do this:
R"(Line 1
Line 2
Line 3
Line 4
Line 5
Line 6)"
So there must only be a prefix at the top of the file and a suffix at the end of it. Between it you can do what you want, no special escaping is necessary as long as you don't need the character sequence )". But even this can work if you specify your own custom delimiter:
R"=====(Line 1
Line 2
Line 3
Now you can use "( and )" in the text file, too.
Line 5
Line 6)====="

I like kayahr's answer. If you don't want to touch the input files however, and if you are using CMake, you can add the delimeter character sequences on the file. The following CMake code, for instance, copies the input files and wraps their content accordingly:
function(make_includable input_file output_file)
file(READ ${input_file} content)
set(delim "for_c++_include")
set(content "R\"${delim}(\n${content})${delim}\"")
file(WRITE ${output_file} "${content}")
endfunction(make_includable)
# Use like
make_includable(external/shaders/cool.frag generated/cool.frag)
Then include in c++ like this:
constexpr char *test =
#include "generated/cool.frag"
;

You have two possibilities:
Make use of compiler/linker extensions to convert a file into a binary file, with proper symbols pointing to the begin and end of the binary data. See this answer: Include binary file with GNU ld linker script.
Convert your file into a sequence of character constants that can initialize an array. Note you can't just do "" and span multiple lines. You would need a line continuation character (\), escape " characters and others to make that work. Easier to just write a little program to convert the bytes into a sequence like '\xFF', '\xAB', ...., '\0' (or use the unix tool xxd described by another answer, if you have it available!):
Code:
#include <stdio.h>
int main() {
int c;
while((c = fgetc(stdin)) != EOF) {
printf("'\\x%X',", (unsigned)c);
}
printf("'\\0'"); // put terminating zero
}
(not tested). Then do:
char my_file[] = {
#include "data.h"
};
Where data.h is generated by
cat file.bin | ./bin2c > data.h

You can do this using objcopy:
objcopy --input binary --output elf64-x86-64 myfile.txt myfile.o
Now you have an object file you can link into your executable which contains symbols for the beginning, end, and size of the content from myfile.txt.

ok, inspired by Daemin's post i tested the following simple example :
a.data:
"this is test\n file\n"
test.c:
int main(void)
{
char *test =
#include "a.data"
;
return 0;
}
gcc -E test.c output:
# 1 "test.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "test.c"
int main(void)
{
char *test =
# 1 "a.data" 1
"this is test\n file\n"
# 6 "test.c" 2
;
return 0;
}
So it's working but require data surrounded with quotation marks.

If you're willing to resort to some dirty tricks you can get creative with raw string literals and #include for certain types of files.
For example, say I want to include some SQL scripts for SQLite in my project and I want to get syntax highlighting but don't want any special build infrastructure. I can have this file test.sql which is valid SQL for SQLite where -- starts a comment:
--x, R"(--
SELECT * from TestTable
WHERE field = 5
--)"
And then in my C++ code I can have:
int main()
{
auto x = 0;
const char* mysql = (
#include "test.sql"
);
cout << mysql << endl;
}
The output is:
--
SELECT * from TestTable
WHERE field = 5
--
Or to include some Python code from a file test.py which is a valid Python script (because # starts a comment in Python and pass is a no-op):
#define pass R"(
pass
def myfunc():
print("Some Python code")
myfunc()
#undef pass
#define pass )"
pass
And then in the C++ code:
int main()
{
const char* mypython = (
#include "test.py"
);
cout << mypython << endl;
}
Which will output:
pass
def myfunc():
print("Some Python code")
myfunc()
#undef pass
#define pass
It should be possible to play similar tricks for various other types of code you might want to include as a string. Whether or not it is a good idea I'm not sure. It's kind of a neat hack but probably not something you'd want in real production code. Might be ok for a weekend hack project though.

You need my xtr utility but you can do it with a bash script. This is a script I call bin2inc. The first parameter is the name of the resulting char[] variable. The second parameter is the name of the file. The output is C include file with the file content encoded (in lowercase hex) as the variable name given. The char array is zero terminated, and the length of the data is stored in $variableName_length
#!/bin/bash
fileSize ()
{
[ -e "$1" ] && {
set -- `ls -l "$1"`;
echo $5;
}
}
echo unsigned char $1'[] = {'
./xtr -fhex -p 0x -s ', ' < "$2";
echo '0x00'
echo '};';
echo '';
echo unsigned long int ${1}_length = $(fileSize "$2")';'
YOU CAN GET XTR HERE xtr (character eXTRapolator) is GPLV3

Why not link the text into the program and use it as a global variable! Here is an example. I'm considering using this to include Open GL shader files within an executable since GL shaders need to be compiled for the GPU at runtime.

I reimplemented xxd in python3, fixing all of xxd's annoyances:
Const correctness
string length datatype: int → size_t
Null termination (in case you might want that)
C string compatible: Drop unsigned on the array.
Smaller, readable output, as you would have written it: Printable ascii is output as-is; other bytes are hex-encoded.
Here is the script, filtered by itself, so you can see what it does:
pyxxd.c
#include <stddef.h>
extern const char pyxxd[];
extern const size_t pyxxd_len;
const char pyxxd[] =
"#!/usr/bin/env python3\n"
"\n"
"import sys\n"
"import re\n"
"\n"
"def is_printable_ascii(byte):\n"
" return byte >= ord(' ') and byte <= ord('~')\n"
"\n"
"def needs_escaping(byte):\n"
" return byte == ord('\\\"') or byte == ord('\\\\')\n"
"\n"
"def stringify_nibble(nibble):\n"
" if nibble < 10:\n"
" return chr(nibble + ord('0'))\n"
" return chr(nibble - 10 + ord('a'))\n"
"\n"
"def write_byte(of, byte):\n"
" if is_printable_ascii(byte):\n"
" if needs_escaping(byte):\n"
" of.write('\\\\')\n"
" of.write(chr(byte))\n"
" elif byte == ord('\\n'):\n"
" of.write('\\\\n\"\\n\"')\n"
" else:\n"
" of.write('\\\\x')\n"
" of.write(stringify_nibble(byte >> 4))\n"
" of.write(stringify_nibble(byte & 0xf))\n"
"\n"
"def mk_valid_identifier(s):\n"
" s = re.sub('^[^_a-z]', '_', s)\n"
" s = re.sub('[^_a-z0-9]', '_', s)\n"
" return s\n"
"\n"
"def main():\n"
" # `xxd -i` compatibility\n"
" if len(sys.argv) != 4 or sys.argv[1] != \"-i\":\n"
" print(\"Usage: xxd -i infile outfile\")\n"
" exit(2)\n"
"\n"
" with open(sys.argv[2], \"rb\") as infile:\n"
" with open(sys.argv[3], \"w\") as outfile:\n"
"\n"
" identifier = mk_valid_identifier(sys.argv[2]);\n"
" outfile.write('#include <stddef.h>\\n\\n');\n"
" outfile.write('extern const char {}[];\\n'.format(identifier));\n"
" outfile.write('extern const size_t {}_len;\\n\\n'.format(identifier));\n"
" outfile.write('const char {}[] =\\n\"'.format(identifier));\n"
"\n"
" while True:\n"
" byte = infile.read(1)\n"
" if byte == b\"\":\n"
" break\n"
" write_byte(outfile, ord(byte))\n"
"\n"
" outfile.write('\";\\n\\n');\n"
" outfile.write('const size_t {}_len = sizeof({}) - 1;\\n'.format(identifier, identifier));\n"
"\n"
"if __name__ == '__main__':\n"
" main()\n"
"";
const size_t pyxxd_len = sizeof(pyxxd) - 1;
Usage (this extracts the script):
#include <stdio.h>
extern const char pyxxd[];
extern const size_t pyxxd_len;
int main()
{
fwrite(pyxxd, 1, pyxxd_len, stdout);
}

Here's a hack I use for Visual C++. I add the following Pre-Build Event (where file.txt is the input and file_txt.h is the output):
#(
echo const char text[] = R"***(
type file.txt
echo ^^^)***";
) > file_txt.h
I then include file_txt.h where I need it.
This isn't perfect, as it adds \n at the start and \n^ at the end, but that's not a problem to handle and I like the simplicity of this solution. If anyone can refine is to get rid of the extra chars, that would be nice.

You can use assembly for this:
asm("fileData: .incbin \"filename.ext\"");
asm("fileDataEnd: db 0x00");
extern char fileData[];
extern char fileDataEnd[];
const int fileDataSize = fileDataEnd - fileData + 1;

Even if it can be done at compile time (I don't think it can in general), the text would likely be the preprocessed header rather than the files contents verbatim. I expect you'll have to load the text from the file at runtime or do a nasty cut-n-paste job.

Hasturkun's answer using the xxd -i option is excellent. If you want to incorporate the conversion process (text -> hex include file) directly into your build the hexdump.c tool/library recently added a capability similar to xxd's -i option (it doesn't give you the full header - you need to provide the char array definition - but that has the advantage of letting you pick the name of the char array):
http://25thandclement.com/~william/projects/hexdump.c.html
It's license is a lot more "standard" than xxd and is very liberal - an example of using it to embed an init file in a program can be seen in the CMakeLists.txt and scheme.c files here:
https://github.com/starseeker/tinyscheme-cmake
There are pros and cons both to including generated files in source trees and bundling utilities - how to handle it will depend on the specific goals and needs of your project. hexdump.c opens up the bundling option for this application.

I think it is not possible with the compiler and preprocessor alone. gcc allows this:
#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)
printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
STRGF(
# define hostname my_dear_hostname
hostname
)
"\n" );
But unfortunately not this:
#define _STRGF(x) # x
#define STRGF(x) _STRGF(x)
printk ( MODULE_NAME " built " __DATE__ " at " __TIME__ " on host "
STRGF(
# include "/etc/hostname"
)
"\n" );
The error is:
/etc/hostname: In function ‘init_module’:
/etc/hostname:1:0: error: unterminated argument list invoking macro "STRGF"

I had similar issues, and for small files the aforementioned solution of Johannes Schaub worked like a charm for me.
However, for files that are a bit larger, it ran into issues with the character array limit of the compiler. Therefore, I wrote a small encoder application that converts file content into a 2D character array of equally sized chunks (and possibly padding zeros). It produces output textfiles with 2D array data like this:
const char main_js_file_data[8][4]= {
{'\x69','\x73','\x20','\0'},
{'\x69','\x73','\x20','\0'},
{'\x61','\x20','\x74','\0'},
{'\x65','\x73','\x74','\0'},
{'\x20','\x66','\x6f','\0'},
{'\x72','\x20','\x79','\0'},
{'\x6f','\x75','\xd','\0'},
{'\xa','\0','\0','\0'}};
where 4 is actually a variable MAX_CHARS_PER_ARRAY in the encoder. The file with the resulting C code, called, for example "main_js_file_data.h" can then easily be inlined into the C++ application, for example like this:
#include "main_js_file_data.h"
Here is the source code of the encoder:
#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>
#define MAX_CHARS_PER_ARRAY 2048
int main(int argc, char * argv[])
{
// three parameters: input filename, output filename, variable name
if (argc < 4)
{
return 1;
}
// buffer data, packaged into chunks
std::vector<char> bufferedData;
// open input file, in binary mode
{
std::ifstream fStr(argv[1], std::ios::binary);
if (!fStr.is_open())
{
return 1;
}
bufferedData.assign(std::istreambuf_iterator<char>(fStr),
std::istreambuf_iterator<char>() );
}
// write output text file, containing a variable declaration,
// which will be a fixed-size two-dimensional plain array
{
std::ofstream fStr(argv[2]);
if (!fStr.is_open())
{
return 1;
}
const std::size_t numChunks = std::size_t(std::ceil(double(bufferedData.size()) / (MAX_CHARS_PER_ARRAY - 1)));
fStr << "const char " << argv[3] << "[" << numChunks << "]" <<
"[" << MAX_CHARS_PER_ARRAY << "]= {" << std::endl;
std::size_t count = 0;
fStr << std::hex;
while (count < bufferedData.size())
{
std::size_t n = 0;
fStr << "{";
for (; n < MAX_CHARS_PER_ARRAY - 1 && count < bufferedData.size(); ++n)
{
fStr << "'\\x" << int(unsigned char(bufferedData[count++])) << "',";
}
// fill missing part to reach fixed chunk size with zero entries
for (std::size_t j = 0; j < (MAX_CHARS_PER_ARRAY - 1) - n; ++j)
{
fStr << "'\\0',";
}
fStr << "'\\0'}";
if (count < bufferedData.size())
{
fStr << ",\n";
}
}
fStr << "};\n";
}
return 0;
}

This problem was irritating me and xxd doesn't work for my use case because it made the variable called something like __home_myname_build_prog_cmakelists_src_autogen when I tried to script it in, so I made a utility to solve this exact problem:
https://github.com/Exaeta/brcc
It generates a source and header file and allows you to explicitly set the name of each variable so then you can use them via std::begin(arrayname) and std::end(arrayname).
I incorporated it into my cmake project like so:
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/binary_resources.hpp ${CMAKE_CURRENT_BINARY_DIR}/binary_resources.cpp
COMMAND brcc ${CMAKE_CURRENT_BINARY_DIR}/binary_resources RGAME_BINARY_RESOURCES_HH txt_vertex_shader ${CMAKE_CURRENT_BINARY_DIR}/src/vertex_shader1.glsl
DEPENDS src/vertex_shader1.glsl)
With small tweaks I suppose it could be made to work for C as well.

If you are using CMake, you probably may be interested in writing CMake preprocessing script like the following:
cmake/ConvertLayout.cmake
function(convert_layout file include_dir)
get_filename_component(name ${file} NAME_WE)
get_filename_component(directory ${file} DIRECTORY)
get_filename_component(directory ${directory} NAME)
string(TOUPPER ${name} NAME)
string(TOUPPER ${directory} DIRECTORY)
set(new_file ${include_dir}/${directory}/${name}.h)
if (${file} IS_NEWER_THAN ${new_file})
file(READ ${file} content)
string(REGEX REPLACE "\"" "\\\\\"" content "${content}")
string(REGEX REPLACE "[\r\n]" "\\\\n\"\\\\\n\"" content "${content}")
set(content "\"${content}\"")
set(content "#ifndef ${DIRECTORY}_${NAME}\n#define ${DIRECTORY}_${NAME} ${content} \n#endif")
message(STATUS "${content}")
file(WRITE ${new_file} "${content}")
message(STATUS "Generated layout include file ${new_file} from ${file}")
endif()
endfunction()
function(convert_layout_directory layout_dir include_dir)
file(GLOB layouts ${layout_dir}/*)
foreach(layout ${layouts})
convert_layout(${layout} ${include_dir})
endforeach()
endfunction()
your CMakeLists.txt
include(cmake/ConvertLayout.cmake)
convert_layout_directory(layout ${CMAKE_BINARY_DIR}/include)
include_directories(${CMAKE_BINARY_DIR}/include)
somewhere in c++
#include "layout/menu.h"
Glib::ustring ui_info = LAYOUT_MENU;

I like #Martin R.'s answer because, as it says, it doesn't touch the input file and automates the process. To improve on this, I added the capability to automatically split up large files that exceed compiler limits. The output file is written as an array of smaller strings which can then be reassembled in code. The resulting script, based on #Martin R.'s version, and an example is included here:
https://github.com/skillcheck/cmaketools.git
The relevant CMake setup is:
make_includable( LargeFile.h
${CMAKE_CURRENT_BINARY_DIR}/generated/LargeFile.h
"c++-include" "L" LINE_COUNT FILE_SIZE
)
The source code is then:
static std::vector<std::wstring> const chunks = {
#include "generated/LargeFile.h"
};
std::string contents =
std::accumulate( chunks.begin(), chunks.end(), std::wstring() );

in x.h
"this is a "
"buncha text"
in main.c
#include <stdio.h>
int main(void)
{
char *textFileContents =
#include "x.h"
;
printf("%s\n", textFileContents);
return 0
}
ought to do the job.

What might work is if you do something like:
int main()
{
const char* text = "
#include "file.txt"
";
printf("%s", text);
return 0;
}
Of course you'll have to be careful with what is actually in the file, making sure there are no double quotes, that all appropriate characters are escaped, etc.
Therefore it might be easier if you just load the text from a file at runtime, or embed the text directly into the code.
If you still wanted the text in another file you could have it in there, but it would have to be represented there as a string. You would use the code as above but without the double quotes in it. For example:
file.txt
"Something evil\n"\
"this way comes!"
main.cpp
int main()
{
const char* text =
#include "file.txt"
;
printf("%s", text);
return 0;
}
So basically having a C or C++ style string in a text file that you include. It would make the code neater because there isn't this huge lot of text at the start of the file.

Using seekg() in text mode

While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.
Eventually I did a sanity check; even if I just do this...
int main()
{
std::ifstream dataFile("myfile.txt",
std::ifstream::in);
if (dataFile.is_open() && !dataFile.fail())
{
while (dataFile.good())
{
std::string line;
dataFile.seekg(dataFile.tellg());
std::getline(dataFile, line);
}
}
}
...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?

This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.
The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte \r\n endline is converted to \n, windows also seems to append a spurious newline to the end of the file).
lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).
The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
For a file with a single a character I get the following output
2 3
Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.
Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering
#include <iostream>
#include <fstream>
int main()
{
std::ifstream f;
f.rdbuf()->pubsetbuf(nullptr, 0);
f.open("myfile.txt");
for (char c; f.get(c);)
std::cout << f.tellg() << ' ';
}
but this is far from ideal.
Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.

Thanks for this , though it's a very old post. I was stuck on this problem for more then a week. Here's some code examples on my site (the menu versions 1 and 2). Version 1 uses the solution presented here, in case anyone wants to see it .
:)
void customerOrder::deleteOrder(char* argv[]){
std::fstream newinFile,newoutFile;
newinFile.rdbuf()->pubsetbuf(nullptr, 0);
newinFile.open(argv[1],std::ios_base::in);
if(!(newinFile.is_open())){
throw "Could not open file to read customer order. ";
}
newoutFile.open("outfile.txt",std::ios_base::out);
if(!(newoutFile.is_open())){
throw "Could not open file to write customer order. ";
}
newoutFile.seekp(0,std::ios::beg);
std::string line;
int skiplinesCount = 2;
if(beginOffset != 0){
//write file from zero to beginoffset and from endoffset to eof If to delete is non-zero
//or write file from zero to beginoffset if to delete is non-zero and last record
newinFile.seekg (0,std::ios::beg);
// if primarykey < largestkey , it's a middle record
customerOrder order;
long tempOffset(0);
int largestKey = order.largestKey(argv);
if(primaryKey < largestKey) {
//stops right before "current..." next record.
while(tempOffset < beginOffset){
std::getline(newinFile,line);
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
newinFile.seekg(endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while( std::getline(newinFile,line) ) {
newoutFile << line << std::endl;
}
} else if (primaryKey == largestKey){
//its the last record.
//write from zero to beginoffset.
while((tempOffset < beginOffset) && (std::getline(newinFile,line)) ) {
newoutFile << line << std::endl;
tempOffset = newinFile.tellg();
}
} else {
throw "Error in delete key"
}
} else {
//its the first record.
//write file from endoffset to eof
//works with endOffset - 4 (but why??)
newinFile.seekg (endOffset);
//skip two lines between records.
for(int i=0; i<skiplinesCount;++i) {
std::getline(newinFile,line);
}
while(std::getline(newinFile,line)) {
newoutFile << line << std::endl;
}
}
newoutFile.close();
newinFile.close();
}
beginOffset is a specific point in the file (beginning of each record) , and endOffset is the end of the record, calculated in another function with tellg (findFoodOrder) I did not add this as it may become very lengthy, but you can find it on my site (under: menu version 1 link) :
http://www.buildincode.com

Copying contents of one file to another leaves the other file empty

I'm trying to copy the contents of document1.txt to document2.txt using this simple program:
int main() {
ifstream in("document1.txt");
ofstream out("document2.txt");
string str;
while(getline(in,str))
out<<str;
}
But, when I run the program, I find that document2.txt is still empty.
What could be wrong?

ifstream in("document1.txt");
ofstream out("document2.txt");
string str;
while(getline(in,str))
{
out<<str;
}
in.close(); // <---
out.close(); // <---
There is a function to close it. Please check this: ifstream and ofstream.

"I have manually created document1.txt and document2.txt"
In case you are running your program directly from Visual Studio, note that the working directory is set to $(ProjectDir) by default, i.e. these files must be placed at the same directory where your project file (.vcproj / .vcxproj) is.
You could either place them to the appropriate directory or specify the full path, for example place them directly to C: and in code do:
ifstream in("C:\\document1.txt");
ofstream out("C:\\document2.txt");
This is something you would notice if you outputted something in case of an error while opening:
if (!out.is_open())
{
std::cout << "ERROR: Can not open document2.txt" << std::endl;
return -1;
}
Note that in this case you don't actually need to flush any buffer, neither close the streams explicitly. All of this will happen automatically when these objects are destructed, i.e. when the execution goes out of scope.

ifstream document1("document1.txt");
ofstream document2("document2.txt");
string str;
getline(document1, str);
document2<< str<< endl;
document1.close();
document2.close();
Try this code. It works for me.

test.cpp
#include <iostream>
#include <fstream>
using namespace std;
int main() {
ifstream in("document1.txt");
ofstream out("document2.txt");
string str;
while(getline(in,str))
out<<str<<endl;
}
document1.txt
aaaaaaaa
bbb bbb
c c c c
d
under command line you can do
$g++ test.cpp -o test.out
$test.out
And then you will get document2.txt
aaaaaaaa
bbb bbb
c c c c
d
P.S. All the code tested under Mac and Linux
P.S. if you change 9th line of test.cpp from out<<str<<endl; to out<<str;
document2.txt would becomes
aaaaaaaabbb bbbc c c cd
because the getline(in,str) function would extracts characters from in and stores them into str until '\n' or end-of-file is found.
Which means str does not contains '\n'. You need to add '\n' by yourself.
P.S. under C++, as LihO says, you don't need to close in and out, since in and out would be closed at line 10 }
Also, you don't need to flush when you execute line 9, since endl has flush function.

C++ reading a file in binary mode. Problems with END OF FILE

I am learning C++and I have to read a file in binary mode. Here's how I do it (following the C++ reference):
unsigned values[255];
unsigned total;
ifstream in ("test.txt", ifstream::binary);
while(in.good()){
unsigned val = in.get();
if(in.good()){
values[val]++;
total++;
cout << val <<endl;
}
}
in.close();
So, I am reading the file byte per byte till in.good() is true. I put some cout at the end of the while in order to understand what's happening, and here is the output:
marco#iceland:~/workspace/huffman$ ./main
97
97
97
97
10
98
98
10
99
99
99
99
10
100
100
10
101
101
10
221497852
marco#iceland:~/workspace/huffman$
Now, the input file "test.txt" is just:
aaaa
bb
cccc
dd
ee
So everything works perfectly till the end, where there's that 221497852. I guess it's something about the end of file, but I can't figure the problem out.
I am using gedit & g++ on a debian machine(64bit).
Any help help will be appreciated.
Many thanks,
Marco

fstream::get returns an int-value. This is one of the problems.
Secondly, you are reading in binary, so you shouldn't use formatted streams. You should use fstream::read:
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = new char [length];
std::cout << "Reading " << length << " characters... ";
// read data as a block:
is.read (buffer,length);
if (is)
std::cout << "all characters read successfully.";
else
std::cout << "error: only " << is.gcount() << " could be read";
is.close();
// ...buffer contains the entire file...
delete[] buffer;
}
return 0;
}

This isn't the way istream::get() was designed to be used.
The classical idiom for using this function would be:
for ( int val = in.get(); val != EOF; val = in.get() ) {
// ...
}
or even more idiomatic:
char ch;
while ( in.get( ch ) ) {
// ...
}
The first loop is really inherited from C, where in.get() is
the equivalent of fgetc().
Still, as far as I can tell, the code you give should work.
It's not idiomatic, and it's not
The C++ standard is unclear what it should return if the
character value read is negative. fgetc() requires a value in
the range [0...UCHAR_MAX], and I think it safe to assume that
this is the intent here. It is, at least, what every
implementation I've used does. But this doesn't affect your
input. Depending on how the implementation interprets the
standard, the return value of in.get() must be in the range
[0...UCHAR_MAX] or [CHAR_MIN...CHAR_MAX], or it must be EOF
(typically -1). (The reason I'm fairly sure that the intent is
to require [0...UCHAR_MAX] is because otherwise, you may not
be able to distinguish end of file from a valid character.)
And if the return value is EOF (almost always
-1), failbit should be set, so in.good() would return
false. There is no case where in.get() would be allowed
to return 221497852. The only explication I can possibly think
of for your results is that your file has some character with
bit 7 set at the end of the file, that the implementation is
returning a negative number for this (but not end of file,
because it is a character), which results in an out of bounds
index in values[val], and that this out of bounds index
somehow ends up modifying val. Or that your implementation is
broken, and is not setting failbit when it returns end of
file.
To be certain, I'd be interested in knowing what you get from
the following:
std::ifstream in( "text.txt", std::ios_base::binary );
int ch = in.get();
while ( ch != std::istream::traits_type::eof() ) {
std::cout << ch << std::endl;
ch = in.get();
}
This avoids any issues of a possibly invalid index, and any type
conversions (although the conversion int to unsigned is well
defined). Also, out of curiosity (since I can only access VC++
here), you might try replacing in as follows:
std::istringstream in( "\n\xE5" );
I would expect to get:
10
233
(Assuming 8 bit bytes and an ASCII based code set. Both of
which are almost, but not quite universal today.)

I've eventually figured this out.
Apparently it seems the problem wasn't due to any code. The problem was gedit. It always appends a newline character at the end of file. This also happen with other editors, such as vim. For some editor this can be configured to not append anything, but in gedit this is apparently not possible. https://askubuntu.com/questions/13317/how-to-stop-gedit-gvim-vim-nano-from-adding-end-of-file-newline-char
Cheers to everyone who asked me,
Marco

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ stream get unget not a nop? - c++

Related

Proper way to convert HEX to ASCII read from a file C++

Get file string in compile time [duplicate]

Using seekg() in text mode

Copying contents of one file to another leaves the other file empty

C++ reading a file in binary mode. Problems with END OF FILE

Categories

Resources