basic_ifstream<...>::read() doesn't read anything - c++

The program built from this code:
#include <fstream>
using std::basic_ifstream;
#include <ios>
using std::streamsize;
#include <ZenLib/Conf.h>
using ZenLib::int8u;
int main() {
#define charT int8u
#define T basic_ifstream<charT>
T ifs ("/proc/cpuinfo", T::in | T::binary);
#undef T
streamsize const bufsize (4096);
charT buf[bufsize];
#undef charT
return !ifs.read(buf, bufsize).gcount();
}
... returns 1.
So std::basic_ifstream<ZenLib::int8u>::read() could not extract any byte from /proc/cpuinfo.
Am I doing anything wrong?

Intantiating std::char_traits for anything but char or wchar_t is
undefined behavior (And I suspect that your charT is unsigned char,
not char.) If you want to use a different type for characters, you'll
have to define a new traits class; for std::istream anf
std::ostream, you'll also have to define a number of facets for the
type as well.
The question is what you want to do. In your example, you only call
std::istream::read. If this is the case, the simplest solution is
probably to just drop down to the system level functions. These
probably want a char* for there buffer as well, but a
reinterpret_cast from unsigned char* will work. You can do this for
std::istream<char>::read as well, but if you have an std::istream,
there's a definite possibility that some formatted input will creap in,
and that will interpret the characters before you can get your
reinterpret_cast in.

The stream libraries are designed to be used with the character types such as char and wchar_t, not integers:
C++11 standard: 27.2.2
In the classes of Clause 27, a template formal parameter with name
charT represents a member of the set of types containing char,
wchar_t, and any other implementation-defined character types that
satisfy the requirements for a character on which any of the iostream
components can be instantiated.
Maybe start from this:
int main()
{
std::ifstream ifs("/proc/cpuinfo", std::ios::binary);
std::cout << ifs.rdbuf();
}

Related

How to make char16_t acceptable as a template parameter to basic_ifstream? [duplicate]

The following code works as expected. The source code, file "file.txt" and "out.txt" are all encoded with utf8. But it does not work when I change wchar_t to char16_t at the first line in main(). I've tried both gcc5.4 and clang8.0 with -std=c++11. My goal is to replace wchar_t with char16_t, as wchar_t takes twice space in RAM. I thought these 2 types are equally well supported in c++11 and later standards. What do I miss here?
#include<iostream>
#include<fstream>
#include<locale>
#include<codecvt>
#include<string>
int main(){
typedef wchar_t my_char;
std::locale::global(std::locale("en_US.UTF-8"));
std::ofstream out("file.txt");
out << "123正则表达式abc" << std::endl;
out.close();
std::basic_ifstream<my_char> win("file.txt");
std::basic_string<my_char> wstr;
win >> wstr;
win.close();
std::ifstream in("file.txt");
std::string str;
in >> str;
in.close();
std::wstring_convert<std::codecvt_utf8<my_char>, my_char> my_char_conv;
std::basic_string<my_char> conv = my_char_conv.from_bytes(str);
std::cout << (wstr == conv ? "true" : "false") << std::endl;
std::basic_ofstream<my_char> wout("out.txt");
wout << wstr << std::endl << conv << std::endl;
wout.close();
return 0;
}
EDIT
The modified code does not compile with clang8.0. It compiles with gcc5.4 but crashes at run-time as shown by #Brian.
The various stream classes need a set of definitions to be operational. The standard library requires the relevant definitions and objects only for char and wchar_t but not for char16_t or char32_t. Off the top of my head the following is needed to use std::basic_ifstream<cT> or std::basic_ofstream<cT>:
std::char_traits<cT> to specify how the character type behaves. I think this template is specialized for char16_t and char32_t.
The used std::locale needs to contain an instance of the std::num_put<cT> facet to format numeric types. This facet can just be instantiated and a new std::locale containing it can be created but the standard doesn't mandate that it is present in a std::locale object.
The used std::locale needs to contain an instance of the facet std::num_get<cT> to read numeric types. Again, this facet can be instantiated but isn't required to be present by default.
the facet std::numpunct<cT> needs to be specialized and put into the used std::locale to deal with decimal points, thousand separators, and textual boolean values. Even if it isn't really used it will be referenced from the numeric formatting and parsing functions. There is no ready specialization for char16_t or char32_t.
The facet std::ctype<cT> needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. There is no ready specialization for char16_t or char32_t.
The facet std::codecvt<cT, char, std::mbstate_t> needs to be specialized and put into the used std::locale to convert between external byte sequences and internal "character" sequences. There is no ready specialization for char16_t or char32_t.
Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups. However, the std::codecvt facet tends to be rather tricky, especially because std::mbstate_t is an opaque type from the point of view of the standard C++ library.
All of that can be done. It is a while since I last did a proof of concept implementation for a character type. It took me about a day worth of work. Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before. To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).

Why does `std::basic_ifstream<char16_t>` not work in c++11?

The following code works as expected. The source code, file "file.txt" and "out.txt" are all encoded with utf8. But it does not work when I change wchar_t to char16_t at the first line in main(). I've tried both gcc5.4 and clang8.0 with -std=c++11. My goal is to replace wchar_t with char16_t, as wchar_t takes twice space in RAM. I thought these 2 types are equally well supported in c++11 and later standards. What do I miss here?
#include<iostream>
#include<fstream>
#include<locale>
#include<codecvt>
#include<string>
int main(){
typedef wchar_t my_char;
std::locale::global(std::locale("en_US.UTF-8"));
std::ofstream out("file.txt");
out << "123正则表达式abc" << std::endl;
out.close();
std::basic_ifstream<my_char> win("file.txt");
std::basic_string<my_char> wstr;
win >> wstr;
win.close();
std::ifstream in("file.txt");
std::string str;
in >> str;
in.close();
std::wstring_convert<std::codecvt_utf8<my_char>, my_char> my_char_conv;
std::basic_string<my_char> conv = my_char_conv.from_bytes(str);
std::cout << (wstr == conv ? "true" : "false") << std::endl;
std::basic_ofstream<my_char> wout("out.txt");
wout << wstr << std::endl << conv << std::endl;
wout.close();
return 0;
}
EDIT
The modified code does not compile with clang8.0. It compiles with gcc5.4 but crashes at run-time as shown by #Brian.
The various stream classes need a set of definitions to be operational. The standard library requires the relevant definitions and objects only for char and wchar_t but not for char16_t or char32_t. Off the top of my head the following is needed to use std::basic_ifstream<cT> or std::basic_ofstream<cT>:
std::char_traits<cT> to specify how the character type behaves. I think this template is specialized for char16_t and char32_t.
The used std::locale needs to contain an instance of the std::num_put<cT> facet to format numeric types. This facet can just be instantiated and a new std::locale containing it can be created but the standard doesn't mandate that it is present in a std::locale object.
The used std::locale needs to contain an instance of the facet std::num_get<cT> to read numeric types. Again, this facet can be instantiated but isn't required to be present by default.
the facet std::numpunct<cT> needs to be specialized and put into the used std::locale to deal with decimal points, thousand separators, and textual boolean values. Even if it isn't really used it will be referenced from the numeric formatting and parsing functions. There is no ready specialization for char16_t or char32_t.
The facet std::ctype<cT> needs to be specialized and put into the used facet to support widening, narrowing, and classification of the character type. There is no ready specialization for char16_t or char32_t.
The facet std::codecvt<cT, char, std::mbstate_t> needs to be specialized and put into the used std::locale to convert between external byte sequences and internal "character" sequences. There is no ready specialization for char16_t or char32_t.
Most of the facets are reasonably easy to do: they just need to forward a simple conversion or do table look-ups. However, the std::codecvt facet tends to be rather tricky, especially because std::mbstate_t is an opaque type from the point of view of the standard C++ library.
All of that can be done. It is a while since I last did a proof of concept implementation for a character type. It took me about a day worth of work. Of course, I knew what I need to do when I embarked on the work having implemented the locales and IOStreams library before. To add a reasonable amount of tests rather than merely having a simple demo would probably take me a week or so (assuming I can actually concentrate on this work).

Converting Const char * to Unsigned long int - strtoul

I am using the following code to convert Const char * to Unsigned long int, but the output is always 0. Where am I doing wrong? Please let me know.
Here is my code:
#include <iostream>
#include <vector>
#include <stdlib.h>
using namespace std;
int main()
{
vector<string> tok;
tok.push_back("2");
const char *n = tok[0].c_str();
unsigned long int nc;
char *pEnd;
nc=strtoul(n,&pEnd,1);
//cout<<n<<endl;
cout<<nc<<endl; // it must output 2 !?
return 0;
}
Use base-10:
nc=strtoul(n,&pEnd,10);
or allow the base to be auto-detected:
nc=strtoul(n,&pEnd,0);
The third argument to strtoul is the base to be used and you had it as base-1.
You need to use:
nc=strtoul(n,&pEnd,10);
You used base=1 that means only zeroes are allowed.
If you need info about integer bases you can read this
The C standard library function strtoul takes as its third argument the base/radix of the number system to be used in interpreting the char array pointed to by the first argument.
Where am I doing wrong?
nc=strtoul(n,&pEnd,1);
You're passing the base as 1, which leads to a unary numeral system i.e. the only number that can be repesented is 0. Hence you'd get only that as the output. If you need decimal system interpretation, pass 10 instead of 1.
Alternatively, passing 0 lets the function auto-detect the system based on the prefix: if it starts with 0 then it is interpreted as octal, if it is 0x or 0X it is taken as hexadecimal, if it has other numerals it is assumed as decimal.
Aside:
If you don't need to know the character upto which the conversion was considered then passing a dummy second parameter is not required; you can pass NULL instead.
When you're using a C standard library function in a C++ program, it's recommended that you include the C++ version of the header; with the prefix c, without the suffix .h e.g. in your case, it'd be #include <cstdlib>
using namespace std; is considered bad practice

Why doesn't uint8_t and int8_t work with file and console streams? [duplicate]

This question already has answers here:
uint8_t iostream behavior
(3 answers)
Closed 9 years ago.
$ file testfile.txt
testfile.txt: ASCII text
$ cat testfile.txt
aaaabbbbccddef
#include <iostream>
#include <fstream>
#include <string>
#include <cstdint>
typedef uint8_t byte; // <-------- interesting
typedef std::basic_ifstream<byte> FileStreamT;
static const std::string FILENAME = "testfile.txt";
int main(){
FileStreamT file(FILENAME, std::ifstream::in | std::ios::binary);
if(!file.is_open())
std::cout << "COULD NOT OPEN FILE" << std::endl;
else{
FileStreamT::char_type buff;
file.read(&buff,1);
std::cout << (SOMECAST)buff; // <------- interesting
}
std::cout << "done" << std::endl;
}
Depending on what is in the typedef and what is it casted to (or not casted), it does all sorts of stupid things.
It happens to work with 'typedef char' and no cast. (97 when casted to int, as expected)
Both uint8_t and int8_t will print
nothing without cast
nothing when casted to char or unsigned char
8 when casted to int or unsigned (although ASCII 'a' should be 97)
I somehow managed to print a "�" character, but forgot which case it was.
Why do I get these strange results?
notes for the future reader:
takeaway from the answer given: only instantiate streams with char (or one of the wide characters also mentioned by the standard), otherwise you get no compiler warning and silent failure
it is very sad that the standard warrants these things
moral of the story: avoid C++
The declaration of template std::basic_ifstream is:
template<
class CharT,
class Traits = std::char_traits<CharT>
> class basic_ifstream;
The C++03 Standard (21.1/1) requires the library to define specializations
of std::char_traits<CharT> for CharT = char, wchar_t.
The C++11 Standard (C++11 21.2/1) requires the library to define specializations
of std::char_traits<CharT> for CharT = char,char16_t,char32_t,wchar_t.
If you instantiate std::basic_ifstream<Other> with Other not one of
the 2[4] types nominated by the Standard to which you are compiling then
the behaviour will be undefined, unless you yourself
define my_char_traits<Other> as you require and then instantiate
std::basic_ifstream<Other,my_char_traits<Other>>.
CONTINUED in response to OP's comments.
Requesting an std::char_traits<Other> will not provoke template instantiation
errors: the template is defined so that you may specialize it, but the
default (unspecialized) instantiation is very likely to be wrong for Other
or indeed for any given CharT, where wrong means does not satisfy the
the Standard's requirements for a character traits class per C++03 § 21.1.1/C++11 § 21.2.1.
You suspect that a typedef might thwart the choice of a template specialization
for the typedef-ed type, i.e. that the fact that uint8_t and int8_t
are typedefs for fundamentals character types might result in std::basic_ifstream<byte>
not being the same as std::basic_ifstream<FCT>, where FCT
is the aliased fundamental character type.
Forget that suspicion.typedef is transparent. It seems you believe one of
the typedefs int8_t and uint8_t must be char, in which case - unless
the typedef was somehow intefering with template resolution -
one of the misbehaving basic_ifstream instantiations you have tested would
have to be std::basic_ifstream<char>
But what about the fact that typedef char byte is harmless? That belief that
either int8_t or uint8_t = char is false. You will find that int8_t
is an alias for signed char while uint8_t is an alias for unsigned char.
But neither signed char nor unsigned char is the same type as char:
C++03/11 § 3.9.1/1
Plain char, signed char, and unsigned char are three distinct types
So both char_traits<int8_t> and char_traits<uint8_t> are default,
unspecialized, instantiations of template char_traits and you have
no right to expect that they fulfill that Standard's requirements of
character traits.
The one test case in which you found no misbehaviour was for byte = char.
That is because char_traits<char> is a Standard specialization provided
by the library.
The connection between all the misbehaviour you have observed and the
types that you have substituted for SOMECAST in:
std::cout << (SOMECAST)buff; // <------- interesting
is none. Since your testfile contains ASCII text, basic_ifstream<char>
is the one and only instantiation of basic_ifstream that the Standard warrants
for reading it. If you read the file using typedef char byte in your program
then none of the casts that you say you substituted will have an unexpected
result: SOMECAST = char or unsigned char will output a, and
SOMECAST = int or unsigned int will output 97.
All the misbehaviour arises from instantiating basic_ifstream<CharT> with CharT
some type that the Standard does not warrant.

Equivalent of a python generator in C++ for buffered reads

Guido Van Rossum demonstrates the simplicity of Python in this article and makes use of this function for buffered reads of a file of unknown length:
def intsfromfile(f):
while True:
a = array.array('i')
a.fromstring(f.read(4000))
if not a:
break
for x in a:
yield x
I need to do the same thing in C++ for speed reasons! I have many files containing sorted lists of unsigned 64 bit integers that I need to merge. I have found this nice piece of code for merging vectors.
I am stuck on how to make an ifstream for a file of unknown length present itself as a vector which can be happily iterated over until the end of the file is reached. Any suggestions? Am I barking up the correct tree with an istreambuf_iterator?
In order to disguise an ifstream (or really, any input stream) in a form that acts like an iterator, you want to use the istream_iterator or the istreambuf_iterator template class. The former is useful for files where the formatting is of concern. For example, a file full of whitespace-delimited integers can be read into the vector's iterator range constructor as follows:
#include <fstream>
#include <vector>
#include <iterator> // needed for istream_iterator
using namespace std;
int main(int argc, char** argv)
{
ifstream infile("my-file.txt");
// It isn't customary to declare these as standalone variables,
// but see below for why it's necessary when working with
// initializing containers.
istream_iterator<int> infile_begin(infile);
istream_iterator<int> infile_end;
vector<int> my_ints(infile_begin, infile_end);
// You can also do stuff with the istream_iterator objects directly:
// Careful! If you run this program as is, this won't work because we
// used up the input stream already with the vector.
int total = 0;
while (infile_begin != infile_end) {
total += *infile_begin;
++infile_begin;
}
return 0;
}
istreambuf_iterator is used to read through files a single character at a time, disregarding the formatting of the input. That is, it will return you all characters, including spaces, newline characters, and so on. Depending on your application, that may be more appropriate.
Note: Scott Meyers explains in Effective STL why the separate variable declarations for istream_iterator are needed above. Normally, you would do something like this:
ifstream infile("my-file.txt");
vector<int> my_ints(istream_iterator<int>(infile), istream_iterator<int>());
However, C++ actually parses the second line in an incredibly bizarre way. It sees it as the declaration of a function named my_ints that takes in two parameters and returns a vector<int>. The first parameter is of type istream_iterator<int> and is named infile (the parantheses are ignored). The second parameter is a function pointer with no name that takes zero arguments (because of the parantheses) and returns an object of type istream_iterator<int>.
Pretty cool, but also pretty aggravating if you're not watching out for it.
EDIT
Here's an example using the istreambuf_iterator to read in a file of 64-bit numbers laid out end-to-end:
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
int main(int argc, char** argv)
{
ifstream input("my-file.txt");
istreambuf_iterator<char> input_begin(input);
istreambuf_iterator<char> input_end;
// Fill a char vector with input file's contents:
vector<char> char_input(input_begin, input_end);
input.close();
// Convert it to an array of unsigned long with a cast:
unsigned long* converted = reinterpret_cast<unsigned long*>(&char_input[0]);
size_t num_long_elements = char_input.size() * sizeof(char) / sizeof(unsigned long);
// Put that information into a vector:
vector<unsigned long> long_input(converted, converted + num_long_elements);
return 0;
}
Now, I personally rather dislike this solution (using reinterpret_cast, exposing char_input's array), but I'm not familiar enough with istreambuf_iterator to comfortably use one templatized over 64-bit characters, which would make this much easier.