Unicode File Writing and Reading in C++? - c++

Can anyone Provide a Simple Example to Read and Write in the Unicode File a Unicode Character ?

try http://utfcpp.sourceforge.net/. the link has an introductory example to read a utf8 file, line by line.

On linux I use the iconv (link) library which is very standard. An overly simple program is:
#include <stdio.h>
#include <stdlib.h>
#include <iconv.h>
#define BUF_SZ 1024
int main( int argc, char* argv[] )
{
char bin[BUF_SZ];
char bout[BUF_SZ];
char* inp;
char* outp;
ssize_t bytes_in;
size_t bytes_out;
size_t conv_res;
if( argc != 3 )
{
fprintf( stderr, "usage: convert from to\n" );
return 1;
}
iconv_t conv = iconv_open( argv[2], argv[1] );
if( conv == (iconv_t)(-1) )
{
fprintf( stderr, "Cannot conver from %s to %s\n", argv[1], argv[2] );
return 1;
}
bytes_in = read( 0, bin, BUF_SZ );
{
bytes_out = BUF_SZ;
inp = bin;
outp = bout;
conv_res = iconv( conv, &inp, &bytes_in, &outp, &bytes_out );
if( conv_res >= 0 )
{
write( 1, bout, (size_t)(BUF_SZ) - bytes_out );
}
}
iconv_close( conv );
return 0;
}
This is overly simple to demonstrate the conversion. In the real world you would normally have two nested loops:
One reading input, so handle when its more than BUF_SZ
One converting input to output. Remember if you're converting from ascii to UTF-32LE you will end up with each iunput byte being 4 bytes of output. So the inner loop would handle this by examining conv_res and then checking errno.

In case you're using Windows.
Use fgetws http://msdn.microsoft.com/en-us/library/c37dh6kf(VS.71).aspx to read
and fputws http://msdn.microsoft.com/en-us/library/t33ya8ky(VS.71).aspx to write.
The example code are in the provided links.

Related

Convert the Linux open, read, write, close functions to work on Windows

The code below was written for Linux and uses open, read, write and close. I am working on a Windows computer where I normally use fopen, fgets, fputs, fclose. Right now I get a no prototype error for open, read, write and close. Is there a header file I can include to make this work on a Windows computer or do I need to convert the code? Can you show how to convert it so it works the same on Windows or at least point me to an online document which shows how to convert it?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#ifdef unix
#include <unistd.h>
#endif
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#ifndef O_BINARY
#define O_BINARY 0
#endif
#define NB 8192
char buff[NB];
int
main(argc,argv)
int argc;
char **argv;
{
int fdi, fdo, i, n, m;
char *p, *q;
char c;
if( argc > 0 )
printf( "%s: Reverse bytes in 8-byte values \n", argv[0] );
if( argc > 1 )
strcpy( buff, argv[1] );
else
{
printf( "Input file name ? " );
gets( buff );
}
fdi = open( buff, O_BINARY | O_RDONLY, S_IREAD );
if( fdi <= 0 )
{
printf( "Can't open <%s>\n", buff );
exit(2);
}
if( argc > 2 )
strcpy( buff, argv[2] );
else
{
printf( "Output file name ? " );
gets( buff );
}
fdo = open( buff, O_BINARY | O_RDWR | O_CREAT | O_TRUNC,
S_IREAD | S_IWRITE );
if( fdo <= 0 )
{
printf( "Can't open <%s>\n", buff );
exit(2);
}
while( (n = read( fdi, buff, NB )) > 0 )
{
m = n / 8;
p = buff;
q = buff+7;
for( i=0; i<m; i++ )
{
c = *p;
*p++ = *q;
*q-- = c;
c = *p;
*p++ = *q;
*q-- = c;
c = *p;
*p++ = *q;
*q-- = c;
c = *p;
*p++ = *q;
*q-- = c;
p += 4;
q += 12;
}
write( fdo, buff, n );
}
close( fdo );
close( fdi );
exit(0);
}
Microsoft directly supports POSIX-style low-level IO calls such as open(), read(), , write(), and close(); although with what appears to be a misleading "deprecated" characterization.
The required header is <io.h>.
The calls correspond to functions named with a preceeding underscore, so open() maps to _open().
The full list of supported "low-level" IO functions Microsoft supports are:
Low-Level I/O
Low-Level I/O Functions
Function Use
_close Close file
_commit Flush file to disk
_creat, _wcreat Create file
_dup Return next available file descriptor for given file
_dup2 Create second descriptor for given file
_eof Test for end of file
_lseek, _lseeki64 Reposition file pointer to given location
_open, _wopen Open file
_read Read data from file
_sopen, _wsopen, _sopen_s, _wsopen_s Open file for file sharing
_tell, _telli64 Get current file-pointer position
_umask, _umask_s Set file-permission mask
_write Write data to file
Some of the low-level functions may not have a non-underscore, POSIX-style equivalent name.
The corresponding functions in Windows use the same name but with an underscore (_) prepended to the name.
open -> _open
close -> _close
etc.
They are declared in the header io.h. See https://msdn.microsoft.com/en-us/library/z0kc8e3z.aspx for the list of all supported functions.
Borland C++ Builder encapsulated binary file access functions into:
FileOpen
FileCreate
FileRead
FileWrite
FileSeek
FileClose
Here simple example of loading text file:
BYTE *txt=NULL; int hnd=-1,siz=0;
hnd = FileOpen("textfile.txt",fmOpenRead);
if (hnd!=-1)
{
siz=FileSeek(hnd,0,2); // position to end of file (0 bytes from end) and store the offset to siz which means size of file
FileSeek(hnd,0,0); // position to start of file (0 bytes from start)
txt = new BYTE[siz];
FileRead(hnd,txt,siz); // read siz bytes to txt buffer
FileClose(hnd);
}
if (txt!=NULL)
{
// here do your stuff with txt[siz] I save it to another file
hnd = FileCreate("output.txt");
if (hnd!=-1)
{
FileWrite(hnd,txt,siz); // write siz bytes to txt buffer
FileClose(hnd);
}
delete[] txt;
}
IIRC All these are part of VCL so in case you are using console you need to set VCL include check during the project creation or include it manually.

different results with printf and fprintf

I need function that prints "word=n" (where n in [0..10]) to stream using linux function ssize_t write(int fd, const void *buf, size_t count);. Trying to use fprintf, but it's give strange results : program prints in ~1% of calls "woword=n", and length for example "woword=7" are 7. Printf print all right. I'm doing something wrong or this is the bag ?
if ((id_result = open( out , O_WRONLY)) <= 0) {
fprintf(stderr, "%s : %s\n", currentDateTime().c_str(), "could not open output\0");
ret = P_STREAMS_LOAD_ERROR;
}
void printProbability( int probability ){
char buf[50];
memset( buf, '\0', 50 );
int length = sprintf( buf, "word=%i\n\0", probability );
fprintf(stderr, "debug : word=%i len = %i\n\0", probability, length );
int result = write( id_result, buf, length );
if( result == -1){
fprintf(stderr, "%s : %s\n", currentDateTime().c_str(), "error \n");
}
}
EDITED:
how I understand, we have 2 theorys :
1) mixing printf and write
2) using '\0' and '\n' in fprintf
int length = sprintf( buf, "word=%i", probability );
int result = write( id_result, buf, length );
write( id_result, "\n", 1 );
with this code I still have same errors
aa help me :))
If you are interspersing calls to printf (or write) and fprintf(stderr, ...) the output won't necessarily come out in order. There is buffering going on, and the actual output probably won't switch at the end-of-line character.

C++ fwrite() in hex

Writing code with Winsock.
I currently have this:
fwrite(buff, 1, len, stdout);
How to do it like:
for ( int i = 0; i < len; i++ ) {
printf( "%02x ", unsigned char (buff[i]) );
}
printf( "\n" );
Or should I just remove the fwrite and use the print instead?
I wanted to write it to stdout, cuz I have my option to either write to stdout of write to file.
fprintf (see the docs) is like printf but to an arbitrary file:
fprintf(stdout, "%02x ", unsigned char (buff[i]));

Is it possible to use a C++ stream class to buffer reads from a pipe?

In short, is it possible to do buffered reads from a pipe from a stream class, along the lines of what this pseudo-example describes.
Please ignore any pedantic problems you see (like not checking errors & the like); I'm doing all that in my real code, this is just a pseudo-example to get across my question.
#include <iostream> // or istream, ifstream, strstream, etc; whatever stream could pull this off
#include <unistd.h>
#include <stdlib.h>
#include <sstream>
void myFunc() {
int pipefd[2][2] = {{0,0},{0,0}};
pipe2( pipefd[0], O_NONBLOCK );
pipe2( pipefd[1], O_NONBLOCK );
if( 0 == fork() ) {
close( pipefd[0][1] );
close( pipefd[1][1] );
dup2( pipefd[0][0], stdout );
dup2( pipefd[1][0], stderr );
execv( /* some arbitrary program */ );
} else {
close( pipefd[0][0] );
close( pipefd[1][0] );
/* cloudy bubble here for the 'right thing to do'.
* Obviously this is faulty code; look at the intent,
* not the implementation.
*/
#ifdef RIGHT_THING_TO_DO
for( int ii = 0; ii < 2; ++ii ) {
cin.tie( pipefd[ii][1] );
do {
cin.readline( /* ... */ );
} while( /* ... */ );
}
#else
// This is what I'm doing now; it works, but I'm
// curious whether it can be done more concisely
do {
do {
select( /* ... */ );
for( int ii = 0; ii < 2; ++ii ) {
if( FD_SET( fd[ii][1], &rfds ) ) {
read( fd[ii][1], buff, 4096 );
if( /* read returned a value > 0 */ ) {
myStringStream << buff;
} else {
FD_CLR( fd[ii][1], &rfds );
}
}
}
} while( /* select returned a value > 0 */ );
} while( 0 == waitpid( -1, 0, WNOHANG ) );
#endif
}
}
Edit
Here's a simple example of how to use boost::file_descriptor to work with a pipe; should work with sockets too, didn't test though.
This is how I compiled it:
g++ -m32 -DBOOST_IOSTREAMS_NO_LIB -isystem ${BOOST_PATH}/include \
${BOOST_SRC_PATH}/libs/iostreams/src/file_descriptor.cpp blah.cc -o blah
Here's the example:
#include <fcntl.h>
#include <stdio.h>
#include <boost/iostreams/device/file_descriptor.hpp>
#include <boost/iostreams/stream.hpp>
int main( int argc, char* argv[] ) {
// if you just do 'using namespace...', there's a
// namespace collision with the global 'write'
// function used in the child
namespace io = boost::iostreams;
int pipefd[] = {0,0};
pipe( pipefd, 0 ); // If you use O_NONBLOCK, you'll have to
// add some extra checks to the loop so
// it will wait until the child is finished.
if( 0 == fork() ) {
// child
close( pipefd[0] ); // read handle
dup2( pipefd[1], FILENO_STDOUT );
printf( "This\nis\na\ntest\nto\nmake sure that\nit\nis\working as expected.\n" );
return 0; // ya ya, shoot me ;p
}
// parent
close( pipefd[1] ); // write handle
char *buff = new char[1024];
memset( buff, 0, 1024 );
io::stream<io::file_descriptor_source> fds(
io::file_descriptor_source( pipefd[0], io::never_close_handle ) );
// this should work with std::getline as well
while( fds.getline( buff, 1024 )
&& fds.gcount() > 0 // this condition is not enough if you use
// O_NONBLOCK; it should only bail if this
// is false AND the child has exited
) {
printf( "%s,", buff );
}
printf( "\n" );
}
There sure is. There's an example from the book "The C++ Standard Library: a Tutorial and Reference" for how to make a std::streambuf that wraps file descriptors (like those you get from pipe()). From that creating a stream on top of it is trivial.
Edit: here's the book: http://www.josuttis.com/libbook/
And here's an example output buffer using file descriptors: http://www.josuttis.com/libbook/io/outbuf2.hpp.html
Also, here's an example input buffer: http://www.josuttis.com/libbook/io/inbuf1.hpp.html
You'd want a stream that can be created with an existing file descriptor, or a stream that creates a pipe itself. Unfortunately there's no such standard stream type.
You could write your own or use, for example, boost::iostreams::file_descriptor.
Writing your own entails creating a subclass of basic_streambuf, and then then creating a very simple subclass of basic_i/ostream that does little more than hold your streambuf class and provide convenient constructors.

How can I read keyboard input to character strings? (C++)

getc (stdin) reads keyboard input to integers, but what if I want to read keyboard input to character strings?
#include "stdafx.h"
#include "string.h"
#include "stdio.h"
void CharReadWrite(FILE *fin);
FILE *fptr2;
int _tmain(int argc, _TCHAR* argv[])
{
char alpha= getc(stdin);
char filename=alpha;
if (fopen_s( &fptr2, filename, "r" ) != 0 )
printf( "File stream %s was not opened\n", filename );
else
printf( "The file %s was opened\n", filename );
CharReadWrite(fptr2);
fclose(fptr2);
return 0;
}
void CharReadWrite(FILE *fin){
int c;
while ((c=fgetc(fin)) !=EOF) {
putchar(c);}
}
Continuing with the theme of getc you can use fgets to read a line of input into a character buffer.
E.g.
char buffer[1024];
char *line = fgets(buffer, sizeof(buffer), stdin);
if( !line ) {
if( feof(stdin) ) {
printf("end of file\n");
} else if( ferror(stdin) ) {
printf("An error occurerd\n");
exit(0);
}
} else {
printf("You entered: %s", line);
}
Note that ryansstack's answer is a much better, easier and safer solution given you are using C++.
http://www.cplusplus.com/reference/iostream/istream/getline/
Ta da!
A character (ASCII) is just an unsigned 8 bit integral value, ie. it can have a value between 0-255. If you have a look at an ASCII table you can see how the integer values map to characters. But in general you can just jump between the types, ie:
int chInt = getc(stdin);
char ch = chInt;
// more simple
char ch = getc(stdin);
// to be explicit
char ch = static_cast<char>(getc(stdin));
Edit: If you are set on using getc to read in the file name, you could do the following:
char buf[255];
int c;
int i=0;
while (1)
{
c = getc(stdin);
if ( c=='\n' || c==EOF )
break;
buf[i++] = c;
}
buf[i] = 0;
This is a pretty low level way of reading character inputs, the other responses give higher level/safer methods, but again if you're set on getc...
Since you already are mixing "C" code with "C++" by using printf, why not continue and use scanf scanf("%s", &mystring); in order to read and format it all nicely ?
Or of course what already was said.. getline