How to put UTF-8 text into std::string through linux sockets - c++

I made a simple C++ server program, which works just fine as long as I use it with simple tools like telnet, however when I use for example .Net (C#) that would connect to it and send it some strings, the text is somewhat corrupted. I tried multiple encodings on C# side, and only result was that it was corrupted in a different way.
I belive that main problem is in this function that is meant to read a line of text from socket:
std::string Client::ReadLine()
{
std::string line;
while (true)
{
char buffer[10];
read(this->Socket, buffer, 9);
int i = 0;
while (i < 10)
{
if (buffer[i] == '\r')
{
i++;
continue;
}
if (buffer[i] == '\0')
{
// end of string reached
break;
}
if (buffer[i] == '\n')
{
return line;
}
line += buffer[i];
i++;
}
}
return line;
}
This is a simple output of program into terminal, when I send it string "en.wikipedia.org" using telnet I see:
Subscribed to en.wikipedia.org
When I use C# that open a stream writer using this code
streamWriter = new StreamWriter(networkStream, Encoding.UTF8);
I see:
Subscribed to en.wiki,pedia.org,
When I use it without UTF-8 (so that default .net encoding is used, IDK what it is)
streamWriter = new StreamWriter(networkStream);
I see:
Subscribed to en.wiki�pedia.org�
However, in both cases it's wrong. What's a most simple way to achieve this, using just standard C++ and linux libraries? (no boost etc - I can do this using some framework, like Qt, boost etc, but I would like to understand this). Full code #http://github.com/huggle/XMLRCS

A UTF-8 string is just a series of single bytes, basically just wnat std::string is supposed to handle. You have two other problems:
The first is that you don't actually check ho many characters was actually read, you always loop over ten characters. Since you don't loop over the actual number of characters read (and don't check for error or end of connection) you might read data in the buffer beyond what was written by read and you have undefined behavior.
The second problem is kind of related to the first, and that is that you have a buffer of ten characters, you read up to nine characters into the buffer, and then loop over all ten characters in the buffer. The problem with this is that since you only read up to nine characters, the tenth character will always be uninitialized. Because the tenth entry in the buffer is always uninitialized, its value will be indeterminate and reading it will again lead to undefined behavior.

Related

Calling putback() on istream multiple times

Many sites describe the istream::putback() function that lets you "put back" a character into the input stream so you can read it again in a subsequent reading operation.
What's to stop me, however, from calling putback() multiple times in sequence over the same stream? Of course, you're supposed to check for errors after every operation in order to find out if it succeeded; and yet, I wonder: is there any guarantee that a particular type of stream supports putting back more than one character at a time?
I'm only guessing here, but I can imagine istringstream is able to put back as many characters as the length of the string within the stream; but I'm not so sure that it is the same for ifstream.
Is any of this true? How do I find out how many characters I can putback() into an istream?
If you want to read multiple characters from a stream you may unget them using unget():
std::vector<char>&read_top5(std::istream & stream, std::vector<char> & container) {
std::ios_base::sync_with_stdio(false);
char c;
int i=4;
container.clear();
while (stream && stream.get(c)) {
container.push_back(c);
if (--i < 0) break;
if (c == '\n') break;
}
for (int j=0;j<(int)container.size();j++) {
//stream.putback(container[j]); // not working
stream.unget(); // working properly
}
return container;
}
This function reads the first 5 characters from stream while they are still in stream after the function exits.

String parsing to extract int in C++ for Arduino

I'm trying to write a sketch that allows a user to access data in EEPROM using the serial monitor. In the serial monitor the user should be able to type one of two commands: “read” and “write. "Read" should take one argument, an EEPROM address. "Write" should take two arguments, an EEPROM address and a value. For example, if the user types “read 7” then the contents of EEPROM address 7 should be printed to the serial monitor. If the user types “write 7 12” then the value 12 should be written into address 7 of the EEPROM. Any help is much appreciated. I'm not an expert in Arudino, still learning ;). In the code below I defined inByte to be the serail.read(). Now how do I extract numbers from the string "inByte" to assign to "val" and "addr"
void loop() {
String inByte;
if (Serial.available() > 0) {
// get incoming byte:
inByte = Serial.read();
}
if (inByte.startsWith("Write")) {
EEPROM.write(addr, val);
}
if (inByte.startsWith("Read")) {
val= EEPROM.read(addr);
}
delay(500);
}
Serial.read() only reads a single character. You should loop until no more input while filling your buffer or use a blocking function like Serial.readStringUntil() or Serial.readBytes() to fill a buffer for you.
https://www.arduino.cc/en/Serial/ReadStringUntil
https://www.arduino.cc/en/Serial/ReadBytes
Or you can use Serial.parseInt() twice to grab the two values directly into a pair of integers. This function will skip the non numerical text and grab the values. This method is also blocking.
https://www.arduino.cc/en/Reference/StreamParseInt
A patch I wrote to improve this function is available in the latest hourly build, but the old versions still work fine for simple numbers with the previous IDE's
The blocking methods can be tweaked using Serial.setTimeout() to change how long they wait for input (1000ms default)
https://www.arduino.cc/en/Serial/SetTimeout
[missed the other answer, there's half my answer gone]
I was going to say use Serial.readStringUntil('\n') in order to read a line at a time.
To address the part:
how do I extract numbers from the string "inByte" to assign to "val" and "addr"
This is less trivial than it might seem and a lot of things can go wrong. For simplicity, let's assume the input string is always in the format /^(Read|Write) (\d+)( \d+)?$/.
A simple way to parse it would be to find the spaces, isolate the number strings and call .toInt().
...
int val, addr;
int addrStart = 0;
while(inByte[addrStart] != ' ' && addrStart < inByte.length())
addrStart++;
addrStart++; //skip the space
int addrEnd = addrStart + 1;
while(inByte[addrEnd] != ' ' && addrEnd < inByte.length())
addrEnd++;
String addrStr = inByte.substring(addrStart, addrEnd); //excludes addrEnd
addr = addrStr.toInt();
if (inByte.startsWith("Write")) {
int valEnd = addrEnd+1;
while(inByte[varEnd] != ' ' && varEnd < inByte.length())
valEnd++;
String valStr = inByte.substring(addrEnd+1, valEnd);
val = valStr.toInt();
EEPROM.write(addr, val);
}
else if (inByte.startsWith("Read")) {
val = EEPROM.read(addr);
}
This can fail in all sorts of horrible ways if the input string has a double space or the numbers are malformed, or has any other subtle error.
If you're concerned with correctness, I suggest you look into a regex library, or even an standard format such as JSON - see ArduinoJson.

No methods of read a file seem to work, all return nothing - C++

EDIT: Problem solved! Turns out Windows 7 wont let me read/ write to files without explicitly running as administrator. So if i run as admin it works fine, if i dont i get the weird results i explain below.
I've been trying to get a part of a larger program of mine to read a file.
Despite trying multiple methods(istream::getline, std::getline, using the >> operator etc) All of them return with either /0, blank or a random number/what ever i initialised the var with.
My first thought was that the file didn't exist or couldn't be opened, however the state flags .good, .bad and .eof all indicate no problems and the file im trying to read is certainly in the same directory as the debug .exe and contains data.
I'd most like to use istream::getline to read lines into a char array, however reading lines into a string array is possible too.
My current code looks like this:
void startup::load_settings(char filename[]) //master function for opening a file.
{
int i = 0; //count variable
int num = 0; //var containing all the lines we read.
char line[5];
ifstream settings_file (settings.inf);
if (settings_file.is_open());
{
while (settings_file.good())
{
settings_file.getline(line, 5);
cout << line;
}
}
return;
}
As said above, it compiles but just puts /0 into every element of the char array much like all the other methods i've tried.
Thanks for any help.
Firstly your code is not complete, what is settings.inf ?
Secondly most probably your reading everything fine, but the way you are printing is cumbersome
cout << line; where char line[5]; be sure that the last element of the array is \0.
You can do something like this.
line[4] = '\0' or you can manually print the values of each element in array in a loop.
Also you can try printing the character codes in hex for example. Because the values (character codes) in array might be not from the visible character range of ASCII symbols. You can do it like this for example :
cout << hex << (int)line[i]

Reading file byte by byte with ifstream::get

I wrote this binary reader after a tutorial on the internet. (I'm trying to find the link...)
The code reads the file byte by byte and the first 4 bytes are together the magic word. (Let's say MAGI!) My code looks like this:
std::ifstream in(fileName, std::ios::in | std::ios::binary);
char *magic = new char[4];
while( !in.eof() ){
// read the first 4 bytes
for (int i=0; i<4; i++){
in.get(magic[i]);
}
// compare it with the magic word "MAGI"
if (strcmp(magic, "MAGI") != 0){
std::cerr << "Something is wrong with the magic word: "
<< magic << ", couldn't read the file further! "
<< std::endl;
exit(1);
}
// read the rest ...
}
Now here comes the problem, when I open my file, I get this error output:
Something is wrong with the magic word: MAGI?, couldn't read the file further! So there is always one (mostly random) character after the word MAGI, like in this example the character ?!
I do think that it has something to do with how a string in C++ is stored and compared with each other. Am I right and how can I avoid this?
PS: this implementation is included in another program and works totally fine ... weird.
strcmp assumes that both strings are nul-terminated (end with a nul-character). When you want to compare strings which are not terminated, like in this case, you need to use strncmp and tell it how many characters to compare (4 in this case).
if (strncmp(magic, "MAGI", 4) != 0){
When you try to use strcmp to compare not null-terminated char arrays, it can't tell how long the arrays are (you can't tell the length of an array in C/C++ just by looking at the array itself - you need to know the length it was allocated with. The standard library is not exempt from this limitation). So it reads any data which happens to be stored in memory after the char array until it hits a 0-byte.
By the way: Note the comment to your question by Lightness Races in Orbit, which is unrelated to the issue you are having now, but which hints a different bug which might cause you some problems later on.

C++ Escape Phrase Substring

I'm trying to parse web data coming from a server, and I'm trying to find a more stl version of what I had.
My old code consisted of a for() loop and checked each character of the string against a set of escape characters and used a stringstream to collect the rest. As I'm sure you can imagine, this sort of loop leads to being a high point of failure when reading web data, as I need strict syntax checking.
I'm trying to instead start using the string::find and string::substr functions, but I'm unsure of the best implementation to do it with.
Basically, I want to read a string of data from a server, different data, separated by a comma. (i.e., first,lastname,email#email.com) and separate it at the commas, but read the data in between.
Can anyone offer any advice?
I'm not sure what kind of data are you parsing, but it's always a good idea to use a multi layer architecture. Each layer should implement an abstract function, and each layer should only do one job (like escaping characters).
The number of layers you use depends on the actual steps needed to decode the stream
for your problem I suggest the following layers:
1st: tokenize by ',' and '\n': convert in to some kind of vector of strings
2nd: resolve escapes: decode escape characers
you should use std::stringstream, and process the characters with a loop. unless your format is REALLY simple (like only a single separator character, without escapes), you can't really use any standard function.
For the learning experience, this is the code I ended up using to parse data into a map. You can use the web_parse_resurn.err to see if an error was hit, or use it for specific error codes.
struct web_parse_return {
map<int,string> parsedata;
int err;
};
web_parse_return* parsewebstring(char* escapechar, char* input, int tokenminimum) {
int err = 0;
map<int,string> datamap;
if(input == "MISSING_INFO") { //a server-side string for data left out in the call
err++;
}
else {
char* nTOKEN;
char* TOKEN = strtok_s(input, escapechar,&nTOKEN);
if(TOKEN != 0) { //if the escape character is found
int tokencount = 0;
while(TOKEN != 0) {//since it finds the next occurrence, keep going
datamap.insert(pair<int,string>(tokencount,TOKEN));
TOKEN = strtok_s(NULL, escapechar,&nTOKEN);
tokencount++;
}
if(tokencount < tokenminimum) //check that the right number was hit
err++; //other wise, up the error count
}
else {
err++;
}
}
web_parse_return* p = new web_parse_return; //initializing a new struct
p->err = err;
p->parsedata = datamap;
return p;
}