UTF-8 problems in writing a UART-Console on a microcontroller

UTF-8 problems in writing a UART-Console on a microcontroller - c++

I am currently writing a uart-console on an ATMega1284p. It supposed to echo the characters back, so that the computer-side-console actually sees what is being typed and that is it for now.
Here is the problem: With ASCII it works perfectly fine, but if I am sending anything beyond ASCII e.g. a '§' my minicom shows "�§" '�' being the invalid or the '§' in case everything works fine. But getting the combination of both throws me off and I currently have no idea where the problem is!
Here is part of my code:
char c;
while(m_uart->recv(c) > 0) {
m_lineBuff[m_lineIndex++] = c;
if(c == '\r') {
c = '\n';
m_lineBuff[m_lineIndex++] = c;
m_sendCount = 2;
} else {
m_sendCount = 1;
}
this->send();
if(c == '\n') {
m_lineBuff[m_lineIndex++] = '\0';
// invoke some callbacks that handle the line at some point
m_lineIndex = 0;
}
}
m_lineBuff is a self written (and tested) vector of chars. m_uart is a self written (and also tested) UART driver for the micro-internal hardware uart. this->send sends m_sendCount bytes using m_uart.
What I tried so far:
I verified that the baud rates of minicom and my micro match (115200). I verified that the frequency is within the 2% range (micro is running at 20MHz). Both minicom and the micro are setup for 8n1.
I verified that minicom works by hooking it up to a little-board I had lying around. On that board any utf-8 digit works just fine.
Does anyone see my mistake or does anyone have a clue at what I haven't considered?
I'll be happy to supply up to all of my code if you guys are interested in it.
EDIT/Elaboration:
Observation 1 (prior to starting this project)
The PC side program (minicom) can send and recieve characters to resp. from the microcontroller. It does not show the sent characters though.
Conclusion 1 (prior to starting this project)
The microcontroller side needs to send the characters back to the PC, so that you have the behaviour of a console.
Thus I immediately send back any character I get.
Observation 2 (after implementing it)
When I press '§' (or any other character consisting of more than 1 byte) (using minicom) I see "�§".
Conclusion 2 (after implementing it)
Something I can't explain with my knowledge is going on. Maybe a small delay between the two bytes making up the character lead to minicom printing a '�' first because the first byte on it's own is indeed an invalid character, and when the second character comes in minicom realizes that it's acutally '§' but minicom doesn't remove/overwrite the '�'.
If that is the problem, then how do I solve it? Does my microcontroller need to react faster/with less delay in between characters?
EDIT2:
I replaced the '?' with the actual character '�' using the power of copy and paste.
More tests I did
I tried the character '😹' and as I expexted (it backs my conclusion 2) and I got "���😹". '😹' by the way is a 4 byte character.
Set the baud rate of micro and minicom to 9600: exact same behaviour.
I managed to set minicom into hex mode: it sends regularly but outputs hex... When I send '😹' I get "f0 9f 98 b9" which (at least according to this site) is correct... Is that backing my conclusion 2? And more importantly: how do I get rid of that behaviour. It works with my little linux board instead of my micro.

EDIT: the op discovered on his own that the odd behaviour he discovered is (probably) a bug of minicom itself. This post of mine clearly looses its value, unless the community thinks that it should be removed I would leave it here as a witness of possible workarounds when experiencing similar problems.
tl;dr: your pc application might not be interpreting UTF-8 correctly as it appears.
If we look at the Extended ASCII Code defined by ISO 8859-1,
A7 10100111 § § => Section sign
and according to this page, the UTF-8 encoding of § is
U+00A7 § c2 a7 => SECTION SIGN
So my educated guess is that the symbol is still printed correctly because it belongs to the Extended ASCII Code with the same value a7.
Either your end-application fails to correctly interpret the UTF-8 U (c2) symbol, and that's why you get an ? printed out, or a component in the middle fails to pass the correct value forward. I am inclined to believe your output is an instance of the first case.
You claim that minicom works, I can not refute this claim, but I would suggest you to try the following things first:
try send a symbol that belongs to UTF-8 but not to the ISO 8859-1 standard: if it doesn't work, this should rule out your Conclusion #2 pretty immediately;
try reduce the speed to the lowest possible, 9600 baud rate
verify that minicom is correctly configured to interpret UTF-8 characters checking the documentation;
try to use some other application to fetch data from your micro-controller and see whether the results are consistent;
verify that the unicode symbol U you're sending out is correct
NB: this is kind of an incomplete answer, but I couldn't get everything in the comments. If you're patient enough, please update your question with your findings and comment this answer to notify me. I'll get back here and update my answer accordingly.

Related

Weird output for RGB

I am trying to manage some LED strips with my mobile device using bluetooth + Adafruit NeoPixel. I almost had the sketch finished but I have found the numbers for RGB are not appearing as I expected and I cannot find what I am doing wrong.
Imagine that from my mobile I have sent the following RGB code "0,19,255", when I check the console I see the following result:
As you can see the first two lines are ok, but the third one we can see 2550. The 0 should no be there and I cannot figure it out the problem.
So I decided isolate the code and try to keep the minimum to identify the root cause and this is the code:
#include <SoftwareSerial.h>
#include <Adafruit_NeoPixel.h>
SoftwareSerial BT (10, 11);
#define PIN 2
#define NUMPIXELS 144
Adafruit_NeoPixel pixels(NUMPIXELS, PIN, NEO_GRBW + NEO_KHZ800);
int red = "";
void setup() {
Serial.begin(9600);
Serial.println("Ready");
BT.begin(38400);
pixels.begin();
pixels.show();
pixels.setBrightness(20);
}
void loop() {
while (BT.available()>0){
red = BT.parseInt();
Serial.println(red);
}
}

You describe that for the shown output you sent via mobile "0,19,255".
Yet in the shown output that obviously is only the second part of a longer sequence of numbers sent, which starts with "0,21".
Assuming that what you send is always of the format you have described, i.e. three numbers, separated by two ",", the shown output is most likely the result of you sending first "0,21,255" and then another triplet "0,19,255".
These two messages together would end up in an input buffer "0,21,2550,19,255".
Now I have to do some speculation. Most parsers, when told to look for numbers within the buffer, will look for digits followed by non-digits. They would end up yielding "0,21,2550".
Without knowing details of the parsers working it is hard to say how to fix your problem.
I would however definitly experiment with sending triplets which end in a non-digit.
For example:
"0,21,255,"
or
"0,21,255 "
or
"0,21,255;"
If none of them work you might need to explicitly expect the non-digit, i.e. between triplets of numbers read a character and either ignore it or compare it to " ,", " " or ";" for additional self-checking features.
(Writing this I rely on user zdf not intending to make an answer, because while I did spot the "2550" as "255""0", zdf spotted the only two "," inside the question body in sample input, which I missed. I will of course adapt my answer to one created by zdf, to not use their contribution without their consent.)

Arduino substring doesn't work

I have a static method that searches (and returns) into String msg the value between a TAG
this is the code function:
static String genericCutterMessage(String TAG, String msg){
Serial.print("a-----");
Serial.println(msg);
Serial.print("b-----");
Serial.println(TAG);
if(msg.indexOf(TAG) >= 0){
Serial.print("msg ");
Serial.println(msg);
int startTx = msg.indexOf(TAG)+3;
int endTx = msg.indexOf(TAG,startTx)-2;
Serial.print("startTx ");
Serial.println(startTx);
Serial.print("endTx ");
Serial.println(endTx);
String newMsg = msg.substring(startTx,endTx);
Serial.print("d-----");
Serial.println(newMsg);
Serial.println("END");
Serial.println(newMsg.length());
return newMsg;
} else {
Serial.println("d-----TAG NOT FOUND");
return "";
}
}
and this is output
a-----[HS][TS]5132[/TS][TO]5000[/TO][/HS]
b-----HS
msg [HS][TS]5132[/TS][TO]5000[/TO][/HS]
startTx 4
endTx 30
d-----
END
0
fake -_-'....go on! <-- print out of genericCutterMessage
in that case I want return the string between HS tag, so my expected output is
[TS]5132[/TS][TO]5000[/TO]
but I don't know why I receive a void string.
to understand how substring works I just followed tutorial on official Arduino site
http://www.arduino.cc/en/Tutorial/StringSubstring
I'm not an expert in C++ and Arduino but this looks like a flushing or buffering problem, isn't it?
Any idea?

Your code is correct, this should not happen. Which forces you to consider the unexpected ways that this could possibly fail. There is really only one candidate mishap I can think of, your Arduino is running out of RAM. It has very little, the Uno only has 2 kilobytes for example. It doesn't take a lot of string munching to fill that up.
This is not reported in a smooth way. All I can do is point you to the relevant company page. Quoting:
If you run out of SRAM, your program may fail in unexpected ways; it will appear to upload successfully, but not run, or run strangely. To check if this is happening, you can try commenting out or shortening the strings or other data structures in your sketch (without changing the code). If it then runs successfully, you're probably running out of SRAM. There are a few things you can do to address this problem:
If your sketch talks to a program running on a (desktop/laptop) computer, you can try shifting data or calculations to the computer, reducing the load on the Arduino.
If you have lookup tables or other large arrays, use the smallest data type necessary to store the values you need; for example, an int takes up two bytes, while a byte uses only one (but can store a smaller range of values).
If you don't need to modify the strings or data while your sketch is running, you can store them in flash (program) memory instead of SRAM; to do this, use the PROGMEM keyword.
That's not very helpful in your specific case, you'll have to look at the rest of the program for candidates. Or upgrade your hardware, StackExchange has a dedicated site for Arduino enthusiasts, surely the best place to get advice.

Bad status register?

Working on a project. Professor gave us a .zip file with some tests, so we can see if our project is working correctly. We are building a small kernel in c++.
Anyhow, there is a thread that waits for a keyboard interrupt (event9.wait()) and after that it should put characters in a buffer or end the program (if you press "esc").
while (!theEnd) {
event9.wait();
status = inportb(0x64); // reading status reg. from 64h
while (status & 0x01){ //while status indicates that keys are pressed
....
I checked and I am certain that it waits for the interrupt regularly. The problem occurs because status&0x01 is 0.
Then I got the part of code that gets the characters from 0x60 and it worked just fine.
Is there something wrong with the code of test files? And if yes, what? If the code is correct what could cause the problem?
I could change the test files, but I need a good reason to do so. And so far the only reason I have is that it doesn't work.
*note: comments are translated from Serbian, but I am almost certain they are translated correctly.

I think status & 0x01 is perfectly fine. However, you would need to read the port again after reading port 0x60 - it may well be that you do that later on in the code, but I personally would just re-write the code to:
while ((status = inportb(0x64)) & 0x01){ //while status indicates that keys are pressed
....
Note that you shouldn't read port 0x64 again inside the loop in this case.

Printing Code 128 C barcode through C++ code interacting with OPOS Common Controls 1.8

I'm trying to print Code 128 C type barcode (as type A/B would be too wide for my requirements) through Epson TM-H6000III receipt printer using OPOS Common Controls 1.8. My code is written in C++.
Normally, I print the barcode using the following code snippet:
const LONG PTR_BCS_Code128 = 110;
lOposBarcodeType = PTR_BCS_Code128;
lReturn = m_PosPrinter.PrintBarCode(2,*lpszTextline,lOposBarcodeType,120,5,PTR_BC_CENTER,PTR_BC_TEXT_BELOW);
Here, *lpszTextline represents the data to be printed as barcode.
From suggestions found online, I tried to make the following changes to print the barcode in Code 128 C format:
const LONG PTR_BCS_Code128_Parsed = 123;
lOposBarcodeType = PTR_BCS_Code128_Parsed;
lReturn = m_PosPrinter.PrintBarCode(2,*lpszTextline,lOposBarcodeType,120,5,PTR_BC_CENTER,PTR_BC_TEXT_BELOW);
and tried to format the barcode data in various ways:
Leading "{C"
Leading "{C", trailing "H"
Making no. of characters in the data even
But none of the ways worked. It always resulted in OPOS_E_ILLEGAL error with ResultCodeExtended = 300003. I cannot find more information about the extended code in the Internet either.
Any help in this regard will be highly appreciated.
Thanks in advance.
Prosu

The mode is often determined by the printer firmware, based on the data you are trying to print. The best behavior is when it tries to print as compact as possible: mode C is used if the data is all numeric, mode A is used if it's alphabetic, etc., and it switches from mode to mode as needed: a 17 digit number might print as mode C for the first 16 digits, then switch to mode A for the 17th digit.
If your printer firmware handles this directly, you may not even be able to choose the mode yourself. Alternately, some thermal printers cannot print anything but mode C and will return an error if you try to print alphabetic characters. (We had some old IBM Suremark printers that could only print mode C.)
You should check with Epson.

Error message "error: stray '\302' in program"

I'm using Code::Blocks on Ubuntu 10.10 (Maverick Meerkat). I have connected a Mac keyboard and set the keyboard settings to "Swiss German Mac". Now whenever I write an equals sign, followed by a space (something like width = 100) I get the error message: stray '\302' in program.
I know this error means that there is a non-standard character in the text file.
When I delete the space character, the program compiles just fine. So that means Code::Blocks adds some sort of special character. But I can't see why this happens. What is the reason?
What character does '\302' stand for?
[UPDATE]
I got a little further investigating the problem. I get this stray when I use the combo Shift + Space. Now that I know it doesn't happen that often any more. But it's still rather annoying especially when writing code... Is there a way to turn off this combo in X11?
[SOLVED]
Thanks to Useless's answer, I was able to solve the "issue". It's more of a feature actually. Shift + space created a spacenolinebreak by default. So by changing the xmodmap with
xmodmap -e "keycode 65 = space space space space space space"
this behavior was overridden and everything works fine now.

Since you're sure it's caused by hitting Shift + Space, you can check what X itself is doing by. First, run xev from the command line, hit Shift + Space and check the output. For example, I see:
$ xev
KeyPress event, serial 29, synthetic NO, window 0x2000001,
root 0x3a, subw 0x0, time 4114211795, (-576,-249), root:(414,593),
state 0x0, keycode 50 (keysym 0xffe1, Shift_L), same_screen YES,
XLookupString gives 0 bytes:
XmbLookupString gives 0 bytes:
XFilterEvent returns: False
KeyPress event, serial 29, synthetic NO, window 0x2000001,
root 0x3a, subw 0x0, time 4114213059, (-576,-249), root:(414,593),
state 0x1, keycode 65 (keysym 0x20, space), same_screen YES,
XLookupString gives 1 bytes: (20) " "
XmbLookupString gives 1 bytes: (20) " "
XFilterEvent returns: False
...
Then, run xmodmap -pk and look up the keycode (space should be 65 as above, but check your xev output).
If you see something like
65 0x0020 (space)
Then X isn't doing this. On the other hand, if I pick a character key which is modified by shift, I see something like this:
58 0x006d (m) 0x004d (M)
If you have two or more keysyms for your keycode, X is the culprit. In that case, something like xmodmap -e 'keycode 65 space' should work.

\302 stands for the octal representation of byte value the compiler encountered. It translates to 11000010 in binary, which makes me think it's the start of a two byte UTF-8 sequence. Then this sequence must be:
11000010 10??????
Which encodes the binary Unicode point 10??????, which can be anything from U+80 to U+BF.
Several characters starting from U+80 are special spaces and breaks which usually are not shown inside a text editor.
Probably it's not your editor, but Xorg, that emits these characters due to your keyboard settings. Try switching to a generic US keyboard and test if the problems persists.

'\302' is C notation for the octal number 3028, which equals C216 and 19410. So it's not ASCII.
Which character it maps to depends on the encoding. In Latin-1, it's the Â character, for instance.

I have seen this type of issue when copying and pasting from web pages or other electronic documents. The common culprits would be invalid quotes like ` instead of ', or something alike. Try to use the compiler error to guide you into where in the file the error might be.

That sounds like some kind of encoding issue. I haven't used Code::Blocks for years, so I am not sure if it allows you to pick different encodings.
How about opening your code file with gedit and saving it as UTF-8, and then try again? But it sounds rather strange that you get such an issue using space characters.

I've seen this problem in my Linux box with a Finnish keyboard.
It also happens with Emacs, etc. I don't have a good solution for it, but I guess reports about the fact that it happens elsewhere are also useful...

If you open your file in Emacs and set-buffer-file-coding-system to something like "unix" or some ASCII variety, then when you try to save, it'll warn you that the buffer contains unrepresentable characters and points you to them so you can fix them.

I had the same issue by modified a US ASCII example file.
So I convert it in UTF-8, here are GNU/Linux command:
iconv -c -t us-ascii -f utf-8 source_file -o dest_file
And then add my modifications… no more errors!
To verify initial encoding, use
file -i source_file
I should add a non-ASCII character to allow iconv do the job!!??

I decided to move the file from a MAC book to a Linux box and used email with my icloud.com address. When I opened the transferred file the errors had gone and the file now compiles !

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js