Printing Euro Symbol to PCL6 printer - universe

I tried to print CHAR(128), U+20AC, etc. but none seem to print. Does anyone know how to print the Euro Symbol to a Sharp PCL6 printer in Universe/U2? Thanks in advance.

The Euro symbol with ASCII 128 only applies with the correct 8-bit table because there is no consistency with ASCII beyond the 7-bit table. In other words if you output character 128 and you have the wrong character set assigned it will not print what you are expecting. A little research shows that the ISO 8859-1 able needs to be assigned.
You will need to consult the printer manual to determine how to change the assigned table.

Related

i am building a program for Urdu language analysis so how can I make my program to accept text file in Urdu language in c++

I am building a language analysis program I have a program which counts the words in text and give the ratio of every word in text as a output, but this program can not work on file containing Urdu text. how can I make it work
Encoding
Urdu may be presented in two¹ forms: Unicode and Code Page 868. This is convenient to you because the two ranges do not overlap. It is inconvenient because the Unicode code range is U+0600 – U+06FF, which means encoding is an issue:
CP-868 will encode each one as a single-byte value in the range 128–252
UTF-8 will encode each one as a two-byte sequence with bits 110x xxxx and 10xx xxxx
UTF-16 encodes every character as two-byte entities
UTF-32 encodes every character as four-byte entities
This means that you should be aware of encoding issues, and for an easy life, use UTF-16 internally (std::u16string), and accept files as (default) UTF-8 / CP-868, or as UTF-16/32 if there is a BOM indicating such.
Your other option is to simply require all input to be UTF-8 / CP-868.
¹ AFAIK. There may be other ways of storing Urdu text.
  Three forms. See comments below.
Word separation
As you know, the end of a word is generally marked with a special letter form.
So, all you need is a table of end-of-word letters listing letters in both the CP-868 range and the Unicode Arabic text range.
Then, every time you find a space or a letter in that table you know you have found the end of a word.
Histogram
As you read words, store them in a histogram. For C++ a map <u16string, size_t> will do. The actual content of each word does not matter.
After that you have all the information necessary to print stats about the text.
Edit
The approach presented above is designed to be simple at the cost of some correctness. If you are doing something for the workplace, for example, and assuming it matters, you should also consider:
Normalizing word forms
For example, the same word may be presented in standard Arabic text codes or using the Urdu-specific codes. If you do not convert to the Urdu equivalent characters then you will have two words that should compare equal but do not.
Use something internally consistent. I recommend UZT, as it is the most complete Urdu text representation. You will also need an additional lookup for the original text representation from the UZT representation.
Dictionaries
As complete a dictionary (as an unordered_set <u16string>) of words in Urdu as you can get.
This is how it is done with languages like Japanese, for example, to find breaks between words.
Then use the dictionary to find all the words you can, and fall back on letterform recognition and/or spaces for what remains.

simple sum of Unicode symbol codes

I want to do this:
1) Click event of Convert ! button
User must type 2 value into writable edit controls. After pressing Convert ! program must set sum of these characters Unicode values to first read-only edit control (it's near of " = " symbol). For example, if I set first edit control value as є (which its UTF-16 (hex) encoding value is 0x0404 (0404). It's also known as Cyrillic Capital Letter Ukrainian IE) and second edit control value as # (which its UTF-16 (hex) encoding value is 0x0040 (0040). It's also known as Commercial At), then result must be a symbol: ф (its UTF-16 encoding value is 0x0444 (0444)). Therefore, its value equals to sum of other edit controls UTF-16 encoding values. How can I do this?
2) Click event of Undo button
By clicking Undo button, it must sets the value of edit control the below this button. This value should be є symbol (as you see its Unicode encoding value is extraction of sum and second edit control's value. How can I do this?
I've searched out for these problem for 2 weeks in Google, MSDN and some different forums. But I couldn't find any helpful topic. I could find only the MultiByteCharacterSet, _mbclen, mblen and _mblen_l functions. If these functions are useful for me, how can I use it/them in my program? Please, give me advice. I'm new to VC++.
Edit
User must enter a character. It is maybe a digit or letter. Not a word or sequence of characters or number.
Thanks for any attention.
P.S: If there are too many and poor mistakes in my grammar, and if the question is duplicate so sorry...
Best regards,
Mirjalal.
The input value is already equal to its unicode-16 value. No conversion is needed.
CString in1(L'1');
CString in2(L'2');
CString sum(wchar_t(in1[0] + in2[0]));

Latin based Japanse input

I'm working on Japanese input system for device with 10-digit numeric keypad.
That's why I'm searching for some not complicated input method that can be implemented on c++.
I've implemented Pinyin method for Chinese input. Each Chinese symbol can be reached by some Latin symbol combination and then chosen from the list. For example if user types "ca" I show "擦拆礤嚓" Chinese characters list.
Is there something similar for Japanese?
I have found the table with Japanese transliteration: あa でde がga じji まm のno たta わwa 阿ae ... 芭bac 八bap 捌bat 覇bax 冊cek 測ces 策cez 癌aib. And I suggest to base on this information while implementing Japanese input. For example user need to type "bac" and device will show possible replacement option "芭".
Is there some common used input method that is based Latin symbols input?

Encode gives wrong value of Japanese kanji

As a part of a scraper, I need to encode kanji to URLs, but I just can't seem to even get the correct output from a simple sign, and I'm currently blinded by everything I've tried thus far from various Stack Overflow posts.
The document is set to UTF-8.
sampleText=u'ル'
print sampleText
print sampleText.encode('utf-8')
print urllib2.quote(sampleText.encode('utf-8'))
It gives me the values:
ル
ル
%E3%83%AB
But as far as I understand, it should give me:
ル
XX
%83%8B
What am I doing wrong? Are there some settings I don't have correct? Because as far as I understand it, my output from the encode() should not be ル.
The code you show works correctly. The character ル is KATAKANA LETTER RU, and is Unicode codepoint U+30EB. When encoded to UTF-8, you'll get the Python bytestring '\xe3\x83\xab', which prints out as ル if your console encoding is Latin-1. When you URL-escape those three bytes, you get %E3%83%AB.
The value you seem to be expecting, %83%8B is the Shift-JIS encoding of ル, rather than UTF-8 encoding. For a long time there was no standard for how to encode non-ASCII text in a URL, and as this Wikipedia section notes, many programs simply assumed a particular encoding (often without specifying it). The newer standard of Internationalized Resource Identifiers (IRIs) however says that you should always convert Unicode text to UTF-8 bytes before performing percent encoding.
So, if you're generating your encoded string for a new program that wants to meet the current standards, stick with the UTF-8 value you're getting now. I would only use the Shift-JIS version if you need it for backwards compatibility with specific old websites or other software that expects that the data you send will have that encoding. If you have any influence over the server (or other program), see if you can update it to use IRIs too!

Unicode Woes! Ms-Access 97 migration to Ms-Access 2007

Problem is categorized in two steps:
Problem Step 1. Access 97 db containing XML strings that are encoded in UTF-8.
The problem boils down to this: the Access 97 db contains XML strings that are encoded in UTF-8. So I created a patch tool for separate conversion for the XML strings from UTF-8 to Unicode. In order to covert UTF8 string to Unicode, I have used function
MultiByteToWideChar(CP_UTF8, 0, PChar(OriginalName), -1, #newName, Size);.(where newName is array as declared "newName : Array[0..2048] of WideChar;" ).
This function works good on most of the cases, I have checked it with Spainsh, Arabic, characters. but I am working on Greek and Chineese Characters it is choking.
For some greek characters like "Ευγ. ΚαÏαβιά" (as stored in Access-97), the resultant new string contains null charaters in between, and when it is stored to wide-string the characters are getting clipped.
For some chineese characters like "?¢»?µ?"(as stored in Access-97), the result is totally absurd like "?¢»?µ?".
Problem Step 2. Access 97 db Text Strings, Application GUI takes unicode input and saved in Access-97
First I checked with Arabic and Spainish Characters, it seems then that no explicit characters encoding is required. But again the problem comes with greek and chineese characters.
I tried the above mentioned same function for the text conversion( Is It correct???), the result was again disspointing. The Spainsh characters which are ok with out conversion, get unicode character either lost or converted to regular Ascii Alphabets.
The Greek and Chineese characters shows similar behaviour as mentined in step 1.
Please guide me. Am I taking the right approach? Is there some other way around???
Well Right now I am confused and full of Questions :)
There is no special requirement for working with Greek characters. The real problem is that the characters were stored in an encoding that Access doesn't recognize in the first place. When the application stored the UTF8 values in the database it tried to convert every single byte to the equivalent byte in the database's codepage. Every character that had no correspondence in that encoding was replaced with ? That may mean that the Greek text is OK, while the chinese text may be gone.
In order to convert the data to something readable you have to know the codepage they are stored in. Using this you can get the actual bytes and then convert them to Unicode.