Convert Extended characters to int - c++

I have the following string in notepad "ùŒÚÿwž+2»ó66H".
I used the fstream library to read those files in c++ and print those characters and thier equivalent decimal numbers in the conlsole window but the symbols are different than those in notepad and numbers for the extended characters are in negative form,i realize maybe its impossible for my console window to print those symbols as the vary through many character sets but how do i get the numbers displayed as 255 and not -1 ??

Simple version: Read the file as unsigned char instead of char and use printf('%c', a) to see what character you get. This will get you values between 0 to 255 and not -128 to 127

Related

Fortran formatted IO and the Null character

I wonder how Fortran's I/O is expected to behave in case of a NULL character ACHAR(0).
The actual task is to fill an ASCII file by blocks of precisely eight characters. The strings are read from a binary and may contain non-printing characters.
I tried with gfortran 4.8, 8.1 and f2c. If there is a NULL character in the string the format specifier FORMAT(A8) does not write eight characters.
Give the following F77 code a try:
c Print a string of eight character surrounded by dashes
100 FORMAT('-',A8,'-')
c Works fine if empty or any other combination of printing chars
write(*,100) ''
c In case of a short sting blanks are padded
write(*,100) '345678'
c A NULL character does something I did not expect
write(*,100) '123'//ACHAR(0)//'4567'
c Not even position editing helps
101 FORMAT('-',A8,T10,'x')
write(*,101) '123'//ACHAR(0)//'4567'
end
My output is:
- -
- 345678-
-1234567-
-1234567x
Is this expected behavior? Any idea how to get the output eight characters wide in any case?
When using an edit descriptor A8 the field width is eight. For output, eight characters will be written.
In the case of the example, it isn't the writing of the characters that is contrary to your expectations, but how they are displayed by your terminal.
You can examine the output further with tools like hexdump or you can write to an internal file and look at arbitrary substrings.
Yes, that is expected, if there is a null character, the printing of the string on the screen can stop there. The characters will still be sent, but the string does not have to be printed on the screen.
Note that C uses NULL to delimit strings and the OS may interpret the strings it receives with the same conventions. The allows the non-printable characters to be interpreted in processor specific ways by the processor and the processor includes the whole complex of the compiler, the executing environment (OS and programs in the OS) and the hardware.

C++ - A few quetions about my textbook's ASCII table

In my c++ textbook, there is an "ASCII Table of Printable Characters."
I noticed a few odd things that I would appreciate some clarification on:
Why do the values start with 32? I tested out a simple program and it has the following lines of code: char ch = 1; std::cout << ch << "\n"; of code and nothing printed out. So I am kind of curious as to why the values start at 32.
I noticed the last value, 127, was "Delete." What is this for, and what does it do?
I thought char can store 256 values, why is there only 127? (Please let me know if I have this wrong.)
Thanks in advance!
The printable characters start at 32. Below 32 there are non-printable characters (or control characters), such as BELL, TAB, NEWLINE etc.
DEL is a non-printable character that is equivalent to delete.
char can indeed store 256 values, but its signed-ness is implementation defined. If you need to store values from 0 to 255 then you need to explicitly specify unsigned char. Similarly from -128 to 127, have to specify signed char.
EDIT
The so called extended ASCII characters with codes >127 are not part of the ASCII standard. Their representation depends on the so called "code page" chosen by the operating system. For example, MS-DOS used to use such extended ASCII characters for drawing directory trees, window borders etc. If you changed the code page, you could have also used to display non-English characters etc.
It's a mapping between integers and characters plus other "control" "characters" like space, line feed and carriage return interpreted by display devices (possibly virtual). As such it is arbitrary, but they are organized by binary values.
32 is a power of 2 and an alphabet starts there.
Delete is the signal from your keyboard delete key.
At the time the code was designed only 7 bits were standard. Not all bytes (parts words) were 8 bits.

Data not saved in binary form

I made a program as below
#include<iostream.h>
#include<string.h>
#include<stdio.h>
#include<fstream.h>
void main() {
char name[24];
cout << "enter string :";
gets(name);
ofstream fout;
fout.open("bin_data",ios::out|ios::binary);
fout.write((char*)&name,10);
fout.close();
}
But when I open the file bin_data by notepad I find that the string is saved in text format not in binary form...... Please help...
This code can save a word of 10 char.
But when I compile this code by turbo c++ v4.5 I find that. When I input 1 or 2 letter word it saves in text format(ignore garbage value) but when I input a word of 3 to 7 letter long it saves in binary format. and in 9 and 10 letter word again in text format..... Can anyone tell me the reason...?
Please compile and run program as I mentioned above and answer
Your data only contains text. It is represented by the very same bits in both text format and binary format.
Binary format means that your data is written to the file unchanged. If you were to use text format, some non-text characters would be modified. For example, byte 10 (which represents newline) could be changed to operating system specific newline (two bytes, 15 and 10, on Windows).
For binary values of text characters, see http://www.asciitable.com/
Your second example has a buffer overflow.
char name[24];
fout.write((char*)&name,10);
You reserve 24 bytes of data, which is filled by random bytes that happen to be at that point of memory. When you save a 2-character string to the buffer, it only overwrites first three bytes. The third byte is set to value 0, which tells you that the text ends at that point. If you were to call strlen(), it would tell you the amount of characters before the first 0 byte.
If your input is a 2-character text, and you choose to write 10 bytes from your buffer, the 7 bytes in the end are filled with invalid data. Note that this does not cause an access violation, because you have reserved data for 24 bytes.
See also: https://en.wikipedia.org/wiki/Null-terminated_string

C++ string/char and accents

I'm writing a text writer in C++, which I'll have a string of a phrase and display the appropriate bitmap font for each char value.
For now, it's working for the regular characters, but I'm getting weird values for accents and other characters such as À, Á, Â, Ã, etc
I'm doing this:
int charToPrint = 'a';
//use this value to figure which bitmap font to display
The bitmap font does have these characters, but on this line I'm not getting the values I'm supposed to get, such as: 195 for Ã, 199 for Ç, etc...
I tried changing my project's character set from Multi Byte to Unicode, but I don't think that does anything for the char->int conversion...
How can I get this conversion with chars?
Edit: I'm using Visual Studio 2012, Windows 7, and it's an OpenGL application with a bitmap font.
I've mapped the positions/width/height of each character, according to it's char value, so the character a is at the position 97 of my bitmap font (plus width accounted for).
To draw, I just need to figure the position based on the char code.
I have a string of a phrase I want to display, and I loop through each character, figure the charCode, and call my draw function.
For these characters with accents, I'm getting negative values, so my draw function doesn't do anything (there's no position -30 for Ç for example).
I need to figure how to get these values properly and send to the draw function.
Use Unicode, it is year 2013 already :) The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
You will use wchar_t as a type and UTF-16 / UTF-32 encoding. That will make your code supporting not only "irregular" characters but many more "irregular" characters :) (there is no such a thing as regular characters).
Example
wchar_t c = L'Á';
printf("char: %lc encoding: %d\n", c, c);
c = 0xc1;
printf("char: %lc encoding: %d\n", c, c);
Output
char: Á encoding: 193
char: Á encoding: 193

UDF decimal to binary

I wrote a decimal to binary converter function in order to practice my manipulation of number systems and arrays. I took the int a converted it to binary and stored each character, or so I beleive, in an array, then displayed to the screen, however it is displaying characters I do not know i looked them up on the aski table and do not recognize them, so i would like to ask for your assistance, here is a picture of the code, and console app.
Thanks in advance.
You likely want to insert number chars (such as '1') in your result, but you assign the char value. Try adding the value of '0' to get a readable result (remainder + '0').
If you interpret the result array as a string (that's what i suggested), you should also set the last char to the value 0 (not '0'!) to mark the end of the c string.
Your output function not correct output your binary text because:
1) cout output characters until '\0', so your function will correct output until get first 0 in binary representation of int (for example for 5 = 101 it will output only one smile with code 0x01).
2) your last character in array is not '\0', so cout will output garbage until '\0' or memory access exception.