in python2 is OK, but in python3 doesn't work - python-2.7

#!/usr/bin/env python3
f = open('dv.bmp', mode='rb')
slika = f.read()
f.closed
pic = slika[:28]
slika = slika[54:]
# dimenzije originalnog bitmapa
pic_w = ord(pic[18]) + ord(pic[19])*256
pic_h = ord(pic[22]) + ord(pic[23])*256
print(pic_w, pic_h)
why this code doesn't work in python3 (in python2 it works fine) OR
howto read binary file into string type in python3?

In Python 2.x, binary mode (e.g. 'rb') only affects how Python interprets end-of-line characters:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
However in Python 3.x, binary mode also changes the type of the resulting data:
Normally, files are opened in text mode, that means, you read and
write strings from and to the file, which are encoded in a specific
encoding. If encoding is not specified, the default is platform
dependent (see open()). 'b' appended to the mode opens the file in
binary mode: now the data is read and written in the form of bytes
objects. This mode should be used for all files that don’t contain
text.
Since the read results in a bytes object, indexing it results in an integer, not a one-character string as in Python 2. Passing that integer to the ord() function raises the error mentioned in your comment.
The solution is just to omit the ord() call in Python 3, since the integer you get from indexing the bytes object is the same as what you'd get from calling ord() on the string equivalent.

Related

C++: Problem of Korean alphabet encoding in text file write process with std::ofstream

I have a code for save the log as a text file.
It usually works well, but I found a case where doesn't work:
{Id": "testman", "ip": "192.168.1.1", "target": "?뚯뒪??exe", "desc": "?덈뀞諛⑷??뚯슂"}
My code is a simple logic that saves the log string as a text file.
My code was works well when log is English, but there is a problem when log is Korean language.
After checking through various experiments, it was confirmed that Korean language would not problem if the file could be saved as utf-8 format.
I think, if Korean language is included in log string, c++ is basically saved as ANSI format.
This is my c++ code:
string logfilePath = {path};
log = "{\Id\": \"testman\", \"ip\": \"192.168.1.1\", \"target\": \"테스트.exe\", \"desc\": \"안녕방가워요\"}";
ofstream output(logFilePath, ios::app);
output << log << endl;
output.close();
Is there a way to save log files as uft-8 or any other good way?
Please give me some advice.
You could set UTF-8 in File->Advanced Save Options.
If you do not find it, you could add Advanced Save Options in Tools->Customize->Commands->Add Command..->File.
TDLR: write 0xefbbbf (3-bytes UTF-8 BOM) in the beginning of the file before writing out your string.
One of the hints that text viewer software use to determine if the file should be shown in the Unicode format is something called the Byte Order Marker (or BOM for short). It is basically a series of bytes in the beginning of a stream of text that specifies the encoding and endianness of the text string. For UTF-8 it is these three bytes 0xEF 0xBB 0xBF.
You can experiment with this by opening notepad, writing a single character and saving file in the ANSI format. Then look at the size of file in bytes. It will be 1 byte. Now open the file and save it in UTF-8 and look at the size of file again. It will 4 bytes that is three bytes for the BOM and one byte for the single character you put in there. You can confirm this by viewing both files in some hex editor.
That being said, you may need to insert these bytes to your files before writing your string to them. So why UTF-8? you may ask, well, it depends on the encoding the original string is encoded in (your std::string log) which in this case it is an string literal written in a source file whose encoding is (most likely) UTF-8. Therefor the bytes that build up the string are made according to this encoding and are put into your executable.
note that std::string can contain Unicode string, it just can't make sense of it. For example it reports its length wrong. But it can be used to carry Unicode string around fine.

Writing a mixof ascii and binary data in Fortran

I'm trying to write a mix of ASCII and binary data as given below for a vtk file format data.
I understand that the binary or ASCII distinction must be made in a file-OPEN statement (in the FORM='BINARY', preferably: ACCESS='STREAM' ). I don't understand how to write the file for the format I require.
What I'm trying to output:
ascii keyword
ascii keyword
ascii keyword
ascii keyword
ascii keywords "variable value in ascii" ascii keywords
.....SOME BINARY DATA ....
.....................
What I'm using:
write(fl) "# vtk DataFile Version 3.0"//CHAR(13)//CHAR(10)
write(fl)"Flow Field"//CHAR(13)//CHAR(10)
write(fl)"BINARY"//CHAR(13)//CHAR(10)
write(fl)"DATASET UNSTRUCTURED_GRID"//CHAR(13)//CHAR(10)
write(fl)"POINTS",npoints,"float" -------------> gives value of npoints(example:8) in binary format
What the output should be:
# vtk DataFile Version 3.0
Flow Field
BINARY
DATASET UNSTRUCTURED_GRID
POINTS 8 Float
.....SOME BINARY DATA ....
.....................
What the output is:
# vtk DataFile Version 3.0
Flow Field
BINARY
DATASET UNSTRUCTURED_GRID
POINTSÒ^O^#^#float
.....SOME BINARY DATA ....
...................
Firstly, you will find examples of writing of VTK files on the internet, like in questions binary vtk for Rectilinear_grid from fortran code can not worked by paraview and Binary VTK for RECTILINEAR_GRID from fortran code in various open source research codes, like https://bitbucket.org/LadaF/elmm/src/866794b5f95ec93351b0edea47e52af8eadeceb5/src/simplevtk.f90?at=master&fileviewer=file-view-default (this one is my simplified example, there are many more) or in dedicated libraries, like http://people.sc.fsu.edu/~jburkardt/f_src/vtk_io/vtk_io.html (there is also a VTKFortran library for the XML VTK files).
Socondly, even though you are on Windows, you should not use the Windows line ending conventions in VTK binary files. End your lines just with achar(10) (or the new_line constant from iso_fortran_env). And don't forget that the binary data must be bigendian. There are examples how to deal with that in the links above.
Thirdly, to put an integer number to a string, we have a huge number of duplicates. I mean really huge. Start here Convert integers to strings to create output filenames at run time and I will shamelessly recommend my itoa function there, because it will simplify your code a lot.
write(fl)"POINTS ",itoa(npoints)," float"
I would replace
write(fl)"POINTS",npoints,"float"
with
BLOCK
integer, parameter :: big_enough = 132 ! Or whatever
character(big_enough) line
write(line,'(*(g0))')"POINTS ",npoints," Float"//achar(13)//achar(10)
write(f1) trim(line)
END BLOCK

How to set the UTF-8 as default encoding in Odoo Build?

Can anyone tell me how to set the UTF-8 as default encoding option in Odoo Build.?
Note : I have mentioned "# -- coding: utf-8 --" in all the files which takes no effect on my expected encoding.
If you put # coding: utf-8 at the top of a Python module, this affects the way how Python interprets the source code. This is important if you have string literals with non-ASCII characters in your code, in order to have them represent the correct characters.
However, since you talk about "default encoding", I assume you care about the encoding of text files opened for reading or writing. In Python 2.x, the default for reading and writing files is not to decode/encode at all. I don't think you can change this default (because the built-in function open simply doesn't support encoding), but you can use io.open() or codecs.open() to open files with an explicit encoding.
Thus, to read from a file encoded with UTF-8, open it as follows:
with io.open(filename, encoding='utf-8') as f:
for line in f:
...
In Python 3, built-in open() is the same as io.open(), and the default encoding is platform-dependent.

wrong text encoding on linux

I downloaded a source code .rar file from internet to my linux server. Then, I extract all source files into local directory. When I use "cat" command to see the content of each file, the wrong text encoding is shown on my terminal (There are some chinese characters in the source file).
I use
file -bi testapi.cpp
then shows:
text/plain; charset=iso-8859-1
I tried to convert that file to uft-8 encoding with following command:
iconv -f ISO88591 -t UTF8 testapi.cpp > new.cpp
But it doesn't work.
I set my .vimrc file with following two lines:
set encoding=utf-8
set fileencoding=utf-8
After this, when I vim testapi.cpp, the chinese characters will be normally displayed in the vim. But cat testapi.cpp doesn't work.
When I compile and run the program, the printf statement with chinese characters will print wrong characters like ????
What should I do to display correct chinese characters when I run the program?
TLDR Quickest Solution: Copy/Paste the Visible Text to a Brand-New, Confirmed UTF-8 File
Your file is marked as latin1, but the data is stored as utf8.
When you set set-enc=utf8 or set fileencoding=utf-8 in VIM, you're not changing the data or converting it. You're looking at the same exact data, but interpreting as if it is the utf8 charset. So, good news: Your data is good. No conversion or changing necessary.
You just need to put the same exact data into a file already marked as UTF-8 encoding. That can be done easily by simply making a brand new file in vim, using set enc=utf8, and then copy-pasting your old data into the new file. You can test this out by making a testfile with the only text "汉语" ("chinese language"), set enc, save, close, reopen, and see that the text didn't get corrupted. And you can test with file -bi $pathtofile, though that is not super reliable.
Anyway, TLDR: Make a brand new UTF-8 file, confirm that it's utf-8, make your data visible, and then copy/paste and/or transfer it to the new UTF-8 file, without doing any conversion.
Also, theoretically, I considered that iconv -f utf8 -t utf8 would work, since all I wanted to do was make utf-8-encoded data be marked as utf-8-encoded, without changing it. But this gave me an error that indicated it was still trying to do a data conversion.

Read text-file in C++ with fopen without linefeed conversion

I'm working with text-files (UTF-8) on Windows and want to read them using C++.
To open the file corrently, I use fopen. As described here, there are two options for opening the file:
Text mode "rt" (Carriage return + Linefeed will automatically be converted into Linefeed; Short "\r\n" becomes "\n").
Binary mode "rb" (The file will be read byte by byte).
Now it becomes tricky. I don't want to open the file in binary mode, since I would lose the correct handling of my UTF-8 characters (and there are special characters in my text-files, which are corrupted when interpreted as ANSI-character). But I also don't want fopen to convert all my CR+LF into LF.
Is there a way to combine the two modes, to read a text-file into a string without tampering with the linefeeds, while still being able to read UTF-8 correctly?
I am aware, that the reverse conversion would happen, if I write it through the same file, but the string is sent to another application that expects Windows-style line-endings.
The difference between opening files in text and binary mode is exactly the handling of line end sequences in text mode or not touching them in binary mode. Nothing more nothing less. Since the ASCII characters use the same code points in Unicode and UTF-8 retains the encoding of ASCII characters (i.e., every ASCII file happens to be a UTF-8 encoded Unicode file) whether you use binary or text mode won't affect the other bytes.
It may be worth to have a look at James McNellis "Unicode in C++" presentation at C++Now 2014.