How to decode a USB device label - c++

I am trying to get a USB device label from the udev library. But I have a problem when the label is not in UTF8 encoding.
The USB device was previously formatted on Windows and has the FAT32 file system. The USB name is “РФПАЦУ” (I used Cyrillic for test purposes (CP866 code page)). To get the USB device properties, I run the following command:
sudo /sbin/blkid -o udev -p /dev/sdd1
The answer is as follows:
ID_FS_LABEL=______
ID_FS_LABEL_ENC=\x90\x94\x8f\x80\x96\x93
According to https://bbs.archlinux.org/viewtopic.php?id=197582
ID_FS_LABEL contains plain ascii, with hex-escaped and any valid utf8 characters but all whitespaces are replaced with '_' , while in ID_FS_LABEL_ENC all potentially unsafe characters are replaced by the corresponding hex value prefixed by '\x'.
I cannot just unhex the ID_FS_LABEL_ENC since the amount of bytes to read is unknown.
Is there a way to find out the encoding of ID_FS_LABEL_ENC? Or a way to get the correct label of a USB device?

Related

Write to the input of process strings received by socket

I have an application on the Windows platform that receives remote commands from applications running on the Linux platform.
The Linux applications are experiencing difficulties accessing directories or files that contain accented characters, they send the command to access such files/directories and the return is always: "directory/file not found".
I think the two applications are with different code page, I venture to say this because I previously had problems in linux applications, the directories and files with accented words came with strange symbols in std::cout, and after I added SetConsoleOutputCP (CP_UTF8) in the windows application the problem was solved, and finally the paths containing accents were readable, does this mean that the linux application has code page 65001? Anyway, the problem when sending strings containing the path to the directories/files still persists, whenever the linux application tries to access paths containing accented words it fails.
I'll try to show how the two applications communicate.
Windows Side:
In short, this is the part where the client receives the message from the linux application, and then writes in the process what was received. In this part when writing paths containing accented characters the application returns in the output that is not possible to find them.
BYTE buffer[4096];
DWORD BytesWritten;
int ret = SSL_read(stI->ssl, (char*)buffer, sizeof(buffer));
if (ret <= 0)
break;
if(!WriteFile(stI->hStdIn, buffer, ret, &BytesWritten, NULL))
break;
And then it reads the output of the process and sends the content to the Linux application.
BYTE buffer[4096];
DWORD BytesAvailable, BytesRead;
if (!ReadFile(stI->hStdOut, buffer, min(sizeof(buffer), BytesAvailable), &BytesRead, NULL))
break;
ret = SSL_write(stI->ssl, (char*)buffer, BytesAvailable);
if (ret <= 0)
break;
Linux Side:
This part is very basic, the application reads a user input and then sends it to the windows application.
std::string inputBuffer;
ZH->console_input(inputBuffer, 33); // This function only controls the input and output of data with termios.
inputBuffer+='\n' // To simulate an enter in windows application
// Sends the typed path to the Windows application
SSL_write(session_data.ssl, inputBuffer.c_str(), strlen(inputBuffer.c_str()))
The part of receiving the data is basically the same as the windows application, it receives the data in a char variable and then print on the screen with std::cout.
The only difference is that the socket is set to NONBLOCK and I use the select function.
Any suggestions on how to solve this problem?
Your best bet is to use proper unicode encodings. Windows tends to use UTF-16 (uses 2 bytes to represent a character), Linux on the other hand uses UTF-8. This is typically uses a single byte per character for ASCII and escapes non ascii characters (\uxxxx where x represents a hex digit). If you do a proper conversion from Windows UTF-16 to UTF-8, things should work correctly.
C++11 and Boost do provide some Unicode support, but for gold standard support, take a look at ICU.
Sockets however just transmit bytes so they have nothing to do with Unicode conversions.

How to get USB Drive Label in Linux?

I am trying to get USB drive's Label in my c/c++ Application. I am using libudev to get the usb details. But it doesn't provides the drives Label. Does any one have an idea on how to get the drive Label. I am working on embedded platform, it doesn't have a /dev/disk folder.
Please Help.
Kernel Version : 3.3.8
Normally, a usb filesystem has a vfat partition on it to make it compatible between msdos, windows, linux and mac architectures.
The label is a property of the vfat filesystem. It appears normaly as the first directory entry in the root directory and marked as a filesystem label. Recent implementations of msdos filesystems (merely vfat exfat and fat32) write it also in a fixed part of the boot record for that partition, so you can read it from there.
You have volume serial-number at offset 0x43 (4 bytes) in the first sector of the partition.
You have also a copy of the volume label at offset 0x47 in that first sector also (11 bytes length)
The trick is: as normally a usb stick is partitioned (with only one partition) you have to:
look in the first sector of the usb stick for the partition table and locate the first parition.
then, look in the first sector of that parition, locate byte offset 0x43 and use that four bytes as volume serial number (it matches UUID="..." in /etc/fstab linux file) and the eleven bytes that follow for the volume label.
Note
Be careful that NTFS doesn't use that place for that purpose and you can damage a NTFS partition writing there. Just read from that place.
Note 2
Also, don't try to write to that place even in vfat filesystems, as they also maintain a copy of the volume label in the root directory of the filesystem.
Note 3
The easiest way to get the label of a dos filesystem (and ext[234], ntfs, etc) in linux is with the command blkid(8) it gives the followind output:
/dev/sda1: UUID="0b2741c0-90f5-48d7-93ce-6a03d2e8e9aa" TYPE="ext4"
/dev/sda5: UUID="62e2cbf2-d847-4048-856a-a90b91116285" TYPE="crypto_LUKS"
/dev/mapper/sda5_crypt: UUID="vnBDh3-bcaR-Cu7E-ok5D-oeFp-5SyP-MmAEsb" TYPE="LVM2_member"
/dev/mapper/my_vg-root: UUID="1b9f158b-35b5-490e-b914-bdc70e7f5c28" TYPE="ext4"
/dev/mapper/my_vg-swap_1: UUID="36b8ac81-7043-42ae-9f2a-908d53e2a2b3" TYPE="swap"
/dev/sdb1: LABEL="K003_1G" UUID="641B-80BF" TYPE="vfat"
As you can see, the last entry is for a vfat usb pendrive, but you have to parse this output (I think is not difficult to do)
I believe the "label" of a disk is a property maintained by the file system it's using, i.e. it's not at the USB level.
You're going to need the proper file system implementation, i.e. "mount" the disk.
You can use blkid to read the USB device label:
blkid USB_PATH | grep -o ""LABEL.*"" | cut -d'\"' -f2

how to detect character encoding between two operating systems?

I'm writing an application for BlackBerry 10 using z10 and normal (qt), not cascades. When I read a string from the virtual keyboard, using the keypressevent() function, I convert the first character from the QKeyEveny::text() to a std char. On the device if I do qDebug() «(int) converted_char characters like r are represented by 114. But on my laptop it is 40(i believe) .
Hence if I send a char from the phone to the laptop using tcp, the correct character is never printed, how can I fix this, what is going on?

Windows usage of char * functions with UTF-16

I port one application from Linux to Windows.
On Linux I use libmagic library from which I wouldn't be glad to rid of on Windows.
The problem is that I need pass name of file that is held in UTF-16 encoding to such function:
int magic_load(magic_t cookie, const char *filename);
Unfortunately it accepts only const char *filename. My first idea was to convert UTF-16 string to local encoding, but there are some problems - like string can contain e.g. Chinese symbols and local encoding may be Russian.
As result we will get trash on the output and program will not reach its aim.
Converting into UTF-8 doesn't help either, because this is Windows and Windows holds file name in UTF-16.
But I somehow need make that function able to open file with Unicode name.
I came only to one very very bad solution:
1. I have a filename
2. I can copy file with unicode name to file with ASCII name like "1.mp3"
3. open it with libmagic functions and get what I want
4. remove temporarily file
But I understand how this solution is bad and how it could make my application slower, so I wonder, perhaps are there some better ways to do it?
Thanks in advance for any tips, 'cause I'm really confused with it.
Use 8.3 file names to access the files.
In addition to long file names up to 255 characters in length, Windows also generates an MS-DOS-compatible (short) file name in 8.3 format.
http://support.microsoft.com/kb/142982

Reading file with cyrillic

I have to open file with cyrillic symbols. I've encoded file into utf8. Here is example:
en: Couldn't your family afford a
costume for you
ru: Не ваша семья
позволить себе костюм для вас
How do I open file:
ifstream readFile(fileData.c_str());
while (!readFile.eof())
{
std::getline(readFile, buffer);
...
}
The first trouble, there is some symbol before text 'en' (I saw this in debugger):
"en: least"
And another trouble is cyrillic symbols:
" ru: наименьший"
What's wrong?
there is some symbol before text 'en'
That's a faux-BOM, the result of encoding a U+FEFF BYTE ORDER MARK character into UTF-8.
Since UTF-8 is an encoding that does not have a byte order, the faux-BOM shouldn't ever be used, but unfortunately quite a bit of existing software (especially in the MS world) does nonetheless. Load the messages file into a text editor and save it back out again as UTF-8, using a “UTF-8 without BOM” encoding if one is especially listed.
ru: наименьший
That's what you get when you've got a UTF-8 byte string (representing наименьший) and you print it as if it were a Code Page 1252 (Windows Western European) byte string. It's not an input problem; you have read in the string OK and have a UTF-8 byte string. But then, in code you haven't quoted, it gets output as cp1252.
If you're just printing it to the console, this is to be expected, as the console always uses the system default code page (1252 on a Western Windows install), and not UTF-8. If you need to send Unicode to the console you'll have to convert the bytes to native-Unicode wchar​s and write them from there. I don't know what the final destination for your strings is though... if you're just going to write them to another file or something you could just keep them as bytes and not care about what encoding they're in.
i suppose that your os is windows. exists several ways simple:
Use wchar_t, wstring, wifstream, etc.
Use icu library
Use other super puper library (them really many)
Note: for console printing you must use WinApi functions to convert UTF-8 to cp866 (my default cyrilic windows encoding cp1251) because of windows console supports only dos encodings.
Note: for file printing you need to know what encoding use your file
Use libiconv to convert the text to a usable encoding after reading.
Use icu to convert the text.