Using std:: string in hdf5 creates unreadable output - c++

I'm currently using hdf5 1.8.15 on Windows 7 64bit.
The sourcecode of my software is saved in files using utf8 encoding.
As soon as I call any hdf5 function supporting std:: string, the ouput gets cryptic
But if I use const char* instead of std::string, everything works fine. This applies also to the filename.
Here is a short sample:
std::string filename_ = "test.h5";
H5::H5File file( filename_.c_str(), H5F_ACC_TRUNC); // works
H5::H5File file( filename_, H5F_ACC_TRUNC); // filename is not readable or
// hdf5 throws an exception
I guess that this problem is caused by different encodings used in my source files and hdf5. But I'm not sure about this and found no solution allowing the use of std::strings. I would appreciate any idea which helps me with this issue.

I also had the same problem, and fixed it by changing all my std::string or h5std_string to literally:
5File file("myFile.h5", H5F_ACC_TRUNC);
Or use string.c_str() to change the string to const char.

I had exactly the same problem. The solution was, that I was in Debug-Mode in Visual Studio, whereas the libraries I linked against were build in Release-Mode. When I switched in Visual Studio to Release-Mode, the above error disappeared.

Related

How to properly navigate directory paths in C++

I'm working on a solution within Visual Studio. It currently has two projects.
I will represent Directories or folders with capitals letters, and filenames will be all lower case. My solution structure is as follows:
SolutionDir
ProjectLib
source files
Shaders
shader files
ProjectApp
source files
x64
Debug
app.exe // debug build
Release
app.exe // release build
Within ProjectLib I have a function to open and read my Shader files. Here is what my function looks like:
std::vector<char> VRXShader::readFile(std::string_view shadername) {
std::string filename = std::string("Shaders/");
filename.append(shadername);
std::ifstream file(filename.data(), std::ios::ate | std::ios::binary);
if (!file.is_open()) {
throw std::runtime_error("failed to open file!");
}
size_t fileSize = static_cast<size_t>(file.tellg());
std::vector<char> buffer(fileSize);
file.seekg(0);
file.read(buffer.data(), fileSize);
file.close();
return buffer;
}
This function is being called within my VRXDevices::createPipeline function and here is the relevant code:
void VRXDevices::createPipeline(
VkDevice device, VkExtent2D swapChainExtent, VkRenderPass renderPass,
const std::vector<std::string_view>& shaderNames,
VkPipelineLayout& pipelineLayout, VkPipeline& pipeline
) {
std::vector<std::vector<char>> shaderCodes;
shaderCodes.resize(shaderNames.size());
for (auto& name : shaderNames) {
auto shaderCode = VRXShader::readFile(name.data());
}
// .... more code
}
The names are being created and passed to this function from my VRXEngine::initVulkan function which can be seen here:
void VRXEngine::initVulkan(
std::string_view app_name, std::string_view engine_name,
glm::ivec3 app_version, glm::ivec3 engine_version
) {
//... code
std::vector<std::string_view> shaderFilenames{ "vert.spv", "frag.spv" };
VRXDevices::createPipeline(device_, swapChainExtent_, renderPass_, shaderFilenames, pipelineLayout_, graphicsPipeline_);
}
I'm using just the name of the shader files such as vert.spv, frag.spv, geom.spv etc. I'm not including the paths here because these will be used as the key to a std::map<string_view, object>. So I'm passing a vector of these names from my ::initVulkan function into ::createPipeline().
Within ::createPipeline() is where ::readFile() is being called passing in the string_view.
Now as for my question... within ::readFile() I'm creating a local string and trying to initialize it with the appropriate path... then append to it the string_view for the shader's filename as can be seen from these two lines...
std::string filename = std::string("Shaders/");
filename.append(shadername);
I'm trying to figure out the appropriate string to initialize filename with... Shaders/ will be a part of the name, but it's not finding the file and I'm not sure what the appropriate prefix should be...
My working directories within both projects are as follows:
ProjectApp -> $(SolutionDir)x64/Release AND $(SolutionDir)x64/Debug
ProjectLib -> $(SolutionDir)x64/Release AND $(SolutionDir)x64/Debug
So I need to go back 2 directories then into VRX Engine/Shader...
What is the correct string value for navigating back directories?
Would I initialize filename with "../../VRX Engine/Shaders/" or is it "././" also, should I have quotes around VRX Engine since there is a space in the folder name? What do I need to initialize filename with before I append the shader name to it?
How to properly navigate directory paths in C++
It depends on which C++ standard your implementation claims to be compliant with.
Or else which additional libraries can you use.
C++ is useful on computers without directories (e.g. inside some operating system kernel coded in C++ and compiled with GCC, see OSDEV for examples).
Look on en.cppreference.com for details.
Licensing constraints could matter when using extra open source libraries.
If your implementation is C++17 compliant (in a "hosted" not "freestanding" way), use the std::filesystem part of the standard library.
If your operating system supports the Qt or POCO frameworks and you are allowed to use them (e.g. on C++11), you could use appropriate APIs. So QDir and related classes with Qt, Poco::Path and related classes with POCO.
Perhaps you want to code just for the WinAPI. Then read its documentation (I never coded on Windows myself, just on POSIX or Unix -e.g. Linux- and MSDOS....).
I was originally initializing my local temp string properly with "../../VRX Engine/Shaders/" before appending the string_view to it to be able to open the file. This was actually correct, but because it didn't initially work, I was assuming that it was wrong.
The correct string value for going back one directory should be "../" at least on Windows, I'm not sure about Linux, Mac, Android, etc...
My problem wasn't with the string at all, it pertained to settings within my projects. Within my project that builds into an executable, I had its working directory set to $(SolutionDir)x64/Debug and $(SolutionDir)x64/Release respectively which is correct for my solutions structure.
The issue was within my Engine project that is being built as a static library. Within its settings for its working directory, I had forgotten to modify both of the Debug and Release build options... These were still set to the default values of Visual Studio which I believe is (ProjectDir). Once I changed these to $(SolutionDir)x64/Debug and $(SolutionDir)x64/Release to match that of my ApplicationProject, I was able to open and read the contents of the files.

Working with UTF-8 std::string objects in C++

I'm using Visual Studio and C++ on Windows to work with small caps text like ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ using e.g. this website. Whenever I read this text from a file or put this text directly into my source code using std::string, the text visualizer in Visual Studio shows it in the wrong encoding, presumably the visualizer uses Windows (ANSI). How can I force Visual Studio to let me work with UTF-8 strings properly?
std::string message_or_file_path = "...";
auto message = message_or_file_path;
// If the file path is valid, read from that file
if (GetFileAttributes(message_or_file_path.c_str()) != INVALID_FILE_ATTRIBUTES
&& GetLastError() != ERROR_FILE_NOT_FOUND)
{
std::ifstream file_stream(message_or_file_path);
std::string text_file_contents((std::istreambuf_iterator<char>(file_stream)),
std::istreambuf_iterator<char>());
message = text_file_contents; // Displayed in wrong encoding
message = "ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in wrong encoding
std::wstring wide_message = L"ʜᴇʟʟᴏ ꜱᴛᴀᴄᴋᴏᴠᴇʀꜰʟᴏᴡ"; // Displayed in correct encoding
}
I tried the additional command line option /utf-8 for compiling and setting the locale:
std::locale::global(std::locale(""));
std::cout.imbue(std::locale());
Neither of those fixed the encoding issue.
From What’s Wrong with My UTF-8 Strings in Visual Studio?, there are a couple of ways to see the contents of a std::string with UTF-8 encoding.
Let's say you have a variable with the following initialization:
std::string s2 = "\x7a\xc3\x9f\xe6\xb0\xb4\xf0\x9f\x8d\x8c";
Use a Watch window.
Add the variable to Watch.
In the Watch window, add ,s8 to the variable name to display its contents as UTF-8.
Here's what I see in Visual Studio 2015.
Use the Command Window.
In the Command Window, use ? &s2[0],s8 to display the text as UTF-8.
Here's what I see in Visual Studio 2015.
A working solution was simply rewriting all std::strings as std::wstrings and adjusting the code logic properly to work with std::wstrings, as indicated in the question as well. Now everything works as expected.

How to initialize or assign 中文 to wstring?

I tried to use L"string", but it doesn't work.
#include <iostream>
using namespace std;
int main(){
wstring wstr = L"你好";//[Error] converting to execution character set: Illegal byte sequence
wcout<<wstr<<endl;
}
Use wcin and input 中文 works fine.
#include <iostream>
using namespace std;
int main(){
wstring wstr;
wcin>>wstr;//Input Chinese is OK
wcout<<wstr<<endl;
}
How to initialize or assign 中文 to wstring?
Edit: I tried some online compilers. They all can compile but all output "??".
e.g. cpp.sh jdoodle onlinegdb repl.it
Edit 2: I installed g++ i868 MinGW-W64 8.1.0. Use Visual Studio to save the cpp file as utf8 format. Then use command line to compile it. It still output nothing.
Your compiler clearly doesn't like Unicode characters in its source files. Try initializing your string with Unicode escapes, instead:
wstring wstr = L"\u4E2D\u6587"; // These MAY be the correct codes.
Where 4E2D and 6587 are replaced with the actual hexadecimal values for the characters you want. (Sorry, but I don't have access to a full Unicode table for Chinese characters: I tried pasting them into my compiler, and these are the values it gave me on translating.)
The Unicode values given are for the character string in your question (中文); for the (different - 你好) one in your posted code, use L"\u4F60\u597D".
Also see the answer by #MarekR.
This must be configuration issue!
Apparently your compiler uses different encoding then your file is written in!
Since you are using Windows most probably encoding of file on your machine is not UTF-8 (end you have copied this file to Linux), but something else.
Since gcc is more Linux friendly it may expect UTF-8 and you have an conflict.
This is common problem, since Windows for a long time did maintain some backward compatibility with DOS (where only single byte characters where allowed and system used code pages for respective languages).
As you can see here, most compilers with default settings do not have a problem with code which uses Chinese characters.
I do not see TCM-GCC 4.9.2 compiler on godbolt, but it is not very old gcc after all.
I recommend ensure that code is written in UTF-8 and compiler will treat sources as UTF-8 encoded.
Edit: Adding std::locale::global(std::locale("")); made your code properly displaying this string on godbolt.
I tried in Visual Studio. It works(outputs "你好") if I saved my C++ file to either Unicode or utf8 format. Can you try saving your c++ file to either Unicode or utf8?

Move a text file from one location to another in Turbo C++

I have been trying to use the following code snippet to move a text file from one location to another (to a folder on the Desktop). However, the method of using the REN function of DOSBox or the rename function of C++ has failed.
char billfile[] = "Text.txt";
char path[67] = "ren C:\\TURBOC3\\Projects\\";
strcat(path, billfile);
strcat(path, " C:\\Users\\Admini~1\\Desktop\\Bills");
system(path);
Are there any other alternatives to this?
P.S.: This is for a school project, where Turbo C++ has to be used
Corresponding to this website for stdio.h the TurboC run-time library supports the rename function.
So even if you are obliged to use a totally outdated tool like TurboC++ it's not necessary to spawn a new process with the system function just to rename the file.
If you are using the Win32 API then consider looking into the functions CopyFile or CopyFileEx.
You can use the first in a way similar to the following:
CopyFile( szFilePath.c_str(), szCopyPath.c_str(), FALSE );
This will copy the file found at the contents of szFilePath to the contents of szCopyPath, and will return FALSE if the copy was unsuccessful. To find out more about why the function failed you can use the GetLastError() function and then look up the error codes in the Microsoft Documentation.

UCS-2LE text file parsing

I have a text file which was created using some Microsoft reporting tool. The text file includes the BOM 0xFFFE in the beginning and then ASCII character output with nulls between characters (i.e "F.i.e.l.d.1."). I can use iconv to convert this to UTF-8 using UCS-2LE as an input format and UTF-8 as an output format... it works great.
My problem is that I want to read in lines from the UCS-2LE file into strings and parse out the field values and then write them out to a ASCII text file (i.e. Field1 Field2). I have tried the string and wstring-based versions of getline – while it reads the string from the file, functions like substr(start, length) do interpret the string as 8-bit values, so the start and length values are off.
How do I read the UCS-2LE data into a C++ String and extract the data values? I have looked at boost and icu as well as numerous google searches but have not found anything that works. What am I missing here? Please help!
My example code looks like this:
wifstream srcFile;
srcFile.open(argv[1], ios_base::in | ios_base::binary);
..
..
wstring srcBuf;
..
..
while( getline(srcFile, srcBuf) )
{
wstring field1;
field1 = srcBuf.substr(12, 12);
...
...
}
So, if, for example, srcBuf contains "W.e. t.h.i.n.k. i.n. g.e.n.e.r.a.l.i.t.i.e.s." then the substr() above returns ".k. i.n. g.e" instead of "g.e.n.e.r.a.l.i.t.i.e.s.".
What I want is to read in the string and process it without having to worry about the multi-byte representation. Does anybody have an example of using boost (or something else) to read these strings from the file and convert them to a fixed width representation for internal use?
BTW, I am on a Mac using Eclipse and gcc.. Is it possible my STL does not understand wide character strings?
Thanks!
Having spent some good hours tackling this question, here are my conclusions:
Reading an UTF-16 (or UCS2-LE) file is apparently manageable in C++11, see How do I write a UTF-8 encoded string to a file in Windows, in C++
Since the boost::locale library is now part of C++11, one can just use codecvt_utf16 (see bullet below for eventual code samples)
However, in older compilers (e.g. MSVC 2008), you can use locale and a custom codecvt facet/"recipe", as very nicely exemplified in this answer to Writing UTF16 to file in binary mode
Alternatively, one can also try this method of reading, though it did not work in my case. The output would be missing lines which were replaced by garbage chars.
I wasn't able to get this done in my pre-C++11 compiler and had to resort to scripting it in Ruby and spawning a process (it's just in test so I think that kind of complications are ok there) to execute my task.
Hope this spares others some time, happy to help.
substr works fine for me on Linux with g++ 4.3.3. The program
#include <string>
#include <iostream>
using namespace std;
int main()
{
wstring s1 = L"Hello, world";
wstring s2 = s1.substr(3,5);
wcout << s2 << endl;
}
prints "lo, w" as it should.
However, the file reading probably does something different from what you expect. It converts the files from the locale encoding to wchar_t, which will cause each byte becoming its own wchar_t. I don't think the standard library supports reading UTF-16 into wchar_t.