How to set UTF-8 encoding without BOM for CStdioFileEx?

How to set UTF-8 encoding without BOM for CStdioFileEx? - c++

I try to read a file via MFC:
CString string;
CStdioFileEx gameFile;
bool have_file = false;
if (PathFileExists(filePathsAndNames[i].first + L"\\main.lua"))
{
gameFile.Open(filePathsAndNames[i].first + L"\\main.lua", CFile::modeRead);
have_file = true;
}
else if (PathFileExists(filePathsAndNames[i].first + L"\\main3.lua"))
{
gameFile.Open(filePathsAndNames[i].first + L"\\main3.lua", CFile::modeRead);
have_file = true;
}
if (have_file)
{
gameFile.SetCodePage(CP_UTF8);
CString game_name;
CString game_name_en;
CString game_author;
CString game_version;
const int MAX_STR_CNT = 5; //не больше этого количества строк от начала
int curr_str = 0;
while (gameFile.ReadString(string)) {
...
}
}
When a file has UTF-8 encoding without a BOM, the readString() method skips near two, three characters in the first line. When the file has UTF-8 encoding with a BOM, all is ok.
How can I fix it?
Is it an issue which I should report to Microsoft? If yes, how can I do it?

Related

Move a file or folder to the RecycleBin/Trash (C++17)

I am trying to write function to move files to trash.
For example when I use a file path with unicode and whitespace I cannot send it to the Recycle Bin.
...\Yönü Değiştir\Yönü Değiştir Sil.txt
I found many examples on the forum.
But I couldn't run it correctly.
Where did I go wrong,
Can you help me write the function correctly?
My function and code is like this:
. includes...
.
.
bool recycle_file_folder(std::string path) {
std::wstring widestr = std::wstring(path.begin(), path.end());
const wchar_t* widecstr = widestr.c_str();
SHFILEOPSTRUCT fileOp; //#include <Windows.h>;
fileOp.hwnd = NULL;
fileOp.wFunc = FO_DELETE;
fileOp.pFrom = widecstr; /// L"C:\\Users\\USER000\\Documents\\Yönü Değiştir\\Yönü Değiştir Sil.txt";
fileOp.pTo = NULL;
fileOp.fFlags = FOF_ALLOWUNDO | FOF_NOERRORUI | FOF_NOCONFIRMATION | FOF_SILENT;
int result = SHFileOperation(&fileOp);
if (result != 0) {
return false;
}
else {
return true;
}
}
int main()
{
std::filesystem::path p("C:\\Users\\USER000\\Documents\\Yönü Değiştir\\Yönü Değiştir Sil.txt");
recycle_file_folder(p.string());
return 0;
}
Now it works successfully when you specify the file like this:
fileOp.pFrom = L"C:\\Users\\USER000\\Documents\\Yönü Değiştir\\Yönü Değiştir Sil.txt";
How do I adapt this to function for all files?

I think your conversion between wstring and string has problem. Note that std::filesystem supports converting to both string and wstring so let's re-write your code a bit
bool recycle_file_folder(std::wstring path) {
std::wstring widestr = path + std::wstring(1, L'\0');
SHFILEOPSTRUCT fileOp;
fileOp.hwnd = NULL;
fileOp.wFunc = FO_DELETE;
fileOp.pFrom = widestr.c_str();
fileOp.pTo = NULL;
fileOp.fFlags = FOF_ALLOWUNDO | FOF_NOERRORUI | FOF_NOCONFIRMATION | FOF_SILENT;
int result = SHFileOperation(&fileOp);
if (result != 0) {
return false;
}
else {
return true;
}
}
int main()
{
std::filesystem::path p("C:\\Users\\USER000\\Documents\\Yönü Değiştir\\Yönü Değiştir Sil.txt");
recycle_file_folder(p.wstring());
return 0;
}

a file path with unicode and whitespace
The problem is not in whitespace, it is with non-ASCII characters.
std::wstring widestr = std::wstring(path.begin(), path.end());
This is not a correct way to convert characters of some code page to UTF-16.
You'll have to use a method suggested in this Q&A: C++ Convert string (or char*) to wstring (or wchar_t*) (Ignore the answer by Pietro M, look into other answers)
Alternately, use SHFileOperationA, and SHFILEOPSTRUCTA, but it is a worse solution.

String includes just valid chars?

I'd like to valid a string to check if the string just includes valid characters or not using C++.
Valid characters should be given to the function like as charset of valid characters: "abc123".
A string that just includes the characters given in the charset above should return true while a string that also includes other characters then given should return false. Obviously a easy task :)
--> using charset abc123:
string myString_1 = "bbbac1" // should get true
string myString_2 = "bbbac132aacc" // should get true
string myString_3 = "xxxxxx" // should get false
string myString_4 = "bbbac12533cc" // should get false
How can I implement a call like this in C++?
Note: I though about using something like the code below but I'm pretty sure theres a way better solution.
string charset = "abc123";
string myString = "bbbac1";
for (int i=0; i<charset.length(); i++) {
std::replace( myString.begin(), myString.end(), charset[i], '');
}
bool isValid = (myString.length() == 0);

AS igor-tandetnik pointed in comments this is a job for std::find_first_not_of:
auto validate(const std::string& str, const std::string& charset) -> bool
{
return str.find_first_not_of(charset) == std::string::npos;
}

You can write your own check function:
bool checkstring(std::string &checkstring, std::string &legalchars) {
for (char c : checkstring) {
// resetting the bool
bool isLegal = false;
for (char d : legalchars) {
// comparing the chars
if (c == d) { isLegal = true; }
}
// if a non-legal char was found, return false
if (!isLegal) { return false; }
}
// if no non-legal character was found, return true
return true;
}
Although there might be a better alternative using the standard libraries, especially if you need to compare very long strings with a large set of legal characters.

`fgetpos` Not Returning the Correct Position

Update: To get around the problem below, I have done
if (ftell(m_pFile) != m_strLine.size())
fseek(m_pFile, m_strLine.size(), SEEK_SET);
fpos_t position;
fgetpos(m_pFile, &position);
this then returns the correct position for my file. However, I would still like to understand why this is occurring?
I want to get the position in a text file. For most files I have been reading the first line, storing the position, doing some other stuff and returning to the position afterwards...
m_pFile = Utils::OpenFile(m_strBaseDir + "\\" + Source + "\\" + m_strFile, "r");
m_strLine = Utils::ReadLine(m_pFile);
bEOF = feof(m_pFile) != 0;
if (bEOF)
{
Utils::CompilerError(m_ErrorCallback,
(boost::format("File '%1%' is empty.") % m_strFile).str());
return false;
}
// Open.
pFileCode = Utils::OpenFile(strGenCode + "\\" + m_strFile, options.c_str());
m_strLine = Utils::Trim(m_strLine);
Utils::WriteLine(pFileCode, m_strLine);
// Store location and start passes.
unsigned int nLineCount = 1;
fpos_t position;
fgetpos(m_pFile, &position);
m_strLine = Utils::ReadLine(m_pFile);
...
fsetpos(m_pFile, &position);
m_strLine = Utils::ReadLine(m_pFile);
With all files provided to me the storage of the fgetpos and fsetpos works correctly. The problem is with a file that I have created which looks like
which is almost identical to the supplied files. The problem is that for the file above fgetpos(m_pFile, &position); is not returning the correct position (I am aware that the fpos_t position is implementation specific). After the first ReadLine I get a position of 58 (edited from 60) so that when I attempt to read the second line with
fsetpos(m_pFile, &position);
m_strLine = Utils::ReadLine(m_pFile);
I get
on 700
instead of
Selection: Function ADJEXCL
Why is fgetpos not returning the position of the end of the first line?
_Note. The Utils.ReadLine method is:
std::string Utils::ReadLine(FILE* file)
{
if (file == NULL)
return NULL;
char buffer[MAX_READLINE];
if (fgets(buffer, MAX_READLINE, file) != NULL)
{
if (buffer != NULL)
{
std::string str(buffer);
Utils::TrimNewLineChar(str);
return str;
}
}
std::string str(buffer);
str.clear();
return str;
}
with
void Utils::TrimNewLineChar(std::string& s)
{
if (!s.empty() && s[s.length() - 1] == '\n')
s.erase(s.length() - 1);
}
Edit. Following the debugging suggestions in the comments I have added the following code
m_pFile = Utils::OpenFile(m_strBaseDir + "\\" + Source + "\\" + m_strFile, "r");
m_strLine = Utils::ReadLine(m_pFile);
// Here m-strLine = " Logic Definition Report Chart Version: New Version 700" (64 chars).
long vv = ftell(m_pFile); // Here vv = 58!?
fpos_t pos;
vv = ftell(m_pFile);
fgetpos(m_pFile, &pos); // pos = 58.
fsetpos(m_pFile, &pos);
m_strLine = Utils::ReadLine(m_pFile);

Sorry, but your Utils functions have clearly been written by an incompetent. Some issues are just a matter of style. For trimming:
void Utils::TrimNewLineChar(std::string& s)
{
if (!s.empty() && *s.rbegin() == '\n')
s.resize(s.size() - 1); // resize, not erase
}
or in C++11
void Utils::TrimNewLineChar(std::string& s)
{
if (!s.empty() && s.back() == '\n')
s.pop_back();
}
ReadLine is even worse, replace it with:
std::string Utils::ReadLine(FILE* file)
{
std::string str;
char buffer[MAX_READLINE];
if (file != NULL && fgets(buffer, MAX_READLINE, file) != NULL)
{
// it is guaranteed that buffer != NULL, since it is an automatic array
str.assign(buffer);
Utils::TrimNewLineChar(str);
}
// copying buffer into str is useless here
return str;
}
That last str(buffer) in the original worries me especially. If fgets reaches a newline, fills the buffer, or reaches end of file, you're guaranteed to get a properly terminated string in your buffer. If some other I/O error occurs? Who knows? It might be undefined behavior.
Best not to rely on the value of buffer when fgets fails.

Why is this encrypted message damaged?

I use the following code to encrypt a string with a key, using the 3-DES algorithm:
private bool Encode(string input, out string output, byte[] k, bool isDOS7)
{
try
{
if (k.Length != 16)
{
throw new Exception("Wrong key size exception");
}
int length = input.Length % 8;
if (length != 0)
{
length = 8 - length;
for (int i = 0; i < length; i++)
{
input += " ";
}
}
TripleDESCryptoServiceProvider des = new TripleDESCryptoServiceProvider();
des.Mode = CipherMode.ECB;
des.Padding = PaddingMode.Zeros;
des.Key = k;
ICryptoTransform ic = des.CreateEncryptor();
byte[] bytePlainText = Encoding.Default.GetBytes(input);
MemoryStream ms = new MemoryStream();
CryptoStream cStream = new CryptoStream(ms,
ic,
CryptoStreamMode.Write);
cStream.Write(bytePlainText, 0, bytePlainText.Length);
cStream.FlushFinalBlock();
byte[] cipherTextBytes = ms.ToArray();
cStream.Close();
ms.Close();
output = Encoding.Default.GetString(cipherTextBytes);
}
catch (ArgumentException e)
{
output = e.Message;
//Log.Instance.WriteToEvent("Problem encoding, terminalID= "+objTerminalSecurity.TerminalID+" ,Error" + output, "Security", EventLogEntryType.Error);
return false;
}
return true;
}
I send the output parameter as is over to a WCF http-binding webservice, and I noticed that the actual encoded string looks different, it looks like there are some \t and \n but the charachters are about the same.
What is going on, why does the server get a different encoded string?

Usually cipher text is base64 encoded in an effort to be binary safe during transmission.
Also I would not use 3DES with ECB. That is awful, you must have copy pasted this from somewhere. Use AES with cbc mode and think about adding a cmac or hmac.

C++ equivalent of MATLAB's "fileparts" function

In MATLAB there's a nice function called fileparts that takes a full file path and parses it into path, filename (without extension), and extension as in the following example from the documentation:
file = 'H:\user4\matlab\classpath.txt';
[pathstr, name, ext] = fileparts(file)
>> pathstr = H:\user4\matlab
>> name = classpath
>> ext = .txt
So I was wondering if there's an equivalent function in any standard C++ or C libraries that I could use? Or would I have to implement this myself? I realize it's fairly simple, but I figured if there's already something pre-made that would be preferable.
Thanks.

The boost library has a file system component "basic_path" that allows you use iterators to discover each component in the filename. Such a component would be OS specific, and I believe you need to compile boost separately for Windows, Linux etc.

I just wrote this simple function. It behaves similar as Matlab's fileparts and works independent of platform.
struct FileParts
{
string path;
string name;
string ext;
};
FileParts fileparts(string filename)
{
int idx0 = filename.rfind("/");
int idx1 = filename.rfind(".");
FileParts fp;
fp.path = filename.substr(0,idx0+1);
fp.name = filename.substr(idx0+1,idx1-idx0-1);
fp.ext = filename.substr(idx1);
return fp;
}

A platform-independent way with C++11/14.
#include <experimental/filesystem>
namespace fs = std::experimental::filesystem;
void fileparts(string full, string& fpath, string& fname, string& fext)
{
auto source = fs::path(full);
fpath = source.parent_path().string();
fname = source.stem().string();
fext = source.extension().string();
}
...
string fpath, fname, fext;
fileparts(full_file_path,fpath,fname,fext);

Some possible solutions, depending on your OS:
Visual C++ _splitpath function
Win32 Shell Path Handling Functions such as PathFindExtension, PathFindFileName, PathStripPath, PathRemoveExtension, PathRemoveFileSpec

Ekalic's text-only approach is useful, but it didn't check for errors. Here's one that does, and also works with both / and \
struct FileParts
{
std::string path; //!< containing folder, if provided, including trailing slash
std::string name; //!< base file name, without extension
std::string ext; //!< extension, including '.'
};
//! Using only text manipulation, splits a full path into component file parts
FileParts fileparts(const std::string &fullpath)
{
using namespace std;
size_t idxSlash = fullpath.rfind("/");
if (idxSlash == string::npos) {
idxSlash = fullpath.rfind("\\");
}
size_t idxDot = fullpath.rfind(".");
FileParts fp;
if (idxSlash != string::npos && idxDot != string::npos) {
fp.path = fullpath.substr(0, idxSlash + 1);
fp.name = fullpath.substr(idxSlash + 1, idxDot - idxSlash - 1);
fp.ext = fullpath.substr(idxDot);
} else if (idxSlash == string::npos && idxDot == string::npos) {
fp.name = fullpath;
} else if (/* only */ idxSlash == string::npos) {
fp.name = fullpath.substr(0, idxDot);
fp.ext = fullpath.substr(idxDot);
} else { // only idxDot == string::npos
fp.path = fullpath.substr(0, idxSlash + 1);
fp.name = fullpath.substr(idxSlash + 1);
}
return fp;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to set UTF-8 encoding without BOM for CStdioFileEx? - c++

Related

Move a file or folder to the RecycleBin/Trash (C++17)

String includes just valid chars?

`fgetpos` Not Returning the Correct Position

Why is this encrypted message damaged?

C++ equivalent of MATLAB's "fileparts" function

Categories

Resources