Stripping characters of a wxString - c++

I am working on an application written in C++ that primarily uses wxWidgets for its objects. Now, suppose I have the following wxString variable:
wxString path = "C:\\Program Files\\Some\\Path\\To\\A\\Directory\\";
Are there any ways to remove the trailing slashes? While wxString provides a Trim() method, it only applies to whitespace characters. I could think of converting the string to another string type and perform the stripping there and switch back to the wxString type (it is essential that I use the wxString type) but if there is a less convoluted way of doing things, I'd prefer that.

The others have mentioned how this can be achieved using wxString methods, however I would strongly advise using an appropriate class, i.e. either wxFileName or, maybe, std::filesystem::path, for working with paths instead of raw strings. E.g. to get a canonical representation of the path in your case I would use wxFileName::DirName(path).GetFullPath().

This is what I would use, if I had no proper path-parsing alternative:
wxString& remove_trailing_backslashes(wxString& path)
{
auto inb = path.find_last_not_of(L'\\');
if(inb != wxString::npos)
path.erase(inb + 1); //inb + 1 <= size(), valid for erase()
else //empty string or only backslashes
path.clear();
return path; //to allow chaining
}
Notes:
Unless you're doing something unusual, wxString stores wchar_ts internally, so it makes sense to use wide string and character literals (prefixed with L) to avoid unnecessary conversions.
Even in the unusual case when you'd have strings encoded in UTF-8, the code above still works, as \ is ASCII, so it cannot appear in the encoding of another code point (the L prefix wouldn't apply anymore in this case, of course).
Even if you're forced to use wxString, I suggest you try to use its std::basic_string-like interface whenever possible, instead of the wx-specific functions. The code above works fine if you replace wxString with std::wstring.
In support of what VZ. said in his answer, note that all these simplistic string-based solutions will strip C:\ to C:, and \ to the empty string, which may not be what you want. To avoid such issues, I would go for the Boost.Filesystem library, which is, as far as I know, the closest to the proposed standard library filesystem functionality (which is not formally part of the standard yet, but very close).
For completeness, here's what it would look like using Boost.Filesystem:
wxString remove_trailing_backslashes(const wxString& arg)
{
using boost::filesystem::path;
static const path dotp = L".";
path p = arg.wc_str();
if(p.filename() == dotp)
p.remove_filename();
return p.native();
}
It's not as efficient as the ad-hoc solution above, mainly because the string is not modified in-place, but more resilient to problems caused by special path formats.

path.erase(path.end() - 1);
or
path.RemoveLast();

My use case actually also considers scenarios without trailing slashes.
I came up with two solutions. The first makes use of regular expression:
wxRegEx StripRegex("(.+?)\\\\*$", wxRE_ADVANCED);
if (StripRegex.Matches(path))
{
path = StripRegex.GetMatch(path,1);
}
The second, as #catalin suggested, uses RemoveLast:
while (path.EndsWith("\\"))
{
path.RemoveLast();
}
Edit: Using #VZ's suggestion, I came up with the following:
// for some reason, the 'Program Files' part get's taken out in the resulting string
// so I have to first replace the double slashes
path.Replace("\\\\","\\");
path = wxFileName::DirName(path).GetPath();

Related

Removing certain characters at the beginning of string

Here's the usecase.
I have got a string with relative path to the folder. It's format may vary a little bit depending on where it came from (I am dealing with exported files from difference software).
For example: ./path/to/folder, /path/to/folder, path/to/folder.
What I need to do is to delete all the characters '.', '/' from the beginning of the string. Of course I can just do this manually in a for loop, but I thought maybe there's some kind of stl function exactly for such use-cases.
I thought maybe there's some kind of stl function exactly for such use-cases
#include <regex>
const std::string src("./path/to/folder");
static const std::regex re("^\\.?\\/?");
const std::string result = std::regex_replace(src, re, "");
If you need more efficiency than what <regex> provides, do it manually.

How to convert std::string_view to double?

I'm writing a c++ parser for a custom option file for an application. I have a loop that reads lines in the form of option=value from a text file where value must be converted to double. In pseudocode it does the following:
while(not EOF)
statement <- read_from_file
useful_statement <- remove whitespaces, comments, etc from statement
equal_position <- find '=' in useful_statement
option_str <- useful_statement[0:equal_position)
value_str <- useful_statement[equal_position:end)
find_option(option_str) <- double(value_str)
To handle the string splitting and passing around to functions, I use std::string_view because it avoids excessive copying and clearly states the intent of viewing segments of a pre-existing std::string. I've done everything to the point where std::string_view value_str points to the exact part of useful_statement that contains the value I want to extract, but I can't figure out the way to read a double from an std::string_view.
I know of std::stod which doesn't work with std::string_view. It allows me to write
double value = std::stod(std::string(value_str));
However, this is ugly because it converts to a string which is not actually needed, and even though it will presumably not make a noticeable difference in my case, it could be too slow if one had to read a huge amount of numbers from a text file.
On the other hand, atof won't work because I can't guarantee a null terminator. I could hack it by adding \0 to useful_statement when constructing it, but that will make the code confusing to a reader and make it too easy to break if the code is altered/refactored.
So, what would be a clean, intuitive and reasonably efficient way to do this?
Since you marked your question with C++1z, then that (theoretically) means you have access to from_chars. It can handle your string-to-number conversion without needing anything more than a pair of const char*s:
double dbl;
auto result = from_chars(value_str.data(), value_str.data() + value_str.size(), dbl);
Of course, this requires that your standard library provide an implementation of from_chars.
Headers:
#include <boost/convert.hpp>
#include <boost/convert/strtol.hpp>
Then:
std::string x { "aa123.4"};
const std::string_view y(x.c_str()+2, 5); // Window that views the characters "123.4".
auto value = boost::convert<double>(y, boost::cnv::strtol());
if (value.has_value())
{
cout << value.get() << "\n"; // Prints: 123.4
}
Tested Compilers:
MSVC 2017
p.s. Can easily install Boost using vcpkg (defaults to 32-bit, second command is for 64-bit):
vcpkg install boost-convert
vcpkg install boost-convert:x64-windows
Update: Apparently, many Boost functions use string streams internally, which has a lock on the global OS locale. So they have terrible multi-threaded performance**.
I would now recommend something like stoi() with substr instead. See: Safely convert std::string_view to int (like stoi or atoi)
** This strange quirk of Boost renders most of Boost string processing absolutely useless in a multi-threaded environment, which is strange paradox indeed. This is the voice of hard won experience talking - measure it for yourself if you have any doubts. A 48-core machine runs no faster with many Boost calls compared to a 2-core machine. So now I avoid certain parts of Boost like the proverbial plague, as anything can have a dependency on that damn global OS locale lock.

How to disable the escape sequence in C++

I use C++ to process many files, and I have to write the file name in source code like this:
"F:\\somepath\\subpath\\myfile",
I wonder that if there's any way to get rid of typing "\\" to get a character '\' in string literal context, i.e, I hope I can just write "F:\somepath\subpath\myfile" instead the boring one.
Solutions:
use C++11 string literals: R"(F:\somepath\subpath\myfile)"
Use boost::path with forward slashes:
They will validate your path and raise exceptions for problems.
boost::filesystem::path p = "f:/somepath/subpath";
p /= "myfile";
just use forward slashes; Windows should understand them.
If you have C++11, you can use raw string literals:
std::string s = R"F:\somepath\subpath\myfile";
On the other hand, you can just use forward slashes for filesystem paths:
std::string s = "F:/somepath/subpath/myfile";
Two obvious options:
Windows understands forward slashes (or rather, it translates them to backslashes); use those instead.
C++11 has raw string literals. Stuff inside them doesn't need to be escaped.
R"(F:\somepath\subpath\myfile)"

Polish chars in std::string

I have a problem. I'm writing an app in Polish (with, of course, polish chars) for Linux and I receive 80 warnings when compiling. These are just "warning: multi-character character constant" and "warning: case label value exceeds maximum value for type". I'm using std::string.
How do I replace std::string class?
Please help.
Thanks in advance.
Regards.
std::stringdoes not define a particular encoding. You can thus store any sequence of bytes in it. There are subtleties to be aware of:
.c_str() will return a null-terminated buffer. If your character set allows null bytes, don't pass this string to functions that take a const char* parameter without a lenght, or your data will be truncated.
A char does not represent a character, but a **byte. IMHO, this is the most problematic nomenclature in computing history. Note that wchar_t does necessarily hold a full character either, depending on UTF-16 normalization.
.size() and .length() will return the number of bytes, not the number of characters.
[edit] The warnings about case labels is related to issue (2). You are using a switch statement with multi-byte characters using type char which can not hold more than one byte.[/edit]
Therefore, you can use std::string in your application, provided that you respect these three rules. There are subtleties involving the STL, including std::find() that are consequences of this. You need to use some more clever string matching algorithms to properly support Unicode because of normalization forms.
However, when writing applications in any language that uses non-ASCII characters (if you're paranoid, consider this anything outside [0, 128)), you need to be aware of encodings in different sources of textual data.
The source-file encoding might not be specified, and might be subject to change using compiler options. Any string literal will be subject to this rule. I guess this is why you are getting warnings.
You will get a variety of character encodings from external sources (files, user input, etc.). When that source specifies the encoding or you can get it from some external source (i.e. asking the user that imports the data), then this is easier. A lot of (newer) internet protocols impose ASCII or UTF-8 unless otherwise specified.
These two issues are not addressed by any particular string class. You just need to convert all any external source to your internal encoding. I suggest UTF-8 all the time, but especially so on Linux because of native support. I strongly recommend to place your string literals in a message file to forget about issue (1) and only deal with issue (2).
I don't suggest using std::wstring on Linux because 100% of native APIs use function signatures with const char* and have direct support for UTF-8. If you use any string class based on wchar_t, you will need to convert to/from std::wstring non-stop and eventually get something wrong, on top of making everything slow(er).
If you were writing an application for Windows, I'd suggest exactly the opposite because all native APIs use const wchar_t* signatures. The ANSI versions of such functions perform an internal conversion to/from const wchar_t*.
Some "portable" libraries/languages use different representations based on the platform. They use UTF-8 with char on Linux and UTF-16 with wchar_t on Windows. I recall reading bout that trick in the Python reference implementation but the article was quite old. I'm not sure if that is true anymore.
On linux you should use multibyte string class provided by a framework you use.
I'd recommend Glib::ustring, from glibmm framework, which stores strings in UTF-8 encoding.
If your source files are in UTF-8, then using multibyte string literal in code is as easy as:
ustring alphabet("aąbcćdeęfghijklłmnńoóprsśtuwyzźż");
But you can not build a switch/case statement on multibyte characters using char. I'd recommend using a series of ifs. You can use Glibmm's gunichar, but it's not very readable (You can get correct unicode values for characters using a table from article on Polish alphabet in Wikipedia):
#include <glibmm.h>
#include <iostream>
using namespace std;
int main()
{
Glib::ustring alphabet("aąbcćdeęfghijklłmnńoóprsśtuwyzźż");
int small_polish_vovels_with_diacritics_count = 0;
for ( int i=0; i<alphabet.size(); i++ ) {
switch (alphabet[i]) {
case 0x0105: // ą
case 0x0119: // ę
case 0x00f3: // ó
small_polish_vovels_with_diacritics_count++;
break;
default:
break;
}
}
cout << "There are " << small_polish_vovels_with_diacritics_count
<< " small polish vovels with diacritics in this string.\n";
return 0;
}
You can compile this using:
g++ `pkg-config --cflags --libs glibmm-2.4` progname.cc -o progname
std::string is for ASCII strings. Since your polish strings don't fit in, you should use std::wstring.

Parse URLs using C-Strings in C++

I'm learning C++ for one of my CS classes, and for our first project I need to parse some URLs using c-strings (i.e. I can't use the C++ String class).
The only way I can think of approaching this is just iterating through (since it's a char[]) and using some switch statements. From someone who is more experienced in C++ - is there a better approach? Could you maybe point me to a good online resource? I haven't found one yet.
Weird that you're not allowed to use C++ language features i.e. C++ strings!
There are some C string functions available in the standard C library.
e.g.
strdup - duplicate a string
strtok - breaking a string into tokens. Beware - this modifies the original string.
strcpy - copying string
strstr - find string in string
strncpy - copy up to n bytes of string
etc
There is a good online reference here with a full list of available c string functions
for searching and finding things.
http://www.cplusplus.com/reference/clibrary/cstring/
You can walk through strings by accessing them like an array if you need to.
e.g.
char* url="http://stackoverflow.com/questions/1370870/c-strings-in-c"
int len = strlen(url);
for (int i = 0; i < len; ++i){
std::cout << url[i];
}
std::cout << endl;
As for actually how to do the parsing, you'll have to work that out on your own. It is an assignment after all.
There are a number of C standard library functions that can help you.
First, look at the C standard library function strtok. This allows you to retrieve parts of a C string separated by certain delimiters. For example, you could tokenize with the delimiter / to get the protocol, domain, and then the file path. You could tokenize the domain with delimiter . to get the subdomain(s), second level domain, and top level domain. Etc.
It's not nearly as powerful as a regular expression parser, which is what you would really want for parsing URLs, but it works on C strings, is part of the C standard library and is probably OK to use in your assignment.
Other C standard library functions that may help:
strstr() Extracts substrings just like std::string::substr()
strspn(), strchr() and strpbrk() Find a character or characters in a string, similar to std::string::find_first_of(), etc.
Edit: A reminder that the proper way to use these functions in C++ is to include <cstring> and use them in the std:: namespace, e.g. std::strtok().
You might want to refer to an open source library that can parse URLs (as a reference for how others have done it -- obviously don't copy and paste it!), such as curl or wget (links are directly to their url parsing files).
I don't know what the requirements are for parsing the URLs,
but if this is CS level it would be appropriate to use (very
simple) BNF and a (very simple) recursive descent parser.
This would make for a more robust solution than direct
iteration, e.g. for malformed URLs.
Very few string functions from the standard C library would
be needed.
You can use C functions like strtok, strchr, strstr etc.
Many of the runtime library functions that have been mentioned work quite well, either in conjunction with or apart from the approach of iterating through the string that you mentioned (which I think is time honored).