Regular Expression for removing suffix - c++

What is the regular expression for removing the suffix of file names? For example, if I have a file name in a string such as "vnb.txt", what is the regular expression to remove ".txt"?
Thanks.

Do you really need a regular expression to do this? Why not just look for the last period in the string, and trim the string up to that point? Frankly, there's a lot of overhead for a regular expression, and I don't think you need it in this case.
As suggested by tstenner, you can try one of the following, depending on what kinds of strings you're using:
std::strrchr
std::string::find_last_of
First example:
char* str = "Directory/file.txt";
size_t index;
char* pStr = strrchr(str,'.');
if(nullptr != pStr)
{
index = pStr - str;
}
Second example:
int index = string("Directory/file.txt").find_last_of('.');

If you are using Qt already, you could use QFileInfo, and use the baseName() function to get just the name (if one exists), or the suffix() function to get the extension (if one exists).

If you're looking for a solution that will give you anything except for the suffix, you should use string::find_last_of.
Your code could look like this:
const std::string removesuffix(const std::string& s) {
size_t suffixbegin = s.find_last_of('.');
//This will handle cases like "directory.foo/bar"
size_t dir = s.find_last_of('/');
if(dir != std::string::npos && dir > suffixbegin) return s;
if(suffixbegin == std::string::npos) return s;
else return s.substr(0,suffixbegin);
}
If you're looking for a regular expression, use \.[^.]+$.
You have to escape the first ., otherwise it will match any character, and put a $ at the end, so it will only match at the end of a string.

Different operating systems may allow different characters in filenams, the simplest regex might be (.+)\.txt$. Get the first capture group to get the filename sans extension.

Related

convert std::string to std::regex and escape special regex symbols [duplicate]

I'm string to create a std::regex(__FILE__) as part of a unit test which checks some exception output that prints the file name.
On Windows it fails with:
regex_error(error_escape): The expression contained an invalid escaped character, or a trailing escape.
because the __FILE__ macro expansion contains un-escaped backslashes.
Is there a more elegant way to escape the backslashes than to loop through the resulting string (i.e. with a std algorithm or some std::string function)?
File paths can contain many characters that have special meaning in regular expression patterns. Escaping just the backslashes is not enough for robust checking in the general case.
Even a simple path, like C:\Program Files (x86)\Vendor\Product\app.exe, contains several special characters. If you want to turn that into a regular expression (or part of a regular expression), you would need to escape not only the backslashes but also the parentheses and the period (dot).
Fortunately, we can solve our regular expression problem with more regular expressions:
std::string EscapeForRegularExpression(const std::string &s) {
static const std::regex metacharacters(R"([\.\^\$\-\+\(\)\[\]\{\}\|\?\*)");
return std::regex_replace(s, metacharacters, "\\$&");
}
(File paths can't contain * or ?, but I've included them to keep the function general.)
If you don't abide by the "no raw loops" guideline, a probably faster implementation would avoid regular expressions:
std::string EscapeForRegularExpression(const std::string &s) {
static const char metacharacters[] = R"(\.^$-+()[]{}|?*)";
std::string out;
out.reserve(s.size());
for (auto ch : s) {
if (std::strchr(metacharacters, ch))
out.push_back('\\');
out.push_back(ch);
}
return out;
}
Although the loop adds some clutter, this approach allows us to drop a level of escaping on the definition of metacharacters, which is a readability win over the regex version.
Here is polymapper.
It takes an operation that takes and element and returns a range, the "map operation".
It produces a function object that takes a container, and applies the "map operation" to each element. It returns the same type as the container, where each element has been expanded/contracted by the "map operation".
template<class Op>
auto polymapper( Op&& op ) {
return [op=std::forward<Op>(op)](auto&& r) {
using std::begin;
using R=std::decay_t<decltype(r)>;
using iterator = decltype( begin(r) );
using T = typename std::iterator_traits<iterator>::value_type;
std::vector<T> data;
for (auto&& e:decltype(r)(r)) {
for (auto&& out:op(e)) {
data.push_back(out);
}
}
return R{ data.begin(), data.end() };
};
}
Here is escape_stuff:
auto escape_stuff = polymapper([](char c)->std::vector<char> {
if (c != '\\') return {c};
else return {c,c};
});
live example.
int main() {
std::cout << escape_stuff(std::string(__FILE__)) << "\n";
}
The advantage of this approach is that the action of messing with the guts of the container is factored out. You write code that messes with the characters or elements, and the overall logic is not your problem.
The disadvantage is polymapper is a bit strange, and needless memory allocations are done. (Those could be optimized out, but that makes the code more convoluted).
EDIT
In the end, I switched to #AdrianMcCarthy 's more robust approach.
Here's the inelegant method in which I solved the problem in case someone stumbles on this actually looking for a workaround:
std::string escapeBackslashes(const std::string& s)
{
std::string out;
for (auto c : s)
{
out += c;
if (c == '\\')
out += c;
}
return out;
}
and then
std::regex(escapeBackslashes(__FILE__));
It's O(N) which is probably as good as you can do here, but involves a lot of string copying which I'd like to think isn't strictly necessary.

How to remove the first two characters of a QString

How would I remove the the first two characters of a QString or if I have to put it a StackOverflows layman's terms:
QString str = "##Name" //output: ##Name
to
output: Name
So far I have used this small piece of code:
if(str.contains("##"))
{
str.replace("##","");
}
..but it doesn't work as I would need to have "##" in some other strings, but not at the beginning.
The first two characters may occur to be "%$" and "##" as well and that mostly the reason why I need to delete the first two characters.
Any ideas?
This the syntax to remove the two first characters.
str.remove(0, 2);
You can use the QString::mid function for this:
QString trimmed = str.mid(2);
But if you wish to modify the string in place, you would be better off using QString::remove as others have suggested.
You can use remove(const QRegExp &rx)
Removes every occurrence of the regular expression rx in the string, and returns a reference to the string. For example:
QString str = "##Name" //output: ##Name
str.remove(QRegExp("[#]."));
//strr == "Name"

How to match "{" using regex in c++

May we have similar question here stackoverflow:
But my question is:
First I tried to match all x in the string so I write the following code, and it's working well:
string str = line;
regex rx("x");
vector<int> index_matches; // results saved here
for (auto it = std::sregex_iterator(str.begin(), str.end(), rx);
it != std::sregex_iterator();
++it)
{
index_matches.push_back(it->position());
}
Now if I tried to match all { I tried to replace
regex rx("x"); with regex rx("{"); andregex rx("\{");.
So I got an exception and I think it should throw an exception because we use {
sometimes to express the regular expression, and it expect to have } in the regex at the end that's why it throw an exception.
So first is my explanation correct?
Second question I need to match all { using the same code above, is that possible to change the regex rx("{"); to something else?
You need to escape characters with special meaning in regular expressions, i.e. use \{ regular expression. But, \ has special meaning in C++ string literals. So, next you need to escape characters with special meaning in C++ string literals, i.e. write:
regex rx("\\{");

How to detect ESC in string using Boost regex

I need to determine if a file is PCL encoded. So I am looking at the first line to see if it begins with an ESC character. If you know a better way feel free to suggest. Here is my code:
bool pclFlag = false;
if (containStr(jobLine, "^\\e")) {
pclFlag=true;
}
bool containStr(const string& s, const string& re)
{
static const boost::regex e(re);
return regex_match(s, e);
}
pclFlag does not get set to true.
You've declared boost::regex e to be static, which means it will only get initialized the very first time your function is called. If your search here is not the first call, it will be searching for whatever string was passed in the first call.
regex_match must match the entire string. Try adding ".*" (dot star) to the end of your regex.
Important
Note that the result is true only if the expression matches the whole of the input sequence. If you want to search for an expression somewhere within the sequence then use regex_search. If you want to match a prefix of the character string then use regex_search with the flag match_continuous set.
http://www.boost.org/doc/libs/1_51_0/libs/regex/doc/html/boost_regex/ref/regex_match.html
#JoachimPileborg is right... if (jobline[0] == 0x1B) {} is much easier.
Boost.Regex seems like overkill if all you want to do is see if a string starts with a certain character.
bool pclFlag = jobLine.length() > 0 && jobLine[0] == '\033';
You could also use Boost string algorithms:
#include <boost/algorithm/string.hpp>
bool pclFlag = jobLine.starts_with("\033");
If you're looking to see if a string contains an escape anywhere in the string:
bool pclFlag = jobLine.find('\033') != npos;

string contains valid characters

I am writing a method whose signature is
bool isValidString(std::string value)
Inside this method I want to search all the characters in value are belongs to a set of characters which is a constant string
const std::string ValidCharacters("abcd")
To perform this search I take one character from value and search in ValidCharacters,if this check fails then it is invalid string is there any other alternative method in STL library to do this check.
Use find_first_not_of():
bool isValidString(const std::string& s) {
return std::string::npos == s.find_first_not_of("abcd");
}
you can use regular expressions to pattern match.
library regexp.h is to be included
http://www.digitalmars.com/rtl/regexp.html