Efficient way to truncate string to length N - c++

For example, suppose I have std::string containing UNIX-style path to some file:
string path("/first/second/blah/myfile");
Suppose now I want to throw away file-related information and get path to 'blah' folder from this string. So is there an efficient (saying 'efficient' I mean 'without any copies') way of truncating this string so that it contains "/first/second/blah" only?
Thanks in advance.

If N is known, you can use
path.erase(N, std::string::npos);
If N is not known and you want to find it, you can use any of the search functions. In this case you 'll want to find the last slash, so you can use rfind or find_last_of:
path.erase(path.rfind('/'), std::string::npos);
path.erase(path.find_last_of('/'), std::string::npos);
There's even a variation of this based on iterators:
path.erase (path.begin() + path.rfind('/'), path.end());
That said, if you are going to be manipulating paths for a living it's better to use a library designed for this task, such as Boost Filesystem.

While the accepted answer for sure works, the most efficient way to throw away the end of a string is to call the resize method, in your case just:
path.resize(N);

Related

Replace all occurrences based on map

This question might have already been answered but I haven't found it yet.
Let's say I have a std::map<string,string> which contains string pairs of <replace_all_this, to_this>.
I checked Boost's format library, which is close but not perfect:
std::map<string,string> m;
m['$search1'] = 'replace1';
m['$search2'] = 'replace2';
format fmter1("let's try to %1% and %2%.");
fmter % 36; fmter % 77;
for(auto r : m) {
fmter % r.second;
}
// would print "let's try to replace1 and replace2
This would work, but I lose control of what.
Actually I'd like to have this as result:
format fmter1("let's try to $search2 and $search1 even if their order is different in the map.");
...
//print: "let's try to replace2 and replace1 even if their order is different in the map".
Please note: map can contain more items, and items can occur multiple times in the formatter.
What is the way to go for this in 2020, I'd like it to be effective and fast, so I'd avoid iterating over the map multiple times.
There may be new libraries but there is no new algorithm to do that faster than what we have so far.
Assuming your format implies a $<name> for your variables, you can search for the first '$', read the <name> search for that in the map, then do the replace. This allows you to either skip the replacement or process it too (i.e. make it recursive where a variable can reference another).
I don't think that doing it the other way around would be any faster: i.e. go through the map and search for the names in the string means you'd be parsing the strings many times and if you have many variables, it will be a huge waste if most are not likely part of your string. Also if you want to prevent some level of recursivity, it's very complicated.
Where you can eventually optimize is in calculating the size of the resulting string and allocate that buffer once instead of using += which is not unlikely going to be slower.
I have such an implementation in my snaplogger. The variable has to be between brackets and it can include multiple parameters to further tweak the data. There is documentation here about what is supported by that function and as written, you can easily extend the class to add more features. It's probably not a one to one match to what you're looking for, but it shows you that there is not 20 ways of implementing that function.

Efficient way to check if string contains value of vector<string>?

I'm pretty new to C++ programming but for certain reasons I need to develop a small tool in C++. I've written the same tool in C# already. Right now I'm trying to check if my string contains a value that is stored in a std::vector. In C# this is pretty straight forward, simply using something like this:
if(string.Contains(myarray)) { // do sth. }
In C++ this seems way harder to achieve. I googled quite a bit but so far I found only solutions to check if the WHOLE string exists in an array and not just a certain part of it.
Unfortunately std::string does not have a method that can see if a element of a vector is in a string like C# does. What it does have though is std::string::find which can determine if a string is contained within the string you call find on. You could use that like
std::vector<std::string> words;
// fill words
std::string search_me = "some text";
for (const auto & e : words)
{
if (search_me.find(e) != std::string::npos)
{
// e is contained in search me. do something here
// call break here if you only want to check for one existence
}
}
This is O(N*complexity_of_find).
Use a for loop and the find method.
I would suggest std::find_first_of
Not sure if I understood your exact problem, though. Could you give a small example of what your are trying to find in what?
If you need more effective way to find several substrings in string than straightforward find string-by-string, you can use Aho-Corasick algorithm
It uses trie to hold substrings. First google link to c++ implementation

Parse std::string for a selection of characters

Is there an easy way to parse a std::string in search of a list of certain charcters? For example, let's say the user enters this<\is a.>te!st string. I'd like to be able to spot those non-letter characters are there and do something about it. I'm looking for a general purpose solution that allows me to simply specify a list of chars so I can reuse the function in different situations. I'm guessing regular expressions will play a key role in any solution, and obviously the more compact and effience, the better.
You could use std::string::find_first_not_of() for this. It'll find the characters except those in the set that you give it. Its counterpart, find_first_of(), will search for characters that are in the set.
Both functions allow you to specify the starting index. This will enable you you to continue the search from where you left off.
How about using a regex library like boost::regex?
This should exactly do what you are looking for.
If your compiler supports C++11 you can use std::regex.
Regex seems like overkill. You can use std::string's methods: find_first_of() and/or find_last_of(). Here you can find documentation and examples.

C++ - Splitting Filename and File Extension

Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split() function:
void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
std::string ea;
std::stringstream stream(string);
while(getline(stream, ea, delim))
tokens.push_back(ea);
}
I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz, tar.bz2, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz. I wish to separate Some.file.name and tar.gz Note: The number of dots in a filename isn't constant.
I also tried PathFindExtension but no luck. Is this possible? If so, please enlighten me. Thank you.
Edit: I'm very sorry about not specifying the OS. It's Windows.
I think you could use std::string find_last_of to get the index of the last ., and substr to cut the string (although the "complex extensions" involving multiple dots will require additional work).
There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.
There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.
In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()
The problem is indeed filenames like *.tar.gz, which can not be split consistently, due to the fact that (at least in Windows) the .tar part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind for the rest or find some pre-implemented way. Note that the .tar.* extensions aren't infinite, and very much standardized (there's about ten of them I think).
You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.
EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?

C++, Boost regex, replace value function of matched value?

Specifically, I have an array of strings called val, and want to replace all instances of "%{n}%" in the input with val[n]. More generally, I want the replace value to be a function of the match value. This is in C++, so I went with Boost, but if another common regex library matches my needs better let me know.
I found some .NET (C#, VB.NET) solutions, but I don't know if I can use the same approach here (or, if I can, how to do so).
I know there is this ugly solution: have an expression of the form "(%{0}%)|(%{1}%)..." and then have a replace pattern like "(1?" + val[0] + ")(2?" + val[1] ... + ")".
But I'd like to know if what I'm trying to do can be done more elegantly.
Thanks!
I don't beleive boost::regex has an easy way to do this. The most straightfoward way that I can think of would be to do a regex_search using the "(%{[0-9]+}%)" pattern and then iterate over the sub-matches in the returned match_results object. You'll need to build a new string by concatenating the text from between each match (the match_results::position method will be your friend here) with the result of converting sub-matches to the values from your val array.