Xml Parser - string::find - c++

I am trying to parse a string which contains a line of my XML file.
std::string temp = "<Album>Underclass Hero</Album>";
int f = temp.find(">");
int l = temp.find("</");
std::string _line = temp.substr(f + 1, l-2);
This is a part of my code of my function which should actually return the parsed string. What I expected was that it returns Underclass Hero. Instead I got Underclass Hero< /Alb
(here is between the '<' and '/' a space because I couldn't write them together).
I looked std::string::find several times up and it always said it returns, if existing, the position of the first character of the first match. Here it gives me the last character of the string, but only in my variable l.
f does fine.
link to std::string::find
So can anyone tell me what I'm doing wrong?

The second argument takes the length of the substring you want to extract. You can fix your code this way:
#include <string>
#include <iostream>
int main()
{
std::string temp = "<Album>Underclass Hero</Album>";
int f = temp.find(">");
int l = temp.find("</");
std::string line = temp.substr(f + 1, l - f - 1);
// ^^^^^^^^^
}
Here is a live example.
Also, be careful with names such as _line. Per Paragraph 17.6.4.3.2/1 of the C++11 Standard:
[...] Each name that begins with an underscore is reserved to the implementation for use as a name in the
global namespace.

substr takes the length as the second parameter, not the end position. Try:
temp.substr(f + 1, l-f-1);
Also, please consider using a real XML parser, don't try it yourself or by other inappropriate means.

Don't do it this way!
'Parsing' 'lines' of XML files sooner or later will fail with your attempt. Example: The following is valid XML but your code will fail:
<Album>Underclass Hero<!-- What about </ this --></Album>
P.S.: Please use const where possible:
std::string const temp = ...
// ...
std::string const line = ...

Related

Why does std::views::split() compile but not split with an unnamed string literal as a pattern?

When std::views::split() gets an unnamed string literal as a pattern, it will not split the string but works just fine with an unnamed character literal.
#include <iomanip>
#include <iostream>
#include <ranges>
#include <string>
#include <string_view>
int main(void)
{
using namespace std::literals;
// returns the original string (not splitted)
auto splittedWords1 = std::views::split("one:.:two:.:three", ":.:");
for (const auto word : splittedWords1)
std::cout << std::quoted(std::string_view(word));
std::cout << std::endl;
// returns the splitted string
auto splittedWords2 = std::views::split("one:.:two:.:three", ":.:"sv);
for (const auto word : splittedWords2)
std::cout << std::quoted(std::string_view(word));
std::cout << std::endl;
// returns the splitted string
auto splittedWords3 = std::views::split("one:two:three", ':');
for (const auto word : splittedWords3)
std::cout << std::quoted(std::string_view(word));
std::cout << std::endl;
// returns the original string (not splitted)
auto splittedWords4 = std::views::split("one:two:three", ":");
for (const auto word : splittedWords4)
std::cout << std::quoted(std::string_view(word));
std::cout << std::endl;
return 0;
}
See live # godbolt.org.
I understand that string literals are always lvalues. But even though, I am missing some important piece of information that connects everything together. Why can I pass the string that I want splitted as an unnamed string literal whereas it fails (as-in: returns a range of ranges with the original string) when I do the same with the pattern?
String literals always end with a null-terminator, so ":.:" is actually a range with the last element of \0 and a size of 4.
Since the original string does not contain such a pattern, it is not split.
When dealing with C++20 ranges, I strongly recommend using string_view instead of raw string literals, which works well with <ranges> and can avoid the error-prone null-terminator issue.
This answer is completely correct, I'd just like to add a couple additional notes that might be interesting.
First, if you use {fmt} for printing, it's a lot easier to see what's going on, since you also don't have to write your own loop. You can just write this:
fmt::print("{}\n", rv::split("one:.:two:.:three", ":.:"));
Which will output (this is the default output for a range of range of char):
[[o, n, e, :, ., :, t, w, o, :, ., :, t, h, r, e, e, ]]
In C++23, there will be a way to directly specify that this print as a range of strings, but that hasn't been added to {fmt} yet. In the meantime, because split preserves the initial range category, you can add:
auto to_string_views = std::views::transform([](auto sr){
return std::string_view(sr.data(), sr.size());
});
And then:
fmt::print("{}\n", std::views::split("one:.:two:.:three", ":.:") | to_string_views);
prints:
["one:.:two:.:three\x00"]
Note the visibly trailing zero. Likewise, the next three attempts format as:
["one", "two", "three\x00"]
["one", "two", "three\x00"]
["one:two:three\x00"]
The fact that we can clearly see the \x00 helps track down the issue.
Next, consider the difference between:
std::views::split("one:.:two:.:three", ":.:")
and
"one:.:two:.:three" | std::views::split(":.:")
We typically consider these to be equivalent, but they're... not entirely. In the latter case, the library has to capture and stash these values - which involves decaying them. In this case, because ":.:" decays into char const*, that's no longer a valid pattern for the incoming string literal. So the above doesn't actually compile.
Now, it'd be great if it both compiled and also worked correctly. Unfortunately, it's impossible to tell in the language between a string literal (where you don't want to include the null terminator) and an array of char (where you want to include the whole array). So at least, with this latter formulation, you can get the wrong thing to not compile. And at least - "doesn't compile" is better than "compiles and does something wildly different from what I expected"?
Demo.

Replace single backslash with double in a string c++

I am trying to replace one backslash with two. To do that I tried using the following code
str = "d:\test\text.txt"
str.replace("\\","\\\\");
The code does not work. Whole idea is to pass str to deletefile function, which requires double blackslash.
since c++11, you may try using regex
#include <regex>
#include <iostream>
int main() {
auto s = std::string(R"(\tmp\)");
s = std::regex_replace(s, std::regex(R"(\\)"), R"(\\)");
std::cout << s << std::endl;
}
A bit overkill, but does the trick is you want a "quick" sollution
There are two errors in your code.
First line: you forgot to double the \ in the literal string.
It happens that \t is a valid escape representing the tab character, so you get no compiler error, but your string doesn't contain what you expect.
Second line: according to the reference of string::replace,
you can replace a substring by another substring based on the substring position.
However, there is no version that makes a substitution, i.e. replace all occurences of a given substring by another one.
This doesn't exist in the standard library. It exists for example in the boost library, see boost string algorithms. The algorithm you are looking for is called replace_all.

Need to access an element from part of a string array

Suppose I have this code:
#include <iostream>
#include <string>
using namespace std;
int main()
{
string disectedString[5];
disectedString[0] = "011001";
string temp = disectedString[0];
string print = temp[0];
return 0;
}
So I'm selecting an element out of my array of strings, and then assigning it to a temp variable. From there, I want to select the first element out of the temp variable,(the first character). How would I go about doing this?
Your intuition is mostly valid: You use the square brackets operator, [], to access the element at an indexed position within a collection or sequence. Thus
disectedString[0] means "the first element of disectedString";
temp[0] means "the first element of temp";
What you've gotten mixed up are the types, as commenters and #demogorgon.net's answer have explained.
Now, with modern C++ you can "play dumb" and not declare what you know the types to be:
std::string disectedString[5];
disectedString[0] = "011001";
auto temp = disectedString[0];
auto print = temp[0];
Note the use of auto instead of a specific type name. This will work as you would like it to. You can then use use print, and do, for example:
std::cout << print;
and this will output 0.
By the way, I believe you should reconsider your choice of names:
Intuitively, print should refer to a function, or a method, which prints things; I'd suggest first_character or char_to_print or just c if you want to be brief.
temp is no more a temporary variable than, say, print.
It's better to avoid variable names which contain the type name, although we sometimes sort of have to resort to that. Specifically you using the word 'string' in variable names; probably not a good idea.
Your disectedString variable is not a string, it's an array of strings, which is confusing.
A string behaves in many ways like an array of char's (*). You need to set print to char type instead of string since you are trying to get a specific element from the string. So your print should look like this:
char print = temp[0];
(*) but it's really more complicated than that.
Here is a code example that prints the output.

How can I match the \0 character in a regex in C++?

I need to match the text '\0' with the same regex that I would match 'a' or 'b'. (a regex for a character constant in C++). I've tried a bunch of different regexes, but haven't gotten a successful one yet. My latest attempt:
^['].|\\0[']
Most of the other things I've tried have given seg faults, so this is really the closest I've gotten.
This works pretty nicely with what I've tested ('a','b','\0').
If you don't have std::regex or boost::regex I guess what you can get out of it is the fact that the regex I used is ('.'|'\\0').
#include <boost/regex.hpp>
#include <string>
#include <iostream>
#include <vector>
int main() {
std::vector<std::string> strings;
strings.push_back(R"('a')");
strings.push_back(R"('b')");
strings.push_back(R"('\0')");
boost::regex rgx(R"(('.'|'\\0'))");
boost::smatch match;
for(auto& i : strings) {
if(boost::regex_match(i,match, rgx)) {
boost::ssub_match submatch = match[1];
std::cout << submatch.str() << '\n';
}
}
}
Example
There's nothing magic about '\0'; it's just a character, like any other character, and there's nothing (almost) special you have to do to use it in a regular expression. The only problem you might run into is if you use it in the middle of a character literal that you pass to a function that treats it as the end of a string. To avoid that, force it into a std::string:
const char s[] = "a\0b";
std::string not_my_str(s); // not_my_str holds "a"
std::string str(s, 3); // str holds "a\0b"
Once you've constructed the string object, the embedded '\0' gets no special treatment. Except, of course, if you copy the contents with a function that treats it specially.
The regex that works (in this instance, using the C header ) is:
^('(.|([\\]0))')
Thanks to #WhozCraig for the help!

Regular Expression for removing suffix

What is the regular expression for removing the suffix of file names? For example, if I have a file name in a string such as "vnb.txt", what is the regular expression to remove ".txt"?
Thanks.
Do you really need a regular expression to do this? Why not just look for the last period in the string, and trim the string up to that point? Frankly, there's a lot of overhead for a regular expression, and I don't think you need it in this case.
As suggested by tstenner, you can try one of the following, depending on what kinds of strings you're using:
std::strrchr
std::string::find_last_of
First example:
char* str = "Directory/file.txt";
size_t index;
char* pStr = strrchr(str,'.');
if(nullptr != pStr)
{
index = pStr - str;
}
Second example:
int index = string("Directory/file.txt").find_last_of('.');
If you are using Qt already, you could use QFileInfo, and use the baseName() function to get just the name (if one exists), or the suffix() function to get the extension (if one exists).
If you're looking for a solution that will give you anything except for the suffix, you should use string::find_last_of.
Your code could look like this:
const std::string removesuffix(const std::string& s) {
size_t suffixbegin = s.find_last_of('.');
//This will handle cases like "directory.foo/bar"
size_t dir = s.find_last_of('/');
if(dir != std::string::npos && dir > suffixbegin) return s;
if(suffixbegin == std::string::npos) return s;
else return s.substr(0,suffixbegin);
}
If you're looking for a regular expression, use \.[^.]+$.
You have to escape the first ., otherwise it will match any character, and put a $ at the end, so it will only match at the end of a string.
Different operating systems may allow different characters in filenams, the simplest regex might be (.+)\.txt$. Get the first capture group to get the filename sans extension.