Why is string.find_first_of behaving this way? - c++

I am trying to make a (assembly) parser which uses a string as a guide for how to cut the text to get the tokens I want.
string s = "$t4,";
string guide = "$!,$!,$!";
int i = 1;
string test =s.substr(0, s.find_first_of(" ,.\t"+to_string(guide[i+1]) ));
cout << test << "\n";
if s = "$t4" then test = "$t"
what I am expecting it to do is test to be "$t4", this works for every other $tX except for specifically the number 4 even though it's not in the (" ,.\t"+to_string(guide[i+1])) string

s.find_first_of(" ,.\t" + std::to_string(guide[i + 1]))
Assuming ASCII, that string will be:
,.\t44
44 is the ASCII value of the , in guide[i + 1].
The first character in "$t4," that it'll find is 4 at position 2, and you then create a substring from 0 and length 2, that is $t.

Related

How to name regex group matches in C++ the way python does (?P<name_of_regex>(.*))

I have a string in my program that contains certain values for parameters. I need to extract the values from the parameters using regex.
The regex looks like this:
std::smatch param;
std::string str = "--name=AName --age=AnAge --gender=AGender"
if (std::regex_match(str, param, std::regex(".*--name=(\\w+) .*--age=(\\d+) .*--gender=(\\w+) .*")))
{
//if it finds the order of the regex will come here and the values for each will be stored in param[1-3]
}
The problem is the order of the params can come in different orders, for example:
std::string str = "--gender=AGender --name=AName --age=AnAge"
std::string str = "--age=AnAge --gender=AGender --name=AName"
std::string str = "--name=AName --gender=AGender --age=AnAge "
Is there a way to express in a single regex expression to be able to capture values despite of the order instead of doing on regex per parameter I want to find? If so how can I access such value? In python is possible to add an <id> before the desired group to then later access it using same identifier. In my example code I do that using smatch type variable but the access to it depends on the order that the string has and I cannot rely on that.
Use this regex:
"^(?=.*--name=(\\w+))(?=.*--age=(\\d+))(?=.*--gender=(\\w+)).+"
The one problem you'll run into is the fact that params won't be able to determine which item belongs to which parameter.
The way I would solve this problem would be to use std::string::find.
For example:
std::string str = "--name=AName --age=AnAge --gender=AGender";
size_t namePos = str.find("--name=");
size_t agePos = str.find("--age=");
size_t genderPos = str.find("--gender=");
std::string name = "";
std::string gender = "";
std::string age = "";
if(namePos != std::string::npos)
{
// Add 7 to namePos since the size of "--name=" is 7.
// Assuming that the delimiter of the name is whitespace so find the first
// whitespace after --name=
name = str.substr(namePos + 7, str.find_first_of(" \n\r", namePos + 7) - (namePos + 7));
}
if(agePos != std::string::npos)
{
// Add 6 to agePos since the size of "--age=" is 6.
// Assuming that the delimiter of the age is whitepace so find the first
// whitespace after --age=
age = str.substr(agePos + 6, str.find_first_of(" \n\r", agePos + 6) - (agePos + 6));
}
if(genderPos != std::string::npos)
{
// Add 9 to genderPos since the size of "--gender=" is 9.
// Assuming that the delimiter of the gender is whitespace so find the first
// whitespace after --gender=
gender = str.substr(genderPos + 9, str.find_first_of(" \n\r", genderPos + 9) - (genderPos + 9));
std::cout << name << " " << gender << " " << age << std::endl;
}
Output:
AName AGender AnAge
There are better tools to parse commandlines, but if you really want to use regex, you will find that Boost::Regex makes this much easier than the std::regex.
In particular, it supports named groups (see e.g. Boost Regular Expression: Getting the Named Group) which is the feature you request in your question title.
You can combine that with BOOST_REGEX_MATCH_EXTRA to keep all matches for all named groups (by default, only the last match for each capture group is accessible after the search.)
Then you can just make a big disjunction ((?<group1>...)|(?<group2>...)|...) in your regex for all the groups you may encounter, and you will be able to get all values out regardless of their order.

regex repeated capturing group captures the last iteration but I need all

Example code:
var reStr = `"(?:\\"|[^"])*"`
var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)\s*\+\s*(` + reStr + `)\s*\+\s*(` + reStr + `)`)
var str = `"This\nis\ta\\string" +
"Another\"string" +
"Third string"
`
for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
fmt.Println(match, "found at index", i)
for i, str := range match {
fmt.Println(i, str)
}
}
Output:
["This\nis\ta\\string" +
"Another\"string" +
"Third string" "This\nis\ta\\string" "Another\"string" "Third string"] found at index 0
0 "This\nis\ta\\string" +
"Another\"string" +
"Third string"
1 "This\nis\ta\\string"
2 "Another\"string"
3 "Third string"
E.g. it matches the "sum of strings" and it captures all three strings correctly.
My problem is that I do not want to match the sum of exactly three strings. I want to match all "sum of strings" where the sum can consist of one or more string literals. I have tried to express this with {0,}
var reStr = `"(?:\\"|[^"])*"`
var reStrSum = regexp.MustCompile(`(?m)(` + reStr + `)` + `(?:\s*\+\s*(` + reStr + `)){0,}`)
var str = `
test1("This\nis\ta\\string" +
"Another\"string" +
"Third string summed");
test2("Second string " + "sum");
`
for i, match := range reStrSum.FindAllStringSubmatch(str, -1) {
fmt.Println(match, "found at index", i)
for i, str := range match {
fmt.Println(i, str)
}
}
`)){0,}`)
then I get this result:
["This\nis\ta\\string" +
"Another\"string" +
"Third string summed" "This\nis\ta\\string" "Third string summed"] found at index 0
0 "This\nis\ta\\string" +
"Another\"string" +
"Third string summed"
1 "This\nis\ta\\string"
2 "Third string summed"
["Second string " + "sum" "Second string " "sum"] found at index 1
0 "Second string " + "sum"
1 "Second string "
2 "sum"
Group 0 of the first match contains all three strings (the regexp matches correctly), but there are only two capturing groups in the expression, and the second group only contains the last iteration of the repetition. E.g. "Another\"string" is lost in the process, it cannot be accessed.
Would it be possible to get all iterations of (all repetitions) inside group 2 somehow?
I would also accept any workaround that uses nested loops. But please be aware that I cannot simply replace the {0,} repetition with an outer FindAllStringSubmatch call, because the FindAllStringSubmatch call is already used for iterating over "sums of strings". In other words, I must find the first string sum and also the "Second string sum".
I just found a workaround that will work. I can do two passes. In the first pass, I just match all string literals, and replace them with unique placeholders in the original text. Then the transformed text won't contain any strings, and it becomes much easier to do further processing on it in a second pass.
Something like this:
type javaString struct {
value string
lineno int
}
// First we find all string literals
var placeholder = "JSTR"
var reJavaStringLiteral = regexp.MustCompile(`(?m)("(?:\\"|[^"])*")`)
javaStringLiterals := make([]javaString, 0)
for pos, strMatch := range reJavaStringLiteral.FindAllStringSubmatch(strContent, -1) {
pos = strings.Index(strContent, strMatch[0])
head := strContent[0:pos]
lineno := strings.Count(head, "\n") + 1
javaStringLiterals = append(javaStringLiterals, javaString{value: strMatch[1], lineno: lineno})
}
// Next, we replace all string literals with placeholders.
for i, jstr := range javaStringLiterals {
strContent = strings.Replace(strContent, jstr.value, fmt.Sprintf("%v(%v)", placeholder, i), 1)
}
// Now the transformed text does not contain any string literals.
After the first pass, the original text becomes:
test1(JSTR(1) +
JSTR(2) +
JSTR(3));
test2(JSTR(3) + JSTR(4));
After this step, I can easily look for "JSTR(\d+) + JSTR(\d+) + JSTR(\d+)..." expressions. Now they are easy to find, because the text does not contain any strings (that could otherwise contain practically anything and interfere with regular expressions). These "sum of string" matches can then be re-matched with another FindAllStringSubmatch (in an inner loop) and then I'll get all information that I needed.
This is not a real solution, because it requires writting a lot of code, it is specific to my concrete use case, and does not really answer the original question: allow access to all iterations inside a repeated capturing group.
But the general idea of the workaround might be benefical for somebody who is facing a similar problem.

How to get a substring from a found string to a character in C++?

For example I have a string:
int random_int = 123; // let's pretend this integer could have various values!!
std::string str = "part1 Hello : part2 " + std::to_string(random_int) + " : part3 World ";
All parts are divided by the characters :
Let's say I want to find a substring from "part2" to the next character :, which would return part2 123 in this case.
I know how to find the pos of "part2" by str.find("part2"), but I don't know how to determine the length to the next : from that "part2", because the length can be of various length.
For example, I know that part3 substring could be extracted with str.substr(str.find("part3"));, but only because it's at the end...
So, is there a subtle way to get the substring part2 123 from that string?

Stars and string combination pattern in Python

I want a pattern like:
Input : Python is Interactive (any string separated by space)
Expected Output:
*************
*Python *
*is *
*Interactive*
*************
I tried using python's "re" module ,not able to create the stars in the pattern
inp = "Python is interactive"
import re
split = re.split(' ', inp)
length = []
for item in range(len(split)):
length.append(len(split[item]))
Max = (max(length))
for i in range(len(split)):
print(split[i])
You don't need the re module. Your approach is not that bad, but needs some rework:
input = "Python is interactive"
parts = input.split(" ")
maxlen = max(map(lambda part: len(part), parts))
# or this, if you want to go even more elegant:
maxlen = max(map(len, parts))
print ('*' * (maxlen + 4))
for part in parts:
spaces = maxlen - len(part)
print("* " + part + (" " * spaces) + " *")
print ('*' * (maxlen + 4))
For splitting you can use the string.split method. Then I calculate the maximum length (like you did, but a little bit more elegant).
Then I print as many stars as the most long string is + 4 because at the beginning and end of each string there is "* " and " *", so 4 more characters.
Then I print the string with as many spaces as padding as needed.
Finally the last line of stars.

Replace 8 characters after finding 3 characters from a string

Is it possible in C++ to replace 8 characters after finding 3 characters from a string
I tried below
Input:
txtvar = "This is for Testing Purpose line";
Expected should be output
txtvar = "This is for Testing XXXXXX line";
I tried below
std::string::size_type pos;
while (( pos = txtvar. find ("Testing")) ! = std::string::npos) {
txtvar.replace(pos, 9, XXXX);
}
After finding the Testing keyword next to that 9 characters should be replaced to "XXXXXXX"
Please help me on this
yes you can, the string class have methods for that, just look in the documentation for it:
//https://en.cppreference.com/w/cpp/string/basic_string
std::string txtvar {"This is for Testing Purpose line"};
//https://en.cppreference.com/w/cpp/string/basic_string/find
auto index {txtvar.find("Purpose")};
std::string t{"XXXXXXX"};
txtvar.replace(index, 7, t);
std::cout << txtvar << std::endl;