Finding a string, in a string, with regex, regex_search - c++

I have a string:
string str = "C:/Riot Games/League of Legends/LeagueClientUx.exe" "--riotclient-auth-token=yvjM3_sRqdaFoETdKSt1bQ" "--riotclient-app-port=53201" "--no-rads" "--disable-self-update" "--region=EUW" "--locale=en_GB" "--remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA" "--respawn-command=LeagueClient.exe" "--respawn-display-name=League of Legends" "--app-port=53230" "--install-directory=C:\Riot Games\League of Legends" "--app-name=LeagueClient" "--ux-name=LeagueClientUx" "--ux-helper-name=LeagueClientUxHelper" "--log-dir=LeagueClient Logs" "--crash-reporting=crashpad" "--crash-environment=EUW1" "--crash-pipe=\\.\pipe\crashpad_12076_CFZRMYHTBJGPBIUH" "--app-log-file-path=C:/Riot Games/League of Legends/Logs/LeagueClient Logs/2020-07-13T13-33-41_12076_LeagueClient.log" "--app-pid=12076" "--output-base-dir=C:\Riot Games\League of Legends" "--no-proxy-server";
I wanna grab the port number and remote auth token, and I do that with the following code:
#include <regex>
#include <iostream>
#include <string>
#include <Windows.h>
using namespace std;
string PrintMatch(std::string str, std::regex reg) {
smatch matches;
while (regex_search(str,matches,reg))
{
cout << matches.str(1) << endl;
break;
}
return matches.str(1);
}
int main() {
string str = "C:/Riot Games/League of Legends/LeagueClientUx.exe" "--riotclient-auth-token=yvjM3_sRqdaFoETdKSt1bQ" "--riotclient-app-port=53201" "--no-rads" "--disable-self-update" "--region=EUW" "--locale=en_GB" "--remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA" "--respawn-command=LeagueClient.exe" "--respawn-display-name=League of Legends" "--app-port=53230" "--install-directory=C:\Riot Games\League of Legends" "--app-name=LeagueClient" "--ux-name=LeagueClientUx" "--ux-helper-name=LeagueClientUxHelper" "--log-dir=LeagueClient Logs" "--crash-reporting=crashpad" "--crash-environment=EUW1" "--crash-pipe=\\.\pipe\crashpad_12076_CFZRMYHTBJGPBIUH" "--app-log-file-path=C:/Riot Games/League of Legends/Logs/LeagueClient Logs/2020-07-13T13-33-41_12076_LeagueClient.log" "--app-pid=12076" "--output-base-dir=C:\Riot Games\League of Legends" "--no-proxy-server";
regex reg("([0-9][0-9][0-9][0-9][0-9])");
string port = PrintMatch(str, reg);
regex reg1("(remoting-auth-token=[^\d]*)");
string output = PrintMatch(str, reg1);
}
´
Gives me the following output:
53201
remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA--respawn-comman
The amount of characters in port number(53201) doesn't change, so I get that sucessfully.
However the remoting-auth-token changes therefore I don't know how I can get it successfully also when changing length.
I wanna grab this part from the remoting auth token: "13bHJUl7M_u_CtoR7v8XeA", so I can store it in a variable for use in my APP, just like I've done with the port number.
Looking forward to hearing from you! :)

You should study the syntax of your expected matches to extract them correctly.
To get the port number value, I'd use
regex reg("--riotclient-app-port=(\\d+)");
This way, you do not even need to care about the number of digits you match since it will capture a number after a known string.
If the auth token can only contain letters, digits, _ or - you may use
regex reg1("remoting-auth-token=([\\w-]+)")
where \w matches a letter/digit/_ and - matches a hyphen, + will match one or more occurrences.
See the C++ demo.

First, you need to escape your str value. Every double-quotes (") character must be escaped with (\")
string str = "C:/Riot Games/League of Legends/LeagueClientUx.exe\" \"--riotclient-auth-token=yvjM3_sRqdaFoETdKSt1bQ\" \"--riotclient-app-port=53201\" \"--no-rads\" \"--disable-self-update\" \"--region=EUW\" \"--locale=en_GB\" \"--remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA\" \"--respawn-command=LeagueClient.exe\" \"--respawn-display-name=League of Legends\" \"--app-port=53230\" \"--install-directory=C:\Riot Games\League of Legends\" \"--app-name=LeagueClient\" \"--ux-name=LeagueClientUx\" \"--ux-helper-name=LeagueClientUxHelper\" \"--log-dir=LeagueClient Logs\" \"--crash-reporting=crashpad\" \"--crash-environment=EUW1\" \"--crash-pipe=\\.\pipe\crashpad_12076_CFZRMYHTBJGPBIUH\" \"--app-log-file-path=C:/Riot Games/League of Legends/Logs/LeagueClient Logs/2020-07-13T13-33-41_12076_LeagueClient.log\" \"--app-pid=12076\" \"--output-base-dir=C:\Riot Games\League of Legends\" \"--no-proxy-server";
Second, use this pattern:
(?:--remoting-auth-token=)([^"]*)
You can access match group with index 1.
To test regexp you can use this link: https://regexr.com/58bpb

Related

How can I replace all words in a string except one

So, I would like to change all words in a string except one, that stays in the middle.
#include <boost/algorithm/string/replace.hpp>
int main()
{
string test = "You want to join player group";
string find = "You want to join group";
string replace = "This is a test about group";
boost::replace_all(test, find, replace);
cout << test << endl;
}
The output was expected to be:
This is a test about player group
But it doesn't work, the output is:
You want to join player group
The problem is on finding out the words, since they are a unique string.
There's a function that reads all words, no matter their position and just change what I want?
EDIT2:
This is the best example of what I want to happen:
char* a = "This is MYYYYYYYYY line in the void Translate"; // This is the main line
char* b = "This is line in the void Translate"; // This is what needs to be find in the main line
char* c = "Testing - is line twatawtn thdwae voiwd Transwlate"; // This needs to replace ALL the words in the char* b, perserving the MYYYYYYYYY
// The output is expected to be:
Testing - is MYYYYYYYY is line twatawtn thdwae voiwd Transwlate
You need to invert your thinking here. Instead of matching "All words but one", you need to try to match that one word so you can extract it and insert it elsewhere.
We can do this with Regular Expressions, which became standardized in C++11:
std::string test = "You want to join player group";
static const std::regex find{R"(You want to join (\S+) group)"};
std::smatch search_result;
if (!std::regex_search(test, search_result, find))
{
std::cerr << "Could not match the string\n";
exit(1);
}
else
{
std::string found_group_name = search_result[1];
auto replace = boost::format("This is a test about %1% group") % found_group_name;
std::cout << replace;
}
Live Demo
To match the word "player" I used a pretty simply regular expression (\S+) which means "match one or more non-whitespace characters (greedily) and put that into a group"
"Groups" in regular expressions are enclosed by parentheses. The 0th group is always the entire match, and since we only have one set of parentheses, your word is therefore in group 1, hence the resulting access of the match result at search_result[1].
To create the regular expression, you'll notice I used the perhaps-unfamiliar string literal syntaxR"(...)". This is called a raw string literal and was also standardized in C++11. It was basically made for describing regular expressions without needing to escape backslashes. If you've used Python, it's the same as r'...'. If you've used C#, it's the same as #"..."
I threw in some boost::format to print the result because you were using Boost in the question and I thought you'd like to have some fun with it :-)
In your example, find is not a substring of test, so boost::replace_all(test, find, replace); has no effect.
Removing group from find and replace solves it:
#include <boost/algorithm/string/replace.hpp>
#include <iostream>
int main()
{
std::string test = "You want to join player group";
std::string find = "You want to join";
std::string replace = "This is a test about";
boost::replace_all(test, find, replace);
std::cout << test << std::endl;
}
Output: This is a test about player group.
In this case, there is just one replace of the beginning of the string because the end of the string is already the right one. You could have another call of replace_all to change the end if needed.
Some other options:
one is in the other answer.
split the strings into a vector (or array) of words, then insert the desired word (player) at the right spot of the replace vector, then build your output string from it.

how can extract the name from a line

Assume that I have a line from a file that I want to read:
>NZ_FNBK01000055.1 Halorientalis regularis
So how can extract the name from that line that begins with a greater than sign; everything following the greater-than sign (and excluding the newline at the end of the line) is the name.
The name should be:
NZ_FNBK01000055.1 Halorientalis regularis
Here is my code so far:
bool file::load(istream& file)
{
string line;
while(getline(genomeSource, line)){
if(line.find(">") != string::npos)
{
m_name =
}
}
return true;
}
You could easily handle both conditions using regular expressions. c++ introduced <regex> in c++11. Using this and a regex like:
>.*? (.*?) .*$
> Get the literal character
.*? Non greedy search for anything stopping at a space
(.*?) Non greedy search sor anything stopping at a space but grouping the characters before hand.
.*$ Greedy search until the end of the string.
With this you can easily check if this line meets your criteria and get the name at the same time. Here is a test showing it working. For the code, the c++11 regex lib is very simple:
std::string s = ">NZ_FNBK01000055.1 Halorientalis regularis ";
std::regex rgx(">.*? (.*?) .*$"); // Make the regex
std::smatch matches;
if(std::regex_search(s, matches, rgx)) { // Do a search
if (matches.size() > 1) { // If there are matches, print them.
std::cout << "The name is " << matches[1].str() << "\n";
}
}
Here is a live example.

Regex in C++ how to search for valid Linux Device Node?

Given a device node in Linux such as "/dev/sda1" or "/dev/sdb", I'd like to match all valid choices to know if I have a valid device node.
Here's what I have so far:
static bool isUSBNameValid(const std::string &node) {
std::regex device("/dev/sd[a-z]*");
if (std::regex_match(node, device)) {
return true;
}
return false;
}
This does not work. Why is this?
How to make this work with any valid Linux device node?
Your /dev/sd[a-z]* pattern matches /dev/sd literal substring followed with any 0+ lowercase ASCII letters. Used within regex_match, the pattern must match the whole string. Since the /dev/sda1 ends with a digit, the regex_match fails, but it succeeds with /dev/sdb.
So, if you plan to only match SATA devices, you will need to use /dev/sd[a-z][0-9]* pattern, else, to match arbitrary number of alphanumeric chars after /dev/, you may use /dev/[[:alnum:]]+.
std::regex device_sata("/dev/sd[a-z][0-9]*");
std::regex device_any("/dev/[[:alnum:]]+");
See the C++ demo:
#include<regex>
#include <iostream>
using namespace std;
bool isUSBNameValid(const std::string &node, std::regex device) {
if (std::regex_match(node, device)) {
return true;
}
return false;
}
int main() {
std::regex device_sata("/dev/sd[a-z][0-9]*");
std::regex device_any("/dev/[[:alnum:]]+");
cout<< ( isUSBNameValid("/dev/sda1", device_sata) ? "Found" : "Not found")<<endl;
cout<< ( isUSBNameValid("/dev/sdb", device_sata) ? "Found" : "Not found")<<endl;
cout<< ( isUSBNameValid("/dev/ttyS0", device_any) ? "Found" : "Not found")<<endl;
return 0;
}
I would suggest the following pattern instead:
std::regex device("/dev/sd[a-z][0-9]*");
Add capturing groups around the [a-z] and [0-9]* if that becomes important.
If you truly want to match any device it would be:
std::regex device("/dev/[[::anum]]+");
with an additional check that what you have matched is not a directory. It would probably be good to add such a check (using stat) anyway.

unchecked exception while running regex- get file name without extention from file path

I have this simple program
string str = "D:\Praxisphase 1 project\test\Brainstorming.docx";
regex ex("[^\\]+(?=\.docx$)");
if (regex_match(str, ex)){
cout << "match found"<< endl;
}
expecting the result to be true, my regex is working since I have tried it online, but when trying to run in C++ , the app throws unchecked exception.
First of all, use raw string literals when defining regex to avoid issues with backslashes (the \. is not a valid escape sequence, you need "\\." or R"(\.)"). Second, regex_match requires a full string match, thus, use regex_search.
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
string str = R"(D:\Praxisphase 1 project\test\Brainstorming.docx)";
// OR
// string str = R"D:\\Praxisphase 1 project\\test\\Brainstorming.docx";
regex ex(R"([^\\]+(?=\.docx$))");
if (regex_search(str, ex)){
cout << "match found"<< endl;
}
return 0;
}
See the C++ demo
Note that R"([^\\]+(?=\.docx$))" = "[^\\\\]+(?=\\.docx$)", the \ in the first are literal backslashes (and you need two backslashes in a regex pattern to match a \ symbol), and in the second, the 4 backslashes are necessary to declare 2 literal backslashes that will match a single \ in the input text.

What is the regular expression to get a token of a URL?

Say I have strings like these:
bunch of other html<a href="http://domain.com/133742/The_Token_I_Want.zip" more html and stuff
bunch of other html<a href="http://domain.com/12345/another_token.zip" more html and stuff
bunch of other html<a href="http://domain.com/0981723/YET_ANOTHER_TOKEN.zip" more html and stuff
What is the regular expression to match The_Token_I_Want, another_token, YET_ANOTHER_TOKEN?
Appendix B of RFC 2396 gives a doozy of a regular expression for splitting a URI into its components, and we can adapt it for your case
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*/([^.]+)[^?#]*)(\?([^#]*))?(#(.*))?
#######
This leaves The_Token_I_Want in $6, which is the “hashderlined” subexpression above. (Note that the hashes are not part of the pattern.) See it live:
#! /usr/bin/perl
$_ = "http://domain.com/133742/The_Token_I_Want.zip";
if (m!^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*/([^.]+)[^?#]*)(\?([^#]*))?(#(.*))?!) {
print "$6\n";
}
else {
print "no match\n";
}
Output:
$ ./prog.pl
The_Token_I_Want
UPDATE: I see in a comment that you're using boost::regex, so remember to escape the backslash in your C++ program.
#include <boost/foreach.hpp>
#include <boost/regex.hpp>
#include <iostream>
#include <string>
int main()
{
boost::regex token("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*"
"/([^.]+)"
// ####### I CAN HAZ HASHDERLINE PLZ
"[^?#]*)(\\?([^#]*))?(#(.*))?");
const char * const urls[] = {
"http://domain.com/133742/The_Token_I_Want.zip",
"http://domain.com/12345/another_token.zip",
"http://domain.com/0981723/YET_ANOTHER_TOKEN.zip",
};
BOOST_FOREACH(const char *url, urls) {
std::cout << url << ":\n";
std::string t;
boost::cmatch m;
if (boost::regex_match(url, m, token))
t = m[6];
else
t = "<no match>";
std::cout << " - " << m[6] << '\n';
}
return 0;
}
Output:
http://domain.com/133742/The_Token_I_Want.zip:
- The_Token_I_Want
http://domain.com/12345/another_token.zip:
- another_token
http://domain.com/0981723/YET_ANOTHER_TOKEN.zip:
- YET_ANOTHER_TOKEN
/a href="http://domain.com/[0-9]+/([a-zA-Z_]+).zip"/
Might want to add more characters to [a-zA-Z_]+
You can use:
(http|ftp)+://[[:alnum:]./_]+/([[:alnum:]._-]+).[[:alnum:]_-]+
([[:alnum:]._-]+) is a group for the matched pattern, and in your example its value will be The_Token_I_Want. to access this group, use \2 or $2, because (http|ftp) is the first group and ([[:alnum:]._-]+) is the second group of the matched pattern.
Try this:
/(?:f|ht)tps?:/{2}(?:www.)?domain[^/]+.([^/]+).([^/]+)/i
or
/\w{3,5}:/{2}(?:w{3}.)?domain[^/]+.([^/]+).([^/]+)/i
First, use an HTML parser and get a DOM. Then get the anchor elements and loop over them looking for the hrefs. Don't try to grab the token straight out of a string.
Then:
The glib answer would be:
/(The_Token_I_Want.zip)/
You might want to be a little more precise then a single example.
I'm guessing you are actually looking for:
/([^/]+)$/
m/The_Token_I_Want/
You'll have to be more specific about what kind of token it is. A number? A string? Does it repeat? Does it have a form or pattern to it?
It's probably best to use something smarter than a RegEx. For example, if you're using C# you could use the System.Uri class to parse it for you.