Regex in C++ how to search for valid Linux Device Node? - c++

Given a device node in Linux such as "/dev/sda1" or "/dev/sdb", I'd like to match all valid choices to know if I have a valid device node.
Here's what I have so far:
static bool isUSBNameValid(const std::string &node) {
std::regex device("/dev/sd[a-z]*");
if (std::regex_match(node, device)) {
return true;
}
return false;
}
This does not work. Why is this?
How to make this work with any valid Linux device node?

Your /dev/sd[a-z]* pattern matches /dev/sd literal substring followed with any 0+ lowercase ASCII letters. Used within regex_match, the pattern must match the whole string. Since the /dev/sda1 ends with a digit, the regex_match fails, but it succeeds with /dev/sdb.
So, if you plan to only match SATA devices, you will need to use /dev/sd[a-z][0-9]* pattern, else, to match arbitrary number of alphanumeric chars after /dev/, you may use /dev/[[:alnum:]]+.
std::regex device_sata("/dev/sd[a-z][0-9]*");
std::regex device_any("/dev/[[:alnum:]]+");
See the C++ demo:
#include<regex>
#include <iostream>
using namespace std;
bool isUSBNameValid(const std::string &node, std::regex device) {
if (std::regex_match(node, device)) {
return true;
}
return false;
}
int main() {
std::regex device_sata("/dev/sd[a-z][0-9]*");
std::regex device_any("/dev/[[:alnum:]]+");
cout<< ( isUSBNameValid("/dev/sda1", device_sata) ? "Found" : "Not found")<<endl;
cout<< ( isUSBNameValid("/dev/sdb", device_sata) ? "Found" : "Not found")<<endl;
cout<< ( isUSBNameValid("/dev/ttyS0", device_any) ? "Found" : "Not found")<<endl;
return 0;
}

I would suggest the following pattern instead:
std::regex device("/dev/sd[a-z][0-9]*");
Add capturing groups around the [a-z] and [0-9]* if that becomes important.
If you truly want to match any device it would be:
std::regex device("/dev/[[::anum]]+");
with an additional check that what you have matched is not a directory. It would probably be good to add such a check (using stat) anyway.

Related

Finding a string, in a string, with regex, regex_search

I have a string:
string str = "C:/Riot Games/League of Legends/LeagueClientUx.exe" "--riotclient-auth-token=yvjM3_sRqdaFoETdKSt1bQ" "--riotclient-app-port=53201" "--no-rads" "--disable-self-update" "--region=EUW" "--locale=en_GB" "--remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA" "--respawn-command=LeagueClient.exe" "--respawn-display-name=League of Legends" "--app-port=53230" "--install-directory=C:\Riot Games\League of Legends" "--app-name=LeagueClient" "--ux-name=LeagueClientUx" "--ux-helper-name=LeagueClientUxHelper" "--log-dir=LeagueClient Logs" "--crash-reporting=crashpad" "--crash-environment=EUW1" "--crash-pipe=\\.\pipe\crashpad_12076_CFZRMYHTBJGPBIUH" "--app-log-file-path=C:/Riot Games/League of Legends/Logs/LeagueClient Logs/2020-07-13T13-33-41_12076_LeagueClient.log" "--app-pid=12076" "--output-base-dir=C:\Riot Games\League of Legends" "--no-proxy-server";
I wanna grab the port number and remote auth token, and I do that with the following code:
#include <regex>
#include <iostream>
#include <string>
#include <Windows.h>
using namespace std;
string PrintMatch(std::string str, std::regex reg) {
smatch matches;
while (regex_search(str,matches,reg))
{
cout << matches.str(1) << endl;
break;
}
return matches.str(1);
}
int main() {
string str = "C:/Riot Games/League of Legends/LeagueClientUx.exe" "--riotclient-auth-token=yvjM3_sRqdaFoETdKSt1bQ" "--riotclient-app-port=53201" "--no-rads" "--disable-self-update" "--region=EUW" "--locale=en_GB" "--remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA" "--respawn-command=LeagueClient.exe" "--respawn-display-name=League of Legends" "--app-port=53230" "--install-directory=C:\Riot Games\League of Legends" "--app-name=LeagueClient" "--ux-name=LeagueClientUx" "--ux-helper-name=LeagueClientUxHelper" "--log-dir=LeagueClient Logs" "--crash-reporting=crashpad" "--crash-environment=EUW1" "--crash-pipe=\\.\pipe\crashpad_12076_CFZRMYHTBJGPBIUH" "--app-log-file-path=C:/Riot Games/League of Legends/Logs/LeagueClient Logs/2020-07-13T13-33-41_12076_LeagueClient.log" "--app-pid=12076" "--output-base-dir=C:\Riot Games\League of Legends" "--no-proxy-server";
regex reg("([0-9][0-9][0-9][0-9][0-9])");
string port = PrintMatch(str, reg);
regex reg1("(remoting-auth-token=[^\d]*)");
string output = PrintMatch(str, reg1);
}
ยด
Gives me the following output:
53201
remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA--respawn-comman
The amount of characters in port number(53201) doesn't change, so I get that sucessfully.
However the remoting-auth-token changes therefore I don't know how I can get it successfully also when changing length.
I wanna grab this part from the remoting auth token: "13bHJUl7M_u_CtoR7v8XeA", so I can store it in a variable for use in my APP, just like I've done with the port number.
Looking forward to hearing from you! :)
You should study the syntax of your expected matches to extract them correctly.
To get the port number value, I'd use
regex reg("--riotclient-app-port=(\\d+)");
This way, you do not even need to care about the number of digits you match since it will capture a number after a known string.
If the auth token can only contain letters, digits, _ or - you may use
regex reg1("remoting-auth-token=([\\w-]+)")
where \w matches a letter/digit/_ and - matches a hyphen, + will match one or more occurrences.
See the C++ demo.
First, you need to escape your str value. Every double-quotes (") character must be escaped with (\")
string str = "C:/Riot Games/League of Legends/LeagueClientUx.exe\" \"--riotclient-auth-token=yvjM3_sRqdaFoETdKSt1bQ\" \"--riotclient-app-port=53201\" \"--no-rads\" \"--disable-self-update\" \"--region=EUW\" \"--locale=en_GB\" \"--remoting-auth-token=13bHJUl7M_u_CtoR7v8XeA\" \"--respawn-command=LeagueClient.exe\" \"--respawn-display-name=League of Legends\" \"--app-port=53230\" \"--install-directory=C:\Riot Games\League of Legends\" \"--app-name=LeagueClient\" \"--ux-name=LeagueClientUx\" \"--ux-helper-name=LeagueClientUxHelper\" \"--log-dir=LeagueClient Logs\" \"--crash-reporting=crashpad\" \"--crash-environment=EUW1\" \"--crash-pipe=\\.\pipe\crashpad_12076_CFZRMYHTBJGPBIUH\" \"--app-log-file-path=C:/Riot Games/League of Legends/Logs/LeagueClient Logs/2020-07-13T13-33-41_12076_LeagueClient.log\" \"--app-pid=12076\" \"--output-base-dir=C:\Riot Games\League of Legends\" \"--no-proxy-server";
Second, use this pattern:
(?:--remoting-auth-token=)([^"]*)
You can access match group with index 1.
To test regexp you can use this link: https://regexr.com/58bpb

how can extract the name from a line

Assume that I have a line from a file that I want to read:
>NZ_FNBK01000055.1 Halorientalis regularis
So how can extract the name from that line that begins with a greater than sign; everything following the greater-than sign (and excluding the newline at the end of the line) is the name.
The name should be:
NZ_FNBK01000055.1 Halorientalis regularis
Here is my code so far:
bool file::load(istream& file)
{
string line;
while(getline(genomeSource, line)){
if(line.find(">") != string::npos)
{
m_name =
}
}
return true;
}
You could easily handle both conditions using regular expressions. c++ introduced <regex> in c++11. Using this and a regex like:
>.*? (.*?) .*$
> Get the literal character
.*? Non greedy search for anything stopping at a space
(.*?) Non greedy search sor anything stopping at a space but grouping the characters before hand.
.*$ Greedy search until the end of the string.
With this you can easily check if this line meets your criteria and get the name at the same time. Here is a test showing it working. For the code, the c++11 regex lib is very simple:
std::string s = ">NZ_FNBK01000055.1 Halorientalis regularis ";
std::regex rgx(">.*? (.*?) .*$"); // Make the regex
std::smatch matches;
if(std::regex_search(s, matches, rgx)) { // Do a search
if (matches.size() > 1) { // If there are matches, print them.
std::cout << "The name is " << matches[1].str() << "\n";
}
}
Here is a live example.

detect new line using C++ boost regex_match [duplicate]

I just started using Boost::regex today and am quite a novice in Regular Expressions too. I have been using "The Regulator" and Expresso to test my regex and seem satisfied with what I see there, but transferring that regex to boost, does not seem to do what I want it to do. Any pointers to help me a solution would be most welcome. As a side question are there any tools that would help me test my regex against boost.regex?
using namespace boost;
using namespace std;
vector<string> tokenizer::to_vector_int(const string s)
{
regex re("\\d*");
vector<string> vs;
cmatch matches;
if( regex_match(s.c_str(), matches, re) ) {
MessageBox(NULL, L"Hmmm", L"", MB_OK); // it never gets here
for( unsigned int i = 1 ; i < matches.size() ; ++i ) {
string match(matches[i].first, matches[i].second);
vs.push_back(match);
}
}
return vs;
}
void _uttokenizer::test_to_vector_int()
{
vector<string> __vi = tokenizer::to_vector_int("0<br/>1");
for( int i = 0 ; i < __vi.size() ; ++i ) INFO(__vi[i]);
CPPUNIT_ASSERT_EQUAL(2, (int)__vi.size());//always fails
}
Update (Thanks to Dav for helping me clarify my question):
I was hoping to get a vector with 2 strings in them => "0" and "1". I instead never get a successful regex_match() (regex_match() always returns false) so the vector is always empty.
Thanks '1800 INFORMATION' for your suggestions. The to_vector_int() method now looks like this, but it goes into a never ending loop (I took the code you gave and modified it to make it compilable) and find "0","","","" and so on. It never find the "1".
vector<string> tokenizer::to_vector_int(const string s)
{
regex re("(\\d*)");
vector<string> vs;
cmatch matches;
char * loc = const_cast<char *>(s.c_str());
while( regex_search(loc, matches, re) ) {
vs.push_back(string(matches[0].first, matches[0].second));
loc = const_cast<char *>(matches.suffix().str().c_str());
}
return vs;
}
In all honesty I don't think I have still understood the basics of searching for a pattern and getting the matches. Are there any tutorials with examples that explains this?
The basic problem is that you are using regex_match when you should be using regex_search:
The algorithms regex_search and
regex_match make use of match_results
to report what matched; the difference
between these algorithms is that
regex_match will only find matches
that consume all of the input text,
where as regex_search will search for
a match anywhere within the text being
matched.
From the boost documentation. Change it to use regex_search and it will work.
Also, it looks like you are not capturing the matches. Try changing the regex to this:
regex re("(\\d*)");
Or, maybe you need to be calling regex_search repeatedly:
char *where = s.c_str();
while (regex_search(s.c_str(), matches, re))
{
where = m.suffix().first;
}
This is since you only have one capture in your regex.
Alternatively, change your regex, if you know the basic structure of the data:
regex re("(\\d+).*?(\\d+)");
This would match two numbers within the search string.
Note that the regular expression \d* will match zero or more digits - this includes the empty string "" since this is exactly zero digits. I would change the expression to \d+ which will match 1 or more.

C++: Regex: returns full string and not matched group

for those asking, the {0} allows selection of any one block within the sResult string separated by the | 0 is the first block
it needs to be dynamic for future expansion as that number will be configurable by users
So I am working on a regex to extract 1 portion of a string, however while it matches the results return are not what is expected.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
for( int i = 0; i < regMatch.size(); i++)
{
//SUBMATCH 0 = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE"
//SUBMATCH 1 = "BUT|NOT|ANYTHNG|ELSE"
std::ssub_match sm = regMatch[i];
bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
For some reason I cannot figure out the code to get me just the MATCH_ME back so I can compare it to expected results list on the C++ side.
Anyone have any ideas on where I went wrong here.
It seems you're using regular expressions for what they haven't been designed for. You should first split your string at the delimiter | and apply regular expressions on the resulting tokens if you want to check them for validity.
By the way: The std::regex implementation in libstdc++ seems to be buggy. I just did some tests and found that even simple patterns containing escaped pipe characters like \\| failed to compile throwing a std::regex_error with no further information in the error message (GCC 4.8.1).
The following code example shows how to do what you are after - you compile this, then call it with a single numerical argument to extract that element of the input:
#include <iostream>
#include <cstring>
#include <regex>
int main(int argc, char *argv[]) {
char pat[100];
if (argc > 1) {
sprintf(pat, "^(?:[^|]+[|]){%s}([^|;]+)", argv[1]);
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern(pat);
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::ssub_match sm = regMatch[1];
std::cout << "The match is " << sm << std::endl;
//bValid = strcmp(regMatch[i].str().c_str(), pzPoint->_ptrTarget->_pzTag->szOPCItem);
}
}
return 0;
}
Creating an executable called match, you can then do
>> match 2
The match is NOT
which is what you wanted.
The regex, it turns out, works just fine - although as a matter of preference I would use \| instead of [|] for the first part.
Turns out the problem was on the C side in extracting the match, it had to be done more directly, below is the code that gets me exactly what I wanted out of the string so I can use it later.
std::string sResult = "MATCH_ME|BUT|NOT|ANYTHNG|ELSE";
std::regex pattern("^(?:[^|]+[|]){0}([^|;]+)");
std::smatch regMatch;
std::regex_search(sResult, regMatch, pattern);
if(regMatch[1].matched)
{
std::string theMatchedPortion = regMatch[1];
//the issue was not with the regex but in how I was retrieving the results.
//theMatchedPortion now equals "MATCH_ME" and by changing the number associated
with it I can navigate through the string
}

Boost regex not working as expected in my code

I just started using Boost::regex today and am quite a novice in Regular Expressions too. I have been using "The Regulator" and Expresso to test my regex and seem satisfied with what I see there, but transferring that regex to boost, does not seem to do what I want it to do. Any pointers to help me a solution would be most welcome. As a side question are there any tools that would help me test my regex against boost.regex?
using namespace boost;
using namespace std;
vector<string> tokenizer::to_vector_int(const string s)
{
regex re("\\d*");
vector<string> vs;
cmatch matches;
if( regex_match(s.c_str(), matches, re) ) {
MessageBox(NULL, L"Hmmm", L"", MB_OK); // it never gets here
for( unsigned int i = 1 ; i < matches.size() ; ++i ) {
string match(matches[i].first, matches[i].second);
vs.push_back(match);
}
}
return vs;
}
void _uttokenizer::test_to_vector_int()
{
vector<string> __vi = tokenizer::to_vector_int("0<br/>1");
for( int i = 0 ; i < __vi.size() ; ++i ) INFO(__vi[i]);
CPPUNIT_ASSERT_EQUAL(2, (int)__vi.size());//always fails
}
Update (Thanks to Dav for helping me clarify my question):
I was hoping to get a vector with 2 strings in them => "0" and "1". I instead never get a successful regex_match() (regex_match() always returns false) so the vector is always empty.
Thanks '1800 INFORMATION' for your suggestions. The to_vector_int() method now looks like this, but it goes into a never ending loop (I took the code you gave and modified it to make it compilable) and find "0","","","" and so on. It never find the "1".
vector<string> tokenizer::to_vector_int(const string s)
{
regex re("(\\d*)");
vector<string> vs;
cmatch matches;
char * loc = const_cast<char *>(s.c_str());
while( regex_search(loc, matches, re) ) {
vs.push_back(string(matches[0].first, matches[0].second));
loc = const_cast<char *>(matches.suffix().str().c_str());
}
return vs;
}
In all honesty I don't think I have still understood the basics of searching for a pattern and getting the matches. Are there any tutorials with examples that explains this?
The basic problem is that you are using regex_match when you should be using regex_search:
The algorithms regex_search and
regex_match make use of match_results
to report what matched; the difference
between these algorithms is that
regex_match will only find matches
that consume all of the input text,
where as regex_search will search for
a match anywhere within the text being
matched.
From the boost documentation. Change it to use regex_search and it will work.
Also, it looks like you are not capturing the matches. Try changing the regex to this:
regex re("(\\d*)");
Or, maybe you need to be calling regex_search repeatedly:
char *where = s.c_str();
while (regex_search(s.c_str(), matches, re))
{
where = m.suffix().first;
}
This is since you only have one capture in your regex.
Alternatively, change your regex, if you know the basic structure of the data:
regex re("(\\d+).*?(\\d+)");
This would match two numbers within the search string.
Note that the regular expression \d* will match zero or more digits - this includes the empty string "" since this is exactly zero digits. I would change the expression to \d+ which will match 1 or more.