Easy way to parse a url in C++ cross platform?

Easy way to parse a url in C++ cross platform? - c++

I need to parse a URL to get the protocol, host, path, and query in an application I am writing in C++. The application is intended to be cross-platform. I'm surprised I can't find anything that does this in the boost or POCO libraries. Is it somewhere obvious I'm not looking? Any suggestions on appropriate open source libs? Or is this something I just have to do my self? It's not super complicated but it seems like such a common task I am surprised there isn't a common solution.

There is a library that's proposed for Boost inclusion and allows you to parse HTTP URI's easily. It uses Boost.Spirit and is also released under the Boost Software License. The library is cpp-netlib which you can find the documentation for at http://cpp-netlib.github.com/ -- you can download the latest release from http://github.com/cpp-netlib/cpp-netlib/downloads .
The relevant type you'll want to use is boost::network::http::uri and is documented here.

Wstring version of above, added other fields I needed. Could definitely be refined, but good enough for my purposes.
#include <string>
#include <algorithm> // find
struct Uri
{
public:
std::wstring QueryString, Path, Protocol, Host, Port;
static Uri Parse(const std::wstring &uri)
{
Uri result;
typedef std::wstring::const_iterator iterator_t;
if (uri.length() == 0)
return result;
iterator_t uriEnd = uri.end();
// get query start
iterator_t queryStart = std::find(uri.begin(), uriEnd, L'?');
// protocol
iterator_t protocolStart = uri.begin();
iterator_t protocolEnd = std::find(protocolStart, uriEnd, L':'); //"://");
if (protocolEnd != uriEnd)
{
std::wstring prot = &*(protocolEnd);
if ((prot.length() > 3) && (prot.substr(0, 3) == L"://"))
{
result.Protocol = std::wstring(protocolStart, protocolEnd);
protocolEnd += 3; // ://
}
else
protocolEnd = uri.begin(); // no protocol
}
else
protocolEnd = uri.begin(); // no protocol
// host
iterator_t hostStart = protocolEnd;
iterator_t pathStart = std::find(hostStart, uriEnd, L'/'); // get pathStart
iterator_t hostEnd = std::find(protocolEnd,
(pathStart != uriEnd) ? pathStart : queryStart,
L':'); // check for port
result.Host = std::wstring(hostStart, hostEnd);
// port
if ((hostEnd != uriEnd) && ((&*(hostEnd))[0] == L':')) // we have a port
{
hostEnd++;
iterator_t portEnd = (pathStart != uriEnd) ? pathStart : queryStart;
result.Port = std::wstring(hostEnd, portEnd);
}
// path
if (pathStart != uriEnd)
result.Path = std::wstring(pathStart, queryStart);
// query
if (queryStart != uriEnd)
result.QueryString = std::wstring(queryStart, uri.end());
return result;
} // Parse
}; // uri
Tests/Usage
Uri u0 = Uri::Parse(L"http://localhost:80/foo.html?&q=1:2:3");
Uri u1 = Uri::Parse(L"https://localhost:80/foo.html?&q=1");
Uri u2 = Uri::Parse(L"localhost/foo");
Uri u3 = Uri::Parse(L"https://localhost/foo");
Uri u4 = Uri::Parse(L"localhost:8080");
Uri u5 = Uri::Parse(L"localhost?&foo=1");
Uri u6 = Uri::Parse(L"localhost?&foo=1:2:3");
u0.QueryString, u0.Path, u0.Protocol, u0.Host, u0.Port....

Terribly sorry, couldn't help it. :s
url.hh
#ifndef URL_HH_
#define URL_HH_
#include <string>
struct url {
url(const std::string& url_s); // omitted copy, ==, accessors, ...
private:
void parse(const std::string& url_s);
private:
std::string protocol_, host_, path_, query_;
};
#endif /* URL_HH_ */
url.cc
#include "url.hh"
#include <string>
#include <algorithm>
#include <cctype>
#include <functional>
using namespace std;
// ctors, copy, equality, ...
void url::parse(const string& url_s)
{
const string prot_end("://");
string::const_iterator prot_i = search(url_s.begin(), url_s.end(),
prot_end.begin(), prot_end.end());
protocol_.reserve(distance(url_s.begin(), prot_i));
transform(url_s.begin(), prot_i,
back_inserter(protocol_),
ptr_fun<int,int>(tolower)); // protocol is icase
if( prot_i == url_s.end() )
return;
advance(prot_i, prot_end.length());
string::const_iterator path_i = find(prot_i, url_s.end(), '/');
host_.reserve(distance(prot_i, path_i));
transform(prot_i, path_i,
back_inserter(host_),
ptr_fun<int,int>(tolower)); // host is icase
string::const_iterator query_i = find(path_i, url_s.end(), '?');
path_.assign(path_i, query_i);
if( query_i != url_s.end() )
++query_i;
query_.assign(query_i, url_s.end());
}
main.cc
// ...
url u("HTTP://stackoverflow.com/questions/2616011/parse-a.py?url=1");
cout << u.protocol() << '\t' << u.host() << ...

POCO's URI class can parse URLs for you. The following example is shortened version of the one in POCO URI and UUID slides:
#include "Poco/URI.h"
#include <iostream>
int main(int argc, char** argv)
{
Poco::URI uri1("http://www.appinf.com:88/sample?example-query#frag");
std::string scheme(uri1.getScheme()); // "http"
std::string auth(uri1.getAuthority()); // "www.appinf.com:88"
std::string host(uri1.getHost()); // "www.appinf.com"
unsigned short port = uri1.getPort(); // 88
std::string path(uri1.getPath()); // "/sample"
std::string query(uri1.getQuery()); // "example-query"
std::string frag(uri1.getFragment()); // "frag"
std::string pathEtc(uri1.getPathEtc()); // "/sample?example-query#frag"
return 0;
}

For completeness, there is one written in C that you could use (with a little wrapping, no doubt): https://uriparser.github.io/
[RFC-compliant and supports Unicode]
Here's a very basic wrapper I've been using for simply grabbing the results of a parse.
#include <string>
#include <uriparser/Uri.h>
namespace uriparser
{
class Uri //: boost::noncopyable
{
public:
Uri(std::string uri)
: uri_(uri)
{
UriParserStateA state_;
state_.uri = &uriParse_;
isValid_ = uriParseUriA(&state_, uri_.c_str()) == URI_SUCCESS;
}
~Uri() { uriFreeUriMembersA(&uriParse_); }
bool isValid() const { return isValid_; }
std::string scheme() const { return fromRange(uriParse_.scheme); }
std::string host() const { return fromRange(uriParse_.hostText); }
std::string port() const { return fromRange(uriParse_.portText); }
std::string path() const { return fromList(uriParse_.pathHead, "/"); }
std::string query() const { return fromRange(uriParse_.query); }
std::string fragment() const { return fromRange(uriParse_.fragment); }
private:
std::string uri_;
UriUriA uriParse_;
bool isValid_;
std::string fromRange(const UriTextRangeA & rng) const
{
return std::string(rng.first, rng.afterLast);
}
std::string fromList(UriPathSegmentA * xs, const std::string & delim) const
{
UriPathSegmentStructA * head(xs);
std::string accum;
while (head)
{
accum += delim + fromRange(head->text);
head = head->next;
}
return accum;
}
};
}

//sudo apt-get install libboost-all-dev; #install boost
//g++ urlregex.cpp -lboost_regex; #compile
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, char* argv[])
{
string url="https://www.google.com:443/webhp?gws_rd=ssl#q=cpp";
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url.c_str(), what, ex))
{
cout << "protocol: " << string(what[1].first, what[1].second) << endl;
cout << "domain: " << string(what[2].first, what[2].second) << endl;
cout << "port: " << string(what[3].first, what[3].second) << endl;
cout << "path: " << string(what[4].first, what[4].second) << endl;
cout << "query: " << string(what[5].first, what[5].second) << endl;
cout << "fragment: " << string(what[6].first, what[6].second) << endl;
}
return 0;
}

The Poco library now has a class for dissecting URI's and feeding back the host, path segments and query string etc.
https://pocoproject.org/pro/docs/Poco.URI.html

QT has QUrl for this. GNOME has SoupURI in libsoup, which you'll probably find a little more light-weight.

Facebook's Folly library can do the job for you easily. Simply use the Uri class:
#include <folly/Uri.h>
int main() {
folly::Uri folly("https://code.facebook.com/posts/177011135812493/");
folly.scheme(); // https
folly.host(); // code.facebook.com
folly.path(); // posts/177011135812493/
}

I know this is a very old question, but I've found the following useful:
http://www.zedwood.com/article/cpp-boost-url-regex
It gives 3 examples:
(With Boost)
//sudo apt-get install libboost-all-dev;
//g++ urlregex.cpp -lboost_regex
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using std::string;
using std::cout;
using std::endl;
using std::stringstream;
void parse_url(const string& url) //with boost
{
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url.c_str(), what, ex))
{
string protocol = string(what[1].first, what[1].second);
string domain = string(what[2].first, what[2].second);
string port = string(what[3].first, what[3].second);
string path = string(what[4].first, what[4].second);
string query = string(what[5].first, what[5].second);
cout << "[" << url << "]" << endl;
cout << protocol << endl;
cout << domain << endl;
cout << port << endl;
cout << path << endl;
cout << query << endl;
cout << "-------------------------------" << endl;
}
}
int main(int argc, char* argv[])
{
parse_url("http://www.google.com");
parse_url("https://mail.google.com/mail/");
parse_url("https://www.google.com:443/webhp?gws_rd=ssl");
return 0;
}
(Without Boost)
#include <string>
#include <iostream>
using std::string;
using std::cout;
using std::endl;
using std::stringstream;
string _trim(const string& str)
{
size_t start = str.find_first_not_of(" \n\r\t");
size_t until = str.find_last_not_of(" \n\r\t");
string::const_iterator i = start==string::npos ? str.begin() : str.begin() + start;
string::const_iterator x = until==string::npos ? str.end() : str.begin() + until+1;
return string(i,x);
}
void parse_url(const string& raw_url) //no boost
{
string path,domain,x,protocol,port,query;
int offset = 0;
size_t pos1,pos2,pos3,pos4;
x = _trim(raw_url);
offset = offset==0 && x.compare(0, 8, "https://")==0 ? 8 : offset;
offset = offset==0 && x.compare(0, 7, "http://" )==0 ? 7 : offset;
pos1 = x.find_first_of('/', offset+1 );
path = pos1==string::npos ? "" : x.substr(pos1);
domain = string( x.begin()+offset, pos1 != string::npos ? x.begin()+pos1 : x.end() );
path = (pos2 = path.find("#"))!=string::npos ? path.substr(0,pos2) : path;
port = (pos3 = domain.find(":"))!=string::npos ? domain.substr(pos3+1) : "";
domain = domain.substr(0, pos3!=string::npos ? pos3 : domain.length());
protocol = offset > 0 ? x.substr(0,offset-3) : "";
query = (pos4 = path.find("?"))!=string::npos ? path.substr(pos4+1) : "";
path = pos4!=string::npos ? path.substr(0,pos4) : path;
cout << "[" << raw_url << "]" << endl;
cout << "protocol: " << protocol << endl;
cout << "domain: " << domain << endl;
cout << "port: " << port << endl;
cout << "path: " << path << endl;
cout << "query: " << query << endl;
}
int main(int argc, char* argv[])
{
parse_url("http://www.google.com");
parse_url("https://mail.google.com/mail/");
parse_url("https://www.google.com:443/webhp?gws_rd=ssl");
return 0;
}
(Different way without Boost)
#include <string>
#include <stdint.h>
#include <cstring>
#include <sstream>
#include <algorithm>
#include <iostream>
using std::cerr; using std::cout; using std::endl;
using std::string;
class HTTPURL
{
private:
string _protocol;// http vs https
string _domain; // mail.google.com
uint16_t _port; // 80,443
string _path; // /mail/
string _query; // [after ?] a=b&c=b
public:
const string &protocol;
const string &domain;
const uint16_t &port;
const string &path;
const string &query;
HTTPURL(const string& url): protocol(_protocol),domain(_domain),port(_port),path(_path),query(_query)
{
string u = _trim(url);
size_t offset=0, slash_pos, hash_pos, colon_pos, qmark_pos;
string urlpath,urldomain,urlport;
uint16_t default_port;
static const char* allowed[] = { "https://", "http://", "ftp://", NULL};
for(int i=0; allowed[i]!=NULL && this->_protocol.length()==0; i++)
{
const char* c=allowed[i];
if (u.compare(0,strlen(c), c)==0) {
offset = strlen(c);
this->_protocol=string(c,0,offset-3);
}
}
default_port = this->_protocol=="https" ? 443 : 80;
slash_pos = u.find_first_of('/', offset+1 );
urlpath = slash_pos==string::npos ? "/" : u.substr(slash_pos);
urldomain = string( u.begin()+offset, slash_pos != string::npos ? u.begin()+slash_pos : u.end() );
urlpath = (hash_pos = urlpath.find("#"))!=string::npos ? urlpath.substr(0,hash_pos) : urlpath;
urlport = (colon_pos = urldomain.find(":"))!=string::npos ? urldomain.substr(colon_pos+1) : "";
urldomain = urldomain.substr(0, colon_pos!=string::npos ? colon_pos : urldomain.length());
this->_domain = _tolower(urldomain);
this->_query = (qmark_pos = urlpath.find("?"))!=string::npos ? urlpath.substr(qmark_pos+1) : "";
this->_path = qmark_pos!=string::npos ? urlpath.substr(0,qmark_pos) : urlpath;
this->_port = urlport.length()==0 ? default_port : _atoi(urlport) ;
};
private:
static inline string _trim(const string& input)
{
string str = input;
size_t endpos = str.find_last_not_of(" \t\n\r");
if( string::npos != endpos )
{
str = str.substr( 0, endpos+1 );
}
size_t startpos = str.find_first_not_of(" \t\n\r");
if( string::npos != startpos )
{
str = str.substr( startpos );
}
return str;
};
static inline string _tolower(const string& input)
{
string str = input;
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
};
static inline int _atoi(const string& input)
{
int r;
std::stringstream(input) >> r;
return r;
};
};
int main(int argc, char **argv)
{
HTTPURL u("https://Mail.google.com:80/mail/?action=send#action=send");
cout << "protocol: " << u.protocol << endl;
cout << "domain: " << u.domain << endl;
cout << "port: " << u.port << endl;
cout << "path: " << u.path << endl;
cout << "query: " << u.query << endl;
return 0;
}

This library is very tiny and lightweight: https://github.com/corporateshark/LUrlParser
However, it is parsing only, no URL normalization/validation.

Also of interest could be http://code.google.com/p/uri-grammar/ which like Dean Michael's netlib uses boost spirit to parse a URI. Came across it at Simple expression parser example using Boost::Spirit?

There is the newly released google-url lib:
http://code.google.com/p/google-url/
The library provides a low-level url parsing API as well as a higher-level abstraction called GURL. Here's an example using that:
#include <googleurl\src\gurl.h>
wchar_t url[] = L"http://www.facebook.com";
GURL parsedUrl (url);
assert(parsedUrl.DomainIs("facebook.com"));
Two small complaints I have with it: (1) it wants to use ICU by default to deal with different string encodings and (2) it makes some assumptions about logging (but I think they can be disabled). In other words, the library is not completely stand-alone as it exists, but I think it's still a good basis to start with, especially if you are already using ICU.

May I offer another self-contained solution based on std::regex :
const char* SCHEME_REGEX = "((http[s]?)://)?"; // match http or https before the ://
const char* USER_REGEX = "(([^#/:\\s]+)#)?"; // match anything other than # / : or whitespace before the ending #
const char* HOST_REGEX = "([^#/:\\s]+)"; // mandatory. match anything other than # / : or whitespace
const char* PORT_REGEX = "(:([0-9]{1,5}))?"; // after the : match 1 to 5 digits
const char* PATH_REGEX = "(/[^:#?\\s]*)?"; // after the / match anything other than : # ? or whitespace
const char* QUERY_REGEX = "(\\?(([^?;&#=]+=[^?;&#=]+)([;|&]([^?;&#=]+=[^?;&#=]+))*))?"; // after the ? match any number of x=y pairs, seperated by & or ;
const char* FRAGMENT_REGEX = "(#([^#\\s]*))?"; // after the # match anything other than # or whitespace
bool parseUri(const std::string &i_uri)
{
static const std::regex regExpr(std::string("^")
+ SCHEME_REGEX + USER_REGEX
+ HOST_REGEX + PORT_REGEX
+ PATH_REGEX + QUERY_REGEX
+ FRAGMENT_REGEX + "$");
std::smatch matchResults;
if (std::regex_match(i_uri.cbegin(), i_uri.cend(), matchResults, regExpr))
{
m_scheme.assign(matchResults[2].first, matchResults[2].second);
m_user.assign(matchResults[4].first, matchResults[4].second);
m_host.assign(matchResults[5].first, matchResults[5].second);
m_port.assign(matchResults[7].first, matchResults[7].second);
m_path.assign(matchResults[8].first, matchResults[8].second);
m_query.assign(matchResults[10].first, matchResults[10].second);
m_fragment.assign(matchResults[15].first, matchResults[15].second);
return true;
}
return false;
}
I added explanations for each part of the regular expression. This way allows you to choose exactly the relevant parts to parse for the URL that you're expecting to get. Just remember to change the desired regular expression group indices accordingly.

A small dependency you can use is uriparser, which recently moved to GitHub.
You can find a minimal example in their code: https://github.com/uriparser/uriparser/blob/63384be4fb8197264c55ff53a135110ecd5bd8c4/tool/uriparse.c
This will be more lightweight than Boost or Poco. The only catch is that it is C.
There is also a Buckaroo package:
buckaroo add github.com/buckaroo-pm/uriparser

I tried a couple of the solutions here, but then decided to write my own that could just be dropped into a project without any external dependencies (except c++17).
Right now, it passes all tests. But, if you find any cases that don't succeed, please feel free to create a Pull Request or an Issue.
I'll keep it up to date and improve its quality. Suggestions welcome! I'm also trying out this design to only have a single, high-quality class per repository so that the header and source can just be dropped into a project (as opposed to building a library or header-only). It appears to be working out well (I'm using git submodules and symlinks in my own projects).
https://github.com/homer6/url

You could try the open-source library called C++ REST SDK (created by Microsoft, distributed under the Apache License 2.0). It can be built for several platforms including Windows, Linux, OSX, iOS, Android). There is a class called web::uri where you put in a string and can retrieve individual URL components. Here is a code sample (tested on Windows):
#include <cpprest/base_uri.h>
#include <iostream>
#include <ostream>
web::uri sample_uri( L"http://dummyuser#localhost:7777/dummypath?dummyquery#dummyfragment" );
std::wcout << L"scheme: " << sample_uri.scheme() << std::endl;
std::wcout << L"user: " << sample_uri.user_info() << std::endl;
std::wcout << L"host: " << sample_uri.host() << std::endl;
std::wcout << L"port: " << sample_uri.port() << std::endl;
std::wcout << L"path: " << sample_uri.path() << std::endl;
std::wcout << L"query: " << sample_uri.query() << std::endl;
std::wcout << L"fragment: " << sample_uri.fragment() << std::endl;
The output will be:
scheme: http
user: dummyuser
host: localhost
port: 7777
path: /dummypath
query: dummyquery
fragment: dummyfragment
There are also other easy-to-use methods, e.g. to access individual attribute/value pairs from the query, split the path into components, etc.

If you use oatpp for web request handling, you can find its built-in URL parsing useful:
std::string url = /* ... */;
oatpp::String oatUrl(url.c_str(), url.size(), false);
oatpp::String oatHost = oatpp::network::Url::Parser::parseUrl(oatUrl).authority.host->toLowerCase();
std::string host(oatHost->c_str(), oatHost->getSize());
The above snippet retrieves the hostname. In a similar way:
oatpp::network::Url parsedUrl = oatpp::network::Url::Parser::parseUrl(oatUrl);
// parsedUrl.authority.port
// parsedUrl.path
// parsedUrl.scheme
// parsedUrl.queryParams

There is yet another library https://snapwebsites.org/project/libtld which handles all possible top level domains and URI shema

I have developed an "object oriented" solution, one C++ class, that works with one regex like #Mr.Jones and #velcrow solutions. My Url class performs url/uri 'parsing'.
I think I improved velcrow regex to be more robust and includes also the username part.
Follows the first version of my idea, I have released the same code, improved, in my GPL3 licensed open source project Cpp URL Parser.
Omitted #ifdef/ndef bloat part, follows Url.h
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
class Url {
public:
boost::regex ex;
string rawUrl;
string username;
string protocol;
string domain;
string port;
string path;
string query;
string fragment;
Url();
Url(string &rawUrl);
Url &update(string &rawUrl);
};
This is the code of the Url.cpp implementation file:
#include "Url.h"
Url::Url() {
this -> ex = boost::regex("(ssh|sftp|ftp|smb|http|https):\\/\\/(?:([^# ]*)#)?([^:?# ]+)(?::(\\d+))?([^?# ]*)(?:\\?([^# ]*))?(?:#([^ ]*))?");
}
Url::Url(string &rawUrl) : Url() {
this->rawUrl = rawUrl;
this->update(this->rawUrl);
}
Url &Url::update(string &rawUrl) {
this->rawUrl = rawUrl;
boost::cmatch what;
if (regex_match(rawUrl.c_str(), what, ex)) {
this -> protocol = string(what[1].first, what[1].second);
this -> username = string(what[2].first, what[2].second);
this -> domain = string(what[3].first, what[3].second);
this -> port = string(what[4].first, what[4].second);
this -> path = string(what[5].first, what[5].second);
this -> query = string(what[6].first, what[6].second);
this -> fragment = string(what[7].first, what[7].second);
}
return *this;
}
Usage example:
string urlString = "http://gino#ciao.it:67/ciao?roba=ciao#34";
Url *url = new Url(urlString);
std::cout << " username: " << url->username << " URL domain: " << url->domain;
std::cout << " port: " << url->port << " protocol: " << url->protocol;
You can also update the Url object to represent (and parse) another URL:
url.update("http://gino#nuovociao.it:68/nuovociao?roba=ciaoooo#")
I'm not a full-time C++ developer, so, I'm not sure I followed 100% C++ best-practises.
Any tip is appreciated.
P.s: let's look at Cpp URL Parser, there are refinements there.
Have fun

simple solution to get the protocol, host, path
int url_get(const std::string& uri)
{
//parse URI
std::size_t start = uri.find("://", 0);
if (start == std::string::npos)
{
return -1;
}
start += 3; //"://"
std::size_t end = uri.find("/", start + 1);
std::string protocol = uri.substr(0, start - 3);
std::string host = uri.substr(start, end - start);
std::string path = uri.substr(end);
return 0;
}

Related

How do I normalize a filepath in C++ using std::filesystem::path?

I am trying to convert a path string to a normalized (neat) format where any number of directory separators "\\" or "/" is converted to one default directory separator:
R"(C:\\temp\\Recordings/test)" -> R"(C:\temp\Recordings\test)"
Code:
#include <string>
#include <vector>
#include <iostream>
#include <filesystem>
std::string normalizePath(const std::string& messyPath) {
std::filesystem::path path(messyPath);
std::string npath = path.make_preferred().string();
return npath;
}
int main()
{
std::vector<std::string> messyPaths = { R"(C:\\temp\\Recordings/test)", R"(C://temp\\Recordings////test)" };
std::string desiredPath = R"(C:\temp\Recordings\test)";
for (auto messyPath : messyPaths) {
std::string normalizedPath = normalizePath(messyPath);
if (normalizedPath != desiredPath) {
std::cout << "normalizedPath: " << normalizedPath << " != " << desiredPath << std::endl;
}
}
std::cout << "Press any key to continue.\n";
int k;
std::cin >> k;
}
Output on Windows VS2019 x64:
normalizedPath: C:\\temp\\Recordings\test != C:\temp\Recordings\test
normalizedPath: C:\\temp\\Recordings\\\\test != C:\temp\Recordings\test
Reading the std::filepath documentation:
A path can be normalized by following this algorithm:
1. If the path is empty, stop (normal form of an empty path is an empty path)
2. Replace each directory-separator (which may consist of multiple slashes) with a single path::preferred_separator.
...
Great, but which library function does this? I do not want to code this myself.

As answered by bolov:
std::string normalizePath(const std::string& messyPath) {
std::filesystem::path path(messyPath);
std::filesystem::path canonicalPath = std::filesystem::weakly_canonical(path);
std::string npath = canonicalPath.make_preferred().string();
return npath;
}
weakly_canonical does not throw an exception if path does not exist.
canonical does.

Why is my string extraction function using back referencing in regex not working as intended?

Extraction Function
string extractStr(string str, string regExpStr) {
regex regexp(regExpStr);
smatch m;
regex_search(str, m, regexp);
string result = "";
for (string x : m)
result = result + x;
return result;
}
The Main Code
#include <iostream>
#include <regex>
using namespace std;
string extractStr(string, string);
int main(void) {
string test = "(1+1)*(n+n)";
cout << extractStr(test, "n\\+n") << endl;
cout << extractStr(test, "(\\d)\\+\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]([a-zA-Z])") << endl;
return 0;
}
The Output
String = (1+1)*(n+n)
n\+n = n+n
(\d)\+\1 = 1+11
([a-zA-Z])[+-/*]\1 = n+nn
([a-zA-Z])[+-/*]([a-zA-Z]) = n+nnn
If anyone could kindly point the error I've done or point me to a similar question in SO that I've missed while searching, it would be greatly appreciated.

Regexes in C++ don't work quite like "normal" regexes. Specialy when you are looking for multiple groups later. I also have some C++ tips in here (constness and references).
#include <cassert>
#include <iostream>
#include <sstream>
#include <regex>
#include <string>
// using namespace std; don't do this!
// https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice
// pass strings by const reference
// 1. const, you promise not to change them in this function
// 2. by reference, you avoid making copies
std::string extractStr(const std::string& str, const std::string& regExpStr)
{
std::regex regexp(regExpStr);
std::smatch m;
std::ostringstream os; // streams are more efficient for building up strings
auto begin = str.cbegin();
bool comma = false;
// C++ matches regexes in parts so work you need to loop
while (std::regex_search(begin, str.end(), m, regexp))
{
if (comma) os << ", ";
os << m[0];
comma = true;
begin = m.suffix().first;
}
return os.str();
}
// small helper function to produce nicer output for your tests.
void test(const std::string& input, const std::string& regex, const std::string& expected)
{
auto output = extractStr(input, regex);
if (output == expected)
{
std::cout << "test succeeded : output = " << output << "\n";
}
else
{
std::cout << "test failed : output = " << output << ", expected : " << expected << "\n";
}
}
int main(void)
{
std::string input = "(1+1)*(n+n)";
test(input, "n\\+n", "n+n");
test(input, "(\\d)\\+\\1", "1+1");
test(input, "([a-zA-Z])[+-/*]\\1", "n+n");
return 0;
}

Search partial filenames in C++ using boost filesystem

the question is simple , I want to find a file path inside a directory but I have only part of the filename, so here is a functions for this task
void getfiles(const fs::path& root, const string& ext, vector<fs::path>& ret)
{
if(!fs::exists(root) || !fs::is_directory(root)) return;
fs::recursive_directory_iterator it(root);
fs::recursive_directory_iterator endit;
while(it != endit)
{
if(fs::is_regular_file(*it)&&it->path().extension()==ext) ret.push_back(it->path());//
++it;
}
}
bool find_file(const filesystem::path& dir_path, const filesystem::path file_name, filesystem::path& path_found) {
const fs::recursive_directory_iterator end;
const auto it = find_if(fs::recursive_directory_iterator(dir_path), end,
[file_name](fs::path e) {
cerr<<boost::algorithm::icontains(e.filename().native() ,file_name.native())<<endl;
return boost::algorithm::icontains(e.filename().native() ,file_name.native());//
});
if (it == end) {
return false;
} else {
path_found = it->path();
return true;
}
}
int main (int argc, char* argv[])
{
vector<fs::path> inputClass ;
fs::path textFiles,datasetPath,imgpath;
textFiles=argv[1];
datasetPath=argv[2];
getfiles(textFiles,".txt",inputClass);
for (int i=0;i<inputClass.size();i++)
{
ifstream lblFile(inputClass[i].string().c_str());
string line;
fs::path classname=inputClass[i].parent_path()/inputClass[i].stem().string();
cerr<<classname.stem()<<endl;
while (getline(lblFile,line))
{
bool find=find_file(datasetPath,line,imgpath);
if (find)
{
while(!fs::exists(classname))
fs::create_directories (classname);
fs::copy(imgpath,classname/imgpath.filename());
cerr<<"Found\n";
}
else
cerr<<"Not Found \n";
}
lblFile.close();
}
}
Console out:
"490"
vfv343434.jpeg||E9408000EC0
0
fsdfdsfdfsf.jpeg||E9408000EC0
0
1200E9408000EC0.jpeg||E9408000EC0
0
Not Found
but when I set the search string manually it works fine ! I tried other methods for searching string like std::find but all the methods fail to find the substring, it seems there is problem with input string (line) I printed all the chars but no especial characters or anything.
if I set the search string manually it works as desired
string search="E9408000EC0";
cerr<<e.filename().native()<<"||"<<search<<endl;
cerr<<boost::algorithm::icontains(e.filename().native() ,search)<<endl;
the results for above change is like
"490"
vfv343434.jpeg||E9408000EC0
0
fsdfdsfdfsf.jpeg||E9408000EC0
0
1200E9408000EC0.jpeg||E9408000EC0
1
Found

I cannot reproduce this.
The only hunch I have is that on your platform, perhaps the string() accessor is not returning the plain string, but e.g. the quoted path. That would break the search. Consider using the native() accessor instead.
(In fact, since file_name is NOT a path, but a string pattern, suggest passing the argument as std::string__view or similar instead.)
Live On Coliru
#include <boost/filesystem.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
namespace fs = boost::filesystem;
template <typename Out>
void find_file(const fs::path& dir_path, const fs::path file_name, Out out) {
fs::recursive_directory_iterator it(dir_path), end;
std::copy_if(it, end, out, [file_name](fs::path e) {
return boost::algorithm::icontains(e.filename().native(),
file_name.native());
});
}
int main() {
fs::path d = "a/b/c/e";
fs::create_directories(d);
{
std::ofstream ofs(d / "1200E9408000EC0.jpeg");
}
std::cout << fs::path("000EC0").native() << "\n";
std::vector<fs::path> found;
find_file(".", "000EC0", back_inserter(found));
for (auto &f : found)
{
std::cout << "Found: " << f << "\n";
}
}
Prints
000EC0
Found: "./a/b/c/e/1200E9408000EC0.jpeg"
UPDATE: Code Review
To the updated question, came up with an somewhat improved tester that works with boost::filesystem and with std::filesystem just the same.
There are many small improvements (removing repetition, explicit conversions, using optional to return optional matches, etc.
Also added a whitespace trim to avoid choking on extraneous whitespace on the input lines:
Live On Coliru (-DUSE_BOOST_FS)
Live On Coliru (std library)
#include <boost/algorithm/string.hpp>
#include <fstream>
#include <iostream>
using boost::algorithm::icontains;
using boost::algorithm::trim;
#if defined(USE_BOOST_FS)
#include <boost/filesystem.hpp>
namespace fs = boost::filesystem;
using boost::system::error_code;
#else
#include <filesystem>
namespace fs = std::filesystem;
using std::error_code;
#endif
void getfiles(
const fs::path& root, const std::string& ext, std::vector<fs::path>& ret)
{
if (!exists(root) || !is_directory(root))
return;
for (fs::recursive_directory_iterator it(root), endit; it != endit; ++it) {
if (is_regular_file(*it) && it->path().extension() == ext)
ret.push_back(it->path()); //
}
}
std::optional<fs::path> find_file(const fs::path& dir_path, fs::path partial)
{
fs::recursive_directory_iterator end,
it = fs::recursive_directory_iterator(dir_path);
it = std::find_if(it, end, [partial](fs::path e) {
auto search = partial.native();
//std::cerr << e.filename().native() << "||" << search << std::endl;
auto matches = icontains(e.filename().native(), search);
std::cerr << e << " Matches: " << std::boolalpha << matches
<< std::endl;
return matches;
});
return (it != end)
? std::make_optional(it->path())
: std::nullopt;
}
auto readInputClass(fs::path const& textFiles)
{
std::vector<fs::path> found;
getfiles(textFiles, ".txt", found);
return found;
}
int main(int argc, char** argv)
{
std::vector<std::string> const args(argv, argv + argc);
auto const textFiles = readInputClass(args.at(1));
std::string const datasetPath = args.at(2);
for (fs::path classname : textFiles) {
// open the text file
std::ifstream lblFile(classname);
// use base without extension as output directory
classname.replace_extension();
if (!fs::exists(classname)) {
if (fs::create_directories(classname))
std::cerr << classname << " created" << std::endl;
}
for (std::string line; getline(lblFile, line);) {
trim(line);
if (auto found = find_file(datasetPath, line)) {
auto dest = classname / found->filename();
error_code ec;
copy(*found, dest, ec);
std::cerr << dest << " (" << ec.message() << ")\n";
} else {
std::cerr << "Not Found \n";
}
}
}
}
Testing from scratch with
mkdir -pv textfiles dataset
touch dataset/{vfv343434,fsdfdsfdfsf,1200E9408000EC0}.jpeg
echo 'E9408000EC0 ' > textfiles/490.txt
Running
./a.out textfiles/ dataset/
Prints
"textfiles/490" created
"dataset/1200E9408000EC0.jpeg" Matches: true
"textfiles/490/1200E9408000EC0.jpeg" (Success)
Or on subsequent run
"dataset/fsdfdsfdfsf.jpeg" Matches: false
"dataset/1200E9408000EC0.jpeg" Matches: true
"textfiles/490/1200E9408000EC0.jpeg" (File exists)
BONUS
Doing some more diagnostics and avoiding repeatedly traversing the filesystem for each pattern. The main program is now:
Live On Coliru
int main(int argc, char** argv)
{
std::vector<std::string> const args(argv, argv + argc);
Paths const classes = getfiles(args.at(1), ".txt");
Mappings map = readClassMappings(classes);
std::cout << "Procesing " << map.size() << " patterns from "
<< classes.size() << " classes" << std::endl;
processDatasetDir(args.at(2), map);
}
And the remaining functions are implemented as:
// be smart about case insenstiive patterns
struct Pattern : std::string {
using std::string::string;
using std::string::operator=;
#ifdef __cpp_lib_three_way_comparison
std::weak_ordering operator<=>(Pattern const& other) const {
if (boost::ilexicographical_compare(*this, other)) {
return std::weak_ordering::less;
} else if (boost::ilexicographical_compare(other, *this)) {
return std::weak_ordering::less;
}
return std::weak_ordering::equivalent;
}
#else
bool operator<(Pattern const& other) const {
return boost::ilexicographical_compare(*this, other);
}
#endif
};
using Paths = std::vector<fs::path>;
using Mapping = std::pair<Pattern, fs::path>;
using Patterns = std::set<Pattern>;
using Mappings = std::set<Mapping>;
Mappings readClassMappings(Paths const& classes)
{
Mappings mappings;
for (fs::path classname : classes) {
std::ifstream lblFile(classname);
classname.replace_extension();
for (Pattern pattern; getline(lblFile, pattern);) {
trim(pattern);
if (auto [it, ok] = mappings.emplace(pattern, classname); !ok) {
std::cerr << "WARNING: " << std::quoted(pattern)
<< " duplicates " << std::quoted(it->first)
<< std::endl;
}
}
}
return mappings;
}
size_t processDatasetDir(const fs::path& datasetPath, Mappings const& patterns)
{
size_t copied = 0, failed = 0;
Patterns found;
using It = fs::recursive_directory_iterator;
for (It it = It(datasetPath), end; it != end; ++it) {
if (!it->is_regular_file())
continue;
fs::path const& entry = *it;
for (auto& [pattern, location]: patterns) {
if (icontains(it->path().filename().native(), pattern)) {
found.emplace(pattern);
if (!exists(location) && fs::create_directories(location))
std::cerr << location << " created" << std::endl;
auto dest = location / entry.filename();
error_code ec;
copy(entry, dest, ec);
std::cerr << dest << " (" << ec.message() << ") from "
<< std::quoted(pattern) << "\n";
(ec? failed : copied) += 1;
}
}
}
std::cout << "Copied:" << copied
<< ", missing:" << patterns.size() - found.size()
<< ", failed: " << failed << std::endl;
return copied;
}
With some more "random" test data:
mkdir -pv textfiles dataset
touch dataset/{vfv343434,fsdfdsfdfsf,1200E9408000EC0}.jpeg
echo .jPeg > textfiles/all_of_them.txt
echo $'E9408000EC0 \n e9408000ec0\nE9408\nbOgUs' > textfiles/490.txt
Running as
./a.out textfiles/ dataset/
Prints:
WARNING: "e9408000ec0" duplicates "E9408000EC0"
Procesing 4 patterns from 2 classes
"textfiles/all_of_them" created
"textfiles/all_of_them/1200E9408000EC0.jpeg" (Success) from ".jPeg"
"textfiles/490" created
"textfiles/490/1200E9408000EC0.jpeg" (Success) from "E9408"
"textfiles/490/1200E9408000EC0.jpeg" (File exists) from "E9408000EC0"
"textfiles/all_of_them/vfv343434.jpeg" (Success) from ".jPeg"
"textfiles/all_of_them/fsdfdsfdfsf.jpeg" (Success) from ".jPeg"
Copied:4, missing:1, failed: 1

Passing variable (array type) from function to "main" scope Type: std::tr1::match_results<std::string::const_iterator>

I would like to pass the variable from a function to the main scope which I'm calling, I'm trying to do like I use to do in C but it returns nothing.
I want to be able to output and deal with it after the return of the function
#include "StdAfx.h"
#include <regex>
#include <iostream>
#include <string>
#include <conio.h>
using namespace std;
std::tr1::match_results<std::string::const_iterator> match(std::string& regex, const std::string& ip,std::tr1::match_results<std::string::const_iterator> res)
{
const std::tr1::regex pattern(regex.c_str());
bool valid = std::tr1::regex_match(ip, res, pattern);
std::cout << ip << " \t: " << (valid ? "valid" : "invalid") << std::endl;
cout << "FIRST RES FOUND: " << res[1] << endl;
return res;
}
int main()
{
string regex = "(\\d{1,3}):(\\d{1,3}):(\\d{1,3}):(\\d{1,3})";
string ip = "49:22:33:444";
std::tr1::match_results<std::string::const_iterator> res;
match(regex,ip.c_str(), res);
cout << "Result >" << res[1] << "< " << endl;
_getch(); return 0;
}
When I compile and run, The output is: "FIRST RES FOUND: 49
Result ><"
It's probably a really simple solution but what do I have to do to set it for my main can read it correctly as in: "Result >49<"
Thanks in advance. :)

Option 1: Use references:
void match(string& regex, const string& ip, tr1::match_results<string::const_iterator> & res)
{
const tr1::regex pattern(regex.c_str());
bool valid = tr1::regex_match(ip, res, pattern);
cout << ip << " \t: " << (valid ? "valid" : "invalid") << endl;
cout << "FIRST RES FOUND: " << res[1] << endl;
}
Option 2: Return the result by value and store it:
tr1::match_results<string::const_iterator> match(string& regex, const string& ip)
{
tr1::match_results<string::const_iterator> res;
// ...
return res;
}
int main()
{
// ...
tr1::match_results<string::const_iterator> res = match(regex, ip);
}
On a separate note, there should be absolutely no need for all the c_str() calls, as <regex> has a perfectly functional std::string interface. Check the documentation for details, you just have to get a couple of typenames right.
Edit: Here are some basic examples on using std::string. There are equivalent constructions for std::wstring, char* and wchar_t*, but std::strings should be the most useful one.
Since <regex> support is still patchy, you should consider the TR1 and Boost alternatives, too; I provide all three and you can pick one:
namespace ns = std; // for <regex>
namespace ns = std::tr1; // for <tr1/regex>
namespace ns = boost; // for <boost/regex.hpp>
ns::regex r("");
ns::smatch rxres; // 's' for 'string'
std::string data = argv[1]; // the data to be matched
// Fun #1: Search once
if (!ns::regex_search(data, rxres, r))
{
std::cout << "No match." << std::endl;
return 0;
}
// Fun #2: Iterate over all matches
ns::sregex_iterator rt(data.begin(), data.end(), r), rend;
for ( ; rt != rend; ++rt)
{
// *rt is the entire match object
for (auto it = rt->begin(), end = rt->end(); it != end; ++it)
{
// *it is the current capture group; the first one is the entire match
std::cout << " Match[" << std::distance(rt->begin(), it) << "]: " << *it << ", length " << it->length() << std::endl;
}
}
Don't forget to handle exceptions of type ns::regex_error.

Pass in res by reference instead of by value. In other words, declare the parameter res as a reference instead of a value, i.e., type &res, not type res.

How do I check if a C++ std::string starts with a certain string, and convert a substring to an int?

How do I implement the following (Python pseudocode) in C++?
if argv[1].startswith('--foo='):
foo_value = int(argv[1][len('--foo='):])
(For example, if argv[1] is --foo=98, then foo_value is 98.)
Update: I'm hesitant to look into Boost, since I'm just looking at making a very small change to a simple little command-line tool (I'd rather not have to learn how to link in and use Boost for a minor change).

Use rfind overload that takes the search position pos parameter, and pass zero for it:
std::string s = "tititoto";
if (s.rfind("titi", 0) == 0) { // pos=0 limits the search to the prefix
// s starts with prefix
}
Who needs anything else? Pure STL!
Many have misread this to mean "search backwards through the whole string looking for the prefix". That would give the wrong result (e.g. string("tititito").rfind("titi") returns 2 so when compared against == 0 would return false) and it would be inefficient (looking through the whole string instead of just the start). But it does not do that because it passes the pos parameter as 0, which limits the search to only match at that position or earlier. For example:
std::string test = "0123123";
size_t match1 = test.rfind("123"); // returns 4 (rightmost match)
size_t match2 = test.rfind("123", 2); // returns 1 (skipped over later match)
size_t match3 = test.rfind("123", 0); // returns std::string::npos (i.e. not found)

You would do it like this:
std::string prefix("--foo=");
if (!arg.compare(0, prefix.size(), prefix))
foo_value = std::stoi(arg.substr(prefix.size()));
Looking for a lib such as Boost.ProgramOptions that does this for you is also a good idea.

Just for completeness, I will mention the C way to do it:
If str is your original string, substr is the substring you want to
check, then
strncmp(str, substr, strlen(substr))
will return 0 if str
starts with substr. The functions strncmp and strlen are in the C
header file <string.h>
(originally posted by Yaseen Rauf here, markup added)
For a case-insensitive comparison, use strnicmp instead of strncmp.
This is the C way to do it, for C++ strings you can use the same function like this:
strncmp(str.c_str(), substr.c_str(), substr.size())

If you're already using Boost, you can do it with boost string algorithms + boost lexical cast:
#include <boost/algorithm/string/predicate.hpp>
#include <boost/lexical_cast.hpp>
try {
if (boost::starts_with(argv[1], "--foo="))
foo_value = boost::lexical_cast<int>(argv[1]+6);
} catch (boost::bad_lexical_cast) {
// bad parameter
}
This kind of approach, like many of the other answers provided here is ok for very simple tasks, but in the long run you are usually better off using a command line parsing library. Boost has one (Boost.Program_options), which may make sense if you happen to be using Boost already.
Otherwise a search for "c++ command line parser" will yield a number of options.

Code I use myself:
std::string prefix = "-param=";
std::string argument = argv[1];
if(argument.substr(0, prefix.size()) == prefix) {
std::string argumentValue = argument.substr(prefix.size());
}

Nobody used the STL algorithm/mismatch function yet. If this returns true, prefix is a prefix of 'toCheck':
std::mismatch(prefix.begin(), prefix.end(), toCheck.begin()).first == prefix.end()
Full example prog:
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, char** argv) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "Will print true if 'prefix' is a prefix of string" << std::endl;
return -1;
}
std::string prefix(argv[1]);
std::string toCheck(argv[2]);
if (prefix.length() > toCheck.length()) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "'prefix' is longer than 'string'" << std::endl;
return 2;
}
if (std::mismatch(prefix.begin(), prefix.end(), toCheck.begin()).first == prefix.end()) {
std::cout << '"' << prefix << '"' << " is a prefix of " << '"' << toCheck << '"' << std::endl;
return 0;
} else {
std::cout << '"' << prefix << '"' << " is NOT a prefix of " << '"' << toCheck << '"' << std::endl;
return 1;
}
}
Edit:
As #James T. Huggett suggests, std::equal is a better fit for the question: Is A a prefix of B? and is slight shorter code:
std::equal(prefix.begin(), prefix.end(), toCheck.begin())
Full example prog:
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, char **argv) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "Will print true if 'prefix' is a prefix of string"
<< std::endl;
return -1;
}
std::string prefix(argv[1]);
std::string toCheck(argv[2]);
if (prefix.length() > toCheck.length()) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "'prefix' is longer than 'string'" << std::endl;
return 2;
}
if (std::equal(prefix.begin(), prefix.end(), toCheck.begin())) {
std::cout << '"' << prefix << '"' << " is a prefix of " << '"' << toCheck
<< '"' << std::endl;
return 0;
} else {
std::cout << '"' << prefix << '"' << " is NOT a prefix of " << '"'
<< toCheck << '"' << std::endl;
return 1;
}
}

With C++17 you can use std::basic_string_view & with C++20 std::basic_string::starts_with or std::basic_string_view::starts_with.
The benefit of std::string_view in comparison to std::string - regarding memory management - is that it only holds a pointer to a "string" (contiguous sequence of char-like objects) and knows its size. Example without moving/copying the source strings just to get the integer value:
#include <exception>
#include <iostream>
#include <string>
#include <string_view>
int main()
{
constexpr auto argument = "--foo=42"; // Emulating command argument.
constexpr auto prefix = "--foo=";
auto inputValue = 0;
constexpr auto argumentView = std::string_view(argument);
if (argumentView.starts_with(prefix))
{
constexpr auto prefixSize = std::string_view(prefix).size();
try
{
// The underlying data of argumentView is nul-terminated, therefore we can use data().
inputValue = std::stoi(argumentView.substr(prefixSize).data());
}
catch (std::exception & e)
{
std::cerr << e.what();
}
}
std::cout << inputValue; // 42
}

Given that both strings — argv[1] and "--foo" — are C strings, #FelixDombek's answer is hands-down the best solution.
Seeing the other answers, however, I thought it worth noting that, if your text is already available as a std::string, then a simple, zero-copy, maximally efficient solution exists that hasn't been mentioned so far:
const char * foo = "--foo";
if (text.rfind(foo, 0) == 0)
foo_value = text.substr(strlen(foo));
And if foo is already a string:
std::string foo("--foo");
if (text.rfind(foo, 0) == 0)
foo_value = text.substr(foo.length());

Starting with C++20, you can use the starts_with method.
std::string s = "abcd";
if (s.starts_with("abc")) {
...
}

text.substr(0, start.length()) == start

Using STL this could look like:
std::string prefix = "--foo=";
std::string arg = argv[1];
if (prefix.size()<=arg.size() && std::equal(prefix.begin(), prefix.end(), arg.begin())) {
std::istringstream iss(arg.substr(prefix.size()));
iss >> foo_value;
}

At the risk of being flamed for using C constructs, I do think this sscanf example is more elegant than most Boost solutions. And you don't have to worry about linkage if you're running anywhere that has a Python interpreter!
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
for (int i = 1; i != argc; ++i) {
int number = 0;
int size = 0;
sscanf(argv[i], "--foo=%d%n", &number, &size);
if (size == strlen(argv[i])) {
printf("number: %d\n", number);
}
else {
printf("not-a-number\n");
}
}
return 0;
}
Here's some example output that demonstrates the solution handles leading/trailing garbage as correctly as the equivalent Python code, and more correctly than anything using atoi (which will erroneously ignore a non-numeric suffix).
$ ./scan --foo=2 --foo=2d --foo='2 ' ' --foo=2'
number: 2
not-a-number
not-a-number
not-a-number

I use std::string::compare wrapped in utility method like below:
static bool startsWith(const string& s, const string& prefix) {
return s.size() >= prefix.size() && s.compare(0, prefix.size(), prefix) == 0;
}

C++20 update :
Use std::string::starts_with
https://en.cppreference.com/w/cpp/string/basic_string/starts_with
std::string str_value = /* smthg */;
const auto starts_with_foo = str_value.starts_with(std::string_view{"foo"});

In C++20 now there is starts_with available as a member function of std::string defined as:
constexpr bool starts_with(string_view sv) const noexcept;
constexpr bool starts_with(CharT c) const noexcept;
constexpr bool starts_with(const CharT* s) const;
So your code could be something like this:
std::string s{argv[1]};
if (s.starts_with("--foo="))

In case you need C++11 compatibility and cannot use boost, here is a boost-compatible drop-in with an example of usage:
#include <iostream>
#include <string>
static bool starts_with(const std::string str, const std::string prefix)
{
return ((prefix.size() <= str.size()) && std::equal(prefix.begin(), prefix.end(), str.begin()));
}
int main(int argc, char* argv[])
{
bool usage = false;
unsigned int foos = 0; // default number of foos if no parameter was supplied
if (argc > 1)
{
const std::string fParamPrefix = "-f="; // shorthand for foo
const std::string fooParamPrefix = "--foo=";
for (unsigned int i = 1; i < argc; ++i)
{
const std::string arg = argv[i];
try
{
if ((arg == "-h") || (arg == "--help"))
{
usage = true;
} else if (starts_with(arg, fParamPrefix)) {
foos = std::stoul(arg.substr(fParamPrefix.size()));
} else if (starts_with(arg, fooParamPrefix)) {
foos = std::stoul(arg.substr(fooParamPrefix.size()));
}
} catch (std::exception& e) {
std::cerr << "Invalid parameter: " << argv[i] << std::endl << std::endl;
usage = true;
}
}
}
if (usage)
{
std::cerr << "Usage: " << argv[0] << " [OPTION]..." << std::endl;
std::cerr << "Example program for parameter parsing." << std::endl << std::endl;
std::cerr << " -f, --foo=N use N foos (optional)" << std::endl;
return 1;
}
std::cerr << "number of foos given: " << foos << std::endl;
}

Why not use gnu getopts? Here's a basic example (without safety checks):
#include <getopt.h>
#include <stdio.h>
int main(int argc, char** argv)
{
option long_options[] = {
{"foo", required_argument, 0, 0},
{0,0,0,0}
};
getopt_long(argc, argv, "f:", long_options, 0);
printf("%s\n", optarg);
}
For the following command:
$ ./a.out --foo=33
You will get
33

Ok why the complicated use of libraries and stuff? C++ String objects overload the [] operator, so you can just compare chars.. Like what I just did, because I want to list all files in a directory and ignore invisible files and the .. and . pseudofiles.
while ((ep = readdir(dp)))
{
string s(ep->d_name);
if (!(s[0] == '.')) // Omit invisible files and .. or .
files.push_back(s);
}
It's that simple..

You can also use strstr:
if (strstr(str, substr) == substr) {
// 'str' starts with 'substr'
}
but I think it's good only for short strings because it has to loop through the whole string when the string doesn't actually start with 'substr'.

With C++11 or higher you can use find() and find_first_of()
Example using find to find a single char:
#include <string>
std::string name = "Aaah";
size_t found_index = name.find('a');
if (found_index != std::string::npos) {
// Found string containing 'a'
}
Example using find to find a full string & starting from position 5:
std::string name = "Aaah";
size_t found_index = name.find('h', 3);
if (found_index != std::string::npos) {
// Found string containing 'h'
}
Example using the find_first_of() and only the first char, to search at the start only:
std::string name = ".hidden._di.r";
size_t found_index = name.find_first_of('.');
if (found_index == 0) {
// Found '.' at first position in string
}
More about find
More about find_first_of
Good luck!

std::string text = "--foo=98";
std::string start = "--foo=";
if (text.find(start) == 0)
{
int n = stoi(text.substr(start.length()));
std::cout << n << std::endl;
}

Since C++11 std::regex_search can also be used to provide even more complex expressions matching. The following example handles also floating numbers thorugh std::stof and a subsequent cast to int.
However the parseInt method shown below could throw a std::invalid_argument exception if the prefix is not matched; this can be easily adapted depending on the given application:
#include <iostream>
#include <regex>
int parseInt(const std::string &str, const std::string &prefix) {
std::smatch match;
std::regex_search(str, match, std::regex("^" + prefix + "([+-]?(?=\\.?\\d)\\d*(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)$"));
return std::stof(match[1]);
}
int main() {
std::cout << parseInt("foo=13.3", "foo=") << std::endl;
std::cout << parseInt("foo=-.9", "foo=") << std::endl;
std::cout << parseInt("foo=+13.3", "foo=") << std::endl;
std::cout << parseInt("foo=-0.133", "foo=") << std::endl;
std::cout << parseInt("foo=+00123456", "foo=") << std::endl;
std::cout << parseInt("foo=-06.12e+3", "foo=") << std::endl;
// throw std::invalid_argument
// std::cout << parseInt("foo=1", "bar=") << std::endl;
return 0;
}
The kind of magic of the regex pattern is well detailed in the following answer.
EDIT: the previous answer did not performed the conversion to integer.

if(boost::starts_with(string_to_search, string_to_look_for))
intval = boost::lexical_cast<int>(string_to_search.substr(string_to_look_for.length()));
This is completely untested. The principle is the same as the Python one. Requires Boost.StringAlgo and Boost.LexicalCast.
Check if the string starts with the other string, and then get the substring ('slice') of the first string and convert it using lexical cast.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Easy way to parse a url in C++ cross platform? - c++

The Poco library now has a class for dissecting URI's and feeding back the host, path segments and query string etc. https://pocoproject.org/pro/docs/Poco.URI.html

QT has QUrl for this. GNOME has SoupURI in libsoup, which you'll probably find a little more light-weight.

Facebook's Folly library can do the job for you easily. Simply use the Uri class: #include <folly/Uri.h> int main() { folly::Uri folly("https://code.facebook.com/posts/177011135812493/"); folly.scheme(); // https folly.host(); // code.facebook.com folly.path(); // posts/177011135812493/ }

This library is very tiny and lightweight: https://github.com/corporateshark/LUrlParser However, it is parsing only, no URL normalization/validation.

Also of interest could be http://code.google.com/p/uri-grammar/ which like Dean Michael's netlib uses boost spirit to parse a URI. Came across it at Simple expression parser example using Boost::Spirit?

There is yet another library https://snapwebsites.org/project/libtld which handles all possible top level domains and URI shema

Related

How do I normalize a filepath in C++ using std::filesystem::path?

Why is my string extraction function using back referencing in regex not working as intended?

Search partial filenames in C++ using boost filesystem

Passing variable (array type) from function to "main" scope Type: std::tr1::match_results<std::string::const_iterator>

How do I check if a C++ std::string starts with a certain string, and convert a substring to an int?

Categories

Resources