Search partial filenames in C++ using boost filesystem

Search partial filenames in C++ using boost filesystem - c++

the question is simple , I want to find a file path inside a directory but I have only part of the filename, so here is a functions for this task
void getfiles(const fs::path& root, const string& ext, vector<fs::path>& ret)
{
if(!fs::exists(root) || !fs::is_directory(root)) return;
fs::recursive_directory_iterator it(root);
fs::recursive_directory_iterator endit;
while(it != endit)
{
if(fs::is_regular_file(*it)&&it->path().extension()==ext) ret.push_back(it->path());//
++it;
}
}
bool find_file(const filesystem::path& dir_path, const filesystem::path file_name, filesystem::path& path_found) {
const fs::recursive_directory_iterator end;
const auto it = find_if(fs::recursive_directory_iterator(dir_path), end,
[file_name](fs::path e) {
cerr<<boost::algorithm::icontains(e.filename().native() ,file_name.native())<<endl;
return boost::algorithm::icontains(e.filename().native() ,file_name.native());//
});
if (it == end) {
return false;
} else {
path_found = it->path();
return true;
}
}
int main (int argc, char* argv[])
{
vector<fs::path> inputClass ;
fs::path textFiles,datasetPath,imgpath;
textFiles=argv[1];
datasetPath=argv[2];
getfiles(textFiles,".txt",inputClass);
for (int i=0;i<inputClass.size();i++)
{
ifstream lblFile(inputClass[i].string().c_str());
string line;
fs::path classname=inputClass[i].parent_path()/inputClass[i].stem().string();
cerr<<classname.stem()<<endl;
while (getline(lblFile,line))
{
bool find=find_file(datasetPath,line,imgpath);
if (find)
{
while(!fs::exists(classname))
fs::create_directories (classname);
fs::copy(imgpath,classname/imgpath.filename());
cerr<<"Found\n";
}
else
cerr<<"Not Found \n";
}
lblFile.close();
}
}
Console out:
"490"
vfv343434.jpeg||E9408000EC0
0
fsdfdsfdfsf.jpeg||E9408000EC0
0
1200E9408000EC0.jpeg||E9408000EC0
0
Not Found
but when I set the search string manually it works fine ! I tried other methods for searching string like std::find but all the methods fail to find the substring, it seems there is problem with input string (line) I printed all the chars but no especial characters or anything.
if I set the search string manually it works as desired
string search="E9408000EC0";
cerr<<e.filename().native()<<"||"<<search<<endl;
cerr<<boost::algorithm::icontains(e.filename().native() ,search)<<endl;
the results for above change is like
"490"
vfv343434.jpeg||E9408000EC0
0
fsdfdsfdfsf.jpeg||E9408000EC0
0
1200E9408000EC0.jpeg||E9408000EC0
1
Found

I cannot reproduce this.
The only hunch I have is that on your platform, perhaps the string() accessor is not returning the plain string, but e.g. the quoted path. That would break the search. Consider using the native() accessor instead.
(In fact, since file_name is NOT a path, but a string pattern, suggest passing the argument as std::string__view or similar instead.)
Live On Coliru
#include <boost/filesystem.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
namespace fs = boost::filesystem;
template <typename Out>
void find_file(const fs::path& dir_path, const fs::path file_name, Out out) {
fs::recursive_directory_iterator it(dir_path), end;
std::copy_if(it, end, out, [file_name](fs::path e) {
return boost::algorithm::icontains(e.filename().native(),
file_name.native());
});
}
int main() {
fs::path d = "a/b/c/e";
fs::create_directories(d);
{
std::ofstream ofs(d / "1200E9408000EC0.jpeg");
}
std::cout << fs::path("000EC0").native() << "\n";
std::vector<fs::path> found;
find_file(".", "000EC0", back_inserter(found));
for (auto &f : found)
{
std::cout << "Found: " << f << "\n";
}
}
Prints
000EC0
Found: "./a/b/c/e/1200E9408000EC0.jpeg"
UPDATE: Code Review
To the updated question, came up with an somewhat improved tester that works with boost::filesystem and with std::filesystem just the same.
There are many small improvements (removing repetition, explicit conversions, using optional to return optional matches, etc.
Also added a whitespace trim to avoid choking on extraneous whitespace on the input lines:
Live On Coliru (-DUSE_BOOST_FS)
Live On Coliru (std library)
#include <boost/algorithm/string.hpp>
#include <fstream>
#include <iostream>
using boost::algorithm::icontains;
using boost::algorithm::trim;
#if defined(USE_BOOST_FS)
#include <boost/filesystem.hpp>
namespace fs = boost::filesystem;
using boost::system::error_code;
#else
#include <filesystem>
namespace fs = std::filesystem;
using std::error_code;
#endif
void getfiles(
const fs::path& root, const std::string& ext, std::vector<fs::path>& ret)
{
if (!exists(root) || !is_directory(root))
return;
for (fs::recursive_directory_iterator it(root), endit; it != endit; ++it) {
if (is_regular_file(*it) && it->path().extension() == ext)
ret.push_back(it->path()); //
}
}
std::optional<fs::path> find_file(const fs::path& dir_path, fs::path partial)
{
fs::recursive_directory_iterator end,
it = fs::recursive_directory_iterator(dir_path);
it = std::find_if(it, end, [partial](fs::path e) {
auto search = partial.native();
//std::cerr << e.filename().native() << "||" << search << std::endl;
auto matches = icontains(e.filename().native(), search);
std::cerr << e << " Matches: " << std::boolalpha << matches
<< std::endl;
return matches;
});
return (it != end)
? std::make_optional(it->path())
: std::nullopt;
}
auto readInputClass(fs::path const& textFiles)
{
std::vector<fs::path> found;
getfiles(textFiles, ".txt", found);
return found;
}
int main(int argc, char** argv)
{
std::vector<std::string> const args(argv, argv + argc);
auto const textFiles = readInputClass(args.at(1));
std::string const datasetPath = args.at(2);
for (fs::path classname : textFiles) {
// open the text file
std::ifstream lblFile(classname);
// use base without extension as output directory
classname.replace_extension();
if (!fs::exists(classname)) {
if (fs::create_directories(classname))
std::cerr << classname << " created" << std::endl;
}
for (std::string line; getline(lblFile, line);) {
trim(line);
if (auto found = find_file(datasetPath, line)) {
auto dest = classname / found->filename();
error_code ec;
copy(*found, dest, ec);
std::cerr << dest << " (" << ec.message() << ")\n";
} else {
std::cerr << "Not Found \n";
}
}
}
}
Testing from scratch with
mkdir -pv textfiles dataset
touch dataset/{vfv343434,fsdfdsfdfsf,1200E9408000EC0}.jpeg
echo 'E9408000EC0 ' > textfiles/490.txt
Running
./a.out textfiles/ dataset/
Prints
"textfiles/490" created
"dataset/1200E9408000EC0.jpeg" Matches: true
"textfiles/490/1200E9408000EC0.jpeg" (Success)
Or on subsequent run
"dataset/fsdfdsfdfsf.jpeg" Matches: false
"dataset/1200E9408000EC0.jpeg" Matches: true
"textfiles/490/1200E9408000EC0.jpeg" (File exists)
BONUS
Doing some more diagnostics and avoiding repeatedly traversing the filesystem for each pattern. The main program is now:
Live On Coliru
int main(int argc, char** argv)
{
std::vector<std::string> const args(argv, argv + argc);
Paths const classes = getfiles(args.at(1), ".txt");
Mappings map = readClassMappings(classes);
std::cout << "Procesing " << map.size() << " patterns from "
<< classes.size() << " classes" << std::endl;
processDatasetDir(args.at(2), map);
}
And the remaining functions are implemented as:
// be smart about case insenstiive patterns
struct Pattern : std::string {
using std::string::string;
using std::string::operator=;
#ifdef __cpp_lib_three_way_comparison
std::weak_ordering operator<=>(Pattern const& other) const {
if (boost::ilexicographical_compare(*this, other)) {
return std::weak_ordering::less;
} else if (boost::ilexicographical_compare(other, *this)) {
return std::weak_ordering::less;
}
return std::weak_ordering::equivalent;
}
#else
bool operator<(Pattern const& other) const {
return boost::ilexicographical_compare(*this, other);
}
#endif
};
using Paths = std::vector<fs::path>;
using Mapping = std::pair<Pattern, fs::path>;
using Patterns = std::set<Pattern>;
using Mappings = std::set<Mapping>;
Mappings readClassMappings(Paths const& classes)
{
Mappings mappings;
for (fs::path classname : classes) {
std::ifstream lblFile(classname);
classname.replace_extension();
for (Pattern pattern; getline(lblFile, pattern);) {
trim(pattern);
if (auto [it, ok] = mappings.emplace(pattern, classname); !ok) {
std::cerr << "WARNING: " << std::quoted(pattern)
<< " duplicates " << std::quoted(it->first)
<< std::endl;
}
}
}
return mappings;
}
size_t processDatasetDir(const fs::path& datasetPath, Mappings const& patterns)
{
size_t copied = 0, failed = 0;
Patterns found;
using It = fs::recursive_directory_iterator;
for (It it = It(datasetPath), end; it != end; ++it) {
if (!it->is_regular_file())
continue;
fs::path const& entry = *it;
for (auto& [pattern, location]: patterns) {
if (icontains(it->path().filename().native(), pattern)) {
found.emplace(pattern);
if (!exists(location) && fs::create_directories(location))
std::cerr << location << " created" << std::endl;
auto dest = location / entry.filename();
error_code ec;
copy(entry, dest, ec);
std::cerr << dest << " (" << ec.message() << ") from "
<< std::quoted(pattern) << "\n";
(ec? failed : copied) += 1;
}
}
}
std::cout << "Copied:" << copied
<< ", missing:" << patterns.size() - found.size()
<< ", failed: " << failed << std::endl;
return copied;
}
With some more "random" test data:
mkdir -pv textfiles dataset
touch dataset/{vfv343434,fsdfdsfdfsf,1200E9408000EC0}.jpeg
echo .jPeg > textfiles/all_of_them.txt
echo $'E9408000EC0 \n e9408000ec0\nE9408\nbOgUs' > textfiles/490.txt
Running as
./a.out textfiles/ dataset/
Prints:
WARNING: "e9408000ec0" duplicates "E9408000EC0"
Procesing 4 patterns from 2 classes
"textfiles/all_of_them" created
"textfiles/all_of_them/1200E9408000EC0.jpeg" (Success) from ".jPeg"
"textfiles/490" created
"textfiles/490/1200E9408000EC0.jpeg" (Success) from "E9408"
"textfiles/490/1200E9408000EC0.jpeg" (File exists) from "E9408000EC0"
"textfiles/all_of_them/vfv343434.jpeg" (Success) from ".jPeg"
"textfiles/all_of_them/fsdfdsfdfsf.jpeg" (Success) from ".jPeg"
Copied:4, missing:1, failed: 1

Related

Why is my string extraction function using back referencing in regex not working as intended?

Extraction Function
string extractStr(string str, string regExpStr) {
regex regexp(regExpStr);
smatch m;
regex_search(str, m, regexp);
string result = "";
for (string x : m)
result = result + x;
return result;
}
The Main Code
#include <iostream>
#include <regex>
using namespace std;
string extractStr(string, string);
int main(void) {
string test = "(1+1)*(n+n)";
cout << extractStr(test, "n\\+n") << endl;
cout << extractStr(test, "(\\d)\\+\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]\\1") << endl;
cout << extractStr(test, "([a-zA-Z])[+-/*]([a-zA-Z])") << endl;
return 0;
}
The Output
String = (1+1)*(n+n)
n\+n = n+n
(\d)\+\1 = 1+11
([a-zA-Z])[+-/*]\1 = n+nn
([a-zA-Z])[+-/*]([a-zA-Z]) = n+nnn
If anyone could kindly point the error I've done or point me to a similar question in SO that I've missed while searching, it would be greatly appreciated.

Regexes in C++ don't work quite like "normal" regexes. Specialy when you are looking for multiple groups later. I also have some C++ tips in here (constness and references).
#include <cassert>
#include <iostream>
#include <sstream>
#include <regex>
#include <string>
// using namespace std; don't do this!
// https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice
// pass strings by const reference
// 1. const, you promise not to change them in this function
// 2. by reference, you avoid making copies
std::string extractStr(const std::string& str, const std::string& regExpStr)
{
std::regex regexp(regExpStr);
std::smatch m;
std::ostringstream os; // streams are more efficient for building up strings
auto begin = str.cbegin();
bool comma = false;
// C++ matches regexes in parts so work you need to loop
while (std::regex_search(begin, str.end(), m, regexp))
{
if (comma) os << ", ";
os << m[0];
comma = true;
begin = m.suffix().first;
}
return os.str();
}
// small helper function to produce nicer output for your tests.
void test(const std::string& input, const std::string& regex, const std::string& expected)
{
auto output = extractStr(input, regex);
if (output == expected)
{
std::cout << "test succeeded : output = " << output << "\n";
}
else
{
std::cout << "test failed : output = " << output << ", expected : " << expected << "\n";
}
}
int main(void)
{
std::string input = "(1+1)*(n+n)";
test(input, "n\\+n", "n+n");
test(input, "(\\d)\\+\\1", "1+1");
test(input, "([a-zA-Z])[+-/*]\\1", "n+n");
return 0;
}

How to put file in boost::interprocess::managed_shared_memory?

How to put a file with an arbitrary name and arbitrary size into a boost::interprocess::managed_shared_memory?
Note, I donot mean boost::interprocess::managed_mapped_file or
boost::interprocess::file_mapping.
I chose managed_shared_memory because other options require a fixed file name
to be specified but I need to transfer files with different names.
I need to use boost, not Win32 API.
I rummaged through a huge amount of information on the Internet, but did not
find anything suitable.
Therefore, I am asking you for help. I would be very grateful to you.

UPDATE
Added bonus versions at the end. Now this answer presents three complete versions of the code:
Using managed_shared_memory as requested
Using message_queue as a more natural appraoch for upload/transfter
Using TCP sockets (Asio) as to demonstrate the flexibilities of that
All of these are using Boost only
Shared memory managed segments contain arbitrary objects. So you define an object like
struct MyFile {
std::string _filename;
std::vector<char> _contents;
};
And store it there. But, wait, not so quick, because these can only be stored safely with interprocess allocators, so adding some magic sauce (a.k.a lots of interesting typedefs to get the allocators declared, and some constructors):
namespace Shared {
using Mem = bip::managed_shared_memory;
using Mgr = Mem::segment_manager;
template <typename T>
using Alloc = bc::scoped_allocator_adaptor<bip::allocator<T, Mgr>>;
template <typename T> using Vector = bc::vector<T, Alloc<T>>;
using String =
bc::basic_string<char, std::char_traits<char>, Alloc<char>>;
struct MyFile {
using allocator_type = Alloc<char>;
template <typename It>
explicit MyFile(std::string_view name, It b, It e, allocator_type alloc)
String _filename;
Vector<char> _contents;
};
}
Now you can store your files like:
Shared::Mem shm(bip::open_or_create, "shared_mem", 10ull << 30);
std::ifstream ifs("file_name.txt", std::ios::binary);
std::istreambuf_iterator<char> data_begin{ifs}, data_end{};
auto loaded = shm.find_or_construct<Shared::MyFile>("file1")(
file.native(), data_begin, data_end,
shm.get_segment_manager());
Note that the shared memory won't actually take 30GiB right away, even though
that's what 10ull << 30 specifies. On most operating systems this will be
sparesely allocated and only the pages that contain data will be commited.
Improving
You might have wondered what the scoped_allocator_adaptor was for. It doesn't seem we use it?
Well, the idea was to not use find_or_construct directly per file, but to
store a Vector<MyFile so you can harness the full power of BIP allocators.
The following full demo can be invoked
with filename arguments, which will all be loaded (if they exist as
regular files)
without arguments, which will list previously loaded files
Live On Coliru
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/managed_mapped_file.hpp> // for COLIRU
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/containers/string.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <fstream>
#include <filesystem>
#include <iostream>
#include <iomanip>
namespace bip = boost::interprocess;
namespace bc = boost::container;
namespace fs = std::filesystem;
namespace Shared {
#ifdef COLIRU
using Mem = bip::managed_mapped_file; // managed_shared_memory not allows
#else
using Mem = bip::managed_shared_memory;
#endif
using Mgr = Mem::segment_manager;
template <typename T>
using Alloc = bc::scoped_allocator_adaptor<bip::allocator<T, Mgr>>;
template <typename T> using Vector = bc::vector<T, Alloc<T>>;
using String = bc::basic_string<char, std::char_traits<char>, Alloc<char>>;
struct MyFile {
using allocator_type = Alloc<char>;
MyFile(MyFile&&) = default;
MyFile(MyFile const& rhs, allocator_type alloc)
: _filename(rhs._filename.begin(), rhs._filename.end(), alloc),
_contents(rhs._contents.begin(), rhs._contents.end(), alloc) {}
MyFile& operator=(MyFile const& rhs) {
_filename.assign(rhs._filename.begin(), rhs._filename.end());
_contents.assign(rhs._contents.begin(), rhs._contents.end());
return *this;
}
template <typename It>
explicit MyFile(std::string_view name, It b, It e, allocator_type alloc)
: _filename(name.data(), name.size(), alloc),
_contents(b, e, alloc) {}
String _filename;
Vector<char> _contents;
friend std::ostream& operator<<(std::ostream& os, MyFile const& mf) {
return os << "Name: " << std::quoted(mf._filename.c_str())
<< " content size: " << mf._contents.size();
}
};
} // namespace Shared
int main(int argc, char** argv) {
Shared::Mem shm(bip::open_or_create, "shared_mem", 512ull << 10);
using FileList = Shared::Vector<Shared::MyFile>;
auto& shared_files =
*shm.find_or_construct<FileList>("FileList")(shm.get_segment_manager());
if (1==argc) {
std::cout << "Displaying previously loaded files: \n";
for (auto& entry : shared_files)
std::cout << entry << std::endl;
} else {
std::cout << "Loading files: \n";
for (auto file : std::vector<fs::path>{argv + 1, argv + argc}) {
if (is_regular_file(file)) {
try {
std::ifstream ifs(file, std::ios::binary);
std::istreambuf_iterator<char> data_begin{ifs}, data_end{};
auto& loaded = shared_files.emplace_back(
file.native(), data_begin, data_end);
std::cout << loaded << std::endl;
} catch (std::system_error const& se) {
std::cerr << "Error: " << se.code().message() << std::endl;
} catch (std::exception const& se) {
std::cerr << "Other: " << se.what() << std::endl;
}
}
}
}
}
When run with
g++ -std=c++17 -O2 -Wall -pedantic -pthread main.cpp -lrt -DCOLIRU
./a.out main.cpp a.out
./a.out
Prints
Loading files:
Name: "main.cpp" content size: 3239
Name: "a.out" content size: 175176
Displaying previously loaded files:
Name: "main.cpp" content size: 3239
Name: "a.out" content size: 175176
BONUS
In response to the comments, I think it's worth actually comparing
Message Queue version
For comparison, here's a message queue implementation
Live On Coliru
#include <boost/interprocess/ipc/message_queue.hpp>
#include <boost/endian/arithmetic.hpp>
#include <fstream>
#include <filesystem>
#include <iostream>
#include <iomanip>
namespace bip = boost::interprocess;
namespace fs = std::filesystem;
using bip::message_queue;
static constexpr auto MAX_FILENAME_LENGH = 512; // 512 bytes max filename length
static constexpr auto MAX_CONTENT_SIZE = 512ull << 10; // 512 KiB max payload size
struct Message {
std::vector<char> _buffer;
using Uint32 = boost::endian::big_uint32_t;
struct header_t {
Uint32 filename_length;
Uint32 content_size;
};
static_assert(std::is_standard_layout_v<header_t> and
std::is_trivial_v<header_t>);
Message() = default;
Message(fs::path file) {
std::string const name = file.native();
std::ifstream ifs(file, std::ios::binary);
std::istreambuf_iterator<char> data_begin{ifs}, data_end{};
_buffer.resize(header_len + name.length());
std::copy(begin(name), end(name), _buffer.data() + header_len);
_buffer.insert(_buffer.end(), data_begin, data_end);
header().filename_length = name.length();
header().content_size = size() - header_len - name.length();
}
Message(char const* buf, size_t size)
: _buffer(buf, buf+size) {}
static constexpr auto header_len = sizeof(header_t);
static constexpr auto max_size =
header_len + MAX_FILENAME_LENGH + MAX_CONTENT_SIZE;
char const* data() const { return _buffer.data(); }
size_t size() const { return _buffer.size(); }
header_t& header() {
assert(_buffer.size() >= header_len);
return *reinterpret_cast<header_t*>(_buffer.data());
}
header_t const& header() const {
assert(_buffer.size() >= header_len);
return *reinterpret_cast<header_t const*>(_buffer.data());
}
std::string_view filename() const {
assert(_buffer.size() >= header_len + header().filename_length);
return { _buffer.data() + header_len, header().filename_length };
}
std::string_view contents() const {
assert(_buffer.size() >=
header_len + header().filename_length + header().content_size);
return {_buffer.data() + header_len + header().filename_length,
header().content_size};
}
friend std::ostream& operator<<(std::ostream& os, Message const& mf) {
return os << "Name: " << std::quoted(mf.filename())
<< " content size: " << mf.contents().size();
}
};
int main(int argc, char** argv) {
message_queue mq(bip::open_or_create, "file_transport", 10, Message::max_size);
if (1==argc) {
std::cout << "Receiving uploaded files: \n";
char rawbuf [Message::max_size];
while (true) {
size_t n;
unsigned prio;
mq.receive(rawbuf, sizeof(rawbuf), n, prio);
Message decoded(rawbuf, n);
std::cout << "Received: " << decoded << std::endl;
}
} else {
std::cout << "Loading files: \n";
for (auto file : std::vector<fs::path>{argv + 1, argv + argc}) {
if (is_regular_file(file)) {
try {
Message encoded(file);
std::cout << "Sending: " << encoded << std::endl;
mq.send(encoded.data(), encoded.size(), 0);
} catch (std::system_error const& se) {
std::cerr << "Error: " << se.code().message() << std::endl;
} catch (std::exception const& se) {
std::cerr << "Other: " << se.what() << std::endl;
}
}
}
}
}
A demo:
Note that there is a filesize limit in this approach because messages have a maximum length
TCP Socket Version
Here's a TCP socket implementation.
Live On Coliru
#include <boost/asio.hpp>
#include <boost/endian/arithmetic.hpp>
#include <vector>
#include <fstream>
#include <filesystem>
#include <iostream>
#include <iomanip>
namespace fs = std::filesystem;
using boost::asio::ip::tcp;
using boost::system::error_code;
static constexpr auto MAX_FILENAME_LENGH = 512; // 512 bytes max filename length
static constexpr auto MAX_CONTENT_SIZE = 512ull << 10; // 512 KiB max payload size
struct Message {
std::vector<char> _buffer;
using Uint32 = boost::endian::big_uint32_t;
struct header_t {
Uint32 filename_length;
Uint32 content_size;
};
static_assert(std::is_standard_layout_v<header_t> and
std::is_trivial_v<header_t>);
Message() = default;
Message(fs::path file) {
std::string const name = file.native();
std::ifstream ifs(file, std::ios::binary);
std::istreambuf_iterator<char> data_begin{ifs}, data_end{};
_buffer.resize(header_len + name.length());
std::copy(begin(name), end(name), _buffer.data() + header_len);
_buffer.insert(_buffer.end(), data_begin, data_end);
header().filename_length = name.length();
header().content_size = actual_size() - header_len - name.length();
}
Message(char const* buf, size_t size)
: _buffer(buf, buf+size) {}
static constexpr auto header_len = sizeof(header_t);
static constexpr auto max_size =
header_len + MAX_FILENAME_LENGH + MAX_CONTENT_SIZE;
char const* data() const { return _buffer.data(); }
size_t actual_size() const { return _buffer.size(); }
size_t decoded_size() const {
return header().filename_length + header().content_size;
}
bool is_complete() const {
return actual_size() >= header_len && actual_size() >= decoded_size();
}
header_t& header() {
assert(actual_size() >= header_len);
return *reinterpret_cast<header_t*>(_buffer.data());
}
header_t const& header() const {
assert(actual_size() >= header_len);
return *reinterpret_cast<header_t const*>(_buffer.data());
}
std::string_view filename() const {
assert(actual_size() >= header_len + header().filename_length);
return std::string_view(_buffer.data() + header_len,
header().filename_length);
}
std::string_view contents() const {
assert(actual_size() >= decoded_size());
return std::string_view(_buffer.data() + header_len +
header().filename_length,
header().content_size);
}
friend std::ostream& operator<<(std::ostream& os, Message const& mf) {
return os << "Name: " << std::quoted(mf.filename())
<< " content size: " << mf.contents().size();
}
};
int main(int argc, char** argv) {
boost::asio::io_context ctx;
u_int16_t port = 8989;
if (1==argc) {
std::cout << "Receiving uploaded files: " << std::endl;
tcp::acceptor acc(ctx, tcp::endpoint{{}, port});
while (true) {
auto s = acc.accept();
std::cout << "Connection accepted from " << s.remote_endpoint() << std::endl;
Message msg;
auto buf = boost::asio::dynamic_buffer(msg._buffer);
error_code ec;
while (auto n = read(s, buf, ec)) {
std::cout << "(read " << n << " bytes, " << ec.message() << ")" << std::endl;
while (msg.is_complete()) {
std::cout << "Received: " << msg << std::endl;
buf.consume(msg.decoded_size() + Message::header_len);
}
}
std::cout << "Connection closed" << std::endl;
}
} else {
std::cout << "Loading files: " << std::endl;
tcp::socket s(ctx);
s.connect(tcp::endpoint{{}, port});
for (auto file : std::vector<fs::path>{argv + 1, argv + argc}) {
if (is_regular_file(file)) {
try {
Message encoded(file);
std::cout << "Sending: " << encoded << std::endl;
write(s, boost::asio::buffer(encoded._buffer));
} catch (std::system_error const& se) {
std::cerr << "Error: " << se.code().message() << std::endl;
} catch (std::exception const& se) {
std::cerr << "Other: " << se.what() << std::endl;
}
}
}
}
}
Demo:
Note how this easily scales to larger files, multiple files in a single connection and even multiple connections simultaneously if you need. It also doesn't do double buffering, which improves performance.
This is why this kind of approach is much more usual than any of your other approaches.

How to get the largest files in the current working directory in unix?

I know how to get the list of files in Unix. The c++ program that I tried is below. Now how do I print the largest files in descending order?
int main() {
DIR* drstrm = opendir(".");
if (drstrm == NULL) {
perror("error opening directory");
return 1;
}
struct dirent* directoryentry = readdir(drstrm);
while (directoryentry != NULL) {
cout << (*directoryentry).d_name << endl;
directoryentry = readdir(drstrm);
}
return 0;
}

Since you said you can use C++17, the filesystem library it introduces makes this really easy (And portable to systems that don't have opendir()/readdir()):
#include <iostream>
#include <vector>
#include <filesystem>
#include <algorithm>
#include <string>
int main(int argc, char **argv) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " DIRECTORY\n";
return 1;
}
std::vector<std::filesystem::directory_entry> files;
for (const auto &dirent : std::filesystem::directory_iterator(argv[1])) {
if (dirent.is_regular_file()) {
files.push_back(dirent);
}
}
std::sort(files.begin(), files.end(), [](const auto &a, const auto &b){
return a.file_size() > b.file_size(); });
for (const auto &dirent : files) {
// Quotes the filenames
// std::cout << dirent.path() << '\n';
// Doesn't quote
std::cout << static_cast<std::string>(dirent.path()) << '\n';
}
return 0;
}
Usage:
$ g++-8 -std=c++17 -O -Wall -Wextra test.cpp -lstdc++fs
$ ./a.out .
a.out
bigfile.txt
test.cpp
smallfile.txt
etc.
If you can't use C++17, the same approach still holds: Put the file names and their sizes in a vector, and sort based on the sizes using > instead of the normal < (Which would sort from smallest to largest). On POSIX systems, you can get the file size with stat(2).

To do this you are going to have to read the file info into a data structure (like a std::vector) and then sort the file info according to their size.
The old fashioned way could go something like this:
DIR* drstrm = opendir(".");
if(drstrm == NULL)
throw std::runtime_error(std::strerror(errno));
struct stat st; // this is to use decltype
// keep info from dirent & stat in one place
struct file_info
{
std::string name;
decltype(st.st_size) size;
};
// store list of files here to be sorted
std::vector<file_info> files;
while(dirent* entry = readdir(drstrm))
{
// get file info
if(::stat(entry->d_name, &st) == -1)
throw std::runtime_error(std::strerror(errno));
// is it a regular file?
if(!S_ISREG(st.st_mode))
continue;
// store it ready for sorting
files.push_back({entry->d_name, st.st_size});
}
// sort the file_info objects according to size
std::sort(std::begin(files), std::end(files), [](file_info const& a, file_info const& b){
return a.size < b.size;
});
// print them out
for(auto const& file: files)
std::cout << file.name << ": " << file.size << '\n';
Fortunately in newer versions of C++ (C++17) you can use the new <filesystem> standard library:
namespace fs = std::filesystem; // for brevity
std::vector<fs::path> files;
for(auto const& ent: fs::directory_iterator("."))
{
if(!fs::is_regular_file(ent))
continue;
files.push_back(ent);
}
std::sort(std::begin(files), std::end(files), [](fs::path const& a, fs::path const& b){
return fs::file_size(a) < fs::file_size(b);
});
for(auto const& file: files)
std::cout << file << ": " << fs::file_size(file) << '\n';

Easy way to parse a url in C++ cross platform?

I need to parse a URL to get the protocol, host, path, and query in an application I am writing in C++. The application is intended to be cross-platform. I'm surprised I can't find anything that does this in the boost or POCO libraries. Is it somewhere obvious I'm not looking? Any suggestions on appropriate open source libs? Or is this something I just have to do my self? It's not super complicated but it seems like such a common task I am surprised there isn't a common solution.

There is a library that's proposed for Boost inclusion and allows you to parse HTTP URI's easily. It uses Boost.Spirit and is also released under the Boost Software License. The library is cpp-netlib which you can find the documentation for at http://cpp-netlib.github.com/ -- you can download the latest release from http://github.com/cpp-netlib/cpp-netlib/downloads .
The relevant type you'll want to use is boost::network::http::uri and is documented here.

Wstring version of above, added other fields I needed. Could definitely be refined, but good enough for my purposes.
#include <string>
#include <algorithm> // find
struct Uri
{
public:
std::wstring QueryString, Path, Protocol, Host, Port;
static Uri Parse(const std::wstring &uri)
{
Uri result;
typedef std::wstring::const_iterator iterator_t;
if (uri.length() == 0)
return result;
iterator_t uriEnd = uri.end();
// get query start
iterator_t queryStart = std::find(uri.begin(), uriEnd, L'?');
// protocol
iterator_t protocolStart = uri.begin();
iterator_t protocolEnd = std::find(protocolStart, uriEnd, L':'); //"://");
if (protocolEnd != uriEnd)
{
std::wstring prot = &*(protocolEnd);
if ((prot.length() > 3) && (prot.substr(0, 3) == L"://"))
{
result.Protocol = std::wstring(protocolStart, protocolEnd);
protocolEnd += 3; // ://
}
else
protocolEnd = uri.begin(); // no protocol
}
else
protocolEnd = uri.begin(); // no protocol
// host
iterator_t hostStart = protocolEnd;
iterator_t pathStart = std::find(hostStart, uriEnd, L'/'); // get pathStart
iterator_t hostEnd = std::find(protocolEnd,
(pathStart != uriEnd) ? pathStart : queryStart,
L':'); // check for port
result.Host = std::wstring(hostStart, hostEnd);
// port
if ((hostEnd != uriEnd) && ((&*(hostEnd))[0] == L':')) // we have a port
{
hostEnd++;
iterator_t portEnd = (pathStart != uriEnd) ? pathStart : queryStart;
result.Port = std::wstring(hostEnd, portEnd);
}
// path
if (pathStart != uriEnd)
result.Path = std::wstring(pathStart, queryStart);
// query
if (queryStart != uriEnd)
result.QueryString = std::wstring(queryStart, uri.end());
return result;
} // Parse
}; // uri
Tests/Usage
Uri u0 = Uri::Parse(L"http://localhost:80/foo.html?&q=1:2:3");
Uri u1 = Uri::Parse(L"https://localhost:80/foo.html?&q=1");
Uri u2 = Uri::Parse(L"localhost/foo");
Uri u3 = Uri::Parse(L"https://localhost/foo");
Uri u4 = Uri::Parse(L"localhost:8080");
Uri u5 = Uri::Parse(L"localhost?&foo=1");
Uri u6 = Uri::Parse(L"localhost?&foo=1:2:3");
u0.QueryString, u0.Path, u0.Protocol, u0.Host, u0.Port....

Terribly sorry, couldn't help it. :s
url.hh
#ifndef URL_HH_
#define URL_HH_
#include <string>
struct url {
url(const std::string& url_s); // omitted copy, ==, accessors, ...
private:
void parse(const std::string& url_s);
private:
std::string protocol_, host_, path_, query_;
};
#endif /* URL_HH_ */
url.cc
#include "url.hh"
#include <string>
#include <algorithm>
#include <cctype>
#include <functional>
using namespace std;
// ctors, copy, equality, ...
void url::parse(const string& url_s)
{
const string prot_end("://");
string::const_iterator prot_i = search(url_s.begin(), url_s.end(),
prot_end.begin(), prot_end.end());
protocol_.reserve(distance(url_s.begin(), prot_i));
transform(url_s.begin(), prot_i,
back_inserter(protocol_),
ptr_fun<int,int>(tolower)); // protocol is icase
if( prot_i == url_s.end() )
return;
advance(prot_i, prot_end.length());
string::const_iterator path_i = find(prot_i, url_s.end(), '/');
host_.reserve(distance(prot_i, path_i));
transform(prot_i, path_i,
back_inserter(host_),
ptr_fun<int,int>(tolower)); // host is icase
string::const_iterator query_i = find(path_i, url_s.end(), '?');
path_.assign(path_i, query_i);
if( query_i != url_s.end() )
++query_i;
query_.assign(query_i, url_s.end());
}
main.cc
// ...
url u("HTTP://stackoverflow.com/questions/2616011/parse-a.py?url=1");
cout << u.protocol() << '\t' << u.host() << ...

POCO's URI class can parse URLs for you. The following example is shortened version of the one in POCO URI and UUID slides:
#include "Poco/URI.h"
#include <iostream>
int main(int argc, char** argv)
{
Poco::URI uri1("http://www.appinf.com:88/sample?example-query#frag");
std::string scheme(uri1.getScheme()); // "http"
std::string auth(uri1.getAuthority()); // "www.appinf.com:88"
std::string host(uri1.getHost()); // "www.appinf.com"
unsigned short port = uri1.getPort(); // 88
std::string path(uri1.getPath()); // "/sample"
std::string query(uri1.getQuery()); // "example-query"
std::string frag(uri1.getFragment()); // "frag"
std::string pathEtc(uri1.getPathEtc()); // "/sample?example-query#frag"
return 0;
}

For completeness, there is one written in C that you could use (with a little wrapping, no doubt): https://uriparser.github.io/
[RFC-compliant and supports Unicode]
Here's a very basic wrapper I've been using for simply grabbing the results of a parse.
#include <string>
#include <uriparser/Uri.h>
namespace uriparser
{
class Uri //: boost::noncopyable
{
public:
Uri(std::string uri)
: uri_(uri)
{
UriParserStateA state_;
state_.uri = &uriParse_;
isValid_ = uriParseUriA(&state_, uri_.c_str()) == URI_SUCCESS;
}
~Uri() { uriFreeUriMembersA(&uriParse_); }
bool isValid() const { return isValid_; }
std::string scheme() const { return fromRange(uriParse_.scheme); }
std::string host() const { return fromRange(uriParse_.hostText); }
std::string port() const { return fromRange(uriParse_.portText); }
std::string path() const { return fromList(uriParse_.pathHead, "/"); }
std::string query() const { return fromRange(uriParse_.query); }
std::string fragment() const { return fromRange(uriParse_.fragment); }
private:
std::string uri_;
UriUriA uriParse_;
bool isValid_;
std::string fromRange(const UriTextRangeA & rng) const
{
return std::string(rng.first, rng.afterLast);
}
std::string fromList(UriPathSegmentA * xs, const std::string & delim) const
{
UriPathSegmentStructA * head(xs);
std::string accum;
while (head)
{
accum += delim + fromRange(head->text);
head = head->next;
}
return accum;
}
};
}

//sudo apt-get install libboost-all-dev; #install boost
//g++ urlregex.cpp -lboost_regex; #compile
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc, char* argv[])
{
string url="https://www.google.com:443/webhp?gws_rd=ssl#q=cpp";
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url.c_str(), what, ex))
{
cout << "protocol: " << string(what[1].first, what[1].second) << endl;
cout << "domain: " << string(what[2].first, what[2].second) << endl;
cout << "port: " << string(what[3].first, what[3].second) << endl;
cout << "path: " << string(what[4].first, what[4].second) << endl;
cout << "query: " << string(what[5].first, what[5].second) << endl;
cout << "fragment: " << string(what[6].first, what[6].second) << endl;
}
return 0;
}

The Poco library now has a class for dissecting URI's and feeding back the host, path segments and query string etc.
https://pocoproject.org/pro/docs/Poco.URI.html

QT has QUrl for this. GNOME has SoupURI in libsoup, which you'll probably find a little more light-weight.

Facebook's Folly library can do the job for you easily. Simply use the Uri class:
#include <folly/Uri.h>
int main() {
folly::Uri folly("https://code.facebook.com/posts/177011135812493/");
folly.scheme(); // https
folly.host(); // code.facebook.com
folly.path(); // posts/177011135812493/
}

I know this is a very old question, but I've found the following useful:
http://www.zedwood.com/article/cpp-boost-url-regex
It gives 3 examples:
(With Boost)
//sudo apt-get install libboost-all-dev;
//g++ urlregex.cpp -lboost_regex
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using std::string;
using std::cout;
using std::endl;
using std::stringstream;
void parse_url(const string& url) //with boost
{
boost::regex ex("(http|https)://([^/ :]+):?([^/ ]*)(/?[^ #?]*)\\x3f?([^ #]*)#?([^ ]*)");
boost::cmatch what;
if(regex_match(url.c_str(), what, ex))
{
string protocol = string(what[1].first, what[1].second);
string domain = string(what[2].first, what[2].second);
string port = string(what[3].first, what[3].second);
string path = string(what[4].first, what[4].second);
string query = string(what[5].first, what[5].second);
cout << "[" << url << "]" << endl;
cout << protocol << endl;
cout << domain << endl;
cout << port << endl;
cout << path << endl;
cout << query << endl;
cout << "-------------------------------" << endl;
}
}
int main(int argc, char* argv[])
{
parse_url("http://www.google.com");
parse_url("https://mail.google.com/mail/");
parse_url("https://www.google.com:443/webhp?gws_rd=ssl");
return 0;
}
(Without Boost)
#include <string>
#include <iostream>
using std::string;
using std::cout;
using std::endl;
using std::stringstream;
string _trim(const string& str)
{
size_t start = str.find_first_not_of(" \n\r\t");
size_t until = str.find_last_not_of(" \n\r\t");
string::const_iterator i = start==string::npos ? str.begin() : str.begin() + start;
string::const_iterator x = until==string::npos ? str.end() : str.begin() + until+1;
return string(i,x);
}
void parse_url(const string& raw_url) //no boost
{
string path,domain,x,protocol,port,query;
int offset = 0;
size_t pos1,pos2,pos3,pos4;
x = _trim(raw_url);
offset = offset==0 && x.compare(0, 8, "https://")==0 ? 8 : offset;
offset = offset==0 && x.compare(0, 7, "http://" )==0 ? 7 : offset;
pos1 = x.find_first_of('/', offset+1 );
path = pos1==string::npos ? "" : x.substr(pos1);
domain = string( x.begin()+offset, pos1 != string::npos ? x.begin()+pos1 : x.end() );
path = (pos2 = path.find("#"))!=string::npos ? path.substr(0,pos2) : path;
port = (pos3 = domain.find(":"))!=string::npos ? domain.substr(pos3+1) : "";
domain = domain.substr(0, pos3!=string::npos ? pos3 : domain.length());
protocol = offset > 0 ? x.substr(0,offset-3) : "";
query = (pos4 = path.find("?"))!=string::npos ? path.substr(pos4+1) : "";
path = pos4!=string::npos ? path.substr(0,pos4) : path;
cout << "[" << raw_url << "]" << endl;
cout << "protocol: " << protocol << endl;
cout << "domain: " << domain << endl;
cout << "port: " << port << endl;
cout << "path: " << path << endl;
cout << "query: " << query << endl;
}
int main(int argc, char* argv[])
{
parse_url("http://www.google.com");
parse_url("https://mail.google.com/mail/");
parse_url("https://www.google.com:443/webhp?gws_rd=ssl");
return 0;
}
(Different way without Boost)
#include <string>
#include <stdint.h>
#include <cstring>
#include <sstream>
#include <algorithm>
#include <iostream>
using std::cerr; using std::cout; using std::endl;
using std::string;
class HTTPURL
{
private:
string _protocol;// http vs https
string _domain; // mail.google.com
uint16_t _port; // 80,443
string _path; // /mail/
string _query; // [after ?] a=b&c=b
public:
const string &protocol;
const string &domain;
const uint16_t &port;
const string &path;
const string &query;
HTTPURL(const string& url): protocol(_protocol),domain(_domain),port(_port),path(_path),query(_query)
{
string u = _trim(url);
size_t offset=0, slash_pos, hash_pos, colon_pos, qmark_pos;
string urlpath,urldomain,urlport;
uint16_t default_port;
static const char* allowed[] = { "https://", "http://", "ftp://", NULL};
for(int i=0; allowed[i]!=NULL && this->_protocol.length()==0; i++)
{
const char* c=allowed[i];
if (u.compare(0,strlen(c), c)==0) {
offset = strlen(c);
this->_protocol=string(c,0,offset-3);
}
}
default_port = this->_protocol=="https" ? 443 : 80;
slash_pos = u.find_first_of('/', offset+1 );
urlpath = slash_pos==string::npos ? "/" : u.substr(slash_pos);
urldomain = string( u.begin()+offset, slash_pos != string::npos ? u.begin()+slash_pos : u.end() );
urlpath = (hash_pos = urlpath.find("#"))!=string::npos ? urlpath.substr(0,hash_pos) : urlpath;
urlport = (colon_pos = urldomain.find(":"))!=string::npos ? urldomain.substr(colon_pos+1) : "";
urldomain = urldomain.substr(0, colon_pos!=string::npos ? colon_pos : urldomain.length());
this->_domain = _tolower(urldomain);
this->_query = (qmark_pos = urlpath.find("?"))!=string::npos ? urlpath.substr(qmark_pos+1) : "";
this->_path = qmark_pos!=string::npos ? urlpath.substr(0,qmark_pos) : urlpath;
this->_port = urlport.length()==0 ? default_port : _atoi(urlport) ;
};
private:
static inline string _trim(const string& input)
{
string str = input;
size_t endpos = str.find_last_not_of(" \t\n\r");
if( string::npos != endpos )
{
str = str.substr( 0, endpos+1 );
}
size_t startpos = str.find_first_not_of(" \t\n\r");
if( string::npos != startpos )
{
str = str.substr( startpos );
}
return str;
};
static inline string _tolower(const string& input)
{
string str = input;
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
};
static inline int _atoi(const string& input)
{
int r;
std::stringstream(input) >> r;
return r;
};
};
int main(int argc, char **argv)
{
HTTPURL u("https://Mail.google.com:80/mail/?action=send#action=send");
cout << "protocol: " << u.protocol << endl;
cout << "domain: " << u.domain << endl;
cout << "port: " << u.port << endl;
cout << "path: " << u.path << endl;
cout << "query: " << u.query << endl;
return 0;
}

This library is very tiny and lightweight: https://github.com/corporateshark/LUrlParser
However, it is parsing only, no URL normalization/validation.

Also of interest could be http://code.google.com/p/uri-grammar/ which like Dean Michael's netlib uses boost spirit to parse a URI. Came across it at Simple expression parser example using Boost::Spirit?

There is the newly released google-url lib:
http://code.google.com/p/google-url/
The library provides a low-level url parsing API as well as a higher-level abstraction called GURL. Here's an example using that:
#include <googleurl\src\gurl.h>
wchar_t url[] = L"http://www.facebook.com";
GURL parsedUrl (url);
assert(parsedUrl.DomainIs("facebook.com"));
Two small complaints I have with it: (1) it wants to use ICU by default to deal with different string encodings and (2) it makes some assumptions about logging (but I think they can be disabled). In other words, the library is not completely stand-alone as it exists, but I think it's still a good basis to start with, especially if you are already using ICU.

May I offer another self-contained solution based on std::regex :
const char* SCHEME_REGEX = "((http[s]?)://)?"; // match http or https before the ://
const char* USER_REGEX = "(([^#/:\\s]+)#)?"; // match anything other than # / : or whitespace before the ending #
const char* HOST_REGEX = "([^#/:\\s]+)"; // mandatory. match anything other than # / : or whitespace
const char* PORT_REGEX = "(:([0-9]{1,5}))?"; // after the : match 1 to 5 digits
const char* PATH_REGEX = "(/[^:#?\\s]*)?"; // after the / match anything other than : # ? or whitespace
const char* QUERY_REGEX = "(\\?(([^?;&#=]+=[^?;&#=]+)([;|&]([^?;&#=]+=[^?;&#=]+))*))?"; // after the ? match any number of x=y pairs, seperated by & or ;
const char* FRAGMENT_REGEX = "(#([^#\\s]*))?"; // after the # match anything other than # or whitespace
bool parseUri(const std::string &i_uri)
{
static const std::regex regExpr(std::string("^")
+ SCHEME_REGEX + USER_REGEX
+ HOST_REGEX + PORT_REGEX
+ PATH_REGEX + QUERY_REGEX
+ FRAGMENT_REGEX + "$");
std::smatch matchResults;
if (std::regex_match(i_uri.cbegin(), i_uri.cend(), matchResults, regExpr))
{
m_scheme.assign(matchResults[2].first, matchResults[2].second);
m_user.assign(matchResults[4].first, matchResults[4].second);
m_host.assign(matchResults[5].first, matchResults[5].second);
m_port.assign(matchResults[7].first, matchResults[7].second);
m_path.assign(matchResults[8].first, matchResults[8].second);
m_query.assign(matchResults[10].first, matchResults[10].second);
m_fragment.assign(matchResults[15].first, matchResults[15].second);
return true;
}
return false;
}
I added explanations for each part of the regular expression. This way allows you to choose exactly the relevant parts to parse for the URL that you're expecting to get. Just remember to change the desired regular expression group indices accordingly.

A small dependency you can use is uriparser, which recently moved to GitHub.
You can find a minimal example in their code: https://github.com/uriparser/uriparser/blob/63384be4fb8197264c55ff53a135110ecd5bd8c4/tool/uriparse.c
This will be more lightweight than Boost or Poco. The only catch is that it is C.
There is also a Buckaroo package:
buckaroo add github.com/buckaroo-pm/uriparser

I tried a couple of the solutions here, but then decided to write my own that could just be dropped into a project without any external dependencies (except c++17).
Right now, it passes all tests. But, if you find any cases that don't succeed, please feel free to create a Pull Request or an Issue.
I'll keep it up to date and improve its quality. Suggestions welcome! I'm also trying out this design to only have a single, high-quality class per repository so that the header and source can just be dropped into a project (as opposed to building a library or header-only). It appears to be working out well (I'm using git submodules and symlinks in my own projects).
https://github.com/homer6/url

You could try the open-source library called C++ REST SDK (created by Microsoft, distributed under the Apache License 2.0). It can be built for several platforms including Windows, Linux, OSX, iOS, Android). There is a class called web::uri where you put in a string and can retrieve individual URL components. Here is a code sample (tested on Windows):
#include <cpprest/base_uri.h>
#include <iostream>
#include <ostream>
web::uri sample_uri( L"http://dummyuser#localhost:7777/dummypath?dummyquery#dummyfragment" );
std::wcout << L"scheme: " << sample_uri.scheme() << std::endl;
std::wcout << L"user: " << sample_uri.user_info() << std::endl;
std::wcout << L"host: " << sample_uri.host() << std::endl;
std::wcout << L"port: " << sample_uri.port() << std::endl;
std::wcout << L"path: " << sample_uri.path() << std::endl;
std::wcout << L"query: " << sample_uri.query() << std::endl;
std::wcout << L"fragment: " << sample_uri.fragment() << std::endl;
The output will be:
scheme: http
user: dummyuser
host: localhost
port: 7777
path: /dummypath
query: dummyquery
fragment: dummyfragment
There are also other easy-to-use methods, e.g. to access individual attribute/value pairs from the query, split the path into components, etc.

If you use oatpp for web request handling, you can find its built-in URL parsing useful:
std::string url = /* ... */;
oatpp::String oatUrl(url.c_str(), url.size(), false);
oatpp::String oatHost = oatpp::network::Url::Parser::parseUrl(oatUrl).authority.host->toLowerCase();
std::string host(oatHost->c_str(), oatHost->getSize());
The above snippet retrieves the hostname. In a similar way:
oatpp::network::Url parsedUrl = oatpp::network::Url::Parser::parseUrl(oatUrl);
// parsedUrl.authority.port
// parsedUrl.path
// parsedUrl.scheme
// parsedUrl.queryParams

There is yet another library https://snapwebsites.org/project/libtld which handles all possible top level domains and URI shema

I have developed an "object oriented" solution, one C++ class, that works with one regex like #Mr.Jones and #velcrow solutions. My Url class performs url/uri 'parsing'.
I think I improved velcrow regex to be more robust and includes also the username part.
Follows the first version of my idea, I have released the same code, improved, in my GPL3 licensed open source project Cpp URL Parser.
Omitted #ifdef/ndef bloat part, follows Url.h
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
class Url {
public:
boost::regex ex;
string rawUrl;
string username;
string protocol;
string domain;
string port;
string path;
string query;
string fragment;
Url();
Url(string &rawUrl);
Url &update(string &rawUrl);
};
This is the code of the Url.cpp implementation file:
#include "Url.h"
Url::Url() {
this -> ex = boost::regex("(ssh|sftp|ftp|smb|http|https):\\/\\/(?:([^# ]*)#)?([^:?# ]+)(?::(\\d+))?([^?# ]*)(?:\\?([^# ]*))?(?:#([^ ]*))?");
}
Url::Url(string &rawUrl) : Url() {
this->rawUrl = rawUrl;
this->update(this->rawUrl);
}
Url &Url::update(string &rawUrl) {
this->rawUrl = rawUrl;
boost::cmatch what;
if (regex_match(rawUrl.c_str(), what, ex)) {
this -> protocol = string(what[1].first, what[1].second);
this -> username = string(what[2].first, what[2].second);
this -> domain = string(what[3].first, what[3].second);
this -> port = string(what[4].first, what[4].second);
this -> path = string(what[5].first, what[5].second);
this -> query = string(what[6].first, what[6].second);
this -> fragment = string(what[7].first, what[7].second);
}
return *this;
}
Usage example:
string urlString = "http://gino#ciao.it:67/ciao?roba=ciao#34";
Url *url = new Url(urlString);
std::cout << " username: " << url->username << " URL domain: " << url->domain;
std::cout << " port: " << url->port << " protocol: " << url->protocol;
You can also update the Url object to represent (and parse) another URL:
url.update("http://gino#nuovociao.it:68/nuovociao?roba=ciaoooo#")
I'm not a full-time C++ developer, so, I'm not sure I followed 100% C++ best-practises.
Any tip is appreciated.
P.s: let's look at Cpp URL Parser, there are refinements there.
Have fun

simple solution to get the protocol, host, path
int url_get(const std::string& uri)
{
//parse URI
std::size_t start = uri.find("://", 0);
if (start == std::string::npos)
{
return -1;
}
start += 3; //"://"
std::size_t end = uri.find("/", start + 1);
std::string protocol = uri.substr(0, start - 3);
std::string host = uri.substr(start, end - start);
std::string path = uri.substr(end);
return 0;
}

How do I check if a C++ std::string starts with a certain string, and convert a substring to an int?

How do I implement the following (Python pseudocode) in C++?
if argv[1].startswith('--foo='):
foo_value = int(argv[1][len('--foo='):])
(For example, if argv[1] is --foo=98, then foo_value is 98.)
Update: I'm hesitant to look into Boost, since I'm just looking at making a very small change to a simple little command-line tool (I'd rather not have to learn how to link in and use Boost for a minor change).

Use rfind overload that takes the search position pos parameter, and pass zero for it:
std::string s = "tititoto";
if (s.rfind("titi", 0) == 0) { // pos=0 limits the search to the prefix
// s starts with prefix
}
Who needs anything else? Pure STL!
Many have misread this to mean "search backwards through the whole string looking for the prefix". That would give the wrong result (e.g. string("tititito").rfind("titi") returns 2 so when compared against == 0 would return false) and it would be inefficient (looking through the whole string instead of just the start). But it does not do that because it passes the pos parameter as 0, which limits the search to only match at that position or earlier. For example:
std::string test = "0123123";
size_t match1 = test.rfind("123"); // returns 4 (rightmost match)
size_t match2 = test.rfind("123", 2); // returns 1 (skipped over later match)
size_t match3 = test.rfind("123", 0); // returns std::string::npos (i.e. not found)

You would do it like this:
std::string prefix("--foo=");
if (!arg.compare(0, prefix.size(), prefix))
foo_value = std::stoi(arg.substr(prefix.size()));
Looking for a lib such as Boost.ProgramOptions that does this for you is also a good idea.

Just for completeness, I will mention the C way to do it:
If str is your original string, substr is the substring you want to
check, then
strncmp(str, substr, strlen(substr))
will return 0 if str
starts with substr. The functions strncmp and strlen are in the C
header file <string.h>
(originally posted by Yaseen Rauf here, markup added)
For a case-insensitive comparison, use strnicmp instead of strncmp.
This is the C way to do it, for C++ strings you can use the same function like this:
strncmp(str.c_str(), substr.c_str(), substr.size())

If you're already using Boost, you can do it with boost string algorithms + boost lexical cast:
#include <boost/algorithm/string/predicate.hpp>
#include <boost/lexical_cast.hpp>
try {
if (boost::starts_with(argv[1], "--foo="))
foo_value = boost::lexical_cast<int>(argv[1]+6);
} catch (boost::bad_lexical_cast) {
// bad parameter
}
This kind of approach, like many of the other answers provided here is ok for very simple tasks, but in the long run you are usually better off using a command line parsing library. Boost has one (Boost.Program_options), which may make sense if you happen to be using Boost already.
Otherwise a search for "c++ command line parser" will yield a number of options.

Code I use myself:
std::string prefix = "-param=";
std::string argument = argv[1];
if(argument.substr(0, prefix.size()) == prefix) {
std::string argumentValue = argument.substr(prefix.size());
}

Nobody used the STL algorithm/mismatch function yet. If this returns true, prefix is a prefix of 'toCheck':
std::mismatch(prefix.begin(), prefix.end(), toCheck.begin()).first == prefix.end()
Full example prog:
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, char** argv) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "Will print true if 'prefix' is a prefix of string" << std::endl;
return -1;
}
std::string prefix(argv[1]);
std::string toCheck(argv[2]);
if (prefix.length() > toCheck.length()) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "'prefix' is longer than 'string'" << std::endl;
return 2;
}
if (std::mismatch(prefix.begin(), prefix.end(), toCheck.begin()).first == prefix.end()) {
std::cout << '"' << prefix << '"' << " is a prefix of " << '"' << toCheck << '"' << std::endl;
return 0;
} else {
std::cout << '"' << prefix << '"' << " is NOT a prefix of " << '"' << toCheck << '"' << std::endl;
return 1;
}
}
Edit:
As #James T. Huggett suggests, std::equal is a better fit for the question: Is A a prefix of B? and is slight shorter code:
std::equal(prefix.begin(), prefix.end(), toCheck.begin())
Full example prog:
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, char **argv) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "Will print true if 'prefix' is a prefix of string"
<< std::endl;
return -1;
}
std::string prefix(argv[1]);
std::string toCheck(argv[2]);
if (prefix.length() > toCheck.length()) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "'prefix' is longer than 'string'" << std::endl;
return 2;
}
if (std::equal(prefix.begin(), prefix.end(), toCheck.begin())) {
std::cout << '"' << prefix << '"' << " is a prefix of " << '"' << toCheck
<< '"' << std::endl;
return 0;
} else {
std::cout << '"' << prefix << '"' << " is NOT a prefix of " << '"'
<< toCheck << '"' << std::endl;
return 1;
}
}

With C++17 you can use std::basic_string_view & with C++20 std::basic_string::starts_with or std::basic_string_view::starts_with.
The benefit of std::string_view in comparison to std::string - regarding memory management - is that it only holds a pointer to a "string" (contiguous sequence of char-like objects) and knows its size. Example without moving/copying the source strings just to get the integer value:
#include <exception>
#include <iostream>
#include <string>
#include <string_view>
int main()
{
constexpr auto argument = "--foo=42"; // Emulating command argument.
constexpr auto prefix = "--foo=";
auto inputValue = 0;
constexpr auto argumentView = std::string_view(argument);
if (argumentView.starts_with(prefix))
{
constexpr auto prefixSize = std::string_view(prefix).size();
try
{
// The underlying data of argumentView is nul-terminated, therefore we can use data().
inputValue = std::stoi(argumentView.substr(prefixSize).data());
}
catch (std::exception & e)
{
std::cerr << e.what();
}
}
std::cout << inputValue; // 42
}

Given that both strings — argv[1] and "--foo" — are C strings, #FelixDombek's answer is hands-down the best solution.
Seeing the other answers, however, I thought it worth noting that, if your text is already available as a std::string, then a simple, zero-copy, maximally efficient solution exists that hasn't been mentioned so far:
const char * foo = "--foo";
if (text.rfind(foo, 0) == 0)
foo_value = text.substr(strlen(foo));
And if foo is already a string:
std::string foo("--foo");
if (text.rfind(foo, 0) == 0)
foo_value = text.substr(foo.length());

Starting with C++20, you can use the starts_with method.
std::string s = "abcd";
if (s.starts_with("abc")) {
...
}

text.substr(0, start.length()) == start

Using STL this could look like:
std::string prefix = "--foo=";
std::string arg = argv[1];
if (prefix.size()<=arg.size() && std::equal(prefix.begin(), prefix.end(), arg.begin())) {
std::istringstream iss(arg.substr(prefix.size()));
iss >> foo_value;
}

At the risk of being flamed for using C constructs, I do think this sscanf example is more elegant than most Boost solutions. And you don't have to worry about linkage if you're running anywhere that has a Python interpreter!
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
for (int i = 1; i != argc; ++i) {
int number = 0;
int size = 0;
sscanf(argv[i], "--foo=%d%n", &number, &size);
if (size == strlen(argv[i])) {
printf("number: %d\n", number);
}
else {
printf("not-a-number\n");
}
}
return 0;
}
Here's some example output that demonstrates the solution handles leading/trailing garbage as correctly as the equivalent Python code, and more correctly than anything using atoi (which will erroneously ignore a non-numeric suffix).
$ ./scan --foo=2 --foo=2d --foo='2 ' ' --foo=2'
number: 2
not-a-number
not-a-number
not-a-number

I use std::string::compare wrapped in utility method like below:
static bool startsWith(const string& s, const string& prefix) {
return s.size() >= prefix.size() && s.compare(0, prefix.size(), prefix) == 0;
}

C++20 update :
Use std::string::starts_with
https://en.cppreference.com/w/cpp/string/basic_string/starts_with
std::string str_value = /* smthg */;
const auto starts_with_foo = str_value.starts_with(std::string_view{"foo"});

In C++20 now there is starts_with available as a member function of std::string defined as:
constexpr bool starts_with(string_view sv) const noexcept;
constexpr bool starts_with(CharT c) const noexcept;
constexpr bool starts_with(const CharT* s) const;
So your code could be something like this:
std::string s{argv[1]};
if (s.starts_with("--foo="))

In case you need C++11 compatibility and cannot use boost, here is a boost-compatible drop-in with an example of usage:
#include <iostream>
#include <string>
static bool starts_with(const std::string str, const std::string prefix)
{
return ((prefix.size() <= str.size()) && std::equal(prefix.begin(), prefix.end(), str.begin()));
}
int main(int argc, char* argv[])
{
bool usage = false;
unsigned int foos = 0; // default number of foos if no parameter was supplied
if (argc > 1)
{
const std::string fParamPrefix = "-f="; // shorthand for foo
const std::string fooParamPrefix = "--foo=";
for (unsigned int i = 1; i < argc; ++i)
{
const std::string arg = argv[i];
try
{
if ((arg == "-h") || (arg == "--help"))
{
usage = true;
} else if (starts_with(arg, fParamPrefix)) {
foos = std::stoul(arg.substr(fParamPrefix.size()));
} else if (starts_with(arg, fooParamPrefix)) {
foos = std::stoul(arg.substr(fooParamPrefix.size()));
}
} catch (std::exception& e) {
std::cerr << "Invalid parameter: " << argv[i] << std::endl << std::endl;
usage = true;
}
}
}
if (usage)
{
std::cerr << "Usage: " << argv[0] << " [OPTION]..." << std::endl;
std::cerr << "Example program for parameter parsing." << std::endl << std::endl;
std::cerr << " -f, --foo=N use N foos (optional)" << std::endl;
return 1;
}
std::cerr << "number of foos given: " << foos << std::endl;
}

Why not use gnu getopts? Here's a basic example (without safety checks):
#include <getopt.h>
#include <stdio.h>
int main(int argc, char** argv)
{
option long_options[] = {
{"foo", required_argument, 0, 0},
{0,0,0,0}
};
getopt_long(argc, argv, "f:", long_options, 0);
printf("%s\n", optarg);
}
For the following command:
$ ./a.out --foo=33
You will get
33

Ok why the complicated use of libraries and stuff? C++ String objects overload the [] operator, so you can just compare chars.. Like what I just did, because I want to list all files in a directory and ignore invisible files and the .. and . pseudofiles.
while ((ep = readdir(dp)))
{
string s(ep->d_name);
if (!(s[0] == '.')) // Omit invisible files and .. or .
files.push_back(s);
}
It's that simple..

You can also use strstr:
if (strstr(str, substr) == substr) {
// 'str' starts with 'substr'
}
but I think it's good only for short strings because it has to loop through the whole string when the string doesn't actually start with 'substr'.

With C++11 or higher you can use find() and find_first_of()
Example using find to find a single char:
#include <string>
std::string name = "Aaah";
size_t found_index = name.find('a');
if (found_index != std::string::npos) {
// Found string containing 'a'
}
Example using find to find a full string & starting from position 5:
std::string name = "Aaah";
size_t found_index = name.find('h', 3);
if (found_index != std::string::npos) {
// Found string containing 'h'
}
Example using the find_first_of() and only the first char, to search at the start only:
std::string name = ".hidden._di.r";
size_t found_index = name.find_first_of('.');
if (found_index == 0) {
// Found '.' at first position in string
}
More about find
More about find_first_of
Good luck!

std::string text = "--foo=98";
std::string start = "--foo=";
if (text.find(start) == 0)
{
int n = stoi(text.substr(start.length()));
std::cout << n << std::endl;
}

Since C++11 std::regex_search can also be used to provide even more complex expressions matching. The following example handles also floating numbers thorugh std::stof and a subsequent cast to int.
However the parseInt method shown below could throw a std::invalid_argument exception if the prefix is not matched; this can be easily adapted depending on the given application:
#include <iostream>
#include <regex>
int parseInt(const std::string &str, const std::string &prefix) {
std::smatch match;
std::regex_search(str, match, std::regex("^" + prefix + "([+-]?(?=\\.?\\d)\\d*(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)$"));
return std::stof(match[1]);
}
int main() {
std::cout << parseInt("foo=13.3", "foo=") << std::endl;
std::cout << parseInt("foo=-.9", "foo=") << std::endl;
std::cout << parseInt("foo=+13.3", "foo=") << std::endl;
std::cout << parseInt("foo=-0.133", "foo=") << std::endl;
std::cout << parseInt("foo=+00123456", "foo=") << std::endl;
std::cout << parseInt("foo=-06.12e+3", "foo=") << std::endl;
// throw std::invalid_argument
// std::cout << parseInt("foo=1", "bar=") << std::endl;
return 0;
}
The kind of magic of the regex pattern is well detailed in the following answer.
EDIT: the previous answer did not performed the conversion to integer.

if(boost::starts_with(string_to_search, string_to_look_for))
intval = boost::lexical_cast<int>(string_to_search.substr(string_to_look_for.length()));
This is completely untested. The principle is the same as the Python one. Requires Boost.StringAlgo and Boost.LexicalCast.
Check if the string starts with the other string, and then get the substring ('slice') of the first string and convert it using lexical cast.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Search partial filenames in C++ using boost filesystem - c++

Related

Why is my string extraction function using back referencing in regex not working as intended?

How to put file in boost::interprocess::managed_shared_memory?

How to get the largest files in the current working directory in unix?

Easy way to parse a url in C++ cross platform?

How do I check if a C++ std::string starts with a certain string, and convert a substring to an int?

Categories

Resources