Find substring in string using locale - c++

I need to find if a string contains a substring, but according to the current locale's rules.
So, if I'm searching for the string "aba", with the Spanish locale, "cabalgar", "rábano" and "gabán" would all three contain it.
I know I can compare strings with locale information (collate), but is there any built-in or starightforward way to do the same with find, or do I have to write my own?
I'm fine using std::string (up to TR1) or MFC's CString

For reference, here is an implementation using boost locale compiled with ICU backend:
#include <iostream>
#include <boost/locale.hpp>
namespace bl = boost::locale;
std::locale usedLocale;
std::string normalize(const std::string& input)
{
const bl::collator<char>& collator = std::use_facet<bl::collator<char> >(usedLocale);
return collator.transform(bl::collator_base::primary, input);
}
bool contain(const std::string& op1, const std::string& op2){
std::string normOp2 = normalize(op2);
//Gotcha!! collator.transform() is returning an accessible null byte (\0) at
//the end of the string. Thats why we search till 'normOp2.length()-1'
return normalize(op1).find( normOp2.c_str(), 0, normOp2.length()-1 ) != std::string::npos;
}
int main()
{
bl::generator generator;
usedLocale = generator(""); //use default system locale
std::cout << std::boolalpha
<< contain("cabalgar", "aba") << "\n"
<< contain("rábano", "aba") << "\n"
<< contain("gabán", "aba") << "\n"
<< contain("gabán", "Âbã") << "\n"
<< contain("gabán", "aba.") << "\n"
}
Output:
true
true
true
true
false

You could loop over the string indices, and compare a substring with the string you want to find with std::strcoll.

I haven't used this before, but std::strxfrm looks to be what you could use:
http://en.cppreference.com/w/cpp/locale/collate/transform
#include <iostream>
#include <iomanip>
#include <cstring>
std::string xfrm(std::string const& input)
{
std::string result(1+std::strxfrm(nullptr, input.c_str(), 0), '\0');
std::strxfrm(&result[0], input.c_str(), result.size());
return result;
}
int main()
{
using namespace std;
setlocale(LC_ALL, "es_ES.UTF-8");
const string aba = "aba";
const string rabano = "rábano";
cout << "Without xfrm: " << aba << " in " << rabano << " == " <<
boolalpha << (string::npos != rabano.find(aba)) << "\n";
cout << "Using xfrm: " << aba << " in " << rabano << " == " <<
boolalpha << (string::npos != xfrm(rabano).find(xfrm(aba))) << "\n";
}
However, as you can see... This doesn't do what you want. See the comment at your question.

Related

How would I make an underline exactly the length of any text inputted as well as capitalizing every letter

Sorry I'm really new to programming and need some assistance. How would I make this happen. This is the function I currently have.
void DisplayTitle(string aTitle) {
cout << "\t" << aTitle << endl;
cout << "\t--------------\n\n";
}
How would I go about making sure that no matter which title is inputted, every character will be capitalized and the underscores will be the same amount of characters as the displayed title above.
You can use std::setfill combined with std::setw from <iomanip> as follows:
std::cout << std::setfill('-') << std::setw(title.size()) << "";
Here, you're telling the stream to use a padding character of '-', then a padded output size that's the length of your title, and then output an empty string. Because the string is empty, it will pad that entire area.
#include <iostream>
#include <iomanip>
#include <string>
void DisplayTitle(const std::string& title, const char* prefix = "\t")
{
std::cout << prefix << title << "\n";
std::cout << prefix << std::setfill('-') << std::setw(title.size()) << "" << "\n\n";
}
int main()
{
for (std::string title; std::getline(std::cin, title); )
{
DisplayTitle(title);
}
}
Example input:
One flew over the cuckoo's nest
The birds and the bees
Example output:
One flew over the cuckoo's nest
-------------------------------
The birds and the bees
----------------------
Here is a live demo of the above.
Oh, it seems I missed the fact your question was asking two things. You also want to capitalize the title. You can do that with std::transform, and in fact it can even be done without modifying the string:
void DisplayTitle(const std::string& title, const char* prefix = "\t")
{
// Write title in all-caps
std::cout << prefix;
std::transform(title.begin(), title.end(),
std::ostream_iterator<char>(std::cout),
[](char c) { return std::toupper(c); });
std::cout << "\n";
// Underline title
std::cout << prefix << std::setfill('-') << std::setw(title.size()) << "" << "\n\n";
}
Here is the updated live demo with the above change.
You can use std::transform and to_upper to capitalize the string.
You can use std::string's two-parameter constructor which takes a length and a character to generate a sequence of - of the same length as the title
Together we get:
#include <iostream>
#include <string>
#include <algorithm>
void DisplayTitle(std::string aTitle) {
std::transform(aTitle.begin(), aTitle.end(), aTitle.begin(), toupper);
std::cout << "\t" << aTitle << "\n";
std::cout << "\t" << std::string(aTitle.length(), '-') << "\n\n";
}
int main()
{
for (std::string title; std::getline(std::cin, title); )
{
DisplayTitle(title);
}
}
demo on godbolt

C++ regular expression sscanf

I wanto to use sscanf to extrat the 2 first integer (5 and 10) in a string
rssi = 5
ber = 10
like this :
#include <stdio.h>
#include <iostream>
#include <string>
using namespace std;
int main() {
std::string str = "\r\n+CSQ: 5,10\r\n\r\nOK\r\n7556\r\n";
unsigned char lBufRX[100];
char *rssi, *ber;
if((sscanf(str.c_str(), "%*[^:]: %s,%s[^\n]", rssi, ber)) != 2) {
std::cout <<"[" << rssi << "]" << "[" << ber << "]" << std::endl;
}
return 0;
}
The result is bad. Can anyone help me ?
My output is : "[5,10][" with " %*[^:]: " i read until the first integer so "5" with ",%s[^\n]" i read the second integer so "10" until \r\n
Thanks
You have several errors. You're using char* although you want to read two ints. But then you don't allocate memory for them. Also, you're expecting two successful parses but check with != 2. Here is some code that works:
int main() {
std::string str = "\r\n+CSQ: 5,10\r\n\r\nOK\r\n7556\r\n";
int rssi, ber;
if((sscanf(str.c_str(), "%*[^:]: %d,%d", &rssi, &ber)) == 2) {
std::cout <<"[" << rssi << "]" << "[" << ber << "]" << std::endl;
}
return 0;
}

What am I doing wrong here with find and string?

I am asking user to enter date in format with slashes. Then I try to find the slashes in the string using find. I get error saying I cannot compare pointer with integer on my if statement. Here is code.
// test inputing string date formats
#include <iostream>
#include <string>
#include <algorithm>
int main() {
std::string dateString;
int month,day,year;
std::cout << "Enter a date in format of 5/14/1999: ";
std::getline(std::cin,dateString);
std::cout << "You entered " << dateString << std::endl;
if (std::find(dateString.begin(),dateString.end(),"/") != dateString.end()) {
std::cout << "Found slash in date.\n";
}
else {
std::cout << "screwed it up.\n";
}
}
Any help is appreciated.
if (std::find(dateString.begin(),dateString.end(),"/") != dateString.end()) {
"/" is a literal string, or a const char * (actually a const char[2] in this case, to be pedantic, but this is not germane) . The third parameter to std::find, in this case, should be a char, a single character.
You probably meant
if (std::find(dateString.begin(),dateString.end(),'/') != dateString.end()) {
I think you can use
if (dateString.find("/") != std::string::npos) {
std::cout << "Found slash in date.\n";
} else {
std::cout << "screwed it up.\n";
}
to find substring/char in a string. Note that std::string::find() works for char, const char * and std::string.

Passing variable (array type) from function to "main" scope Type: std::tr1::match_results<std::string::const_iterator>

I would like to pass the variable from a function to the main scope which I'm calling, I'm trying to do like I use to do in C but it returns nothing.
I want to be able to output and deal with it after the return of the function
#include "StdAfx.h"
#include <regex>
#include <iostream>
#include <string>
#include <conio.h>
using namespace std;
std::tr1::match_results<std::string::const_iterator> match(std::string& regex, const std::string& ip,std::tr1::match_results<std::string::const_iterator> res)
{
const std::tr1::regex pattern(regex.c_str());
bool valid = std::tr1::regex_match(ip, res, pattern);
std::cout << ip << " \t: " << (valid ? "valid" : "invalid") << std::endl;
cout << "FIRST RES FOUND: " << res[1] << endl;
return res;
}
int main()
{
string regex = "(\\d{1,3}):(\\d{1,3}):(\\d{1,3}):(\\d{1,3})";
string ip = "49:22:33:444";
std::tr1::match_results<std::string::const_iterator> res;
match(regex,ip.c_str(), res);
cout << "Result >" << res[1] << "< " << endl;
_getch(); return 0;
}
When I compile and run, The output is: "FIRST RES FOUND: 49
Result ><"
It's probably a really simple solution but what do I have to do to set it for my main can read it correctly as in: "Result >49<"
Thanks in advance. :)
Option 1: Use references:
void match(string& regex, const string& ip, tr1::match_results<string::const_iterator> & res)
{
const tr1::regex pattern(regex.c_str());
bool valid = tr1::regex_match(ip, res, pattern);
cout << ip << " \t: " << (valid ? "valid" : "invalid") << endl;
cout << "FIRST RES FOUND: " << res[1] << endl;
}
Option 2: Return the result by value and store it:
tr1::match_results<string::const_iterator> match(string& regex, const string& ip)
{
tr1::match_results<string::const_iterator> res;
// ...
return res;
}
int main()
{
// ...
tr1::match_results<string::const_iterator> res = match(regex, ip);
}
On a separate note, there should be absolutely no need for all the c_str() calls, as <regex> has a perfectly functional std::string interface. Check the documentation for details, you just have to get a couple of typenames right.
Edit: Here are some basic examples on using std::string. There are equivalent constructions for std::wstring, char* and wchar_t*, but std::strings should be the most useful one.
Since <regex> support is still patchy, you should consider the TR1 and Boost alternatives, too; I provide all three and you can pick one:
namespace ns = std; // for <regex>
namespace ns = std::tr1; // for <tr1/regex>
namespace ns = boost; // for <boost/regex.hpp>
ns::regex r("");
ns::smatch rxres; // 's' for 'string'
std::string data = argv[1]; // the data to be matched
// Fun #1: Search once
if (!ns::regex_search(data, rxres, r))
{
std::cout << "No match." << std::endl;
return 0;
}
// Fun #2: Iterate over all matches
ns::sregex_iterator rt(data.begin(), data.end(), r), rend;
for ( ; rt != rend; ++rt)
{
// *rt is the entire match object
for (auto it = rt->begin(), end = rt->end(); it != end; ++it)
{
// *it is the current capture group; the first one is the entire match
std::cout << " Match[" << std::distance(rt->begin(), it) << "]: " << *it << ", length " << it->length() << std::endl;
}
}
Don't forget to handle exceptions of type ns::regex_error.
Pass in res by reference instead of by value. In other words, declare the parameter res as a reference instead of a value, i.e., type &res, not type res.

How do I check if a C++ std::string starts with a certain string, and convert a substring to an int?

How do I implement the following (Python pseudocode) in C++?
if argv[1].startswith('--foo='):
foo_value = int(argv[1][len('--foo='):])
(For example, if argv[1] is --foo=98, then foo_value is 98.)
Update: I'm hesitant to look into Boost, since I'm just looking at making a very small change to a simple little command-line tool (I'd rather not have to learn how to link in and use Boost for a minor change).
Use rfind overload that takes the search position pos parameter, and pass zero for it:
std::string s = "tititoto";
if (s.rfind("titi", 0) == 0) { // pos=0 limits the search to the prefix
// s starts with prefix
}
Who needs anything else? Pure STL!
Many have misread this to mean "search backwards through the whole string looking for the prefix". That would give the wrong result (e.g. string("tititito").rfind("titi") returns 2 so when compared against == 0 would return false) and it would be inefficient (looking through the whole string instead of just the start). But it does not do that because it passes the pos parameter as 0, which limits the search to only match at that position or earlier. For example:
std::string test = "0123123";
size_t match1 = test.rfind("123"); // returns 4 (rightmost match)
size_t match2 = test.rfind("123", 2); // returns 1 (skipped over later match)
size_t match3 = test.rfind("123", 0); // returns std::string::npos (i.e. not found)
You would do it like this:
std::string prefix("--foo=");
if (!arg.compare(0, prefix.size(), prefix))
foo_value = std::stoi(arg.substr(prefix.size()));
Looking for a lib such as Boost.ProgramOptions that does this for you is also a good idea.
Just for completeness, I will mention the C way to do it:
If str is your original string, substr is the substring you want to
check, then
strncmp(str, substr, strlen(substr))
will return 0 if str
starts with substr. The functions strncmp and strlen are in the C
header file <string.h>
(originally posted by Yaseen Rauf here, markup added)
For a case-insensitive comparison, use strnicmp instead of strncmp.
This is the C way to do it, for C++ strings you can use the same function like this:
strncmp(str.c_str(), substr.c_str(), substr.size())
If you're already using Boost, you can do it with boost string algorithms + boost lexical cast:
#include <boost/algorithm/string/predicate.hpp>
#include <boost/lexical_cast.hpp>
try {
if (boost::starts_with(argv[1], "--foo="))
foo_value = boost::lexical_cast<int>(argv[1]+6);
} catch (boost::bad_lexical_cast) {
// bad parameter
}
This kind of approach, like many of the other answers provided here is ok for very simple tasks, but in the long run you are usually better off using a command line parsing library. Boost has one (Boost.Program_options), which may make sense if you happen to be using Boost already.
Otherwise a search for "c++ command line parser" will yield a number of options.
Code I use myself:
std::string prefix = "-param=";
std::string argument = argv[1];
if(argument.substr(0, prefix.size()) == prefix) {
std::string argumentValue = argument.substr(prefix.size());
}
Nobody used the STL algorithm/mismatch function yet. If this returns true, prefix is a prefix of 'toCheck':
std::mismatch(prefix.begin(), prefix.end(), toCheck.begin()).first == prefix.end()
Full example prog:
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, char** argv) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "Will print true if 'prefix' is a prefix of string" << std::endl;
return -1;
}
std::string prefix(argv[1]);
std::string toCheck(argv[2]);
if (prefix.length() > toCheck.length()) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "'prefix' is longer than 'string'" << std::endl;
return 2;
}
if (std::mismatch(prefix.begin(), prefix.end(), toCheck.begin()).first == prefix.end()) {
std::cout << '"' << prefix << '"' << " is a prefix of " << '"' << toCheck << '"' << std::endl;
return 0;
} else {
std::cout << '"' << prefix << '"' << " is NOT a prefix of " << '"' << toCheck << '"' << std::endl;
return 1;
}
}
Edit:
As #James T. Huggett suggests, std::equal is a better fit for the question: Is A a prefix of B? and is slight shorter code:
std::equal(prefix.begin(), prefix.end(), toCheck.begin())
Full example prog:
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, char **argv) {
if (argc != 3) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "Will print true if 'prefix' is a prefix of string"
<< std::endl;
return -1;
}
std::string prefix(argv[1]);
std::string toCheck(argv[2]);
if (prefix.length() > toCheck.length()) {
std::cerr << "Usage: " << argv[0] << " prefix string" << std::endl
<< "'prefix' is longer than 'string'" << std::endl;
return 2;
}
if (std::equal(prefix.begin(), prefix.end(), toCheck.begin())) {
std::cout << '"' << prefix << '"' << " is a prefix of " << '"' << toCheck
<< '"' << std::endl;
return 0;
} else {
std::cout << '"' << prefix << '"' << " is NOT a prefix of " << '"'
<< toCheck << '"' << std::endl;
return 1;
}
}
With C++17 you can use std::basic_string_view & with C++20 std::basic_string::starts_with or std::basic_string_view::starts_with.
The benefit of std::string_view in comparison to std::string - regarding memory management - is that it only holds a pointer to a "string" (contiguous sequence of char-like objects) and knows its size. Example without moving/copying the source strings just to get the integer value:
#include <exception>
#include <iostream>
#include <string>
#include <string_view>
int main()
{
constexpr auto argument = "--foo=42"; // Emulating command argument.
constexpr auto prefix = "--foo=";
auto inputValue = 0;
constexpr auto argumentView = std::string_view(argument);
if (argumentView.starts_with(prefix))
{
constexpr auto prefixSize = std::string_view(prefix).size();
try
{
// The underlying data of argumentView is nul-terminated, therefore we can use data().
inputValue = std::stoi(argumentView.substr(prefixSize).data());
}
catch (std::exception & e)
{
std::cerr << e.what();
}
}
std::cout << inputValue; // 42
}
Given that both strings — argv[1] and "--foo" — are C strings, #FelixDombek's answer is hands-down the best solution.
Seeing the other answers, however, I thought it worth noting that, if your text is already available as a std::string, then a simple, zero-copy, maximally efficient solution exists that hasn't been mentioned so far:
const char * foo = "--foo";
if (text.rfind(foo, 0) == 0)
foo_value = text.substr(strlen(foo));
And if foo is already a string:
std::string foo("--foo");
if (text.rfind(foo, 0) == 0)
foo_value = text.substr(foo.length());
Starting with C++20, you can use the starts_with method.
std::string s = "abcd";
if (s.starts_with("abc")) {
...
}
text.substr(0, start.length()) == start
Using STL this could look like:
std::string prefix = "--foo=";
std::string arg = argv[1];
if (prefix.size()<=arg.size() && std::equal(prefix.begin(), prefix.end(), arg.begin())) {
std::istringstream iss(arg.substr(prefix.size()));
iss >> foo_value;
}
At the risk of being flamed for using C constructs, I do think this sscanf example is more elegant than most Boost solutions. And you don't have to worry about linkage if you're running anywhere that has a Python interpreter!
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
for (int i = 1; i != argc; ++i) {
int number = 0;
int size = 0;
sscanf(argv[i], "--foo=%d%n", &number, &size);
if (size == strlen(argv[i])) {
printf("number: %d\n", number);
}
else {
printf("not-a-number\n");
}
}
return 0;
}
Here's some example output that demonstrates the solution handles leading/trailing garbage as correctly as the equivalent Python code, and more correctly than anything using atoi (which will erroneously ignore a non-numeric suffix).
$ ./scan --foo=2 --foo=2d --foo='2 ' ' --foo=2'
number: 2
not-a-number
not-a-number
not-a-number
I use std::string::compare wrapped in utility method like below:
static bool startsWith(const string& s, const string& prefix) {
return s.size() >= prefix.size() && s.compare(0, prefix.size(), prefix) == 0;
}
C++20 update :
Use std::string::starts_with
https://en.cppreference.com/w/cpp/string/basic_string/starts_with
std::string str_value = /* smthg */;
const auto starts_with_foo = str_value.starts_with(std::string_view{"foo"});
In C++20 now there is starts_with available as a member function of std::string defined as:
constexpr bool starts_with(string_view sv) const noexcept;
constexpr bool starts_with(CharT c) const noexcept;
constexpr bool starts_with(const CharT* s) const;
So your code could be something like this:
std::string s{argv[1]};
if (s.starts_with("--foo="))
In case you need C++11 compatibility and cannot use boost, here is a boost-compatible drop-in with an example of usage:
#include <iostream>
#include <string>
static bool starts_with(const std::string str, const std::string prefix)
{
return ((prefix.size() <= str.size()) && std::equal(prefix.begin(), prefix.end(), str.begin()));
}
int main(int argc, char* argv[])
{
bool usage = false;
unsigned int foos = 0; // default number of foos if no parameter was supplied
if (argc > 1)
{
const std::string fParamPrefix = "-f="; // shorthand for foo
const std::string fooParamPrefix = "--foo=";
for (unsigned int i = 1; i < argc; ++i)
{
const std::string arg = argv[i];
try
{
if ((arg == "-h") || (arg == "--help"))
{
usage = true;
} else if (starts_with(arg, fParamPrefix)) {
foos = std::stoul(arg.substr(fParamPrefix.size()));
} else if (starts_with(arg, fooParamPrefix)) {
foos = std::stoul(arg.substr(fooParamPrefix.size()));
}
} catch (std::exception& e) {
std::cerr << "Invalid parameter: " << argv[i] << std::endl << std::endl;
usage = true;
}
}
}
if (usage)
{
std::cerr << "Usage: " << argv[0] << " [OPTION]..." << std::endl;
std::cerr << "Example program for parameter parsing." << std::endl << std::endl;
std::cerr << " -f, --foo=N use N foos (optional)" << std::endl;
return 1;
}
std::cerr << "number of foos given: " << foos << std::endl;
}
Why not use gnu getopts? Here's a basic example (without safety checks):
#include <getopt.h>
#include <stdio.h>
int main(int argc, char** argv)
{
option long_options[] = {
{"foo", required_argument, 0, 0},
{0,0,0,0}
};
getopt_long(argc, argv, "f:", long_options, 0);
printf("%s\n", optarg);
}
For the following command:
$ ./a.out --foo=33
You will get
33
Ok why the complicated use of libraries and stuff? C++ String objects overload the [] operator, so you can just compare chars.. Like what I just did, because I want to list all files in a directory and ignore invisible files and the .. and . pseudofiles.
while ((ep = readdir(dp)))
{
string s(ep->d_name);
if (!(s[0] == '.')) // Omit invisible files and .. or .
files.push_back(s);
}
It's that simple..
You can also use strstr:
if (strstr(str, substr) == substr) {
// 'str' starts with 'substr'
}
but I think it's good only for short strings because it has to loop through the whole string when the string doesn't actually start with 'substr'.
With C++11 or higher you can use find() and find_first_of()
Example using find to find a single char:
#include <string>
std::string name = "Aaah";
size_t found_index = name.find('a');
if (found_index != std::string::npos) {
// Found string containing 'a'
}
Example using find to find a full string & starting from position 5:
std::string name = "Aaah";
size_t found_index = name.find('h', 3);
if (found_index != std::string::npos) {
// Found string containing 'h'
}
Example using the find_first_of() and only the first char, to search at the start only:
std::string name = ".hidden._di.r";
size_t found_index = name.find_first_of('.');
if (found_index == 0) {
// Found '.' at first position in string
}
More about find
More about find_first_of
Good luck!
std::string text = "--foo=98";
std::string start = "--foo=";
if (text.find(start) == 0)
{
int n = stoi(text.substr(start.length()));
std::cout << n << std::endl;
}
Since C++11 std::regex_search can also be used to provide even more complex expressions matching. The following example handles also floating numbers thorugh std::stof and a subsequent cast to int.
However the parseInt method shown below could throw a std::invalid_argument exception if the prefix is not matched; this can be easily adapted depending on the given application:
#include <iostream>
#include <regex>
int parseInt(const std::string &str, const std::string &prefix) {
std::smatch match;
std::regex_search(str, match, std::regex("^" + prefix + "([+-]?(?=\\.?\\d)\\d*(?:\\.\\d*)?(?:[Ee][+-]?\\d+)?)$"));
return std::stof(match[1]);
}
int main() {
std::cout << parseInt("foo=13.3", "foo=") << std::endl;
std::cout << parseInt("foo=-.9", "foo=") << std::endl;
std::cout << parseInt("foo=+13.3", "foo=") << std::endl;
std::cout << parseInt("foo=-0.133", "foo=") << std::endl;
std::cout << parseInt("foo=+00123456", "foo=") << std::endl;
std::cout << parseInt("foo=-06.12e+3", "foo=") << std::endl;
// throw std::invalid_argument
// std::cout << parseInt("foo=1", "bar=") << std::endl;
return 0;
}
The kind of magic of the regex pattern is well detailed in the following answer.
EDIT: the previous answer did not performed the conversion to integer.
if(boost::starts_with(string_to_search, string_to_look_for))
intval = boost::lexical_cast<int>(string_to_search.substr(string_to_look_for.length()));
This is completely untested. The principle is the same as the Python one. Requires Boost.StringAlgo and Boost.LexicalCast.
Check if the string starts with the other string, and then get the substring ('slice') of the first string and convert it using lexical cast.