What's an easy way to compare case insensitively of strings? [duplicate] - c++

This question already has answers here:
Case-insensitive string comparison in C++ [closed]
(30 answers)
C++11 case insensitive comparison of beginning of a string (unicode)
(3 answers)
Closed 3 years ago.
Trying to compare strings using:
!(stringvector[i]).compare(vector[j][k])
only works for some entries of
vector[j][k]
-- namely ones that are a case sensitive string match.
How do I get case-insensitive matching from this functionality?
Here is a bit of code I was working on
#include <iostream>
#include <vector>
#include <string>
using namespace std; //poor form
vector<string> stringvector = {"Yo", "YO", "babbybabby"};
vector<string> vec1 = {"yo", "Yo" , "these"};
vector<string> vec2 = {"these", "checked" , "too" , "Yo", "babbybabby"};
vector<vector<string>> vvs = {vec1, vec2};
for (int v = 0; v < vvs.size(); v++) //first index of vector
{
for(int s = 0; s < vvs[v].size(); s++) //second index of vector
{
for(int w = 0; w < stringvector.size(); w++)
{
if (stringvector[w] == vvs[v][s])
{cout << "******FOUND******";}
}
}
}
This doesn't print out FOUND for the case-insensitive matches.
Stringvector[w] == vvs[v][s] does not make case-insensitive comparison, is there a way to add this functionality easily?
--Prof D

tl;dr
Use the ICU library.
"The easy way", when it comes to natural language strings, is usually fraught with problems.
As I pointed out in my answer to that "lowercase conversion" answer #Armando linked to, if you want to actually do it right, you're currently best off using the ICU library, because nothing in the standard gives you actual Unicode support at this point.
If you look at the docs to std::tolower as used by #NutCracker, you will find that...
Only 1:1 character mapping can be performed by this function, e.g. the Greek uppercase letter 'Σ' has two lowercase forms, depending on the position in a word: 'σ' and 'ς'. A call to std::tolower cannot be used to obtain the correct lowercase form in this case.
If you want to do this correctly, you need full Unicode support, and that means the ICU library until some later revision of the C++ standard actually introduces that to the standard library.
Using icu::UnicodeString -- clunky as it might be at first -- for storing your language strings gives you access to caseCompare(), which does a proper case-insensitive comparison.

You can implement a function for this purpose, example:
bool areEqualsCI(const string &x1, const string &x2){
if(x1.size() != x2.size()) return false;
for(unsigned int i=0; i<x2.size(); ++i)
if(tolower((unsigned char)x1[i]) != tolower((unsigned char)x2[i])) return false;
return true;
}
I recommendy see this post How to convert std::string to lower case?

First, I gave myself some freedom to pretty up your code a bit. For that purpose I replaced ordinary for loops with range-based for loops. Furthermore, I have changed your names of the variables. They are not perfect yet though since I don't know what's the purpose of the code. However, here is a refactored code:
#include <iostream>
#include <vector>
#include <string>
int main() {
std::vector<std::string> vec1 = { "Yo", "YO", "babbybabby" };
std::vector<std::string> vec2 = { "yo", "Yo" , "these" };
std::vector<std::string> vec3 = { "these", "checked", "too", "Yo", "babbybabby" };
std::vector<std::vector<std::string>> vec2_vec3 = { vec2, vec3 };
for (auto const& i : vec2_vec3) {
for (auto const& j : i) {
for (auto const& k : vec1) {
if (k == j) {
std::cout << k << " == " << j << std::endl;
}
}
}
}
return 0;
}
Now, if you want to compare strings case-insensitively and if you have access to Boost library, you could use boost::iequals in the following manner:
#include <boost/algorithm/string.hpp>
std::string str1 = "yo";
std::string str2 = "YO";
if (boost::iequals(str1, str2)) {
// identical strings
}
On the other hand, if you don't have access to Boost library, you can make your own iequals function by using STL algorithms (C++14 required):
bool iequals(const string& a, const string& b) {
return std::equal(str1.begin(), str1.end(),
str2.begin(), str2.end(),
[](char a, char b) {
return std::tolower(a, std::locale()) == std::tolower(b, std::locale());
});
}
std::string str1 = "yo";
std::string str2 = "YO";
if (iequals(str1, str2)) {
// identical strings
}
Note that this would only work for Single-Byte Character Sets (SBCS).

Related

Reverse string and Palindrome

input : "A man, a plan, a canal: Panama"
Why does output capital char become small not capital?
output :
x: amanaPlanacanalpanam
y: AmanaplanacanalPanama
#include <iostream>
#include <stdbool.h>
#include <stdio.h>
#include<string.h>
using namespace std;
int isPalindrome(char* x) {
int i, j = 0;
char y[50];
int n = strlen(x);
for (i = 0; i < n; i++)
if ((x[i] >= 'a' && x[i] <= 'z') || (x[i] >= 'A' && x[i] <= 'Z'))
y[j++] = x[i];
y[j] = '\0';
n = strlen(y);
int c = n - 1;
for (i = 0; i < n - 1; i++) {
x[i] = y[c];
c--;
}
x[i] = '\0';
when i print x and y :
cout << "x: " << x << "\ny: " << y;
why char are small
output : x: amanaPlanacanalpanam
y: AmanaplanacanalPanama
if (strcmp(x, y) == 0)
return 1;
return 0;
}
int main() {
char x[50];
cout << "Enter : ";
scanf_s("%[^\n]", x, sizeof(x));
if (isPalindrome(x))
printf("\n true\n");
else
printf("\nfalse\n");
}
screen for output :
output screen
If you are using C++, you should use std::string instead and that will make your life easier and code much cleaner. So let's first create a function, that returns a string reversed:
std::string reverse( const std::string &s )
{
return std::string( s.rbegin(), s.rend() );
}
very simple, right? Now we need a function, that removes all non letters from a string, here it is:
std::string removeNonLetters( std::string s )
{
auto it = std::remove_if( s.begin(), s.end(), []( char c ) { return not std::isalpha( c ); } );
return std::string( s.begin(), it );
}
It is little bit more complicated, as we use remove-erase idiom from `std::remove_if() but it is a 2 line function. And last task - we need to make our string all lowercase:
std::string lowerCase( std::string s )
{
std::transform( s.begin(), s.end(), s.begin(), []( char c ) { return std::tolower( c ); } );
return s;
}
which uses another algo from standard library - std::transform(), here we can test them all live example, results:
amanaP :lanac a ,nalp a ,nam A
AmanaplanacanalPanama
a man, a plan, a canal: panama
Now combining these 3 functions you can easily achieve your task - to check if a string is a palindrome.
Note: if using standard algorithms is too complicated for you or it is too early for your class, you can easily rewrite them using loops. Point is you should write simple functions, test that they do work and then combine them into more complex task.
Your code is not really C++. It's C with cin and cout mixed in. C++ lets you do this in a much nicer way! You don't need stdbool header in C++, by the way.
First, let's see how it could be done in C++98, using built-in algorithms. Try it on online!: https://godbolt.org/z/j5s4r91s1
// C++98, v1
#include <algorithm>
#include <cassert>
#include <cctype>
#include <string>
// We adapt the C-compatible signatures to C++
char my_tolower(char c) { return std::tolower(c); }
// We pass the string by const reference, thus avoiding the copy.
// Passing by value would look like is_palindrome(const std::string str)
// and would be wasteful, since that copy is not needed at all.
bool is_palindrome(const std::string &input)
{
std::string filtered;
// For each character of the input string (within the {begin(), end()} range),
// insert at the back of the filtered string only if isalpha returns true
// for that character. This removes non-letters - space and punctuation.
for (std::string::const_iterator it = input.begin(); it != input.end(); ++it)
if (std::isalpha(*it))
filtered.push_back(*it);
if (input == "A man, a plan, a canal: Panama")
assert (filtered == "AmanaplanacanalPanama");
// Transform the filtered string to lower case: for each character
// in the {begin(), end()} range, apply the my_tolower function to it, and
// write the result back to the same range {begin(), ...}
std::transform (filtered.begin(), filtered.end(), filtered.begin(), my_tolower);
if (input == "A man, a plan, a canal: Panama")
assert (filtered == "amanaplanacanalpanama");
// Is the string in forward direction {begin(), end()}
// equal to the string in backward direction {rbegin(), ...}?
return std::equal (filtered.begin(), filtered.end(), filtered.rbegin());
}
int main()
{
// For testing, input can be provided directly in the code.
const char input[] = "A man, a plan, a canal: Panama";
// We now can plainly state that `input` contains a palindrome.
assert (is_palindrome(input));
}
Note how assert is used as a crude form of automated testing: we don't have to be entering inputs manually before we get the code working. Up till then, an assertion will suffice. Assertions are a means of conveying to both humans and machines that some condition is true. If the condition is not satisfied, the program terminates, usually with an error message.
Instead of the for loop, we should have used copy_if, but that's only available in C++11 and later:
std::copy_if (input.begin(), input.end(), std::back_inserter(filtered), my_isalpha);
But one shouldn't use the algorithms blindly. In our particular case, they don't improve performance nor make the code more generic. We can simplify things dramatically by explicitly iterating over the input string.
https://godbolt.org/z/s8YYMK3a4
// C++98, v2
#include <algorithm>
#include <cassert>
#include <cctype>
#include <string>
std::string filter_for_palindrome_check(const std::string &input)
{
std::string result;
for (std::string::const_iterator it = input.begin(); it != input.end(); ++it)
if (std::isalpha(*it))
result.push_back(std::tolower(*it));
return result;
}
bool is_palindrome(const std::string &input)
{
const std::string filtered = filter_for_palindrome_check(input);
return std::equal (filtered.begin(), filtered.end(), filtered.rbegin());
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (filter_for_palindrome_check(input) == "amanaplanacanalpanama");
assert (is_palindrome(input));
}
In C++11, we could have used the range-for.
https://godbolt.org/z/WGaezj3oq
// C++11, v1
#include <algorithm>
#include <cassert>
#include <cctype>
#include <string>
std::string filter_for_palindrome_check(const std::string &input)
{
std::string result;
for (char c : input)
if (std::isalpha(c))
result.push_back(std::tolower(c));
return result;
}
bool is_palindrome(const std::string &input)
{
const std::string filtered = filter_for_palindrome_check(input);
return std::equal (filtered.begin(), filtered.end(), filtered.rbegin());
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (filter_for_palindrome_check(input) == "amanaplanacanalpanama");
assert (is_palindrome(input));
}
Now we can begin to experiment with C++20. Note how the algorithm becomes less verbose even after this initial transformation. The pipe syntax is reminiscent of Unix.
https://godbolt.org/z/cM4ova8a5
// C++20, v1
#include <algorithm>
#include <cassert>
#include <cctype>
#include <ranges>
#include <string>
#include <string_view>
// Saves us from typing std:: every time we use these namespaces.
namespace views = std::ranges::views;
// We adapt the C-compatible signatures to C++
bool my_isalpha(char c) { return std::isalpha(c); }
char my_tolower(char c) { return std::tolower(c); }
// It's more general to use a string_view - always passed by value!
// - in place of const string &.
bool is_palindrome(std::string_view input)
{
// C++17 lets us do it all in a single `using` statement :)
using views::filter, views::transform, views::reverse;
std::string filtered;
std::ranges::copy(input | filter(my_isalpha) | transform(my_tolower),
std::back_inserter(filtered));
return std::ranges::equal(filtered, filtered | reverse);
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (is_palindrome(input));
}
The C++ standard that's coming after C++20 will likely make it easier to convert a range to a concrete value. In the meantime, Eric Niebler's range-v3 library does the trick.
https://godbolt.org/z/ojo9zWPr9
// C++20, v2 (using Eric Niebler's range-v3)
#include <algorithm>
#include <cassert>
#include <cctype>
#include <ranges>
#include <range/v3/to_container.hpp>
#include <string>
#include <string_view>
namespace views = std::ranges::views;
bool my_isalpha(char c) { return std::isalpha(c); }
char my_tolower(char c) { return std::tolower(c); }
bool is_palindrome(std::string_view input)
{
using views::filter, views::transform, views::reverse, ranges::to;
auto const filtered = input
| filter(my_isalpha) | transform(my_tolower) | to<std::string>();
return std::ranges::equal(filtered, filtered | reverse);
}
int main()
{
const char input[] = "A man, a plan, a canal: Panama";
assert (is_palindrome(input));
}
Note how the C++ code is now approximating a plain-language description of the algorithm: take the input, only alphanumeric part of it, in lower case, put it into a string. Call the input a palindrome if the string is equal to its reverse.
As you may have guessed, the type of filtered is const std::string.
You can of course ask: why not avoid the intermediate string? Say:
bool is_palindrome(std::string_view input)
{
using views::filter, views::transform, views::reverse;
auto const filtered = input | filter(my_isalpha) | transform(my_tolower);
return std::ranges::equal(filtered, filtered | reverse);
}
This won't compile until we make filtered non-const. That's a bit curious: are the compiler and library designers perhaps trying to tell us something? Yes!
Without said const, this approach works, but filtered here is a lazily evaluated range. Lazy evaluation means that the filtered result is computed only as needed. And here, it's needed twice: for the 1st and 2nd argument to std::ranges::equal. So this will, in general, make things worse even if it appears "simpler". Let's not write such things.
Hopefully this gives you some taste for how much C++ is a different language than C. There're some nifty things in C++, but not all of them are in the standard library. There is IMHO at least one somewhat better alternative to C++20 ranges, e.g. ast-al/rangeless. It's like C# LINQ, but in C++ :)
C++20 ranges work "well" when things are trivial, but become hard to use when you're expecting more complex expressions to "just work" - and they don't. So - the above C++20 examples must be read with this in mind. C++20 ranges aren't an answer to the ultimate question for sure.
The piece of code in which you fill x with the contents of y is not correct, since it does not consider the first element of y. If you start at index c = n - 1, but in the for loop you stop at i < n - 1, the last index you get from y is 2, not 1.
If you want to keep the original upper and lower cases, a possible solution would be to update both i and c in the loop conditions, and to assure that all the indices are correctly swapped:
for (i = 0, c = n - 1; i < n; i++, c--) {
x[i] = y[c];
}

Count unique words in a string in C++

I want to count how many unique words are in string 's' where punctuations and newline character (\n) separates each word. So far I've used the logical or operator to check how many wordSeparators are in the string, and added 1 to the result to get the number of words in string s.
My current code returns 12 as the number of word. Since 'ab', 'AB', 'aB', 'Ab' (and same for 'zzzz') are all same and not unique, how can I ignore the variants of a word? I followed the link: http://www.cplusplus.com/reference/algorithm/unique/, but the reference counts unique item in a vector. But, I am using string and not vector.
Here is my code:
#include <iostream>
#include <string>
using namespace std;
bool isWordSeparator(char & c) {
return c == ' ' || c == '-' || c == '\n' || c == '?' || c == '.' || c == ','
|| c == '?' || c == '!' || c == ':' || c == ';';
}
int countWords(string s) {
int wordCount = 0;
if (s.empty()) {
return 0;
}
for (int x = 0; x < s.length(); x++) {
if (isWordSeparator(s.at(x))) {
wordCount++;
return wordCount+1;
int main() {
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
int number_of_words = countWords(s);
cout << "Number of Words: " << number_of_words << endl;
return 0;
}
What you need to make your code case-insensitive is tolower().
You can apply it to your original string using std::transform:
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
I should add however that your current code is much closer to C than to C++, perhaps you should check out what standard library has to offer.
I suggest istringstream + istream_iterator for tokenizing and either unique_copy or set for getting rid of the duplicates, like this: https://ideone.com/nb4BEH
You could create a set of strings, save the position of the last separator (starting with 0) and use substring to extract the word, then insert it into the set. When done just return the set's size.
You could make the whole operation easier by using string::split - it tokenizes the string for you. All you have to do is insert all of the elements in the returned array to the set and again return it's size.
Edit: as per comments, you need a custom comparator to ignore case for comparisons.
First of all I'd suggest rewriting isWordSeparator like this:
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
since your current implementation doesn't handle all the punctuation and space, like \t or +.
Also, incrementing wordCount when isWordSeparator is true is incorrect for example if you have something like ?!.
So, a less error-prone approach would be to substitute all separators by space and then iterate words inserting them into an (unordered) set:
#include <iterator>
#include <unordered_set>
#include <algorithm>
#include <cctype>
#include <sstream>
int countWords(std::string s) {
std::transform(s.begin(), s.end(), s.begin(), [](char c) {
if (isWordSeparator(c)) {
return ' ';
}
return std::tolower(c);
});
std::unordered_set<std::string> uniqWords;
std::stringstream ss(s);
std::copy(std::istream_iterator<std::string>(ss), std::istream_iterator<std::string(), std::inserter(uniqWords));
return uniqWords.size();
}
While splitting the string into words, insert all words into a std::set. This will get rid of the duplicates. Then it's just a matter of calling set::size() to get the number of unique words.
I'm using the boost::split() function from the boost string algorithm library in my solution, because is almost standard nowadays.
Explanations in the comments in code...
#include <iostream>
#include <string>
#include <set>
#include <boost/algorithm/string.hpp>
using namespace std;
// Function suggested by user 'mshrbkv':
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
// This is used to make the set case-insensitive.
// Alternatively you could call boost::to_lower() to make the
// string all lowercase before calling boost::split().
struct IgnoreCaseCompare {
bool operator()( const std::string& a, const std::string& b ) const {
return boost::ilexicographical_compare( a, b );
}
};
int main()
{
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
// Define a set that will contain only unique strings, ignoring case.
set< string, IgnoreCaseCompare > words;
// Split the string by using your isWordSeparator function
// to define the delimiters. token_compress_on collapses multiple
// consecutive delimiters into only one.
boost::split( words, s, isWordSeparator, boost::token_compress_on );
// Now the set contains only the unique words.
cout << "Number of Words: " << words.size() << endl;
for( auto& w : words )
cout << w << endl;
return 0;
}
Demo: http://coliru.stacked-crooked.com/a/a3b51a6c6a3b4ee8
You can consider SQLite c++ wrapper

Make *it in lowercase [duplicate]

I want to convert a std::string to lowercase. I am aware of the function tolower(). However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string would require iterating over each character.
Is there an alternative which works 100% of the time?
Adapted from Not So Frequently Asked Questions:
#include <algorithm>
#include <cctype>
#include <string>
std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
[](unsigned char c){ return std::tolower(c); });
You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.
If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:
char asciitolower(char in) {
if (in <= 'Z' && in >= 'A')
return in - ('Z' - 'z');
return in;
}
std::transform(data.begin(), data.end(), data.begin(), asciitolower);
Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.
Boost provides a string algorithm for this:
#include <boost/algorithm/string.hpp>
std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str
Or, for non-in-place:
#include <boost/algorithm/string.hpp>
const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);
tl;dr
Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.
First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)
If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.
As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).
So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.
Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?
So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.
(C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)
While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.
And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)
So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>
#include <iostream>
int main()
{
/* "Odysseus" */
char const * someString = u8"ΟΔΥΣΣΕΥΣ";
icu::UnicodeString someUString( someString, "UTF-8" );
// Setting the locale explicitly here for completeness.
// Usually you would use the user-specified system locale,
// which *does* make a difference (see ı vs. i above).
std::cout << someUString.toLower( "el_GR" ) << "\n";
std::cout << someUString.toUpper( "el_GR" ) << "\n";
return 0;
}
Compile (with G++ in this example):
g++ -Wall example.cpp -licuuc -licuio
This gives:
ὀδυσσεύς
Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.
Using range-based for loop of C++11 a simpler code would be :
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for(auto elem : str)
std::cout << std::tolower(elem,loc);
}
If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html
Another approach using range based for loop with reference variable
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
cout<<test<<endl;
This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.
#include <string>
#include <algorithm>
#include <iostream>
int main (int argc, char* argv[])
{
std::string sourceString = "Abc";
std::string destinationString;
// Allocate the destination space
destinationString.resize(sourceString.size());
// Convert the source string to lower case
// storing the result in destination string
std::transform(sourceString.begin(),
sourceString.end(),
destinationString.begin(),
::tolower);
// Output the result of the conversion
std::cout << sourceString
<< " -> "
<< destinationString
<< std::endl;
}
Simplest way to convert string into loweercase without bothering about std namespace is as follows
1:string with/without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
getline(cin,str);
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
2:string without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
cin>>str;
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
My own template functions which performs upper / lower case.
#include <string>
#include <algorithm>
//
// Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), tolower);
return s2;
}
//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), toupper);
return s2;
}
I wrote this simple helper function:
#include <locale> // tolower
string to_lower(string s) {
for(char &c : s)
c = tolower(c);
return s;
}
Usage:
string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.
An alternative to Boost is POCO (pocoproject.org).
POCO provides two variants:
The first variant makes a copy without altering the original string.
The second variant changes the original string in place.
"In Place" versions always have "InPlace" in the name.
Both versions are demonstrated below:
#include "Poco/String.h"
using namespace Poco;
std::string hello("Stack Overflow!");
// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));
// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
std::ctype::tolower() from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page
#include <locale>
#include <iostream>
int main () {
std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());
std::wcout << "In US English UTF-8 locale:\n";
auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring str = L"HELLo, wORLD!";
std::wcout << "Lowercase form of the string '" << str << "' is ";
f.tolower(&str[0], &str[0] + str.size());
std::wcout << "'" << str << "'\n";
}
Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3, I would like to add a way to perform this conversion using it.
To modify the string in-place:
str |= action::transform([](unsigned char c){ return std::tolower(c); });
To generate a new string:
auto new_string = original_string
| view::transform([](unsigned char c){ return std::tolower(c); });
(Don't forget to #include <cctype> and the required Ranges headers.)
Note: the use of unsigned char as the argument to the lambda is inspired by cppreference, which states:
Like all other functions from <cctype>, the behavior of std::tolower is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char:
char my_tolower(char ch)
{
return static_cast<char>(std::tolower(static_cast<unsigned char>(ch)));
}
Similarly, they should not be directly used with standard algorithms when the iterator's value type is char or signed char. Instead, convert the value to unsigned char first:
std::string str_tolower(std::string s) {
std::transform(s.begin(), s.end(), s.begin(),
// static_cast<int(*)(int)>(std::tolower) // wrong
// [](int c){ return std::tolower(c); } // wrong
// [](char c){ return std::tolower(c); } // wrong
[](unsigned char c){ return std::tolower(c); } // correct
);
return s;
}
On microsoft platforms you can use the strlwr family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx
// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>
int main( void )
{
char string[100] = "The String to End All Strings!";
char * copy1 = _strdup( string ); // make two copies
char * copy2 = _strdup( string );
_strlwr( copy1 ); // C4996
_strupr( copy2 ); // C4996
printf( "Mixed: %s\n", string );
printf( "Lower: %s\n", copy1 );
printf( "Upper: %s\n", copy2 );
free( copy1 );
free( copy2 );
}
There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.
Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.
Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.
The code looks like this...
#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap() {
for (int i = 0; i < sizeof(UtoL); i++) {
if (isupper(i)) {
UtoL[i] = (char)(i + 32);
} else {
UtoL[i] = i;
}
}
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
char *p = szMyStr;
// do conversion in-place so as not to require a destination buffer
while (*p) { // szMyStr must be null-terminated
*p = UtoL[*p];
p++;
}
return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
time_t start;
char *Lowered, Upper[128];
InitUtoLMap();
strcpy(Upper, "Every GOOD boy does FINE!");
Lowered = LowerStr(Upper);
return 0;
}
This approach will, at the same time, allow you to remap any other characters you wish to change.
This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.
Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.
Here's a macro technique if you want something simple:
#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(), ::toupper); std::transform (x.begin()+1, x.end(), x.begin()+1,::tolower)
However, note that #AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.
Is there an alternative which works 100% of the time?
No
There are several questions you need to ask yourself before choosing a lowercasing method.
How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding?
What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else?
What libraries are available?
Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!
C++ doesn't have tolower or toupper methods implemented for std::string, but it is available for char. One can easily read each char of string, convert it into required case and put it back into string.
A sample code without using any third party library:
#include<iostream>
int main(){
std::string str = std::string("How ARe You");
for(char &ch : str){
ch = std::tolower(ch);
}
std::cout<<str<<std::endl;
return 0;
}
For character based operation on string : For every character in string
// tolower example (C++)
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for (std::string::size_type i=0; i<str.length(); ++i)
std::cout << std::tolower(str[i],loc);
return 0;
}
For more information: http://www.cplusplus.com/reference/locale/tolower/
Copy because it was disallowed to improve answer. Thanks SO
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
Explanation:
for(auto& c : test) is a range-based for loop of the kind for (range_declaration:range_expression)loop_statement:
range_declaration: auto& c
Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.
range_expression: test
The range in this case are the characters of string test.
The characters of the string test are available as a reference inside the for loop through identifier c.
Try this function :)
string toLowerCase(string str) {
int str_len = str.length();
string final_str = "";
for(int i=0; i<str_len; i++) {
char character = str[i];
if(character>=65 && character<=92) {
final_str += (character+32);
} else {
final_str += character;
}
}
return final_str;
}
Use fplus::to_lower_case() from fplus library.
Search to_lower_case in fplus API Search
Example:
fplus::to_lower_case(std::string("ABC")) == std::string("abc");
Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.
#include <exception>
#include <iostream>
#include <codecvt>
// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"
using namespace std;
using namespace unicode;
// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;
std::u32string in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;
std::u32string lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;
Output
Je suis là!
je suis là!
Google's absl library has absl::AsciiStrToLower / absl::AsciiStrToUpper
Since you are using std::string, you are using c++. If using c++11 or higher, this doesn't need anything fancy. If words is vector<string>, then:
for (auto & str : words) {
for(auto & ch : str)
ch = tolower(ch);
}
Doesn't have strange exceptions. Might want to use w_char's but otherwise this should do it all in place.
Code Snippet
#include<bits/stdc++.h>
using namespace std;
int main ()
{
ios::sync_with_stdio(false);
string str="String Convert\n";
for(int i=0; i<str.size(); i++)
{
str[i] = tolower(str[i]);
}
cout<<str<<endl;
return 0;
}
Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).
Facebook's Folly:
void toLowerAscii(char* str, size_t length)
Google's Abseil:
void AsciiStrToLower(std::string* s);
I wrote a templated version that works with any string :
#include <type_traits> // std::decay
#include <ctype.h> // std::toupper & std::tolower
template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1,
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------
template<class T, class T2 =
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::toupper(*t); }
template<class T, class T2 = typename std::decay< typename
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::tolower(*t); }
Tested with gcc compiler:
#include <iostream>
#include "upove_code.h"
int main()
{
std::string str1 = "hEllo ";
char str2 [] = "wOrld";
ToUpper(str1);
ToUpper(str2);
std::cout << str1 << str2 << '\n';
Tolower(str1);
Tolower(str2);
std::cout << str1 << str2 << '\n';
return 0;
}
output:
>HELLO WORLD
>
>hello world
use this code to change case of string in c++.
#include<bits/stdc++.h>
using namespace std;
int main(){
string a = "sssAAAAAAaaaaDas";
transform(a.begin(),a.end(),a.begin(),::tolower);
cout<<a;
}
This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string _input = "lowercasetouppercase";
#if 0
// My idea is to use the ascii value to convert
char upperA = 'A';
char lowerA = 'a';
cout << (int)upperA << endl; // ASCII value of 'A' -> 65
cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
// 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0
cout << "Input String = " << _input.c_str() << endl;
for (int i = 0; i < _input.length(); ++i)
{
_input[i] -= 32; // To convert lower to upper
#if 0
_input[i] += 32; // To convert upper to lower
#endif // 0
}
cout << "Output String = " << _input.c_str() << endl;
return 0;
}
Note: if there are special characters then need to be handled using condition check.

Ignoring several different words.. c++?

I'm reading in several documents, and indexing the words I read in. However, I want to ignore common case words (a, an, the, and, is, or, are, etc).
Is there a shortcut to doing this? Moreso than doing just...
if(word=="and" || word=="is" || etc etc....) ignore word;
For example, can I put them into a const string somehow, and have it just check against the string? Not sure... thank you!
Create a set<string> with the words that you would like to exclude, and use mySet.count(word) to determine if the word is in the set. If it is, the count will be 1; it will be 0 otherwise.
#include <iostream>
#include <set>
#include <string>
using namespace std;
int main() {
const char *words[] = {"a", "an", "the"};
set<string> wordSet(words, words+3);
cerr << wordSet.count("the") << endl;
cerr << wordSet.count("quick") << endl;
return 0;
}
You can use an array of strings, looping through and matching against each, or use a more optimal data structure such as a set, or trie.
Here's an example of how to do it with a normal array:
const char *commonWords[] = {"and", "is" ...};
int commonWordsLength = 2; // number of words in the array
for (int i = 0; i < commonWordsLength; ++i)
{
if (!strcmp(word, commonWords[i]))
{
//ignore word;
break;
}
}
Note that this example doesn't use the C++ STL, but you should.
If you want to maximize performance you should create a trie....
http://en.wikipedia.org/wiki/Trie
...of stopwords....
http://en.wikipedia.org/wiki/Stop_words
There is no standard C++ trie datastructure, however see this question for third party implementations...
Trie implementation
If you can't be bothered with that and want to use a standard container, the best one to use is unordered_set<string> which will put the stopwords in a hash table.
bool filter(const string& word)
{
static unordered_set<string> stopwords({"a", "an", "the"});
return !stopwords.count(word);
}

How to replace all occurrences of a character in string?

What is the effective way to replace all occurrences of a character with another character in std::string?
std::string doesn't contain such function but you could use stand-alone replace function from algorithm header.
#include <algorithm>
#include <string>
void some_func() {
std::string s = "example string";
std::replace( s.begin(), s.end(), 'x', 'y'); // replace all 'x' to 'y'
}
The question is centered on character replacement, but, as I found this page very useful (especially Konrad's remark), I'd like to share this more generalized implementation, which allows to deal with substrings as well:
std::string ReplaceAll(std::string str, const std::string& from, const std::string& to) {
size_t start_pos = 0;
while((start_pos = str.find(from, start_pos)) != std::string::npos) {
str.replace(start_pos, from.length(), to);
start_pos += to.length(); // Handles case where 'to' is a substring of 'from'
}
return str;
}
Usage:
std::cout << ReplaceAll(string("Number Of Beans"), std::string(" "), std::string("_")) << std::endl;
std::cout << ReplaceAll(string("ghghjghugtghty"), std::string("gh"), std::string("X")) << std::endl;
std::cout << ReplaceAll(string("ghghjghugtghty"), std::string("gh"), std::string("h")) << std::endl;
Outputs:
Number_Of_Beans
XXjXugtXty
hhjhugthty
EDIT:
The above can be implemented in a more suitable way, in case performance is of your concern, by returning nothing (void) and performing the changes "in-place"; that is, by directly modifying the string argument str, passed by reference instead of by value. This would avoid an extra costly copy of the original string by overwriting it.
Code :
static inline void ReplaceAll2(std::string &str, const std::string& from, const std::string& to)
{
// Same inner code...
// No return statement
}
Hope this will be helpful for some others...
I thought I'd toss in the boost solution as well:
#include <boost/algorithm/string/replace.hpp>
// in place
std::string in_place = "blah#blah";
boost::replace_all(in_place, "#", "#");
// copy
const std::string input = "blah#blah";
std::string output = boost::replace_all_copy(input, "#", "#");
Imagine a large binary blob where all 0x00 bytes shall be replaced by "\1\x30" and all 0x01 bytes by "\1\x31" because the transport protocol allows no \0-bytes.
In cases where:
the replacing and the to-replaced string have different lengths,
there are many occurences of the to-replaced string within the source string and
the source string is large,
the provided solutions cannot be applied (because they replace only single characters) or have a performance problem, because they would call string::replace several times which generates copies of the size of the blob over and over.
(I do not know the boost solution, maybe it is OK from that perspective)
This one walks along all occurrences in the source string and builds the new string piece by piece once:
void replaceAll(std::string& source, const std::string& from, const std::string& to)
{
std::string newString;
newString.reserve(source.length()); // avoids a few memory allocations
std::string::size_type lastPos = 0;
std::string::size_type findPos;
while(std::string::npos != (findPos = source.find(from, lastPos)))
{
newString.append(source, lastPos, findPos - lastPos);
newString += to;
lastPos = findPos + from.length();
}
// Care for the rest after last occurrence
newString += source.substr(lastPos);
source.swap(newString);
}
A simple find and replace for a single character would go something like:
s.replace(s.find("x"), 1, "y")
To do this for the whole string, the easy thing to do would be to loop until your s.find starts returning npos. I suppose you could also catch range_error to exit the loop, but that's kinda ugly.
For completeness, here's how to do it with std::regex.
#include <regex>
#include <string>
int main()
{
const std::string s = "example string";
const std::string r = std::regex_replace(s, std::regex("x"), "y");
}
If you're looking to replace more than a single character, and are dealing only with std::string, then this snippet would work, replacing sNeedle in sHaystack with sReplace, and sNeedle and sReplace do not need to be the same size. This routine uses the while loop to replace all occurrences, rather than just the first one found from left to right.
while(sHaystack.find(sNeedle) != std::string::npos) {
sHaystack.replace(sHaystack.find(sNeedle),sNeedle.size(),sReplace);
}
As Kirill suggested, either use the replace method or iterate along the string replacing each char independently.
Alternatively you can use the find method or find_first_of depending on what you need to do. None of these solutions will do the job in one go, but with a few extra lines of code you ought to make them work for you. :-)
What about Abseil StrReplaceAll? From the header file:
// This file defines `absl::StrReplaceAll()`, a general-purpose string
// replacement function designed for large, arbitrary text substitutions,
// especially on strings which you are receiving from some other system for
// further processing (e.g. processing regular expressions, escaping HTML
// entities, etc.). `StrReplaceAll` is designed to be efficient even when only
// one substitution is being performed, or when substitution is rare.
//
// If the string being modified is known at compile-time, and the substitutions
// vary, `absl::Substitute()` may be a better choice.
//
// Example:
//
// std::string html_escaped = absl::StrReplaceAll(user_input, {
// {"&", "&"},
// {"<", "<"},
// {">", ">"},
// {"\"", """},
// {"'", "'"}});
#include <iostream>
#include <string>
using namespace std;
// Replace function..
string replace(string word, string target, string replacement){
int len, loop=0;
string nword="", let;
len=word.length();
len--;
while(loop<=len){
let=word.substr(loop, 1);
if(let==target){
nword=nword+replacement;
}else{
nword=nword+let;
}
loop++;
}
return nword;
}
//Main..
int main() {
string word;
cout<<"Enter Word: ";
cin>>word;
cout<<replace(word, "x", "y")<<endl;
return 0;
}
Old School :-)
std::string str = "H:/recursos/audio/youtube/libre/falta/";
for (int i = 0; i < str.size(); i++) {
if (str[i] == '/') {
str[i] = '\\';
}
}
std::cout << str;
Result:
H:\recursos\audio\youtube\libre\falta\
For simple situations this works pretty well without using any other library then std::string (which is already in use).
Replace all occurences of character a with character b in some_string:
for (size_t i = 0; i < some_string.size(); ++i) {
if (some_string[i] == 'a') {
some_string.replace(i, 1, "b");
}
}
If the string is large or multiple calls to replace is an issue, you can apply the technique mentioned in this answer: https://stackoverflow.com/a/29752943/3622300
here's a solution i rolled, in a maximal DRI spirit.
it will search sNeedle in sHaystack and replace it by sReplace,
nTimes if non 0, else all the sNeedle occurences.
it will not search again in the replaced text.
std::string str_replace(
std::string sHaystack, std::string sNeedle, std::string sReplace,
size_t nTimes=0)
{
size_t found = 0, pos = 0, c = 0;
size_t len = sNeedle.size();
size_t replen = sReplace.size();
std::string input(sHaystack);
do {
found = input.find(sNeedle, pos);
if (found == std::string::npos) {
break;
}
input.replace(found, len, sReplace);
pos = found + replen;
++c;
} while(!nTimes || c < nTimes);
return input;
}
I think I'd use std::replace_if()
A simple character-replacer (requested by OP) can be written by using standard library functions.
For an in-place version:
#include <string>
#include <algorithm>
void replace_char(std::string& in,
std::string::value_type srch,
std::string::value_type repl)
{
std::replace_if(std::begin(in), std::end(in),
[&srch](std::string::value_type v) { return v==srch; },
repl);
return;
}
and an overload that returns a copy if the input is a const string:
std::string replace_char(std::string const& in,
std::string::value_type srch,
std::string::value_type repl)
{
std::string result{ in };
replace_char(result, srch, repl);
return result;
}
This works! I used something similar to this for a bookstore app, where the inventory was stored in a CSV (like a .dat file). But in the case of a single char, meaning the replacer is only a single char, e.g.'|', it must be in double quotes "|" in order not to throw an invalid conversion const char.
#include <iostream>
#include <string>
using namespace std;
int main()
{
int count = 0; // for the number of occurences.
// final hold variable of corrected word up to the npos=j
string holdWord = "";
// a temp var in order to replace 0 to new npos
string holdTemp = "";
// a csv for a an entry in a book store
string holdLetter = "Big Java 7th Ed,Horstman,978-1118431115,99.85";
// j = npos
for (int j = 0; j < holdLetter.length(); j++) {
if (holdLetter[j] == ',') {
if ( count == 0 )
{
holdWord = holdLetter.replace(j, 1, " | ");
}
else {
string holdTemp1 = holdLetter.replace(j, 1, " | ");
// since replacement is three positions in length,
// must replace new replacement's 0 to npos-3, with
// the 0 to npos - 3 of the old replacement
holdTemp = holdTemp1.replace(0, j-3, holdWord, 0, j-3);
holdWord = "";
holdWord = holdTemp;
}
holdTemp = "";
count++;
}
}
cout << holdWord << endl;
return 0;
}
// result:
Big Java 7th Ed | Horstman | 978-1118431115 | 99.85
Uncustomarily I am using CentOS currently, so my compiler version is below . The C++ version (g++), C++98 default:
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This is not the only method missing from the standard library, it was intended be low level.
This use case and many other are covered by general libraries such as:
POCO
Abseil
Boost
QtCore
QtCore & QString has my preference: it supports UTF8 and uses less templates, which means understandable errors and faster compilation. It uses the "q" prefix which makes namespaces unnecessary and simplifies headers.
Boost often generates hideous error messages and slow compile time.
POCO seems to be a reasonable compromise.
How about replace any character string with any character string using only good-old C string functions?
char original[256]="First Line\nNext Line\n", dest[256]="";
char* replace_this = "\n"; // this is now a single character but could be any string
char* with_this = "\r\n"; // this is 2 characters but could be of any length
/* get the first token */
char* token = strtok(original, replace_this);
/* walk through other tokens */
while (token != NULL) {
strcat(dest, token);
strcat(dest, with_this);
token = strtok(NULL, replace_this);
}
dest should now have what we are looking for.