I want to convert a std::string to lowercase. I am aware of the function tolower(). However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string would require iterating over each character.
Is there an alternative which works 100% of the time?
Adapted from Not So Frequently Asked Questions:
#include <algorithm>
#include <cctype>
#include <string>
std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
[](unsigned char c){ return std::tolower(c); });
You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.
If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:
char asciitolower(char in) {
if (in <= 'Z' && in >= 'A')
return in - ('Z' - 'z');
return in;
}
std::transform(data.begin(), data.end(), data.begin(), asciitolower);
Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.
Boost provides a string algorithm for this:
#include <boost/algorithm/string.hpp>
std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str
Or, for non-in-place:
#include <boost/algorithm/string.hpp>
const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);
tl;dr
Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.
First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)
If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.
As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).
So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.
Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?
So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.
(C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)
While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.
And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)
So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>
#include <iostream>
int main()
{
/* "Odysseus" */
char const * someString = u8"ΟΔΥΣΣΕΥΣ";
icu::UnicodeString someUString( someString, "UTF-8" );
// Setting the locale explicitly here for completeness.
// Usually you would use the user-specified system locale,
// which *does* make a difference (see ı vs. i above).
std::cout << someUString.toLower( "el_GR" ) << "\n";
std::cout << someUString.toUpper( "el_GR" ) << "\n";
return 0;
}
Compile (with G++ in this example):
g++ -Wall example.cpp -licuuc -licuio
This gives:
ὀδυσσεύς
Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.
Using range-based for loop of C++11 a simpler code would be :
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for(auto elem : str)
std::cout << std::tolower(elem,loc);
}
If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html
Another approach using range based for loop with reference variable
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
cout<<test<<endl;
This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.
#include <string>
#include <algorithm>
#include <iostream>
int main (int argc, char* argv[])
{
std::string sourceString = "Abc";
std::string destinationString;
// Allocate the destination space
destinationString.resize(sourceString.size());
// Convert the source string to lower case
// storing the result in destination string
std::transform(sourceString.begin(),
sourceString.end(),
destinationString.begin(),
::tolower);
// Output the result of the conversion
std::cout << sourceString
<< " -> "
<< destinationString
<< std::endl;
}
Simplest way to convert string into loweercase without bothering about std namespace is as follows
1:string with/without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
getline(cin,str);
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
2:string without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
cin>>str;
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
My own template functions which performs upper / lower case.
#include <string>
#include <algorithm>
//
// Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), tolower);
return s2;
}
//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), toupper);
return s2;
}
I wrote this simple helper function:
#include <locale> // tolower
string to_lower(string s) {
for(char &c : s)
c = tolower(c);
return s;
}
Usage:
string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.
An alternative to Boost is POCO (pocoproject.org).
POCO provides two variants:
The first variant makes a copy without altering the original string.
The second variant changes the original string in place.
"In Place" versions always have "InPlace" in the name.
Both versions are demonstrated below:
#include "Poco/String.h"
using namespace Poco;
std::string hello("Stack Overflow!");
// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));
// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
std::ctype::tolower() from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page
#include <locale>
#include <iostream>
int main () {
std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());
std::wcout << "In US English UTF-8 locale:\n";
auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring str = L"HELLo, wORLD!";
std::wcout << "Lowercase form of the string '" << str << "' is ";
f.tolower(&str[0], &str[0] + str.size());
std::wcout << "'" << str << "'\n";
}
Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3, I would like to add a way to perform this conversion using it.
To modify the string in-place:
str |= action::transform([](unsigned char c){ return std::tolower(c); });
To generate a new string:
auto new_string = original_string
| view::transform([](unsigned char c){ return std::tolower(c); });
(Don't forget to #include <cctype> and the required Ranges headers.)
Note: the use of unsigned char as the argument to the lambda is inspired by cppreference, which states:
Like all other functions from <cctype>, the behavior of std::tolower is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char:
char my_tolower(char ch)
{
return static_cast<char>(std::tolower(static_cast<unsigned char>(ch)));
}
Similarly, they should not be directly used with standard algorithms when the iterator's value type is char or signed char. Instead, convert the value to unsigned char first:
std::string str_tolower(std::string s) {
std::transform(s.begin(), s.end(), s.begin(),
// static_cast<int(*)(int)>(std::tolower) // wrong
// [](int c){ return std::tolower(c); } // wrong
// [](char c){ return std::tolower(c); } // wrong
[](unsigned char c){ return std::tolower(c); } // correct
);
return s;
}
On microsoft platforms you can use the strlwr family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx
// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>
int main( void )
{
char string[100] = "The String to End All Strings!";
char * copy1 = _strdup( string ); // make two copies
char * copy2 = _strdup( string );
_strlwr( copy1 ); // C4996
_strupr( copy2 ); // C4996
printf( "Mixed: %s\n", string );
printf( "Lower: %s\n", copy1 );
printf( "Upper: %s\n", copy2 );
free( copy1 );
free( copy2 );
}
There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.
Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.
Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.
The code looks like this...
#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap() {
for (int i = 0; i < sizeof(UtoL); i++) {
if (isupper(i)) {
UtoL[i] = (char)(i + 32);
} else {
UtoL[i] = i;
}
}
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
char *p = szMyStr;
// do conversion in-place so as not to require a destination buffer
while (*p) { // szMyStr must be null-terminated
*p = UtoL[*p];
p++;
}
return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
time_t start;
char *Lowered, Upper[128];
InitUtoLMap();
strcpy(Upper, "Every GOOD boy does FINE!");
Lowered = LowerStr(Upper);
return 0;
}
This approach will, at the same time, allow you to remap any other characters you wish to change.
This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.
Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.
Here's a macro technique if you want something simple:
#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(), ::toupper); std::transform (x.begin()+1, x.end(), x.begin()+1,::tolower)
However, note that #AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.
Is there an alternative which works 100% of the time?
No
There are several questions you need to ask yourself before choosing a lowercasing method.
How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding?
What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else?
What libraries are available?
Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!
C++ doesn't have tolower or toupper methods implemented for std::string, but it is available for char. One can easily read each char of string, convert it into required case and put it back into string.
A sample code without using any third party library:
#include<iostream>
int main(){
std::string str = std::string("How ARe You");
for(char &ch : str){
ch = std::tolower(ch);
}
std::cout<<str<<std::endl;
return 0;
}
For character based operation on string : For every character in string
// tolower example (C++)
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for (std::string::size_type i=0; i<str.length(); ++i)
std::cout << std::tolower(str[i],loc);
return 0;
}
For more information: http://www.cplusplus.com/reference/locale/tolower/
Copy because it was disallowed to improve answer. Thanks SO
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
Explanation:
for(auto& c : test) is a range-based for loop of the kind for (range_declaration:range_expression)loop_statement:
range_declaration: auto& c
Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.
range_expression: test
The range in this case are the characters of string test.
The characters of the string test are available as a reference inside the for loop through identifier c.
Try this function :)
string toLowerCase(string str) {
int str_len = str.length();
string final_str = "";
for(int i=0; i<str_len; i++) {
char character = str[i];
if(character>=65 && character<=92) {
final_str += (character+32);
} else {
final_str += character;
}
}
return final_str;
}
Use fplus::to_lower_case() from fplus library.
Search to_lower_case in fplus API Search
Example:
fplus::to_lower_case(std::string("ABC")) == std::string("abc");
Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.
#include <exception>
#include <iostream>
#include <codecvt>
// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"
using namespace std;
using namespace unicode;
// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;
std::u32string in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;
std::u32string lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;
Output
Je suis là!
je suis là!
Google's absl library has absl::AsciiStrToLower / absl::AsciiStrToUpper
Since you are using std::string, you are using c++. If using c++11 or higher, this doesn't need anything fancy. If words is vector<string>, then:
for (auto & str : words) {
for(auto & ch : str)
ch = tolower(ch);
}
Doesn't have strange exceptions. Might want to use w_char's but otherwise this should do it all in place.
Code Snippet
#include<bits/stdc++.h>
using namespace std;
int main ()
{
ios::sync_with_stdio(false);
string str="String Convert\n";
for(int i=0; i<str.size(); i++)
{
str[i] = tolower(str[i]);
}
cout<<str<<endl;
return 0;
}
Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).
Facebook's Folly:
void toLowerAscii(char* str, size_t length)
Google's Abseil:
void AsciiStrToLower(std::string* s);
I wrote a templated version that works with any string :
#include <type_traits> // std::decay
#include <ctype.h> // std::toupper & std::tolower
template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1,
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------
template<class T, class T2 =
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::toupper(*t); }
template<class T, class T2 = typename std::decay< typename
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::tolower(*t); }
Tested with gcc compiler:
#include <iostream>
#include "upove_code.h"
int main()
{
std::string str1 = "hEllo ";
char str2 [] = "wOrld";
ToUpper(str1);
ToUpper(str2);
std::cout << str1 << str2 << '\n';
Tolower(str1);
Tolower(str2);
std::cout << str1 << str2 << '\n';
return 0;
}
output:
>HELLO WORLD
>
>hello world
use this code to change case of string in c++.
#include<bits/stdc++.h>
using namespace std;
int main(){
string a = "sssAAAAAAaaaaDas";
transform(a.begin(),a.end(),a.begin(),::tolower);
cout<<a;
}
This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string _input = "lowercasetouppercase";
#if 0
// My idea is to use the ascii value to convert
char upperA = 'A';
char lowerA = 'a';
cout << (int)upperA << endl; // ASCII value of 'A' -> 65
cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
// 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0
cout << "Input String = " << _input.c_str() << endl;
for (int i = 0; i < _input.length(); ++i)
{
_input[i] -= 32; // To convert lower to upper
#if 0
_input[i] += 32; // To convert upper to lower
#endif // 0
}
cout << "Output String = " << _input.c_str() << endl;
return 0;
}
Note: if there are special characters then need to be handled using condition check.
I would like to generate consecutive C++ strings like e.g. in cameras: IMG001, IMG002 etc. being able to indicate the prefix and the string length.
I have found a solution where I can generate random strings from concrete character set: link
But I cannot find the thing I want to achieve.
A possible solution:
#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
std::string make_string(const std::string& a_prefix,
size_t a_suffix,
size_t a_max_length)
{
std::ostringstream result;
result << a_prefix <<
std::setfill('0') <<
std::setw(a_max_length - a_prefix.length()) <<
a_suffix;
return result.str();
}
int main()
{
for (size_t i = 0; i < 100; i++)
{
std::cout << make_string("IMG", i, 6) << "\n";
}
return 0;
}
See online demo at http://ideone.com/HZWmtI.
Something like this would work
#include <string>
#include <iomanip>
#include <sstream>
std::string GetNextNumber( int &lastNum )
{
std::stringstream ss;
ss << "IMG";
ss << std::setfill('0') << std::setw(3) << lastNum++;
return ss.str();
}
int main()
{
int x = 1;
std::string s = GetNextNumber( x );
s = GetNextNumber( x );
return 0;
}
You can call GetNextNumber repeatedly with an int reference to generate new image numbers. You can always use sprintf but it won't be the c++ way :)
const int max_size = 7 + 1; // maximum size of the name plus one
char buf[max_size];
for (int i = 0 ; i < 1000; ++i) {
sprintf(buf, "IMG%.04d", i);
printf("The next name is %s\n", buf);
}
char * seq_gen(char * prefix) {
static int counter;
char * result;
sprintf(result, "%s%03d", prefix, counter++);
return result;
}
This would print your prefix with 3 digit padding string. If you want a lengthy string, all you have to do is provide the prefix as much as needed and change the %03d in the above code to whatever length of digit padding you want.
Well, the idea is rather simple. Just store the current number and increment it each time new string is generated. You can implement it to model an iterator to reduce the fluff in using it (you can then use standard algorithms with it). Using Boost.Iterator (it should work with any string type, too):
#include <boost/iterator/iterator_facade.hpp>
#include <sstream>
#include <iomanip>
// can't come up with a better name
template <typename StringT, typename OrdT>
struct ordinal_id_generator : boost::iterator_facade<
ordinal_id_generator<StringT, OrdT>, StringT,
boost::forward_traversal_tag, StringT
> {
ordinal_id_generator(
const StringT& prefix = StringT(),
typename StringT::size_type suffix_length = 5, OrdT initial = 0
) : prefix(prefix), suffix_length(suffix_length), ordinal(initial)
{}
private:
StringT prefix;
typename StringT::size_type suffix_length;
OrdT ordinal;
friend class boost::iterator_core_access;
void increment() {
++ordinal;
}
bool equal(const ordinal_id_generator& other) const {
return (
ordinal == other.ordinal
&& prefix == other.prefix
&& suffix_length == other.suffix_length
);
}
StringT dereference() const {
std::basic_ostringstream<typename StringT::value_type> ss;
ss << prefix << std::setfill('0')
<< std::setw(suffix_length) << ordinal;
return ss.str();
}
};
And example code:
#include <string>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef ordinal_id_generator<std::string, unsigned> generator;
int main() {
std::ostream_iterator<std::string> out(std::cout, "\n");
std::copy_n(generator("IMG"), 5, out);
// can even behave as a range
std::copy(generator("foo", 1, 2), generator("foo", 1, 4), out);
return 0;
}
Take a look at the standard library's string streams. Have an integer that you increment, and insert into the string stream after every increment. To control the string length, there's the concept of fill characters, and the width() member function.
You have many ways of doing that.
The generic one would be to, like the link that you showed, have an array of possible characters. Then after each iteration, you start from right-most character, increment it (that is, change it to the next one in the possible characters list) and if it overflowed, set it to the first one (index 0) and go the one on the left. This is exactly like incrementing a number in base, say 62.
In your specific example, you are better off with creating the string from another string and a number.
If you like *printf, you can write a string with "IMG%04d" and have the parameter go from 0 to whatever.
If you like stringstream, you can similarly do so.
What exactly do you mean by consecutive strings ?
Since you've mentioned that you're using C++ strings, try using the .string::append method.
string str, str2;
str.append("A");
str.append(str2);
Lookup http://www.cplusplus.com/reference/string/string/append/ for more overloaded calls of the append function.
it's pseudo code. you'll understand what i mean :D
int counter = 0, retval;
do
{
char filename[MAX_PATH];
sprintf(filename, "IMG00%d", counter++);
if(retval = CreateFile(...))
//ok, return
}while(!retval);
You have to keep a counter that is increased everytime you get a new name. This counter has to be saved when your application is ends, and loaded when you application starts.
Could be something like this:
class NameGenerator
{
public:
NameGenerator()
: m_counter(0)
{
// Code to load the counter from a file
}
~NameGenerator()
{
// Code to save the counter to a file
}
std::string get_next_name()
{
// Combine your preferred prefix with your counter
// Increase the counter
// Return the string
}
private:
int m_counter;
}
NameGenerator my_name_generator;
Then use it like this:
std::string my_name = my_name_generator.get_next_name();