How to efficiently replace elements of char string with different integer elements?

How to efficiently replace elements of char string with different integer elements? - c++

I am looking to replace the elements of character string with integer elements. I want to replace A with 1, B with 2, C with 3 and D with 4.
How can I do it efficiently?
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
std::string str = "ABCDDCBA";
std::replace(str.begin(), str.end(), 'A', '1'); // Replacing
std::replace(str.begin(), str.end(), 'B', '2');
std::replace(str.begin(), str.end(), 'C', '3');
std::replace(str.begin(), str.end(), 'D', '4');
// ...
std::cout << str << std::endl; // displaying
return 0;
}

Your method is quite efficient for replacing an arbitrary character with another arbitrary character.
However, currently you replace continuous latin alphabet with continuous integer digits, so you can take advantage of the fact that ASCII representations are also contiguous:
for(char& c : str)
c += '1' - 'A';
Of course, this depends on the native character encoding to represent contiguous latin alphabet with contiguous values, such as ASCII does. It also depends on the digits being represented contiguously, but that is mandated by the standard.
Furhtermore, this method currently has no check for replaced character, and will change all encountered characters. This is not an issue with your input string, but if you intend to only replace some types of characters and leave others untouched, then you'll need to add a condition check.

Or you can try if input string only contains caps alphabet in ASCII format
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main()
{
std::string str = "ABCDDCBA";
for(int i=0;i<str.size();i++)
{
str[i]=str[i]-'A'+'0';
}
std::cout << str << std::endl; // displaying
return 0;
}

Presumably you mean do the replacement in one traversal? There are flashy C++ Standard Library ways of doing this, or you could even solve it with std::regex. But exploiting the overloaded [] operator on the std::string class will be hard to beat in terms of performance and is clear.
for (std::size_t i = 0; i < str.size(); ++i){
switch (str[i]){
case 'A':
str[i] = '1';
break;
case 'B':
/*etc*/
}
}
If you are transforming more than a handful of characters (I took your question literally that you only wanted to map A, B, C, and D), then consider defining a const char[] array that describes the transformation; relying on the values of particular characters is not strictly portable C++.

A funny way to do that would be to use the ASCI value of each char:
by exemple: int('A') = 65, int('B') = 66 so you just have to do a boucle with
val = int(c) - 64;
(Works only if you have maj char)

One proposal in O(n)
void MyReplaceFunction(char& c)
{
static const int delta='A'-'1';
if(c>='A' && c<='Z')
{
c-=delta;
}
}
std::for_each(str.begin(), str.end(), MyReplaceFunction);

Related

How to loop through certain Ascii characters in c++

I only want to loop through certain Ascii characters and not all are directly next to each other . For example I only want to loop from char '1 to 7' and then from char '? to F'. I dont want to loop through '8 to >' . I have this for loop but this will include the char I don't want.
for (char i = '1'; i < 'H'; i++)
How should I modify it to only loop through what I want?

Looping from 1 to 7 is straight-forward, since the arabic numerals ('0' to '9') are required to continguous and increasing values by all C and C++ standards.
for (char c = '1'; c <= '7'; ++c)
or (a more common style)
for (char c = '1'; c < '8'; ++c)
The problem with trying to loop through your second set of ASCII characters ('?' to 'F') is that there are character sets other than ASCII - in which the order of characters is different. For example, in ASCII, '?' is one less than '#', but that is not guaranteed for other character sets. Instead, create a string with the characters you want to loop over, and iterate over the string. For example;
const std::string characterset = "?#ABCDEF";
for (char c : characterset) // option 1, C++11 and later
{
// do something with c
}
for (auto c : characterset) // option 2, C++11 and later (type deduction)
{
// do something with c
}
// Option 3 (all C++ standards)
for (std::string::const_iterator it = characterset.begin(), end = characterset.end();
it != end; ++it)
{
char c = *it;
// do something with *it or c (it is an iterator that references a character)
}
will loop over your second set of characters.
If you want to do it all in a single loop, then change the character set. For example, a modified version of Option 1 above might be;
const std::string characterset = "1234567?#ABCDEF";
This is a more general approach that doesn't rely on your implementation (host system, compiler, library) supporting the ASCII character set (or compatible).

Each character has a fixed ASCII value associated with it. You can refer to any character with that particular ASCII value. You can just skip the characters you do not want with an 'if' condition. You will find all the ASCII values here. Referring to your example, if you want to skip the characters from '?' to 'F', the code might look something like this:
#include <iostream>
using namespace std;
int main()
{
for (char i = '1'; i < 'H'; i++)
{
if(i>=63 && i<=70)
// 63 is the ASCII value for '?'
// 70 is the ASCII value for 'F'
{
// skipping the ASCII values we do not need
continue;
}
cout << i << "";
}
return 0;
}

Create a set containing the characters you want to loop over and loop over that set.
For example :
#include <iostream>
#include <stdexcept>
#include <string>
#include <set>
// character_set.h
//-------------------------------------------------------------------------
// To be able to easily input a character range we need a helper struct
struct character_range_t
{
// have this destructor so a character range can be used in brace initialization.
character_range_t(const char f, const char t) :
from(f),
to(t)
{
if (to < from) throw std::invalid_argument("to must be larger or equal to from");
}
char from;
char to;
};
//-------------------------------------------------------------------------
// helper function to combine multiple character ranges into on set
// input is a compile time array of ranges
template<std::size_t N>
auto make_character_set(const character_range_t(&ranges)[N])
{
// I chose a set because all elements must be unique and set does that.
std::set<char> set;
// loop over all input ranges
for (std::size_t n = 0; n < N; ++n)
{
// and for each range add the characters in the range to the set
for (char c = ranges[n].from; c <= ranges[n].to; ++c) set.insert(c);
}
return set;
}
// main.cpp
//-------------------------------------------------------------------------
// #include "character_set.h"
int main()
{
auto set = make_character_set({{'1','7'},{'?','F'}});
// use range based for loop to loop over all characters in the set
for (const char c : set)
{
std::cout << c << " ";
}
}

Count unique words in a string in C++

I want to count how many unique words are in string 's' where punctuations and newline character (\n) separates each word. So far I've used the logical or operator to check how many wordSeparators are in the string, and added 1 to the result to get the number of words in string s.
My current code returns 12 as the number of word. Since 'ab', 'AB', 'aB', 'Ab' (and same for 'zzzz') are all same and not unique, how can I ignore the variants of a word? I followed the link: http://www.cplusplus.com/reference/algorithm/unique/, but the reference counts unique item in a vector. But, I am using string and not vector.
Here is my code:
#include <iostream>
#include <string>
using namespace std;
bool isWordSeparator(char & c) {
return c == ' ' || c == '-' || c == '\n' || c == '?' || c == '.' || c == ','
|| c == '?' || c == '!' || c == ':' || c == ';';
}
int countWords(string s) {
int wordCount = 0;
if (s.empty()) {
return 0;
}
for (int x = 0; x < s.length(); x++) {
if (isWordSeparator(s.at(x))) {
wordCount++;
return wordCount+1;
int main() {
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
int number_of_words = countWords(s);
cout << "Number of Words: " << number_of_words << endl;
return 0;
}

What you need to make your code case-insensitive is tolower().
You can apply it to your original string using std::transform:
std::transform(s.begin(), s.end(), s.begin(), ::tolower);
I should add however that your current code is much closer to C than to C++, perhaps you should check out what standard library has to offer.
I suggest istringstream + istream_iterator for tokenizing and either unique_copy or set for getting rid of the duplicates, like this: https://ideone.com/nb4BEH

You could create a set of strings, save the position of the last separator (starting with 0) and use substring to extract the word, then insert it into the set. When done just return the set's size.
You could make the whole operation easier by using string::split - it tokenizes the string for you. All you have to do is insert all of the elements in the returned array to the set and again return it's size.
Edit: as per comments, you need a custom comparator to ignore case for comparisons.

First of all I'd suggest rewriting isWordSeparator like this:
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
since your current implementation doesn't handle all the punctuation and space, like \t or +.
Also, incrementing wordCount when isWordSeparator is true is incorrect for example if you have something like ?!.
So, a less error-prone approach would be to substitute all separators by space and then iterate words inserting them into an (unordered) set:
#include <iterator>
#include <unordered_set>
#include <algorithm>
#include <cctype>
#include <sstream>
int countWords(std::string s) {
std::transform(s.begin(), s.end(), s.begin(), [](char c) {
if (isWordSeparator(c)) {
return ' ';
}
return std::tolower(c);
});
std::unordered_set<std::string> uniqWords;
std::stringstream ss(s);
std::copy(std::istream_iterator<std::string>(ss), std::istream_iterator<std::string(), std::inserter(uniqWords));
return uniqWords.size();
}

While splitting the string into words, insert all words into a std::set. This will get rid of the duplicates. Then it's just a matter of calling set::size() to get the number of unique words.
I'm using the boost::split() function from the boost string algorithm library in my solution, because is almost standard nowadays.
Explanations in the comments in code...
#include <iostream>
#include <string>
#include <set>
#include <boost/algorithm/string.hpp>
using namespace std;
// Function suggested by user 'mshrbkv':
bool isWordSeparator(char c) {
return std::isspace(c) || std::ispunct(c);
}
// This is used to make the set case-insensitive.
// Alternatively you could call boost::to_lower() to make the
// string all lowercase before calling boost::split().
struct IgnoreCaseCompare {
bool operator()( const std::string& a, const std::string& b ) const {
return boost::ilexicographical_compare( a, b );
}
};
int main()
{
string s = "ab\nAb!aB?AB:ab.AB;ab\nAB\nZZZZ zzzz Zzzz\nzzzz";
// Define a set that will contain only unique strings, ignoring case.
set< string, IgnoreCaseCompare > words;
// Split the string by using your isWordSeparator function
// to define the delimiters. token_compress_on collapses multiple
// consecutive delimiters into only one.
boost::split( words, s, isWordSeparator, boost::token_compress_on );
// Now the set contains only the unique words.
cout << "Number of Words: " << words.size() << endl;
for( auto& w : words )
cout << w << endl;
return 0;
}
Demo: http://coliru.stacked-crooked.com/a/a3b51a6c6a3b4ee8

You can consider SQLite c++ wrapper

Make *it in lowercase [duplicate]

I want to convert a std::string to lowercase. I am aware of the function tolower(). However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string would require iterating over each character.
Is there an alternative which works 100% of the time?

Adapted from Not So Frequently Asked Questions:
#include <algorithm>
#include <cctype>
#include <string>
std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
[](unsigned char c){ return std::tolower(c); });
You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.
If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:
char asciitolower(char in) {
if (in <= 'Z' && in >= 'A')
return in - ('Z' - 'z');
return in;
}
std::transform(data.begin(), data.end(), data.begin(), asciitolower);
Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.

Boost provides a string algorithm for this:
#include <boost/algorithm/string.hpp>
std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str
Or, for non-in-place:
#include <boost/algorithm/string.hpp>
const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);

tl;dr
Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.
First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)
If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.
As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).
So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.
Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?
So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.
(C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)
While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.
And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)
So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:
#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>
#include <iostream>
int main()
{
/* "Odysseus" */
char const * someString = u8"ΟΔΥΣΣΕΥΣ";
icu::UnicodeString someUString( someString, "UTF-8" );
// Setting the locale explicitly here for completeness.
// Usually you would use the user-specified system locale,
// which *does* make a difference (see ı vs. i above).
std::cout << someUString.toLower( "el_GR" ) << "\n";
std::cout << someUString.toUpper( "el_GR" ) << "\n";
return 0;
}
Compile (with G++ in this example):
g++ -Wall example.cpp -licuuc -licuio
This gives:
ὀδυσσεύς
Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.

Using range-based for loop of C++11 a simpler code would be :
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for(auto elem : str)
std::cout << std::tolower(elem,loc);
}

If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html

Another approach using range based for loop with reference variable
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
cout<<test<<endl;

This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.
#include <string>
#include <algorithm>
#include <iostream>
int main (int argc, char* argv[])
{
std::string sourceString = "Abc";
std::string destinationString;
// Allocate the destination space
destinationString.resize(sourceString.size());
// Convert the source string to lower case
// storing the result in destination string
std::transform(sourceString.begin(),
sourceString.end(),
destinationString.begin(),
::tolower);
// Output the result of the conversion
std::cout << sourceString
<< " -> "
<< destinationString
<< std::endl;
}

Simplest way to convert string into loweercase without bothering about std namespace is as follows
1:string with/without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
getline(cin,str);
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}
2:string without spaces
#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
cin>>str;
//------------function to convert string into lowercase---------------
transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
cout<<str;
return 0;
}

My own template functions which performs upper / lower case.
#include <string>
#include <algorithm>
//
// Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), tolower);
return s2;
}
//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
std::basic_string<T> s2 = s;
std::transform(s2.begin(), s2.end(), s2.begin(), toupper);
return s2;
}

I wrote this simple helper function:
#include <locale> // tolower
string to_lower(string s) {
for(char &c : s)
c = tolower(c);
return s;
}
Usage:
string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.

An alternative to Boost is POCO (pocoproject.org).
POCO provides two variants:
The first variant makes a copy without altering the original string.
The second variant changes the original string in place.
"In Place" versions always have "InPlace" in the name.
Both versions are demonstrated below:
#include "Poco/String.h"
using namespace Poco;
std::string hello("Stack Overflow!");
// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));
// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);

std::ctype::tolower() from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page
#include <locale>
#include <iostream>
int main () {
std::locale::global(std::locale("en_US.utf8"));
std::wcout.imbue(std::locale());
std::wcout << "In US English UTF-8 locale:\n";
auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
std::wstring str = L"HELLo, wORLD!";
std::wcout << "Lowercase form of the string '" << str << "' is ";
f.tolower(&str[0], &str[0] + str.size());
std::wcout << "'" << str << "'\n";
}

Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3, I would like to add a way to perform this conversion using it.
To modify the string in-place:
str |= action::transform([](unsigned char c){ return std::tolower(c); });
To generate a new string:
auto new_string = original_string
| view::transform([](unsigned char c){ return std::tolower(c); });
(Don't forget to #include <cctype> and the required Ranges headers.)
Note: the use of unsigned char as the argument to the lambda is inspired by cppreference, which states:
Like all other functions from <cctype>, the behavior of std::tolower is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char:
char my_tolower(char ch)
{
return static_cast<char>(std::tolower(static_cast<unsigned char>(ch)));
}
Similarly, they should not be directly used with standard algorithms when the iterator's value type is char or signed char. Instead, convert the value to unsigned char first:
std::string str_tolower(std::string s) {
std::transform(s.begin(), s.end(), s.begin(),
// static_cast<int(*)(int)>(std::tolower) // wrong
// [](int c){ return std::tolower(c); } // wrong
// [](char c){ return std::tolower(c); } // wrong
[](unsigned char c){ return std::tolower(c); } // correct
);
return s;
}

On microsoft platforms you can use the strlwr family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx
// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>
int main( void )
{
char string[100] = "The String to End All Strings!";
char * copy1 = _strdup( string ); // make two copies
char * copy2 = _strdup( string );
_strlwr( copy1 ); // C4996
_strupr( copy2 ); // C4996
printf( "Mixed: %s\n", string );
printf( "Lower: %s\n", copy1 );
printf( "Upper: %s\n", copy2 );
free( copy1 );
free( copy2 );
}

There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.
Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.
Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.
The code looks like this...
#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap() {
for (int i = 0; i < sizeof(UtoL); i++) {
if (isupper(i)) {
UtoL[i] = (char)(i + 32);
} else {
UtoL[i] = i;
}
}
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
char *p = szMyStr;
// do conversion in-place so as not to require a destination buffer
while (*p) { // szMyStr must be null-terminated
*p = UtoL[*p];
p++;
}
return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
time_t start;
char *Lowered, Upper[128];
InitUtoLMap();
strcpy(Upper, "Every GOOD boy does FINE!");
Lowered = LowerStr(Upper);
return 0;
}
This approach will, at the same time, allow you to remap any other characters you wish to change.
This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.
Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.

Here's a macro technique if you want something simple:
#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(), ::toupper); std::transform (x.begin()+1, x.end(), x.begin()+1,::tolower)
However, note that #AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.

Is there an alternative which works 100% of the time?
No
There are several questions you need to ask yourself before choosing a lowercasing method.
How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding?
What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else?
What libraries are available?
Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!

C++ doesn't have tolower or toupper methods implemented for std::string, but it is available for char. One can easily read each char of string, convert it into required case and put it back into string.
A sample code without using any third party library:
#include<iostream>
int main(){
std::string str = std::string("How ARe You");
for(char &ch : str){
ch = std::tolower(ch);
}
std::cout<<str<<std::endl;
return 0;
}
For character based operation on string : For every character in string

// tolower example (C++)
#include <iostream> // std::cout
#include <string> // std::string
#include <locale> // std::locale, std::tolower
int main ()
{
std::locale loc;
std::string str="Test String.\n";
for (std::string::size_type i=0; i<str.length(); ++i)
std::cout << std::tolower(str[i],loc);
return 0;
}
For more information: http://www.cplusplus.com/reference/locale/tolower/

Copy because it was disallowed to improve answer. Thanks SO
string test = "Hello World";
for(auto& c : test)
{
c = tolower(c);
}
Explanation:
for(auto& c : test) is a range-based for loop of the kind for (range_declaration:range_expression)loop_statement:
range_declaration: auto& c
Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.
range_expression: test
The range in this case are the characters of string test.
The characters of the string test are available as a reference inside the for loop through identifier c.

Try this function :)
string toLowerCase(string str) {
int str_len = str.length();
string final_str = "";
for(int i=0; i<str_len; i++) {
char character = str[i];
if(character>=65 && character<=92) {
final_str += (character+32);
} else {
final_str += character;
}
}
return final_str;
}

Use fplus::to_lower_case() from fplus library.
Search to_lower_case in fplus API Search
Example:
fplus::to_lower_case(std::string("ABC")) == std::string("abc");

Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.
#include <exception>
#include <iostream>
#include <codecvt>
// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"
using namespace std;
using namespace unicode;
// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;
std::u32string in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;
std::u32string lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;
Output
Je suis là!
je suis là!

Google's absl library has absl::AsciiStrToLower / absl::AsciiStrToUpper

Since you are using std::string, you are using c++. If using c++11 or higher, this doesn't need anything fancy. If words is vector<string>, then:
for (auto & str : words) {
for(auto & ch : str)
ch = tolower(ch);
}
Doesn't have strange exceptions. Might want to use w_char's but otherwise this should do it all in place.

Code Snippet
#include<bits/stdc++.h>
using namespace std;
int main ()
{
ios::sync_with_stdio(false);
string str="String Convert\n";
for(int i=0; i<str.size(); i++)
{
str[i] = tolower(str[i]);
}
cout<<str<<endl;
return 0;
}

Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).
Facebook's Folly:
void toLowerAscii(char* str, size_t length)
Google's Abseil:
void AsciiStrToLower(std::string* s);

I wrote a templated version that works with any string :
#include <type_traits> // std::decay
#include <ctype.h> // std::toupper & std::tolower
template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1,
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------
template<class T, class T2 =
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::toupper(*t); }
template<class T, class T2 = typename std::decay< typename
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0];
for (; *t; ++t) *t = std::tolower(*t); }
Tested with gcc compiler:
#include <iostream>
#include "upove_code.h"
int main()
{
std::string str1 = "hEllo ";
char str2 [] = "wOrld";
ToUpper(str1);
ToUpper(str2);
std::cout << str1 << str2 << '\n';
Tolower(str1);
Tolower(str2);
std::cout << str1 << str2 << '\n';
return 0;
}
output:
>HELLO WORLD
>
>hello world

use this code to change case of string in c++.
#include<bits/stdc++.h>
using namespace std;
int main(){
string a = "sssAAAAAAaaaaDas";
transform(a.begin(),a.end(),a.begin(),::tolower);
cout<<a;
}

This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string _input = "lowercasetouppercase";
#if 0
// My idea is to use the ascii value to convert
char upperA = 'A';
char lowerA = 'a';
cout << (int)upperA << endl; // ASCII value of 'A' -> 65
cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
// 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0
cout << "Input String = " << _input.c_str() << endl;
for (int i = 0; i < _input.length(); ++i)
{
_input[i] -= 32; // To convert lower to upper
#if 0
_input[i] += 32; // To convert upper to lower
#endif // 0
}
cout << "Output String = " << _input.c_str() << endl;
return 0;
}
Note: if there are special characters then need to be handled using condition check.

Remove character from array where spaces and punctuation marks are found [duplicate]

This question already has answers here:
C++ Remove punctuation from String
(12 answers)
Closed 9 years ago.
In my program, I am checking whole cstring, if any spaces or punctuation marks are found, just add empty character to that location but the complilor is giving me an error: empty character constant.
Please help me out, in my loop i am checking like this
if(ispunct(str1[start])) {
str1[start]=''; // << empty character constant.
}
if(isspace(str1[start])) {
str1[start]=''; // << empty character constant.
}
This is where my errors are please correct me.
for eg the word is str,, ing, output should be string.

There is no such thing as an empty character.
If you mean a space then change '' to ' ' (with a space in it).
If you mean NUL then change it to '\0'.

Edit: the answer is no longer relevant now that the OP has edited the question. Leaving up for posterity's sake.
If you're wanting to add a null character, use '\0'. If you're wanting to use a different character, using the appropriate character for that. You can't assign it nothing. That's meaningless. That's like saying
int myHexInt = 0x;
or
long long myIndeger = L;
The compiler will error. Put in the value you wanted. In the char case, that's a value from 0 to 255.

UPDATE:
From the edit to OP's question, it's apparent that he/she wanted to trim a string of punctuation and space characters.
As detailed in the flagged possible duplicate, one way is to use remove_copy_if:
string test = "THisisa test;;';';';";
string temp, finalresult;
remove_copy_if(test.begin(), test.end(), std::back_inserter(temp), ptr_fun<int, int>(&ispunct));
remove_copy_if(temp.begin(), temp.end(), std::back_inserter(finalresult), ptr_fun<int, int>(&isspace));
ORIGINAL
Examining your question, replacing spaces with spaces is redundant, so you really need to figure out how to replace punctuation characters with spaces. You can do so using a comparison function (by wrapping std::ispunct) in tandem with std::replace_if from the STL:
#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>
using namespace std;
bool is_punct(const char& c) {
return ispunct(c);
}
int main() {
string test = "THisisa test;;';';';";
char test2[] = "THisisa test;;';';'; another";
size_t size = sizeof(test2)/sizeof(test2[0]);
replace_if(test.begin(), test.end(), is_punct, ' ');//for C++ strings
replace_if(&test2[0], &test2[size-1], is_punct, ' ');//for c-strings
cout << test << endl;
cout << test2 << endl;
}
This outputs:
THisisa test
THisisa test another

Try this (as you asked for cstring explicitly):
char str1[100] = "str,, ing";
if(ispunct(str1[start]) || isspace(str1[start])) {
strncpy(str1 + start, str1 + start + 1, strlen(str1) - start + 1);
}
Well, doing this just in pure c language, there are more efficient solutions (have a look at #MichaelPlotke's answer for details).
But as you also explicitly ask for c++, I'd recommend a solution as follows:
Note you can use the standard c++ algorithms for 'plain' c-style character arrays also. You just have to place your predicate conditions for removal into a small helper functor and use it with the std::remove_if() algorithm:
struct is_char_category_in_question {
bool operator()(const char& c) const;
};
And later use it like:
#include <string>
#include <algorithm>
#include <iostream>
#include <cctype>
#include <cstring>
// Best chance to have the predicate elided to be inlined, when writing
// the functor like this:
struct is_char_category_in_question {
bool operator()(const char& c) const {
return std::ispunct(c) || std::isspace(c);
}
};
int main() {
static char str1[100] = "str,, ing";
size_t size = strlen(str1);
// Using std::remove_if() is likely to provide the best balance from perfor-
// mance and code size efficiency you can expect from your compiler
// implementation.
std::remove_if(&str1[0], &str1[size + 1], is_char_category_in_question());
// Regarding specification of the range definitions end of the above state-
// ment, note we have to add 1 to the strlen() calculated size, to catch the
// closing `\0` character of the c-style string being copied correctly and
// terminate the result as well!
std::cout << str1 << endl; // Prints: string
}
See this compilable and working sample also here.

As I don't like the accepted answer, here's mine:
#include <stdio.h>
#include <string.h>
#include <cctype>
int main() {
char str[100] = "str,, ing";
int bad = 0;
int cur = 0;
while (str[cur] != '\0') {
if (bad < cur && !ispunct(str[cur]) && !isspace(str[cur])) {
str[bad] = str[cur];
}
if (ispunct(str[cur]) || isspace(str[cur])) {
cur++;
}
else {
cur++;
bad++;
}
}
str[bad] = '\0';
fprintf(stdout, "cur = %d; bad = %d; str = %s\n", cur, bad, str);
return 0;
}
Which outputs cur = 18; bad = 14; str = string
This has the advantage of being more efficient and more readable, hm, well, in a style I happen to like better (see comments for a lengthy debate / explanation).

How to check if a string is all lowercase and alphanumerics?

Is there a method that checks for these cases? Or do I need to parse each letter in the string, and check if it's lower case (letter) and is a number/letter?

You can use islower(), isalnum() to check for those conditions for each character. There is no string-level function to do this, so you'll have to write your own.

Assuming that the "C" locale is acceptable (or swap in a different set of characters for criteria), use find_first_not_of()
#include <string>
bool testString(const std::string& str)
{
std::string criteria("abcdefghijklmnopqrstuvwxyz0123456789");
return (std::string::npos == str.find_first_not_of(criteria);
}

It's not very well known, but a locale actually does have functions to determine characteristics of entire strings at a time. Specifically, the ctype facet of a locale has a scan_is and a scan_not that scan for the first character that fits a specified mask (alpha, numeric, alphanumeric, lower, upper, punctuation, space, hex digit, etc.), or the first that doesn't fit it, respectively. Other than that, they work a bit like std::find_if, returning whatever you passed as the "end" to signal failure, otherwise returning a pointer to the first item in the string that doesn't fit what you asked for.
Here's a quick sample:
#include <locale>
#include <iostream>
#include <iomanip>
int main() {
std::string inputs[] = {
"alllower",
"1234",
"lower132",
"including a space"
};
// We'll use the "classic" (C) locale, but this works with any
std::locale loc(std::locale::classic());
// A mask specifying the characters to search for:
std::ctype_base::mask m = std::ctype_base::lower | std::ctype_base::digit;
for (int i=0; i<4; i++) {
char const *pos;
char const *b = &*inputs[i].begin();
char const *e = &*inputs[i].end();
std::cout << "Input: " << std::setw(20) << inputs[i] << ":\t";
// finally, call the actual function:
if ((pos=std::use_facet<std::ctype<char> >(loc).scan_not(m, b, e)) == e)
std::cout << "All characters match mask\n";
else
std::cout << "First non-matching character = \"" << *pos << "\"\n";
}
return 0;
}
I suspect most people will prefer to use std::find_if though -- using it is nearly the same, but can be generalized to many more situations quite easily. Even though this has much narrower applicability, it's not really a lot easier to user (though I suppose if you're scanning large chunks of text, it might well be at least a little faster).

You could use the tolower & strcmp to compare if the original_string and the tolowered string.And do the numbers individually per character.
(OR) Do both per character as below.
#include <algorithm>
static inline bool is_not_alphanum_lower(char c)
{
return (!isalnum(c) || !islower(c));
}
bool string_is_valid(const std::string &str)
{
return find_if(str.begin(), str.end(), is_not_alphanum_lower) == str.end();
}
I used the some info from:
Determine if a string contains only alphanumeric characters (or a space)

Just use std::all_of
bool lowerAlnum = std::all_of(str.cbegin(), str.cend(), [](const char c){
return isdigit(c) || islower(c);
});
If you don't care about locale (i.e. the input is pure 7-bit ASCII) then the condition can be optimized into
[](const char c){ return ('0' <= c && c <= '9') || ('a' <= c && c <= 'z'); }

If your strings contain ASCII-encoded text and you like to write your own functions (like I do) then you can use this:
bool is_lower_alphanumeric(const string& txt)
{
for(char c : txt)
{
if (!((c >= '0' and c <= '9') or (c >= 'a' and c <= 'z'))) return false;
}
return true;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to efficiently replace elements of char string with different integer elements? - c++

A funny way to do that would be to use the ASCI value of each char: by exemple: int('A') = 65, int('B') = 66 so you just have to do a boucle with val = int(c) - 64; (Works only if you have maj char)

One proposal in O(n) void MyReplaceFunction(char& c) { static const int delta='A'-'1'; if(c>='A' && c<='Z') { c-=delta; } } std::for_each(str.begin(), str.end(), MyReplaceFunction);

Related

How to loop through certain Ascii characters in c++

Count unique words in a string in C++

Make *it in lowercase [duplicate]

Remove character from array where spaces and punctuation marks are found [duplicate]

How to check if a string is all lowercase and alphanumerics?

Categories

Resources