I'm writing a piece of my code that checks whether what the user has entered is actually one of the valid inputs (1-9 in this case), and will give an error message if it isn't.
This is what I have:
if (input != '1', '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , '0' )
{
cout << "Error";
}
But it doesn't seem to work. I thought I could use commas to separate them, but maybe I'm imagining that.
Is the only option to just do:
input != '1' && input != '2' && input != '3' etc etc
I know that method would work, but it seems a bit long winded. Is there a simpler way?
You can store the values in a container and utilize the std::find_if, std::none_of or std::any_of functions:
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
std::vector<char> v = { '1', '2', '3', '4', '5', '6', '7', '8', '9', '0' };
char input = '1';
if (std::none_of(v.cbegin(), v.cend(), [&input](char p){ return p == input; })) {
std::cout << "None of the elements are equal to input.\n";
}
else {
std::cout << "Some of the elements are equal to input.\n";
}
}
How do I check if a variable is not equal to multiple things
Is the only option to just do:
input != '1' && input != '2' && input != '3' etc etc
In the general case, for an arbitrary set of values: No, that is not the only option, but it is the simplest. And simplest is often best, or at least good enough.
If you dislike the redundant repetition of input !=, a variadic template can be used to generate the expression. I've written an example of this in another question: https://stackoverflow.com/a/51497146/2079303
In specific cases, there may be better alternatives. There exists std::isdigit for example for exactly the particular case in your example code.
In order to check if a variable is (not) equal to mutliple things which are not known until runtime, the typical solution is to use a set data structure, such as std::unordered_set.
If you are looking for a more general and human-readable construct, you can create something like this:
template <typename T, int TSize>
struct AnyOfThis {
template <typename TFirst, typename... TOthers>
explicit AnyOfThis(TFirst&& first, TOthers&&... others)
: values({ std::forward<TFirst>(first), std::forward<TOthers>(others)... }) {}
std::array<T, TSize> values;
};
template <typename TFirst, typename... TOthers>
auto anyOf(TFirst&& first, TOthers&&... others) {
constexpr std::size_t size = 1 + sizeof...(others);
return AnyOfThis<typename std::decay<TFirst>::type, size>(std::forward<TFirst>(first),
std::forward<TOthers>(others)...);
}
template <typename T, int TSize>
bool operator==(const T value, const AnyOfThis<typename std::decay<T>::type, TSize>& anyOfThis) {
return std::find(anyOfThis.values.begin(), anyOfThis.values.end(), value) != anyOfThis.values.end();
}
Basically, it creates a static array from a variadic function. Then there is another function which serves as a comparator, which takes the value you want to compare and looks for this value in the array.
The use-case reads fairly well, too:
if (1 == anyOf(1, 2, 3)) {
// do stuff
}
LIVE DEMO AT COLIRU
simple and efficient way would be.
std::unordered_set<char> allowedValues = {'1','2','3','4','5','6','7','8','9','0'};
std::unordered_set<char>::const_iterator index = allowedValues.find(input);
if(index == allowedValues.end())
std::cout << "Error";
else
std::cout << "Valid";
by using unordered set you expect O(1) complexity for lookup. It is good when input number is high. If your index is equal to end of set it does not exist in the list, you will get end of set as index which is invalid input for you. otherwise you will count it as a valid input
If you are looking for "if a string is not equal to multiple strings in C" you may use the following (Not everyone would consider it elegant, but if you are fond of good old c-str then you may find it nice. Surely, it is simple and fast):
int GetIdxOfStringInOptionList (const char *Xi_pStr)
{
char l_P2[205];
sprintf(l_P2, "<#%s^>", Xi_pStr); // TODO: if (strlen>=200) return -1. Note that 200 is above length of options string below
_strlwr(l_P2); // iff you want comparison to be case insensitive
const char *l_pCO = strstr("01<#gps^>02<#gps2^>03<#log^>04<#img^>05<#nogps^>06<#nogps2^>07<#gps3^>08<#pillars0^>09<#pillars1^>10<#pillars2^>11<#pillars3^>", l_P2);
return l_pCO? atoi(l_pCO-2) : -1;
}
This question already has answers here:
Shorthand for checking for equality to multiple possibilities [duplicate]
(3 answers)
Closed 6 years ago.
Just instead of:
if ( ch == 'A' || ch == 'B' || ch == 'C' || .....
For example, to do it like:
if ( ch == 'A', 'B', 'C', ...
is there even a shorter way to summarize conditions?
strchr() can be used to see if the character is in a list.
const char* list = "ABCXZ";
if (strchr(list, ch)) {
// 'ch' is 'A', 'B', 'C', 'X', or 'Z'
}
In this case you could use a switch:
switch (ch) {
case 'A':
case 'B':
case 'C':
// do something
break;
case 'D':
case 'E':
case 'F':
// do something else
break;
...
}
While this is slightly more verbose than using strchr, it doesn't involve any function calls. It also works for both C and C++.
Note that the alternate syntax you suggested won't work as you might expect because of the use of the comma operator:
if ( ch == 'A', 'B', 'C', 'D', 'E', 'F' )
This first compares ch to 'A' and then discards the result. Then 'B' is evaluated and discarded, then 'C', and so forth until 'F' is evaluated. Then 'F' becomes the value of the conditional. Since any non-zero value evaluated to true in a boolean context (and 'F' is non-zero), then the above expression will always be true.
Templates allow us to express ourselves in this way:
if (range("A-F").contains(ch)) { ... }
It requires a little plumbing, which you can put in a library.
This actually compiles out to be incredibly efficient (at least on gcc and clang).
#include <cstdint>
#include <tuple>
#include <utility>
#include <iostream>
namespace detail {
template<class T>
struct range
{
constexpr range(T first, T last)
: _begin(first), _end(last)
{}
constexpr T begin() const { return _begin; }
constexpr T end() const { return _end; }
template<class U>
constexpr bool contains(const U& u) const
{
return _begin <= u and u <= _end;
}
private:
T _begin;
T _end;
};
template<class...Ranges>
struct ranges
{
constexpr ranges(Ranges...ranges) : _ranges(std::make_tuple(ranges...)) {}
template<class U>
struct range_check
{
template<std::size_t I>
bool contains_impl(std::integral_constant<std::size_t, I>,
const U& u,
const std::tuple<Ranges...>& ranges) const
{
return std::get<I>(ranges).contains(u)
or contains_impl(std::integral_constant<std::size_t, I+1>(),u, ranges);
}
bool contains_impl(std::integral_constant<std::size_t, sizeof...(Ranges)>,
const U& u,
const std::tuple<Ranges...>& ranges) const
{
return false;
}
constexpr bool operator()(const U& u, std::tuple<Ranges...> const& ranges) const
{
return contains_impl(std::integral_constant<std::size_t, 0>(), u, ranges);
}
};
template<class U>
constexpr bool contains(const U& u) const
{
range_check<U> check {};
return check(u, _ranges);
}
std::tuple<Ranges...> _ranges;
};
}
template<class T>
constexpr auto range(T t) { return detail::range<T>(t, t); }
template<class T>
constexpr auto range(T from, T to) { return detail::range<T>(from, to); }
// this is the little trick which turns an ascii string into
// a range of characters at compile time. It's probably a bit naughty
// as I am not checking syntax. You could write "ApZ" and it would be
// interpreted as "A-Z".
constexpr auto range(const char (&s)[4])
{
return range(s[0], s[2]);
}
template<class...Rs>
constexpr auto ranges(Rs...rs)
{
return detail::ranges<Rs...>(rs...);
}
int main()
{
std::cout << range(1,7).contains(5) << std::endl;
std::cout << range("a-f").contains('b') << std::endl;
auto az = ranges(range('a'), range('z'));
std::cout << az.contains('a') << std::endl;
std::cout << az.contains('z') << std::endl;
std::cout << az.contains('p') << std::endl;
auto rs = ranges(range("a-f"), range("p-z"));
for (char ch = 'a' ; ch <= 'z' ; ++ch)
{
std::cout << ch << rs.contains(ch) << " ";
}
std::cout << std::endl;
return 0;
}
expected output:
1
1
1
1
0
a1 b1 c1 d1 e1 f1 g0 h0 i0 j0 k0 l0 m0 n0 o0 p1 q1 r1 s1 t1 u1 v1 w1 x1 y1 z1
For reference, here was my original answer:
template<class X, class Y>
bool in(X const& x, Y const& y)
{
return x == y;
}
template<class X, class Y, class...Rest>
bool in(X const& x, Y const& y, Rest const&...rest)
{
return in(x, y) or in(x, rest...);
}
int main()
{
int ch = 6;
std::cout << in(ch, 1,2,3,4,5,6,7) << std::endl;
std::string foo = "foo";
std::cout << in(foo, "bar", "foo", "baz") << std::endl;
std::cout << in(foo, "bar", "baz") << std::endl;
}
If you need to check a character against an arbitrary set of characters, you could try writing this:
std::set<char> allowed_chars = {'A', 'B', 'C', 'D', 'E', 'G', 'Q', '7', 'z'};
if(allowed_chars.find(ch) != allowed_chars.end()) {
/*...*/
}
Yet another answer on this overly-answered question, which I'm just including for completeness. Between all of the answers here you should find something that works in your application.
So another option is a lookup table:
// On initialization:
bool isAcceptable[256] = { false };
isAcceptable[(unsigned char)'A'] = true;
isAcceptable[(unsigned char)'B'] = true;
isAcceptable[(unsigned char)'C'] = true;
// When you want to check:
char c = ...;
if (isAcceptable[(unsigned char)c]) {
// it's 'A', 'B', or 'C'.
}
Scoff at the C-style static casts if you must, but they do get the job done. I suppose you could use an std::vector<bool> if arrays keep you up at night. You can also use types besides bool. But you get the idea.
Obviously this becomes cumbersome with e.g. wchar_t, and virtually unusable with multibyte encodings. But for your char example, or for anything that lends itself to a lookup table, it'll do. YMMV.
Similarly to the C strchr answer, In C++ you can construct a string and check the character against its contents:
#include <string>
...
std::string("ABCDEFGIKZ").find(c) != std::string::npos;
The above will return true for 'F' and 'Z' but false for 'z' or 'O'. This code does not assume contiguous representation of characters.
This works because std::string::find returns std::string::npos when it can't find the character in the string.
Live on Coliru
Edit:
There's another C++ method which doesn't involve dynamic allocation, but does involve an even longer piece of code:
#include <algorithm> // std::find
#include <iterator> // std::begin and std::end
...
char const chars[] = "ABCDEFGIKZ";
return std::find(std::begin(chars), std::end(chars), c) != std::end(chars);
This works similarly to the first code snippet: std::find searches the given range and returns a specific value if the item isn't found. Here, said specific value is the range's end.
Live on Coliru
One option is the unordered_set. Put the characters of interest into the set. Then just check the count of the character in question:
#include <iostream>
#include <unordered_set>
using namespace std;
int main() {
unordered_set<char> characters;
characters.insert('A');
characters.insert('B');
characters.insert('C');
// ...
if (characters.count('A')) {
cout << "found" << endl;
} else {
cout << "not found" << endl;
}
return 0;
}
There is solution to your problem, not in language but in coding practices - Refactoring.
I'm quite sure that readers will find this answer very unorthodox, but - Refactoring can, and is used often to, hide a messy piece of code behind a method call. That method can be cleaned later or it can be left as it is.
You can create the following method:
private bool characterIsValid(char ch) {
return (ch == 'A' || ch == 'B' || ch == 'C' || ..... );
}
and then this method can be called in a short form as:
if (characterIsValid(ch)) ...
Reuse that method with so many checks and only returning a boolean, anywhere.
For a simple and effective solution, you can use memchr():
#include <string.h>
const char list[] = "ABCXZ";
if (memchr(list, ch, sizeof(list) - 1)) {
// 'ch' is 'A', 'B', 'C', 'X', or 'Z'
}
Note that memchr() is better suited than strchr() for this task as strchr() would find the null character '\0' at the end of the string, which is incorrect for most cases.
If the list is dynamic or external and its length is not provided, the strchr() approach is better, but you should check if ch is different from 0 as strchr() would find it at the end of the string:
#include <string.h>
extern char list[];
if (ch && strchr(list, ch)) {
// 'ch' is one of the characters in the list
}
Another more efficient but less terse C99 specific solution uses an array:
#include <limits.h>
const char list[UCHAR_MAX + 1] = { ['A'] = 1, ['B'] = 1, ['C'] = 1, ['X'] = 1, ['Z'] = 1 };
if (list[(unsigned char)ch]) {
/* ch is one of the matching characters */
}
Note however that all of the above solutions assume ch to have char type. If ch has a different type, they would accept false positives. Here is how to fix this:
#include <string.h>
extern char list[];
if (ch == (char)ch && ch && strchr(list, ch)) {
// 'ch' is one of the characters in the list
}
Furthermore, beware of pitfalls if you are comparing unsigned char values:
unsigned char ch = 0xFF;
if (ch == '\xFF') {
/* test fails if `char` is signed by default */
}
if (memchr("\xFF", ch, 1)) {
/* test succeeds in all cases, is this OK? */
}
For this specific case you can use the fact that char is an integer and test for a range:
if(ch >= 'A' && ch <= 'C')
{
...
}
But in general this is not possible unfortunately. If you want to compress your code just use a boolean function
if(compare_char(ch))
{
...
}
The X-Y answer on the vast majority of modern systems is don't bother.
You can take advantage of the fact that practically every character encoding used today stores the alphabet in one sequentially-ordered contiguous block. A is followed by B, B is followed by C, etc... on to Z. This allows you to do simple math tricks on letters to convert the letter to a number. For example the letter C minus the letter A , 'C' - 'A', is 2, the distance between c and a.
Some character sets, EBCDIC was discussed in the comments above, are not sequential or contiguous for reasons that are out of scope for discussion here. They are rare, but occasionally you will find one. When you do... Well, most of the other answers here provide suitable solutions.
We can use this to make a mapping of letter values to letters with a simple array:
// a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p, q,r,s,t,u,v,w,x,y, z
int lettervalues[] = {1,3,3,2,1,4,2,4,1,8,5,1,3,1,1,3,10,1,1,1,1,4,4,8,4,10};
So 'c' - 'a' is 2 and lettervalues[2] will result in 3, the letter value of C.
No if statements or conditional logic required what-so-ever. All the debugging you need to do is proof reading lettervalues to make sure you entered the correct values.
As you study more in C++, you will learn that lettervalues should be static (current translation unit-only access) and const (cannot be changed), possibly constexpr (cannot be changed and fixed at compile time). If you don't know what I'm talking about, don't worry. You'll cover all three later. If not, google them. All are very useful tools.
Using this array could be as simple as
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
score += lettervalues[ch - 'a'];
}
return score;
}
But this has two fatal blind spots:
The first is capital letters. Sorry Ayn Rand, but 'A' is not 'a', and 'A'-'a' is not zero. This can be solved by using std::tolower or std::toupper to convert all input to a known case.
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
score += lettervalues[std::tolower(ch) - 'a'];
}
return score;
}
The other is input of characters that aren't letters. For example, '1'. 'a' - '1' will result in an array index that is not in the array. This is bad. If you're lucky your program will crash, but anything could happen, including looking as though your program works. Read up on Undefined Behaviour for more.
Fortunately this also has a simple fix: Only compute the score for good input. You can test for valid alphabet characters with std::isalpha.
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
if (std::isalpha(ch))
{
score += lettervalues[std::tolower(ch) - 'a'];
}
else
{
// do something that makes sense here.
}
}
return score;
}
My something else would be return -1;. -1 is an impossible word score, so anyone who calls ComputeWordScore can test for -1 and reject the user's input. What they do with it is not ComputeWordScore's problem. Generally the stupider you can make a function, the better, and errors should be handled by the closest piece of code that has all the information needed to make a decision. In this case, whatever read in the string would likely be tasked with deciding what to do with bad strings and ComputeWordScore can keep on computing word scores.
Most of the terse versions have been covered, so I will cover the optimized cases with some helper macros to make them a little more terse.
It just so happens that if your range falls within your number of bits per long that you can combine all of your constants using a bitmask and just check that your value falls in the range and the variable's bitmask is non-zero when bitwise-anded with the constant bitmask.
/* This macro assumes the bits will fit in a long integer type,
* if it needs to be larger (64 bits on x32 etc...),
* you can change the shifted 1ULs to 1ULL or if range is > 64 bits,
* split it into multiple ranges or use SIMD
* It also assumes that a0 is the lowest and a9 is the highest,
* You may want to add compile time assert that:
* a9 (the highest value) - a0 (the lowest value) < max_bits
* and that a1-a8 fall within a0 to a9
*/
#define RANGE_TO_BITMASK_10(a0,a1,a2,a3,a4,a5,a6,a7,a8,a9) \
(1 | (1UL<<((a1)-(a0))) | (1UL<<((a2)-(a0))) | (1UL<<((a3)-(a0))) | \
(1UL<<((a4)-(a0))) | (1UL<<((a5)-(a0))) | (1UL<<((a6)-(a0))) | \
(1UL<<((a7)-(a0))) | (1UL<<((a8)-(a0))) | (1UL<<((a9)-(a0))) )
/*static inline*/ bool checkx(int x){
const unsigned long bitmask = /* assume 64 bits */
RANGE_TO_BITMASK_10('A','B','C','F','G','H','c','f','y','z');
unsigned temp = (unsigned)x-'A';
return ( ( temp <= ('z'-'A') ) && !!( (1ULL<<temp) & bitmask ) );
}
Since all of a# values are constants, they will be combined into 1 bitmask at compile time. That leaves 1 subtraction and 1 compare for the range, 1 shift and 1 bitwise and ... unless the compiler can optimize further, it turns out clang can (it uses the bit test instruction BTQ):
checkx: # #checkx
addl $-65, %edi
cmpl $57, %edi
ja .LBB0_1
movabsq $216172936732606695, %rax # imm = 0x3000024000000E7
btq %rdi, %rax
setb %al
retq
.LBB0_1:
xorl %eax, %eax
retq
It may look like more code on the C side, but if you are looking to optimize, this looks like it may be worth it on the assembly side. I'm sure someone could get creative with the macro to make it more useful in a real programming situations than this "proof of concept".
That will get a little complex as a macro, so here is an alternative set of macros to setup a C99 lookup table.
#include <limits.h>
#define INIT_1(v,a) [ a ] = v
#define INIT_2(v,a,...) [ a ] = v, INIT_1(v, __VA_ARGS__)
#define INIT_3(v,a,...) [ a ] = v, INIT_2(v, __VA_ARGS__)
#define INIT_4(v,a,...) [ a ] = v, INIT_3(v, __VA_ARGS__)
#define INIT_5(v,a,...) [ a ] = v, INIT_4(v, __VA_ARGS__)
#define INIT_6(v,a,...) [ a ] = v, INIT_5(v, __VA_ARGS__)
#define INIT_7(v,a,...) [ a ] = v, INIT_6(v, __VA_ARGS__)
#define INIT_8(v,a,...) [ a ] = v, INIT_7(v, __VA_ARGS__)
#define INIT_9(v,a,...) [ a ] = v, INIT_8(v, __VA_ARGS__)
#define INIT_10(v,a,...) [ a ] = v, INIT_9(v, __VA_ARGS__)
#define ISANY10(x,...) ((const unsigned char[UCHAR_MAX+1]){ \
INIT_10(-1, __VA_ARGS__) \
})[x]
bool checkX(int x){
return ISANY10(x,'A','B','C','F','G','H','c','f','y','z');
}
This method will use a (typically) 256 byte table and a lookup that reduces to something like the following in gcc:
checkX:
movslq %edi, %rdi # x, x
cmpb $0, C.2.1300(%rdi) #, C.2
setne %al #, tmp93
ret
NOTE: Clang doesn't fare as well on the lookup table in this method because it sets up const tables that occur inside functions on the stack on each function call, so you would want to use INIT_10 to initialize a static const unsigned char [UCHAR_MAX+1] outside of the function to achieve similar optimization to gcc.
Basically, I have to use selection sort to sort a string[]. I have done this part but this is what I am having difficulty with.
The sort, however, should be case-insensitive, so that "antenna" would come before "Jupiter". ASCII sorts from uppercase to lowercase, so would there not be a way to just swap the order of the sorted string? Or is there a simpler solution?
void stringSort(string array[], int size) {
int startScan, minIndex;
string minValue;
for(startScan = 0 ; startScan < (size - 1); startScan++) {
minIndex = startScan;
minValue = array[startScan];
for (int index = startScan + 1; index < size; index++) {
if (array[index] < minValue) {
minValue = array[index];
minIndex = index;
}
}
array[minIndex] = array[startScan];
array[startScan] = minValue;
}
}
C++ provides you with sort which takes a comparison function. In your case with a vector<string> you'll be comparing two strings. The comparison function should return true if the first argument is smaller.
For our comparison function we'll want to find the first mismatched character between the strings after tolower has been applied. To do this we can use mismatch which takes a comparator between two characters returning true as long as they are equal:
const auto result = mismatch(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend(), [](const unsigned char lhs, const unsigned char rhs){return tolower(lhs) == tolower(rhs);});
To decide if the lhs is smaller than the rhs fed to mismatch we need to test 3 things:
Were the strings of unequal length
Was string lhs shorter
Or was the first mismatched char from lhs smaller than the first mismatched char from rhs
This evaluation can be performed by:
result.second != rhs.cend() && (result.first == lhs.cend() || tolower(*result.first) < tolower(*result.second));
Ultimately, we'll want to wrap this up in a lambda and plug it back into sort as our comparator:
sort(foo.begin(), foo.end(), [](const unsigned char lhs, const unsigned char rhs){
const auto result = mismatch(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend(), [](const unsigned char lhs, const unsigned char rhs){return tolower(lhs) == tolower(rhs);});
return result.second != rhs.cend() && (result.first == lhs.cend() || tolower(*result.first) < tolower(*result.second));
});
This will correctly sort vector<string> foo. You can see a live example here: http://ideone.com/BVgyD2
EDIT:
Just saw your question update. You can use sort with string array[] as well. You'll just need to call it like this: sort(array, std::next(array, size), ...
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
void CaseInsensitiveSort(vector<string>& strs)
{
sort(
begin(strs),
end(strs),
[](const string& str1, const string& str2){
return lexicographical_compare(
begin(str1), end(str1),
begin(str2), end(str2),
[](const char& char1, const char& char2) {
return tolower(char1) < tolower(char2);
}
);
}
);
}
I use this lambda function to sort a vectors of strings:
std::sort(entries.begin(), entries.end(), [](const std::string& a, const std::string& b) -> bool {
for (size_t c = 0; c < a.size() and c < b.size(); c++) {
if (std::tolower(a[c]) != std::tolower(b[c]))
return (std::tolower(a[c]) < std::tolower(b[c]));
}
return a.size() < b.size();
});
Instead of the < operator, use a case-insensitive string comparison function.
C89/C99 provide strcoll (string collate), which does a locale-aware string comparison. It's available in C++ as std::strcoll. In some (most?) locales, like en_CA.UTF-8, A and a (and all accented forms of either) are in the same equivalence class. I think strcoll only compares within an equivalence class as a tiebreak if the whole string is otherwise equal, which gives a very similar sort order to a case-insensitive compare. Collation (at least in English locales on GNU/Linux) ignores some characters (like [). So ls /usr/share | sort gives output like
acpi-support
adduser
ADM_scripts
aglfn
aisleriot
I pipe through sort because ls does its own sorting, which isn't quite the same as sort's locale-based sorting.
If you want to sort some user-input arbitrary strings into an order that the user will see directly, locale-aware string comparison is usually what you want. Strings that differ only in case or accents won't compare equal, so this won't work if you were using a stable sort and depending on case-differing strings to compare equal, but otherwise you get nice results. Depending on the use-case, nicer than plain case-insensitive comparison.
FreeBSD's strcoll was and maybe still is case sensitive for locales other than POSIX (ASCII). That forum post suggests that on most other systems it is not case senstive.
MSVC provides a _stricoll for case-insensitive collation, implying that its normal strcoll is case sensitive. However, this might just mean that the fallback to comparing within an equivalence class doesn't happen. Maybe someone can test the following example with MSVC.
// strcoll.c: show that these strings sort in a different order, depending on locale
#include <stdio.h>
#include <locale.h>
int main()
{
// TODO: try some strings containing characters like '[' that strcoll ignores completely.
const char * s[] = { "FooBar - abc", "Foobar - bcd", "FooBar - cde" };
#ifdef USE_LOCALE
setlocale(LC_ALL, ""); // empty string means look at env vars
#endif
strcoll(s[0], s[1]);
strcoll(s[0], s[2]);
strcoll(s[1], s[2]);
return 0;
}
output of gcc -DUSE_LOCALE -Og strcoll.c && ltrace ./a.out (or run LANG=C ltrace a.out):
__libc_start_main(0x400586, 1, ...
setlocale(LC_ALL, "") = "en_CA.UTF-8" # my env contains LANG=en_CA.UTF-8
strcoll("FooBar - abc", "Foobar - bcd") = -1
strcoll("FooBar - abc", "FooBar - cde") = -2
strcoll("Foobar - bcd", "FooBar - cde") = -1
# the three strings are in order
+++ exited (status 0) +++
with gcc -Og -UUSE_LOCALE strcoll.c && ltrace ./a.out:
__libc_start_main(0x400536, ...
# no setlocale, so current locale is C
strcoll("FooBar - abc", "Foobar - bcd") = -32
strcoll("FooBar - abc", "FooBar - cde") = -2
strcoll("Foobar - bcd", "FooBar - cde") = 32 # s[1] should sort after s[2], so it's out of order
+++ exited (status 0) +++
POSIX.1-2001 provides strcasecmp. The POSIX spec says the results are "unspecified" for locales other than plain-ASCII, though, so I'm not sure whether common implementations handle utf-8 correctly or not.
See this post for portability issues with strcasecmp, e.g. to Windows. See other answers on that question for other C++ ways of doing case-insensitive string compares.
Once you have a case-insensitive comparison function, you can use it with other sort algorithms, like C standard lib qsort, or c++ std::sort, instead of writing your own O(n^2) selection-sort.
As b.buchhold's answer points out, doing a case-insensitive comparison on the fly might be slower than converting everything to lowercase once, and sorting an array of indices. The lowercase-version of each strings is needed many times. std::strxfrm will transform a string so that strcmp on the result will give the same result as strcoll on the original string.
You could call tolower on every character you compare. This is probably the easiest, yet not a great solution, becasue:
You look at every char multiple times so you'd call the method more often than necessary
You need extra care to handle wide-characters w.r.t to their encoding (UTF8 etc)
You could also replace the comparator by your own function. I.e. there will be some place where you compare something like stringone[i] < stringtwo[j] or charA < charB. change it to my_less_than(stringone[i], stringtwo[j]) and implement the exact ordering you want based.
another way would be to transform every string to lowercase once and create an array of pairs. then you base your comparisons on the lowercase value only, but you swap whole pairs so that your final strings will be in the right order as well.
finally, you can create an array with lowercase versions and sort this one. whenever you swap two elements in this one, you also swap in the original array.
note that all those proposals would still need proper handling of wide characters (if you need that at all)
This solution is much simpler to understand than Jonathan Mee's and pretty inefficient, but for educational purpose could be fine:
std::string lowercase( std::string s )
{
std::transform( s.begin(), s.end(), s.begin(), ::tolower );
return s;
}
std::sort( array, array + length,
[]( const std::string &s1, const std::string &s2 ) {
return lowercase( s1 ) < lowercase( s2 );
} );
if you have to use your sort function, you can use the same approach:
....
minValue = lowercase( array[startScan] );
for (int index = startScan + 1; index < size; index++) {
const std::string &tstr = lowercase( array[index] );
if (tstr < minValue) {
minValue = tstr;
minIndex = index;
}
}
...