C++ Integer [?] - c++

In Java, strings have a charAt() function.
In C++, that function is simply stringname[INDEX]
However, what if I wanted to use a particular number at a certain index of an integer?
E.g.
int value = 9123;
Let's say I wanted to work with the index 0, which is just the 9.
Is there a way to use index at's with integers?

int value = 9123;
std::stringstream tmp;
tmp << value;
char digit = (tmp.str())[0];

No, there is no standard function to extract decimal digits from an integer.
In C++11, there is a function to convert to a string:
std::string string = std::to_string(value);
If you can't use C++11, then you could use a string stream:
std::ostringstream stream;
stream << value;
std::string string = stream.str();
or old-school C formatting:
char buffer[32]; // Make sure it's large enough
snprintf(buffer, sizeof buffer, "%d", value);
std::string string = buffer;
or if you just want one digit, you could extract it arithmetically:
int digits = 0;
for (int temp = value; temp != 0; temp /= 10) {
++digits;
}
// This could be replaced by "value /= std::pow(10, digits-index-1)"
// if you don't mind using floating-point arithmetic.
for (int i = digits-index-1; i > 0; --i) {
value /= 10;
}
int digit = value % 10;
Handling negative numbers in a sensible way is left as an exercise for the reader.

You can use the following formula (pseudo-code) :
currDigit = (absolute(value) / 10^index) modulo 10; // (where ^ is power-of)

Just to make things complete, you can also use boost::lexical_cast, for more info check out the documentation here.
Basically its just a nice wrapper around the code which can be found at Andreas Brinck answear.

Another solution, which does use 0 for the lestmost digit. digits is used to break down value into individual digits in written order. (i.e. "9347" becomes 9,3,4,7). We then discard the first index values. I.e. to get the 3nd digit, we discard the first two and take the new front.
if (value==0 && index ==0) return 0; // Special case.
if (value <0) { ... } // Unclear what to do with this.
std::list<char> digits;
while (value) {
digits.push_front(value % 10);
value /= 10;
}
for(; index > 0 && !digits.empty(); index--) {
digits.pop_front();
}
if (!digits.empty()) {
return digits.front();
} else
{
throw std::invalid_argument("Index too large");
}

An integer is not a string and therefor you can not do that. What you need is indeed to convert an integer to string. You can use itoa or have a look here.

Try sprintf to write the integer out to a string:
http://www.cplusplus.com/reference/clibrary/cstdio/sprintf/
Then you can index into the char array that you've just printed into.

I've implemented a variant of giorashc s solution, with all the suggested fixes and issues resolved: Its a bit long but it should be fast if everything is inlined: Most of the code is tests which I've left in for completeness.
#include <iostream>
#include <math.h>
char get_kth_digit( int v, int index)
{
assert(v>0);
int mask = pow(10,index);
return '0'+(v % (mask*10))/mask;
}
int count_digits( int v )
{
assert(v>0);
int c=0;
while(v>0)
{
++c;
v/=10;
}
return c;
}
char get_int_index(int v, int index)
{
if( v==0 ) return '0';
if( v < 0 )
{
if(index==0) { return '-'; }
return get_int_index(-v,index-1);
}
// get_kth_digit counts the wrong way, so we need to reverse the count
int digits = count_digits(v);
return get_kth_digit( v, digits-index-1);
}
template<typename X, typename Y>
void compare(const X & v1, const Y & v2, const char * v1t, const char * v2t, uint32_t line, const char * fname )
{
if(v1!=v2)
{
std::cerr<<fname<<":"<<line<<": Equality test failed "<< v1t << "("<<v1<<") <> " << v2t <<" ("<<v2<<")"<<std::endl;
}
}
#define test_eq(X,Y) compare(X,Y,#X,#Y,__LINE__,__FILE__)
int main()
{
test_eq( 1, count_digits(1) );
test_eq( 1, count_digits(9) );
test_eq( 2, count_digits(10) );
test_eq( 2, count_digits(99) );
test_eq( 3, count_digits(100) );
test_eq( 3, count_digits(999) );
test_eq( '1', get_kth_digit(123,2) );
test_eq( '2', get_kth_digit(123,1) );
test_eq( '3', get_kth_digit(123,0) );
test_eq( '0', get_kth_digit(10,0) );
test_eq( '1', get_kth_digit(10,1) );
test_eq( '1', get_int_index(123,0) );
test_eq( '2', get_int_index(123,1) );
test_eq( '3', get_int_index(123,2) );
test_eq( '-', get_int_index(-123,0) );
test_eq( '1', get_int_index(-123,1) );
test_eq( '2', get_int_index(-123,2) );
test_eq( '3', get_int_index(-123,3) );
}

Longer version respect to Andreas Brink.
The C++ library is designed so that between "sequences" and "values" there is a "mediator" named "stream", that actually act as a translator from the value to their respecting sequence.
"sequences" is an abstract concept whose concrete implementation are "strings" and "files".
"stream" is another abstract concept whose correspondent concrete implementation are "stringstream" and "fstream", that are implemented in term of helper classes "stringbuf" and "filebuf" (both derived form the abstract "streambuf") and from a helper object of "locale" class, containing some "facets".
The cited answer code, works this way:
The tmp object of class stringstream is default-constructed: this will construct also internally a stingbuf and a string, plus a locale referencing the facets of the system global locale (the default one remaps the "classic" or "C" locale)
The operator<< between stream and int function is called: there is one of them, for all the basic types
The "int version" gets the num_put facet from the locale, and a "buffer iterator" from the buffer, and calls the put function passing the format flags of the given stream.
the "put function" actually converts the number into the character sequence thus filling the buffer
When the buffer is full, or when a particular character is inserted or when the str function is called, the buffer content is "sent" (copyed, in this case) to the string, and the string content returned.
This very convoluted process looks complex at first but:
Can be completely hidden (resulting in two lines of code)
Cam be extended to virtually anything but...
It is often kept as a (sort of ) misery in its details in the most of C++ courses and tutorials

I would convert it to a string, then index it -- CPP also has the:
str.at(i)
function similar to Java's.
Another simpler loop in C++11 would be a range based loop --
int i = 0
for(auto s : int_or_str){
if(i == idx)
cout << s;
else
i++
}
I guess this isn't easier than the standard for loop -- thought auto may be helpful, not really. I know this is answered, but I prefer simple and familiar answers.
Zach

Related

Generate string lexicographically larger than input

Given an input string A, is there a concise way to generate a string B that is lexicographically larger than A, i.e. A < B == true?
My raw solution would be to say:
B = A;
++B.back();
but in general this won't work because:
A might be empty
The last character of A may be close to wraparound, in which case the resulting character will have a smaller value i.e. B < A.
Adding an extra character every time is wasteful and will quickly in unreasonably large strings.
So I was wondering whether there's a standard library function that can help me here, or if there's a strategy that scales nicely when I want to start from an arbitrary string.
You can duplicate A into B then look at the final character. If the final character isn't the final character in your range, then you can simply increment it by one.
Otherwise you can look at last-1, last-2, last-3. If you get to the front of the list of chars, then append to the length.
Here is my dummy solution:
std::string make_greater_string(std::string const &input)
{
std::string ret{std::numeric_limits<
std::string::value_type>::min()};
if (!input.empty())
{
if (std::numeric_limits<std::string::value_type>::max()
== input.back())
{
ret = input + ret;
}
else
{
ret = input;
++ret.back();
}
}
return ret;
}
Ideally I'd hope to avoid the explicit handling of all special cases, and use some facility that can more naturally handle them. Already looking at the answer by #JosephLarson I see that I could increment more that the last character which would improve the range achievable without adding more characters.
And here's the refinement after the suggestions in this post:
std::string make_greater_string(std::string const &input)
{
constexpr char minC = ' ', maxC = '~';
// Working with limits was a pain,
// using ASCII typical limit values instead.
std::string ret{minC};
auto rit = input.rbegin();
while (rit != input.rend())
{
if (maxC == *rit)
{
++rit;
if (rit == input.rend())
{
ret = input + ret;
break;
}
}
else
{
ret = input;
++(*(ret.rbegin() + std::distance(input.rbegin(), rit)));
break;
}
}
return ret;
}
Demo
You can copy the string and append some letters - this will produce a lexicographically larger result.
B = A + "a"

How do I check if a variable is not equal to multiple things in C++?

I'm writing a piece of my code that checks whether what the user has entered is actually one of the valid inputs (1-9 in this case), and will give an error message if it isn't.
This is what I have:
if (input != '1', '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , '0' )
{
cout << "Error";
}
But it doesn't seem to work. I thought I could use commas to separate them, but maybe I'm imagining that.
Is the only option to just do:
input != '1' && input != '2' && input != '3' etc etc
I know that method would work, but it seems a bit long winded. Is there a simpler way?
You can store the values in a container and utilize the std::find_if, std::none_of or std::any_of functions:
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
std::vector<char> v = { '1', '2', '3', '4', '5', '6', '7', '8', '9', '0' };
char input = '1';
if (std::none_of(v.cbegin(), v.cend(), [&input](char p){ return p == input; })) {
std::cout << "None of the elements are equal to input.\n";
}
else {
std::cout << "Some of the elements are equal to input.\n";
}
}
How do I check if a variable is not equal to multiple things
Is the only option to just do:
input != '1' && input != '2' && input != '3' etc etc
In the general case, for an arbitrary set of values: No, that is not the only option, but it is the simplest. And simplest is often best, or at least good enough.
If you dislike the redundant repetition of input !=, a variadic template can be used to generate the expression. I've written an example of this in another question: https://stackoverflow.com/a/51497146/2079303
In specific cases, there may be better alternatives. There exists std::isdigit for example for exactly the particular case in your example code.
In order to check if a variable is (not) equal to mutliple things which are not known until runtime, the typical solution is to use a set data structure, such as std::unordered_set.
If you are looking for a more general and human-readable construct, you can create something like this:
template <typename T, int TSize>
struct AnyOfThis {
template <typename TFirst, typename... TOthers>
explicit AnyOfThis(TFirst&& first, TOthers&&... others)
: values({ std::forward<TFirst>(first), std::forward<TOthers>(others)... }) {}
std::array<T, TSize> values;
};
template <typename TFirst, typename... TOthers>
auto anyOf(TFirst&& first, TOthers&&... others) {
constexpr std::size_t size = 1 + sizeof...(others);
return AnyOfThis<typename std::decay<TFirst>::type, size>(std::forward<TFirst>(first),
std::forward<TOthers>(others)...);
}
template <typename T, int TSize>
bool operator==(const T value, const AnyOfThis<typename std::decay<T>::type, TSize>& anyOfThis) {
return std::find(anyOfThis.values.begin(), anyOfThis.values.end(), value) != anyOfThis.values.end();
}
Basically, it creates a static array from a variadic function. Then there is another function which serves as a comparator, which takes the value you want to compare and looks for this value in the array.
The use-case reads fairly well, too:
if (1 == anyOf(1, 2, 3)) {
// do stuff
}
LIVE DEMO AT COLIRU
simple and efficient way would be.
std::unordered_set<char> allowedValues = {'1','2','3','4','5','6','7','8','9','0'};
std::unordered_set<char>::const_iterator index = allowedValues.find(input);
if(index == allowedValues.end())
std::cout << "Error";
else
std::cout << "Valid";
by using unordered set you expect O(1) complexity for lookup. It is good when input number is high. If your index is equal to end of set it does not exist in the list, you will get end of set as index which is invalid input for you. otherwise you will count it as a valid input
If you are looking for "if a string is not equal to multiple strings in C" you may use the following (Not everyone would consider it elegant, but if you are fond of good old c-str then you may find it nice. Surely, it is simple and fast):
int GetIdxOfStringInOptionList (const char *Xi_pStr)
{
char l_P2[205];
sprintf(l_P2, "<#%s^>", Xi_pStr); // TODO: if (strlen>=200) return -1. Note that 200 is above length of options string below
_strlwr(l_P2); // iff you want comparison to be case insensitive
const char *l_pCO = strstr("01<#gps^>02<#gps2^>03<#log^>04<#img^>05<#nogps^>06<#nogps2^>07<#gps3^>08<#pillars0^>09<#pillars1^>10<#pillars2^>11<#pillars3^>", l_P2);
return l_pCO? atoi(l_pCO-2) : -1;
}

Is there a shorter way to write compound 'if' conditions? [duplicate]

This question already has answers here:
Shorthand for checking for equality to multiple possibilities [duplicate]
(3 answers)
Closed 6 years ago.
Just instead of:
if ( ch == 'A' || ch == 'B' || ch == 'C' || .....
For example, to do it like:
if ( ch == 'A', 'B', 'C', ...
is there even a shorter way to summarize conditions?
strchr() can be used to see if the character is in a list.
const char* list = "ABCXZ";
if (strchr(list, ch)) {
// 'ch' is 'A', 'B', 'C', 'X', or 'Z'
}
In this case you could use a switch:
switch (ch) {
case 'A':
case 'B':
case 'C':
// do something
break;
case 'D':
case 'E':
case 'F':
// do something else
break;
...
}
While this is slightly more verbose than using strchr, it doesn't involve any function calls. It also works for both C and C++.
Note that the alternate syntax you suggested won't work as you might expect because of the use of the comma operator:
if ( ch == 'A', 'B', 'C', 'D', 'E', 'F' )
This first compares ch to 'A' and then discards the result. Then 'B' is evaluated and discarded, then 'C', and so forth until 'F' is evaluated. Then 'F' becomes the value of the conditional. Since any non-zero value evaluated to true in a boolean context (and 'F' is non-zero), then the above expression will always be true.
Templates allow us to express ourselves in this way:
if (range("A-F").contains(ch)) { ... }
It requires a little plumbing, which you can put in a library.
This actually compiles out to be incredibly efficient (at least on gcc and clang).
#include <cstdint>
#include <tuple>
#include <utility>
#include <iostream>
namespace detail {
template<class T>
struct range
{
constexpr range(T first, T last)
: _begin(first), _end(last)
{}
constexpr T begin() const { return _begin; }
constexpr T end() const { return _end; }
template<class U>
constexpr bool contains(const U& u) const
{
return _begin <= u and u <= _end;
}
private:
T _begin;
T _end;
};
template<class...Ranges>
struct ranges
{
constexpr ranges(Ranges...ranges) : _ranges(std::make_tuple(ranges...)) {}
template<class U>
struct range_check
{
template<std::size_t I>
bool contains_impl(std::integral_constant<std::size_t, I>,
const U& u,
const std::tuple<Ranges...>& ranges) const
{
return std::get<I>(ranges).contains(u)
or contains_impl(std::integral_constant<std::size_t, I+1>(),u, ranges);
}
bool contains_impl(std::integral_constant<std::size_t, sizeof...(Ranges)>,
const U& u,
const std::tuple<Ranges...>& ranges) const
{
return false;
}
constexpr bool operator()(const U& u, std::tuple<Ranges...> const& ranges) const
{
return contains_impl(std::integral_constant<std::size_t, 0>(), u, ranges);
}
};
template<class U>
constexpr bool contains(const U& u) const
{
range_check<U> check {};
return check(u, _ranges);
}
std::tuple<Ranges...> _ranges;
};
}
template<class T>
constexpr auto range(T t) { return detail::range<T>(t, t); }
template<class T>
constexpr auto range(T from, T to) { return detail::range<T>(from, to); }
// this is the little trick which turns an ascii string into
// a range of characters at compile time. It's probably a bit naughty
// as I am not checking syntax. You could write "ApZ" and it would be
// interpreted as "A-Z".
constexpr auto range(const char (&s)[4])
{
return range(s[0], s[2]);
}
template<class...Rs>
constexpr auto ranges(Rs...rs)
{
return detail::ranges<Rs...>(rs...);
}
int main()
{
std::cout << range(1,7).contains(5) << std::endl;
std::cout << range("a-f").contains('b') << std::endl;
auto az = ranges(range('a'), range('z'));
std::cout << az.contains('a') << std::endl;
std::cout << az.contains('z') << std::endl;
std::cout << az.contains('p') << std::endl;
auto rs = ranges(range("a-f"), range("p-z"));
for (char ch = 'a' ; ch <= 'z' ; ++ch)
{
std::cout << ch << rs.contains(ch) << " ";
}
std::cout << std::endl;
return 0;
}
expected output:
1
1
1
1
0
a1 b1 c1 d1 e1 f1 g0 h0 i0 j0 k0 l0 m0 n0 o0 p1 q1 r1 s1 t1 u1 v1 w1 x1 y1 z1
For reference, here was my original answer:
template<class X, class Y>
bool in(X const& x, Y const& y)
{
return x == y;
}
template<class X, class Y, class...Rest>
bool in(X const& x, Y const& y, Rest const&...rest)
{
return in(x, y) or in(x, rest...);
}
int main()
{
int ch = 6;
std::cout << in(ch, 1,2,3,4,5,6,7) << std::endl;
std::string foo = "foo";
std::cout << in(foo, "bar", "foo", "baz") << std::endl;
std::cout << in(foo, "bar", "baz") << std::endl;
}
If you need to check a character against an arbitrary set of characters, you could try writing this:
std::set<char> allowed_chars = {'A', 'B', 'C', 'D', 'E', 'G', 'Q', '7', 'z'};
if(allowed_chars.find(ch) != allowed_chars.end()) {
/*...*/
}
Yet another answer on this overly-answered question, which I'm just including for completeness. Between all of the answers here you should find something that works in your application.
So another option is a lookup table:
// On initialization:
bool isAcceptable[256] = { false };
isAcceptable[(unsigned char)'A'] = true;
isAcceptable[(unsigned char)'B'] = true;
isAcceptable[(unsigned char)'C'] = true;
// When you want to check:
char c = ...;
if (isAcceptable[(unsigned char)c]) {
// it's 'A', 'B', or 'C'.
}
Scoff at the C-style static casts if you must, but they do get the job done. I suppose you could use an std::vector<bool> if arrays keep you up at night. You can also use types besides bool. But you get the idea.
Obviously this becomes cumbersome with e.g. wchar_t, and virtually unusable with multibyte encodings. But for your char example, or for anything that lends itself to a lookup table, it'll do. YMMV.
Similarly to the C strchr answer, In C++ you can construct a string and check the character against its contents:
#include <string>
...
std::string("ABCDEFGIKZ").find(c) != std::string::npos;
The above will return true for 'F' and 'Z' but false for 'z' or 'O'. This code does not assume contiguous representation of characters.
This works because std::string::find returns std::string::npos when it can't find the character in the string.
Live on Coliru
Edit:
There's another C++ method which doesn't involve dynamic allocation, but does involve an even longer piece of code:
#include <algorithm> // std::find
#include <iterator> // std::begin and std::end
...
char const chars[] = "ABCDEFGIKZ";
return std::find(std::begin(chars), std::end(chars), c) != std::end(chars);
This works similarly to the first code snippet: std::find searches the given range and returns a specific value if the item isn't found. Here, said specific value is the range's end.
Live on Coliru
One option is the unordered_set. Put the characters of interest into the set. Then just check the count of the character in question:
#include <iostream>
#include <unordered_set>
using namespace std;
int main() {
unordered_set<char> characters;
characters.insert('A');
characters.insert('B');
characters.insert('C');
// ...
if (characters.count('A')) {
cout << "found" << endl;
} else {
cout << "not found" << endl;
}
return 0;
}
There is solution to your problem, not in language but in coding practices - Refactoring.
I'm quite sure that readers will find this answer very unorthodox, but - Refactoring can, and is used often to, hide a messy piece of code behind a method call. That method can be cleaned later or it can be left as it is.
You can create the following method:
private bool characterIsValid(char ch) {
return (ch == 'A' || ch == 'B' || ch == 'C' || ..... );
}
and then this method can be called in a short form as:
if (characterIsValid(ch)) ...
Reuse that method with so many checks and only returning a boolean, anywhere.
For a simple and effective solution, you can use memchr():
#include <string.h>
const char list[] = "ABCXZ";
if (memchr(list, ch, sizeof(list) - 1)) {
// 'ch' is 'A', 'B', 'C', 'X', or 'Z'
}
Note that memchr() is better suited than strchr() for this task as strchr() would find the null character '\0' at the end of the string, which is incorrect for most cases.
If the list is dynamic or external and its length is not provided, the strchr() approach is better, but you should check if ch is different from 0 as strchr() would find it at the end of the string:
#include <string.h>
extern char list[];
if (ch && strchr(list, ch)) {
// 'ch' is one of the characters in the list
}
Another more efficient but less terse C99 specific solution uses an array:
#include <limits.h>
const char list[UCHAR_MAX + 1] = { ['A'] = 1, ['B'] = 1, ['C'] = 1, ['X'] = 1, ['Z'] = 1 };
if (list[(unsigned char)ch]) {
/* ch is one of the matching characters */
}
Note however that all of the above solutions assume ch to have char type. If ch has a different type, they would accept false positives. Here is how to fix this:
#include <string.h>
extern char list[];
if (ch == (char)ch && ch && strchr(list, ch)) {
// 'ch' is one of the characters in the list
}
Furthermore, beware of pitfalls if you are comparing unsigned char values:
unsigned char ch = 0xFF;
if (ch == '\xFF') {
/* test fails if `char` is signed by default */
}
if (memchr("\xFF", ch, 1)) {
/* test succeeds in all cases, is this OK? */
}
For this specific case you can use the fact that char is an integer and test for a range:
if(ch >= 'A' && ch <= 'C')
{
...
}
But in general this is not possible unfortunately. If you want to compress your code just use a boolean function
if(compare_char(ch))
{
...
}
The X-Y answer on the vast majority of modern systems is don't bother.
You can take advantage of the fact that practically every character encoding used today stores the alphabet in one sequentially-ordered contiguous block. A is followed by B, B is followed by C, etc... on to Z. This allows you to do simple math tricks on letters to convert the letter to a number. For example the letter C minus the letter A , 'C' - 'A', is 2, the distance between c and a.
Some character sets, EBCDIC was discussed in the comments above, are not sequential or contiguous for reasons that are out of scope for discussion here. They are rare, but occasionally you will find one. When you do... Well, most of the other answers here provide suitable solutions.
We can use this to make a mapping of letter values to letters with a simple array:
// a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p, q,r,s,t,u,v,w,x,y, z
int lettervalues[] = {1,3,3,2,1,4,2,4,1,8,5,1,3,1,1,3,10,1,1,1,1,4,4,8,4,10};
So 'c' - 'a' is 2 and lettervalues[2] will result in 3, the letter value of C.
No if statements or conditional logic required what-so-ever. All the debugging you need to do is proof reading lettervalues to make sure you entered the correct values.
As you study more in C++, you will learn that lettervalues should be static (current translation unit-only access) and const (cannot be changed), possibly constexpr (cannot be changed and fixed at compile time). If you don't know what I'm talking about, don't worry. You'll cover all three later. If not, google them. All are very useful tools.
Using this array could be as simple as
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
score += lettervalues[ch - 'a'];
}
return score;
}
But this has two fatal blind spots:
The first is capital letters. Sorry Ayn Rand, but 'A' is not 'a', and 'A'-'a' is not zero. This can be solved by using std::tolower or std::toupper to convert all input to a known case.
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
score += lettervalues[std::tolower(ch) - 'a'];
}
return score;
}
The other is input of characters that aren't letters. For example, '1'. 'a' - '1' will result in an array index that is not in the array. This is bad. If you're lucky your program will crash, but anything could happen, including looking as though your program works. Read up on Undefined Behaviour for more.
Fortunately this also has a simple fix: Only compute the score for good input. You can test for valid alphabet characters with std::isalpha.
int ComputeWordScore(std::string in)
{
int score = 0;
for (char ch: in) // for all characters in string
{
if (std::isalpha(ch))
{
score += lettervalues[std::tolower(ch) - 'a'];
}
else
{
// do something that makes sense here.
}
}
return score;
}
My something else would be return -1;. -1 is an impossible word score, so anyone who calls ComputeWordScore can test for -1 and reject the user's input. What they do with it is not ComputeWordScore's problem. Generally the stupider you can make a function, the better, and errors should be handled by the closest piece of code that has all the information needed to make a decision. In this case, whatever read in the string would likely be tasked with deciding what to do with bad strings and ComputeWordScore can keep on computing word scores.
Most of the terse versions have been covered, so I will cover the optimized cases with some helper macros to make them a little more terse.
It just so happens that if your range falls within your number of bits per long that you can combine all of your constants using a bitmask and just check that your value falls in the range and the variable's bitmask is non-zero when bitwise-anded with the constant bitmask.
/* This macro assumes the bits will fit in a long integer type,
* if it needs to be larger (64 bits on x32 etc...),
* you can change the shifted 1ULs to 1ULL or if range is > 64 bits,
* split it into multiple ranges or use SIMD
* It also assumes that a0 is the lowest and a9 is the highest,
* You may want to add compile time assert that:
* a9 (the highest value) - a0 (the lowest value) < max_bits
* and that a1-a8 fall within a0 to a9
*/
#define RANGE_TO_BITMASK_10(a0,a1,a2,a3,a4,a5,a6,a7,a8,a9) \
(1 | (1UL<<((a1)-(a0))) | (1UL<<((a2)-(a0))) | (1UL<<((a3)-(a0))) | \
(1UL<<((a4)-(a0))) | (1UL<<((a5)-(a0))) | (1UL<<((a6)-(a0))) | \
(1UL<<((a7)-(a0))) | (1UL<<((a8)-(a0))) | (1UL<<((a9)-(a0))) )
/*static inline*/ bool checkx(int x){
const unsigned long bitmask = /* assume 64 bits */
RANGE_TO_BITMASK_10('A','B','C','F','G','H','c','f','y','z');
unsigned temp = (unsigned)x-'A';
return ( ( temp <= ('z'-'A') ) && !!( (1ULL<<temp) & bitmask ) );
}
Since all of a# values are constants, they will be combined into 1 bitmask at compile time. That leaves 1 subtraction and 1 compare for the range, 1 shift and 1 bitwise and ... unless the compiler can optimize further, it turns out clang can (it uses the bit test instruction BTQ):
checkx: # #checkx
addl $-65, %edi
cmpl $57, %edi
ja .LBB0_1
movabsq $216172936732606695, %rax # imm = 0x3000024000000E7
btq %rdi, %rax
setb %al
retq
.LBB0_1:
xorl %eax, %eax
retq
It may look like more code on the C side, but if you are looking to optimize, this looks like it may be worth it on the assembly side. I'm sure someone could get creative with the macro to make it more useful in a real programming situations than this "proof of concept".
That will get a little complex as a macro, so here is an alternative set of macros to setup a C99 lookup table.
#include <limits.h>
#define INIT_1(v,a) [ a ] = v
#define INIT_2(v,a,...) [ a ] = v, INIT_1(v, __VA_ARGS__)
#define INIT_3(v,a,...) [ a ] = v, INIT_2(v, __VA_ARGS__)
#define INIT_4(v,a,...) [ a ] = v, INIT_3(v, __VA_ARGS__)
#define INIT_5(v,a,...) [ a ] = v, INIT_4(v, __VA_ARGS__)
#define INIT_6(v,a,...) [ a ] = v, INIT_5(v, __VA_ARGS__)
#define INIT_7(v,a,...) [ a ] = v, INIT_6(v, __VA_ARGS__)
#define INIT_8(v,a,...) [ a ] = v, INIT_7(v, __VA_ARGS__)
#define INIT_9(v,a,...) [ a ] = v, INIT_8(v, __VA_ARGS__)
#define INIT_10(v,a,...) [ a ] = v, INIT_9(v, __VA_ARGS__)
#define ISANY10(x,...) ((const unsigned char[UCHAR_MAX+1]){ \
INIT_10(-1, __VA_ARGS__) \
})[x]
bool checkX(int x){
return ISANY10(x,'A','B','C','F','G','H','c','f','y','z');
}
This method will use a (typically) 256 byte table and a lookup that reduces to something like the following in gcc:
checkX:
movslq %edi, %rdi # x, x
cmpb $0, C.2.1300(%rdi) #, C.2
setne %al #, tmp93
ret
NOTE: Clang doesn't fare as well on the lookup table in this method because it sets up const tables that occur inside functions on the stack on each function call, so you would want to use INIT_10 to initialize a static const unsigned char [UCHAR_MAX+1] outside of the function to achieve similar optimization to gcc.

Case insensitive sorting of an array of strings

Basically, I have to use selection sort to sort a string[]. I have done this part but this is what I am having difficulty with.
The sort, however, should be case-insensitive, so that "antenna" would come before "Jupiter". ASCII sorts from uppercase to lowercase, so would there not be a way to just swap the order of the sorted string? Or is there a simpler solution?
void stringSort(string array[], int size) {
int startScan, minIndex;
string minValue;
for(startScan = 0 ; startScan < (size - 1); startScan++) {
minIndex = startScan;
minValue = array[startScan];
for (int index = startScan + 1; index < size; index++) {
if (array[index] < minValue) {
minValue = array[index];
minIndex = index;
}
}
array[minIndex] = array[startScan];
array[startScan] = minValue;
}
}
C++ provides you with sort which takes a comparison function. In your case with a vector<string> you'll be comparing two strings. The comparison function should return true if the first argument is smaller.
For our comparison function we'll want to find the first mismatched character between the strings after tolower has been applied. To do this we can use mismatch which takes a comparator between two characters returning true as long as they are equal:
const auto result = mismatch(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend(), [](const unsigned char lhs, const unsigned char rhs){return tolower(lhs) == tolower(rhs);});
To decide if the lhs is smaller than the rhs fed to mismatch we need to test 3 things:
Were the strings of unequal length
Was string lhs shorter
Or was the first mismatched char from lhs smaller than the first mismatched char from rhs
This evaluation can be performed by:
result.second != rhs.cend() && (result.first == lhs.cend() || tolower(*result.first) < tolower(*result.second));
Ultimately, we'll want to wrap this up in a lambda and plug it back into sort as our comparator:
sort(foo.begin(), foo.end(), [](const unsigned char lhs, const unsigned char rhs){
const auto result = mismatch(lhs.cbegin(), lhs.cend(), rhs.cbegin(), rhs.cend(), [](const unsigned char lhs, const unsigned char rhs){return tolower(lhs) == tolower(rhs);});
return result.second != rhs.cend() && (result.first == lhs.cend() || tolower(*result.first) < tolower(*result.second));
});
This will correctly sort vector<string> foo. You can see a live example here: http://ideone.com/BVgyD2
EDIT:
Just saw your question update. You can use sort with string array[] as well. You'll just need to call it like this: sort(array, std::next(array, size), ...
#include <algorithm>
#include <vector>
#include <string>
using namespace std;
void CaseInsensitiveSort(vector<string>& strs)
{
sort(
begin(strs),
end(strs),
[](const string& str1, const string& str2){
return lexicographical_compare(
begin(str1), end(str1),
begin(str2), end(str2),
[](const char& char1, const char& char2) {
return tolower(char1) < tolower(char2);
}
);
}
);
}
I use this lambda function to sort a vectors of strings:
std::sort(entries.begin(), entries.end(), [](const std::string& a, const std::string& b) -> bool {
for (size_t c = 0; c < a.size() and c < b.size(); c++) {
if (std::tolower(a[c]) != std::tolower(b[c]))
return (std::tolower(a[c]) < std::tolower(b[c]));
}
return a.size() < b.size();
});
Instead of the < operator, use a case-insensitive string comparison function.
C89/C99 provide strcoll (string collate), which does a locale-aware string comparison. It's available in C++ as std::strcoll. In some (most?) locales, like en_CA.UTF-8, A and a (and all accented forms of either) are in the same equivalence class. I think strcoll only compares within an equivalence class as a tiebreak if the whole string is otherwise equal, which gives a very similar sort order to a case-insensitive compare. Collation (at least in English locales on GNU/Linux) ignores some characters (like [). So ls /usr/share | sort gives output like
acpi-support
adduser
ADM_scripts
aglfn
aisleriot
I pipe through sort because ls does its own sorting, which isn't quite the same as sort's locale-based sorting.
If you want to sort some user-input arbitrary strings into an order that the user will see directly, locale-aware string comparison is usually what you want. Strings that differ only in case or accents won't compare equal, so this won't work if you were using a stable sort and depending on case-differing strings to compare equal, but otherwise you get nice results. Depending on the use-case, nicer than plain case-insensitive comparison.
FreeBSD's strcoll was and maybe still is case sensitive for locales other than POSIX (ASCII). That forum post suggests that on most other systems it is not case senstive.
MSVC provides a _stricoll for case-insensitive collation, implying that its normal strcoll is case sensitive. However, this might just mean that the fallback to comparing within an equivalence class doesn't happen. Maybe someone can test the following example with MSVC.
// strcoll.c: show that these strings sort in a different order, depending on locale
#include <stdio.h>
#include <locale.h>
int main()
{
// TODO: try some strings containing characters like '[' that strcoll ignores completely.
const char * s[] = { "FooBar - abc", "Foobar - bcd", "FooBar - cde" };
#ifdef USE_LOCALE
setlocale(LC_ALL, ""); // empty string means look at env vars
#endif
strcoll(s[0], s[1]);
strcoll(s[0], s[2]);
strcoll(s[1], s[2]);
return 0;
}
output of gcc -DUSE_LOCALE -Og strcoll.c && ltrace ./a.out (or run LANG=C ltrace a.out):
__libc_start_main(0x400586, 1, ...
setlocale(LC_ALL, "") = "en_CA.UTF-8" # my env contains LANG=en_CA.UTF-8
strcoll("FooBar - abc", "Foobar - bcd") = -1
strcoll("FooBar - abc", "FooBar - cde") = -2
strcoll("Foobar - bcd", "FooBar - cde") = -1
# the three strings are in order
+++ exited (status 0) +++
with gcc -Og -UUSE_LOCALE strcoll.c && ltrace ./a.out:
__libc_start_main(0x400536, ...
# no setlocale, so current locale is C
strcoll("FooBar - abc", "Foobar - bcd") = -32
strcoll("FooBar - abc", "FooBar - cde") = -2
strcoll("Foobar - bcd", "FooBar - cde") = 32 # s[1] should sort after s[2], so it's out of order
+++ exited (status 0) +++
POSIX.1-2001 provides strcasecmp. The POSIX spec says the results are "unspecified" for locales other than plain-ASCII, though, so I'm not sure whether common implementations handle utf-8 correctly or not.
See this post for portability issues with strcasecmp, e.g. to Windows. See other answers on that question for other C++ ways of doing case-insensitive string compares.
Once you have a case-insensitive comparison function, you can use it with other sort algorithms, like C standard lib qsort, or c++ std::sort, instead of writing your own O(n^2) selection-sort.
As b.buchhold's answer points out, doing a case-insensitive comparison on the fly might be slower than converting everything to lowercase once, and sorting an array of indices. The lowercase-version of each strings is needed many times. std::strxfrm will transform a string so that strcmp on the result will give the same result as strcoll on the original string.
You could call tolower on every character you compare. This is probably the easiest, yet not a great solution, becasue:
You look at every char multiple times so you'd call the method more often than necessary
You need extra care to handle wide-characters w.r.t to their encoding (UTF8 etc)
You could also replace the comparator by your own function. I.e. there will be some place where you compare something like stringone[i] < stringtwo[j] or charA < charB. change it to my_less_than(stringone[i], stringtwo[j]) and implement the exact ordering you want based.
another way would be to transform every string to lowercase once and create an array of pairs. then you base your comparisons on the lowercase value only, but you swap whole pairs so that your final strings will be in the right order as well.
finally, you can create an array with lowercase versions and sort this one. whenever you swap two elements in this one, you also swap in the original array.
note that all those proposals would still need proper handling of wide characters (if you need that at all)
This solution is much simpler to understand than Jonathan Mee's and pretty inefficient, but for educational purpose could be fine:
std::string lowercase( std::string s )
{
std::transform( s.begin(), s.end(), s.begin(), ::tolower );
return s;
}
std::sort( array, array + length,
[]( const std::string &s1, const std::string &s2 ) {
return lowercase( s1 ) < lowercase( s2 );
} );
if you have to use your sort function, you can use the same approach:
....
minValue = lowercase( array[startScan] );
for (int index = startScan + 1; index < size; index++) {
const std::string &tstr = lowercase( array[index] );
if (tstr < minValue) {
minValue = tstr;
minIndex = index;
}
}
...

Function isalnum(): unexpected results

For an assignment, I am using std::isalnum to determine if the input is a letter or a number. The point of the assignment is to create a "dictionary." It works well on small paragraphs, but does horrible on pages of text. Here is the code snippet I am using.
custom::String string;
std::cin >> string;
custom::String original = string;
size_t size = string.Size();
char j;
size_t i = 0;
size_t beg = 0;
while( i < size)
{
j = string[i];
if(!!std::isalnum(static_cast<unsignedchar>(j)))
{
--size;
}
if( std::isalnum( j ) )
{
string[i-beg] = tolower(j);
}
++i;
}//end while
string.SetSize(size - beg, '\0');
The code presented as I write this, does not make sense as a whole.
However, the calls to isalnum, as shown, would only work for plain ASCII, because
the C character classification functions require non-negative argument, or else EOF as argument, and
in order to work for international characters,
the encoding must be single-byte per character, and
setlocale should have been called prior to using the functions.
Regarding the first of these three points, you can wrap std::isalnum like this:
using Byte = unsigned char;
auto is_alphanumeric( char const ch )
-> bool
{ return !!std::isalnum( static_cast<Byte>( ch ) ); }
where the !! is just to silence a sillywarning from Visual C++ (warning about "performance", of all things).
Disclaimer: code untouched by compiler's hands.
Addendum: if you don't have a C++11 compiler, but only C++03,
typedef unsigned char Byte;
bool is_alphanumeric( char const ch )
{
return !!std::isalnum( static_cast<Byte>( ch ) );
}
As Bjarne remarked, C++11 feels like a whole new language! ;-)
I was able to create a solution to the problem. I noticed that isalnum did take care of some non alpha-numerics, but not all the time. Since the code above is part of a function, I called it multiple times with refined results given each time. I then came up with a do while loop that stores the string's size, calls the function, stores the new size, and compares them. If they are not the same it means that there is a chance that it needs to be called again. If they are the same, then the string has been fully cleaned. I am guessing that the reason isalnum was not working well was because I was reading in several chapters of a book into the string. Here is my code:
custom::string abc;
std::cin >> abc;
size_t first = 0;
size_t second = 0;
//clean the word
do{
first = abc.Size();
Cleanup(abc);
second = abc.Size();
}while(first != second);