How to loop through all ASCII characters? - c++

I am attempting to write a program which loops through all ASCII characters, selects those which are letters, appends them to a string and then outputs this string. I have written the following code:
#include <iostream>
#include <string>
#include "ctype.h"
using namespace std;
int main()
{
string str;
for (char k = 0; k <= 255; k++)
if (isalpha(k) != 0) {
str += k;
}
cout << str << '\n';
}
However, when I run this code I get a 'bad_alloc' error, which to my understanding means I ran out of memory. Thus, I must have done something terribly wrong in my code. I am very new to C++ in general, could someone point out my (possibly many) mistakes?

You should enable all warnings when you compile your code (-Wall -Wextra). Doing so would result in the following message by the compiler:
clang:
result of comparison of constant 255 with expression of type 'char' is always true [-Wtautological-constant-out-of-range-compare]
gcc:
warning: comparison is always true due to limited range of data type [-Wtype-limits]
Depending on the compiler and the target platform the signedness of char depends can vary. The defaults for ARM and PowerPC are typically unsigned, the defaults for x86 and x64 are typically signed.
The range of char, if signed, is -128 to 127, to be platform-independent, you need to ensure that you use unsinged char, but even then you would have the problem that <=255 would produce the same bad_alloc error, as the maximum value of a unsigned char is 255, so you have to use `k < 255;':
for (unsigned char k = 0; k < 255; k++)

On many platforms, char is signed. That means it goes to 127 then "overflows".
Overflowing has undefined behaviour, but sometimes that looks like a "wrap-around" to a negative value. The result is that k will always be less than 255, so your loop will never end, and you'll keep trying to add characters to the string forever, or until you run out of memory (boom).
Indeed even if you use an unsigned char, you're still never going to get above 255.
Use an int instead.
Your call to isalpha also has undefined behaviour, for similar reasons (refer to the documentation for functions that you use).

Change for (char k = 0; k <= 255; k++) to for (int k = 0; k <= 255; k++) then it will be fine.
Reason:
Char on the system is signed character, hence it's range is 2^n - 1 to -(2^n) - 1 where n is 8 (you can check it by printing macro CHAR_BIT).
So when the loop for k reaches 127 and goes for the next value, it becomes -128 as the value wraps around & then it becomes the infinite loop.
You can verify this by:
char c=127;
char d=c+1;
printf("%d\n",d);
OUTPUT: -128

Related

A 'for' loop that appears to be practically infinite

I'm debugging some code at the moment, and I've come across this line:
for (std::size_t j = M; j <= M; --j)
(Written by my boss, who's on holiday.)
It looks really odd to me.
What does it do? To me, it looks like an infinite loop.
std::size_t is guaranteed by the C++ standard to be a unsigned type. And if you decrement an unsigned type from 0, the standard guarantees that the result of doing that is the largest value for that type.
That wrapped-around value is always greater than or equal to M1 so the loop terminates.
So j <= M when applied to an unsigned type is a convenient way of saying "run the loop to zero then stop".
Alternatives such as running j one greater than you want, and even using the slide operator for (std::size_t j = M + 1; j --> 0; ){ exist, which are arguably clearer although require more typing. I guess one disadvantage though (other than the bewildering effect it produces on first inspection) is that it doesn't port well to languages with no unsigned types, such as Java.
Note also that the scheme that your boss has picked "borrows" a possible value from the unsigned set: it so happens in this case that M set to std::numeric_limits<std::size_t>::max() will not have the correct behaviour. In fact, in that case, the loop is infinite. (Is that what you're observing?) You ought to insert a comment to that effect in the code, and possibly even assert on that particular condition.
1 Subject to M not being std::numeric_limits<std::size_t>::max().
What your boss was probably trying to do was to count down from M to zero inclusive, performing some action on each number.
Unfortunately there's an edge case where that will indeed give you an infinite loop, the one where M is the maximum size_t value you can have. And, although it's well defined what an unsigned value will do when you decrement it from zero, I maintain that the code itself is an example of sloppy thinking, especially as there's a perfectly viable solution without the shortcomings of your bosses attempt.
That safer variant (and more readable, in my opinion, while still maintaining a tight scope limit), would be:
{
std::size_t j = M;
do {
doSomethingWith(j);
} while (j-- != 0);
}
By way of example, see the following code:
#include <iostream>
#include <cstdint>
#include <climits>
int main (void) {
uint32_t quant = 0;
unsigned short us = USHRT_MAX;
std::cout << "Starting at " << us;
do {
quant++;
} while (us-- != 0);
std::cout << ", we would loop " << quant << " times.\n";
return 0;
}
This does basically the same thing with an unsigned short and you can see it processes every single value:
Starting at 65535, we would loop 65536 times.
Replacing the do..while loop in the above code with what your boss basically did will result in an infinite loop. Try it and see:
for (unsigned int us2 = us; us2 <= us; --us2) {
quant++;
}

C function call produces different result than C++ function call

I have this C code:
#include <stdio.h>
#include <windows.h>
int main() {
int i;
while(1) {
for (i = 8; i <= 190; i++) {
if (GetAsyncKeyState(i) == -32767) // That number doesnt actually work, only works with -32768 (which checks to see if the key is pressed down)
printf("%c\n",i); // When using -32768, it prints the key if it is held down about 20 times
}
}
return 0;
}
so -32767 doesnt work in that C code, but I have this C++ code:
#include <iostream>
using namespace std;
#include <windows.h>
int main() {
char i;
while (1) {
for(i = 8; i <= 190; i++) {
if (GetAsyncKeyState(i) == -32767) // works as intended, prints the pressed key letter (doesnt have to be held down)
cout << i << endl;
}
}
return 0;
}
Which works with -32767. This is very confusing as both of these are being ran on the same computer with the same command: clang++ test.c
Output of C with -32768(pressing A):
A
A
A
A
A
A
A
A
A
A
A
output of C++ code with -32767:
A
B
C
D
E
F
G
H
Output of C code with -32767:
(Nothing)
According to the documentation of GetAsyncKeyState:
If the function succeeds, the return value specifies whether the key was pressed since the last call to GetAsyncKeyState, and whether the key is currently up or down. If the most significant bit is set, the key is down, and if the least significant bit is set, the key was pressed after the previous call to GetAsyncKeyState. However, you should not rely on this last behavior; for more information, see the Remarks.
This says that it is not reliable to treat -32768 differently to -32767. It also says nothing about all of the bits in between, but your code is assuming they are 1 bits without justification.
To be reliable, your code should only do the following tests on the return value:
>= 0 - key currently up, or info unavailable
< 0 - key currently down
Your code relies on implementation-defined behavior. It appears that your C compiler is using unsigned chars, while your C++ compiler uses signed ones. That is why the nested loop in C goes all the way to 190, while the same loop in C++ wraps around to zero upon reaching 128.
You can fix this by making the type of i an unsigned char in both implementations. You could also make i an int, and add a cast to char in the call of GetAsyncKeyState function.
See this topic:
http://www.cplusplus.com/forum/general/141404/
-32767 is actually 1000 0000 0000 0001 in binary, when you compare the returned value from GetAsyncKeyState(i), you're basically asking if only the left and right bits are on, but as it is said in the link above, the bits 1-14 may not always be zero.
A more proper expression would be like that:
if (GetAsyncKeyState(i) & -32767)
or maybe use a hex literal instead:
if (GetAsyncKeyState(i) & 0x8001)
Actually the reason why the C implementation didnt work was because I was manually executing ./a in the mingw32 terminal and for some reason it didn't print anything until after alot of keys have been pressed, so I just executed a.exe by double-clicking and it worked, this has been a very weird experience :/

mistake in C++ function that returns most common character in a string. Multibyte characters?

Pursuing a job, I was asked to solve a problem on HackerRank.com, to write a function that accepts a string, counts the characters in it and returns the most common character found. I wrote my solution, got the typos fixed, and it works with my test cases and theirs, except it fails "Test 7". Because its an interview deal, HackerRank doesn't tell me the failure details, just that it failed.
I used far too much time trying to figure out why. I've triple checked for off-by-one errors, wrote the code for 8 bit chars but tried accepting 16 bit values without changing the result. Here's my code. I cannot give the error, just that there is one.
Could it be multi-byte characters?
How can I create a testcase with a 2 byte or 3 byte character?
I put in some display dump code and what comes out is exactly what you'd expect. I have Mac XCode IDE on my desktop, any suggestions are welcome!
/*
* Complete the function below.
*/
char func(string theString) {
// I wonder what I'm doing wrong. 256 doesn't work any better here.
const int CHARS_RECOGED = 65536; // ie 0...65535 - even this isn't good enough to fix test case 7.
unsigned long alphaHisto[CHARS_RECOGED];
for (int count = 0; count < CHARS_RECOGED; count++ ) {
alphaHisto[ count ] = 0;
} // for int count...
cout << "size: " << theString.size() << endl;
for (int count = 0; count < theString.size(); count++) {
// unsigned char uChar = theString.at(count); // .at() better protected than [] - and this works no differently...
unsigned int uChar = std::char_traits<char>::to_int_type(theString.at(count)); // .at() better protected than []
alphaHisto[ uChar ]++;
} // for count...
unsigned char mostCommon = -1;
unsigned long totalMostCommon = 0;
for (int count = 0; count < CHARS_RECOGED; count++ ) {
if (alphaHisto[ count ] > totalMostCommon){
mostCommon = count;
totalMostCommon = alphaHisto[ count ];
} // if alphahisto
} // for int count...
for (int count = 0; count < CHARS_RECOGED; count++ ) {
if (alphaHisto[ count ] > 0){
cout << (char)count << " " << count << " " << alphaHisto[ count ] << endl;
} // if alphaHisto...
} // for int count...
return (char) mostCommon;
}
// Please provide additional test cases:
// Input Return
// thequickbrownfoxjumpsoverthelazydog e
// the quick brown fox jumps over the lazy dog " "
// theQuickbrownFoxjumpsoverthelazydog e
// the Quick BroWn Fox JuMpS OVER THe lazy dog " "
// the_Quick_BroWn_Fox.JuMpS.OVER..THe.LAZY.DOG "."
If the test is anything to take serious, the charset should be specified. Without, it´s probably safe to assume that one byte is one char. Just as side note, to support charsets with multibyte chars, exchanging 256 with 65536 is far from enough, but even without multibyte chars, you could exchange 256 with 1<<CHAR_BITS because a "byte" may have more than 8 bit.
I´m seeing a more important problem with
unsigned int uChar = std::char_traits<char>::to_int_type(theString.at(count));
First, it´s unnecessary complex:
unsigned int uChar = theString.at(count);
should be enough.
Now remember that std::string::at returns a char, and your variable is unsigned int. What char means without explicitely stating if it is signed or unsigned depends on the compiler (ie. if it is signed char or unsigned char). Now, char values between 0 and 127 will be saved without changes in the target variable, but that´s only half of the value range: If char is unsigned, 128-255 will work fine too, but signed chars, ie. between -128 and -1, won´t map to unsigned 128-255 if the target variable is bigger than the char. With a 4 byte integer, you´ll get some huge values which aren´t valid indices for your array => problem. Solution: Use char, not int.
unsigned char uChar = theString.at(count);
Another thing, :
for (int count = 0; count < theString.size(); count++)
theString.size() returns a size_t which may have differend size and/or signedness compared to int, with huge string lengths there could be problems because of that. Accordingly, the char-counting numbers could be size_t too instead of unsigned long...
And the least likely problem source, but if this runs on machines without two-complement,
it´ll probably fail spectacularly (altough I didn´t thought it through in detail)

Why is memset() incorrectly initializing int?

Why is the output of the following program 84215045?
int grid[110];
int main()
{
memset(grid, 5, 100 * sizeof(int));
printf("%d", grid[0]);
return 0;
}
memset sets each byte of the destination buffer to the specified value. On your system, an int is four bytes, each of which is 5 after the call to memset. Thus, grid[0] has the value 0x05050505 (hexadecimal), which is 84215045 in decimal.
Some platforms provide alternative APIs to memset that write wider patterns to the destination buffer; for example, on OS X or iOS, you could use:
int pattern = 5;
memset_pattern4(grid, &pattern, sizeof grid);
to get the behavior that you seem to expect. What platform are you targeting?
In C++, you should just use std::fill_n:
std::fill_n(grid, 100, 5);
memset(grid, 5, 100 * sizeof(int));
You are setting 400 bytes, starting at (char*)grid and ending at (char*)grid + (100 * sizeof(int)), to the value 5 (the casts are necessary here because memset deals in bytes, whereas pointer arithmetic deals in objects.
84215045 in hex is 0x05050505; since int (on your platform/compiler/etc.) is represented by four bytes, when you print it, you get "four fives."
memset is about setting bytes, not values. One of the many ways to set array values in C++ is std::fill_n:
std::fill_n(grid, 100, 5);
Don't use memset.
You set each byte [] of the memory to the value of 5. Each int is 4 bytes long [5][5][5][5], which the compiler correctly interprets as 5*256*256*256 + 5*256*256 + 5*256 + 5 = 84215045. Instead, use a for loop, which also doesn't require sizeof(). In general, sizeof() means you're doing something the hard way.
for(int i=0; i<110; ++i)
grid[i] = 5;
Well, the memset writes bytes, with the selected value. Therefore an int will look something like this:
00000101 00000101 00000101 00000101
Which is then interpreted as 84215045.
You haven't actually said what you want your program to do.
Assuming that you want to set each of the first 100 elements of grid to 5 (and ignoring the 100 vs. 110 discrepancy), just do this:
for (int i = 0; i < 100; i ++) {
grid[i] = 5;
}
I understand that you're concerned about speed, but your concern is probably misplaced. On the one hand, memset() is likely to be optimized and therefore faster than a simple loop. On the other hand, the optimization is likely to consist of writing more than one byte at a time, which is what this loop does. On the other other hand, memset() is a loop anyway; writing the loop explicitly rather than burying it in a function call doesn't change that. On the other other other hand, even if the loop is slow, it's not likely to matter; concentrate on writing clear code, and think about optimizing it if actual measurements indicate that there's a significant performance issue.
You've spent many orders of magnitude more time writing the question than your computer will spend setting grid.
Finally, before I run out of hands (too late!), it doesn't matter how fast memset() is if it doesn't do what you want. (Not setting grid at all is even faster!)
If you type man memset on your shell, it tells you that
void * memset(void *b, int c, size_t len)
A plain English explanation of this would be, it fills a byte string b of length len with each byte a value c.
For your case,
memset(grid, 5, 100 * sizeof(int));
Since sizeof(int)==4, thus the above code pieces looked like:
for (int i=0; i<100; i++)
grid[i]=0x05050505;
OR
char *grid2 = (char*)grid;
for (int i=0; i<100*sizeof(int); i++)
grid2[i]=0x05;
It would print out 84215045
But in most C code, we want to initialize a piece of memory block to value zero.
char type --> \0 or NUL
int type --> 0
float type --> 0.0f
double type --> 0.0
pointer type --> nullptr
And either gcc or clang etc. modern compilers can take well care of this for you automatically.
// variadic length array (VLA) introduced in C99
int len = 20;
char carr[len];
int iarr[len];
float farr[len];
double darr[len];
memset(carr, 0, sizeof(char)*len);
memset(iarr, 0, sizeof(int)*len);
memset(farr, 0, sizeof(float)*len);
memset(darr, 0, sizeof(double)*len);
for (int i=0; i<len; i++)
{
printf("%2d: %c\n", i, carr[i]);
printf("%2d: %i\n", i, iarr[i]);
printf("%2d: %f\n", i, farr[i]);
printf("%2d: %lf\n", i, darr[i]);
}
But be aware, C ISO Committee does not imposed such definitions, it is compiler-specific.
Since the memset writes bytes,I usually use it to set an int array to zero like:
int a[100];
memset(a,0,sizeof(a));
or you can use it to set a char array,since a char is exactly a byte:
char a[100];
memset(a,'*',sizeof(a));
what's more,an int array can also be set to -1 by memset:
memset(a,-1,sizeof(a));
This is because -1 is 0xffffffff in int,and is 0xff in char(a byte).
This code has been tested. Here is a way to memset an "Integer" array to a value between 0 to 255.
MinColCost=new unsigned char[(Len+1) * sizeof(int)];
memset(MinColCost,0x5,(Len+1)*sizeof(int));
memset(MinColCost,0xff,(Len+1)*sizeof(int));

Replacing multiple chars at the same time

So in my code I have a series of chars which I want to replace with random data. Since rand can replace ints, I figured I could save some time by replacing four chars at once instead of one at a time. So basically instead of this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i++) // generating the data to send.
TXT[i] = rand() % 255;
I'd like to do something like:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4) // generating the data to send.
TXT[i] = rand() % 4294967295;
Something that effect, but I'm not sure how to do the latter part. Any help you can give me is greatly appreciated, thanks!
That won't work. The compiler will take the result from rand() % big_number and chop off the extra data to fit it in an unsigned char.
Speed-wise, your initial approach was fine. The optimization you contemplated is valid, but most likely unneeded. It probably wouldn't make a noticeable difference.
What you wanted to do is possible, of course, but given your mistake, I'd say the effort to understand how right now far outweights the benefits. Keep learning, and the next time you run across code like this, you'll know what to do (and judge if it's necessary), look back on this moment and smile :).
You'll have to access memory directly, and do some transformations on your data. You probably want something like this:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght/sizeof(int); i+=sizeof(int)) // generating the data to send.
{
int *temp = (int*)&TXT[i]; // very ugly
*temp = rand() % 4294967295;
}
It can be problematic though because of alignment issues, so be careful. Alignment issues can cause your program to crash unexpectedly, and are hard to debug. I wouldn't do this if I were you, your initial code is just fine.
TXT[i] = rand() % 4294967295;
Will not work the way you expect it to. Perhaps you are expecting that rand()%4294967295 will generate a 4 byte integer(which you maybe interpreting as 4 different characters). The value that rand()%4294967295, produces will be type cast into a single char and will get assigned to only one of the index of TXT[i].
Though it's not quire clear as to why you need to make 4 assigning at the same time, one approach would be to use bit operators to obtain 4 different significant bytes of the number generated and those can then be assigned to the four different index.
There are valid answers just so much C does not care very much about what type it stores at which address. So you can get away with something like:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
char *arr;
int *iArr;
int main (void){
int i;
arr = malloc(100);
/* Error handling ommitted, yes that's evil */
iArr = (int*) arr;
for (i = 0; i < 25; i++) {
iArr[i] = rand() % INT_MAX;
}
for (i = 0; i < 25; i++) {
printf("iArr[%d] = %d\n", i, iArr[i]);
}
for (i = 0; i < 100; i++) {
printf("arr[%d] = %c\n", i, arr[i]);
}
free(arr);
return 0;
}
In the end an array is just some contiguous block in memory. And you can interpret it as you like (if you want). If you know that sizeof(int) = 4 * sizeof(char) then the above code will work.
I do not say I recommend it. And the others have pointed out whatever happened the first loop through all the chars in TXT will yield the same result. One could think for example of unrolling a loop but really I'd not care about that.
The (int*) just alone is warning enough. It means to the compiler, do not think about what you think the type is just "believe" he programmer that he knows better.
Well this "know better" is probably the root of all evil in C programming....
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght; i+4)
// generating the data to send.
TXT[i] = rand() % 4294967295;
This has a few issues:
TXT is not guaranteed to be memory-aligned as needed for the CPU to write int data (whether it works - perhaps relatively slowly - or not - e.g. SIGBUS on Solaris - is hardware specific)
the last 1-3 characters may be missed (even if you change i + 4 to i += 4 ;-P)
rand() returns an int anyway - you don't need to mod it with anything
you need to write your random data via an int* so you're accessing 4 bytes at a time and not simply slicing a byte off the end of the random data and overwriting every fourth single character
for stuff like this where you're dependent on the size of int, you should really write it in terms of sizeof(int) so it'll work even if int isn't 32 bits, or use a (currently sadly) non-Standard but common typedef such as int32_t (or on Windows I think it's __int32, or you can use a boost or other library header to get int32_t, or write your own typedef).
It's actually pretty tricky to align your text data: your code suggests you want int-sized slices from the 35th character... even if the overall character array is aligned properly for ints, the 35th character will not be.
If it really is always the 35th, then you can pad the data with a leading character so you're accessing the 36th (being a multiple of presumably 32-bit int size), then align the text to an 32-bit address (with a compiler-specific #pragma or using a union with int32_t). If the real code varies the character you start overwriting from, such that you can't simply align the data once, then you're stuck with:
your original character-at-a-time overwrites
non-portable unaligned overwrites (if that's possible and better on your system), OR
implementing code that overwrites up to three leading unaligned characters, then switches to 32-bit integer overwrite mode for aligned addresses, then back to character-by-character overwrites for up to three trailing characters.
That does not work because the generated value is converted to type of array element - char in this particular case. But you are free to interpret allocated memory in the manner you like. For example, you could convert it into array int:
unsigned char TXT[] = { data1,data2,data3,data4,data4,data5....
for (i = 34; i < flenght-sizeof(int); i+=sizeof(int)) // generating the data to send.
*(int*)(TXT+i) = rand(); // There is no need in modulo operator
for (; i < flenght; ++i) // generating the data to send.
TXT[i] = rand(); // There is no need in modulo operator either
I just want to complete solution with the remarks about modulo operator and handling of arrays not multiple of sizeof(int).
1) % means "the remainder when divided by", so you want rand() % 256 for a char, or else you will never get chars with a value of 255. Similarly for the int case, although here there is no point in doing a modulus operation anyway, since you want the entire range of output values.
2) rand usually only generates two bytes at a time; check the value of RAND_MAX.
3) 34 isn't divisible by 4 anyway, so you will have to handle the end case specially.
4) You will want to cast the pointer, and it won't work if it isn't already aligned. Once you have the cast, though, there is no need to account for the sizeof(int) in your iteration: pointer arithmetic automatically takes care of the element size.
5) Chances are very good that it won't make a noticeable difference. If scribbling random data into an array is really the bottleneck in your program, then it isn't really doing anything significiant anyway.