I'm working on a lab that requires password authentication as both an older c-string and a string class. I have the string class version working. I've gotten the password entered as an array using cin.getline(password, 20)
strlen(password) also works correctly.
I've been searching for how to determing is the older c-string version contains an uppercase letter in any of it's values. Everything is saying to use isupper, which is from the newer string class(as far as I can tell).
Is there a way to do this? I'm considering just verifying using the string class version then inputting it into the char array.
There is a function called isupper in the C standard library, which takes a single character as an argument. (It doesn't matter where the character comes from, a C string or somewhere else.) This is probably what you are meant to use.
There is an isupper() function in the C standard library, as well - in <ctype>
It takes a char parameter, so you would need to iterate over the character array and call it for each character.
There's some good information about it here.
Since you know that C uses ASCII, you could create your own function:
bool upper(char chr)
{
return chr >= 'A' && chr <= 'Z'; // same as return chr >= 65 && chr <= 90
}
Related
string str='中test'
first_char = str[0]
How can I compare first_char with an int 128? I want to test whether the first char is an ascii or not.
Something like this:
if char(first_char) < 128:
return true
In C++ (and C), the signedness of a char is implementation-defined. Hence, a simple less-than operator will not suffice. You need some bitwise action:
bool is_ascii( char c )
{
return !(c & 0x80);
}
As soon as you begin messing with UTF-8 text (or any other non-ASCII text) the usual assumptions about what a character is go out the window. You should use a library, such as ICU, to help you. (Every modern OS has ICU installed already, so this should not be a difficult requirement.)
A comment on an earlier version of this answer of mine alerted me to the fact that I can't assume that 'A', 'B', 'C' etc. have successive numeric values. I had sort of assumed the C or C++ language standards guarantee that this is the case.
So, how should I determine whether consecutive letter characters' values are themselves consecutive? Or rather, how can I determine whether the character constants I can express within single quotes have their ASCII codes for a numeric value?
I'm asking how to do this both in C and in C++. Obviously the C way would work in C++ also, but if there's a C++ish facility for doing this I'm interested in that as well. Also, I'm asking about the newest relevant standards (C11, C++17).
You can use the preprocessor to check if a particular character maps to the charset:
#include <iostream>
using namespace std;
int main() {
#if ('A' == 65 && 'Z' - 'A' == 25)
std::cout << "ASCII" << std::endl;
#else
std::cout << "Other charset" << std::endl;
#endif
return 0;
}
The drawback is, you need to know the mapped values in advance.
The numeric chars '0' - '9' are guaranteed to appear in consecutive order BTW.
... (2) I expect to be able to obtain the distance in number-of-letters between two letters ...
This comment specifying your goal makes much more sense than your actual question! Why didn't you ask about that? You can use strchr on an array of characters, and strchr doesn't care what the native character set is, meaning your code won't care what the native character set is... For example:
char alphabet[] = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";
ptrdiff_t fubar = strchr(alphabet, 'y') - strchr(alphabet, 'X');
printf("'X' and 'y' have a distance of %tu and a case difference of %tu\n", fubar / 2, fubar % 2);
... how should I determine whether consecutive letter characters' values are themselves consecutive?
Consecutive letter characters' values are consecutive, by definition, because they're consecutive letter characters. I know this isn't what you meant, but your actual question illustrates a lack of planning and thought, and... a stupid question warrants a stupid answer.
You're much better off programming in such a way that you don't care what values they have. Nonetheless, create an array containing the characters you care about, loop through the elements and test for inconsistencies. For example:
int is_consecutive(char const *alphabet) {
for (size_t x = 0; alphabet[x] && alphabet[x] + 1 == alphabet[x + 1]; x++);
return !alphabet[x];
}
... how can I determine whether the character constants I can express within single quotes have their ASCII codes for a numeric value?
Again with the lack of sense, and again with the caring about values... Alternatively, build two translation tables, native_to_ascii and ascii_to_native, and work it out from there. I won't help you with this, as it's a silly exercise involving the use of magic numbers that most likely aren't necessary for your actual goal.
I'm trying to convert a string to lowercase, and am treating it as a char* and iterating through each index. The problem is that the tolower function I read about online is not actually converting a char to lowercase: it's taking char as input and returning an integer.
cout << tolower('T') << endl;
prints 116 to the console when it should be printing T.
Is there a better way for me to convert a string to lowercase?
I've looked around online, and most sources say to "use tolower and iterate through the char array", which doesn't seem to be working for me.
So my two questions are:
What am I doing wrong with the tolower function that's making it return 116 instead of 't' when I call tolower('T')
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
That's because there are two different tolower functions. The one that you're using is this one, which returns an int. That's why it's printing 116. That's the ASCII value of 't'. If you want to print a char, you can just cast it back to a char.
Alternatively, you could use this one, which actually returns the type you would expect it to return:
std::cout << std::tolower('T', std::locale()); // prints t
In response to your second question:
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
Nope.
116 is indeed the correct value, however this is simply an issue of how std::cout handles integers, use char(tolower(c)) to achieve your desired results
std::cout << char(tolower('T')); // print it like this
It's even weirder than that - it takes an int and returns an int. See http://en.cppreference.com/w/cpp/string/byte/tolower.
You need to ensure the value you pass it is representable as an unsigned char - no negative values allowed, even if char is signed.
So you might end up with something like this:
char c = static_cast<char>(tolower(static_cast<unsigned char>('T')));
Ugly isn't it? But in any case converting one character at a time is very limiting. Try converting 'ß' to upper case, for example.
To lower is int so it returns int. If you check #include <ctype> you will see that definition is int tolower ( int c ); You can use loop to go trough string and to change every single char to lowe case. For example
while (str[i]) // going trough string
{
c=str[i]; // ging c value of current char in string
putchar (tolower(c)); // changing to lower case
i++; //incrementing
}
the documentation of int to_lower(int ch) mandates that ch must either be representable as an unsigned char or must be equal to EOF (which is usually -1, but don't rely on that).
It's not uncommon for character manipulation functions that have been inherited from the c standard library to work in terms of ints. There are two reasons for this:
In the early days of C, all arguments were promoted to int (function prototypes did not exist).
For consistency these functions need to handle the EOF case, which for obvious reasons cannot be a value representable by a char, since that would mean we'd have to lose one of the legitimate encodings for a character.
http://en.cppreference.com/w/cpp/string/byte/tolower
The answer is to cast the result to a char before printing.
e.g.:
std::cout << static_cast<char>(std::to_lower('A'));
Generally speaking to convert an uppercase character to a lowercase, you only need to add 32 to the uppercase character as this number is the ASCII code difference between lowercase and uppercase characters, e.g., 'a'-'A'=97-67=32.
char c = 'B';
c += 32; // c is now 'b'
printf("c=%c\n", c);
Another easy way would be to first map the uppercase character to an offset within the range of English alphabets 0-25 i.e. 'a' is index '0' and 'z' is index '25' inclusive and then remap it to a lowercase character.
char c = 'B';
c = c - 'A' + 'a'; // c is now 'b'
printf("c=%c\n", c);
So I was playing around with some code and wanted to see which method of converting a std::string to upper case was most efficient. I figured that the two would be somewhat similar performance-wise, but I was terribly wrong. Now I'd like to find out why.
The first method of converting the string works as follows: for each character in the string (save the length, iterate from 0 to length), if it's between 'a' and 'z', then shift it so that it's between 'A' and 'Z' instead.
The second method works as follows: for each character in the string (start from 0, keep going till we hit a null terminator), apply the build in toupper() function.
Here's the code:
#include <iostream>
#include <string>
inline std::string ToUpper_Reg(std::string str)
{
for (int pos = 0, sz = str.length(); pos < sz; ++pos)
{
if (str[pos] >= 'a' && str[pos] <= 'z') { str[pos] += ('A' - 'a'); }
}
return str;
}
inline std::string ToUpper_Alt(std::string str)
{
for (int pos = 0; str[pos] != '\0'; ++pos) { str[pos] = toupper(str[pos]); }
return str;
}
int main()
{
std::string test = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!##$%^&*()_+=-`'{}[]\\|\";:<>,./?";
for (size_t i = 0; i < 100000000; ++i) { ToUpper_Reg(test); /* ToUpper_Alt(test); */ }
return 0;
}
The first method ToUpper_Reg took about 169 seconds per 100 million iterations.
The second method Toupper_Alt took about 379 seconds per 100 million iterations.
What gives?
Edit: I changed the second method so that it iterates the string how the first one does (set the length aside, loop while less than length) and it's a bit faster, but still about twice as slow.
Edit 2: Thanks everybody for your submissions! The data I'll be using it on is guaranteed to be ascii, so I think I'll be sticking with the first method for the time being. I'll keep in mind that toupper is locale specific for when/if I need it.
std::toupper uses the current locale to do case conversions, which involves a function call and other abstractions. So naturally, it will be slower. But it will also work on non-ASCII text.
toupper() does more than just shift characters in the range [a-z]. For one thing it's locale dependent and can handle more than just ASCII.
toupper() takes the locale into account so it can handle (some) international characters and is much more complex than just handling the character range 'a'-'z'.
Well, ToUpper_Reg() doesn't work. For example, it doesn't turn my name into all uppercase characters. That said, ToUpper_Alt() also doesn't work because it toupper() gets passed a negative value on some platforms, i.e. it creates undefined behavior (normally a crash) when using it with my name. This is easily fixed, though, by correctly calling it something like this:
toupper(static_cast<unsigned char>(str[pos]))
That said, the two versions of the code are not equivalent: the version onot using toupper() isn't writing the characters all the time while the latter version is: once everything is converted to uppercase it always takes the same branch after a test and then does nothing. You might want to change ToUpper_Alt() to look like this and retest:
inline std::string ToUpper_Alt(std::string str)
{
for (int pos = 0; str[pos] != '\0'; ++pos) {
if (islower(static_cast<unsigned char>(str[pos])) {
str[pos] = toupper(static_cast<unsigned char>(str[pos]));
}
}
return str;
}
I would guess the difference is the writing: toupper() trades the comparison for an array look-up. The locale is quickly accessed and all toupper() does is get the current pointer and access the location at a given offset. With data in the cache this is probably as fast as the branch.
The second on involves a function call. a function call is an expensive operation in an inner loop. toupper also uses locales to determine how the character should be changed.
The advances of the call is that it is standard and will work regardless of character encoding on the host machine
That said, I would highly recommend use the boost function:
boost::algorithm::to_upper
It is a template so is more than likely to be inlined, however it does involve locales. I would still use it.
http://www.boost.org/doc/libs/1_40_0/doc/html/boost/algorithm/to_upper.html
I guess it's because the second one calls a C standard library function, that on the one hand isn't inlined, so you got the overhead of a function call. But even more important, this function probably does a lot more than just two comparisons, two jumps and two integer additions. It performs additional checks on the character and takes the current locale into account and all that stuff.
std::toupper uses the current locale and the reason why this is slower than the C function is that the current locale is shared and mutable from different threads, so it's necessary to lock the locale object when it's accessed to ensure it's not switched during the call. This happens once per call to toupper and introduces quite a large overhead (obtaining the lock might require a syscall depending on implementation). One workaround if you want to get the performance and respect the locale is to get the locale object first (creating a local copy) and then call the toupper facet on your copy, thus avoiding the need to lock for each toupper call. See the link below for an example.
http://www.cplusplus.com/reference/std/locale/ctype/toupper/
The question has already been answered, but as an aside, replacing the guts of your loop in the first method with:
std::string::value_type &c = str[pos];
if ('a' <= c && c <= 'z') { c += ('A' - 'a'); }
makes it even faster. Maybe my compiler just sucks.
What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?
EDIT: I am interested in character arrays only and not in STL string.
Typically as another poster wrote:
char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};
or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)
char s[6] = {115, 116, 114, 105, 110, 107};
There is also a largely ignored way that works only in C (not C++)
char s[6] = "string";
If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).
Obviously you can also do it at run time:
char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';
or (same remark on ASCII charset as above)
char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;
Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).
memcpy(c, "string", 6);
or strncpy
strncpy(c, "string", 6);
What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.
As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.
What I mean is (for example) that you don't have to do
char c = '\0';
To store a code 0 in a char, just do:
char c = 0;
As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.
"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).
The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.
Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.
The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.
You can use a predefined length:
char s[6] = {'s','t','r','i','n','g'};
You can emulate pascal-style strings:
unsigned char s[7] = {6, 's','t','r','i','n','g'};
You can use std::string (in C++). (since you're not interested in std::string).
Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (i.e., wchar.h).
And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.
typedef struct {
char[10] characters;
} ThisIsNotACString;
C++ std::strings are not NUL terminated.
P.S : NULL is a macro1. NUL is \0. Don't mix them up.
1: C.2.2.3 Macro NULL
The macro NULL, defined in any of <clocale>, <cstddef>, <cstdio>, <cstdlib>, <cstring>,
<ctime>, or <cwchar>, is an implementation-defined C++ null pointer constant in this International
Standard (18.1).
In C++ you can use the string class and not deal with the null char at all.
Just for the sake of completeness and nail this down completely.
vector<char>
Use std::string.
There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).
In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char.
In C++ I'd definitely use the std::string class that can be accessed by
#include <string>
Being a commonly used library this will almost certainly be more reliable than rolling your own string class.
The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.
To be honest, I don't quite understand your question, or if it actually is a question.
Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.
I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.