Is this good enough to check an ascii string?

Is this good enough to check an ascii string? - c++

bool is_ascii(const string &word) {
if((unsigned char)(*word.c_str()) < 128){
return true
}
return false
}
I want to check whether a string is ascii string. I also saw such a function to detect whether a string is ascii chars or not:
bool is_ascii(const string &str){
std::locale loc;
for(size_t i = 0; i < str.size(); i++)
if( !isalpha(str[i], loc) && !isspace(str[i], loc))
return false;
return true;
}
Which one is better or more reliable?

Other answers get the is-char-ASCII part already. I’m assuming it’s right. Putting it together I’d recommend:
#include <algorithm>
bool is_ascii_char(unsigned char c) {
return (c & 0x80) == 0;
}
bool is_ascii(std::string_view s) {
return std::ranges::all_of(s, is_ascii_char);
}
https://godbolt.org/z/nKb673vaM
Or before C++20, that could be return std::all_of(s.begin(), s.end(), is_ascii_char);.

ASCII is a lot more than just alpha characters and spaces. If you want to accept all ASCII, just use your second example and change the if:
if(str[i] < 0 || str[i] > 0x7f)
return false;

Related

C++ std::string capitalize in non-latin language (without third-party libraries)

Considering the method:
void Capitalize(std::string &s)
{
bool shouldCapitalize = true;
for(size_t i = 0; i < s.size(); i++)
{
if (iswalpha(s[i]) && shouldCapitalize == true)
{
s[i] = (char)towupper(s[i]);
shouldCapitalize = false;
}
else if (iswspace(s[i]))
{
shouldCapitalize = true;
}
}
}
It works perfectly for ASCII characters, e.g.
"steve" -> "Steve"
However, once I'm using a non-latin characters, e.g. as with Cyrillic alphabet, I'm not getting that result:
"стив" -> "стив"
What is the reason why that method fails for non-latin alphabets? I've tried using methods such as isalpha as well as iswalpha but I'm getting exactly the same result.
What would be a way to modify this method to capitalize non-latin alphabets?
Note: Unfortunately, I'd prefer to solve this issue without using a third party library such as icu4c, otherwise it would have been a very simple problem to solve.
Update:
This solution doesn't work (for some reason):
void Capitalize(std::string &s)
{
bool shouldCapitalize = true;
std::locale loc("ru_RU"); // Creating a locale that supports cyrillic alphabet
for(size_t i = 0; i < s.size(); i++)
{
if (isalpha(s[i], loc) && shouldCapitalize == true)
{
s[i] = (char)toupper(s[i], loc);
shouldCapitalize = false;
}
else if (isspace(s[i], loc))
{
shouldCapitalize = true;
}
}
}

std::locale works, at least where it is present in system. Also you use it incorrectly.
This code works as expected on Ubuntu with Russian locale installed:
#include <iostream>
#include <locale>
#include <string>
#include <codecvt>
void Capitalize(std::wstring &s)
{
bool shouldCapitalize = true;
std::locale loc("ru_RU.UTF-8"); // Creating a locale that supports cyrillic alphabet
for(size_t i = 0; i < s.size(); i++)
{
if (isalpha(s[i], loc) && shouldCapitalize == true)
{
s[i] = toupper(s[i], loc);
shouldCapitalize = false;
}
else if (isspace(s[i], loc))
{
shouldCapitalize = true;
}
}
}
int main()
{
std::wstring in = L"это пример текста";
Capitalize(in);
std::wstring_convert<std::codecvt_utf8<wchar_t>> conv1;
std::string out = conv1.to_bytes(in);
std::cout << out << "\n";
return 0;
}
Its possible that on Windows you need to use other locale name, I'm not sure.

Well, an external library would be the only practical choice IMHO. The standard functions works well with Latin, and any other locale would be a pain, and I wouldn't bother. Still, if you want support for Latin and Cyrillic without an external library, you can just write it yourself:
wchar_t to_upper(wchar_t c) {
// Latin
if (c >= L'a' && c <= L'z') return c - L'a' + L'A';
// Cyrillic
if (c >= L'а' && c <= L'я') return c - L'а' + L'А';
return towupper(c);
}
Still, it's important to note that you need to painstakingly implement support for all alphabets, and even not all latin characters are supported, so an external library is the best solution. Consider the given solution if you're sure only English and Russian are going to be used.

Code checking the result of std::unordered_set::find won't compile

I am writing a program to determine whether all characters in a string are unique or not. I am trying to do this using an unordered_set. Here is my code:
#include <iostream>
#include <unordered_set>
#include <string>
using namespace std;
bool uniqueChars(string word) {
unordered_set<char> set;
for (int i = 0; i < word.length(); i++) {
auto character = set.find(word[i]);
// if word[i] is found in set then not all chars are unique
if (character == word[i]) {
return false;
}
//else add word[i] to set
else {
set.insert(word[i]);
}
}
return true;
}
int main() {
string word;
getline(cin, word);
bool result = uniqueChars(word);
return 0;
}
It is giving me this error:
|15|error: no match for 'operator==' (operand types are 'std::__detail::_Node_iterator' and 'char')|
I believe that means that character is not comparable to word[i], but I'm not sure.
How do I make this work?

Note that std::unordered_set::find returns an iterator, not the element. It can't be compared to the element directly.
You could check whether the element was found or not by comparing the iterator with std::unordered_set::end. e.g.
auto character = set.find(word[i]);
// if word[i] is found in set then not all chars are unique
if (character != set.end()) {
return false;
}
//else add word[i] to set
else {
set.insert(word[i]);
}
BTW: Better not to use set as the name of variable, which is the name of another STL container.

Take advantage of the return value of insert. It tells you whether a duplicate was found during insertion (in which case nothing is inserted).
bool uniqueChars(string word) {
unordered_set<char> set;
for ( char c : word ) {
if ( ! set.insert( c ).second ) {
return false; // set didn't insert c because of a duplicate.
}
}
return true; // No duplicates.
}
However, this isn't as efficient as it might look. unordered_set is a heap-based hash table and its implementation is fairly heavyweight. A lightweight bit-vector works well for classifying characters.
#include <bitset>
constexpr int char_values = numeric_limits< char >::max()
- numeric_limits< char >::min() + 1;
bool uniqueChars(string word) {
bitset< char_values > set;
for ( char c : word ) {
int value_index = c - numeric_limits< char >::min();
if ( set[ value_index ] ) {
return false;
} else {
set[ value_index ] = true;
}
}
return true; // No duplicates.
}

*character == word[i]
( This is the way to access the characters but it is not needed and it
should be guided by a check whether it points to the past to the last
element)
The *charcater is basically referencing the already inserted charcater.
if(character != set1.end() )
return false; // as we are sure that it is not unique character string
You have to dereference it. but in that case you also need to do the whether it return iterator pointing to `set::end``.
By the way there is a really a simple way to do what you are trying to do.
bool uniqueChars(string word) {
unordered_set<char> set1;
for (int i = 0; i < word.length(); i++)
auto character = set1.insert(word[i]);
return set1.size()==word.length();
}
"set" is a key word in c++

Determine If String Has All Same Character

Is there a function like find_first_not_of that returns true or false as opposed to a position? I do not need the position, but rather whether or not the string contains all of the same char.

You could write your own function:
bool all_chars_same(string testStr) {
char letter = testStr[0];
for (int i = 1; i < testStr.length(); i++) {
if (testStr[i] != letter)
return false;
}
return true;
}
Or use the built in find_first_not_of:
bool all_chars_same(string testStr) {
return testStr.find_first_not_of(testStr[0]) == string::npos;
}

Just check the value returned by find_first_not_of for string::npos:
// needs to check if str.size() > 0
bool all_same = str.find_first_not_of(str[0]) == string::npos;
Alternatively, since you're looking for a single character, there's also std::all_of.
bool all_same = std::all_of(str.cbegin(), str.cend(), [&](char c){ return str[0] == c; });

use yourstring.find(keyword);
you can get detail here
http://www.cplusplus.com/reference/string/string/find/

I would recomend a define, it is the faster way.
#define find_not_of(a) (a.find_first_not_of(a[0]) != std::string::npos)

The best way and the quickest i can of is create a map and put the first value of the string as the key of the map. then iterate through the string and once you find one characters that is not in the map , you are done
bool allSameCharacters ( string s){
unordered_map < char , int> m;
// m.reserve(s.size());
m[s[0]]++;
for (char c : s ){
if (m.find(c) == m.end()) return false;
}
return true;
}

booleans with constraints

How do I write a boolean that checks if a string has only letters, numbers and an underscore?

Assuming String supports iterators, use all_of:
using std::begin;
using std::end;
return std::all_of(begin(String), end(String),
[](char c) { return isalnum(c) || c == '_'; });

In an easier way, run a loop and check all the characters holding the property you mentioned, and if not, just return false.
Code:
bool stringHasOnlyLettersNumbsandUndrscore(std::string const& str)
{
for(int i = 0; i < str.length(); ++i)
{
//Your character in the string does not fulfill the property.
if (!isalnum(str[i]) && str[i] != '_')
{
return false;
}
}
//The whole string fulfills the condition.
return true;
}

bool stringHasOnlyLettersNumbsandUndrscore(std::string const& str)
{
return ( std::all_of(str.begin(), str.end(),
[](char c) { return isalnum(c) || c == '_'; }) &&
(std::count_if(str.begin(), str.end(),
[](char c) { return (c == '_'); }) < 2));
}

Check if each character is a letter, number or underscore.
for c and c++ , this should do.
if(!isalnum(a[i]) && a[i]!='_')
cout<<"No";
You will have to add < ctype > for this code to work.
This is just the quickest way that comes to mind, there might be other more complex and faster ways.

c++ char comparison to see if a string is fits our needs

i want to do my work if chars of the string variable tablolar does not contain any char but small letters between a-z and ','. what do you suggest?
if string tablolar is;
"tablo"->it is ok
"tablo,tablobir,tabloiki,tablouc"->it is ok
"ta"->it is ok
but if it is;
"tablo2"->not ok
"ta546465"->not ok
"Tablo"->not ok
"tablo,234,tablobir"->not ok
"tablo^%&!)=(,tablouc"-> not ok
what i tried was wrog;
for(int z=0;z<tablolar.size();z++){
if ((tablolar[z] == ',') || (tablolar[z] >= 'a' && tablolar[z] <= 'z'))
{//do your work here}}

tablolar.find_first_not_of("abcdefghijknmopqrstuvwxyz,") will return the position of the first invalid character, or std::string::npos if the string is OK.

bool fitsOurNeeds(const std::string &tablolar) {
for (int z=0; z < tablolar.size(); z++)
if (!((tablolar[z] == ',') || (tablolar[z] >= 'a' && tablolar[z] <= 'z')))
return false;
return true;
}

The c function islower tests for lowercase. So you probably want something along these lines:
#include <algorithm>
#include <cctype> // for islower
bool fitsOurNeeds(std::string const& tabular)
{
return std::all_of(tabular.begin(), tabular.end(),
[](char ch)
{
return islower(ch) || ch == ',';
});
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is this good enough to check an ascii string? - c++

ASCII is a lot more than just alpha characters and spaces. If you want to accept all ASCII, just use your second example and change the if: if(str[i] < 0 || str[i] > 0x7f) return false;

Related

C++ std::string capitalize in non-latin language (without third-party libraries)

Code checking the result of std::unordered_set::find won't compile

Determine If String Has All Same Character

booleans with constraints

c++ char comparison to see if a string is fits our needs

Categories

Resources