I am writing a function that determines whether a string contains only alphanumeric characters and spaces. I am effectively testing whether it matches the regular expression ^[[:alnum:] ]+$ but without using regular expressions. This is what I have so far:
#include <algorithm>
static inline bool is_not_alnum_space(char c)
{
return !(isalpha(c) || isdigit(c) || (c == ' '));
}
bool string_is_valid(const std::string &str)
{
return find_if(str.begin(), str.end(), is_not_alnum_space) == str.end();
}
Is there a better solution, or a “more C++” way to do this?
Looks good to me, but you can use isalnum(c) instead of isalpha and isdigit.
And looking forward to C++0x, you'll be able to use lambda functions (you can try this out with gcc 4.5 or VS2010):
bool string_is_valid(const std::string &str)
{
return find_if(str.begin(), str.end(),
[](char c) { return !(isalnum(c) || (c == ' ')); }) == str.end();
}
You can also do this with binders so you can drop the helper function. I'd recommend Boost Binders as they are much easier to use then the standard library binders:
bool string_is_valid(const std::string &str)
{
return find_if(str.begin(), str.end(),
!boost::bind(isalnum, _1) || boost::bind(std::not_equal_to<char>, _1, ' ')) == str.end();
}
Minor points, but if you want is_not_alnum_space() to be a helper function that is only visible in that particular compilation unit, you should put it in an anonymous namespace instead of making it static:
namespace {
bool is_not_alnum_space(char c)
{
return !(isalpha(c) || isdigit(c) || (c == ' '));
}
}
...etc
In case dont want to use stl function then this can be used:
// function for checking char is alphnumeric
bool checkAlphaNumeric(char s){
if(((s - 'a' >= 0) && (s - 'a' < 26)) ||((s - 'A' >= 0) && (s - 'A' < 26)) || ((s- '0' >= 0) &&(s - '0' < 10)))
return true;
return false;
}
//main
String s = "ab cd : 456";
for(int i = 0; i < s.length(); i++){
if(!checkAlphaNumeric(s[i])) return false;
}
Related
Edit: I'm looking for a solution that doesn't use regex since it seems buggy and not trustable
I had the following function which extracts tokens of a string whenever one the following symbols is found: +,-,^,*,!
bool extract_tokens(string expression, std::vector<string> &tokens) {
static const std::regex reg(R"(\+|\^|-|\*|!|\(|\)|([\w|\s]+))");
std::copy(std::sregex_token_iterator(right_token.begin(), right_token.end(), reg, 0),
std::sregex_token_iterator(),
std::back_inserter(tokens));
return true;
}
I though it worked perfectly until today I found an edge case,
The following input : !aaa + ! a is supposed to return !,aaa ,+,!, a But it returns !,aaa ,+,"",!, a Notice the extra empty string between + and !.
How may I prevent this behaviour? I think this can be done with the regex expression,
In an attempt to salvage the regular expression-based solution, I came up with this:
[-+^*!()]|\s*[^-+^*!()\s][^-+^*!()]*
Demo. This reports delimiters, and anything between delimiters including leading and trailing whitespace, but drops tokens consisting of whitespace alone.
A similar expression that also strips leading and trailing whitespace:
[-+^*!()]|[^-+^*!()\s]+(\s+[^-+^*!()\s]+)*)
Demo
Inspired by https://stackoverflow.com/a/9436872/4645334 you could solve the problem with:
bool extract_tokens(std::string expression, std::vector<std::string> &tokens) {
std::string token;
for (const auto& c: expression) {
if (c == '/' || c == '-' || c == '*' || c == '+' || c == '!') {
if (token.length() && !std::all_of(token.cbegin(), token.cend(), [](auto c) { return c == ' '; })) tokens.push_back(token);
token.clear();
tokens.emplace_back(1, c);
} else {
token += c;
}
}
if (token.length() && !std::all_of(token.cbegin(), token.cend(), [](auto c) { return c == ' '; })) tokens.push_back(token);
return true;
}
Input:
"!aaa + ! a"
Output:
"!","aaa ","+","!"," a"
I'm trying to create a lexer for a functional language, one of the methods of which should allow, on each call, to return the next token of a file.
For example :
func main() {
var MyVar : integer = 3+2;
}
So I would like every time the next method is called, the next token in that sequence is returned; in that case, it would look like this :
func
main
(
)
{
var
MyVar
:
integer
=
3
+
2
;
}
Except that the result I get is not what I expected:
func
main(
)
{
var
MyVar
:
integer
=
3+
2
}
Here is my method:
token_t Lexer::next() {
token_t ret;
std::string token_tmp;
bool IsSimpleQuote = false; // check string --> "..."
bool IsDoubleQuote = false; // check char --> '...'
bool IsComment = false; // check comments --> `...`
bool IterWhile = true;
while (IterWhile) {
bool IsInStc = (IsDoubleQuote || IsSimpleQuote || IsComment);
std::ifstream file_tmp(this->CurrentFilename);
if (this->eof) break;
char chr = this->File.get();
char next = file_tmp.seekg(this->CurrentCharIndex + 1).get();
++this->CurrentCharInCurrentLineIndex;
++this->CurrentCharIndex;
{
if (!IsInStc && !IsComment && chr == '`') IsComment = true; else if (!IsInStc && IsComment && chr == '`') { IsComment = false; continue; }
if (IsComment) continue;
if (!IsInStc && chr == '"') IsDoubleQuote = true;
else if (!IsInStc && chr == '\'') IsSimpleQuote = true;
else if (IsDoubleQuote && chr == '"') IsDoubleQuote = false;
else if (IsSimpleQuote && chr == '\'') IsSimpleQuote = false;
}
if (chr == '\n') {
++this->CurrentLineIndex;
this->CurrentCharInCurrentLineIndex = -1;
}
token_tmp += chr;
if (!IsInStc && IsLangDelim(chr)) IterWhile = false;
}
if (token_tmp.size() > 1 && System::Text::EndsWith(token_tmp, ";") || System::Text::EndsWith(token_tmp, " ")) token_tmp.pop_back();
++this->NbrOfTokens;
location_t pos;
pos.char_pos = this->CurrentCharInCurrentLineIndex;
pos.filename = this->CurrentFilename;
pos.line = this->CurrentLineIndex;
SetToken_t(&ret, token_tmp, TokenList::ToToken(token_tmp), pos);
return ret;
}
Here is the function IsLangDelim :
bool IsLangDelim(char chr) {
return (chr == ' ' || chr == '\t' || TokenList::IsSymbol(CharToString(chr)));
}
TokenList is a namespace that contains the list of tokens, as well as some functions (like IsSymbol in this case).
I have already tried other versions of this method, but the result is almost always the same.
Do you have any idea how to improve this method?
The solution for your problem is using a std::regex. Understanding the syntax is, in the beginning, a little bit difficult, but after you understand it, you will always use it.
And, it is designed to find tokens.
The specific critera can be expressed in the regex string.
For your case I will use: std::regex re(R"#((\w+|\d+|[;:\(\)\{\}\+\-\*\/\%\=]))#");
This means:
Look for one or more characters (That is a word)
Look for one or more digits (That is a integer number)
Or look for all kind of meaningful operators (Like '+', '-', '{' and so on)
You can extend the regex for all the other stuff that you are searching. You can also regex a regex result.
Please see example below. That will create your shown output from your provided input.
And, your described task is only one statement in main.
#include <iostream>
#include <string>
#include <algorithm>
#include <regex>
// Our test data (raw string) .
std::string testData(
R"#(func main() {
var MyVar : integer = 3+2;
}
)#");
std::regex re(R"#((\w+|\d+|[;:\(\)\{\}\+\-\*\/\%\=]))#");
int main(void)
{
std::copy(
std::sregex_token_iterator(testData.begin(), testData.end(), re, 1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n")
);
return 0;
}
You try to parse using single loop, which makes the code very complicated. Instead i suggest something like this:
struct token { ... };
struct lexer {
vector<token> tokens;
string source;
unsigned int pos;
bool parse_ident() {
if (!is_alpha(source[pos])) return false;
auto start = pos;
while(pos < source.size() && is_alnum(source[pos])) ++pos;
tokens.push_back({ token_type::ident, source.substr(start, pos - start) });
return true;
}
bool parse_num() { ... }
bool parse_comment() { ... }
...
bool parse_whitespace() { ... }
void parse() {
while(pos < source.size()) {
if (!parse_comment() && !parse_ident() && !parse_num() && ... && !parse_comment()) {
throw error{ "unexpected character at position " + std::to_string(pos) };
}
}
}
This is standard structure i use, when lexing my files in any scripting language i've written. Lexing is usually greedy, so you don't need to bother with regex (which is effective, but slower, unless some crazy template based implementation). Just define your parse_* functions, make sure they return false, if they didn't parsed a token and make sure they are called in correct order.
Order itself doesn't matter usually, but:
operators needs to be checked from longest to shortest
number in style .123 might be incorrectly recognized as . operator (so you need to make sure, that after . there is no digit.
numbers and identifiers are very lookalike, except that identifiers starts with non-number.
How do I write a boolean that checks if a string has only letters, numbers and an underscore?
Assuming String supports iterators, use all_of:
using std::begin;
using std::end;
return std::all_of(begin(String), end(String),
[](char c) { return isalnum(c) || c == '_'; });
In an easier way, run a loop and check all the characters holding the property you mentioned, and if not, just return false.
Code:
bool stringHasOnlyLettersNumbsandUndrscore(std::string const& str)
{
for(int i = 0; i < str.length(); ++i)
{
//Your character in the string does not fulfill the property.
if (!isalnum(str[i]) && str[i] != '_')
{
return false;
}
}
//The whole string fulfills the condition.
return true;
}
bool stringHasOnlyLettersNumbsandUndrscore(std::string const& str)
{
return ( std::all_of(str.begin(), str.end(),
[](char c) { return isalnum(c) || c == '_'; }) &&
(std::count_if(str.begin(), str.end(),
[](char c) { return (c == '_'); }) < 2));
}
Check if each character is a letter, number or underscore.
for c and c++ , this should do.
if(!isalnum(a[i]) && a[i]!='_')
cout<<"No";
You will have to add < ctype > for this code to work.
This is just the quickest way that comes to mind, there might be other more complex and faster ways.
i want to do my work if chars of the string variable tablolar does not contain any char but small letters between a-z and ','. what do you suggest?
if string tablolar is;
"tablo"->it is ok
"tablo,tablobir,tabloiki,tablouc"->it is ok
"ta"->it is ok
but if it is;
"tablo2"->not ok
"ta546465"->not ok
"Tablo"->not ok
"tablo,234,tablobir"->not ok
"tablo^%&!)=(,tablouc"-> not ok
what i tried was wrog;
for(int z=0;z<tablolar.size();z++){
if ((tablolar[z] == ',') || (tablolar[z] >= 'a' && tablolar[z] <= 'z'))
{//do your work here}}
tablolar.find_first_not_of("abcdefghijknmopqrstuvwxyz,") will return the position of the first invalid character, or std::string::npos if the string is OK.
bool fitsOurNeeds(const std::string &tablolar) {
for (int z=0; z < tablolar.size(); z++)
if (!((tablolar[z] == ',') || (tablolar[z] >= 'a' && tablolar[z] <= 'z')))
return false;
return true;
}
The c function islower tests for lowercase. So you probably want something along these lines:
#include <algorithm>
#include <cctype> // for islower
bool fitsOurNeeds(std::string const& tabular)
{
return std::all_of(tabular.begin(), tabular.end(),
[](char ch)
{
return islower(ch) || ch == ',';
});
}
I wrote a function in c++ to remove parenthesis from a string, but it doesn't always catch them all for some reason that I'm sure is really simple.
string sanitize(string word)
{
int i = 0;
while(i < word.size())
{
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
}
i++;
}
return word;
}
Sample result:
Input: ((3)8)8)8)8))7
Output: (38888)7
Why is this? I can get around the problem by calling the function on the output (so running the string through twice), but that is clearly not "good" programming. Thanks!
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
}
i++;
If you erase a parenthesis, the next character moves to the index previously occupied by the parenthesis, so it is not checked. Use an else.
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
} else {
i++;
}
while(i < word.size())
{
if(word[i] == '(' || word[i] == ')')
{
word.erase(i,1);
}
i++;
}
When you remove an element the next element is moved to that location. If you want to test it, you will have to avoid incrementing the counter:
while (i < word.size()) {
if (word[i] == '(' || word[i] == ')' ) {
word.erase(i,1);
} else {
++i;
}
}
That can also be done with iterators, but either option is bad. For each parenthesis in the string, all elements that are after it will be copied, which means that your function has quadratic complexity: O(N^2). A much better solution is use the erase-remove idiom:
s.erase( std::remove_if(s.begin(), s.end(),
[](char ch){ return ch==`(` || ch ==`)`; })
s.end() );
If your compiler does not have support for lambdas you can implement the check as a function object (functor). This algorithm has linear complexity O(N) as the elements that are not removed are copied only once to the final location.
It's failing because your incrementing the index in all cases. You should only do that if you're not deleting the character, since the deletion shifts all the characters beyond that point back by one.
In other words, you'll have this problem wherever you have two or more consecutive characters to delete. Rather than deleting them both, it "collapses" the two into one.
Running it through your function twice will work on that particular input string but you'll still get into trouble with something like "((((pax))))" since the first call will collapse it to "((pax))" and the second will give you "(pax)".
One solution is to not advance the index when deleting a character:
std::string sanitize (std::string word) {
int i = 0;
while (i < word.size()) {
if(word[i] == '(' || word[i] == ')') {
word.erase(i,1);
continue;
}
i++;
}
return word;
}
However, I'd be using the facilities of the language a little more intelligently. C++ strings already have the capability to search for a selection of characters, one that's possibly far more optimised than a user loop. So you can use a much simpler approach:
std::string sanitize (std::string word) {
int spos = 0;
while ((spos = word.find_first_of ("()", spos)) != std::string::npos)
word.erase (spos, 1);
return word;
}
You can see this in action in the following complete program:
#include <iostream>
#include <string>
std::string sanitize (std::string word) {
int i = 0;
while ((i = word.find_first_of ("()", i)) != std::string::npos)
word.erase (i, 1);
return word;
}
int main (void) {
std::string s = "((3)8)8)8)8))7 ((((pax))))";
s = sanitize (s);
std::cout << s << '\n';
return 0;
}
which outputs:
388887 pax
Why not just use strtok and a temporary string?
string sanitize(string word)
{
int i = 0;
string rVal;
char * temp;
strtok(word.c_str(), "()"); //I make the assumption that your values should always start with a (
do
{
temp = strtok(0, "()");
if(temp == 0)
{
break;
}
else { rVal += temp;}
}while(1);
return rVal;
}