Regular expression: Numeric + alphanum + special characters Only - c++

I am trying to build a regular expression that can find patterns that MUST contain both numeric and alphanumeric values along side special characters.
I found an answer that deals with this type of regular expressions but without the special characters.
How can I include the special characters including: ^$=()_"'[\# in the Regular expression?
^([0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+)[0-9a-zA-Z]*$
can you explain it a little please ?
Regex tester : http://regexlib.com/RETester.aspx
Thank you.
AS a solution I found this regular expression: ^(?=.*\d)(?=.*[a-zA-Z]).{4,8}$
Maybe it can help you.

Why so complicated !?
enum { numeric = 1; alpha = 2, special = 4; }
bool check(const std::string& s) {
for(std::string::size_type i = 0; i < s.size; ++i) {
if(is_numeric(s[i])) result |= numeric;
if(is_alpha(s[i])) result |= alpha;
if(is_special(s[i])) result |= special;
if(result == numeric | alpha | special)
return true;
}
return false;
}
A little more typing but less brain damage

Your regex is formed of two parts, both must capture a complete line as they're between start-of-line (^) and end-of-line ($):
([0-9]+[a-zA-Z]+|[a-zA-Z]+[0-9]+)
This is formed of two regexs or'd (|) together. The first regex is one or more numbers ([0-9]+) followed by one or more letters ([a-zA-Z]+). This regex is or'd with the opposite case regex: one or more letters followed by one or more numbers.
The second group says that the above is followed by a regex zero or more letters or numbers ([0-9a-zA-Z]*)

Related

Regex: Find a word that consists of certain characters

I have a list of dictionary words, I would like to find any word that consists of (some or all) certain characters of a source word in any order :
For Example:
Characters (source word) to look for : stainless
Found Words : stainless, stain, net, ten, less, sail, sale, tale, tales, ants, etc.
Also if a letter is found once in the source word it can't be repeated in the found word
Unacceptable words to find : tent (t is repeated), tall (l is repeated) , etc.
Acceptable words to find : less (s is already repeated in the source word), etc.
You could take this approach:
Match any sequence of characters that are in the search word, requiring that the match is a word (word-boundaries)
Prohibit that a certain character occurs more often than it is present in the search word, using a negative look-ahead. Do this for every character that is in the search word.
For the given example the regular expression would be:
(?!(\S*s){4}|(\S*t){2}|(\S*a){2}|(\S*i){2}|(\S*n){2}|(\S*l){2}|(\S*e){2})\b[stainless]+\b
The biggest part of the pattern deals with the negative look-ahead. For example:
(\S*s){4} would match four times an 's' in a single word.
(?! | ) places these patterns as different options in a negative look-ahead so that none of them should match.
Automation
It is clear that making such a regular expression for a given word needs some work, so that is where you could use some automation. Notepad++ cannot help with that, but in a programming environment it is possible. Here is a little snippet in JavaScript that will give you the regular expression that corresponds to a given search word:
function regClassEscape(s) {
// Escape "[" and "^" and "-":
return s.replace(/[\]^-]/g, "\\$&");
}
function buildRegex(searchWord) {
// get frequency of each letter:
let freq = {};
for (let ch of searchWord) {
ch = regClassEscape(ch);
freq[ch] = (freq[ch] ?? 0) + 1;
}
// Produce negative options (too many occurrences)
const forbidden = Object.entries(freq).map(([ch, count]) =>
"(\\S*[" + ch + "]){" + (count + 1) + "}"
).join("|");
// Produce character set
const allowed = Object.keys(freq).join("");
return "(?!" + forbidden + ")\\b[" + allowed + "]+\\b";
}
// I/O management
const [input, output] = document.querySelectorAll("input,div");
input.addEventListener("input", refresh);
function refresh() {
if (/\s/.test(input.value)) {
output.textContent = "Input should have no white space!";
} else {
output.textContent = buildRegex(input.value);
}
}
refresh();
input { width: 100% }
Search word:<br>
<input value="stainless">
Regular expression:
<div></div>

C++11 regex to tokenize Mathematical Expression

I have the following code to tokenize a string of the format: (1+2)/((8))-(100*34):
I'd like to throw an error to the user if they use an operator or character that isn't part of my regex.
e.g if user enters 3^4 or x-6
Is there a way to negate my regex, search for it and if it is true throw the error?
Can the regex expression be improved?
//Using c++11 regex to tokenize input string
//[0-9]+ = 1 or many digits
//Or [\\-\\+\\\\\(\\)\\/\\*] = "-" or "+" or "/" or "*" or "(" or ")"
std::regex e ( "[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
std::sregex_iterator rend;
std::sregex_iterator a( infixExpression.begin(), infixExpression.end(), e );
queue<string> infixQueue;
while (a!=rend) {
infixQueue.push(a->str());
++a;
}
return infixQueue;
-Thanks
You can run a search on the string using the search expression [^0-9()+\-*/] defined as C++ string as "[^0-9()+\\-*/]" which finds any character which is NOT a digit, a round bracket, a plus or minus sign (in real hyphen), an asterisk or a slash.
The search with this regular expression search string should not return anything otherwise the string contains a not supported character like ^ or x.
[...] is a positive character class which means find a character being one of the characters in the square brackets.
[^...] is a negative character class which means find a character NOT being one of the characters in the square brackets.
The only characters which must be escaped within square brackets to be interpreted as literal character are ], \ and - whereby - must not be escaped if being first or last character in the list of characters within the square brackets. But it is nevertheless better to escape - always within square brackets as this makes it easier for the regular expression engine / function to detect that the hyphen character should be interpreted as literal character and not with meaning "FROM x to z".
Of course this expression does not check for missing closing round brackets. But formula parsers do often not require that there is always a closing parenthesis for every opening parenthesis in comparison to a compiler or script interpreter simply because not needed to calculate the value based on entered formula.
Answer is given already but perhaps someone might need this
[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]
This regex separates floats, integers and arithmetic operators
Heres the trick:
[0-9]?([0-9]*[.])?[0-9]+ -> if its a digit and has a point, then grab the digits with the point and the digits that follows it, if not, just grab the digits.
Sorry if my answer isn't clear, i just learned regex and found this solution by my own by just trial and errors.
Heres the code (it takes a mathematical expression and split all digits and operators into a vector)
NOTE: I don't know if it accepts whitespaces, meaning that the mathematical expression that i worked with had no whitespaces. Example: 4+2*(3+1) and would separate everything nicely, but i havent tried with whitespaces.
/* Separate every int or float or operator into a single string using regular expression and store it in untokenize vector */
string infix; //The string to be parse (the arithmetic operation if you will)
vector<string> untokenize;
std::regex words_regex("[0-9]?([0-9]*[.])?[0-9]+|[\\-\\+\\\\\(\\)\\/\\*]");
auto words_begin = std::sregex_iterator(infix.begin(), infix.end(), words_regex);
auto words_end = std::sregex_iterator();
for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
cout << (*i).str() << endl;
untokenize.push_back((*i).str());
}
Output:
(<br/>
1<br/>
+<br/>
2<br/>
)<br/>
/<br/>
(<br/>
(<br/>
8<br/>
)<br/>
)<br/>
-<br/>
(<br/>
100<br/>
*<br/>
34<br/>
)<br/>

Reg expression validate / \ # & characters

I've been learning how Regular expressions work, which is very tricky for me. I would like to validate this chars below from input field. Basically if string contains any of these characters, alert('bad chars')
/
\
#
&
I found this code, but when I change it around doesn't seem to work. How can I alter this code to meet my needs?
var str = $(this).val();
if(/^[a-zA-Z0-9- ]*$/.test(str) == false) {
alert('bad');
return false;
} else {
alert('good');
}
/^[a-zA-Z0-9- ]*$/ means the following:
^ the string MUST start here
[a-zA-Z0-9- ] a letter between a and z upper or lower case, a number between 0 and 9, dashes (-) and spaces.
* repeated 0 or more times
$ the string must end here.
In the case of "any character but" you can use ^ like so: /^[^\/\\#&]*$/. If this matches true, then it doesn't have any of those characters. ^ right after a [ means match anything that isn't the following.
.
You could just try the following:
if("/[\\/#&]/".test(str) == true) {
alert('bad');
return false;
} else {
alert('good');
}
NOTE: I'm not 100% on what characters need to be escaped in JavaScript vs. .NET regular expressions, but basically, I'm saying if your string contains any of the characters \, /, # or &, then alert 'bad'.

use regular expression to find and replace but only every 3 characters for DNA sequence

Is it possible to do a find/replace using regular expressions on a string of dna such that it only considers every 3 characters (a codon of dna) at a time.
for example I would like the regular expression to see this:
dna="AAACCCTTTGGG"
as this:
AAA CCC TTT GGG
If I use the regular expressions right now and the expression was
Regex.Replace(dna,"ACC","AAA") it would find a match, but in this case of looking at 3 characters at a time there would be no match.
Is this possible?
Why use a regex? Try this instead, which is probably more efficient to boot:
public string DnaReplaceCodon(string input, string match, string replace) {
if (match.Length != 3 || replace.Length != 3)
throw new ArgumentOutOfRangeException();
var output = new StringBuilder(input.Length);
int i = 0;
while (i + 2 < input.Length) {
if (input[i] == match[0] && input[i+1] == match[1] && input[i+2] == match[2]) {
output.Append(replace);
} else {
output.Append(input[i]);
output.Append(input[i]+1);
output.Append(input[i]+2);
}
i += 3;
}
// pick up trailing letters.
while (i < input.Length) output.Append(input[i]);
return output.ToString();
}
Solution
It is possible to do this with regex. Assuming the input is valid (contains only A, T, G, C):
Regex.Replace(input, #"\G((?:.{3})*?)" + codon, "$1" + replacement);
DEMO
If the input is not guaranteed to be valid, you can just do a check with the regex ^[ATCG]*$ (allow non-multiple of 3) or ^([ATCG]{3})*$ (sequence must be multiple of 3). It doesn't make sense to operate on invalid input anyway.
Explanation
The construction above works for any codon. For the sake of explanation, let the codon be AAA. The regex will be \G((?:.{3})*?)AAA.
The whole regex actually matches the shortest substring that ends with the codon to be replaced.
\G # Must be at beginning of the string, or where last match left off
((?:.{3})*?) # Match any number of codon, lazily. The text is also captured.
AAA # The codon we want to replace
We make sure the matches only starts from positions whose index is multiple of 3 with:
\G which asserts that the match starts from where the previous match left off (or the beginning of the string)
And the fact that the pattern ((?:.{3})*?)AAA can only match a sequence whose length is multiple of 3.
Due to the lazy quantifier, we can be sure that in each match, the part before the codon to be replaced (matched by ((?:.{3})*?) part) does not contain the codon.
In the replacement, we put back the part before the codon (which is captured in capturing group 1 and can be referred to with $1), follows by the replacement codon.
NOTE
As explained in the comment, the following is not a good solution! I leave it in so that others will not fall for the same mistake
You can usually find out where a match starts and ends via m.start() and m.end(). If m.start() % 3 == 0 you found a relevant match.

Detecting text like "#smth" with RegExp (with some more terms)

I'm really bad in regular expressions, so please help me.
I need to find in string any pieces like #text.
text mustn't contain any space characters (\\s). It's length must be at least 2 characters ({2,}), and it must contain at least 1 letter(QChar::isLetter()).
Examples:
#c, #1, #123456, #123 456, #123_456 are incorrect
#cc, #text, #text123, #123text are correct
I use QRegExp.
QRegExp rx("#(\\S+[A-Za-z]\\S*|\\S*[A-Za-z]\\S+)$");
bool result = (rx.indexIn(str) == 0);
rx either finds a non-whitespace followed by a letter and by an unspecified number of non-whitespace characters, or a letter followed by at least non-whitespace.
Styne666 gave the right regex.
Here is a little Perl script which is trying to match its first argument with this regex:
#!/usr/bin/env perl
use strict;
use warnings;
my $arg = shift;
if ($arg =~ m/(#(?=\d*[a-zA-Z])[a-zA-Z\d]{2,})/) {
print "$1 MATCHES THE PATTERN!\n";
} else {
print "NO MATCH\n";
}
Perl is always great to quickly test your regular expressions.
Now, your question is a bit different. You want to find all the substrings in your text string,
and you want to do it in C++/Qt. Here is what I could come up with in couple of minutes:
#include <QtCore/QCoreApplication>
#include <QRegExp>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
QString str = argv[1];
QRegExp rx("[\\s]?(\\#(?=\\d*[a-zA-Z])[a-zA-Z\\d]{2,})\\b");
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1)
{
QString token = rx.cap(1);
cout << token.toStdString().c_str() << endl;
pos += rx.matchedLength();
}
return 0;
}
To make my test I feed it an input like this (making a long string just one command line argument):
peter#ubuntu01$ qt-regexp "#hjhj 4324 fdsafdsa #33e #22"
And it matches only two words: #hjhj and #33e.
Hope it helps.
The shortest I could come up with (which should work, but I haven't tested extensively) is:
QRegExp("^#(?=[0-9]*[A-Za-z])[A-Za-z0-9]{2,}$");
Which matches:
^ the start of the string
# a literal hash character
(?= then look ahead (but don't match)
[0-9]* zero or more latin numbers
[A-Za-z] a single upper- or lower-case latin letter
)
[A-Za-z0-9]{2,} then match at least two characters which may be upper- or lower-case latin letters or latin numbers
$ then find and consume the end of the line
Technically speaking though this is still wrong. It only matches latin letters and numbers. Replacing a few bits gives you:
QRegExp("^#(?=\\d*[^\\d\\s])\\w{2,}$");
This should work for non-latin letters and numbers but this is totally untested. Have a quick read of the QRegExp class reference for an explanation of each escaped group.
And then to match within larger strings of text (again, untested):
QRegExp("\b#(?=\\d*[^\\d\\s])\\w{2,}\b");
A useful tool is the Regular Expressions Example which comes with the SDK.
use this regular expression. hope fully your problem will solve with given RE.
^([#(a-zA-Z)]+[(a-zA-Z0-9)]+)*(#[0-9]+[(a-zA-Z)]+[(a-zA-Z0-9)]*)*$