" java regex" match words with numbers and specials caracters - regex

I have this regex :
"([ ]?[a-zA-Z]{3,})"
but when I try match these words :
"sol perro idea \ncaballo\ndo7\ntres\n tr_es\n8cuatro\ncinco.\n3pesos\n$dollar$\nccc\ncoton\nH7T\n chien#\na-z\n"
i get these matchs:
sol
perro
idea
caballo
tres
cuatro
cinco
pesos
dollar
ccc
coton
chien
please how i change my regex ???? if i want 8cuatro\n 3pesos\n $dollar$\n and chien#\n not matched....thanks lot of
bye

If you don't want to match the leading space, you can omit that from the pattern, and you can also omit the capture group if you want matches only.
You can assert a whitspace boundary to the left, and at the right side a word boundary followed by asserting not # to the right.
(?<!\S)[a-zA-Z]{3,}\b(?!#)
See a regex demo
In Java:
String regex = "(?<!\\S)[a-zA-Z]{3,}\\b(?!#)";
String string = "sol perro idea \n"
+ "caballo\n"
+ "do7\n"
+ "tres\n"
+ " tr_es\n"
+ "8cuatro\n"
+ "cinco.\n"
+ "3pesos\n"
+ "$dollar$\n"
+ "ccc\n"
+ "coton\n"
+ "H7T\n"
+ " chien#\n"
+ "a-z\n";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
sol
perro
idea
caballo
tres
cinco
ccc
coton

Related

Remove only non-leading and non-trailing spaces from a string in Ruby?

I'm trying to write a Ruby method that will return true only if the input is a valid phone number, which means, among other rules, it can have spaces and/or dashes between the digits, but not before or after the digits.
In a sense, I need a method that does the opposite of String#strip! (remove all spaces except leading and trailing spaces), plus the same for dashes.
I've tried using String#gsub!, but when I try to match a space or a dash between digits, then it replaces the digits as well as the space/dash.
Here's an example of the code I'm using to remove spaces. I figure once I know how to do that, it will be the same story with the dashes.
def valid_phone_number?(number)
phone_number_pattern = /^0[^0]\d{8}$/
# remove spaces
number.gsub!(/\d\s+\d/, "")
return number.match?(phone_number_pattern)
end
What happens is if I call the method with the following input:
valid_phone_number?(" 09 777 55 888 ")
I get false because line 5 transforms the number into " 0788 ", i.e. it gets rid of the digits around the spaces as well as the spaces. What I want it to do is just to get rid of the inner spaces, so as to produce " 0977755888 ".
I've tried
number.gsub!(/\d(\s+)\d/, "") and number.gsub!(/\d(\s+)\d/) { |match| "" } to no avail.
Thank you!!
If you want to return a boolean, you might for example use a pattern that accepts leading and trailing spaces, and matches 10 digits (as in your example data) where there can be optional spaces or hyphens in between.
^ *\d(?:[ -]?\d){9} *$
For example
def valid_phone_number?(number)
phone_number_pattern = /^ *\d(?:[ -]*\d){9} *$/
return number.match?(phone_number_pattern)
end
See a Ruby demo and a regex demo.
To remove spaces & hyphen inbetween digits, try:
(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)
See an online regex demo
(?: - Open non-capture group;
d+ - Match 1+ digits;
| - Or;
\G(?!^)\d+ - Assert position at end of previous match but (negate start-line) with following 1+ digits;
)\K - Close non-capture group and reset matching point;
[- ]+ - Match 1+ space/hyphen;
(?=\d) - Assert position is followed by digits.
p " 09 777 55 888 ".gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, '')
Prints: " 0977755888 "
Using a very simple regex (/\d/ tests for a digit):
str = " 09 777 55 888 "
r = str.index(/\d/)..str.rindex(/\d/)
str[r] = str[r].delete(" -")
p str # => " 0977755888 "
Passing a block to gsub is an option, capture groups available as globals:
>> str = " 09 777 55 888 "
# simple, easy to understand
>> str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }
=> " 0977755888 "
# a different take on #steenslag's answer, to avoid using range.
>> s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s
=> " 0977755888 "
Benchmark, not that it matters that much:
n = 1_000_000
puts(Benchmark.bmbm do |x|
# just a match
x.report("match") { n.times {str.match(/^ *\d(?:[ -]*\d){9} *$/) } }
# use regex in []=
x.report("[//]=") { n.times {s = str.dup; s[/^\s+([\d\s-]+?)\s+$/, 1] = s.delete("- "); s } }
# use range in []=
x.report("[..]=") { n.times {s = str.dup; r = s.index(/\d/)..s.rindex(/\d/); s[r] = s[r].delete(" -"); s } }
# block in gsub
x.report("block") { n.times {str.gsub(/(^\s+)([\d\s-]+?)(\s+$)/){ "#$1#{$2.delete('- ')}#$3" }} }
# long regex
x.report("regex") { n.times {str.gsub(/(?:\d+|\G(?!^)\d+)\K[- ]+(?=\d)/, "")} }
end)
Rehearsal -----------------------------------------
match 0.997458 0.000004 0.997462 ( 0.998003)
[//]= 1.822698 0.003983 1.826681 ( 1.827574)
[..]= 3.095630 0.007955 3.103585 ( 3.105489)
block 3.515401 0.003982 3.519383 ( 3.521392)
regex 4.761748 0.007967 4.769715 ( 4.772972)
------------------------------- total: 14.216826sec
user system total real
match 1.031670 0.000000 1.031670 ( 1.032347)
[//]= 1.859028 0.000000 1.859028 ( 1.860013)
[..]= 3.074159 0.003978 3.078137 ( 3.079825)
block 3.751532 0.011982 3.763514 ( 3.765673)
regex 4.634857 0.003972 4.638829 ( 4.641259)

RegEx to match with single occurrence of dash anywhere in [A-Z0-9]+ with total occurrence of 20 chars

I couldn't figure out a regex to match with single occurrence of dash anywhere in [A-Z0-9]+ with max occurrence of 20 chars, so it's like - and [A-Z0-9]+ altogether max 20 chars.
This is the closest pattern I can get but didn't work
([A-Z0-9]{1,19}|\-{1})
Why use a regex, especially a single regex? These conditions are much easier to check separately.
For example, using Perl:
if (length($str) <= 20 && $str =~ /\A[A-Z0-9]*-[A-Z0-9]*\z/)
Another option is to use a positive lookahead and assert the length to 1 - 20 chars:
^(?=.{1,20}$)[A-Z0-9]*-[A-Z0-9]*$
Depending on the tool or language, if you want to use different anchors than ^ and $ to match the start and end of the string or line you might look at this page.
For example:
let pattern = /^(?=.{1,20}$)[A-Z0-9]*-[A-Z0-9]*$/;
[
"AAAAAAAAAA-AAAAAAAAA",
"-",
"A-A",
"-A",
"A-",
"A",
"AAAAAAAAAAA-AAAAAAAAA",
"AAAAAAAAAAAAAAAAAAAA",
].forEach(s => {
if (pattern.test(s)) {
console.log("Match: '" + s + "' (Nr of chars: " + s.length + ")");
} else {
console.log("No match: '" + s + "' (Nr of chars: " + s.length + ")");
}
});

Qt C++ QRegExp parse string

I have the string str. I want to get two strings ('+' and '-'):
QString str = "+asdf+zxcv-tyupo+qwerty-yyuu oo+llad dd ff";
// I need this two strings:
// 1. For '+': asdf,zxcv,qwerty,llad dd ff
// 2. For '-': tyupo,yyuu oo
QRegExp rx("[\\+\\-](\\w+)");
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1) {
qDebug() << rx.cap(0);
pos += rx.matchedLength();
}
Output I need:
"+asdf"
"+zxcv"
"-tyupo"
"+qwerty"
"-yyuu oo"
"+llad dd ff"
Output I get:
"+asdf"
"+zxcv"
"-tyupo"
"+qwerty"
"-yyuu"
"+llad"
If I replace \\w by .* the output is:
"+asdf+zxcv-tyupo+qwerty-yyuu oo+llad dd ff"
You can use the following regex:
[+-]([^-+]+)
See regex demo
The regex breakdown:
[+-] - either a + or -
([^-+]+) - a capturing group matching 1 or more symbols other than - and +.
Your regexp is excessive:
[\\+\\-](\\w+)
\______/\____/
^ ^--- any amount of alphabetical characters
^--- '+' or '-' sign
So what you are capturing is the +/- sign, and any word that follows it directly. If you want to capture only the +/- signs, use [+-] as a regular expression.
EDIT:
To get the strings including the spaces, you need
QRegExp rx("[+-](\\w|\\s)+");

String Replacing in Regex

I am trying to replace text in string using regex. I accomplished it in c# using the same pattern but in swift its not working as per needed.
Here is my code:
var pattern = "\\d(\\()*[x]"
let oldString = "2x + 3 + x2 +2(x)"
let newString = oldString.stringByReplacingOccurrencesOfString(pattern, withString:"*" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
print(newString)
What I want after replacement is :
"2*x + 3 +x2 + 2*(x)"
What I am getting is :
"* + 3 + x2 +*)"
Try this:
(?<=\d)(?=x)|(?<=\d)(?=\()
This pattern matches not any characters in the given string, but zero width positions in between characters.
For example, (?<=\d)(?=x) This matches a position in between a digit and 'x'
(?<= is look behind assertion (?= is look ahead.
(?<=\d)(?=\() This matches the position between a digit and '('
So the pattern before escaping:
(?<=\d)(?=x)|(?<=\d)(?=\()
Pattern, after escaping the parentheses and '\'
\(?<=\\d\)\(?=x\)|\(?<=\\d\)\(?=\\\(\)

Regular expression to limit number of digits

I am trying to write a regular expression that will only match with qtr1, qtr2, qtr3, qtr4 with help of following regex [q|qtr|qtrs|quarter]+[1-4] but the problem is if i ask something like this "Ficoscore for Q21 2005" a space is added between Q and 21 ie "Ficoscore for Q 21 2005" this not valid.
String regEx = "([q|qtr|qtrs|quarter]+[1-4])";
Pattern pattern = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(userQuerySentence);
System.out.println(matcher.matches());
while (matcher.find()) {
String quarterString = matcher.group();
userQuerySentence = userQuerySentence.replaceAll(quarterString,
(quarterString.substring(0, quarterString.length() - 1) + " " + quarterString.substring(quarterString
.length() - 1)));
}
[q|qtr|qtrs|quarter] is a character class, I guess you want (q|qtr|qtrs|quarter):
String regEx = "(?i)\\b((?:q(?:trs?|uarter)?)[1-4])\\b";