Regex - Match Words which are not Strings - regex

I am trying to distinguish between words and strings. I managed to get strings working, but I can't quite figure out how to only match words which are not surrounded by double quotes:
So I want this to match:
test
But this shouldn't match:
"test"
This is what I have so far:
[^\"][a-zA-Z]*[^\"]
It still gets the test although it is surrounded by double quotes.
Input: "\"this is a string\" word"
Expected Output: word
Any suggestions?

How about it?
assert("\"<quoted>\" word".words == listOf("word"))
assert("head \"<quoted>\" word".words == listOf("head", "word"))
assert("head\"<quoted>\"word".words == listOf("head", "word"))
assert("\"<escaped\\\"quoted>\"".words == emptyList())
assert("; punctuations , ".words == listOf("punctuations"))
inline val String.words get() = dropStrings().split("[^\\p{Alpha}]+".toRegex())
.filter { it.isNotBlank() }
#Suppress("NOTHING_TO_INLINE")
inline fun String.dropStrings() = replace("\"(\\[\"]|.*)?\"".toRegex(), " ")

Related

How to do a camel case to sentence case in dart

Something is wrong with my attempt:
String camelToSentence(String text) {
var result = text.replaceAll(RegExp(r'/([A-Z])/g'), r" $1");
var finalResult = result[0].toUpperCase() + result.substring(1);
return finalResult;
}
void main(){
print(camelToSentence("camelToSentence"));
}
It just prints "CamelToSentence" instead of "Camel To Sentence".
Looks like the problem is here r" $1"; but I don't know why.
You can use
String camelToSentence(String text) {
return text.replaceAllMapped(RegExp(r'^([a-z])|[A-Z]'),
(Match m) => m[1] == null ? " ${m[0]}" : m[1].toUpperCase());
}
Here,
^([a-z])|[A-Z] - matches and captures into Group 1 a lowercase ASCII letter at the start of string, or just matches an uppercase letter anywhere in the string
(Match m) => m[1] == null ? " ${m[0]}" : m[1].toUpperCase() returns as the replacement the uppercases Group 1 value (if it was matched) or a space + the matched value otherwise.
You should not use the / and /g in the pattern.
About the The replaceAll method:
Notice that the replace string is not interpreted. If the replacement
depends on the match (for example on a RegExp's capture groups), use
the replaceAllMapped method instead.
As is does not match, result[0] returns c and result.substring(1) contains amelToSentence so you are concatenating an uppercased c with amelToSentence giving CamelToSentence
You can also use lookarounds
(?<!^)(?=[A-Z])
(?<!^) Assert not the start of the string
(?=[A-Z]) Assert an uppercase char A-Z to the right
Dart demo
For example
String camelToSentence(String text) {
var result = text.replaceAll(RegExp(r'(?<!^)(?=[A-Z])'), r" ");
var finalResult = result[0].toUpperCase() + result.substring(1);
return finalResult;
}
void main() {
print(camelToSentence("camelToSentence"));
}
Output
Camel To Sentence

How to only replace the vowels of words that match the words in a given array with a "*"?

I need to create a ruby method that accepts a string and an array and if any of the words in the string matches the words in the given array then all the vowels of the matched words in the string should be replaced with a "*". I have tried to do this using regex and an "if condition" but I don't know why this does not work. I'd really appreciate if somebody could explain me where I have gone wrong and how I can get this code right.
def censor(sentence, arr)
if arr.include? sentence.downcase
sentence.downcase.gsub(/[aeiou]/, "*")
end
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#expected_output = "G*sh, it's s* h*t"
are.include? sentence.downcase reads, “If one of the elements of arr equals sentence.downcase ...”, not what you want.
baddies = ["gosh", "it's", "hot", "shoot", "so"]
sentence = "Gosh, it's so very hot"
r = /\b#{baddies.join('|')}\b/i
#=> /\bgosh|it's|hot|shoot|so\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "G*sh *t's s* very h*t"
In the regular expression, \b is a word break and #{baddies.join('|')} requires a match of one of the baddies. The word breaks are to avoid, for example, "so" matching "solo" or "possible". One could alternatively write:
/\b#{Regexp.union(baddies).source}\b/
#=> /\bgosh|it's|hot|shoot|so\b/
See Regexp::union and Regexp#source. source is needed because Regexp.union(baddies) is unaffected by the case-indifference modifier (i).
Another approach is split the sentence into words, manipulate each word, then rejoin all the pieces to form a new sentence. One difficulty with this approach concerns the character "'", which serves double-duty as a single quote and an apostrophe. Consider
sentence = "She liked the song, 'don't box me in'"
baddies = ["don't"]
the approach I've given here yields the correct result:
r = /\b#{baddies.join('|')}\b/i
#=> /\bdon't\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "She liked the song 'd*n't box me in'"
If we instead divide up the sentence into parts we might try the following:
sentence.split(/([\p{Punct}' ])/)
#=> ["She", " ", "liked", " ", "", " ", "the", " ", "song", ",", "",
# " ", "", "'", "don", "'", "t", " ", "box", " ", "me", " ", "in", "'"]
As seen, the regex split "don't" into "don" and "'t", not what we want. Clearly, distinguishing between single quotes and apostrophes is a non-trivial task. This is made difficult by the the fact that words can begin or end with apostrophes ("'twas") and most nouns in the possessive form that end with "s" are followed by an apostrophe ("Chris' car").
Your code does not return any value if the condition is valid.
One option is to split words by spaces and punctuation, manipulate, then rejoin:
def censor(sentence, arr)
words = sentence.scan(/[\w'-]+|[.,!?]+/) # this splits the senctence into an array of words and punctuation
res = []
words.each do |word|
word = word.gsub(/[aeiou]/, "*") if arr.include? word.downcase
res << word
end
res.join(' ') # add spaces also before punctuation
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#=> G*sh , it's s* h*t
Note that res.join(' ') add spaces also before punctuation. I'm not so good with regexp, but this could solve:
res.join(' ').gsub(/ [.,!?]/) { |punct| "#{punct}".strip }
#=> G*sh, it's s* h*t
This part words = sentence.scan(/[\w'-]+|[.,!?]+/) returns ["Gosh", ",", "it's", "so", "hot"]

validator.addMethod for checking before and end whitespaces

I want to validate a field with white spaces either before a text string or after. It is allowed to have space in the middle string.
Here is my code
$.validator.addMethod("trimLookup", function(value, element) {
regex = "^[^\s]+(\s+[^\s]+)*$";
regex = new RegExp( regex );
return this.optional( element ) || regex.test( value );
}, $.validator.format("Cannot contains any spaces at beginning or end"));
I test the regex in https://regex101.com/ it works fine. I also test this code with other regex it works. But if enter " " or " abc " it doesn't work.
Any Suggestion?
Thank you for your time!

Scala regex find and replace

I'm having problems finding and replacing portions of a string using regex in scala.
Given the following string: q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20
I want to replace all the occurrences of "and" and "or" with "&&" and "||".
The regex I have come up with is: .+\s((and|or)+)\s.+. However, this seems to only find the last "and".
When using https://regex101.com/#pcre I tried to solve this by adding the modifiers gU, which seems to work. But I'm not sure how to use those modifiers in Scala code.
Any help is much appreciated
Why not to use solution like:
str.replaceAll("\\sand\\s", " && ").replaceAll("\\sor\\s", " || ")
You can check the captured/matched substrings with a lambda and use an if/else syntax to replace with the appropriate replacement:
val str = "q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20"
val pattern = """\b(and|or)\b""".r
val replacedStr = pattern replaceAllIn (str, m => if (m.group(1) == "or") "||" else "&&")
println(replacedStr)
Result of the code demo: q[k6.q3]>=0 && q[dist.report][0] || q[dist.report][1] && q[10]>20
Regex breakdown:
\b - word boundary
(and|or) - either and or or letter sequences
\b - the closing word boundary.
If you require whitespaces on both ends, use
val pattern = """ (and|or) """.r
val replacedStr = pattern replaceAllIn (str, m => if (m.group(1) == "or") " || " else " && ")
See another Scala demo
You need to add "?" in the right places to make your patterns reluctant:
val line = "q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20"
val regex = ".+\\s((and|or)+)\\s.+".r
regex.findAllIn(line).toList
//Produces list with one item:
//res0: List[String] = List(q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q)
Compared with:
val line = "q[k6.q3]>=0 and q[dist.report][0] or q[dist.report][1] and q[10]>20"
val regex = ".+?\\s((and|or)+)\\s.+?".r
regex.findAllIn(line).toList
//List with 3 items:
//res0: List[String] = List(q[k6.q3]>=0 and q, [dist.report][0] or q, [dist.report][1] and q)

In DOORS DXL, how do I use a regular expression to determine whether a string starts with a number?

I need to determine whether a string begins with a number - I've tried the following to no avail:
if (matches("^[0-9].*)", upper(text))) str = "Title"""
I'm new to DXL and Regex - what am I doing wrong?
You need the caret character to indicate a match only at the start of a string. I added the plus character to match all the numbers, although you might not need it for your situation. If you're only looking for numbers at the start, and don't care if there is anything following, you don't need anymore.
string str1 = "123abc"
string str2 = "abc123"
string strgx = "^[0-9]+"
Regexp rgx = regexp2(strgx)
if(rgx(str1)) { print str1[match 0] "\n" } else { print "no match\n" }
if(rgx(str2)) { print str2[match 0] "\n" } else { print "no match\n" }
The code block above will print:
123
no match
#mrhobo is correct, you want something like this:
Regexp numReg = "^[0-9]"
if(numReg text) str = "Title"
You don't need upper since you are just looking for numbers. Also matches is more for finding the part of the string that matches the expression. If you just want to check that the string as a whole matches the expression then the code above would be more efficient.
Good luck!
At least from example I found this example should work:
Regexp plural = regexp "^([0-9].*)$"
if plural "15systems" then print "yes"
Resource:
http://www.scenarioplus.org.uk/papers/dxl_regexp/dxl_regexp.htm