Getting a substring from a list of strings - regex

So, as part of learning the language, I wanted to check three strings for a certain pattern and return the first match of that pattern only.
My attempt was to use a combination of find and regular expressions to traverse the list:
def date = [
"some string",
"some other string 11.11.2000",
"another one 20.10.1990"
].find { title ->
title =~ /\d{2}\.\d{2}\.\d{4}/
}
This kind of works, leaving the whole string in date.
My goal, however, would be to end up with "11.11.2000" in date; I assume somehow I should be able to access the capture group, but how?

If you want to return a specific value when finding a matching element in a collection (which as in your case might be part of that element), you need to use findResult.
Your code might then look like this
def date = [
"some string",
"some other string 11.11.2000",
"another one 20.10.1990"
].findResult { title ->
def res = title =~ /\d{2}\.\d{2}\.\d{4}/
if (res) {
return res[0]
}
}

Extending UnholySheep's answer, you can also do this:
assert [
"some string",
"some other string 11.11.2000",
"another one 20.10.1990"
].findResult { title ->
def matcher = title =~ /\d{2}\.\d{2}\.\d{4}/
matcher.find() ? matcher.group() : null
} == '11.11.2000'
For all matches, just use findResults instead of findResult, like this:
assert [
"some string",
"some other string 11.11.2000",
"another one 20.10.1990"
].findResults { title ->
def matcher = title =~ /\d{2}\.\d{2}\.\d{4}/
matcher.find() ? matcher.group() : null
} == ['11.11.2000', '20.10.1990']

Related

return first instance of unmatched regex scala

Is there a way to return the first instance of an unmatched string between 2 strings with Scala's Regex library?
For example:
val a = "some text abc123 some more text"
val b = "some text xyz some more text"
a.firstUnmatched(b) = "abc123"
Regex is good for matching & replacing in strings based on patterns.
But to look for the differences between strings? Not exactly.
However, diff can be used to find differences.
object Main extends App {
val a = "some text abc123 some more text 321abc"
val b = "some text xyz some more text zyx"
val firstdiff = (a.split(" ") diff b.split(" "))(0)
println(firstdiff)
}
prints "abc123"
Is regex desired after all? Then realize that the splits could be replaced by regex matching.
The regex pattern in this example looks for words:
val reg = "\\w+".r
val firstdiff = (reg.findAllIn(a).toList diff reg.findAllIn(b).toList)(0)

Convert UPPERCASE to Title Case

I want to convert an uppercase string (UPPERCASE) into a title case string (Title Case) in swift. I am not strong in regular expressions, but have found this answer with a regular expression that I have attempted to use.
The search expression is:
"([A-Z])([A-Z]+)\b"
and the template expression is:
"$1\L$2"
In order to use it in swift I have escaped the backslashes as seen below:
var uppercase = "UPPER CASE STRING"
var titlecase = uppercase.stringByReplacingOccurrencesOfString("([A-Z])([A-Z]+)\\b", withString: "$1\\L$2", options: NSStringCompareOptions.RegularExpressionSearch, range: Range<String.Index>(start: uppercase.startIndex, end: uppercase.endIndex))
The code above gives the following result:
"ULPPER CLASE SLTRING"
From that you can see that the search expression successfully finds the two parts $1 and $2, but it looks like escaping the backslash interferes with the replacement.
How can I get the expected result of:
"Upper Case String"
Many of the useful existing NSString methods are available from Swift. This includes capitalizedString, which may just do exactly what you want, depending on your requirements.
As I know, title cased string is the string that has the first letter of each word capitalised (except for prepositions, articles and conjunctions). So, the code should be like that:
public extension String {
subscript(range: NSRange) -> Substring {
get {
if range.location == NSNotFound {
return ""
} else {
let swiftRange = Range(range, in: self)!
return self[swiftRange]
}
}
}
/// Title-cased string is a string that has the first letter of each word capitalised (except for prepositions, articles and conjunctions)
var localizedTitleCasedString: String {
var newStr: String = ""
// create linguistic tagger
let tagger = NSLinguisticTagger(tagSchemes: [.lexicalClass], options: 0)
let range = NSRange(location: 0, length: self.utf16.count)
tagger.string = self
// enumerate linguistic tags in string
tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: []) { tag, tokenRange, _ in
let word = self[tokenRange]
guard let tag = tag else {
newStr.append(contentsOf: word)
return
}
// conjunctions, prepositions and articles should remain lowercased
if tag == .conjunction || tag == .preposition || tag == .determiner {
newStr.append(contentsOf: word.localizedLowercase)
} else {
// any other words should be capitalized
newStr.append(contentsOf: word.localizedCapitalized)
}
}
return newStr
}
}

find str in another str with regex

I defined:
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
i need to split the words in s1 and check if they contains in s2
if yes give me the specific exists posts
The best way to help me with this, it is makes it as regex, cause i need this checks for mongo db.
Please let me know the proper regex i need.
Thx.
Possibly was something that could be answered with just the regular expression (and is actually) but considering the data:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
{ "phrase" : "something about androi" }
{ "phrase" : "johnathan was here" }
You match with MongoDB like this:
db.collection.find({ "phrase": /\broi\b|\bjohn\b/ })
And that gets the two documents that match:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
So the regex works by keeping the word boundaries \b around the words to match so they do not partially match something else and are combined with an "or" | condition.
Play with the regexer for this.
Doing open ended $regex queries like this in MongoDB can be often bad for performance. Not sure of your actual use case for this but it is possible that a "full text search" solution would be better suited to your needs. MongoDB has full text indexing and search or you can use an external solution.
Anyhow, this is how you mactch your words using a $regex condition.
To actually process your string as input you will need some code before doing the search:
var string = "roi john";
var splits = string.split(" ");
for ( var i = 0; i < splits.length; i++ ) {
splits[i] = "\\b" + splits[i] + "\\b";
}
exp = splits.join("|");
db.collection.find({ "phrase": { "$regex": exp } })
And possibly even combine that with the case insensitive "$option" if that is what you want. That second usage form with the literal $regex operator is actually a safer form form usage in languages other than JavaScript.
using a loop to iterate over the words of s1 and checking with s2 will give the expected result
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
var arr1 = s1.split(" ");
for(var i=0;i<=arr1.length;i++){
if (s2.indexOf(arr1[i]) != -1){
console.log("The string contains "+arr1[i]);
}
}

In DOORS DXL, how do I use a regular expression to determine whether a string starts with a number?

I need to determine whether a string begins with a number - I've tried the following to no avail:
if (matches("^[0-9].*)", upper(text))) str = "Title"""
I'm new to DXL and Regex - what am I doing wrong?
You need the caret character to indicate a match only at the start of a string. I added the plus character to match all the numbers, although you might not need it for your situation. If you're only looking for numbers at the start, and don't care if there is anything following, you don't need anymore.
string str1 = "123abc"
string str2 = "abc123"
string strgx = "^[0-9]+"
Regexp rgx = regexp2(strgx)
if(rgx(str1)) { print str1[match 0] "\n" } else { print "no match\n" }
if(rgx(str2)) { print str2[match 0] "\n" } else { print "no match\n" }
The code block above will print:
123
no match
#mrhobo is correct, you want something like this:
Regexp numReg = "^[0-9]"
if(numReg text) str = "Title"
You don't need upper since you are just looking for numbers. Also matches is more for finding the part of the string that matches the expression. If you just want to check that the string as a whole matches the expression then the code above would be more efficient.
Good luck!
At least from example I found this example should work:
Regexp plural = regexp "^([0-9].*)$"
if plural "15systems" then print "yes"
Resource:
http://www.scenarioplus.org.uk/papers/dxl_regexp/dxl_regexp.htm

QRegexp idiosyncracies (compared to perl): How can I write this regexp without the lazy quantifier?

I have the following regular expression that works fine in perl:
Classification:\s([^\n]+?)(?:\sRange:\s([^\n]+?))*(?:\sStructural Integrity:\s([^\n]+))*\n
The type of data format this string is supposed to match against is:
Classification: Class Name Range: xxxx Structural Integrity: value
Classification: Class Name Structural Integrity: value
Classification: Class Name
That is: the "Range" and "Structural Integrity" fields are optional. So the desired result is:
{
$& [Classification: Class Name Range: xxxx Structural Integrity: value ]
$1 [Class Name ]
$2 [xxxx ]
$3 [value ]
$& [Classification: Class Name Structural Integrity: value ]
$1 [Class Name ]
$2 [value ]
$& [Classification: Class Name ]
$1 [Class Name ]
}
The expression uses the ? lazy quantifier in two places. This operator is not supported by QRegExp, instead Qt uses a "minimal" property which, when set to true, makes all quantifiers in an expression non-greedy
Armed with this information I write my code:
QRegExp rx("Classification:\\s([^\\n]+)(?:\\sRange:\\s([^\\n]+))*(?:\\sStructural Integrity:\\s([^\\n]+))*\\n");
rx.setMinimal(true);
But the results are incorrect, and after much tweaking I haven't been able to get the correct captures. Is it possible to split this up into more code and less regex? Or to rewrite it without the lazy operator?
Something like this:
QRegExp rx("(Classification|Range|Structural\\s+Integrity):|(\\S+)");
QStringList classification();
QStringList range();
QStringList integrity();
QStringList current = null;
int pos;
while ((pos = rx.indexIn(str, pos)) != -1) {
if (rx.cap(1) == null) {
if (current != null) {
current << rx.cap(2);
}
}
else if ("Classification".equals(rx.cap(1))) {
current = classification;
}
else if ("Range".equals(rx.cap(1))) {
current = range;
}
else if ("Structural Integrity".equals(rx.cap(1))) {
current = integrity;
}
pos += rx.matchedLength();
}
It matches either valid keys followed by a colon or words. If it is a key, change the current list to the corresponding one. Otherwise add the word to the current list.
In the end, you will have the lists classification, range and integrity, containing the words after the corresponding keys. You could join them after the full match is done:
QString classificationString = classification.join(" ");
It does not care about the order of the keys though.
Also there is QRegExp::RegExp2 that supports greedy quantifiers since 4.2