Swift 5.7 RegexBuilder convert Array of strings to Regex - ChoiceOf options programatically - regex

I would love to take an array of strings
let array = ["one", "two", "three", "four"]
and convert it to the regex builder equivalent of:
Regex {
ChoiceOf{
"one"
"two"
"three"
"four"
}
}
or basically the equivelant of:
/one|two|three|four/
so far I have tried:
let joinedArray = array.joined(separator: "|")
let choicePattern = Regex(joinedArray)
I know that using Regex() throws and I need to handle that somehow but even when I do, I don't seem to get it to work.
Does anyone know how to do this?

Related

Regex: How to Display 4 digits after the "L" (same for after the "C") ? like : L100_C1_"1"_KO,L100_C2_"3260"_KO,etc

Hi Every Regex Expert,
I have one Array List al1 like this (line by line):
al1 : L1_C1_0, L1_C2_"11229", L1_C2_"CHK_CASHING"_OK, etc... L1_C100_"FR45248624892", L2_C1_0, L2_C2_"11229", L2_C2_"CHK_CASHING"_OK etc... L2_C100_"FR45248624892"_KO, L3_C1_0, L3_C2_"11229", L3_C2_"CHK_CASHING"_OK etc... L3_C100_"FR45248624892"_KO, L4_C1_0, L3_C2_"11229", L4_C2_"CHK_CASHING"_OK etc... L4_C100_"FR45248624892"_OK
I write this regex but it doesn't work as i want :
String spattern = "(L(([1-9]?[0-9])|100)_C\\d_\\W.*?L\\2_C\\d{3}_\".*?\"(?:,?\$?))";
I want to display like this :
L1_C1_0, L1_C2_"11229", L1_C2_"CHK_CASHING"_OK, etc...L1_C100_"FR45248624892"
L2_C1_0, L2_C2_"11229", L2_C2_"CHK_CASHING"OK etc...L2_C100"FR45248624892"_KO
L3_C1_0, L3_C2_"11229", L3_C2_"CHK_CASHING"OK etc...L3_C100"FR45248624892"_KO
L4_C1_0, L3_C2_"11229", L4_C2_"CHK_CASHING"OK etc...L4_C100"FR45248624892"_OK
L5_C1_1 etc...
Some one can help me to Display this ?
Thank you very much for help

Remove empty string in list in Julia

I am looking for efficient solution to remove empty string in a list in Julia.
Here is my list :
li = ["one", "two", "three", " ", "four", "five"]
I can remove empty string by using for loop, as following :
new_li = []
for i in li
if i == " "
else
push!(new_li, i)
end
end
But I believe there is more efficient way to remove the empty string.
new_li = filter((i) -> i != " ", li)
or
new_li = [i for i in li if i != " "]
I have a couple of comments a bit too long for the comments field:
Firstly, when building an array, never start it like this:
new_li = []
It creates a Vector{Any}, which can harm performance. If you want to initialize a vector of strings, it is better to write
new_li = String[]
Secondly, " " is not an empty string! Look here:
jl> isempty(" ")
false
It is a non-empty string that contains a space. An empty string would be "", no space. If you're actually trying to remove empty strings you could do
filter(!isempty, li)
or, for in-place operation, you can use filter!:
filter!(!isempty, li)
But you're not actually removing empty strings, but strings consisting of one (or more?) spaces, and maybe also actually empty strings? In that case you could use isspace along with all. This will remove all strings that are only spaces, including empty strings:
jl> li = ["one", "", "two", "three", " ", "four", " ", "five"];
jl> filter(s->!all(isspace, s), li)
5-element Vector{String}:
"one"
"two"
"three"
"four"
"five"

Case and diacritic insensitive matching of regex with metacharacter in Swift

I am trying to match rude words in user inputs, for example "I Hate You!" or "i.håté.Yoù" will match with "hate you" in an array of words parsed from JSON.
So I need it to be case and diacritic insensitive and to treat whitespaces in the rude words as any non-letter character:
regex metacharacter \P{L} should work for that, or at least \W
Now I know [cd] works with NSPredicate, like this:
func matches(text: String) -> [String]? {
if let rudeWords = JSON?["words"] as? [String]{
return rudeWords.filter {
let pattern = $0.stringByReplacingOccurrencesOfString(" ", withString: "\\P{L}", options: .CaseInsensitiveSearch)
return NSPredicate(format: "SELF MATCHES[cd] %#", pattern).evaluateWithObject(text)
}
} else {
log.debug("error fetching rude words")
return nil
}
}
That doesn't work with either metacharacters, I guess they are not parsed by NSpredicate, so I tried using NSRegularExpression like this:
func matches(text: String) -> [String]? {
if let rudeWords = JSON?["words"] as? [String]{
return rudeWords.filter {
do {
let pattern = $0.stringByReplacingOccurrencesOfString(" ", withString: "\\P{L}", options: .CaseInsensitiveSearch)
let regex = try NSRegularExpression(pattern: pattern, options: .CaseInsensitive)
return regex.matchesInString(text, options: [], range: NSMakeRange(0, text.characters.count)).count > 0
}
catch _ {
log.debug("error parsing rude word regex")
return false
}
}
} else {
log.debug("error fetching rude words")
return nil
}
}
This seem to work OK however there is no way that I know to make regex diacritic insensitive, so I tried this (and other solutions like re-encoding)
let text = text.stringByFoldingWithOptions(.DiacriticInsensitiveSearch, locale: NSLocale.currentLocale())
However, this does not work for me since I check user input every time a character is typed so all the solutions I tried to strip accents made the app extremely slow.
Does someone know if there any other solutions or if I am using this the wrong way ?
Thanks
EDIT
I was actually mistaken, what was making the app slow was trying to match with \P{L}, I tried the second soluton with \W and with the accent-stripping line, now it works OK even if it matches with less strings than I initially wanted.
Links
These might help some people dealing with regex and predicates:
http://www.regular-expressions.info/unicode.html
http://juehualu.blogspot.fr/2013/08/ios-notes-for-predicates-programming.html
https://regex101.com
It might be worthwhile to go in a different direction. Instead of flattening the input, what if you changed the regex?
Instead of matching against hate.you, could match against [h][åæaàâä][t][ëèêeé].[y][o0][ùu], for example (it's not a comprehensive list, in any case). It would make most sense to do this transformation on the fly (not storing it) because it might be easier if you need to change what the characters expand to later.
This will give you some more control over what characters will match. If you look, I have 0 as a character matching o. No amount of Unicode coercion could let you do that.
I ended up using the solution suggested by Laurel. It works well for me.
I post it here for anybody who might need it.
extension String {
func getCaseAndDiacriticInsensitiveRegex() throws -> NSRegularExpression {
var pattern = self.folding(options: [.caseInsensitive, .diacriticInsensitive], locale: .current)
pattern = pattern.replacingOccurrences(of: "a", with: "[aàáâäæãåā]")
pattern = pattern.replacingOccurrences(of: "c", with: "[cçćč]")
pattern = pattern.replacingOccurrences(of: "e", with: "[eèéêëēėę]")
pattern = pattern.replacingOccurrences(of: "l", with: "[lł]")
pattern = pattern.replacingOccurrences(of: "i", with: "[iîïíīįì]")
pattern = pattern.replacingOccurrences(of: "n", with: "[nñń]")
pattern = pattern.replacingOccurrences(of: "o", with: "[oôöòóœøōõ]")
pattern = pattern.replacingOccurrences(of: "s", with: "[sßśš]")
pattern = pattern.replacingOccurrences(of: "u", with: "[uûüùúū]")
pattern = pattern.replacingOccurrences(of: "y", with: "[yýÿ]")
pattern = pattern.replacingOccurrences(of: "z", with: "[zžźż]")
return try NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
}
}

find str in another str with regex

I defined:
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
i need to split the words in s1 and check if they contains in s2
if yes give me the specific exists posts
The best way to help me with this, it is makes it as regex, cause i need this checks for mongo db.
Please let me know the proper regex i need.
Thx.
Possibly was something that could be answered with just the regular expression (and is actually) but considering the data:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
{ "phrase" : "something about androi" }
{ "phrase" : "johnathan was here" }
You match with MongoDB like this:
db.collection.find({ "phrase": /\broi\b|\bjohn\b/ })
And that gets the two documents that match:
{ "phrase" : "hello guys my name is roi levi or maybe roy" }
{ "phrase" : "and another sentence from john" }
So the regex works by keeping the word boundaries \b around the words to match so they do not partially match something else and are combined with an "or" | condition.
Play with the regexer for this.
Doing open ended $regex queries like this in MongoDB can be often bad for performance. Not sure of your actual use case for this but it is possible that a "full text search" solution would be better suited to your needs. MongoDB has full text indexing and search or you can use an external solution.
Anyhow, this is how you mactch your words using a $regex condition.
To actually process your string as input you will need some code before doing the search:
var string = "roi john";
var splits = string.split(" ");
for ( var i = 0; i < splits.length; i++ ) {
splits[i] = "\\b" + splits[i] + "\\b";
}
exp = splits.join("|");
db.collection.find({ "phrase": { "$regex": exp } })
And possibly even combine that with the case insensitive "$option" if that is what you want. That second usage form with the literal $regex operator is actually a safer form form usage in languages other than JavaScript.
using a loop to iterate over the words of s1 and checking with s2 will give the expected result
var s1="roi john";
var s2="hello guys my name is roi levi or maybe roy";
var arr1 = s1.split(" ");
for(var i=0;i<=arr1.length;i++){
if (s2.indexOf(arr1[i]) != -1){
console.log("The string contains "+arr1[i]);
}
}

display a list differently Haskell?

hey i was wandering if it was possible to show a list:
["one", "two", "three"]
to be shown as
"one", "two", "three"
need it done for a file
thanks
You can do this with intercalate from Data.List
showList :: Show a => [a] -> String
showList = intercalate ", " . map show
The map show converts each element to it's string representation with quotes (and any internal quotes properly escaped), while intercalate ", " inserts commas and spaces between the pieces and glues them together.