How to Avoid OR condition in lambda expression - list

I am new to the lambda expression. I came across a list of type string where I have to query on the basis of certain keywords. I can use multiple OR condition but that's not how I want to do.
Here is the snippet:
List<string> messageList = new List<string>();
//add some data to this list
.
.
//now query
var message = messageList.Where(x => x.Contains("SomeValue") && (x.Contains(value_1)) || (x.Contains(value_2))).ToList();
In the above code the OR list might go on...
In case I have all these values (to be used in OR) in a List, do we have some generic way to avoid these OR condition and query this list instead
Any help in this regard would be appreciated

You could use Any on the list of words to check:
var words = new [] {value_1, value_2, ...};
var message = messageList.Where(x => x.Contains("SomeValue")
&& words.Any(w => x.Contains(w))
.ToList();

Related

Linq get element from string list and a position of a char in this list

i want to get an element from a list of string and get the position of a char in this list by using linq ?
Example :
List<string> lines = new List<string> { "TOTO=1", "TATA=2", "TUTU=3"}
I want to extract the value 1 from TOTO in the list
here is the begin of my code
var value= lines.ToList().Single(x =>x.Contains("TOTO=")).ToString().Trim();
How to continue this code to extract 1 ?
Add this :
value = value[(value.LastIndexOf('=') + 1)..];
Using LINQ you can do this:
List<string> lines = new List<string> { "TOTO=1", "TATA=2", "TUTU=3" };
int value = lines
.Select(line => line.Split('='))
.Where(parts => parts[0] == "TOTO")
.Select(parts => int.Parse(parts[1]))
.Single();
If you always expect each item in that list to be in the proper format then this should work, otherwise you'd need to add some validation.
Similar to What #jtate proposed, Some minor enhancements can help.
int value = lines
.Select(line => line.Split(new []{ '=' }, StringSplitOptions.RemoveEmptyEntries))
.Where(parts => string.Equals(parts[0], "TOTO", StringComparison.InvariantCultureIgnoreCase))
.Select(parts => int.Parse(parts[1]))
.SingleOrDefault();
SingleOrDefault - If you don't find any elements matching your constraints, Single() would thow an exception. Here, SingleOrDefault would return 0;
String.Equals - would take care of any upper lowere or any culture related problems.
StringSplitOptions.RemoveEmptyEntries - would limit some unecessary iterations and improve performance.
Also see if you need int.TryParse instead of int.Prase. All these checks would help cover edges cases in production

How can I convert csv to List with Java 8?

I want to get either empty list or list of strings from CSV. And I tried below approach:
String valueStr = "US,UK";
List<String> countryCodes = StringUtils.isBlank(valueStr)
? Collections.emptyList()
: Arrays.stream(valueStr.split(DELIMITER))
.map(String::trim)
.collect(Collectors.toList());
How can I make it more concise without ternary operator, keeping it easy as well? This works fine. Just checking other approaches.
static Pattern p = Pattern.compile(DELIMITER);
public static List<String> getIt(String valueStr) {
return Optional.ofNullable(valueStr)
.map(p::splitAsStream)
.map(x -> x.map(String::trim).collect(Collectors.toList()))
.orElse(Collections.emptyList());
}
You can filter:
List<String> countryCodes = Arrays.stream(
StringUtils.trimToEmpty(valueStr).split(DELIMITER))
.filter(v -> !v.trim().isEmpty())
.collect(Collectors.toList());
The above returns an empty list when tested with a blank. It also excludes blank values (such as the last value from "UK,")

Spark: Add Regex column into Row

I am writing a spark job which iterates through dataset and finds matches, here's what the pseudo code looks like:
def map(data: Dataset[Row], queries: Array[Row]): Dataset[Row] = {
import spark.implicits._
val val1 = data
.flatMap(r => {
val text = r.getAs[String]("text");
queries.filter(t => t.getAs[String]("query").r.findFirstIn(message).text)
.map(..//mapping)
}).toDF(..columns);
}
So, it iterates through the data and performs regex matching. The issue is, it tries to convert string into regex (t.getAs[String]("query").r) every time, and I am trying to swap it outside the loop as it's not really needed.
So, I tried this (where queries array is generated):
val convertToRegex = udf[Regex, String]((arg:String) => if(arg != null) arg.r else null)
queries.withColumn("queryR", convertToRegex(col("query"))) //queries is DataFrame here
However, as expected, it threw an error saying (Schema for type scala.util.matching.Regex is not supported).
Is there any way I can add a Regex column into an array or create a temp column before stating the iteration?

Scala Spark count regex matches in a file

I am learning Spark+Scala and I am stuck with this problem. I have one file that contains many sentences, and another file with a large number of regular expressions. Both files have one element per line.
What I want is to count how many times each regex has a match in the whole sentences file. For example if the sentences file (after becoming an array or list) was represented by ["hello world and hello life", "hello i m fine", "what is your name"], and the regex files by ["hello \\w+", "what \\w+ your", ...] then I would like the output to be something like: [("hello \\w+", 3),("what \\w+ your",1), ...]
My code is like this:
object PatternCount_v2 {
def main(args: Array[String]) {
// The text where we will find the patterns
val inputFile = args(0);
// The list of patterns
val inputPatterns = args(1)
val outputPath = args(2);
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
// Load the text file
val textFile = sc.textFile(inputFile).cache()
// Load the patterns
val patterns = Source.fromFile(inputPatterns).getLines.map(line => line.r).toList
val patternCounts = textFile.flatMap(line => {
println(line)
patterns.foreach(
pattern => {
println(pattern)
(pattern,pattern.findAllIn(line).length )
}
)
}
)
patternCounts.saveAsTextFile(outputPath)
}}
But the compiler complains:
If I change the flatMap to just map the code runs but returns a bunch of empty tuples () () () ()
Please help! This is driving me crazy.
Thanks,
As far as I can see, there are two issues here:
You should use map instead of foreach: foreach returns Unit, it performs an action with a potential side effect on each element of a collection, it doesn't return a new collection. map on the other hand transform a collection into a new one by applying the supplied function to each element
You're missing the part where you aggregate the results of flatMap to get the actual count per "key" (pattern). This can be done easily with reduceByKey
Altogether - this does what you need:
val patternCounts = textFile
.flatMap(line => patterns.map(pattern => (pattern, pattern.findAllIn(line).length)))
.reduceByKey(_ + _)

Distinct by part of the string in linq

Given this collection:
var list = new [] {
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"};
How can I get subset of items that start with a number followed by a dot (regex #"^\d+(?=\.)") with distinct numbers? That is:
{"1.one", "2. two", "300. three hundred"}
UPDATE:
My attempt on this was to use an IEqualityComparer to pass to the Distinct method. I borrowed this GenericCompare class and tried the following code to no avail:
var pattern = #"^\d+(?=\.)";
var comparer = new GenericCompare<string>(s => Regex.Match(s, pattern).Value);
list.Where(f => Regex.IsMatch(f, pattern)).Distinct(comparer);
If you fancy an approach with Linq, you can try adding a named capture group to the regex, then filter the items that match the regex, group by the captured number and finally get only the first string for each number. I like the readability of the solution but I wouldn´t be surprised if there is a more efficient way of eliminating the duplicates, let´s see if somebody else comes with a different approach.
Something like this:
list.Where(s => regex.IsMatch(s))
.GroupBy(s => regex.Match(s).Groups["num"].Value)
.Select(g => g.First())
You can give it a try with this sample:
public class Program
{
private static readonly Regex regex = new Regex(#"^(?<num>\d+)\.", RegexOptions.Compiled);
public static void Main()
{
var list = new [] {
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"
};
var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
.GroupBy(s => regex.Match(s).Groups["num"].Value)
.Select(g => g.First());
distinctWithNumbers.ToList().ForEach(Console.WriteLine);
Console.ReadKey();
}
}
You can try the approach it in this fiddle
As pointed by #orad in the comments, there is a Linq extension DistinctBy() in MoreLinq that could be used instead of grouping and then getting the first item in the group to eliminate the duplicates:
var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
.DistinctBy(s => regex.Match(s).Groups["num"].Value);
Try it in this fiddle
EDIT
If you want to use your comparer, you need to implement the GetHashCode so it uses the expression as well:
public int GetHashCode(T obj)
{
return _expr.Invoke(obj).GetHashCode();
}
Then you can use the comparer with a lambda function that takes a string and gets the number using the regex:
var comparer = new GenericCompare<string>(s => regex.Match(s).Groups["num"].Value);
var distinctWithNumbers = list.Where(s => regex.IsMatch(s)).Distinct(comparer);
I have created another fiddle with this approach.
Using lookahead regex
You can use any of these 2 approaches with the regex #"^\d+(?=\.)".
Just change the lambda expressions getting the "num" group s => regex.Match(s).Groups["num"].Value with a expression that gets the regex match s => regex.Match(s).Value
Updated fiddle here.
(I could mark this as answer too)
This solution works without duplicate regex runs:
var regex = new Regex(#"^\d+(?=\.)", RegexOptions.Compiled);
list.Select(i => {
var m = regex.Match(i);
return new KeyValuePair<int, string>( m.Success ? Int32.Parse(m.Value) : -1, i );
})
.Where(i => i.Key > -1)
.GroupBy(i => i.Key)
.Select(g => g.First().Value);
Run it in this fiddle.
Your solution is good enough.
You can also use LINQ query syntax to avoid regex re-runs with the help of let keyword as follows:
var result =
from kvp in
(
from s in source
let m = regex.Match(s)
where m.Success
select new KeyValuePair<int, string>(int.Parse(m.Value), s)
)
group kvp by kvp.Key into gr
select new string(gr.First().Value);
Something like this should work:
List<string> c = new List<string>()
{
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"
};
c.Where(i =>
{
var match = Regex.Match(i, #"^\d+(?=\.)");
return match.Success;
});