How to Compare strings by Regex inside a LINQ Query? - regex

I have two datatables dtRosterList and falsefields
I want to list out all rosterlist against falsefields ,where list column value is a sentence "Hello i am a list of field" and falsefields column value having a single string "field"...i am matching by Regex.IsMatch
Regex.IsMatch(r.Field<string>("ListName"),#"\bFacebook Link\b")
is returning true in the Visual studio intermediate window but
Regex.IsMatch(r.Field("ListName"),#"\b"+fn+"\b") is coming false in Linq query itself and i am getting no rows...The query is below:
var listTobeDeleted = dtRosterList.AsEnumerable().
Where(r => falsefields.AsEnumerable()
.Select(f => f.Field<string>("FieldName")).Any(fn => Regex.Match(r.Field<string>("ListName"),#"\b"+fn+"\b",RegexOptions.IgnoreCase))).CopyToDataTable();

Two problems with your code:
You've used Regex.Match instead of Regex.IsMatch;
You've missed the # verbatim string literal prefix on the second "\b" string;
With the prefix, the string contains two characters: \ (ASCII code 92), and b (ASCII code 98). The Regex engine interprets this to mean "the match must occur on a boundary between an alphanumeric and a non-alphanumeric character".
Without the prefix, the string contains a single character: backspace (ASCII code 8). The Regex engine interprets this as a literal character, and so will only match a string containing the backspace character.
I'd also be inclined to add a Regex.Escape call around the word to find, in case it contains any special characters.
var listTobeDeleted = dtRosterList.AsEnumerable()
.Where(r => falsefields.AsEnumerable()
.Select(f => f.Field<string>("FieldName"))
.Any(fn => Regex.IsMatch(r.Field<string>("ListName"), #"\b" + Regex.Escape(fn) + #"\b", RegexOptions.IgnoreCase)))
.CopyToDataTable();

Related

Regex to match certain characters anywhere between two characters

I want to detect (and return) any punctuation within brackets. The line of text I'm looking at will have multiple sets of brackets (which I can assume are properly formatted). So given something like this:
[abc.] [!bc]. [.e.g] [hi]
I'd want to detect all those cases and return something like [[.], [!], [..]].
I tried to do /{.*?([.,!?]+).*?}/g but then it returns true for [hello], [hi] which I don't want to match!
I'm using JS!
You can match substrings between square brackets and then remove all chars that are not punctuation:
const text = '[abc.] [!bc]. [.e.g]';
const matches = text.match(/\[([^\][]*)]/g).map(x => `[${x.replace(/[^.,?!]/g, '')}]`)
console.log(matches);
If you need to make your regex fully Unicode aware you can leverage ECMAScript 2018+ compliant solution like
const text = '[abc.] [!bc、]. [.e.g]';
const matches = text.match(/\[([^\][]*)]/g).map(x => `[${x.replace(/[^\p{P}\p{S}]/gu, '')}]`)
console.log(matches);
So,
\[([^\][]*)] matches a string between [ and ] with no other [ and ] inside
.replace(/[^.,?!]/g, '') removes all chars other than ., ,, ? and !
.replace(/[^\p{P}\p{S}]/gu, '') removes all chars other than Unicode punctuation proper and symbols.

Regexp to match the whole string if there is/isn't a specific number in it

I am trying to find a way to have a regexp match a whole delimited string in case it fulfils one of the conditions:
The string should not contain number 1 (as a single digit, not 11 or 12)
The string contains number 1 (as a single digit, not 11 or 12)
The strings can be like the following format:
1,2,wo,9,5
1
wo,1,11
I have tried the following regexp:
/^.*\b(1)\b.*$
/^((?!1).)*$
I am trying to match the whole string and I would like to substitute the whole string if one of the conditions is met.
This regex will find all strings which have an occurrence of 1 as a single digit:
/^.*\b1\b.*$/
When you find a match, you can replace the whole string with the word 'true' using String.replace:
const strings = ['1,2,wo,9,5','1','wo,1,11'];
strings.map(s => console.log(s.replace(/^.*\b1\b.*$/, 'true')));
If you just wanted to replace the 1 with something, you could use a much simpler regex /\b1\b/. To replace all occurences, use the g flag:
const strings = ['1,2,wo,1,5','1','wo,1,11'];
strings.map(s => console.log(s.replace(/\b1\b/g, 'true')));
If you want to find strings that don't include 1 as a single digit, you can use a negative lookahead i.e.
^(?!.*\b1\b.*$).*$
and again use String.replace to replace the whole string with something e.g.
const strings = ['1,2,wo,9,5','1','wo,1,11','45,x,z,23'];
strings.map(s => console.log(s.replace(/^(?!.*\b1\b.*$).*$/, 'false')));

Excel VBA Regex Check For Repeated Strings

I have some user input that I want to validate for correctness. The user should input 1 or more sets of characters, separated by commas.
So these are valid input
COM1
COM1,COM2,1234
these are invalid
COM -- only 3 characters
COM1,123 -- one set is only 3 characters
COM1.1234,abcd -- a dot separator not comma
I googled for a regex pattern to this and found a possible pattern that tested for a recurring instance of any 3 characters, and I modified like so
/^(.{4,}).*\1$/
but this is not finding matches.
I can manage the last comma that may or may not be there before passing to the test so that it is always there.
Preferably, I would like to test for letters (any case) and numbers only, but I can live with any characters.
I know I could easily do this in straight VBA splitting the input on a comma delimiter and looping through each character of each array element, but regex seems more efficient, and I will have more cases than have slightly different patterns, so parameterising the regex for that would be better design.
TIA
I believe this does what you want:
^([A-Z|a-z|0-9]{4},)*[A-Z|a-z|0-9]{4}$
It's a line beginning followed by zero or more groups of four letters or numbers ending with a comma, followed by one group of four letters or number followed by an end-of-line.
You can play around with it here: https://regex101.com/r/Hdv65h/1
The regular expression
"^[\w]{4}(,[\w]{4})*$"
should work.
You can try this to see whether it works for all your cases using the following function. Assuming your test strings are in cells A1 thru A5 on the spreadsheet:
Sub findPattern()
Dim regEx As New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.Pattern = "^[\w]{4}(,[\w]{4})*$"
Dim i As Integer
Dim val As String
For i = 1 To 5:
val = Trim(Cells(i, 1).Value)
Set mat = regEx.Execute(val)
If mat.Count = 0 Then
MsgBox ("No match found for " & val)
Else
MsgBox ("Match found for " & val)
End If
Next
End Sub

Regex to create url friendly string

I want to create a url friendly string (one that will only contain letters, numbers and hyphens) from a user input to :
remove all characters which are not a-z, 0-9, space or hyphens
replace all spaces with hyphens
replace multiple hyphens with a single hyphen
Expected outputs :
my project -> my-project
test project -> test-project
this is # long str!ng with spaces and symbo!s -> this-is-long-strng-with-spaces-and-symbos
Currently i'm doing this in 3 steps :
$identifier = preg_replace('/[^a-zA-Z0-9\-\s]+/','',strtolower($project_name)); // remove all characters which are not a-z, 0-9, space or hyphens
$identifier = preg_replace('/(\s)+/','-',strtolower($identifier)); // replace all spaces with hyphens
$identifier = preg_replace('/(\-)+/','-',strtolower($identifier)); // replace all hyphens with single hyphen
Is there a way to do this with one single regex ?
Yeah, #Jerry is correct in saying that you can't do this in one replacement as you are trying to replace a particular string with two different items (a space or dash, depending on context). I think Jerry's answer is the best way to go about this, but something else you can do is use preg_replace_callback. This allows you to evaluate an expression and act on it according to what the match was.
$string = 'my project
test project
this is # long str!ng with spaces and symbo!s';
$string = preg_replace_callback('/([^A-Z0-9]+|\s+|-+)/i', function($m){$a = '';if(preg_match('/(\s+|-+)/i', $m[1])){$a = '-';}return $a;}, $string);
print $string;
Here is what this means:
/([^A-Z0-9]+|\s+|-+)/i This looks for any one of your three quantifiers (anything that is not a number or letter, more than one space, more than one hyphen) and if it matches any of them, it passes it along to the function for evaluation.
function($m){ ... } This is the function that will evaluate the matches. $m will hold the matches that it found.
$a = ''; Set a default of an empty string for the replacement
if(preg_match('/(\s+|-+)/i', $m[1])){$a = '-';} If our match (the value stored in $m[1]) contains multiple spaces or hyphens, then set $a to a dash instead of an empty string.
return $a; Since this is a function, we will return the value and that value will be plopped into the string wherever it found a match.
Here is a working demo
I don't think there's one way of doing that, but you could reduce the number of replaces and in an extreme case, use a one liner like that:
$text=preg_replace("/[\s-]+/",'-',preg_replace("/[^a-zA-Z0-9\s-]+/",'',$text));
It first removes all non-alphanumeric/space/dash with nothing, then replaces all spaces and multiple dashes with a single one.
Since you want to replace each thing with something different, you will have to do this in multiple iterations.
Sorry D:

Regular Expression to split by comma + ignores comma within double quotes. VB.NET

I'm trying to parse csv file with VB.NET.
csv files contains value like 0,"1,2,3",4 which splits in 5 instead of 3. There are many examples with other languages in Stockoverflow but I can't implement it in VB.NET.
Here is my code so far but it doesn't work...
Dim t As String() = Regex.Split(str(i), ",(?=([^\""]*\""[^\""]*\"")*[^\""]*$)")
Assuming your csv is well-formed (ie no " besides those used to delimit string fields, or besides ones escaped like \"), you can split on a comma that's followed by an even number of non-escaped "-marks. (If you're inside a set of "" there's only an odd number left in the line).
Your regex you've tried looks like you're almost there.
The following looks for a comma followed by an even number of any sort of quote marks:
,(?=([^"]*"[^"]*")*[^"]*$)
To modify it to look for an even number of non-escaped quote marks (assuming quote marks are escaped with backslash like \"), I replace each [^"] with ([^"\\]|\\.). This means "match a character that isn't a " and isn't a blackslash, OR match a backslash and the character immediately following it".
,(?=(([^"\\]|\\.)*"([^"\\]|\\.)*")*([^"\\]|\\.)*$)
See it in action here.
(The reason the backslash is doubled is I want to match a literal backslash).
Now to get it into vb.net you just need to double all your quote marks:
splitRegex = ",(?=(([^""\\]|\\.)*""([^""\\]|\\.)*"")*([^""\\]|\\.)*$)"
Instead of a regular expression, try using the TextFieldParser class for reading .csv files. It handles your situation exactly.
TextFieldParserClass
Especially look at the HasFieldsEnclosedInQuotes property.
Example:
Note: I used a string instead of a file, but the result would be the same.
Dim theString As String = "1,""2,3,4"",5"
Using rdr As New StringReader(theString)
Using parser As New TextFieldParser(rdr)
parser.TextFieldType = FieldType.Delimited
parser.Delimiters = New String() {","}
parser.HasFieldsEnclosedInQuotes = True
Dim fields() As String = parser.ReadFields()
For i As Integer = 0 To fields.Length - 1
Console.WriteLine("Field {0}: {1}", i, fields(i))
Next
End Using
End Using
Output:
Field 0: 1
Field 1: 2,3,4
Field 2: 5
This worked great for parsing a Shipping Notice .csv file we receive. Thanks for keeping this solution here.
This is my version of the code:
Try
Using rdr As New IO.StringReader(Row.FlatFile)
Using parser As New FileIO.TextFieldParser(rdr)
parser.TextFieldType = FileIO.FieldType.Delimited
parser.Delimiters = New String() {","}
parser.HasFieldsEnclosedInQuotes = True
Dim fields() As String = parser.ReadFields()
Row.Account = fields(0).ToString().Trim()
Row.AccountName = fields.GetValue(1).ToString().Trim()
Row.Status = fields.GetValue(2).ToString().Trim()
Row.PONumber = fields.GetValue(3).ToString().Trim()
Row.ErrorMessage = ""
End Using
End Using
Catch ex As Exception
Row.ErrorMessage = ex.Message
End Try
It's possible to do it with regex VB.NET in the following way:
,(?=(?:[^"]*"[^"]*")*[^"]*$)
The positive lookahead ((?= ... )) ensures that there is an even number of quotes ahead of the comma to split on (i.e. either they occur in pairs, or there are none).
[^"]* matches non-quote characters.
Given below is a VB.NET example to apply the regex.
Imports System
Imports System.Text.RegularExpressions
Public Class Test
Public Shared Sub Main()
Dim theString As String = "1,""2,3,4"",5"
Dim theStringArray As String() = Regex.Split(theString, ",(?=(?:[^""\\]*""[^""\\]*"")*[^""\\]*$)")
For i As Integer = 0 To theStringArray.Length - 1
Console.WriteLine("theStringArray {0}: {1}", i, theStringArray(i))
Next
End Sub
End Class
'Output:
'theStringArray 0: 1
'theStringArray 1: "2,3,4"
'theStringArray 2: 5