I need to remove some emoji characters from a string using classic asp and vb script. Here is what I have:
👪 Repeat / Other
📅 Scheduled
💲 Lead
And what I need:
Repeat / Other
Scheduled
Lead
I have been able to remove the emojis using this function but I want to keep special characters such as the forward slash /, spaces, &, :, etc.
Any help is appreciated.
Function strClean (strtoclean)
Dim objRegExp, outputStr
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "((?![a-zA-Z0-9]).)+"
outputStr = objRegExp.Replace(strtoclean, "-")
objRegExp.Pattern = "\-+"
outputStr = objRegExp.Replace(outputStr, "")
strClean = outputStr
End Function
Your current regex matches any char but a line break and ASCII alphanumeric chars. It does not match emojis because VBScript ECMA-262 3rd edition based regex engine cannot match astral plane chars with a mere . pattern.
If you want to just add the emoji matching support to your current pattern, you can replace the . with (?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]) pattern and use
objRegExp.Pattern = "(?:(?![a-zA-Z0-9])(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]))+"
See the regex demo
If you just want to remove all but ASCII chars, you can use
objRegExp.Pattern = "objRegExp.Pattern = "(?:(?![ -~])[\s\S])+"
The pattern matches any one or more (+) chars ([\s\S] matches any whitespace and non-whitespace chars) that does not equal the printable ASCII chars.
Related
I need to replace the date and time in xml file using regex pattern.
xml text would contain:
w:date="2022-12-01T01:17:00Z"
w:date="2022-12-01T02:17:00Z"
w:date="2022-12-02T03:17:00Z"
possible regex pattern for the above would be:
w:date="[\d\W]\w[\d\W]\w"
but it is not replacing anything and the resulted string remain intact in the following VBA code:
Sub ChangeDateTime()
Dim sWOOXML As String
Set objRegEx = CreateObject("vbscript.regexp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.MultiLine = True
objRegEx.Pattern = "w:date=" & Chr(34) & "[\d\W]\w[\d\W]\w" & Chr(34)
sWOOXML = ActiveDocument.Content.WordOpenXML
sWOOXML = objRegEx.Replace(sWOOXML, "")
ActiveDocument.Content.InsertXML sWOOXML
Beep
End Sub
Your [\d\W]\w[\d\W]\w regex prevents from matching since it only finds two repetitions of a digit or non-word char + a word char sequence between two double quotes, while you have many more chars there.
You can use
objRegEx.Pattern = "w:date=""\d{4}-\d{1,2}-\d{1,2}T\d{1,2}:\d{1,2}:\d{1,2}Z"""
See the regex demo. Note you may add a double quote to the string using a doubled ", no need to use Chr(34).
This is a verbose pattern where \d{1,2} matches one or two digits and \d{4} matches four digits, the rest is self-explanatory.
I use this code below to check if the string is match to pattern or not.
Sub chkPattern(str As String, pattern As String)
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
objRegex.pattern = pattern
MsgBox objRegex.test(str)
End Sub
Specifically, I want to check if string match whole string "abc" or "cde" of "xy"
For example, if inputs are "abccde" or "abcxy" or "abccdexyz", I expect it will return false
Some patterns that I have already try like : "abc|cde|xyz" , "\b(abc|cde|xyz)\b)" are not working
Can this be done in VBA by using Regex?
It is possible yes. As I read your question you want to apply the OR with the pipe character.
Sub Test()
Dim arr As Variant: arr = Array("abc", "cde", "xy")
With CreateObject("VBScript.RegExp")
.Pattern = "^(" & Join(arr, "|") & ")$"
Debug.Print .Test("abcd") 'Will return False
Debug.Print .Test("abc") 'Will return True
End With
End Sub
The key to match the whole string here are the start string ancor ^ and the end string ancor $. If you meant you wanted to test for partial match, you have simply reversed the slashes. Use backslashes instead of forward slashes > \b(abc|cde|xyz)\b as a pattern.
Remember, when you want to ignore case comparison, use .IgnoreCase = True.
Alternatively use the build-in Like operator.
To match whole word use
(\w+)
https://regex101.com/r/sve6Tp/1
(\w+) Capturing Group
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible,
\babc\b|\bcde\b|\bxy\b should work for "abc" or "cde" or "xy" but not other variants.
Hello I'm trying to search all the matching expressions in a file through a Regex in VB.NET
I have the function:
Dim written As MatchCollection = Regex.Matches(ToTreat, "\bGlobalIndexImage = \'(?![0-9])([A-Za-z])\w+\'")
For Each writ As Match In written
For Each w As Capture In writ.Captures
MsgBox(w.Value.ToString)
Next
Next
I have this Regex now:
\bGlobalIndexImage = \'(?![0-9])([A-Za-z])\w+\'
I'm trying to match all occurrences under this form:
GlobalIndexImage = 'images'
GlobalIndexImage = 'Search'
But I also get values like this which I don't want to match:
GlobalIndexImage = 'Z0003_S16G2'
So I wanted in my Regex to simply exclude a match if it contains numbers.
The \w shorthand character class matches letters and digits and _. If you need only letters, just use [a-zA-Z]:
"\bGlobalIndexImage = '([A-Za-z]+)'"
See the regex demo.
Details:
\b - a leading word boundary
GlobalIndexImage = ' - a string of literal chars
([A-Za-z]+) - Group 1 capturing one or more (due to + quantifier) ASCII letters
' - a single quote.
If you need to match any Unicode letters, replace [a-zA-Z] with \p{L}.
VB.NET:
Dim text = "GlobalIndexImage = 'images' GlobalIndexImage = 'Search'"
Dim pattern As String = "\bGlobalIndexImage = '([A-Za-z]+)'"
Dim matches As List(Of String) = Regex.Matches(text, pattern) _
.Cast(Of Match)() _
.Select(Function(m) m.Groups(1).Value) _
.ToList()
Console.WriteLine(String.Join(vbLf, matches))
Output:
To catch everything that's not a number use \D
So your regex will be something like
\bGlobalIndexImage = \'\d+\'
But this will also include words with white spaces. To get only letters use [a-zA-Z]
\bGlobalIndexImage = \'[a-zA-Z]+\'
I have struggled with this expression for 2 days now so I thought I'd ask for some proper help from the world of knowledge. I hope someone can help.
This is the RegEx I built to get me what I want.
\S*\d*?-[A-Z]*[0-9]*
I only want the Uppercase Letters and Numbers with dashes, so it does get GC-113, AO-1-GC-113, AO-2-GC-113, which is great!
"I don't want this ------, but this is good GC-113, AO-1-GC-113, AO-2-GC-113"
BUT if I come across one where there is no space between the number, but just another character like a comma or a period then it returns a match on the entire section "GC-113,AO-1-GC-113,AO-2-GC-113"
"I don't want this ------, but this is good GC-113,AO-1-GC-113,AO-2-GC-113"
I'm using RegExBuddy to try and figure this out.
This is the VBA code I'm using get the matches.
Public Function GetRIs(ByVal vstrInString As String) As Collection
Dim myRegExp As RegExp
Dim myMatches As Variant
Dim myMatch As Variant
Set GetRIs = New Collection
Set myRegExp = New RegExp
myRegExp.Global = True
myRegExp.Pattern = "\S*\d*?-[A-Z]*[0-9]*"
Set myMatches = myRegExp.Execute(vstrInString)
For Each myMatch In myMatches
If myMatch.Value <> "" Then
GetRIs.Add myMatch.Value
End If
Next
End Function
Thanks!
Dave
Your \S*\d*?-[A-Z]*[0-9]* pattern can even match a single hyphen as only - is obligatory and the rest of subpatterns can match zero times (can be absent from the string).
You can use
myRegExp.Pattern = "\b[A-Z0-9]+(?:-[A-Z0-9]+)+"
The pattern matches:
\b - a word boundary (before the next letter or digit there must be a non-word character or start of string
[A-Z0-9]+ - one or more letters or digits
(?:-[A-Z0-9]+)+ - 1 or more sequences of:
- - a hyphen
[A-Z0-9]+ - one or more letters or digits
how to check particular value start with string or digit. here i attached my code. am getting error to like idendifier expected.
code
----
Dim i As String
dim ReturnValue as boolean
i = 400087
Dim s_str As String = i.Substring(0, 1)
Dim regex As Regex = New Regex([(a - z)(A-Z)])
ReturnValue = Regex.IsMatch(s_str, Regex)
error
regx is type and cant be used as an expression
Your variable is regex, Regex is the type of the variable.
So it is:
ReturnValue = Regex.IsMatch(s_str, regex)
But your regex is also flawed. [(a - z)(A-Z)] is creating a character class that does exactly match the characters ()-az, the range A-Z and a space and nothing else.
It looks to me as if you want to match letters. For that just use \p{L} that is a Unicode property that would match any character that is a letter in any language.
Dim regex As Regex = New Regex("[\p{L}\d]")
maybe you mean
Dim _regex As Regex = New Regex("[(a-z)(A-Z)]")
Dim regex As Regex = New Regex([(a - z)(A-Z)])
ReturnValue = Regex.IsMatch(s_str, Regex)
Note case difference, use regex.IsMatch. You also need to quote the regex string: "[(a - z)(A-Z)]".
Finally, that regex doesn't make sense, you are matching any letter or opening/closing parenthesis anywhere in the string.
To match at the start of the string you need to include the start anchor ^, something like: ^[a-zA-Z] matches any ASCII letter at the start of the string.
Check if a string starts with a letter or digit:
ReturnValue = Regex.IsMatch(s_str,"^[a-zA-Z0-9]+")
Regex Explanation:
^ # Matches start of string
[a-zA-Z0-9] # Followed by any letter or number
+ # at least one letter of number
See it in action here.