I am using Microsoft Project VBA to translate my activity names from English to Chinese.
My problem is I have some Chinese translations embedded in some of the English activity names. I want to strip out the Chinese characters before passing the string to Microsoft Translator.
Any ideas as to how I can do that?
You can use a Regexp to strip the Chinese unicode characters
Wikipedia lists the relevant characters below
Sub Test()
Dim myString as String
myString = "This is my string with a " & ChrW$(&H6C49) & " in it."
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "[\u4E00-\u9FFF\u6300-\u77FF\u7800-\u8CFF\u8D00-\u9FFF]+"
MsgBox .Replace(myString, vbNullString)
End With
End Sub
So this regexp will strip out these ranges. I have used aldo.roman.nurena's string example
You have to use ChrW$() as this:
MyString = "This is my string with a " & ChrW$(&H6C49) & " in it."
The H6C49 is available (thanks God for that) on Unicode as CJK codes (Chinese, Japanese and Korean). See this to take a look of the characters range.
So, you have to check the character Unicode code and then compare if it is already on the CJK range so as to translate it or not.
There is also a good explanation and even a program to translate strings here
Related
I'm new to VBA and would like to seek some help with regards to using RegEx and I hope somehow can enlighten me on what I'm doing wrong. I'm currently trying to split a date into its individual date, month and year, and possible delimiters include "," , "-" and "/".
Function formattedDate(inputDate As String) As String
Dim dateString As String
Dim dateStringArray() As String
Dim day As Integer
Dim month As String
Dim year As Integer
Dim assembledDate As String
Dim monthNum As Integer
Dim tempArray() As String
Dim pattern As String()
Dim RegEx As Object
dateString = inputDate
Set RegEx = CreateObject("VBScript.RegExp")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
' .... code continues
This is what I am currently doing. However, there seems to be something wrong during the RegEx.Split function, as it seems to cause my codes to hang and not process further.
To just confirm, I did something simple:
MsgBox("Hi")
pattern = "(/)|(,)|(-)"
dateStringArray() = RegEx.Split(dateString, pattern)
MsgBox("Bye")
"Hi" msgbox pops out, but the "Bye" msgbox never gets popped out, and the codes further down don't seem to get excuted at all, which led to my suspicion that the RegEx.Split is causing it to be stuck.
Can I check if I'm actually using RegEx.Split the right way? According to MSDN here, Split(String, String) returns an array of strings as well.
Thank you!
Edit: I'm trying not to explore the CDate() function as I am trying not to depend on the locale settings of the user's computer.
To split a string with a regular expression in VBA:
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(text, ChrW(-1)), ChrW(-1))
End Function
Usage example:
Dim v
v = SplitRe("a,b/c;d", "[,;/]")
Splitting by a regex is definitely nontrivial to implement compared to other regex operations, so I don't blame you for being stumped!
If you wanted to implement it yourself, it helps to know that RegExp objects from Microsoft VBScript Regular Expressions 5.5 have a FirstIndex property and a Length property, such that you can loop through the matches and pick out all the substrings between the end of one match (or the start of the string) and the start of the next match (or the end of the string).
If you don't want to implement it yourself, I've also implemented a RegexSplit UDF using those same RegExp objects on my GitHub.
Quoting an example from the documentation of VbScript Regexp:
https://msdn.microsoft.com/en-us/library/y27d2s18%28v=vs.84%29.aspx
Function SubMatchTest(inpStr)
Dim retStr
Dim oRe, oMatch, oMatches
Set oRe = New RegExp
' Look for an e-mail address (not a perfect RegExp)
oRe.Pattern = "(\w+)#(\w+)\.(\w+)"
' Get the Matches collection
Set oMatches = oRe.Execute(inpStr)
' Get the first item in the Matches collection
Set oMatch = oMatches(0)
' Create the results string.
' The Match object is the entire match - dragon#xyzzy.com
retStr = "Email address is: " & oMatch & vbNewLine
' Get the sub-matched parts of the address.
retStr = retStr & "Email alias is: " & oMatch.SubMatches(0) ' dragon
retStr = retStr & vbNewLine
retStr = retStr & "Organization is: " & oMatch.SubMatches(1) ' xyzzy
SubMatchTest = retStr
End Function
To test, call:
MsgBox(SubMatchTest("Please send mail to dragon#xyzzy.com. Thanks!"))
In short, you need your Pattern to match the various parts you want to extract, with the spearators in between, maybe something like:
"(\d+)[/-,](\d+)[/-,](\d+)"
The whole thing will be in oMatch, while the numbers (\d) will end up in oMatch.SubMatches(0) to oMatch.SubMatches(2).
I have trouble finding working solution for couple of hours now. I hope you will help me.
My problem:
I need to find and select in Word a whole sentence after providing the starting and ending strings of particular sentence.
For example, when my starting string is "People" and ending string is "apples." I expect Word to select the whole "People like red apples." sentence in my document. (If such a sentence exists)
For this purpose I prepared a macro which works almost like I want. The only problem is that it doesn't select the smallest possible set of characters (which I want it to do). To make it clear let's assume I have this text in my document: People like smoking. People like red apples.
Now, when I provide the starting and ending strings to the macro respectively as "People" and "apples.", it selects all the text, which contains 2 sentences mentioned above. That is my problem: I wanted it to select only the second sentence (People like red apples.), not both of them, even though they start with the same word. So, basically, I always want to select the shortest possible set of characters (which in this case is only the last sentence).
Here is a part of my macro in VBA:
`text_str = startStr & "*" & endStr
With Application.Selection.Find
.ClearFormatting
.Forward = True
.Wrap = wdFindContinue
.Text = text_str
.MatchWildcards = True
.MatchCase = True
.Execute
End With
I know the problem is with the Wildcards (or very limited set of regular expressions), so I also tried something like this as the search string:
text_str = "(" & startStr & "*){1}" & endStr
It also didn't help. I'm stuck here. :/
Thanks for any suggestions!
Selection.Find has something similar to regular expressions,
but in this case you must use real regular expressions.
The pattern (in this particular case) should be:
People[^.]+apples\.
I wrote an example macro, which:
Selects the whole text in the document and assigns it to src
variable (searched by the regex).
Sets the cursor at the beginning of the document.
Checks whether the pattern can be matched (regEx.Test).
Executes the regex.
Assigns the matched string to ret variable.
Displays it in a message box.
Below you have a complete macro. Probably you should change it to
select (find) the text matched (instead of the message box).
Sub Re()
Dim startStr As String: startStr = "People"
Dim endStr As String: endStr = "apples"
Dim pattern As String: pattern = startStr & "[^.]+" & endStr & "\."
Dim regEx As New RegExp
Dim src As String
Dim ret As String
Dim colMatches As MatchCollection
ActiveDocument.Range.Select
src = ActiveDocument.Range.Text
Selection.StartOf
regEx.pattern = pattern
If (regEx.Test(src)) Then
Set colMatches = regEx.Execute(src)
ret = "Match: " & colMatches(0).Value
Else
ret = "Matching Failed"
End If
MsgBox ret, vbOKOnly, "Result"
End Sub
I have an excel sheet filled with tweets. There are several entries which contain #blah type of strings among other. I need to keep the rest of the text and remove the #blah part. For example: "#villos hey dude" needs to be transformed into : "hey dude". This is what i ve done so far.
Sub Macro1()
'
' Macro1 Macro
'
Dim counter As Integer
Dim strIN As String
Dim newstring As String
For counter = 1 To 46
Cells(counter, "E").Select
ActiveCell.FormulaR1C1 = strIN
StripChars (strIN)
newstring = StripChars(strIN)
ActiveCell.FormulaR1C1 = StripChars(strIN)
Next counter
End Sub
Function StripChars(strIN As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "^#?(\w){1,15}$"
.ignorecase = True
StripChars = .Replace(strIN, vbNullString)
End With
End Function
Moreover there are also entries like this one: Ÿ³é‡ï¼Ÿã€€åˆã‚ã¦çŸ¥ã‚Šã¾ã—ãŸã€‚ shiftã—ãªãŒã‚‰ã‚¨ã‚¯ã‚¹ãƒ
I need them gone too! Ideas?
For every line in the spreadsheet run the following regex on it: ^(#.+?)\s+?(.*)$
If the line matches the regex, the information you will be interested in will be in the second capturing group. (Usually zero indexed but position 0 will contain the entire match). The first capturing group will contain the twitter handle if you need that too.
Regex demo here.
However, this will not match tweets that are not replies (starting with #). In this situation the only way to distinguish between regular tweets and the junk you are not interested in is to restrict the tweet to alphanumerics - but this may mean some tweets are missed if they contain any non-alphanumerical characters. The following regex will work if that is not an issue for you:
^(?:(#.+?)\s+?)?([\w\t ]+)$
Demo 2.
I have a Full Name as input and want to split the whole full name word by word but it should:
Do not Split the word if contains a Hyphen e.g. REES-MOGG
Should Split the word if contains an Underscore e.g REES_MOGG
HYPHEN
Example:
MRS C REES-MOGG
Result:
MRS
C
REES-MOGG
UNDERSCORE
Example:
MRS C REES_MOGG
Result :
MRS
C
REES
MOGG
I am currently using the code below but in vain:
Dim str As String() = Regex.Split(names, "\s+")
Just split on "\s+|_", that will split on whitespace, and also on underscores. Your code would be:
Dim str As String() = **Regex.Split(names, "\s+|_")**
Demo.
For it to split on ampersands too, just add |\& to the string:
Dim str As String() = **Regex.Split(names, "\s+|_|\&")**
Demo.
use this :
Dim str As String() = Regex.Split(names, "[\s_]+")
Dim str As String() = names.Split({" ", "_", "&", vbTab}, StringSplitOptions.RemoveEmptyEntries)
In order to make your script split on white-space and underscores you simply need to add a character group [ ] around the white-space character \s in your regex and then add any other symbols which you want to spit on into that group.
Dim str As String() = Regex.Split(names, "[\s_]+")
I don't know much about VB .NET, but you should change your RegEx for sure.
here is an example, though I tested on Javascript.
Dim matchForHyphen As MatchCollection = Regex.Matches("MRS C REES-MOGG","[\w]*[^_]*")
Dim matchForUnderscore As MatchCollection = Regex.Match("MRS C REES_MOGG","[\w]*[^_]*")
Then you should cycle through the Match objects to get the results.
eg. matchForHyphen[i] in a For cycle. or a For Each statement
Hope it helps
I am doing this task as part of a larger sub in order to massively reduce the workload for a different team.
I am trying to read in a string and use Regular Expressions to replace one-to-many spaces with a single space (or another character). At the moment I am using a local string, however in the main sub this data will come from an external .txt file. The number of spaces between elements in this .txt can vary depeneding on the row.
I am using the below code, and replacing the spaces with a dash. I have tried different variations and different logic on the below code, but always get "Run-time error '91': Object Variable or with clock variable not set" on line "c = re.Replace(s, replacement)"
After using breakpoints, I have found out that my RegularExpression (re) is empty, but I can't quite figure out how to progress from here. How do I replace my spaces with dashes? I have been at this problem for hours and spent most of that time on Google to see if someone has had a similar issue.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub
Extra information: Using Excel 2010. Have successfully linked all my references (Microsoft VBScript Regular Expressions 5.5". I was sucessfully able to replace the spaces using the vanilla "Replace" function, however as the number of spaces between elements vary I am unable to use that to solve my issue.
Ed: My .txt file is not fixed either, there are a number of rows that are different lengths so I am unable to use the MID function in excel to dissect the string either
Please help
Thanks,
J.H.
You're not setting up the RegExp object correctly.
Dim pattern As String
pattern = "\s+" ' pattern is just a local string, not bound to the RegExp object!
You need to do this:
Dim re As RegExp
Set re = New RegExp
re.Pattern = "\s+" ' Now the pattern is bound to the RegExp object
re.Global = True ' Assuming you want to replace *all* matches
s = "hello World"
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Try setting the pattern inside your Regex object. Right now, re is just a regex with no real pattern assigned to it. Try adding in re.Pattern = pattern after you initialize your pattern string.
You initialized the pattern but didn't actually hook it into the Regex. When you ended up calling replace it didn't know what it was looking for pattern wise, and threw the error.
Try also setting the re as a New RegExp.
Sub testWC()
Dim s As String
Dim c As String
Dim re As RegExp
Set re = New RegExp
s = "hello World"
Dim pattern As String
pattern = "\s+"
re.Pattern = pattern
Dim replacement As String
replacement = "-"
c = re.Replace(s, replacement)
Debug.Print (c)
End Sub