Regular expression substring replacement in Microsoft Excel [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I would like to bulk replace the "07" part of a list of strings (mobile telephone numbers) with the international version "447".
The list of strings currently forms a columnn in an Excel spreadsheet.
I have the regular expression to match strings requiring modification:
^07[0-9]{9}$
...but I don't know how to do the replacement that I need.
The data is in an Excel spreadsheet, but can of course be exported.
Preferred solution would be to keep the data in Microsoft Excel, but it can of course be exported and then re-imported. I know TextMate has a regular expression replace feature. Can this help me?

I was about to go off looking for elegant VBA solutions or whatever, then I thought: 'Hang on. We just want to manipulate some data in a spreadsheet we own. Why complicate things?'
How does this idea sound to you:
insert a new column just after the column with the existing data (let's assume that is column C)
fill the new column with this formula: ="447" & RIGHT(C1, 9)
select column D (which now contains the new values) and Paste Values (which is in the Paste Special dialog) onto column C, replacing existing values
delete the 'working' column D
It's not programming but if you only have to do it once you don't need a program, right?

Use Excel VBA. Make a reference to "Microsoft VBScript Regular Expressions 5.5".
Then do, in a new regular VBA module:
Sub ReplaceMobileNumbers
Dim re as New RegExp
re.Pattern = "^0(?=7[0-9]{9}$)" ''# look-ahead
Dim cell As Range
For Each cell In ActiveSheet.Range("Your Range Address in A1:B1 notation")
cell.Value = re.Replace(cell.value, "44")
Next cell
End Sub
and call this sub in the Immediate Window. The above is throw-away code, not designed with re-usability in mind. I know that, so don't tell me. ;-)
Though you can probably get away with a cell function:
=IF(AND(LEN(A1) = 11;LEFT(A1; 2) = "07"); "44" & RIGHT(A1; 10); A1)

You'll have to include Microsoft Regular Expressions in your sheet (add it as a Reference)
Then make a quick macro, like the following, to use it:
Dim reg As New RegExp
Public Function RegMatch(Source As Range, Pattern As String, Optional IgnoreCase As Boolean = True, Optional MultiLine As Boolean = True) As Long
Dim rng As Range, i As Long, j As Long
reg.IgnoreCase = IgnoreCase
reg.MultiLine = MultiLine
reg.Pattern = Pattern
i = 0: j = 0
For Each rng In Source
i = i + 1
If reg.test(rng.Value) Then
j = i
Exit For
End If
Next
RegMatch = j
End Function
Then, simply call it as a macro in your sheet (for example):
=INDEX(B6:B15, RegMatch($A$6:$A$15, $A$3))
Where the first argument is your range, and the second argument is your pattern (as above)

Related

how to parse the key value pair with regex in C++ [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have some string with such format:
aaaaaaaaaaaa //first line
[key = [metadata = 1 metadata = 2 metadata =3] KEY(1) = 100
KEY(2) = 16:30:00 KEY(3) = 2020-12-12 08:30:30 KEY(4) = 0]
I want to get the key value pairs in Json format like
{"KEY(1)":"100", "KEY(2)":"16:30:00", "KEY(3)":"2020-12-12 08:30:30", "KEY(4)":"0"}
I am kind of struggling to deal with the last part, because there could be space also in value like 2020-12-12 08:30:30, so the only way I can think of is to find the "=", the data between the first space and the second space on the left is the current key, and all rest util the previous "=" is the value for previous key, which is tricky and I am new to REGEX, how should I do it? Thanks!
I would not try to use a regex to do this.
There are difficulties you haven't considered yet. For example, the quoted string can contain an =, or (worse) it can contain a quote mark, so something like this is unusual, but seems to be legitimate:
{ "\"key\"=\"value\"" = "This is the value"}
When you're done parsing it, the key in this case will be "key" = "value" (with the quote marks and equal sign included in the string.
So not only do you need to recognize the beginning and end of each part of what you're dealing with, but in some cases you need to do some transformations on it to get the correct string.
Now, I'm not going to say this can't be done using a regex--but I think (at best) developing a regex that will work correctly will be more trouble than it's worth.

How to use Regex in DataSet.Tables.Select() in VB.net

I have a dataset that contains multiple values. I want to take those rows from that dataset from the datatable BLABLA that contains an "S" with the numbers from zero to six. Then I want to display those in a MessageBox.
My Regex is S[0-6].
Dim answer As String = ""
Dim myregex As Regex = New Regex("S[0-6]")
Dim SearchRows() As DataRow = datasetB.Tables("BLABLA").Select("Data LIKE '%myregex%'")
For k As Integer = 0 To SearchRows.Length - 1
If answer = "" Then
answer = SearchRows(k).Item("Data")
Else
answer = answer & vbNewLine & SearchRows(k).Item("Data")
End If
Next
MsgBox(answer)
Unfortunately SearchRows is empty. I couldn't find the reason by debugging.
What am I doing wrong?
The DataTable.Select method does not support regex. As the documentation states, it does allow you to pass it a filterExpression string as an argument, but just because it takes a filter expression doesn't mean that it support's regex expressions. On the contrary, it's designed to mostly support the same kinds of expressions as the WHERE clause in T-SQL. T-SQL's LIKE operator does not support regex patterns, and neither does DataTable.Select. See this documentation to learn the rules for the pattern expressions that are supported by the DataTable.Select method's LIKE operator.
The filter expressions supported by the LIKE operator are not as advanced as regex, so it's almost certainly impossible to construct a filter expression which is that specific. If there is a way to filter to digits between 0 and 6, I am unaware of it and the documentation doesn't mention it. So, if you really need to filter rows by regex, you can still do it, but you need to select all the rows and then filter them yourself:
Dim SearchRows() As DataRow = datasetB.Tables("BLABLA").Select().
Where(Function(r) myregex.IsMatch(r.Item("Data").ToString())).
ToArray()

VBA code to check is value from array in present in a cell?

I need to loop though a cell range that contains one or several locale ISO codes in a CSV fashion e.g esES, frFR, itIT, etc.
If ANY of these values are contained within a cell, I select it and paste it to another workbook. The latter part I got covered, but I can't figure how to make the former part work. This is the code I'm working with at the moment:
OTHERS_V = "*arAR*|*bgBG*|*csCZ*|*daDK*"
For Each cell In Intersect(Sheets("Requests").Range("G:G"), Sheets("Requests").UsedRange)
If cell.Value Like OTHERS_V Then [...]
I'm pretty new to VBA and I don't know much about Regex in this language but from my experience this should read something like:
(anything + "arAR" + anything) OR (anything + "bgBG" + anything) OR [...]
etc.
It doesn't seem to work though. How would you go about accomplishing what I'm after in this context?
As per my comments, put OTHERS_V list in an array and loop testing each one:
Sub fooo()
Dim OTHERS_V()
Dim cell As Range
Dim i As Long
OTHERS_V = Array("*arAR*", "*bgBG*", "*csCZ*", "*daDK*")
For Each cell In Intersect(Sheets("Requests").Range("G:G"), Sheets("Requests").UsedRange)
For i = LBound(OTHERS_V) To UBound(OTHERS_V)
If cell.Value Like OTHERS_V(i) Then
'do your stuff
Exit For
End If
Next i
Next cell
End Sub

How can I normalize / asciify Unicode characters in Google Sheets?

I'm trying to write a formula for Google Sheets which will convert Unicode characters with diacritics to their plain ASCII equivalents.
I see that Google uses RE2 in its "REGEXREPLACE" function. And I see that RE2 offers Unicode character classes.
I tried to write a formula (similar to this one):
REGEXREPLACE("público","(\pL)\pM*","$1")
But Sheets produces the following error:
Function REGEXREPLACE parameter 2 value "\pL" is not a valid regular expression.
I suppose I could write a formula consisting of a long set of nested SUBSTITUTE functions (Like this one), but that seems pretty awful.
Can any offer a suggestion for a better way to normalize Unicode letters with diacritical/accent marks in a Google Sheets formula?
[[:^alpha:]] (negated ASCII character class) works fine for REGEXEXTRACT formula.
But =REGEXREPLACE("público","([[:alpha:]])[[:^alpha:]]","$1") gives "pblic" as a result. So, I guess, formula doesn't know what exact ASCII character must replace "ú".
Workaround
Let's take the word públicē; we need to replace two symbols in it. Put this word in cell A1, and this formula in cell B1:
=JOIN("",ArrayFormula(IFERROR(VLOOKUP(SPLIT(REGEXREPLACE(A1,"(.)","$1-"),"-"),D:E,2,0),SPLIT(REGEXREPLACE(A1,"(.)","$1-"),"-"))))
And then make directory of replacements in range D:E:
D E
1 ú u
2 ē e
3 ... ...
This formula is still ugly, but more useful because you can control your directory by adding more characters to the table.
Or use Java Script
Also found a good solution, which works in google sheets.
This did it for me in Google Sheets, Google Apps Scripts, GAS
function normalizetext(text) {
var weird = 'öüóőúéáàűíÖÜÓŐÚÉÁÀŰÍçÇ!#£$%^&*()_+?/*."';
var normalized = 'ouooueaauiOUOOUEAAUIcC ';
var idoff = -1,new_text = '';
var lentext = text.toString().length -1
for (i = 0; i <= lentext; i++) {
idoff = weird.search(text.charAt(i));
if (idoff == -1) {
new_text = new_text + text.charAt(i);
} else {
new_text = new_text + normalized.charAt(idoff);
}
}
return new_text;
}
This answer doesn't require a Google App Script, and it's still fast, and relatively simple. It builds on Max's answer by providing a full lookup table, and it also allows for case-sensitive transliteration (normally VLOOKUP is NOT case-sensitive).
Here is a link to the Google Spreadsheet if you want to jump right into it. If you want to use your own sheet, you'll need to copy the TRANS_TABLE sheet into your Spreadsheet.
In the code snippet below, the source cell is A2, so you'd place this formula in any column on row 2. Using REGEXREPLACE AND SPLIT, we split apart the string in A2 into an array of characters, then USING ARRAYFORMULA, we do the following to EACH character in the array: First, the character is converted to its 'decimal' CODE equivalent, then matched against a table on the TRANS_TABLE sheet by that number, then using VLOOKUP, a character X number of columns over (the index value provided) on the TRANS_TABLE sheet (in this case, the 3rd column over) is returned. When all characters in the array have been transliterated, we finally JOIN the array of characters back into a single string. I provided examples with named ranges as well.
=iferror(
join(
"",
ARRAYFORMULA(
vlookup(
code(split(REGEXREPLACE($A2,"(.)", "$1;"),";",TRUE)),
TRANS_TABLE!$A$5:$F,3
)
)
)
,)
You'll note on the TRANS_TABLE sheet I made, I created 4 different transliteration columns, which makes it easy to have a column for each of your transliteration needs. To reference the column, just use a different index number in the VLOOKUP. Each column is simply a replacement character column. In some cases, you don't want any conversion made (A -> A or 3 -> 3), so you just copy the same character from the source Glyph column. Where you DO want to convert characters, you type in whatever character you want replaced (ñ -> n etc). If you want a character removed altogether, you leave the cell blank (? -> ''). You can see examples of the transliteration output on the data sheet in which I created 4 different transliteration columns (A-D) referencing each of the Transliteration tables from the TRANS_TABLE sheet for different use case scenarios.
I hope this finally answers your question in a fashion that isn't so "ugly." Cheers.

Why does Find/Replace zRngResult.Find work fine, but RegEx myRegExp.Execute(zRngResult) mess up the range.Start?

I wish to select and add comments after certain words, e.g. “not”, “never”, “don’t” in sentences in a Word document with VBA. The Find/Replace with wildcards works fine, but “Use wildcards” cannot be selected with “Match case”. The RegEx can “IgnoreCase=True”, but the selection of the word is not reliable when there are more than one comments in a sentence. The Range.start seems to be getting modified in a way that I cannot understand.
A similar question was asked in June 2010. https://social.msdn.microsoft.com/Forums/office/en-US/f73ca32d-0af9-47cf-81fe-ce93b13ebc4d/regex-selecting-a-match-within-the-document?forum=worddev
Is there a new/different way of solving this problem?
Any suggestion will be appreciated.
The code using RegEx follows:
Function zRegExCommentor(zPhrase As String, tComment As String) As Long
Dim sTheseSentences As Sentences
Dim rThisSentenceToSearch As Word.Range, rThisSentenceResult As Word.Range
Dim myRegExp As RegExp
Dim myMatches As MatchCollection
Options.CommentsColor = wdByAuthor
Set myRegExp = New RegExp
With myRegExp
.IgnoreCase = True
.Global = False
.Pattern = zPhrase
End With
Set sTheseSentences = ActiveDocument.Sentences
For Each rThisSentenceToSearch In sTheseSentences
Set rThisSentenceResult = rThisSentenceToSearch.Duplicate
rThisSentenceResult.Select
Do
DoEvents
Set myMatches = myRegExp.Execute(rThisSentenceResult)
If myMatches.Count > 0 Then
rThisSentenceResult.Start = rThisSentenceResult.Start + myMatches(0).FirstIndex
rThisSentenceResult.End = rThisSentenceResult.Start + myMatches(0).Length
rThisSentenceResult.Select
Selection.Comments.Add Range:=Selection.Range
Selection.TypeText Text:=tComment & "{" & zPhrase & "}"
rThisSentenceResult.Start = rThisSentenceResult.Start + 1 'so as not to find the same phrase again and again
rThisSentenceResult.End = rThisSentenceToSearch.End
rThisSentenceResult.Select
End If 'If myMatches.Count > 0 Then
Loop While myMatches.Count > 0
Next 'For Each rThisSentenceToSearch In sTheseSentences
End Function
Relying on Range.Start or Range.End for position in a Word document is not reliable due to how Word stores non-printing information in the text flow. For some kinds of things you can work around it using Range.TextRetrievalMode, but the non-printing characters inserted by Comments aren't affected by these settings.
I must admit I don't understand why Word's built-in Find with wildcards won't work for you - no case matching shouldn't be a problem. For instance, based on the example: "Never has there been, never, NEVER, a total drought.":
FindText:="[n,N][e,E][v,V][e,E][r,R]"
Will find all instances of n-e-v-e-r regardless of the capitalization. The brackets let you define a range of values, in this case the combination of lower and upper case for each letter in the search term.
The workarounds described in my MSDN post you link to are pretty much all you can if you insist on RegEx:
Using the Office Open XML (or possibly Word 2003 XML) file format will let you use RegEx and standard XML processing tools to find the information, add comment "tags" into the Word XML, close it all up... And when the user sees the document it will all be there.
If you need to be doing this in the Word UI a slightly different approach should work (assuming you're targeting Word 2003 or later): Work through the document on a range-by-range basis (by paragraph, perhaps). Read the XML representation of the text into memory using the Range.WordOpenXML property, perform the RegEx search, add comments as WordOpenXML, then write the WordOpenXML back into the document using the InserXml method, replacing the original range (paragraph). Since you'd be working with the Paragraph object Range.Start won't be a factor.