Remove " and =+ from cells in Google Sheets - regex

I have cells like
"Apple"
=+Organe +is +good
"Mango"
I want to remove all the characters i.e. = and " and +
Tried =SUBSTITUTE(C3,"+" ,"",1) but didnt work
I am using Google Sheets and can't use Excel (in MAC)

If you know which and how many characters should be removed N nested substitutes would work. Just make sure that the input cell is not an error:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"="," "),"+"," "),"""","")
If you need to exclude anything but the alphabet characters A to Z and the numbers from 0 to 9, VBA + RegEx come into power:
Public Function RemoveNonAlphabetChars(inputString As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Global = True
.Pattern = "[^a-zA-Z0-9]"
.ignoreCase = True
Set inputMatches = .Execute(inputString)
If regEx.test(inputString) Then
RemoveNonAlphabetChars = .Replace(inputString, vbNullString)
Else
RemoveNonAlphabetChars = inputString
End If
End With
End Function

try:
=ARRAYFORMULA(REGEXREPLACE(A1:A, "[=\+""]", ))
or if you want to use SUBSTITUTE then:
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1:A, "=", ), "+", ), """", ))

Related

How to extract phone number and split to column in Google sheet?

I am trying to extract phone numbers in this format (123) 456-7890 and put each number in a separate column with format 1234567890 with no space, dash or parenthesis.
I am able to achieve this in Excel using VBA code below from another StackOverflow question, but not able to get it working on Google sheet
Sub ewqre()
Dim str As String, n As Long, rw As Long
Dim rgx As Object, cmat As Object, ws As Worksheet
Set rgx = CreateObject("VBScript.RegExp")
Set ws = Worksheets("Sheet4")
With rgx
.Global = True
.MultiLine = True
'phone number pattern is: ###-###-####
.Pattern = "/^(\d{3})(\d{3})(\d{4})$/"
For rw = 2 To ws.Cells(Rows.Count, "A").End(xlUp).Row
str = ws.Cells(rw, "A").Value2
If .Test(str) Then
Set cmat = .Execute(str)
'populate the worksheet with the matches
For n = 0 To cmat.Count - 1
ws.Cells(rw, Columns.Count).End(xlToLeft).Offset(0, 1) = cmat.Item(n)
Next n
End If
Next rw
End With
Set rgx = Nothing: Set ws = Nothing
End Sub
=ARRAYFORMULA(IFERROR(REGEXEXTRACT(TO_TEXT(SPLIT(
REGEXREPLACE(A2:A, "\(|\)|\-| ", ""), CHAR(10))), "\d+")))
or:
=ARRAYFORMULA(REGEXREPLACE(REGEXEXTRACT(SPLIT(A2, CHAR(10)),
"\((.*)\(.T"),"\)|\s|\-", ""))
true array formula would be:
=ARRAYFORMULA(IFERROR(REGEXREPLACE(REGEXEXTRACT(SPLIT(A2:A, CHAR(10)),
"\((.*)\(.T"),"\)|\s|\-", "")))

How to extract ad sizes from a string with excel regex

I am trying to extract ad sizes from string. The ad sizes are all set standard sizes. So while I'd prefer to have a regex that looks for a pattern, IE 3 numbers followed by 2 or 3 numbers, hard coding it will also work, since we know what the sizes will be. Here's an example of some of the ad sizes:
300x250
728x90
320x50
I was able to find some VBScript that I modified that almost works, but because my strings that I'm searching are inconsistent, it's pulling too much in some cases. For example:
You see how it's not matching correctly in every instance.
The VB code I found is actually matching everything EXCEPT that ad sizes. I don't know enough about VBScript to reverse it to just look for ad sizes and pull them. So instead it looks for all other text and removes it.
The code is below. Is there a way to fix the Regex so that it just returns the ad sizes?
Function getAdSize(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String
strPattern = "([^300x250|728x90])"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
getAdSize = regEx.Replace(strInput, strReplace)
Else
getAdSize = "Not matched"
End If
End If
End Function
NOTE, THE DATA IS NOT ALWAYS PRECEDED BY AN UNDERSCORE, SOMETIMES IT IS A DASH OR A SPACE BEFORE AND AFTER.
EDIT: Since it's not actually underscore delimited we can't use Split. We can however iterate over the string and extract the "#x#" manually. I have updated the code to reflect this and verified that it works successfully.
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
If IsNumeric(Temp) Then
Ad = Ad & Temp
Else
If Temp = "x" Then
Ad = Ad & Temp
Else
If Ad Like "*#x#*" Then
ExtractAdSize = Ad
Exit Function
Else
Ad = vbNullString
End If
End If
End If
Next i
End If
End Function
Alternate version of the same function using Select Case boolean logic instead of nested If statements:
Public Function ExtractAdSize(ByVal arg_Text As String) As String
Dim i As Long
Dim Temp As String
Dim Ad As String
If arg_Text Like "*#x#*" Then
For i = 1 To Len(arg_Text) + 1
Temp = Mid(arg_Text & " ", i, 1)
Select Case Abs(IsNumeric(Temp)) + Abs((Temp = "x")) * 2 + Abs((Ad Like "*#x#*")) * 4
Case 0: Ad = vbNullString 'Temp is not a number, not an "x", and Ad is not valid
Case 1, 2, 5: Ad = Ad & Temp 'Temp is a number or an "x"
Case 4, 6: ExtractAdSize = Ad 'Temp is not a number, Ad is valid
Exit Function
End Select
Next i
End If
End Function
I have managed to make about 95% of the required answer - the RegEx below will remove the DDDxDD size and would return the rest.
Option Explicit
Public Function regExSampler(s As String) As String
Dim regEx As Object
Dim inputMatches As Object
Dim regExString As String
Set regEx = CreateObject("VBScript.RegExp")
With regEx
.Pattern = "(([0-9]+)x([0-9]+))"
.IgnoreCase = True
.Global = True
Set inputMatches = .Execute(s)
If regEx.test(s) Then
regExSampler = .Replace(s, vbNullString)
Else
regExSampler = s
End If
End With
End Function
Public Sub TestMe()
Debug.Print regExSampler("uni3uios3_300x250_ASDF.html")
Debug.Print regExSampler("uni3uios3_34300x25_ASDF.html")
Debug.Print regExSampler("uni3uios3_8x4_ASDF.html")
End Sub
E.g. you would get:
uni3uios3__ASDF.html
uni3uios3__ASDF.html
uni3uios3__ASDF.html
From here you can continue trying to find a way to reverse the display.
Edit:
To go from the 95% to the 100%, I have asked a question here and it turns out that the conditional block should be changed to the following:
If regEx.test(s) Then
regExSampler = InputMatches(0)
Else
regExSampler = s
End If
This formula could work if it's always 3 characters, then x, and it's always between underscores - adjust accordingly.
=iferror(mid(A1,search("_???x*_",A1)+1,search("_",A1,search("_???x*_",A1)+1)-(search("_???x*_",A1)+1)),"No match")

How to replace Numbers in Parentheses with some calculations in MS Word

I have a problem to replace some serial number such as [30] [31] [32]... to [31] [32] [33]... in MS word when I insert a new references in the middle of article. I have not found a solution way in GUI so I try to use VBA to do that replacement. I find a similar problem in stack overflow:
MS Word Macro to increment all numbers in word document
However, this way is a bit inconvenient because it have to generate some replacement array in other place. Can I make that replacement with regex and some function in MS Word VBA like code below?
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
End With
'How to do some calculations with $1?
Selection.Text = regExp.Replace(Selection.Text, "[$1]")
End Sub
But I don't know how to do some calculations with $1 in regExp? I have try use "[$1+1]" but it return [31+1] [32+1] [33+1]. Can anyone help? Thanks!
It is impossible to pass a callback function to the RegExp.Replace, so you have the only option: use RegExp.execute and process matches in a loop.
Here is an example code for your case (I took a shortcut since you only have the value to modify inside known delimiters, [ and ].)
Sub replaceWithregExp()
Dim regExp As Object
Dim regx, S$, Strnew$
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})]"
.Global = True
End With
'How to do some calculations with $1?
' Removing regExp.Replace(Selection.Text, "[$1]")
For Each m In regExp.Execute(Selection.Text)
Selection.Text = Left(Selection.Text, m.FirstIndex+1) _
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) _
& Mid(Selection.Text, m.FirstIndex + Len(m.Value))
Next m
End Sub
Here,
Selection.Text = Left(Selection.Text, m.FirstIndex+1) - Get what is before
& Replace(m.Value, m.Value, CStr(CInt(m.Submatches(0)) + 10)) - Add 10 to the captured number
& Mid(Selection.Text, m.FirstIndex + Len(m.Value)) - Append what is after the capture
That should do the trick :
Sub IncrementWithRegex()
Dim Para As Paragraph
Set Para = ThisDocument.Paragraphs.First
Dim ParaNext As Paragraph
Dim oRange As Range
Set oRange = Para.Range
Dim regEx As New RegExp
Dim regMatch As Variant
Dim ACrO As String
With regEx
.Global = True
.MultiLine = False
.IgnoreCase = False
.Pattern = "[\[]([0-9]{2})[\]]"
End With
Do While Not Para Is Nothing
Set ParaNext = Para.Next
Set oRange = Para.Range
'Debug.Print oRange.Text
If regEx.test(oRange.Text) Then
For Each regMatch In regEx.Execute(oRange.Text)
oRange.Text = _
Left(oRange.Text, _
InStr(1, oRange.Text, CStr(regMatch))) & _
CDbl(regMatch) + 1 & _
Right(oRange.Text, _
Len(CStr(regMatch)) + InStr(1, oRange.Text, CStr(regMatch)))
Next regMatch
Else
End If
Set Para = ParaNext
Loop
End Sub
To use this, remember to add the reference :
Description: Microsoft VBScript Regular Expressions 5.5
FullPath: C:\windows\SysWOW64\vbscript.dll\3
Major.Minor: 5.5
Name: VBScript_RegExp_55
GUID: {3F4DACA7-160D-11D2-A8E9-00104B365C9F}
Here is a simple VBA macro you can use to achieve this :
Sub IncrementNumbers()
Dim regExp As Object
Dim i As Integer
Dim fullMatch As String
Dim subMatch As Integer
Dim replacement As String
Const TMP_PREFIX As String = "$$$"
Set regExp = CreateObject("vbscript.regexp")
With regExp
.Pattern = "\[([0-9]{2})\]"
.Global = True
.MultiLine = True
End With
'Ensure selected text match our regex
If regExp.test(Selection.Text) Then
'Find all matches
Set matches = regExp.Execute(Selection.Text)
' Start from last match
For i = 0 To (matches.Count - 1)
fullMatch = matches(i).Value
subMatch = CInt(matches(i).SubMatches(0))
'Increment by 1
subMatch = subMatch + 1
'Create replacement. Add a temporary prefix so we ensure [30] replaced with [31]
'will not be replaced with [32] when [31] will be replaced
replacement = "[" & TMP_PREFIX & subMatch & "]"
'Replace full match with [subMatch]
Selection.Text = Replace(Selection.Text, fullMatch, replacement)
Next
End If
'Now replacements are complete, we have to remove replacement prefix
Selection.Text = Replace(Selection.Text, TMP_PREFIX, "")
End Sub

Using regex, but it does not remove html tags in excel sheet

I trying to remove the html tags using the code below but it does not do anything. I have tried running it again but to no use. Could someone please advise what could be the issue. Attaching a snapshot of the file with html tags. For info, the data comes from the MS Access and the MS Access links to Sharepoint lists.
Sub Import_AccessData()
Dim strtKeyMsgRange As Range
Dim KeyMsgRange As Range
Dim KeyMsgRangeCell As Range
Dim endKeyMsgRangeCell As Range
Set strtKeyMsgRange = Range("B2")
Set endKeyMsgRange = Range("AC13")
Set KeyMsgRange = Range(strtKeyMsgRange, endKeyMsgRange)
For Each KeyMsgRangeCell In KeyMsgRange
a = StripHTML(KeyMsgRangeCell)
KeyMsgRangeCell.Value = a
Next KeyMsgRangeCell
End Sub
Public Function StripHTML(cell As Range) As String
Dim RegEx As Object
Set RegEx = CreateObject("vbscript.regexp")
Dim sInput As String
Dim sOut As String
sInput = cell.Value
With RegEx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "<[^>]+>" 'Regular Expression for HTML Tags.
.Pattern = " "
.Pattern = "&"
End With
sOut = RegEx.Replace(sInput, "")
StripHTML = sOut
Set RegEx = Nothing
End Function
You are setting the Pattern property multiple times, it will only retain the last value assigned (&).
You need to use 3 regular expressions("<[^>]+>" => "", " " => " ", "&" => "&"), or one expression that matches all of your inputs ("(&)|( )|(<[^>]+>)" => "") to actually do this.

Using regexp in Excel can I perform some arithmetic on the matched pattern before replacing the matched string?

I am using `VBscript.RegExp`` to find and replace using a regular expression. I'm trying to do something like this:
Dim regEx
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = "ID_(\d{3})"
regEx.IgnoreCase = False
regEx.Global = True
regEx.Replace(a_cell.Value, "=HYPERLINK(A" & CStr(CInt("$1") + 2) )
I.e. I have cells which contain things like ID_006 and I want to replace the contents of such a cell with a hyperlink to cell A8. So I match the three digits, and then want to add 2 to those digits to get the correct row to hyperlink to.
But the CStr(CInt("$1") + 2) part doesn't work. Any suggestions on how I can make it work?
Ive posted given these points
you should test for a valid match before trying a replace
from your current code the Global is redundant as you can add 1 hyerplink (1 match) to a cell
your current code will accept a partial string match, if you wanted to avoid ID_9999 then you match the entire string using ^ and $. This version runs me, you can revert to your current pattern with .Pattern = "ID_(\d{3})"
Normally when adding a hyperlink a visible address is needed. The code beloe does this (with the row manipulation in one shot)
The code below runs at A1:A10 (sample shown dumping to B1:B10 for pre and post coede)
Sub ParseIt()
Dim rng1 As Range
Dim rng2 As Range
Dim regEx
Set rng1 = Range([a1], [a10])
Set regEx = CreateObject("VBScript.RegExp")
With regEx
'match entire string
.Pattern = "^ID_(\d{3})$"
'match anywhere
' .Pattern = "ID_(\d{3})"
.IgnoreCase = False
For Each rng2 In rng1
If .test(rng2.Value) Then
'use Anchor:=rng2.Offset(0, 1) to dump one column to the right)
ActiveSheet.Hyperlinks.Add Anchor:=rng2, Address:="", SubAddress:= _
Cells(.Replace(rng2.Value, "$1") + 2, rng2.Column).Address, TextToDisplay:=Cells(.Replace(rng2.Value, "$1") + 2, rng2.Column).Address
End If
Next
End With
End Sub
This is because: "=HYPERLINK(A" & CStr(CInt("$1") + 2) is evaluated once, when the code is executed, not once for every match.
You need to capture & process the match like this;
a_cell_Value = "*ID_006*"
Set matches = regEx.Execute(a_cell_Value)
Debug.Print "=HYPERLINK(A" & CLng(matches(0).SubMatches(0)) + 2 & ")"
>> =HYPERLINK(A8)
Or if they are all in ??_NUM format;
a_cell_Value = "ID_11"
?"=HYPERLINK(A" & (2 + val(mid$(a_cell_Value, instr(a_cell_Value,"_") +1))) & ")"
=HYPERLINK(A13)
The line -
regEx.Replace(a_cell.Value, "=HYPERLINK(A" & CStr(CInt("$1") + 2) )
won't work as VBA will try to do a CInt on the literal string "$1" rather than on the match from your RegEx.
It would work if you did your replace in 2 steps, something like this -
Dim a_cell
a_cell = Sheets(1).Cells(1, 1)
Dim regEx
Set regEx = CreateObject("VBScript.RegExp")
regEx.Pattern = "ID_(\d{3})"
regEx.IgnoreCase = False
regEx.Global = True
a_cell = regEx.Replace(a_cell, "$1")
Sheets(1).Cells(1, 1) = "=HYPERLINK(A" & CStr(CInt(a_cell) + 2) & ")"