Excel Regex Add 2 spaces instead of one - regex

I'm using the below function in Excel to split the caps of some data. How can I adapt it to add 2 spaces between words e.g Mike Jones rather than just one as it does now. Simple answer I'm sure but RegEx baffles me at the best of times.
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function

very very simple: add an extra space between the 2 regex groups $1 and $2
SplitCaps = .Replace(strIn, "$1 $2")

I think all you need is
([a-zA-Z]*\s*[a-zA-Z]*)*

Have you tried add '\s'? This should be a comment but I cant comment as of now.
try:
You can add extra white space using \s:
SplitCaps = .Replace(strIn, "$1\s\s$2")

Related

Add a new line or space depending on pattern

I am trying to do the following.
Patterns:
aaaaa.BBBBB - to add New Line after the (.)
aaaaaBBBBB - to add New Line when see a Caps letter.
aaaaa12345 - to add a space when there is a digit (Output: aaaaa 12345)
12345aaaaa - to add a space when there is a letter after the digit (Output: 12345 aaaaa)
Values:
Client asked about the 21year planPlease follow up at1234567
ReGex code need to the following:
Client asked about the 21 (space) year plan**(new line)** Please
follow up at (space) 1234567
Result:
Client asked about the 21 year plan
Please follow up at 1234567.
How do I recognize the pattern and also do a specific replacement be it adding (space) or (newline)?
Here is the code I use currently:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z0-9])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
You can use two regex replacements. The first one to add spaces between 0a and a0 (between a number and a lowercase letter), and a second to create the newline between aA and a.A.
([a-z])([0-9])|([0-9])([a-z]) and replace with $1$3 $2$4
([a-z])\.?([A-Z]) and replace with $1\n$2
If you want a period added at the end use $ and replace with \.
Try this code:
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
Dim result As String
With objRegex
.Global = True
.Pattern = "([a-z])([0-9])|([0-9])([a-z])"
result = .Replace(strIn, "$1$3 $2$4")
End With
With objRegex
.Global = True
.Pattern = "([a-z])\.?([A-Z])"
result = .Replace(result, "$1\n$2")
End With
SplitCaps = result
End Function

Merge Three Regexes into One (or Two)

I would like to merge my three regexes which clean text (empty lines, leading and trailing spaces etc.) into, if possible, one regex, or if it is not possible - into two.
My first regex is [ \t]+. It does this sort of cleaning.
My second regex is ^(?:[\t ]*(?:\r?\n|\r))+ Not image included since it won't catch anything if the previous regex has not run.
The third regex is ^[\s\xA0]+|[\s\xA0]+$. It does this sort of cleaning.
EDIT: I have forgotten to mention that in each case I replace match with nothing "".
EDIT 2: I use the following code in Word:
With selection
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Global = True
RegEx.MultiLine = True
' clean selection
RegEx.Pattern = "[ \t]+"
.Text = RegEx.Replace(.Text, " ")
RegEx.Pattern = "^(?:[\t ]*(?:\r?\n|\r))+"
.Text = RegEx.Replace(.Text, "")
' the following is from http://stackoverflow.com/a/24049145/2657875
RegEx.Pattern = "^[\s\xA0]+|[\s\xA0]+$"
.Text = RegEx.Replace(.Text, "")
End With
The last regexps can be merged as
RegEx.Pattern = "^(?:[\t ]*(?:\r?\n|\r)?)*|[ \t]+$"
I do not think there can be a chance to merge all 3 in VBA since you are using two different replacement patterns.
If i am not wrong, you want all your lines/spaces/tabs/white lines to be matched and removed, so you could merge the input strings. Well, that's easy and can be done if you do use the following regex in your replace program/script/command:
/([\s\t]{0,50}\r?\n)+|\s+/s
The regex should work well on windows as well as linux based files.
Not pro but I use multiple regex one after another. If you are not familiar with below code than you should try.
Set regEx_ = new regExp
With regEx_
.Global = True
.MultiLine = True
.IgnoreCase = True
.Pattern = "Pattern 1"
TextLine = regEx_.replace(TextLine, "")
.Pattern = "Pattern 2"
TextLine = regEx_.replace(TextLine, "")
'and so on
End With

How to remove a string between certain slashes regex or excel

I'm looking for a way to remove string after a 3rd and a 4th forward slash
E.g http://www.website.com/content/remove-this/product
to http://www.website.com/content/product
I can use notepad++, regex or excel
I tried using
/.*?/(.*?)/
but that didn't work
Try using Notepad++ with "Replace" and using expression
^(.*://)([^/]*/)([^/]*/)([^/]*/)(.*)$
and replace with
$1$2$3$5
For the answers using Excel:
Formula
=LEFT(A1,FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),4)))&MID(A1,1+FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),5)),99)
UDF (using regex)
Option Explicit
Function Remove4th(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "^((?:.*?/){4})[^/]*/"
.MultiLine = True
Remove4th = .Replace(S, "$1")
End With
End Function
I would do somehting like this:
<?php
$string = " http://www.website.com/content/remove-this/product";
preg_match_all('#http:\/\/([a-zA-Z0-9-.]*)\/([a-zA-Z0-9-]*)\/([a-zA-Z0-9-]*)\/([a-zA-Z0-9-]*)#ism',$string,$out);
$new_string = 'http://'.$out[1][0].'/'.$out[4][0];
echo $new_string;
// => http://www.website.com/content
?>

Remove tweet regular expressions from string of text

I have an excel sheet filled with tweets. There are several entries which contain #blah type of strings among other. I need to keep the rest of the text and remove the #blah part. For example: "#villos hey dude" needs to be transformed into : "hey dude". This is what i ve done so far.
Sub Macro1()
'
' Macro1 Macro
'
Dim counter As Integer
Dim strIN As String
Dim newstring As String
For counter = 1 To 46
Cells(counter, "E").Select
ActiveCell.FormulaR1C1 = strIN
StripChars (strIN)
newstring = StripChars(strIN)
ActiveCell.FormulaR1C1 = StripChars(strIN)
Next counter
End Sub
Function StripChars(strIN As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Pattern = "^#?(\w){1,15}$"
.ignorecase = True
StripChars = .Replace(strIN, vbNullString)
End With
End Function
Moreover there are also entries like this one: Ÿ³é‡ï¼Ÿã€€åˆã‚ã¦çŸ¥ã‚Šã¾ã—ãŸã€‚ shiftã—ãªãŒã‚‰ã‚¨ã‚¯ã‚¹ãƒ
I need them gone too! Ideas?
For every line in the spreadsheet run the following regex on it: ^(#.+?)\s+?(.*)$
If the line matches the regex, the information you will be interested in will be in the second capturing group. (Usually zero indexed but position 0 will contain the entire match). The first capturing group will contain the twitter handle if you need that too.
Regex demo here.
However, this will not match tweets that are not replies (starting with #). In this situation the only way to distinguish between regular tweets and the junk you are not interested in is to restrict the tweet to alphanumerics - but this may mean some tweets are missed if they contain any non-alphanumerical characters. The following regex will work if that is not an issue for you:
^(?:(#.+?)\s+?)?([\w\t ]+)$
Demo 2.

Create clean URL from text in Excel

I want to create a clean URL from a text such as this one:
Alpha Tests' Purchase of Berta Global Associates (C)
The URL should look like this:
alpha-tests-purchase-of-berta-global-associates-c
Currently I use this formula in Excel:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A38;"--";"-");" / ";"-");" ";"-");": ";"-");" - ";"-");"_";"-");"?";"");",";"");".";"");"'";"");")";"");"(";"");":";"");" ";"-");"&";"and");"!";"");"/";"-");"""";""))
However, I don't seem to catch all special symbols etc. and as a consequence my URLs are not as clean as I want them to be.
Do you know an Excel formula or VBA code, which ensures that all special symbols are properly converted to a clean URL?
Thank you.
I can suggest the following Function that you can put into a VBA module and use a normal formula:
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w-]+"
With regEx
.Global = True
.Pattern = strPattern
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(cell.Value, " ", "-"), ""))
End Function
The point is that we replace all spaces with hyphens at the beginning, then use a regex that matches any non-word and non-hyphen characters and remove them with RegExp.Replace.
UPDATE:
After your comments, it is still unclear what you want to do with Unicode letters. Delete or replace with hyphen. Here is a function that I tried to rebuild from your formula, but the logics may be flawed. I would prefer a generic approach above.
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w -]"
With regEx
.Global = True
.Pattern = "[?,.')(:!""]+" ' THESE ARE REMOVED
End With
NormalizeToUrl = regEx.Replace(cell.Value, "")
NormalizeToUrl = Replace(NormalizeToUrl, "&", "and") ' & TURNS INTO "and"
With regEx
.Global = True
.Pattern = strPattern ' WE REPLACE ALL NON-WORD CHARS WITH HYPHEN
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(NormalizeToUrl, " ", "-"), "-"))
With regEx
.Global = True
.Pattern = "--+" ' WE SHRINK ALL HYPHEN SEQUENCES TO SINGLE HYPHEN
End With
NormalizeToUrl = regEx.Replace(NormalizeToUrl, "-")
End Function