How to remove a string between certain slashes regex or excel - regex

I'm looking for a way to remove string after a 3rd and a 4th forward slash
E.g http://www.website.com/content/remove-this/product
to http://www.website.com/content/product
I can use notepad++, regex or excel
I tried using
/.*?/(.*?)/
but that didn't work

Try using Notepad++ with "Replace" and using expression
^(.*://)([^/]*/)([^/]*/)([^/]*/)(.*)$
and replace with
$1$2$3$5

For the answers using Excel:
Formula
=LEFT(A1,FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),4)))&MID(A1,1+FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),5)),99)
UDF (using regex)
Option Explicit
Function Remove4th(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.Pattern = "^((?:.*?/){4})[^/]*/"
.MultiLine = True
Remove4th = .Replace(S, "$1")
End With
End Function

I would do somehting like this:
<?php
$string = " http://www.website.com/content/remove-this/product";
preg_match_all('#http:\/\/([a-zA-Z0-9-.]*)\/([a-zA-Z0-9-]*)\/([a-zA-Z0-9-]*)\/([a-zA-Z0-9-]*)#ism',$string,$out);
$new_string = 'http://'.$out[1][0].'/'.$out[4][0];
echo $new_string;
// => http://www.website.com/content
?>

Related

Excel Regex Add 2 spaces instead of one

I'm using the below function in Excel to split the caps of some data. How can I adapt it to add 2 spaces between words e.g Mike Jones rather than just one as it does now. Simple answer I'm sure but RegEx baffles me at the best of times.
Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.Pattern = "([a-z])([A-Z])"
SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function
very very simple: add an extra space between the 2 regex groups $1 and $2
SplitCaps = .Replace(strIn, "$1 $2")
I think all you need is
([a-zA-Z]*\s*[a-zA-Z]*)*
Have you tried add '\s'? This should be a comment but I cant comment as of now.
try:
You can add extra white space using \s:
SplitCaps = .Replace(strIn, "$1\s\s$2")

Find specific instance of a match in string using RegEx

I am very new to RegEx and I can't seem to find what I looking for. I have a string such as:
[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]
and I want to get everything within the first set of brackets as well as the second set of brackets. If there is a way that I can do this with one pattern so that I can just loop through the matches, that would be great. If not, thats fine. I just need to be able to get the different sections of text separately. So far, the following is all I have come up with, but it just returns the whole string minus the first opening bracket and the last closing bracket:
[\[-\]]
(Note: I'm using the replace function, so this might be the reverse of what you are expecting.)
In my research, I have discovered that there are different RegEx engines. I'm not sure the name of the one that I'm using, but I'm using it in MS Access.
If you're using Access, you can use the VBScript Regular Expressions Library to do this. For example:
Const SOME_TEXT = "[cmdSubmitToDatacenter_Click] in module [Form_frm_bk_UnsubmittedWires]"
Dim re
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\[([^\]]+)\]"
Dim m As Object
For Each m In re.Execute(SOME_TEXT)
Debug.Print m.Submatches(0)
Next
Output:
cmdSubmitToDatacenter_Click
Form_frm_bk_UnsubmittedWires
Here is what I ended up using as it made it easier to get the individual values returned. I set a reference to the Microsoft VBScript Regular Expression 5.5 so that I could get Intellisense help.
Public Sub GetText(strInput As String)
Dim regex As RegExp
Dim colMatches As MatchCollection
Dim strModule As String
Dim strProcedure As String
Set regex = New RegExp
With regex
.Global = True
.Pattern = "\[([^\]]+)\]"
End With
Set colMatches = regex.Execute(strInput)
With colMatches
strProcedure = .Item(0).submatches.Item(0)
strModule = .Item(1).submatches.Item(0)
End With
Debug.Print "Module: " & strModule
Debug.Print "Procedure: " & strProcedure
Set regex = Nothing
End Sub

Create clean URL from text in Excel

I want to create a clean URL from a text such as this one:
Alpha Tests' Purchase of Berta Global Associates (C)
The URL should look like this:
alpha-tests-purchase-of-berta-global-associates-c
Currently I use this formula in Excel:
=LOWER(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A38;"--";"-");" / ";"-");" ";"-");": ";"-");" - ";"-");"_";"-");"?";"");",";"");".";"");"'";"");")";"");"(";"");":";"");" ";"-");"&";"and");"!";"");"/";"-");"""";""))
However, I don't seem to catch all special symbols etc. and as a consequence my URLs are not as clean as I want them to be.
Do you know an Excel formula or VBA code, which ensures that all special symbols are properly converted to a clean URL?
Thank you.
I can suggest the following Function that you can put into a VBA module and use a normal formula:
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w-]+"
With regEx
.Global = True
.Pattern = strPattern
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(cell.Value, " ", "-"), ""))
End Function
The point is that we replace all spaces with hyphens at the beginning, then use a regex that matches any non-word and non-hyphen characters and remove them with RegExp.Replace.
UPDATE:
After your comments, it is still unclear what you want to do with Unicode letters. Delete or replace with hyphen. Here is a function that I tried to rebuild from your formula, but the logics may be flawed. I would prefer a generic approach above.
Function NormalizeToUrl(cell As Range)
Dim strPattern As String
Dim regEx As Object
Set regEx = CreateObject("vbscript.regexp")
strPattern = "[^\w -]"
With regEx
.Global = True
.Pattern = "[?,.')(:!""]+" ' THESE ARE REMOVED
End With
NormalizeToUrl = regEx.Replace(cell.Value, "")
NormalizeToUrl = Replace(NormalizeToUrl, "&", "and") ' & TURNS INTO "and"
With regEx
.Global = True
.Pattern = strPattern ' WE REPLACE ALL NON-WORD CHARS WITH HYPHEN
End With
NormalizeToUrl = LCase(regEx.Replace(Replace(NormalizeToUrl, " ", "-"), "-"))
With regEx
.Global = True
.Pattern = "--+" ' WE SHRINK ALL HYPHEN SEQUENCES TO SINGLE HYPHEN
End With
NormalizeToUrl = regEx.Replace(NormalizeToUrl, "-")
End Function

Extract a string on a text file using VBS

Okay so I have this file sample.txt
("checkAssdMobileNo1".equals(ACTION)
("checkAssdMobileNo2".equals(ACTION)
("checkAssdMobileNo3".equals(ACTION)
("checkAssdMobileNo4".equals(ACTION)
("checkAssdMobileNo5".equals(ACTION)
("checkAssdMobileNo6".equals(ACTION)
How can I output only these:
checkAssdMobileNo1
checkAssdMobileNo2
checkAssdMobileNo3
checkAssdMobileNo4
checkAssdMobileNo5
checkAssdMobileNo6
I tried using the following code but it would not output anything and I couldn't figure out what I did wrong:
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set file = objFSO.OpenTextFile("sample.txt" , ForReading)
Const ForReading = 1
Dim re
Set re = new regexp
re.Pattern = """\w+?""[.]equals(ACTION)"
re.IgnoreCase = True
re.Global = True
Dim line
Do Until file.AtEndOfStream
line = file.ReadLine
For Each m In re.Execute(line)
Wscript.Echo m.Submatches(0)
Next
Loop
Your regular expression is close, but missing 2 things:
You need to escape the parentheses surrounding ACTION
You need to use unescaped parentheses to extract the group between the quotes
Something like this should work:
re.Pattern = """(\w+?)""[.]equals\(ACTION\)"
Regex you need is
\("(\w+)"
Demo on regex101
It uses the concept of Group Capture

vbscript multiple replace regex

How do you match more than one pattern in vbscript?
Set regEx = New RegExp
regEx.Pattern = "[?&]cat=[\w-]+" & "[?&]subcat=[\w-]+" // tried this
regEx.Pattern = "([?&]cat=[\w-]+)([?&]subcat=[\w-]+)" // and this
param = regEx.Replace(param, "")
I want to replace any parameter called cat or subcat in a string called param with nothing.
For instance
string?cat=meow&subcat=purr or string?cat=meow&dog=bark&subcat=purr
I would want to remove cat=meow and subcat=purr from each string.
regEx.Pattern = "([?&])(cat|dog)=[\w-]+"
param = regEx.Replace(param, "$1") ' The $1 brings our ? or & back
Generally, OR in regex is a pipe:
[?&]cat=[\w-]+|[?&]subcat=[\w-]+
In this case, this will also work: making sub optional:
[?&](sub)?cat=[\w-]+
Another option is to use or on the not-shared parts:
[?&](cat|dog|bird)=[\w-]+