Regex: split to the last occurence of path - c++

I want to split up an UNC-path for hostname, shared folder, path, filename and extension. I almost got it, but the last sequence is somehow wrong because I didn't get the filenaem correctly.
e.g.
//host/shared/path1/path2/path3/filename.pdf
should be split up to:
host
shared
path1/path2/path3
filename
pdf
But at the moment I get something like this:
host
shared
path1/path2/path3/filenam
e
pdf
using this regex:
std::regex rgx("\/\/(\\w+?){1,1}\/(\\w+?)\/([\\w\/]+)([^\\.])\\.(.+)$");
So what is wrong with it and how can I solve it?

You want to remove the group "([^\\.])" as the following "\\." matches the period at the end. You also want another word group to match the file name itself that is followed by the period like so:
std::regex rgx("\/\/(\\w+?){1,1}\/(\\w+?)\/([\\w\/]+)\/([\\w]+)\\.(.+)$");
https://regex101.com/r/yK4zH1/4

Related

Regex to pull value from middle of file path

I am trying to figure out how to pull the following string out of a folder path... I want to pull COMPANY_NAME from the below folder path. Is there a way to use REGEX to pull string between 2nd and 3rd backslash?
Example:
\10.20.3.23\S$\COMPANY_NAME\Main_5e08a942f39a430db0b081736a3f1881\C_VOL-b002.spf
Try this (?(DEFINE)(?<urlPart>[^\\\s]+))\\\\(?&urlPart)\\(?&urlPart)\\\K(?&urlPart) demo
It will match the desired part of the URL you are after. Things to note:
The url does not need to start at the beginning of the string (if you require this add ^ after the define group)
It will match many urls in the same string
It will match even if there is no file name
White space will invalidate the match
See the demo for details
If you were wondering it uses subroutine definitions to reuse parts of the regex.

Need regex to strip away remaing part of a path

I am trying to write a regex which will strip away the rest of the path after a particular folder name.
If Input is:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
Output should be:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
Some constrains:
ChangePack- will be followed change pack id which is a mix of numbers or alphabets a-z or A-Z only in any order. And there is no limit on length of change pack id.
ChangePack- is a constant. It will always be there.
And the text before the ChangePack can also change. Like it can also be:
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
My regex-fu is bad. What I have come up with till now is:
^(.*?)\-6a7B6
I need to make this generic.
Any help will be much appreciated.
Below regex can do the trick.
^(.*?ChangePack-[\w]+)
Input:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
Output:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6
Check out the live regex demo here.
^(.*?ChangePack-[a-zA-Z0-9]+)
Try this.Instead of replace grab the match $1 or \1.See demo.
https://regex101.com/r/iY3eK8/17
Will you always have '/Repository/Framework/PITA/branches/' at the beginning? If so, this will do the trick:
/Repository/Framework/PITA/branches/\w+-\w*
Instead of regex you could can use split and join functions. Example python:
path = "/a/b/c/d/e"
folders = path.split("/")
newpath = "/".join(folders[:3]) #trims off everything from the third folder over
print(newpath) #prints "/a/b"
If you really want regex, try something like ^.*\/folder\/ where folder is the name of the directory you want to match.

How to replace a string of digits with a padded version of that string using regular expression?

I have a program that searches through a directory loading files into DB, matching filenames with ID field in DB, using regex to search for patterns.
DB contains ID(##AAA######, ex 14ABC000123) while filename usually contains ##AAA### (ex 14ABC123), and I need regex that would match these two, returning full ID from filename. Until now I devised
([0-9]{2})([A-Z]{3})([0-9]{1,6})
but when returning $1$2$3 to the matcher it misfires saying that 14ABC123 != 14ABC000123. Please help.
It seems you need to get rid of zeroes. Try this:
([0-9]{2})([A-Z]{3})0*?([1-9]{1,6})

Regex remove last dot from string in Yahoo Pipes

I have a couple of strings that end with a dot (.) at the end of the sentence which I need to remove in Yahoo Pipes.
Example:
example.com.
companywebsite.co.uk.
anothersite.co.
I've tried the following from a couple of posts here on SO but none have worked yet
/\.$/
or
^(.*)\\.(.*)$","$1!$2
Neither of these options have worked
I have tried a very simple find of
.com. and replace with .com
and
.co. to replace with .co
But the latter affects .com as well which is not ideal
EDIT: Here is a visual of what my pipe looks like.
If you can do something like this: ^(.*)\\.(.*)$","$1!$2, then doing this should work: "^(.+?)\.?$", $1. This should match the first part of the URL and leave out the period at the end, should it exist.
EDIT:
As per your image, you should place this: ^(.+?)\.?$ in your replace field and this: $1 in your with field. I do not know if you need to do any escaping, so you might have to use ^(.+?)\\.?$ instead of ^(.+?)\.?$.

Regex Assistance for a url filepath

Can someone assist in creating a Regex for the following situation:
I have about 2000 records for which I need to do a search/repleace where I need to make a replacement for a known item in each record that looks like this:
<li>View Product Information</li>
The FILEPATH and FILE are variable, but the surrounding HTML is always the same. Can someone assist with what kind of Regex I would substitute for the "FILEPATH/FILE" part of the search?
you may match the constant part and use grouping to put it back
(<li>View Product Information</li>)
then you should replace the string with $1your_replacement$2, where $1 is the first matching group and $2 the second (if using python for instance you should call Match.group(1) and Match.group(2))
You would have to escape \ chars if you're using Java instead.