Is it possible to remove the slash in this matching? - regex

I want to extend my regexp for filepaths matching and I don't know how to do it even if I see the problem.
Innput example
"C://species/dinosaurs/trex.json"
Ouput example
["C://species/dinosaurs" "trex" "json"]
so that I have the folder path, the filename and the extension.
I also want the folder path to be optional
My regexp
I tried
"^(.*[\\\/])?(.*)\.(.*)$"
It outputs
["C://species/dinosaurs/" "trex" "json"]
Almost but I have the / at the end of the head
I so tried
"^((.*)[\\\/])?(.*)\.(.*)$"
I ouputs
["C://species/dinosaurs/" "C://species/dinosaurs" "trex" "json"]
Maybe better because I juste have to remove the first match whereas in the first case I have to post-process the string.
I see the problem because several / can exist in the body so that it is harder.
Is it possible to say that the end of the first matching group can be all but not /.
I tried
^(.*(?!\/))[\\\/]?(.*)\.(.*)$
Does not work. I just discovered negative assertions but the output is
["C://species/dinosaurs/trex" "json"]
Any clue ?

This one should suit your needs:
^(?:(.*)/)?([^/]+)\.([^.]+)$
Visualization by Debuggex

Related

Path Validation - My RegEx is matching leading spaces in directory names and I can't fix it

I'm back again with more RegEx shenanigans.
I had what I thought was a perfect windows path validation expression.
Here it is at Regex101: https://regex101.com/r/BertHu/6
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?[^\\\/:*?"<>|\r\n]+\\?)(?:[^\\\/:*?"<>|\r\n]+\\)*[^\\\/:*?"<>|\r\n]*$
Breakdown:
# (?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\| # Drive
# \\?[^\\\/:*?"<>|\r\n]+\\?) # Relative path
# (?:[^\\\/:*?"<>|\r\n]+\\)* # Folder
# [^\\\/:*?"<>|\r\n]* # File
The issue I'm having now, is that the expression is matching paths with leading spaces in directories.
Example: C:\ Leading Space\ Shouldnt Match is matching.
I tried adding [^\s] to the folder portion of the expression:
(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*
But that only invalidates a leading space in the first path segment:
C:\ LeadingSpace\ShouldntMatch Doesn't match (Good)
C:\LeadingSpace\ ShouldntMatch Matches incorrectly (Bad)
I think the problem lies here:
If anyone could help or point me in the right direction that would be great.
Sorry for all the RegEx questions!
Well it depends what the exact rules are, if I take your regex101 script, as basis, I would say:
File, Folder and Relative Folder, are more or less the same (if you ignore the no-capture group and the Backslashes):
\\?[^\\\/:*?"<>|\r\n]+\\?
(?:[^\\\/:*?"<>|\r\n]+\\)*
[^\\\/:*?"<>|\r\n]*
So there are three potenital places, where folders could start with a leading space.
You could add a [^\s] infront of all of them like this
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?([^\s][^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
I saved the modified regex101 https://regex101.com/r/Pd3lcR/1
Now it should work, at least for my limited testcases, and information about the restriction.
Btw.: I don't know what your use case is, but this regex is pretty long for a smiple matching and filename capture, may be there is a more readable way(for non regex people).
Update:
to fix the introduced Bug, I have to prevent the Share option matching with the relative path, by preventing a double slash with (?!\\)
^(?:(?:[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\|\\?((?!\\)[^\\\/:*?"<>|\r\n])+\\?)(?:[^\s][^\\\/:*?"<>|\r\n]+\\)*([^\s][^\\\/:*?"<>|\r\n])*$
here the updated regex: https://regex101.com/r/RMVkTC/3
Update (Version 2):
I rewrote the regex to the way I would create it. It is not perfectly optimized(short), but this way it is easier to test/bugfix.
The RegExp is exactly 3 parts, piped together:
Drive + path + folder/file: (^[a-z]:\\([^\s][^\\\/:*?"<>|\r\n]+\\)*[^\s][^\\\/:*?"<>|\r\n]+$)
relativepath + folder/file: (^(\.?\.?\\[^\\])*([^\\\/:*?"<>|\r\n]+\\?)*$)
Share + folder/file: (^\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\([^\s][^\\\/:*?"<>|\r\n]*\\?)*$)
like this, if you have to change something for one edge case, it is more contained and easy to adapt.
here the updated regex: https://regex101.com/r/Qxj3Ni/1

RegEx to match and select specific URLs

I’m on a website with these URLs;
https://flyheight.com/videos/ybb347
https://flyheight.com/videos/yb24os
https://flyheight.com/public/images/videos/793f77362f321e62c32659c3ab00952d.png
https://flyheight.com/videos/5o6t98/#disqus_thread
I need a RegEx that will only select these URLs instead
https://flyheight.com/videos/yb24os
https://flyheight.com/videos/ybb347
This is what I got so far ^(?!images$).*(flyheight.com/videos/).*
PCRE: ^https?:\/\/flyheight\.com\/videos\/[a-z0-9]{6}$
https://regex101.com/r/vM31MK/1
May be this will also work for your language:
^https?://flyheight\.com/videos/[a-z0-9]{6}$
I'm not too sure if this is what you were looking for, but you could use the following:
^(?!images$).*(flyheight.com/videos/)([^/]+)$
The idea is that it would match the first part that you had, then match one or more characters that is not a slash ([^/]+) .
If you had strings that may or may not contain the / on the end (for example, you had https://flyheight.com/videos/yb24os or https://flyheight.com/videos/yb24os/), you can try the following:
^(?!images$).*(flyheight.com/videos/)([^/]+)/?$
here are my results on regexr.
This simple expression might do that since all your desired output starts with an y:
\/(y.*)
However, if you wish to add additional boundaries to it, you can do so. For instance, this would strengthen the left boundary:
flyheight.com\/videos\/(y.*)
Or you could add a list of char, similar to this:
flyheight.com\/videos\/([a-z0-9]+)
You can also add a quantifier to the desired output, similar to this expression:
flyheight.com\/videos\/([a-z0-9]{6})
and you can simply increase and add any boundary that you wish and capture your desired URLs, and fail others.
You might want to use this tool and change/edit/modify your expression based on your desired engine, as you wish:
^(.*)(flyheight.com\/videos\/)([a-z0-9]{6})$
This graph shows how it works and you can test more expressions here:

How to capture a group from a absolute fill path without any slash in it using JavaScript

Here is a sample file path,
/Users/X/Q/Q-doc/src/templates/demos.js
The part I would love to capture is demos.
Here is another example,
/Users/X/Q/Q-doc/src/templates/demos1.js
The target I want is demos1.
I tried to use /\/(.*).js/ to capture the filename but seems it will also capture all the things in between.
([^\/]*?)\.js$
This will grab everything that is not a forward slash, so long as it's followed by .js, from the end of the string.
See it here
Your pattern is doing what it should, however your approach needs a fix you can use this approach instead:
(\w+)\.js
Working demo
Update: in case you need a match for samples like Kyle Fairns mentioned in his comment you can use
.*\/(.*?)\.js
Working demo

regular expression in excel for numbers before a slash

In the example below, I need to change everything before the final slash to jreviews/
so in the example below the first line would become
jreviews/159256_0907131531001639107_std.jpg
i am using open office find and replace tool, I see there is an option for regex but i dont know how to do this. How can I find and replace the img.agoda urls and everything thats a number and slash, and replace that with jreviews/ ?
but keeping the numbers after that final slash, because these are the filename.
http://img.agoda.net/hotelimages/159/159256/159256_0907131531001639107_std.jpg
http://img.agoda.net/hotelimages/161/161941/161941_1001051215002307125_std.jpg
http://img.agoda.net/hotelimages/288/288595/288595_111017161615_std.jpg
http://img.agoda.net/hotelimages/289/289890/289890_13081511070014319856_std.jpg
http://img.agoda.net/hotelimages/305/305075/305075_120427175058_std.jpg
http://img.agoda.net/hotelimages/305/305078/305078_120427175537_std.jpg
Regex seems like overkill, at least for your examples. Since they all have the same number of subfolders, a simple Find and Replace with wildcards works for me. Here's how I did it in Excel:
Just replace http://*/*/*/*/ with jreviews/.
Try this:
Replace the below match with "CustomName/"
^.+[/$]

Regex: Get Filename Without Extension in One Shot?

I want to get just the filename using regex, so I've been trying simple things like
([^\.]*)
which of course work only if the filename has one extension. But if it is adfadsfads.blah.txt I just want adfadsfads.blah. How can I do this with regex?
In regards to David's question, 'why would you use regex' for this, the answer is, 'for fun.' In fact, the code I'm using is simple
length_of_ext = File.extname(filename).length
filename = filename[0,(filename.length-length_of_ext)]
but I like to learn regex whenever possible because it always comes up at Geek cocktail parties.
Try this:
(.+?)(\.[^.]*$|$)
This will:
Capture filenames that start with a dot (e.g. .logs is a file named .logs, not a file extension), which is common in Unix.
Gets everything but the last dot: foo.bar.jpeg gets you foo.bar.
Handles files with no dot: secret-letter gets you secret-letter.
Note: as commenter j_random_hacker suggested, this performs as advertised, but you might want to precede things with an anchor for readability purposes.
Everything followed by a dot followed by one or more characters that's not a dot, followed by the end-of-string:
(.+?)\.[^\.]+$
The everything-before-the-last-dot is grouped for easy retrieval.
If you aren't 100% sure every file will have an extension, try:
(.+?)(\.[^\.]+$|$)
how about 2 captures one for the end and one for the filename.
eg.
(.+?)(?:\.[^\.]*$|$)
^(.*)\\(.*)(\..*)$
Gets the Path without the last \
The file without extension
The the extension with a .
Examples:
c:\1\2\3\Books.accdb
(c:\1\2\3)(Books)(.accdb)
Does not support multiple . in file name
Does support . in file path
I realize this question is a bit outdated, however, I had some trouble finding a good source and wound up making the regex myself. To save whoever may find this time,
If you're looking for a ~standalone~ regex
This will match the extension without the dot
\w+(?![\.\w])
This will always match the file name if it has an extention
[\w\. ]+(?=[\.])
Ok, I am not sure why I would use regular expression for this. If I know for example that the string is a full filepath, then I would use another API to get the file name. Regular expressions are very powerfull but at the same time quite complex (you have just proved that by asking how to create such a simple regex). Somebody said: you had a problem that you decided to solve it using regular expressions. Now you have two problems.
Think again. If you are on .NET platform for example, then take a look at System.IO.Path class.
I used this pattern for simple search:
^\s*[^\.\W]+$
for this text:
file.ext
fileext
file.ext.ext
file.ext
fileext
It finds fileext in the second and last lines.
I applied it in a text tree view of a folder (with spaces as indents).
Just the name of the file, without path and suffix.
^.*[\\|\/](.+?)\.[^\.]+$
Try
(?<=[\\\w\d-:]*\\)([\w\d-:]*)(?=\.[\.\w\d-:]*)
Captures just the filename of any kind within an entire filepath. Purposefully excludes the file path and the file extension
Etc:
C:\Log\test\bin\fee105d1-5008-410c-be39-883e5e40a33d.pdf
Doesn't capture (C:\Log\test\bin)
Captures (fee105d1-5008-410c-be39-883e5e40a33d)
Doesn't capture (.pdf)
This RegExp works for me:
(.+(?=\..+$))|(.+[^\.])
Results (bold means match):
test.txt
test 234!.something123
.test
.test.txt
test.test2.txt
.