Regular Expression to select everything before and up to a particular text - regex

I am trying to use a regular expression to find a part of a string and select everything up to that string. So for example if my string is this/is/just.some/test.txt/some/other, I want to search for .txt and when I find it select everything before and up to .txt.

After executing the below regex, your answer is in the first capture.
/^(.*?)\.txt/

This matches everything up to ".txt" (without including it):
^.*(?=(\.txt))

You could just do ...
(.*?)\.txt
tested here..

^(.*)text
this worked for me but I was actually trying to get everything after the string too. The first part of the expression should answer your question.
^(.*)text([\s\S]*)$
where ^(.*) takes all before and including the text, while ([\s\S]*)$ takes all after and including the text. Tested it at regexr.com/6cpqg.
References:
above answers among other online sources
https://regexland.com/regex-match-after-character/

Up to and including txt you would need to change your regex like so:
^(.*?\\.txt)

((\n.*){0,3})(.*)\W*\.txt
This will select all the content before the particular word ".txt" including any context in different lines up to 3 lines

Related

regex how to find filename which doesn't contain any numbers?

I tried with [^0-9].* and [^\d].*
But none of them is working:
I only want to get filename which doesn't contain any numbers, in above case,
I need to get BUILDING.txt.
I also tried with
But it only matches the characters instead of the whole filename.
Here is the online tool: https://www.regextester.com/
Tried with remove .
My guess is that you might be trying to design an expression similar to:
^(?!.*\d).*$
The expression is explained on the top right panel of this demo, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
try:
/^(\D*)$/gm
Hope it helps
François
You can simply use
^[^\d]+$
Demo
The following regular expression should do the job. Since you've requested to only capture the filename of a file not containing numbers (this means only the filename and not the extension of the file).
^([a-zA-Z]+)(?=\.[a-zA-Z])
You can test the above regular expression here:
https://rubular.com/r/NMcMicEmLUKNTB

How to use a regular expression in notepad++ to change a url

I need some help with our migrated site urls's. We moved our site from Joomla to Worpdress and IN our posts we have over 20K of internal links.
The structure of these links are like these:
www.mysite.nl/current-post-title/index.php?option=com_content&view=article&id=5259:related-post-title&catid=35:universum&Itemid=48
What we need is this:
www.mysite.nl/related-post-title
So basically we need to remove everyhing behind www.mysite.nl/ up until the colon :, i.e. remove this: current-post-title/index.php?option=com_content&view=article&id=5259: (must remove the colon itself too)
And then remove everything behind the first ampersand (including the ampersand itself) until the end of the string, i.e. remove &catid=35:universum&Itemid=48
Of course only url strings containing this index.php?option=com_content must be changed.
I have dumped the table in plain text and opened it in Notepad++ to do a search and replace with regular expression because the content that must be removed from these lines is different every time.
Can someone please help me with the right regular expression?
In find what box enter below:
(www.mysite.nl)\/.*index.php\?option=com[^:]+:([^&]+)&.*
In replace with box enter:
\1/\2
Result
www.mysite.nl/related-post-title
Go inside-out, rather than outside-in, replace \/.+&id=\d+\:(.+?)&.+ with /$1. Also, paste a few into http://www.regexr.com/ and play around, although JavaScript and Notepad++ might have some differences in implemented Regex features, e.g. negative lookbehinds.

regular expression in excel for numbers before a slash

In the example below, I need to change everything before the final slash to jreviews/
so in the example below the first line would become
jreviews/159256_0907131531001639107_std.jpg
i am using open office find and replace tool, I see there is an option for regex but i dont know how to do this. How can I find and replace the img.agoda urls and everything thats a number and slash, and replace that with jreviews/ ?
but keeping the numbers after that final slash, because these are the filename.
http://img.agoda.net/hotelimages/159/159256/159256_0907131531001639107_std.jpg
http://img.agoda.net/hotelimages/161/161941/161941_1001051215002307125_std.jpg
http://img.agoda.net/hotelimages/288/288595/288595_111017161615_std.jpg
http://img.agoda.net/hotelimages/289/289890/289890_13081511070014319856_std.jpg
http://img.agoda.net/hotelimages/305/305075/305075_120427175058_std.jpg
http://img.agoda.net/hotelimages/305/305078/305078_120427175537_std.jpg
Regex seems like overkill, at least for your examples. Since they all have the same number of subfolders, a simple Find and Replace with wildcards works for me. Here's how I did it in Excel:
Just replace http://*/*/*/*/ with jreviews/.
Try this:
Replace the below match with "CustomName/"
^.+[/$]

Searching my code with regex

It happens all the time, I would need to scan my code for places where I have two or more of the same keywords.
For example $json["VALID"]
So, I would need to find json, and VALID.
Some places in the code may contain:
// a = $json['VALID']; // (note the apostrophes)
(I am using EditPlus which is a great text editor, letting me use regex in my searches)
What would be the string in the regex to find json and VALID (in this example) ?
Thanks in advance!
Use this regex:
\$json\[["']VALID['"]\]
wound find $json<2 character>VALID
\$json.{2}VALID

Regex: Get Filename Without Extension in One Shot?

I want to get just the filename using regex, so I've been trying simple things like
([^\.]*)
which of course work only if the filename has one extension. But if it is adfadsfads.blah.txt I just want adfadsfads.blah. How can I do this with regex?
In regards to David's question, 'why would you use regex' for this, the answer is, 'for fun.' In fact, the code I'm using is simple
length_of_ext = File.extname(filename).length
filename = filename[0,(filename.length-length_of_ext)]
but I like to learn regex whenever possible because it always comes up at Geek cocktail parties.
Try this:
(.+?)(\.[^.]*$|$)
This will:
Capture filenames that start with a dot (e.g. .logs is a file named .logs, not a file extension), which is common in Unix.
Gets everything but the last dot: foo.bar.jpeg gets you foo.bar.
Handles files with no dot: secret-letter gets you secret-letter.
Note: as commenter j_random_hacker suggested, this performs as advertised, but you might want to precede things with an anchor for readability purposes.
Everything followed by a dot followed by one or more characters that's not a dot, followed by the end-of-string:
(.+?)\.[^\.]+$
The everything-before-the-last-dot is grouped for easy retrieval.
If you aren't 100% sure every file will have an extension, try:
(.+?)(\.[^\.]+$|$)
how about 2 captures one for the end and one for the filename.
eg.
(.+?)(?:\.[^\.]*$|$)
^(.*)\\(.*)(\..*)$
Gets the Path without the last \
The file without extension
The the extension with a .
Examples:
c:\1\2\3\Books.accdb
(c:\1\2\3)(Books)(.accdb)
Does not support multiple . in file name
Does support . in file path
I realize this question is a bit outdated, however, I had some trouble finding a good source and wound up making the regex myself. To save whoever may find this time,
If you're looking for a ~standalone~ regex
This will match the extension without the dot
\w+(?![\.\w])
This will always match the file name if it has an extention
[\w\. ]+(?=[\.])
Ok, I am not sure why I would use regular expression for this. If I know for example that the string is a full filepath, then I would use another API to get the file name. Regular expressions are very powerfull but at the same time quite complex (you have just proved that by asking how to create such a simple regex). Somebody said: you had a problem that you decided to solve it using regular expressions. Now you have two problems.
Think again. If you are on .NET platform for example, then take a look at System.IO.Path class.
I used this pattern for simple search:
^\s*[^\.\W]+$
for this text:
file.ext
fileext
file.ext.ext
file.ext
fileext
It finds fileext in the second and last lines.
I applied it in a text tree view of a folder (with spaces as indents).
Just the name of the file, without path and suffix.
^.*[\\|\/](.+?)\.[^\.]+$
Try
(?<=[\\\w\d-:]*\\)([\w\d-:]*)(?=\.[\.\w\d-:]*)
Captures just the filename of any kind within an entire filepath. Purposefully excludes the file path and the file extension
Etc:
C:\Log\test\bin\fee105d1-5008-410c-be39-883e5e40a33d.pdf
Doesn't capture (C:\Log\test\bin)
Captures (fee105d1-5008-410c-be39-883e5e40a33d)
Doesn't capture (.pdf)
This RegExp works for me:
(.+(?=\..+$))|(.+[^\.])
Results (bold means match):
test.txt
test 234!.something123
.test
.test.txt
test.test2.txt
.