regex: match 'customer' in string 'styles/customer.1031.css' - regex

I'm trying to extract the customer string out of a filepath in nodejs. So far I have come up with this:
var fileName = 'styles/customer.1031.css';
fileName = fileName.substring(7);
fileName = fileName.substring(0, fileName.length - 4);
fileName = fileName.match('[a-z]*')[0];
console.log(fileName); // <-- yields 'customer'
I'm cutting the styles/ from the beginning and the .css from the end. Then I'm only matching the lowercase characters. What would be a proper regex to match only the customer string so I don't need to cut the string before? F. ex. how would the regex look like to catch everything after styles/ until the .?

The regex to use could look like ^styles/([^.]+)\..*$ where
^styles/ translates to "starts with 'styles/'
Then your match (at least one character, matching until first '.')
Then a literal '.'
Then anything until the end of the string (this is optional, depending on your needs)

How would the regex look like to catch everything after styles/ until
the .?
This is how it will look like:
styles\/(.*?)\.
Run it on Regex101
The caught string can then be accessed via \1.

You can use non capturing group regex as (?:styles\/)(.*?)\.
var fileName = 'styles/customer.1031.css';
console.log(/(?:styles\/)(.*?)\./.exec(fileName)[1])

Related

Regexp to remove the last character in a URL string only if it equals /

The goal is here to basically get a match in a URL string to give us the full URL up till the last parent folder. So for example, if we had:
https://www.example.com/folder1/folder2/folder3/folder4, it would give us https://www.example.com/folder1/folder2/folder3 .
The same result should happen if we had a / at the end like this:
https://www.example.com/folder1/folder2/folder3/folder4/
I think this regexp (/*.*/|$) gives us the URL up to the last /, so it does not work if the URL ends with a /.
If we had: https://www.example.com/folder1/folder2/folder3/folder4/index.php .
It would return https://www.example.com/folder1/folder2/folder3/folder4 .
So basically, up till the last parent folder of another folder or file. Note that the solution should work on any URL and not the specific one in this question.
This one might solve it. Any expression ending with "/" followed by a string without "/"
^(.+\/)[^\/]+$
One could use
/.*[^\/]/
to extract the string
"https://www.example.com/folder1/folder2/folder3/folder4"
from
"https://www.example.com/folder1/folder2/folder3/folder4/"
By contrast, this regex extracts the entire string from the strings
"https://www.example.com/folder2/folder3/folder4/index.php"
and
"https://www.example.com/folder1/folder2/folder3/folder4"
Note that because regular expressions are by default greedy, there is no need for anchors. In this example .* in the regex will match all but the last forward slash, again, because it is greedy.
Alternatively, one could use a method that converts the last character of the string to an empty string if it is "/", returning a new string, leaving the original string unchanged. The regex
/\/\z/
could be used to identify that character, if present. In Ruby one could write:
str = "https://www.example.com/folder1/folder2/folder3/folder4/"
str.sub(/\/\z/, '')
#=> "https://www.example.com/folder1/folder2/folder3/folder4"
We can confirm the original string was unaffected:
str
#=> "https://www.example.com/folder1/folder2/folder3/folder4/"

Regex to extract second word from URL

I want to extract a second word from my url.
Examples:
/search/acid/all - extract acid
/filter/ion/all/sss - extract ion
I tried to some of the ways
/.*/(.*?)/
but no luck.
A couple things:
The forward slashes / have to be escaped like this \/
The (.*?) will match the least amount of any character, including zero characters. In this case it will always match with an empty string.
The .* will take as many characters as it can, including forward slashes
A simple solution will be:
/.+?\/(.*?)\//
Update:
Since you are using JavaScript, try the following code:
var url = "/search/acid/all";
var regex = /.+?\/(.*?)\//g;
var match = regex.exec(url);
console.log(match[1]);
The variable match is a list. The first element of that list is a full match (everything that was matched), you can just ignore that, since you are interested in the specific group we wanted to match (the thing we put in parenthesis in the regex).
You can see the working code here
This regex will do the trick:
(?:[^\/]*.)\/([^\/]*)\/
Proof.
For me, I had difficulties with the above answers for URL without an ending forward slash:
/search/acid/all/ /* works */
/search/acid /* doesn't work */
To extract the second word from both urls, what worked for me is
var url = "/search/acid";
var regex = /(?:[^\/]*.)\/([^\/]*)/g;
var match = regex.exec(url);
console.log(match[1]);

Regex to match alphanumerics, URL operators except forward slashes

I've been trying for the past couple of hours to get this regex right but unfortunately, I still can't get it. Tried searching through existing threads too but no dice. :(
I'd like a regex to match the following possible strings:
userprofile?id=123
profile
search?type=player&gender=male
someotherpage.htm
but not
userprofile/
helloworld/123
Basically, I'd like the regex to match alphanumerics, URL operators such as ?, = and & but not forward slashes. (i.e. As long as the string contains a forward slash, the regex should just return 0 matches.)
I've tried the following regexes but none seem to work:
([0-9a-z?=.]+)
(^[^\/]*$[0-9a-z?=.]+)
([0-9a-z?=.][^\/]+)
([0-9a-z?=.][\/$]+)
Any help will be greatly appreciated. Thank you so much!
The reason they all match is that your regexp matches part of the string and you've not told it that it needs to match the entire string. You need to make sure that it doesn't allow any other characters anywhere in the string, e.g.
^[0-9a-z&?=.]+$
Here's a small perl script to test it:
#!/usr/bin/perl
my #testlines = (
"userprofile?id=123",
"userprofile",
"userprofile?type=player&gender=male",
"userprofile.htm",
"userprofile/",
"userprofile/123",
);
foreach my $testline(#testlines) {
if ($testline =~ /^[0-9a-z&?=.]+$/) {
print "$testline matches\n";
} else {
print "$testline doesn't match - bad regexp, no cookie\n";
}
}
This should do the trick:
/\w+(\.htm|\?\w+=\w*(&\w+=\w*)*)?$/i
To break this down:
\w+ // Match [a-z0-9_] (1 or more), to specify resource
( // Alternation group (i.e., a OR b)
\.htm // Match ".htm"
| // OR
\? // Match "?"
\w+=\w* // Match first term of query string (e.g., something=foo)
(&\w+=\w*)* // Match remaining terms of query string (zero or more)
)
? // Make alternation group optional
$ // Anchor to end of string
The i flag is for case-insensitivity.

Regex to match a URL and insert a directory

I would like to use regex to match the following:
http://www.test.com/example/sometext/
and then redirect to:
http://www.test.com/uk/example/sometext/
where 'example' is not in a list of reserved words, like _images, _lib, _css, etc.
Use a negative look ahead:
(http://www.test.com/)((?!(_images|_lib|_css))[^/]+/sometext/)
And replace with
$1uk/$2
Broken down, the juicy buts are:
(?!someregex) = a negative lookahead - ie assert the following input does not match someregex
(_images|_lib|_css) = the syntax for regex OR logic, just using literals
[^/]+ = some characters that aren't a slash

Matching everything except a specified regex

I have a huge file, and I want to blow away everything in the file except for what matches my regex. I know I can get matches and just extract those, but I want to keep my file and get rid of everything else.
Here's my regex:
"Id":\d+
How do I say "Match everything except "Id":\d+". Something along the lines of
!("Id":\d+) (pseudo regex) ?
I want to use it with a Regex Replace function. In english I want to say:
Get all text that isn't "Id":\d+ and replace it with and empty string.
Try this:
string path = #"c:\temp.txt"; // your file here
string pattern = #".*?(Id:\d+\s?).*?|.+";
Regex rx = new Regex(pattern);
var lines = File.ReadAllLines(path);
using (var writer = File.CreateText(path))
{
foreach (string line in lines)
{
string result = rx.Replace(line, "$1");
if (result == "")
continue;
writer.WriteLine(result);
}
}
The pattern will preserve spaces between multiple Id:Number occurrences on the same line. If you only have one Id per line you can remove the \s? from the pattern. File.CreateText will open and overwrite your existing file. If a replacement results in an empty string it will be skipped over. Otherwise the result will be written to the file.
The first part of the pattern matches Id:Number occurrences. It includes an alternation for .+ to match lines where Id:Number does not appear. The replacement uses $1 to replace the match with the contents of the first group, which is the actual Id part: (Id:\d+\s?).
well, the opposite of \d is \D in perl-ish regexes. Does .net have something similar?
Sorry, but I totally don't get what your problem is. Shouldn't it be easy to grep the matches into a new file?
Yoo wrote:
Get all text that isn't "Id":\d+ and replace it with and empty string.
A logical equivalent would be:
Get all text that matches "Id":\d+ and place it in a new file. Replace the old file with the new one.
I haven't use .net before, but following works in java
System.out.println("abcd Id:12351abcdf".replaceAll(".*(Id:\\d+).*","$1"));
produces output
Id:12351
Although in true sense it doesnt match the criteria of matching everything except Id:\d+, but it does the job