How to remove digit at the start of a file name? - regex

I want to remove digits at the start of a filename. For example:
atoms/01-headings/01-heading-level-01.html
to
atoms/headings/heading-level-01.html
I've build this regex /(^\d+|(?=\/)\/\d+)[\-\.]/img but it seem that the positive lookahead (?=\/) consume the / too.
How to not consume it ?
Here's my tests: https://regex101.com/r/hK0rZ2/5

Search using this regex:
/(^|\/)\d+[.-]?/img
and replace by:
"$1"
Captured group #1 (^|\/) matches either start position or a / followed by 1 or more digits and an optional hyphen OR DOT. In replacement we $1 as back reference of captured group #1.
Updated RegEx Demo

It is not the lookahead that consumes the slash, the \/ in \/\d+ does.
You can use
/(?:^|(\/))\d+[-.]/igm
And replace with $1.
See the regex demo
The regex matches:
(?:^|(\/)) - either the start of a line (^) or a / symbol and will capture the / into Group 1 (we'll later restore it with a $1 backreference)
\d+ - one or more digits
[-.] - either - or a . literally (since it is a character class, and the hyphen is at the beginning of it, no escaping is necessary).
var re = /(?:^|(\/))\d+[-.]/img;
var str = '01-heading-level-01.html\n02-heading-level-02.html\natoms/01-headings/01-heading-level-01.html\natoms/01-headings/01-heading-level-1.html\natoms/01-headings/02-heading-level-02.html\natoms/01-headings/02-heading-level-2.html\natoms/01-headings/01-heading-level-01/01-headings-level-01-red.html';
var result = str.replace(re, '$1');
document.body.innerHTML = result.replace(/\n/g, "<br/>");

Related

How to replace certain pattern in a file while keeping some string (could be any string) within it unchanged?

We know that in Linux vim, we can use
:%s/character/symbol/g
to replace any occurrence of 'character' with 'symbol' in a file.
But what if I want to replace patterns like:
defined($opt_ws_parser)
defined($opt_client)
defined($opt_server)
...
with only the part in the parentheses? That is:
$opt_ws_parser
$opt_client
$opt_server
...
How can I achieve that?
I tried using "%s/defined($.)/$./g". It turned out that all the occurrences became $.*, its literal form, not retaining their original letters.
:%s/defined(\(\$.*\))/\1
% - Repeat for every line
s - Substitute
/ - Start of pattern
defined( - Match the string "defined(" literally
\( - Beginning of capture group
\$ - Match "$"
.* - Match anything
\) - End of capture group
) - Literally match ")"
/ - End of pattern, start of substitution string
\1 - Reference to the first capture group (first expression between round brackets)
Sources:
:h :substitute
:h :sub-replace-special

replaceAll regex to remove last - from the output

I was able to achieve some of the output but not the right one. I am using replace all regex and below is the sample code.
final String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
System.out.println(label.replaceAll(
"([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)", "$3"));
i want this output:
abc-nyd-request-xyxpt
but getting:
abc-nyd-request-xyxpt-
here is the code https://ideone.com/UKnepg
You may use this .replaceFirst solution:
String label = "abcs-xyzed-abc-nyd-request-xyxpt--1-cnaq9";
label.replaceFirst("(?:[^-]*-){2}(.+?)(?:--1)?-[^-]+$", "$1");
//=> "abc-nyd-request-xyxpt"
RegEx Demo
RegEx Details:
(?:[^-]+-){2}: Match 2 repetitions of non-hyphenated string followed by a hyphen
(.+?): Match 1+ of any characters and capture in group #1
(?:--1)?: Match optional --1
-: Match a -
[^-]+: Match a non-hyphenated string
$: End
The following works for your example case
([^-]+)-([^-]+)-(.+[^-])-+([^-]+)-([^-]+)
https://regex101.com/r/VNtryN/1
We don't want to capture any trailing - while allowing the trailing dashes to have more than a single one which makes it match the double --.
With your shown samples and attempts, please try following regex. This is going to create 1 capturing group which can be used in replacement. Do replacement like: $1in your function.
^(?:.*?-){2}([^-]*(?:-[^-]*){3})--.*
Here is the Online demo for above regex.
Explanation: Adding detailed explanation for above regex.
^(?:.*?-){2} ##Matching from starting of value in a non-capturing group where using lazy match to match very near occurrence of - and matching 2 occurrences of it.
([^-]*(?:-[^-]*){3}) ##Creating 1st and only capturing group and matching everything before - followed by - followed by everything just before - and this combination 3 times to get required output.
--.* ##Matching -- to all values till last.

Regex to extract string if there is or not a specific word

Hi I'm a regex noob and I'd like to make a regex in order to extract the penultimate string from the URL if the word "xxxx" is contained or the last string if the word "xxxx" is not contained.
For example, I could have 2 scenarios:
www.hello.com/aaaa/1adf0023efae456
www.hello.com/aaaa/1adf0023efae456/xxxx
In both cases I want to extract the string 1adf0023efae456.
I've tried something like (?=(\w*xxxx\w*)\/.*\/(.*?)\/|[^\/]+$) but doesn't work properly.
You can match the forward slash before the digits, then match digits and assert what follows is either xxxx or the end of the string.
\d+(?=/xxxx|$)
Regex demo
If there should be a / before matching the digits, you could use a capturing group and get the value from group 1
/(\d+)(?=/xxxx|$)
/ Match /
(\d+) Capture group 1, match 1+ digits
(?=/xxxx|$) Positive lookahead, assert what is on the right is either xxxx or end of string
Regex demo
Edit
If there could possibly also be alphanumeric characters instead of digits, you could use a character class [a-z0-9]+ with an optional non capturing group.
/([a-z0-9]+)(?:/xxxx)?$
Regex demo
To match any char except a whitespace char or a forward slash, use [^\s/]+
Using lookarounds, you could assert a / on the left, match 1+ alphanumerics and assert what is at the right is either /xxxx or the end of the string which did not end with /xxxx
(?<=/)[a-z0-9]+(?=/xxxx$|$(?<!/xxxx))
Regex demo
You could avoid Regex:
string[] strings =
{
"www.hello.com/aaaa/1adf0023efae456",
"www.hello.com/aaaa/1adf0023efae456/xxxx"
};
var x = strings.Select(s => s.Split('/'))
.Select(arr => new { upper = arr.GetUpperBound(0), arr })
.Select(z => z.arr[z.upper] == "xxxx" ? z.arr[z.upper - 1] : z.arr[z.upper]);

Extract only the first occurence of the search string and ignore everything after /

I'm new to regex and want to display all the folders that contain the string name but ignore the characters or inner directories after "/"
Using regex only
(*spark?/)
Below are the set of directories:
/app-logs/spark/logs/application_15262_85484
/user/oozie/share/lib/lib_36456456/spark
/app-logs/spark/logs
/app-logs/spark
/apps/spark/warehouse
My result should be:
/app-logs/spark
/user/oozie/share/lib/lib_36456456/spark
/app-logs/spark
/apps/spark
The expression we might be looking for here, would be:
(spark)\/?.*
which we would replace it with our first capturing group, $1.
Demo
Test
const regex = /(spark)\/?.*/gm;
const str = `/app-logs/spark/logs/application_15262_85484
/user/oozie/share/lib/lib_36456456/spark
/app-logs/spark/logs
/app-logs/spark
/apps/spark/warehouse`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
Your pattern (*spark?/) is not valid because before the quantifier * there is a an opening parenthesis for the capturing group which is not valid. The questionmark after the k means that the character k is optional.
You could use a repeating pattern to match a forward slash followed by matching not a forward slash, then match /spark
^(?:/[^/\n]+)+/spark
Explanation
^ Assert start of string
(?: Non capturing group
/[^/\n]+ Match /, then match 1+ times not / or a newline
)+ Close non capturing group and repeat 1+ times
/spark Match /spark
Regex demo

A regular expression for matching a group followed by a specific character

So I need to match the following:
1.2.
3.4.5.
5.6.7.10
((\d+)\.(\d+)\.((\d+)\.)*) will do fine for the very first line, but the problem is: there could be many lines: could be one or more than one.
\n will only appear if there are more than one lines.
In string version, I get it like this: "1.2.\n3.4.5.\n1.2."
So my issue is: if there is only one line, \n needs not to be at the end, but if there are more than one lines, \n needs be there at the end for each line except the very last.
Here is the pattern I suggest:
^\d+(?:\.\d+)*\.?(?:\n\d+(?:\.\d+)*\.?)*$
Demo
Here is a brief explanation of the pattern:
^ from the start of the string
\d+ match a number
(?:\.\d+)* followed by dot, and another number, zero or more times
\.? followed by an optional trailing dot
(?:\n followed by a newline
\d+(?:\.\d+)*\.?)* and another path sequence, zero or more times
$ end of the string
You might check if there is a newline at the end using a positive lookahead (?=.*\n):
(?=.*\n)(\d+)\.(\d+)\.((\d+)\.)*
See a regex demo
Edit
You could use an alternation to either match when on the next line there is the same pattern following, or match the pattern when not followed by a newline.
^(?:\d+\.\d+\.(?:\d+\.)*(?=.*\n\d+\.\d+\.)|\d+\.\d+\.(?:\d+\.)*(?!.*\n))
Regex demo
^ Start of string
(?: Non capturing group
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
(?=.*\n\d+\.\d+\.) Positive lookahead, assert what follows a a newline starting with the pattern
| Or
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
*(?!.*\n) Negative lookahead, assert what follows is not a newline
) Close non capturing group
(\d+\.*)+\n* will match the text you provided. If you need to make sure the final line also ends with a . then (\d+\.)+\n* will work.
Most programming languages offer the m flag. Which is the multiline modifier. Enabling this would let $ match at the end of lines and end of string.
The solution below only appends the $ to your current regex and sets the m flag. This may vary depending on your programming language.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /((\d+)\.(\d+)\.((\d+)\.)*)$/gm,
match;
while (match = regex.exec(text)) {
console.log(match);
}
You could simplify the regex to /(\d+\.){2,}$/gm, then split the full match based on the dot character to get all the different numbers. I've given a JavaScript example below, but getting a substring and splitting a string are pretty basic operations in most languages.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /(\d+\.){2,}$/gm;
/* Slice is used to drop the dot at the end, otherwise resulting in
* an empty string on split.
*
* "1.2.3.".split(".") //=> ["1", "2", "3", ""]
* "1.2.3.".slice(0, -1) //=> "1.2.3"
* "1.2.3".split(".") //=> ["1", "2", "3"]
*/
console.log(
text.match(regex)
.map(match => match.slice(0, -1).split("."))
);
For more info about regex flags/modifiers have a look at: Regular Expression Reference: Mode Modifiers