Replace last occurrence of character in string [duplicate] - regex

This question already has answers here:
How to replace last occurrence of characters in a string using javascript
(3 answers)
Closed 6 years ago.
I've got the following string :
01/01/2014 blbalbalbalba blabla/blabla
I would like to replace the last slash with a space, and keep the first 2 slashes in the date.
The closest thing I have come up with was this kind of thing :
PS E:\> [regex] $regex = '[a-z]'
PS E:\> $regex.Replace('abcde', 'X', 3)
XXXde
but I don't know how to start from the end of the line. Any help would be greatly appreciated.
Editing my question to clarify :
I just want to replace the last slash character with a space character, therefore :
01/01/2014 blbalbalbalba blabla/blabla
becomes
01/01/2014 blbalbalbalba blabla blabla
Knowing that the length of "blabla" varies from one line to the other and the slash character could be anywhere.
Thanks :)

Using the following string to match:
(.*)[/](.*)
and the following to replace:
$1 $2
Explanation:
(.*) matches anything, any number of times (including zero). By wrapping it in parentheses, we make it available to be used again during the replace sequence, using the placeholder $ followed by the element number (as an example, because this is the first element, $1 will be the placeholder). When we use the relevant placeholder in the replace string, it will put all of the characters matched by this section of the regex into the resulting string. In this situation, the matched text will be 01/01/2014 blbalbalbalba blabla
[/] is used to match the forward slash character.
(.*) again is used to match anything, any number of times, similar to the first instance. In this case, it will match blabla, making it available in the $2 placeholder.
Basically, the first three elements work together to find a number of characters, followed by a forward slash, followed by another number of characters. Because the first "match everything" is greedy (that is, it will attempt to match as many character as possible), it will include all of the other forward slashes as well, up until the last. The reason that it stops short of the last forward slash is that including it would make the regex fail, as the [/] wouldn't be able to match anything any more.

You can also use lookahead:
'01/01/2014 blbalbalbalba blabla/blabla' -replace '/(?=[^/]+$)',' '
01/01/2014 blbalbalbalba blabla blabla
'/(?=[^/]+$)' will match a '/' character that comes right before a series of 'not /' characters immediately before EOL, but this is probably less efficient than the direct matches.

'01/01/2014 blbalbalbalba blabla/blabla' -replace '^(\d{2}/\d{2})/(\d{4} .*)','$1 $2'
# outputs this:
# 01/01 2014 blbalbalbalba blabla/blabla

Here's how you can do it without regular expressions:
$string = "01/01/2014 blbalbalbalba blabla/blabla"
$last_index = $string.LastIndexOf('/')
$chars = $string.ToCharArray()
$chars[$last_index] = ' '
$new_string = $chars -join ''
Another way:
$string = "01/01/2014 blbalbalbalba blabla/blabla"
$last_index = $string.LastIndexOf('/')
$new_string = $string.Remove($last_index, 1).Insert($last_index, ' ')

$ is the anchor for end of line.
So
(.*?)([a-z])$
should match what you want, and the thing in () is what you want to replace.
Best regards

Related

how to shell script regex perfect matching?

I have a Bash script file that matches a regex.
My regex script file:
if [[ "$image" =~ [0-9]+(\.[0-9]+){3}\-[0-9]+$ ]]; then
I need to pass cases that only match 0.0.0.0-0000
These are my inputs and results.
pass : 0.0.0.0-0000
pass : 0.0.0.0.0.0-0000 << Unwanted match
no : 0.0.0.0-word
no : 0.0.0.0
As I marked above 0.0.0.0.0.0-0000 gets a match with my regex.
My question is how can I modify my regex to only match the pattern 0.0.0.0-0000?
Assuming that you are trying to match up some sort of IP address like String I came up with this regex.
^(\d+\.?){4}-\d+
Regex Demo
Note the \d+ in first capturing group (\d+\.?) which will match any number before a .. If the only starting pattern is 0.0.0.0, you can remove the + mark here to only match one digit character.
Explanation:
^ - Captures start of a String
(\d+\.?){4} - Captures a number that ends with a optional . character 4 times in a row capturing 0.0.0.0
-\d+ - Captures - character and sequence of digits in a row capturing -0000
This issue is solved.
The follow answer to up #The fourth bird
i missed anchor(^).
To clarify the starting and ending points, It should be between '^' and '$'.
You can refer to answer
if [[ "$image" =~ ^[0-9]+(\.[0-9]+){3}\-[0-9]+$ ]]; #The fourth bird Jul 11 at 8:43
Thank you for replayers XD

RegEx for matching a string before a year

I have directory names with include year numbers. I want to split them to variables what is before the year number:
Input:
Holidays.uS.2019.bla.bla
Holidays.ca.old.2017.bla.bla
Holidays.2015.bla.bla.bla
Holidays.1.2.3.4.at.old.1999.bla.bla.bla.bla
The year is not always in the same place, but, it always has 4 digits.
I always need everything up to the year.
For an input:
Holidays.ca.old.2017.bla.bla
Output:
Holidays.ca.old
Attempt
set name Holidays.ca.old.2017.bla.bla
set numbers [regexp -all -inline {[0-9]+} $name]
Output from my code is the year number, and sometimes other wrong numbers.
This expression might help you to design one:
([\w\.]+)(\.[0-9]{4}.+)
Graph
This graph displays how it would work:
Code:
set string "Holidays.1.2.3.4.at.old.1999.bla.bla.bla.bla"
set match [regsub {([\w\.]+)(\.[0-9]{4}.+)} $string "\\1"]
puts $match
Output
Holidays.1.2.3.4.at.old
You may use a regex to match a dot followed with 4 digits that are not followed with a word char, and then matching any other char 0 or more times, and remove the matched text using regsub like this:
regsub {\.[0-9]{4}\y.*} $name ""
See Tcl demo online:
set name "Holidays.ca.old.2017.bla.bla"
set res [regsub {\.[0-9]{4}\y.*} $name ""]
puts $res
# => Holidays.ca.old
Regex details
\. - a dot
[0-9]{4} - four digits
\y - a word boundary
.* - any 0 or more chars as many as possible.
If you want to see a demo of the regex at regex101.com, you need to replace \y with \b, see this demo here.
(\w|\.)+(?=\.\d{4})
Breakdown:
(\w|\.)+ One or more words (which includes digits) or literal periods.
(?=\.\d{4}) Positive lookahead for a literal period followed by exactly four digits.
Demo: https://regex101.com/r/vaofyC/6
Thank you for your help, that's really nice
I use this in tcl and working perfekt forme
set name_split [regsub {\.[0-9]{4}\y.*} $name ""]
I still need it for a bash script, how can use it?
this does not really work :(
name_split=$(echo $name | {\.[0-9]{4}\y.*}

PowerShell Regular Expression match Y or Z

I am trying to match some strings using a regular expression in PowerShell but due to the differing format of the original string that I'm extracting from, encountering difficulty. I admittedly am not very strong with creating regular expressions.
I need to extract the numbers from each of these strings. These can vary in length but in both cases will be preceded by Foo
PC1-FOO1234567
PC2-FOO1234567/FOO98765
This works for the second example:
'PC2-FOO1234567/FOO98765' -match 'FOO(.*?)\/FOO(.*?)\z'
It lets me access the matched strings using $matches[1] and $matches[2] which is great.
It obviously doesn't work for the first example. I suspect I need some way to match on either / or the end of the string but I'm not sure how to do this and end up with my desired match.
Suggestions?
You may use
'FOO(.*?)(?:/FOO(.*))?$'
It will match FOO, then capture any 0 or more chars as few as possible into Group 1 and then will attempt to optionally match a sequence of patterns: /FOO, any 0 or more chars as many as possible captured into Group 2 and then the end of string position should follow.
See the regex demo
Details
FOO - literal substring
(.*?) - Group 1: any zero or more chars other than newline, as few as possible
(?:/FOO(.*))? - an optional non-capturing group matching 1 or 0 repetitions of:
/FOO - a literal substring
(.*) - Group 2: any 0+ chars other than newline as many as possible (* is greedy)
$ - end of string.
[edit - removed the unneeded pipe to Where-Object. thanks to mklement0 for that! [*grin*]]
this is a somewhat different approach. it splits on the foo, then replaces the unwanted / with nothing, and finally filters out any string that contains letters.
the pure regex solutions others offered will likely be faster, but this may be slightly easier to understand - and therefore to maintain. [grin]
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
PC1-FOO1234567
PC2-FOO1234567/FOO98765
'# -split [environment]::NewLine
$InStuff -split 'foo' -replace '/' -notmatch '[a-z]'
output ...
1234567
1234567
98765
To offer a more concise alternative with the -split operator, which obviates the need to access $Matches afterwards to extract the numbers:
PS> 'PC1-FOO1234568', 'PC2-FOO1234567/FOO98765' -split '(?:^PC\d+-|/)FOO' -ne ''
1234568 # single match from 1st input string
1234567 # first of 2 matches from 2nd input string
98765
Note: -split always returns a [string[]] array, even if only 1 string is returned; result strings from multiple input strings are combined into a single, flat array.
^PC\d+-|/ matches PC followed by 1 or more (+) digits (\d) at the start of the string (^) or (|) a / char., which matches both PC2-FOO at the beginning and /FOO.
(?:...), a non-capturing subexpression, must be used to prevent -split from including what the subexpression matched in the results array.
-ne '' filters out the empty elements that result from the input strings starting with a separator.
To learn more about the regex-based -split operator and in what ways it is more powerful than the string literal-based .NET String.Split() method, see this answer.

Regex in PHP: take all the words after the first one in string and truncate all of them to the first character

I'm quite terrible at regexes.
I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example:
$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';
I'd like to convert each as follows:
$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';
My guess is that I should first match the first word, like so:
preg_match( "^([\w\-]+)", $str1, $first_word )
then all the words after the first one... but how do I match those? should I use again preg_match and use offset = 1 in the arguments? but that offset is in characters or bytes right?
Anyway after I matched the words following the first, if the exist, should I do for each of them something like:
$second_word = substr( $following_word, 1 ) . '. ';
Or my approach is completely wrong?
Thanks
ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (e.g. 'Kim Jong U.').
It can be done in single preg_replace using a regex.
You can search using this regex:
^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+
And replace by:
$1.
RegEx Demo
Code:
$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);
Explanation:
(*FAIL) behaves like a failing negative assertion and is a synonym for (?!)
(*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later
(*SKIP)(*FAIL) together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex.
^\w+(?:$| +)(*SKIP)(*F) matches first word in a name and skips it (does nothing)
(\w)\w+ matches all other words and replaces it with first letter and a dot.
You could use a positive lookbehind assertion.
(?<=\h)([A-Z])\w+
OR
Use this regex if you want to turn Bob F to Bob F.
(?<=\h)([A-Z])\w*(?!\.)
Then replace the matched characters with \1.
DEMO
Code would be like,
preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);
DEMO
(?<=\h)([A-Z]) Captures all the uppercase letters which are preceeded by a horizontal space character.
\w+ matches one or more word characters.
Replace the matched chars with the chars inside the group index 1 \1 plus a dot will give you the desired output.
A simple solution with only look-ahead and word boundary check:
preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);
(\w)\w+ is a word in the name, with the first character captured
(?!^)\b performs a word boundary check \b, and makes sure the match is not at the start of the string (?!^).
Demo

Putting space in camel case string using regular expression

I am driving my question from add a space between two words.
Requirement: Split a camel case string and put spaces just before the capital letter which is followed by a small case letter or may be nothing. The space should not incur between capital letters.
eg: CSVFilesAreCoolButTXT is a string I want to yield it this way CSV Files Are Cool But TXT
I drove a regular express this way:
"LightPurple".replace(/([a-z])([A-Z])/, '$1 $2')
If you have more than 2 words, then you'll need to use the g flag, to match them all.
"LightPurpleCar".replace(/([a-z])([A-Z])/g, '$1 $2')
If are trying to split words like CSVFile then you might need to use this regexp instead:
"CSVFilesAreCool".replace(/([a-zA-Z])([A-Z])([a-z])/g, '$1 $2$3')
But still it does not serve the way I have put my requirements.
var rex = /([A-Z])([A-Z])([a-z])|([a-z])([A-Z])/g;
"CSVFilesAreCoolButTXT".replace( rex, '$1$4 $2$3$5' );
// "CSV Files Are Cool But TXT"
And also
"CSVFilesAreCoolButTXTRules".replace( rex, '$1$4 $2$3$5' );
// "CSV Files Are Cool But TXT Rules"
The text of the subject string that matches the regex pattern will be replaced by the replacement string '$1$4 $2$3$5', where the $1, $2 etc. refer to the substrings matched by the pattern's capture groups ().
$1 refers to the substring matched by the first ([A-Z]) sub-pattern, and $3 refers to the substring matched by the first ([a-z]) sub-pattern etc.
Because of the alternation character |, to make a match the regex will have to match either the ([A-Z])([A-Z])([a-z]) sub-pattern or the ([a-z])([A-Z]) sub-pattern, so if a match is made several of the capture groups will remain unmatched. These capture groups can be referenced in the replacement string but they have have no effect upon it - effectively, they will reference an empty string.
The space in the replacement string ensures a space is inserted in the subject string every time a match is made (the trailing g flag means the regular expression engine will look for more than one match).
If the first character is always lowercase.
'camelCaseString'.replace(/([A-Z]+)/g, ' $1')
If the first character is uppercase.
'CamelCaseString'.replace(/([A-Z]+)/g, ' $1').replace(/^ /, '')
Splitting CamelCase with regex in .NET :
Regex.Replace(input, "((?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]))", " $1").Trim();
Example :
Regex.Replace("TheCapitalOfTheUAEIsAbuDhabi", "((?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]))", " $1").Trim();
Output :
The Capital Of The UAE Is Abu Dhabi
This worked for me
let camelCase = "CSVFilesAreCoolButTXTRules"
let re = /[A-Z-_\&](?=[a-z0-9]+)|[A-Z-_\&]+(?![a-z0-9])/g
let delimited = camelCase.replace(re,' $&').trim()
The above code works for almost all the use cases i had. I had a few peculiarities where '&' and '_' should be treated equivalent to an upper case character
ThisIsASlug ---> This Is A Slug
loremIpsum ---> lorem Ipsum
PAGS_US ---> PAGS_US
TheCapitalOfTheUAEIsAbuDhabi ---> The Capital Of The UAE Is Abu Dhabi
eclipseRCPExt ---> eclipse RCP Ext
VALUE ---> VALUE
SG&A ---> SG&A
A brief explanation
[A-Z-_\&](?=[a-z0-9]+)
//Matches normal words i.e. one uppercase followed by one or more non-uppercase characters
[A-Z-_\&]+(?![a-z0-9])
//Matches acronyms & abbreviations i.e. a sequence of uppercase characters that are not followed by non-uppercase characters
Check out the regexr fiddle here
Camel-case replacement for Javascript using lookaheads / behinds:
"TheCapitalOfTheUAEIsAbuDhabi".replace(/([A-Z](?=[a-z]+)|[A-Z]+(?![a-z]))/g, ' $1').trim()
// "The Capital Of The UAE Is Abu Dhabi"