Regex in sed to convert ##XXX## to ${XXX} - regex

I need to use sed to convert all occurences of ##XXX## to ${XXX}. X could be any alphabetic character or '_'. I know that I need to use something like:
's/##/\${/g'
But of course that won't work properly, as it will convert ##FOO## to ${FOO${

Here's a shot at a better replacement regex:
's/##\([a-zA-Z_]\+\)##/${\1}/g'
Or if you assume exactly three characters :
's/##\([a-zA-Z_]\{3\}\)##/${\1}/g'

Encapsulate the alpha and '_' within '\(' and '\)' and then in the right side reference that with '\1'.
'+' to match one or more alpha and '_' (in case you see ####).
Add the 'g' option to the end to replace all matches (which I'm guessing is what you want to do in this case).
's/##\([a-zA-Z_]\+\)##/${\1}/g'

Use this:
s/##\([^#]*\)##/${\1}/
BTW, there is no need to escape $ in the right side of the "s" operator.

sed 's/##\([a-zA-Z_][a-zA-Z_][a-zA-Z_]\)##/${\1}/'
The \(...\) remembers...and is referenced as \1 in the expansion. Use single quotes to save your sanity.
As noted in the comments below this, this can also be contracted to:
sed 's/##\([a-zA-Z_]\{3\}\)##/${\1}/'
This answer assumes that the example wanted exactly three characters matched. There are multiple variations depending on what is in between the hash marks. The key part is remembering part of the matched string.

echo "##foo##" | sed 's/##/${/;s//}/'
s change only 1 occurence by default
s//take last search pattern used so second s take also ## and only the second occurence still exist

echo '##XXX##' | sed "s/^##\([^#]*\)/##$\{\1\}/g"

sed 's/\([^a-z]*[^A-Z]*[^0-9]*\)/(&)/pg

Related

Extracting string before pattern with sed (bash)

I need some help with sed to remove everything after matching pattern and remove the last "." if it exists..
Take this string as an example:
The.100.S02E05.720p.HDTV.x264-KILLERS.mkv
I want everything before the pattern "S[0-9][0-9]E[0-9[0-9]" except the last "."
What I want:
"The.100"
Does anyone have a great oneliner for this one?
It sounds like you can pretty much use exactly what you had in your question:
sed 's/\.*S[0-9][0-9]E[0-9][0-9].*//'
This matches an optional . character followed by the pattern you suggested (and anything after it), replacing with nothing. You were missing a ] in the question, which I have added.
Testing it out:
$ sed 's/\.*S[0-9][0-9]E[0-9][0-9].*//' <<<'The.100.S02E05.720p.HDTV.x264-KILLERS.mkv'
The.100

Regex to find text between second and third slashes

I would like to capture the text that occurs after the second slash and before the third slash in a string. Example:
/ipaddress/databasename/
I need to capture only the database name. The database name might have letters, numbers, and underscores. Thanks.
How you access it depends on your language, but you'll basically just want a capture group for whatever falls between your second and third "/". Assuming your string is always in the same form as your example, this will be:
/.*/(.*)/
If multiple slashes can exist, but a slash can never exist in the database name, you'd want:
/.*/(.*?)/
/.*?/(.*?)/
In the event that your lines always have / at the end of the line:
([^/]*)/$
Alternate split method:
split("/")[2]
The regex would be:
/[^/]*/([^/]*)/
so in Perl, the regex capture statement would be something like:
($database) = $text =~ m!/[^/]*/([^/]*)/!;
Normally the / character is used to delimit regexes but since they're used as part of the match, another character can be used. Alternatively, the / character can be escaped:
($database) = $text =~ /\/[^\/]*\/([^\/]*)\//;
You can even more shorten the pattern by going this way:
[^/]+/(\w+)
Here \w includes characters like A-Z, a-z, 0-9 and _
I would suggest you to give SPLIT function a priority, since i have experienced a good performance of them over RegEx functions wherever it is possible to use them.
you can use explode function with PHP or split with other languages to so such operation.
anyways, here is regex pattern:
/[\/]*[^\/]+[\/]([^\/]+)/
I know you specifically asked for regex, but you don't really need regex for this. You simply need to split the string by delimiters (in this case a backslash), then choose the part you need (in this case, the 3rd field - the first field is empty).
cut example:
cut -d '/' -f 3 <<< "$string"
awk example:
awk -F '/' {print $3} <<< "$string"
perl expression, using split function:
(split '/', $string)[2]
etc.

Matching regex at specific positions

Is it possible to match strings using a single regex expression where I could define constraints on their position within the text?
For example given a hex encoded file I would like to match hex representations that correspond to characters whose hex representation is larger than 0x40. The position constraint should be that matching should start at even positions.
E.g. 034673911921 should match at 46,73,91 but not at 92.
You can encode the position inside a regex. For your example of only starting at even positions, that could be something like
/^(?:..)*([4-9a-fA-F].)/
It can be done in two easy to understand steps: first split it into fields and then check the size. Here is an example with sed, I hope it will be of help:
echo 034673911921 | sed -nr 's7([0-9][0-9])/\1 /gp' | sed -n 's/[0-3][0-9]//gp'
You can use something like this:
^(?:..)*([4-9A-Fa-f][\da-fA-F])
which will make sure that an even number of characters precedes your capturing group.

regex replace: '[A-Z]'' to [A-Z]' - I can't preserve the letter in in the string

My google foo is failing me...
I have a file (well over 2 gig's) that has a SQL format problem. So I need a regex that will update the following examples (remember, I don't know how many there are or what the letters are):
'N'' should be changed to N'
'L'' should be changed to L'
etc
I've tried (within VIM and sed):
s/'[A-Z]''/$1'/
but that just produces:
'N'' -> '$1'
A backreference in sed is \1, not $1. You also need to capture the letter using \(\) (and probably use the global flag g).
Your sed expression should be:
s/'\([A-Z]\)''/\1'/g
Give this a shot:
sed "s/\([[:alpha:]]'\)'/\1/g" file
Example Output
$ sed "s/\([[:alpha:]]'\)'/\1/g" <<<"aBcD''eg''H'i"
aBcD'eg'H'i
Note: Since you said you didn't know what letters they would be I assumed they could be lower case. If you know for a fact they are always uppercase, then change [[:alpha:]] to [[:upper:]]. These character classes are preferred over [A-Za-z] and [A-Z], respectively, because they will always work as you expect no matter the locale.

Using regex to find any last occurrence of a word between two delimiters

Suppose I have the following test string:
Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop
where _ means any characters, eg: StartaGetbbGetcccGetddddStopeeeeeStart....
What I want to extract is any last occurrence of the Get word within Start and Stop delimiters. The result here would be the three bolded Get below.
Start__Get__Get__Get__Stop__Start__Get__Get__Stop__Start__Get__Stop
I precise that I'd like to do this only using regex and as far as possible in a single pass.
Any suggestions are welcome
Thanks'
Get(?=(?:(?!Get|Start|Stop).)*Stop)
I'm assuming your Start and Stop delimiters will always be properly balanced and they can't be nested.
I would have done it with two passes. The first pass find the word "Get", and the second pass count the number of occurrences of it.
$ echo "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get__Stop" | awk -vRS="Stop" -F"_*" '{print $(NF-1)}'
Get
Get
Get
Something like this, maybe:
(?<=Start(?:.Get)*)Get(?=.Stop)
That requires variable-length lookbehind support, which not all regex engines support.
It could be made to have a max length, which a few more (but still not all) support, by changing the first * to {0,99} or similar.
Also, in the lookahead, possibly the . should be a .+ or .{1,2} depending on if the double underscore is a typo or not.
With Perl, i'd do :
my $test = "Start_Get_Get_Get_Stop_Start_Get_Get_Stop_Start_Get_Stop";
$test =~ s#(?<=Start_)((Get_)*)(Get)(?=_Stop)#$1<FOUND>$3</FOUND>#g;
print $test;
output:
Start_Get_Get_<FOUND>Get</FOUND>_Stop_Start_Get_<FOUND>Get</FOUND>_Stop_Start_<FOUND>Get</FOUND>_Stop
You should adapt to your regex flavour.