Vim substitution with regex - regex

New to regex and I need to pattern match on some dates to change the format.
I'm going from mm/dd/yy to yyyy-mm-dd where there are no entries prior to 2000.
What I'm unfamiliar with is how to group things to use their respective references of \1, \2, etc.
Would I first want to match on mm/dd/yy with something like ( \d{2} ) ( \/\d{2} ) ( \/\d{2} ) or is it as easy as \d\d/\d\d/\d\d ?
Assuming my first grouping is partially the right idea, I'm looking to do something like:
:%s/old/new/g
:%s/ ( \d{2} ) ( \/\d{2} ) ( \/\d{2} ) / ( 20+\3) - (\3) - (\1) /g
EDIT: Sorry, the replace is going to a yyyy-mm-dd format with hyphens, not the slash.

I was going to comment on another answer but it got complicated.
Mind the magic setting. If you want unescaped parens to do grouping, you need to include \v somewhere in your pattern. (See :help magic).
You can avoid escaping the slashes if you use something other than slashes in the :s command.
You are close. :) You don't want all of those spaces though as they'll require spaces in the same places to match.
My solution, where I use \v so I don't need to escape the parens and exclamation points so I can use slashes in my pattern without escaping them:
:%s!\v(\d{2})/(\d{2})/(\d{2})!20\3-\2-\1!g
This will match "inside" items that start or end with three or more digits though, too. If you can give begin/end criteria then that'd possibly be helpful. Assuming that simple "word boundary" conditions work, you can use <>:
:%s!\v<(\d{2})/(\d{2})/(\d{2})>!20\3-\2-\1!g
To critique yours specifically (for learning!):
:%s/ ( \d{2} ) ( \/\d{2} ) ( \/\d{2} ) / ( 20+\3) - (\3) - (\1) /g
Get rid of the spaces since presumably you don't want them!
Your grouping needs either \( \) or \v to work
You also need \{2} unless you use \v
You are putting the slashes in groups two and three which means they'll show up in the replacement too
You don't want the parentheses in the output!
You're substituting text directly; you don't want the + after the 20 in the output

Try this:
:%s/\(\d\{2}\)\/\(\d\{2}\)\/\(\d\{2}\)/20\3-\2-\1/g
The bits you're interested in are: \(...\) - capture; \d - a digit; \{N} - N occurrences; and \/ - a literal forward slash.
So that's capturing two digits, skipping a slash, capturing two more, skipping another slash, and capturing two more, then replacing it with "20" + the third couplet + "-" + the second couplet + "-" + the first couplet. That should turn "dd/mm/yy" into "20yy-mm-dd".

ok, try this one:
:0,$s#\(\d\{1,2\}\)/\(\d\{1,2\}\)/\(\d\{1,2\}\)#20\3-\2-\1#g
I've removed a lot of the spaces, both in the matching section and the replacement section, and most of parens, because the format you were asking for didn't have it.
Some things of note:
With vi you can change the '/' to any other character, which helps when you're trying to match a string with slashes in it.. I usually use '#' but it doesn't have to be.
You've got to escape the parens, and the curly braces
I use the :0,$ instead of %s, but I think it has the same meaning -- apply the following command to every row between row 0 and the end.

For the match: (\d{2})\/(\d{2})\/(\d{2})
For the replace: 20\3\/\1\/\2

Related

Regex syntax \(.*\) suppose to remove the ( ) and all characters between ( ). How it actually works?

I am new to Regex world. I would like to rename the files that have time stamp added on the end of the file name. Basically remove last 25 characters before the extension.
Examples of file names to rename:
IMG523314(2021-12-05-14-51-25_UTC).jpg > IMG523314.jpg
Test run1(2021-08-05-11-32-18_UTC).txt > Test run1.txt
To remove 25 characters before the .extension (2021-12-05-14-51-25_UTC)
or if you like, remove the brackets ( ) which are always there and everything inside the brackets.
After the right bracket is always a dot '. "
Will Regex syntax as shown in the Tittle here, select the above? If yes, I wonder how it actually works?
Many Thanks,
Dan
Yes \(.*\) will select the paranthesis and anything inside of them.
Assuming when you ask how it works you mean why do the symbols work how they do, heres a breakdown:
\( & \): Paranthesis are special characters in regex, they signify groups, so in order to match them properly, you need to escape them with backslashes.
.: Periods are wildcard matcher, meaning they match any single character.
*: Asterisks are a quantifier, meaning match zero to inifite number of the previous matcher.
So to put everything together you have:
Match exactly one opening parathesis
Match an unlimited number of any character
Match exactly one closing bracket
Because of that closing bracket requirement, you put a limit to the infinite matching of the asterisk and therefore only grab the parenthesis and characters inside of them.
Yes, it's possible:
a='IMG523314(2021-12-05-14-51-25_UTC).jpg'
echo "${a/\(*\)/}"
and
b='Test run1(2021-08-05-11-32-18_UTC).txt'
echo "${b/\(*\)/}"
Explanation:
the first item is the variable
the second is the content to be replaced \(*\), that is, anything inside paranthesis
the third is the string we intend to replace the former with (it's empty string in this case)

Why are there extra parenthesis in this regex substitution?

I have code with lines that look like this:
self.request.sendall(some_string)
I want to replace them to look like this:
self.request.sendall(bytes(some_string, 'utf-8'))
This is my current sed command:
sed -i "s/\.sendall\((.*)\)/\.sendall\(bytes\(\1, 'utf-8'\)\)/g" some_file.py
I'm close, but this is the result I'm getting:
self.request.sendall(bytes((some_string), 'utf-8'))
I can't figure out where the extra open and close parenthesis are coming from in the group substitution. Does anyone see it? The escaped parenthesis are literal and need to be there. The ones around .* are to form a group for later replacement, but it's like they are becoming part of the matched text.
You escaped the wrong set of parentheses, you need to use
sed -i "s/\.sendall(\(.*\))/.sendall(bytes(\1, 'utf-8'))/g" some_file.py
Note: the regex flavor you are using is POSIX BRE, thus,
Capturing groups are set with \(...\)
Literal parentheses are defined with mere ( and ) chars with no escapes
Parentheses and a dot in the RHS, replacement, are redundant.
Pattern details:
\.sendall( - a .sendall( string
\(.*\) - Group 1 (\1): any zero or more chars
) - a ) char
.sendall(bytes(\1, 'utf-8')) - RHS, where \1 refers to the Group 1 value.

regex search and delete everything in brackets () using vim

I have a file text like below:
java-environment-common (3-1)
jdk8-openjdk (8.u172-2)
libart-lgpl (2.3.21-4)
expect (5.45.4-1)
dejagnu (1.6.1-1)
cython2 (0.28.4-1)
python2-pytz (2018.5-1)
python2-sip (4.19.8-1)
And I want to remove all the text in bracket (bracket included).
I use %s/\(.*\)//g, but it remove each line.
Finally, I get the true answer: %s/\s\(.*\)//g! But why the result of the regex are so different, the second just remove a space.
Please tell me the reason, thanks!
You are accidentally creating capture groups by putting the escaping slash, \ before your brackets. To vim, this means that whatever you find thanks to the regex we place inside these brackets should be saved so we can use those values again in our replace. It's actually not searching for the brackets at all! What you are actually doing in %s/\s\(.*\)//g is finding a whitespace with \s followed by any number of any character, saving these characters for later use and then replacing everything found with nothing (not using the values you saved earlier). This also just so happens to delete your brackets and their contents but not for the reasons you think it is.
If you wanted to search for the brackets, delete them and delete everything inside them, the right way would be to not escape your brackets, like this:
:%s/\s(.*)$//g
here I am telling vim to find a whitespace, followed by an opening bracket, followed by any number of any characters, followed by a closing bracket, followed by a newline (endline) character, then I tell it to replace everything with nothing (indicated by// )
In the non-very magic mode, \(...\) creates a capturing group, you do not actually match literal ( and ) with \( and \).
You must be looking for
%s/\s*(.*)//g
After running this command, I get the following output:
java-environment-common
jdk8-openjdk
libart-lgpl
expect
dejagnu
cython2
python2-pytz
python2-sip
To only remove 0+ whitespaces and the following (...) substring at the end of the line you may use
%s/\s*([^()]*)$//g
Where
\s* - 0+ whitespaces
( - a literal ( (in a non-very magic mode)
[^()]* - 0+ chars other than ( and )
) - a literal ) (in a non-very magic mode)
$ - end of the line.
You can just use:
:%s/\s(.*)//g
Another command
:%norm f(D
% ...... the whole file
f( ..... jump to the next (
D ...... erases till the end of line
Using t( instead of f( we can also delete the spaces
You can also use :normal command (more info here : http://vim.wikia.com/wiki/Using_normal_command_in_a_script_for_searching)
in your specific case, you can use the following:
:normal /(/di
Explanation of the command: Look for a ( then di will delete inside the scope of the ().

Reg Exp For javascript

hiii every body
i need help for this reg expression pattern
i need to search on text for this
( anything ) -
check this example to every statement
i need to detect if this pattern exist on the statement that i will feed to my function and get the matched string
be careful for space and braces and dash and anything mean any content Arabic or English no matter what is it , just pattern start with ( and end to - and if this pattern exist on the first statement so it say exist
thanks for every one .....
The task can be easier if it is described in a way "guiding to"
the proper solution. Let's rewrite your task the following way:
The text to match:
Should start with ( and a space.
Then there is a non-empty sequence of chars other than )
(the text between parentheses, which you wrote as anything).
The last part is a space, ), another space and -.
Having such a description, it is obvious, that the pattern should be:
\( [^)]+ \) -
where each fragment: \(, [^)]+ and \) - expresses each of the
above conditions.
Note: If spaces after ( and before ) are optional, then you can express it
with ? after each such space, and then the whole regex will change to
\( ?[^)]+ ?\) -.

Matching python statement/expression boundaries when searching in pycharm

A common thing I want to do, when doing a search-replace in an IDE (in this case: PyCharm), is to avoid cutting expressions or statements in half.
For example, suppose I want to fix the fact that my is using python-2-style print statements. I might write:
Search: print (.+), replace: print($1)
But this will do the wrong thing for multi-line statements:
print 'one' \
'two'
In general, recognizing multi-line statements is complicated. You need to check for trailing \s and also do bracket-matching for multiple types of brackets. Is there built-in functionality for doing this? Some kind of end-of-statement / end-of-expression escape sequence?
You could probably do it this way.
Find print((?:.+?(?:\\\r?\n)?)+)
Replace print($1)
Expanded
print
( # (1 start)
(?:
.+?
(?: \\ \r? \n )? # Possible line-continuation
)+
) # (1 end)