Search and convert to lower case on vim - regex

I have a code with object.attribute where attribute can be an array
example: object.SIZE_OF_IMAGE[0] or a simple string. I want to search all occurrences "object.attribute" and replace it with self.lowercase(attribute) I want a regular expression on vim to do that.
I can use that :%s/object.*/self./gc and replace it manually but it is very slow.
Here are some examples:
object.SIZE to self.size
object.SIZE_OF_IMAGE[0] to self.size_of_image[0]

You basically just need two things:
Capture groups :help /\( let you store what's matched in between \(...\) and then reference it (via \1, \2, etc.) in the replacement (or even afterwards in the pattern itself).
The :help s/\L special replacement action that makes everything following lowercase.
This gives you the following command:
:%substitute/\<object\.\(\w\+\)/self.\L\1/g
Notes:
I've established a keyword start assertion (\<) at the beginning to avoid matching schlobject as well.
\w\+ matches letters, digits, and underscores (so it fulfills your example); various alternatives are possible here.

sed -E 's/object\.([^ \(]*)(.*)/self.lowercase(\1)\2/g' file_name.txt
above command considers that your attribute is followed by space or "("
you can tweek this command based on your need

Based on your comment above that the attribute part
"finishes by space or [ or (" you could match it with:
/object\.[^ [(]*
So, to replace it with self.attribute use a capturing
group and \L to make everything lowercase:
:%s/\vobject\.([^ [(]*)/self.\L\1/g

In the command mode try this
:1,$ s/object.attribute/self.lowercase(attribute)/g

Related

Multiple regex query (Notepad++)

I’ve understand that I can do multiple search&replace queries in this way:
Search: (á)|(é)|(í)|(ó)|(ú)
Replace: (?1Á)(?2É)(?3Í)(?4Ó)(?5Ú)
Example:
Before: árbol ácido
After: Árbol ácido
But how can I use this method if I need to add a precondition, such as line start and some punctuation before it?
I’ve tried:
Search: (^[—¿¡«]*?)[(á)|(é)|(í)|(ó)|(ú)] => Works!
Replace: \1(?2Á)(?3É)(?4Í)(?5Ó)(?6Ú) => Doesn’t work…
Example:
Before: —¿árbol ácido?
After: —¿rbol ácido?
Any help?
Regards.
In your second regex, you use a character class (indicated by [...]) - for grouping purposes according to a comment. But that's not how those work. Use non-capturing groups instead, e.g.
(^[—¿¡«]*?)(?:(á)|(é)|(í)|(ó)|(ú))
Instead of creating a group for each character, you can also have all the characters in one group and replace them with their uppercase variant.
search string
(^[—¿¡«]*)([áéíóú])
replacement
$1\U$2
The following can be found in the Notepad++ Regular Expressions documentation:
\U Causes next characters to be output in uppercase, until a \E is found.

Regex: Exact match string ending with specific character

I'm using Java. So I have a comma separated list of strings in this form:
aa,aab,aac
aab,aa,aac
aab,aac,aa
I want to use regex to remove aa and the trailing ',' if it is not the last string in the list. I need to end up with the following result in all 3 cases:
aab,aac
Currently I am using the following pattern:
"aa[,]?"
However it is returning:
b,c
If lookarounds are available, you can write:
,aa(?![^,])|(?<![^,])aa,
with an empty string as replacement.
demo
Otherwise, with a POSIX ERE syntax you can do it with a capture:
^(aa(,|$))+|(,aa)+(,|$)
with the 4th group as replacement (so $4 or \4)
demo
Without knowing your flavor, I propose this solution for the case that it does know the \b.
I use perl as demo environment and do a replace with "_" for demonstration.
perl -pe "s/\baa,|,aa\b/_/"
\b is the "word border" anchor. I.e. any start or end of something looking like a word. It allows to handle line end, line start, blank, comma.
Using it, two alternatives suffice to cover all the cases in your sample input.
Output (with interleaved input, with both, line ending in newline and line ending in blank):
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
aa,aab,aac
_aab,aac
aab,aa,aac
aab_,aac
aab,aac,aa
aab,aac_
If the \b is unknown in your regex engine, then please state which one you are using, i.e. which tool (e.g. perl, awk, notepad++, sed, ...). Also in that case it might be necessary to do replacing instead of deleting, i.e. to fine tune a "," or "" as replacement. For supporting that, please show the context of your regex, i.e. the replacing mechanism you are using. If you are deleting, then please switch to replacing beforehand.
(I picked up an input from comment by gisek, that the cpaturing groups are not needed. I usually use () generously, including in other syntaxes. In my opinion not having to think or look up evaluation orders is a benefit in total time and risks taken. But after testing, I use this terser/eleganter way.)
If your regex engine supports positive lookaheads and positive lookbehinds, this should work:
,aa(?=,)|(?<=,)aa,|(,|^)aa(,|$)
You could probably use the following and replace it by nothing :
(aa,|,aa$)
Either aa, when it's in the begin or the middle of a string
,aa$ when it's at the end of the string
Demo
As you want to delete aa followed by a coma or the end of the line, this should do the trick: ,aa(?=,|$)|^aa,
see online demo

Notepad++ and delimiters: automatically replace ``string'' by \command{string}

Within Notepad++, I want to replace many instances of the type ``string'' by \command{string} where string can be any string of characters. I am fairly close to what I want to achieve with:
Find: (?<=``)(.*?)(?='')
Replace: \\command{\1}
There is still a problem. With the regex code above, instead of \command{string} I get ``\command{string}'' and I am not sure why the `` and '' are not removed?
It is because you are using lookaround assertions. Lookaround (zero-width) assertions only assert that a position can be matched and do not "consume" any characters on the string. You can use the below regular expression.
Find: ``([^']+)''
Replace: \\command{\1}
You need to wrap everything into a capture group and use that. NP++ seems to not support lookahead/behind, but you dont need that for this specific case anyway:
``([^']+)'' -> \\command{\1}
This will make sure it does not match two commands (longest match) in something like:
run ``ls -l'' or ``ls -a''

Select last character of a substring in regexp

I'm trying to clean a huge geoJson datafile. I need to change the format of "text" field from
"text": "(2:Placename,Placename)"
to
"text": "Placename".
In Sublime text I managed to write a regular expression which enabled me to select and remove the first part leaving something like this:
"text": "Placename)"
With following regexp I can select the text above, but I need to narrow it down to the last character:
text\": \".*?\)
No matter what I can't figure out how to select the ")" character in the end of Placename string in the whole file and remove it. Note that the "Placename" here can be any place name, like New York, London etc.
I tried to build an expression where first part finds the text field, then ignores n-amount of characters until it finds the ")" character.
After experimenting and Googling I couldn't find a solution here.
You can capture the value of the second placemark field with the following regexp:
/"text": "+\(\d+:[^,]+,(.*?)\)/
Which will capture "Placename" in $1
More info on capturing parenthesis: http://www.regular-expressions.info/brackets.html
The trick is to use the inverted character classes and to escape any parentheses you want to match.
HTH
I do not know if you are using a Unix system, but probably sed can do much of the work for you. It can interpret regular expressions, capture groups, and substitute by other groups of characters. I have tried an example with sed and the following sed command worked for me:
echo "\"text\": \"(2:Placename,Placename)\"" | sed -r 's/(\"text\": )\"\([[:digit:]]:[^0-9]+,([^0-9]+)\)\"/\1\"\2\"/g'
-r allows sed to interpret regular expressions. I am using parentheses to capture groups that I will use later in the substitution (e.g., a group for "text", and a group for the second placename). In the substitution part of sed, you can use groups by using \n where n is the group number that you want to used. This expression should help you to achieve your desired result.

How to distinguish between saved segment and alternative?

From the following text...
Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF>
... I need to extract company name + zip code + rest of the line.
Since either a TAB or a SPACE character can separate the second from the third tokens, I tried using the following regex:
FIND:^(.+) (\d{5})(\t| )(.+)$
REPLACE:\1\t\2\t\3
However, the contents of the alternative part is put in the \3 part, so the result is this:
Acme Inc.<TAB>12345<TAB><TAB or SPACE here>$
How can I tell the (Perl) regex engine that (\t| ) is an alternative instead of a token to be saved in RAM?
Thank you.
You want:
^(.+?) (\d{5})[\t ](.+)$
Since you are matching one character or the other, you can use a character class instead. Also, I made your first quantifier non-greedy (+? instead of +) to reduce the amount of backtracking the engine has to do to find the match.
In general, if you want to make capture groups not capture anything, you can add ?: to it, like:
^(.+?) (\d{5})(?:\t| )(.+)$
Use non-capturing parentheses:
^(.+) (\d{5})(?:\t| )(.+)$
One way is to use \s instead of ( |\t) which will match any whitespace char.
See Backslash-sequences for how Perl defines "whitespace".