I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas
Related
In my script, I'm in passing a markdown file and using sed, I'm trying to find lines that do not have one or more # and are not empty lines and then surround those lines with <p></p> tags
My reasoning:
^[^#]+ At beginning of line, find lines that do not begin with 1 or more #
.\+ Then find lines that contain one or more character (aka not empty lines)
Then replace the matched line with <p>\1</p>, where \1 represents the matched line.
However, I'm getting "\1 not defined in the RE". Is my reasoning above correct and how do I fix this error?
BODY=$(sed -E 's/^[^#]+.\+/<p>\1</p>/g' "$1")
Backslash followed by a number is replaced with the match for the Nth capture group in the regexp, but your regexp has no capture groups.
If you want to replace the entire match, use &:
BODY=$(sed -E 's%^[^#].*%<p>&</p>%' "$1")
You don't need to use .+ to find non-empty lines -- the fact that it has a character at the beginning that doesn't match # means it's not empty. And you don't need + after [^#] -- all you care is that the first character isn't #. You also don't need the g modifier when the regexp matches the entire line -- that's only needed to replace multiple matches per line.
And since your replacement string contains /, you need to either escape it or change the delimiter to some other character.
I need to run advanced find and replace using regex. I have a CSV similar to the following:
"Item 1a,,,,
,,Item 1b,,,,
,,Item 1c"
"Item 2a,,,,
,,Item 2b,,,,"
I need to remove the trailing commas for lines that start with a " quote.
I can match the correct lines like so:
(".*?),,,,$
The problem is, that selects the entire row, rather than just the trailing commas.
Anybody know how to match this correctly? so that only the commas are matched, on lines that start with " quote.
You are already capturing the content before the commas, just put it all back using a back reference in the replace:
Search: ^(".*?),+$
Replace: $1
Note: You need to anchor you regex to start of line ^ to match a quote there (otherwise it will match a quote anywhere in the line).
I have an xml file that has a value like
JOBNAME="JBDSR14353_Some_other_Descriptor"
I am looking for an expression that will go through the file and change all of the characters in the quotes to Uppercase letters. Is there a Regex expression that will search for JOBNAME="Anything within the quotes" and change them to uppercase? Or a command that will find JOBNAME= and change all on that line to uppercase letters? I know that can just do a search for JOBNAME= and then use a VU command in vim to throw the line to uppercase store that to a macro and run that, but I was wondering if there was a way to get this done with a regex??
Here's an alternative with :substitute, as you had originally intended. This works better than #Zach's solution with gU_ when there's other text in the line:
:%s/JOBNAME="[^"]\+"/\U&/g
"[^"]\+" matches the quoted text (non-greedily by matching only non-quotes inside, to handle multiple quotes in a line)
\U turns the remainder of the replacement uppercase
for simplicity, the entire match (&) is uppercased here, but one could have also used capture groups (\(...\)), or match limiting with \zs
You can use the :g command which executes a command on lines that match a pattern:
:g/JOBNAME/norm! gU_
This will execute the gU_, which capitalizes all letters on a line, on all the lines that match JOBNAME
If there are other things on the same line that you don't want to capitalize, here is a solution for only the words in quotes:
:g/JOBNAME/norm! f"gU;
f" goes to the next quote. gU capitalizes with a motion. The motion used is ; which searches for the next " (repeats the last f command).
To do this with substitution you can use the \U atom which makes everything after it uppercase.
:%s/JOBNAME="\zs.*\ze"/\U&
\zs and \ze mark the start and end of the match and & is the whole match. This means that only the part between quotes is replaced.
I'm trying to use a regex search and replace to find and fix any unescaped quotation marks with escaped question marks. This is not in any particular language - just using regex to search and replace in Sublime Text 2.
I can find them just fine with this regex:
([a-zA-Z0-9!##$%^&*()_+=-\?><:;\/])\"
Trying to replace is giving me some headaches. I thought this would work:
$0\\\"
but it's adding an extra quote in (or leaving the previous one there somehow).
e.g.,
e"
becomes
e"\"
instead of just
e\"
What the hey? I can't seem to find a combination in the replacement that will work!
In the replacement $0 will be a reference to the entire match, including the quote. It looks like you should be using $1 instead which will be the first capturing group, so just the character immediately before the quote. So your replacement string would be "$1\\\"".
I am trying to remove the following function from all my working files, while leaving the first argument intact. The second argument changes every time.
dotranslate( "Arg1", "Arg2" )
I am trying to do this using Notepad++, but I just can't seem to get it right.
If your strings can contain escaped quotes, this will be quite difficult. If not, you can go with this:
Find what: dotranslate\(\s*("[^"]*")\s*,\s*"[^"]*"\s*\)
Replace with: $1
So this will match dotranslate(, then optional spaces, then capture a string. The string is written as "[^"]*". So quotes, as many non-quotes as possible and quotes again. And then we just match spaces, comma, spaces, string, spaces, closing parenthesis.
And replace that with what we captured in the first (and only) set of unescaped parentheses. Which is the first string.
If Args1 should not be variable, simply write the specific value into the capturing group.