Vim regex for matching strings - regex

How can I create a regex for Vim that accommodates for matching multiple double quotes strings on one line, without matching the text in between the two double quotes strings? A restriction on the pattern is that the double quoted strings can't contain a single quote. So far I came up with /"\([^']\{-}\)"/ to match the strings below. But as you see it will match the text in between the strings for the second line. I can't rely on white space surrounding the strings, as you see in the third line. And of course it needs to work with the fourth line as well.
"cat" is called "foo"
"cat's" name is "foo"
x="cat's food"
x = "cat"

basically I want to retrieve the contents from within the double
quotes. This way I can replace them with single quotes. It goes
without saying that I do not want to replace double quotes for single
quotes when there is a single quote inside
I didn't find a way to write a simple regex to match your need, but with vim's :s there is a way:
%s/\v"([^"]*)"/\=stridx(submatch(1),"'")>=0?submatch(0):"'".submatch(1)."'"/g
after you executed the above line, your example text would be changed into:
'cat' is called 'foo'
"cat's" name is 'foo'
x="cat's food"
x = 'cat'

I'm not exactly sure I understand what you need here, but
/\("\([^"]*'[^"]*\)\)\#<!\("\([^"^']*\)"\)
matched all the strings from your example that are in double quotes, but not those containing single quotes.
"cat" is called "foo" => "cat", "foo" highlighted
"cat's" name is "foo" => "foo" highlighted
x="cat's food" => nothing highlighted
x = "cat" => "cat" highlighted
[highlighted here means: found by the regex search in vim when prompted the command from above]
This uses the \#<! construct which is vim-regex syntax for negative look-behind (see vim manual here). Matches double quotes that are not preceded by a quote and a single quote.
This still has trouble if a single quote exists outside of double quotes though. I don't know if thats a problem, let me know if it is.

Related

Regex to match text between single, double and triple quotes

I have a text file that I want to parse strings from. The thing is that there are strings enclosed in either single ('), double (") or 3x single (''') quotes within the exact same file. The best result I was able to get so far is to use this:
((?<=["])(.*?)(?=["]))|((?<=['])(.*?)(?=[']))
to match only single-line strings between single and double quotes. Please note that the strings in the file are enclosed in each type of quotes can be either single- or multi-line and that each type of string repeats several times within the file.
Here's a sample string:
<thisisthefirststring
'''- This is the first line of text
- This is the second line of text
- This is the third line of text
'''
>
<thisisanotheroption
"Just a string between quotes"
>
<thisisalsopossible
'Single quotes
Multiple lines.
With blank lines in between
'
>
<lineBreaksDoubleQoutes
"This is the first sentence here
After the first sentence, comes the blank line, and then the second one."
>
Use this:
((?:'|"){1,3})([^'"]+)\1
Test it online
Using the group reference \1, you can simplify your work
Also, to get only what is inside of the quotes, use the 2nd group of the match
This regex: ('{3}|["']{1})([^'"][\s\S]+?)\1
does what you want.
Some results:
Using Notepad++, you can use: ('''|'|")((?:(?!\1).)+)\1
Explanation:
('''|'|") : group 1, all types of quote
( : group 2
(?:(?!\1).)+ : any thing that is not the quote in group 1
) : end group 2
\1 : back reference to group 1 (i.e. same quote as the beginning)
Here is a screen capture of the result.
Here's something that may work for you.
^(\"([^\"\n\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*\"|'([^'\n\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*'|\"\"\"((?!\"\"\")[^\\]|\\[abfnrtv?\"'\\0-7]|\\x[0-9a-fA-F])*\"\"\")$
Replace the triple double quotes with triple single quotes. See it in action at regex101.com.
Named Group Version
Avoids problems when used in larger expressions by explicitly referring to the name of the group storing the last found quote.
Should work for most systems:
(?<Qt>'''|'|")(.*?)\k<Qt>
.NET version:
(?<Qt>'''|'|"")(.*?)\k<Qt>
Works as follows:
'''|'|": Check first for ''', then ', and finally ". Done in this order so ''' has priority over '.
(?<Qt>'''|'|""): When matched, place the match in <Qt> for later use.
(.*?): Capture the results of a lazy search for 0 or more of anything .*? - will return empty strings. To prevent empty strings from being returned, change to a lazy search for 1 or more of anything .+?.
\k<Qt>: Search for the value last stored in <Qt>.

Using regex to find a double quote within string encased in double quotes

I am using ultraedit with regex. I would like to find (and replace) and embedded double quotes found withing a string that starts/ends with a double quote. This is a text file with pipe | as the delimeter.
How do I find the embedded double quotes:
"This string is ok."|"This is example with a "C" double quoted grade in middle."|"Next line"
I eventually need to replace the double quotes in "C" to just have C.
The big trade off in CSV is correct parsing in every case versus simplicity.
This is a resonably moderated approach. If you have really wily strings with quotes next to pipes in them, you better use something like PERL and Text::CSV.
There is a bother with a regex that requires a non-pipe character on each side of the quote (such as [^|]) in that the parser will absorb the C and then won't find the other quote next to the C.
This example will work pretty well as long as you don't have pipes and quotes next to each other in your actual CSV strings. The lookaheads and behinds are zero-width, so they do not remove any additional characters besides the quote.
1 2 3 4
(?<!^)(?<!\|)"(?!\|)(?!$)
Don't match quotes at the beginning of the line.
Don't match quotes with a pipe in front.
Don't match quotes with a pipe afterwards.
Don't match quotes at the end of a string.
Every quote thus matched can be removed. Don't forget to specify global replacement to get all of the quotes.
Try this find:
(["][^"]*)["]C["]([^"]*["])
and replace:
\1C\2
Turn on Regular Expressions in Perl mode.
Screen shot of
UltraEdit Professional Text/HEX Editor
Version 21.30.0.1005
Trying it out.
Start with:
"This string is ok."|"This is example with a "C" double quoted grade in middle."|"Next line"
"This string is ok."|"This is example with a C double quoted grade in middle."|"Next line"
Ends with:
"This string is ok."|"This is example with a C double quoted grade in middle."|"Next line"
"This string is ok."|"This is example with a C double quoted grade in middle."|"Next line"
Breakdown of the regex FIND.
First part.
(["][^"]*)
from (["][^"]*)["]C["]([^"]*["])
This looks for a sequence of:
Double quote: ["].
Any number of characters that are not double quotes: [^"]*
The brackets that surround ["][^"]* indicate that the regex engine should store this sequence of characters so that the REPLACE part can refer back to it (as back references).
Note that this is repeated at the start and end - meaning that there are two sequences stored.
Second part.
["]C["]
from (["][^"]*)["]C["]([^"]*["])
This looks for a sequence of:
Double quote: ["].
The capital letter C (which may or may not stand for Cookies).
Double quote: ["].
Breakdown of the regex REPLACE.
\1C\2
\1 is a back reference that means replace this with the first sequence saved.
The capital letter C (which may or may not stand for Cookies).
\2 is a back reference that means replace this with the second sequence saved.
For the example you gave just "\w" works as the regex to find "C"
Try it here
The replacing mechanism is probably built into ultraedit
You really don't want to do this with regex. You should use a csv parser that can understand pipe delimiters. If I were to this with just regex, I would use multiple replacements like this:
Find and replace the good quotes with placeholder to text. Start/end quote:
s/(^"|"$)/QUOTE/g
Quotes near pipe delimiters:
s/"\|"/DELIMITER/g
Now only embedded double quotes remain. To delete all of them:
s/"//g
Now put the good quotes back:
s/QUOTE|DELIMITER/"/g
nanny posted a good solution, but for a Perl script, not for usage in a text editor like UltraEdit.
In general it is possible to have double quotes within a field value. But each double quote must be escaped with one more double quote. This is explained for example in Wikipedia article about comma-separated values.
This very simple escaping algorithm makes reading in a CSV file character by character coded in a programming language very easy. But double quotes, separators and line breaks included in a double quoted value are a nightmare for a regular expression find and replace in a CSV file.
I have recorded several replaces into an UltraEdit macro
InsertMode
ColumnModeOff
Top
PerlReOn
Find MatchCase RegExp "^"|"$"
Replace All "QuOtE"
Find MatchCase ""|"
Replace All "QuOtE|"
Find MatchCase "|""
Replace All "|QuOtE"
Find MatchCase """"
Replace All "QuOtEQuOtE"
Find MatchCase """
Replace All """"
Find MatchCase "QuOtE"
Replace All """
The first replace is a Perl regular expression replace. Each double quote at beginning or end of a line is replaced by the string QuOtE by this replace. I'm quite sure that QuOtE does not exist in the CSV file.
Each double quote before and after the pipe character is also replaced by QuOtE by the next 2 non regular expression replaces.
Escaped double quotes "" in the CSV file are replaced next by QuOtEQuOtE with a non regular expression replace.
Now the remaining single double quotes are replaced by two double quotes to make them valid in CSV file. You could of course also remove those single double quotes.
Finally, all QuOtE are replaced back to double quotes.
Note: This is not the ultimate solution. Those replaces could produce nevertheless a wrong result, for example for an already valid CSV line like this one
"first value with separator ""|"" included"|second value|"third value again with separator|"|fourth value contains ""Hello!"""|fifth value
as the result is
"first value with separator """|""" included"|second value|"third value again with separator|"|fourth value contains ""Hello!"""|fifth value
PS: The valid example line above should be displayed in a spreadsheet application as
first value with separator "|" included second value third value again with separator| fourth value contains "Hello!" fifth value

How can I strip double quotes and braces from my strings before insert in Rails4?

I am parsing values from xml and saving them to variables. I was able to strip all but the braces and double quotes from the string. The value displays like this on the page: ["MPEG Video"].
Here is an exampled of the parse saving it to a variable:
#video_format = REXML::XPath.each(media_parse_doc, "//track[#type='Video']/Format/text()") { |element| element }
I tried using .ts like this:
#video_format = (REXML::XPath.each(media_parse_doc, "//track[#type='Video']/Format/text()") { |element| element } ).ts('[]"','')
but it did not work. I saw some examples telling to you gsub and I looked at the api dock for gsub but I am not understanding the thought logic in the examples to be able to apply it correctly to my own case. Here is one of the examples:
"foobar".gsub(/^./, "") # => "oobar"
I understand it is removing te first character but I don't know how to set it up to remove " and [.
Why the /^? Is that ascii for something? Can someone please show me the correct syntax to remove the double quotes and braces from my varialbes and explain the logic process so I can better understand to use on my own in the future?
Thank you for the help!
If you want to understand regular expressions, check out http://rubular.com/.
"foobar".gsub(/^./, "") # => "oobar" that particular example will substitue the first letter of the string with "" (ie, nothing). The reason is that the ^ says "pin the match to the beginning of the string", and the . says "match any character" - so, it'll match any character at the beginning of the string. The encosing / characters are just the standard delimiters for a regular expression - so it's only the ^. that you need to figure out.
To replace double quotes: 'fo"o"bar'.gsub(/"/, "") # => "foobar"
To replace left square bracket: 'fo[o[bar'.gsub(/\[/, "") # => "foobar" (because square brackets are a special character in regex, you have to prefix them with a \ when you want to use them as a 'normal' character.
to replace all quotes and square brackers in one: 'fo[o"[b]"ar'.gsub(/("|\[|\])/, "") # => "foobar"
(the parenthesis indicate a group, and the pipes | indicate 'or'. So, ("|\[|\]) means "match any of the things in this group: a quote, or a left square bracket, or a right square bracket".
But really what you should do is do a good intro tutorial to regular expressions and start from the basics. Once you understand that, it shouldn't be too hard to start composing simple regular expressions of your own.
If you're on a mac, this app is very useful for writing your own regex's: http://krillapps.com/patterns/

RegEx Expression to find strings with quotation marks and a backslash

I am using a program that pastes what is in the clipboard in a modified format according to what I specify.
I would like for it to paste paths (i.e. "C:\folder\My File") without the pair of double quotes.
This, which isn't using RegEx works: Find " (I simply enter than in one line) and replace with nothing. I enter nothing in the second field. I leave it blank.
Now, though that works, it will remove double quotes in this scenario: Bob said "What are you doing?"
I would like the program to remove the quotes only if the the words enclosed in the double quotes have a backslash.
So, once again, just to make sure I am clear, I need the following:
1) RegEx Expression to find strings that have both double quotes and a backslash within those set of quotes.
2) A RegEx Expression that says: replace the backslashes with backslashes (i.e. leave them there).
Thank you for the fast response. This program has two fields. One for what to find and the other for what to replace. So, what would go in the 2nd field?
The program came with the Remove HTML entry, which has
<[^>]*> in the match pattern
and nothing (it's blank) in the Replacement field.
You didn't say which language you use, here's an example in Javascript:
> s = 'say "hello" and replace "C:\\folder\\My File" thanks'
"say "hello" and replace "C:\folder\My File" thanks"
> s.replace(/"([^"\\]*\\[^"]*)"/g, "$1")
"say "hello" and replace C:\folder\My File thanks"
This should work in .NET:
^".*?\\.*?"$

Regex for excluding characters

I'm trying to strip a string of all special characters except a few, plus remove everything between brackets (square, or any other, Including the brackets!). My current regex is:
^[a-zA-Z0-9äöüÄÖÜ;#.]*$
\\[.+\\]
\\<.+\\>
\\s+
All sequences that match one of the above are removed
It works fine on e.g.:
Foo Bar[Foo.Bar#google.com]
reducing it too FooBar but not on e.g.:
Foo
foo#bar.com
removing them completely
Update: Updating regex as per OP's edit.
You can use the following regex and replace the match with empty string.
\[.*?\]|<.*?>|\s|[^a-zA-Z0-9äöüÄÖÜ;#.]
To remove anything between brackets except brackets, you could use the following regex and replace it with an empty string:
/\[[^\]]*\]/
To remove special characters, you could use the one below. It selects everything except what is inside the brackets. So you could once again replace it with the empty string.
/[^a-zA-Z0-9äöüÄÖÜ;#]/
You could use them in sequence or build a bigger one.
In Ruby, I have the following test:
irb(main):001:0> s = "Foo Bar[Foo.Bar#google.com]"
=> "Foo Bar[Foo.Bar#google.com]"
irb(main):005:0* s.gsub(/\[[^\]]*\]|[^a-zA-Z0-9äöüÄÖÜ;#]/, "")
=> "FooBar"
Note that the space has disappeared.