Regexp to cut all text inside the external quotation signs - regex

Please help me with adjusting regexp. I need to cut all text inside the external quotation signs.
I have text:
some text "have "some text" here "that should" be cut"
My regexp:
some text "(?<name>[^"]*)"
Need to get
have "some text" here "that should" be cut
But I've got
have

If you want to supported the first level of nested double quotes you can use
some text "(?<name>[^"]*(?:"[^"]*"[^"]*)*)"
See the regex demo.
Details:
[^"]* - zero or more chars other than double quotes
(?:"[^"]*"[^"]*)* - zero or more repetitions of
"[^"]*" - a substring between double quotes that contains no other double quotes
[^"]* - zero or more chars other than double quotes.
If your regex flavor supports recursion:
some text ("(?<name>(?:[^"]++|\g<1>)*)")
See this regex demo. Here, ("(?<name>(?:[^"]++|\g<1>)*)") is a capturing group #1 that matches
" - a " char
(?<name>(?:[^"]++|\g<1>)*) - Group "name": zero or more sequences of
[^"]++ - one or more chars other than "
| - or
\g<1> - Group 1 pattern recursed
" - a " char

Assuming you want to remove all text up to the first quotes then retain everything till the last quote, you can try this.
Demo
[[:alpha:]][^"]*\"(?<name>.*)"

You can solve this problem with nested regexp operators:
SELECT regexp_replace(Regexp_substr(regexp_replace(word,'(^")|("$)'),'["].+'),'(^")') as Result
from(
SELECT '"some text "have "some text" here "that should" be cut"' as word from dual)

Related

Select all single quotes in regex field

I have this field in my JSON data:
"pinyin": "bei1 'ai1",
I just want to select any single quote ' like the one before ai1;
I tried this
(?<="pinyin": "\w*)\'+(?!")
but it didn't work
You can use
(?<="pinyin": "[\w\s]*)'(?!")
See this regex demo. Details:
(?<="pinyin": "[\w\s]*) - a positive lookbehind that matches a location that is immediately preceded with "pinyin": " and then any zero or more word or whitespace chars
' - a single quotation mark
(?!") - a negative lookahead that fails the match of there is a " char immediately to the right of the current location.

regex to replace specific characters while capturing the rest of the line

Using notepad++
I need to replace ", " with , on a line beginning exclusively with genre: and no where else in the document, while maintaining all of the other content in the line. I will be applying the search/replace to an entire folder, so I need to be as precise as I can.
Examples
genre: "drama", "thriller", "mystery", "espionage"
genre: "drama", "sci-fi"
should look like this:
genre: "drama, thriller, mystery, espionage"
genre: "drama, sci-fi"
I'm having a hell of a time figuring out how to do that without capturing an unlimited and unknown number of groups before and after each instance of ", ", while also keeping the first word and colon: genre: . I'm pretty sure I have to capture the entire group between the first and last ", and then replace ", " with just , within that group, but I can't figure out how to do that.
Obviously what I have here isn't going to do the trick.
find what: ^genre: "(.*)", "(.*)", "(.*)", "(.*)"
replace with: genre: "$1, $2, $3, $4"
You can use
Find: (?:\G(?!^)|^genre:\h*").*?\K",\h*"
Replace: ,<SPACE>
Details:
(?:\G(?!^)|^genre:\h*") - end of the previous match position or genre:, zero or more horizontal whitespaces and " at the start of string (here, line)
.*? - any zero or more chars other than line break chars, as few as possible
\K - omit the matched text
",\h*" - consume ",, then zero or more horizontal whitespaces, and then a " (this will be replaced with , + space)
See the regex demo:
Try this code then, Updated Answer
Find: (?:^genre|\G)(?!^).*?\K", "
Replace All: , there is a space after ","

Notepad++: reemplace ocurrences of characters before other character

I have a file with text like this:
"Title" = "Body"
And I would like to remove both " before the =, to leave it like this:
Title = "Body"
So far I managed to select the first block of text with:
.+(=)
That selects everything up to the =, but I can't find how to reemplace (or delete) both " .
Any suggestions?
You could use a capture group in the replacement, and match the double quotes to be removed while asserting an equals sign at the right.
Find what:
"([^"]+)"(?=\h*=)
" Match literally
([^"]+) Capture group 1, match 1+ times any char other than "
" Match literally
(?=\h*=) Positive lookahead, assert an = sigh at the right
Regex demo
Replace with:
$1
To match the whole pattern from the start till end end of the string, you might also use 2 capture groups and use those in the replacement.
^"([^"]+)"(\h*=\h*"[^"]+")$
Regex demo
In the replacement use $1$2
You can use
(?:\G(?!^)|^(?=.*=))[^"=\v]*\K"
Replace with an empty string.
Details:
(?:\G(?!^)|^(?=.*=)) - end of the previous successful match (\G(?!^)) or (|) start of a line that contains = somewhere on it (^(?=.*=))
[^"=\v]* - any zero or more chars other than ", = and vertical whitespace
\K - omit the text matched
" - a " char (matched, consumed and removed)
See the screenshot with settings and a demo:

Regex ignore multiple wrong placed quotes

From this input:
""" "01-01-2000""" " ",""" "Bank123""" "", "" ""Example text" " "",
I want to extract:
01-01-2000
Bank123
Example text
I managed this:
(["'])(?:(?=(\\?))\2.)*?\1
But if fails if it comes to deal with many wrong placed quotes. Any ideas?
As I see, you are interested in strings which:
start with either a digit or a letter,
followed by a (maybe empty) sequence of chars other than ".
So the intuitive solution is [a-z\d][^"]* with gi options
(global, case insensitive).
For your given example, perhaps it could be an option to match a whitespace or a double quote zero or more times [ "]* to match what comes before the value between the inner double quotes.
Then match that double quote and capture in a group not a double quote or a newline ([^"\r\n]+) using a negated character class.
At the end match the closing double quote followed by zero or more times a whitespace or a double quote which will match what comes after so the group does not match a whitespace between double quotes.
[ "]*"([^"\r\n]+)"[ "]*
There are various options to do so:
1) ([\d-\w\s][\d-\w\s]+)
2) ([\d-\w\s]{2,})
3) "\b(.+?)\b"
4) \b([^"]{2,})\b
Demo : https://regex101.com/r/jPXqKv/1
Test:
""" "01-01-2000""" " ",""" "Bank123""" "", "" ""Example text" " ""
Match:
Match 1
Full match 5-15 `01-01-2000`
Group 1. 5-15 `01-01-2000`
Match 2
Full match 28-35 `Bank123`
Group 1. 28-35 `Bank123`
Match 3
Full match 48-60 `Example text`
Group 1. 48-60 `Example text`

regex - capture group

I trying to write a regex to match the following at the beginning of a new line
- a number followed by parantheses e.g. 2) or 8)
- a number followed by period e.g. 5
- the character '-'
- the character '*'
the following strings should match
"1. Sorting function. If you have a long checklist it's very difficult."
"5) This is another example"
"-this is yet another one"
"* last item in the list"
I have tried this but it doesn't quite get me what I am looking for.
re.findall(r'(?m)\s*^[-*(\d.)(\d\))]',item)
Try
re.findall(r'^\s*(\d+(\)|\.)|-|\*)', item, re.MULTILINE)
It will match all sequences of numbers followed by a closing parenthesis or period as well as dashes and stars at the beginning of the line.
Example: https://regex101.com/r/cR2lZ5/6
Assuming that your quote marks " are not included, and that each line is a separate string,
^\d\.|^\d\)|^\-|^\*
Would be the regular expression. | is OR, \d is a digit, and you escape the special characters ".", ")", "-", and "*" by putting a backslash in front of them.
You can test your regular expressions here. Good luck!