Regex find incorrect " - regex

i am trying to build an regex to find wrong " in an csv file:
for example
,"Nori",,,,,896282962,23.07.2013,,,,"Lady Love "Karo","w",
The " before Karo is wrong, but there can be multiple " inside a ,"", column.
So every ," and ", is correct but an " with leading or following char and no , before or after the char is incorrect.
Can anyone help me find the correct regex pattern?
Regards.

You can use the following to match:
(?<!,|^)"(?!,|$)
See RegEX DEMO
Explanation:
(?<!,|^) : Negative lookbehind to check for , or start of the string
" : Match quote
(?!,|$) : Negative lookahead to check for , or end of the string

Related

Google Sheets REGEXTRACT between two quotes

I'm trying to extract data between two quotes using the Google Sheets REGEXEXTRACT function.
The regex works perfect:
(?<=actor_email":")(.*?)(?=")
Data in the cell is:
{"account_name":"Test","actor_email":"test#test.com","user_email":"anyone#test.com"}
However, placing it within the Google Sheet gives an error.
Been trying a number of combinations with no luck.
Tried using: (?<=actor_email""":""")(.*?)(?=""")
The output should be: test#test.com
You may use
=REGEXEXTRACT(A1, "actor_email"":""([^""]+)""")
The pattern is actor_email":"([^"]+)":
actor_email":" - a literal substring
([^"]+) - Capturing group 1 (the value extracted): any 1+ chars other than "
" - a " char (may be removed if this " can be missing)
or eliminate quotes like:
=REGEXEXTRACT(SUBSTITUTE(A1, """", ), "actor_email:(.+),user_")
=REGEXEXTRACT(SUBSTITUTE(A1, """", " "), "actor_email : ([^ ]+)")

match only HTML name regex

I have this example :
<button type="sasasasasa" abcd="dsqdsq" efgh="sasasa">
I only want to match "button" "type" "abcd" and "efgh".
I already tried : [a-zA-Z:_][a-zA-Z:_.*]* but it matches also what's inside the quotes ""
I thought about taking either "=" or "<" or " " at the beginning to only match but I don't want to have such caracters in my results.
Use this:
(?<= |<)[a-zA-Z]*(?==| )
How it works:
[a-zA-Z]*: search for any sequence of letters which ...
(?<= |<): is preceded by either a < (as in the case of button) or by a space and ...
(?==| ): is followed by either a = sign or a space (as in the case of button)
See it on Regex101
You can try this
(?<=<|\s)[a-zA-Z:_][a-zA-Z:_.*][^=|\s]+
This will look after "<" or " " and before "=" or " " but will not include any of the characters.
Check this regex online tester.
Hope this helps

How to select the string html using Regex?

I tried to select " only but unable to select the correct output
Input:
"V&M" Test
Regex:
( |"|&|[\w&#0-9;]+|.)
Ouput: I need:
Match 1
1. "
Match 2
1. V&M
Match 3
1. "
Match 4
1. Test
Try this:
"|.*?(?="|$)
It matches either " or anything else until next " or end of string.
Demo: https://regex101.com/r/82TOog/1
Here is a suggestion:
( )|(")|(\w&\w)|[\w]+
Try it on Regex101
Here is also a more generic way that matchs anything between & and ;:
(&[\w]+;)|[\w]+
Try it on Regex101

boost regex pattern for special characters

Does anyone know how to make boost pattern for special characters in c++ ?
That means not only ("[A-Za-z0-9 \\s]*")
but _ + - \ ) \\ ( . | ] etc. as well so for example string like this:
"hello world \\has (special.characters) + defined_with[boost]" is valid
but
"!hello world \\has (special.characters) + defined_with[boost]" is not valid
to be more specific, something like this:
string input;
getline(cin,input);
boost::regex pattern1 ("[a-zA-Z0-9 \\s \\_ \\+ \\- \\\ \\) \\\\ \\( \\. \\| ]*");
if (!regex_match (input, pattern1))
{
cout << "invalid input" << endl;
}
else
cout << input << " - is valid" << endl;
I will appreciate any help you can provide
I would recommend to do a set excluding the character that won't pass like that for example :
[^\\!\\?]+ , then test if match
I didnt use Boost:regex so far, but i know a bit about Regex.
I assume you want to match only if at least one "special character" is present.
Furthermore i assume you want only alphanummeric letters to be valid and the rest to be invalid.
Like "Good String" and "\Bad.String)". Please try to be more specific about this.
First of all matching only one class with zero or more "*" will always return true, as far as i know. Try "+" instead to match only if you have at least one instance of your class.
Second, you may try to use a class like [^a-zA-Z0-9 ] which results true if there is anything thats NOT within your class.
For further reading i suggest the documentation(yeah i know, that regex syntax is quite the hell): http://www.boost.org/doc/libs/1_53_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
or my favorite for the regex syntax
http://en.wikipedia.org/wiki/Regular_expression
I hope this helps!

Matching exactly one occurrence in a string with a regular expression

The other day I sat with a regular expression problem. Eventually I solved it a different way, without regular expressions, but I would still like to know how you do it :)
The problem I was having was running svn update via an automated script, and I wanted to detect conflicts. Doing this with or without regex is trivial, but it got me thinking about a more obscure problem: How do you match exactly ONE occurrence of a character inside a fixed length field of whitespace?
For instance, let's say we wanted to match "C" inside a six-byte wide field:
"C " MATCH
" C " MATCH
" C C " NO MATCH
" M " NO MATCH
" " NO MATCH
"C " NO MATCH (7 characters, not 6)
" C " NO MATCH (5 characters, not 6)
I know it's not right to answer your own question, but I basically merged your answers ... please don't flame :)
^(?=.{6}$) *C *$
Edit:
Replacing . with Tomalak's response below [ C] increases the speed with about 4-5% or so
^(?=[ C]{6}$) *C *$
^(?=[ C]{6}$) *C(?! *C)
Explanation:
^ # start-of-string
(?=[ C]{6}$) # followed by exactly 6 times " " or "C" and the end-of-string
*C # any number of spaces and a "C"
(?! *C) # not followed by another C anywhere (negative lookahead)
Notes:
The ^(?=…{6}$) construct can be used anywhere you want to measure string length but not actually match anything yet.
Since the end of the string is already checked in the look-ahead, you do not need to put a $ at the end of the regex, but it does not hurt to do it.
^[^C]*C[^C]*$
but this will not verify the length of your string.