How to group a certain character in regex expression? - regex

This is my regex expression
/(')([\w\ \,\"]+)(')/g
I want to group the \" individually so I can replace it with some other character later, how do I do that?
Sample https://regex101.com/r/mqO4yL/1

Kind of hard to know for sure without an example of the text you'd being using in the regex. But based on what it looks like it's trying to do if you had something like:
/'"test"'/g
or
/'test"'/g
You could use:
/(')((\w|\ |\,)*(\")*)*(')/g
and the fourth group would get you the double quotes. There might be an easier way without so many groups but without an example text I'm not sure.

My guess is that we want to capture " only and everything else then replace it. Maybe, this expression would be an option:
([\s\S]*?)(")?
Demo

Related

Regex for value.contains() in Google Refine

I have a column of strings, and I want to use a regex to find commas or pipes in every cell, and then make an action. I tried this, but it doesn't work (no syntax error, just doesn't match neither commas nor pipes).
if(value.contains(/(,|\|)/), ...
The funny thing is that the same regex works with the same data in SublimeText. (Yes, I can work it there and then reimport, but I would like to understand what's the difference or what is my mistake).
I'm using Google Refine 2.5.
Since value.match should return captured texts, you need to define a regex with a capture group and check if the result is not null.
Also, pay attention to the regex itself: the string should be matched in its entirety:
Attempts to match the string s in its entirety against the regex pattern p and returns an array of capture groups.
So, add .* before and after the pattern you are looking inside a larger string:
if(value.match(/.*([,|]).*/) != null)
You can use a combination of if and isNonBlank like:
if(isNonBlank(value.match(/your regex/), ...

Regex substitution with Notepad++

I have a text file with several lines like these ones:
cd_cod_bus
nm_number_ex
cd_goal
And I want to get rid of the - and uppercase the following character using Notepad++ (I can also use other tool but if it doesn't get the problem more troublesome).
So I tried to get the characters with the following regex (?<=_)\w and replace it using \U\1\E\2 for the uppercasing trick but here is where my problems came. I think the regex is OK but once I click replace all I get this result:
cd_od_us
nm_umber_x
cd_oal
as you can see it is only deleting the match.
Do you know where the problem is?
Thanks.
The search regex has no capture groups, i.e. the \1 and \2 references in the replacement do not refer to anything.
Try this instead:
Search: _(\w)
Replace \U\1\E
There you have a capture group in the search part (the parenthesis around the \w) and the \1 in the replacement refers back to what was captured.
replace
_(.)
with
\U$1
will give you:
cdCodBus
nmNumberEx
cdGoal
and for your
I can also use other tool but if it doesn't get the problem more troublesome
I suggest you try vim.
Try this,
_(\w)
and replace with
\U\1
here's a screenshot

Greedy and non-greedy regex

I currently have this regex: this\.(.*)?\s[=,]\s, however I have come across a pickle I cannot fix.
I tried the following Regex, which works, but it captures the space as well which I don't want: this\.(.*)?(?<=\s)=|(?<!\s),. What I'm trying to do is match identifier names. An example of what I want and the result is this:
this.""W = blah; which would match ""W. The second regex above does this almost perfectly, however it also captures the space before the = in the first group. Can someone point me in the correct direction to fix this?
EDIT: The reason for not simply using [^\s] in the wildcard group is that sometimes I can get lines like this: this. "$ = blah;
EDIT2: Now I have another issue. Its not matching lines like param1.readBytes(this.=!3,0,param1.readInt()); properly. Instead of matching =!3 its matching =!3,0. Is there a way to fix this? Again, I cannot simply use a [^,] because there could be a name like param1.readBytes(this.,3$,0,param1.readInt()); which should match ,3$.
(.*) will match any character including whitespace.
To force it not to end in whitespace change it to (.*[^\s])
Eg:
this\.(.*[^\s])?\s?[=,]\s
For your second edit, it seems like you are doing a language parser. Even though regular expressions are powerful, they do have limits. You need a grammar parser for that.
Maybe you can tell in your first block to capture non space characters, instead of any.
this\.(\S*)?(?<=\s)=|(?<!\s),

Regex to extract specific words

I am looking for regex to extract following words from text. For example, the word which comes after "Replaced" in each of these lines:
Replaced disk
Replaced floppy
Replaced memory
Please suggest the regex for it.
We can't really help you without more details (like which regex flavor you're using, for example), but you probably want to match it with something like this:
Replaced\s+(\w+)\b
...and then extract the desired portion from capturing group #1.
use this regex (?<=Replaced\s*)(.+)
I think
/Replaced\s(\w*?)/
Or:
/(?<=Replaced\s)(\w*)/
If you only want to select the word and not Replaced.

Regex match everything after question mark?

I have a feed in Yahoo Pipes and want to match everything after a question mark.
So far I've figured out how to match the question mark using..
\?
Now just to match everything that is after/follows the question mark.
\?(.*)
You want the content of the first capture group.
Try this:
\?(.*)
The parentheses are a capturing group that you can use to extract the part of the string you are interested in.
If the string can contain new lines you may have to use the "dot all" modifier to allow the dot to match the new line character. Whether or not you have to do this, and how to do this, depends on the language you are using. It appears that you forgot to mention the programming language you are using in your question.
Another alternative that you can use if your language supports fixed width lookbehind assertions is:
(?<=\?).*
With the positive lookbehind technique:
(?<=\?).*
(We're searching for a text preceded by a question mark here)
Input: derpderp?mystring blahbeh
Output: mystring blahbeh
Example
Basically the ?<= is a group construct, that requires the escaped question-mark, before any match can be made.
They perform really well, but not all implementations support them.
\?(.*)$
If you want to match all chars after "?" you can use a group to match any char, and you'd better use the "$" sign to indicate the end of line.
?(.*\n)+
With this you can get everything Even a new line
Check out this site: http://rubular.com/ Basically the site allows you to enter some example text (what you would be looking for on your site) and then as you build the regular expression it will highlight what is being matched in real time.
str.replace(/^.+?\"|^.|\".+/, '');
This is sometimes bad to use when you wanna select what else to remove between "" and you cannot use it more than twice in one string. All it does is select whatever is not in between "" and replace it with nothing.
Even for me it is a bit confusing, but ill try to explain it. ^.+? (not anything OPTIONAL) till first " then | Or/stop (still researching what it really means) till/at ^. has selected nothing until before the 2nd " using (| stop/at). And select all that comes after with .+.