I would like to remove constant text and brackets from a String using regex. e.g.
INPUT Expected OUTPUT
var x = CONST_STR(ABC) var x = ABC
var y = CONST_STR(DEF) var y = DEF
How to achieve this?
Try with regex:
(?<==\s)\w+\(([^)]+)\)
DEMO
which means:
(?<==\s) - lookbehind for = and space (\s), the regax need to fallow these signs,
\w+ - one or more word characters (A-Za-z_0-9) this might be upgraded to naming rules of language from code you are processing,
\( - opening bracket,
([^)]+) - capturing group (\1 or $1) for one or more characters, not closing brackets,
\) - closing bracket,
Just remember that in Java you need to use double escape character \,
like in:
public class Test {
public static void main(String[] args) {
System.out.println("var y = CONST_STR(DEF)".replaceAll("(?<==\\s)\\w+\\(([^)]+)\\)", "$1"));
}
}
with output:
var y = DEF
The $1 in replaceAll() is a call to captured groups nr 1, so a text captured by this fragment of regex: ([^)]+) - words in brackets.
Related
I would like to replace all the dots inside the HTML tag <code> with the word " dot ".
If I do like this it will change only the first occurrence.
I would like to change them all.
const str = 'Some text <code class="foo">this.is.a.class</code> and <code>this.another.thing</code>';
const res = str.replaceAll(/(<code[^>]*>.*?)(\.)(.*?<\/code>)/g, "$1 dot $3");
console.log(res);
// Some text <code class="foo">this dot is.a.class</code> and <code>this dot another.thing</code>
Why it is changing only the first?
You can use a String#replace method with a callback function:
const str = 'Some text <code class="foo">this.is.a.class</code> and <code>this.another.thing</code>';
const res = str.replaceAll(/(<code[^>]*>)([\s\S]*?<\/code>)/g, (_,x,y) => `${x}${y.replaceAll(".", " dot ")}`);
console.log(res);
The (<code[^>]*>)([\s\S]*?<\/code>) regex matches and captures the open code tag into Group 1 (x variable), and then captures any zero or more chars as few as possible and then the close code tag into Group 2 (y variable). When replacing, all dots in y (Group 2 captured value) are replaced with DOT inside the arrow function.
Note that [\s\S] matches any chars including line break chars, . does not match line break chars (at least by default).
My regex is matching a string like between a tab or comma and a colon.
(?<=\t|,)([a-z].*?)(:)
This returns a string: app_open_icon:
I would like the regex to remove the : appended and return app_open_icon only.
How should I chnage this regex to exclude :?
Try (?<=\t|,)[a-z].*?(?=:)
const regex = /(?<=\t|,)[a-z].*?(?=:)/;
const text='Lorem ipsum dolor sit amet,app_open_icon:consectetur adipiscing...';
const result = regex.exec(text);
console.log(result);
You don't have to use lookarounds, you can use a capture group.
[\t,]([a-z][^:]*):
[\t,] Match either a tab or comma
( Capture group 1
[a-z][^:]* Match a char in the range a-z and 0+ times any char except :
) Close group 1
: Match literally
Regex demo
const regex = /[\t,]([a-z][^:]*):/;
const str = `,app_open_icon:`;
const m = str.match(regex);
if (m) {
console.log(m[1]);
}
To get a match only using lookarounds, you can turn the matches into lookarounds, and omit the capture group 1:
(?<=[\t,])[a-z][^:]*(?=:)
Regex demo
const regex = /(?<=[\t,])[a-z][^:]*(?=:)/;
const str = `,app_open_icon:`;
const m = str.match(regex);
if (m) {
console.log(m[0]);
}
You already have achived that with your regex.
Positive Lookbehind : (?<=\t|,) check for comma or tab before start of match
First capturing group : ([a-z].*?) captures everything before : and after , or \t
Second capturing group : (:) captures :
Just extract the first captuaring group of your match.
Check regex demo
I have the following javascript code
var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....
To find the closest string declaration to c, I was using something like this:
var \w+ = '(.*?)';(?:(?!var \w+ = '(.*?)';.*?var c = \w+\.map)).*?var c = \w+\.map
I am using negative lookahead to not match a declaration if there is another declaration ahead that matches. However, the result I am getting is still <some string 2>. Could someone please explain me what causes that to happen?
The are more ways to match a string in JavaScript, and even more ways to break the pattern as this is very very brittle.
For the example data in the question, using a negative lookahead to assert that the next line does not start with the .map part or a string:
^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b
^ Start of string
var\s+\w\s*=\s* Match a possible var format and the equals sign
(["'`]) Capture group 1, match one of " ' or `
(.*?) Capture group 2 capture the data that you want to find
\1;\n A back reference to match up the quote in group 1
(?: Non capture group, to repeat as a whole
(?! Negative lookahead, assert what is directly to the right is not
var\s+\w+\s*=\s* Match a var part with =
(?: Non capture group for the alternation |
\w+\.map\b Match the .map part
| Or
(["'`]).*?\3 Capture group 3 to match one of the quotes with a back reference to match it up
) Close non capture group
;\n Match ; and a newline
) Close lookahead
.*\n Match the whole line and a newline
)* Close non capture group and optionally repeat to match all lines
var\s+\w+\s*=\s*\w\.map\b Match the line with .map
Regex demo
const regex = /^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b/m;
const str = `var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....`;
const m = str.match(regex);
if (m) {
console.log(m[2]);
}
I need to replace two characters {, } with {\n, \n}.
But they must be not surrounded in '' or "".
I tried this code to achieve that
text = 'hello(){imagine{myString("HELLO, {WORLD}!")}}'
replaced = re.sub(r'{', "{\n", text)
Ellipsis...
Naturally, This code replaces curly brackets that are surrounded in quote marks.
What are the negative statements like ! or not that can be used in regular expressions?
And the following is what I wanted.
hello(){
imagine{
puts("{HELLO}")
}
}
In a nutshell - what I want to do is
Search { and }.
If that is not enclosed in '' or ""
replace { or } to {\n or \n}
In the opposite case, I can solve it with (?P<a>\".*){(?P<b>.*?\").
But I have no clue how I can solve it in my case.
First replace all { characters with {\n. You will also be replacing {" with {\n". Now, you can replace back all {\n" characters with {".
text = 'hello(){imagine{puts("{HELLO}")}}'
replaced = text.replace('{', '{\n').replace('{\n"','{"')
You may match single and double quoted (C-style) string literals (those that support escape entities with backslashes) and then match { and } in any other context that you may replace with your desired values.
See Python demo:
import re
text = 'hello(){imagine{puts("{HELLO}")}}'
dblq = r'(?<!\\)(?:\\{2})*"[^"\\]*(?:\\.[^"\\]*)*"'
snlq = r"(?<!\\)(?:\\{2})*'[^'\\]*(?:\\.[^'\\]*)*'"
rx = re.compile(r'({}|{})|[{{}}]'.format(dblq, snlq))
print(rx.pattern)
def repl(m):
if m.group(1):
return m.group(1)
elif m.group() == '{':
return '{\n'
else:
return '\n}'
# Examples
print(rx.sub(repl, text))
print(rx.sub(repl, r'hello(){imagine{puts("Nice, Mr. \"Know-all\"")}}'))
print(rx.sub(repl, "hello(){imagine{puts('MORE {HELLO} HERE ')}}"))
The pattern that is generated in the code above is
((?<!\\)(?:\\{2})*"[^"\\]*(?:\\.[^"\\]*)*"|(?<!\\)(?:\\{2})*'[^'\\]*(?:\\.[^'\\]*)*')|[{}]
It can actually be reduced to
(?<!\\)((?:\\{2})*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'))|[{}]
See the regex demo.
Details:
The pattern matches 2 main alternatives. The first one matches single- and double-quoted string literals.
(?<!\\) - no \ immediately to the left is allowed
((?:\\{2})*(?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')) - Group 1:
(?:\\{2})* - 0+ repetitions of two consecutive backslashes
(?: - a non-capturing group:
"[^"\\]*(?:\\.[^"\\]*)*" - a double quoted string literal
| - or
'[^'\\]*(?:\\.[^'\\]*)*' - a single quoted string literal
) - end of the non-capturing group
| - or
[{}] - a { or }.
In the repl method, Group 1 is checked for a match. If it matched, the single- or double-quoted string literal is matched, it must be put back where it was. Else, if the match value is {, it is replaced with {\n, else, with \n}.
Replace { with {\n:
text.replace('{', '{\n')
Replace } with \n}:
text.replace('}', '\n}')
Now to fix the braces that were quoted:
text.replace('"{\n','"{')
and
text.replace('\n}"', '}"')
Combined together:
replaced = text.replace('{', '{\n').replace('}', '\n}').replace('"{\n','"{').replace('\n}"', '}"')
Output
hello(){
imagine{
puts("{HELLO}")
}
}
You can check the similarities with the input and try to match them.
text = 'hello(){imagine{puts("{HELLO}")}}'
replaced = text.replace('){', '){\n').replace('{puts', '{\nputs').replace('}}', '\n}\n}')
print(replaced)
output:
hello(){
imagine{
puts("{HELLO}")
}
}
UPDATE
try this: https://regex101.com/r/DBgkrb/1
I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.