Regex match only working with first catches (JavaScript) - regex

I have a file content into memory. Within the file, there are variables with the form of:
{{ _("variable1") }}
{{ _("variable2") }}
{{ _("variable3") }}
I'm trying to catch them with /\{\{ _(.+) \}\}/i:
var result = /\{\{ _(.+) \}\}/i.exec(fileContents);
It seems to work at first, as the first two variables are pushed into the array, but then it pushes the whole file content.
What am I missing?
BONUS: It would be awesome if I could grab variable1 instead of {{ _("variable1") }} but I can live with it.

What you need is the g flag. This way you get an additional match every time you call exec (until there are no further matches, and you get null). For the bonus, just include the (" and ") in the pattern, so that they are not captured. Finally, you might want to make the .+ ungreedy, otherwise you'll get funny surprises if there are multiple occurrences of this pattern in a single line:
r = /\{\{ _\("(.+?)"\) \}\}/ig;
while(m = r.exec(fileContents)
{
// m[0] will contain the entire match
// m[1] will contain the contents of the quotes
}
By the way, if "variable1" cannot contain escaped quotes (like "some\"oddvariable"), then this regex should be slightly more efficient:
r = /\{\{ _\("([^"]*)"\) \}\}/ig

Related

Regex: Replace semi-colons with enter key "\n"

I have a string to replace semi-colons with \n. The requirement I have is to detect only those semi-colons that are outside HTML <> tags and replace them with \n.
I have come very close by using this regex by implementing multiple fixes.
/((?:^|>)[^<>]*);([^<>]*(?:<|$))/g, '$1\n$2'
The above regex works well if I input string like the below one -
Value1;<p style="color:red; font-weight:400;">Value2</p>;<p style="color:red; font-weight:400;">Value3</p>;Value4
The output it gives is this (which is expected and correct) -
Value1
<p style="color:red; font-weight:400;">Value2</p>
<p style="color:red; font-weight:400;">Value3</p>
Value4
But fails if I input string like - M1;M2;M3
The output this gives is -
M1;M2
M3
(semi-colon doesn't remove between M1 and M2).
whereas the expected output should be -
M1
M2
M3
Also the string can be like this too (both combined) - M1;M2;M3;Value1;<p style="color:red; font-weight:400;">Value2</p>;<p style="color:red; font-weight:400;">Value3</p>;Value4
The major goal is to replace all the semicolons outside HTML Tags <> and replace it with '\n` (enter key).
You can use this regex associate with .replace() function of JavaScript:
/(<[^<>]*>)|;/g
For substitution, you may use this function:
(_, tag) => tag || '\n'
If (<[^<>]*>) catches anything - which is a HTML tag, it will go into tag parameter, otherwise an outbound ; must be matched.
So you can check if tag exists. If it exists, replace with itself, otherwise replace it with a \n.
const text = `Value1;<p style="color:red; font-weight:400;">Value2</p>;<p style="color:red; font-weight:400;">Value3</p>;Value4
M1;M2;M3`;
const regex = /(<[^<>]*>)|;/g;
const result = text.replace(regex, (_, tag) => tag || '\n');
console.log(result);

How to replace newline or soft linebreak (ctrl+enter) in Google doc app script?

I have a working template (Google Doc) and have variables with following patterns to be replace with values
{{BASIC SALARY_.Description}}
{{OT1.5.Description}}
{{MEL ALW.Description}}
{{OST ALW.Description}}
{{TRV ALW.Description}}
{{ADV SAL.Description}}
note: I am using soft line break (ctrl+enter) in google doc as I couldn't figure out to detect normal linebreak pattern "\n", "\n", "\r\n" but my result always weird as some line need to be replaced as proper descriptions but some need to be totally nullify (remove whole {{pattern}} together with the line break to avoid empty line)
I have tried out multiple REGEX patterns, googled the online forum
https://github.com/google/re2/wiki/Syntax
Eliminate newlines in google app script using regex
Use RegEx in Google Doc Apps Script to Replace Text
and figure out only soft linebreak is the only way to deal with (identify pattern \v. Please check my sample code as the pattern replace doesn't work as expected.
// code block 1
var doc = DocumentApp.openById(flPayslip.getId());
var body = doc.getBody();
body.replaceText("{{BASIC SALARY_.Description}}", "Basic Salary");
body.replaceText("{{OST ALW.Description}}", "Outstation Allowance");
// code block 2
var doc = DocumentApp.openById(flPayslip.getId());
var body = doc.getBody();
body.replaceText("{{BASIC SALARY_.Description}}", "Basic Salary");
body.replaceText("{{OST ALW.Description}}", "Outstation Allowance");
body.replaceText("{{.*}}\\v+", ""); // to replace soft linebreak
Actual Result of code block 1
Basic Salary
{{OT1.5.Description}}
{{MEL ALW.Description}}
Outstation Allowance
{{TRV ALW.Description}}
{{ADV SAL.Description}}
Actual Result of code block 2:
Basic Salary
Issue: actual result "Outstation Allowance" was removed from regex replacement.
Expected result
Basic Salary
Outstation Allowance
What's the proper regex pattern I should use in my code?
Try
body.replaceText("{{[^\\v]*?}}\\v+", ""); // No \v inside `{{}}` and not greedy`?`
When you use {{.*}}, .* matches everything between the first {{ and the last }}
Basic Salary
{{
OT1.5.Description}}
{{MEL ALW.Description}}
Outstation Allowance
{{TRV ALW.Description}}
{{ADV SAL.Description
}}

Regex trim all <br>'s on a string while ignoring line breaks and spaces

var str = `
<br><br/>
<Br>
foobar
<span>yay</span>
<br><br>
catmouse
<br>
`;
//this doesn't work but what I have so far
str.replace(/^(<br\s*\/?>)*|(<br\s*\/?>)*$/ig, '');
var desiredOutput = `
foobar
<span>yay</span>
<br><br>
catmouse
`;
I want to ensure that I remove all <br>'s regardless of case or ending slash being present. And I want to keep any <br>'s that reside in the middle of the text. There may be other html tags present.
Edit: I want to note that this will be happening server-side so DOMParser won't be available to me.
We may try using the following pattern:
^\s*(<br\/?>\s*)*|(<br\/?>\s*)*\s*$
This pattern targets <br> tags (and their variants) only if they occur at the start or end of the string, possibly preceded/proceeded by some whitespace.
var str = '<br><br/>\n<Br>\nfoobar\n<span>yay</span>\n<br><br>\ncatmouse\n<br>';
console.log(str + '\n');
str = str.replace(/^\s*(<br\/?>\s*)*|(<br\/?>\s*)*\s*$/ig, '');
console.log(str);
Note that in general parsing HTML with regex is not advisable. But in this case, since you just want to remove flat non-nested break tags from the start and end, regex might be viable.
Don't use a regular expression for this - regular expressions and HTML parsing don't work that well together. Even if it's possible with a regex, I'd recommend using DOMParser instead; transform the text into a document, and iterate through the first and last nodes, removing them while their tagName is BR (and removing empty text nodes too, if they exist):
var str = `
<br><br/>
<Br>
foobar
<span>yay</span>
<br><br>
catmouse
<br>
`;
const body = new DOMParser().parseFromString(str.trim(), 'text/html').body;
const nodes = [...body.childNodes];
let node;
while (node = nodes.shift(), node.tagName === 'BR') {
node.remove();
const next = nodes[0];
if (next.nodeType === 3 && next.textContent.trim() === '') nodes.shift().remove();
}
while (node = nodes.pop(), node.tagName === 'BR') {
node.remove();
const next = nodes[nodes.length - 1];
if (next.nodeType === 3 && next.textContent.trim() === '') nodes.pop().remove();
}
console.log(body.innerHTML);
Note that it gets a lot easier if you don't have to worry about empty text nodes, or if you don't care about whether there are empty text nodes or not in HTML output.
Try
/^(\s*<br\s*\/?>)*|(<br\s*\/?>\s*)*$/ig

Replace dots with underscores in right part of the line [duplicate]

This question already has answers here:
Substitution of characters limited to part of each input line
(4 answers)
Closed 6 years ago.
Say I have this piece of text:
some.blah.key={{blah.woot.wiz}}
some.another.foo.key={{foo.bar.qix.name}}
+ many other lines with a variable number of words separated by dots within {{ and }}
I'd like the following outcome after replacing dots with underscores in the right part (between the {{ and }} delimiters):
some.blah.key={{blah_woot_wiz}}
some.another.foo.key={{foo_bar_qix_name}}
...
I'm looking for the appropriate regex to perform the replacement in a one-liner sedcommand`.
I'm on a lead with this one: https://regex101.com/r/8wsLHo/1 but it capture all dots, including those on the left part, which I don't want.
I tried this variation to exclude those on the left part but then it doesn't capture anything anymore: https://regex101.com/r/d7WAmX/1
You can use a loop:
sed ':a;s/\({{[^}]*\)\./\1_/;ta' file
:a defines a label "a"
ta jumps to "a" when something is replaced.
I came up with this quite complex one-liner:
sed "h;s/=.*/=/;x;s/.*=//;s/\./_/g;H;x;s/\n//"
explanations:
h: put line in hold buffer
s/=.*/=/: clobber right part after =
x: swap to put line in main buffer again, first part in hold buffer
s/.*=//: clobber left part before =
s/\./_/g: perform replacement of dots now that there's only right part in main buffer
H: append main buffer to hold buffer
x: swap buffers again
s/\n//: remove linefeed or both parts appear on separate lines
that was quite fun, but maybe sed is not the best tool to perform that operation, this rather belongs to code golf

Vim: Get content of syntax element under cursor

I'm on a highlighted complex syntax element and would like to get it's content. Can you think of any way to do this?
Maybe there's some way to search for a regular expression so that it contains the cursor?
EDIT
Okay, example. The cursor is inside a string, and I want to get the text, the content of this syntactic element. Consider the following line:
String myString = "Foobar, [CURSOR]cool \"string\""; // And some other "rubbish"
I want to write a function that returns
"Foobar, cool \"string\""
if I understood the question. I found this gem some time ago and don't remember where but i used to understand how syntax hilighting works in vim:
" Show syntax highlighting groups for word under cursor
nmap <leader>z :call <SID>SynStack()<CR>
function! <SID>SynStack()
if !exists("*synstack")
return
endif
echo map(synstack(line('.'), col('.')), 'synIDattr(v:val, "name")')
endfunc
The textobj-syntax plugin might help. It creates a custom text object, so that you can run viy to visually select the current syntax highlighted element. The plugin depends on the textobj-user plugin, which is a framework for creating custom text objects.
This is a good use for text objects (:help text-objects). To get the content you're looking for (Foobar, cool \"string\"), you can just do:
yi"
y = yank
i" = the text object "inner quoted string"
The yank command by default uses the unnamed register ("", see :help registers), so you can access the yanked contents programmatically using the getreg() function or the shorthand #{register-name}:
:echo 'String last yanked was:' getreg('"')
:echo 'String last yanked was:' #"
Or you can yank the contents into a different register:
"qyi"
yanks the inner quoted string into the "q register, so it doesn't conflict with standard register usage (and can be accessed as the #q variable).
EDIT: Seeing that the plugin mentioned by nelstrom works similar to my original approach, I settled on this slightly more elegant solution:
fu s:OnLink()
let stack = synstack(line("."), col("."))
return !empty(stack)
endf
normal mc
normal $
let lineLength = col(".")
normal `c
while col(".") > 1
normal h
if !s:OnLink()
normal l
break
endif
endwhile
normal ma`c
while col(".") < lineLength
normal l
if !s:OnLink()
normal h
break
endif
endwhile
normal mb`av`by