Replace first occurrence of text using replaceText(searchPattern, replacement) - regex

I am trying to replace the first occurrence of a paragraph in Google Doc using the function replaceText(searchPattern, replacement), but I can't seem to find the right RegEx expression.
If someone could help me I would really appreciate it.
body.replaceText("^"+paragraph.getText()+"$"," ");

The body.ReplaceText() function replaces all instances of a pattern, not just the first instance ( link ).
A better option may be to loop through the paragraphs to find the first with matching text, like so:
function deleteParagraph(textToRemove) {
var body = DocumentApp.getActiveDocument().getBody();
// gets all paragraphs as an array
var paragraphs = body.getParagraphs()
for (var i = 0; i < paragraphs.length; i++){
if (paragraphs[i].getText() === textToRemove){
paragraphs[i].clear()
Logger.log(textToRemove + " was removed")
//stops it looping through any more paragraphs
break;
}
}
}
If you want to practice with regular expressions then www.regexr.com is very handy.

Related

How to split 1 long paragraph to 2 shorter paragraphs? Google Document

I want paragraphs to be up to 3 sentences only.
For that, my strategy is to loop on all paragraphs and find the 3rd sentence ending (see note). And then, to add a "\r" char after it.
This is the code I have:
for (var i = 1; i < paragraphs.length; i++) {
...
sentEnds = paragraphs[i].getText().match(/[a-zA-Z0-9_\u0590-\u05fe][.?!](\s|$)|[.?!][.?!](\s|$)/g);
//this array is used to count sentences in Hebrew/English/digits that end with 1 or more of either ".","?" or "!"
...
if ((sentEnds != null) && (sentEnds.length > 3)) {
lineBreakAnchor = paragraphs[i].getText().match(/.{10}[.?!](\s)/g);
paragraphs[i].replaceText(lineBreakAnchor[2],lineBreakAnchor[2] + "\r");
}
}
This works fine for round 1. But if I run the code again- the text after the inserted "\r" char is not recognized as a new paragraph. Hence, more "\r" (new lines) will be inserted each time the script is running.
How can I make the script "understand" that "\r" means new, separate paragraph?
OR
Is there another character/approach that will do the trick?
Thank you.
Note: I use the last 10 characters of the sentence assuming the match will be unique enough to make only 1 replacement.
Without modifying your own regex expression you can achieve this.
Try this approach to split the paragraphs:
Grab the whole content of the document and create an array of sentences.
Insert paragraphs with up to 3 sentences after original paragraphs.
Remove original paragraphs from hell.
function sentenceMe() {
var doc = DocumentApp.getActiveDocument();
var paragraphs = doc.getBody().getParagraphs();
var sentences = [];
// Split paragraphs into sentences
for (var i = 0; i < paragraphs.length; i++) {
var parText = paragraphs[i].getText();
//Count sentences in Hebrew/English/digits that end with 1 or more of either ".","?" or "!"
var sentEnds = parText.match(/[a-zA-Z0-9_\u0590-\u05fe][.?!](\s|$)|[.?!][.?!](\s|$)/g);
if (sentEnds){
for (var j=0; j< sentEnds.length; j++){
var initIdx = 0;
var sentence = parText.substring(initIdx,parText.indexOf(sentEnds[j])+3);
var parInitIdx = initIdx;
initIdx = parText.indexOf(sentEnds[j])+3;
parText = parText.substring(initIdx - parInitIdx);
sentences.push(sentence);
}
}
// console.log(sentences);
}
inThrees(doc, paragraphs, sentences)
}
function inThrees(doc, paragraphs, sentences) {
// define offset
var offset = paragraphs.length;
// Create paragraphs with up to 3 sentences
var k=0;
do {
var parText = sentences.splice(0,3).join(' ');
doc.getBody().insertParagraph(k + offset , parText.concat('\n'));
k++
}
while (sentences.length > 0)
// Remove paragraphs from hell
for (var i = 0; i < offset; i++){
doc.getBody().removeChild(paragraphs[i]);
}
}
In case you are wondering about the custom menu, here is it:
function onOpen() {
var ui = DocumentApp.getUi();
ui.createMenu('Custom Menu')
.addItem("3's the magic number", 'sentenceMe')
.addToUi();
}
References:
DocumentApp.Body.insertParagraph
Actually the detection of sentences is not an easy task.
A sentence does not always end with a dot, a question mark or an exclamation mark. If the sentence ends with a quote then punctuation rules in some countries force you to put the end of the sentence mark inside the quote:
John asked: "Who's there?"
Not every dot means an end of a sentence, usually the dot after an uppercase letter does not end the sentence, because it occurs after an initial. The sentence does not end after J. here:
The latest Star Wars movie has been directed by J.J. Abrams.
However, sometimes the sentence does end after a capital letter followed by a dot:
This project has been sponsored by NASA.
And abbreviations can make it very hard:
For more information check the article in Phys. Rev. Letters 66, 2697, 2013.
Having in mind these difficulties let's still try to get some expression which will work in "usual" cases.
Make a global match and substitution. Match
((?:[^.?!]+[.?!] +){3})
and substitute it with
\1\r
Demo
This looks for 3 sentences (a sentence is a sequence of not-dot, not-?, not-! characters followed by a dot, a ? or a ! and some spaces) and puts a \r after them.
UPDATED 2020-03-04
Try this:
var regex = new RegExp('((?:[a-zA-Z0-9_\\u0590-\\u05fe\\s]+[.?!]+\\s+){3})', 'gi');
for (var i = 1; i < paragraphs.length; i++) {
paragraphs[i].replaceText(regex, '$1\\r');
}

RegExp JS regarding sequential patttern matching

P.S: --> I know there is an easy solution to my needs, and I can do it that way but, -- I am looking for a "diff" solution for learning sake & challenge sake. So, this is just to solve an algorithm in a lesser traditional way.
I am working on solving an algorithm, and thought I had everything working well but one use case is failing. That is because I am building a regexp dynamically - now, my issue is this.
I need to match letters sequentially up until one doesn't match, then I just "match" what did match sequentially.
so... lets say I was matching this:
"zaazizz"
with this: /\bz[a]?[z]?/
"zizzi".match(/\bz[z]?[i]?/)
currently, that is matching with a : [zi], but the match should only be [z]
zzi only matches "z" from the front of "zizzi", in that order zzi - I now I am using [z]? etc... so it is optional.. but what I really need is match sequentially.. I'd only get "zi" IF from the front, it matched: zzi per my regex.... so, some sort of lookahead or ?. I tried ?= and != no luck.
I still think a non-regex-approach is best here. Have a look at the following JS-Code:
var match = "abcdef";
var input = "abcxdef";
var mArray = match.split("");
var inArray = input.split("");
var max = Math.min(mArray.length, inArray.length) - 1;
for (var i = 0; i < max; i++) {
if (mArray[i] != inArray[i]) { break; }
}
input.substring(0, i);
Where match is the string to be partially matched, input is the input and input.substring(0, i) is the result of the matching part. And you can change match as often as you like.

Google sheet : REGEXREPLACE match everything except a particular pattern

I would try to replace everything inside this string :
[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs
1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567,
1410315029
Except the following numbers :
1303222074
1403281851
1307239335
1403277567
1410315029
I have built a REGEX to match them :
1[0-9]{9}
But I have not figured it out to do the opposite that is everything except all matches ...
google spreadsheet use the Re2 regex engine and doesn't support many usefull features that can help you to do that. So a basic workaround can help you:
match what you want to preserve first and capture it:
pattern: [0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)
replacement: $1 (with a space after)
demo
So probably something like this:
=TRIM(REGEXREPLACE("[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs 1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567, 1410315029"; "[0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)"; "$1 "))
You can also do this with dynamic native functions:
=REGEXEXTRACT(A1,rept("(\d{10}).*",counta(split(regexreplace(A1,"\d{10}","#"),"#"))-1))
basically it is first split by the desired string, to figure out how many occurrences there are of it, then repeats the regex to dynamically create that number of capture groups, thus leaving you in the end with only those values.
First of all thank you Casimir for your help. It gave me an idea that will not be possible with a built-in functions and strong regex lol.
I found out that I can make a homemade function for my own purposes (yes I'm not very "up to date").
It's not very well coded and it returns doublons. But rather than fixing it properly, I use the built in UNIQUE() function on top of if to get rid of them; it's ugly and I'm lazy but it does the job, that is, a list of all matches of on specific regex (which is: 1[0-9]{9}). Here it is:
function ti_extract(input) {
var tab_tis = new Array();
var tab_strings = new Array();
tab_tis.push(input.match(/1[0-9]{9}/)); // get the TI and insert in tab_tis
var string_modif = input.replace(tab_tis[0], " "); // modify source string (remove everything except the TI)
tab_strings.push(string_modif); // insert this new string in the table
var v = 0;
var patt = new RegExp(/1[0-9]{9}/);
var fin = patt.test(tab_strings[v]);
var first_string = tab_strings[v];
do {
first_string = tab_strings[v]; // string 0, or the string with the first removed TI
tab_tis.push(first_string.match(/1[0-9]{9}/)); // analyze the string and get the new TI to put it in the table
var string_modif2 = first_string.replace(tab_tis[v], " "); // modify the string again to remove the new TI from the old string
tab_strings.push(string_modif2);
v += 1;
}
while(v < 15)
return tab_tis;
}

Using regular expressions to add numbers using find and replace in Notepad++

I have a SPROC which is having the multiple instances of string Say '#TRML_CLOSE'.
I want to make them to be concatenated with a sequence of numbers.
Eg:
Search and find string '#TRML_CLOSE'
And
Replace the 1st Instance with '#TRML_CLOSE_1',
Replace the 2nd Instance with '#TRML_CLOSE_2',
Replace the 3nd Instance with '#TRML_CLOSE_3',
and so on.
How do I achieve this in Notepad++ using expressions.
I don't know the extent you can script Notepad++, but I do know you can throw together a quick JavaScript snippet to do what you want. http://jsfiddle.net/x4eSr/
Just go to the JS fiddle, and hit the button.
document.getElementById("btn").onclick = function() {
var elm = document.getElementById("txt");
var val = elm.value;
var cnt = 1;
val = val.replace(/#TRML_CLOSE(?!=[_])/g, function(m) {
return m + "_" + cnt++;
});
elm.value = val;
};
Using JavaScript's string.replace(regex, function(){}) which calls the function on each match and a globally incremented "cnt" variable.

I want to check a string against many different regular expressions at once

I have a string which the user has inputted and I have my regular expressions within my Database and I can check the input string against those regular expressions within the database fine.
But now I need to add another column within my database which will hold another regular expression but I want to use the same for loop to check the input string againt my new regular expression aswell but at the end of my first loop. But I want to use this new expression against the same string
i.e
\\D\\W\\D <-- first expression
\\d <-- second expression which I want to use after the first expression is over
use regular expressions from database against input string which works
add new regular expression and corporate that within the same loop and check against the same string - not workin
my code is as follows
std::string errorMessages [2][2] = {
{
"Correct .R\n",
},
{
"Free text characters out of bounds\n",
}
};
for(int i = 0; i < el.size(); i++)
{
if(el[i].substr(0,3) == ".R/")
{
DCS_LOG_DEBUG("--------------- Validating .R/ ---------------");
output.push_back("\n--------------- Validating .R/ ---------------\n");
str = el[i].substr(3);
split(st,str,boost::is_any_of("/"));
DCS_LOG_DEBUG("main loop done");
for (int split_id = 0 ; split_id < splitMask.size() ; split_id++ )
{
boost::regex const string_matcher_id(splitMask[split_id]);
if(boost::regex_match(st[split_id],string_matcher_id))
{
a = errorMessages[0][split_id];
DCS_LOG_DEBUG("" << a );
}
else
{
a = errorMessages[1][split_id];
DCS_LOG_DEBUG("" << a);
}
output.push_back(a);
}
DCS_LOG_DEBUG("Out of the loop 2");
}
}
How can I retrieve my regular expression from the database and after this loops has finished use this new regex against the same string.
STRING IS - shamari
regular expresssion i want to add - "\\d"
ask me any questions if you do not understand
I'm not sure I understand you entirely, but if you're asking "How do I combine two separate regexes into a single regex", then you need to do
combinedRegex = "(?:" + firstRegex + ")|(?:" + secondRegex + ")"
if you want an "or" comparison (either one of the parts must match).
For an "and" comparison it's a bit more complicated, depending on whether these regexes match the entire string or only a substring.
Be aware that if the second regex uses numbered backreferences, this won't work since the indexes will change: (\w+)\1 and (\d+)\1 would have to become (?:(\w+)\1)|(?:(\d+)\2), for example.