Regex - find duplicate sets, keep them and remove everything else

Regex - find duplicate sets, keep them and remove everything else - regex

What is the regex needed to remove all except second+ occurrence of the duplicate entries?
data sets are separated by commas.
Example: This needs to convert to
#20131229PV1,#20140109PV5,#20140101PV1,#20140109PV5,#20140109PV5,#20131224PV5,
This
#20140109PV5,#20140109PV5,
after going through regex

Unfortunately, there is no way to find duplicate string sets using regex. You need a good string-based algorithm and implement it in your favorite computer language to achieve this.

Answered through a different approach (Javascript)
function find_duplicates(arr) {
var len=arr.length,
out=[],
counts={};
for (var i=0;i<len;i++) {
var item = arr[i];
var count = counts[item];
counts[item] = counts[item] >= 1 ? counts[item] + 1 : 1;
}
for (var item in counts) {
if(counts[item] > 1)
out.push(item);
}
alert(out);
}
find_duplicates(bookcaldatesci.innerHTML.split(","));

Related

Regex for the first 3 digits in a 6 digit number

I have items in a cms which have a 6-digit number.
The user can filter these item, via a input field,
by start typing a number.
const list = document.querySelector('#filter-wrap');
const searchBar = document.forms['search-kelim'].querySelector('input');
searchBar.addEventListener('keyup', function(e){
const term = e.target.value.toLowerCase();
const kelims = list.getElementsByClassName('filter-item');
Array.from(kelims).forEach(function(kelim){
let number = kelim.firstElementChild.textContent;
if(number.toLowerCase().indexOf(term) != -1 ){
console.log("Valid");
} else {
console.log("Invalid");
}
});
});
This is working, but it filters no matter where the digit
is occurring within the 6-digit number.
Aim is, it should only filter the first 3 starting digits, already starting with the first digit.
Meaning, if the user types 2, only the items starting with 2 are shown,
if the user then types 1, only the items starting with 21 are shown.
(the same for the third digit, typing 214 matches only the items starting with 214)
instead of indexof i tried with regex, but cannot get it to work:
var re = new RegExp("^[0-9]+$");
if (re.test(term)) {
console.log("Valid");
} else {
console.log("Invalid");
}
also tried these regex:
var re = new RegExp("^[0-9]");
var re = new RegExp("^\d{3}[0-9]");
var re = new RegExp("/[0-9]{1}[0-9]{1}[0-9]{1}/");
i also tried with match but also no luck, (different syntax?)
UPDATE:
here are two codepens for better understanding:
Filter with indexof, working but for first 3 digits.
https://codepen.io/hhentschel/pen/LYNWKeK
Filter with Regex, i tried all different answers, which came up so far.
https://codepen.io/hhentschel/pen/yLOMmbw

Your number variables all start with a line break. You may easily check that if you add console.log("'"+number+"': number") in the code.
To fix the regex approach, you just need to trim the incoming strings:
var re = new RegExp("^"+term);
if (re.test(number.trim())) { // <-- HERE!
kelim.classList.add("block");
kelim.classList.remove("hide");
} else {
kelim.classList.add("hide");
kelim.classList.remove("block");
}

Just check whether the Index is 0:
if(number.toLowerCase().indexOf(term) == 0){
console.log("Valid");
} else {
console.log("Invalid");
}
So you know that the term is at the beginning of the number.
But if you want to use regex, you have to build a new pattern every time:
var re = new RegExp("^"+term);
if (re.test(number)) {
console.log("Valid");
} else {
console.log("Invalid");
}

RegExp JS regarding sequential patttern matching

P.S: --> I know there is an easy solution to my needs, and I can do it that way but, -- I am looking for a "diff" solution for learning sake & challenge sake. So, this is just to solve an algorithm in a lesser traditional way.
I am working on solving an algorithm, and thought I had everything working well but one use case is failing. That is because I am building a regexp dynamically - now, my issue is this.
I need to match letters sequentially up until one doesn't match, then I just "match" what did match sequentially.
so... lets say I was matching this:
"zaazizz"
with this: /\bz[a]?[z]?/
"zizzi".match(/\bz[z]?[i]?/)
currently, that is matching with a : [zi], but the match should only be [z]
zzi only matches "z" from the front of "zizzi", in that order zzi - I now I am using [z]? etc... so it is optional.. but what I really need is match sequentially.. I'd only get "zi" IF from the front, it matched: zzi per my regex.... so, some sort of lookahead or ?. I tried ?= and != no luck.

I still think a non-regex-approach is best here. Have a look at the following JS-Code:
var match = "abcdef";
var input = "abcxdef";
var mArray = match.split("");
var inArray = input.split("");
var max = Math.min(mArray.length, inArray.length) - 1;
for (var i = 0; i < max; i++) {
if (mArray[i] != inArray[i]) { break; }
}
input.substring(0, i);
Where match is the string to be partially matched, input is the input and input.substring(0, i) is the result of the matching part. And you can change match as often as you like.

Replace first occurrence of text using replaceText(searchPattern, replacement)

I am trying to replace the first occurrence of a paragraph in Google Doc using the function replaceText(searchPattern, replacement), but I can't seem to find the right RegEx expression.
If someone could help me I would really appreciate it.
body.replaceText("^"+paragraph.getText()+"$"," ");

The body.ReplaceText() function replaces all instances of a pattern, not just the first instance ( link ).
A better option may be to loop through the paragraphs to find the first with matching text, like so:
function deleteParagraph(textToRemove) {
var body = DocumentApp.getActiveDocument().getBody();
// gets all paragraphs as an array
var paragraphs = body.getParagraphs()
for (var i = 0; i < paragraphs.length; i++){
if (paragraphs[i].getText() === textToRemove){
paragraphs[i].clear()
Logger.log(textToRemove + " was removed")
//stops it looping through any more paragraphs
break;
}
}
}
If you want to practice with regular expressions then www.regexr.com is very handy.

Google sheet : REGEXREPLACE match everything except a particular pattern

I would try to replace everything inside this string :
[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs
1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567,
1410315029
Except the following numbers :
1303222074
1403281851
1307239335
1403277567
1410315029
I have built a REGEX to match them :
1[0-9]{9}
But I have not figured it out to do the opposite that is everything except all matches ...

google spreadsheet use the Re2 regex engine and doesn't support many usefull features that can help you to do that. So a basic workaround can help you:
match what you want to preserve first and capture it:
pattern: [0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)
replacement: $1 (with a space after)
demo
So probably something like this:
=TRIM(REGEXREPLACE("[JGMORGAN - BANK2] n° 10 NEWYORK, n° 222 CAEN, MONTELLIER, VANNES / TARARTA TIs 1303222074, 1403281851 & 1307239335 et Cloture TIs 1403277567, 1410315029"; "[0-9]*(?:[0-9]{0,9}[^0-9]+)*(?:([0-9]{9,})|[0-9]*\z)"; "$1 "))

You can also do this with dynamic native functions:
=REGEXEXTRACT(A1,rept("(\d{10}).*",counta(split(regexreplace(A1,"\d{10}","#"),"#"))-1))
basically it is first split by the desired string, to figure out how many occurrences there are of it, then repeats the regex to dynamically create that number of capture groups, thus leaving you in the end with only those values.

First of all thank you Casimir for your help. It gave me an idea that will not be possible with a built-in functions and strong regex lol.
I found out that I can make a homemade function for my own purposes (yes I'm not very "up to date").
It's not very well coded and it returns doublons. But rather than fixing it properly, I use the built in UNIQUE() function on top of if to get rid of them; it's ugly and I'm lazy but it does the job, that is, a list of all matches of on specific regex (which is: 1[0-9]{9}). Here it is:
function ti_extract(input) {
var tab_tis = new Array();
var tab_strings = new Array();
tab_tis.push(input.match(/1[0-9]{9}/)); // get the TI and insert in tab_tis
var string_modif = input.replace(tab_tis[0], " "); // modify source string (remove everything except the TI)
tab_strings.push(string_modif); // insert this new string in the table
var v = 0;
var patt = new RegExp(/1[0-9]{9}/);
var fin = patt.test(tab_strings[v]);
var first_string = tab_strings[v];
do {
first_string = tab_strings[v]; // string 0, or the string with the first removed TI
tab_tis.push(first_string.match(/1[0-9]{9}/)); // analyze the string and get the new TI to put it in the table
var string_modif2 = first_string.replace(tab_tis[v], " "); // modify the string again to remove the new TI from the old string
tab_strings.push(string_modif2);
v += 1;
}
while(v < 15)
return tab_tis;
}

Using regular expressions to add numbers using find and replace in Notepad++

I have a SPROC which is having the multiple instances of string Say '#TRML_CLOSE'.
I want to make them to be concatenated with a sequence of numbers.
Eg:
Search and find string '#TRML_CLOSE'
And
Replace the 1st Instance with '#TRML_CLOSE_1',
Replace the 2nd Instance with '#TRML_CLOSE_2',
Replace the 3nd Instance with '#TRML_CLOSE_3',
and so on.
How do I achieve this in Notepad++ using expressions.

I don't know the extent you can script Notepad++, but I do know you can throw together a quick JavaScript snippet to do what you want. http://jsfiddle.net/x4eSr/
Just go to the JS fiddle, and hit the button.
document.getElementById("btn").onclick = function() {
var elm = document.getElementById("txt");
var val = elm.value;
var cnt = 1;
val = val.replace(/#TRML_CLOSE(?!=[_])/g, function(m) {
return m + "_" + cnt++;
});
elm.value = val;
};
Using JavaScript's string.replace(regex, function(){}) which calls the function on each match and a globally incremented "cnt" variable.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex - find duplicate sets, keep them and remove everything else - regex

What is the regex needed to remove all except second+ occurrence of the duplicate entries? data sets are separated by commas. Example: This needs to convert to #20131229PV1,#20140109PV5,#20140101PV1,#20140109PV5,#20140109PV5,#20131224PV5, This #20140109PV5,#20140109PV5, after going through regex

Unfortunately, there is no way to find duplicate string sets using regex. You need a good string-based algorithm and implement it in your favorite computer language to achieve this.

Related

Regex for the first 3 digits in a 6 digit number

RegExp JS regarding sequential patttern matching

Replace first occurrence of text using replaceText(searchPattern, replacement)

Google sheet : REGEXREPLACE match everything except a particular pattern

Using regular expressions to add numbers using find and replace in Notepad++

Categories

Resources