How to remove and ID from a string - regex

I have a string that looks like this, they are ids in a table:
1,2,3,4,5,6,7,8,9
If someone deletes something from the database, I will need to update the string. I know that doing this it will remove the value, but not the commas. Any idea how can I check if the id has a comma before and after so my string doesn't break?
$new_values = $original_values[0];
$new_values =~ s/$car_id//;
Result: 1,2,,4,5,6,7,8,9 using the above sample (bad). It should be 1,2,4,5,6,7,8,9.

To remove the $car_id from the string:
my $car_id = 3;
my $new_values = q{1,2,3,4,5,6,7,8,9};
$new_values = join q{,}, grep { $_ != $car_id }
split /,/, $new_values;
say $new_values;
# Prints:
# 1,2,4,5,6,7,8,9
If you already removed the id(s), and you need to remove the extra commas, reformat the string like so:
my $new_values = q{,,1,2,,4,5,6,7,8,9,,,};
$new_values = join q{,}, grep { /\d/ } split /,/, $new_values;
say $new_values;
# Prints:
# 1,2,4,5,6,7,8,9

You can use
s/^$car_id,|,$car_id\b//
Details
^ - start of string
$car_id - variable value
, - comma
| - or
, - comma
$car_id - variable value
\b - word boundary.

s/^\Q$car_id\E,|,\Q$car_id\E\b//
Another approach is to store an extra leading and trailing comma (,1,2,3,4,5,6,7,8,9,)
The main benefit is that it makes it easier to search for the id using SQL (since you can search for ,$car_id,). Same goes for editing it.
On the Perl side, you'd use
s/,\K\Q$car_id\E,// # To remove
substr($_, 1, -1) # To get actual string

Ugly way: use regex to remove the value, then simplify
$new_values = $oringa_value[0];
$new_values =~ s/$car_id//;
$new_values =~ s/,+/,/;
Nice way: split and merge
$new_values = $oringa_value[0];
my #values = split(/,/, $new_values);
my $index = 0;
$index++ until $values[$index] eq $car_id;
splice(#values, $index, 1);
$new_values = join(',', #values);

Related

perl regular expression match scalar plus punctuation

I have scalars (columns in a table) that have one or two email addresses separated by a comma. such as 'Joek#xyznco.com, jrancher#candyco.us' or 'jsmith#wellingent.com,mjones#wellingent.com' for several of these records I need to remove a bad/old email address and the trailing comma (if one exists).
if jmsith#wellingent is no longer valid how do I remove that address and the trailing comma?
This only removes the address but leaves the comma.
my $general_email = 'jsmith#wellingent.com,mjones#wellingent.com';
my $bad_addr = 'jsmith#wellingent.com';
$general_email =~ s/$bad_addr//;
Thanks for any help.
You may be better off without a regex but with list splitting:
use strict;
use warnings;
sub remove_bad {
my ($full, $bad) = #_;
my #emails = split /\s*,\s*/, $full; # split at comma, allowing for spaces around the comma
my #filtered = grep { $_ ne $bad } #emails;
return join ",", #filtered;
}
print 'First: ' , remove_bad('me#example.org, you#example.org', 'me#example.org'), "\n";
print 'Last: ', remove_bad('me#example.org, you#example.org', 'you#example.org'), "\n";
print 'Middle: ', remove_bad('me#example.org, you#example.org, other#eample.org', 'you#example.org'), "\n";
First, split the bad email address list at the comma, creating an array. Filter that using grep to remove the bad address. join the remaining elements back into a string.
The above code prints:
First: you#example.org
Last: me#example.org
Middle: me#example.org,other#eample.org

RegEx and split camelCase

I want to get an array of all the words with capital letters that are included in the string. But only if the line begins with "set".
For example:
- string "setUserId", result array("User", "Id")
- string "getUserId", result false
Without limitation about "set" RegEx look like /([A-Z][a-z]+)/
$str ='setUserId';
$rep_str = preg_replace('/^set/','',$str);
if($str != $rep_str) {
$array = preg_split('/(?<=[a-z])(?=[A-Z])/',$rep_str);
var_dump($array);
}
See it
Also your regex will also work.:
$str = 'setUserId';
if(preg_match('/^set/',$str) && preg_match_all('/([A-Z][a-z]*)/',$str,$match)) {
var_dump($match[1]);
}
See it

In Perl, how many groups are in the matched regex?

I would like to tell the difference between a number 1 and string '1'.
The reason that I want to do this is because I want to determine the number of capturing parentheses in a regular expression after a successful match. According the perlop doc, a list (1) is returned when there are no capturing groups in the pattern. So if I get a successful match and a list (1) then I cannot tell if the pattern has no parens or it has one paren and it matched a '1'. I can resolve that ambiguity if there is a difference between number 1 and string '1'.
You can tell how many capturing groups are in the last successful match by using the special #+ array. $#+ is the number of capturing groups. If that's 0, then there were no capturing parentheses.
For example, bitwise operators behave differently for strings and integers:
~1 = 18446744073709551614
~'1' = Î ('1' = 0x31, ~'1' = ~0x31 = 0xce = 'Î')
#!/usr/bin/perl
($b) = ('1' =~ /(1)/);
print isstring($b) ? "string\n" : "int\n";
($b) = ('1' =~ /1/);
print isstring($b) ? "string\n" : "int\n";
sub isstring() {
return ($_[0] & ~$_[0]);
}
isstring returns either 0 (as a result of numeric bitwise op) which is false, or "\0" (as a result of bitwise string ops, set perldoc perlop) which is true as it is a non-empty string.
If you want to know the number of capture groups a regex matched, just count them. Don't look at the values they return, which appears to be your problem:
You can get the count by looking at the result of the list assignment, which returns the number of items on the right hand side of the list assignment:
my $count = my #array = $string =~ m/.../g;
If you don't need to keep the capture buffers, assign to an empty list:
my $count = () = $string =~ m/.../g;
Or do it in two steps:
my #array = $string =~ m/.../g;
my $count = #array;
You can also use the #+ or #- variables, using some of the tricks I show in the first pages of Mastering Perl. These arrays have the starting and ending positions of each of the capture buffers. The values in index 0 apply to the entire pattern, the values in index 1 are for $1, and so on. The last index, then, is the total number of capture buffers. See perlvar.
Perl converts between strings and numbers automatically as needed. Internally, it tracks the values separately. You can use Devel::Peek to see this in action:
use Devel::Peek;
$x = 1;
$y = '1';
Dump($x);
Dump($y);
The output is:
SV = IV(0x3073f40) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PV(0x30698cc) at 0x3073484
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x3079bb4 "1"\0
CUR = 1
LEN = 4
Note that the dump of $x has a value for the IV slot, while the dump of $y doesn't but does have a value in the PV slot. Also note that simply using the values in a different context can trigger stringification or nummification and populate the other slots. e.g. if you did $x . '' or $y + 0 before peeking at the value, you'd get this:
SV = PVIV(0x2b30b74) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x3079c5c "1"\0
CUR = 1
LEN = 4
At which point 1 and '1' are no longer distinguishable at all.
Check for the definedness of $1 after a successful match. The logic goes like this:
If the list is empty then the pattern match failed
Else if $1 is defined then the list contains all the catpured substrings
Else the match was successful, but there were no captures
Your question doesn't make a lot of sense, but it appears you want to know the difference between:
$a = "foo";
#f = $a =~ /foo/;
and
$a = "foo1";
#f = $a =~ /foo(1)?/;
Since they both return the same thing regardless if a capture was made.
The answer is: Don't try and use the returned array. Check to see if $1 is not equal to ""

I want to replace ',' on the 150th location in a String with a <br>

My String is : PI Last Name equal to one of
('AARONSON','ABDEL MEGUID','ABDEL-LATIF','ABDOOL KARIM','ABELL','ABRAMS','ACKERMAN','ADAIR','ADAMS','ADAMS-CAMPBELL', 'ADASHI','ADEBAMOWO','ADHIKARI','ADIMORA','ADRIAN', 'ADZERIKHO','AGADJANYAN','AGARWAL','AGOT', 'AGUIRRE-CRUZ','AHMAD','AHMED','AIKEN', 'AINAMO', 'AISENBERG','AJAIYEOBA','AKA','AKHTAR','AKINGBEMI','AKINYINKA','AKKERMAN','AKSOY','AKYUREK', 'ALBEROLA-ILA','ALBERT','ALCANTARA' ,'ALCOCK','ALEMAN', 'ALEXANDER','ALEXANDRE','ALEXANDROV','ALEXANIAN','ALLAND','ALLEN','ALLISON','ALPER', 'ALTMAN','ALVAREZ','AMARYAN','AMBESI-IMPIOMBATO','AMEGBETO','AMOWITZ', 'ANAGNOSTARAS','ANAND','ANDERSEN','ANDERSON', 'ANDRADE','ANDREEFF','ANDROPHY','ANGER','ANHOLT','ANTHONY','ANTLE','ANTONELLI','ANTONY', 'ANZULOVICH', 'APODACA','APOSHIAN','APPEL','APPLEBY','APRIL','ARAUJO','ARBIB','ARBOLEDA', 'ARCHAKOV','ARCHER', 'ARECHAVALETA-VELASCO','ARENS','ARGON','ARGYROKASTRITIS', 'ARIAS','ARIZAGA','ARMSTRONG','ARNON', 'ARSHAVSKY','ARVIN','ASATRYAN','ASCOLI','ASKENASE','ASSI','ATALAY','ATANASOVA','ATKINSON','ATTYGALLE','ATWEH','AU','AVETISYAN','AWE','AYOUB','AZAD','BACSO','BAGASRA','BAKER','BALAS', 'BALCAZAR','BALK','BALKAY','BALLOU','BALRAJ','BALSTER','BANERJEE','BANKOLE','BANTA','BARAL','BARANOWSKA','BARBAS', 'BARBER','BARILLAS-MURY','BARKHOLT','BARNES','BARNETT','BARRETT','BARRIA','BARROW','BARROWS','BARTKE','BARTLETT','BASSINGTHWAIGHTE','BASSIOUNY','BASU','BATES','BATTAGLIA','BATTERMAN','BAUER','BAUERLE','BAUM','BAUME', 'BAUMLER','BAVISTER','BAWA','BAYNE','BEASLEY','BEATTY','BEATY','BEBENEK','BECK','BECKER','BECKMAN','BECKMAN-SUURKULA' ,'BEDFORD','BEDOLLA','BEEBE','BEEMON','BEHETS','BEHRMAN','BEIER','BEKKER','BELL','BELLIDO','BELMAIN', 'BENATAR','BENBENISHTY','BENBROOK','BENDER','BENEDETTI','BENNETT','BENNISH','BENZ','BERG','BERGER','BERGEY','BERGGREN','BERK','BERKOWITZ','BERLIN','BERLINER','BERMAN','BERTINO','BERTOZZI','BERTRAND','BERWICK','BETHONY','BEYERS','BEYRER' ,'BEZPROZVANNY','BHAGWAT','BHANDARI','BHARGAVA','BHARUCHA','BHUJWALLA','BIANCO','BIDLACK','BIELERT','BIER','BIESSMANN','BIGELOW' ,'BILLER','BILLINGS','BINDER','BINDMAN','BINUTU','BIRBECK','BIRGE','BIRNBAUM','BIRO','BIRT','BISHAI','BISHOP','BISSELL','BJORKEGREN','BJORNSTAD','BLACK','BLANCHARD','BLASS','BLATTNER','BLIGNAUT','BLOCH','BLOCK','BLOOM','BLOOM,','BLUM','BLUMBERG' ,'BLUMENTHAL','BLYUKHER','BODDULURI','BOFFETTA','BOGOLIUBOVA', 'BOLLINGER','BOLLS','BOMSZTYK','BONANNO','BONNER','BOOM','BOOTHROYD','BOPPANA','BORAWSKI','BORG','BORIS-LAWRIE','BORISY','BORLONGAN','BORNSTEIN','BORODOVSKY','BORST','BOS','BOTO','BOWDEN','BOWEN','BOYCE-JACINO','BRADEN','BRADY' ,'BRAITHWAITE','BRANN','BRASH','BRAUNSTEIN', 'BREMAN','BRENNAN','BRENNER','BRETSCHER','BREW','BREYSSE','BRIGGS','BRITES','BRITT','BRITTENHAM','BRODIE','BRODY','BROOK','BROOTEN','BROSCO','BROSNAN','BROWN','BROWNE','BRUCKNER','BRUNENGRABER','BRYL','BRYSON','BU','BUCHAN','BUDD','BUDNIK', 'BUEKENS','BUKRINSKY','BULLMORE','BULUN','BURBANO','BURGENER','BURGESS','BURKS','BURMEISTER','BURNETT','BURNHAM','BURNS','BURRIDGE','BURTON','BUSCIGLIO','BUSHEK','BUSIJA','BUZSAKI','BZYMEK','CABA')
I need to have a regex which will greedily looks for up to 150 characters with a last character being a ','. And then replace the last ',' of the 150 with a <br />
Any suggestions pls?
I used this ','(?=[^()]*\)) but this one replaces all the occurences. I want the 150th ones to be replaced.
Thanks everyone for your suggestions. I managed to do it with Java code instead of regex.
StringBuilder sb = new StringBuilder(html);
int i = 0;
while ((i = sb.indexOf("','", i + 150)) != -1) {
int j = sb.lastIndexOf("','", i + 150);
sb.insert(i+1, "<BR>");
}
return sb.toString();
However, this breaks at the first encounter of ',' in the 150 chars.
Can anyone help modify my code to incorporate the break at the last occurence of ',' withing the 150 chars.
You'll want something like this:
Look for every occurrence of \([^)]+*,[^)]+*\) (Find a parenthesis-wrapped string with a comma in it and then run the following regular expression on each of the matched elements:
(.{135,150}[^,]*?),
The first number is the minimum number of characters you want to match before you add a break tag -- the second is the maximum number of characters you would like to match before inserting a break tag. If there is no , between the characters in question then the regular expression will continue to consume characters until it finds a comma.
You could probably do it like this:
regex ~ /(^.{1,14}),/
replacement ~ '\1<replacement' or "$1<insert your text>"
In Perl:
$target = ','x 22;
$target =~ s/(^ .{1,14}) , /$1<15th comma>/x;
print $target;
Output
,,,,,,,,,,,,,,<15th comma>,,,,,,,
Edit: As an alternative, if you want to break the string up into succesive 150 or less
you could do it this way:
regex ~ /(.{1,150},)/sg
replacement ~ '\1<br/>' or "$1<br\/>"
// That is a regex of type global (/g) and include newlines (/s)
In Perl:
$target = "
('AARONSON','ABDEL MEGUID','ABDEL-LATIF','ABDOOL KARIM','ABELL','ABRAMS','ACKERMAN','ADAIR','ADAMS','ADAMS-CAMPBELL', 'ADASHI','ADEBAMOWO','ADHIKARI','ADIMORA','ADRIAN', 'ADZERIKHO','AGADJANYAN','AGARWAL','AGOT', 'AGUIRRE-CRUZ','AHMAD','AHMED','AIKEN', 'AINAMO', 'AISENBERG','AJAIYEOBA','AKA','AKHTAR','AKINGBEMI','AKINYINKA','AKKERMAN','AKSOY','AKYUREK', 'ALBEROLA-ILA','ALBERT','ALCANTARA' ,'ALCOCK','ALEMAN', 'ALEXANDER','ALEXANDRE','ALEXANDROV','ALEXANIAN','ALLAND','ALLEN','ALLISON','ALPER', 'ALTMAN', ... )
";
if ($target =~ s/( .{1,150} , )/$1<br\/>/sxg) {
print $target;
}
Output:
('AARONSON','ABDEL MEGUID','ABDEL-LATIF','ABDOOL KARIM','ABELL','ABRAMS','ACKERMAN','ADAIR','ADAMS','ADAMS-CAMPBELL', 'ADASHI','ADEBAMOWO','ADHIKARI',<br/>'ADIMORA','ADRIAN', 'ADZERIKHO','AGADJANYAN','AGARWAL','AGOT', 'AGUIRRE-CRUZ','AHMAD','AHMED','AIKEN', 'AINAMO', 'AISENBERG','AJAIYEOBA','AKA',<br/>'AKHTAR','AKINGBEMI','AKINYINKA','AKKERMAN','AKSOY','AKYUREK', 'ALBEROLA-ILA','ALBERT','ALCANTARA' ,'ALCOCK','ALEMAN', 'ALEXANDER','ALEXANDRE',<br/>'ALEXANDROV','ALEXANIAN','ALLAND','ALLEN','ALLISON','ALPER', 'ALTMAN',<br/> ... )

Remove a number from a comma separated string while properly removing commas

FOR EXAMPLE: Given a string... "1,2,3,4"
I need to be able to remove a given number and the comma after/before depending on if the match is at the end of the string or not.
remove(2) = "1,3,4"
remove(4) = "1,2,3"
Also, I'm using javascript.
As jtdubs shows, an easy way is is to use a split function to obtain an array of elements without the commas, remove the required element from the array, and then rebuild the string with a join function.
For javascript something like this might work:
function remove(array,to_remove)
{
var elements=array.split(",");
var remove_index=elements.indexOf(to_remove);
elements.splice(remove_index,1);
var result=elements.join(",");
return result;
}
var string="1,2,3,4,5";
var newstring = remove(string,"4"); // newstring will contain "1,2,3,5"
document.write(newstring+"<br>");
newstring = remove(string,"5");
document.write(newstring+"<br>"); // will contain "1,2,3,4"
You also need to consider the behavior you want if you have repeats, say the string is "1,2,2,4" and I say "remove(2)" should it remove both instances or just the first? this function will remove only the first instance.
Just use multiple substitutions.
s/^$removed,//;
s/,$removed$//;
s/,$removed,/,/;
This will be easier than trying to invent a single replacement that handles all those cases.
string input = "1,2,3,4";
List<string> parts = new List<string>(input.Split(new char[] { ',' }));
parts.RemoveAt(2);
string output = String.Join(",", parts);
Instead of using regex, I would do something like:
- split on comma
- delete the right element
- join with comma
Here is a perl script that does the job:
#!/usr/bin/perl
use 5.10.1;
use strict;
use warnings;
my $toremove = 5;
my $string = "1,2,3,4,5";
my #tmp = split/,/, $string;
#tmp = grep{ $_ != $toremove }#tmp;
$string =join',', #tmp;
say $string;
Output:
1,2,3,4
Javascript has improved since this question was posted.
I use the following regex to remove items from a csv string
let searchStr = "359";
let regex = new RegExp("^" + searchStr + ",?|," + searchStr);
csvStr = csvStr.replace(regex, "");
If the child_id is the start, middle or end, or only item it is replaced.
If the searchStr is at the start of the csvStr it and any trailing comma is replaced. Else if the searchStr is anywhere else in the csvStr it must be preceded with a comma so the searchStr and its preceding comma are replaced by an empty string.