How to replace the captured group in Ruby - regex

I would like to replace the captured group of a string with the elements of an array.
I am trying something like this:
part_number = 'R1L16SB#AA'
regex = (/\A(RM|R1)([A-Z])(\d\d+)([A-Z]+)#?([A-Z])([A-Z])\z/)
g = ["X","Y","Z"]
g.each do |i|
ren_m,ch_conf,bit_conf,package_type,packing_val,envo_vals = part_number.match(regex).captures
m = part_number.sub! packing_val,i
puts m
end
My code with array g = ["X","Y","Z"] is giving desired output as:
R1L16SB#XA
R1L16SB#YA
R1L16SB#ZA
The captured group packing_val is replaced with
g = ["X","Y","Z"]
But when the array has elements which are already present in the string then it is not working:
g = ["A","B","C"]
outputs:
R1L16SB#AA
R1L16SB#BA
R1L16SC#BA
But my expected output is:
R1L16SB#AA
R1L16SB#BA
R1L16SB#CA
What is going wrong and what could be the possible solution?

sub! will replace the first match every iteration on part_number which is outside of the loop.
What happens is:
In the first iteration, the first A will be replaced with A giving the same
R1L16SB#AA
^
In the second iteration, the first A will be replaced by B giving
R1L16SB#BA
^
In the third iteration, the first B will be replaced by C giving
R1L16SC#BA
^
One way to get the desired output is to put part_number = 'R1L16SB#AA' inside the loop.
Ruby demo

You mutated your part_number every iteration. That's the reason.
Just switch to sub without bang:
m = part_number.sub(packing_val, i)
You can do it without regex:
part_number = 'R1L16SB#AA'
g = %w[X Y Z]
g.each do |i|
pn = part_number.dup
pn[-2] = i
puts pn
end

Related

How to get the value only in lua?

I want to only get the value. How to get the value only?
Is there any pattern matching? I especially trouble in '='
DeviceName=
ManufacturerId=ABC76
SerialNo=100
ModelId=BDF2015
You can use Lua's pattern matching for this task:
s=[[
DeviceName=
ManufacturerId=ABC76
SerialNo=100
ModelId=BDF2015
]]
for k,v in s:gmatch("(%S+)=(%S*)") do
print(k,'['..v..']')
end
Output
DeviceName []
ManufacturerId [ABC76]
SerialNo [100]
ModelId [BDF2015]
You can use the following regex:
^(\t| )*[a-zA-Z][a-zA-Z_0-9]*(\t| )*=(\t| )*(.*)$
and a backreference to $4 or \4
it will work perfectly for the following assignation (with spaces)
DeviceName =
ManufacturerId= ABC76
SerialNo = 100
ModelId=BDF2015
However, beware that the multiple assignations on the same line will not work with this regex!!!!
You will also have to adapt it for local declaration.
The following assignations will not be extracted properly!
local d , f = 5 ,10
d , f = 5, 10;
d, f = 10
the regex for that would be:
\SerialNo=(.*)

NaN returned when converting string to number within a for loop

for n=1:37
for m=2:71
rep1 = regexp(Cell1{n,m}, 'f[0-9]*', 'match')
rep2 = regexp(rep1, '[0-9]*', 'match')
rep2 = [rep2{:}]
cln = str2double(rep2)
Cell2{n,cln} = Cell1{n,m}
end
end
Cell 1 is a 37x71 Cell, Cell 2 is a 37x71 empty cell.
Ex
Cell1{1,2} = -(f32.*x1.*x6)./v1
If I run each part of the loop above individually, the function works as intended. However, it returns cln as a NaN when the whole loop is executed.
You are getting a NaN because your regex doesn't match one of the values of Cell1 and returns an empty string (which str2double converts to a NaN).
But let's take a step back for a second here. You can use regexp on cell arrays so there is no need to loop through all of your elements. Also, you can use a look behind assertion to look for that "f" that precedes your number therefore preventing the use of regexp twice.
stringNumber = regexp(Cell1, '(?<=f)[0-9]*', 'match', 'once');
numbers = str2double(stringNumber);
You can then check for NaNs (isnan(numbers)) and look closer at the elements of Cell1 to see why your regex isn't finding a number in a particular string.
Once you get that sorted out, you can assign to Cell2 like you are doing
Cell2 = cell(37, 71);
for k = 1:numel(numbers)
row = mod(k - 1, size(Cell1, 2)) + 1;
Cell2(row, numbers(k)) = Cell1(k);
end

Matching specific lengths with regexp in Matlab

String matching question in Matlab.
if i have a matrix
a = ['thehe'];
str = {'the','he'};
match = regexp(a,str);
the output is match =
[1] [1x2 double]
because it found 'he' twice and 'the' once
how can i make it so it looks from left to right of my string a and
only matches 'the' once and 'he' once?
To answer the explicit question, from the documentation for regexp you can specify the once search option:
a = 'thehe';
str = {'the','he'};
match = regexp(a,str, 'once');
Which returns:
match =
[1] [2]
Where match is a 1x2 cell array whose cell value(s) correspond to the first index of the match in a for each cell of str.
I understand from what the ambiguously described details I'v read, that you want the indexes of non-interleaved occurences of the and he, means 1, and 4.
a = ['thehe'];
str = {'the';'[^t]he'};
match = regexp(a,str)
after this print the two results.
a(match{1}:match{1}+2)
ans =
the
and
a(match{2}+1:match{2}+2)
ans =
he
no third occurence !
a(match{3})
??? Index exceeds matrix dimensions.

Regex: How to remove English words from sentences using Regex?

I've number of rows in SQLite, each row has one column that contains data like this:
prosperکامیاب شدن ، موفق شدن ، رونق یافتن
As you can see, the sentence starts with English words, Now I want to remove English words at first of each sentence. Is there any way to do that via T-SQL query(using Regex)?
you may try this :) I have made it as a function to call upon
create function dbo.RemoveEngChars (#Unicode_string nvarchar(max))
returns nvarchar(max) as
begin
declare #i int = 1; -- must start from 1, as SubString is 1-based
declare #OriginalString nvarchar(100) = #Unicode_string collate SQL_Latin1_General_Cp1256_CS_AS
declare #ModifiedString nvarchar(100) = N'';
while #i <= Len(#OriginalString)
begin
if SubString(#OriginalString, #i, 1) not like '[a-Z]'
begin
set #ModifiedString = #ModifiedString + SubString(#OriginalString, #i, 1);
end
set #i = #i + 1;
end
return #ModifiedString
end
--To call the function , you can run the following script and pass the Unicode in N' prefix
select dbo.RemoveEngChars(N'prosperکامیاب شدن ، موفق شدن ، رونق یافتن')

Mysterious no-match in regular expression

Imagine I have a cell array with two filenames:
filenames{1,1} = 'SMCSx0noSat48VTFeLeakTrace.txt';
filenames{2,1} = 'SMCSx0NoSat48VTrace.txt';
I want to get the filename which starts with 'SMCSx0' and contains the filterword 'NoSat48VTrace':
%// case 1
expression = 'SMCSx0';
filterword = 'NoSat48VTrace';
regs = regexp(filenames, ['^' expression '.*\' filterword '.*\.txt$'])
mask = ~cellfun(#isempty,regs);
file = filenames(mask)
it works, I get:
file =
'SMCSx0NoSat48VTrace.txt'
But for whatever reason does the change of the filterword to 'noSat48VTFeLeakTrace' doesn't get me the other file?
%// case 2
expression = 'SMCSx0';
filterword = 'noSat48VTFeLeakTrace';
regs = regexp(filenames, ['^' expression '.*\' filterword '.*\.txt$'])
mask = ~cellfun(#isempty,regs);
file = filenames(mask)
which is absolutely the same as before, but
file =
Empty cell array: 0-by-1
I'm actually use these lines in a function for months, without problems. But now I added some files to my folder which are not found, though their names are similar to before. Any hints?
It is actually supposed to work without including Trace into the filterword, which it does for the first case, that's why I put .*\ into the regex.
%// case 1
expression = 'SMCSx0';
filterword = 'NoSat48V';
... works
'^' expression '.*\'
The \ near the end makes it that \n is interpreted as a new-line character:
SMCSx0.*\noSat48VTFeLeakTrace.*\.txt$
This worked fine with the other filterword because NoSat48VTrace has an upper case N and \N is interpreted as simply N.
Get rid of the \, you don't need it.
You have an extra backslash in there:
regs = regexp(filenames, ['^' expression '.*\' filterword '.*\.txt$'])
^^^
|||
remove it and it should give the expected result.