How to get the value only in lua? - regex

I want to only get the value. How to get the value only?
Is there any pattern matching? I especially trouble in '='
DeviceName=
ManufacturerId=ABC76
SerialNo=100
ModelId=BDF2015

You can use Lua's pattern matching for this task:
s=[[
DeviceName=
ManufacturerId=ABC76
SerialNo=100
ModelId=BDF2015
]]
for k,v in s:gmatch("(%S+)=(%S*)") do
print(k,'['..v..']')
end
Output
DeviceName []
ManufacturerId [ABC76]
SerialNo [100]
ModelId [BDF2015]

You can use the following regex:
^(\t| )*[a-zA-Z][a-zA-Z_0-9]*(\t| )*=(\t| )*(.*)$
and a backreference to $4 or \4
it will work perfectly for the following assignation (with spaces)
DeviceName =
ManufacturerId= ABC76
SerialNo = 100
ModelId=BDF2015
However, beware that the multiple assignations on the same line will not work with this regex!!!!
You will also have to adapt it for local declaration.
The following assignations will not be extracted properly!
local d , f = 5 ,10
d , f = 5, 10;
d, f = 10

the regex for that would be:
\SerialNo=(.*)

Related

How to optimize the python code written using regex and for loops?

I have two lists and I need to perform a string match. I have used three for loops and re.pattern to solve. I am getting the expected using existing code (part1), but I need to optimized the code (part2) as it takes a longer time when I apply for lengthy data.
part1
texts = ['foo abc', 'foobar xyz', 'xyz baz32', 'baz 45','fooz','bazzar','foo baz']
terms = ['foo','baz','apple']
output_list = []
for term in terms:
pattern_term = r'\b(?:{})\b'.format(term)
try:
for i in range(len(texts)):
line_text = texts[i]
for match in re.finditer(pattern_term, line_text):
start_index = match.start()
output_list.append([i, start_index, line_text[start_index:], term])
except:
pass
output:
Explaination fo columns names :
Index = index of texts when pattern matches
Start_index = start index where pattern matches inside text
Match_text = complete text of that matching
Match_term = term with it matches
pd.DataFrame(output_list, columns = ['Index', 'Start_index', 'Match_text', 'Match_term'])
Index Start_index Match_text Match_term
0 0 0 foo abc foo
1 6 0 foo baz foo
2 3 0 baz 45 baz
3 6 4 baz baz
I have tried the following code (part2), but its output is partial:
part 2
df = pd.DataFrame({'Match_text': texts})
pat = r'\b(?:{})\b'.format('|'.join(terms))
df[df['Match_text'].str.contains(pat)]
output
Match_text
0 foo abc
3 baz 45
6 foo baz
Your code is already good since you need to find occurrences of whole words inside longer strings, and you create the regex pattern before the loop where the texts are processed with the regex.
The regex already is good, the only thing about it is the redundant non-capturing group that you may discard because you check term by term, there is no alternation inside the group. You might also compile the regex:
pattern_term = re.compile(r'\b{}\b'.format(term))
Then, you may get rid of temporary variables in the for loop:
for i in range(len(texts)):
for match in pattern_term.finditer(texts[i]):
output_list.append([i, match.start(), texts[i][match.start():], term])

How to replace the captured group in Ruby

I would like to replace the captured group of a string with the elements of an array.
I am trying something like this:
part_number = 'R1L16SB#AA'
regex = (/\A(RM|R1)([A-Z])(\d\d+)([A-Z]+)#?([A-Z])([A-Z])\z/)
g = ["X","Y","Z"]
g.each do |i|
ren_m,ch_conf,bit_conf,package_type,packing_val,envo_vals = part_number.match(regex).captures
m = part_number.sub! packing_val,i
puts m
end
My code with array g = ["X","Y","Z"] is giving desired output as:
R1L16SB#XA
R1L16SB#YA
R1L16SB#ZA
The captured group packing_val is replaced with
g = ["X","Y","Z"]
But when the array has elements which are already present in the string then it is not working:
g = ["A","B","C"]
outputs:
R1L16SB#AA
R1L16SB#BA
R1L16SC#BA
But my expected output is:
R1L16SB#AA
R1L16SB#BA
R1L16SB#CA
What is going wrong and what could be the possible solution?
sub! will replace the first match every iteration on part_number which is outside of the loop.
What happens is:
In the first iteration, the first A will be replaced with A giving the same
R1L16SB#AA
^
In the second iteration, the first A will be replaced by B giving
R1L16SB#BA
^
In the third iteration, the first B will be replaced by C giving
R1L16SC#BA
^
One way to get the desired output is to put part_number = 'R1L16SB#AA' inside the loop.
Ruby demo
You mutated your part_number every iteration. That's the reason.
Just switch to sub without bang:
m = part_number.sub(packing_val, i)
You can do it without regex:
part_number = 'R1L16SB#AA'
g = %w[X Y Z]
g.each do |i|
pn = part_number.dup
pn[-2] = i
puts pn
end

groovy regex, how to match array items in a string

The string looks like this "[xx],[xx],[xx]"
Where xx is a ploygon like this "(1.0,2.3),(2.0,3)...
Basically, we are looking for a way to get the string between each pair of square brackets into an array.
E.g. String source = "[hello],[1,2],[(1,2),(2,4)]"
would result in an object a such that:
a[0] == 'hello'
a[1] == '1,2'
a[2] == '(1,2),(2,4)'
We have tried various strategies, including using groovy regex:
def p = "[12],[34]"
def points = p =~ /(\[([^\[]*)\])*/
println points[0][2] // yields 12
However,this yields the following 2 dim array:
[[12], [12], 12]
[, null, null]
[[34], [34], 34]
so if we took the 3rd item from every even rows we would be ok, but this does look very correct. We are not talking into account the ',' and we are not sure why we are getting "[12]" twice, when it should be zero times?
Any regex experts out there?
I think that this is what you're looking for:
def p = "[hello],[1,2],[(1,2),(2,4)]"
def points = p.findAll(/\[(.*?)\]/){match, group -> group }
println points[0]
println points[1]
println points[2]
This scripts prints:
hello
1,2
(1,2),(2,4)
The key is the use of the .*? to make the expression non-greedy to found the minimum between chars [] to avoid that the first [ match with the last ] resulting match in hello],[1,2],[(1,2),(2,4) match... then with findAll you returns only the group captured.
Hope it helps,

Error with regex, match numbers

I have a string 00000001001300000708303939313833313932E2
so, I want to match everything between 708 & E2..
So I wrote:
(?<=708)(.*\n?)(?=E2) - tested in RegExr (it's working)
Now, from that result 303939313833313932 match to get result
(every second number):
099183192
How ?
To match everything between 708 and E2, use:
708(\d+)
if you are sure that there will be only digits. Otherwise try with:
708(.*?)E2
To match every second digit from 303939313833313932, use:
(?:\d(\d))+
use a global replace:
find: \d(\d)
replace: $1
Are you expecting a regular expression answer to this?
You are perhaps better off doing this using string operations in whatever programming language you're using. If you have text = "abcdefghi..." then do output = text[0] + text[2] + text[4]... in a loop, until you run out of characters.
You haven't specified a programming language, but in Python I would do something like:
>>> text = "abcdefghjiklmnop"
>>> for n, char in enumerate(text):
... if n % 2 == 0: #every second char
... print char
...
a
c
e
g
j
k
m
o

In Perl, how many groups are in the matched regex?

I would like to tell the difference between a number 1 and string '1'.
The reason that I want to do this is because I want to determine the number of capturing parentheses in a regular expression after a successful match. According the perlop doc, a list (1) is returned when there are no capturing groups in the pattern. So if I get a successful match and a list (1) then I cannot tell if the pattern has no parens or it has one paren and it matched a '1'. I can resolve that ambiguity if there is a difference between number 1 and string '1'.
You can tell how many capturing groups are in the last successful match by using the special #+ array. $#+ is the number of capturing groups. If that's 0, then there were no capturing parentheses.
For example, bitwise operators behave differently for strings and integers:
~1 = 18446744073709551614
~'1' = Î ('1' = 0x31, ~'1' = ~0x31 = 0xce = 'Î')
#!/usr/bin/perl
($b) = ('1' =~ /(1)/);
print isstring($b) ? "string\n" : "int\n";
($b) = ('1' =~ /1/);
print isstring($b) ? "string\n" : "int\n";
sub isstring() {
return ($_[0] & ~$_[0]);
}
isstring returns either 0 (as a result of numeric bitwise op) which is false, or "\0" (as a result of bitwise string ops, set perldoc perlop) which is true as it is a non-empty string.
If you want to know the number of capture groups a regex matched, just count them. Don't look at the values they return, which appears to be your problem:
You can get the count by looking at the result of the list assignment, which returns the number of items on the right hand side of the list assignment:
my $count = my #array = $string =~ m/.../g;
If you don't need to keep the capture buffers, assign to an empty list:
my $count = () = $string =~ m/.../g;
Or do it in two steps:
my #array = $string =~ m/.../g;
my $count = #array;
You can also use the #+ or #- variables, using some of the tricks I show in the first pages of Mastering Perl. These arrays have the starting and ending positions of each of the capture buffers. The values in index 0 apply to the entire pattern, the values in index 1 are for $1, and so on. The last index, then, is the total number of capture buffers. See perlvar.
Perl converts between strings and numbers automatically as needed. Internally, it tracks the values separately. You can use Devel::Peek to see this in action:
use Devel::Peek;
$x = 1;
$y = '1';
Dump($x);
Dump($y);
The output is:
SV = IV(0x3073f40) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = PV(0x30698cc) at 0x3073484
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x3079bb4 "1"\0
CUR = 1
LEN = 4
Note that the dump of $x has a value for the IV slot, while the dump of $y doesn't but does have a value in the PV slot. Also note that simply using the values in a different context can trigger stringification or nummification and populate the other slots. e.g. if you did $x . '' or $y + 0 before peeking at the value, you'd get this:
SV = PVIV(0x2b30b74) at 0x3073f44
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 1
PV = 0x3079c5c "1"\0
CUR = 1
LEN = 4
At which point 1 and '1' are no longer distinguishable at all.
Check for the definedness of $1 after a successful match. The logic goes like this:
If the list is empty then the pattern match failed
Else if $1 is defined then the list contains all the catpured substrings
Else the match was successful, but there were no captures
Your question doesn't make a lot of sense, but it appears you want to know the difference between:
$a = "foo";
#f = $a =~ /foo/;
and
$a = "foo1";
#f = $a =~ /foo(1)?/;
Since they both return the same thing regardless if a capture was made.
The answer is: Don't try and use the returned array. Check to see if $1 is not equal to ""