Convert list of one number to int - regex

I have a regular expression that parses a line# string from a log. That line# is then subjected to another regular expression to just extract the line#.
For example:
Part of this regex:
m = re.match(r"^(\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2}),?(\d{3}),?(?:\s+\[(?:[^\]]+)\])+(?<=])(\s+?[A-Z]+\s+?)+(\s?[a-zA-Z0-9\.])+\s?(\((?:\s?\w)+\))\s?(\s?.)+", line)
Will match this:
(line 206)
Then this regex:
re.findall(r'\b\d+\b', linestr)
Gives me
['206']
In order to further process my information I need to have the line number as an integer and am lost for a solution as to how to do that.

You may try:
line_int = int(re.findall(r'\b\d+\b', linestr)[0])
or if you have more than one element in the list:
lines_int = [int(i) for i in re.findall(r'\b\d+\b', linestr)]
or even
lines_int = map(int, re.findall(r'(\b\d+\b)+', linestr))
I hope it helps -^.^-

Use int() to convert your list of one "string number" to an int:
myl = ['206']
int(myl[0])
206
if you have a list of these, you can conver them all to ints using list comprehension:
[int(i) for i in myl]
resulting in a list of ints.
You can hook this into your code as best fits, e.g.,
int(re.findall(r'\b\d+\b', linestr)[0])

int(re.findall(r'\b\d+\b', linestr)[0])
?

Related

How to call characters from first list with second list

I want to input two comma separated strings: the first a set of strings, the second a set of ranges and return substrings based on ranges, for example:
x=input("Input string to search: ")
search=x.split(',')
y=input("Input numbers to locate: ")
numbers=y.split(',')
I would then like to use the second list of ranges to print out specified characters from the first list.
An example:
Input string to search: abcdefffg,aabcdefghi,bbcccdefghi
Input numbers to locate: 1:2,2:3,5:9
I would like the output to look like this:
bc
bcd
defghi
Any suggestions? Thanks in advance!
split(':') splits a "range" into its two components. map(int, ...) converts them to integers. string[a:b] takes characters at indices a through b.
zip is an easy way to read from two different lists combined.
Let me know if you have any other questions:
x = "abcdefffg,aabcdefghi,bbcccdefghi"
search = x.split(',')
y = "1:2,2:3,5:9"
numbers = y.split(',')
results = []
for string, rng in zip(search, numbers):
start, how_many = map(int, rng.split(':'))
results.append(string[start:start+how_many])
print(" ".join(results))
# Output:
# bc bcd defghi

String to numeric conversion using regex in scala

Hi have an array of numbers as string:
val original_array = Array("-0,1234567",......) which is a string and I want to convert to a numeric Array.
val new_array = Array("1234567", ........)
How can I aheive this in scala?
Using original_array.toDouble is giving error
The simple answer is ...
val arrNums = Array("123", "432", "99").map(_.toDouble)
... but this a little dangerous because it will throw if any of the strings are not proper numbers.
This is safer...
val arrNums = Array("123", "432", "99").collect{ case n
if n matches """\d+""" => n.toDouble
}
... but you'll want to use a regex pattern that covers all cases. This example won't recognize floating point numbers ("1.1") or negatives ("-4"). Something like """-?\d*\.?\d+""" might fit your requirements.

How can I extract a file name based on number string?

I have a list of filenames in a struct array, example:
4x1 struct array with fields:
name
date
bytes
isdir
datenum
where files.name
ans =
ts.01094000.crest.csv
ans =
ts.01100600.crest.csv
etc.
I have another list of numbers (say, 1094000). And I want to find the corresponding file name from the struct.
Please note, that 1094000 doesn't have preceding 0. Often there might be other numbers. So I want to search for '1094000' and find that name.
I know I can do it using Regex. But I have never used that before. And finding it difficult to write for numbers instead of text using strfind. Any suggestion or another method is welcome.
What I have tried:
regexp(files.name,'ts.(\d*)1094000.crest.csv','match');
I think the regular expression you'd want is more like
filenames = {'ts.01100600.crest.csv','ts.01094000.crest.csv'};
matches = regexp(filenames, ['ts\.0*' num2str(1094000) '\.crest\.csv']);
matches = ~cellfun('isempty', matches);
filenames(matches)
For a solution with strfind...
Pre-16b:
match = ~cellfun('isempty', strfind({files.name}, num2str(1094000)),'UniformOutput',true)
files(match)
16b+:
match = contains({files.name}, string(1094000))
files(match)
However, the strfind way might have issues if the number you are looking for exists in unexpected places such as looking for 10 in ["01000" "00101"].
If your filenames match the pattern ts.NUMBER.crest.csv, then in 16b+ you could do:
str = {files.name};
str = extractBetween(str,4,'.');
str = strip(str,'left','0');
matches = str == string(1094000);
files(matches)

Dynamic regexprep in MATLAB

I have the following strings in a long string:
a=b=c=d;
a=b;
a=b=c=d=e=f;
I want to first search for above mentioned pattern (X=Y=...=Z) and then output like the following for each of the above mentioned strings:
a=d;
b=d;
c=d;
a=b;
a=f;
b=f;
c=f;
d=f;
e=f;
In general, I want all the variables to have an equal sign with the last variable on the extreme right of the string. Is there a way I can do it using regexprep in MATLAB. I am able to do it for a fixed length string, but for variable length, I have no idea how to achieve this. Any help is appreciated.
My attempt for the case of two equal signs is as follows:
funstr = regexprep(funstr, '([^;])+\s*=\s*+(\w+)+\s*=\s*([^;])+;', '$1 = $3; \n $2 = $3;\n');
Not a regexp but if you stick to Matlab you can make use of the cellfun function to avoid loop:
str = 'a=b=c=d=e=f;' ; %// input string
list = strsplit(str,'=') ;
strout = cellfun( #(a) [a,'=',list{end}] , list(1:end-1), 'uni', 0).' %'// Horchler simplification of the previous solution below
%// this does the same than above but more convoluted
%// strout = cellfun( #(a,b) cat(2,a,'=',b) , list(1:end-1) , repmat(list(end),1,length(list)-1) , 'uni',0 ).'
Will give you:
strout =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
Note: As Horchler rightly pointed out in comment, although the cellfun instruction allows to compact your code, it is just a disguised loop. Moreover, since it runs on cell, it is notoriously slow. You won't see the difference on such simple inputs, but keep this use when super performances are not a major concern.
Now if you like regex you must like black magic code. If all your strings are in a cell array from the start, there is a way to (over)abuse of the cellfun capabilities to obscure your code do it all in one line.
Consider:
strlist = {
'a=b=c=d;'
'a=b;'
'a=b=c=d=e=f;'
};
Then you can have all your substring with:
strout = cellfun( #(s)cellfun(#(a,b)cat(2,a,'=',b),s(1:end-1),repmat(s(end),1,length(s)-1),'uni',0).' , cellfun(#(s) strsplit(s,'=') , strlist , 'uni',0 ) ,'uni',0)
>> strout{:}
ans =
'a=d;'
'b=d;'
'c=d;'
ans =
'a=b;'
ans =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
This gives you a 3x1 cell array. One cell for each group of substring. If you want to concatenate them all then simply: strall = cat(2,strout{:});
I haven't had much experience w/ Matlab; but your problem can be solved by a simple string split function.
[parts, m] = strsplit( funstr, {' ', '='}, 'CollapseDelimiters', true )
Now, store the last part of parts; and iterate over parts until that:
len = length( parts )
for i = 1:len-1
print( strcat(parts(i), ' = ', parts(len)) )
end
I do not know what exactly is the print function in matlab. You can update that accordingly.
There isn't a single Regex that you can write that will cover all the cases. As posted on this answer:
https://stackoverflow.com/a/5019658/3393095
However, you have a few alternatives to achieve your final result:
You can get all the values in the line with regexp, pick the last value, then use a for loop iterating throughout the other values to generate the output. The regex to get the values would be this:
matchStr = regexp(str,'([^=;\s]*)','match')
If you want to use regexprep at any means, you should write a pattern generator and a replace expression generator, based on number of '=' in the input string, and pass these as parameters of your regexprep func.
You can forget about Regex and Split the input to generate the output looping throughout the values (similarly to alternative #1) .

Why is max number ignoring two-digit numbers?

At the moment I am saving a set of variables to a text file. I am doing following to check if my code works, but whenever I use a two-digit numbers such as 10 it would not print this number as the max number.
If my text file looked like this.
tom:5
tom:10
tom:1
It would output 5 as the max number.
name = input('name')
score = 4
if name == 'tom':
fo= open('tom.txt','a')
fo.write('Tom: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
if name == 'wood':
fo= open('wood.txt','a')
fo.write('Wood: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
tomL2 = []
woodL2 = []
fo = open('tom.txt','r')
tomL = fo.readlines()
tomLi = tomL2 + tomL
fo.close
tomLL=max(tomLi)
print(tomLL)
fo = open('wood.txt','r')
woodL = fo.readlines()
woodLi = woodL2 + woodL
fo.close
woodLL=max(woodLi)
print(woodLL)
You are comparing strings, not numbers. You need to convert them into numbers before using max. For example, you have:
tomL = fo.readlines()
This contains a list of strings:
['tom:5\n', 'tom:10\n', 'tom:1\n']
Strings are ordered lexicographically (much like how words would be ordered in an English dictionary). If you want to compare numbers, you need to turn them into numbers first:
tomL_scores = [int(s.split(':')[1]) for s in tomL]
The parsing is done in the following way:
….split(':') separates the string into parts using a colon as the delimiter:
'tom:5\n' becomes ['tom', '5\n']
…[1] chooses the second element from the list:
['tom', '5\n'] becomes '5\n'
int(…) converts a string into an integer:
'5\n' becomes 5
The list comprehension [… for s in tomL] applies this sequence of operations to every element of the list.
Note that int (or similarly float) are rather picky about what it accepts: it must be in the form of a valid numeric literal or it will be rejected with an error (although preceding and trailing whitespace is allowed). This is why you need ….split(':')[1] to massage the string into a form that it's willing to accept.
This will yield:
[5, 10, 1]
Now, you can apply max to obtain the largest score.
As a side-note, the statement
fo.close
will not close a file, since it doesn't actually call the function. To call the function you must enclose the arguments in parentheses, even if there are none:
fo.close()