Removing Measurement Units from Cell Array - regex

I am trying to remove the units out of a column of cell array data i.e.:
cArray =
time temp
2022-05-10 20:19:43 '167 °F'
2022-05-10 20:19:53 '173 °F'
2022-05-10 20:20:03 '177 °F'
...
2022-06-09 20:18:10 '161 °F'
I have tried str2double but get all NaN.
I have found some info on regexp but don't follow exactly as the example is not the same.
Can anyone help me get the temp column to only read the value i.e.:
cArray =
time temp
2022-05-10 20:19:43 167
2022-05-10 20:19:53 173
2022-05-10 20:20:03 177
...
2022-06-09 20:18:10 161

For some cell array of data
cArray = { ...
1, '123 °F'
2, '234 °F'
3, '345 °F'
};
The easiest option is if we can safely assume the temperature data always starts with numeric values, and you want all of the numeric values. Then we can use regex to match only numbers
temps = regexp( cArray(:,2), '\d+', 'match', 'once' );
The match option causes regexp to return the matching string rather than the index of the match, and once means "stop at the first match" so that we ignore everything after the first non-numeric character.
The pattern '\d+' means "one or more numbers". You could expand it to match numbers with a decimal part using '\d+(\.\d+)?' instead if that's a requirement.
Then if you want to actually output numbers, you should use str2double. You could do this in a loop, or use cellfun which is a compact way of achieving the same thing.
temps = cellfun( #str2double, temps, 'uni', 0 ); % 'uni'=0 to retain cell array
Finally you can override the column in cArray
cArray(:,2) = temps;

Related

NaN returned when converting string to number within a for loop

for n=1:37
for m=2:71
rep1 = regexp(Cell1{n,m}, 'f[0-9]*', 'match')
rep2 = regexp(rep1, '[0-9]*', 'match')
rep2 = [rep2{:}]
cln = str2double(rep2)
Cell2{n,cln} = Cell1{n,m}
end
end
Cell 1 is a 37x71 Cell, Cell 2 is a 37x71 empty cell.
Ex
Cell1{1,2} = -(f32.*x1.*x6)./v1
If I run each part of the loop above individually, the function works as intended. However, it returns cln as a NaN when the whole loop is executed.
You are getting a NaN because your regex doesn't match one of the values of Cell1 and returns an empty string (which str2double converts to a NaN).
But let's take a step back for a second here. You can use regexp on cell arrays so there is no need to loop through all of your elements. Also, you can use a look behind assertion to look for that "f" that precedes your number therefore preventing the use of regexp twice.
stringNumber = regexp(Cell1, '(?<=f)[0-9]*', 'match', 'once');
numbers = str2double(stringNumber);
You can then check for NaNs (isnan(numbers)) and look closer at the elements of Cell1 to see why your regex isn't finding a number in a particular string.
Once you get that sorted out, you can assign to Cell2 like you are doing
Cell2 = cell(37, 71);
for k = 1:numel(numbers)
row = mod(k - 1, size(Cell1, 2)) + 1;
Cell2(row, numbers(k)) = Cell1(k);
end

Dynamic regexprep in MATLAB

I have the following strings in a long string:
a=b=c=d;
a=b;
a=b=c=d=e=f;
I want to first search for above mentioned pattern (X=Y=...=Z) and then output like the following for each of the above mentioned strings:
a=d;
b=d;
c=d;
a=b;
a=f;
b=f;
c=f;
d=f;
e=f;
In general, I want all the variables to have an equal sign with the last variable on the extreme right of the string. Is there a way I can do it using regexprep in MATLAB. I am able to do it for a fixed length string, but for variable length, I have no idea how to achieve this. Any help is appreciated.
My attempt for the case of two equal signs is as follows:
funstr = regexprep(funstr, '([^;])+\s*=\s*+(\w+)+\s*=\s*([^;])+;', '$1 = $3; \n $2 = $3;\n');
Not a regexp but if you stick to Matlab you can make use of the cellfun function to avoid loop:
str = 'a=b=c=d=e=f;' ; %// input string
list = strsplit(str,'=') ;
strout = cellfun( #(a) [a,'=',list{end}] , list(1:end-1), 'uni', 0).' %'// Horchler simplification of the previous solution below
%// this does the same than above but more convoluted
%// strout = cellfun( #(a,b) cat(2,a,'=',b) , list(1:end-1) , repmat(list(end),1,length(list)-1) , 'uni',0 ).'
Will give you:
strout =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
Note: As Horchler rightly pointed out in comment, although the cellfun instruction allows to compact your code, it is just a disguised loop. Moreover, since it runs on cell, it is notoriously slow. You won't see the difference on such simple inputs, but keep this use when super performances are not a major concern.
Now if you like regex you must like black magic code. If all your strings are in a cell array from the start, there is a way to (over)abuse of the cellfun capabilities to obscure your code do it all in one line.
Consider:
strlist = {
'a=b=c=d;'
'a=b;'
'a=b=c=d=e=f;'
};
Then you can have all your substring with:
strout = cellfun( #(s)cellfun(#(a,b)cat(2,a,'=',b),s(1:end-1),repmat(s(end),1,length(s)-1),'uni',0).' , cellfun(#(s) strsplit(s,'=') , strlist , 'uni',0 ) ,'uni',0)
>> strout{:}
ans =
'a=d;'
'b=d;'
'c=d;'
ans =
'a=b;'
ans =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
This gives you a 3x1 cell array. One cell for each group of substring. If you want to concatenate them all then simply: strall = cat(2,strout{:});
I haven't had much experience w/ Matlab; but your problem can be solved by a simple string split function.
[parts, m] = strsplit( funstr, {' ', '='}, 'CollapseDelimiters', true )
Now, store the last part of parts; and iterate over parts until that:
len = length( parts )
for i = 1:len-1
print( strcat(parts(i), ' = ', parts(len)) )
end
I do not know what exactly is the print function in matlab. You can update that accordingly.
There isn't a single Regex that you can write that will cover all the cases. As posted on this answer:
https://stackoverflow.com/a/5019658/3393095
However, you have a few alternatives to achieve your final result:
You can get all the values in the line with regexp, pick the last value, then use a for loop iterating throughout the other values to generate the output. The regex to get the values would be this:
matchStr = regexp(str,'([^=;\s]*)','match')
If you want to use regexprep at any means, you should write a pattern generator and a replace expression generator, based on number of '=' in the input string, and pass these as parameters of your regexprep func.
You can forget about Regex and Split the input to generate the output looping throughout the values (similarly to alternative #1) .

Why is max number ignoring two-digit numbers?

At the moment I am saving a set of variables to a text file. I am doing following to check if my code works, but whenever I use a two-digit numbers such as 10 it would not print this number as the max number.
If my text file looked like this.
tom:5
tom:10
tom:1
It would output 5 as the max number.
name = input('name')
score = 4
if name == 'tom':
fo= open('tom.txt','a')
fo.write('Tom: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
if name == 'wood':
fo= open('wood.txt','a')
fo.write('Wood: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
tomL2 = []
woodL2 = []
fo = open('tom.txt','r')
tomL = fo.readlines()
tomLi = tomL2 + tomL
fo.close
tomLL=max(tomLi)
print(tomLL)
fo = open('wood.txt','r')
woodL = fo.readlines()
woodLi = woodL2 + woodL
fo.close
woodLL=max(woodLi)
print(woodLL)
You are comparing strings, not numbers. You need to convert them into numbers before using max. For example, you have:
tomL = fo.readlines()
This contains a list of strings:
['tom:5\n', 'tom:10\n', 'tom:1\n']
Strings are ordered lexicographically (much like how words would be ordered in an English dictionary). If you want to compare numbers, you need to turn them into numbers first:
tomL_scores = [int(s.split(':')[1]) for s in tomL]
The parsing is done in the following way:
….split(':') separates the string into parts using a colon as the delimiter:
'tom:5\n' becomes ['tom', '5\n']
…[1] chooses the second element from the list:
['tom', '5\n'] becomes '5\n'
int(…) converts a string into an integer:
'5\n' becomes 5
The list comprehension [… for s in tomL] applies this sequence of operations to every element of the list.
Note that int (or similarly float) are rather picky about what it accepts: it must be in the form of a valid numeric literal or it will be rejected with an error (although preceding and trailing whitespace is allowed). This is why you need ….split(':')[1] to massage the string into a form that it's willing to accept.
This will yield:
[5, 10, 1]
Now, you can apply max to obtain the largest score.
As a side-note, the statement
fo.close
will not close a file, since it doesn't actually call the function. To call the function you must enclose the arguments in parentheses, even if there are none:
fo.close()

decision on regular expression length

I want to accomplish the following requirements using Regex only (no C# code can be used )
• BTN length is 12 and BTN starts with 0[123456789] then it should remove one digit from left and one digit from right.
WORKING CORRECTLY
• BTN length is 12 and it’s not the case stated above then it should always return 10 right digits by removing 2 from the start. (e.g. 491234567891 should be changed to 1234567891)
NOT WORKING CORRECTLY
• BTN length is 11 and it should remove one digit from left. WORKING CORRECTLY
for length <=10 BTNs , nothing is required to be done , they would remain as it is or Regex may get failed too on them , thats acceptable .
USING SQL this can be achieved like this
case when len(BTN) = 12 and BTN like '0[123456789]%' then SUBSTRING(BTN,2,10) else RIGHT(BTN,10) end
but how to do this using Regex .
So far I have used and able to get some result correct using this regex
[0*|\d\d]*(.{10}) but by this regex I am not able to correctly remove 1st and last character of a BTN like this 015732888810 to 1573288881 as this regex returns me this 5732888810 which is wrong
code is
string s = "111112573288881,0573288881000,057328888105,005732888810,15732888815,344956345335,004171511326,01777203102,1772576210,015732888810,494956345335";
string[] arr = s.Split(',');
foreach (string ss in arr)
{
// Match mm = Regex.Match(ss, #"\b(?:00(\d{10})|0(\d{10})\d?|(\d{10}))\b");
// Match mm = Regex.Match(ss, "0*(.{10})");
// ([0*|\\d\\d]*(.{10}))|
Match mm = Regex.Match(ss, "[0*|\\d\\d]*(.{10})");
// Match mm = Regex.Match(ss, "(?(^\\d{12}$)(.^{12}$)|(.^{10}$))");
// Match mm = Regex.Match(ss, "(info)[0*|\\d\\d]*(.{10}) (?(1)[0*|\\d\\d]*(.{10})|[0*|\\d\\d]*(.{10}))");
string m = mm.Groups[1].Value;
Console.WriteLine("Original BTN :"+ ss + "\t\tModified::" + m);
}
This should work:
(0(\d{10})0|\d\d(\d{10}))
UPDATE:
(0(\d{10})0|\d{1,2}(\d{10}))
1st alternate will match 12-digits with 0 on left and 0 on right and give you only 10 in between.
2nd alternate will match 11 or 12 digits and give you the right 10.
EDIT:
The regex matches the spec, but your code doesn't read the results correctly. Try this:
Match mm = Regex.Match(ss, "(0(\\d{10})0|\\d{1,2}(\\d{10}))");
string m = mm.Groups[2].Value;
if (string.IsNullOrEmpty(m))
m = mm.Groups[3].Value;
Groups are as follows:
index 0: returns full string
index 1: returns everything inside the outer closure
index 2: returns only what matches in the closure inside the first alternate
index 3: returns only what matches in the closure inside the second alternate
NOTE: This does not deal with anything greater than 12 digits or less than 11. Those entries will either fail or return 10 digits from somewhere. If you want results for those use this:
"(0(\\d{10})0|\\d*(\\d{10}))"
You'll get rightmost 10 digits for more than 12 digits, 10 digits for 10 digits, nothing for less than 10 digits.
EDIT:
This one should cover your additional requirements from the comments:
"^(?:0|\\d*)(\\d{10})0?$"
The (?:) makes a grouping excluded from the Groups returned.
EDIT:
This one might work:
"^(?:0?|\\d*)(\\d{10})\\d?$"
(?(^\d{12}$)(?(^0[1-9])0?(?<digit>.{10})|\d*(?<digit>.{10}))|\d*(?<digit>.{10}))
which does the exact same thing as sql query + giving result in Group[1] all the time so i didn't had to change the code a bit :)

How to separate a line of input into multiple variables?

I have a file that contains rows and columns of information like:
104857 Big Screen TV 567.95
573823 Blender 45.25
I need to parse this information into three separate items, a string containing the identification number on the left, a string containing the item name, and a double variable containing the price. The information is always found in the same columns, i.e. in the same order.
I am having trouble accomplishing this. Even when not reading from the file and just using a sample string, my attempt just outputs a jumbled mess:
string input = "104857 Big Screen TV 567.95";
string tempone = "";
string temptwo = input.substr(0,1);
tempone += temptwo;
for(int i=1 ; temptwo != " " && i < input.length() ; i++)
{
temptwo = input.substr(j,j);
tempone += temp2;
}
cout << tempone;
I've tried tweaking the above code for quite some time, but no luck, and I can't think of any other way to do it at the moment.
You can find the first space and the last space using std::find_first_of and std::find_last_of . You can use this to better split the string into 3 - first space comes after the first variable and the last space comes before the third variable, everything in between is the second variable.
How about following pseudocode:
string input = "104857 Big Screen TV 567.95";
string[] parsed_output = input.split(" "); // split input string with 'space' as delimiter
// parsed_output[0] = 104857
// parsed_output[1] = Big
// parsed_output[2] = Screen
// parsed_output[3] = TV
// parsed_output[4] = 567.95
int id = stringToInt(parsed_output[0]);
string product = concat(parsed_output[1], parsed_output[2], ... ,parsed_output[length-2]);
double price = stringToDouble(parsed_output[length-1]);
I hope, that's clear.
Well try breaking down the files components:
you know a number always comes first, and we also know a number has no white spaces.
The string following the number CAN have whitespaces, but won't contain any numbers(i would assume)
After this title, you're going to have more numbers(with no whitespaces)
from these components, you can deduce:
grabbing the first number is as simple as reading in using the filestream <<.
getting the string requires you to check until you reach a number, grabbing one character at a time and inserting that into a string. the last number is just like the first, using the filestream <<
This seems like homework so i'll let you put the rest together.
I would try a regular expression, something along these lines:
^([0-9]+)\s+(.+)\s+([0-9]+\.[0-9]+)$
I am not very good at regex syntax, but ([0-9]+) corresponds to a sequence of digits (this is the id), ([0-9]+\.[0-9]+) is the floating point number (price) and (.+) is the string that is separated from the two number by sequences of "space" characters: \s+.
The next step would be to check if you need this to work with prices like ".50" or "10".