Extract Digits From Matlab String - regex

In Matlab, let's say that I have the following string:
mystring = 'sdfkdsgoeskjgk elkr jtk34s ;3k54352642 643l j3kf p35j535';
And I want to extract all the digits in it to a vector such that each digit is standing by its own, so the output should be like:
output = [3 4 3 5 4 3 5 2 6 4 2....]
I tried to do it using this code and regex:
mystring = 'sdfkdsgoeskjgk elkr jtk34s ;3k54352642 643l j3kf p35j535';
digits = regexp(mystring, '[0-9]');
disp(digits);
But it gives me some weird 4-combined digits instead of what I need.

By default, the output of regexp is in the index of the first character in each match which is why the numbers aren't the same as the digits in your string. You'll want to use the output of regexp to then index into the initial string to get the digits themselves
digits = mystring(regexp(mystring, '[0-9]'));
You will still need to convert these from characters to numbers so you can subtract off '0' to do this conversion
digits = mystring(regexp(mystring, '[0-9]')) - '0';
Alternately, you could specify the 'match' input to regexp to return the actual matching string itself. This will return a cell array which we can then convert to an array of numbers using str2double
digits = str2double(regexp(mystring, '[0-9]', 'match'))

I use transposing instead of any other existing function to convert a string into an array.
mystring = 'sdfkdsgoeskjgk elkr jtk34s ;3k54352642 643l j3kf p35j535';
digits = regexp(mystring, '[0-9]');
array = double(mystring(digits)')'-48; % array of doubles
disp(array);

Related

Extracting numbers using Regex in Matlab

I would like to extract integers from strings from a cell array in Matlab. Each string contains 1 or 2 integers formatted as shown below. Each number can be one or two digits. I would like to convert each string to a 1x2 array. If there is only one number in the string, the second column should be -1. If there are two numbers then the first entry should be the first number, and the second entry should be the second number.
'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'
Thank you very much!
I have tried a few different methods that did not work out. I think that I need to use regex and am having difficulty finding the proper expression.
You can use str2num to convert well formatted chars (which you appear to have) to the correct arrays/scalars. Then simply pad from the end+1 element to the 2nd element (note this is nothing in the case there's already two elements) with the value -1.
This is most clearly done in a small loop, see the comments for details:
% Set up the input
c = { ...
'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'
};
n = cell(size(c)); % Initialise output
for ii = 1:numel(n) % Loop over chars in 'c'
n{ii} = str2num(c{ii}); % convert char to numeric array
n{ii}(end+1:2) = -1; % Extend (if needed) to 2 elements = -1
end
% (Optional) Convert from a cell to an Nx2 array
n = cell2mat(n);
If you really wanted to use regex, you could replace the loop part with something similar:
n = regexp( c, '\d{1,2}', 'match' ); % Match between one and two digits
for ii = 1:numel(n)
n{ii} = str2double(n{ii}); % Convert cellstr of chars to arrays
n{ii}(end+1:2) = -1; % Pad to be at least 2 elements
end
But there are lots of ways to do this without touching regex, for example you could erase the square brackets, split on a comma, and pad with -1 according to whether or not there's a comma in each row. Wrap it all in a much harder to read (vs a loop) cellfun and ta-dah you get a one-liner:
n = cellfun( #(x) [str2double( strsplit( erase(x,{'[',']'}), ',' ) ), -1*ones(1,1-nnz(x==','))], c, 'uni', 0 );
I'd recommend one of the loops for ease of reading and debugging.

How to sort non-numeric strings by converting them to integers? Is there a way to convert strings to unique integers while being ordered?

I am trying to convert strings to integers and sort them based on the integer value. These values should be unique to the string, no other string should be able to produce the same value. And if a string1 is bigger than string2, its integer value should be greater. Ex: since "orange" > "apple", "orange" should have a greater integer value. How can I do this?
I know there are an infinite number of possibilities between just 'a' and 'b' but I am not trying to fit every single possibility into a number. I am just trying to possibly sort, let say 1 million values, not an infinite amount.
I was able to get the values to be unique using the following:
long int order = 0;
for (auto letter : word)
order = order * 26 + letter - 'a' + 1;
return order;
but this obviously does not work since the value for "apple" will be greater than the value for "z".
This is not a homework assignment or a puzzle, this is something I thought of myself. Your help is appreciated, thank you!
You are almost there ... just a minor tweaks are needed:
you are multiplying by 26
however you have letters (a..z) and empty space so you should multiply by 27 instead !!!
Add zeropading
in order to make starting letter the most significant digit you should zeropad/align the strings to common length... if you are using 32bit integers then max size of string is:
floor(log27(2^32)) = 6
floor(32/log2(27)) = 6
Here small example:
int lexhash(char *s)
{
int i,h;
for (h=0,i=0;i<6;i++) // process string
{
if (s[i]==0) break;
h*=27;
h+=s[i]-'a'+1;
}
for (;i<6;i++) h*=27; // zeropad missing letters
return h;
}
returning these:
14348907 a
28697814 b
43046721 c
373071582 z
15470838 abc
358171551 xyz
23175774 apple
224829626 orange
ordered by hash:
14348907 a
15470838 abc
23175774 apple
28697814 b
43046721 c
224829626 orange
358171551 xyz
373071582 z
This will handle all lowercase a..z strings up to 6 characters length which is:
26^6 + 26^5 +26^4 + 26^3 + 26^2 + 26^1 = 321272406 possibilities
For more just use bigger bitwidth for the hash. Do not forget to use unsigned type if you use the highest bit of it too (not the case for 32bit)
You can use position of char:
std::string s("apple");
int result = 0;
for (size_t i = 0; i < s.size(); ++i)
result += (s[i] - 'a') * static_cast<int>(i + 1);
return result;
By the way, you are trying to get something very similar to hash function.

Find Number of 0's at end of integer using POWER QUERY Power Bi

I wanted to find out the number of 0's at end of integer.
Eg for 2020 it should count 1
for 2000 it should count 3
for 3010000 it should count 4
I have no idea to do it without counting all the zeros and not just the ending ones!
someone please help :)
Go to Power Query Editor and add a Custom Colum with this below code-
if Number.Mod([number],100000) = 0 then 5
else if Number.Mod([number],10000) = 0 then 4
else if Number.Mod([number],1000) = 0 then 3
else if Number.Mod([number],100) = 0 then 2
else if Number.Mod([number],10) = 0 then 1
else 0
Considered highst possibility of trailing 0 is 5. You can add more if/else case following the above logic if you predict more numbers of consecutive 0 at the end.
Here is sample output using above logic-
Take advantage of the fact, that text "00123" converted to number will be 2 characters shorter.
= let
TxtRev = Text.Reverse(Number.ToText([num]))&"1", /*convert to text and reverse, add 1 to handle num being 0*/
TxtNoZeroes = Number.ToText(Number.FromText(TxtRev)) /*convert to number to remove starting zeroes and then back to text*/
in
Text.Length(TxtRev)-Text.Length(TxtNoZeroes) /*compare length of original value with length without zeroes*/
This will work for any number of trailing zeroes (up to Int64 capacity of course, minus space for &"1"). Assuming that the column is of number type; if it's a text then just remove Number.ToText in TxtRev. If you have negative numbers or decimals, replace characters not being a digit after converting to text. For initial number being 0 it shows 1, but if it should show 0 just remove &"1".
You can do it as general string manipulation:
= Text.Length(Text.From([number])) - Text.Length(Text.TrimEnd(Text.From(number]), "0"))
We convert the column to string, strip of the zeroes, count take that away from the total length, giving you the amount of stripped zeroes.
Edit: I messed up my first answer, this one should in fact be correct

Regex for matching up to a total of x digits where of those x digits, at most y can be after the decimal

I'm currently writing a validator where I need to check the formats of floats. My code reads in a format of (x,y) where x is the total possible digits in the float and y is the maximum digits out of x that can be after the decimal point. Apologies if this question has already been answered before, but I wasn't able to find anything similar.
For example, given a format of (5,3):
Valid values
55555
555.33
55.333
5555.3
.333
Invalid values
55555.5
555555
5.5555
.55555
This is my first time working with regex so if you guys have any tutorials that you recommend, please send it my way!
You can use a lookahead to ensure both conditions, like
^(?=(?:\.?\d){1,5}$)\d*(?:\.\d{1,3})?$
^ match from the start of the string
(?=(?:\.?\d){1,5}$) check for the presence of 1 up to 5 digits to the end of the string - not caring to much about the correct number of dots
\d* match any number of digits
(?:\.\d{1,3})? match up to 3 decimal places
$ ensure end of the string
See https://regex101.com/r/lrP56w/1
Assuming JS you can try
function validate(value, total, dec) {
let totalRegex = new RegExp(`\\d{0,${total}}$`);
let decimalRegex = new RegExp(`\\.\\d{0,${dec}}$`);
return totalRegex.test(value.replace(".","")) && (!(/\./.test(value)) || decimalRegex.test(value));
}
console.log(validate("555.55", 5, 2));
console.log(validate("55.555", 5, 2));
console.log(validate("5.5", 5, 2));
console.log(validate("55555", 5, 2));
console.log(validate("5.5555", 5, 2));

AS3 regex split

I want to split a number using regex. I have a number like xyz (x and y are single digits, z can be a 2 or three digit number), for example 001 or 103 or 112. I want to split it into separate numbers. This can be done, if I'm not wrong by doing split("",3); This will split the number (saved as string, but I don't think it makes difference in this case) 103 in an array with values 1,0,3.
Since here it's easy,the fact is that the last number z may be a 2 or 3 digit number.
So I could have 1034, 0001, 1011 so on. And I have to split it respectively into [1,0,34] [0,0,01] [1,0,11]
How can I do that?
Thanks
Sergiu
var regex:RegExp = /(\d)(\d)(\d+)/;
var n:Number = 1234;
var res:Array = regex.exec(n.toString()) as Array;
trace(res.join("\n"); /** Traces:
*
* 1234
* 1
* 2
* 34
*
* The first 1234 is the whole matched string
* and the rest are the three (captured) groups.
*/
Found the solution, I was going the hard way...it was just possible to use substr to substract the charcaters I want and the put them in an array.