MATLAB to C++: Coder: not consistent array dimension concatenation - c++

adapting the code from this coder-compatible solution to read csv data i ran into the following issue during the runtime issue check of Matlab Coder:
Error using cat>>check_non_axis_size (line 283)
Dimensions of arrays being concatenated are not consistent.
Error in cat>>cat_impl (line 102)
check_non_axis_size(isempty, i, sizes{i}, varargin{:});
Error in cat (line 22)
result = cat_impl(#always_isempty_matrix, axis, varargin{:});
Error in readCsv (line 28)
coder.ceval('sscanf', [token, NULL], ['%lf', NULL], coder.wref(result(k)));
my adaptation:
function result = readCsv(filepath, rows, columns)
NULL = char(0);
fid = fopen(filepath, 'r');
% read entire file into char array
remainder = fread(fid, '*char');
% preallocation for speedup
result = coder.nullcopy(zeros(columns,rows));
k = 1;
while ~isempty(remainder)
% comma, newline
delimiters = [',', char(10)];
% strtok ignores leading delimiter,
% returns chars upto, but not including,
% the next delimiter
[token,remainder] = strtok(remainder, delimiters);
% string to double conversion
% no need to worry about return type / order
% since we only look at one token at a time
if coder.target('MATLAB')
result(k) = sscanf(token, '%f');
else
coder.ceval('sscanf', [token, NULL], ['%lf', NULL], coder.wref(result(k)));
end
k = k + 1;
end
% workaround for filling column-major but breaks on single-line csv
result = reshape(result,rows, [])';
disp(k)
fclose(fid);
the .csv in case is a 200x51 matrix
testing in matlab: works as expected - the .csv is read 1:1 as with csvread()
the error pops up during code generation, and as far as I understand, an issue with writing the result of sscanf into the preallocated result array - but only for the c code.
Addendum: a line with only integer values (1,1,1,...,0) works fine, a line with actual floats (6.7308,38.7101,...,40.5999,0) breaks with the aforementioned error.

remainder = fread(f, [1, Inf], '*char');
turns out sizeA argument is not optional in this case

Related

Extracting numbers using Regex in Matlab

I would like to extract integers from strings from a cell array in Matlab. Each string contains 1 or 2 integers formatted as shown below. Each number can be one or two digits. I would like to convert each string to a 1x2 array. If there is only one number in the string, the second column should be -1. If there are two numbers then the first entry should be the first number, and the second entry should be the second number.
'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'
Thank you very much!
I have tried a few different methods that did not work out. I think that I need to use regex and am having difficulty finding the proper expression.
You can use str2num to convert well formatted chars (which you appear to have) to the correct arrays/scalars. Then simply pad from the end+1 element to the 2nd element (note this is nothing in the case there's already two elements) with the value -1.
This is most clearly done in a small loop, see the comments for details:
% Set up the input
c = { ...
'[1, 2]'
'[3]'
'[10, 3]'
'[1, 12]'
'[11, 12]'
};
n = cell(size(c)); % Initialise output
for ii = 1:numel(n) % Loop over chars in 'c'
n{ii} = str2num(c{ii}); % convert char to numeric array
n{ii}(end+1:2) = -1; % Extend (if needed) to 2 elements = -1
end
% (Optional) Convert from a cell to an Nx2 array
n = cell2mat(n);
If you really wanted to use regex, you could replace the loop part with something similar:
n = regexp( c, '\d{1,2}', 'match' ); % Match between one and two digits
for ii = 1:numel(n)
n{ii} = str2double(n{ii}); % Convert cellstr of chars to arrays
n{ii}(end+1:2) = -1; % Pad to be at least 2 elements
end
But there are lots of ways to do this without touching regex, for example you could erase the square brackets, split on a comma, and pad with -1 according to whether or not there's a comma in each row. Wrap it all in a much harder to read (vs a loop) cellfun and ta-dah you get a one-liner:
n = cellfun( #(x) [str2double( strsplit( erase(x,{'[',']'}), ',' ) ), -1*ones(1,1-nnz(x==','))], c, 'uni', 0 );
I'd recommend one of the loops for ease of reading and debugging.

Time optimize C++ function to find number of decoding possibilities

This is an interview practice problem from CodeFights. I have a solution that's working except for the fact that it takes too long to run for very large inputs.
Problem Description (from the link above)
A top secret message containing uppercase letters from 'A' to 'Z' has been encoded as numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
You are an FBI agent and you need to determine the total number of ways that the message can be decoded.
Since the answer could be very large, take it modulo 10^9 + 7.
Example
For message = "123", the output should be
mapDecoding(message) = 3.
"123" can be decoded as "ABC" (1 2 3), "LC" (12 3) or "AW" (1 23), so the total number of ways is 3.
Input/Output
[time limit] 500ms (cpp)
[input] string message
A string containing only digits.
Guaranteed constraints:
0 ≤ message.length ≤ 105.
[output] integer
The total number of ways to decode the given message.
My Solution so far
We have to implement the solution in a function int mapDecoding(std::string message), so my entire solution is as follows:
/*0*/ void countValidPaths(int stIx, int endIx, std::string message, long *numPaths)
/*1*/ {
/*2*/ //check out-of-bounds error
/*3*/ if (endIx >= message.length())
/*4*/ return;
/*5*/
/*6*/ int subNum = 0, curCharNum;
/*7*/ //convert substr to int
/*8*/ for (int i = stIx; i <= endIx; ++i)
/*9*/ {
/*10*/ curCharNum = message[i] - '0';
/*11*/ subNum = subNum * 10 + curCharNum;
/*12*/ }
/*13*/
/*14*/ //check for leading 0 in two-digit number, which would not be valid
/*15*/ if (endIx > stIx && subNum < 10)
/*16*/ return;
/*17*/
/*18*/ //if number is valid
/*19*/ if (subNum <= 26 && subNum >= 1)
/*20*/ {
/*21*/ //we've reached the end of the string with success, therefore return a 1
/*22*/ if (endIx == (message.length() - 1) )
/*23*/ ++(*numPaths);
/*24*/ //branch out into the next 1- and 2-digit combos
/*25*/ else if (endIx == stIx)
/*26*/ {
/*27*/ countValidPaths(stIx, endIx + 1, message, numPaths);
/*28*/ countValidPaths(stIx + 1, endIx + 1, message, numPaths);
/*29*/ }
/*30*/ //proceed to the next digit
/*31*/ else
/*32*/ countValidPaths(endIx + 1, endIx + 1, message, numPaths);
/*33*/ }
/*34*/ }
/*35*/
/*36*/ int mapDecoding(std::string message)
/*37*/ {
/*38*/ if (message == "")
/*39*/ return 1;
/*40*/ long numPaths = 0;
/*41*/ int modByThis = static_cast<int>(std::pow(10.0, 9.0) + 7);
/*42*/ countValidPaths(0, 0, message, &numPaths);
/*43*/ return static_cast<int> (numPaths % modByThis);
/*44*/ }
The Issue
I have passed 11/12 of CodeFight's initial test cases, e.g. mapDecoding("123") = 3 and mapDecoding("11115112112") = 104. However, the last test case has message = "1221112111122221211221221212212212111221222212122221222112122212121212221212122221211112212212211211", and my program takes too long to execute:
Expected_output: 782204094
My_program_output: <empty due to timeout>
I wrote countValidPaths() as a recursive function, and its recursive calls are on lines 27, 28 and 32. I can see how such a large input would cause the code to take so long, but I'm racking my brain trying to figure out what more efficient solutions would cover all possible combinations.
Thus the million dollar question: what suggestions do you have to optimize my current program so that it runs in far less time?
A couple of suggestions.
First this problem can probably be formulated as a Dynamic Programming problem. It has that smell to me. You are computing the same thing over and over again.
The second is the insight that long contiguous sequences of "1"s and "2"s are a Fibonacci sequence in terms of the number of possibilities. Any other value terminates the sequence. So you can split the strings into runs of of ones and twos terminated by any other number. You will need special logic for a termination of zero since it does not also correspond to a character. So split the strings count, the length of each segment, look up the fibonacci number (which can be pre-computed) and multiply the values. So your example "11115112112" yields "11115" and "112112" and f(5) = 8 and f(6) = 13, 8*13 = 104.
Your long string is a sequence of 1's and 2's that is 100 digits long. The following Java (Sorry, my C++ is rusty) program correctly computes its value by this method
public class FibPaths {
private static final int MAX_LEN = 105;
private static final BigInteger MOD_CONST = new BigInteger("1000000007");
private static BigInteger[] fibNum = new BigInteger[MAX_LEN];
private static void computeFibNums() {
fibNum[0] = new BigInteger("1");
fibNum[1] = new BigInteger("1");
for (int i = 2; i < MAX_LEN; i++) {
fibNum[i] = fibNum[i-2].add(fibNum[i-1]);
}
}
public static void main(String[] argv) {
String x = "1221112111122221211221221212212212111221222212122221222112122212121212221212122221211112212212211211";
computeFibNums();
BigInteger val = fibNum[x.length()].mod(MOD_CONST);
System.out.println("N=" + x.length() + " , val = " + val);
}
}

How can I create an array from a messy text file

I have a text file in the form below...
Some line of text
Some line of text
Some line of text
--
data entry 0 (i = 0, j = 0); value = 1.000000
data entry 1 (i = 0, j = 1); value = 1.000000
data entry 2 (i = 0, j = 2); value = 1.000000
data entry 3 (i = 0, j = 3); value = 1.000000
etc for quite a large number of lines. The total array ends up being 433 rows x 400 columns. There is a line of hyphens -- separating each new i value. So far I have the following code:
f = open('text_file_name', 'r')
lines = f.readlines()
which is simply opening the file and converting it to a list with each line as a separate string. I need to be able create an array with the given values for i and j positions - let's call the array A. The value of A[0,0] should be 1.000000. I don't know how I can get from a messy text file (at the stage I am, messy list) to a usable array
EDIT:
The expected output is a NumPy array. If I can get to that point, I can work through the rest of the tasks in the problem
UPDATE:
Thank you, Lukasz, for the suggestion below. I sort of understand the code you wrote, but I don't understand it well enough to use it. However, you have given me some good ideas on what to do. The data entries begin on line 12 of the text file. Values for i are within the 22nd and 27th character places, values for j are within the 33rd and 39th character places, and values for value are within the 49th and 62nd character places. I realize this is overly specific for this particular text file, but my professor is fine with that.
Now, I've written the following code using the formatting of this text file
for x in range(12,len(lines)):
if not lines[x].startswith(' data entry'):
continue
else:
i = int(lines[x][22:28])
j = int(lines[x][33:39])
r = int(lines[x][49:62])
matrix[i,j] = r
print matrix
and the following ValueError message is given:
r = int(lines[x][49:62])
ValueError: invalid literal for int() with base 10: '1.000000'
Can anyone explain why this is given (I should be able to convert the string '1.000000' to integer 1) and what I can do to correct the issue?
You may simply skip all lines that does not look like data line.
For retrieving indices simple regular expression is introduced.
import numpy as np
import re
def parse(line):
m = re.search('\(i = (\d+), j = (\d+)\); value = (\S+)', line)
if not m:
raise ValueError("Invalid line", line)
return int(m.group(1)), int(m.group(2)), float(m.group(3))
R = 433
C = 400
data_file = 'file.txt'
matrix = np.zeros((R, C))
with open(data_file) as f:
for line in f:
if not line.startswith('data entry'):
continue
i, j, v = parse(line)
matrix[i, j] = v
print matrix
Main trouble here is hardcoded matrix size. Ideally you' somehow detect a size of destination matrix prior to reading data, or use other data structure and rebuild numpy array from said structure.

How to convert a binary byte into a printable numeric value?

I have to convert the CRYPTO++ AES ciphertext of 128 bits into a pribtable numerical string.
I am currently using the following code to do the casting, but bitset is too slow for my case. Does anyone know any efficient way of doing this?
string output = "";
for (std::size_t i = 0; i < 16; ++ i) {
output += bitset<8>(ciphertext[i]).to_string();
}
How to convert a binary byte into a printable numeric value? Thanks a lot!
There are plenty of clever methods to compute a binary string from a number, but it doesn't really matter; Whatever method you use, you can use that method to fill up a table once:
std::string bytes[256];
for (unsigned char c = 0; c<=255; ++c) {
bytes[c] = bitset<8>(c).to_string();
}
And then bytes[c] will give you the string for a particular byte.
In your post you show four lines of code. Below is what those four lines of code would change to using the above precomputed strings:
string output = "";
for (std::size_t i = 0; i < 16; ++ i) {
output += bytes[ciphertext[i]];
}
Also, your code likely involves some allocations during your loop. The best way to avoid those depends entirely on how you use the output string, but at the minimum output.reserve(16*8) can't hurt.
I would do
char ct_b[16];
char ct_h[33]; // 2 hex digits per byte + NUL
snprintf(ct_h, 33,
"%02x%02x%02x%02x%02x%02x%02x%02x"
"%02x%02x%02x%02x%02x%02x%02x%02x",
ct_h[ 0], ct_h[ 1], ct_h[ 2], ct_h[ 3],
ct_h[ 4], ct_h[ 5], ct_h[ 6], ct_h[ 7],
ct_h[ 8], ct_h[ 9], ct_h[10], ct_h[11],
ct_h[12], ct_h[13], ct_h[14], ct_h[15]);
This will certainly be faster than what you have, at the expense of a good bit more repetition. It does produce hexadecimal rather than binary, but it's very likely that hex is what you really want.
(In case you haven't seen string constant concatenation before: The absence of a comma after the first half of the string constant is intentional.)
(Please tell me you aren't using ECB.)
string output = "";
for (std::size_t i = 0; i < 16; ++ i) {
output += bitset<8>(ciphertext[i]).to_string();
}
There's also the Crypto++ source/sink method if you are itnerested:
string output;
ArraySource as(ciphertext, sizeof(ciphertext),
true /*pump*/,
new HexEncoder(
new StringSink(output)
) // HexEncoder
); // ArraySource

Regarding the .replace() Function

I am fairly new to Python and am trying to create my own small program. Im having trouble with the replace function. I want to replace every even position in a string with its position number, and a problem occurs when the position becomes greater than 10; it just keeps replacing every character after 10 with an even number. Here is my code
def replaceEvenUntil(st,n):
for i in range(len(st)):
if i % 2 == float(0):
st = st.replace(st[i], str(i), n)
return st
>>> replaceEvenUntil("abcdefghijklmnop", 100)
'0b2d4f6h8j101214161820'
Where in my code have I made my error?
A few things:
float and str are functions in Python. You don't need to write int(0) or str('foo').
str.replace('a', 'b') replaces all occurrences of a with b. You don't really want that.
You're re-assigning st in the loop, but the size of st may change (10 is two characters), so you'll get off-by-one errors as the strings grow larger than 10 characters long.
I would construct a temporary string instead:
def replaceEvenUntil(s, n):
result = ''
for i in range(min(n, len(s))):
if i % 2 == 0:
result += str(i)
else:
result += s[i]
return result
Or with enumerate():
def replaceEvenUntil(s, n):
result = ''
for i, c in enumerate(s):
if i <= n and i % 2 == 0:
result += str(i)
else:
result += s[i]
return result