nul bytes in regexp MATLAB - regex

Can someone explain what MATLAB is doing with nul bytes (x00) in regular expressions?
Examples:
>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
1 % current
4 % expected
>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
4 % current
4 % expected
>> regexp(char([0 0 0 1 0 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
[] % current
[] % expected
>> regexp(char([0 0 0 0 10 0 0 1 0 0 10 0 0 0]),char([0 0 0 0 46 0 0 10]))
ans =
1 % current
[] % expected
>> regexp(char([0 0 0 0 0 0 0 1 0 0 10 0 0 0]),char([1 0 0 0 46 0 0 10]))
ans =
[] % current
[] % expected
The answer might simply be, MATLAB regular expression isn't meant to handle non printable characters, but I would assume it would error if this was the case.
EDIT: The 46 is expected to be '.' as in the regex wildcard.
EDIT2:
>> regexp(char([0 0 0 0 50 0 0 100 0 0 90 0 0 0]),char([0 0 46 0 0 90]))
ans =
1 9
I realized it could have been 10 being a special character so this one has only printable and nul bytes. I would expect this one to only match 9 because the fifth character 50 does not match 0.

this bug is probably already fixed. I tested your example from Matlab Central in several versions:
in R2013b:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
2
in R2015a:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
2
in R2016a:
>> regexp(char([0 0 1 0 41 41 41 41 41 41]),char([0 '.' 0 40 40 40 40]))
ans =
[]

Related

matrix filling, but it is skipping an element each row, why? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have these vectors as follows
vector< vector< Arc * > * > _adjacences;
vector <int> list[_adjacences.size()];
vector <int> listD[_adjacences.size()];
vector < vector <int> > matrix( _adjacences.size(), vector<int>(_adjacences.size(),0 ));
vector < vector <int> > shortPath( _adjacences.size(), vector<int>(_adjacences.size(),0 ));
I want to make the adjacency list to an adjacency matrix
Arc contains these:
int sommetArrive;
int longueur;
string nom;
I tried to make two vectors one for the distance and the other for the peeks
for (unsigned i = 0; i < _adjacences.size(); i++){
for (auto j : *_adjacences[i]){
list[i].push_back(j->sommetArrive);
listD[i].push_back(j->longueur);
}
}
Then I make the adjacency matrix, THIS IS WHERE I'M DOING THINGS WRONG
for (int i = 0; i < _adjacences.size(); i++) {
for (auto j : list[i]){
for (auto k : listD[i]){
matrix[i][j] = k;
}
}
}
Instead of getting this:
0 0 0 0 0 0 0 0 0 0 0 0 0 120 0 62
0 0 253 0 0 0 0 0 0 0 204 0 0 0 0 0
0 53 0 12 0 0 105 0 0 0 0 0 0 0 0 0
0 0 15 0 38 0 0 108 0 0 0 0 0 0 0 0
0 0 0 93 0 123 0 0 113 0 0 0 0 0 0 0
0 0 0 0 158 0 0 0 0 118 0 0 0 0 0 0
0 0 97 0 0 0 0 17 0 0 0 87 0 0 0 0
0 0 0 103 0 0 3 0 53 0 0 0 73 0 0 0
0 0 0 0 153 0 0 33 0 113 0 0 0 0 0 0
0 0 0 0 0 55 0 0 91 0 0 0 0 72 0 0
I end up with this
0 0 0 0 0 0 0 0 0 0 0 0 0 62 0 62
0 0 204 0 0 0 0 0 0 0 204 0 0 0 0 0
0 105 0 105 0 0 105 0 0 0 0 0 0 0 0 0
0 0 108 0 108 0 0 108 0 0 0 0 0 0 0 0
0 0 0 113 0 113 0 0 113 0 0 0 0 0 0 0
0 0 0 0 118 0 0 0 0 118 0 0 0 0 0 0
0 0 87 0 0 0 0 87 0 0 0 87 0 0 0 0
0 0 0 73 0 0 73 0 73 0 0 0 73 0 0 0
0 0 0 0 113 0 0 113 0 113 0 0 0 0 0 0
0 0 0 0 0 72 0 0 72 0 0 0 0 72 0 0
Where the same number is being repeated instead of passing to the next element k
What did I do wrong ?
Notice that for every i you use the same list listD[i]. The inner loop:
for (auto k : listD[i]){
matrix[i][j] = k;
}
will assign all values in listD[i], one by one, to matrix[i][j], until the final value remains. This is the same as writing matrix[i][j] = listD[i].back().
You probably need to replace the inner loop with matrix[i][j] = listD[i][j].

Create new variable based on conditions across multiple variables in SAS

I would like to create a new variable "type" based on conditions being true across multiple variables, but I have too many variables (~100) to type. I am using SAS Studio v 9.4.
My data is set up similar to this:
DATA have;
INPUT id
a_var_a a_var_b a_var_c a_var_d a_var_e
b_var_a b_var_b b_var_c b_var_d
c_var_a c_var_b c_var_c d_var_d;
DATALINES;
01 1 0 0 0 0 0 0 0 0 0 0 0 0
02 0 1 0 0 0 0 0 0 0 0 0 0 0
03 0 0 1 0 0 0 0 0 0 0 0 0 0
04 0 0 0 1 0 0 0 0 0 0 0 0 0
05 0 0 0 0 1 0 0 0 0 0 0 0 0
06 0 0 0 0 0 1 0 0 0 0 0 0 0
07 0 0 0 0 0 0 1 0 0 0 0 0 0
08 0 0 0 0 0 0 0 1 0 0 0 0 0
09 0 0 0 0 0 0 0 0 1 0 0 0 0
10 0 0 0 0 0 0 0 0 0 1 0 0 0
11 0 0 0 0 0 0 0 0 0 0 1 0 0
12 0 0 0 0 0 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 0 0 0 0 0 1
;
Run;
"type" is coded as:
1 If any of the group a vars (a_var:) are equal to 1
2 If any of the group b vars (b_var:) are equal to 1
3 If any of the group c vars (c_var:) are equal to 1
else equal to 0
I thought it would be as simple as:
Data want;
Set have;
If a_var: = 1 then type = 1;
Else If b_var: = 1 then type = 2;
Else If c_var: = 1 then type = 3;
Else type = 0;
Run;
However I keep getting an error code because I am not allowed to group the variables.
I tried doing the same thing with an array but I am still unable to arrive at a solution:
Data want;
Set have;
Array a (*) a_var:;
Array other (2,4) b_var: c_var:;
do i = 1 to dim(a);
If a(i) = 1 then type=1;
end;
do i = 1 to 4;
If other (1,i) = 1 then type=2;
If other (2,i) = 1 then type=3;
Else type=0;
end;
drop i;
Run;
I am trying to create 3 categories of the "type" variable (0,1,2, and 3) based on how the conditions are met.
Thank you!
This is the code eventually worked.
DATA have;
INPUT id
a_var_a a_var_b a_var_c a_var_d a_var_e
b_var_a b_var_b b_var_c b_var_d
c_var_a c_var_b c_var_c c_var_d;
if whichn (1, of a_var: ) =>1 then type=1;
else if whichn (1, of b_var: ) =>1 then type=2;
else if whichn(1, of c_var:) =>1 then type=3;
else type = 0;
DATALINES;
01 1 0 0 0 0 0 0 0 0 0 0 0 0
02 0 1 0 0 0 0 0 0 0 0 0 0 0
03 0 0 1 0 0 0 0 0 0 0 0 0 0
04 0 0 0 1 0 0 0 0 0 0 0 0 0
05 0 0 0 0 1 0 0 0 0 0 0 0 0
06 0 0 0 0 0 1 0 0 0 0 0 0 0
07 0 0 0 0 0 0 1 0 0 0 0 0 0
08 0 0 0 0 0 0 0 1 0 0 0 0 0
09 0 0 0 0 0 0 0 0 1 0 0 0 0
10 0 0 0 0 0 0 0 0 0 1 0 0 0
11 0 0 0 0 0 0 0 0 0 0 1 0 0
12 0 0 0 0 0 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 0 0 0 0 0 1
14 0 0 0 0 0 0 0 0 0 0 0 0 0
;
Run;
I don't think the prefix: shortcut can be used for something like this.
Instead I suggest you use macros to generate the code you need based on DICTIONARY.COLUMNS (see data set column names into macro variable(s) for an example).
You can generate conditions like a_var_a=1 or a_var_b=1 or a_var_c=1 or a_var_d=1 or a_var_e=1 using something like this (untested):
/* preferably enclose this in a macro and declare the macrovariable as %local mvGroupAIsSet; */
proc sql noprint;
select cats(name, '=1') into :mvGroupAIsSet separated by ' or '
from dictionary.columns
where name like 'a_var_%' /* don't remember if you need to escape the underscores */
and libname = 'WORK'
and memname = 'HAVE';
quit;
Then use this in your DATA step:
data want;
set have;
if &mvGroupAIsSet then type = 1;
/* etc */
run;

awk if-greater-than and replacement under condition

I have following data
......
6 4 4 17 154 93 309 0 11930
7 3 2 233 311 0 11936 11932 111874
8 3 1 15 0 11938 11943 211004 11449
9 3 2 55 102 0 11932 11941 111883
10 3 2 197 231 0 11925 11921 111849
11 3 2 160 777 0 11934 11928 111875
......
I hope to replace any values greater than 5000 to 0, from column 4 to column 9. How can I do this work with awk?
To print with lots of spaces like the input, something like this:
awk '{for(i=4;i<=NF;i++)if($i>5000)$i=0; for(i=1;i<=NF;i++)printf "%7d",$i;printf"\n"}' file
Output
6 4 4 17 154 93 309 0 0
7 3 2 233 311 0 0 0 0
8 3 1 15 0 0 0 0 0
9 3 2 55 102 0 0 0 0
10 3 2 197 231 0 0 0 0
11 3 2 160 777 0 0 0 0
For scrunched up together (TM) output, you can use this:
awk '{for(i=4;i<=NF;i++)if($i>5000)$i=0}1' file
6 4 4 17 154 93 309 0 0
7 3 2 233 311 0 0 0 0
8 3 1 15 0 0 0 0 0
9 3 2 55 102 0 0 0 0
10 3 2 197 231 0 0 0 0
11 3 2 160 777 0 0 0 0
An alternative approach (requires gawk4+):
{
patsplit($0, a, "[0-9]+", s)
printf s[0]
for (i=1; i<=length(a); i++){
if(i>4 && a[i]>5000) {
l=length(a[i])
a[i]=0
}
else l=0
printf "%"l"s%s", a[i], s[i]
}
printf "\n"
}
It is more flexible when the spacing would vary, as opposed to the example data. It might also be faster than the accepted answer, in case the number of fields is way bigger than 9.

read data from file to an array

I have the following data written on a file. I want to neglect all the zeros in the beginning and the but the starting from 181 in an array each number in a cell so I could use it easily.
I know how to put data in an array but how could I neglect all these zeros ??
0 177 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 181 98 1 2 28 0 104 93 165 3 7 110 239 5 172 164 176 29 56 147 4 0 234 215 3 0 166 6 0 0 78 5 0 0 164 145 181 98 1 2 28 0 80 97 165 3 7 110 239 5 172 164 176 29 56 147 4 0 234 215 3 0 169 6 0 0 78 5 0 0 147 117 181 98 1 2 28 0 56 101 165 3 7 110 239 5 172 164 176 29 56 147 4 0 234 215 3 0 173 6 0 0 81 5 0 0 134 109 181 98 1 2 28 0 32 105 165 3 7 110 239 5 172 164 176 29 56 147 4 0 234 215 3 0 181 6 0 0 85 5 0 0 126 137 181 98 1 2 28 0 8 109 165 3 7 110 239 5 172 164 176 29 56 147 4 0 234 215 3 0 182 6 0 0 87 5 0 0 109 101
I am not sure I understand your question, so I will post multiple answers. Choose the one that fits with your problem's description.
Case 1: ignore everything before(or before and including) 181:
#include <iostream>
#include <fstream>
#include <vector>
int main() {
std::ifstream in("input.txt");
std::vector<int> vec;
int reached_181 = 0, x;
while(in >> x) {
if(x == 181) reached_181 = 1;
if(reached_181) vec.push_back(x);
// if you also want to neglect 181 then just change the order of the two commands
// if(reached_181) vec.push_back(x);
// if(x == 181) reached_181 = 1;
}
for(std::vector<int>::size_type i=0; i<vec.size(); ++i) {
std::cout << vec[i] << " ";
}
return 0;
}
Case 2: ignore every zero before 181
#include <iostream>
#include <fstream>
#include <vector>
int main() {
std::ifstream in("input.txt");
std::vector<int> vec;
int reached_181 = 0, x;
while(in >> x) {
if(x == 181) reached_181 = 1;
if(reached_181 || x) vec.push_back(x);
}
for(std::vector<int>::size_type i=0; i<vec.size(); ++i) {
std::cout << vec[i] << " ";
}
return 0;
}
Case 3: ignore all the zeroes in the input file
#include <iostream>
#include <fstream>
#include <vector>
int main() {
std::ifstream in("input.txt");
std::vector<int> vec;
int x;
while(in >> x) {
if(x) vec.push_back(x);
}
for(std::vector<int>::size_type i=0; i<vec.size(); ++i) {
std::cout << vec[i] << " ";
}
return 0;
}
Try this:
#include <fstream>
#include <vector>
#include <iostream>
int main()
{
std::vector<int> v;
std::fstream out("out.txt"); // name of your file
bool hit;
for (int n; (out >> n);)
{
if (n == 181 && !hit)
hit = true;
if (!hit)
if (n)
v.push_back(n);
if (hit)
v.push_back(n);
}
typedef std::vector<int>::const_iterator iter_type;
for (iter_type it = v.begin(); it != v.end(); ++it)
std::cout << *it << std::endl;
}

Minimal double type number in C aften which comupter start thinking that its zero

Sorry for my english. Can you tell me the smallest double type number after which the computer considers that the double type number equals zero?
Actual zero is zero. The result can become zero in different ways. A double has an value range of +/-10^+/-308 (roughly). A number smaller than the smallest number will be considered zero. Using #include <limits>, you can get numeric_limits<double>::denorm_min(), which is the smallest value that can be represented in a double.
But you can get "the effect of zero" in other ways. Say you have a fairly large number, 10 million, and you add (or subtract - read add as add or subtract in the rest of this paragraph) a very small number, say 1/10 million, then the addition will have no effect, because it is outside the actual value bits of the mantissa of the floating point number - that is, 53 bits in the case of double - then the effect will be the same as adding zero. In other words, even if you have a number that is not zero, using it to add to another number is not always going to change the other number.
See IEEE-754 on Wikipedia (other floating point formats do exist, but they are unusual).
You could try:
#include <limits>
std::numeric_limits<double>::denorm_min();
Doc for denormal (aka subnormal) numbers (here).
If this number is divided by e.g. by 2 the result is 0.
To check this values on a specific platform the following code can be used:
#include <iostream>
#include <limits>
using std::cout;
using std::endl;
int main() {
typedef double real;
union dbl {
real d;
unsigned char c[sizeof(d)];
dbl(const dbl &n = 0.0) : d(n.d) {}
dbl(double n) : d(n) {}
void pr(const char *txt = 0) const {
if (txt) cout << txt << ": ";
cout << d << ":";
for (int i = sizeof(d) -1; i >= 0; --i)
cout << std::hex << " " << (int)c[i];
cout << endl;
}
};
dbl n = 1.0;
for (; n.d > 0.0; n.d /= 2.0)
n.pr();
n.pr("zero");
n.d = std::numeric_limits<real>::min();
n.pr("min");
n.d = std::numeric_limits<real>::denorm_min();
n.pr("denorm_min");
}
Output on 32 bit linux (intel cpu) (doc about double format):
1: 3f f0 0 0 0 0 0 0
0.5: 3f e0 0 0 0 0 0 0
0.25: 3f d0 0 0 0 0 0 0
0.125: 3f c0 0 0 0 0 0 0
0.0625: 3f b0 0 0 0 0 0 0
...
8.9003e-308: 0 30 0 0 0 0 0 0
4.45015e-308: 0 20 0 0 0 0 0 0
2.22507e-308: 0 10 0 0 0 0 0 0
1.11254e-308: 0 8 0 0 0 0 0 0
5.56268e-309: 0 4 0 0 0 0 0 0
...
7.90505e-323: 0 0 0 0 0 0 0 10
3.95253e-323: 0 0 0 0 0 0 0 8
1.97626e-323: 0 0 0 0 0 0 0 4
9.88131e-324: 0 0 0 0 0 0 0 2
4.94066e-324: 0 0 0 0 0 0 0 1
zero: 0: 0 0 0 0 0 0 0 0
min: 2.22507e-308: 0 10 0 0 0 0 0 0
denorm_min: 4.94066e-324: 0 0 0 0 0 0 0 1
If real is defined as long double the output is:
1: 0 0 3f ff 80 0 0 0 0 0 0 0
0.5: 0 0 3f fe 80 0 0 0 0 0 0 0
0.25: 0 0 3f fd 80 0 0 0 0 0 0 0
0.125: 0 0 3f fc 80 0 0 0 0 0 0 0
0.0625: 0 0 3f fb 80 0 0 0 0 0 0 0
...
5.83232e-4950: 0 0 0 0 0 0 0 0 0 0 0 10
2.91616e-4950: 0 0 0 0 0 0 0 0 0 0 0 8
1.45808e-4950: 0 0 0 0 0 0 0 0 0 0 0 4
7.2904e-4951: 0 0 0 0 0 0 0 0 0 0 0 2
3.6452e-4951: 0 0 0 0 0 0 0 0 0 0 0 1
zero: 0: 0 0 0 0 0 0 0 0 0 0 0 0
min: 3.3621e-4932: 0 0 0 1 80 0 0 0 0 0 0 0
denorm_min: 3.6452e-4951: 0 0 0 0 0 0 0 0 0 0 0 1
Or for float:
1: 3f 80 0 0
0.5: 3f 0 0 0
0.25: 3e 80 0 0
0.125: 3e 0 0 0
0.0625: 3d 80 0 0
...
2.24208e-44: 0 0 0 10
1.12104e-44: 0 0 0 8
5.60519e-45: 0 0 0 4
2.8026e-45: 0 0 0 2
1.4013e-45: 0 0 0 1
zero: 0: 0 0 0 0
min: 1.17549e-38: 0 80 0 0
denorm_min: 1.4013e-45: 0 0 0 1
In the single-precision 32-bit and double-precision 64-bit format IEEE 754
The smallest positive normal value of double is 0x1.0p-1022 2.2250738585072014E-308.
The smallest positive denormal value of double is 0x0.0000000000001P-1022 4.9e-324.
The smallest positive normal value of float is 0x1.0p-126f 1.17549435E-38f.
The smallest positive denormal value of float is 0x0.000002P-126f 1.4e-45f.
Positive numbers smaller than above may result in 0, depending on the rounding-mode as Marc Glisse commented.
When you compare a double value that has been calculated, you should never check equality. You should check to see if is within a range. Not doing so would lead to the strong possibility that what you think is true is not so.
This is possibly a duplicate of this question.