In Solidity, what does | mean in the context of "if"? - if-statement

I see an if statement in Solidity that looks like this:
// Set a flag if this is an NFI.
if (_isNF)
_type = _type | TYPE_NF_BIT;
What does the | mean? Usually it means "or", but it doesn't make sense to me here...

Bitwise or, a kind of bitwise operation. It can be used for example some on-off flags (indicator) for further operation.

It's a bitwise or. As the comment says, the statement sets a flag. E.g. if _type is (binary) 00100100 and TYPE_NF_BIT is 00000010, the result will be 00100110 - i.e. it makes sure that the value of the second bit of _type is set to 1. This way you can store up to 8 boolean values in a byte.

Related

Universe OCONV argument for zero-padding

I'm looking for some argument (ARG) such that this code:
A = 5
B = OCONV(A,'ARG5')
PRINT B
will print to the screen
00005
Anybody know something which will do this for me?
In Universe I would use the MR% conversion code. Just be aware that it will truncate anything longer than 5 characters.
A = 5
B = OCONV(A,'MR%5')
PRINT B
I use this a lot when I need to use EVAL in a conditional or as an aggregate function in a SQL or other TCL statement like to find the record with the most fields in a file.
SELECT MAX(EVAL "DCOUNT(#RECORD,#FM)") FROM VOC;
SELECT MAX(EVAL "OCONV(DCOUNT(#RECORD,#FM),'MR%8')") FROM VOC;
Masking aside these generally return 2 different values on our system.
I am using UniData, but looking at the commands reference manual I can't see anything quite right, in terms of one simple argument to OCONV, or similar. I came up with these (somewhat kludgy) alternatives, though:
NUMLEN=5
VALUE=5
PRINT CHANGE(SPACES(NUMLEN-LEN(VALUE))," ","0"):VALUE
Here you are using the SPACES function to create that amount of space characters and then convert them to zeros.
PRINT OCONV(VALUE,"MR":NUMLEN:"(#####)")
This is using OCONV but has to define a string with the "mask" to only shew the final 5 digits. So if NUMLEN changes then the mask string definition would have to change.
PRINT OCONV(VALUE,"MR":NUMLEN)[3,NUMLEN]
This version uses OCONV but prints starting at the 3rd character and shews the next NUMLEN characters, therefore trimming off the initial "0." that is made by using the "MR" parameter
PADDED.VALUE = VALUE 'R%5' is the simplest way to do this.

How to find strings beginning with X?

I am trying to identify strings that begin with X using the function regexm() in Stata.
My code:
for var lookin: count if regexm(X, "X")
I have tried using double quotes, square brackets, adding the options for the other characters in the string X[0-9][0-9] etc. but to no avail.
I expect the resultant number to be about 1000, but it returns 0.
The following works for me:
clear
input str22 foo
"Xhello"
"this is a X sentence"
"X a silly one"
"but serves the purpose"
end
generate tag = strmatch(foo, "X*")
list
+------------------------------+
| foo tag |
|------------------------------|
1. | Xhello 1 |
2. | this is a X sentence 0 |
3. | X a silly one 1 |
4. | but serves the purpose 0 |
+------------------------------+
count if tag
2
This is the regular expression solution based on the above example:
generate tag = regexm(foo, "^X")
for in Stata is ancient and now undocumented syntax, unless you are using a very old version of Stata, in which case you would be better flagging that.
X is the default loop element which is substituted everywhere it is found.
Hence your syntax -- looping over a single variable -- reduces to
count if regexm(lookin, "lookin")
and even without a data example we can believe that the answer is 0.
This would be legal and is closer to what you seek:
for Y in var lookin : count if regexm(Y, "X")
but the regular expression is wrong, as #Pearly Spencer points out.
Incidentally,
count if strpos(lookin, "X") == 1
is a direct alternative to your code.
In any Stata that supports regexm() you should be looping with foreach or forvalues.

SAS converting character to numeric in data step

I cannot figure out why SAS is converting some character variables to numeric in a data step. I've confirmed that each variable is a character variable with a length of 1 using PROC CONTENTS. Below is the code that is generating this problem. I've not found any answers through google searches that make sense for this issue.
data graddist.graddist11;
set graddist.graddist10;
if (ACCT2301_FA16 | BIOL1106_FA16 | BIOL1107_FA16 | BIOL1306_FA16 |
BIOL2101_FA16 | BIOL2301_FA16 | CHEM1111_FA16 | CHEM1305_FA16 | CHEM1311_FA16 |
ECON2301_FA16 | ENGL1301_FA16 | ENGL1302_FA16 | ENGR1201_FA16 | GEOG1313_FA16 |
HIST1301_FA16 | HIST1302_FA16 | MARK3311_FA16 | MATH1314_FA16 | MATH2413_FA16 |
MATH2414_FA16 | PHIL2306_FA16 | POLS2305_FA16 | POLS2306_FA16 |
PSYC1301_FA16 | PSYC2320_FA16) in ('A','B','C','D','F','W','Q','I') then FA16courses_b=1;
else FA16courses_b=0;
Thank you,
Brian
Unfortunately the in operator does not work like that in SAS. It only compares a single value on the left to a list of values on the right. Your code is basically evaluating everything to the left of the in statement first and returning a TRUE/FALSE value (in SAS this is a numeric value where 0=FALSE and anything other than 0=TRUE). It is then basically saying:
if (0) in ('A'....'I') then...
or
if (1) in ('A'....'I') then...
You would need to rewrite your equivalence test in some other way such as :
if ACCT2301_FA16 in ('A'....'I')
or BIOL1106_FA16 in ('A'....'I')
or ...
As Robert notes, you can't compare multiple variables to multiple values at once with one equals query.
The way to do this with a long string is with FINDC. Concatenate everything into a string, concatenate your finds into a string, and compare; FINDC looks for any char in (charlist).
data graddist10;
length ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16 $1;
input
ACCT2301_FA16 $
BIOL1106_FA16 $
BIOL1107_FA16 $
;
datalines;
X Y Z
A A B
B A B
. . .
A X .
X . B
W Y A
;;;;
run;
data graddist11;
set graddist10;
if findc(catx('|',of ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16),
('ABCDFWQI')) then FA16courses_b=1;
else FA16courses_b=0;
run;
You made two syntax mistakes but they cancelled each other out and SAS considered your statement valid. First the IN () operator can only take one value on the left. Second the | token means logical or and is not used to separate items in a list.
The use of | between the variable names caused SAS to reduce the left hand side to a single value and so your use of IN was then syntactically correct. That is also why SAS noted that it had converted your character variables to numeric values so that they could be evaluated as boolean logic.
It is not clear what logic you meant for testing of those multiple values.
You might be able to use the verify() function. Perhaps something like this, but watch out for when all of the values are missing.
FA16courses_b = 0 ^= verify(cats(of ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16),'ABCDEQWI');
But it is probably easier to loop over the variables and test them one by one. So for example you could set the target variable to zero and then conditionally set it to one when one of the variables meets the condition.
FA16courses_b=0;
array c ACCT2301_FA16 BIOL1106_FA16 BIOL1107_FA16 ;
do i=1 to dim(c) while (FA16courses_b=0);
if not indexc(c(i),'ABCDEQWI') then FA16courses_b=1;
end;

Is there a way to usefully index a text column containing regex patterns?

I'm using PostgreSQL, currently version 9.2 but I'm open to upgrading.
In one of my tables, I have a column of type text that stores regex patterns.
CREATE TABLE foo (
id serial,
pattern text,
PRIMARY KEY(id)
);
CREATE INDEX foo_pattern_idx ON foo(pattern);
Then I do queries on it like this:
INSERT INTO foo (pattern) VALUES ('^abc.*$');
SELECT * FROM foo WHERE 'abc literal string' ~ pattern;
I understand that this is sort of a reverse LIKE or reverse pattern match. If it was the other, more common way, if my haystack was in the database, and my needle was anchored, I could use a btree index more or less effectively depending on the exact search pattern and data.
But the data that I have is a table of patterns and other data associated with the patterns. I need to ask the database which rows have patterns that match my query text. Is there a way to make this more efficient than a sequential scan that checks every row in my table?
There is no way.
Indexes require IMMUTABLE expressions. The result of your expression depends on the input string. I don't see any other way than to evaluate the expression for every row, meaning a sequential scan.
Related answer with more details for the IMMUTABLE angle:
Does PostgreSQL support "accent insensitive" collations?
Just that there is no workaround for your case, which is impossible to index. The index needs to store constant values in its tuples, which is just not available because the resulting value for every row is computed based on the input. And you cannot transform the input without looking at the column value.
Postgres index usage is bound to operators and only indexes on expressions left of the operator can be used (due to the same logical restraints). More:
Can PostgreSQL index array columns?
Many operators define a COMMUTATOR which allows the query planner / optimizer to flip the indexed expressions to the left. Simple example: The commutator of = is =. the commutator of > is < and vice versa. The documentation:
the index-scan machinery expects to see the indexed column on the left of the operator it is given.
The regular expression match operator ~ has no commutator, again, because that's not possible. See for yourself:
SELECT oprname, oprright::regtype, oprleft::regtype, oprcom
FROM pg_operator
WHERE oprname = '~'
AND 'text'::regtype IN (oprright, oprleft);
oprname | oprright | oprleft | oprcom
---------+----------+-----------+------------
~ | text | name | 0
~ | text | text | 0
~ | text | character | 0
~ | text | citext | 0
And consult the manual here:
oprcom ... Commutator of this operator, if any
...
Unused column contain zeroes. For example, oprleft is zero for a prefix operator.
I have tried before and had to accept it's impossible on principal.

Using re2c with ISO-8859-x

We have some text in ISO-8859-15 for which we want to tokenize. (ISO-8859-15 is ISO-8859-1 with the Euro sign and other common accented characters, for more details see ISO-8859-15).
I am trying to get the parser to recognize all the characters. The native character representation of the text editors I'm using is UTF-8, so to avoid hidden conversion problems, I'm restricting all re2c code to ASCII e.g.
LATIN_CAPITAL_LETTER_A_WITH_GRAVE = "\xc0" ;
LATIN_CAPITAL_LETTER_A_WITH_ACUTE = "\xc1" ;
LATIN_CAPITAL_LETTER_A_WITH_CIRCUMFLEX = "\xc2" ;
LATIN_CAPITAL_LETTER_A_WITH_TILDE = "\xc3" ;
...
Then:
UPPER = [A-Z] | LATIN_CAPITAL_LETTER_A_WITH_GRAVE
| LATIN_CAPITAL_LETTER_A_WITH_CIRCUMFLEX
| LATIN_CAPITAL_LETTER_AE
| LATIN_CAPITAL_LETTER_C_WITH_CEDILLA
| ...
WORD = UPPER LOWER* | LOWER+ ;
It compiles no problem and runs great on ASCII, but stalls whenever it hits these extended characters.
Has anyone seen this, and is there a way to fix it?
Thank you,
Yimin
Yes, I've seen it. Has to do with comparison of signed vs unsigned types for bytes ≥ 128.
Two ways to fix: use unsigned char as your default type, e.g. re2c:define:YYCTYPE = "unsigned char";, or -funsigned-char (if using gcc, other compilers have equivalent) as a compile flag. You can use the one that interferes with your existing code the least.