not sure about my bitmask logic - bit-manipulation

I have objects, which I only want to display to the visitor based on different criteria.
The object has a bitmask and I have defined the following conditions:
const FLAG_ALWAYS = 0; // always show this item
const FLAG_LOGIN = 1; // only display to logged in users
const FLAG_NOTLOGIN = 2; // only display to users not logged in
const FLAG_OTHER = 4; // other criteria
const FLAG_NORTH = 8; // GeoIP
const FLAG_SOUTH = 16;
Combinations of flags are possible of course like 1+4+16 or 2+4.
An item can be displayed under 3 conditions for login for example: logged in, not logged in, or both. Therefore I need FLAG_NOTLOGIN.
I'm confused by the FLAG_ALWAYS... should it be 0, or should it cover all other flags like 4095 ?
Or should I remove FLAG_NOTLOGIN ?

Answer depend on how you combine criteria. There are two most simple cases.
Any match, OR combination. any flag you set will add matches, more flags more matches.
In this case all flags reset (0x0000) will match never. This means no criteria met.
All flags set (0xFFFF) will cause most matches. In case you have complimentary criteria (one of them is set) criteria will match always.
Match is implemented like this: 0!=(filter & criteria)
where filter is set of criteria to filter and criteria is set of same flags set on several conditions.
All match, AND combination. any flag you set will filter out some matches, more flags less matches.
In this case all flags reset will match always.
All flags set will cause least matches. If you have mutually exclusive criteria (one set then others reset) when all ones will cause no matches.
E. g. your flags: FLAG_LOGIN, FLAG_NOLOGIN. User may be either login or not, so BOTH criteria will never met, and FLAG_LOGIN+FLAG_NOLOGIN will never match, but 0 will match in any case as none of criteria set.
Match is implemented using this formula: 0==(all_flags & ~filter & ~criteria), here filter and criteria are same as above and all_flags is set of all used flags, to exclude unused bits in comparison. (note, expression criteria == (filter & criteria) is seem more obvious but wrong, because it will cause no matches when no flags ar set in criteria).
If your flag is 1 | 4 and object has 1 | 8, then first case will have match (because 1 criteria met and one is sufficient) and second case will not have match (4 criteria does not met but you need both 1 and 4).

FLAG_ALWAYS should be combination of all other flags and it should not be zero. FLAG_NOTLOGIN need not be removed.

Related

Having difficulty in pattern matching Postal Codes for an oracle regexp_like command

The Problem:
All I'm trying to do is come up with a pattern matching string for my regular expression that lets me select Canadian postal codes in this format: 'A1A-2B2' (for example).
The types of data I am trying to insert:
Insert Into Table
(Table_Number, Person_Name, EMail_Address, Street_Address, City, Province, Postal_Code, Hire_Date)
Values
(87, 'Tommy', 'mobster#gmail.com', '123 Street', 'location', 'ZY', 'T4X-1S2', To_Date('30-Aug-2020 08:50:56');
This is a slightly modified/generic version to protect some of the data. All of the other columns enter just fine/no complaints. But the postal code it does not seem to like when I try to run a load data script.
The Column & Constraint in question:
Postal_Code varchar2(7) Constraint Table_Postal_Code Null
Constraint CK_Postal_Code Check ((Regexp_like (Postal_Code, '^\[[:upper:]]{1}[[:digit:]]{1}[[:upper:]][[:punct:]]{1}[[:digit:]]{1}[[:upper:]](1}[[:digit:]]{1}$')),
My logic here: following the regular expression documentation:
I have:
an open quote
a exponent sign to indicate start of string
Backslash (I think to interpet a string literal)
-1 upper case letter, 1 digit, 1 uppercase , 1 :punct: to account for the hypen, 1 digit, 1 upper case letter, 1 digit
$ to indicate end of string
Close quote
In my mind, something like this should work, it accounts for every single letter/character and the ranges they have to be in. But something is off regarding my formatting of this pattern matching string.
The error I get is:
ORA-02290: check constraint (user.CK_POSTAL_CODE) violated
(slightly modified once more to protect my identity)
Which tells me that the data insert statement is tripping off my check constraint and thats about it. So its as issue with the condition of the constraint itself - ie string I'm using to match it. My instructor has told me that insert data is valid, and doesn't need any fix-up so I'm at a loss.
Limits/Rules: The Hyphen has to be there/matched to my understanding of the problem. They are all uppercase in the dataset, so I don't have to worry about lowercase for this example.
I have tried countless variations of this regexp statement to see if anything at all would work, including:
changing all those uppers to :alpha: , then using 'i' to not check for case sensitivity for the time being
removing the {1} in case that was redudant
using - (backslash hyphen) , to turn into a string literal maybe
using only Hyphen by itself
even removing regexp altogether and trying a LIKE [A-Z][0-9][A-Z]-[0-9][A-Z][0-9] etc
keeping the uppers , turning :digit:'s to [0-9] to see if that would maybe work
The only logical thing I can think of now is: the check constraint is actually working fine and tripping off when it matches my syntax. But I didn't write it clearly enough to say "IGNORE these cases and only get tripped/activated if it doesn't meet these conditions"
But I'm at my wits end and asking here as a last resort. I wouldn't if I could see my mistake eventually - but everything I can think of, I probably tried. I'm sure its some tiny formatting rule I just can't see (I can feel it).Thank you kindly to anyone who would know how to format a pattern matching string like this properly.
It looks like you may have been overcomplicating the regex a bit. The regex below matches your description based on the first set of bullets you lined out:
REGEXP_LIKE (postal_code, '^[A-Z]\d[A-Z]-\d[A-Z]\d$')
I see two problems with that regexp.
Firstly, you have a spurious \ at the start. It serves you no purpose, get rid of it.
Secondly, the second-from last {1} appears in your code with mismatched brackets as (1}. I get the error ORA-12725: unmatched parentheses in regular expression because of this.
To be honest, you don't need the {1}s at all: they just tell the regular expression that you want one of the previous item, which is exactly what you'd get without them.
So you can fix the regexp in your constraint by getting rid of the \ and removing the {1}s, including the one with mismatched parentheses.
Here's a demo of the fixed constraint in action:
SQL> CREATE TABLE postal_code_test (
2 Postal_Code varchar2(7) Constraint Table_Postal_Code Null
3 Constraint CK_Postal_Code Check ((Regexp_like (Postal_Code, '^[[:upper:]][[:digit:]][[:upper:]][[:punct:]][[:digit:]][[:upper:]][[:digit:]]$'))));
Table created.
SQL> INSERT INTO postal_code_test (postal_code) VALUES ('T4X-1S2');
1 row created.
SQL> INSERT INTO postal_code_test (postal_code) VALUES ('invalid');
INSERT INTO postal_code_test (postal_code) VALUES ('invalid')
*
ERROR at line 1:
ORA-02290: check constraint (user.CK_POSTAL_CODE) violated
You do not need the backslash and you have (1} instead of {1}.
You can simplify the expression to:
Postal_Code varchar2(7)
Constraint Table_Postal_Code Null
Constraint CK_Postal_Code Check (
REGEXP_LIKE(Postal_Code, '^[A-Z]\d[A-Z][[:punct:]]\d[A-Z]\d$')
)
or:
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[A-Z][0-9][A-Z][[:punct:]][0-9][A-Z][0-9]$'
)
)
or:
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[[:upper:]][[:digit:]][[:upper:]][[:punct:]][[:digit:]][[:upper:]][[:digit:]]$'
)
)
or (although the {1} syntax is redundant here):
Constraint CK_Postal_Code Check (
REGEXP_LIKE(
Postal_Code,
'^[[:upper:]]{1}[[:digit:]]{1}[[:upper:]]{1}[[:punct:]]{1}[[:digit:]]{1}[[:upper:]]{1}[[:digit:]]{1}$'
)
)
fiddle
removing regexp altogether and trying a LIKE [A-Z][0-9][A-Z]-[0-9][A-Z][0-9] etc
That will not work as the LIKE operator does not match regular expression patterns.

Regex to detect string is x.x.x where x is a digit from 1-3 digits

I have values 1000+ rows with variable values entered as below
5.99
5.188.0
v5.33
v.440.0
I am looking in Gsheet another column to perform following operations:
Remove the 'v' character from the values
if there is 2nd '.' missing as so string can become 5.88 --> 5.88.0
Can help please in the regex and replace logic as tried this but new to regex making. Thanks for the help given
=regexmatch(<cellvalue>,"^[0-9]{1}\.[0-9]{1,3}\.[0-9]{1,3}$")
I have done till finding the value as 5.88.0 returns TRUE and 5.99 returns false, need to now append ".0" so 5.99 --> 5.99.0 and remove 'v' if found.
You can use a combination of functions, it may not be pretty, but it does the work
Replace any instance of v with an empty string using substitute, by making the content of the cell upper case, if we don't put UPPER(CELL) we could exclude any upper case V or lower case v(it will depend which function you use)
SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])
=SUBSTITUTE(UPPER(A1),"V","")
Look for cell missing the last block .xxx, you need to update a bit your regex to specified that the last group it's not present
^([0-9]{1}\.[0-9]{1,3} ( \.[0-9]{1,3}){0} )$
Using REGEXMATCH and IF we can then CONCATENATE the last group as .0
REGEXMATCH(text, regular_expression)
CONCATENATE(string1, [string2, ...])
=IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}(\.[0-9]{1,3}){0})$"),concatenate(A2,".0"), A2)
The last A2 will be replace with something similar than what we have until now, but before that we need to make small change in the regex, we want to look for the groups you specified were the last group it's present, that's your orignal regex, if it meets the regex it will put it in the cell, otherwise it will put INVALID, you can change that to anything you want it to be
^([0-9]{1}.[0-9]{1,3}.[0-9]{1,3})$
This it's the piece we are putting instead of the last A2
IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}\.[0-9]{1,3})$"),substitute(upper(A2),"V",""),"INVALID")
With this the final code to put in your cell will be:
=IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}(\.[0-9]{1,3}){0})$"),concatenate(SUBSTITUTE(UPPER(A2),"V",""),".0"),IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}\.[0-9]{1,3})$"),substitute(upper(A2),"V",""),"INVALID"))

Does mongodb $regex without the option `i` still make use of the index if I am searching on the Index?

I have a model with a normal index using Mongoose.
const mod = new mongoose.Schema({
number: { type: String, required: true, index: { unique: true } },
});
I am using a regex in a query to get the mod corresponding to a specific number. Will my regex query utilize the index that is on this model?
query.number = {
$regex: `.*Q10.*`
}
modelName.find(query)
I am concerned that this is looking through the entire collection without using the indexes. What would be the best way to know if I am using the index. Or if you happen to know a way that will utilize the index could you show me? Here I am looking for all close to Q10, not trying to get an exact match. Would using /^Q10.* be better and use the index?
Referencing MongoDB regex information on index and comments made on this post stackoverflow previous question
The best way to confirm index usage for a given query is using MongoDB's query explain() feature. See Explain Results in the manual for your version of MongoDB for more information on the output fields and interpretation.
With regular expressions a main concern is efficient use of indexes. An unanchored substring match like /Q10/ will require examining all index keys (assuming a candidate index exists, as in your example). This is an improvement over scanning the full collection data (as would be the case without an index), but not as ideal as being able to check a subset of relevant index keys as is possible with a regex prefix search.
If you are routinely searching for substring matches and there is a common pattern to your strings, you could design a more scalable schema. For example, you could save whatever your Q10 value represents into a separate field (such as part_number) where you could use a prefix match or an exact match (non-regex).
To illustrate, I set up some test data using MongoDB 3.4.2 and the mongo shell:
// Needles: strings to search for
db.mod.insert([{number:'Q10'}, {number: 'foo-Q10'}, {number:'Q10-123'}])
// Haystack: some string values to illustrate key comparisons
for (i=0; i<1000; i++) { db.mod.insert({number: "I" + i}) }
Regex search without an index:
db.mod.find({ number: { $regex: /Q10/ }}).explain('executionStats')
The winningPlan is a COLLSCAN (collection scan) which requires the server retrieve every document in the collection to perform the comparison. Note that the original regex includes an unnecessary .* prefix and suffix; this is implicit with a substring match so can be written more concisely as /Q10/.
Highlights from the executionStats section of the explain output:
"nReturned": 2,
"totalKeysExamined": 0,
"totalDocsExamined": 1003,
The explain output confirms there are no index keys examined and 1003 documents (all docs in this collection).
Add an index for the following two examples:
db.mod.createIndex({number:1}, {unique: true})
Regex substring search with an index:
db.mod.find({ number: { $regex: /Q10/}}).explain('executionStats')
The winningPlan is still an IXSCAN, but now has to examine all 1003 indexed string values to find substring matches:
"nReturned": 3,
"totalKeysExamined": 1003,
"totalDocsExamined": 3,
Regex prefix search with an index:
db.mod.find({ number: { $regex: /^Q10/}}).explain('executionStats')
The winningPlan is an IXSCAN (Index scan) which requires 3 key comparisons and 2 document fetches to return the 2 matching documents:
"nReturned": 2,
"totalKeysExamined": 3,
"totalDocsExamined": 2,
A prefix search isn't equivalent to the first two searches, as it will not match the document with value foo-Q10. However, this does illustrate a more efficient regex search.
Note that totalKeysExamined is 3. It might be reasonable to expect this to be 2 since there were only 2 matches, however this metric includes any comparisons with out-of-range keys (eg. end of a range of values). For more information see Explain Results: keysExamined.
With the index enabled, For case sensitive regular expression queries, the query traverses the entire index (load into memory), then load the matching documents to be returned into memory. Its expensive but still could be better than a full collection scan.
For /John Doe/ regex ,mongo will scan the entire keyset in the index
then fetch the matched documents.
However, if you use a prefix query :
Further optimization can occur if the regular expression is a “prefix
expression”, which means that all potential matches start with the
same string. This allows MongoDB to construct a “range” from that
prefix and only match against those values from the index that fall
within that range.

DB2: find field value where first character is a lower case letter

I am trying to pick out a value in a field where the first character is a lower case letter. This is difficult since DB2 does not permit regular expressions. My current attempt is:
select * from mytable
where field1 like lcase('_%')
where I was hoping the underscore followed by percent wildcard would find any character in the first position, and then wrap the lcase() around that to ensure it is lower case. the result is that any and every value gets selected, so the lcase() is not performing what I want it to do, and in hindsight is used to cast to lowercase.
With that in mind, how to I ensure that the result of
('_%')
is lowercase only?
Thanks very much
i would use something like:
... where substr(field1,1,1) <> upper(substr(field1,1,1))
solution with 'a'...'z' will not work with characters different from latin characterset (e.g. cyrilic etc)
Why not
where field1 >= 'a' and field1 < '{'
This will even make use of an appropriate index, if any.
Be warned, however, that this won't work when your DB instance does lexicongraphic ordering. I am not sure if the latter is a DB attribute or a session attribute, however.
Another, more general way (especially when considering non ASCII letters) would be to check if the length of the field is > 0 and the lowercased substring consisting of the first character equals the substring consisting of the first character while the uppercased first character does not equal the first character. (Look up the functions in the DB2 reference, I have mine not ready at the moment.)
DB2 DOES allow Regular expressions with xQuery. For example:
with cteGender(VALUE) as
(
values
('M'),('F'),('U'),('S'),(' M'),('f')
),
cteResult(VALUE,RESULT_BOOLEAN) as
(
select '"' || VALUE || ‘"',
xmlquery('fn:matches($VALUE,''^[MFU]{1}$'')') from cteGender
)
select VALUE, RESULT_BOOLEAN,
xmlcast(RESULT_BOOLEAN as integer) RESULT_INTEGER from cteResult;
I took this example from: http://www.idug.org/p/bl/et/blogid=278&blogaid=187 That article explain very well how to use xQuery.
DB2 does not have SQL functions for Regular Expressions, but with xQuery you can do that. But if you really want SQL functions for RegEx, please visit this site: https://www.ibm.com/developerworks/jp/data/library/db2/j_d-regularexpression/ (In Japanese, but the code can be understood)
For more information about RegEx in DB2 please visit: http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.xml.doc/doc/xqrregexp.html

Postgres set varchar field to regular expression of itself

I'm trying to normalise a data field by removing a fairly common postfix. I've got as far as using the substring() function in postgres, but can't quite get it to work. For example, if I want to strip the postfix 'xyz' from any values that have it;
UPDATE my_table SET my_field=substring(my_field from '#"%#"xyz' for '#');
But this is having some weird effects that I cant pin down. Any thoughts? Many thanks as always.
update my_table
set my_field = regexp_replace(my_field, 'xyz$', '')
where my_field ~ 'xyz$';
This will also change the value 'xyz' into an empty string. I don't know if you want that (or if the suffix can exists "on it's own".
The where clause is not strictly necessary but will make the update more efficient because only those rows are updated that actually meet the criteria.
UPDATE my_table
SET my_field = left(my_field, -3)
WHERE my_field LIKE '%xyz';
For several reasons:
If you don't want to change every single row, always add a WHERE clause to your UPDATE. Even if only some rows are actually changed by the expression. An UPDATE from the same value to the same value is still an UPDATE and will produce dead rows and table bloat and trigger triggers ...
Use left() in combination with LIKE.
left() with a negative second parameter effectively trims the number of character from the end of the string. left() was introduced with PostgreSQL 9.1. I quote the manual here:
When n is negative, return all but last |n| characters.
Always pick LIKE over a regular expression (~) if you can. LIKE is not as versatile, but much faster. (SIMILAR TO is rewritten as regular expression internally). Details in this related answer on dba.SE.
If you want to make sure that a minimum of characters remains:
WHERE my_field LIKE '_%xyz'; -- prepend as many _ as you want chars left
substring() would work like this (one possibility):
substring(my_field, '^(.*)xyz$');