What is the algorithm of generating the code on 100 USD banknote? - check-digit

I am designing the primary key for storing product. I look around to find some insight how to design the ID as using auto increment is too boring. Do any one know that the code 'KB46279860I' on the below banknote meaning?
100 USD picture
I think that code is not just using auto-increment but some algorithm like check digit,etc.
Could any one give me some hints , Thanks!!

If you're not planning on showing the user your ID then auto-increment could save you processing time as it is handled by your database directly.
If you are planning on showing the ID to the user without showing the one in the database, you could consider using Hashids, or GUID or generating your own unique random value with a check digit. You can use Luhn or Damm's algorithm for check digit.


Android/ Java : IS there fast way to filter large data saved in a list ? and how to get high quality picture with small storage space in server?

I have two questions
the first one is:
I have large data come from the server I saved it in a list , the customer can filter this data by 7 filters and two by text watcher this thing caused filtering operation to slow it takes 4 seconds in each time
I tried to put the filter keywords like(length or width ...) in one if and (&&) between them
but it didn't give me a result, also I tried to replace the textwatcher by spinner but it's not
I'm using one (for loop)
So the question: how can I use multi filter for list contain up to 2000 row with mini or zero slow?
the second is:
I saved from 2 to 8 pictures in the server in string form
the question is when I get these pictures from the server how can I show them in high quality?
when I show them I can see the pixels and this is not good for the customer
I don't want these pictures to take large space in the server and at the same time I want it in good quality when I restore them to display
I'm using Android/ Java
Thank you
The answer on my first quistion is if you want using filter (like when you are using online clothes shop and you want to filter it by less price ) you should use the hash map, not ordinary list it will be faster
The answer on my second question is: if you want to save store images in a database you should save it as a link, not a string or any other datatype

Application for filtering database for the short period of time

I need to create an application that would allow me to get phone numbers of users with specific conditions as fast as possible. For example we've got 4 columns in sql table(region, income, age [and 4th with the phone number itself]). I want to get phone numbers from the table with specific region and income. Just make a sql query won't help because it takes significant amount of time. Database updates 1 time per day and I have some time to prepare data as I wish.
The question is: How would you make the process of getting phone numbers with specific conditions as fast as possible. O(1) in the best scenario. Consider storing values from sql table in RAM for the fastest access.
I came up with the following idea:
For each phone number create smth like a bitset. 0 if the particular condition is false and 1 if the condition is true. But I'm not sure I can implement it for columns with not boolean values.
Create a vector with phone numbers.
Create a vector with phone numbers' bitsets.
To get phone numbers - iterate for the 2nd vector and compare bitsets with required one.
It's not O(1) at all. And I still don't know what to do about not boolean columns. I thought maybe it's possible to do something good with std::unordered_map (all phone numbers are unique) or improve my idea with vector and masks.
P.s. SQL table consumes 4GB of memory and I can store up to 8GB in RAM. The're 500 columns.
I want to get phone numbers from the table with specific region and income.
You would create indexes in the database on (region, income). Let the database do the work.
If you really want it to be fast I think you should consider ElasticSearch. Think of every phone in the DB as a doc with properties (your columns).
You will need to reindex the table once a day (or in realtime) but when it's time to search you just use the filter of ElasticSearch to find the results.
Another option is to have an index for every column. In this case the engine will do an Index Merge to increase performance. I would also consider using MEMORY Tables. In case you write to this table - consider having a read replica just for reads.
To optimize your table - save your queries somewhere and add index(for multiple columns) just for the top X popular searches depends on your memory limitations.
You can use use NVME as your DB disk (if you can't load it to memory)

Preserve Order for Cross Validation in Weka

I am using the Weka GUI for classifying sensor data.
I have measures of 10 people, the data is sorted. So the first 10% correspond to participant 1, the second 10% to participant 2 etc.
I would like to use 10 fold cross validation to build a model on 9 participants and test it on the remaining participant. In my case I believe I could accomplish this by simply not randomizing the data splits.
How would I best go about doing this?
I don't know how to do this in the Explorer.
In the KnowledgeFlow GUI, there is a CrossValidationFoldMaker used to create cross-validation folds. This has an option to Preserve instances order, which says it preserves the order of instances rather than randomly shuffling.
There's a video describing the KnowledgeFlow interface here:

Check a fingerprint in the database

I am saving the fingerprints in a field "blob", then wonder if the only way to compare these impressions is retrieving all prints saved in the database and then create a vector to check, using the function "identify_finger"? You can check directly from the database using a SELECT?
I'm working with libfprint. In this code the verification is done in a vector:
def test_identify():
cur = DB.cursor()
cur.execute('select id, fp from print')
id = []
gallary = []
for row in cur.fetchall():
data = pyfprint.pyf.fp_print_data_from_data(str(row['fp']))
gallary.append(pyfprint.Fprint(data_ptr = data))
n, fp, img = FingerDevice.identify_finger(gallary)
There are two fundamentally different ways to use a fingerprint database. One is to verify the identity of a person who is known through other means, and one is to search for a person whose identity is unknown.
A simple library such as libfprint is suitable for the first case only. Since you're using it to verify someone you can use their identity to look up a single row from the database. Perhaps you've scanned more than one finger, or perhaps you've stored multiple scans per finger, but it will still be a small number of database blobs returned.
A fingerprint search algorithm must be designed from the ground up to narrow the search space, to compare quickly, and to rank the results and deal with false positives. Just as a Google search may come up with pages totally unrelated to what you're looking for, so too will a fingerprint search. There are companies that devote their entire existence to solving this problem.
Another way would be to have a mysql plugin that knows how to work with fingerprint images and select based on what you are looking for.
I really doubt that there is such a thing.
You could also try to parallelize the fingerprint comparation, ie - calling:
in parallel, on different cores/machines
You can't check directly from the database using a SELECT because each scan is different and will produce different blobs. libfprint does the hard work of comparing different scans and judging if they are from the same person or not
What zinking and Tudor are saying, I think, is that if you understand how does that judgement process works (which is by the way, by minutiae comparison) you can develop a method of storing the relevant data for the process (the *minutiae, maybe?) in the database and then a method for fetching the relevant values -- maybe a kind of index or some type of extension to the database.
In other words, you would have to reimplement the libfprint algorithms in a more complex (and beautiful) way, instead of just accepting the libfprint method of comparing the scan with all stored fingerprint in a loop.
other solutions for speeding your program
use C:
I only know sufficient C to write kind of hello-world programs, but it was not hard to write code in pure C to use the fp_identify_finger_img function of libfprint and I can tell you it is much faster than pyfprint.identify_finger.
You can continue doing the enrollment part of the stuff in python. I do it.
use a time / location based SELECT:
If you know your users will scan their fingerprints with more probability at some time than other time, or at some place than other place (maybe arriving at work at some time and scanning their fingers, or leaving, or entering the building by one gate, or by other), you can collect data (at each scan) for measuring the probabilities and creating parallel tables to sort the users for their probability of arriving at each time and location.
We know that identify_finger tries to identify fingers in a loop with the fingerprint objects you provided in a list, so we can use that and give it the objects sorted in a way in which the more likely user for that time and that location will be the first in the list and so on.

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm.
My objective is to calculate a score for each product that a user has some sort of history with.
The data I am currently collecting:
User order history
Product pageview history for both anonymous and registered users
All of this data is timestamped.
What I'm looking for
There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer.
Any additional data I can collect for a user that can directly imply an interest in a product
Algorithms/equations for turning this data into scores for each product
What I'm NOT looking for
Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user:
Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score
For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.
Calculating a score for each product based on the preferences of other users within the user's graph
Sorting the scores to return a list of recommendations
Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place
Here's a haymaker of a response:
time spent looking at a product
semantic interpretation of comments left about the product
make a discussion page about a product, brand, or product category and semantically interpret the comments
if they Shared a product page (email, del.icio.us, etc.)
browser (mobile might make them spend less time on the page vis-à-vis laptop while indicating great interest) and connection speed (affects amt. of time spent on the page)
facebook profile similarity
heatmap data (e.g. à la kissmetrics)
What kind of products are you selling? That might help us answer you better. (Since this is an old question, I am addressing both #Andrew Ingram and anyone else who has the same question and found this thread through search.)
You can allow users to explicitly state their preferences, the way netflix allows users to assign stars.
You can assign a positive numeric value for all the stuff they bought, since you say you do have their purchase history. Assign zero for stuff they didn't buy
You could do some sort of weighted value for stuff they bought, adjusted for what's popular. (if nearly everybody bought a product, it doesn't tell you much about a person that they also bought it) See "term frequency–inverse document frequency"
You could also assign some lesser numeric value for items that users looked at but did not buy.