How can I index a tree in a SQLite Table? - c++

I have a table with the following fields:
id VARCHAR(32) PRIMARY KEY,
parent VARCHAR(32),
name VARCHAR(32)
parent is a foreign key referencing the same table. This structure generates a tree. This tree is supposed to replicate a filesystem tree. The problem is that looking up an id from a path is slooow. Therefore I want to build an index. What is the best way of doing this?
Example Data:
id parent name
-------- ----------- ----------
1 NULL root
2 1 foo
3 1 bar
4 3 baz
5 4 aii
Would Index To:
id parent name
-------- ----------- ----------
1 NULL root
2 1 root/foo
3 1 root/bar
4 3 root/bar/baz
5 4 root/bar/baz/aii
I am currently thinking about using a temporary table and in the code manually running a series of insert from's to build the index. (The reason that I make it temporary is that if this db is accessed from a windows system, the path needs backslashes whereas from *nix it needs forward slashes). Is there an alternative to this?

so, you have a function that does something like this (pseudocode):
int getId(char **path) {
int id = 0;
int i;
char query[1000];
for (i = 0; path[i] != NULL; i++) {
sprintf(query, "select id from table where parent = %d and name = '%s'", id, name);
id = QuerySimple(query);
if (id < 0)
break;
}
return id;
}
looking at the query, you need a (non unique) index on columns (parent, name), but maybe you already have it.
a (temporary) table can be used like you said, note that you can change the path separator in your program, avoiding the needed of different tables for windows and unix. you also need to keep the additional table in sync with the master. if updates/deletes/inserts are rare, instead of a table, you can simply keep an in-memory cache of already looked up data, and clear the cache when an update happens (you can also do partial deletes on the cache if you want). In this case you can also read more data (e.g. given a parent read all the children) to fill up the cache faster. On the extreme, you can read the entire table in memory and work there! it depends on how many data you have, your typical access patterns (how many reads, how many writes), and the deployment environment (do you have RAM to spare?).

Related

Crash at rte_hash_del_key( )

I am using DPDK-18.02.02 in my application where we have created hash table
with the following parameters,
*/*DPDK hash table configuration parameters */
rte_hash_parameters flowHashParams = {
.name = "Hash Table for FLOW",
.entries = 128000,
.key_len = sizeof(ipv4_flow_key_t),
.hash_func = ipv4_flow_hash_crc,
.hash_func_init_val = 0,
.extra_flag = RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD,
/* FLAG - Multiple thread can use this HT */
};*
While adding and deleting an entry in hash table in DPDK-18.02.2, everything works well.
While moving on with DPDK-19.11.13(latest stable version), we are facing a crash at rte_hash_del_key(). If we remove that extra_flag, RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD in the hash table parameters, we did not face any crash at rte_hash_del_key().
In my application, one thread adds entry to the hash table, another one thread reads those information and delete the entry in the hash table.
How can we add that multi-writer support for the hash table? Is there any alternative to enable that support?

Where/when to check the version of the row when doing optimistic locking? [duplicate]

This question already has answers here:
Optimistic vs. Pessimistic locking
(13 answers)
Closed 6 months ago.
I want to implement optimistic locking for a relational database.
Suppose there is a users table
id
name
version
1
Jhon
1
2
Jhane
1
My application fetches the Jhon user to change his name
SELECT id, name, version FROM users WHERE id = 1;
jhon = get_user_by_id('1');
jhon.change_name_to('Jhin');
jhon.save() // this method should fail or succeed depending on the version of the row in the database
So where do I need to compare the version of the selected row with the version of the row that is in the database?
Is a database transaction a good place to fetch the existing version of the row and compare it with the already fetched record?
transaction_begin()
jhon = get_user_by_id('1')
if (jhon.version !== updated_jhon.version) { // Ensures that version match
// If no rollback
transaction_rollback();
} else {
// If yes, update and commit
query("UPDATE table SET name = {updated_jhon.name}, SET version = {jhon.version + 1} WHERE id = 1;")
}
transaction_commit()
I found an answer in a similar asked question
How does Hibernate do row version check for Optimistic Locking before committing the transaction
The answer would be not to read a version at all.
Optimistic locking does not require any extra SELECT to get and check the version after the entity was modified
In order to update a record (user), we also need to pass a version
UPDATE users SET name = Jhin WHERE id = 1 AND VERSION = 1;
If the number of affected records is greater than 0, it means that the row was not affected by someone else during the name update.
If the number of affected records is equal to 0. It means that someone else has changed the row during our modification.

How can I SELECT records using a select list made of foreign keys?

I have a table, DEBTOR, with a structure like this:
and a second table, DEBTOR.INFO structured like this:
I have a select list made of record IDs from the DEBTOR.INFO table. How can I
select * from DEBTOR WHERE 53 IN (name of select list)?
Is this even possible?
I realize this query looks more like SQL than RetrieVe but I wrote it that way for an easier understanding of what I'm trying to accomplish.
Currently, I accomplish this query by writing
SELECT DEBTOR WITH 53 EQ [paste list of DEBTOR.INFO record IDs]
but obviously this is unwieldy for large lists.
It looks to me that you cant do that. Even if you use and i-descriptor, It only works in one direction. TRANS("DEBTOR.INFO",53,0,"X") works from the DEBTOR file but not the other way. So TRANS("DEBTOR",#ID,53,"X") from DEBTOR.INFO will return nothing.
See this article on U2's site for a possible solution.
Would something like this work (two steps):
SELECT DEBTOR.INFO SAVING PACKET
LIST DEBTOR ....
This creates a select list of the data in the PACKET field in the DEBTOR.INFO file and makes it active. (If you have duplicate values that way you can add the keyword UNIQUE after SAVING).
Then the subsequent LIST command uses that active select list which contains values found in the #ID field of the file DEBTOR.
Not sure if you are still looking at this, but there is a simple option that will not require a lot of programming.
I did it with a program, a subroutine and a dictionary item.
First I set a named common variable to contain the list of DEBTOR.INFO ids:
SETLIST
*
* Use named common to hold list of keys
COMMON /MYKEYS/ KEYLIST
*
* Note for this example I am reading the list from SAVEDLISTS
OPEN "SAVEDLISTS" TO FILE ELSE STOP "CAN NOT OPEN SAVEDLISTS"
READ KEYLIST FROM FILE, "MIKE000" ELSE STOP "NO MIKE000 ITEM"
Now, I can create a subroutine that checks for a value in that list
CHECKLIST
SUBROUTINE CHECKLIST( RVAL, IVAL)
COMMON /MYKEYS/ KEYLIST
LOCATE IVAL IN KEYLIST <1> SETTING POS THEN
RVAL = 1
END ELSE RVAL = 0
RETURN
Lastly, I use a dictionary item to call the subroutine with the field I am looking for:
INLIST:
I
SUBR("CHECKLIST", FK)
IN LIST
10R
S
Now all I have to do is put the correct criteria on my list statement:
LIST DEBTOR WITH INLIST = 1 ACCOUNT STATUS FK
Id use the very powerfull EVAL with an XLATE ;
SELECT DEBTOR WITH EVAL \XLATE('DEBTOR.INFO',#RECORD<53>,'-1','X')\ NE ""

Search query in sqlite3 database for certain items

I have a list of items as an string array in C++. I also have a sqlite3 database which contains blacklisted strings. Now I must Use the list of items that i have to mark them with 0 or 1, telling me if they are blacklisted or not. I could do search for them one by one by using "Select * from ITEMS_TABLE WHERE item = string[i]" but it will take time. I could also pull blacklist from database and then look for them in my list. But is there an efficient way to find out which of the items in my list are blacklisted.
Lets say I have following structure
struct item
{
char name[MAX_NAME_LEN];
bool isBlacklisted;
};
Then i use array of these structures to knows if any of them is blacklisted. So i have to make isBlacklisted flag to true, if the entry is found in database. If i use Select approach, it returns me list of items that were blacklisted. But i still need to find them in my array using string comparisons. Is there some efficient way to do is. Does database provide any such functionality.
Thanks and regards,
Mike.
Design your database structure according to your requirements. You want to know blacklist items simply use a column which contains 0 or 1 for blacklist or not i.e your table ITEMS_TABLE has these columns
itemcode itemname isblacklist
1 item1 0
2 item2 0
3 item3 1
now
Select * from ITEMS_TABLE WHERE isblacklist=0
this will return non blacklist items and
Select * from ITEMS_TABLE WHERE isblacklist=1
will return blacklist items, Hope this will help you

Multiple access to static data in a django app

I'm building an application and I'm having trouble making a choice about how is the best way to access multiple times to static data in a django app. My experience in the field is close to zero so I could use some help.
The app basically consists in a drag & drop of foods. When you drag a food to a determined place(breakfast for example) differents values gets updated: total breakfast calories, total day nutrients(Micro/Macro), total day calories, ...That's why I think the way I store and access the data it's pretty important performance speaking.
This is an excerpt of the json file I'm currently using:
foods.json
{
"112": {
"type": "Vegetables",
"description": "Mushrooms",
"nutrients": {
"Niacin": {
"unit": "mg",
"group": "Vitamins",
"value": 3.79
},
"Lysine": {
"units": "g",
"group": "Amino Acids",
"value": 0.123
},
... (+40 nutrients)
"amount": 1,
"unit": "cup whole",
"grams": 87.0 }
}
I've thought about different options:
1) JSON(The one I'm currently using):
Every time I drag a food to a "droppable" place, I call a getJSON function to access the food data and then update the corresponding values. This file has a 2mb size, but it surely will increase as I add more foods to it. I'm using this option because it was the most quickest to begin to build the app but I don't think it's a good choice for the live app.
2) RDBMS with normalized fields:
I could create two models: Food and Nutrient, each food has 40+ nutrients related by a FK. The problem I see with this is that every time a food data request is made, the app will hit the db a lot of times to retrieve it.
3) RDBMS with picklefield:
This is the option I'm actually considering. I could create a Food models and put the nutrients in a picklefield.
4) Something with Redis/Django Cache system:
I'll dive more deeply into this option. I've read some things about them but I don't clearly know if there's some way to use them to solve the problem I have.
Thanks in advance,
Mariano.
This is a typical use case for a relational database. More or less normalized form is the proper way most of the time.
I wrote this data model up from the top of my head, according to your example:
CREATE TABLE unit(
unit_id integer PRIMARY KEY
,unit text NOT NULL
,metric_unit text NOT NULL
,atomic_amount numeric NOT NULL
);
CREATE TABLE food_type(
food_type_id integer PRIMARY KEY
,food_type text NOT NULL
);
CREATE TABLE nutrient_type(
nutrient_type_id integer PRIMARY KEY
,nutrient_type text NOT NULL
);
CREATE TABLE food(
food_id serial PRIMARY KEY
,food text NOT NULL
,food_type_id integer REFERENCES food_type(food_type_id) ON UPDATE CASCADE
,unit_id integer REFERENCES unit(unit_id) ON UPDATE CASCADE
,base_amount numeric NOT NULL DEFAULT 1
);
CREATE TABLE nutrient(
nutrient_id serial PRIMARY KEY
,nutrient text NOT NULL
,metric_unit text NOT NULL
,base_amount numeric NOT NULL
,calories integer NOT NULL DEFAULT 0
);
CREATE TABLE food_nutrient(
food_id integer references food (food_id) ON UPDATE CASCADE ON DELETE CASCADE
,nutrient_id integer references nutrient (nutrient_id) ON UPDATE CASCADE
,amount numeric NOT NULL DEFAULT 1
,CONSTRAINT food_nutrient_pkey PRIMARY KEY (food_id, nutrient_id)
);
CREATE TABLE meal(
meal_id serial PRIMARY KEY
,meal text NOT NULL
);
CREATE TABLE meal_food(
meal_id integer references meal(meal_id) ON UPDATE CASCADE ON DELETE CASCADE
,food_id integer references food (food_id) ON UPDATE CASCADE
,amount numeric NOT NULL DEFAULT 1
,CONSTRAINT meal_food_pkey PRIMARY KEY (meal_id, food_id)
);
This is definitely not, how it should work:
every time a food data request is made, the app will hit the db a lot
of times to retrieve it.
You should calculate / aggregate all values you need in a view or function and hit the database only once per request, not many times.
Simple example to calculate the calories of a meal according to the above model:
SELECT sum(n.calories * fn.amount * f.base_amount * u.atomic_amount * mf.amount)
AS meal_calories
FROM meal_food mf
JOIN food f USING (food_id)
JOIN unit u USING (unit_id)
JOIN food_nutrient fn USING (food_id)
JOIN nutrient n USING (nutrient_id)
WHERE mf.meal_id = 7;
You can also use materialized views. For instance, store computed values per food in a table and update it automatically if underlying data changes. Most likely, those rarely change (but are still easily updated this way).
I think the flat file version you are using comes in last place. Every time it is requested it is being read from top to bottom. For the size I think this comes in last place. The cache system would provide the best performance, but the RDBMS would be the easiest to manage/extend, plus your queries will automatically be cached.