Create tables
I have a database composed of two tables:
ENTITE_CANDIDATE
VARIATIONS
Tables are created by using the following queries:
CREATE TABLE IF NOT EXISTS ENTITE_CANDIDATE (ID INTEGER PRIMARY KEY NOT NULL, ID_KBP TEXT NOT NULL, wiki_title TEXT, type TEXT NOT NULL);"
CREATE TABLE IF NOT EXISTS VARIATIONS (ID INTEGER PRIMARY KEY NOT NULL, ID_ENTITE INTEGER, NAME TEXT, TYPE TEXT, LANGUAGE TEXT, FOREIGN KEY(ID_ENTITE) REFERENCES ENTITE_CANDIDATE(ID));"
Table ENTITE_CANDIDATE is composed of 818,742 records
Table VARIATIONS is composed of 154,716,653 records
Index tables
I indexed the previous tables by using the following queries:
`CREATE INDEX var_id ON VARIATIONS (ID, ID_ENTITE, NAME);`
`CREATE INDEX entity_id ON ENTITE_CANDIDATE (ID, wiki_title);`
Retrieve information
I want to retrieve from table VARIATIONS the following records:
"SELECT ID, ID_ENTITE, NAME FROM VARIATIONS WHERE NAME=foo ;"
Every select query is taking around 5.414931 seconds. I know the table contains a very large number of records. But can I make the retrieval faster? Am I indexing correctly the tables?
The documentation says:
the index might be used if the initial columns of the index … appear in WHERE clause terms.
This query uses only the NAME column to search, so the var_id index cannot be used. (That index is useful only for lookups that use ID, which is mostly useless because the ID column is already indexed as PRIMARY KEY.)
Related
I want to display image in report column. My table has composite primary key: ID and DATE.
When I add these columns in BLOB attributes as Primary Key Column 1 and Primary Key Column 2 report can not find data because of DATE column. Is it a problem in date format, or something else?
I'd suggest you to use only one column as a primary key column (a sequence or - if your database version supports it - an identity column).
Combination of [ID, DATE] you currently have can then be set to unique key (set both columns NOT NULL to "mimic" what primary key would do).
Why? Although your data model probably is just fine, certain Apex functionalities "suffer" from such things and prefer having a single-column primary keys.
I have a dynamo-db table with following schema
{
"id": String [hash key]
"type": String [range key]
}
I have a usecase where I need to fetch last 3 rows for a given id when type is unknown.
Your items need a timestamp attribute. Without that they can’t be sorted out filtered by time. Once you have that, you can define a local secondary index with the id as partition key and the timestamp as the sort key. You can then get the top three items from the index.
Find more information about DynamoDb’s Local Secondary Index here.
Add a field to store the timestamp to the schema
Use query to fetch all the records for the given key
Query always returns records sorted by range key, you cannot set a sort order (without changing table's schema), so, sort the records by timestamp in your code
Get top 3 records
If you have a lot of records, use filter expressions to drop extra results. E.g. if you know that latest records will always have a timestamp not older than a hour (day, week or so) you could filter older records.
I'm having a little hard time understanding Cassandra. I simply couldn't write this question without making it look like confusing, but as I detail it below it may become clearer.
Suppose I have this datatype that I've created:
CREATE TYPE transaction (
transaction_id UUID,
value float,
transaction_date timestamp,
PRIMARY KEY (transaction_id, transaction_date)
);
PS: I'm using it as if it was a 'class', but that might be a logical mistake of mine, please correct me if it can't be used as such.
Anyway, also I have this Column Family, in which I've created a list of this 'transaction' datatype:
CREATE TABLE transactions_history_by_date (
wallet_address UUID,
user_id UUID,
transactions list <transaction>,
PRIMARY KEY (wallet_address, transaction_date))
WITH CLUSTERING ORDER BY (transaction_date DESC);
So what I'd like to know if this Column Family above is correct. I'd like to get all the transactions of a wallet, sorted by the transaction date (but the date is a column of the 'transaction' datatype - and to complicate it even more, in this Column Family there's a list of transactions, and not just a single one).
No, in Cassandra you can sort only on the value of the clustering column - in this case you need to move transaction_date into table itself...
To expand on Alex's answer, in your situation I think the best approach would probably be to denormalise your table. Rather than using a UDT, you could create something like this:
CREATE TABLE transactions_history_by_date (
wallet_address UUID,
user_id UUID,
transaction_id UUID,
value float,
transaction_date timestamp,
PRIMARY KEY ((wallet_address), transaction_date, transaction_id))
WITH CLUSTERING ORDER BY (transaction_date DESC);
Now you can make the following query and the results will be sorted by date:
SELECT * FROM transactions_history_by_date WHERE wallet_address = ...;
Note that I added transaction_id as a second clustering key. If this was omitted the table would not have been able to hold two transactions that had the same wallet_address and the same transaction_date. This is because unique rows are identified by the primary key.
I'm using the following: dynamodb2, boto, python. I have the following code for creating a table:
table = Table.create('mySecondTable',
schema=[HashKey('ID')],
RangeKey('advertiser'),
throughput={'read':5,'write':2},
global_indexes=[GlobalAllIndex('otherDataIndex',parts=[
HashKey('date',data_type=NUMBER),
RangeKey('publisher', date_type=str),
],throughput={'read':5,'write':3})],
connection=conn)
I would like to be able to have the following data that I can query by:
ID, advertiser, date, publisher, size, and color
That means I need a different schema. When I add additional points it does not query unless the column name is listed in the schema.
The problem however is that right now I am only able to query by Id, advertiser, date, and publisher in this case. How can I add additional columns that I can query by?
I read this which appears to say that it is possible:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html
However there is no example here:
http://boto.readthedocs.org/en/latest/dynamodb2_tut.html
I tried adding an additional range key however it doesn't work (cannot have duplicates)
I'd like it to be like:
table = Table.create('mySecondTable',
schema=[
RangeKey('advertiser'),
otherKey('date')
fourthKey('publisher') ... etc
throughput={'read':5,'write':2},
connection=conn)
Thanks!
If you want to add additional range keys you need to use Local secondary index.
You can query the LSI in the same way that you query the base table. You need to provide an exact value for the hashkey and a comparison-predicate for range key.
I trying to code a many to many relationship in c++ sqlite3.
in the diagram below,
managers can add many job opportunities.
jobs opportunities is being add by many managers
my create table statements
"CREATE TABLE Manager(" \
"manager_id INTEGER PRIMARY KEY NOT NULL,"\
"name varchar(45) NOT NULL);"
"CREATE TABLE jobs ("
"jobId INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,"\
"jobTitle varchar(45) NOT NULL);"
"CREATE TABLE Add ("
"manager_id,jobId INTEGER PRIMARY KEY NOT NULL,"\
"date varchar(45) NOT NULL,"\
"FOREIGN KEY(manager_id) REFERENCES Manager(manager_id),"\
"FOREIGN KEY(job_id) REFERENCES jobs(job_id));";
my manager table is populated with the following information
1|john
2|bob
let's say manager john has added two jobs,jobTitle jobA and jobB
then my insert statement code will look like this.http://pastebin.com/0E8CzPgX
then my jobs tables is populated with the following information
1|jobA
2|jobB
the final step is to take the id of john(manager id = 1) and the two jobsId(1,2) and add it inside
the add table. I don't have an idea of how should I code
so that the add table will become like this.
add table
manager_id|job_id|date
1 | 1 |30-01-2014
1 | 2 |30-01-2014
please advise.thanks
Do you mean something like
sql = "INSERT INTO Add(manager_id,jobId,date) VALUES (?,?,?);";
?
Your problem seems to be that you defined jobID to be the primary key of the table Add, which you don't need.
jobId INTEGER PRIMARY KEY NOT NUL
A common approach to many-to-many relations in a database is to include an intermediate table.
This intermediate table (let's call it Manager_jobs) would have at least 2 columns, both referring to other tables via foreign key. The first attribute would be the primary key of Manager, the second one the primary key of jobs.
Each time you add a job, you just add an entry to Manager_jobs with the foreign keys respectively.
So, Manager_jobs would look like this:
ManagerID | JobID
==========|======
4 | 2
3 | 2
4 | 1
As you can see, Manager_jobs can encode that a Manager has multiple jobs assigned and vice versa.
This approach, of course, requires you to have some form of primary key for both data tables.