I have a field in my database that has 5 possible values: fair, good, very good, ideal, siganture ideal
I have a coldfusion form that has 2 drop-downs each with all the values. What I am looking to do is be able to have the user select a range. For example dropdown1 = Fair dropdown2 = Very Good. So this would somehow generate the SQL WHERE statement:
grade IN ('fair', 'good', 'very good')
Can you think of a smart way to program this given that the values have to be this way. I think maybe if I put them in an array and then looped through it or something. I'm a little stumped on this any help would be appreciated.
As others mentioned, redesigning is ultimately the better course of action, both in terms of efficiency and data integrity. However, if you absolutely cannot change the structure, a possible workaround is to create a lookup table of the allowable grade descriptions, along with a numeric rating value for each one:
GradeID | GradeText | Rating
1 | Fair | 0
2 | Good | 1
3 | Very Good | 2
4 | Ideal | 3
5 | Signature Ideal | 4
Then populate your select list from a query on the lookup table. Be sure to ORDER BY Rating ASC and use the rating number as the list value. Then on your action page, use the selected values to filter by range. (Obviously validate the selected range is valid as well)
SELECT t.ColumnName1, t.ColumnName2
FROM SomeTable t INNER JOIN YourLookupTable lt ON lt.Grade = t.GradeText
WHERE lt.Rating BETWEEN <cfqueryparam value="#form.dropdown1#" cfsqltype="cf_sql_integer">
AND <cfqueryparam value="#form.dropdown2#" cfsqltype="cf_sql_integer">
Again, I would recommend restructuring instead. However, the above should work if that is really not an option.
Related
I have a simple table that stores list of names for a particular record. I want to ensure that a name can never be used for any other record more than once. The names column should also not be empty; there should always be at least 1 given name.
| ID | Names |
|-----------------------|------------------------------------------------|
| 111 | [john, bob] |
| 222 | [tim] |
| 333 | [bob] (invalid bob already used) |
Easiest solution I believe for this case is to simply use a 2nd table for the values you are interested in being the primary key. Then in application code, simply check that new table if they exist or not to determine if you should create a new record in the primary table. For a List[L] this simply avoids having to traverse every single list of every record to determine if a particular scalar value already exists or not. Credit to Tamas.
https://advancedweb.hu/how-to-properly-implement-unique-constraints-in-dynamodb/
Here’s a blog post describing how to best enforce uniqueness constraints: https://aws.amazon.com/blogs/database/simulating-amazon-dynamodb-unique-constraints-using-transactions/
user id 8a0615d2-b123-4714-b76e-a9607a518979 has many entries in mylog table. each with an ip_id field. I'd like to see a weighted list of these ip_id fields.
in sql i use:
select distinct(ip_id), count(ip_id) from mylog
where user_id = '8a0615d2-b123-4714-b76e-a9607a518979'
group by ip_id
this gets me:
ip_id count
--------------------------------------+--------
84285515-0855-41f4-91fb-bcae6bf840a2 | 187
fc212052-71e3-4489-86ff-eb71b73c54d9 | 102
687ab635-1ec9-4c0a-acf1-3a20d0550b7f | 84
26d76a90-df12-4fb7-8f9e-a5f9af933706 | 18
389a4ae4-1822-40d2-a4cb-ab4880df6444 | 10
b5438f47-0f3a-428b-acc4-1eb9eae13c9e | 3
Now I am trying to get to the same result in django. It's surprisingly elusive.
Getting the user:
u = User.objects.get(id='8a0615d2-b123-4714-b76e-a9607a518979') #this works fine.
I tried:
logs = MyLog.objects.filter(Q(user=u) & Q(ip__isnull=False)).values('ip').annotate(total=Count('ip', distinct=True))
I am getting 6 rows in logs which is fine, but the count is always 6, not the weight of the unique ip as it is in the SQL response above.
What am I doing wrong?
You seem to be mistaken about what the keyword argument distinct does in the Count function. It simply means you want to count only the distinct values (you actually don't want to do that). In fact the part in your SQL query distinct(ip_id) is also redundant as you are going to use the group by clause on that anyway.
Furthermore you write .value('ip') which is a typo and should be .values('ip').
So your ORM query should be:
logs = MyLog.objects.filter(Q(user=u) & Q(ip__isnull=False)).values('ip').annotate(total=Count('ip'))
I have table in Amazon DynamoDB with partition key and range key.
Table structure
Subscriber ID (partition key) | Item Id (Range Key) | Date |...
123 | P_345 | some date 1 | ...
123 | I_456 | some date 2 |
123 | A_678 | some date 3 | ...
Now I want to retrieve the data from the table using QueryAsync C# library with multiple scan conditions.
HashKey = 123
condition 1; Date is between 'some date 1' and 'some date 2'
condition 2. Range Key begins_with I_ and P_
Is there any way which I can achieve this using c# dynamoDB APIs?
Please help
You'll need to do the following (I'm not a C# expert, but you can use the following instructions to find the right C# syntax to do it):
Because you are looking for a specific hashkey, this will be a Query request, not a Scan.
You have a begins_with() condition on the range key. You specify that using the KeyConditionExpression parameter to the Query. The KeyConditionExpression will ask for HashKey=123 AND begins_with(RangeKey,"P_").
However, KeyConditionExpression does not allow an "OR" (rangekey begins with either "P_" or "I_"). You'll just need to run two separate queries - one with "I_" and one with "P_" (you can even do the two queries in parallel, if you wish).
The date is not one of the key columns, so you will need to filter it with a FilterExpression parameter to the query. Note that filtering only happens in the last step, after DynamoDB already read all the items matching the KeyConditionExpression above (this may increase your costs if filtering removes a lot of items and you will still pay for them).
I want to implement a search query for a bookshop. I use MySQL and I have a varchar column which contains name, author or other details like The Tragedy of Hamlet, Prince of Denmark, by William Shakespeare and I want to search like shakespeare tragedy or denmark tragedy to have a list of books have them in their one column.
I have three queries to implement this but I want to know about their performance.
LIKE %%
my first way is to split search text into words and create a dynamic command based on word counts:
SELECT * FROM books
WHERE name LIKE '%shakespeare%'
AND name LIKE '%tragedy%'
But I was told that like is a slow operator specially with two % because it can not use index.
TAG table and relational division
My second way is to have another table which contains tags like:
-------------------------
| book_id | tag |
|-----------------------|
| 1 | Tragedy |
| 1 | Hamlet |
| 1 | Prince |
| 1 | Denmark |
| 1 | William |
| 1 | Shakespeare |
-------------------------
And create a dynamic divide command:
SELECT DISTINCT book_id FROM booktag AS b1
WHERE ((SELECT 'shakespeare' as tag UNION SELECT 'tragedy' as tag)
EXCEPT
SELECT tag FROM booktag AS b2 WHERE b1.book_id = b2.book_id) IS NULL
But I was told that relational division is so slow too.
REGEXP
My third way is to use regular expressions:
SELECT * FROM books
WHERE name REGEXP '(?=.*shakespeare)(?=.*tragedy)'
But someone told me that it is slower than LIKE
Please help me decide which way is faster?
Surely using LIKE which is a built-in operand, is more optimized than Regular expression. But there is an important point here that you can not compare these two recipes together, because LIKE used to add a wildcard to string and regex is for matching a string based on a pattern which can be much complex.
Anyway the best ways which come in my mind for this aim, would be one of the followings:
Use LIKE on your column which has been indexed properly.1
Using some optimized search technologies like elastic search.
Implement a multithreading algorithm 2 which performs very good with IO tasks. For this one you can use some tricks like defining an offset and divide the table among the threads.
Also for some alternative ways read this article https://technet.microsoft.com/en-us/library/aa175787%28v=sql.80%29.aspx
1. You should be careful of the way you put the indices on your columns.read this answer for more info https://stackoverflow.com/a/10354292/2867928 and this post http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning
2.Read this answer for more info Multi Thread in SQL?
I use the following query to create my table.
create table t1 (url varchar(250) unique);
Then I insert about 500 urls, twice. I am expecting that the second time I had the URLs that no new entries show up in my table, but instead my count value doubles for:
select count(*) from t1;
What I want is that when I try and add a url that is already in my table, it is skipped.
Have I declared something in my table deceleration incorrect?
I am using RedShift from AWS.
Sample
urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
url | wascrawled | source | date_crawled
-----------------------+------------+--------+--------------
http://www.google.com | 0 | 1 |
(1 row)
urlenrich=# insert into seed(url, source) select 'http://www.google.com', '1';
INSERT 0 1
urlenrich=# select * from seed;
url | wascrawled | source | date_crawled
-----------------------+------------+--------+--------------
http://www.google.com | 0 | 1 |
http://www.google.com | 0 | 1 |
(2 rows)
Output of \d seed
urlenrich=# \d seed
Table "public.seed"
Column | Type | Modifiers
--------------+-----------------------------+-----------
url | character varying(250) |
wascrawled | integer | default 0
source | integer | not null
date_crawled | timestamp without time zone |
Indexes:
"seed_url_key" UNIQUE, btree (url)
Figured out the problem
Amazon RedShift does not enforce constraints...
As explained here
http://docs.aws.amazon.com/redshift/latest/dg/t_Defining_constraints.html
They said they may get around to changing it at some point.
NEW 11/21/2013
RDS has added support for PostGres, if you need unique and such an postgres rds instance is now the best way to go.
In redshift, constraints are recommended but doesn't take effect, constraints will just help to the query planner to select better ways to perform the query.
Usually, columnar databases do not manage indexes or constraints.
Although Amazon Redshift doesn't support unique constraints, there are some ways to delete duplicated records that can be helpful.
See the following link for the details.
copy data from Amazon s3 to Red Shift and avoid duplicate rows
Primary and unique key enforcement in distributed systems, never mind column store systems, is difficult. Both RedShift (Paracel) and Vertica face the same problems.
The challenge with a column store is that the question that is being asked is "does this table row have a relevant entry in another table row" but column stores are not designed for row operations.
In HP Vertica there is an explicit command to report on constraint violations.
In Redshift it appears that you have to roll your own.
SELECT COUNT(*) AS TotalRecords, COUNT(DISTINCT {your PK_Column}) AS UniqueRecords
FROM {Your table}
HAVING COUNT(*)> COUNT(DISTINCT {your PK_Column})
Obviously, if you have a multi-column PK you have to do something more heavyweight.
SELECT COUNT(*)
FROM (
SELECT {PkColumns}
FROM {Your Table}
GROUP BY {PKColumns}
HAVING COUNT(*)>1
) AS DT
If the above returns a value greater than zero then you have a primary key violation.
For anyone who:
Needs to use redshift
Wants unique inserts in a single query
Doesn't care too much about query performance
Only really cares about inserting a single unique value at a time
Here's an easy way to get it done
INSERT INTO MY_TABLE (MY_COLUMNS)
SELECT MY_UNIQUE_VALUE WHERE MY_UNIQUE_VALUE NOT IN (
SELECT MY_UNIQUE_VALUE FROM MY_TABLE
WHERE MY_UNIQUE_COLUMN = MY_UNIQUE_VALUE
)