Field order on query MySQL from Clojure JDBC - clojure

When I query MySQL from Clojure (jdbc), the fields are not returned in the order specified in SELECT clause (I'm calling a stored procedure that does the select). Initially it seemed that the fields were returned in the reverse order, but this happens only if there are 1 to 9 fields. Adding the tenth field makes the result set to go in no particular order, altough it is always the same order for a particular number of fields in result set.
Anyone has observed it?

You can ask java.jdbc to return individual rows as vectors rather than maps (or arrays, in spite of the option name) by passing :as-arrays? true to query; field order will then be preserved:
;; checked with java.jdbc 0.3.0-alpha4
(query db [sql params...] :as-arrays? true)
Note that in this mode of operation an extra vector containing the keys corresponding to the column names (which would otherwise be used in the constructed maps) will be prepended to the seq of actual result vectors.
By default, java.jdbc returns rows as maps, as per Arthur's answer. These will be array maps up to 9 entries (maintaining insertion order) and hash maps beyond this threshold (with no useful ordering).

It is most likely that the fields are being returned in order and then subsequently being re-orderd by the data structure they are packed into by the clojure jdbc library. Assuming you are using clojure.java.jdbc the results are returned in a list of maps like this:
{:name "Apple" :appearance "rosy" :cost 24}
{:name "Orange" :appearance "round" :cost 49}
where each map is one row. The order of the rows will be preserved because they are presented in a list, though the order of the fields is not because they are presented in maps (which do not guarantee order. You could sort them afterwords if you need a particular order or call

Related

Querying DynamoDB with a partition key and list of specific sort keys

I have a DyanmoDB table that for the sake of this question looks like this:
id (String partition key)
origin (String sort key)
I want to query the table for a subset of origins under a specific id.
From my understanding, the only operator DynamoDB allows on sort keys in a Query are 'between', 'begins_with', '=', '<=' and '>='.
The problem is that my query needs a form of 'CONTAINS' because the 'origins' list is not necessarily ordered (for a between operator).
If this was SQL it would be something like:
SELECT * from Table where id={id} AND origin IN {origin_list}
My exact question is: What do I need to do to achieve this functionality in the most efficient way? should I change my table structure? maybe add a GSI? Open to suggestions.
I am aware that this can be achieved with a Scan operation but I want to have an efficient query. Same goes for BatchGetItem, I would rather avoid that functionality unless absolutely necessary.
Thanks
This is a case for using Filter Expressions for Query. It has IN operator
Comparison Operator
a IN (b, c, d) — true if a is equal to any value in the list — for
example, any of b, c or d. The list can contain up to 100 values,
separated by commas.
However, you cannot use condition expressions on key attributes.
Filter Expressions for Query
A filter expression cannot contain partition key or sort key
attributes. You need to specify those attributes in the key condition
expression, not the filter expression.
So, what you could do is to use origin not as a sort key (or duplicate it with another attribute) to filter it after the query. Of course filter first reads all the items has that 'id' and filters later which consumes read capacity and less efficient but there is no other way to query that otherwise. Depending on your item sizes and query frequency and estimated number of returned items BatchGetItem could be a better choice.

Is there a benefit to using STORING clause on an INTERLEAVE secondary index in Cloud Spanner?

If we are using the Interleave option with a secondary index, is there still benefit to using the storing clause?
https://cloud.google.com/spanner/docs/secondary-indexes
Yes, there can still be benefit, although it's less likely to be a major benefit:
Let's say you have interleaved tables Singers->Albums->Songs, and you have an index:
CREATE INDEX SongsBySingerSongName ON Songs(SingerId, SongName),
INTERLEAVE IN Singers
Let's also assume that Songs has a FLOAT64 column, LengthInSeconds, for storing the length of a song.
If you wanted to look up all songs for SingerId 123 that started with "T" and were less than 4 minutes long, your query could be executed by:
Using SongsBySingerSongName to lookup all songs for Singer 123
that start with "T"
For these songs, back-join with Songs to
lookup LengthInSeconds to filter by length.
Since both Songs and SongsBySingerSongName are interleaved in the Singers table, we know that our data should all be in the same split, which means it will all reside on the same machine, which means the back-join in step (2) won't be terribly costly. However, the local back-join still incurs a cost to lookup the data, so saving step (2) by having a STORING clause could still reduce your query latency and overall cost. You would want to do benchmarking of your workload to see if the extra storing clause provides a net-benefit.
In general, if you have filters in your query that refer to columns in the index (either key columns or 'storing' columns), the filters can be evaluated before doing the back-join to the base table, and if the filter does not match, the back-join can be avoided. If the filter refers to a column that is not in the index, the back-join has to be done first to get the column value that the filter refers

DynamoDB create index on map or list type

I'm trying to add an index to an attribute inside of a map object in DynamoDB and can't seem to find a way to do so. Is this something that is supported or are indexes really only allowed on scalar values? The documentation around this seems to be quite sparse. I'm hoping that the indexing functionality is similar to MongoDB but so far the approaches I've taken of referencing the attribute to index using dot syntax has not been successful. Any help or additional info that can be provided is appreciated.
Indexes can be built only on top-level JSON attributes. In addition, range keys must be scalar values in DynamoDB (one of String, Number, Binary, or Boolean).
From http://aws.amazon.com/dynamodb/faqs/:
Q: Is querying JSON data in DynamoDB any different?
No. You can create a Global Secondary Index or Local Secondary Index
on any top-level JSON element. For example, suppose you stored a JSON
document that contained the following information about a person:
First Name, Last Name, Zip Code, and a list of all of their friends.
First Name, Last Name and Zip code would be top-level JSON elements.
You could create an index to let you query based on First Name, Last
Name, or Zip Code. The list of friends is not a top-level element,
therefore you cannot index the list of friends. For more information
on Global Secondary Indexing and its query capabilities, see the
Secondary Indexes section in this FAQ.
Q: What data types can be indexed?
All scalar data types (Number, String, Binary, and Boolean) can be
used for the range key element of the local secondary index key. Set,
list, and map types cannot be indexed.
I have tried doing hash(str(object)) while I store the object separately. This hash gives me an integer(Number) and I am able to use a secondary index on it. Below is a sample in python, it is important to use a hash function which generates the same hash key every time for the value. So I am using sha1.
# Generate a small integer hash:
import hashlib
def hash_8_digits(source):
return int(hashlib.sha1(source.encode()).hexdigest(), 16) % (10 ** 8)
The idea is to keep the entire object small while still the entity intact. i.e. rather than serializing and storing the object as string and changing whole way the object is used I am storing a smaller hash value along with the actual list or map.

Datomic queries and laziness

I'm surprised to find that query results in datomic are not lazy, when entities are.
Is there an obvious rationale for this choice that I am missing? It seems reasonable that someone might want to want to (map some-fn (take 100 query-result-containing-millions)), but this would force the evaluation of the entire set of entity-ids, no?
Is there a way to get a lazy seq (of entity-ids) directly back from the query, or do they always have to be loaded into memory first, with laziness only available through the entity?
You can use the datomic.api/datoms fn to get access to entities in a lazy way.
Note that you have to specify the index type when calling datoms and the types of indexes available to you depends on the type of the attribute that you're interested in. Eg the :avet index is only available if your attribute has :db/index set in the schema, and the :vaet index is only available if your attribute is of type :db.type/ref.
We use something like this at work (note: the attribute, ref-attr, must be of :db.type/ref for this to work):
(defn datoms-by-ref-value
"Returns a lazy seq of all the datoms in the database matching the
given reference attribute value."
[db ref-attr value]
(d/datoms db :vaet value ref-attr))
The datoms documentation is a bit sparse, but with some trial an error you can probably work out what you need. There's a post by August Lilleaas about using the :avet index (which requires an index on the attribute in the datomic schema) that I found somewhat helpful.

CouchDB reduce error?

I have a sample database in CouchDB with the information of a number of aircraft, and a view which shows the manufacturer as key and the model as the value.
The map function is
function(doc) {
emit(doc["Manufacturer"], doc._id)
}
and the reduce function is
function(keys, values, rereduce){
return values.length;
}
This is pretty simple. And I indeed get the correct result when I show the view using Futon, where I have 26 aircraft of Boeing:
"BOEING" 26
But if I use a REST client to query the view using
http://localhost:6060/aircrafts/_design/basic/_view/VendorProducts?key="BOEING"
I get
{"rows":[
{"key":null,"value":2}
]}
I have tested different clients (including web browser, REST client extensions, and curl), all give me the value 2! While queries with other keys work correctly.
Is there something wrong with the MapReduce function or my query?
The issue could be because of grouping
Using group=true (which is Futon's default), you get a separate reduce value for each unique key in the map - that is, all values which share the same key are grouped together and reduced to a single value.
Were you passing group=true as a query parameter when querying with curl etc? Since it is passed by default in futon you saw the results like
BOEING : 26
Where as without group=true only the reduced value was being returned.
So try this query
http://localhost:6060/aircrafts/_design/basic/_view/VendorProducts?key="BOEING"&group=true
You seem to be falling into the re-reduce-trap. Couchdb strictly speaking uses a map-reduce-rereduce process.
Map: reformats your data in the output format.
Reduce: aggregates the data of several (but not all entries with the same key) - which works correctly in your case.
Re-reduce: does the same as reduce, but on previously reduced data.
As you change the format of the value in the reduce stage, the re-reduce call will aggregate the number of already reduced values.
Solutions:
You can just set the value in the map to 1 and reduce a sum of the values.
You check for rereduce==true and in that case return a sum of the values - which will be the integer values returned by the initial reduce.