Index versions(historical) data - amazon-web-services

I have a table storing all the old versions for item in DynamoDB. Is there a way to index the versions data using search engine or other things so that I can do something gracefully for the following scenarios:
Use case 1:
Search for current items that match a specific query with sorting and pagination and get their historical data for a date range (Between January and June, what were the statuses of those items each month?). This would ideally be able to happen in one query.
Use case 2:
Find differences for a field based on the version history.
eg. Find all items whose field changed from 'A' to 'B'
Current state:
Use case 1:
1.Get all the items matching query first.
2.For each item ,issue a separate query to get all the versions for that
item.
3.Do a lot of in-memory filter operations in back end, and get the data
within data range.
Use case 2:
not supported now

Related

Django filter get at least n records

Is there an efficient way to get at least n records for a given filter and order? More specifically, I want to get a list of all entries in a model that have a certain date field greater than a month ago, but in case there are less than 10 such entries matching the filter, then get at least 10 of such entries by relaxing the filter. I can do this by getting the count, then checking and making the query again, but was wondering if there was a better way to do it.

Fastest way to select several inserted rows

I have a table in a database which stores items. Each item has a unique ID, which the DB generates upon insertion (auto-increment).
A user may perform a specific task that will add X items to the database, however my program (C++ server application using MySQL connector) should return the IDs that the database generated right away. For example, if I add 6 items, the server must return 6 new unique IDs to the client.
What is the fastest/cleanest way to do such thing? So far I have been doing INSERT followed by SELECT for each new item OR INSERT followed by last_insert_id, however if there are 50 items to add it will take a few seconds at least which is not good at all for user experience.
sql_task.query("INSERT INTO `ItemDB` (`ItemName`, `Type`, `Time`) VALUES ('%s', '%d', '%d')", strName.c_str(), uiType, uiTime);
Getting the ID:
uint64_t item_id { sql_task.last_id() }; //This calls mysql_insert_id
I believe you need to rethink your design slightly. Let's use the analogy of a sales order. With a sales order (or invoice #) the user gets an invoice number (auto_incr) as well as multiple line item numbers (also auto_inc).
The sales order and all of the line items are selected for insert (from the GUI) and the inserts are performed. First, the sales order row is inserted and its id is saved in a variable for subsequent calls to insert the line items. But the line items are then just inserted without immediate return of their auto_inc id values. The application is merely returned the sales order number in the end. How your app uses that sales order number in subsequent calls is up to you. But it does not need to be immediate to retrieve all the X or 50 rows at once, as it has the sales order number iced and saved somewhere. Let's call that sales order number XYZ.
When you actually need the information, an example call could look like
select lineItemId
from lineItems
where salesOrderNumber=XYZ
order by lineItemId
You need to remember that in a multi-user system that there is no guarantee of receiving a contiguous block of numbers. Nor should it matter to you, as they are all attached appropriately with the correct sales order number.
Again, the above is just an analogy, used for illustration purposes.
That's a common but hard to solve problem. Unsure for mysql, but PostreSQL uses sequences to generate automatic ids. Inserting frameworks (object relationnal mappers) use that when they expect to insert many values: they query directly the sequence for a bunch of IDs and then insert new rows using those already known IDs. That way, no need for an additional query after each insert to get the ID.
The downside is that the relation ID - insertion time can be non monotonic when different writers intermix their inserts. It is not a problem for the database, but some (poorly written?) program could expect it is.
As you ID is autoincremental, you can do only two SELECT queries - before and after INSERT queries:
SELECT AUTO_INCREMENT FROM information_schema.tables WHERE table_name = 'dbTable' AND table_schema = DATABASE();
--
-- INSERT INTO dbTable... (one or many, does not matter);
--
SELECT LAST_INSERT_ID() AS lastID;
This will give you the siquence between first and last inserted IDs. Then you can easily calculate how many they are.

Sort treelistex by displayname, both available and selected items per language

I'm working with Sitecore 8 Update 2
I'm looking for a way to sort all items ( mainly the available items ) from a treelistex by displayname per language.
I've found some way to extend the list of selected items but not for the available items ( left column ).
How to sort the selected items in a Sitecore Treelist?
I've also found this but i can't seem to get it to work ( SortBy )
http://www.sitecore.net/learn/blogs/technical-blogs/john-west-sitecore-blog/posts/2012/10/more-enhancements-to-the-treelist-field-type-in-the-sitecore-aspnet-cms.aspx
Can someone give me a clear explanation on how to achieve this ?
One of the ways of achieving this could be writing a processor on Save Event Pipeline. When item save is called, you can check for TreelistEx field and Sort the selected values (since they are pipe separated guids, you may need to get each item from guid and re-arrange the pipe separated guids based on your sort) based on any field on the selection. I also think this will incur a performance hit on save (likely not much).

How to Query Large Sharepoint 2013 Lists in Infopath 2010?

I'm designing an Infopath form to help guide people in a data creation process. The form needs to draw from a Sharepoint list that contains around 19,000 rows, each with six columns that contain attributes (Column 1 = Attribute A, Column 2 = Attribute B, etc.) I've reduced the first three columns to their own lists, which contain only a few hundred unique entries each, if that. When I get to Column 4, there are 8,000 unique entries, which makes querying the list outright impossible
In an attempt to get around the item limitation, I've created an Infopath form with a data connection to the list (which does not automatically query when the form is loaded). Additionally, I've added drop downs that sets values for the queryFields of the secondary data source (one for Column 1, another for Column 2, and another for Column 3). On the last drop down, I set an action to query the database, but I still get the error regarding limitations and that rules cannot be applied.
Is there any way to "pre-filter" the data connection so that I can bypass the limitation by only drawing the data I need? Am I going about this the right way?
Any guidance would be greatly appreciated.
Are you able to add indexes to your list columns that you intend to query on? I've found that I can get around the error message on list limits if I go to the list and add an index for the columns that I will be setting as query fields prior to running my query data connection.

nosql/dynamodb hash and range use case

It's my first time using a NoSQL database so I'm really confused. I'd really appreciate any help I can get.
I want to store data comprising announcements in my table. Essentially, each announcement has an ID, a date, and a text.
So for example, an announcement might have ID of 1, date of 2014/02/26, and text of "This is a sample announcement". Newer announcements always have a greater ID value than older announcements, since they are added to the table later.
There are two types of queries I want to run on this table:
I want to retrieve the text of the announcements sorted in order of date.
I want to retrieve the text and dates of the x most recent announcements (say, the 3 most recent announcements).
So I've set up the table with the following attributes:
ID (number) as primary key, and
date (string) as range
Is this appropriate for what my use cases? And if so, what kind of query/reads/requests/scans/whatever (I'm really confused about the terminology here too) should I be running to accomplish the two types of queries I want to make?
Any help will be very much appreciated. Thanks!
You are on the right track.
As far as sorting, DynamoDB will sort by the range key, so date will work but I'd recommend storing it as a number, perhaps milliseconds since the Unix epoch, rather than a String. This will make it trivial to get the announcements in ascending or descending order based on their created date.
See this answer for an overview of local vs global secondary indexes and what capabilities they provide: Optional secondary indexes in DynamoDB
As far as retrieving all items, you would need to perform a scan. Scans are not as efficient as queries, but since all of Dynamo is on SSD's they're still relatively quick. You don't get the single digit millisecond performance with a scan that you get with a query, so if there's a way to associate announcements with a user ID, you might get better performance than with a scan.
Note that you cannot modify the table schema (hash key, range key, and indexes) after you create the table. There are ways to manually migrate a table or import/export it, but the point is that you should think hard about current and future query requirements up front and design the table to support them. It's very easy to add or stop storing non-key or non-item attributes though, which provides nice flexibility.
Finally, try to avoid thinking of Dynamo as relational. With Dynamo, in a lot of cases you may well be better off de normalizing or duplicating some of the data in exchange for fast query performance.