Pagination with DynamoDBMapper Java AWS SDK

Pagination with DynamoDBMapper Java AWS SDK - amazon-web-services

From the API docs dynamo db does support pagination for scan and query operations. The catch here is to set the ExclusiveStartIndex of current request to the value of the LastEvaluatedIndex of previous request to get next set (logical page) of results.
I'm trying to implement the same but I'm using DynamoDBMapper, which seems to have lot more advantages like tight coupling with data models. So if I wanted to do the above, I'm assuming I would do something like below:
// Mapping of hashkey of the last item in previous query operation
Map<String, AttributeValue> lastHashKey = ..
DynamoDBQueryExpression expression = new DynamoDBQueryExpression();
...
expression.setExclusiveStartKey();
List<Table> nextPageResults = mapper.query(Table.class, expression);
I hope my above understanding is correct on paginating using DynamoDBMapper.
Secondly, how would I know that I've reached the end of results. From the docs if I use the following api:
QueryResult result = dynamoDBClient.query((QueryRequest) request);
boolean isEndOfResults = StringUtils.isEmpty(result.getLastEvaluatedKey());
Coming back to using DynamoDBMapper, how can I know if I've reached end of results in this case.

You have a couple different options with the DynamoDBMapper, depending on which way you want go.
query - returns a PaginatedQueryList
queryPage - returns a QueryResultPage
scan - returns a PaginatedScanList
scanPage - returns a ScanResultPage
The part here is understanding the difference between the methods, and what functionality their returned objects encapsulate.
I'll go over PaginatedScanList and ScanResultPage, but these methods/objects basically mirror each other.
The PaginatedScanList says the following, emphasis mine:
Implementation of the List interface that represents the results from a scan in AWS DynamoDB. Paginated results are loaded on demand when the user executes an operation that requires them. Some operations, such as size(), must fetch the entire list, but results are lazily fetched page by page when possible.
This says that results are loaded as you iterate through the list. When you get through the first page, the second page is automatically fetched with out you having to explicitly make another request. Lazy loading the results is the default method, but it can be overridden if you call the overloaded methods and supply a DynamoDBMapperConfig with a different DynamoDBMapperConfig.PaginationLoadingStrategy.
This is different from the ScanResultPage. You are given a page of results, and it is up to you to deal with the pagination yourself.
Here is quick code sample showing an example usage of both methods that I ran with a table of 5 items using DynamoDBLocal:
final DynamoDBMapper mapper = new DynamoDBMapper(client);
// Using 'PaginatedScanList'
final DynamoDBScanExpression paginatedScanListExpression = new DynamoDBScanExpression()
.withLimit(limit);
final PaginatedScanList<MyClass> paginatedList = mapper.scan(MyClass.class, paginatedScanListExpression);
paginatedList.forEach(System.out::println);
System.out.println();
// using 'ScanResultPage'
final DynamoDBScanExpression scanPageExpression = new DynamoDBScanExpression()
.withLimit(limit);
do {
ScanResultPage<MyClass> scanPage = mapper.scanPage(MyClass.class, scanPageExpression);
scanPage.getResults().forEach(System.out::println);
System.out.println("LastEvaluatedKey=" + scanPage.getLastEvaluatedKey());
scanPageExpression.setExclusiveStartKey(scanPage.getLastEvaluatedKey());
} while (scanPageExpression.getExclusiveStartKey() != null);
And the output:
MyClass{hash=2}
MyClass{hash=1}
MyClass{hash=3}
MyClass{hash=0}
MyClass{hash=4}
MyClass{hash=2}
MyClass{hash=1}
LastEvaluatedKey={hash={N: 1,}}
MyClass{hash=3}
MyClass{hash=0}
LastEvaluatedKey={hash={N: 0,}}
MyClass{hash=4}
LastEvaluatedKey=null

Related

How do I query for relationship data in spring data neo4j 4?

I have a cypher query that is supposed to return nodes and edges so that I can render a representation of my graph in a web app. I'm running it with the query method in Neo4jOperations.
start n=node({id}) match n-[support:SUPPORTED_BY|INTERPRETS*0..5]->(argument:ArgumentNode)
return argument, support
Earlier, I was using spring data neo4j 3.3.1 with an embedded database, and this query did a fine job of returning relationship proxies with start nodes and end nodes. I've upgraded to spring data neo4j 4.0.0 and switched to using a remote server, and now it returns woefully empty LinkedHashMaps.
This is the json response from the server:
{"commit":"http://localhost:7474/db/data/transaction/7/commit","results":[{"columns":["argument","support"],
"data":[
{"row":[{"buildVersion":-1},[]]},
{"row":[{"buildVersion":-1},[{}]]}
]}],"transaction":{"expires":"Mon, 12 Oct 2015 06:49:12 +0000"},"errors":[]}
I obtained this json by putting a breakpoint in DefaultRequest.java and executing EntityUtils.toString(response.getEntity()). The query is supposed to return two nodes which are related via an edge of type INTERPRETS. In the response you see [{}], which is where data about the edge should be.
How do I get a response with the data I need?

Disclaimer: this is not a definitive answer, just what I've pieced together so far.
You can use the queryForObjects method in Neo4jOperations, and make sure that your query returns a path. Example:
neo4jOperations.queryForObjects(ArgumentNode.class, "start n=node({id}) match path=n-[support:SUPPORTED_BY|INTERPRETS*0..5]->(argument:ArgumentNode) return path", params);
The POJOs that come back should be hooked together properly based on their relationship annotations. Now you can poke through them and manually build a set of edges that you can serialize. Not ideal, but workable.
Docs suggesting that you return a path:
From http://docs.spring.io/spring-data/data-neo4j/docs/4.0.0.RELEASE/reference/html/#_cypher_queries:
For the query methods that retrieve mapped objects, the recommended
query format is to return a path, which should ensure that known types
get mapped correctly and joined together with relationships as
appropriate.
Explanation of why queryForObjects helps:
Under the hood, there is a distinction between different types of queries. They have GraphModelQuery, RowModelQuery, and GraphRowModelQuery, each of which pass a different permutation of resultDataContents: ["row", "graph"] to the server. If you want data sufficient to reconstruct the graph, you need to make sure "graph" is in the list.
You can find this code inside ExecuteQueriesDelegate:
if (type != null && session.metaData().classInfo(type.getSimpleName()) != null) {
Query qry = new GraphModelQuery(cypher, parameters);
...
} else {
RowModelQuery qry = new RowModelQuery(cypher, parameters);
...
}
Using queryForObjects allows you to provide a type, and kicks things over into GraphModelQuery mode.

Passing List of Integers to GET REST API

I wanted to fetch the List of Entities from database at Front end.
So I have written POST REST HTTP call in Spring MVC.
But I read the HTTP documentation which says whenever you have to retrieve data from database prefer GET call.
So, Is it there is any I can replace the POST call to GET call from angular JS and pass list of Integers.
But, GET HTTP has many drawbacks like : the length of URL is limited.Considering the case where we have to fetch 1000 entities from database.
Please suggest me the possible way to get the entities or write GET REST API in Spring MVC for list of integers(refers to ID's of Entities).
For Example : Consider there are 100 books in book table, But I want only few books, say id : 5,65,42,10,53,87,34,23.
Thats why I am passing this List of Id's in a List of Integer in POST call.
Currently stuck how to convert this to GET call. In Short, how to pass List of Integers through GET REST call.

I prefer a variant through HTTP path variable for your problem, because of in REST ideology a resource ID is passed after a resource name 'http://../resource/id' and HTTP parameters are used for filtering.
Through HTTP parameters
If you need to pass your ids through HTTP parameters, see an axample below:
Here is your Spring MVC controller method:
#RequestMapping(value = "/books", params = "ids", method = RequestMethod.GET)
#ResponseBody
Object getBooksById_params(#RequestParam List<Integer> ids) {
return "ids=" + ids.toString();
}
And you can make a call using next variants:
http://server:port/ctx/books?ids=5,65,42
http://server:port/ctx/books?ids=5&ids=65&ids=42
Also take a look this discussion: https://stackoverflow.com/a/9547490/1881761
Through HTTP path variables
Also you can pass your ids through path variable, see an example below:
#RequestMapping(value = "/books/{ids}", method = RequestMethod.GET)
#ResponseBody
Object getBooksById_pathVariable(#PathVariable List<Integer> ids) {
return "ids=" + ids.toString();
}
And your call will be look like this: http://server:port/ctx/books/5,65,42

Pros of GET HTTP call : It is always used for retrieval of Data.(From this perspective : we should implemented for each and every and retrieval)
Through HTTP parameters
If you need to pass your ids through HTTP parameters, see an axample below:
Here is your Spring MVC controller method:
#RequestMapping(value = "/book", params = "ids", method = RequestMethod.GET)
#ResponseBody
Object getBooksById_params(#RequestParam List<Integer> ids) {
return "ids=" + ids.toString();
}
It works fine but for exceptional case : say URI is above 2048 characters. It means there are many Id's in the list(eg : 1000)
then its throws an exception :
return 414 (Request-URI Too Long)
which is http://www.checkupdown.com/status/E414.html
After some research MY UNDERSTANDING is : The HTTP protocol does not place any a priori limit on the lenght of a URI. Servers MUST be able to handle the URI of any resources they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414(Request_URI Too Long) status if a URI is longer than the server can handle.
I have also gone through sites like to get the GET URI length :
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.1
So conclusion, is stick to POST call when there can be variable URI length at runtime for not getting such exception[return 414 (Request-URI Too Long)]

Alternatively, to allow for complex queries you'll need to give the query its own rest language. So to create a query you POST to /library/query then you edit it with things like POST /library/query/12345 with data {id:34} and then you execute the query with GET /library/books?query=12345

Simply you can call list like this:
***/users?id=1&id=2

If you want to read entities from your API you have to use a GET call.
The better way to get them all is to use a query params as filter.
REST DESIGN cannot retrieve more than one entity each time by id.
An example:
GET /library/books/58731 -> returns only one book identified by 58731
GET /library/books?numPages>70 returns all the books with more than 70 pages
I think that If you need to retrieve a lot of books because they have some logic that all matches, try to put it as a queryString.
Another example:
GET /library/books?stored>20150101 returns all the books added to library on 2015
If you give us more information about the collection and the requirements we will answer more directly.

Overcoming querying limitations in Couchbase

We recently made a shift from relational (MySQL) to NoSQL (couchbase). Basically its a back-end for social mobile game. We were facing a lot of problems scaling our backend to handle increasing number of users. When using MySQL loading a user took a lot of time as there were a lot of joins between multiple tables. We saw a huge improvement after moving to couchbase specially when loading data as most of it is kept in a single document.
On the downside, couchbase also seems to have a lot of limitations as far as querying is concerned. Couchbase alternative to SQL query is views. While we managed to handle most of our queries using map-reduce, we are really having a hard time figuring out how to handle time based queries. e.g. we need to filter users based on timestamp attribute. We only need a user in view if time is less than current time:
if(user.time < new Date().getTime() / 1000)
What happens is that once a user's time is set to some future time, it gets exempted from this view which is the desired behavior but it never gets added back to view unless we update it - a document only gets re-indexed in view when its updated.
Our solution right now is to load first x user documents and then check time in our application. Sorting is done on user.time attribute so we get those users who's time is less than or near to current time. But I am not sure if this is actually going to work in live environment. Ideally we would like to avoid these type of checks at application level.
Also there are times e.g. match making when we need to check multiple time based attributes. Our current strategy doesn't work in such cases and we frequently get documents from view which do not pass these checks when done in application. I would really appreciate if someone who has already tackled similar problems could share their experiences. Thanks in advance.
Update:
We tried using range queries which works for only one key. Like I said in most cases we have multiple time based keys meaning multiple ranges which does not work.

If you use Date().getTime() inside a view function, you'll always get the time when that view was indexed, just as you said "it never gets added back to view unless we update it".
There are two ways:
Bad way (don't do this in production). Query views with stale=false param. That will cause view to update before it will return results. But view indexing is slow process, especially if you have > 1 milllion records.
Good way. Use range requests. You just need to emit your date in map function as a key or a part of complex key and use that range request. You can see one example here or here (also if you want to use DateTime in couchbase this example will be more usefull). Or just look to my example below:
I.e. you will have docs like:
doc = {
"id"=1,
"type"="doctype",
"timestamp"=123456, //document update or creation time
"data"="lalala"
}
For those docs map function will look like:
map = function(){
if (doc.type === "doctype"){
emit(doc.timestamp,null);
}
}
And now to get recently "updated" docs you need to query this view with params:
startKey="dateTimeNowFromApp"
endKey="{}"
descending=true
Note that startKey and endKey are swapped, because I used descending order. Here is also a link to documnetation about key types that couchbase supports.
Also I've found a link to a question that can also help.

EclipseLink JPA: Can I run multiple queries from one builder?

I have a method that builds and runs a Criteria query. The query does what I want it to, specifically it filters (and sorts) records based on user input.
Also, the query size is restricted to the number of records on the screen. This is important because the data table can be potentially very large.
However, if filters are applied, I want to count the number of records that would be returned if the query was not limited. So this means running two queries: one to fetch the records and then one to count the records that are in the overall set. It looks like this:
public List<Log> runQuery(TableQueryParameters tqp) {
// get the builder, query, and root
CriteriaBuilder builder = em.getCriteriaBuilder();
CriteriaQuery<Log> query = builder.createQuery(Log.class);
Root<Log> root = query.from(Log.class);
// build the requested filters
Predicate filter = null;
for (TableQueryParameters.FilterTerm ft : tqp.getFilterTerms()) {
// this section runs trough the user input and constructs the
// predicate
}
if (filter != null) query.where(filter);
// attach the requested ordering
List<Order> orders = new ArrayList<Order>();
for (TableQueryParameters.SortTerm st : tqp.getActiveSortTerms()) {
// this section constructs the Order objects
}
if (!orders.isEmpty()) query.orderBy(orders);
// run the query
TypedQuery<Log> typedQuery = em.createQuery(query);
typedQuery.setFirstResult((int) tqp.getStartRecord());
typedQuery.setMaxResults(tqp.getPageSize());
List<Log> list = typedQuery.getResultList();
// if we need the result size, fetch it now
if (tqp.isNeedResultSize()) {
CriteriaQuery<Long> countQuery = builder.createQuery(Long.class);
countQuery.select(builder.count(countQuery.from(Log.class)));
if (filter != null) countQuery.where(filter);
tqp.setResultSize(em.createQuery(countQuery).getSingleResult().intValue());
}
return list;
}
As a result, I call createQuery twice on the same CriteriaBuilder and I share the Predicate object (filter) between both of them. When I run the second query, I sometimes get the following message:
Exception [EclipseLink-6089] (Eclipse Persistence Services -
2.2.0.v20110202-r8913):
org.eclipse.persistence.exceptions.QueryException Exception
Description: The expression has not been initialized correctly. Only
a single ExpressionBuilder should be used for a query. For parallel
expressions, the query class must be provided to the ExpressionBuilder
constructor, and the query's ExpressionBuilder must always be on the
left side of the expression. Expression: [ Base
com.myqwip.database.Log] Query: ReportQuery(referenceClass=Log ) at
org.eclipse.persistence.exceptions.QueryException.noExpressionBuilderFound(QueryException.java:874)
at
org.eclipse.persistence.expressions.ExpressionBuilder.getDescriptor(ExpressionBuilder.java:195)
at
org.eclipse.persistence.internal.expressions.DataExpression.getMapping(DataExpression.java:214)
Can someone tell me why this error shows up intermittently, and what I should do to fix this?

Short answer to the question : Yes you can, but only sequentially.
In the method above, you start creating the first query, then start creating the second, the execute the second, then execute the first.
I had the exact same problem. I don't know why it's intermittent tough.
I other words, you start creating your first query, and before having finished it, you start creating and executing another.
Hibernate doesn't complain but eclipselink doesn't like it.
If you just start by the query count, execute it, and then create and execute the other query (what you've done by splitting it in 2 methods), eclipselink won't complain.
see https://issues.jboss.org/browse/SEAMSECURITY-91

It looks like this posting isn't going to draw much more response, so I will answer this in how I resolved it.
Ultimately I ended up breaking my runQuery() method into two methods: runQuery() that fetches the records and runQueryCount() that fetches the count of records without sort parameters. Each method has its own call to em.getCriteriaBuilder(). I have no idea what effect that has on the EntityManager, but the problem has not appeared since.
Also, the DAO object that has these methods used to be #ApplicationScoped. It now has no declared scope, so it is now constructed on demand from the various #RequestScoped and #ConversationScoped beans that use it. I don't know if this has any effect on the problem but since it has not appeared since I will use this as my code pattern from now on. Suggestions welcome.

Comparing entities while unit testing with Hibernate

I am running JUnit tests using in memory HSQLDB. Let's say I have a method that inserts some values to the DB and I am checking if the method inserted the values correctly. Note that order of the insertion is not important.
#Test
public void should_insert_correctly() {
MyEntity[] expectedEntities = new MyEntity[2];
// init expected entities
Inserter out = new Inserter(session); // out: object under test
out.insert();
List list = session.createCriteria(MyEntity.class).list();
assertTrue(list.contains(expectedEntities[0]));
assertTrue(list.contains(expectedEntities[1]));
}
The problem is I cannot compare expected entities to actual ones because the expected's id and the actual's id are different. Since setId() of MyEntity is private (to prevent setting id explicitly), I cannot set all of the entities' id to 0 and compare like that.
How can I compare two result set regardless of their ids?

I found this more practical. Instead of fetching all results at once, I am fetching results according to the criterias and asserting they are not null.
public void should_insert_correctly() {
Inserter out = new Inserter(session); // out: object under test
out.insert();
Criteria criteria;
criteria = getCriteria(session, 0);
assertNotNull(criteria.uniqueResult());
criteria = getCriteria(session, 1);
assertNotNull(criteria.uniqueResult());
}
private Criteria getCriteria(Session session, int i) {
Criteria criteria = session.createCriteria(MyEntity.class);
criteria.add(Restrictions.eq("x", expectedX[i]));
criteria.add(Restrictions.eq("y", expectedY[i]));
return criteria;
}

A stateful entity should not override equals -- that is, entities should be compared for equality by reference identity -- so List.contains will not work as you want.
What I do is use reflection to compare the fields of the original and reloaded entities. The function that walks over the fields of the objects ignores transient fields and those annotated as #Transient.
I don't find I need to ignore the id. When the object is first flushed to the database, Hibernate allocates it an id. When it is reloaded, the object will have the same id.
The flaw in your test is that you have not set transaction boundaries. You need to save the objects in one transaction. When you commit that transaction, Hibernate will flush the objects to the database and allocate their ids. Then in another transaction load the entities back from the database. You will get another set of objects that should have the same ids and persistent (i.e. non-transient) state.

I would try to implement Object.equals(Object) method in your MyEntity class.
List.contains(Object) uses Object.equals(Object) (Source: Java 6 API) to determine if an Object is in this list.
The method session.createCriteria(MyEntity.class).list(); returns a list of new instances with the values you inserted (hopefully).
So you need to compare the values. This is easily done via the implementation of Object.equals(Object).
Clarification edit:
You could ignore the ids in your equals method, so that the comparison only cares about "real values".
YAE (Yet Another Edit):
I recommend reading this article about the equals() method: Angelika Langer: Secrets Of Equal. It explains all background information very well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js