I want to perform a transaction with multiple write operations (~5 inserts/updates to different tables) in Cassandra but if any of them fail, then the rest should not be written (either rollback each operation or fail the whole transaction).
Please let me know what is the proper approach to perform this in Cassandra and how to do it (an example will be welcomed).
Yes, you can use the logged batch functionality to accomplish this atomically. Note, that you do take a hit on performance. See the BATCH Statements documentation section of the C++ Driver.
Here is an example of how to do this in C++, taken from the documentation link above. It demos showing how to batch an INSERT, UPDATE and a DELETE together:
/* This logged batch will makes sure that all the mutations eventually succeed */
CassBatch* batch = cass_batch_new(CASS_BATCH_TYPE_LOGGED);
/* Statements can be immediately freed after being added to the batch */
{
CassStatement* statement
= cass_statement_new(cass_string_init("INSERT INTO example1(key, value) VALUES ('a', '1')"), 0);
cass_batch_add_statement(batch, statement);
cass_statement_free(statement);
}
{
CassStatement* statement
= cass_statement_new(cass_string_init("UPDATE example2 set value = '2' WHERE key = 'b'"), 0);
cass_batch_add_statement(batch, statement);
cass_statement_free(statement);
}
{
CassStatement* statement
= cass_statement_new(cass_string_init("DELETE FROM example3 WHERE key = 'c'"), 0);
cass_batch_add_statement(batch, statement);
cass_statement_free(statement);
}
CassFuture* batch_future = cass_session_execute_batch(session, batch);
/* Batch objects can be freed immediately after being executed */
cass_batch_free(batch);
/* This will block until the query has finished */
CassError rc = cass_future_error_code(batch_future);
printf("Batch result: %s\n", cass_error_desc(rc));
cass_future_free(batch_future);
Related
I am currently building a map/reduce script in NetSuite which passes the results of a saved search from the getInputData stage to the map stage. This is being done by first running a WHILE loop in the getInputData stage to obtain the internal ids of each entry, inserting into an array, then passing over to the map stage. Like so:
// run saved search - unlimited rows from saved search.
do {
var subresults = invoiceSearch.run().getRange({ start: start, end: start + pageSize });
results = results.concat(subresults);
count = subresults.length;
start += pageSize + 1;
} while (count == pageSize);
var invSearchArray = [];
if(invoiceSearch){
//NOTE: .run().each has a limit of 4,000 results, hence the do-while loop above.
for (var i = 0; i < results.length; i++){
var invObj = new Object();
invObj['invID'] = results[i].getValue({name: 'internalid'});
invSearchArray.push(invObj);
}
}
return invSearchArray;
I implemented it this way because I feared there would be result restrictions, just as the ".run().each" function has (limited to 4000 results).
I made the assumption that passing the search object directly from getInputData to Map would have restricted results of 4000 as well. Can someone offer clarity on whether there are such restrictions? Am I right to fear the script holting prematurely because search results cannot be processed beyond 4000 in the getInputData stage of a map/reduce script?
Any example to aid me in understanding how a search object is processed in a map/reduce script would be most appreciated.
Thanks
If you simply return the Search instance, all results will be passed along to map, beyond the 1000 or 4000 limits of the getRange and each methods.
If the Search has 8500 results, all 8500 will get passed to map.
function getInputData() {
return search.load(...); // alternatively search.create(...)
}
using
call ldaps_search(handle,shandle,filter, attrs, num, rc);
with Microsoft Active Directory i get WARNING: LDAP server sizelimit was exceeded.
is there a way to page through somehow in sas?
i have tried ldaps_setOptions with sizeLimit=2000 for example but still generates the warning, as i guess is set on Microsofts side
Thanks
sample:
more = 1;
do while (more eq 1);
call ldaps_search_page(handle, shandle, filter, attrs, num, rc, more, 1000);
if rc ne 0 then do;
more = 0;
msg = sysmsg();
put msg;
end;
/* free search results page */
if shandle NE 0 then do;
call ldaps_free(shandle,rc);
end;
end;
It's not possible to control LDAP server sizelimit from the client side (see AD's MaxPageSize), but yes you can still work around this via paging controls.
The idea is to request a paged result set, with a number of entries per page less than server's MaxPageSize limit.
SAS provides the call ldaps_search_page routine that returns only a single page for a given search request and requires subsequent calls to retrieve the entirety of the results :
CALL LDAPS_SEARCH_PAGE(lHandle, sHandle, filter, attr, num, rc, more <, pageSize>);
pageSize (optional) specifies a positive integer value, which is the number of
results on a page of output. By default, this value is set to 50. If
pageSize is 0, this function acts as if paging is turned off. This
argument is case-insensitive.
For example if a query matches n results (exceeding server side limit) and the page size is set to 50, you need to make up to ceil(n/50) calls.
Here is an example taken from the doc, it uses the more argument in a loop to continue retrieving paged results until there is no more information to retrieve :
more = 1;
do while (more eq 1);
call ldaps_search_page(handle, shandle, filter, attrs, num, rc, more, 50);
...
/* free search results page */
if shandle NE 0 then do;
call ldaps_free(shandle,rc);
end;
end;
https://documentation.sas.com/api/docsets/itechdsref/9.4/content/itechdsref.pdf
For those having trouble with more stuck at 1 thus causing the code above to loop forever (I don't know why the reference wouldn't get updated but OP was in this situation), actually you don't need it, incrementing a counter until the number of fetched entries reaches num should do the trick.
We are using DynamoDB for counting user actions and an item must be either inserted or updated, depending on whatever it's already exists. The code must also update a counter. Right now we do this with 2 steps:
using (var client = AWSClientFactory.CreateAmazonDynamoDBClient(RegionEndpoint.USEast1))
{
var table = Table.LoadTable(client, TableName);
var item = await table.GetItemAsync(id);
if (item == null)
{
// row not exists -> insert & return 1
var document = new Document();
document["Id"] = id;
document["Counter"] = 1;
await table.PutItemAsync(document);
return 1;
}
// row exists -> increment counter & update
var counter = item["Counter"].AsInt();
item["Counter"] = counter + 1;
await table.UpdateItemAsync(item);
return counter + 1;
}
The problem with the code is that it increases latency times & server load. I would prefer to do this with a single operation. I think this should be possible with conditional expressions but I cannot figure out how to do this using .NET SDK.
Be careful about incrementing counters yourself, as you could have race conditions if multiple instances of your app can increment the counter. Instead use DynamoDB Atomic Counters. For example, my ruby code calls the UpdateItem API with the following (older) way of incrementing counters:
{"counter" => {value: {n: "1"}, action: "ADD"}}
The newer way is to use an Update Expression, which I haven't implemented yet. Also, if the counter/item doesn't already exist, it will assume the value is 0 and increment the counter to 1.
you have a race condition in your code.
it's possible that 2 different worker create the item at the same time.
the recommended pattern for what you are trying to do is:
create if not exists operation for the item.
atomic counter update on "Count"
So instead of 3 operations (get, put, update) - that also have a race condition - in this case you will only have 2 operations (and the correct behavior)
hope this helps.
When I call setMaxResults on a query, it seems to want to treat the max number as "2", no matter what it's actual value is.
function findMostRecentByOwnerUser(\Entities\User $user, $limit)
{
echo "2: $limit<br>";
$query = $this->getEntityManager()->createQuery('
SELECT t
FROM Entities\Thread t
JOIN t.messages m
JOIN t.group g
WHERE
g.ownerUser = :owner_user
ORDER BY m.timestamp DESC
');
$query->setParameter("owner_user", $user);
$query->setMaxResults(4);
echo $query->getSQL()."<br>";
$results = $query->getResult();
echo "3: ".count($results);
return $results;
}
When I comment out the setMaxResults line, I get 6 results. When I leave it in, I get the 2 most recent results. When I run the generated SQL code in phpMyAdmin, I get the 4 most recent results. The generated SQL, for reference, is:
SELECT <lots of columns, all from t0_>
FROM Thread t0_
INNER JOIN Message m1_ ON t0_.id = m1_.thread_id
INNER JOIN Groups g2_ ON t0_.group_id = g2_.id
WHERE g2_.ownerUser_id = ?
ORDER BY m1_.timestamp DESC
LIMIT 4
Edit:
While reading the DQL "Limit" documentation, I came across the following:
If your query contains a fetch-joined collection specifying the result limit methods are not working as you would expect. Set Max Results restricts the number of database result rows, however in the case of fetch-joined collections one root entity might appear in many rows, effectively hydrating less than the specified number of results.
I'm pretty sure that I'm not doing a fetch-joined collection. I'm under the impression that a fetch-joined collection is where I do something like SELECT t, m FROM Threads JOIN t.messages. Am I incorrect in my understanding of this?
An update : With Doctrine 2.2+ you can use the Paginator http://docs.doctrine-project.org/en/latest/tutorials/pagination.html
Using ->groupBy('your_entity.id') seem to solve the issue!
I solved the same issue by only fetching contents of the master table and having all joined tables fetched as fetch="EAGER" which is defined in the Entity (described here http://www.doctrine-project.org/docs/orm/2.1/en/reference/annotations-reference.html?highlight=eager#manytoone).
class VehicleRepository extends EntityRepository
{
/**
* #var integer
*/
protected $pageSize = 10;
public function page($number = 1)
{
return $this->_em->createQuery('SELECT v FROM Entities\VehicleManagement\Vehicles v')
->setMaxResults(100)
->setFirstResult($number - 1)
->getResult();
}
}
In my example repo you can see I only fetched the vehicle table to get the correct result amount. But all properties (like make, model, category) are fetched immediately.
(I also iterated over the Entity-contents because I needed the Entity represented as an array, but that shouldn't matter afaik.)
Here's an excerpt from my entity:
class Vehicles
{
...
/**
* #ManyToOne(targetEntity="Makes", fetch="EAGER")
* #var Makes
*/
public $make;
...
}
Its important that you map every Entity correctly otherwise it won't work.
*Basically I'm trying to order objects by their score over the last hour.
I'm trying to generate an hourly votes sum for objects in my database. Votes are embedded into each object. The object schema looks like this:
{
_id: ObjectId
score: int
hourly-score: int <- need to update this value so I can order by it
recently-voted: boolean
votes: {
"4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
"_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
"a": 1, <- Vote amount
"ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
"ts": 1313452894 <- Created at timestamp
},
... repeat ...
}
}
This question is actually related to a question I asked a couple of days ago Best way to model a voting system in MongoDB
How would I (can I?) run a MapReduce command to do the following:
Only run on objects with recently-voted = true OR hourly-score > 0.
Calculate the sum of the votes created in the last hour.
Update hourly-score = the sum calculated above, and recently-voted = false.
I also read here that I can perform a MapReduce on the slave DB by running db.getMongo().setSlaveOk() before the M/R command. Could I run the reduce on a slave and update the master DB?
Are in-place updates even possible with Mongo MapReduce?
You can definitely do this. I'll address your questions one at a time:
1.
You can specify a query along with your map-reduce, which filters the set of objects which will be passed into the map phase. In the mongo shell, this would look like (assuming m and r are the names of your mapper and reducer functions, respectively):
> db.coll.mapReduce(m, r, {query: {$or: [{"recently-voted": true}, {"hourly-score": {$gt: 0}}]}})
2.
Step #1 will let you use your mapper on all documents with at least one vote in the last hour (or with recently-voted set to true), but not all the votes will have been in the last hour. So you'll need to filter the list in your mapper, and only emit those votes you wish to count:
function m() {
var hour_ago = new Date() - 3600000;
this.votes.forEach(function (vote) {
if (vote.ts > hour_ago) {
emit(/* your key */, this.vote.a);
}
});
}
And to reduce:
function r(key, values) {
var sum = 0;
values.forEach(function(value) { sum += value; });
return sum;
}
3.
To update the hourly scores table, you can use the reduceOutput option to map-reduce, which will call your reducer with both the emitted values, and the previously saved value in the output collection, (if any). The result of that pass will be saved into the output collection. This looks like:
> db.coll.mapReduce(m, r, {query: ..., out: {reduce: "output_coll"}})
In addition to re-reducing output, you can use merge which will overwrite documents in the output collection with newly created ones (but leaving behind any documents with an _id different than the _ids created by your m-r job), replace, which is effectively a drop-and-create operation and is the default, or use {inline: 1}, which will return the results directly to the shell or to your driver. Note that when using {inline: 1}, your results must fit in the size allowed for a single document (16MB in recent MongoDB releases).
(4.)
You can run map-reduce jobs on secondaries ("slaves"), but since secondaries cannot accept writes (that's what makes them secondary), you can only do this when using inline output.