Writing just a function that saves given objects to sqlite - clojure

I am only starting off with clojure, and am stuck thinking how to implement a seemingly straightforward functionality.
There is a function generator, which takes (among others) an saver function as an argument. The generator does all sorts of stuff and generates certain data objects regularly, which need to be saved. This saving is supposed to be handled by the saver function, and so the generator calls the saver with the data that needs to be saved, every time data is generated.
Now, one saver function I am to write is one that saves the data to a sqlite db. How should I go about this?
One strategy I thought of is to create a connection to the sqlite db in the saver function. Create a new connection every time data is to be saved, save the data (only one row in one table) and close the connection. This seemed to be a bit inefficient. Especially considering the data gets generated every 2-5 secs.
Another idea is to keep an open connection as a module-level var, which is set to nil at start. The connection is opened the first time the saver function is called and is reused in the subsequent calls. This seems it would probably be more efficient, but to my knowledge, it would require a def form inside of the saver function. Personally, I don't enjoy doing that.
One more (crazy?) thought I had was to use an agent that saves the connection object, initially set to nil. The saver will be a function that sends data to the agent. The agent, creates the connection the first time it needs it, and saves in its associated data object. This looks like it might work well, but agents aren't designed for this, are they?
So, how do you people address problem? Is there any other paradigm suited just for this case? Or should I do one of the above?
PS. I spent a good deal of time writing this as it's very hard to put my problem in words. I'm not sure if I got it all right. Let me know if something is unclear.

your second solution sounds best. if you don't want to use a mutable Var (created via def) then you could create the connection in a "factory" function as a simple immutable value (so it's just carried around in the closure):
(defn sqlite-saver-factory [path]
(let [db-connection (open-sqlite-connection path)]
(fn [data]
(save-to-sqlite db-connection data))))
...
(generator (sqlite-saver-factory path) ...)
disclaimer: i am no great clojure expert - the above is just how i would do this in pretty much any functional language. so perhaps there is a more idiomatic clojure approach.

Related

C++, server-side logging multiply files at once

I do have a 2D game server written in C++ that can host up to 1000 players (that limit is because of map size, not by a performance by any means).
It's optimization and so is more-or-less great, as it was written by some really good developers.
What I am now trying to do, is to attach logging for each and every player. I do want to log such actions like:
Move things within, talks, die, login and so on and so on (so basically not everything, but fairly bit of what the player is doing)
I am not having problem programming-wise, rather I do lack knowledge about how to (performance) handle it and if my attempt is any good. What I am trying to do right now is:
I, for some reason, thought that it would be good to have each player log save to different files using ofstream.open. (I do believe that's an actually bad idea, but (1.) is it really? (2.) And why is it so? (3.) Can it handle up to 1000 files open?) I do have a class LoggingPlayer; on player login, I create an instance of this class and follow players actions by sending information to logging function.
I do save write to file when the buffer's full. I do close file only when player logs-off.
If I lose file - that's not the big deal, really. If the server crashes or slows for few seconds - that's a big deal for me.
I do have a few questions regarding:
(4.) How can I make it better? (e.g. using one logger for all players? log in one file?)
(5.) Should I totally abandon trying to make my own logger and try the already existing ones?
(5a.) If so, what are the recommendations for such a case? (online with 500-1000 users). I heard about boost log, but never tried.
I do appreciate all the help; so if you know the answer just for the one question, please don't hesitate.

Should I store a list in memory or in a database and should I build a class to connect to DB?

I am writing a C++ program, I have a class that provides services for the rest of the clases in the program.
I am writing now the clases and the UML.
1) the class that I refer to has a task list that is changing over time and conditions are being checked on this list, I am thinking to keep it in a table in a databasse that every line in the table would represent a task, this way in case that the program crashes or stops working I can restore the last situation, the other option is to keep the task list in memory and keep a copy in the database.
the task list should be searched every second
Which approach is more recommended?
2) In order to write and to read to the database I can call the database directly from the class or build a database communication class, if I write a data communication class I need to give specific options and to build a mini server for this,
e.g. write a line to the database, read a line to the database, update only the first column etc..
what is the recommended approach for this?
Thanks.
First, if the database is obvious and easy, and there are no performance problems, just do that. You're talking about running a query once/second, and maybe marking a task done or adding a new one every so often; even sqlite on a slow SMB share should be able to handle that just fine.
If you do need to optimize it, then there are two approaches: Either still with the database and cache it in-memory, or use memory as your primary storage and come up with a persistence mechanism that uses the database. But until you need to optimize it, don't.
Next, how should you do it? Your question makes it sound like you're thinking in terms of a whole three-tier system, with a "mini-server" sitting between the database server and your task list. There's really no need for that. What you want is a bespoke ORM, but that makes it sound more complicated than it is. All you're doing is writing a class that wraps a database connection and provides a handful of methods—get_due, mark_done, add, get_next_id—each of which maps SQL parameters to Task members. For example (with no error handling):
void mark_done(Task task) {
db.execute("UPDATE Task SET done=true WHERE id=%s", task.id);
}
Three more methods like that, plus a constructor to connect to the database (including creating the Task table if it didn't already exist), and your class is done.
The reason you don't want to write the database stuff directly into Task is that you don't really have anywhere to store shared information like the database connection object; either you need globals (or class attributes, which are effectively globals), or you need copies in every single Task instance (or, really, weak references—which you're going to fake with either a reference or a raw pointer, either way leading to shutdown problems somewhere down the line).
Finally, your whole reason for doing this is error recovery, and databases do a great job of journaling so nothing ever gets inconsistent, but you do have to make sure to structure your app to take advantage of that. For example, you may want to mark all the now-due tasks "in process", then process them, then mark them all "done"; that way, at recovery time, you know exactly which tasks may or may not have been done, and can act appropriately. The more steps you can commit to the database, the less data loss you have to deal with—but of course the more code you have to write, and the slower it gets. So, do as much as necessary, but no more.
Saving information in Database just to recover crashed information may be bit of an overkill.
You ideally want to serialize the list and save it - as binary, xml or csv based values. This can be done based on a timer or certain events in your applications.
Databases may also be used if you can come up with a structure that looks exactly similar to tables - so that you can do one-to-one mapping between the objects and probably write SQL queries easily. But keep that on a separate layer for abstraction.

Save, Load and Replay game. c++

Basically as part of a team I have had to create a pacman like game for my university course, just zombies instead of ghosts.
We have built all of the game so far and it seems to work really well. Our current problem is that we have to Save a game (with a username and score), Load the game into the position it was once saved, with the correct username and score, and finally be able to offer a replay option where the user can see all the moves that they have previously made (as well as the moves the zombies have made). The zombies will always make the same moves that the user makes as they are designed to chase the user.
My question is what would be the best way to do the save, load and reload options? We cannot use vectors, stacks or queues. We can only really use strings, arrays and other basic variables.
We were thinking to do the reload first by adding everything onto the end of a string and then popping the last value off the string. We could then delay each one by a second and the user will be able to see his/her moves.
As for Saving we were unsure, there are also holes (0 symbols) and pills (* symbols) to take into account. So the position of character, zombies, pills and holes will need to be saved. The character can start from any random position and pretty much everything else is placed after.
The way we do the loading will depend on the way you suggest we do the saving.
Does anyone have any suggestions of the way we should do save, load and replay?
thanks
The simplest way I could think of is saving the user inputs.
This way you could easily replay the game by sending the inputs to the game engine (this may require a lot of restructuring depending on the design of the game engine). To accelerate the loading you could also save the game state at the time of the save (through serialization).
That's the idea, how to do it... you need an ever-expanding array to record the user-input, so let's use a linked-list.
struct Node {
T data;
Node* next_node;
};
//Google for the rest of the code, it is a reeeaaallly
// basic/fundamental data structure.
The data would be the user-inputs and the time they happened.
To save the data, you simply have to iterate through the linked-list and append it to a std::ostream& (to be generic, a std::ofstream& to be specific).
You may add some other useful information (such as the game state and the highscore) before or after the user inputs (or even in another file, which would really make sense for the highscores).
You'll need to read up on some serialization. I wrote some articles on it here, but this is going to be overkill for you guys: http://www.randygaul.net/2013/01/05/c-reflection-part-5-automated-serialization/
You can use some very simple serialization to write out the moves of each zombie into a file. Then when you want to reload this information you deserialize the information in the file. Each move will likely be stored in some form of a linked list so you'll have to come up with a way of recreating such lists upon deserialization.
Your question is really broad so my answer has to be quite broad as well. Really it's up to you to research a solution and implement it.

Pattern for synching object lists among computers (in C++)?

I've got an app that has about 10 types of objects. There will be potentially a few thousand object instances of each type. These lists of objects need to stay synchronized between apps running on different machines. If an object is added, changed or deleted, that needs to propagate to the other machines.
This will be a star topology -- there is a central master, and the rest are clients.
I DO have the concept of a session, so can store data about each client.
Is there a good design pattern to follow for this? Even better, is there a (template based?) library that would handle asking the container what has changed since client X came by and getting that delta to send out?
Right now I'm thinking every object-type container has an update counter. When something is added/changed/removed, the update counter is incremented, and the changed object(s) are tagged with that value. Each client will save the value of the update counter when it gets an update. Later it will come back and ask for any changes since it's update counter value. Finally, deletes are kept as tombstone records (although I'm not exactly sure when to clear them out).
One thing that makes this harder is clients can come and go without the central server necessarily knowing, although I guess there could be a timeout concept (if the server haven't heard from a client in 5 minutes, it assumes the client is gone)
Is this a well-known pattern? Any additional suggestions?
How you implement synchronization very much depends on your needs. Do the changes need to be sent to the clients, or is it sufficient that the clients checks if an object is up to date whenever it uses the objects? How bout using the Proxy pattern? This pattern allows you to create a proxy-implementation of your objects that can check if they are up to date or not, do update if they are not, and then return the result. I would do this by having a lastChanged timestamp on the objects on the master and a lastUpdated timestamp on the client objects. If latency is an issue checking if an object is up-to-date on each call is probably not a good idea. Consider having a separate thread that queries the master for changed objects and marks them "dirty". This could dramatically reduce the network traffic as well.
You could also look into the Observer pattern and Publish/Subscribe.
An option that might be simple to implement and still pretty efficient is to treat the pile of objects as an opaque blob and use librsync to synchronize them. It sounds like all of the updates flow one direction, from master to clients, and there's probably some persistent representation of the objects on the clients -- a file or something. I'm assuming it's a file for the rest of this answer, though any sequence of bytes can be used.
The way it would work is that each client would generate a librsync "signature" of its local copy of the blob and send that signature to the master. The signature is about 1% of the size of the blob. The master would then use librsync to compute a delta between that signature and the current data, and send the delta to the client, which would use librsync to apply the delta to its local copy of the blob.
The librsync API is simple, and the signature/delta data transfer is relatively efficient.
If that's not workable, it may still be useful to take a more manual "delta-based" approach, to avoid having to do per-object versioning. Each time the master makes a change, it should log that change to a journal, recording what was done and to which object. Versioning is done at the whole-database level, so in effect a version number is assigned to each journal entry.
When a client connects, it should send its version of the whole object collection, and the server can then respond with the contents of the journal between the client's version and the newest entry. If updates on a given object are done by completely replacing the object contents, then you can optimize this by filtering out all but the most recent version of each object. If the master also keeps track of which versions it has sent to which client, it can know when it is safe to discard old journal entries. Even if it doesn't track that, you can still discard old journal entries according to some heuristic (probably just age) and if you receive a connection from a client whose last version is older than your oldest journal entry, then you just have to send the entire set of objects to that client.

How should I link a data Class to my GUI code (to display attributes of object, in C++)?

I have a class (in C++), call it Data, that has thousands of instances (objects) when the code is run. I have a widget (in Qt), call it DataWidget that displays attributes of the objects. To rapidly build the widget I simply wrote the object attributes to a file and had the widget parse the file for the attributes - this approach works, but isn't scalable or pretty.
To be more clear my requirements are:
1 - DataWidget should be able to display multiple, different, Data object's attributes at a time
2 - DataWidget should be able to display thousands of Data objects per second
3 - DataWidget should be run along side the code that generates new Data objects
4 - each Data object needs to be permanently saved to file/database
Currently, the GUI is created and the DataWidget is created then the experiment runs and generates thousands of Data objects (periodically writing some of them to file). After the experiment runs the DataWidget displays the last Data object written to file (they are written to XML files).
With my current file approach I can satisfy (1) by grabbing more than one file after the experiment runs. Since the experiment isn't tied to DataWidget, there is no concurrency, so I can't do (3) until I add a signal that informs the DataWidget that a new file exists.
I haven't moved forward with this approach for 2 reasons:
Firstly, even though the files aren't immediately written to disk, I can't imagine that this method is scalable unless I implement a caching system - but, this seems like I'm reinvent the wheel? Secondly, Data is a wrapper for a graph data-structure and I'm using Graphml (via Boost Graph Library i.e. write_graphml()) to write the structure to XML files, and to read the structure back in with Boost's read_graphml() requires me to read the file back into a Data object ... which means the experiment portion of the program encodes the object into XML, writes the XML to a file (but hopefully in memory and not to disk), then the DataWidget reads the XML from a file and decodes it into an object!
It seems to me like I should be using a database which would handle all the caching etc. Moreover, it seems like I should be able to skip the file/database step and pass the Data to the DataWidget in the program (perhaps pass it a reference to a list of Data). Yet, I also want to save the Data to file to the file/database step isn't entirely pointless - I'm just using it in the wrong way at the wrong time.
What is the better approach given my requirements?
Are there any general resources and/or guidelines for handling and displaying data like this?
I see you're using Qt. This is good because Qt 4.0 and later includes a powerful model/view framework. And I think this is what you want.
Model/View
Basically, have your Data class inherit and implement QAbstractItemModel, or a different Qt Model class, depending on the kind of model you want. Then set your view widget (most likely a QListView) to use Data for its model.
There are lots of examples at their site and this solution scales nicely with large data sets.
Added: This model test code from labs.trolltech.com comes in real handy:
http://labs.trolltech.com/page/Projects/Itemview/Modeltest
It seems to me like I should be using
a database which would handle all the
caching etc. Moreover, it seems like I
should be able to skip the
file/database step and pass the Data
to the DataWidget in the program
(perhaps pass it a reference to a list
of Data). Yet, I also want to save the
Data to file to the file/database step
isn't entirely pointless - I'm just
using it in the wrong way at the wrong
time.
If you need to display that much rapidly changing data, having an intermediate file or database will slow it down and likely become the bottleneck. I think the Widget should read the newly generated data directly from memory. This doesn't prevent you from storing the data in a file or database though, it can be done in a separate thread/process.
If all of the data items will fit in memory, I'd say put them in a vector/list, and pass a reference to that to the DataWidget. When it's time to save them, pass a reference to your serializing method. Then your experiment just populates the data structure for the other processes to use.