Selection appropriate STL container for logging Data - c++

I require logging and filtering mechanism in my client server application.where client may request log data based on certain parameter.
log will have MACID,date and time,command type and direction as field.
server can filter log data based on these parameter as well.
size of the the log is 10 mb afterwards the log will be override the message from beginning.
My approach is I will log data in to file as well in the STL container as "in memory" so that when the client request data server will filter the log data based on any criteria
So the process is server will first do the sorting on particular criteria on vector<> and then filter it using binary search.
I am planning to use vector as STL container for in memory logging data.
I am bit confused whether vector will appropriate on this situation or not.
since size of the data can max upto 10 mb in vector.
my question whether vector is fare enough for this case or not ?

I'd go with a deque, double ended queue. It's like a vector but you can add/remove elements from both ends.

I would first state that I would use a logging library since there are many and I assure you they will do a better job (log4cxx for ex). If you insist on doing this your yourself A vector is an appropriate mechanism but you will have to manually sort the data biased upon user requests. One other idea is to use sqllite and let it manage storing sorting and filtering your data.

The actual response will depend a lot on the usage pattern and interface. If you are using a graphical UI, then chances are that there is already a widget that implements that feature to some extent (ability to sort by different columns, and even filter). If you really want to implement this outside of the UI, then it will depend on the usage pattern, will the user want a particular view more than others? does she need only filtering, or also sorting?
If there is one view of the data that will be used in most cases, and you only need to show a different order a few times, I would keep an std::vector or std::deque of the elements, and filter out with remove_copy_if when needed. If a different sort is required, I would copy and sort the copy, to avoid having to re-sort back to time based to continue adding elements to the log. Beware if you the application keeps pushing data that you will need to update the copy with the new elements in place (or provide a fixed view and rerun the operation periodically).
If there is no particular view that occurs much more often than the rest, of if you don't want to go through the pain of implementing the above, take a look a boost multi index containers. They keep synchronized views of the same data with different criteria. That will probably be the most efficient in this last case, and even if it might be less efficient in the general case of a dominating view, it might make things simpler, so it could still be worth it.

Related

Understanding the difference btw sparse_hash_map/dense_hash_map/flat_hash_map of google?

After some survey, I know these 3 hash data structures e.g. sparse_hash/dense_hash/flat_hash individually from CppCon2017.
I know that they use some common points to make faster than std::unordered_map.
store key-value pointer in one cacheline e.g. dense_hash_map/flat_hash_map use one
metadata to store pointer of key/value to fit in one cacheline to
speed up.
collision handling e.g. dense_hash_map use quadratic
probing and flat_hash uses robin-hood?
But if they use the common mechanism, they shell be almost the same performance.
Any other algorithm details to differentiate them?

Storing named data, where the 'name' is larger than the 'data'?

I'm writing the logic portion of a game, and want to create, retrieve, and store values (integers) to keep track of progress. For instance, a door would create the pair ("location.room.doorlock", 0) in an std::map, and unlocking this door would set that value to 1. Anytime the player wants to go through this door, it would retrieve the value by that keyname to see if it's passable. (Just an example, but it's important that this information exist outside of the "door" object itself, as characters or other events might retrieve this data and act on it.)
The problem though is that the name (or map key) itself is far larger than the data it's referring to, which seems wasteful, and feels 'wrong' as a result.
Is there a commonly used or best approach for storing this type of data, one where the key isn't so much larger than the data itself?
It is possible to know how much space to allocate at compile time for the progress data itself, if it's important. It need not use std::map either, so long as I don't have to use raw array indices to get or store data.
It seems like you have two options, if you really want to diminish the size of the string (although the string length does not seem to be that bad at all).
You can either just change your naming conventions or implement hashing. Hashing can be implemented in the form of a hashmap (also known as an unordered map) or by hand (you can create a small program that hashes your names to an int, then use that as a pair). Hashmaps/unordered maps are probably your best bet, as there is a lot of support code out there for it and you don't run the risk of having to deal with bugs in your own programs.
http://www.cplusplus.com/reference/unordered_map/unordered_map/

Iterating through a list made up of a custom Class. How do I do it? C++

I am working on an assignment for my Operating Systems class. I am to simulate how a schedular works with Processes. I have a Process class which holds all the information about the processes. I also have a class called scheduler which holds two Process Lists, interactive and Real-Time.
Using a test text file, I am able to read through the file and place Processes into two lists. One for Interactive Processes and one for Real-time processes.
My issue is this. My professor did not let us know if he will put the processes in order of FCFS, as he said that they must be executed in that order. So, what I must now do is iterate through the lists and sort the Processes based on their arrival times. How do I iterate through the list?
I've tried using
list<Process>::iterator it;
for (it=super.interactive.begin() ; it !=super.interactive.end(); it++)
Where super is the name of the Scheduler object i'm using and interactive is the Interactive Process List.
But the issue with this is that since it's a list made out of Processes, I can't access the int starttime that tells me when the processes start because I don't know how to access individual Processes in these lists.
Any help would be much appreciated or suggestions on any other container I could use for this task would be greatly appreciated.
I first had it set up to use Queues but when it came time to iterate through it, I was told I couldn't. Which is why I've switched to links but i'm not too familiar with those.
My only other idea is to use just dynamic arrays but it would be nice to be able to use Lists because of the push_back() functions. I wouldn't have to worry about increasing the arrays capacity since with a list and a queue you can just add to the back.
One quality of iterators is that they act like pointers to the data you are iterating over, so if you want starttime (and it is public), you can do it->starttime inside of your loop.
But first, you probably don't want std::list. Use std::vector instead, which behaves like a "dynamic array" but handles all of the memory allocation internally. Random access is going to be helpful for keeping a sorted list.
Next, you need a way to sort. Luckily, the standard library has std::sort. You will need to either overload operator< or provide a BinaryPredicate (as described in the link).
But the issue with this is that since it's a list made out of Processes, I can't access the int starttime that tells me when the processes start because I don't know how to access individual Processes in these lists.
For that, you can do : (*it).something(), or the better looking : it->something().
The only reason I asked is because I needed to sort a bunch of my Process classes. Turns out, the rest of my class is assuming the professor will format the input text as first come first serve so I needn't worry about it. Thanks for the help you two. It did help figuring out other bits of the assignment :D

Fastest C++ Container: Unique Values

I am writing an email application that interfaces with a MySQL database. I have two tables that are sourcing my data, one of which contains unsubscriptions, the other of which is a standard user table. As of now, I'm creating a vector of pointers to email objects, and storing all of the unsubscribed emails in it, initially. I then have a standard SQL loop in which I'm checking to see if the email is not in the unsubscribe vector, then adding it to the global send email vector. My question, is, is there a more efficient way of doing this? I have to search the unsub vector for every single email in my system, up to 50K different ones. Is there a better structure for searching? And, a better structure for maintaining a unique collection of values? Perhaps one that would simply discard the value if it already contains it?
If your C++ Standard Library implementation supports it, consider using a std::unordered_set or a std::hash_set.
You can also use std::set, though its overhead might be higher (it depends on the cost of generating a hash for the object versus the cost of comparing two of the objects several times).
If you do use a node based container like set or unordered_set, you also get the advantage that removal of elements is relatively cheap compared to removal from a vector.
Tasks like this (set manipulations) are better left to what is MEANT to execute them - the database!
E.g. something along the lines of:
SELECT email FROM all_emails_table e WHERE NOT EXISTS (
SELECT 1 FROM unsubscribed u where e.email=u.email
)
If you want an ALGORITHM, you can do this fast by retrieving both the list of emails AND a list of unsubscriptions as ORDERED lists. Then you can go through the e-mail list (which is ordered), and as you do it you glide along the unsubscribe list. The idea is that you move 1 forward in whichever list has the "biggest" current" element. This algo is O(M+N) instead of O(M*N) like your current one
Or, you can do a hash map which maps from unsubscribed e-mail address to 1. Then you do find() calls on that map whcih for correct hash implementations are O(1) for each lookup.
Unfortunately, there's no Hash Map standard in C++ - please see this SO question for existing implementations (couple of ideas there are SGI's STL hash_map and Boost and/or TR1 std::tr1::unordered_map).
One of the comments on that post indicates it will be added to the standard: "With this in mind, the C++ Standard Library Technical Report introduced the unordered associative containers, which are implemented using hash tables, and they have now been added to the Working Draft of the C++ Standard."
Store your email adresses in a std::set or use std::set_difference().
The best way to do this is within MySQL, I think. You can modify your users table schema with another column, a BIT column, for "is unsubscribed". Better yet: add a DATETIME column for "date deleted" with a default value of NULL.
If using a BIT column, your query becomes something like:
SELECT * FROM `users` WHERE `unsubscribed` <> 0b1;
If using a DATETIME column, your query becomes something like:
SELECT * FROM `users` WHERE `date_unsubscribed` IS NULL;

Caching data from MySQL DB - technique and appropriate STL container?

I am designing a data caching system that could have a very large amount of records held at a time, and I need to know what stl container to use and how to use it. The application is that I have an extremely large DB of records for users - when they log in to my system I want to pull their record and cache some data such as username and several important properties. As they interact with the system, I update and access their properties. Several properties are very volatile and I'm doing this to avoid "banging" on the DB with many transactions. Also, I rarely need to be using the database for sorting or anything - I'm using this just like a glorified binary save file (which is why I am happy to cache records to memory..); a more important goal for me is to be able to scale to huge numbers of users.
When the user logs out, server shuts down, or periodically in round-robin fashion (just in case..), I want to write their data back to the DB.
The server keeps its own:
vector <UserData *> loggedInUsers;
With UserData keeping things like username (string) and other properties from the DB, as well as other temporary data like network handles.
My first Q is, if I need to find a specific user in this vector, what's the fastest way to do that and is there a different stl container I can use to do this faster? What I do now is create an iterator, start it at loggedInUsers.begin() and iterate to .end(), checking *iter->username == "foo" and returning when it's found. If the username is at the end of the vector, or if the vector has 5000 users, this is a significant delay.
My second Q is, how can I round-robin schedule this data to be written back to the DB? I can call a function every time I'm ready to write a few records to the DB. But I can't hold an iterator to the vector, because it will become invalid. What I'd like to do is have a rotating queue where I can access the head of the queue, persist it to the DB, then rotate it to be the end of the queue. That seems like a lot of overhead.. what type could I use to do this better?
My third Q is, I'm using MySQL server and libmysqlclient connector/C.. is there any kind of built in caching that could solve this problem "for free", or is there a different technique altogether? I'm open to suggestions
A1. you're better off with a map, this is a tree that does the lookup for you. Test with a map and (assuming you have the right compiler) or a hash_map (which does the same thing, but the lookup mechanism is different). They have different performance characteristics for different types of data storage workloads.
A2. A list would probably be better for you - push to the front, pull off the end. (a deque could also be used, but you cannot keep an iterator if you erase from it, you can with a list). push_back and pop_front (or vice-versa) will allow you to keep a rolling queue of cached data.
A3. You could try SQLite, which is a mini-database designed for simple application-level db storage needs. It can work entirely in-memory too.
You don't say what your system does or how it's accessed, but this kind of technique probably won't scale well (because eventually you'll run out of memory and whatever you use to find information won't be as efficient as a database) and won't necessarily handle concurrent users properly, unless you make sure that data can be shared properly between them.
That said.. you might be better off using a map (http://www.cplusplus.com/reference/stl/map/) with the username as the key.
In terms of writing it back to the database, why not store a separate structure (a queue) that you can clear every time you write it to the database? As long as you're storing pointers it won't use much more memory. Which brings me to.. rather than using pointers you should take a look at smart pointers (for example boost's shared_ptr) which let you pass them around without worrying about ownership.