How can I dirty parts of an array efficiently? [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
EDIT: apparently I asked this question wrong. Before voting to close, please allow me to know what the question is missing. I promise you, this is not an unanswerable question. You can always come back and vote to close it later.
I'm currently working in C++, but I think this question applies to most compiled languages.
In my project, I have an array of values which are calculated individually, one at a time, as late as possible based off a single variable. These values are not all calculated at once, they are calculated if and only if they need to be. As is normally the case when using "dirty", the objective is to label certain things as being in need of update, without updating it preemptively. These values are cycled through over and over, so I'd like to cache the computation if possible. Whenever the single variable changes, all the values should be marked dirty so the cycle knows to recalculate before storing and moving on.
I can think of a few ways of achieving this, but I'm not sure what is most efficient:
Have two arrays, one of booleans and one of values. Mark all booleans to false if dirty, and true when clean.
Have a clean start point. Consider everything dirty until passing that cycle point again. Has the drawback of not allowing skipping of cycle entries.
Brand new array. Just create a new array, if any of the items are unset, set them. This one seems like it would have tons of problems, but it's a thought.
Perhaps use some built in class meant for this stuff?
The above are just the first things that came to mind for me, but I'm kind of new to c++ and would like to have some idea of normal or special solutions to marking an array dirty.
How can I dirty an array efficiently?
In order to show an example of code, I will show js which I'm more used to:
const numbers = [];
const clean = [];
let length = 1000;
let variable;
const setVariable(num) => {
variable = num;
for (let i = 0; i < length; i++) { clean[i] = false; }
}
setVariable(42);
let pos = 0;
while (true) {
if (clean[pos] == false) {
clean[pos] = true;
numbers[pos] = someIntensiveMath(pos, variable);
}
doSomethingWithNumbers(numbers[pos]);
pos++;
if (pos >= length) pos = 0;
// wait a bit;
}
in js you could also do
const setVariable(num) => {
variable = num;
numbers = [];
}
const isDirt = numbers[pos] === undefined;
With js the latter would probably be faster due to the native implementation of the script, but I don't think that's the case with compiled languages. I think you guys do things differently.

I've found elsewhere that the typical way to label entries of an array "dirty" is by having a parallel array of booleans.
#stark mentioned in the comments the idea of using a map, and speed comparisons of the two appear to be pretty decent, but it was advised in the following answer to use an array for indexed items.
performance of array vs. map
Whether or not changes in modern coding have led to a new defacto way of labeling items or parts of an array (or linear collection of items) as "dirty" is unknown. But in the very least, as an answer, the most straight forward nooby way is to have a parallel array of booleans.
Further, depending on the way you are iterating through the "array" it may make sense to use vector or map. In the case of either of those, the form of dirtying would probably be best done by clearing the map or removing vector entries(??).
So, to give my best answer, it would seem one should first find the storage method that best fits their needs, and then use whichever method is most normal for that.
For arrays, as this question was specified towards, parallel arrays appears to be the answer.

Related

Most efficient way to iterate through Lists and Forward List

In the one of the question on the studio for my class, it asks to create iterators for 2 different object types (list, forwardlist) that are pointing 2 past the beginning of each type in the MOST EFFICIENT MANNER. I'm not sure what constitutes the most efficient manner.
Obviously I could use a for loop to move the iterator twice, but I'm not sure that's what you are looking for. I also tried using the next function, but it did not appear to work. Is there a better way than the for loop?
auto it = list.begin()
for (int i = 0; i< 2; i++){
++iterator;
}
Thanks so much for your time!
Efficiency - for this context - can mean efficient use of memory, cpu cycles, non-duplication of efforts, and more.
The most important aspect of your exercise is to have some kind of understanding of the code that will be used. That does not just mean the code you write, but the code contained in the libraries you use. Once you are comfortable with all of the code in use, then you can begin to analyze it.
Some ideas:
read about the iterator and collection libraries that you use for lists. Also read about how they perform 'begin()' and 'next()'. If there are no articles on this, you may have to have a look at the source code.
can the int loop index be swapped for a byte? (splitting hairs).

The fastest dynamic data structure in C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I've got a task which mainly consist in adding or removing elements from an array in C++. Since arrays ain't dynamic but operations on them are very fast, I've been looking for a dynamic data structure which is nearly as fast to operate. I've been thinking about std::vector but since it is predefined and quite massive construct I'm afraid about time of the operations which is crucial for me. Could Anybody provide me with some information about Your point of view? I'd be very glad for any help from You!
edited:
I'm really sorry I haven't included all important point in my question; below I'd try to add more info:
I'll be traversing elements of the structure many times and access them in a random manner so operation on elements on every possible positions are possible
I think that there will be (depending on tests provided) many operations on elements in the middle of the data structure as well as near its "brims".
I believe that will help my post to be more clear, specific and, thus, more useful for others.
Thank You for all the answers!
Refer to Mikael Persson's "container choice" diagram:
http://www.cim.mcgill.ca/~mpersson/pics/STLcontainerChoices.png
The different data structures were implemented in the STL to be used for different reasons. Therefore the structures differ when it comes to insertion/deletion speeds at the start, the middle or the end of the structures or even when it comes to the random access of the structure elements.
A nice short comparison of STL containers:
http://john-ahlgren.blogspot.com/2013/10/stl-container-performance.html
If it's possible for you to use an associative array, maps at least guarantee an insertion/look-up time of O(log n) which is a good bit faster for large amounts of data/lots of insertions and deletes than vector's guarantee of O(n) for non-back insertions.
Not sure if they will work here or not, this link also shows some graphs of benchmarks using random insert/removes/searches/fills/sorts, etc. on several different containers:
http://www.baptiste-wicht.com/2012/12/cpp-benchmark-vector-list-deque/
Lastly, a flow chart from SO that could help you decide on a container:
In which scenario do I use a particular STL container?
While not perfect, it still might turn out that a vector is your best bet.
Will a linked list implemented using an array meet your needs?
class AList
{
public:
AList()
{
for (int = 0; i != 256; ++i )
{
nodes[i].prev = (i-1+256)%256;
nodes[i].next = (i+1)%256;
}
}
int const& operator[](int index)
{
// Deal with the case where nodes[index].isSet == false
return nodes[index].data;
}
// Not sure what the requirements are for adding
// and removing items from the list.
//
// add();
// remove();
private:
struct Node
{
Node() : data(0), prev(0), next(0), isSet(false) {}
int data;
unsigned char prev;
unsigned char next;
bool isSet;
};
Node nodes[256];
};

why can I not instantiate a class by doing "myVector[i].data()", where myVector[i].data() is a string? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am using a code that someone else wrote for calculating chemical reactions. The user must specify many values for a calculation and this can lead to mistakes. I am trying to automate/simply this process.
I can instantiate a class by doing (for example):
Algorithm<double> chlorine;
I would like to do multiple instantiations--for example, chlorine, hydrogen, and oxygen. I don't understand why I get a segmentation fault when I put "chlorine," "hydrogen," and "oxygen" as elements in a vector of strings called "chemicalElements"and then do:
for (i = 0; i < chemicalElements.size(); i++)
{
Algorithm<double> chemicalElements[i].data();
}
Am I missing something simple here? When I write:
Algorithm<double> chlorine;
"chlorine" is just a string, right? So why would it not work to add "chlorine" from an element in a vector of strings?
chlorine is not a string in your example code, it's an identifier for a variable (of type Algorithm<double>).
Variables must be given compile-time identifiers; that means the identifier must be specified when the compiler is traversing your code. The result of chemicalElements[i].data() is unknown until runtime.
C++ doesn't have any facility for creating variable names at runtime, so you cannot do what you are directly asking. However, it sounds like what you really need is a collection of algorithm objects, one for each of your elements. To create an array of algorithm objects, you can do:
Algorithm<double> algorithms[15];
This creates 15 distinct algorithm objects, which you can map to your elements however you like. You can of course choose a different number than 15, so long as that number is a compile-time constant value.
You may also be interested in learning about std::vector<T>, a type that allows you to create dynamically-resizing arrays, or std::map<K,V> which allows you to create an associative mapping between a key value (a string, such as "chlorine," and a value, such as the associated algorithm).
To use the latter, you can do something like this:
std::map<std::string, Algorithm<double>> algorithms;
algorithms["chlorine"] = Algorithm<double>();
algorithms["argon"] = Algorithm<double>();
and then later:
auto results = algorithms["chlorine"].data();
(You should of course peruse the linked documentation on the above types, since I am omitting some error handling for brevity.)
Algorithm chlorine , means that
You've instantiated an "Algorithm" object named "chlorine"
to make array of "Algorithm"
you code it like:
Algorithm<double> chemicalElements[Const_num];
and to pass through each one of its items you call the array's name + it's index like:
chemicalElements[0 or 1 or 2 or ... etc].data();
So it would be like
for (i = 0; i < Const_num i++)
{
chemicalElements[i].data();
}
In this statement
Algorithm<double> chlorine;
chlorine is not a string. It is an identificator that names an object of type Algorithm<double>.
This construction
Algorithm<double> chemicalElements[i].data();
has no syntaxical sense in C++ and the compiler shall issue an error.

Something wrong with BFS maze solving algorithm in OCaml

http://ideone.com/QXyVzR
The above link contains a program I wrote to solve mazes using a BFS algorithm. The maze is represented as a 2D array, initially passed in as numbers, (0's represent an empty block which can be visited, any other number represent a "wall" block), and then converted into a record type which I defined, which keeps track of various data:
type mazeBlock = {
walkable : bool;
isFinish : bool;
visited : bool;
prevCoordinate : int * int
}
The output is a list of ordered pairs (coordinates/indices) which trace a shortest path through the maze from the start to the finish, the coordinates of which are both passed in as parameters.
It works fine for smaller mazes with low branching factor, but when I test it on larger mazes (say 16 x 16 or larger), especially on ones with no walls(high branching factor) it takes up a LOT of time and memory. I am wondering if this is inherent to the algorithm or related to the way I implemented it. Can any OCaml hackers out there offer me their expertise?
Also, I have very little experience with OCaml so any advice on how to improve the code stylistically would be greatly appreciated. Thanks!
EDIT:
http://ideone.com/W0leMv
Here is an cleaned-up, edited version of the program. I fixed some stylistic issues, but I didn't change the semantics. As usual, the second test still takes up a huge amount of resources and cannot seem to finish at all. Still seeking help on this issue...
EDIT2:
SOLVED. Thanks so much to both answerers. Here is the final code:
http://ideone.com/3qAWnx
In your critical section, that is mazeSolverLoop, you should only visited elements that have not been visited before. When you take the element from the queue, you should first check if the element has been visited, and in that case do nothing but recurse to get the next element. This is precisely what makes the good time complexity of the algorithm (you never visit a place twice).
Otherwise, yes, your OCaml style could be improved. Some remarks:
the convention in OCaml-land is rather to write_like_this instead of writeLikeThis. I recommend that you follow it, but admittedly that is a matter of taste and not an objective criterion.
there is no point in returning a datastructure if it is a mutable structure that was updated; why do you make a point to always return a (grid, pair) queue, when it is exactly the same as the input? You could just have those functions return unit and have code that is simpler and easier to read.
the abstraction level allowed by pairs is good and you should preserve it; you currently don't. There is no point in writing for example, let (foo, bar) = dimension grid in if in_bounds pos (foo, bar). Just name the dimension dim instead of (foo, bar), it makes no sense to split it in two components if you don't need them separately. Remark that for the neighbor, you do use neighborX and neighborY for array access for now, but that is a style mistake: you should have auxiliary functions to get and set values in an array, taking a pair as input, so that you don't have to destruct the pair in the main function. Try to keep all the code inside a single function at the same level of abstraction: all working on separate coordinates, or all working on pairs (named as such instead of being constructed/deconstructed all the time).
If I understand you right, for an N x N grid with no walls you have a graph with N^2 nodes and roughly 4*N^2 edges. These don't seem like big numbers for N = 16.
I'd say the only trick is to make sure you track visited nodes properly. I skimmed your code and don't see anything obviously wrong in the way you're doing it.
Here is a good OCaml idiom. Your code says:
let isFinish1 = mazeGrid.(currentX).(currentY).isFinish in
let prevCoordinate1 = mazeGrid.(currentX).(currentY).prevCoordinate in
mazeGrid.(currentX).(currentY) <-
{ walkable = true;
isFinish = isFinish1;
visited = true;
prevCoordinate = prevCoordinate1}
You can say this a little more economically as follows:
mazeGrid.(currentX).(currentY) <-
{ mazeGrid.(currentX).(currentY) with visited = true }

How to create a hash table

I would like to mention before continuing that I have looked at other questions asking the same thing on this site as well as on other sites. I hope that I can get a good answer, because my goal is twofold:
Foremost, I would like to learn how to create a hash table.
Secondly, I find that a lot of answers on Stack Overflow tend to assume a certain level of knowledge on a subject that is often not there, especially for the newer types. That being said, I hope to edit my main message to include an explanation of the process a bit more in depth once I figure it out myself.
Onto the main course:
As I understand them so far, a hash table is an array of lists (or a similar data structure) that hopes to, optimally, have as few collisions as possible in order to preserve it's lauded O(1) complexity. The following is my current process:
So my first step is to create an array of pointers:
Elem ** table;
table = new Elem*[size];//size is the desired size of the array
My second step is to create a hashing function( a very simple one ).
int hashed = 0;
hashed = ( atoi( name.c_str() ) + id ) % size;
//name is a std string, and id is a large integer. Size is the size of the array.
My third step would be to create something to detect collisions, which is the part I'm currently at.
Here's some pseudo-code:
while( table[hashedValue] != empty )
hashedValue++
else
put in the list at that index.
It's relatively inelegant, but I am still at the "what is this" stage. Bear with me.
Is there anything else? Did I miss something or do something incorrectly?
Thanks
Handle finding no empty slots and resizing the table.
You're missing a definition for Elem. That's not trivial, as it depends on whether you want a chaining or a probing hash table.
A hash function produces the same value for the same data. Your collision check, however, modifies that value, which means that the hash value not only depends on the input, but also on the presence of other elements in the hash map. This is bad, as you almost never will be able to actually access the element you put in before through its name, only through iterating over the map.
Second, your collision check is vulnerable to overflow / range errors, as you just increase the hash value without checking against the size of the map (though, as I said before, you shouldn't even be doing this).