I'm just getting starting with Go and I have a situation where I need to make a set of entities whose size/length is only known at runtime. I first thought using a list would be a good fit but soon realized slices are the idiomatic data structure in Go.
Curious, I wrote the following benchmarks
package main
import (
"container/list"
"testing"
)
var N = 10000000
func BenchmarkSlices(B *testing.B) {
s := make([]int, 1)
for i := 0; i < N; i += 1 {
s = append(s, i)
}
}
func BenchmarkLists(B *testing.B) {
l := list.New()
for i := 0; i < N; i += 1 {
l.PushBack(i)
}
}
which gave me
BenchmarkSlices-4 2000000000 0.03 ns/op
BenchmarkLists-4 1 1665489308 ns/op
Given that append would create a new array and copy all the data over from the old to the new array when the old array gets full, I was expected lists to perform better than slices in the example above. However, my expectations are obviously wrong and I'm trying to understand why.
I wrote the following in order to understand a little better how append creates new arrays when it needs to:
package main
import "fmt"
func describe(s []int) {
fmt.Printf("len = %d, cap = %d\n", len(s), cap(s))
}
func main() {
s := make([]int, 2)
for i := 0; i < 15; i += 1 {
fmt.Println(i)
describe(s)
s = append(s, i)
}
}
which gave me
0
len = 2, cap = 2
1
len = 3, cap = 4
2
len = 4, cap = 4
3
len = 5, cap = 8
4
len = 6, cap = 8
5
len = 7, cap = 8
6
len = 8, cap = 8
7
len = 9, cap = 16
8
len = 10, cap = 16
9
len = 11, cap = 16
10
len = 12, cap = 16
11
len = 13, cap = 16
12
len = 14, cap = 16
13
len = 15, cap = 16
14
len = 16, cap = 16
My only guess at the moment as for why slices perform better than lists is that allocating memory for a new array whose size is double and copying all the data over is faster than allocating memory for a single element each time an insertion happens.
Is my guess correct? Am I missing something?
You are running the benchmarks wrong. You should first set up the initial data structure and then run the operation being benchmarked as many times as the testing.B instance indicates.
I replaced your code with:
var N = 1
func BenchmarkSlices(B *testing.B) {
s := make([]int, 1)
for n := 0; n < B.N; n++ {
for i := 0; i < N; i++ {
s = append(s, i)
}
}
}
func BenchmarkLists(B *testing.B) {
l := list.New()
for n := 0; n < B.N; n++ {
for i := 0; i < N; i++ {
l.PushBack(i)
}
}
}
And got this result:
BenchmarkSlices-4 100000000 14.3 ns/op
BenchmarkLists-4 5000000 275 ns/op
At least this time the difference seems reasonable and not like a trillion times.
Note that I also replaced the value of N to 1 so that ns/op actually means nanoseconds per operation and not nanoseconds per N operations. However, this might also impact the results.
Now onto your question: linked lists, as implemented in Go, suffer from additional costs when compared to simply adding another int to a pre-allocated slice: the list method needs to create a new Element, wrap your value in an interface{} and reassign some pointers.
Meanwhile appending to a slice which has not maxed out its capacity will result in just a few instructions at the CPU level: move an int to a memory location, increment the length of the slice, and you're done.
There is also the fact that the underlying allocator might reallocate the slice in place thus avoiding the need to copy the existing underlying array elements at all.
Related
I'm trying to solve a problem of graph:
There are (n+1) cities(no. from 0 to n), and (m+1) bus lines(no. from 0 to m).
(A line may contain repeated cities, meaning the line have a cycle.)
Each line covers several cities, and it takes t_ij to run from city i to city j (t_ij may differ in different lines).
Moreover, it takes extra transfer_time to each time you get in a bus.
An edge look like this: city i --(time)-->city2
Example1:
n = 2, m = 2, start = 0, end = 2
line0:
0 --(1)--> 1 --(1)--> 2; transfer_time = 1
line1:
0 --(2)--> 2; transfer_time = 2
Line0 take 1+1+1 = 3 and line1 takes 4, so the min is 3.
Example2:
n = 4, m = 0, start = 0, end = 4
line0:
0 --(2)--> 1 --(3)--> 2 --(3)-->3 --(3)--> 1 --(2)--> 4; transfer_time = 1
it takes 1(get in at 0) + 2(from 0 to 1) + 1(get off and get in, transfer) + 2 = 6
I've tried to solve it with Dijkstra Algorithm, but failed to handle graph with cycles(like Example2).
Below is my code.
struct Edge {
int len;
size_t line_no;
};
class Solution {
public:
Solution() = default;
//edges[i][j] is a vector, containing ways from city i to j in different lines
int findNearestWay(vector<vector<vector<Edge>>>& edges, vector<int>& transfer_time, size_t start, size_t end) {
size_t n = edges.size();
vector<pair<int, size_t>> distance(n, { INT_MAX / 2, -1 }); //first: len, second: line_no
distance[start].first = 0;
vector<bool> visited(n);
int cur_line = -1;
for (int i = 0; i < n; ++i) {
int next_idx = -1;
//find the nearest city
for (int j = 0; j < n; ++j) {
if (!visited[j] && (next_idx == -1 || distance[j].first < distance[next_idx].first))
next_idx = j;
}
visited[next_idx] = true;
cur_line = distance[next_idx].second;
//update distance of other cities
for (int j = 0; j < n; ++j) {
for (const Edge& e : edges[next_idx][j]) {
int new_len = distance[next_idx].first + e.len;
//transfer
if (cur_line == -1 || cur_line != e.line_no) {
new_len += transfer_time[e.line_no];
}
if (new_len < distance[j].first) {
distance[j].first = new_len;
distance[j].second = e.line_no;
}
}
}
}
return distance[end].first == INT_MAX / 2 ? -1 : distance[end].first;
}
};
Is there a better practice to work out it? Thanks in advance.
Your visited set looks wrong. The "nodes" you would feed into Djikstra's algorithm cannot simply be cities, because that doesn't let you model the cost of switching from one line to another within a city. Each node must be a pair consisting of a city number and a bus line number, representing the bus line you are currently riding on. The bus line number can be -1 to represent that you are not on a bus, and the starting and destination nodes would have a bus line numbers of -1.
Then each edge would represent either the cost of staying on the line you are currently on to ride to the next city, or getting off the bus in your current city (which is free), or getting on a bus line within your current city (which has the transfer cost).
You said the cities are numbered from 0 to n but I see a bunch of loops that stop at n - 1 (because they use < n as the loop condition), so that might be another problem.
I have a for-loop that is constructing a vector with 101 elements, using (let's call it equation 1) for the first half of the vector, with the centre element using equation 2, and the latter half being a mirror of the first half.
Like so,
double fc = 0.25
const double PI = 3.1415926
// initialise vectors
int M = 50;
int N = 101;
std::vector<double> fltr;
fltr.resize(N);
std::vector<int> mArr;
mArr.resize(N);
// Creating vector mArr of 101 elements, going from -50 to +50
int count;
for(count = 0; count < N; count++)
mArr[count] = count - M;
// using these elements, enter in to equations to form vector 'fltr'
int n;
for(n = 0; n < M+1; n++)
// for elements 0 to 50 --> use equation 1
fltr[n] = (sin((fc*mArr[n])-M))/((mArr[n]-M)*PI);
// for element 51 --> use equation 2
fltr[M] = fc/PI;
This part of the code works fine and does what I expect, but for elements 52 to 101, I would like to mirror around element 51 (the output value using equation)
For a basic example;
1 2 3 4 5 6 0.2 6 5 4 3 2 1
This is what I have so far, but it just outputs 0's as the elements:
for(n = N; n > M; n--){
for(i = 0; n < M+1; i++)
fltr[n] = fltr[i];
}
I feel like there is an easier way to mirror part of a vector but I'm not sure how.
I would expect the values to plot like this:
After you have inserted the middle element, you can get a reverse iterator to the mid point and copy that range back into the vector through std::back_inserter. The vector is named vec in the example.
auto rbeg = vec.rbegin(), rend = vec.rend();
++rbeg;
copy(rbeg, rend, back_inserter(vec));
Lets look at your code:
for(n = N; n > M; n--)
for(i = 0; n < M+1; i++)
fltr[n] = fltr[i];
And lets make things shorter, N = 5, M = 3,
array is 1 2 3 0 0 and should become 1 2 3 2 1
We start your first outer loop with n = 3, pointing us to the first zero. Then, in the inner loop, we set i to 0 and call fltr[3] = fltr[0], leaving us with the array as
1 2 3 1 0
We could now continue, but it should be obvious that this first assignment was useless.
With this I want to give you a simple way how to go through your code and see what it actually does. You clearly had something different in mind. What should be clear is that we do need to assign every part of the second half once.
What your code does is for each value of n to change the value of fltr[n] M times, ending with setting it to fltr[M] in any case, regardless of what value n has. The result should be that all values in the second half of the array are now the same as the center, in my example it ends with
1 2 3 3 3
Note that there is also a direct error: starting with n = N and then accessing fltr[n]. N is out of bounds for an arry of size N.
To give you a very simple working solution:
for(int i=0; i<M; i++)
{
fltr[N-i-1] = fltr[i];
}
N-i-1 is the mirrored address of i (i = 0 -> N-i-1 = 101-0-1 = 100, last valid address in an array with 101 entries).
Now, I saw several guys answering with a more elaborate code, but I thought that as a beginner, it might be beneficial for you to do this in a very simple manner.
Other than that, as #Pzc already said in the comments, you could do this assignment in the loop where the data is generated.
Another thing, with your code
for(n = 0; n < M+1; n++)
// for elements 0 to 50 --> use equation 1
fltr[n] = (sin((fc*mArr[n])-M))/((mArr[n]-M)*PI);
// for element 51 --> use equation 2
fltr[M] = fc/PI;
I have two issues:
First, the indentation makes it look like fltr[M]=.. would be in the loop. Don't do that, not even if this should have been a mistake when you wrote the question and is not like this in the code. This will lead to errors in the future. Indentation is important. Using the auto-indentation of your IDE is an easy way to go. And try to use brackets, even if it is only one command.
Second, n < M+1 as a condition includes the center. The center is located at adress 50, and 50 < 50+1. You haven't seen any problem as after the loop you overwrite it, but in a different situation, this can easily produce errors.
There are other small things I'd change, and I recommend that, when your code works, you post it on CodeReview.
Let's use std::iota, std::transform, and std::copy instead of raw loops:
const double fc = 0.25;
constexpr double PI = 3.1415926;
const std::size_t M = 50;
const std::size_t N = 2 * M + 1;
std::vector<double> mArr(M);
std::iota(mArr.rbegin(), mArr.rend(), 1.); // = [M, M - 1, ..., 1]
const auto fn = [=](double m) { return std::sin((fc * m) + M) / ((m + M) * PI); };
std::vector<double> fltr(N);
std::transform(mArr.begin(), mArr.end(), fltr.begin(), fn);
fltr[M] = fc / PI;
std::copy(fltr.begin(), fltr.begin() + M, fltr.rbegin());
I'm trying to rewrite this outer for loop to allow multiple threads to execute the iterations in parallel. Actually, for now I just want two threads to compute this although a more general solution would be ideal. The issue is that there's a loop carried dependence: nval[j] is added to the previous value of nval[j] along with this iteration's value of avval.
void loop_func(Arr* a, int blen, double* nval) {
// an Arr is a struct of an array and len field
assert(a->len < blen);
long long int i = 0, j = 0, jlo = 0, jhi = 0;
double avval = 0.0;
for (i = 0; i < blen; i++)
nval[i] = 0.0;
const double quot = static_cast<double>(blen) / static_cast<double>(a->len);
for (auto i = 1; i < a->len - 1; i++) {
j_lower = (int)(0.5 * (2 * i - 1) * quot);
j_upper = (int)(0.5 * (2 * i + 1) * quot);
printf("a->val index: %lld\t\tnval index: %lld-%lld\n", i, j_lower, j_upper);
avval = a->val[i] / (j_upper - j_lower + 1);
for (j = j_lower; j <= j_upper; j++) {
nval[j] += avval;
}
}
}
For convenience, I'm printing out some details that were printed via the printf above.
a->val index: 1 nval index: 1-3
a->val index: 2 nval index: 3-5
a->val index: 3 nval index: 5-7
a->val index: 4 nval index: 7-9
a->val index: 5 nval index: 9-11
a->val index: 6 nval index: 11-13
I'd ideally want to have thread 1 handle the a->val index 1-3 and thread 2 handle the a-val index 4-6.
Question 1: can anyone describe a code transformation that would remove this dependence?
Question 2: are there C/C++ tools that can do this? Maybe something built with LLVM? I will likely have a number of different situations like this where I need to do some parallel execution. Similarly, if there are general techniques that can be applied to remove such loop-carried dependencies, I'd like to learn about this more generally.
I have a list of 100 random integers. Each random integer has a value from 0 to 99. Duplicates are allowed, so the list could be something like
56, 1, 1, 1, 1, 0, 2, 6, 99...
I need to find the smallest integer (>= 0) is that is not contained in the list.
My initial solution is this:
vector<int> integerList(100); //list of random integers
...
vector<bool> listedIntegers(101, false);
for (int theInt : integerList)
{
listedIntegers[theInt] = true;
}
int smallestInt;
for (int j = 0; j < 101; j++)
{
if (!listedIntegers[j])
{
smallestInt = j;
break;
}
}
But that requires a secondary array for book-keeping and a second (potentially full) list iteration. I need to perform this task millions of times (the actual application is in a greedy graph coloring algorithm, where I need to find the smallest unused color value with a vertex adjacency list), so I'm wondering if there's a clever way to get the same result without so much overhead?
It's been a year, but ...
One idea that comes to mind is to keep track of the interval(s) of unused values as you iterate the list. To allow efficient lookup, you could keep intervals as tuples in a binary search tree, for example.
So, using your sample data:
56, 1, 1, 1, 1, 0, 2, 6, 99...
You would initially have the unused interval [0..99], and then, as each input value is processed:
56: [0..55][57..99]
1: [0..0][2..55][57..99]
1: no change
1: no change
1: no change
0: [2..55][57..99]
2: [3..55][57..99]
6: [3..5][7..55][57..99]
99: [3..5][7..55][57..98]
Result (lowest value in lowest remaining interval): 3
I believe there is no faster way to do it. What you can do in your case is to reuse vector<bool>, you need to have just one such vector per thread.
Though the better approach might be to reconsider the whole algorithm to eliminate this step at all. Maybe you can update least unused color on every step of the algorithm?
Since you have to scan the whole list no matter what, the algorithm you have is already pretty good. The only improvement I can suggest without measuring (that will surely speed things up) is to get rid of your vector<bool>, and replace it with a stack-allocated array of 4 32-bit integers or 2 64-bit integers.
Then you won't have to pay the cost of allocating an array on the heap every time, and you can get the first unused number (the position of the first 0 bit) much faster. To find the word that contains the first 0 bit, you only need to find the first one that isn't the maximum value, and there are bit twiddling hacks you can use to get the first 0 bit in that word very quickly.
You program is already very efficient, in O(n). Only marginal gain can be found.
One possibility is to divide the number of possible values in blocks of size block, and to register
not in an array of bool but in an array of int, in this case memorizing the value modulo block.
In practice, we replace a loop of size N by a loop of size N/block plus a loop of size block.
Theoretically, we could select block = sqrt(N) = 12 in order to minimize the quantity N/block + block.
In the program hereafter, block of size 8 are selected, assuming that dividing integers by 8 and calculating values modulo 8 should be fast.
However, it is clear that a gain, if any, can be obtained only for a minimum value rather large!
constexpr int N = 100;
int find_min1 (const std::vector<int> &IntegerList) {
constexpr int Size = 13; //N / block
constexpr int block = 8;
constexpr int Vmax = 255; // 2^block - 1
int listedBlocks[Size] = {0};
for (int theInt : IntegerList) {
listedBlocks[theInt / block] |= 1 << (theInt % block);
}
for (int j = 0; j < Size; j++) {
if (listedBlocks[j] == Vmax) continue;
int &k = listedBlocks[j];
for (int b = 0; b < block; b++) {
if ((k%2) == 0) return block * j + b;
k /= 2;
}
}
return -1;
}
Potentially you can reduce the last step to O(1) by using some bit manipulation, in your case __int128, set the corresponding bits in loop one and call something like __builtin_clz or use the appropriate bit hack
The best solution I could find for finding smallest integer from a set is https://codereview.stackexchange.com/a/179042/31480
Here are c++ version.
int solution(std::vector<int>& A)
{
for (std::vector<int>::size_type i = 0; i != A.size(); i++)
{
while (0 < A[i] && A[i] - 1 < A.size()
&& A[i] != i + 1
&& A[i] != A[A[i] - 1])
{
int j = A[i] - 1;
auto tmp = A[i];
A[i] = A[j];
A[j] = tmp;
}
}
for (std::vector<int>::size_type i = 0; i != A.size(); i++)
{
if (A[i] != i+1)
{
return i + 1;
}
}
return A.size() + 1;
}
I know that a reverse ordered list should yield theta(n^2) number of comparisons and theta(n^2) number of exchanges for bubble sort. In my sample code I am using a list of size n = 10. I implemented counters for the numComparisons and numExchanges, and although this doesn't seem very complicated, I can't figure out why my results don't yield 100 comparisons and 100 exchanges. Am I really far off target?
void testList::bubbleSort()
{
int k = 10;
bool flag = true;
while(flag)
{
k = k - 1;
flag = false;
for(int j = 0; j < k; j++)
{
if( vecPtr[j] > vecPtr[j+1])
{
int temp = vecPtr[j];
vecPtr[j] = vecPtr[j+1];
vecPtr[j+1] = temp;
numExchanges += 1;
flag = true;
}
numComparisons++;
}
}
}
The resulting output:
Original List: 10 9 8 7 6 5 4 3 2 1
Sorted List: 1 2 3 4 5 6 7 8 9 10
Comparisons: 45
Exchanges: 45
I also tried this implementation, but my results were the same:
void testList::bubbleSort()
{
int temp;
for(long i = 0; i < 10; i++)
{
for(long j = 0; j < 10-i-1; j++)
{
if (vecPtr[j] > vecPtr[j+1])
{
temp = vecPtr[j];
vecPtr[j] = vecPtr[j+1];
vecPtr[j+1] = temp;
numExchanges++;
}
numComparisons++;
}
}
}
Approximately N2/2 comparisons and exchanges are expected.
In particular, the inner loop starts the current value of the outer loop. So, on the first iteration, it traverses the entire array. On each subsequent iteration, it traverses one fewer item in the array.
So, the number of iterations of the inner loop is N + N-1 + N-2 ... 1. On average, that's approximately N/2.
If you want to get more precise, there's one more detail to consider: the inner loop iterates from i+1...N, so its largest value is N-1 iterations, not N iterations.
Therefore, instead of being precisely N2/2, it's really N * (N-1)/2. In your case, that 10*9/2 = 45.
That's the count for the number of comparisons. For swaps, you get some percentage of that, depending on the number of items that are out of order. In your specific case, all items are always out of order (because you're starting with reverse order) so you do a swap for every comparison. With any other ordering, you'd expect the number of swaps to be reduced.
45 = 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1, so for the exchanges this is correct, but for the comparisons I think there must be a mistake somewhere. Edit: You implemented a slightly more intelligent version than the standard bubble sort, that's why you have only 45 comparisons instead of 90 (it's not 100, one iteration takes 9 comparisons).