I was trying to understand the code to generate all the subsets from one set. Here is the code
#include <stdio.h>
/* Applies the mask to a set like {1, 2, ..., n} and prints it */
void printv(int mask[], int n) {
int i;
printf("{ ");
for (i = 0; i < n; ++i)
if (mask[i])
printf("%d ", i + 1); /*i+1 is part of the subset*/
printf("\\b }\\n");
}
/* Generates the next mask*/
int next(int mask[], int n) {
int i;
for (i = 0; (i < n) && mask[i]; ++i)
mask[i] = 0;
if (i < n) {
mask[i] = 1;
return 1;
}
return 0;
}
int main(int argc, char *argv[]) {
int n = 3;
int mask[16]; /* Guess what this is */
int i;
for (i = 0; i < n; ++i)
mask[i] = 0;
/* Print the first set */
printv(mask, n);
/* Print all the others */
while (next(mask, n))
printv(mask, n);
return 0;
}
I am not understand the logic behind this line for (i = 0; (i < n) && mask[i]; ++i) inside the next function. How is the next mask being generated here?
Code and algorithm looked here:
http://compprog.wordpress.com/2007/10/10/generating-subsets/
That is simply an implementation of counting in binary. The basic idea is to change the least-significant (last) zero to a one, and change all the ones after it to zeroes. The "next" mask will be "one more" than the previous if interpreted as a binary number.
Because the array is arranged with the one's place first, it looks backwards from traditional numeric notation.
Instead of using an array of Boolean values, it could just as well use the bits in the binary representation of one number and the ++ operator.
int next(int &mask, int n) { // using C++ reference
if ( mask == ( 1u << n ) - 1 ) return 0;
++ mask;
return 1;
}
void printv(int mask, int n) {
int i;
printf("{ ");
for (i = 0; i < n; ++i)
if (mask & ( 1 << i ) )
printf("%d ", i + 1); /*i+1 is part of the subset*/
printf("\\b }\\n");
}
I've used a little C++ since you tagged the question as such, but the posted code is plain C.
Last year I participated in the C language contest of the 6th ITAT competition where I solved the second problem by generating all comabinations with the help of a mask (though, it might not be an optimal solution to that problem.)
When you try to derive all the subsets of {a,b,c}, you do it this way:
You may or may not take the first element a.
May or may not take the 2nd element b.
Same for c.
So you wind up with a set of 3 take-or-not-take choices. This can be represented in binaries or boolean values: represent taking by 1, and not taking by 0.
You get the following eight masks: (by the order of a,b,c)
000 100 010 110 001 101 011 111
To generate the next mask of 110:
element 0 is 1. Switch it to 0.
element 1 is 1. Switch it to 0.
element 2 is 0. Switch it to 1.
now you have 001 which is the next mask, which generates subset {c}.
for (i = 0; (i < n) && mask[i]; ++i) does exactly that.
start at element 0.
while (i doesn't exceed your mask length AND element i is 1)
do the body code which flips i to 0, and ++i (go to next element). goto 2 (check).
If the current mask is 111 (the last mask), the next() function simply returns 1 to indicate END.
(P.S. a non-zero integer always represents true.)
The loop in questions starts at the beginning of the array and sets all 1s to 0s until a 0 in encountered. The next statement sets this 0 to a 1 (if possible). So what happens is: 0,0,0 -> 1,0,0 -> 0,1,0 -> 1,1,0 -> 0,0,1... I am not a hardcore C programmer but I think this could have been done easier by using a bit field and incrementing by 1 iteratively.
for (i = 0; (i < n) && mask[i]; ++i)
for:
start at 0 and increment i by 1 each time
don't stop while i is less than n and the bit in the mask array at position i is set
it's straightforward really: 3 parts to a for statement: initial state; end condition; operation.
So if you can understand for (i=0; i < 5; i++) means start at 0 and increment i by 1 each time until it fails to be less than 5, you can understand the more complex for loop you asked about.
in this case, it's going through the loop looking for the first element of the mask that is not set, clearing each element as it goes, then it performs some other operation - namely if there was no mask bits set, and it reached the end of the array. Seems to me like a simple way of setting only one element of an array to 1, in sequence to get the result: 100, 010, 001
Related
I am trying to find the number of sub arrays that have a sum equal to k:
int subarraySum(vector<int>& nums, int k)
{
int start, end, curr_sum = 0, count = 0;
start = 0, end = 0;
while (end < (int)nums.size())
{
curr_sum = curr_sum + nums[end];
end++;
while (start < end && curr_sum >= k)
{
if (curr_sum == k)
count++;
curr_sum = curr_sum - nums[start];
start++;
}
}
return count;
}
The above code I have written, works for most cases, but fails for the following:
array = {-1, -1, 1} with k = 0
I have tried to add another while loop to iterate from the start and go up the array until it reaches the end:
int subarraySum(vector<int>& nums, int k)
{
int start, end, curr_sum = 0, count = 0;
start = 0, end = 0;
while (end < (int)nums.size())
{
curr_sum = curr_sum + nums[end];
end++;
while (start < end && curr_sum >= k)
{
if (curr_sum == k)
count++;
curr_sum = curr_sum - nums[start];
start++;
}
}
while (start < end)
{
if (curr_sum == k)
count++;
curr_sum = curr_sum - nums[start];
start++;
}
return count;
}
Why is this not working? I am sliding the window until the last element is reached, which should have found a sum equal to k? How can I solve this issue?
Unfortunately, you did not program a sliding window in the correct way. And a sliding window is not really a solution for this problem. One of your main issues is, that you do not move the start of the window based on the proper conditions. You always sum up and wait until the sum is greater than the search value.
This will not really work. Especially for your example -1, -1, 1. The running sum of this is: -1, -2, -1 and you do not see the 0, although it is there. You may have the idea to write while (start < end && curr_sum != k), but this will also not work, because you handle the start pointer not correctly.
Your approach will lead to the brute force solution that typically takes something like N*N loop operations, where N is the size of the array. This, because we need a double nested loop.
That will of course always work, but maybe very time-consuming, and, in the end, too slow.
Anyway. Let us implement that. We will start from each value in the std::vector and try out all sub arrays starting from the beginning value. We must evaluate all following values in the std::vector, because for example the last value could be a big negative number and bring down the sum again to the search value.
We could implement this for example like the following:
#include <iostream>
#include <vector>
using namespace std;
int subarraySum(vector<int>& numbers, int searchSumValue) {
// Here we will store the result
int resultingCount{};
// Iterate over all values in the array. So, use all different start values
for (std::size_t i{}; i < numbers.size(); ++i) {
// Here we stor the running sum of the elements in the vector
int sum{ numbers[i] };
// Check for trivial case. A one-element sub-array does already match the search value
if (sum == searchSumValue) ++resultingCount;
// Now we build all subarrays beginning with the start value
for (std::size_t k{ i + 1 }; k < numbers.size(); ++k) {
sum += numbers[k];
if (sum == searchSumValue) ++resultingCount;
}
}
return resultingCount;
}
int main() {
vector v{ -1,-1,1 };
std::cout << subarraySum(v, 0);
}
.
But, as said, the above is often too slow for big vectors and there is indeed a better solution available, which is based on a DP (dynamic programming) algorithm.
It uses so-called prefix sums, running sums, based on the running sum before the current evaluated value.
We need to show an example. Let's use a std::vector with 5 values {1,2,3,4,5}. And we want to look subarrays with a sum of 9.
We can “guess” that there are 2 subarrays: {2,3,4} and {4,5} that have a sum of 9.
Let us investigate further
Index 0 1 2 3 4
Value 1 2 3 4 5
We can now add a running sum and see, how much delta we have between the current evaluated element and the left neighbor or over-next neighbor and so on. And if we have a delta that is equal to our search value, then we must have a subarray building this sum.
Running Sum 1 3 6 10 15
Deltas of 2 3 4 5 against next left
Running sum 5 7 9 against next next left
9 12 against next next next left
Example {2,3,4}. If we evaluate the 4 with a running sum of 10, and subtract the search value 9, then we get the previous running sum 1. “1+9=10” all values are there.
Example {4,5}. If we evaluate the 5 with a running sum of 15, and subtract the search value 9, then we get the previous running sum = 6. “6+9=15” all values are there.
We can find all solutions using the same approach.
So, the only thing we need to do, is to subtract the search value from the current running sum and see, if we have this running sum already calculated before.
Like: “Search-Value” + “previously Calculated Sum” = “Current Running Sum”.
Or: “Current Running Sum” – “Search-Value” = “previously Calculated Sum”
Again, we need to do the subtraction and check, if we already calculated such a sum previously.
So, we need to store all previously calculated running sums. And, because such a sum may appear more than one, we need to find occurrences of equal running sums and count them.
It is very hard to digest, and you need to think a while to understand.
With the above wisdom, you can draft the below potential solution.
#include <iostream>
#include <vector>
#include <unordered_map>
int subarraySum(std::vector<int>& numbers, int searchSumValue) {
// Here we will store the result
int resultingSubarrayCount{};
// Here we will stor all running sums and how ofthen their value appeared
std::unordered_map<int, int> countOfRunningSums;
// Continuosly calculating the running sum
int runningSum{};
// And initialize the first value
countOfRunningSums[runningSum] = 1;
// Now iterate over all values in the vector
for (const int n : numbers) {
// Calculate the running sum
runningSum += n;
// Check, if we have the searched value already available
// And add the number of occurences to our resulting number of subarrays
resultingSubarrayCount += countOfRunningSums[runningSum - searchSumValue];
// Store the new running sum. Respectively. Add 1 to the counter, if the running sum was alreadyy existing
countOfRunningSums[runningSum]++;
}
return resultingSubarrayCount;
}
int main() {
std::vector v{ 1,2,3,4,5 };
std::cout << subarraySum(v, 9);
}
I have a list of 100 random integers. Each random integer has a value from 0 to 99. Duplicates are allowed, so the list could be something like
56, 1, 1, 1, 1, 0, 2, 6, 99...
I need to find the smallest integer (>= 0) is that is not contained in the list.
My initial solution is this:
vector<int> integerList(100); //list of random integers
...
vector<bool> listedIntegers(101, false);
for (int theInt : integerList)
{
listedIntegers[theInt] = true;
}
int smallestInt;
for (int j = 0; j < 101; j++)
{
if (!listedIntegers[j])
{
smallestInt = j;
break;
}
}
But that requires a secondary array for book-keeping and a second (potentially full) list iteration. I need to perform this task millions of times (the actual application is in a greedy graph coloring algorithm, where I need to find the smallest unused color value with a vertex adjacency list), so I'm wondering if there's a clever way to get the same result without so much overhead?
It's been a year, but ...
One idea that comes to mind is to keep track of the interval(s) of unused values as you iterate the list. To allow efficient lookup, you could keep intervals as tuples in a binary search tree, for example.
So, using your sample data:
56, 1, 1, 1, 1, 0, 2, 6, 99...
You would initially have the unused interval [0..99], and then, as each input value is processed:
56: [0..55][57..99]
1: [0..0][2..55][57..99]
1: no change
1: no change
1: no change
0: [2..55][57..99]
2: [3..55][57..99]
6: [3..5][7..55][57..99]
99: [3..5][7..55][57..98]
Result (lowest value in lowest remaining interval): 3
I believe there is no faster way to do it. What you can do in your case is to reuse vector<bool>, you need to have just one such vector per thread.
Though the better approach might be to reconsider the whole algorithm to eliminate this step at all. Maybe you can update least unused color on every step of the algorithm?
Since you have to scan the whole list no matter what, the algorithm you have is already pretty good. The only improvement I can suggest without measuring (that will surely speed things up) is to get rid of your vector<bool>, and replace it with a stack-allocated array of 4 32-bit integers or 2 64-bit integers.
Then you won't have to pay the cost of allocating an array on the heap every time, and you can get the first unused number (the position of the first 0 bit) much faster. To find the word that contains the first 0 bit, you only need to find the first one that isn't the maximum value, and there are bit twiddling hacks you can use to get the first 0 bit in that word very quickly.
You program is already very efficient, in O(n). Only marginal gain can be found.
One possibility is to divide the number of possible values in blocks of size block, and to register
not in an array of bool but in an array of int, in this case memorizing the value modulo block.
In practice, we replace a loop of size N by a loop of size N/block plus a loop of size block.
Theoretically, we could select block = sqrt(N) = 12 in order to minimize the quantity N/block + block.
In the program hereafter, block of size 8 are selected, assuming that dividing integers by 8 and calculating values modulo 8 should be fast.
However, it is clear that a gain, if any, can be obtained only for a minimum value rather large!
constexpr int N = 100;
int find_min1 (const std::vector<int> &IntegerList) {
constexpr int Size = 13; //N / block
constexpr int block = 8;
constexpr int Vmax = 255; // 2^block - 1
int listedBlocks[Size] = {0};
for (int theInt : IntegerList) {
listedBlocks[theInt / block] |= 1 << (theInt % block);
}
for (int j = 0; j < Size; j++) {
if (listedBlocks[j] == Vmax) continue;
int &k = listedBlocks[j];
for (int b = 0; b < block; b++) {
if ((k%2) == 0) return block * j + b;
k /= 2;
}
}
return -1;
}
Potentially you can reduce the last step to O(1) by using some bit manipulation, in your case __int128, set the corresponding bits in loop one and call something like __builtin_clz or use the appropriate bit hack
The best solution I could find for finding smallest integer from a set is https://codereview.stackexchange.com/a/179042/31480
Here are c++ version.
int solution(std::vector<int>& A)
{
for (std::vector<int>::size_type i = 0; i != A.size(); i++)
{
while (0 < A[i] && A[i] - 1 < A.size()
&& A[i] != i + 1
&& A[i] != A[A[i] - 1])
{
int j = A[i] - 1;
auto tmp = A[i];
A[i] = A[j];
A[j] = tmp;
}
}
for (std::vector<int>::size_type i = 0; i != A.size(); i++)
{
if (A[i] != i+1)
{
return i + 1;
}
}
return A.size() + 1;
}
Here is the description:
******The gray code is a binary numeral system where two successive values differ in only one bit.
Given a non-negative integer n representing the total number of bits in the code, print the sequence of gray code. A gray code sequence must begin with 0.
For example, given n = 2, return [0,1,3,2]. Its gray code sequence is:
00 - 0
01 - 1
11 - 3
10 - 2
**Note:
For a given n, a gray code sequence is not uniquely defined.
For example, [0,2,3,1] is also a valid gray code sequence according to the above definition.****
Actually this point is totally a new thing to me,So I look through its introduction on WIKI,then I find a solution (maybe called Mirror Construct
Methond),here is a graph about it:Mirror. And there is the code writing in this method :
// Mirror arrangement
class Solution {
public:
vector<int> grayCode(int n) {
vector<int> res{0};
for (int i = 0; i < n; ++i) {
int size = res.size();
for (int j = size - 1; j >= 0; --j) {
res.push_back(res[j] | (1 << i));
}
}
return res;
}
};
The problem now is that I can't figure out what's the meaning of *res.push_back(res[j] | (1 << i)). I can't understand and use the logic character very well.
res.push_back(res[j] | (1 << i));
The parameter passed to res.push_back() is the contents of res[j] with the ith bit set.
If i is 2, the bit representing the value 4 is set. The parameter will also have all the bits that were set in res[j] set.
Given an array of n non-negative integers: A1, A2, …, AN. How to find a pair of integers Au, Av (1 ≤ u < v ≤ N) such that (Au and Av) is as large as possible.
Example : Let N=4 and array be [2 4 8 10] .Here answer is 8
Explanation
2 and 4 = 0
2 and 8 = 0
2 and 10 = 2
4 and 8 = 0
4 and 10 = 0
8 and 10 = 8
How to do it if N can go upto 10^5.
I have O(N^2) solution.But its not efficient
Code :
for(int i=0;i<n;i++){
for(int j=i+1;j<n;j++){
if(arr[i] & arr[j] > ans)
{
ans=arr[i] & arr[j];
}
}
}
One way you could speed it up is to take advantage of the fact that if any of the high bits are set in any two numbers, then the AND of those two number will ALWAYS be larger than any combination using lower bits.
Therefore, if you order your numbers by the bits set you may decrease the number of operations drastically.
In order to find the most significant bit efficiently, GCC has a builtin intrinsic: __builtin_clz(unsigned int x) that returns the index of the most significant set bit. (Other compilers have similar intrinsics, translating to a single instruction on at least x86).
const unsigned int BITS = sizeof(unsigned int)*8; // Assuming 8 bit bytes.
// Your implementation over.
unsigned int max_and_trivial( const std::vector<unsigned int> & input);
// Partition the set.
unsigned int max_and( const std::vector<unsigned int> & input ) {
// For small input, just use the trivial algorithm.
if ( input.size() < 100 ) {
return max_and_trivial(input);
}
std::vector<unsigned int> by_bit[BITS];
for ( auto elem : input ) {
unsigned int mask = elem;
while (mask) { // Ignore elements that are 0.
unsigned int most_sig = __builtin_clz(mask);
by_bits[ most_sig ].push_back(elem);
mask ^= (0x1 << BITS-1) >> most_sig;
}
}
// Now, if any of the vectors in by_bits have more
// than one element, the one with the highest index
// will include the largest AND-value.
for ( unsigned int i = BITS-1; i >= 0; i--) {
if ( by_bits[i].size() > 1 ) {
return max_and_trivial( by_bits[i]);
}
}
// If you get here, the largest value is 0.
return 0;
}
This algorithm still has worst case runtime O(N*N), but on average it should perform much better. You can also further increase the performance by repeating the partition step when you search through the smaller vector (just remember to ignore the most significant bit in the partition step, doing this should increase the performance to a worst case of O(N)).
Guaranteeing that there are no duplicates in the input-data will further increase the performance.
Sort the array in descending order.
Take the first two numbers. If they are both between two consecutive powers of 2 (say 2^k and 2^(k+1), then you can remove all elements that are less than 2^k.
From the remaining elements, subtract 2^k.
Repeat steps 2 and 3 until the number of elements in the array is 2.
Note: If you find that only the largest element is between 2^k and 2^(k+1) and the second largest element is less than 2^k, then you will not remove any element, but just subtract 2^k from the largest element.
Also, determining where an element lies in the series {1, 2, 4, 8, 16, ...} can be done in O(log(log(MAX))) time where MAX is the largest number in the array.
I didn't test this, and I'm not going to. O(N) memory and O(N) complexity.
#include <vector>
#include <utility>
#include <algorithm>
using namespace std;
/*
* The idea is as follows:
* 1.) Create a mathematical set A that holds integers.
* 2.) Initialize importantBit = highest bit in any integer in v
* 3.) Put into A all integers that have importantBit set to 1.
* 4.) If |A| = 2, that is our answer. If |A| < 2, --importantBit and try again. If |A| > 2, basically
* redo the problem but only on the integers in set A.
*
* Keep "set A" at the beginning of v.
*/
pair<unsigned, unsigned> find_and_sum_pair(vector<unsigned> v)
{
// Find highest bit in v.
int importantBit = 0;
for(auto num : v)
importantBit = max(importantBit, highest_bit_index(num));
// Move all elements with imortantBit to front of vector until doing so gives us at least 2 in the set.
int setEnd;
while((setEnd = partial_sort_for_bit(v, importantBit, v.size())) < 2 && importantBit > 0)
--importantBit;
// If the set is never sufficient, no answer exists
if(importantBit == 0)
return pair<unsigned, unsigned>();
// Repeat the problem only on the subset defined by A until |A| = 2 and impBit > 0 or impBit = 0
while(importantBit > 1)
{
unsigned secondSetEnd = partial_sort_for_bit(v, --importantBit, setEnd);
if(secondSetEnd >= 2)
setEnd = secondSetEnd;
}
return pair<unsigned, unsigned>(v[0], v[1]);
}
// Returns end index (1 past last) of set A
int partial_sort_for_bit(vector<unsigned> &v, unsigned importantBit, unsigned vSize)
{
unsigned setEnd = 0;
unsigned mask = 1<<(importantBit-1);
for(decltype(v.size()) index = 0; index < vSize; ++index)
if(v[index]&mask > 0)
swap(v[index], v[setEnd++]);
return setEnd;
}
unsigned highest_bit_index(unsigned i)
{
unsigned ret = i != 0;
while(i >>= 1)
++ret;
return ret;
}
I came upon this problem again and solved it a different way (much more understandable to me):
unsigned findMaxAnd(vector<unsigned> &input) {
vector<unsigned> candidates;
for(unsigned mask = 1<<31; mask; mask >>= 1) {
for(unsigned i : input)
if(i&mask)
candidates.push_back(i);
if (candidates.size() >= 2)
input = move(candidates);
candidates = vector<unsigned>();
}
if(input.size() < 2) {
return 0;
return input[0]&input[1];
}
Here is an O(N * log MAX_A) solution:
1)Let's construct the answer greedily, iterating from the highest bit to the lowest one.
2)To do it, one can mantain a set S of numbers that currently fit. Initially, it consists of all numbers in the array. Let's also assume that initially ANS = 0.
3)Now lets iterate over all the bits from the highest to the lowest. Let's say that current bit is B.
4)If the number of elements in S with value 1 of the B-th bit is greater than 1, it is possible to have 1 in this position without changing the values of higher bits in ANS so we should add 2^B to the ANS and remove all elements from S which have 0 value of this bit(they do not fit anymore).
5)Otherwise, it is not possible to obtain 1 in this position, so we do not change S and ANS and proceed to the next bit.
I have to solve a problem when Given a grid size N x M , I have to find the number of parallelograms that "can be put in it", in such way that they every coord is an integer.
Here is my code:
/*
~Keep It Simple!~
*/
#include<fstream>
#define MaxN 2005
int N,M;
long long Paras[MaxN][MaxN]; // Number of parallelograms of Height i and Width j
long long Rects; // Final Number of Parallelograms
int cmmdc(int a,int b)
{
while(b)
{
int aux = b;
b = a -(( a/b ) * b);
a = aux;
}
return a;
}
int main()
{
freopen("paralelograme.in","r",stdin);
freopen("paralelograme.out","w",stdout);
scanf("%d%d",&N,&M);
for(int i=2; i<=N+1; i++)
for(int j=2; j<=M+1; j++)
{
if(!Paras[i][j])
Paras[i][j] = Paras[j][i] = 1LL*(i-2)*(j-2) + i*j - cmmdc(i-1,j-1) -2; // number of parallelograms with all edges on the grid + number of parallelograms with only 2 edges on the grid.
Rects += 1LL*(M-j+2)*(N-i+2) * Paras[j][i]; // each parallelogram can be moved in (M-j+2)(N-i+2) places.
}
printf("%lld", Rects);
}
Example : For a 2x2 grid we have 22 possible parallelograms.
My Algorithm works and it is correct, but I need to make it a little bit faster. I wanna know how is it possible.
P.S. I've heard that I should pre-process the greatest common divisor and save it in an array which would reduce the run-time to O(n*m), but I'm not sure how to do that without using the cmmdc ( greatest common divisor ) function.
Make sure N is not smaller than M:
if( N < M ){ swap( N, M ); }
Leverage the symmetry in your loops, you only need to run j from 2 to i:
for(int j=2; j<=min( i, M+1); j++)
you don't need an extra array Paras, drop it. Instead use a temporary variable.
long long temparas = 1LL*(i-2)*(j-2) + i*j - cmmdc(i-1,j-1) -2;
long long t1 = temparas * (M-j+2)*(N-i+2);
Rects += t1;
// check if the inverse case i <-> j must be considered
if( i != j && i <= M+1 ) // j <= N+1 is always true because of j <= i <= N+1
Rects += t1;
Replace this line: b = a -(( a/b ) * b); using the remainder operator:
b = a % b;
Caching the cmmdc results would probably be possible, you can initialize the array using sort of sieve algorithm: Create an 2d array indexed by a and b, put "2" at each position where a and b are multiples of 2, then put a "3" at each position where a and b are multiples of 3, and so on, roughly like this:
int gcd_cache[N][N];
void init_cache(){
for (int u = 1; u < N; ++u){
for (int i = u; i < N; i+=u ) for (int k = u; k < N ; k+=u ){
gcd_cache[i][k] = u;
}
}
}
Not sure if it helps a lot though.
The first comment in your code states "keep it simple", so, in the light of that, why not try solving the problem mathematically and printing the result.
If you select two lines of length N from your grid, you would find the number of parallelograms in the following way:
Select two points next to each other in both lines: there is (N-1)^2
ways of doing this, since you can position the two points on N-1
positions on each of the lines.
Select two points with one space between them in both lines: there is (N-2)^2 ways of doing this.
Select two points with two, three and up to N-2 spaces between them.
The resulting number of combinations would be (N-1)^2+(N-2)^2+(N-3)^2+...+1.
By solving the sum, we get the formula: 1/6*N*(2*N^2-3*N+1). Check WolframAlpha to verify.
Now that you have a solution for two lines, you simply need to multiply it by the number of combinations of order 2 of M, which is M!/(2*(M-2)!).
Thus, the whole formula would be: 1/12*N*(2*N^2-3*N+1)*M!/(M-2)!, where the ! mark denotes factorial, and the ^ denotes a power operator (note that the same sign is not the power operator in C++, but the bitwise XOR operator).
This calculation requires less operations that iterating through the matrix.