How to accelerate reading elements from a C++ array

How to accelerate reading elements from a C++ array - c++

I need to loop a big array, 2 Million elements, many many times.
The structure of my code looks like:
loop_tag= 0;
// the ig1, ig2 loop will run many times,
// and N & M are about 30000
for(ig1=0; ig1<N; ig1++)
{
for(ig2=0; ig2<M; ig2++)
{
for(k=0;k<45;k++)
{
element_val = arr[loop_tag];
loop_tag ++;
// there're a few lines to calculate something
}
if(loop_tag == the end of arr){loop_tag=0;}
}
}
I will run the code more than 100 000 times, each time takes me about 200~1000 sec. Actually, I have used MPI to save time. But it still needs about 10 hours with 300 CPUs being used.
I find that most of the time is spent on "element_val = arr[loop_tag];". If I just assign a value to element_val like "element_val = 0.01", the time of each running will be just about 30% of before.
How can I accelerate this part? Thanks!
Here are some outputs in the log file:
expo pair: 0-0-0(20193) <-> 1-0-1(22275)
Finish in 1017.22 sec. 4.49795e+08 pairs. Expo pairs got now: 1
Now 0 buffers, 1 block in current buffer
expo pair: 23-5-2(18259) <-> 201-18-1(9704)
Finish in 70.86 sec. 3.17283e+07 pairs. Expo pairs got now: 2
Now 0 buffers, 2 block in current buffer
expo pair: 23-5-2(18259) <-> 559-47-1(15243)
Finish in 608.50 sec. 2.78322e+08 pairs. Expo pairs got now: 3
Now 0 buffers, 3 block in current buffer
Here is the code inside the ig1- &ig2-loop. The code reads data from two exposure files each time, then it loop the lines of them. I put all the data, arrays, in the structure, expo_info.
// if two galaxies come from the same CFHTLenS exposure, break
if(expo_info->obs_expo_label_1[ig1] == expo_info->obs_expo_label_2[ig2]){break;}
n = ig2*expo_info->expo_data_col;
ra_z2 = expo_info->expo_data[expo_label_1][n+expo_info->ra_idx];
dec_z2 = expo_info->expo_data[expo_label_1][n+expo_info->dec_idx];
cos_dec_z2 = expo_info->expo_data[expo_label_1][n+expo_info->cos_dec_idx];
// the seperation angle (arc minute)
delta_ra = (ra_z2 - ra_z1)*cos_dec_z1;
delta_dec = dec_z2 - dec_z1;
delta_radius = sqrt(delta_ra*delta_ra + delta_dec*delta_dec);
theta_tag = -1;
for(ir=0; ir<expo_info->theta_bin_num; ir++)
{
if(delta_radius > expo_info->theta_bin[ir] and delta_radius <= expo_info->theta_bin[ir+1]){theta_tag=ir;break;}
}
if(theta_tag > -1)
{
pairs+= 1;
// shear estimators rotation (position angle defined as East of North)
sin_theta = delta_ra/delta_radius;
cos_theta = delta_dec/delta_radius;
sin_2theta = 2*sin_theta*cos_theta;
cos_2theta = cos_theta*cos_theta - sin_theta*sin_theta;
sin_4theta = 2*sin_2theta*cos_2theta;
cos_4theta = cos_2theta*cos_2theta - sin_2theta*sin_2theta;
mg1_z2 = expo_info->expo_data[expo_label_1][n+expo_info->mg1_idx]*cos_2theta - expo_info->expo_data[expo_label_1][n+expo_info->mg2_idx]*sin_2theta;
mg2_z2 = expo_info->expo_data[expo_label_1][n+expo_info->mg1_idx]*sin_2theta + expo_info->expo_data[expo_label_1][n+expo_info->mg2_idx]*cos_2theta;
mnu1_z2 = expo_info->expo_data[expo_label_1][n+expo_info->mu_idx]*cos_4theta -expo_info->expo_data[expo_label_1][n+expo_infomv_idx]*sin_4theta;
mnu2_z2 = mnu1_z2;
mnu1_z2 = expo_info->expo_data[expo_label_1][n+expo_info->mn_idx] + mnu2_z2;
mnu2_z2 = expo_info->expo_data[expo_label_1][n+expo_info->mn_idx] - mnu2_z2;
// there're zbin_num *zbin_num blocks, iz1 is row, iz2 is the col, each block
// has a length of mg_bin_num*mg_bin_num*chi_guess_num*theta_bin_num.
iz2 = expo_info->expo_zbin_label[expo_label_1][ig2];
////////////////////// the key part of PDF_SYM //////////////////////////////
ic_len = theta_tag*ir_chi_block_len + (iz1 + iz2)*expo_info-iz_chi_block_len;
gg_1 = expo_info->gg_1[loop_label];
gg_2 = expo_info->gg_2[loop_label];
temp_tt[2] = mg1_z1 - gg_1*mnu1_z1;
temp_tt[3] = mg1_z2 - gg_2*mnu1_z2;
hist_2d_new(temp_tt[2], temp_tt[3], expo_info->mg_bin,
mg_bin_num,mg_bin_num1, mg_bin_num2, mg_bin_num3, ix_tt, iy_tt);
expo_info->expo_num_count_chit[ic_len + iy_tt*mg_bin_num+ix_tt] += 1;
temp_xx[2] = mg2_z1 - gg_1*mnu2_z1;
temp_xx[3] = mg2_z2 - gg_2*mnu2_z2;
hist_2d_new(temp_xx[2], temp_xx[3], expo_info->mg_bin,
mg_bin_num,mg_bin_num1, mg_bin_num2, mg_bin_num3, ix_xx, iy_xx);
expo_info->expo_num_count_chix[ic_len + iy_xx*mg_bin_num+ix_xx] += 1;
loop_label += 1;
for(ic=1; ic<chi_guess_num; ic++)
{
ic_len += chi_block_len;
// these two lines, gg_1 & gg_2, take a lot of time.
// expo_info->gg_1 & gg_2 are two big arrays, 2 Million elements
// if I just use something like gg_1 = 0.001; gg_2 == 0.001,
// it runs very fast
gg_1 = expo_info->gg_1[loop_label];
gg_2 = expo_info->gg_2[loop_label];
bin_para_tt[0] = ix_tt;
bin_para_tt[1] = iy_tt;
temp_tt[0] = temp_tt[2];
temp_tt[1] = temp_tt[3];
temp_tt[2] = mg1_z1 - gg_1*mnu1_z1;
temp_tt[3] = mg1_z2 - gg_2*mnu1_z2;
hist_2d_new(expo_info->mg_bin, mg_bin_num, temp_tt, bin_para_tt, ix_tt, iy_tt);
expo_info->expo_num_count_chit[ic_len + iy_tt*mg_bin_num+ix_tt] += 1;
bin_para_xx[0] = ix_xx;
bin_para_xx[1] = iy_xx;
temp_xx[0] = temp_xx[2];
temp_xx[1] = temp_xx[3];
temp_xx[2] = mg2_z1 - gg_1*mnu2_z1;
temp_xx[3] = mg2_z2 - gg_2*mnu2_z2;
hist_2d_new(expo_info->mg_bin, mg_bin_num, temp_xx, bin_para_xx, ix_xx, iy_xx);
expo_info->expo_num_count_chix[ic_len + iy_xx*mg_bin_num+ix_xx] += 1;
loop_label += 1;
}
if(loop_label >= gg_len){loop_label = 0;}
////////////////////// the key part of PDF_SYM -end //////////////////////////////
}
I find most of the time is spent on
gg_1 = expo_info->gg_1[loop_label];
gg_2 = expo_info->gg_2[loop_label];

Related

Converting Youtube Data API V3 video duration format to seconds in Dart

In my case i get the time in this format : PT2H3M20S i have no idea about the regex expression [using dart] so I just want to know how can we calculate milliseconds from above format.. thanks in advance
Future<http.Response> getVideoDuration({var videoUri}) async {
// print(videoUri);
final BI_YT_API_KEY = "some_API";
var lArr = videoUri.split('/');
var lId = lArr[lArr.length - 1];
var data = await http.get('https://www.googleapis.com/youtube/v3/videos' +
"?id=$lId&part=contentDetails&key=$BI_YT_API_KEY");
if (data.statusCode == 200) {
var jom = json.decode(data.body);
print(jom['items'][0]['contentDetails']['duration']);
var duration = data.body[0];
}

Took some time. But fiinally done.
You can use it like this.
int seconds = convertTime("PT1H11S");
Here, seconds will be the converted duration in seconds. So, for PT1H11S, the answer will be, 3611 because of 1 hour == 3600 seconds + 11 seconds.
int convertTime(String duration) {
RegExp regex = new RegExp(r'(\d+)');
List<String> a = regex.allMatches(duration).map((e) => e.group(0)!).toList();
if (duration.indexOf('M') >= 0 &&
duration.indexOf('H') == -1 &&
duration.indexOf('S') == -1) {
a = ["0", a[0], "0"];
}
if (duration.indexOf('H') >= 0 && duration.indexOf('M') == -1) {
a = [a[0], "0", a[1]];
}
if (duration.indexOf('H') >= 0 &&
duration.indexOf('M') == -1 &&
duration.indexOf('S') == -1) {
a = [a[0], "0", "0"];
}
int seconds = 0;
if (a.length == 3) {
seconds = seconds + int.parse(a[0]) * 3600;
seconds = seconds + int.parse(a[1]) * 60;
seconds = seconds + int.parse(a[2]);
}
if (a.length == 2) {
seconds = seconds + int.parse(a[0]) * 60;
seconds = seconds + int.parse(a[1]);
}
if (a.length == 1) {
seconds = seconds + int.parse(a[0]);
}
return seconds;
}

I also managed to get the duration in seconds using dart (If in case someone needed it)
/// For duration = 2H1M48S
converToSeconds(String duration){
var hour = "", minute = "", seconds = "";
var tempList = duration.split('');
/// HOUR
if (tempList.contains('H')) {
var ind = tempList.indexOf('H');
for (int i = 0; i < ind; i++) {
hour = hour + tempList[i];
}
tempList.removeRange(0, ind + 1);
}
/// MINUTES
if (tempList.contains('M')) {
var ind = tempList.indexOf('M');
for (int i = 0; i < ind; i++) {
minute = minute + tempList[i];
}
tempList.removeRange(0, ind + 1);
}
/// SECONDS
if (tempList.contains('S')) {
var ind = tempList.indexOf('S');
for (int i = 0; i < ind; i++) {
seconds = seconds + tempList[i];
}
tempList.removeRange(0, ind + 1);
}
/// CONVER TO INT
hour = hour != "" ? hour : '0';
seconds = seconds != "" ? seconds : '0';
minute = minute != "" ? minute : '0';
var ms = ((int.parse(hour) * 3600 + int.parse(minute) * 60) + int.parse(seconds));
}

Note: I do not know flutter but I have heard that a flutter developer should be able to use Java code. This answer is based on Java.
tl;dr
With Java, all you need is:
Duration.parse("PT2H3M20S").toMillis()
java.time.Duration is modelled on ISO-8601 standards and was introduced with Java-8 as part of JSR-310 implementation.
If you have gone through the above links, you might have already noticed that PT2H3M20S specifies a duration of 2 hours 3 minutes 20 seconds that you can parse to a Duration object which you can convert into milliseconds.
Demo:
import java.time.Duration;
public class Main {
public static void main(String[] args) {
String strIso8601Duration = "PT2H3M20S";
Duration duration = Duration.parse(strIso8601Duration);
long millis = duration.toMillis();
System.out.println(millis);
}
}
Output:
7400000
Learn more about the modern date-time API* from Trail: Date Time.
* For any reason, if you have to stick to Java 6 or Java 7, you can use ThreeTen-Backport which backports most of the java.time functionality to Java 6 & 7. If you are working for an Android project and your Android API level is still not compliant with Java-8, check Java 8+ APIs available through desugaring and How to use ThreeTenABP in Android Project.

ON and OFF time control over 24 hour period in C, C++

Using c, c++ (mbed, Arduino, etc), Is there a trick up c's sleeve to be able to set an ON time and OFF time over a 24 hour period. For instance 'ON' at 20:00 hours and off at 06:30 hours following morning.
Timers are no good here if there is a nvic reset. If the device does fall over and restart's at say 23:40 hours, we still need to service that 20:00 to 06:30 time frame.
Stuck on the going past midnight.
I've got this far using seconds but not quite working, but I'm sure I'm barking up the wrong tree so I would appreciate some clever input here.
lockStatus = 1 is 'ON'
lockStatus = 0 is 'OFF'
void autoLOCK()
{
int hour_from, minute_from = 0;
int seconds_from = 0 ;
int hour_to, minute_to = 0;
int seconds_to = 0;
lockFrom = "20:00";
lockTo = "06:30";
if (sscanf(lockFrom, "%d:%d", &hour_from, &minute_from) >= 2)
{
seconds_from = (hour_from * 3600 + minute_from * 60);
}
if (sscanf(lockTo, "%d:%d", &hour_to, &minute_to) >= 2)
{
seconds_to = (hour_to * 3600 + minute_to * 60);
}
lockStatus = 0;
if (seconds_now >= seconds_from) {
lockStatus = 1;
}
if (seconds_from > seconds_to) {
lockStatus = 1;
}
if (seconds_now >= seconds_to && seconds_from >= seconds_to) {
lockStatus = 0;
}
Serial.printf("Lock Status: %d\n\n", lockStatus);
}

GIF LZW decompression

I am trying to implement a simple Gif-Reader in c++.
I currently stuck with decompressing the Imagedata.
If an image includes a Clear Code my decompression algorithm fails.
After the Clear Code I rebuild the CodeTable reset the CodeSize to MinimumLzwCodeSize + 1.
Then I read the next code and add it to the indexstream. The problem is that after clearing, the next codes include values greater than the size of the current codetable.
For example the sample file from wikipedia: rotating-earth.gif has a code value of 262 but the GlobalColorTable is only 256. How do I handle this?
I implemented the lzw decompression according to gif spec..
here is the main code part of decompressing:
int prevCode = GetCode(ptr, offset, codeSize);
codeStream.push_back(prevCode);
while (true)
{
auto code = GetCode(ptr, offset, codeSize);
//
//Clear code
//
if (code == IndexClearCode)
{
//reset codesize
codeSize = blockA.LZWMinimumCodeSize + 1;
currentNodeValue = pow(2, codeSize) - 1;
//reset codeTable
codeTable.resize(colorTable.size() + 2);
//read next code
prevCode = GetCode(ptr, offset, codeSize);
codeStream.push_back(prevCode);
continue;
}
else if (code == IndexEndOfInformationCode)
break;
//exists in dictionary
if (codeTable.size() > code)
{
if (prevCode >= codeTable.size())
{
prevCode = code;
continue;
}
for (auto c : codeTable[code])
codeStream.push_back(c);
newEntry = codeTable[prevCode];
newEntry.push_back(codeTable[code][0]);
codeTable.push_back(newEntry);
prevCode = code;
if (codeTable.size() - 1 == currentNodeValue)
{
codeSize++;
currentNodeValue = pow(2, codeSize) - 1;
}
}
else
{
if (prevCode >= codeTable.size())
{
prevCode = code;
continue;
}
newEntry = codeTable[prevCode];
newEntry.push_back(codeTable[prevCode][0]);
for (auto c : newEntry)
codeStream.push_back(c);
codeTable.push_back(newEntry);
prevCode = codeTable.size() - 1;
if (codeTable.size() - 1 == currentNodeValue)
{
codeSize++;
currentNodeValue = pow(2, codeSize) - 1;
}
}
}

Found the solution.
It is called Deferred clear code. So when I check if the codeSize needs to be incremented I also need to check if the codeSize is already max(12), as it is possible to to get codes that are of the maximum Code Size. See spec-gif89a.txt.
if (codeTable.size() - 1 == currentNodeValue && codeSize < 12)
{
codeSize++;
currentNodeValue = (1 << codeSize) - 1;
}

optimize octree octant_determination function in c++

I am building a spacial octree. In order to determine in which branch/octant a certain point (x,y,z) should be placed, I use this function:
if (x>x_centre) {
xsign = 1;
}
else {
xsign = 0;
}
if (y>y_centre) {
ysign = 1;
}
else {
ysign = 0;
}
if (z>z_centre) {
zsign = 1;
}
else {
zsign = 0;
}
return xsign + 2*ysign + 4*zsign;
It returns a number between 0 and 7 unique for every octant. It turns out this snippet is called a big many times. It gets quite time consuming when building large trees.
Is there any easy way to speed this proces up?
This allready gives a 30 percent speed up:
xsign = x>x_centre;
ysign = y>y_centre;
zsign = z>y_centre;
return xsign + 2*ysign + 4*zsign;
Any other tips?

What am I doing wrong? (multithreading)

Here s what I'm doing in a nutshell.
In my class's cpp file I have:
std::vector<std::vector<GLdouble>> ThreadPts[4];
The thread proc looks like this:
unsigned __stdcall BezierThreadProc(void *arg)
{
SHAPETHREADDATA *data = (SHAPETHREADDATA *) arg;
OGLSHAPE *obj = reinterpret_cast<OGLSHAPE*>(data->objectptr);
for(unsigned int i = data->start; i < data->end - 1; ++i)
{
obj->SetCubicBezier(
obj->Contour[data->contournum].UserPoints[i],
obj->Contour[data->contournum].UserPoints[i + 1],
data->whichVector);
}
_endthreadex( 0 );
return 0;
}
SetCubicBezier looks like this:
void OGLSHAPE::SetCubicBezier(USERFPOINT &a,USERFPOINT &b, int &currentvector )
{
std::vector<GLdouble> temp;
if(a.RightHandle.x == a.UserPoint.x && a.RightHandle.y == a.UserPoint.y
&& b.LeftHandle.x == b.UserPoint.x && b.LeftHandle.y == b.UserPoint.y )
{
temp.clear();
temp.push_back((GLdouble)a.UserPoint.x);
temp.push_back((GLdouble)a.UserPoint.y);
ThreadPts[currentvector].push_back(temp);
temp.clear();
temp.push_back((GLdouble)b.UserPoint.x);
temp.push_back((GLdouble)b.UserPoint.y);
ThreadPts[currentvector].push_back(temp);
}
}
The code that calls the threads looks like this:
for(int i = 0; i < Contour.size(); ++i)
{
Contour[i].DrawingPoints.clear();
if(Contour[i].UserPoints.size() < 2)
{
break;
}
HANDLE hThread[4];
SHAPETHREADDATA dat;
dat.objectptr = (void*)this;
dat.start = 0;
dat.end = floor((Contour[i].UserPoints.size() - 1) * 0.25);
dat.whichVector = 0;
dat.contournum = i;
hThread[0] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
dat.start = dat.end;
dat.end = floor((Contour[i].UserPoints.size() - 1) * 0.5);
dat.whichVector = 1;
hThread[1] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
dat.start = dat.end;
dat.end = floor((Contour[i].UserPoints.size() - 1) * 0.75);
dat.whichVector = 2;
hThread[2] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
dat.start = dat.end;
dat.end = Contour[i].UserPoints.size();
dat.whichVector = 3;
hThread[3] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
WaitForMultipleObjects(4,hThread,true,INFINITE);
}
Is there something wrong with this?
I'd expect it to fill ThreadPts[4]; ... There should never be any conflicts the way I have it set up. I usually get error writing at... on the last thread where dat->whichvector = 3. If I remove:
dat.start = dat.end;
dat.end = Contour[i].UserPoints.size();
dat.whichVector = 3;
hThread[3] = (HANDLE)_beginthreadex(NULL,0,&BezierThreadProc,&dat,0,0);
Then it does not seem to crash, what could be wrong?
Thanks

The problem is that you're passing the same dat structure to each thread as the argument to the threadproc.
For example, When you start thread 1, there's no guarantee that it will have read the information in the dat structure before your main thread starts loading that same dat structure with the information for thread 2 (and so on). In fact, you're constantly directly using that dat structure throughout the thread's loop, so the thread won't be finished with the structure passed to it until the thread is basically done with all its work.
Also note that currentvector in SetCubicBezier() is a reference to data->whichVector, which is referring to the exact same location in a threads. So SetCubicBezier() will be performing push_back() calls on the same object in separate threads because of this.
There's a very simple fix: you should use four separate SHAPETHREADDATA instances - one to initialize each thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to accelerate reading elements from a C++ array - c++

Related

Converting Youtube Data API V3 video duration format to seconds in Dart

ON and OFF time control over 24 hour period in C, C++

GIF LZW decompression

optimize octree octant_determination function in c++

What am I doing wrong? (multithreading)

Categories

Resources