Octo.py only using between 0% and 3% of my CPUs

Octo.py only using between 0% and 3% of my CPUs - python-2.7

I have been running a Python octo.py script to do word counting/author on a series of files. The script works well -- I tried it on a limited set of data and am getting the correct results.
But when I run it on the complete data set it takes forever. I am running on a windows XP laptop with dual core 2.33 GHz and 2 GB RAM.
I opened up my CPU usage and it shows the processors running at 0%-3% of maximum.
What can I do to force Octo.py to utilize more CPU?
Thanks.

As your application isn't very CPU intensive, the slow disk turns out to be the bottleneck. Old 5200 RPM laptop hard drives are very slow, which, in addition to fragmentation and low RAM (which impacts disk caching), make reading very slow. This in turns slows down processing and yields low CPU usage. You can try defragmenting, compressing the input files (as they become smaller in disk size, processing speed will increase) or other means of improving IO.

Related

How to monitor TLB misses (windows)?

Is there a way in windows to monitor the TLB misses of an application? The resource monitor in windows shows me 0 Hard faults/sec (so, TLB misses where the page isn't in main memory). But is there also some way to monitor TLB misses where the page is in main memory?
I have an app where I need to make random accesses across about 100GB of data. I'm running it on a computer with 160GB of ram and am keeping all the data in the working set. Still, what I'm seeing is that during the sections where the random access happens the CPUs are shown in windows task manager to only run at about 20% load (the app is multi-threaded with as many threads as there are cpu cores, without any critical sections and with no I/O).
I'm currently suspecting TLB misses to be the problem and wonder how I could confirm/reject this theory.

Why is my file reading via fstream::read slow on another computer with similar hardware?

Assuming the following code:
constexpr int nBufferSize = 1024 * 1024;
auto aBuffer = std::make_unique<char[]>(nBufferSize);
std::ifstream pInput(pFile, std::ios::binary | std::ios::in);
while(pInput)
{
pInput.read(&aBuffer[0], nBufferSize);
}
On my desktop computer, this entire task takes 1400ms to complete on the first run, then 1100ms on the second run. The file size is 1.8GB and I'm reading through an M.2 SSD.
When I run it through my HDD, the task takes 9000ms~ to complete. Which makes sense, hard drives are slower.
However, on my colleague's machine that runs an M.2 SSD (although of a different brand), the task takes 12000ms to complete, which is even slower than my HDD.
Our CPUs and RAM speeds are similar (R7 3700x here, R5 3600x on their machine. 3200MHz RAM on both machines). Both of the operating systems are the latest versions of Windows 10. They ran system health checks to ensure that the hardware isn't faulty, and it seems fine. Other programs, disk benchmarking software included, do not seem to be affected by any kind of slowness and our speeds are almost identical there.

Could be a lot of things: amount of RAM, how many programs and services are running, health of the drive, how full the drive is, etc.
The first thing I'd check for is antivirus interference, which if you've sent your colleague an exe is probably scanning everything you're writing.

Retrieving disk read/write max speed (programmatically)

I am in the process of creating a C++ application that measures disk usage. I've been able to retrieve current disk usage (read and write speeds) by reading /proc/diskstats at regular intervals.
I would now like to be able to display this usage as a percentage (I find it is more user-friendly than raw numbers, which can be hard to interpret). Therefore, does anyone know of a method for retrieving maximum (or nominal) disk I/O speed programmatically on Linux (API call, reading a file, etc)?
I am aware of various answers about measuring disks speeds(eg https://askubuntu.com/questions/87035/how-to-check-hard-disk-performance), but all are through testing. I would like to avoid such methods as they take some time to run and entail heavy disk I/O while running (thus potentially degrading the performance of other running applications).

In the advent of IBM PC era, there was a great DOS utility, I forgot its name, but it was measuring the speed of the computer (maybe Speedtest? whatever). There was a bar in the 2/3 bottom of the screen, which is represented the speed of the CPU. If you had a 4.0 MHz (not GHz!) the bar occupied the 10% of the screen.
2-3 years later, '386 computers has risen, and the speed indicator bar overgrown not just the line but the screen, and it looked crappy.
So, there is no such as 100% disk speed, CPU speed etc.
The best you can do: if you program runs for a while, you can remember the highest value and set it as 100%. Probably you may save the value into a tmp file.

How to tell that a c++ application has disk I/O bottleneck?

I'm working on a "search" project. The main idea is how to build a index to respond to the search request as fast as possible. The input is a query, such as "termi termj", ouput is docs where both termi and termj appear.
the index file looks like this:(each line is called a postlist, which is sorted array of unsigned int and can be compressed with good compression ratio)
term1:doc1, doc5, doc8, doc10
term2:doc10, doc51, doc111, doc10000
...
termN:doc2, doc4, doc10
3 main time resuming procedure is
seek termi and termj's postlist in file (random disk read)
decode the postlists (cpu)
calculate the intersection of 2 postlists (cpu)
My question is, How can I know that the application can't be more efficient, it has a disk I/O bottleneck? How can I measure if my computer has used his disk 100 percent? Are there any tools on linux to help? Is there any tools can measure disk I/O perfectly like google cpu profiler can measure cpu?
My develop env is Ubuntu 14.04.
CPU: 8 cores 2.6GHz
disk: SSD
benchmark now is about 2000 queries/second, but I don't know how to improve it.
Any suggestion will be appreciated! Thank you very much!

Eclipse not closing previous build CPU usage 100%

I am using Eclipse for C/C++ development. I am trying to compile and run a project. When I compile and run the project after a while my CPU gets to 100% usage . I checked "Task Manager" and there I found that Eclipse isn't closing any of the previous build and it's running in the background which uses my CPU heavily. How do I solve this problem. When at 100% usage my PC becomes very very slow.

If you don't want the build to use up all your CPU time (maybe because you want to do other stuff while building) then you could decrease the parallelism of the build to a point where it leaves one or more cores unused. For example, if you have 8 cores you could configure your build to only use 6 of them.
Your build will take longer, but your machine will be more responsive for other tasks while the build runs.

Adding More RAM seems to have solved my problem. Disk usage is also low now. Maybe Since there wasnt enough RAM in my laptop the CPU was fetching data from the Disk directly which made the disk usage to go up.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Octo.py only using between 0% and 3% of my CPUs - python-2.7

Related

How to monitor TLB misses (windows)?

Why is my file reading via fstream::read slow on another computer with similar hardware?

Retrieving disk read/write max speed (programmatically)

How to tell that a c++ application has disk I/O bottleneck?

Eclipse not closing previous build CPU usage 100%

Categories

Resources