Detailed RAM report and usage of running program on linux

Detailed RAM report and usage of running program on linux - c++

I would like to know how to monitor a specific program (with its pid) and get a report of it's RAM used, like with perf record -p <PID> sleep 15 && perf report, giving me instruction using the most of the memory.
I already heard about top commands, but it is not what I want.

Massif is a heap profiler included in the valgrind suite, and can provide some of this information.
Start it with valgrind --tool=massif <you program>. This will create a massif.out file that contains various "snapshots" of heap memory usage while the program ran. A simpler viewer ms_print is included and will dump all the snapshots with stack traces.
For example:
83.83% (10,476B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->30.03% (3,752B) 0x4E6079B: _nl_make_l10nflist (l10nflist.c:241)
| ->24.20% (3,024B) 0x4E608E7: _nl_make_l10nflist (l10nflist.c:285)
| | ->12.10% (1,512B) 0x4E5A091: _nl_find_locale (findlocale.c:218)
| | | ->12.10% (1,512B) 0x4E5978B: setlocale (setlocale.c:340)
| | | ->12.10% (1,512B) 0x4016BA: main (sleep.c:106)
| | |
| | ->12.10% (1,512B) 0x4E608E7: _nl_make_l10nflist (l10nflist.c:285)
| | ->09.41% (1,176B) 0x4E5A091: _nl_find_locale (findlocale.c:218)
| | | ->09.41% (1,176B) 0x4E5978B: setlocale (setlocale.c:340)
| | | ->09.41% (1,176B) 0x4016BA: main (sleep.c:106)
| | |
| | ->02.69% (336B) 0x4E608E7: _nl_make_l10nflist (l10nflist.c:285)
| | ->02.69% (336B) 0x4E5A091: _nl_find_locale (findlocale.c:218)
| | ->02.69% (336B) 0x4E5978B: setlocale (setlocale.c:340)
| | ->02.69% (336B) 0x4016BA: main (sleep.c:106)
| |
| ->05.83% (728B) 0x4E5A091: _nl_find_locale (findlocale.c:218)
| ->05.83% (728B) 0x4E5978B: setlocale (setlocale.c:340)
| ->05.83% (728B) 0x4016BA: main (sleep.c:106)

Check pmap :
pmap <PID>
With pmap , you can see all of resources which using by process. And in here there are many other techniques.

Related

Django display related objects inside related objects in the admin

According to my assignment admin must be able to create Polls with Questions (create, delete, update) and Choices related to this questions. All of this should be displayed and changable on the same admin page.
Poll
|
|_question_1
| |
| |_choice_1(text)
| |
| |_choice_2
| |
| |_choice_3
|
|_question_2
| |
| |_choice_1
| |
| |_choice_2
| |
| |_choice_3
|
|_question_3
|
|_choice_1
|
|_choice_2
|
|_choice_3
Ok, it's not a problem to display one level of nesting like so on
class QuestionInline(admin.StackedInline):
model = Question
class PollAdmin(ModelAdmin):
inlines = [
QuestionInline,
]
But how to do to get the required poll design structure?

Check out this library it should provide the functionality.

How to retrieve the "Regional Format" in Windows 10

I'm trying to find out how the "regional format" setting on Windows 10 can be retrieved (see picture below).
I tried GetLocaleInfoEx, with virtually all combinations of parameters, but this one showed up nowhere.
On the other hand this setting has an influence on what's returned by GetThreadLocale:
Some examples with expected return values from GetThreadLocale as per this Microsoft documentation, C++ code at the end of the question.
+--------------------------+-----------------------------------+
| Regional format | Value returned by GetThreadLocale |
+--------------------------+-----------------------------------+
| French (Switzerland) | 0x100c |
| French (France) | 0x040c |
| German (Germany) | 0x0407 |
| English (United states) | 0x0409 |
| English (United Kingdom) | 0x0809 |
+--------------------------+-----------------------------------+
Some examples with unexpected (and undocumented) return values from GetThreadLocale:
+-----------------------+-----------------------------------+
| Regional format | Value returned by GetThreadLocale |
+-----------------------+-----------------------------------+
| English (Switzerland) | 0x0c00 |
| English (Germany) | 0x0c00 |
| German (Italy) | 0x0c00 |
+-----------------------+-----------------------------------+
I really wonder what this 0x0c00 value returned by GetThreadLocale is?
C++ code
#include <windows.h>
#include <stdio.h>
int main()
{
printf("GetThreadLocale: %08x\n", GetThreadLocale());
}

Python 2.7 - insert text into a file before closing the file

I am writing some text into a file:
import codecs
outfile=codecs.open("c:/temp/myfile.sps","w+","utf-8-sig")
#procedures for creating the text_to_write
outfile.write (text_to_write)
outfile.close()
Now, what I want to do is to insert into the file an additional text, always at a certain line (say line 10), but this additional text is final only after all the procedures for creating the text_to_write. So the code for inserting the additional text, at line 10, should be the last code:
Is this possible without closing the file, reopening, and then saving again ?
(the reopen-insert-close approach is detailed here, but I would like to avoid it). I am looking for something like this:
import codecs
outfile=codecs.open("c:/temp/myfile.sps","w+","utf-8-sig")
#procedures for creating the text_to_write
outfile.write (text_to_write)
#code for inserting additional text at line 10
outfile.close()

Since you don't know the exact position (in bytes) of the insertion point, you need to read the lines of the file content, insert the additional text after the line 10 and write the file a second time.
note: a Python 2+3 way to open a file is to use the io module instead of the codecs module.
For instance, you have the following text to write and additional text:
text_to_write = u"""\
| 1 | This
| 2 |
| 3 | text
| 4 |
| 5 | contains
| 6 |
| 7 | at
| 8 |
| 9 | least
| 10 |
| 11 | ten
| 12 |
| 13 | lines."""
additional_text = u"""\
| ++ | ADDITIONAL
| ++ | TEXT
"""
You can open the file for reading and writing. The file is created if it does not
exist, otherwise it is truncated. The stream is positioned at
the beginning of the file.
with io.open("file.txt", mode="w+", encoding="utf-8-sig") as f:
f.write(text_to_write)
f.seek(0)
lines = f.readlines()
lines[10:10] = additional_text.splitlines(keepends=True)
f.seek(0)
f.writelines(lines)
This solution is not very efficient because you read the content you just write.
You can also process everything in memory and then write the file.
The result is:
| 1 | This
| 2 |
| 3 | text
| 4 |
| 5 | contains
| 6 |
| 7 | at
| 8 |
| 9 | least
| 10 |
| ++ | ADDITIONAL
| ++ | TEXT
| 11 | ten
| 12 |
| 13 | lines.
Another solution using a list in memory:
lines = text_to_write.splitlines(keepends=True)
lines[10:10] = additional_text.splitlines(keepends=True)
with io.open("file2.txt", mode="w+", encoding="utf-8-sig") as f:
f.writelines(lines)

Reading S3 files in nested directory through Spark EMR

I figured out how to read files into my pyspark shell (and script) from an S3 directory, e.g. by using:
rdd = sc.wholeTextFiles('s3n://bucketname/dir/*')
But, while that's great in letting me read all the files in ONE directory, I want to read every single file from all of the directories.
I don't want to flatten them or load everything at once, because I will have memory issues.
Instead, I need it to automatically go load all the files from each sub-directory in a batched manner. Is that possible?
Here's my directory structure:
S3_bucket_name -> year (2016 or 2017) -> month (max 12 folders) -> day (max 31 folders) -> sub-day folders (max 30; basically just partitioned the collecting each day).
Something like this, except it'll go for all 12 months and up to 31 days...
BucketName
|
|
|---Year(2016)
| |
| |---Month(11)
| | |
| | |---Day(01)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| | |---Day(02)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| |---Month(12)
|
|---Year(2017)
| |
| |---Month(1)
| | |
| | |---Day(01)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| | |---Day(02)
| | | |
| | | |---Sub-folder(01)
| | | |
| | | |---Sub-folder(02)
| | | |
| |---Month(2)
Each arrow above represents a fork. e.g. I've been collecting data for 2 years, so there are 2 years in the "year" fork. Then for each year, up to 12 months max, and then for each month, up to 31 possible day folders. And in each day, there will be up to 30 folders just because I split it up that way...
I hope that makes sense...
I was looking at another post (read files recursively from sub directories with spark from s3 or local filesystem) where I believe they suggested using wildcards, so something like:
rdd = sc.wholeTextFiles('s3n://bucketname/*/data/*/*')
But the problem with that is it tries to find a common folder among the various subdirectories - in this case there are no guarantees and I would just need everything.
However, on that line of reasoning, I thought what if I did..:
rdd = sc.wholeTextFiles("s3n://bucketname/*/*/*/*/*')
But the issue is that now I get OutOfMemory errors, probably because it's loading everything at once and freaking out.
Ideally, what I would be able to do is this:
Go to the sub-directory level of the day and read those in, so e.g.
First read in 2016/12/01, then 2016/12/02, up until 2012/12/31, and then 2017/01/01, then 2017/01/02, ... 2017/01/31 and so on.
That way, instead of using five wildcards (*) as I did above, I would somehow have it know to look trough each sub-directory at the level of "day".
I thought of using a python dictionary to specify the file path to each of the days, but that seems like a rather cumbersome approach. What I mean by that is as follows:
file_dict = {
0:'2016/12/01/*/*',
1:'2016/12/02/*/*',
...
30:'2016/12/31/*/*',
}
basically for all the folders, and then iterating through them and loading them in using something like this:
sc.wholeTextFiles('s3n://bucketname/' + file_dict[i])
But I don't want to manually type out all those paths. I hope this made sense...
EDIT:
Another way of asking the question is, how do I read the files from a nested sub-directory structure in a batched way? How can I enumerate all the possible folder names in my s3 bucket in python? Maybe that would help...
EDIT2:
The structure of the data in each of my files is as follows:
{json object 1},
{json object 2},
{json object 3},
...
{json object n},
For it to be "true json", it either just needed to be like the above without a trailing comma at the end, or something like this (note square brackets, and lack of the final trailing comma:
[
{json object 1},
{json object 2},
{json object 3},
...
{json object n}
]
The reason I did it entirely in PySpark as a script I submit is because I forced myself to handle this formatting quirk manually. If I use Hive/Athena, I am not sure how to deal with it.

Why dont you use Hive, or even better, Athena? These will both deploy tables ontop of file systems, to give you access to all the data. Then you can capture this in to Spark
Alternatively, I believe you can also use HiveQL in Spark to set up a tempTable ontop of your file system location, and it'll register it all as a Hive table which you can execute SQL against. It's been a while since I've done that, but it is definitely do-able

How should I implement sub-windows in my OpenGL viewport?

How should I implement sub-windows in my OpenGL viewport? Inside my viewport, I want to reserve some space on the left for labels, and some space around the edges as a border. I've got all the coordinates figured out and everything is displaying properly. My problem is clipping the things in one subwindow that are spilling over into the others. I can't seem to figure out what the OpenGL 3.3, core context way of doing things is. Is it to
use per-vertex clipping?
a scissor test?
a stencil test?
associate a framebuffer with different parts of my window?
Which commands should I be looking at?

Before I spend time writing a full answer, I would like you to confirm that this is what you were describing in your original question:
*---------------------------------------*
| ------------------------------------- |
| | | | |
| | | | |
| | | | |
|C| A | B |C|
| | | | |
| | | | |
| |___|_______________________________| |
*---------------------------------------*
A = Labels
B = Main Window
C = Border

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Detailed RAM report and usage of running program on linux - c++

I would like to know how to monitor a specific program (with its pid) and get a report of it's RAM used, like with perf record -p <PID> sleep 15 && perf report, giving me instruction using the most of the memory. I already heard about top commands, but it is not what I want.

Check pmap : pmap <PID> With pmap , you can see all of resources which using by process. And in here there are many other techniques.

Related

Django display related objects inside related objects in the admin

How to retrieve the "Regional Format" in Windows 10

Python 2.7 - insert text into a file before closing the file

Reading S3 files in nested directory through Spark EMR

How should I implement sub-windows in my OpenGL viewport?

Categories

Resources