dtrace execute action only when the function returns to a specific module

dtrace execute action only when the function returns to a specific module - dtrace

I'm tracking some libc functions with dtrace. I want to make a predicate that only executes the action when the function returns to an adress into a specific module given in the parameters.
copyin(uregs[R_ESP],1) on the return probe should give the return adress i think, i'm not entirely sure of it so it would be nice if someone can confirm.
But then i need a way to resolve that adress to a module, is this possible and how?

There is a ucaller variable which will give you the
saved program counter as a uint64_t and umod() will
translate it into the corresponding module name, e.g.
# dtrace -n 'pid$target:::entry {#[umod(ucaller)]=count()}' -p `pgrep -n xscreensaver`
dtrace: description 'pid$target:::entry ' matched 14278 probes
^C
xscreensaver 16
libXt.so.4 73
libX11.so.4 92
libxcb.so.1 141
libc.so.1 144
^C#
However, umod() is an action (as opposed to a subroutine); it
cannot be assigned to an lvalue and therefore cannot be used in
an expression (because the translation is deferred until the address
is received by the dtrace(1) user-land program).
Fortunately, there's nothing stopping you from finding the address
range occupied by libc in your process and comparing it with ucaller.
Here's an example on Solaris (where a hardware-specific libc is
mounted at boot time):
# mount | fgrep libc
/lib/libc.so.1 on /usr/lib/libc/libc_hwcap1.so.1 read/write/setuid/devices/rstchown/dev=30d0002 on Sat Jul 13 20:27:32 2013
# pmap `pgrep -n gedit` | fgrep libc_hwcap1.so.1
FEE10000 1356K r-x-- /usr/lib/libc/libc_hwcap1.so.1
FEF73000 44K rwx-- /usr/lib/libc/libc_hwcap1.so.1
FEF7E000 4K rwx-- /usr/lib/libc/libc_hwcap1.so.1
#
I'll assume that the text section is the one with only
read & execute permissions, but note that in some
circumstances the text section will be writeable.
# cat Vision.d
/*
* self->current is a boolean indicating whether or not execution is currently
* within the target range.
*
* self->next is a boolean indicating whether or not execution is about to
* return to the target range.
*/
BEGIN
{
self->current = 1;
}
pid$target:::entry
{
self->current = (uregs[R_PC] >= $1 && uregs[R_PC] < $2);
}
syscall:::return
/pid==$target/
{
self->next = self->current;
self->current = 0;
}
pid$target:::return
{
self->next = (ucaller >= $1 && ucaller < $2);
}
pid$target:::return,syscall:::return
/pid==$target && self->next && !self->current/
{
printf("Returning to target from %s:%s:%s:%s...\n",
probeprov, probemod, probefunc, probename);
ustack();
printf("\n");
}
pid$target:::return,syscall:::return
/pid==$target/
{
self->current = self->next;
}
# dtrace -qs Vision.d 0xFEE10000 0xFEF73000 -p `pgrep -n gedit`
This produces results like
Returning to target from pid2095:libcairo.so.2.10800.10:cairo_bo_event_compare:return...
libcairo.so.2.10800.10`cairo_bo_event_compare+0x158
libc.so.1`qsort+0x51c
libcairo.so.2.10800.10`_cairo_bo_event_queue_init+0x122
libcairo.so.2.10800.10`_cairo_bentley_ottmann_tessellate_bo_edges+0x2d
libcairo.so.2.10800.10`_cairo_bentley_ottmann_tessellate_polygon+0
.
.
.
Returning to target from syscall::pollsys:return...
libc.so.1`__pollsys+0x15
libc.so.1`poll+0x81
libxcb.so.1`_xcb_conn_wait+0xb5
libxcb.so.1`_xcb_out_send+0x3b
libxcb.so.1`xcb_writev+0x65
libX11.so.4`_XSend+0x17c
libX11.so.4`_XFlush+0x30
libX11.so.4`XFlush+0x37

Related

In conky, how do I nest a variable within a template?

In conky, how do I nest a variable within a template?
EXAMPLES:
${template2 enp0s25} <- WORKS (fixed string)
${template2 ${gw_iface}} < FAILS (nested variable)
${template2 ${execpi 10 ls -d /sys/class/net/enp* 2> /dev/null | sed -e 's,/sys/class/net/,,'}} <- FAILS (nested variable command)
I've also tried (and failed):
${combine ${template2 ${gw_iface}}}
${combine ${template2} ${gw_iface}}
Here is "template2":
template2 = [[
${if_existing /proc/net/route \1}Wired Ethernet ("\1"):
- MAC: ${execi 5 cat /sys/class/net/\1/address} IP: ${addr \1}
- Max: ${execi 5 /sbin/ethtool '\1' 2>/dev/null | sed -n -e 's/^.*Speed: //p'}${goto 190}${if_match ${downspeedf \1} > 0}${font :bold:size=14}${endif}Down: ${downspeedf \1}kB/s${font}${goto 370}${if_match ${upspeedf \1} > 0}${font :bold:size=14}${endif}Up: ${upspeedf \1}kB/s${font}
${endif}]]
Thanks for the help.

Templates are a little limited as you cannot evaluate the parameters before they are passed through. One workaround is to use a bit of lua code to do the eval explicitly and then parse the template. For example,
conky.config = {
lua_load = '/tmp/myfunction.lua',
...
};
conky.text = [[
${lua myeval template2 ${gw_iface}}
]]
Create the lua file, /tmp/myfunction.lua holding
function conky_myeval(tpl, var1)
v = conky_parse(var1)
cmd = "${"..tpl.." "..v.."}"
return conky_parse(cmd)
end
The lua function takes the name of the template, tpl, and the parameter to evaluate, var1. It evaluates the latter using conky_parse() to, say, string "xxx", then constructs a new string "${template2 xxx}", which is parsed and returned as the value of the ${lua} call.
The same can be done for the longer example ${execpi ...} too.

gdb how to print variable name along with variable value like "$number = variable-name = variable-value"

By default, using p variable-name will display $num = variable-value, $num is the value history, but it there a way to print the variable name along with the variable value like $num = variable-name = variable-value?
I want this since I use
define p
set $i = 0
while $i < $argc
eval "print $arg%d", $i
set $i = $i + 1
end
end
in my ~/.gdbinit, to redefine p command so I can use p var1 var2 var3... to print multiple variables at once, but the print command only output $num = variable-value, and I don't know what the exact variable is in the output, the other situation is when I print the value history using just p $num, it is not that readable, I don't know the exact variable name.
NOTE: the variable may be int/char/pointer/array/vector/...

A solution could be to first add to the display list the wanted variables and then to display all of them together. Note that, it is needed to free the display list before with undisplay, otherwise it also prints the variables of the previous executions.
define p
set confirm off
eval "undisplay"
set confirm on
set $i = 0
while $i < $argc
eval "display $arg%d", $i
set $i = $i + 1
end
display
end
The undisplay evaluation is enclosed between set confirm off/on to suppress the following message:
[answered Y; input not from terminal]
If you have already set the confirm off option in your ~/gdbinit file, you will need to remove these two lines.
Edit: Honestly, I came to know about the display command finding a solution for this question. Although this answer might be useful to print multiple variables with their respective names, after several days using display in my workflow, I discourage to use this answer since I have come to the conclusion that display itself fits better at least my needs (printing multiple variables at every stop). Here the official doc:
If you find that you want to print the value of an expression frequently (to see how it changes), you might want to add it to the automatic display list so that GDB prints its value each time your program stops. Each expression added to the list is given a number to identify it; to remove an expression from the list, you specify that number. The automatic display looks like this:
2: foo = 38
3: bar[5] = (struct hack *) 0x3804
Basically, I have started using the command like this: I add a variable with display $var to the list of variables, and every time a breakpoint is reached the listed variables are automatically printed. It makes sense to have a feature like this in gdb. Thanks #CodyChan for the motivation.

In a nutshell, we want to output
$num = variable-name = variable-value
instead of
$num = variable-value
As far as I can tell, gdb adds to the value history in only three places: the print command, the call command, and the history-append! Scheme function. Since my Scheme is rusty, we'll need to use the CLI or Python to run print and modify its output.
Using the CLI
define pp
set $i = 0
while $i < $argc
eval "pipe print $arg%d | awk -v name='$arg%d' '{ if (NR == 1 && $2 == \"=\") { $2 = \"= \" name \" =\" }; print }'", $i, $i
set $i++
end
end
Pipe is new in gdb 10.
That awk command is, after unescaping,
awk -v name='$arg%d' '{ if (NR == 1 && $2 == "=") { $2 = "= " name " =" }; print }'
which changes the = (second field) in $num = variable-value to = variable-name = . If gdb's print command outputs more than one line, the NR == 1 in the awk command makes sure the substitution is only done on the first line.
Security note: gdb's pipe command appears to parse the shell_command into tokens and uses execve to run it, rather than passing it to an actual shell. This prevents some code injection attacks (if, for instance, the $arg%d in name='$arg%d' contains single quotes), but you should be careful of running any shell command comprised of text you haven't vetted.
Using Python
class PP(gdb.Command):
"""print value history index, name, and value of each arg"""
def __init__(self):
super(PP, self).__init__("pp", gdb.COMMAND_DATA, gdb.COMPLETE_EXPRESSION)
def invoke(self, argstr, from_tty):
for arg in gdb.string_to_argv(argstr):
line = gdb.execute("print " + arg, from_tty=False, to_string=True)
line = line.replace("=", "= " + arg + " =", 1)
gdb.write(line)
PP()
Here, we're using a more sed-like approach, using string.replace.
Sample session:
(gdb) set args a b c
(gdb) start
Starting program: /home/mp/argprint a b c
Temporary breakpoint 2, main (argc=4, argv=0x7ffffffee278) at argprint.c:4
4 for(int i=0; i < argc; i++) {
(gdb) pp i argc argv argv[0]#argc
$1 = i = 0
$2 = argc = 4
$3 = argv = (char **) 0x7ffffffee278
$4 = argv[0]#argc = {0x7ffffffee49f "/home/mp/argprint", 0x7ffffffee4b1 "a", 0x7ffffffee4b3 "b", 0x7ffffffee4b5 "c"}

GAWK - Multiple BEGIN and END sections

I'm trying to process a bunch of files extracting data using gawk.
File area fixed width space formatted file
I'm trying to extract data from two different lines matched by two different regular expressions but return the data from both of these lines on the ONE print statement.
I can achieve this with the following in a.awk file and use gawk -f to run it. the first BEGIN section setup up input file format (FIELDWIDTHs) and the second BEGIN I am trying to use a loop per file to output based on extracted data. The first END complete the inner BEGIN and the second to match the outer BEGIN.
However I can only apply this to one file at a time because if I apply to a bunch of files (as in gawk -f regex.awk km*.txt , I only get the last file's output.
Can I get a one line of output per file input without having to resort to a script file looping over the input files and running the awk script each time.
Thanks
BEGIN{
OFS=","; FIELDWIDTHS ="2 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12";
printf("Date, Turnover, SalesA, SalesB, SalesC, SalesD, Other Data\n");
}
BEGIN{ Sales = 0;
SalesA = 0;
SalesB = 0;
SalesC = 0;
SalesD = 0;
JointSales = 0;
Turnover = 0;
OtherData = 0;}
/^03/ || /^06/ {
if ($1 == "03") {
Sales = $15/100;
SalesA = $17/100;
SalesB = $26/100;
SalesC = $20/100;
SalesD = $22/100;
JointSales = SalesA - SalesB;
Turnover = JointSales + SalesB + SalesC + SalesD; }
else if ( $1 == "06") {
OtherData = substr($0,183,12)/100; }
# printf("%s, %10.2f, %10.2f, %10.2f, %10.2f, %10.2f, %10.2f\n", getDate(FILENAME), Sales, JointSales, SalesB, SalesC, SalesD, OtherData )
}
END{printf("%s, %10.2f, %10.2f, %10.2f, %10.2f, %10.2f, %10.2f\n", getDate(FILENAME), Sales, JointSales, SalesB, SalesC, SalesD, OtherData ) }
END {}
function getDate(str)
{ date = substr(str,3,6);
year = substr(date,1,2);
month= substr(date,3,2);
day=substr(date,5,2);
odate=(day"/"month"/"year);
return odate
}

If you are using gawk, you're in luck. In addition to BEGIN and END blocks, gawk implements BEGINFILE and ENDFILE blocks, which are executed just as you want: before and after processing each file. See the handy gawk programming guide.
Like all awk implementations, Gnu awk allows you to have multiple BEGIN and END blocks. All BEGIN blocks are run in order (first to last) before the first file is read, and all END blocks are run in the same first-to-last order after the last file is done. Since the same order is used for both types of special block, they don't "nest".

awk only allows one begin and end action set per run (though they can be spread across multiple blocks, they're all combined into one action set) and a run includes all files that you process.
If you want to do something between each file as well, the can use the ARGIND variable which holds the index of the current argument (zero-based). You just need to maintain the last argument index (initially zero) and, if the actual argument index is different, execute your special actions and update the last index.
With empty files (for which no code would be run), the current argument index may be more than one higher than the last so you may need to loop, incrementing the last index until it reaches the current one.
For example, let's print the lines of each file but with special markers for before, within and after. With the file a.in:
xyzzy
plugh
and a b.in file containing nothing, you can use the following script demo.awk:
function middleCheck() {
while (lastArgInd != ARGIND) {
print "MIDDLE after "lastArgInd":"ARGV[lastArgInd]
lastArgInd++
}
}
BEGIN { print "BEGIN"
lastArgInd = 1
}
{ middleCheck()
print " "$0
}
END { middleCheck()
print "END"
}
to effect an action between each file:
pax> vi demo.awk ; awk -f demo.awk b.in a.in a.in b.in a.in b.in b.in
BEGIN
MIDDLE after 1:b.in
xyzzy
plugh
MIDDLE after 2:a.in
xyzzy
plugh
MIDDLE after 3:a.in
MIDDLE after 4:b.in
xyzzy
plugh
MIDDLE after 5:a.in
MIDDLE after 6:b.in
END
You just have to make that action match what you need, your current "inner" end followed by your current "inner" begin.

Is there a good way to replace home directory with tilde in bash?

Im trying to work with a path and replace the home directory with a tilde in bash, Im hoping to get it done with as little external programs as necessary. Is there a way to do it with just bash. I got
${PWD/#$HOME/\~}
But thats not quite right. It needs to convert:
/home/alice to ~
/home/alice/ to ~/
/home/alice/herp to ~/herp
/home/alicederp to /home/alicederp
As a note of interest, heres how the bash source does it when converting the \w value in the prompt:
/* Return a pretty pathname. If the first part of the pathname is
the same as $HOME, then replace that with `~'. */
char *
polite_directory_format (name)
char *name;
{
char *home;
int l;
home = get_string_value ("HOME");
l = home ? strlen (home) : 0;
if (l > 1 && strncmp (home, name, l) == 0 && (!name[l] || name[l] == '/'))
{
strncpy (tdir + 1, name + l, sizeof(tdir) - 2);
tdir[0] = '~';
tdir[sizeof(tdir) - 1] = '\0';
return (tdir);
}
else
return (name);
}

I don't know of a way to do it directly as part of a variable substitution, but you can do it as a command:
[[ "$name" =~ ^"$HOME"(/|$) ]] && name="~${name#$HOME}"
Note that this doesn't do exactly what you asked for: it replaces "/home/alice/" with "~/" rather than "~". This is intentional, since there are places where the trailing slash is significant (e.g. cp -R ~ /backups does something different from cp -R ~/ /backups).

See this unix.stackexchange answer:
If you're using bash, then the dirs builtin has the desired
behavior:
dirs +0
~/some/random/folder
That probably uses Bash's own C code that you pasted there. :)
And here's how you could use it:
dir=... # <- Use your own here.
# Switch to the given directory; Run "dirs" and save to variable.
# "cd" in a subshell does not affect the parent shell.
dir_with_tilde=$(cd "$dir" && dirs +0)
Note that this will only work with directory names that already exist.

Generating the shortest regex to match an arbitrary word list

I'm hoping someone might know of a script that can take an arbitrary word list and generated the shortest regex that could match that list exactly (and nothing else).
For example, suppose my list is
1231
1233
1234
1236
1238
1247
1256
1258
1259
Then the output should be:
12(3[13468]|47|5[589])

This is an old post, but for the benefit of those finding it through web searches as I did, there is a Perl module that does this, called Regexp::Optimizer, here: http://search.cpan.org/~dankogai/Regexp-Optimizer-0.23/lib/Regexp/Optimizer.pm
It takes a regular expression as input, which can consist just of the list of input strings separated with |, and outputs an optimal regular expression.
For example, this Perl command-line:
perl -mRegexp::Optimizer -e "print Regexp::Optimizer->new->optimize(qr/1231|1233|1234|1236|1238|1247|1256|1258|1259/)"
generates this output:
(?^:(?^:12(?:3[13468]|5[689]|47)))
(assuming you have installed Regex::Optimizer), which matches the OP's expectation quite well.
Here's another example:
perl -mRegexp::Optimizer -e "print Regexp::Optimizer->new->optimize(qr/314|324|334|3574|384/)"
And the output:
(?^:(?^:3(?:[1238]|57)4))
For comparison, an optimal trie-based version would output 3(14|24|34|574|84). In the above output, you can also search and replace (?: and (?^: with just ( and eliminate redundant parentheses, to obtain this:
3([1238]|57)4

You are probably better off saving the entire list, or if you want to get fancy, create a Trie:
1231
1234
1247
1
|
2
/ \
3 4
/ \ \
1 4 7
Now when you take a string check if it reaches a leaf node. It does, it's valid.
If you have variable length overlapping strings (eg: 123 and 1234) you'll need to mark some nodes as possibly terminal.
You can also use the trie to generate the regex if you really like the regex idea:
Nodes from the root to the first branching are fixed (eg: 12)
Branches create |: (eg: 12(3|4)
Leaf nodes generate a character class (or single character) that follows the parent node: (eg 12(3[14]|47))
This might not generate the shortest regex, to do that you'll might some extra work:
"Compact" ranges if you find them (eg [12345] becomes [1-4])
Add quantifiers for repeated elements (eg: [1234][1234] becomes [1234]{2}
???
I really don't think it's worth it to generate the regex.

This project generates a regexp from a given list of words: https://github.com/bwagner/wordhierarchy
It almost does the same as the above JavaScript solution, but avoids certain superfluous parentheses.
It only uses "|", non-capturing group "(?:)" and option "?".
There's room for improvement when there's a row of single characters:
Instead of e.g. (?:3|8|1|6|4) it could generate [38164].
The generated regexp could easily be adapted to other regexp dialects.
Sample usage:
java -jar dist/wordhierarchy.jar 1231 1233 1234 1236 1238 1247 1256 1258 1259
-> 12(?:5(?:6|9|8)|47|3(?:3|8|1|6|4))

Here's what I came up with (JavaScript). It turned a list of 20,000 6-digit numbers into a 60,000-character regular expression. Compared to a naive (word1|word2|...) construction, that's almost 60% "compression" by character count.
I'm leaving the question open, as there's still a lot of room for improvement and I'm holding out hope that there might be a better tool out there.
var list = new listChar("");
function listChar(s, p) {
this.char = s;
this.depth = 0;
this.parent = p;
this.add = function(n) {
if (!this.subList) {
this.subList = {};
this.increaseDepth();
}
if (!this.subList[n]) {
this.subList[n] = new listChar(n, this);
}
return this.subList[n];
}
this.toString = function() {
var ret = "";
var subVals = [];
if (this.depth >=1) {
for (var i in this.subList) {
subVals[subVals.length] = this.subList[i].toString();
}
}
if (this.depth === 1 && subVals.length > 1) {
ret = "[" + subVals.join("") + "]";
} else if (this.depth === 1 && subVals.length === 1) {
ret = subVals[0];
} else if (this.depth > 1) {
ret = "(" + subVals.join("|") + ")";
}
return this.char + ret;
}
this.increaseDepth = function() {
this.depth++;
if (this.parent) {
this.parent.increaseDepth();
}
}
}
function wordList(input) {
var listStep = list;
while (input.length > 0) {
var c = input.charAt(0);
listStep = listStep.add(c);
input = input.substring(1);
}
}
words = [/* WORDS GO HERE*/];
for (var i = 0; i < words.length; i++) {
wordList(words[i]);
}
document.write(list.toString());
Using
words = ["1231","1233","1234","1236","1238","1247","1256","1258","1259"];
Here's the output:
(1(2(3[13468]|47|5[689])))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

dtrace execute action only when the function returns to a specific module - dtrace

Related

In conky, how do I nest a variable within a template?

gdb how to print variable name along with variable value like "$number = variable-name = variable-value"

GAWK - Multiple BEGIN and END sections

Is there a good way to replace home directory with tilde in bash?

Generating the shortest regex to match an arbitrary word list

Categories

Resources