Why is the syscall read argument null? - dtrace

The dtrace
dtrace -n 'syscall::read:entry { #[fds[arg0].fi_fs] = count(); }'
I want to find the argument read fds
trace -lvn 'syscall::*read*:entry'
933 syscall read_nocancel entry
Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA
Argument Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA
Argument Types
None
963 syscall readv_nocancel entry
Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA
Argument Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA
Argument Types
None
969 syscall pread_nocancel entry
Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA
Argument Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA
Argument Types
None
But the argument is None. How to find the arguments?

You are confusing the meaning of an argument with the type of an argument.
The meaning of an argument depends on the provider. If you want to learn about syscall::: probes then you need to consult the documentation for the syscall provider, which says
Arguments
For entry probes, the arguments (arg0 .. argn) are the arguments to
the system call. For return probes, both arg0 and arg1 contain the
return value. A non-zero value in the D variable errno indicates
system call failure.
Therefore in the clause
syscall::read:entry
{
...
}
, which corresponds to
ssize_t read(int fildes, void *buf, size_t nbyte);
, arg0 would be the value of fildes, arg1 would be the value of buf and arg2 would be the value of nbyte.
The type of arg0, arg1, arg2 etc. is always an int64_t, regardless of the types of the arguments that they represent. This is enough for scalar quantities, but for a structure dtrace(1) needs to understand types. It's possible to cast arguments, e.g.
((struct mystruct *)(arg0))->my_member
but this is irritating. Sometimes, but not always, DTrace knows the types of the arguments themselves and allows them to be described using the notation args[0], args[1] etc.; thus under certain circumstances I could instead write the much more convenient
args[0]->my_member
For the syscall provider, DTrace doesn't know the arguments' types, which is why you see
# dtrace -lv -n syscall::read:entry
...
Argument Types
None
#
and why
dtrace -n 'syscall::read:entry {trace(args[0])}'
is not valid.
For the io provider, however, DTrace does know the arguments' types, e.g.
# dtrace -lv -n io:::start
...
Argument Types
args[0]: bufinfo_t *
args[1]: devinfo_t *
args[2]: fileinfo_t *
#
By reading the documentation for the io provider one can see that the definition of a bufinfo_t includes
typedef struct bufinfo {
...
size_t b_bcount; /* number of bytes */
...
} bufinfo_t;
and this allows one to write, e.g.
dtrace -n 'io:::start {trace(args[0]->b_bcount)}'.
Finally, you mention fds[]. As I explained before, the type of fds[n] is fileinfo_t *.
I recommend that you follow this introduction.

How about man 2 read? On Mac OS, I get this:
READ(2) BSD System Calls Manual READ(2)
NAME
pread, read, readv -- read input
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
ssize_t
pread(int d, void *buf, size_t nbyte, off_t offset);
ssize_t
read(int fildes, void *buf, size_t nbyte);
ssize_t
readv(int d, const struct iovec *iov, int iovcnt);
...
This will obviously only work for the syscall provider, however.

Related

What is a 16 byte signed integer data type?"

I made this program to test what data types arbitrary integer literals get evaluated to. This program was inspired from reading some other questions on StackOverflow.
How do I define a constant equal to -2147483648?
Why do we define INT_MIN as -INT_MAX - 1?
(-2147483648> 0) returns true in C++?
In these questions, we have an issue: the programmer wants to write INT_MIN as -2^31, but 2^31 is actually a literal and - is the unary negation operator. Since INT_MAX is usually 2^31 - 1 having a 32-bit int, the literal 2^31 cannot be represented as an int, and so it gets promoted to a larger data type. The second answer in the third question has a chart according to which the data type of the integer literals is determined. The compiler goes down the list from the top until it finds a data type which can fit the literal.
Suffix Decimal constants
none int
long int
long long int
=========================================================================
In my little program, I define a macro that will return the "name" of a variable, literal, or expression, as a C-string. Basically, it returns the text that is passed inside of the macro, exactly as you see it in the code editor. I use this for printing the literal expression.
I want to determine the data type of the expression, what it evaluates to. I have to be a little clever about how I do this. How can we determine the data type of a variable or an expression in C? I've concluded that only two "bits" of information are necessary: the width of the data type in bytes, and the signedness of the data type.
I use the sizeof() operator to determine the width of the data type in bytes. I also use another macro to determine if the data type is signed or not. typeof() is a GNU compiler extension that returns the data type of a variable or expression. But I cannot read the data type. I typecast -1 to whatever that data type is. If it's a signed data type, it will still be -1, if it's an unsigned data type, it will become the UINT_MAX for that data type.
#include <stdio.h> /* C standard input/output - for printf() */
#include <stdlib.h> /* C standard library - for EXIT_SUCCESS */
/**
* Returns the name of the variable or expression passed in as a string.
*/
#define NAME(x) #x
/**
* Returns 1 if the passed in expression is a signed type.
* -1 is cast to the type of the expression.
* If it is signed, -1 < 0 == 1 (TRUE)
* If it is unsigned, UMax < 0 == 0 (FALSE)
*/
#define IS_SIGNED_TYPE(x) ((typeof(x))-1 < 0)
int main(void)
{
/* What data type is the literal -9223372036854775808? */
printf("The literal is %s\n", NAME(-9223372036854775808));
printf("The literal takes up %u bytes\n", sizeof(-9223372036854775808));
if (IS_SIGNED_TYPE(-9223372036854775808))
printf("The literal is of a signed type.\n");
else
printf("The literal is of an unsigned type.\n");
return EXIT_SUCCESS;
}
As you can see, I'm testing -2^63 to see what data type it is. The problem is that in ISO C90, the "largest" data type for integer literals appears to be long long int, if we can believe the chart. As we all know, long long int has a numerical range -2^63 to 2^63 - 1 on a modern 64-bit system. However, the - above is the unary negation operator, not really part of the integer literal. I'm attempting to determine the data type of 2^63, which is too big for the long long int. I'm attempting to cause a bug in C's type system. That is intentional, and only for educational purposes.
I am compiling and running the program. I use -std=gnu99 instead of -std=c99 because I am using typeof(), a GNU compiler extension, not actually part of the ISO C99 standard. I get the following output:
$ gcc -m64 -std=gnu99 -pedantic experiment.c
$
$ ./a.out
The literal is -9223372036854775808
The literal takes up 16 bytes
The literal is of a signed type.
I see that the integer literal equivalent to 2^63 evaluates to a 16 byte signed integer type! As far as I know, there is no such data type in the C programming language. I also don't know of any Intel x86_64 processor that has a 16 byte register to store such an rvalue. Please correct me if I'm wrong. Explain what's going on here? Why is there no overflow? Also, is it possible to define a 16 byte data type in C? How would you do it?
Your platform likely has __int128 and 9223372036854775808 is acquiring that type.
A simple way to get a C compiler to print a typename is with something like:
int main(void)
{
#define LITERAL (-9223372036854775808)
_Generic(LITERAL, struct {char x;}/*can't ever match*/: "");
}
On my x86_64 Linux, the above is generating an
error: ‘_Generic’ selector of type ‘__int128’ is not compatible with any association error message, implying __int128 is indeed the type of the literal.
(With this, the warning: integer constant is so large that it is unsigned is wrong. Well, gcc isn't perfect.)
After some digging this is what I've found. I converted the code to C++, assuming that C and C++ behave similarly in this case. I want to create a template function to be able to accept any data type. I use __PRETTY_FUNCTION__ which is a GNU compiler extension which returns a C-string containing the "prototype" of the function, I mean the return type, the name, and the formal parameters that are input. I am interested in the formal parameters. Using this technique, I am able to determine the data type of the expression that gets passed in exactly, without guessing!
/**
* This is a templated function.
* It accepts a value "object" of any data type, which is labeled as "T".
*
* The __PRETTY_FUNCTION__ is a GNU compiler extension which is actually
* a C-string that evaluates to the "pretty" name of a function,
* means including the function's return type and the types of its
* formal parameters.
*
* I'm using __PRETTY_FUNCTION__ to determine the data type of the passed
* in expression to the function, during the runtime!
*/
template<typename T>
void foo(T value)
{
std::cout << __PRETTY_FUNCTION__ << std::endl;
}
foo(5);
foo(-9223372036854775808);
Compiling and running, I get this output:
$ g++ -m64 -std=c++11 experiment2.cpp
$
$ ./a.out
void foo(T) [with T = int]
void foo(T) [with T = __int128]
I see that the passed in expression is of type __int128. Apparently, this is a GNU compiler specific extension, not part of the C standard.
Why isn't there int128_t?
https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/_005f_005fint128.html
https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gcc/C-Extensions.html#C-Extensions
How is a 16 byte data type stored on a 64 bit machine
With all warnings enabled -Wall gcc will issue warning: integer constant is so large that it is unsigned warning. Gcc assigns this integer constant the type __int128 and sizeof(__int128) = 16.
You can check that with _Generic macro:
#define typestring(v) _Generic((v), \
long long: "long long", \
unsigned long long: "unsigned long long", \
__int128: "__int128" \
)
int main()
{
printf("Type is %s\n", typestring(-9223372036854775808));
return 0;
}
Type is __int128
Or with warnings from printf:
int main() {
printf("%s", -9223372036854775808);
return 0;
}
will compile with warning:
warning: format '%s' expects argument of type 'char *', but argument 2 has type '__int128' [-Wformat=]

C++ "Reflection" for bit field structs

I am looking for a way to do "reflection" in C++ over a bit-field struct like this:
struct Bits {
uint32_t opCode : 5;
uint32_t header : 3;
uint32_t source : 5;
uint32_t destination : 5;
uint32_t offset : 13;
};
My goal is to be able to get things like:
All member names as std::string list ex. "opCode", "header", etc...
The underlying typename as a std::string ex. "uint32_t"
The number of bits for each field as a std::int list ex. 5,3,5,5,13
with minimal hand-coding, macros, or boilerplate code. These values should in principal, be able to be determined at compile-time, though it is not a necessity.
After reading through this StackOverflow post, it is clear there are some "reflection" capabilities provided by libraries outside of standard C++. Looking into things like Precise and Flat Reflection, Ponder, Boost Reflect, and even Boost Hana, I have not found support for something like this.
I have already found a way to get the member names and typenames of a non-bitfield struct using Boost Hana [this achieves #1 and #2 for ordinary structs], and there is a way to get the member names of a bitfield struct using Boost Fusion [this achieves #1 for bitfield structs], though it is a custom code solution.
Alternatively, manually flattening/decaying the struct to a single unit32_t member struct like in this post:
struct BitsFlattened{
uint32_t message;
}
with manually coded getters and setters for the appropriate members is also a possibility, where getters might expose the information by their name ex. getopCode05(), getheader69().

Why do Boost Format and printf behave differently on same format string

The Boost Format documentation says:
One of its goal is to provide a replacement for printf, that means
format can parse a format-string designed for printf, apply it to the
given arguments, and produce the same result as printf would have.
When I compare the output of boost:format and printf using the same format string I get different outputs. Online example is here
#include <iostream>
#include <boost/format.hpp>
int main()
{
boost::format f("BoostFormat:%d:%X:%c:%d");
unsigned char cr =65; //'A'
int cr2i = int(cr);
f % cr % cr % cr % cr2i;
std::cout << f << std::endl;
printf("Printf:%d:%X:%c:%d",cr,cr,cr,cr2i);
}
The output is:
BoostFormat: A:A:A:65
printf: 65:41:A:65
The difference is when I want to display a char as integral type.
Why there is a difference? Is this a bug or wanted behavior?
This is expected behaviour.
In the boost manual it is written about the classical type-specification you uses:
But the classical type-specification flag of printf has a weaker
meaning in format. It merely sets the appropriate flags on the
internal stream, and/or formatting parameters, but does not require
the corresponding argument to be of a specific type.
Please note also, that in the stdlib-printf call all char arguments are automatically
converted to int due to the vararg-call. So the generated code is identical to:
printf("Printf:%d:%X:%c:%d",cr2i,cr2i,cr2i,cr2i);
This automatic conversion is not done with the % operator.
Addition to the accepted answer:
This also happens to arguments of type wchar_t as well as unsigned short and other equivalent types, which may be unexpected, for example, when using members of structs in the Windows API (e.g., SYSTEMTIME), which are short integers of type WORD for historical reasons.
If you are using Boost Format as a replacement for printf and "printf-like" functions in legacy code, you may consider creating a wrapper, which overrides the % operator in such a way that it converts
char and short to int
unsigned char and unsigned short to unsigned int
to emulate the behavior of C variable argument lists. It will still not be 100% compatible, but most of the remaining incompatibilities are actually helpful for fixing potentially unsafe code.
Newer code should probably not use Boost Format, but the standard std::format, which is not compatible to printf.

Where to get device type constants description?

I'm getting the information about system network devices through netlink socket.
I'm parsing three message types RTM_NEWLINK, RTM_DELLINK, RTM_GETLINK defined in the ifinfomsg structure.
struct ifinfomsg {
unsigned char ifi_family; /* AF_UNSPEC */
unsigned short ifi_type; /* Device type */
int ifi_index; /* Interface index */
unsigned int ifi_flags; /* Device flags */
unsigned int ifi_change; /* change mask */
};
the definition is from here http://www.kernel.org/doc/man-pages/online/pages/man7/rtnetlink.7.html
But there are no description for the device type field ifi_type, where can I found the constants that describes the possible values?
there are no description even here
http://www.foxprofr.com/rfc/RFC3549-LINUX-NETLINK-AS-AN-IP-SERVICES-PROTOCOL/3549.aspx
Now I know that 1 is ethernet and 772 is loopback, but I'd like to know all possible values.
May be the answer is very obvious but google doesn't want to tell me anything usefull.
Take a look at /usr/include/net/if_arp.h, you will find the constants there as ARPHRD_*. If you want to make your life somewhat easier, check out libnl if you don't use it already.

What's the shortest code to write directly to a memory address in C/C++?

I'm writing system-level code for an embedded system without memory protection (on an ARM Cortex-M1, compiling with gcc 4.3) and need to read/write directly to a memory-mapped register. So far, my code looks like this:
#define UART0 0x4000C000
#define UART0CTL (UART0 + 0x30)
volatile unsigned int *p;
p = UART0CTL;
*p &= ~1;
Is there any shorter way (shorter in code, I mean) that does not use a pointer? I looking for a way to write the actual assignment code as short as this (it would be okay if I had to use more #defines):
*(UART0CTL) &= ~1;
Anything I tried so far ended up with gcc complaining that it could not assign something to the lvalue...
#define UART0CTL ((volatile unsigned int *) (UART0 + 0x30))
:-P
Edited to add: Oh, in response to all the comments about how the question is tagged C++ as well as C, here's a C++ solution. :-P
inline unsigned volatile& uart0ctl() {
return *reinterpret_cast<unsigned volatile*>(UART0 + 0x30);
}
This can be stuck straight in a header file, just like the C-style macro, but you have to use function call syntax to invoke it.
I'd like to be a nitpick: are we talking C or C++ ?
If C, I defer to Chris' answer willingly (and I'd like the C++ tag to be removed).
If C++, I advise against the use of those nasty C-Casts and #define altogether.
The idiomatic C++ way is to use a global variable:
volatile unsigned int& UART0 = *((volatile unsigned int*)0x4000C000);
volatile unsigned int& UART0CTL = *(&UART0 + 0x0C);
I declare a typed global variable, which will obey scope rules (unlike macros).
It can be used easily (no need to use *()) and is thus even shorter!
UART0CTL &= ~1; // no need to dereference, it's already a reference
If you want it to be pointer, then it would be:
volatile unsigned int* const UART0 = 0x4000C000; // Note the const to prevent rebinding
But what is the point of using a const pointer that cannot be null ? This is semantically why references were created for.
You can go one further than Chris's answer if you want to make the hardware registers look like plain old variables:
#define UART0 0x4000C000
#define UART0CTL (*((volatile unsigned int *) (UART0 + 0x30)))
UART0CTL &= ~1;
It's a matter of taste which might be preferable. I've worked in situations where the team wanted the registers to look like variables, and I've worked on code where the added dereference was considered 'hiding too much' so the macro for a register would be left as a pointer that had to be dereferenced explicitly (as in Chris' answer).
#define UART0 ((volatile unsigned int*)0x4000C000)
#define UART0CTL (UART0 + 0x0C)
I like to specify the actual control bits in a struct, then assign that to the control address. Something like:
typedef struct uart_ctl_t {
unsigned other_bits : 31;
unsigned disable : 1;
};
uart_ctl_t *uart_ctl = 0x4000C030;
uart_ctl->disable = 1;
(Apologies if the syntax isn't quite right, I haven't actually coded in C for quite awhile...)
Another option which I kinda like for embedded applications is to use the linker to define sections for your hardward devices and map your variable to those sections. This has the advantage that if you are targeting multiple devices, even from the same vendor such as TI, you will typically have to alter the linker files on a device by device basis. i.e. Different devices in the same family have different amounts of internal direct mapped memory, and board to board you might have different amounts of ram as well and hardware at different locations. Here's an example from the GCC documentation:
Normally, the compiler places the objects it generates in sections
like data and bss. Sometimes, however, you need additional sections,
or you need certain particular variables to appear in special
sections, for example to map to special hardware. The section
attribute specifies that a variable (or function) lives in a
particular section. For example, this small program uses several
specific section names:
struct duart a __attribute__ ((section ("DUART_A"))) = { 0 };
struct duart b __attribute__ ((section ("DUART_B"))) = { 0 };
char stack[10000] __attribute__ ((section ("STACK"))) = { 0 };
int init_data __attribute__ ((section ("INITDATA")));
main()
{
/* Initialize stack pointer */
init_sp (stack + sizeof (stack));
/* Initialize initialized data */
memcpy (&init_data, &data, &edata - &data);
/* Turn on the serial ports */
init_duart (&a);
init_duart (&b);
}
Use the section attribute with global variables and not local variables, as shown in the example.
You may use the section attribute with initialized or uninitialized
global variables but the linker requires each object be defined once,
with the exception that uninitialized variables tentatively go in the
common (or bss) section and can be multiply “defined”. Using the
section attribute will change what section the variable goes into and
may cause the linker to issue an error if an uninitialized variable
has multiple definitions. You can force a variable to be initialized
with the -fno-common flag or the nocommon attribute.