There are a great deal many tools to help developers debug complex applications. Many of those tools are based in one way or another on the ptrace() system call. Such is the case of the quintessential strace command-line tool which allows you to track all system calls made by a Linux process. Another example is gdb which allows you to set breakpoints and step through your code. Yet another example is ltrace which allows you to track the library calls made by applications.

Both strace and gdb operate using some of ptrace()'s most fundamental features: attaching to other PIDs for debugging, read/writing data/text in other PIDs, and being notified of other PIDs entries/exits to/from the kernel. In the case of strace, for instance, ptrace() is used to request notifications whenever the observed process enters and/or exits from a system call. That's a functionality built right into ptrace() and provided by the kernel. You can check out ptrace()'s man page for more info. Here are the relevant snippets:

#include <sys/ptrace.h>

long ptrace(enum __ptrace_request request, pid_t pid,
            void *addr, void *data);
...
PTRACE_SYSCALL, PTRACE_SINGLESTEP
       Restarts  the stopped child as for PTRACE_CONT, but arranges for
       the child to be stopped at the next entry to or exit from a sys-
       tem  call,  or  after execution of a single instruction, respec-
       tively.  (The child will also, as usual, be stopped upon receipt
       of  a  signal.)   From  the parent's perspective, the child will
...

So, in short, strace's job is pretty simple: call ptrace() to request to be notified whenever a system call is made. In the same way, gdb attaches to a process and uses ptrace() to insert breakpoints in the code it's trying to debug. But what about ltrace? Unlike strace, which tracks entries/exits to/from the kernel, ltrace tracks entries/exits to/from library calls. Yet, ptrace() is a kernel mechanism, it has no way of knowning library boundaries. A visit to ptrace()'s man page or the kernel sources should convince you of that. So how is it actually able to track for us all library calls being made by a command or given PID?

Surprisingly, or maybe not so, there's very little documentation on how ltrace operates. The only document I've been able to find on the topic is a paper presented at the 2007 Ottawa Linux Sysmposium entitled Ltrace internals by Rodrigo Rubira Branco from IBM. But despite going through the paper a few times, I still felt I was missing some key pieces, especially when I tried following the latest version of the sources in ltrace's git repository. What follows is an explanation of how ltrace 0.7.90-git works.

Note that this is a best-effort. If you seen anything that doesn't make any sense or if you know better, by all means please feel free to let me know.

ltrace's entry point

ltrace strats in a pretty classic way with main.c:main(). It does two basic things:

  • Call ltrace_init(), this sets the program to be ltrace'ed
  • Call ltrace_main(), this loops around dealing with events to be printed out to the user

ltrace has two main modes of operation:

  • Attach to a running process (-p option)
  • Start a command and then trace it

ltrace_init() starts by calling options.c:process_options() to parse the command-line options passed to it. There are two variables of importance here: libltrace.c:"char* command" and options.c:"struct opt_p_t *opt_p". The former is set if a command is given to ltrace for it to start and the latter if it's a PID to attach to.

If a command is set

Here's what ltrace_init() does if a command is set:

	if (command) {
		/* Check that the binary ABI is supported before
		 * calling execute_program.  */
		struct ltelf lte = {};
		open_elf(&lte, command);
		do_close_elf(&lte);

		pid_t pid = execute_program(command, argv);
		struct process *proc = open_program(command, pid);
		if (proc == NULL) {
			fprintf(stderr, "couldn't open program '%s': %s\n",
				command, strerror(errno));
			exit(EXIT_FAILURE);
		}

		trace_set_options(proc);
		continue_process(pid);
	}

There are essentially four calls here:

  1. execute_program()
  2. open_program()
  3. trace_set_options()
  4. continue_process()

execute_program.c:execute_program() calls fork() to start a child process, sets the forked process as being ptrace'ed (calls on ptrace(PTRACE_TRACEME, ...) in sysdeps/linux-gnu/trace.c:trace_me()) and then does an execvp() on the command to be started. The use of the PTRACE_TRACEME parameter will result in the process to stop when the execvp() is called and wait for the parent to let it continue. The parent side of execute_program(), for its part, calls on sysdeps/linux-gnu/trace.c:wait_for_proc(), which essentially does a waitpid(), to wait for the child to stop on the execvp().

proc.c:open_program() results in a cascade of calls that will result in the setting of a breakpoint at the entry-point of the program being started. It calls on process_init() which makes two important calls: a) process_bare_init() to set up a struct process for this process, and b) process_init_main(). The latter calls on breakpoints.c:breakpoints_init(), which itself calls on breakpoints.c:entry_breakpoint_init(), which sets the breakpoints.c:entry_breakpoint_on_hit() function as a breakpoint at the program's entry point, i.e. first adress in main binary. This breakpoint on program entry will allow us to stop the program as it's starting and do a few clever things.

The call to sysdeps/linux-gnu/trace.c:trace_set_options() results in a call to ptrace() using parameters to indicate to the kernel that we want to catch all processes that are forked by the program that's starting.

Finally, sysdeps/linux-gnu/trace.c:continue_process() essentially does a ptrace(PTRACE_SYSCALL, ...). As we saw from the man page snippet above, this call will result in the process to be restarted while causing all system calls to suspend the process so that the parent ltrace can inspect the system call being made. The same will happen if the child receives a signal. The ptrace() man page has the details on the use of PTRACE_SYSCALL.

When the program actually starts, the breakpoint set earlier will cause breakpoints.c:entry_breakpoint_on_hit() to be called. This will delete the very same breakpoint that caused it to be called and then call proc.c:process_hit_start(). This last function is how ltrace attaches to the library calls. We'll cover what it does below.

If a PID is set

Here's what ltrace_init() does if a PID, or a list of PIDs, is set to be traced:

	opt_p_tmp = opt_p;
	while (opt_p_tmp) {
		open_pid(opt_p_tmp->pid);
		opt_p_tmp = opt_p_tmp->next;
	}

As you can see, instead of calling execute_program.c:execute_program(), ltrace_init() calls proc.c:open_pid(). The latter calls proc.c:open_one_pid() on each task that belongs to the PID being attached to. This results in a call to the open_program() function we saw earlier to set up the struct process for the process being attached to. Since the program has already started, using an entry-point breakpoint to call on proc.c:process_hit_start() won't work. Hence, proc.c:open_pid() calls the latter directly.

Linkmap hacking

proc.c:process_hit_start()'s most important call is to sysdeps/linux-gnu/proc.c:linkmap_init(). This results in a call to sysdeps/linux-gnu/proc.c:crawl_linkmap(), which itself will instrument all already loaded libraries, and make a call to insert_breakpoint_at() to insert a breakpoint in order to be informed of all future libraries that are loaded after this point so that it can instrument those too.

Obviously the most interesting function here is sysdeps/linux-gnu/proc.c:crawl_linkmap(). That's where the link map is walked through and each library is read and instrumented. ltrace-elf.c:ltelf_read_library() (the core of it is in ltrace-elf.c:read_module()) is used to read the ELF library and then proc.c:proc_add_library() has the core piece of code that allows ltrace to do what it does:

	/* Insert breakpoints for all active (non-latent) symbols.  */
	struct library_symbol *libsym = NULL;
	while ((libsym = library_each_symbol(lib, libsym,
					     cb_breakpoint_for_symbol,
					     proc)) != NULL)
		fprintf(stderr,
			"Couldn't insert breakpoint for %s to %d: %s.\n",
			libsym->name, proc->pid, strerror(errno));

cb_breakpoint_for_symbol is actually a function pointer. cb_breakpoint_for_symbol() calls proc.c:breakpoint_for_symbol(). The latter adds the breakpoint and enables it (breakpoints.c:breakpoint_turn_on()). The details are in sysdeps/linux-gnu/breakpoint.c and involve two ptrace() calls. The first does a PTRACE_PEEKTEXT to copy the original content of the location where the breakpoint is to be inserted. The second involves a PTRACE_POKETEXT to insert the actual breakpoint. The breakpoint itself is architecture dependent and you can see the definitions of BREAKPOINT_VALUE in the various arch.h files found under the sysdeps/linux-gnu/[$YOUR_FAVOURITE_ARCH] directory of your choice. Here are two examples:

  • For ARM: #define BREAKPOINT_VALUE { 0xf0, 0x01, 0xf0, 0xe7 }
  • For x86: #define BREAKPOINT_VALUE {0xcc}

Handling of the breakpoint

The breakpoint hex values of BREAKPOINT_VALUE are actually hard-coded CPU instructions which will cause the program that contains them to be stopped by the CPU when they're encountered. The kernel is then trapped into and it'll turn around and send a SIGTRAP signal to the process for which this instruction was executed. In the case of an ltrace'ed process, this will be caught by the parent ltrace inside the sysdepds/linux-gnu/events.c:next_event() function which is called inside ltrace_main()'s while(1) loop:

void
ltrace_main(void) {
	Event * ev;
	while (1) {
		ev = next_event();
		dispatch_callbacks(ev);
		handle_event(ev);
	}
}

next_event() does a waitpid(-1, &status, 0) to wait for the child and get its status. It'll also try to determine the type of event that caused the waitpid() to return, in this case a breakpoint. Once next_event() returns with an event, the event will be dispatched to handle_event.c:handle_event() which will call handle_breakpoint() as this was a breakpoint. The latter then uses output functions in output.c to print info about what library function was called and manages the breakpoint to let the program continue.

In Sum

There's of course much more to be said about ltrace than can fit in this post. The most important thing to remember about ltrace is that unlike strace which can request the kernel to notify it whenever its barrier of interest (i.e. system calls) is crossed, there's no way for ltrace to ask for the same. Instead, ltrace has to insert a breakpoint at each symbol of interest (i.e. all symbols in all loaded libraries), catch each of these breakpoints and print information about the call that's just been made.

How are breakpoints handled in the kernel?

If you're interested in undestanding how the kernel handles breakpoints, you'll need to check the kernel sources. For ARM for instance, that'll be the arch/arm/kernel/ directory and for x86, that'll be the arch/x86/kernel/ directory.

For ARM, the call sequence starts with entry-armv.S:__und_fault which calls traps.c:do_undefinstr() which then calls traps.c:call_undef_hook() which itself calls the hook that was registered for handling the breakpoint. The hook registration is done at startup by ptrace.c:ptrace_break_init() which passes static struct undef_hook arm_break_hook which itself has a .fn member function pointer that's set to break_trap(). Hence, when a breakpoint is encounter, it's that function that gets called. break_trap() then calls ptrace_break() which itself calls on force_sig_info(SIGTRAP...). Hence, a SIGTRAP signal is sent to the process when one of its breakpoints is encountered. If the process is being ptraced, which is the case when it's being run by ltrace, that signal is sent to the parent.

In the case of the x86, the path is somewhat simpler. The CPU trap is handled by entry_32.S:int3 which calls on traps.c:do_int3(), which then calls do_trap(..., SIGTRAP, ...) which results in a call to force_sig_info(signr, ...).