Linux’s ptrace API sucks!

I love Linux, as a developer, I find the tools available suit my style of work perfectly. Sometimes the tool that I want isn’t available. That’s OK though, because whenever I can, I try to contribute.

I do a lot of reverse engineering work and thus the lack of anything like Ollydbg spawned off my EDB project. It’s a debugger designed to focus on applications at a machine code level. This project is coming along nicely but there is one thing that I really wish I could change…ptrace sucks, and it sucks a lot.

First of all, it has no inherent support for threads. This is a huge problem as many modern applications are multi threaded. Instead, since Linux treats threads as independent processes which happen to share the the majority of there address spaces, you are supposed to ptrace each thread individually. So, you need to attach to all “pids” found in “/proc/<pid>/task/”. On top of this, you have to set an option in ptrace to attach to new threads as they are created with the clone system call. It is entirely undocumented whether you need to do this on a per process basis or a per-thread basis. Finally, you need track threads exiting so you know to stop looking for events on those. That’s an awful lot of effort just to be able to debug threads. By the way, the information necessary to know which bits of the status code tell you if it was a clone event (new thread) is also entirely undocumented.

Did i mention the potential race condition with attaching to all threads in the /proc/<pid>/task/ directory? Since the threads could be spawning more threads while you are enumerating the directories. Threads could even exit during this time. So you have to loop continually trying to attach until the you are sure the thread count is stabilized and all are attached to. The only saving grace here is that attaching stops a thread so it is possible to get them all if you think it through.

Next, I wish that the PTRACE_PEEK* and PTRACE_POKE* request types had support for non-word width granularity. It makes setting a breakpoint more annoying than it has to be since you really only want to read/write a single byte in that case. Not only that, but reading/writing from the edges of region boundaries is equally annoying. A much better interface would have been similar to the file API where you can specify and address and a length. In addition to this, you need to pay careful attention to the various gotchas due to the fact that the return value is both an error code and a result. So if it returns (long)-1, then you need to check errno just to make sure that it isn’t an error.

The usage of wait for debug events is just awkward. It works great for single threaded command line debuggers like gdb, but for a GUI, where you want things to be interactive while the debugger is waiting for the next event, it is a disaster. Sure you can use a separate thread to capture events and deliver them to the GUI, but then you have issues properly shutting down that thread, since it will pretty much always be blocked! Also, wait has no timeout, so if you aren’t careful it is possible to get hung forever waiting for an even that will never happen. There is SIGCHLD, which sounds promising at first, but the fun part is that without sigprocmask trickery, you can’t predict which of your threads will get the signal. Grr!

Finally, there is lots of information that would be better suited being part of the debugging API. A great example of this is x86 segments. This really should be in the user area. You can get the segment values from the user area or even a PTRACE_GETREGS request. But the segment values are nearly worthless without being able to look at things like the segment base and limits. I understand not all platforms have this data, and x86-64 has much less usage of segments, but that’s why it should be in the user area.

A better API would first of all, be at a process level. I don’t care how it works under the hood, I want to attach to “processes.” There should also be a function to enumerate threads, this would only be valid when the process is stopped. This way you could get/set the context of each thread by passing a tid. Just these changes would make things much easier.

Overall, the user space API provided by ptrace could use a large overhaul. I understand the desire to be consistent with other unix’s debugging APIs, but this should not get in the way of making something usable.

utrace sounds ok, but as far as I know, it is designed to be kernel level changes. In fact, it appears there are plans to have ptrace implemented on top of utrace in the future. That’s great and all, but the user space API needs an update! I can only hope that utrace bring along a new user space API as well.

17 thoughts on “Linux’s ptrace API sucks!

  1. freenity

    Oh man thanks for this great debugger 🙂
    I love ollybdg and that’s exactly what linux needs a good debugger.
    Thanks again and keep with the project
    Good luck

  2. cyphunk

    thanks for the concise overview of complaints, and some junk im dealing with finding a workaround for now. i guess ill go dig in your code to find a solution ,)

  3. cyphunk

    aaaah, yes. I tried to erase from my memory all that you have thus documented here, for good reason. Tell me, other that utrace, whats your opinion on something like SystemTap?

  4. Evan Teran Post author

    at first glance, systemtap looks pretty cool. Though I think it is trying to fill a different void than the ptrace API. I would imagine that systemtap is implemented using ptrace and provides a nicer abstraction (which is a good thing and ptrace is deeply annoying). Maybe I’ll take a look at the source and see what techniques they use.

    Thanks.

  5. Marlow

    So what do you recommend in finding the ptrace documentation necessary for this? I’m having one hell of a time.

    FWIW, systemtap is nice and all, but you have to build a new module for each stupid set of options you want to use. Furthermore, you need to patch the kernel in order to use the markers version.

    There is hope on the horizon for people who don’t need these abilities now-there is a new in-kernel package called cgroup, although it had definite problems the first time I tried to use it (and I need the 2.6.15 kernel).

  6. Marlow

    PS. It doesn’t use ptrace-it uses kprobes, which does not guarantee catching of exit calls. Kprobes allow you to catch any global system call, whereas ptrace seems geared mainly to signals.

    Also, if you are using a diskless system and trying to transfer stuff over the wire and need a binary form for speed, systemtap is useless.

  7. Rodrigo (BSDaemon)

    Hello,

    Nice article 😉

    Just a comment: System Tap is designed for kernel debugging (of course you can see what the program is doing, but it will give you an overview of the kernel-mode execution of it) whereas ptrace is for user-mode debugging…

    Ptrace completely sucks mainly because nobody really knows it inner functionality anymore. I mean, the developers are just keeping it working, but the low-level interfaces with the hardware are mainly unknow and everybody is afraid of new patches (even for supporting new architectures) because of that.

    Regards,

  8. Pingback: Blog » Blog Archive » EDM Ollydbg for Linux

  9. atlas 0f d00m

    it sounds like the biggest complaint is how hard it is to figure all this crap out.
    ptrace sucks, no doubt about it. it’s “simple” which means non-comprehensive and often incomprehensible. also means it differs from platform to platform (ugh. nice standard, eh?)

    if you are interested, a friend of mine wrote a cross-platform debugging platform and a debugger on top of it, oh, and it’s python with ctypes. quite interesting. i find when dealing with the difficult problems that are not necessarily *new*, that correlation with other like-tools can be very handy, and reading GDB code is like cork-screwing your temples. check it out at http://visi.kenshoto.com/releases
    it’s a python toolset. what you’re looking for is in “vtrace,” in particular vtrace/posix.py and vtrace/linux.py. if you like to chat about it, email me or get on #vtrace on freenode.

    welldone on the debugger. it’s certainly not childsplay.

    @

  10. Pingback: FuzzMon | Pearltrees

  11. random_guy

    “Since the threads could be spawning more threads while you are enumerating the directories.”

    That’s not accurate – there are ways to deterministically trace every thread of a process, re. docs:

    If PTRACE_O_TRACE[V]FORK or PTRACE_O_TRACECLONE options are in effect,
    then children created by (vfork or clone(CLONE_VFORK)), (fork or
    clone(SIGCHLD)) and (other kinds of clone) respectively are
    automatically attached to the same tracer which traced their parent.
    SIGSTOP is delivered to them, causing them to enter
    signal-delivery-stop as they exit syscall which created them.

  12. Evan Teran Post author

    @random_guy: unfortunately that only applies for threads you are already attached to (that’s the only way to apply those flags…). Imagine this scenario. I get a directory listing from proc (before attach to any threads) and THEN one of the threads spawns a new one. Finally, my program finally attempts to attach. In this case, I will have missed one or more threads. So, the best you can do is, enumerate all threads, attach to what you can (and yes setting the TRACE[V]FORK/TRACECLONE options). Then you need to re-enumerate the proc directory to make sure that no threads were spawned between the enumeration and the attaching.

  13. random_guy

    Yes, I can see how that particular case is problematic, although I would question the need, or even usefulness, to attach to such a thread. My point was that using an approach like strace, you can avoid the artificial problem that relying on /proc/PID/task presents.

  14. Evan Teran Post author

    @random_guy, you are right that spawning the process, attaching to the main thread and catching each thread creation event is the ideal solution as it avoids the problem entirely :-).

    But when a user chooses to attach to a already running program, they certainly expect all threads to be attached to. Ideally, I would like there to be some mechanism to attach to entire process (meaning all threads associated with the process) in an atomic way. Perhaps attaching to -pid or something else that would be compatible with the existing API.

  15. M Smith

    ptrace is a low level kernel API. This means that the code to implement it has to go into the kernel. Some of the conveniences he may want probably would be best done with a userspace library rather than in the kernel. Tracking the threads could be done by userspace libraries using ptrace as a backend. This way the convenience code is loaded by the application that needs it rather than having to be loaded into the kernel. If there is something that cannot be done in userspace, or the userspace needs some data from the kernel, that would be good to add to the kernel. Generally, ptrace is fine, if the userspace needs some piece of data or some additional control functionality, it can be added to the existing API. The kernel API generally tends to be raw data and fine grained control, not necessarilly be easy to use. Easy to use can be done by userspace library.

Leave a Reply

Your email address will not be published. Required fields are marked *