Linux’s ptrace API sucks!

I love Linux, as a developer, I find the tools available suit my style of work perfectly. Sometimes the tool that I want isn’t available. That’s OK though, because whenever I can, I try to contribute.

I do a lot of reverse engineering work and thus the lack of anything like Ollydbg spawned off my EDB project. It’s a debugger designed to focus on applications at a machine code level. This project is coming along nicely but there is one thing that I really wish I could change…ptrace sucks, and it sucks a lot.

First of all, it has no inherent support for threads. This is a huge problem as many modern applications are multi threaded. Instead, since Linux treats threads as independent processes which happen to share the the majority of there address spaces, you are supposed to ptrace each thread individually. So, you need to attach to all “pids” found in “/proc/<pid>/task/”. On top of this, you have to set an option in ptrace to attach to new threads as they are created with the clone system call. It is entirely undocumented whether you need to do this on a per process basis or a per-thread basis. Finally, you need track threads exiting so you know to stop looking for events on those. That’s an awful lot of effort just to be able to debug threads. By the way, the information necessary to know which bits of the status code tell you if it was a clone event (new thread) is also entirely undocumented.

Did i mention the potential race condition with attaching to all threads in the /proc/<pid>/task/ directory? Since the threads could be spawning more threads while you are enumerating the directories. Threads could even exit during this time. So you have to loop continually trying to attach until the you are sure the thread count is stabilized and all are attached to. The only saving grace here is that attaching stops a thread so it is possible to get them all if you think it through.

Next, I wish that the PTRACE_PEEK* and PTRACE_POKE* request types had support for non-word width granularity. It makes setting a breakpoint more annoying than it has to be since you really only want to read/write a single byte in that case. Not only that, but reading/writing from the edges of region boundaries is equally annoying. A much better interface would have been similar to the file API where you can specify and address and a length. In addition to this, you need to pay careful attention to the various gotchas due to the fact that the return value is both an error code and a result. So if it returns (long)-1, then you need to check errno just to make sure that it isn’t an error.

The usage of wait for debug events is just awkward. It works great for single threaded command line debuggers like gdb, but for a GUI, where you want things to be interactive while the debugger is waiting for the next event, it is a disaster. Sure you can use a separate thread to capture events and deliver them to the GUI, but then you have issues properly shutting down that thread, since it will pretty much always be blocked! Also, wait has no timeout, so if you aren’t careful it is possible to get hung forever waiting for an even that will never happen. There is SIGCHLD, which sounds promising at first, but the fun part is that without sigprocmask trickery, you can’t predict which of your threads will get the signal. Grr!

Finally, there is lots of information that would be better suited being part of the debugging API. A great example of this is x86 segments. This really should be in the user area. You can get the segment values from the user area or even a PTRACE_GETREGS request. But the segment values are nearly worthless without being able to look at things like the segment base and limits. I understand not all platforms have this data, and x86-64 has much less usage of segments, but that’s why it should be in the user area.

A better API would first of all, be at a process level. I don’t care how it works under the hood, I want to attach to “processes.” There should also be a function to enumerate threads, this would only be valid when the process is stopped. This way you could get/set the context of each thread by passing a tid. Just these changes would make things much easier.

Overall, the user space API provided by ptrace could use a large overhaul. I understand the desire to be consistent with other unix’s debugging APIs, but this should not get in the way of making something usable.

utrace sounds ok, but as far as I know, it is designed to be kernel level changes. In fact, it appears there are plans to have ptrace implemented on top of utrace in the future. That’s great and all, but the user space API needs an update! I can only hope that utrace bring along a new user space API as well.

This entry was posted in General. Bookmark the permalink.

9 Responses to Linux’s ptrace API sucks!

  1. freenity says:

    Oh man thanks for this great debugger :)
    I love ollybdg and that’s exactly what linux needs a good debugger.
    Thanks again and keep with the project
    Good luck

  2. Pingback: Dr. Riekeyword - Child Abuse Avoidance Suggestion Tool

  3. Pingback: Mr_Keyword_Suggestion

  4. cyphunk says:

    thanks for the concise overview of complaints, and some junk im dealing with finding a workaround for now. i guess ill go dig in your code to find a solution ,)

  5. cyphunk says:

    aaaah, yes. I tried to erase from my memory all that you have thus documented here, for good reason. Tell me, other that utrace, whats your opinion on something like SystemTap?

  6. Evan Teran says:

    at first glance, systemtap looks pretty cool. Though I think it is trying to fill a different void than the ptrace API. I would imagine that systemtap is implemented using ptrace and provides a nicer abstraction (which is a good thing and ptrace is deeply annoying). Maybe I’ll take a look at the source and see what techniques they use.

    Thanks.

  7. Marlow says:

    So what do you recommend in finding the ptrace documentation necessary for this? I’m having one hell of a time.

    FWIW, systemtap is nice and all, but you have to build a new module for each stupid set of options you want to use. Furthermore, you need to patch the kernel in order to use the markers version.

    There is hope on the horizon for people who don’t need these abilities now-there is a new in-kernel package called cgroup, although it had definite problems the first time I tried to use it (and I need the 2.6.15 kernel).

  8. Marlow says:

    PS. It doesn’t use ptrace-it uses kprobes, which does not guarantee catching of exit calls. Kprobes allow you to catch any global system call, whereas ptrace seems geared mainly to signals.

    Also, if you are using a diskless system and trying to transfer stuff over the wire and need a binary form for speed, systemtap is useless.

  9. Hello,

    Nice article ;)

    Just a comment: System Tap is designed for kernel debugging (of course you can see what the program is doing, but it will give you an overview of the kernel-mode execution of it) whereas ptrace is for user-mode debugging…

    Ptrace completely sucks mainly because nobody really knows it inner functionality anymore. I mean, the developers are just keeping it working, but the low-level interfaces with the hardware are mainly unknow and everybody is afraid of new patches (even for supporting new architectures) because of that.

    Regards,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>