Thursday, June 14, 2012

Exploring Python Using GDB


People tend to have a narrow view of the problems they can solve using GDB. Many think that GDB is just for debugging segfaults or that it's only useful with C or C++ programs. In reality, GDB is an impressively general and powerful tool. When you know how to use it, you can debug just about anything, including Python, Ruby, and other dynamic languages. It's not just for inspection either—GDB can also be used to modify a program's behavior while it's running.
When we ran our Capture The Flag contest, a lot of people asked us about introductions to that kind of low-level work. GDB can be a great way to get started. In order to demonstrate some of GDB's flexibility, and show some of the steps involved in practical GDB work, we've put together a brief example of debugging Python with GDB.
Imagine you're building a web app in Django. The standard cycle for building one of these apps is to edit some code, hit an error, fix it, restart the server, and refresh in the browser. It's a little tedious. Wouldn't it be cool if you could hit the error, fix the code while the request is still pending, and then have the request complete successfully?
As it happens, the Seaside framework supports exactly this. Using one of Stripe's example projects.
Pretty cool, right? Though a little contrived, this example demonstrates many helpful techniques for making effective real-world use of GDB. I'll walk through what we did in a little more detail, and explain some of the GDB tricks as we go.
For the sake of brevity, I'll show the commands I type, but elide some of the output they generate. I'm working on Ubuntu 12.04 with GDB 7.4. The manipulation should still work on other platforms, but you probably won't get automatic pretty-printing of Python types. You can generate them by hand by running p PyString_AsString(PyObject_Repr(obj)) in GDB.

Getting Set Up

First, let's start the monospace-django server with --noreload so that Django's autoreloading doesn't get in the way of our GDB-based reloading. We'll also use the python2.7-dbginterpreter, which will ensure that less of the program's state is optimized away.
$ git clone http://github.com/stripe/monospace-django
$ cd monospace-django/
$ virtualenv --no-site-packages env
$ cp /usr/bin/python2.7-dbg env/bin/python
$ source env/bin/activate
(env)$ pip install -r requirements.txt
(env)$ python monospace/manage.py syncdb
(env)$ python monospace/manage.py runserver --noreload

$ sudo gdb -p $(pgrep -f monospace/manage.py)
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
[...]
Attaching to process 946
Reading symbols from /home/evan/monospace-django/env/bin/python...done.
(gdb) symbol-file /usr/bin/python2.7-dbg
Load new symbol table from "/usr/bin/python2.7-dbg"? (y or n) y
Reading symbols from /usr/bin/python2.7-dbg...done.
As of version 7.0 of GDB, it's possible to automatically script GDB's behavior, and even register your own code to pretty-print C types. Python comes with its own hooks which can pretty-print Python types (such as PyObject *) and understand the Python stack. These hooks are loaded automatically if you have the python2.7-dbg package installed on Ubuntu.
Whatever you're debugging, you should look to see if there are relevant GDB scripts available—useful helpers have been created for many dynamic languages.

Catching the Error

The Python interpreter creates a PyFrameObject every time it starts executing a Python stack frame. From that frame object, we can get the name of the function being executed. It's stored as a Python object, so we can convert it to a C string using PyString_AsString, and then stop the interpreter only if it begins executing a function called handle_uncaught_exception.
The obvious way to catch this would be by creating a GDB breakpoint. A lot of frames are allocated in the process of executing Python code, though. Rather than tediously continue through hundreds of false positives, we can set a conditional breakpoint that'll break on only the frame we care about:
(gdb) b PyEval_EvalFrameEx if strcmp(PyString_AsString(f->f_code->co_name), "handle_uncaught_exception") == 0
Breakpoint 1 at 0x519d64: file ../Python/ceval.c, line 688.
(gdb) c
Continuing.
Breakpoint conditions can be pretty complex, but it's worth noting that conditional breakpoints that fire often (like PyEval_EvalFrameEx) can slow the program down significantly.

Generating the Initial Return Value

Okay, let's see if we can actually fix things during the next request. We resubmit the form. Once again, GDB halts when the app starts generating the internal server error response. While we investigate more, let's disable the breakpoint in order to keep things fast.
What we really want to do here is to let the app finish generating its original return value (the error response) and then to replace that with our own (the correct response). We find the stack frame where get_response is being evaluated. Once we've jumped to that frame with the up orframe command, we can use the finish command to wait until the currently selected stack frame finishes executing and returns.
Breakpoint 1, PyEval_EvalFrameEx (f=
    Frame 0x3534110, for file [...]/django/core/handlers/base.py, line 186, in handle_uncaught_exception [...], throwflag=0) at ../Python/ceval.c:688
688 ../Python/ceval.c: No such file or directory.
(gdb) disable 1
(gdb) frame 3
#3  0x0000000000521276 in PyEval_EvalFrameEx (f=
    Frame 0x31ac000, for file [...]/django/core/handlers/base.py, line 169, in get_response [...], throwflag=0) at ../Python/ceval.c:2666
2666      in ../Python/ceval.c
(gdb) finish
Run till exit from #3  0x0000000000521276 in PyEval_EvalFrameEx (f=
    Frame 0x31ac000, for file [...]/django/core/handlers/base.py, line 169, in get_response [...], throwflag=0) at ../Python/ceval.c:2666
0x0000000000526871 in fast_function (func=<function at remote 0x26e96f0>, 
    pp_stack=0x7fffb296e4b0, n=2, na=2, nk=0) at ../Python/ceval.c:4107
4107                         in ../Python/ceval.c
Value returned is $1 = 
    <HttpResponseServerError[...] at remote 0x3474680>

Patching the Code

Now that we've gotten the interpreter into the state we want, we can use Python's internals to modify the running state of the application. GDB allows you to make fairly complicated dynamic function invocations, and we'll use lots of that here.
We use the C equivalent of the Python reload function to reimport the code. We have to also reload the monospace.urls module so that it picks up the new code in monospace.views.
One handy trick, which we use to invoke git in the video and curl here, is that you can run shell commands from within GDB.
(gdb) shell curl -s -L https://gist.github.com/raw/2897961/ | patch -p1
patching file monospace/views.py
(gdb) p PyImport_ReloadModule(PyImport_AddModule("monospace.views"))
$2 = <module at remote 0x31d4b58>
(gdb) p PyImport_ReloadModule(PyImport_AddModule("monospace.urls"))
$3 = <module at remote 0x31d45a8>
We've now patched and reloaded the code. Next, let's generate a new response by findingself and request from the local variables in this stack frame, and fetch and call itsget_response method.
(gdb) p $self = PyDict_GetItemString(f->f_locals, "self")
$4 = 
    <WSGIHandler([...]) at remote 0x311c610>
(gdb) set $request = PyDict_GetItemString(f->f_locals, "request")
(gdb) set $get_response = PyObject_GetAttrString($self, "get_response")
(gdb) set $args = Py_BuildValue("(O)", $request)
(gdb) p PyObject_Call($get_response, $args, 0)
$5 = 
    <HttpResponse([...]) at remote 0x31b9fb0>
In the above snippet, we use GDB's set command to assign values to variables.
Alright, we now have a new response. Remember that we stopped the program right where the original get_response method returned. The C return value for the Python interpreter is the same as the Python return value. And so, to replace that return value on x86, we just have to store the new return value in a register—$rax on 64-bit x86— and then allow the execution to continue.
GDB allows you to refer to refer to the values returned by every command you evaluate by number. In this case, we want $5:
(gdb) set $rax = $5
(gdb) c
Continuing.
And, like magic, our web request finishes successfully.
GDB is a powerful precision tool. Even if you spend most of your time writing code in a much higher-level language, it can be extremely useful to have it available when you need to investigate subtle bugs or complex issues in running applications.

No comments:

Post a Comment