Associate Editor: Karim Ali (@karimhamdanali)
Let's start with a simple Python example (run it live here):
This code creates instances of three basic data structures: an ordered collection (called a list in Python), a key-value mapping (called a dict or dictionary in Python), and an unordered set. Note how elements within these data structures can point to other data structures; for instance, the second element of the top list (accessible via the global variable x) points to the bottom list. Using Python Tutor, novices can easily see pointer and aliasing relationships by following the arrows in these diagrams. Without this tool, they would need to print out serialized string values to the terminal, which obscures these critical details.
How is Python Tutor implemented? By hooking into Python's built-in debugger protocol (bdb in its standard library). This tool runs the user's inputted code, single-steps through execution one line at a time, and traverses the object graph starting from globals and stack-local variables. It records a full trace of the stack and heap state at all execution steps and then sends the trace to the web frontend to render as interactive diagrams.
The main limitation of this "trace-everything" approach is scalability: it's clearly not suitable for code which runs for millions of steps or creates millions of objects. But code written by instructors and students in educational settings is usually small -- running for dozens of steps and creating around a dozen data structures -- so this simple approach works well in practice.
Here's what this example looks like in Ruby, once again implemented by hooking into the interpreter's built-in debugger protocol (run it live here):
This tool visualizes Java code in a similar way, but I'll skip that illustration to save space. Let's now turn to a more challenging pair of languages: C and C++. Unlike code in the above languages, C and C++ programs are not necessarily type- or memory-safe. This means that hooking into a debugger such as gdb isn't enough, since it's not clear which chunks of memory correspond to valid data. Here's a C example to show what I mean (run it live here):
This code allocates a 6-element integer array on the stack (accessible via localArray) and a 10-element integer array allocated on the heap via malloc (accessible via b). It then populates the elements of both arrays at indices 1, 3, and 5. The resulting visualization shows those initialized values and ? symbols next to the remaining uninitialized values. In addition, it knows that the heap array has exactly 10 elements and does not try to read unallocated elements beyond that bound, which risks crashing the program. Readers familiar with C and C++ will recognize that such memory allocation and initialization data is not available to debuggers such as gdb. Python Tutor hooks into Valgrind Memcheck to get this vital data. Without something like Memcheck, it would be impossible to build a safe and accurate visualizer for C and C++.
Finally, let's end with a C++ example (run it live here):
This visualization shows object-oriented programming concepts such as calling an instance method Date::set(), its this pointer referring to a Date object on the stack (accessible via stackDay), and another Date object on the heap (allocated with the new keyword and accessible via heapDay). Just like it does for C programs, Valgrind Memcheck ensures that Python Tutor reads only memory that has been both allocated (here recognizing that there is only one Date object on the heap) and initialized (so that it doesn't show stale junk values).
That was my quick tour of how Python Tutor works for a variety of languages that are frequently used to teach programming. The underlying principle that drives my implementation decisions is that authenticity is key: experts can work around bugs or other quirks in debugging tools, but novices will get confused and demoralized if a tool renders inaccurate diagrams. Thus, this tool needs to be able to take whatever code that users put into it and do something reasonable, or at least fail in a transparent way (e.g., stopping after 1,000 execution steps and suggesting for the user to shorten their code). I've tried out alternative language interpreters and experimental runtimes to get more detailed tracing (e.g., finer-grained stepping into subexpressions rather than line-level stepping), but time and time again I've gone back to built-in debugger hooks and widely-deployed tools such as Valgrind since they are far more robust than experimental alternatives. Try it out today at http://pythontutor.com/ and let me know what you think!