I have nothing against assembly as such, sometimes some inline might be required (e.g. inline for SSE/AVX accelerations), but no one in a sane mind would write any project of sensible complexity in assembly, not in 2016. I often prefer to inspect the output assembly to see if I could hint the compiler to do better instead of writing inline snippets.
Maybe the advantages of the example presented in the talk are not clear to everyone. Let me explain. His memory(...) function with this reinterpret_cast is the only contact point with the hardware. One might call it a minimalistic HAL. The rest is C++ business as usual, readable to anyone familiar with C++ and not necessarily this particular chip assembly language.
Also, this is very easy to test. This memory(...) function can be mocked and unit tests can be written in C++ and run on developer's workstation (as there is no inline assembly!) and Continuous Integration systems. In the next step this function might instead interface with SystemC simulation of 6502 or a Cadence simulator - sometimes the hardware is not there yet. Only when hardware reaches FPGA and ASIC levels we can actually run the vanilla code somewhere, and this can be expensive so such testing is done in the later stages and mostly for performance profiling.
Additionally, this memory(...) function can be, for some test build type, replaced with a function that does RPC over UART/Ethernet/whatever to the target system running only so called "Monitor" that only reads and writes to memory/peripherals, while the code itself runs on x86.
Of course all this testing can be done with C and has been, but may require more ugly constructs (preprocessor macros etc), and with inline assembly it becomes a royal pain. The point here is that it might seem that his game is very abstracted (and this improves testability), with classes, TMP and other C++ tricks, and yet it produces very neat and compact assembly.