No optimization compilation usually produces awful code - it is useful just for debugging, where you need to step through code and not being disturbed by optimizer shuffling your code all over the shop or "hiding" variables. At least O1 optimization produces much better code density.
Along with "benchmark results" you should also provide CPU clock speed.
Benchmarks are somehow black magic, with lot of folks having very different opinions and discussing forever; some benchmarks are specific to particular action (floating point, random or sequential memory access) do work better on some architectures and perform poorly on another.