I've worked _extensively_ with nRF51x22 based products, and I just this week completed my "research" into how the nRF52832 compares.
Things I'm not sure about are transfering large(ish) amounts of data over BLE
Like other people have mentioned, this is not what BLE was designed to do. You _can_ do it, but the energy/bit ratio makes it less efficient than regular BT once you pass about 1.5Kbps.
updating OTA
Nordic has a DFU-over-BLE implementation in their SDK and softdevice. It's.... Adequate...
secure bootloader/DFU
There's no such thing as security in a Nordic chip. There are various attempts at security, but since the read-out protection is cracked since a while back, it's all security theater.
how good Nordic's software is
The SDK is actually pretty okay. The examples are a convoluted mess though.
For example: You can take the RSC (Running Speed and Cadence) example, and trim it down from 3000+ lines of code to about 200-300 lines, and retain full functionality.
If you don't like the library, the peripherals aren't very complicated, and the documentation is pretty good, so banging the bits yourself isn't very hard.
if the BLE event handling will become difficult while running along side my application in an RTOS, etc?
Nordic _just_ introduced RTOS support in their SDK. It works remarkably well for their first attempt, but the integration between the softdevice and the RTOS (FreeRTOS in their case) is a bit awkward at times.
They'll get around to it though, I'm sure.
In its current implementation, there's a _slight_ power penalty for running FreeRTOS though. Their port is using the SysTick interrupt at a fixed rate, so there's no way to do dynamic ticks.
As for Nordics support, go visit their forums. I've never posted a question there myself, but my googling points me there quite often, and the Nordic people on it seems to answer all questions with remarkable technical know-how (as opposed to many other manufacturer forums)
Right now, we won't be switching over to nRF52832 from nRF51822. Mainly, because the 52 isn't available in WLCSP (yet), but there are other issues as well.
The entire chip seems to be 16MHz, except the CPU (64MHz). This means that things most certainly won't be going 4x as fast. In fact, you'll have plenty of wait-states for excecuting code from flash.
Also, the cache in the Cortex-M4F seems to be configured to cover the _entire_ address space. This means that if you want to write to a hardware register, and be _certain_ that the write went through to the peripheral before continuing, you need to read back the register..
Also, the wake-from-sleep latency seems (haven't measured though) a bit longer than the 51s. This is a major issue for us, as we're severely power-constrained.