Thanks for appreciating my input really, I indeed want to help you avoid the same old mistakes. I don't want to just say "you are doing it wrong", I also want to provide pointers to how to fix it.
But I reiterate that yes, it's really a huge issue, based on the number of destroyed-by-BMS conversion EV projects I have seen personally, or analysed on the interwebz. Don't be one of those guys, focus on the core reliability issues at the expense of "nice features". It would be a good practice for a BMS designer to wake up every morning, and ask themselves: What is my task? - To always keep the cells within safe operating area regardless of what buttons end-user presses! If not this, why need BMS at all? In other words, BMS ain't entertainment gadget, it's a safety-critical component, also a component which manages expensive assets, and thus requires due diligence.
The problem isn't only the self-discharge of the pack.
(This actually is a problem, as well; as you calculated, if you lose 100% capacity in 8 months, you are already doomed. If the user runs the pack down to 10%, which is totally sensible cut-off limit as you don't want to waste expensive and heavy energy storage into excessive safety margins, this is already down to 3 weeks. As the consequence is serious, loss of expensive property and based on my observations, death of the project or company, I think this is a problem. You can't blame the user by not seeing the "if you run this flat, don't leave it standing for a few weeks but go charge immediately" warning. Robust products do not require such instructions, actually the only task of the BMS is to take care of the battery, not incur extra responsibilities to the user, then destroy the pack if the user fails to follow these often unwritten rules.
I witnessed a €50000 conversion EV self-destroy exactly because this, 12V lead acid battery ran flat, doors were locked so no one used it two months or so, during that relatively short time BMS self-destroyed half of the cells. It was also a very nice BMS with many features (including then-trendy redistributive balancing!) likely "worth" that consumption if you asked the designer. But in the end, the massively expensive and "advanced" BMS failed its only real job. That also triggered the re-design of the BMS into something that does not do that, focusing on core reliability instead of feature creep. Sadly, that resulting BMS project, after being tested in a few vehicles and energy storage systems for a few years, ended up in my drawer but such is life; the last 5% takes most of the time, and proving a supposedly good design really is good, is tedious work.)
But, the problem I focused on in above reply is unbalancing of the cells.
If one cell module leaks 10mA and another leaks 12mA, the cell charge levels drift apart at 2mA. Your balancer needs to correct this unbalance. Now if the balancer has 0.1% duty cycle to do balancing work (how do you even prove that assumption?), required balancing current just to correct the imbalance the BMS caused is already 2A.
You can solve this in a few ways,
First the obvious one is to reduce the imbalance you cause by minimizing the current draw. The limitation is at some point, actual self-discharge of the cells (the reason why you balance in the first place!) dominates again. But at 2mA difference for example, the BMS-induced imbalance will dominate. First get this down to numbers where you are not causing the problem you are solving.
The second one is to make the balance current higher. This usually leads nowhere since you can't go arbitrarily high or you hit thermal issues, or complexity/cost issues (by making it charge redistribution type BMS). IMHO, you already have moar power than I'm comfortable with, so you need to look further...
The third one is to increase the duty cycle of balancing, in other words, let it do its job often. This increases algorithmical complexity, which is a bitch to prove. The "classic" "trig balancing during CV phase of charging" won't cut it. The real challenge is to prove the minimum duty cycle over all typical and nontypical use cases!
My BMS design focused on things #1 and #3 and sacrificed #2. I brought quiescent current down to some 20µA, and active-state current to some 0.5mA IIRC, then implemented balancing algorithm which senses the amount of imbalance (as delta V between the cells) near the end of charge, but stores that information and keeps doing the balancing work even after charging is terminated, even if the EV is driven. The algorithm was something like this (assuming LFP):
V1 = 3.50V
V2 = 3.55V
V3 = 3.60V <-- this causes the charge cutoff
--> calculated shunting time:
Cell 1 = 4 hours
Cell 2 = 2 hours
Cell 3 = 0
There was a self-adjusting gain term etc., but that's the general idea.
Now what comes into the price to pay, it's great to have Android app, but really, does that require that you consume charge from individual cells? In the end, what is the task the cell modules have to do, or what they even can do? Measure voltage. Maybe measure temperature at each individual cell. Cell-level actions? Consume charge from the cell to balance. That's pretty much it. Current measurement already goes into the master module; Android communication should go as well. Amount of data to transfer between the cell modules and master is minuscule because voltages and temperatures are slowly changing variables. This doesn't need to consume so much current.
OTOH, you may be completely right that you don't have current variation issue, or you have advantageous current variation; that's the case if you have high-accuracy constant resistance load, as they will draw more current at higher voltage, balancing the cells. But, semiconductors (including microcontrollers) have large unit-to-unit variations, and specifically temperature variations. It's only matter of coincidence whether these random variables balance or unbalance the pack. Maybe you don't have a problem. Maybe you do. Maybe the problem only manifests itself years later, after a batch of 100 cell modules is made in such a way that some of the MCUs happen to come from a different batch. IMHO, BMS can't be designed this way. It needs to be robust and reliable even in varying conditions, and you need to prove as much as humanly possible, because human errors lurk in anyway. If you ignore alarming calculations regarding the most basic functions and use cases, you are taking a huge risk.