If there is 1 active position out of 10,000 possible positions, you have a total entropy of 10,000 states, and need ceil(Lg(10,000)) = 14 bits to store it.
Most typical encoding would be binary positional, i.e., taking the value as an address into an array of 10k bits. The 74HC138 being a simple hardware example (1-of-8 decoder).
I can't tell what else you're talking about; the only part that seems consistent is "1 of those will be a 1 value". I don't know why you'd want a "set of 0 size", or what good it could be (you can declare any number of sets of size 0, anywhere in the array; it's a null operation?). I don't know how "ratio" or "scope" apply to bit arrays.
If you ever have multiple 1's, you will need some way to encode how many, and where. In principle, you need ceil(Lg(N-n)) bits for each subsequent bit, because, presumably bits shouldn't be placed on top of each other, that would just be one (redundant) bit, right? That is, the first bit has 10,000 locations to choose from, the next has 9,999, and so on. This is an insignificant difference at first (you don't even get to save one bit from the address variable until 8191 locations remain), but becomes important at high densities (which is why it seems plausible that RLE of a low-frequency pattern of 1s and 0s -- say, run lengths averaging 6 or more -- could be cheaply encoded by delta).
If there are few bits, a sparse model will compress better than an array or bitstream (at least for more naive methods, without getting into LZ compression for example -- which is very good all-around, worth considering if your problem isn't so trivial that one of these methods is easier, and assuming your platform can implement it). If they tend to be grouped together, taking deltas will be cheap. If they tend to be dense (short runs of 1s and 0s), RLE might still be used but with relatively short length, further compression being afforded with Huffman perhaps (which I've written before, for an embedded bitmap font).
Or even, if the exact number or position of bits doesn't matter so much, a lossy method might be used, for example taking any of the above cases but quantizing the arguments to shave off a few bits here and there.
Not to mention if there are higher order patterns in the data, for example if it can be wrapped into a 2D array, and columns, diagonals, blocks, etc. become useful. Mind that, the more shapes you put in, the more bits you need to select between them, so, at some point, so many shapes will just be one-offs that the compression ratio goes past a peak. (This would be equivalent, in a certain way, to LZ encoding, which uses pattern matching as well to build an ad-hoc "dictionary".)
Tim