General > General Technical Chat
Entropy in programmed vs blank devices
Circlotron:
So I’m sitting here downloading firmware into MCUs and it occurred to me that when the device is blank the flash memory space contains all zeros, (or maybe all ones like an eprom, I don’t really know) but when it is programmed it contains a varying mix of one’s and zeros. So... the blank unprogrammed memory contains no information but is completely uniform and orderly, and the programmed memory contains *information* but has ones and zeros scattered all over the place. Which would you call high entropy and which would you call low entropy?
profdc9:
So a definition of entropy is the amount of information needed to describe a microstate, that is, every configuration of the memory, given a description of the macrostate. A memory that is described as, for example, 64 kilobytes, has 524,288 bits. There are 2^524288 possible states of the memory given that description. So it's proper to describe the entropy in bits as 524,288, which is the logarithm of the number of combinations base 2.
The same idea applies to physical systems such as gases. We describe a gas with certain extrinsic quantities such as volume and mass, and intrinsic quantities such as pressure and temperature. With physical systems, the system can take on a continuum of states because each gas particle can be at any position in the box, as opposed to a discrete set of countable states like all of the possible combinations of 0 and 1 bits in a flash memory. So instead we can specify the entropy differences with physical systems that have uncountable numbers of states.
Lets say we have an ideal gas, so a gas where the particles do not interact or collide or have any attraction or repulsion. The gas is in a cylinder with a movable piston, and the piston can be pulled in or out very slowly to increase or decrease the box volume. The position of every particle can be described as an x, y, and z position somewhere in the volume. Now imagine that the piston is pushed in slowly to halve the volume. We are compressing the gas in the cylinder. Because each particle can only be in 1/2 of the volume it used to be in, we can say that one bit of information less is required to describe the position of each gas molecule in the cylinder, as if you described the position of the particle with the same positional accuracy, only half the range needs to be described after compression. To force the gas particles into the smaller space, we had to do work. Boltzmann's constant or 1.38E-23 J/K tells us (logarithm base e) how much work is required to reduce the entropy of the gas, which is k ln 2 = 1E-24 J/K per gas particle. At room temperature, or T=300 K this is 3E-22 Joules per gas particle.
Basically, we pushed on the piston and did work on the system, reducing its entropy (and the number of states the particles can be in) while increasing the entropy of the universe as a whole. So you can see with a microchip or a gas how entropy is really just a measure of what is not known about the system given its macroscopic description.
In fact for memory, Landauer's principle ( https://en.wikipedia.org/wiki/Landauer%27s_principle ) exactly describes how much energy is needed to erase a bit. This is exactly what we did by compressing the gas in the cylinder. We erased a bit of range needed to describe the gas particles position, and so the minimum energy needed to erase a bit is also about 3E-22 J at room temperature.
I wanted a rude username:
Thanks for the splendid explanation!
That analogy of gas in a piston would also hold for data compression, right?
profdc9:
--- Quote from: I wanted a rude username on May 06, 2020, 11:35:29 am ---Thanks for the splendid explanation!
That analogy of gas in a piston would also hold for data compression, right?
--- End quote ---
So for lossless data compression, the analogy does not hold, because for example, when we compressed the gas, there are fewer states available to each gas particle than before it was compressed and so the state before compression can not be recovered because there are more states available before compression than after compression. This is why the entropy decreases in the cylinder (but not in the universe as a whole) because we used work to reduce that possible number of states. The pigeonhole principle says that if a function or transformation is going to preserve data, there must be at least as many available unique output bit strings as input strings, but we have reduced the number of output strings and so each output string can not be mapped uniquely backed to an input string.
A lossless data compression function is a deterministic function that maps each input configuration of the gas particles to a different output configuration. By the way, if the positions of the gas particles are truly random, data compression algorithms in general can not be used effectively because there are not enough short bit strings to map to the long random bit strings. This is also a consequence of the pigeonhole principle, so data compression can not be used effectively on a generally random ensemble of positions like gas particle positions.
A lossy data compression is even worse in this regard, because we deliberately throw away information by mapping the input function to fewer bits in the output than in the input. The data that is lost, for example in JPEG or video codes, is usually that deemed to not be noticeable by the viewer, but still the image or video that was compressed can not be uniquely recovered from the compressed data stream. But a lossy data compression algorithm is a better analogy to a compressed gas in a cylinder as bits are actually lost and unrecoverable, like the initial positions of the gas particles before compression can not be recovered after compression.
Incidentally, before I mentioned that we were compressing the gas slowly, or isothermally. This means that during the compression, the gas remains in thermal equilibrium with the environment at the temperature of the environment which is assumed to be large compared to the cylinder and therefore stays at a constant temperature. The information that is lost from the gas particles positions enters the environment and become effectively unrecoverable. But what if we compress the piston very suddenly, or adiabatically? Then there is no time for the heat to be transferred to the environment or information to be lost to the environment. The gas particles compress but they also get hotter. The particles possible locations have been halved in volume, so one less bit is required to describe that, but the particles are hotter and so the distribution of velocities increases. More bits are required to describe the velocities of the gas particles in the box if we compress quickly, whereas if we compress slowly the temperature does not change and therefore the distribution of velocities does not change. On the whole the same number of bits is needed to describe the positions and velocities of the particles together after sudden compression than before compression. No information is lost in the sudden compression, and if you quickly release the piston after sudden compression, the piston will rebound to its orignal location as the pressure of the hot gas inside pushes the piston back out again. Only if we give the system time for the heat to transfer to the environment will the entropy decrease in the piston.
So a simple heat engine's operation can be described as cycle of isothermal and adiabatic operations, and now you know how a heat engine is basically a computational device (and vice-versa). ;) You can go further with these analyses relating computation and thermodynamics with Maxwell's Demon and the Feynman Ratchet.
T3sl4co1l:
There's also the abstract information definition,
https://en.wikipedia.org/wiki/Information_content
which, it looks like you need a probability to work with; the base (expected) probability is simply 0.5 for each bit, but you'd need to subtract conditional probabilities to account for long runs of zeros, ones or patterns thereof, I think? I don't know an algorithmic calculation offhand. Put another way, the autocorrelation of machine code for example tends to be spiky, while random data has an expected distribution of peaks and valleys around a flat baseline.
This kind of analysis can be useful for investigating unknown code; an obfuscated program might have an "extractor" of orderly machine code to start, then the bulk is suddenly high entropy: compressed or encrypted data. The analysis might be applied again during various stages of execution, to see if it's being decrypted or expanded, or remains especially obfuscated, etc.
Tim
Navigation
[0] Message Index
[#] Next page
Go to full version