I really don't see how efficient could imply higher speed (as measured by a real time clock). Again, I'm not a native speaker, though.
Well, the dictionary definitions said "without wasting time" and "making good use of time". And higher speed means less time.
I only ever use "efficient" in the sense of less waste.
As in, if you get a more efficient gearbox in your ute, it doesn't mean it'll go faster or give more torque, as that depends on the gear ratios. It just means the losses in the gearbox are smaller. Here, the operation overheads are smaller. Whether those overheads
matter to any specific user, is a completely separate question.
end users care about time taken
Well, that too actually varies. For
interactive applications and utilities, that absolutely applies. (That's why e.g. a tool that sorts its input is better implemented by reading the input into a self-sorting data structure like a heap, instead of first reading all input and sorting it more efficiently, because the former does most of the work while the I/O is still being transferred, and therefore completes faster than the latter – but also uses a bit more CPU time.)
Batch utilities and background stuff, things that you run under
nice -n 19 ionice -c 3 , now that's different. You don't want them to hog the CPU or storage access, but you do want them to do their work at maximum efficiency, not wasting any RAM or such.
Which one does fractals belong to? I dunno. I used
fractint in the nineties, and I consider it a slow background task, but that's just me.
(Humans are weird in other ways, too. If they compare an action that has a latency of half a second but completes in a second, and an action that has a latency of ten milliseconds but completes in a second and a half, most of them say that the latter is "faster", even though it is really slower and just "more responsive" due to the smaller latency.)
If one wanted an interactive fractal viewer, doing so using Qt (C++) or Gtk+ (C) or even FLTK, SDL, Cairo Graphics, etc., would make more sense to me.
Making one that triggers a background process to calculate the currently visible one at much higher resolution might be nice. More on that later below.
TL;DR: the "efficiency" of an algorithm, without any clarifying context, will generally be assumed to refer to the the big Oh time complexity.
But big-Oh time complexity does not describe "speed" at all, only how the algorithm scales!
In HPC (especially parallel and distributed computing, like in atomic simulations which is kinda-sorta my field), "efficiency" is a general metric that includes the big-O time and space (additional memory needed) complexity, and usually even qualitative judgment on stuff like cache behaviour. It is used as a qualitative comparison between two or more implementations that take the same input and produce the same output, but use different amounts of available resources (memory – cache, RAM, storage space –; CPU time, wall clock time; network bandwidth; etc.) while doing so.
"Speed" is only a minor component of "efficiency" there.
Here, I used it in the sense that PPM is one of the rare image formats you can memory-map, allowing reduced memory pressure when running on the background, plus yields the bonus features like real-time progress check by simply viewing the image file still being generated, all without spending extra CPU or wall clock time. It does not make it any faster, just more "well-behaved" in a sense.
I've never seen any better term than "efficient" for describing this. Suggestions welcome!
In the sense you use it, I've always seen "speed"/"faster"/"slower" used instead.
If you've ever done any coding competitions, that could explain the difference, because none of the ones I've seen actually measure efficiency at all, only time (either CPU time, or wall clock time; usually they don't even tell which). Similarly in the demo scene, and game development.
Also, since you said "Memory-mapping is not about speed or time taken" -- we're in total agreement here! I think we're in total agreement about everything really. I think it's just good practice to volunteer that point as soon as possible, because it's such a common misconception that that's what it's about, and there's an army of us out there who (perhaps too eagerly) shoot down that point whenever it even gets slightly hinted at!
Definitely.
(I do realize now that I should have added a couple of
to indicate my attitude in this discussion. Apologies!)
Also, just to try to defuse a bit
No worries, I've typed all of my messages in this thread with a smile on my face, just forgot to ensure it is conveyed in the text. Me fail that way often, but I'm working on it.
I'm as guilty as the next person of trying to find problems to use a cool solution like memory-mapping on.
I don't think of memory-mapping as "cool" at all. It is actually pretty annoying; I do prefer to avoid it unless the features it yields are worth it.
I definitely did not suggest it because of memory-mapping itself is "cool", but because of the actual benefits in this particular case, which can be a bit surprising.
It's an useful tool, that's all.
As to the downsides: For example, the reason the
ppm_create() code does the
posix_fallocate() call is to reserve the disk space for the entire target file, similarly as if one had written zeroes to it (but letting the filesystem handle the actual details instead of writing actual zeros to it). With the
MAP_NORESERVE flag, if the file was sparse and the kernel wanted to evict any page that covers any part of a sparse hole in the file, the process will get a SIGSEGV signal, which is an utter bitch to handle.
Without the
MAP_NORESERVE flag, the kernel reserves swap space for the mapping (includes the entire mapping in the process memory accounting), so that even if niced and running in the background, the memory pressure from the image buffer is twice the size of the mapping. (This is the use case
MAP_NORESERVE was designed to fix, really.) Since most Linux machines typically run with overcommit enabled, that can lead to surprising memory accounting limitations that don't feel reasonable/sane.
With respect to fractal generation, I'm starting to think it would make more sense to instead memory-map a temporary/work file that contains only the iteration counts. A count of zero would be reserved for "not computed yet", so the count would reflect the number of iterations needed to exit the set. That means the file contents would not need to be initialized per se; just creating the file, ftruncate()'ing it to the correct size, and posix_fallocate()'ing to make sure the storage space for the entire file is available and reserved for it, suffices here, because POSIX semantics guarantees that the data reads as all zeros.
A separate process –– say, the interactive viewer, by naming the temporary/work file in the first place –– could monitor the progress by simply looking at the file.
The worker process could use a multi-pass method of calculating the pixels, essentially progressively filling in non-neighboring pixels in the buffer, so that the separate process could show a very low-resolution version of the fractal sooner, allowing the user to cancel the calculation. Perhaps the worker would note the parent process (a single byte in the response pipe/socket pair?) whenever a full pass is completed?
The reason for using a temporary file like this, is because it would act as shared memory between the parent and the child, but be restartable/continuable if the computing child is killed, or restarted with different settings (say, number of computing threads, different priority, etc.).
It also keeps the code rather simple. For example, instead of trying to raise an already reduced CPU or I/O priority, or change the number of threads used in calculating the fractal, the separate process just tells the worker to exit, and starts a new worker with otherwise the same settings, but differently-set priorities or thread count etc.
Each pixel can be treated as one task in a task pool, with a worker thread examining whether the target pixel has already been computed yet, and if so, skipping it and grabbing the next one. While this is not
optimal, it should not be too wasteful or slow either.
The task pool itself would consist of the calculation of which the next pixel in the image would be. The complex coordinates for the iteration can be easily derived from the pixel coordinates, if you specify the complex coordinates of the center of the image, the distance between neighboring pixels in the complex coordinate system, and the direction (or rotation) of the image around its center. These coordinates would only be calculated after checking the pixel hasn't been computed yet, making the above pixel-skipping an easy but not too wasteful method of letting the worker just die and restart another using the same image size and fractal parameters.
Using 32-bit iteration counts, one could safely use
__atomic_store_n(map + x + y*width, count, __ATOMIC_SEQ_CST) to atomically update the iteration count for a given pixel in Linux. (Corresponding atomic load would be
count = __atomic_load_n(map + x + y*width, __ATOMIC_SEQ_CST).) This would eliminate any "pixel flicker" from non-atomic updates.
Then, the interactive UI viewer could dynamically switch "palettes" – iteration count to color functions; I especially recommend looking at ones working internally in
HSL or HSV or
L*a*b* color spaces –, even during fractal calculation, just by switching the function it uses to turn each iteration count to a color.
To compress the images, one could choose the best set of up to 256 iteration counts (because in the iteration count space, difference between different counts is trivial; in color space, it is very, very nontrivial), and map the iteration counts to those, and generate a paletted image. One can do this directly to PCX (without any libraries), or just save as a PPM with only 256 (or fewer) unique colors in which case
ppmtopng will convert it into an indexed color PNG, or directly use libpng to save a PNG image. Personally, I'd probably implement an easily extensible interface where external programs are used to convert the data to an image format.
(I do find plugin interfaces using
<dlfcn.h> "cool", though, in the sense that I often implement the support for them, even if I don't have an actual plugin, or am even certain I will ever have actual plugins... The exact interface usually does need to be refined when the first few plugins are implemented, because it is rare (for me!) to get the interface right beforehand.)
In this scheme, the reason for the memory map is twofold: it acts as shared memory between the interactive ui and the worker calculating the fractal; and it allows trivially restarting the worker and continuing previously interrupted work. The reason for putting the worker into a separate process (thus the need for shared memory) is that it allows the (unprivileged but normal-priority) interactive ui to reduce the priority of the worker, so that the fractals can be calculated with easily managed, lower priority than other tasks the user might need.
If you don't care about these features, or don't intend to use them, then I see no need nor use for memory mapping here at all.
Oh, and once again, apologies for the wall of text.