Author Topic: Generating Beautiful Fractals (Read 4709 times)

Omega Glory · « **on:** November 23, 2021, 04:09:45 pm »

Because of the chip shortage, I decided to take a break from hardware projects and experiment more with software. I've created a program to generate plots of the Mandelbrot and Julia sets which turned out quite nicely. The actual set calculation is done in C for speed and access to multithreading, where as the actual image generation is done in Python. Here's the Github repo in case you want to generate your own images: https://github.com/ept221/julia. Below are some of the generated images (the resolution was reduced so that they could be attached). The burning ship fractal show below uses a different equation than the Mandelbrot set, but only requires one line of the source code to be modified.

Omega Glory · « **Reply #1 on:** November 23, 2021, 04:17:59 pm »

Here are some created using the equation z = sin(z^2) + c:

golden_labels · « **Reply #2 on:** November 23, 2021, 06:17:56 pm »

At this point I must link two recent videos from 3blue1brown related to fractals:

How a Mandelbrot set arises from Newton’s work:
Newton's Fractal (which Newton knew nothing about):

There is bauty beyond the visuals.

And to those, who do not know it yet, I recommend the entire channel. Grant Sanderson has an amazing skill in putting problems into different perspective and conveying concepts.

Nominal Animal · « **Reply #3 on:** November 23, 2021, 06:22:59 pm »

If you wanted to, you could also use NetPBM P6 (.ppm) output from the C program directly. These can be compressed to PNG using the NetPBM tools (pnmtopng -compress 9 source.ppm > output.png), but are also directly openable in many applications, like Gimp et cetera. (You don't need any libraries, only the tools, which are provided by the netpbm package in all Linux distributions.)

The PPM format is very simple. It has a human-readable header, and image data in uncompressed raw binary form, either 8 bits per component (24 bits per pixel) or 16 bits per component (48 bits per pixel), although not all applications may be able to open the latter. The exact spec for the P6 format is here.

For an introduction, see e.g. Wikipedia NetPBM article, noting that the examples use P1-P3, or the ASCII formats, not the corresponding binary P4-P6 formats. The header in all these formats is always in text form.

You could also memory-map the output file (which could not be a pipe then, though), and let each worker thread just write their output pixel value to the file directly (without seek-read-write overhead, since the data is memory-mapped). If you initialized the output file to a color value that is not used, you can make it completely restartable too.. The trick with memory-mapping is to use a suitable fixed-size header, say 32 bytes, with padding in the size "row", and to use msync() before unmapping or exiting the program. I could whip up an example of the mechanism needed, if you like.

SiliconWizard · « **Reply #4 on:** November 23, 2021, 06:28:50 pm »

I've done something like that a few years ago too. Fractals are fascinating.
As to PNG, I did use libpng directly. It's not that hard to use.

Omega Glory · « **Reply #5 on:** November 24, 2021, 03:59:47 pm »

@golden_labels:
Yes! His videos are always something else! These particular ones were actually the inspiration to work on this project.

@Nominal Animal:
Interesting, thank you for pointing me towards NetPBM, that sounds straightforwards to implement, and would probably speed things up quite a bit. Right now a lot of the compute time is spent transferring the data between the C and Python programs, so it would be nice to cut that out. I'm not exactly sure what you were getting at when you talked about initializing the output file to a color, and making it completely restartable. Could you elaborate? Thanks again for your helpful suggestions.

@SiliconWizard:
I did briefly look into libpng, but at the time it was going to be faster for me to whip something together with Python, though I admit that would be a lot better.

Nominal Animal · « **Reply #6 on:** November 25, 2021, 12:41:45 am »

The main "benefit" of this memory-mapping approach is that you can draw the pixels in a completely arbitrary order, and during calculation, you can use e.g. eog (Eye of Gnome) or any other program to view the progress.

Welp, this is not production code, but hopefully shows the principles. I need to write better comments, though... No guarantees, no warranties, and you get to keep the bugs that almost certainly lurk in the code, but it is Creative Commons Zero 1.0 -licensed, so do with it whatever you want.

First, ppm.h:

Code: [Select]

/* SPDX-License-Identifier: CC0-1.0 */
#ifndef   PPM_H
#define   PPM_H

#if defined(__GNUC__) || defined(__clang__)
#define  HELPER_FUNCTION  __attribute__((unused)) static inline
#else
#define  HELPER_FUNCTION  static inline
#endif

typedef  struct ppm  ppm;
struct ppm {
    unsigned char   *rgb;
    void            *map;
    size_t           xstride;
    size_t           ystride;
    size_t           size;
    int              width;
    int              height;
};

/* Create and open a new PPM image file. */
int  ppm_create(ppm *const img, const char *filename, const int width, const int height, const int color);

/* Open an existing PPM image file. */
int  ppm_open(ppm *const img, const char *filename);

/* Close an open PPM image file. */
int  ppm_close(ppm *const img);

/* Helper functions:

     ppm_width(img)
        Return the width of the PPM image in pixels, or -1 if no image.

     ppm_height(img)
        Return the height of the PPM image in pixels, or -1 if no image.

     ppm_setpixel(img, x, y, color)
        Given a hexadecimal color 0xRRGGBB, set pixel at column x, row y to 'color'.
        It is safe to call this with x and/or y outside the image.

     ppm_getpixel(ppm *img, int x, int y, int outside)
        Return the color at column x, row y; or outside, if x and/or y outside the image.
*/


HELPER_FUNCTION void ppm_setpixel(ppm *const img, const int x, const int y, const int color)
{
    if (img && x >= 0 && x < img->width && y >= 0 && y <= img->height) {
        unsigned char *const  rgb = img->rgb + (size_t)x * img->xstride + (size_t)y * img->ystride;
        rgb[2] = color & 0xFF;
        rgb[1] = (color >> 8) & 0xFF;
        rgb[0] = (color >> 16) & 0xFF;
    }
}

HELPER_FUNCTION int ppm_getpixel(ppm *const img, const int x, const int y, const int outside)
{
    if (img && x >= 0 && x < img->width && y >= 0 && y <= img->height) {
        const unsigned char *const  rgb = img->rgb + (size_t)x * img->xstride + (size_t)y * img->ystride;
        return ((unsigned int)(rgb[0]) << 24) | ((unsigned int)(rgb[1]) << 16) | rgb[2];
    } else {
        return outside;
    }
}

HELPER_FUNCTION int ppm_width(ppm *const img)
{
    return (img) ? img->width : -1;
}

HELPER_FUNCTION int ppm_height(ppm *const img)
{
    return (img) ? img->height : -1;
}

#endif

The ppm structure is a basic memory-mapped image structure that allows you to access the RGB data either directly, or via the ppm_getpixel() and ppm_setpixel() helper functions. The red component (0..255) is at ppm.rgb+(size_t)(x)*ppm.xstride + (size_t)(y)*ppm.ystride , the green component in the following byte (offset +1), and the blue component in the byte following that (offset +2).

The implementation is in ppm.c:

Code: [Select]

/* SPDX-License-Identifier: CC0-1.0 */
#define _POSIX_C_SOURCE  200809L
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#include "ppm.h"

static inline unsigned char *parse_field(unsigned char *src, unsigned char *const end, size_t *to)
{
    /* Verify we're still within the buffer. */
    if (!src || src >= end)
        return NULL;

    /* Remember old starting point, since at least one whitespace character or comment is required. */
    unsigned char *const start = src;

    /* Skip comments and whitespace. */
    while (src < end)
        if (*src == '#')
            while (src < end && *src != '\n' && *src != '\r')
                src++;
        else
        if (*src == '\t' || *src == '\n' || *src == '\v' || *src == '\f' || *src == '\r' || *src == ' ')
            src++;
        else
            break;

    /* No whitespace nor comments? Fell off the buffer? */
    if (start == src || src >= end)
        return NULL;

    /* Not a decimal digit? */
    if (*src < '0' || *src > '9')
        return NULL;

    /* Parse decimal number. */
    size_t  val = 0;
    while (src < end && *src >= '0' && *src <= '9') {
        const size_t  old = val;
        val = (10 * val) + (*(src++) - '0');
        if ((size_t)(val / 10) != old)
            return NULL;  /* Overflow */
    }

    /* Save parsed number. */
    if (to)
        *to = val;

    return src;
}

int ppm_close(ppm *const img)
{
    int  err = 0;

    if (!img)
        return errno = EINVAL;

    void *const   map = img->map;
    const size_t  size = img->size;

    img->rgb = NULL;
    img->map = MAP_FAILED;
    img->xstride = 0;
    img->ystride = 0;
    img->size = 0;
    img->width = -1;
    img->height = -1;

    if (size > 0 && map != MAP_FAILED) {

        /* Flush all changes back to the filesystem. */
        if (msync(map, size, MS_SYNC) == -1)
            err = errno;

        /* Undo the mapping. */
        if (munmap(map, size) == -1)
            if (!err)
                err = errno;

    }

    return errno = err;
}

int ppm_create(ppm *const img, const char *filename, const int width, const int height, const int color)
{
    unsigned char *map;
    char           header[32];
    int            header_len, fd, rc;
    struct flock   lock;

    if (!img)
        return errno = EINVAL;

    /* Initialize the image descriptor to safe/invalid values. */
    img->rgb = NULL;
    img->map = MAP_FAILED;
    img->xstride = 0;
    img->ystride = 0;
    img->size = 0;
    img->width = -1;
    img->height = -1;

    if (!filename || !*filename)
        return errno = ENOENT;

    if (width < 1 || height < 1)
        return errno = EINVAL;

    /* Construct the PPM image header. */
    header_len = snprintf(header, sizeof header, "P6\n%d %d\n255\n", width, height);
    if (header_len < 0 || (size_t)header_len >= sizeof header)
        return errno = ENOMEM;

    /* Calculate the PPM image size and file size, being careful about overflow. */
    const size_t  xsize = width;
    const size_t  ysize = height;
    const size_t  pixels = xsize * ysize;
    const size_t  bytes = 3 * pixels;
    const size_t  total = bytes + (size_t)header_len;

    if ((size_t)(pixels / ysize) != xsize || (size_t)(pixels / xsize) != ysize ||
        (size_t)(bytes / 3) != pixels || total <= bytes || (size_t)((off_t)total) != total)
        return errno = ENOMEM;

    /* Create the target file. */
    do {
        fd = open(filename, O_RDWR | O_CREAT | O_EXCL, 0666);
    } while (fd == -1 && errno == EINTR);
    if (fd == -1)
        return errno;

    /* Place an advisory exclusive lock on the header part of the file */
    lock.l_type = F_WRLCK;
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;
    lock.l_len = header_len;
    do {
        rc = fcntl(fd, F_SETLKW, &lock);
    } while (rc == -1 && errno == EINTR);
    if (rc == -1) {
        const int  saved_errno = errno;
        unlink(filename);
        close(fd);
        return errno = saved_errno;
    }

    /* Resize to final size. */
    do {
        rc = ftruncate(fd, (off_t)total);
    } while (rc == -1 && errno == EINTR);
    if (rc == -1) {
        const int  saved_errno = errno;
        unlink(filename);
        close(fd);
        return errno = saved_errno;
    }

    /* Reserve storage for the file contents. */
    do {
        rc = posix_fallocate(fd, (off_t)0, (off_t)total);
    } while (rc == -1 && errno == EINTR);
    if (rc == -1) {
        const int  saved_errno = errno;
        unlink(filename);
        close(fd);
        return errno = saved_errno;
    }

    /* Memory-map the file. Because we've reserved its storage already,
       we do not need to reserve swap for it. */
    map = mmap(NULL, total, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, (off_t)0);
    if (map == MAP_FAILED) {
        const int  saved_errno = errno;
        unlink(filename);
        close(fd);
        return errno = saved_errno;
    }

    /* Copy the file header. */
    memcpy(map, header, header_len);

    /* Initialize the pixel data to the given values. */
    {
        unsigned char       *rgb = map + header_len;
        unsigned char *const end = map + total;
        const unsigned char  r = (color >> 16) & 255;
        const unsigned char  g = (color >>  8) & 255;
        const unsigned char  b =  color        & 255;
        while (rgb < end) {
            *(rgb++) = r;
            *(rgb++) = g;
            *(rgb++) = b;
        }
    }

    /* Tell the OS the entire mapping should be, at some point, be written to a file.
       In Linux, this doesn't do anything, but for portability, let's keep the call. */
    msync(map, total, MS_ASYNC);

    /* Unlock the file. */
    lock.l_type = F_UNLCK;
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;
    lock.l_len = header_len;
    do {
        rc = fcntl(fd, F_SETLKW, &lock);
    } while (rc == -1 && errno == EINTR);
    if (rc == -1) {
        const int  saved_errno = errno;
        unlink(filename);
        munmap(map, total);
        return errno = saved_errno;
    }

    /* We can now close the file descriptor, as it is not needed while using the mapping. */
    if (close(fd) == -1) {
        const int  saved_errno = errno;
        unlink(filename);
        munmap(map, total);
        return errno = saved_errno;
    }

    /* Update the image structure. */
    img->rgb = map + header_len;
    img->map = map;
    img->xstride = 3;
    img->ystride = 3 * xsize;
    img->size = total;
    img->width = width;
    img->height = height;

    /* Normally, errno is not zeroed. However, this is a complex, long operation,
       with multiple errno manipulations during its run, so here doing so is okay. */
    return errno = 0;
}

int ppm_open(ppm *const img, const char *filename)
{
    unsigned char *map;
    int            fd, rc;
    struct flock   lock;
    struct stat    info;

    if (!img)
        return errno = EINVAL;

    /* Initialize the image descriptor to safe/invalid values. */
    img->rgb = NULL;
    img->map = MAP_FAILED;
    img->xstride = 0;
    img->ystride = 0;
    img->size = 0;
    img->width = -1;
    img->height = -1;

    if (!filename || !*filename)
        return errno = ENOENT;

    /* Open the existing PPM image. */
    do {
        fd = open(filename, O_RDWR);
    } while (fd == -1 && errno == EINTR);
    if (fd == -1)
        return errno;

    /* Obtain an advisory read lock on the entire file. */
    lock.l_type = F_RDLCK;
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;
    lock.l_len = 0; /* Entire file */
    do {
        rc = fcntl(fd, F_SETLKW, &lock);
    } while (rc == -1 && errno == EINTR);
    if (rc == -1) {
        const int  saved_errno = errno;
        close(fd);
        return errno = saved_errno;
    }

    /* Obtain the file information. */
    if (fstat(fd, &info) == -1) {
        const int  saved_errno = errno;
        close(fd);
        return errno = saved_errno;
    }

    /* Check for size overflow. Minimum PPM image size is ten bytes. */
    const size_t  total = (size_t)info.st_size;
    if (info.st_size < 10 || (off_t)total != info.st_size) {
        close(fd);
        return errno = ENOMEM;
    }

    /* Map the file read-write.  Don't reserve swap for the mapping. */
    map = mmap(NULL, total, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, (off_t)0);
    if (map == MAP_FAILED) {
        const int  saved_errno = errno;
        close(fd);
        return errno = saved_errno;
    }

    /* Verify signature. */
    if (map[0] != 'P' || map[1] != '6') {
        munmap(map, total);
        close(fd);
        return errno = EDOM;  /* Unsupported format */
    }

    unsigned char *const end = map + total;
    unsigned char       *rgb = map + 2;
    size_t               w = 0, h = 0, n = 0;

    rgb = parse_field(rgb, end, &w);
    rgb = parse_field(rgb, end, &h);
    rgb = parse_field(rgb, end, &n);

    /* Invalid header? */
    if (!rgb || rgb >= end || w < 1 || h < 1 || n != 255) {
        munmap(map, total);
        close(fd);
        return errno = EDOM;  /* Unsupported format */
    }

    /* Header must end with a single whitespace character. */
    if (*rgb == '\t' || *rgb == '\n' || *rgb == '\v' || *rgb == '\f' || *rgb == '\r' || *rgb == ' ') {
        rgb++;
    } else {
        munmap(map, total);
        close(fd);
        return errno = EDOM;  /* Unsupported format */
    }

    const size_t  header_len = (size_t)(rgb - map);
    const size_t  bytes = 3*w*h;
    const size_t  check = bytes + header_len;

    if ((size_t)((bytes / w) / 3) != h || (size_t)((bytes / h) / 3) != w || check > total) {
        munmap(map, total);
        close(fd);
        return errno = EDOM;  /* Unsupported format, or header error */
    }

    /* Check for size overflow. */
    if ((int)w <= 0 || (size_t)((int)w) != w ||
        (int)h <= 0 || (size_t)((int)h) != h) {
        munmap(map, total);
        close(fd);
        return errno = ENOMEM;  /* Image is too large */
    }

    /* File seems okay.  Unlock advisory lock, */
    lock.l_type = F_UNLCK;
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;
    lock.l_len = 0; /* Entire file */
    do {
        rc = fcntl(fd, F_SETLKW, &lock);
    } while (rc == -1 && errno == EINTR);
    if (rc == -1) {
        const int  saved_errno = errno;
        munmap(map, total);
        close(fd);
        return errno = saved_errno;
    }
    /* and close the file. */
    if (close(fd) == -1) {
        const int  saved_errno = errno;
        munmap(map, total);
        return errno = saved_errno;
    }

    /* Update image structure. */
    img->rgb = rgb;
    img->map = map;
    img->xstride = 3;
    img->ystride = 3*w;
    img->size = 0;
    img->width = (int)w;        /* Verified not to overflow */
    img->height = (int)h;       /* Verified not to overflow */

    /* Normally, errno is not zeroed. However, this is a complex, long operation,
       with multiple errno manipulations during its run, so here doing so is okay. */
    return errno = 0;
}

Here is a silly example program, that when given a nonexistent file name, will make it into a 16×16-pixel PPM image filled with blue color; and otherwise will replace one blue pixel in the existing image with white. test.c:

Code: [Select]

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include "ppm.h"

static int parse_int(const char *src, int *to)
{
    const char *end = src;
    long        val;

    errno = 0;
    val = strtol(src, (char **)&end, 0);
    if (errno || end == src || (long)((int)val) != val)
        return -1;

    while (*end == '\t' || *end == '\n' || *end == '\v' ||
           *end == '\f' || *end == '\r' || *end == ' ')
        end++;
    if (*end)
        return -1;

    if (to)
        *to = val;
    return 0;
}

static int parse_double(const char *src, double *to)
{
    const char *end = src;
    double      val;

    errno = 0;
    val = strtod(src, (char **)&end);
    if (errno || end == src)
        return -1;

    while (*end == '\t' || *end == '\n' || *end == '\v' ||
           *end == '\f' || *end == '\r' || *end == ' ')
        end++;
    if (*end)
        return -1;

    if (to)
        *to = val;
    return 0;
}

int main(int argc, char *argv[])
{
    ppm  image;

    if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        const char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", arg0);
        fprintf(stderr, "       %s FILENAME.PPM\n", arg0);
        fprintf(stderr, "\n");
        fprintf(stderr, "This is an example of \"ppm.h\" memory-mapped PPM image file format support.\n");
        fprintf(stderr, "If FILENAME.PPM does not exist, it will be created, and filled blue.\n");
        fprintf(stderr, "Otherwise, one of the blue pixels in it will be turned white.\n");
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

    /* Try creating the image file first. */
    if (!ppm_create(&image, argv[1], 16, 16, 0x0000FF)) {
        fprintf(stderr, "%s: Created a 16x16-pixel blue PPM image.\n", argv[1]);
        if (ppm_close(&image)) {
            fprintf(stderr, "%s: Write error: %s.\n", argv[1], strerror(errno));
            return EXIT_FAILURE;
        }
        return EXIT_SUCCESS;
    }

    /* Couldn't create the file, so try opening it. */
    if (ppm_open(&image, argv[1])) {
        fprintf(stderr, "%s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }

    /* Find a pixel that is blue. */
    int  found_x = -1, found_y = -1;
    int  x, y;
    for (y = 0; found_x == -1 && y < image.height; y++) {
        for (x = 0; x < image.width; x++) {
            if (ppm_getpixel(&image, x, y, -1) == 0x0000FF) {
                ppm_setpixel(&image, x, y, 0xFFFFFF);
                found_x = x;
                found_y = y;
                break;
            }
        }
    }

    if (found_x != -1) {
        fprintf(stderr, "%s: Painted pixel (%d,%d) white.\n", argv[1], found_x, found_y);
    } else {
        fprintf(stderr, "%s: No blue pixels found.\n", argv[1]);
    }

    if (ppm_close(&image)) {
        fprintf(stderr, "%s: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;
}

The parse_int() and parse_double() functions are included to make parsing command-line arguments easier. You can use them with getopt(), or with positional arguments. For example, if the second command-line parameter (argv[2]) is the desired image width, you could use

Code: [Select]

    int  width;
    if (parse_int(argv[2], &width) || width < 1) {
        fprintf(stderr, "%s: Invalid image width.\n", argv[2]);
        return EXIT_FAILURE;
    }

Integers are parsed in decimal, octal, and hexadecimal; and doubles in decimal or hexadecimal format. Unlike atoi()/sscanf() parsing, these have pretty nice error checking, and won't just blindly return a best-guess value.

Note that if multiple processes open the same PPM file, they see the changes in real time, but because there is no locking, that means they might observe the changes after only one or two components of a pixel have been changed, yielding temporary color "flicker". (You cannot even assume that those only take "a short time", because the kernel may interrupt a process in the middle of writing a pixel value; this is because these are triplets of bytes, and may even cross a page boundary, so definitely not atomic.)

The Linux kernel is pretty efficient about keeping the data in page cache, so as long as there is not too much memory pressure, the data will be in RAM, and only written back to disk per kernel dirty page rules. Any other process opening the file on the same machine will get the page-cached contents, even if you say cat or copy the file.

The ppm_close() uses msync(map, size, MS_SYNC) to ensure the mapping contents match the file at that point.

Note that ppm_open() should be able to memory-map any PPM (P6, binary pixmap format) images with full 24bpp color, i.e. maximum color component value 255. For example, any PPM image you export from Gimp (RGB format, in RAW and not ASCII).

Omega Glory · « **Reply #7 on:** November 25, 2021, 01:35:33 am »

Holy cow! Thank you for such a detailed response! I will certainly take time to dig into this in detail. I hadn't heard of memory mapped files before, but after doing some reading they seem to be quite useful. Wow, I'm glad you pointed this out to me.

rs20 · « **Reply #8 on:** November 26, 2021, 10:26:46 am »

I just have to point out that on a typical modern PC, the time difference between

A) writing directly to a memory-mapped file and
B) writing to an array in memory and then fwrite()ing it to a file in one big chunk

is negligible for any sane-sized image under at least 100 megapixels. And both approaches allow completely arbitrary order and multithreaded generation. So by all means have fun with memory-mapping if it seems interesting, but don't be under any illusions that has much of an effect on performance in this sort of application (I'm well aware no-one has quiite made that claim in this thread that it would, but still I think the warning is worthwhile). I'm as guilty as the next person of trying to find problems to use a cool solution like memory-mapping on.

(Oh, and of course if you were looking for performance, and ultimately wanted a PNG file, then outputting directly to PNG would obviously be faster than using either of the techniques above and then calling a converter tool.)

The feature of using eog to see the image in-progress is cool though, and I'd say that's made easier by memory mapping.

I have used memory-mapping in the past, but that was for a particular crazy project where I needed random access over an entire 30 second 4K video that *didn't* fit in RAM all at once. In that case, it made the project *easier* rather than more complicated, so that seemed like a better application.

magic · « **Reply #9 on:** November 26, 2021, 04:35:47 pm »

Now speed it up with AVX

Once upon a time I wrote a version for SSE (using GCC's vector extensions - too lazy to write assembly by hand) that did four pixels in parallel in single precision. IIIRC it ran about twice faster than naive implementation.

I also wonder if OpenCL would do a good job with it.

SiliconWizard · « **Reply #10 on:** November 26, 2021, 04:37:15 pm »

Quote from: rs20 on November 26, 2021, 10:26:46 am

I just have to point out that on a typical modern PC, the time difference between

A) writing directly to a memory-mapped file and
B) writing to an array in memory and then fwrite()ing it to a file in one big chunk

is negligible for any sane-sized image under at least 100 megapixels. And both approaches allow completely arbitrary order and multithreaded generation. So by all means have fun with memory-mapping if it seems interesting, but don't be under any illusions that has much of an effect on performance in this sort of application (I'm well aware no-one has quiite made that claim in this thread that it would, but still I think the warning is worthwhile).

I agree here. And the memory mapping stuff is not strictly portable, so you also need to keep this in mind.

Nominal Animal · « **Reply #11 on:** November 26, 2021, 05:18:15 pm »

Hey! Memory-mapping is NOT ABOUT SPEED OR TIME TAKEN.

The code is actually quite portable across POSIX, BSDs and SysV, except that non-Linux systems may not support the MAP_NORESERVE flag, which is okay to omit. This flag only tells the kernel that it does not need to reserve swap for this mapping. Let's be honest here: it's really just Windows that has issues with this code; just about everywhere else, it works – albeit might need some tiny changes.

Like I wrote, the main benefit of memory-mapping a file (that is stored on that same machine) is that one can efficiently read and write the data in a random order, and the changes (in Linux) are immediately visible to other processes.

The efficient part is not about speed at all, but about memory use and page cache. On Linux, the kernel is pretty darned good at managing which pages of the mapping stay in memory, and which are evicted to the underlying file. Whenever a process opens and reads the file, the page cache pages for the mapping are used for the data, so essentially it works like shared memory, just with a file backing. This applies in Linux regardless of how the process reads the page: using stdio.h, POSIX low-level I/O, or memory mapping.

If the entire mapping is in page cache –– meaning, there is enough RAM available to hold all image data –– then reading or writing the data is just as fast as it is to dynamically allocated memory within the process. The writeback to the file is essentially "free", from the process' point of view.

You can use this method to generate some really big files, much larger than available RAM. A terabyte is no big deal, as long as you have the storage space to hold the file contents; noting that the program does reserve the space for every byte needed, and a sparse file does not work here. If the access patterns are clustered (not uniform random, but mostly access the same set of pages at any given short interval), it'll work very well, in my experience. The kernel will "swap" (write back) rarely used pages to the underlying file as needed.

Aside from massive amounts of data, NetPBM binary files are one of the rare file types that I prefer to memory-map, because of the above reasons.
The other use case I use them for, is restartable simulation/computation state. (One does have to be careful to make sure that the process is not killed during copying the completed intermediate state to the memory-mapped state, though; using two, each in their own temporary file in the same filesystem as the actual state file, and hard-linking the actual state file to the temporary file after a successful update, is in my experience extremely solid way, even in the face of SIGSEGV and OOM killer shenanigans. Very useful, if your simulation/computation is the sort that takes days or weeks, not hours.)

If you wanted speed, you'd be better off using an internal 32-bit-per-pixel buffer (either R8G8B8 or R10G10B10) with horizontal stride being a multiple of cacheline size, and only at file write time convert to 24bpp R8G8B8, because PPM data needs three 8-bit accesses (16-bit, if you support more than 8 bits per color component) per pixel. For fractal generation (and stuff like ray tracing), it does not matter, because on average each pixel is only accessed once.

For fractals, using indexed color PNG would significantly reduce the file sizes, but would also limit the number of visible iterations to 255 (with the leftover one reserved for the fractal itself, i.e. all iterations stay within the fractal). In the past, I've used PCX as the initial output format from my programs for this, and pcxtoppm | pnmtopng -compress 9 to convert to PNG. PCX does use run-length encoding, so it's not suited for memory-mapping. In the indexed color case, you'd only record the iteration counts into the "image" buffer, then select and compress the 255 most significant ones (256, if you include the fractal set itself), and finally generate the 256 24-bit RGB palette entries corresponding to the selected iteration counts. Note that when you get deeper into the details surrounding the fractal set, all points require more than 256 iterations, so it makes sense to make the iteration count buffer larger than 8 bits per pixel.

I like the Unix philosophy and the KISS principle, which is why I prefer to use trivial, no-library-needed formats as the output from my programs. That said, libpng is not too difficult to use. Yet, because in the past it has had quite a few bugs (in the security sense), I like the idea of keeping it in a separate process that gets updated automatically by my package management as bugs in it are discovered and fixed. (If dynamically linked, the bugs will only be automagically fixed if the fixes are provided to the old versions of the library too; if statically linked, fixing requires a recompile.)

rs20 · « **Reply #12 on:** November 26, 2021, 10:16:49 pm »

Quote from: Nominal Animal on November 26, 2021, 05:18:15 pm

The efficient part is not about speed at all, but about memory use and page cache.

What does the end user care about when running a piece of software?

Speed -- you're voluntarily saying we're not talking about this
Page cache -- the only point of page cache is to improve speed, and you already said we're not talking about this
Memory usage -- negligible regardless of technique used, because the image is well under 100 megapixels

So again, tell me what this "efficient" part is about?

I'm sorry, I'm not trying to be mean, but I've been in your shoes before! Memory mapping is wonderfully elegant and cool to think about, but when you try to explain (in concrete terms, to an end user, talking about things that the end user cares about when they're primed with the word "efficient") why it's good for the end user, it *often* turns out to be pointless and indistinguishable from the more naïve method. And rendering a fractal is one of those times. (Aside from the seeing-the-progress-of-the-image-as-you-go-along thing, that's pretty cool.)

Also, you're going on and on about how memory mapping is useful for multi-terabyte files. I acknowledged this in my previous message! But rendering a fractal is not a multi-terabyte file, so of no relevance here.

TL;DR: You should probably be careful throwing the word "efficient" around if you don't want to be misinterpreted as claiming higher speed...

Nominal Animal · « **Reply #13 on:** November 26, 2021, 10:59:24 pm »

Quote from: rs20 on November 26, 2021, 10:16:49 pm

So again, tell me what this "efficient" part is about?

Using minimal resources to achieve the task.

Quote from: rs20 on November 26, 2021, 10:16:49 pm

TL;DR: You should probably be careful throwing the word "efficient" around if you don't want to be misinterpreted as claiming higher speed...

Me fail English often, that's true. I'm using the term as specified in various dictionaries, for example in Merriam-Webster as
efficient: capable of producing desired results without wasting materials, time, or energy
or in Wiktionary as
efficient: making good, thorough, or careful use of resources; not consuming extra. Especially, making good use of time or energy

I really don't see how efficient could imply higher speed (as measured by a real time clock). Again, I'm not a native speaker, though.

rs20 · « **Reply #14 on:** November 27, 2021, 01:42:36 am »

Quote from: Nominal Animal on November 26, 2021, 10:59:24 pm

I really don't see how efficient could imply higher speed (as measured by a real time clock). Again, I'm not a native speaker, though.

Well, the dictionary definitions said "without wasting time" and "making good use of time". And higher speed means less time. And as for why folks focus on time rather than energy or memory usage, end users care about time taken because they actually experience that. But the energy difference is negligible on their power bill and the memory usage is basically invisible too* (unless the user has 50 chrome tabs open and is dipping into virtual memory).

* In specific examples like fractal rendering, at least.

TL;DR: the "efficiency" of an algorithm, without any clarifying context, will generally be assumed to refer to the the big Oh time complexity.

Also, since you said "Memory-mapping is not about speed or time taken" -- we're in total agreement here! I think we're in total agreement about everything really. I think it's just good practice to volunteer that point as soon as possible, because it's such a common misconception that that's what it's about, and there's an army of us out there who (perhaps too eagerly) shoot down that point whenever it even gets slightly hinted at!

Also, just to try to defuse a bit, I *did* say in my first message:

Quote from: rs20 on November 26, 2021, 10:26:46 am

So by all means have fun with memory-mapping if it seems interesting, but don't be under any illusions that has much of an effect on performance in this sort of application (I'm well aware no-one has quiite made that claim in this thread that it would, but still I think the warning is worthwhile). I'm as guilty as the next person of trying to find problems to use a cool solution like memory-mapping on.

Nominal Animal · « **Reply #15 on:** November 27, 2021, 04:36:54 am »

Quote from: rs20 on November 27, 2021, 01:42:36 am

Quote from: Nominal Animal on November 26, 2021, 10:59:24 pm
I really don't see how efficient could imply higher speed (as measured by a real time clock). Again, I'm not a native speaker, though.
Well, the dictionary definitions said "without wasting time" and "making good use of time". And higher speed means less time.

I only ever use "efficient" in the sense of less waste.

As in, if you get a more efficient gearbox in your ute, it doesn't mean it'll go faster or give more torque, as that depends on the gear ratios. It just means the losses in the gearbox are smaller. Here, the operation overheads are smaller. Whether those overheads matter to any specific user, is a completely separate question.

Quote from: rs20 on November 27, 2021, 01:42:36 am

end users care about time taken

Well, that too actually varies. For interactive applications and utilities, that absolutely applies. (That's why e.g. a tool that sorts its input is better implemented by reading the input into a self-sorting data structure like a heap, instead of first reading all input and sorting it more efficiently, because the former does most of the work while the I/O is still being transferred, and therefore completes faster than the latter – but also uses a bit more CPU time.)

Batch utilities and background stuff, things that you run under nice -n 19 ionice -c 3 , now that's different. You don't want them to hog the CPU or storage access, but you do want them to do their work at maximum efficiency, not wasting any RAM or such.

Which one does fractals belong to? I dunno. I used fractint in the nineties, and I consider it a slow background task, but that's just me.

(Humans are weird in other ways, too. If they compare an action that has a latency of half a second but completes in a second, and an action that has a latency of ten milliseconds but completes in a second and a half, most of them say that the latter is "faster", even though it is really slower and just "more responsive" due to the smaller latency.)

If one wanted an interactive fractal viewer, doing so using Qt (C++) or Gtk+ (C) or even FLTK, SDL, Cairo Graphics, etc., would make more sense to me.
Making one that triggers a background process to calculate the currently visible one at much higher resolution might be nice. More on that later below.

Quote from: rs20 on November 27, 2021, 01:42:36 am

TL;DR: the "efficiency" of an algorithm, without any clarifying context, will generally be assumed to refer to the the big Oh time complexity.

But big-Oh time complexity does not describe "speed" at all, only how the algorithm scales!

In HPC (especially parallel and distributed computing, like in atomic simulations which is kinda-sorta my field), "efficiency" is a general metric that includes the big-O time and space (additional memory needed) complexity, and usually even qualitative judgment on stuff like cache behaviour. It is used as a qualitative comparison between two or more implementations that take the same input and produce the same output, but use different amounts of available resources (memory – cache, RAM, storage space –; CPU time, wall clock time; network bandwidth; etc.) while doing so.
"Speed" is only a minor component of "efficiency" there.

Here, I used it in the sense that PPM is one of the rare image formats you can memory-map, allowing reduced memory pressure when running on the background, plus yields the bonus features like real-time progress check by simply viewing the image file still being generated, all without spending extra CPU or wall clock time. It does not make it any faster, just more "well-behaved" in a sense.

I've never seen any better term than "efficient" for describing this. Suggestions welcome!

In the sense you use it, I've always seen "speed"/"faster"/"slower" used instead.

If you've ever done any coding competitions, that could explain the difference, because none of the ones I've seen actually measure efficiency at all, only time (either CPU time, or wall clock time; usually they don't even tell which). Similarly in the demo scene, and game development.

Quote from: rs20 on November 27, 2021, 01:42:36 am

Also, since you said "Memory-mapping is not about speed or time taken" -- we're in total agreement here! I think we're in total agreement about everything really. I think it's just good practice to volunteer that point as soon as possible, because it's such a common misconception that that's what it's about, and there's an army of us out there who (perhaps too eagerly) shoot down that point whenever it even gets slightly hinted at!

Definitely.

(I do realize now that I should have added a couple of

to indicate my attitude in this discussion. Apologies!)

Quote from: rs20 on November 27, 2021, 01:42:36 am

Also, just to try to defuse a bit

No worries, I've typed all of my messages in this thread with a smile on my face, just forgot to ensure it is conveyed in the text. Me fail that way often, but I'm working on it.

Quote from: rs20 on November 26, 2021, 10:26:46 am

I'm as guilty as the next person of trying to find problems to use a cool solution like memory-mapping on.

I don't think of memory-mapping as "cool" at all. It is actually pretty annoying; I do prefer to avoid it unless the features it yields are worth it.

I definitely did not suggest it because of memory-mapping itself is "cool", but because of the actual benefits in this particular case, which can be a bit surprising.
It's an useful tool, that's all.

As to the downsides: For example, the reason the ppm_create() code does the posix_fallocate() call is to reserve the disk space for the entire target file, similarly as if one had written zeroes to it (but letting the filesystem handle the actual details instead of writing actual zeros to it). With the MAP_NORESERVE flag, if the file was sparse and the kernel wanted to evict any page that covers any part of a sparse hole in the file, the process will get a SIGSEGV signal, which is an utter bitch to handle.

Without the MAP_NORESERVE flag, the kernel reserves swap space for the mapping (includes the entire mapping in the process memory accounting), so that even if niced and running in the background, the memory pressure from the image buffer is twice the size of the mapping. (This is the use case MAP_NORESERVE was designed to fix, really.) Since most Linux machines typically run with overcommit enabled, that can lead to surprising memory accounting limitations that don't feel reasonable/sane.

With respect to fractal generation, I'm starting to think it would make more sense to instead memory-map a temporary/work file that contains only the iteration counts. A count of zero would be reserved for "not computed yet", so the count would reflect the number of iterations needed to exit the set. That means the file contents would not need to be initialized per se; just creating the file, ftruncate()'ing it to the correct size, and posix_fallocate()'ing to make sure the storage space for the entire file is available and reserved for it, suffices here, because POSIX semantics guarantees that the data reads as all zeros.

A separate process –– say, the interactive viewer, by naming the temporary/work file in the first place –– could monitor the progress by simply looking at the file.
The worker process could use a multi-pass method of calculating the pixels, essentially progressively filling in non-neighboring pixels in the buffer, so that the separate process could show a very low-resolution version of the fractal sooner, allowing the user to cancel the calculation. Perhaps the worker would note the parent process (a single byte in the response pipe/socket pair?) whenever a full pass is completed?

The reason for using a temporary file like this, is because it would act as shared memory between the parent and the child, but be restartable/continuable if the computing child is killed, or restarted with different settings (say, number of computing threads, different priority, etc.).
It also keeps the code rather simple. For example, instead of trying to raise an already reduced CPU or I/O priority, or change the number of threads used in calculating the fractal, the separate process just tells the worker to exit, and starts a new worker with otherwise the same settings, but differently-set priorities or thread count etc.

Each pixel can be treated as one task in a task pool, with a worker thread examining whether the target pixel has already been computed yet, and if so, skipping it and grabbing the next one. While this is not optimal, it should not be too wasteful or slow either.

The task pool itself would consist of the calculation of which the next pixel in the image would be. The complex coordinates for the iteration can be easily derived from the pixel coordinates, if you specify the complex coordinates of the center of the image, the distance between neighboring pixels in the complex coordinate system, and the direction (or rotation) of the image around its center. These coordinates would only be calculated after checking the pixel hasn't been computed yet, making the above pixel-skipping an easy but not too wasteful method of letting the worker just die and restart another using the same image size and fractal parameters.

Using 32-bit iteration counts, one could safely use __atomic_store_n(map + x + y*width, count, __ATOMIC_SEQ_CST) to atomically update the iteration count for a given pixel in Linux. (Corresponding atomic load would be count = __atomic_load_n(map + x + y*width, __ATOMIC_SEQ_CST).) This would eliminate any "pixel flicker" from non-atomic updates.

Then, the interactive UI viewer could dynamically switch "palettes" – iteration count to color functions; I especially recommend looking at ones working internally in HSL or HSV or L*a*b* color spaces –, even during fractal calculation, just by switching the function it uses to turn each iteration count to a color.

To compress the images, one could choose the best set of up to 256 iteration counts (because in the iteration count space, difference between different counts is trivial; in color space, it is very, very nontrivial), and map the iteration counts to those, and generate a paletted image. One can do this directly to PCX (without any libraries), or just save as a PPM with only 256 (or fewer) unique colors in which case ppmtopng will convert it into an indexed color PNG, or directly use libpng to save a PNG image. Personally, I'd probably implement an easily extensible interface where external programs are used to convert the data to an image format.

(I do find plugin interfaces using <dlfcn.h> "cool", though, in the sense that I often implement the support for them, even if I don't have an actual plugin, or am even certain I will ever have actual plugins... The exact interface usually does need to be refined when the first few plugins are implemented, because it is rare (for me!) to get the interface right beforehand.)

In this scheme, the reason for the memory map is twofold: it acts as shared memory between the interactive ui and the worker calculating the fractal; and it allows trivially restarting the worker and continuing previously interrupted work. The reason for putting the worker into a separate process (thus the need for shared memory) is that it allows the (unprivileged but normal-priority) interactive ui to reduce the priority of the worker, so that the fractals can be calculated with easily managed, lower priority than other tasks the user might need.

If you don't care about these features, or don't intend to use them, then I see no need nor use for memory mapping here at all.

Oh, and once again, apologies for the wall of text.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Generating Beautiful Fractals (Read 4709 times)

Omega Glory

Generating Beautiful Fractals

Omega Glory

Re: Generating Beautiful Fractals

golden_labels

Re: Generating Beautiful Fractals

Nominal Animal

Re: Generating Beautiful Fractals

SiliconWizard

Re: Generating Beautiful Fractals

Omega Glory

Re: Generating Beautiful Fractals

Nominal Animal

Re: Generating Beautiful Fractals

Omega Glory

Re: Generating Beautiful Fractals

rs20

Re: Generating Beautiful Fractals

magic

Re: Generating Beautiful Fractals

SiliconWizard

Re: Generating Beautiful Fractals

Nominal Animal

Re: Generating Beautiful Fractals

rs20

Re: Generating Beautiful Fractals

Nominal Animal

Re: Generating Beautiful Fractals

rs20

Re: Generating Beautiful Fractals

Nominal Animal

Re: Generating Beautiful Fractals

Share me