Author Topic: FatFS sorted directory listing, but without needing a large RAM buffer (Read 2485 times)

peter-h · « **on:** June 21, 2022, 09:08:46 am »

The standard code (this is mine so not as smart as people here would expect) gets filenames in the order in the directory+FAT.

Code: [Select]

/*
 * Scans directory, fill array flist with the names, sizes and attribs, up to maxcount (each entry takes 17 bytes)
 * Returns # of files actually found in fcount.
 * The array is not initialised so you have to use fcount to determine how many filenames were loaded.
 * Any filename shorter than 12 bytes (the 12 includes the dot) has a 0x00 appended.
 * Directories are found also.
 * Each entry is 17 bytes:
 * 0..11 filename
 * 12..15 filesize (uint32_t, LSB in byte 12)
 * 16 attrib byte:
 *
 * #define	AM_RDO	0x01	 Read only
 * #define	AM_HID	0x02	 Hidden
 * #define	AM_SYS	0x04	 System
 * #define  AM_DIR	0x10	 Directory
 * #define  AM_ARC	0x20	 Archive
 *
 */

uint32_t get_file_list( char *flist, uint32_t maxcount )
{

	FRESULT fr;     /* Return value */
	DIR dj;         /* Directory object */
	FILINFO fno;    /* File information */

	uint32_t fcount=0;	// # of files found so far
	uint32_t idx=0;		// index into flist

	fr = f_findfirst(&dj, &fno, "", "*"); 				// Start to search for any file

	while (fr == FR_OK && fno.fname[0] && maxcount)		// Repeat while an item is found
	{
//	    debug_thread_printf("%.12s", fno.fname);
	    memcpy(&flist[idx],&fno.fname,12);
	    flist[idx+12] =  0x000000ff & fno.fsize;
	    flist[idx+13] = (0x0000ff00 & fno.fsize) >> 8;
	    flist[idx+14] = (0x00ff0000 & fno.fsize) >> 16;
	    flist[idx+15] = (0xff000000 & fno.fsize) >> 24;
	    flist[idx+16] = fno.fattrib;
	    idx+=17;
	    fcount++;
	    maxcount--;
	    fr = f_findnext(&dj, &fno);               		// Search for next item
	}

	f_closedir(&dj);

	return fcount;

}

It is 2MB which is FAT12 and I think the max # of files in root (no subdirs are supported in this case) is 512. Sorting these conventionally will need a lot of RAM. The names are limited to 8.3 (the LFN build option bloats the code quite a lot).

I was thinking of a rather crude method, like this:

Initialise a uint64_t smallest_name = 0xffffffffffffffff

Do one directory scan. Load each filename, left-justified and padded with trailing spaces, into a uint64_t, but in big-endian order, so you have a number which is a) unique for the filename and b) comparing two of these numbers will yield an alphabetical-equivalent comparison. Determine the "smallest" name file in the list. Set smallest_name to that. Display that filename / send it to the client / whatever.

Now scan the directory again, looking for the smallest name, but not smaller than the last one. When found, display that one.

Repeat until no more files.

So if you have 500 files you scan the dir 500 times, but with FatFS running on a FLASH (or RAM) disk this is very fast - around 500kbytes/sec on a 168MHz arm32.

Surprisingly, a search did not dig up anything like this (apart from the usual of people asking and getting no replies) so I wonder if there are slicker ways. This is a simple file listing function running in an HTTP server, to deliver a file listing to a client browser.

ledtester · « **Reply #1 on:** June 21, 2022, 11:29:50 am »

Quote from: peter-h on June 21, 2022, 09:08:46 am

This is a simple file listing function running in an HTTP server, to deliver a file listing to a client browser.

How about having the client's browser sort the results using javascript?

ejeffrey · « **Reply #2 on:** June 21, 2022, 03:00:39 pm »

This is a reasonably standard way to find the M smallest entries in a list. Normally you wouldn't do it for every element in a large list because you are trading N^2 time and IO for N memory but if you don't have the memory that's just what you have to do.

Another thing you can do is a indirect sort. Instead of sorting the data in memory, sort an array of index values. Instead of 512*11 bytes you would only need 512*2. With a bit of cleverness using a combination of merge sort and quick sort you should be able to do that in 512 bytes. The downside of this is that you have a lot more random IO. Also 512 bytes is a lot better than kB but that might still be too much.

peter-h · « **Reply #3 on:** June 21, 2022, 03:07:26 pm »

Quote

How about having the client's browser sort the results using javascript?

I did wonder about that but how futureproof / browser-compatible is that? So much JS breaks as browsers are updated...

Quote

you should be able to do that in 512 bytes

I wondered about a CRC16 hash, so 512 x 2 bytes storage, and you check for a collision and if you get one then you just serve it unsorted

mariush · « **Reply #4 on:** June 21, 2022, 04:55:35 pm »

It would make more sense to have the file list sorted in browser. You can add more cleverness like "group files by extension, then sort each group by filename" or stuff like that.
If you want it sorted by in the microcontroller, I'd just build an array with the first or two characters and the sector where the file info is stored so basically 4-6 bytes per entry .. sort this list and then print each full file name by re-reading that sector to get the full file name.

If it's really FAT12 there's other limitations you could take advantage of like the file names and extensions always being uppercase... you could use each top bit of those 12 possible characters for something (like position in listing for example... as you only need 9 bits if the maximum number of entries in a folder is 512
also you wouldn't need to store the file size in 4 bytes because the maximum fat12 can do is 32 MB

T3sl4co1l · « **Reply #5 on:** June 21, 2022, 05:42:33 pm »

Since the strings are fixed length, a radix sort can do it in O(N) time. Which you've somewhat hit upon with the uint64_t scheme, I think. Granted, a full int64 radix is quite a large array, but...

Should be a permutation array can be crafted in a couple kB of RAM.

How about seeing how MS-DOS does it on ye olde XT PC?

...Ahh, DOS 2.0 (the latest with released source) "DIR" didn't have any options beside /P and /W. Simpler times, they were! Well, still an option, admittedly a remote one: trace COMMAND.COM itself and see how DIR /ON works. Pretty sure that was in by 6.2... Well, nevermind.

But yeah, it's literally a one-liner in JS, and at that, in JS that'll run on a freaking Pentium with IE3 or Firefox 1, or Netscape for that matter. Maybe a few more lines back then, you'd have to check what standard functions were. For sure, the usual way to do it today will run on anything IE5 or 6 up, with IE being the worst offender in terms of compatibility. Avoid newer say ES3+ stuff and you're fine.

And even if the JS fails, you're still left with a, whatever, list box, table, of unsorted items -- it can fail softly, at minor irritation to the user. And the user can always CTRL+F find something if they know what they're looking for (even in Lynx!).

Tim

ledtester · « **Reply #6 on:** June 21, 2022, 07:55:43 pm »

Another approach...

add an SPI SRAM chip to your BOM - enough to hold all of the file names + some more. Crack open Knuth's "The Art of Computer Programming", Vol. 3, "Sorting and Searching" and start reading section 5.4:

Quote

5.4 EXTERNAL SORTING

Now it is time for us to study the interesting problems which arise when the number of records to be sorted is larger than our computer can hold in its high-speed internal memory. ...

SiliconWizard · « **Reply #7 on:** June 21, 2022, 08:31:09 pm »

Define large. Since you're talking about FAT12, filenames were 8.3 only, thus for 500 files you'd need 11*500 = 5500 bytes. Oh, and you'd probably want an associated index in the FAT order too, that would be say a 16-bit integer, so say 13*500 = 6500 bytes. Is that a lot for your use case?
Then doing a sort in place on such an array would be much faster than rescanning the directory a lot of times.

peter-h · « **Reply #8 on:** June 21, 2022, 08:39:17 pm »

5k+ is too much, yes. The target does have it but there are more important uses for it.

I already have an SPI RAM option - 8 megabytes

https://www.eevblog.com/forum/microcontrollers/lyontek-ly68l6400-8-megabyte-spi-sram-am-i-reading-the-data-sheet-right/

SiliconWizard · « **Reply #9 on:** June 21, 2022, 08:53:52 pm »

Ouch. OK.
Well, your approach is certainly doable. People were sorting large databases on tapes back in the days.
You could look at external sorting algorithms, which seem adapted to your use case (and if you don't use external sorting, at least that's some CS history):
https://opendsa.cs.vt.edu/ODSA/Books/CS3/html/ExternalSort.html

Nominal Animal · « **Reply #10 on:** June 21, 2022, 09:54:21 pm »

Rather than scan the directory as many times as you have files in it, you should simply discard all but the latest N names not preceding a specified file name. In other words, you'll need enough memory for N+2 directory entries, and each pass through the directory will yield N names in sorted order.

One entry is your cache, received data from the storage. One entry is the last entry output from the previous directory scan; on the first scan, initialize it to all zeros, or whatever value sorts before any possible file names.

During the scan, you keep the directory entries in the N-entry cache in sorted order, with the last entry from the previous scan as the '-1'th entry.

If a new entry sorts at or before the '-1'th entry, it is discarded (because it was reported after a previous scan). Otherwise, if there are less than N entries, it is inserted into the sorted array. Otherwise, if the new entry sorts before the last sorted entry, the last sorted entry is discarded, and the new entry inserted (into the correct position in the sorted array).

There are several ways of implementing this, but the key idea is that you can choose any integer N starting at 1 (which yields your original idea), and choose how much memory you can spare for this. Personally, I'd use this for N at most 255, and use an additional N-byte array to describe the order (i.e., as a "pointer" array), so that swapping two entries would be just swapping the corresponding array indexes in the N-byte array. (And 255 and not 256, since having a reserved index value to indicate unused slot tends to simplify things.) If N is more than a dozen or so, using a binary search to find the new name insertion point can be useful.

ledtester · « **Reply #11 on:** June 22, 2022, 12:24:21 am »

With the file names in randomly accessible storage you could do the following...

Almost every sort algorithm boils down to comparing two records. When the records are strings a comparison function like strcmp() is used which compares the i-th characters of each string until a difference is found. If the records are of fixed length (not a problem in this case) you can compute exactly where the i-th character of the j-th record resides in memory.

So, you choose your sort algorithm (heap sort, merge sort, quicksort, etc.) and for your comparison function you provide a version of strcmp() which will compare two strings which reside in the external memory. This will require SPI transfers and you could either 1) first transfer both strings into main memory and do the comparison there or even 2) perform SPI transfers of single characters on a as-needed basis.

When you "swap" records you just keep an array of indexes in main memory and swap the indexes.

Nominal Animal · « **Reply #12 on:** June 22, 2022, 04:43:05 pm »

Here is what I was suggesting above – but beware, this hasn't even been compile-tested, so it likely has typos and such.

Code: [Select]

/* Per-file descriptor for directory scanning */
typedef struct {
    uint32_t        size;       /* File size in bytes */
    uint8_t         attr;       /* File attributes */
    char            name[13];   /* File name as a C string, NUL-terminated, includes dot */
} dirscan_entry;

/* Directory scanning structure */
typedef struct {
    int            (*filterfunc)(const FILINFO *);
    int            (*sortfunc)(const dirscan_entry *, const dirscan_entry *);
    DIR             dirh;
    dirscan_entry   last;
} dirscan;

/* Fill dirscan_entry from the FILINFO structure.
   Return FR_NO_FILE (if not a valid entry), or FR_OK (if a valid entry). */
static FRESULT dirscan_entry_fill(dirscan_entry *ent, FILINFO *info)
{
    if (info->fname[0] == 0)
        return FR_NO_FILE;

    ent->size = info->fsize;
    ent->attr = info->fattrib;

    memset(ent->name, 0, sizeof ent->name);
    if (sizeof info->fname >= sizeof ent->name)
        memcpy(ent->name, info->fname, (sizeof ent->name) - 1);
    else
        memcpy(ent->name, info->fname, sizeof info->fname);

    return FR_OK;
}

int dirscan_namesort(const dirscan_entry *ent1, const dirscan_entry *ent2)
{
    const char *name1 = ent1->name;
    const char *name2 = ent2->name;
<
    while (1) {
        unsigned char  c1 = *(name1++);
        unsigned char  c2 = *(name2++);

        /* Case folding?  Not necessary on FATFS, but let's do it anyway. */
        if (c1 >= 'a' && c1 <= 'z')
            c1 = c1 - 'a' + 'A';
        if (c2 >= 'a' && c2 <= 'z')
            c2 = c2 - 'a' + 'A';

        if (c1 != c2)
            return (int)c1 - (int)c2;  /* == c1 - c2, due to C integer promotions */
        else
        if (!c1)
            return 0;
    }
}

/* Begin directory scanning.  Returns FR_OK if valid, an error result otherwise.
   filterfunc() is optional; it returns zero if a file is to be skipped, nonzero to list.
   sortfunc(a, b) compares two dirscan_entries, and returns negative if a < b,
   zero if a == b, and positive if a > b.
*/
FRESULT dirscan_begin(dirscan *ds, const char *path, 
                      int (*filterfunc)(const FILINFO *),
                      int (*sortfunc)(const dirscan_entry *, const dirscan_entry *))
{
    FRESULT result;

    /* Remember the filter function.  It is optional. */
    ds->filterfunc = filterfunc;

    /* Remember the sort function. NULL defaults to dirscan_namesort. */
    if (!sortfunc)
        ds->sortfunc = dirscan_namesort;
    else
        ds->sortfunc = sortfunc;

    /* Clear the last entry to all zeros. */
    ds->last.size = 0;
    ds->last.attr = 0;
    memset(ds->last.name, 0, sizeof ds->last.name);

    /* Open the specified path. */
    return f_opendir(&(ds->dirh), path);
}

FRESULT dirscan_end(dirscan *ds)
{
    return f_closedir(&(ds->dirh));
}

/* Repeatedly call this function with a dirscan handle,
   with an array of dirscan_entry ent[ents_max];,
   until the function returns zero (complete) or negative (error).
   The function returns the number of entries in sorted order
   filled in the ent array.
*/
int dirscan_get(dirscan *ds, dirscan_entry *ent, int ents_max)
{
    FILINFO        info;
    dirscan_entry  tempent;
    FRESULT        result;
    int            ents = 0;

    if (ents_max < 1)
        return -1;  /* TODO: Better error reporting? */

    f_rewinddir(&(ds->dirh));

    while (1) {
        result = f_readdir(&(ds->dirh), &info);
        if (result != FR_OK)
            break;

        /* If there is a filter function, apply. */
        if (ds->filterfunc && !ds->filterfunc(&info))
            continue;  /* Filter returned zero, so we skip this entry. */

        result = dirscan_entry_fill(&tempent, &info);
        if (result != FR_OK)
            break;

        /* If we have already seen this entry, skip it. */
        if (ds->last.name[0] && ds->sortfunc(&(ds->last), &tempent) <= 0)
            continue;

        /* If we have no entries yet, this will be the first one. */
        if (!ents) {
            ent[0] = tempent;
            ents++;
            continue;
        }

        /* Is this a new last entry? */
        if (ds->sortfunc(ent + ents - 1, &tempent) <= 0) {
            /* Append if array is not full yet. */
            if (ents < ents_max)
                ent[ents++] = tempent;
            continue;
        }

        int  enti = 0;     /* Entry insertion index */

        /* If the new entry is not new first entry in the array, .. */
        if (ds->sortfunc(ent, &tempent) <= 0) {
            /* .. do a binary search. We know it is after [0] but before [ents-1]. */
            int  entj = ents - 1;
            while (entj > enti) {
                const int  i = (enti + entj) / 2;
                if (ds->sortfunc(&tempent, ent + i) <= 0)
                    entj = i;
                else
                    enti = i;
            }
        }

        /* If the array is full, drop the last entry.  Otherwise, keep all. */
        if (ents >= ents_max) {
            memmove(ent + enti + 1, ent + enti, (ents_max - enti - 1) * sizeof ent[0]);
            ent[enti] = tempent;
        } else {
            memmove(ent + enti + 1, ent + enti, (ents - enti) * sizeof ent[0]); 
            ent[enti] = tempent;
            ents++;
        }
    }

    if (result != FR_OK && result != FR_NO_FILE)
        return -1;  /* TODO: Better error reporting to caller? */

    /* If we got any entries, save the last one. */
    if (ents > 0)
        ds->last = ent[ents - 1];

    return ents;
}

The idea is that you define a sort function (or use the above dirscan_namesort() which is used by default if you supply NULL sort to dirscan_begin()), and optionally a filter function (for example, to not list hidden files, directories, or the . and .. entries).

To list the contents of say the root directory, you use

Code: [Select]

    dirscan  handle;

    if (dirscan_begin(&handle, "/", filterfunc_or_NULL, dirscan_namesort) != FR_OK) {
        /* Failed, probably a path error! */
    } else {
        dirscan_entries  ent[15];
        int    ents;
        while ((ents = dirscan_get(&handle, ent, sizeof ent / sizeof ent[0])) > 0) {
            /* output ent[0 .. ents-1], inclusive */
        }
        dirscan_end(&handle);
    }

Note that this works with FF_USE_FIND=0 (and FF_FS_MINIMIZE=0 or 1).

The smaller the ent[] array is, the more times one needs to read through the directory contents.

peter-h · « **Reply #13 on:** June 24, 2022, 02:30:01 pm »

I think a reasonable solution to this, which can also address other stuff in the future, is to simply push out a block of data to the client browser at the start of each web page. This data will be from a file in the filesystem. So one can send out some javascript to do the sorting. Then one can do all kinds of clever stuff like sorting on different fields. And the JS will be trivial to change just by editing that file.

Mechatrommer · « **Reply #14 on:** June 24, 2022, 03:44:07 pm »

Quote from: peter-h on June 21, 2022, 09:08:46 am

Code: [Select]
uint32_t get_file_list( char *flist, uint32_t maxcount ) ... memcpy(&flist[idx],&fno.fname,12); ...So if you have 500 files you scan the dir 500 times, but with FatFS running on a FLASH (or RAM) disk this is very fast...

why? you already saved everything in flist, why the need to rescan 500x? just run through the flist!

Quote from: ejeffrey on June 21, 2022, 03:00:39 pm

Another thing you can do is a indirect sort. Instead of sorting the data in memory, sort an array of index values. Instead of 512*11 bytes you would only need 512*2. With a bit of cleverness using a combination of merge sort and quick sort you should be able to do that in 512 bytes. The downside of this is that you have a lot more random IO. Also 512 bytes is a lot better than kB but that might still be too much.

since flist is fixed 17 bytes, he only needs extra 17 bytes (char) as temporary space for swapping the flist..

Quote from: peter-h on June 24, 2022, 02:30:01 pm

I think a reasonable solution to this, which can also address other stuff in the future, is to simply push out a block of data to the client browser at the start of each web page. This data will be from a file in the filesystem. So one can send out some javascript to do the sorting. Then one can do all kinds of clever stuff like sorting on different fields. And the JS will be trivial to change just by editing that file.

now you got the logic sorted out. let the content of this thread for other readers to get an idea from

cheers.

mariush · « **Reply #15 on:** June 25, 2022, 12:35:26 am »

I don't think you need 17 bytes per entry. You don't need to store the file size, the attributes and the whole file name and extension

For example, I'm thinking you could use as little as 5-6 bytes per file entry

3 bytes - offset to where to the file information is located (so you can seek and retrieve file size, attributes, whole file name and extension when needed)
1 byte
- 1 bit : is it a directory or not (assuming you want to have the directories at the top)
- 3 bits : the number of characters in the filename (max 8, so 0..7 + 1)
- 4 bits : for other uses you can think of
1 bytes or more : the first 1 characters (or more) in the file name

so minimum 5 bytes per entry + a few bytes temporary to swap the entries when sorting. If you want to reduce shuffles / writes you could add a byte per entry to store the position (9 bits for 0..511, position when sorted, the 9th bit could go in the 3 bytes for offset, because you have maximum 8 MB and that can be stored in 23 bits) and only update that single byte of two entries at a time.

so you could do a preliminary sort doing multiple passes through this array ...
1. move directories to the top,
2. group the entries by first letter AND number of characters in file name ex A2 , A1 moved up because they have 2 characters, above file names like AB1 , EBA3 because these have 3 and 4 characters and E is after A etc etc

then for groups of files that have more than 2-3 or whatever characters from the file you cached in the array, you would need to seek and load the full file names to compare two file names at a time

You could also just sort only if the total number of files is less than some amount, for example 64-128 files max. If the amount is exceeded, simply dump the listing to the browser and optionally add some javascript routine to sort.

Nominal Animal · « **Reply #16 on:** June 25, 2022, 05:29:57 pm »

Quote from: Mechatrommer on June 24, 2022, 03:44:07 pm

Quote from: peter-h on June 21, 2022, 09:08:46 am
Code: [Select]
uint32_t get_file_list( char *flist, uint32_t maxcount ) ... memcpy(&flist[idx],&fno.fname,12); ...So if you have 500 files you scan the dir 500 times, but with FatFS running on a FLASH (or RAM) disk this is very fast...
why? you already saved everything in flist, why the need to rescan 500x?

No. It is because peter-h uses f_findfirst() and f_findnext(). Each call to either of these scans once through the file list, to obtain a single (the first or the next) entry.

The code I described in reply #10 and showed in reply #12 uses f_readdir() instead. This one simply reads the next entry in the directory.

Essentially, my dirscan_get() is like f_findnext() (as dirscan_begin() does not return any files, just prepares the directory for scanning), as it too does a full scan through the directory, except that mine returns the next ents_max entries using a custom selection and sorting function, much like POSIX.1 scandir(). Note, the sorting is global (done across the entire directory, not just the next ents_max entries in it). Thus, while f_findfirst()/f_findnext() does do N passes to obtain all N files in a directory, mine only does N/ents_max passes, and you get to freely choose ents_max yourself (including using a different count for consecutive dirscan_get() calls. That is, you do K passes, where the sum of ents_max is N.)

The core problem and idea is that there is not enough RAM to store the entire array describing the N files; much less all of them in FILINFO structures as returned by f_findfirst()/f_findnext()/f_readdir(). In a typical browser-based file list, you have the file name and extension, plus the size in bytes, and optionally an icon (or a dash at end of filename) to indicate whether the name refers to a file or directory, is readable and/or writable (by the current user), and so on. Thus, this kind of code is needed, to trade additional processing work and rereading data for smaller RAM footprint.

The other option, the one peter-h alluded to, is to provide the list of directory entries to the user browser in unsorted format, and use JavaScript to populate the listing in desired order, using the browser resources. Because anything with a browser nowadays has plenty of RAM, this is a very viable option, just requires more setup work (essentially, a template for a directory listing, where the list of unsorted directory entries is inserted into something like a JavaScript object – I recommend splitting it into separate header and footer parts, and saving these in reserved file names in the root directory, perhaps ________.__h and ________.__f, or some such).

My own suggestion implements a middle ground, using minimal additional memory. (We could stuff even more entries by using 6 bytes plus file name (four bytes for file size, one byte for type/attributes, unpadded file name, plus terminating NUL byte), but that would add somewhat more processing work; namely, the insertion point to keep the compressed/variable-length array would have to be found using a linear search, not a binary one.)

Adapting to the other option, we could supply it with a fixed-size char buffer –– reuse the TCP send buffer! ––, so that after the JS header part has been emitted, the code would fill the buffer with JS object strings ({n:"filename.ext",a: attrs,s:size},) that could be directly sent to the client browser, and the JS footer part would actually emit the DHTML to convert that array of objects into a nicely formatted file listing, optionally with re-sort buttons as well.

If the directories are read/traversed much more often than the files are modified, then one further option would be to regenerate the file listing page locally after each modification is complete, and then just emit that page to the client whenever the directory listing is desired. I do not think it would be appropriate here.

Mechatrommer · « **Reply #17 on:** June 25, 2022, 07:01:27 pm »

Quote from: Nominal Animal on June 25, 2022, 05:29:57 pm

No. It is because peter-h uses f_findfirst() and f_findnext(). Each call to either of these scans once through the file list, to obtain a single (the first or the next) entry.

but he already saved/duplicate every search result he needs in flist

that means he has enough memory for that... unless his problem is how to minimize flist size, then indirect pointer method suggested by mariush is the way, he probably needs only 2 - 4 bytes per entry in flist, not 5 not 11, depending on FAT size on the disk, ie pointer is the location on the FAT (if he has such a low level access). but if i read correctly, its the sorting problem (when he already has flist).

Nominal Animal · « **Reply #18 on:** June 25, 2022, 09:10:33 pm »

Quote from: Mechatrommer on June 25, 2022, 07:01:27 pm

Quote from: Nominal Animal on June 25, 2022, 05:29:57 pm
No. It is because peter-h uses f_findfirst() and f_findnext(). Each call to either of these scans once through the file list, to obtain a single (the first or the next) entry.
but he already saved/duplicate every search result he needs in flist

Only up to maxcount. peter-h did not implement a way to obtain more than the first maxcount entries of a directory.

Quote from: Mechatrommer on June 25, 2022, 07:01:27 pm

indirect pointer method suggested by mariush is the way

No, it will not work reliably, because FAT does not have inode numbers. Consider what happens if the directory is modified between indirect index generation and its use: the wrong file will be referenced. Horrible; I'd never accept software that does that sort of a thing.

What does my solution do, then? It may show stale data (deleted or entries that have since been renamed), but never confuse them, since each dirscan_get() does a full pass over the directory contents.

Quote from: Mechatrommer on June 25, 2022, 07:01:27 pm

but if i read correctly, its the sorting problem (when he already has flist).

The way I read the question is: "I have very limited memory, not enough to store the names and sizes and attributes of all directory entries in a directory. How can I provide a list of the directory entries to a remote browser client, in sorted order?"

His example code (not a solution, but a reference) uses f_findfirst()/f_findnext() to read as many of the entries that fit in the given buffer. It neither sorts nor supports more directory entries that can fit in memory.

My suggestion describes code that acts like f_findnext(), but provides the next ents_max entries of the directory, optionally filtered, (and the entire directory) sorted using an arbitrary function, doing only one pass over the directory per call. It has a somewhat similar interface to POSIX.1 scandir() function (which is in all ways superior to opendir()/readdir()/closedir() that idiots continuously suggest to new programmers even on POSIX-compatible OSes), but needs to be called repeatedly until all entries have been reported. That is, it solves both the excess directory scanning problem (f_readdir() being used, instead of f_findfirst()/f_findnext(), thus only number_of_entries/ents_max passes over the directory is done), the sort problem (by letting the caller specify the sort function on dirscan_begin() call), the unstated filtering problem (you don't normally want to list all files; you may wish to e.g. omit hidden or non-readable files in such file lists), and the memory limitation (it works even when you supply a buffer of just one entry, although that's a bit silly).

Since then, peter-h has pivoted to a different approach, somewhat dropping the entire question. This approach is to emit the directory listing as a JavaScript-enhanced HTML page, and do any sorting on the client side. This means that to a HTTP directory listing request, the response boils down to something like the following:

Code: [Select]

FRESULT http_dirlist_response(connection_t *client, const char *path, int (*filter)(FILINFO *))
{
    DIR      dh;
    FILINFO  fi;
    FRESULT  rc;

    /* Open the specified path, to make sure it exists. */
    if (!path || !*path)
        rc = f_opendir(&dh, "/");
    else
        rc = f_opendir(&dh, path);
    if (rc != FR_OK)
        return rc;

    /* Send the header part of the page.
        We assume this ends in the middle of a JavaScript assignment, say
        var fileList = [
    */
    rc = http_dirlist_header(client, path);
    if (rc != FR_OK) {
        f_closedir(&dh);
        return rc;
    }

    /* List each directory entry as a JavaScript object */
    while (1) {
        rc = f_readdir(&dh, &fi);
        if (rc != FR_OK || !fi.fname[0])
            break;

        if (filter && !filter(&fi))
            continue;

        rc = http_dirlist_entry(client, &fi);
        if (rc != FR_OK)
            break;
    }

    /* If successful thus far, emit footer part of the page, starting with
            ];
        to end the JavaScript directory entry object array. */        
    if (rc == FR_OK)
        rc = http_dirlist_footer(client, path);

    if (rc != FR_OK) {
        f_closedir(&dh);
        return rc;
    }

    return f_closedir(&dh);
}

where connection_t *is just a handle to the client, it could be anything, really.

The above refers to three other functions:

http_dirlist_header(connection_t *client, const char *path) - emitting the fixed initial part of the HTML directory listing to the user
http_dirlist_entry(connection_t *client, FILINFO *entry) - emitting a single entry in the directory listing to the user
http_dirlist_footer(connection_t *client, const char *path) - emitting the fixed final part of the HTML directory listing to the user

If specified, the filter(FILINFO *entry) function can be used to suppress listing of specific files, such as hidden files, and files used internally by this implementation (for example, in the root directory, describing the header and the footer of the HTML listing page). The entries themselves are not sorted, as that is done by the client-side Javascript.

If the footer part contains say

Code: [Select]

<table>
 <thead>
  <tr>
   <th><a class="th" onclick="return sortType();">Type</a></th>
   <th><a class="th" onclick="return sortName();">Filename</a></th>
   <th><a class="th" onclick="return sortTime();">Modified</a></th>
   <th><a class="th" onclick="return sortSize();">Size</a></th>
  </th>
 </thead>
 <tbody id="listparent"></tbody>
</table>

then a simple JavaScript for loop can construct the dynamic HTML (<tr><td>Type</td><td>Name</td><td>Date Time</td><td>Size</td></tr>) for each directory entry in sorted order, combine them into one long string, and (re)display the list using
document.getElementById("listparent").innerHTML = stringCombiningTheTableRows;

If the header part ends with JavaScript <script type="text/javascript">var listing = [ and the footer part starts with ];, then defines the four sort functions (that return False), and finally adds document.addEventListener("load", sortName);, then when the page loads, the file list is initially shown as sorted by sortName().

As you can see from the above outline, this is a reasonable solution, pushing the memory-hungry sorting to the client, with minimal server-side processing. Sure, there are additional functions, but they shouldn't be too large, and one can also drop f_findfirst()/f_findnext() support (setting FF_USE_FIND=0).

In a properly designed implementation, however, we'd write the response handler as a coroutine, so that the TCP stack, HTTP server, and/or RTOS used can call it to generate around a packet's worth of more data to send to the client, instead of the single-threaded one-client-at-a-time approach shown above. But that gets into some annoyingly complicated stuff (as in, depending on everything else that is being used) that is better discussed in a separate thread.

In case it is not obvious, my intent with these posts is not to just reply to those who replied to me, but hopefully to describe the solutions and approaches in a way that is understandable to even those who much later on stumble on this thread, looking for answers to a similar question. So, if I happen to explain something that you very well know, do not be offended: I'm not assuming you don't understand this, I'm just trying to keep things at a level that anyone interested can follow too. Answering to just one person with this kind of verbosity (which I cannot help but do) isn't much fun; but writing it in a way that it might help others too, makes it worthwhile and fun for me.

Mechatrommer · « **Reply #19 on:** June 26, 2022, 06:26:31 am »

sorry i'm not the FAT expert, but i believe FAT12 is quite simple than other modern FAT32 etc, you can lock the disk to avoid new file/dir creation during record scan/capture process.. or embed the pointer array into the FAT library continually evolved/updated in every FAT update or new file/dir creation. but it can be daunting task for some people... ymmv.

peter-h · « **Reply #20 on:** June 28, 2022, 10:52:53 am »

Thank you all for your ideas.

I don't personally code in JS so have passed this on to the guy doing that part.

I want the JS sorting option to be entirely optional, so the basic project makes no JS assumption in the browser. The reason for this is not trivial; basically it is to make sure that all current and future browsers will function adequately. The product is serving HTTP, not HTTPS, which generates dire warnings to the user, and even if HTTPS the certificate would have to be self signed and that is something browsers increasingly really dislike and emit ever more dire warnings over

So one day the only clients which will work with such a product may be some oddball stuff.

Or I could have just written the C code to sort the listing in the product, but that is less flexible if you want to sort by name, by size, etc. In most applications there will be only a few files actually so this won't be an issue, but it might be, hence the extra flexibility of client side sorting is probably a good way to do it.

One often finds that the original idea is not all that great and there is a much better way, coming out of a discussion

Nominal Animal · « **Reply #21 on:** June 28, 2022, 03:41:08 pm »

Quote from: peter-h on June 28, 2022, 10:52:53 am

if you want to sort by name, by size, etc.

Yup; I personally often do (even in desktop environments I prefer to see folder contents as lists, instead of as icons, if possible), so that's why my C implementation takes a separate sort function pointer.

Note that in the HTML+JS approach I showed, JS support is mandatory, but all JS is embedded in the same HTML file, not as external files. This makes JS support pretty solid in my experience. To add plain HTML listing, you need to split the HTML index page into at least three static parts: header, middle, and footer. I would put these as "hidden" (specially named, filtered out in the file list and file access functions) files in the root directory. The JS directory entry list would be emitted between header and middle, and the initial HTML table rows between middle and footer (within the <tbody id="listparent"> and </tbody> elements).

It would be very interesting and nice if you could afterwards post a summary of the solution, and the reasons why you ended up with that, and if possible, what you would have done differently if you were to start the project from scratch.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FatFS sorted directory listing, but without needing a large RAM buffer (Read 2485 times)

Share me