Products > Programming

FatFS sorted directory listing, but without needing a large RAM buffer

<< < (4/5) > >>

mariush:
I don't think you need 17 bytes per entry. You don't need to store the file size, the attributes and the whole file name and extension

For example, I'm thinking you could use as little as 5-6 bytes per file entry

3 bytes - offset to where to the file information is located (so you can seek and retrieve file size, attributes, whole file name and extension when needed)
1 byte
   -    1 bit :  is it a directory or not (assuming you want to have the directories at the top)
   -  3 bits : the number of characters in the filename (max 8, so 0..7 + 1) 
   -  4 bits : for other uses you can think of
1 bytes or more : the first 1 characters (or more) in the file name

so minimum 5 bytes per entry + a few bytes temporary to swap the entries when sorting.  If you want to reduce shuffles / writes you could add a byte per entry to store the position (9 bits for 0..511, position when sorted, the 9th bit could go in the 3 bytes for offset, because you have maximum 8 MB and that can be stored in 23 bits) and only update that single byte of two entries at a time.

so you could do a preliminary sort doing multiple passes through this array ...
1. move directories to the top,
2. group the entries by first letter AND number of characters in file name  ex  A2 , A1 moved up because they have 2 characters, above file names like AB1 , EBA3 because these have 3 and 4 characters and E is after A etc  etc

then for groups of files that have more than 2-3 or whatever characters from the file you cached in the array, you would need to seek and load the full file names to compare two file names at a time


You could also just sort only if the total number of files is less than some amount, for example 64-128 files max. If the amount is exceeded, simply dump the listing to the browser and optionally add some javascript routine to sort.


Nominal Animal:

--- Quote from: Mechatrommer on June 24, 2022, 03:44:07 pm ---
--- Quote from: peter-h on June 21, 2022, 09:08:46 am ---
--- Code: ---uint32_t get_file_list( char *flist, uint32_t maxcount )
...
    memcpy(&flist[idx],&fno.fname,12);
...

--- End code ---
So if you have 500 files you scan the dir 500 times, but with FatFS running on a FLASH (or RAM) disk this is very fast...

--- End quote ---
why? you already saved everything in flist, why the need to rescan 500x?
--- End quote ---
No.  It is because peter-h uses f_findfirst() and f_findnext().  Each call to either of these scans once through the file list, to obtain a single (the first or the next) entry.

The code I described in reply #10 and showed in reply #12 uses f_readdir() instead.  This one simply reads the next entry in the directory.

Essentially, my dirscan_get() is like f_findnext() (as dirscan_begin() does not return any files, just prepares the directory for scanning), as it too does a full scan through the directory, except that mine returns the next ents_max entries using a custom selection and sorting function, much like POSIX.1 scandir().  Note, the sorting is global (done across the entire directory, not just the next ents_max entries in it). Thus, while f_findfirst()/f_findnext() does do N passes to obtain all N files in a directory, mine only does N/ents_max passes, and you get to freely choose ents_max yourself (including using a different count for consecutive dirscan_get() calls.  That is, you do K passes, where the sum of ents_max is N.)

The core problem and idea is that there is not enough RAM to store the entire array describing the N files; much less all of them in FILINFO structures as returned by f_findfirst()/f_findnext()/f_readdir().  In a typical browser-based file list, you have the file name and extension, plus the size in bytes, and optionally an icon (or a dash at end of filename) to indicate whether the name refers to a file or directory, is readable and/or writable (by the current user), and so on.  Thus, this kind of code is needed, to trade additional processing work and rereading data for smaller RAM footprint.

The other option, the one peter-h alluded to, is to provide the list of directory entries to the user browser in unsorted format, and use JavaScript to populate the listing in desired order, using the browser resources.  Because anything with a browser nowadays has plenty of RAM, this is a very viable option, just requires more setup work (essentially, a template for a directory listing, where the list of unsorted directory entries is inserted into something like a JavaScript object – I recommend splitting it into separate header and footer parts, and saving these in reserved file names in the root directory, perhaps ________.__h and ________.__f, or some such).

My own suggestion implements a middle ground, using minimal additional memory.  (We could stuff even more entries by using 6 bytes plus file name (four bytes for file size, one byte for type/attributes, unpadded file name, plus terminating NUL byte), but that would add somewhat more processing work; namely, the insertion point to keep the compressed/variable-length array would have to be found using a linear search, not a binary one.)

Adapting to the other option, we could supply it with a fixed-size char buffer –– reuse the TCP send buffer! ––, so that after the JS header part has been emitted, the code would fill the buffer with JS object strings ({n:"filename.ext",a: attrs,s:size},) that could be directly sent to the client browser, and the JS footer part would actually emit the DHTML to convert that array of objects into a nicely formatted file listing, optionally with re-sort buttons as well.

If the directories are read/traversed much more often than the files are modified, then one further option would be to regenerate the file listing page locally after each modification is complete, and then just emit that page to the client whenever the directory listing is desired.  I do not think it would be appropriate here.

Mechatrommer:

--- Quote from: Nominal Animal on June 25, 2022, 05:29:57 pm ---No.  It is because peter-h uses f_findfirst() and f_findnext().  Each call to either of these scans once through the file list, to obtain a single (the first or the next) entry.

--- End quote ---
but he already saved/duplicate every search result he needs in flist ??? that means he has enough memory for that... unless his problem is how to minimize flist size, then indirect pointer method suggested by mariush is the way, he probably needs only 2 - 4 bytes per entry in flist, not 5 not 11, depending on FAT size on the disk, ie pointer is the location on the FAT (if he has such a low level access). but if i read correctly, its the sorting problem (when he already has flist).

Nominal Animal:

--- Quote from: Mechatrommer on June 25, 2022, 07:01:27 pm ---
--- Quote from: Nominal Animal on June 25, 2022, 05:29:57 pm ---No.  It is because peter-h uses f_findfirst() and f_findnext().  Each call to either of these scans once through the file list, to obtain a single (the first or the next) entry.

--- End quote ---
but he already saved/duplicate every search result he needs in flist ???
--- End quote ---
Only up to maxcount.  peter-h did not implement a way to obtain more than the first maxcount entries of a directory.


--- Quote from: Mechatrommer on June 25, 2022, 07:01:27 pm ---indirect pointer method suggested by mariush is the way
--- End quote ---
No, it will not work reliably, because FAT does not have inode numbers.  Consider what happens if the directory is modified between indirect index generation and its use: the wrong file will be referenced.  Horrible; I'd never accept software that does that sort of a thing.

What does my solution do, then?  It may show stale data (deleted or entries that have since been renamed), but never confuse them, since each dirscan_get() does a full pass over the directory contents.


--- Quote from: Mechatrommer on June 25, 2022, 07:01:27 pm ---but if i read correctly, its the sorting problem (when he already has flist).
--- End quote ---
The way I read the question is: "I have very limited memory, not enough to store the names and sizes and attributes of all directory entries in a directory.  How can I provide a list of the directory entries to a remote browser client, in sorted order?"

His example code (not a solution, but a reference) uses f_findfirst()/f_findnext() to read as many of the entries that fit in the given buffer.  It neither sorts nor supports more directory entries that can fit in memory.

My suggestion describes code that acts like f_findnext(), but provides the next ents_max entries of the directory, optionally filtered, (and the entire directory) sorted using an arbitrary function, doing only one pass over the directory per call.  It has a somewhat similar interface to POSIX.1 scandir() function (which is in all ways superior to opendir()/readdir()/closedir() that idiots continuously suggest to new programmers even on POSIX-compatible OSes), but needs to be called repeatedly until all entries have been reported.  That is, it solves both the excess directory scanning problem (f_readdir() being used, instead of f_findfirst()/f_findnext(), thus only number_of_entries/ents_max passes over the directory is done), the sort problem (by letting the caller specify the sort function on dirscan_begin() call), the unstated filtering problem (you don't normally want to list all files; you may wish to e.g. omit hidden or non-readable files in such file lists), and the memory limitation (it works even when you supply a buffer of just one entry, although that's a bit silly).

Since then, peter-h has pivoted to a different approach, somewhat dropping the entire question.  This approach is to emit the directory listing as a JavaScript-enhanced HTML page, and do any sorting on the client side.  This means that to a HTTP directory listing request, the response boils down to something like the following:

--- Code: ---FRESULT http_dirlist_response(connection_t *client, const char *path, int (*filter)(FILINFO *))
{
    DIR      dh;
    FILINFO  fi;
    FRESULT  rc;

    /* Open the specified path, to make sure it exists. */
    if (!path || !*path)
        rc = f_opendir(&dh, "/");
    else
        rc = f_opendir(&dh, path);
    if (rc != FR_OK)
        return rc;

    /* Send the header part of the page.
        We assume this ends in the middle of a JavaScript assignment, say
        var fileList = [
    */
    rc = http_dirlist_header(client, path);
    if (rc != FR_OK) {
        f_closedir(&dh);
        return rc;
    }

    /* List each directory entry as a JavaScript object */
    while (1) {
        rc = f_readdir(&dh, &fi);
        if (rc != FR_OK || !fi.fname[0])
            break;

        if (filter && !filter(&fi))
            continue;

        rc = http_dirlist_entry(client, &fi);
        if (rc != FR_OK)
            break;
    }

    /* If successful thus far, emit footer part of the page, starting with
            ];
        to end the JavaScript directory entry object array. */       
    if (rc == FR_OK)
        rc = http_dirlist_footer(client, path);

    if (rc != FR_OK) {
        f_closedir(&dh);
        return rc;
    }

    return f_closedir(&dh);
}

--- End code ---
where connection_t *is just a handle to the client, it could be anything, really.

The above refers to three other functions:
* http_dirlist_header(connection_t *client, const char *path) - emitting the fixed initial part of the HTML directory listing to the user
* http_dirlist_entry(connection_t *client, FILINFO *entry) - emitting a single entry in the directory listing to the user
* http_dirlist_footer(connection_t *client, const char *path) - emitting the fixed final part of the HTML directory listing to the userIf specified, the filter(FILINFO *entry) function can be used to suppress listing of specific files, such as hidden files, and files used internally by this implementation (for example, in the root directory, describing the header and the footer of the HTML listing page).  The entries themselves are not sorted, as that is done by the client-side Javascript.

If the footer part contains say

--- Code: ---<table>
 <thead>
  <tr>
   <th><a class="th" onclick="return sortType();">Type</a></th>
   <th><a class="th" onclick="return sortName();">Filename</a></th>
   <th><a class="th" onclick="return sortTime();">Modified</a></th>
   <th><a class="th" onclick="return sortSize();">Size</a></th>
  </th>
 </thead>
 <tbody id="listparent"></tbody>
</table>

--- End code ---
then a simple JavaScript for loop can construct the dynamic HTML (<tr><td>Type</td><td>Name</td><td>Date Time</td><td>Size</td></tr>) for each directory entry in sorted order, combine them into one long string, and (re)display the list using
    document.getElementById("listparent").innerHTML = stringCombiningTheTableRows;

If the header part ends with JavaScript <script type="text/javascript">var listing = [ and the footer part starts with ];, then defines the four sort functions (that return False), and finally adds document.addEventListener("load", sortName);, then when the page loads, the file list is initially shown as sorted by sortName().

As you can see from the above outline, this is a reasonable solution, pushing the memory-hungry sorting to the client, with minimal server-side processing.  Sure, there are additional functions, but they shouldn't be too large, and one can also drop f_findfirst()/f_findnext() support (setting FF_USE_FIND=0).

In a properly designed implementation, however, we'd write the response handler as a coroutine, so that the TCP stack, HTTP server, and/or RTOS used can call it to generate around a packet's worth of more data to send to the client, instead of the single-threaded one-client-at-a-time approach shown above.  But that gets into some annoyingly complicated stuff (as in, depending on everything else that is being used) that is better discussed in a separate thread.


In case it is not obvious, my intent with these posts is not to just reply to those who replied to me, but hopefully to describe the solutions and approaches in a way that is understandable to even those who much later on stumble on this thread, looking for answers to a similar question.  So, if I happen to explain something that you very well know, do not be offended: I'm not assuming you don't understand this, I'm just trying to keep things at a level that anyone interested can follow too.  Answering to just one person with this kind of verbosity (which I cannot help but do) isn't much fun; but writing it in a way that it might help others too, makes it worthwhile and fun for me.

Mechatrommer:
sorry i'm not the FAT expert, but i believe FAT12 is quite simple than other modern FAT32 etc, you can lock the disk to avoid new file/dir creation during record scan/capture process.. or embed the pointer array into the FAT library continually evolved/updated in every FAT update or new file/dir creation. but it can be daunting task for some people... ymmv.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

There was an error while thanking
Thanking...
Go to full version