EEVblog Electronics Community Forum

Electronics => Projects, Designs, and Technical Stuff => Topic started by: Koen on January 26, 2016, 03:02:47 am

Title: Filesystem to cache small files
Post by: Koen on January 26, 2016, 03:02:47 am
Hello,

    I would like to cache 512 8KB files on an SD card. These files are only of interest to the microcontroller and are write-once/read-often. I am currently placing them on a regular FAT32 partition but the single filenames are 24 bytes and would require "Long filename" or to be divided in folders. Sadly, tonight's test of creating many folders and looking up the files deep in the hierarchy is slow.

    So I would like to use another file system in another partition, eventually code it. I was previously using an SPI flash module with a simple lookup table. I was thinking of shrinking the card FAT32 partition and recreate the SPI flash principle in the raw zone. The FAT32 partition should still be accessible by the microcontroller and exposed through USB-MSC. Is there a risk to this ? First thing which comes to mind is that users could re-format the USB-MSC drive as one single partition and override the second "raw zone".

Thank you very much,
Koen
Title: Re: Filesystem to cache small files
Post by: adam1213 on January 26, 2016, 06:40:17 am
What sort of data are you storing?
Is the problem you are having that alternating between one file and another is slow? if this is the case you could try opening all the files you need initially (though you may need to deal with eventually closing them - e.g. on power loss)
Title: Re: Filesystem to cache small files
Post by: free_electron on January 26, 2016, 07:38:43 am
Fat13. Used for floppies.
Title: Re: Filesystem to cache small files
Post by: Psi on January 26, 2016, 07:50:11 am
I'm not sure what your issue is with long filenames, fat32 supports them.

Here's a interesting idea. You could code the mcu to read/write a zip file containing your 512 files in "store" mode (no compression).

Title: Re: Filesystem to cache small files
Post by: bktemp on January 26, 2016, 08:59:42 am
Hello,

    I would like to cache 512 8KB files on an SD card. These files are only of interest to the microcontroller and are write-once/read-often. I am currently placing them on a regular FAT32 partition but the single filenames are 24 bytes and would require "Long filename" or to be divided in folders. Sadly, tonight's test of creating many folders and looking up the files deep in the hierarchy is slow.
Why do you need 24 byte long filenames? Can't you store the name somewhere else and use shorter names?

Quote
So I would like to use another file system in another partition, eventually code it. I was previously using an SPI flash module with a simple lookup table. I was thinking of shrinking the card FAT32 partition and recreate the SPI flash principle in the raw zone. The FAT32 partition should still be accessible by the microcontroller and exposed through USB-MSC. Is there a risk to this ? First thing which comes to mind is that users could re-format the USB-MSC drive as one single partition and override the second "raw zone".
Can't you lock the raw sectors in the firmware by simply telling a smaller size to USB to avoid writing into the raw zone? Or creating multiple partitions and only forwarding the main partition to USB?

If that is your problem, I see no difference to writing the data into files, because they will also get erased when the drive is being formated.

If you write the date once, you could emulate the SPI flash style using a single file on the drive with your table and all 512 8kB sections. Then you can simplify the filesystem (if you only write once and never need to erase individual files it will simplify everything because you can create a fixed table at the beginning with the names and the start addresses for each datablock and simply add the data at the end.)
Title: Re: Filesystem to cache small files
Post by: daveatol on January 26, 2016, 09:12:22 am
Can't you just concatenate all 512x8kB files into a single 4MB file on the partition root? Call it "data.bin" or whatever, and just tell the user that it has to be present for your device to work (and that maybe they should back it up). This means you only have to look up one file, seeking should be fast, and its contents can be in any format that you want (so it's similar to the raw format you were using before).
Title: Re: Filesystem to cache small files
Post by: Koen on January 26, 2016, 10:53:18 am
Hello and thank you for your answers !

What sort of data are you storing?
The regular FAT32 partition stores a 4GB file of which only few blocks need to be read by the microcontroller. Once processed, the result is stored in these 8KB files and later retrieved as pre-computed.

I'm not sure what your issue is with long filenames, fat32 supports them.
The issue is licensing it from Microsoft or having to implement it myself. In which case I'd rather implement the solution below.

Why do you need 24 byte long filenames? Can't you store the name somewhere else and use shorter names?
The unique identifier of each file is 24 bytes.

If you write the date once, you could emulate the SPI flash style using a single file on the drive with your table and all 512 8kB sections.
Can't you just concatenate all 512x8kB files into a single 4MB file on the partition root?
I will try this today.
Title: Re: Filesystem to cache small files
Post by: Koen on January 26, 2016, 01:09:49 pm
Hello, I am coding a test case and a related question popped up. When the Flash memories I used are delivered or later erased, the default unused state of data is 0xFF. To reduce wear from writes, I would code my unused sections as 0xFF. With SD cards, the unused state seems to be announced by the manufacturer in the SCR register and a majority of cards use 0x00. Is this correct ? Thank you !
Title: Re: Filesystem to cache small files
Post by: tooki on January 26, 2016, 01:11:00 pm
Why do you need 24 byte long filenames? Can't you store the name somewhere else and use shorter names?
The unique identifier of each file is 24 bytes.
I think you missed the point. You don't need 24 bytes (192 bits) of filename to distinguish just 512 files (it takes only 9 bits). In other words, just use one Table of Contents file (e.g. cache.toc) containing an array that associates each 24 byte UID to a file with a short name (e.g. cache001.dat). This completely avoids the need to store the UID as the file name. Instead of dicking around with Microsoft long file name licensing, just use this lookup array. Easy.
Title: Re: Filesystem to cache small files
Post by: Koen on January 26, 2016, 01:40:31 pm
I missed the point indeed, thank you.

Now I wonder if it is better to open a single cache file and seek into it to the 8KB block of interest [1] or to open different files [2].

Code: [Select]
open cache file
  store cache index from first 8192 bytes to memory
  for(i from random+0 to random+16)
    seek to location from cache index [i]
    read 8192 bytes of interest
close cache file

Code: [Select]
open cache index file
  store cache index to memory
close cache index file

for(i from random+0 to random+16)
   open cache file at path from cache index [i]
     read 8192 bytes of interest
   close cache file

I'll try both but [1] is limited by the pre-allocated size of the cache file and always occupies 4MB, even if nothing is cached.

Whereas your solution would allow the cache to grow with only the card size for limit.

I tried to keep the first post simple but there would be multiple caches like this. Some might need only 10 cache files but others could do with 1024 or more.

Thank you.
Title: Re: Filesystem to cache small files
Post by: bktemp on January 26, 2016, 02:05:07 pm
It is impossible to give you an answer without knowing more about the purpose of your cache system.

I would use a single file:
A typical cache should be fast, therefore the index should be kept in ram. I assume you need an index (the 24 byte long name). Maybe you can calculate the index directly from the 24 byte name without any additional table.
From the index you get the address in memory (= in the cache file). If you keep the file open all the time you can seek very fast to a specific location and read the desired entry.
If you allocate the cache file at once on an empty card it will not be fragmented, so the filesystem does not need to jump    in the cluster chain.
You can also increase the cache file anytime if needed, but it will fragment the file if other files have been created and make it slower.

Opening different files is slower in most cases because the file system drives needs to go through the directory list and look for the specific filename. But it highly depends on the filesystem driver and the amount of data it buffers in ram.
Title: Re: Filesystem to cache small files
Post by: Koen on January 26, 2016, 02:40:03 pm
Yes, I will test both and decide. Performance and power consumption is important and I do have a lot more memory available on the card than useful so wasting even 64MB per cache, even empty, isn't a problem.

I have a 4GB file divided in multiple headers and sections (A, B, C, ...). These sections contain subsections (A1, A2.., B1, B2..). On the first call, I have to open/read/compute the main header, the section A header then finally A125, A938, A428, Arandom, ... then store the results in corresponding A125, A938, A428 cache files.

On the next call, I can directly access A125, A938, A428 cache files.

I can divide the cache in 4MB or more files for each A, B, C... sections. 4MB is enough but as it is write-once/read-often and 1 GB on disk is free, there is no harm in using a bigger cache except for the size of the index which still has to be query-able in RAM.

I have no prior knowledge of the sections needed. 32 subsections of section A might end up read and cached, 1024 of section B, and so on.

I would prefer not to "harm" the card for wear/reliability concerns.

Thank you !
Title: Re: Filesystem to cache small files
Post by: Karel on January 26, 2016, 03:18:47 pm
The issue is licensing it from Microsoft ...

You don't need a licence for FAT32 if you don't implement the short filenames, just only the long filenames.
It's what the Linux kernel maintainers did after TomTom got sued by microsoft for using FAT32.
The patent involved describes a way to combine short and long filenames. Simply not writing the short filenames
avoids a patent conflict.
Title: Re: Filesystem to cache small files
Post by: djacobow on January 26, 2016, 04:01:01 pm
It is impossible to give you an answer without knowing more about the purpose of your cache system.

I would use a single file:
A typical cache should be fast, therefore the index should be kept in ram. I assume you need an index (the 24 byte long name). Maybe you can calculate the index directly from the 24 byte name without any additional table.
From the index you get the address in memory (= in the cache file). If you keep the file open all the time you can seek very fast to a specific location and read the desired entry.
If you allocate the cache file at once on an empty card it will not be fragmented, so the filesystem does not need to jump    in the cluster chain.
You can also increase the cache file anytime if needed, but it will fragment the file if other files have been created and make it slower.

Opening different files is slower in most cases because the file system drives needs to go through the directory list and look for the specific filename. But it highly depends on the filesystem driver and the amount of data it buffers in ram.

This. Single file with everything catted together, with lookup table to offsets in the file for various files. If you want to get really clever, you can put the lookup table at the beginning of the file, too, so it's all just one blob (Basically a simple rom-fs). Also, try it out on a PC first so that you can make sure it all works as planned -- much easier to debug there.
Title: Re: Filesystem to cache small files
Post by: Koen on January 26, 2016, 04:23:49 pm
Hello Karel, djacobow

      I've opened another topic earlier about LFN licensing on this same subforum.

      With the previous cache on SPI Flash, I had the 4MB available divided in 512 blocks of 8192 bytes with the index in the first block and caches in the following blocks. I'll reproduce this with a file tonight.

Thank you !