Author Topic: Glibc read/write operations  (Read 2073 times)

0 Members and 1 Guest are viewing this topic.

Offline KarelTopic starter

  • Super Contributor
  • ***
  • Posts: 2217
  • Country: 00
Glibc read/write operations
« on: March 29, 2022, 09:48:17 am »
Cppcheck reports:

"Read and write operations without a call to a positioning function (fseek, fsetpos or rewind) or fflush in between result in undefined behaviour"

Is this true for glibc when using functions like fgetc() and fputc(), etc?

If you believe this is true for glibc, please show me where this is written in the documentation of glibc because I can't find it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: nz
Re: Glibc read/write operations
« Reply #1 on: March 29, 2022, 10:38:50 am »
Presumably that means switching between read and write or vice versa.

It makes sense. There can be independent buffering on each I/O direction that needs to be synchronised.

I learned back on the PDP-11 that it was necessary to call flush on stdio for for simple programs such as 'print "What is your name?: ";input $name'. There are probably implementations where it is not necessary for disk files, but it makes perfect sense that there will be some where it is.

Apparently this is in C99, but the standard is not free and I don't have a copy. But you can do a Google search for "output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function" and you will find this quoted in many sources, including at Microsoft and other vendors.
 

Offline KarelTopic starter

  • Super Contributor
  • ***
  • Posts: 2217
  • Country: 00
Re: Glibc read/write operations
« Reply #2 on: March 29, 2022, 11:31:46 am »
Thank you brucehoult, I understand what you mean.

However, from a technical point of view, is it necessary to flush or use any of the filepositioning functions when mixing in/output when using glibc?
 

Offline KarelTopic starter

  • Super Contributor
  • ***
  • Posts: 2217
  • Country: 00
Re: Glibc read/write operations
« Reply #3 on: March 29, 2022, 11:35:03 am »
I want to clarify that my question isn't about the Cxx standard. Instead it's about how it's implemented in glibc.
 

Offline Ian.M

  • Super Contributor
  • ***
  • Posts: 12856
Re: Glibc read/write operations
« Reply #4 on: March 29, 2022, 01:58:45 pm »
Compare the text in the man page for glibc fopen():
Quote
       Reads and writes may be intermixed on read/write streams in any
       order.  Note that ANSI C requires that a file positioning
       function intervene between output and input, unless an input
       operation encounters end-of-file.  (If this condition is not met,
       then a read is allowed to return the result of writes other than
       the most recent.)  Therefore it is good practice (and indeed
       sometimes necessary under Linux) to put an fseek(3) or fgetpos(3)
       operation between write and read operations on such a stream.
       This operation may be an apparent no-op (as in fseek(..., 0L,
       SEEK_CUR)
called for its synchronizing side effect).
and that in the C99 standard (draft):
Quote from: '7.19.5.3 The fopen function', clause 6
6 When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters endof-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

I'd read that as: If you don't flush or file position, results of read after write may be unpredictable on *SOME* systems . . .
« Last Edit: March 29, 2022, 02:15:29 pm by Ian.M »
 
The following users thanked this post: Karel, SiliconWizard

Offline Ed.Kloonk

  • Super Contributor
  • ***
  • Posts: 4000
  • Country: au
  • Cat video aficionado
Re: Glibc read/write operations
« Reply #5 on: March 29, 2022, 02:04:08 pm »
Philosophical question incoming...Warning!!

Sorry for the hijack but what is the goal here? Is your code supposed to tell the r/w file pointer what it should be or is the lib supposed to tell you where the file pos pointer is?

Who is supposed to be in charge?
iratus parum formica
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: nz
Re: Glibc read/write operations
« Reply #6 on: March 29, 2022, 02:24:48 pm »
Philosophical question incoming...Warning!!

Sorry for the hijack but what is the goal here? Is your code supposed to tell the r/w file pointer what it should be or is the lib supposed to tell you where the file pos pointer is?

Who is supposed to be in charge?

You can do a relative seek of 0 from the current position, so you don't have to know the actual position.
 
The following users thanked this post: Ed.Kloonk

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: nz
Re: Glibc read/write operations
« Reply #7 on: March 29, 2022, 02:29:28 pm »
I want to clarify that my question isn't about the Cxx standard. Instead it's about how it's implemented in glibc.

Read the code?

I don't know why you'd want to write code that doesn't do a seek() or flush(), knowing that it will work on glibc (if that is so), but WILL fail on other libc implementations. If no synchronisation is actually needed then a seek() will probably be almost free anyway.
 
 

Offline dave j

  • Regular Contributor
  • *
  • Posts: 128
  • Country: gb
Re: Glibc read/write operations
« Reply #8 on: March 29, 2022, 02:52:47 pm »
I want to clarify that my question isn't about the Cxx standard. Instead it's about how it's implemented in glibc.

Which version of glibc?

bruceholt has mentioned the code failing on other libc implementations but there's no guarantee that the behaviour won't change between glibc versions either.
I'm not David L Jones. Apparently I actually do have to point this out.
 

Offline KarelTopic starter

  • Super Contributor
  • ***
  • Posts: 2217
  • Country: 00
Re: Glibc read/write operations
« Reply #9 on: March 29, 2022, 03:01:23 pm »
Compare the text in the man page for glibc fopen()

Thanks! Exactly the answer I was looking for.
I already checked the man pages for fputc(), fgetc(), fseek(), etc. except fopen()...  :-[



 

Offline KarelTopic starter

  • Super Contributor
  • ***
  • Posts: 2217
  • Country: 00
Re: Glibc read/write operations
« Reply #10 on: March 29, 2022, 03:05:02 pm »
I don't know why you'd want to write code that doesn't do a seek() or flush(), knowing that it will work on glibc (if that is so), but WILL fail on other libc implementations.

Because glibc is a requirement.

Anyway, I'll consider to call fseek(..., 0L, SEEK_CUR)  when changing I/O direction.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6253
  • Country: fi
    • My home page and email address
Re: Glibc read/write operations
« Reply #11 on: March 29, 2022, 04:55:24 pm »
Just to be clear: this entire discussion only applies when doing reads and writes to the same stream.

Between a write and a read, you want to do a fflush(stream). This ensures that the writes are visible to future reads.

Between a read and a write, you want to do a fseek(stream, 0, SEEK_CUR). This ensures that the C library is prepared for either input or output to that offset in the stream.

In general, I would recommend using a tristate flag (some signed integer type, say int8_fast_8) that is initialized to zero, set to positive after a read, and set to negative after a write.
Before a read, check the flag. If it is negative, do a fflush(stream).
Before a write, check the flag.  If it is positive, do a fseek(stream,0,SEEK_CUR).



Because I do not want my programs to destroy my existing data, I do prefer programs to write the modified data to a new file, and only if no errors occurred writing the new file, replace the old file with the new file (e.g. using rename()).  In Linux (and all other POSIXy systems), files can always be renamed or replaced with new ones, even if they are in use, because of the dnode (file or directory name) and inode (contents and metadata) separation.  Any process having the old file open will see the old contents, and the disk space will only be freed after the last handle closes.

Because of this, this pattern of mixing reads and writes to the same stream, is actually rather rare.
 
The following users thanked this post: newbrain, DiTBho

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14464
  • Country: fr
Re: Glibc read/write operations
« Reply #12 on: March 29, 2022, 05:23:09 pm »
Because of this, this pattern of mixing reads and writes to the same stream, is actually rather rare.

I've done this pretty rarely myself too, but it's probably not that rare if you're using files as some kind of databases. Like SQLite.
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Glibc read/write operations
« Reply #13 on: March 29, 2022, 06:31:23 pm »
Because of this, this pattern of mixing reads and writes to the same stream, is actually rather rare.

Not often but it happens with one of the applications I linked to my B*tree library.

It's the side-effect cause by its disk-virtual-memory, the B*tree tries to allocate a new block on the pool, but when the pool is full, it looks for the less used block and it flushes it back to the disk, then it reloads a new block from the disk to the ram.

Here we are, the patterns of behavior looks like { seek(iblock1), write, seek(iblock2), read } on the same stream  :o :o :o
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6253
  • Country: fi
    • My home page and email address
Re: Glibc read/write operations
« Reply #14 on: March 29, 2022, 07:18:31 pm »
Whenever I need both read and write access to the same file, I use the lower-level <unistd.h> POSIX interfaces –– open(), read()/pread()/readv(), write()/pwrite()/writev(), fsync()/fdatasync(), close(), with advisory record locking via fcntl() –– that use file descriptors, and have a very different set of rules.

For database stuff, I use POSIX memory mapping with the Linux-specific MAP_NORESERVE, so that the size of the mapping is not limited to the size of available RAM and swap, but can exceed a terabyte on 64-bit architectures, while consuming only moderate amounts of actual RAM (kernel page tables in particular; those have to stay in memory).  The way memory maps are implemented in Linux means that all normal file accesses (that go through the page cache; i.e. everything except open(path, O_DIRECT|...)) are in sync with the memory map, which is quite useful.

With the low-level I/O, taking an exclusive lock on the region to be written, then writing the data noting that each write call may be short (write less than requested) so a loop is needed, then releasing the record lock, makes for very robust file access even when multiple processes access that same file.  When the file is stored on a shared volume, as long as it is configured to support file locking, the same automatically works even across machines.

I've even used POSIX fcntl() file leases, for the case when an untrusted black box process occasionally opens an important file and sometimes scribbles all over it, to grab a copy of the contents of the file prior to that access.  It has its limitations, and nowadays the Linux fanotify provides a better interface for it; just don't confuse that with inotify, which provides only filesystem events, and not access interception capabilities.

However, this is quite POSIX-specific, and thus not really portable (to Windows; everything else is more or less POSIX-y already), unlike the standard C <stdio.h> I/O.

The one function I wish that C or POSIX would adopt, that already is available in Linux and BSDs, is asprintf().  It is so nice to not worry about the buffer size, and have it just allocate one dynamically as needed.  It can be implemented in terms of snprintf() by "printing" it twice: the first time to find out the size, and the second time if the initial dynamic buffer size guess was wrong.  A "native" implementation is usually more efficient than that.  It would be even better if the interface was ssize_t msprintf(char **dataptr, size_t *sizeptr, const char *format, ...) similar to getline(), so that one could reuse the same buffer but just have the print function reallocate it as needed.  But me and the C and POSIX standards folks are not on speaking terms  :-[.

For those writing tools that read file names or paths from a stream, I recommend supporting the nul ('\0') separator (similar to xargs -0) among the more commonly used options.  You then read the names using POSIX getdelim(&lineptr, &sizeptr, '\0', stream), until it returns negative.  (At that point, check that feof(stream) is true and ferror(stream) is false; otherwise, a read error occurred.)  This way, you get the file names in the exact same format the Linux kernel uses them, so all possible file names (including those with newlines and such in their names) will work without issues.

I could go on, and explain why the common tutorial stuff like opendir()/readdir()/closedir() is horrible (because scandir(), glob(), wordexp(), and nftw() exist), but yeah.  There is a reason for everything I suggest, and I'll be happy to describe those reasons, if anyone just asks.  My opinion is worth nothing, but those reasons, they can be discussed rationally.
 
The following users thanked this post: DiTBho

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Glibc read/write operations
« Reply #15 on: March 31, 2022, 08:58:02 am »
For database stuff, I use POSIX memory mapping with the Linux-specific MAP_NORESERVE, so that the size of the mapping is not limited to the size of available RAM and swap, but can exceed a terabyte on 64-bit architectures, while consuming only moderate amounts of actual RAM (kernel page tables in particular; those have to stay in memory).  The way memory maps are implemented in Linux means that all normal file accesses (that go through the page cache; i.e. everything except open(path, O_DIRECT|...)) are in sync with the memory map, which is quite useful.

Yes, this is the best way to choose an application that needs to run on Linux natively. In my case, Linux is a kind of test bed for software running on a bare-metal system.

Basically I manually implemented a mini-virtual memory engine, things that are offered for free by the Linux kernel, as you pointed out, just ... I have seen this coding-model applied on Haiku-OS and WxWorks. Probably because Linux offers features that are lacking on other UNIX-like operating systems  :-//
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14464
  • Country: fr
Re: Glibc read/write operations
« Reply #16 on: March 31, 2022, 05:41:06 pm »
For database stuff, I use POSIX memory mapping with the Linux-specific MAP_NORESERVE,(...)

Yes, that would be the best approach, but is not fully portable...
Speaking of SQLite, I've used it, but I admit I haven't taken a look at how they do it.

For performance reasons, the "best" approach is probably to use POSIX mmap for platforms supporting it, and an alternative for other platforms.
 
The following users thanked this post: DiTBho

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6253
  • Country: fi
    • My home page and email address
Re: Glibc read/write operations
« Reply #17 on: March 31, 2022, 06:31:07 pm »
Basically I manually implemented a mini-virtual memory engine, things that are offered for free by the Linux kernel, as you pointed out
Right; sometimes the memory mapping approch just isn't valid.  Another example would be a low-powered embedded device, which provides access to some large database, where the time is not as big of a factor as RAM footprint is: then I would use low-level I/O as well.

For database stuff, I use POSIX memory mapping with the Linux-specific MAP_NORESERVE,(...)
Yes, that would be the best approach, but is not fully portable...
Speaking of SQLite, I've used it, but I admit I haven't taken a look at how they do it.
It is actually quite nice, implementing its own pager (page cache), with memory mapping support on many OSes (even partial maps, not simply "map this entire file" stuff).



For implementing low-level I/O access to a binary database-like file, I do recommend taking a look at the POSIX pread() and pwrite() functions.  They take a file descriptor, pointer to the buffer, the size of that buffer (noting that it is not guaranteed that all of that is read or written!), plus the offset at which the read/write should start.  These do not affect the file position, you see.

For portable code that needs to do reads and writes to the same stream, I would consider implementing wrapper functions
    size_t  file_read(FILE *stream, void *buffer, size_t size, size_t count, off_t offset);
    size_t  file_write(FILE *stream, const void *buffer, size_t size, size_t count, off_t offset);
the functions returning zero with errno set to indicate the error, if an error occurs; otherwise the count of successfully read or written records of size bytes each.

On Linux and Unix systems that do provide unlocked stdio, they can be made thread-safe by locking the stream handle (using flockfile()/funlockfile()).  They'd do an fseek() to the specified offset by default, and the write a fflush() afterwards (before releasing the stream handle).

On systems that do have the file descriptor abstraction (basically all; Windows just calls them handles instead of descriptors), I would consider
    size_t  fd_read(int desc, void *buffer, size_t size, size_t count, off_t offset);
    size_t  fd_write(int desc, const void *buffer, size_t size, size_t count, off_t offset);
with three different implementations: one for Linux, BSDs, and Unix systems having pread() and pwrite(), one for those that do not, and one for Windows; based on pre-defined compiler macros.  I would also use a compile or run-time option, or perhaps add a sixth flags parameter, so that if desired, the operation takes an advisory record lock via fcntl(); this provides "atomic" accesses (for processes that do take advisory locks; across processes, but not across threads in the same process).

As you can see, the POSIX approach yields much more options – even avoiding the file position mess completely –, and that's why it is more applicable to mixed read and write accesses to binary data.  Of course, one could say you just switch the set of pitfalls, because short reads and writes are always possible in practice (i.e., you need more than one call in a loop to get all the data you want), and because some systems, like Linux, limit a single read or write call to just under 2 GiB (because of historical bugs in certain filesystem kernel drivers).



Thinking about this further, I would claim that the solution here, in this particular case, is not to add a call before and/or after each standard I/O call, but to create wrapper functions that implement the logical read and write operations one needs.  These wrapper functions need to take care of the fflush()/fseek(), obviously, but the "trickiness" is then restricted to those wrapper functions.

My own mind needs this kind of tools to work well on complex applications and problems.  It not only lets my mind concentrate on the issues at the correct complexity level (from nitty gritty details, to the highest concept level "okay, so how are the users going to do their thang with this app?"), but it also lets me unit test such wrappers, and after testing, trust them.  That means that whenever there is a bug, I have sort-of automatically limited the scope of that bug, simply by observing which function reports the problem first.  (That also means my programs are often full of "unnecessary" error checks, with people offhandedly mentioning that "that call can never fail".  Ha!  In a perfect world, yes.  In my world, everything I touch can fail.  :P)
 
The following users thanked this post: DiTBho


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf