Author Topic: C: read() and write(). (Read 6412 times)

hamster_nz · « **on:** March 16, 2021, 11:07:51 pm »

Who uses read(), write(), open() & close() functions in C?

Do you:

a) check for EINTR and retry?

b) check for things like short writes, and call write() again?

I do both of these, but some people don't believe it is needed, and some wave it away saying they use and trust SA_RESTART)...

ataradov · « **Reply #1 on:** March 16, 2021, 11:44:47 pm »

Never done any of those checks. I just fail on any error. Never had a single problem with that.

Nominal Animal · « **Reply #2 on:** March 17, 2021, 01:26:07 pm »

Quote from: hamster_nz on March 16, 2021, 11:07:51 pm

a) check for EINTR and retry?

Depends on whether I'm using timeout (a signal delivered to an userspace handler installed without SA_RESTART).

Quote from: hamster_nz on March 16, 2021, 11:07:51 pm

b) check for things like short writes, and call write() again?

Always. (Well, there are a couple of exceptions, but those involve specific types of file descriptors with Linux-specific guarantees/semantics.)

For me, this is just the bare minimum, because otherwise you could be losing or garbling data and not know it. "Never had a problem" is just an indication you don't care, and it never bit you bad enough for you to notice.

The things I've had to argue with people are checking the return value of close() for a delayed write error (or some other kernel-internal filesystem errors); and whether it makes sense to check for malloc()/calloc()/mmap() failures, or just let the process die from segment violation when it dereferences a NULL pointer. (The latter especially on systems with memory overcommit.)

The purpose for me is that the data my code deals with is important to the user. If the kernel reports there was something iffy, suspicious, or just abnormal, I believe my ethical responsibility as a software developer is to let the user know about it. Silently garbling data is evil.

I openly admit this is rather "paranoid" (non-trusting) approach, but this is the way I do my job. More than once have my programs been the first indication of malfunctioning hardware (although that usually leads to claims my "code is buggy, because everything else works fine" – except that on more careful checking, some data has already been garbled). Others do it differently.

Smart people do it the way their employer pays them to do it.

ejeffrey · « **Reply #3 on:** March 17, 2021, 03:44:43 pm »

Well to check for errors you have to check for a short write. An error that occurs after writing at least one byte will cause a short write.

Whether you have to worry about EINTR or short writes caused by interrupted system calls depends on your application. If you are writing a library that might be used in a larger program or you don't have control over signal usage or you know you are using signal handlers without SA_RESTART then you need to be prepared for that.

In linux at least, ordinary file writes are not interruptible so if you know you are accessing an ordinary file then short write == error and you don't have to worry about signals. Also, writes to pipes less than the pipe buffer size are guaranteed to be atomic. Therefore you will never get a short write only an error with no bytes written or success.

ataradov · « **Reply #4 on:** March 17, 2021, 03:57:03 pm »

Just to clarify my point. I don't just ignore status codes or short writes. I put asserts on that stuff and let the program fail with a meaningful message. Once it does, I have a good reason to investigate and add the handling into the code.

There are so many ancient error codes and behaviors that no longer happen on modern OSes. You will go crazy for no reason trying to handle all that.

Same with malloc(). I never "handle" NULL result. I just assert.

radar_macgyver · « **Reply #5 on:** March 17, 2021, 03:57:48 pm »

My take is there's no 'one size fits all'; I tend towards the paranoid approach when write()ing or read()ing sockets or device fds, but not with files (for the reasons given by ejeffrey - haven't had to write C code outside Linux and embedded). But, as Nominal points out, this can lead to silent data corruption if one isn't careful. For socket and device operations I have some wrapper functions that catch EINTR/short writes and retry them, close() and reconnect the socket for zero-byte writes and syslog() other errors before aborting the program. With files, I check the return value of read() or write() against what I expected to send/receive and if they don't match, syslog() and abort.

Nominal Animal · « **Reply #6 on:** March 17, 2021, 04:10:59 pm »

Quote from: ataradov on March 17, 2021, 03:57:03 pm

Just to clarify my point. I don't just ignore status codes or short writes. I put asserts on that stuff and let the program fail with a meaningful message.

I'm relieved to hear that; that is definitely sufficient in my opinion. I consider asserts as one valid way to "handle" errors and unexpected results. They just need to provide enough information to the user to start reacting and investigating.

thinkfat · « **Reply #7 on:** March 17, 2021, 04:20:03 pm »

Quote from: hamster_nz on March 16, 2021, 11:07:51 pm

Who uses read(), write(), open() & close() functions in C?

Do you:

a) check for EINTR and retry?

b) check for things like short writes, and call write() again?

I do both of these, but some people don't believe it is needed, and some wave it away saying they use and trust SA_RESTART)...

Yes to a), though it rarely happens if you're not using signals.
Yes to b) as well, pretty much mandatory for network i/o, if you use O_NONBLOCK and poll() or select(). Not so much for file i/o.

thinkfat · « **Reply #8 on:** March 17, 2021, 04:27:12 pm »

Quote from: ataradov on March 17, 2021, 03:57:03 pm

Just to clarify my point. I don't just ignore status codes or short writes. I put asserts on that stuff and let the program fail with a meaningful message. Once it does, I have a good reason to investigate and add the handling into the code.

There are so many ancient error codes and behaviors that no longer happen on modern OSes. You will go crazy for no reason trying to handle all that.

Same with malloc(). I never "handle" NULL result. I just assert.

IMHO, asserts have no place in production code, other than for "did the sky just fall?" type events that cannot be handled sensibly otherwise. There may be occasions where an OOM condition constitutes such an event. An assert for EINTR or a short write - I hope you're kidding. I'd never let that pass.

Siwastaja · « **Reply #9 on:** March 17, 2021, 04:44:42 pm »

Short writes and reads just happen all the time. Handling them in your application manually sucks but what else can you do. So you have to write a wrapper function for read/write that does some sane thing (what exactly, depends on your application). Sometimes it's just a blocking loop that tries write() as long as all data has been written.

Sometimes in a real hurry in some throw-away code I just assume short write/read doesn't happen and write an assert so that it fails in a controlled way if my assumption fails. You can save a few minutes of work doing this, in a quick test you usually know the conditions - for example, you open a fresh TCP socket and write 5 bytes, it's extremely unlikely to be a short write - so the assumption may be fairly safe.

Asserts are great whenever you don't have time to write proper error recovery, but want to do better than fail doing completely random things in undebuggable way. Proper error recovery might be a week's job, assert is 10 seconds very well spent. You can't write proper error recovery for every imaginable error, you also can't verify 100% everything, so adding asserts to verify your assumptions is a great idea; if you did your job well, they never trig, but in case of human error they are so much better than failing randomly.

Obviously, asserting that write() wrote the length you asked for is a typical example of assert() used wrong. It will crash, it's only matter of when and where.

ejeffrey · « **Reply #10 on:** March 17, 2021, 04:50:07 pm »

Quote from: ataradov on March 17, 2021, 03:57:03 pm

Just to clarify my point. I don't just ignore status codes or short writes. I put asserts on that stuff and let the program fail with a meaningful message. Once it does, I have a good reason to investigate and add the handling into the code.

Sure. If you aren't deliberately using signals to interrupt IO, there is generally no point in "handling" a write failure at the callsite in the sense of trying to remediate the error and retry. You can either fail immediately, or return a failure status up to the user layer to do something sensible such as asking the user to free up disk space or reconnecting a socket. If you *are* using signals like SIGALARM with the intent of interrupting IO, simply retrying a short write() is often not correct. Instead you would check some status variable and decide whether you should abort or retry the transaction.

Quote from: thinkfat

IMHO, asserts have no place in production code, other than for "did the sky just fall?" type events that cannot be handled sensibly otherwise. There may be occasions where an OOM condition constitutes such an event. An assert for EINTR or a short write - I hope you're kidding. I'd never let that pass.

Well, you shouldn't use the assert macro since that gets compiled away if NDEBUG is defined. assert is only intended to be used for internal consistency checks that should never fail, and should not be used for handling errors that can happen. However, the general idea of aborting when you have an error you don't know how to handle including an IO error is fine depending on the circumstances. If EINTR is possible then you should handle it, but there are plenty of circumstances where EINTR is not possible and a short write means there was an error.

ejeffrey · « **Reply #11 on:** March 17, 2021, 04:59:11 pm »

Quote from: Siwastaja on March 17, 2021, 04:44:42 pm

Short writes and reads just happen all the time. Handling them in your application manually sucks but what else can you do. So you have to write a wrapper function for read/write that does some sane thing (what exactly, depends on your application). Sometimes it's just a blocking loop that tries write() as long as all data has been written.

A blocking loop that retries write is rarely correct. Short writes do not happen "all the time" -- they only happen in well defined situations. Namely, a non-blocking file descriptor or socket timeout (if you are using these, obviously you have to handle them properly), invocation of an async signal handler without SA_RESTART (you generally know if you are using this too), or an IO error. If the first two don't apply, retrying won't actually transfer more data, although it will give you the precise error if you care.

Note that there is an asymmetry between read() and write(). Short *reads* happen all the time on sockets and pipes, and if there is any chance you will be operating one one you need to handle that possibility. A read on a socket or pipe will always return as soon as there is any data available, the 'n' is the maximum data to read.

Siwastaja · « **Reply #12 on:** March 17, 2021, 05:01:05 pm »

Related comment:

What I really love in embedded microcontroller environment is that by having access to bare metal (and usually at least somewhat decent documentation of it), a lot of hardware accesses simply can't fail. Can't fail meaning, they do not have error conditions. Sure, the sky can fall rendering the device inoperable; cosmic rays can flip a bit, and so on, but a GPIO write operation cannot return any code at all. Having 100% success guaranteed with predictable timing makes writing the application so much easier, because you don't need to deal with error recovery logic for every trivial thing.

This is also why I loathe stupid libraries that add an extra layer on top of such simple hardware and now this library layer may return an error, forcing you to deal with it.

And let's face it, dealing with errors is extremely difficult and time-consuming, people who claim otherwise just don't know what it means and they just don't deal with errors. In embedded control application, printing an error message and quitting isn't an option. Reboot often isn't an option, even if it takes a millisecond. Restarting control logic to a known state in some guaranteed time while guaranteeing important control features do not stop operating is possibly a month of full time job per error source, or sometimes just simply impossible.

Those shit library writers have no idea about writing robust microcontroller applications.

In desktop computing, we have no choice but to have fairly complex drivers with fairly abstract interfaces because of the wide variety of hardware that must be able to used using common language. This greatly limits what the "language" consists of. So it's something super crude like write() and read(), with gazillion possible sources of errors and unexpected timing.

In microcontroller world, you can write, in 10 lines of code, your own read_cpu_temperature() that returns what you want, directly, and cannot fail.

Siwastaja · « **Reply #13 on:** March 17, 2021, 05:05:39 pm »

write() may be short simply because some internal kernel buffer can't take more input at that time. As of what size the buffers are, who knows. Can you write a gigabyte to /dev/ttyUSB0? No. Can you assume you can write 100 bytes? Why?

I have personally witnessed the need to try writing in a loop with usb serial devices, with Raspberry Pi's internal /dev/serial0, and with TCP sockets.

In that case, flooding write() in a loop hogs up all CPU. Adding some sleep() is the simplest demonstrable way to solve the issue (and works quite well in a separate thread), but using select() is arguably better.

... and when you select() it, your write() may be still short. So then you need to remember how much you were able to write, update your pointer correctly, and wait for the next invocation.

thinkfat · « **Reply #14 on:** March 17, 2021, 05:10:17 pm »

Quote from: Siwastaja on March 17, 2021, 05:01:05 pm

Related comment:

What I really love in embedded microcontroller environment is that by having access to bare metal (and usually at least somewhat decent documentation of it), a lot of hardware accesses simply can't fail. Can't fail meaning, they do not have error conditions. Sure, the sky can fall rendering the device inoperable; cosmic rays can flip a bit, and so on, but a GPIO write operation cannot return any code at all. Having 100% success guaranteed with predictable timing makes writing the application so much easier, because you don't need to deal with error recovery logic for every trivial thing.

This is also why I loathe stupid libraries that add an extra layer on top of such simple hardware and now this library layer may return an error, forcing you to deal with it.

And let's face it, dealing with errors is extremely difficult and time-consuming, people who claim otherwise just don't know what it means and they just don't deal with errors. In embedded control application, printing an error message and quitting isn't an option. Reboot often isn't an option, even if it takes a millisecond. Restarting control logic to a known state in some guaranteed time while guaranteeing important control features do not stop operating is possibly a month of full time job per error source, or sometimes just simply impossible.

Those shit library writers have no idea about writing robust microcontroller applications.

In desktop computing, we have no choice but to have fairly complex drivers with fairly abstract interfaces because of the wide variety of hardware that must be able to used using common language. This greatly limits what the "language" consists of. So it's something super crude like write() and read(), with gazillion possible sources of errors and unexpected timing.

In microcontroller world, you can write, in 10 lines of code, your own read_cpu_temperature() that returns what you want, directly, and cannot fail.

While that's all true for trivial peripherals like a GPIO, already with I2C or SPI you're dealing with external hardware that may behave erratically, and you'll have to deal with it _somehow_, I2C in particular is a quite complex affair with a lot of potential to upset your assumptions. Like, the infamous "clock stretching" that allows for an indefinitely blocked bus if a device falls on its face.

Siwastaja · « **Reply #15 on:** March 17, 2021, 05:33:44 pm »

Quote from: thinkfat on March 17, 2021, 05:10:17 pm

While that's all true for trivial peripherals like a GPIO, already with I2C or SPI you're dealing with external hardware that may behave erratically, and you'll have to deal with it _somehow_, I2C in particular is a quite complex affair with a lot of potential to upset your assumptions. Like, the infamous "clock stretching" that allows for an indefinitely blocked bus if a device falls on its face.

You are right, but the problem gets much worse with I2C or SPI, because used in a typical way, there is just 1-2 well enumerated error sources - for example, receiving NACK - which can be handled in your own code, but using a library then generates 10 more error conditions, and there is no way to know in advance which of the errors will not happen, except to fully study the source code of the library and the documentation of the peripheral; in which case you could have just written bare metal in 1/10th of the time and get exactly what you need.

I2C in itself is a completely unreliable bus, and not only the bus, many I2C slave devices are broken-by-design, so tend to require fairly complex recovery models that also depend on the slave devices, some allow specific transition patterns to reset their FSM's, with others you just need to remember to add the PFET on Vcc. I remember one trivial I2C module ending up almost 1000 lines of code and a month of full-time development just to make it recover from problems of I2C slave FSMs.

But if we had to access said I2C devices through write() and read(), we simply couldn't use them at all. Also, the 1000-line hell I mentioned would have never finished using ST's HAL library. It's simply impossible.

I underline this difference because such way of development sadly isn't possible in the world of general-purpose computers running general-purpose operating systems. In embedded MCU, minimum possible solution is actually doable, and while sometimes really difficult (the 1000-LoC I2C), it's often very simple yet powerful.

ejeffrey · « **Reply #16 on:** March 17, 2021, 05:36:41 pm »

Quote from: Siwastaja on March 17, 2021, 05:05:39 pm

write() may be short simply because some internal kernel buffer can't take more input at that time. As of what size the buffers are, who knows. Can you write a gigabyte to /dev/ttyUSB0? No. Can you assume you can write 100 bytes? Why?

No it won't. File descriptor writes (without O_NONBLOCK) are not limited by the size of kernel buffers. You can absolutely write a gigabyte to /dev/ttyUSB0 and the call should just block until all the data is written, the device disappears, or an asynchronous signal handler is invoked.

Quote

I have personally witnessed the need to try writing in a loop with usb serial devices, with Raspberry Pi's internal /dev/serial0, and with TCP sockets.

In that case, flooding write() in a loop hogs up all CPU. Adding some sleep() is the simplest demonstrable way to solve the issue (and works quite well in a separate thread), but using select() is arguably better.

You are talking about non-blocking sockets. And in that case, select() (or poll() or epoll()) is not arguably better, it is unarguably the correct way to do it. If you are going to spin to emulate a blocking write you shouldn't be using non-blocking IO. If you have to loop like that on blocking sockets and you don't have asynchronous signals (maybe whatever library you used on the Pi installed signal handlers for some reason?) then something is wrong. If you are doing a blocking write but have the possibility of async signal handlers being invoked, then retrying is correct, but you shouldn't need to insert a sleep, you can retry immediately.

Quote

... and when you select() it, your write() may be still short. So then you need to remember how much you were able to write, update your pointer correctly, and wait for the next invocation.

Yes, this is true for non-blocking IO.

Siwastaja · « **Reply #17 on:** March 18, 2021, 07:29:40 am »

Yes, I'm talking about non-blocking sockets or file descriptors opened with O_NONBLOCK, like you would do for /dev/ttyUSB0 for example. I don't know where you did get the idea to limit the discussion to blocking file descriptors only.

Indeed, when I just fopen() a file, I assume write() succeeds with full byte count, unless there is a real error (that doesn't go away by just retrying write with the remaining bytes). In case of short count, I print an error message.

But select() does not have a mechanism to return when some amount of guaranteed buffer space exists. So with select(), as soon as one byte can be written, select() returns and you'll write() to the socket. At that time, it's very well possible you can only write part of your data, so you need to keep track how much you wrote so you know what you'll write next. This is what I have witnessed over and over again so it would be a really unsafe assumption to think you can write any arbitrary amount after select() or poll(), unless the arbitrary amount is 1 byte I guess.

I would like to have a select() which I can tell: return when I can write 42 bytes. This would also save on CPU resources because you could write a larger chunk in one go. Often, it works like this because OS writers have thought about performance, but this is far from guaranteed. The problem also is that your multi-part write() program logic may leave untested because during testing, write() is never short.

The point is, write() and read() seldom result in simple 5-line-of-code solutions. They need logic around them, and all this logic should be rigorously tested, which sadly often doesn't happen in real-world software development.

Nominal Animal · « **Reply #18 on:** March 18, 2021, 11:24:47 am »

(I had to think for quite a while before responding, because this might be seen as nitpicking, but I'll try to express the point clearly. It is a nuance, but an important one, because it gives a developer the basis to make an informed decision on what they consider reasonable in a given situation.)

Quote from: ejeffrey on March 17, 2021, 05:36:41 pm

Quote from: Siwastaja on March 17, 2021, 05:05:39 pm
write() may be short simply because some internal kernel buffer can't take more input at that time. As of what size the buffers are, who knows. Can you write a gigabyte to /dev/ttyUSB0? No. Can you assume you can write 100 bytes? Why?
No it won't. File descriptor writes (without O_NONBLOCK) are not limited by the size of kernel buffers. You can absolutely write a gigabyte to /dev/ttyUSB0 and the call should just block until all the data is written, the device disappears, or an asynchronous signal handler is invoked.

There are a few exceptions, but they are not about kernel buffers per se.

The kernel limits single read and write syscalls to a bit less than two gigabytes (2³¹ bytes). The reason is complicated, but boils down to minimizing bugs due to unstated assumptions in old filesystem drivers. (That is, it is difficult to ensure old filesystem drivers work accurately with over 32-bit requests in all cases.)

A file descriptor refers to a file description (an entry in the open file table), which in Linux has an associated set of handlers; a struct file_operations for a file-like object. Essentially, the write syscall implementation in the kernel (in fs/read_write.c) calls ksys_write() which calls vfs_write(), which maintains the file position associated with the file description, limits the write size to MAX_RW_COUNT (defined in include/linux/fs.h in the Linux kernel, currently exactly one page less than 2 GiB), verifies the target memory buffer (to write data from) is valid, and then calls the filesystem/device specific ->write() handler. Note that it is up to the filesystem/device specific read/write handlers to honor the O_NONBLOCK flag, too!

Now, because the Linux kernel dictators are pretty big on the principle of least surprise, filesystem drivers should fulfill the entire write (and read). But, the kernel does not verify they do. So, the expectation is that the drivers do fulfill the entire request, but there is no enforcement or guarantees. If a filesystem/device does return a short count when it technically could service the entire operation, the maintainers will accept a clean patch to fix that. However, do we know that there cannot be any reasonable reasons for a device or filesystem to return a short count?

(I've been considering implementing for USB Serial protocol with pipe/socket semantics, bypassing the TTY layer. The O_NONBLOCK semantics are clear and easy; it's the blocking syscall semantics that give me a pause: most userspace programmers use the simplest, least amount of code to get things done, and reality rules. I'd like to return short counts if the device NAKs an USB packet, but will that throw off the simple userspace code?)

So, it boils down to:

Should short reads/writes happen with blocking reads and writes?
No, they shouldn't.

Can short reads/writes happen with blocking reads and writes?
Yes, they can; there is no code in the Linux kernel or the C library that verifies they do not happen; the code that implements them should not return short counts for blocking reads and writes, but technically, they could occur, depending on the kernel driver implementing the object you are reading from/writing to.

ejeffrey · « **Reply #19 on:** March 18, 2021, 05:12:05 pm »

Quote from: Siwastaja on March 18, 2021, 07:29:40 am

Yes, I'm talking about non-blocking sockets or file descriptors opened with O_NONBLOCK, like you would do for /dev/ttyUSB0 for example. I don't know where you did get the idea to limit the discussion to blocking file descriptors only.

Because I considered it implicit in the original question. The question of whether you have to watch out for short writes on non-blocking file descriptors is kind of trivial. File descriptors that are generally blocking is the only interesting case.

DiTBho · « **Reply #20 on:** March 18, 2021, 09:04:14 pm »

I have recently developed a sATA and SCSI "ram-disk" tester in C and used direct-IO. It needs particular care. No doubt about it

gf · « **Reply #21 on:** March 18, 2021, 11:43:47 pm »

Quote from: hamster_nz on March 16, 2021, 11:07:51 pm

Who uses read(), write(), open() & close() functions in C?

Do you:

a) check for EINTR and retry?

b) check for things like short writes, and call write() again?

I do both of these, but some people don't believe it is needed, and some wave it away saying they use and trust SA_RESTART)...

unistd.h, open(), close(), read(), write() are not part of the C standard library (https://en.cppreference.com/w/c/header)

For POSIX systems, the semantics are specified in the POSIX standard, e.g.
https://pubs.opengroup.org/onlinepubs/009696699/functions/read.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
https://pubs.opengroup.org/onlinepubs/009696699/functions/recv.html (for sockets)
...etc...

I think most issues discussed in this thread are answered by the standard.

For portability between systems, I would try to avoid relying on implementation-defined semantics of a particular OS, which are not guaranteed by the POSIX standard.
(For instance, POSIX does not guarantee that read/write on regular files are uninterruptible, although this is an "old Unix tradition" for local on-disk file systems. And indeed, on network filsystems they can be interruptible, too, e.g. NFS mounted with 'intr' option.)

Things are different when talking to proprietary devices/drivers with specific (non-POSIX) behavior. Then the individual driver semantics need to be considered.

hamster_nz · « **Reply #22 on:** March 18, 2021, 11:53:07 pm »

Quote from: DiTBho on March 18, 2021, 09:04:14 pm

I have recently developed a sATA and SCSI "ram-disk" tester in C and used direct-IO. It needs particular care. No doubt about it

The big test for this for me was a multithreaded/multiprocess deeply-buffered data mover, to move TBs of data over parallel TCPIP sockets over medium latency, high bandwidth links between Data Centers. Our comms providers were not happy when we could saturate their backbones

It had lots of signals, used large shared memory segments as buffers, lots of parallelism, lots of throughput.

The one thing surprising thing I did learn is that file pointers (that point to where data will be read and written) are by default shared between parent and child processes when you fork(). Found that out pretty quick!

thinkfat · « **Reply #23 on:** March 19, 2021, 07:17:24 am »

Quote from: hamster_nz on March 18, 2021, 11:53:07 pm

The one thing surprising thing I did learn is that file pointers (that point to where data will be read and written) are by default shared between parent and child processes when you fork(). Found that out pretty quick!

Oh yes, I learned that a while ago, too. Fun fact: those file descriptors (not pointers) are also inherited across an "exec", unless you explicitly open them CLOEXEC. Mighty convenient for inter-process communication, or a gaping security hole waiting to be exploited.

Siwastaja · « **Reply #24 on:** March 19, 2021, 10:13:26 am »

Quote from: ejeffrey on March 18, 2021, 05:12:05 pm

Because I considered it implicit in the original question. The question of whether you have to watch out for short writes on non-blocking file descriptors is kind of trivial.

Oh! You are expecting people to know what they are doing.

Yet, I have seen code where people expect that a non-blocking write() succeeds with full count without checking. Try opening a fresh TCP socket and write() some hundred bytes. Almost 100% success rate nearly guaranteed. Heck, I have done it myself!

And it may work surprisingly well until...

Nominal Animal · « **Reply #25 on:** March 19, 2021, 01:45:25 pm »

Quote from: thinkfat on March 19, 2021, 07:17:24 am

Quote from: hamster_nz on March 18, 2021, 11:53:07 pm
The one thing surprising thing I did learn is that file pointers (that point to where data will be read and written) are by default shared between parent and child processes when you fork(). Found that out pretty quick!
Oh yes, I learned that a while ago, too. Fun fact: those file descriptors (not pointers) are also inherited across an "exec", unless you explicitly open them CLOEXEC. Mighty convenient for inter-process communication, or a gaping security hole waiting to be exploited.

That's why, uh, a friend of mine, recently showed a function creating close-on-exec pipes where neither endpoint is a standard stream, and another function to fork and exec child processes using such pipes and an additional close-on-exec pipe to pass any exec errors back to the parent. Makes complex pipe trees and nets much easier to construct. (CC0-1.0 in the hopes of getting people to use something like that in their own code.)

madires · « **Reply #26 on:** March 19, 2021, 02:11:16 pm »

Quote from: hamster_nz on March 16, 2021, 11:07:51 pm

a) check for EINTR and retry?

b) check for things like short writes, and call write() again?

I do both of these, but some people don't believe it is needed, and some wave it away saying they use and trust SA_RESTART)...

Usually I check the results of file functions, but the details depend on the application, i.e. which specific problems need to be tracked and how to react to them.

TheCalligrapher · « **Reply #27 on:** April 21, 2021, 05:42:11 pm »

Quote from: hamster_nz on March 16, 2021, 11:07:51 pm

b) check for things like short writes, and call write() again?
I do both of these, but some people don't believe it is needed, and some wave it away saying they use and trust SA_RESTART)...

Not sure how those "some people" might believe it is not needed, considering that the situation is very easy to reproduce. Take a `read` that reads from standard input tied to the console. Input a few characters from keyboard and hit Ctrl+D. Here you go: you have a classic short read. If the programmer fails to handle such situations, their code simply won't work properly.

SuntUnMorcov · « **Reply #28 on:** April 29, 2021, 07:13:25 pm »

Quote from: TheCalligrapher on April 21, 2021, 05:42:11 pm

Not sure how those "some people" might believe it is not needed, considering that the situation is very easy to reproduce. Take a `read` that reads from standard input tied to the console. Input a few characters from keyboard and hit Ctrl+D. Here you go: you have a classic short read. If the programmer fails to handle such situations, their code simply won't work properly.

Ignorance. Ignorance is bliss. And if not ignorance, then stubbornness and/or arrogance. But only one of those qualities can be forgiven and only some of the time.

I've seen code in a real-time application that didn't even check the return code of arguably the most important function call within the whole damn product. And the criticality of this product failing in production is severe.

Poor quality code is not a rarity and asking why it exists is akin to asking why some people eat Tide pods and why others put their heads into operating thresher machines.

ataradov · « **Reply #29 on:** April 29, 2021, 07:59:46 pm »

I don't think anybody argues that you should not handle errors at all. But simply exiting on incomplete read is good enough handling of a situation, unless application has a specific requirement to handle situations like this.

PlainName · « **Reply #30 on:** April 29, 2021, 09:03:59 pm »

Quote

I've seen code in a real-time application that didn't even check the return code of arguably the most important function call within the whole damn product.

Sounds terrible, but you've not given us much to go on. Perhaps the developer decided there was nothing that could be done, given the real-time constraints, regardless of the return code, so no point in checking it.

Normally, and in general, one would expect that read/write return codes are checked and appropriate action taken but, sometimes, "rules are for the guidance of wise men and the obedience of fools."

SiliconWizard · « **Reply #31 on:** April 29, 2021, 09:09:34 pm »

Quote from: dunkemhigh on April 29, 2021, 09:03:59 pm

Quote
I've seen code in a real-time application that didn't even check the return code of arguably the most important function call within the whole damn product.

Sounds terrible, but you've not given us much to go on. Perhaps the developer decided there was nothing that could be done, given the real-time constraints, regardless of the return code, so no point in checking it.

Yeah. I much too often hear that argument, while it can rarely be justified. Unless said function failing is harmless, there is almost ALWAYS something that can be done to mitigate the fault. The only reason for not checking error codes would be if failure would have no, or very minor consequences.

thinkfat · « **Reply #32 on:** April 29, 2021, 09:28:12 pm »

Like, are you using the return value of a read() as an index into a buffer without checking.

SuntUnMorcov · « **Reply #33 on:** April 29, 2021, 10:03:43 pm »

Quote from: dunkemhigh on April 29, 2021, 09:03:59 pm

Quote
I've seen code in a real-time application that didn't even check the return code of arguably the most important function call within the whole damn product.

Sounds terrible, but you've not given us much to go on. Perhaps the developer decided there was nothing that could be done, given the real-time constraints, regardless of the return code, so no point in checking it.

Normally, and in general, one would expect that read/write return codes are checked and appropriate action taken but, sometimes, "rules are for the guidance of wise men and the obedience of fools."

I was vague, in this instance, for a reason. But to clarify, not checking the return code from this particular function call was absolutely inexcusable in this situation. You absolutely *need* to know if there is an error from this function call because you *need* to take corrective action *immediately*. In this instance, the choice of the developer was wrong (and mostly caused by ignorance, by their own admission).

And I agree with the sentiment of your point and quote, because that's routed in the exact point I'm trying to make. The wise men are exactly that because they have enough knowledge, experience, and intuition to make informed decisions. The fools are exactly that because of their ignorance, stubbornness, and sometimes arrogance. We can all be both wise men and fools at times (hell, I know I can) and that has an impact on the code people produce.

PlainName · « **Reply #34 on:** April 30, 2021, 12:05:15 am »

Quote from: thinkfat on April 29, 2021, 09:28:12 pm

Like, are you using the return value of a read() as an index into a buffer without checking.

A TV recorder of great reputation I have is suspected of doing that. It's very good, unless the signal is a bit poor, and then after arbitary time it would crash. I also noticed some other STBs having similar issues.

As it happens, a bit after I acquired the DVR (and got annoyed with that aspect), I was involved in making a CCTV DVR and got hold of reference hardware and software from an appropriate chip manufacturer. Yep, the example code would take the incoming packet and read off the data length from the header, then blat that amount of data into a buffer. No check that the packet was pukka, or that the length field was meaningfully less than the buffer size. I suspect a fair number of STB badgers copied similar reference code as a quick route to market.

PlainName · « **Reply #35 on:** April 30, 2021, 12:07:15 am »

Quote

You absolutely *need* to know if there is an error from this function call because you *need* to take corrective action *immediately*.

Fair enough

TheCalligrapher · « **Reply #36 on:** April 30, 2021, 06:09:57 am »

Quote from: ataradov on April 29, 2021, 07:59:46 pm

I don't think anybody argues that you should not handle errors at all. But simply exiting on incomplete read is good enough handling of a situation, unless application has a specific requirement to handle situations like this.

I'm not sure what you are trying to say by this. A spurious short read or a short write is not an error at all. It is perfectly normal behavior of `read` and `write`. Why would anyone even think of "exiting" in response to such situation is beyond me.

This is a fundamental difference between POSIX `read`/`write` and C standard library `fread`/`fwrite`. The latter are not allowed to perform a short read/short write spuriously. With the latter an incomplete read or write is indeed an indication of an error.

But that is not the case with POSIX `read`/`write`. These functions are explicitly allowed to perform a short read/short write "for no reason", as part of their normal behavior. This peculiarity of their behavior has to be taken into account when writing code in terms of `read`/`write`.

ataradov · « **Reply #37 on:** April 30, 2021, 06:17:45 am »

Quote from: TheCalligrapher on April 30, 2021, 06:09:57 am

I'm not sure what you are trying to say by this. A spurious short read or a short write is not an error at all. It is perfectly normal behavior of `read` and `write`. Why would anyone even think of "exiting" in response to such situation is beyond me.

Because it does not actually happen on modern OSes outside of direct user input (that Ctrl-D stuff) and other extreme scenarios. It is generally not worth handling that.

Just like it is not worth trying to do something useful if malloc() return NULL. I simply don't care and just exit in that situation.

SiliconWizard · « **Reply #38 on:** May 01, 2021, 05:52:39 pm »

There is a difference between not checking at all and just "exiting" if an error occurs.

If you just "exit in that situation", that means you have at least checked. Note that some people will even argue that checking for NULL for malloc() is also useless, as, as your argument is, "on modern OSs", dereferencing NULL will trigger an access violation and exit the program anyway. So the motto of this category of developers is: "just let it crash". Seriously.

This sounds all very yucky to me.

Of course if you're mostly writing non-critical tools, that behavior is alright. As I said earlier, it has no consequence, or only very minor.

This is unacceptable for anything critical though. And as I said earlier, when we claim "there is no way we can handle this case anyway", most of the time, it is simply not true. It's often plain laziness mixed with some bad faith. There are many many examples of this of course. When we were talking about the Boeing 737MAX issue, I remember some also claimed there was nothing that could be done in case the sensors didn't agree with each other. Yeah. It took months of grounding and a large commercial disaster, but eventually they figured something could be done after all.

TheCalligrapher · « **Reply #39 on:** May 01, 2021, 07:51:18 pm »

Quote from: ataradov on April 30, 2021, 06:17:45 am

Quote from: TheCalligrapher on April 30, 2021, 06:09:57 am
I'm not sure what you are trying to say by this. A spurious short read or a short write is not an error at all. It is perfectly normal behavior of `read` and `write`. Why would anyone even think of "exiting" in response to such situation is beyond me.
Because it does not actually happen on modern OSes outside of direct user input (that Ctrl-D stuff) and other extreme scenarios. It is generally not worth handling that.

That's absolutely not true. It appears that you restricted your consideration to a very narrow field of physical files only. Eliminating short reads is only possible when you are working with a regular file in a modern file system: a file system that knows in advance how much data is available in the file and has immediate access to all that data. Yet, POSIX states that even in that case (!) you might end up with a short reads caused by signals.

But `read` offers a much wider functionality than that. `read` can read pipes, `read` can read sockets, `read` can read FIFO files, `read` can read atty-streams and so forth and so on. With all these kinds of inputs short reads is an unavoidable everyday reality. They will always happen. Every time you request 100 bytes from a pipe when only 60 bytes is currently available, `read` will only read 60 and return immediately. `read` will never wait for the other 40 bytes to fullfill your request. `read` is required to do a short read.

This is not an error. This is not an "end-of-data" situation. This is just... normal. Absolutely nothing "extreme" about that scenario.

You just have to remember about it and write your code accordingly. Or, leave `read` alone and just stick to `fread`, which takes care of all these issues for you.

ataradov · « **Reply #40 on:** May 01, 2021, 08:12:26 pm »

Yes, I did. Unless it is an explicit requirement for the appliation, I consider only normal files. This may my software inferior and if someone does not want to use it - they are free to do so.

I hate the idea of "everything is a file", so I will not support it in my software.

And sure, if used with sockets, there are a lot of things that need to be handled. But again, this is something that actually happens a lot under normal conditions.

SuntUnMorcov · « **Reply #41 on:** May 05, 2021, 07:19:47 pm »

Quote from: SiliconWizard on May 01, 2021, 05:52:39 pm

There is a difference between not checking at all and just "exiting" if an error occurs.

If you just "exit in that situation", that means you have at least checked. Note that some people will even argue that checking for NULL for malloc() is also useless, as, as your argument is, "on modern OSs", dereferencing NULL will trigger an access violation and exit the program anyway. So the motto of this category of developers is: "just let it crash". Seriously.

This sounds all very yucky to me.

Of course if you're mostly writing non-critical tools, that behavior is alright. As I said earlier, it has no consequence, or only very minor.

This is unacceptable for anything critical though. And as I said earlier, when we claim "there is no way we can handle this case anyway", most of the time, it is simply not true. It's often plain laziness mixed with some bad faith. There are many many examples of this of course. When we were talking about the Boeing 737MAX issue, I remember some also claimed there was nothing that could be done in case the sensors didn't agree with each other. Yeah. It took months of grounding and a large commercial disaster, but eventually they figured something could be done after all.

As someone who works in the aerospace industry, I absolutely second all these points.

Just for some additional insight, I know of some aircraft system failures where the only thing that can be tried (as a last-ditch effort, with *very* slim chances of success) will most likely prove fatal to the passengers of an aircraft. As these are cases where everyone is going to die anyway, these last-ditch efforts are executed. But that's all I can say on that subject (due to NDAs - please do not ask me to elaborate).

PlainName · « **Reply #42 on:** May 05, 2021, 09:58:58 pm »

Quote

There is a difference between not checking at all and just "exiting" if an error occurs.

If you just "exit in that situation", that means you have at least checked. Note that some people will even argue that checking for NULL for malloc() is also useless, as, as your argument is, "on modern OSs", dereferencing NULL will trigger an access violation and exit the program anyway. So the motto of this category of developers is: "just let it crash". Seriously.

This sounds all very yucky to me.

I think you may be swinging right over to one extreme and extrapolating that to everything. Not checking for an error doesn't automatically mean "let it crash". For instance, you may be sending data via UDP and you know that nothing that can be returned affects you - if the data isn't sent... well, it's an unreliable protocol anyway so what do you care? What are you going to do about it? So you just don't bother checking the return - nothing will crash and burn, just that your message doesn't even enter the stack, never mind get lost on the network.

That's very different to not checking the return of malloc() and just using the pointer anyway. But you're classing them both as "just let it crash" mentality.

hamster_nz · « **Reply #43 on:** May 05, 2021, 10:45:47 pm »

Quote from: dunkemhigh on May 05, 2021, 09:58:58 pm

Quote
There is a difference between not checking at all and just "exiting" if an error occurs.

If you just "exit in that situation", that means you have at least checked. Note that some people will even argue that checking for NULL for malloc() is also useless, as, as your argument is, "on modern OSs", dereferencing NULL will trigger an access violation and exit the program anyway. So the motto of this category of developers is: "just let it crash". Seriously.

This sounds all very yucky to me.

I think you may be swinging right over to one extreme and extrapolating that to everything. Not checking for an error doesn't automatically mean "let it crash". For instance, you may be sending data via UDP and you know that nothing that can be returned affects you - if the data isn't sent... well, it's an unreliable protocol anyway so what do you care? What are you going to do about it? So you just don't bother checking the return - nothing will crash and burn, just that your message doesn't even enter the stack, never mind get lost on the network.

That's very different to not checking the return of malloc() and just using the pointer anyway. But you're classing them both as "just let it crash" mentality.

The return value of read() and write() seems to fall uncomfortably into the middle ground. Leading to working but somewhat 'snowflake' software. It works well as long as nobody presses CTRL+Z, or you don't redirect to/from a pipe, or a socket or does anything else unexpected.

ejeffrey · « **Reply #44 on:** May 09, 2021, 05:15:49 pm »

Quote from: SiliconWizard on May 01, 2021, 05:52:39 pm

There is a difference between not checking at all and just "exiting" if an error occurs.

If you just "exit in that situation", that means you have at least checked. Note that some people will even argue that checking for NULL for malloc() is also useless, as, as your argument is, "on modern OSs", dereferencing NULL will trigger an access violation and exit the program anyway.

The normal argument here was that in a well behaved program (that doesn't allocate obviously wrong blocks of memory) on many systems malloc would normally not fail and just overcommit. Instead if you actually run out if memory you will just thrash to death or get killed by the OOM killer. This used to be mostly true a long time ago but the overcommit heuristic is a lot better now.

The argument "don't check for an error my target system will not actually report" is a bit limiting but not intrinsically wrong.

Exceptions are a much more convenient way to handle this. I dislike exceptions for ordinary handleable results but for an allocation failure where you likely can't recover but might need to do some cleanup it is a nice way to go.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: C: read() and write(). (Read 6412 times)

Share me