Author Topic: What is the HTTP server-client interaction to do file transfers?  (Read 6270 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #50 on: August 01, 2022, 12:11:32 pm »
Quote
Do whatever you wish

You probably come from a different end of things, where you have a proper server to work on, whereas I am having to byte-bang everything at a low level, and with very limited memory.

Quote
the HTTP method (POST or PUT), is irrelevant to the encoding.

AIUI, the data is sent as binary, so it affects only the delimiters of the data, which are messy to parse in my situation, and which using PUT avoids.

Maybe there is something else but I haven't discovered it yet.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline tellurium

  • Regular Contributor
  • *
  • Posts: 226
  • Country: ua
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #51 on: August 01, 2022, 12:31:38 pm »
AIUI, the data is sent as binary, so it affects only the delimiters of the data, which are messy to parse in my situation, and which using PUT avoids.

You missed the point again.

Those "delimiters" are irrelevant to PUT or POST. What you call "delimiters", is the format of the body - please re-read my message again if you're interested . PUT does not "avoid" anything. You can make PUT request with the same "delimiters", or POST request without those.

I came from the same end of things, dealt with networking on a very RAM constrained environments, more constrained than STM32F4 you're dealing with.
Open source embedded network library https://mongoose.ws
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #52 on: August 07, 2022, 08:01:42 pm »
I am coming back to this for some peripheral stuff. My web server is 99% finished, downloading and uploading etc.

What I remain puzzled about is the state of the client after a file transfer is finished.

In both directions, a byte count is sent and the receiving end is using this to find out the end.

If downloading a file (to a client browser), I use

Code: [Select]
// Header for file download
static const uint8_t DOWNLOAD_HEADER[] =
"HTTP/1.1 200 OK\r\n"
"Content-Type: application/octet-stream\r\n"
"Content-Disposition: attachment>\r\n"
"<meta http-equiv=refresh content=1000>\r\n" // cancels out the file listing refresh
"Pragma-directive: no-cache\r\n"
"Cache-directive: no-cache\r\n"
"Cache-control: no-cache\r\n"
"Pragma: no-cache\r\n"
"Expires: 0\r\n"
;

and then send this (not a fixed header because it has to have a date and size in it - the date is actually another story since it is either ignored or the EXIF, if present, is used to stamp the received file, by some browsers... much discussion online about changing this and doing date stamps properly but most browser developers are against it):

Code: [Select]
// send filesize, as "Content-Length: 1910916\r\n"
strcpy((char*)pagebuf, "Content-Length: ");
itoa(fno.fsize,(char*)&pagebuf[16],10); // place size after "Content-Length: "
strcat((char*)pagebuf,"\r\n");
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

// Send date/time, as "Date: Mon, 21 Oct 2015 07:28:00 UTC\r\n"
// To save the hassle of calculating the day of week (which isn't stored in the directory anyway)
// we use Mon in case the client is validating the presence of the day string ;)
// For jpegs, FF and Edge extract this from exif, interestingly, if header is missing.
// Otherwise browsers appear to ignore this, and same with Last-Modified.

strcpy((char*)pagebuf, "Date: Mon, ");
int monidx = 3*(((fno.fdate >> 5) & 15)-1);
snprintf(datebuf,sizeof(datebuf),"%2u %c%c%c %4u %02u:%02u:%02u%c",
fno.fdate & 31,
montab[0+monidx],
montab[1+monidx],
montab[2+monidx],
(fno.fdate >> 9) + 1980,
fno.ftime >> 11,
(fno.ftime >> 5) & 63,
2*(fno.ftime & 0x1f),
0);
strcat((char *)pagebuf, datebuf);
strcat((char *)pagebuf, "\r\n\r\n");   // 2xCRLF is the last thing before the binary file data
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

and then the binary data is sent. Nothing is sent after it, and nothing seems to get to the client even if it is sent, as if the browser has unilaterally closed the connection. But is that really true? What is the actual browser state after a "Content-Disposition: attachment\r\n" download? I am suspecting that the file is treated as within an HTML body, and maybe doing a </body> may do something. It's quite strange. All I want to do at this point is to send the browser to a specific URL on the server, or even to simulate pressing F5 (which is impossible in HTML; only client-side JS can do that).

What is clear is that one cannot do a progress bar from the server end, because the binary data is being sent down in the same direction. But browsers give you progress anyway...

If uploading a file to a server, I am using JS to create a PUT (can't be done in HTML in response to a form sent to the client, e.g. returning a TEXTAREA, if done in HTML, goes back as a POST). This all works too, but I struggle to get the server, upon successful receipt of all the bytes, to send back the "200 OK" (or some failure) message. The JS is written to look for it but it isn't getting it.

It is as if, again, a PUT transfer to a server, while obviously keeping alive that data direction during the transfer, has closed the connection in the other direction. Is that possible? I am returning one of these, according to the upload success

Code: [Select]
// Return this to client if file write (textarea file or Upload file) got written ok
static const uint8_t FILE_WRITE_GOOD[] =
"HTTP/1.1 200 OK\r\n"
"Content-type: text/plain\r\n"
"Server: XXXXXX\r\n\r\n"
;


// Return this to client if file write (textarea file or Upload file) failed
static const uint8_t FILE_WRITE_BAD[] =
"HTTP/1.1 413 Write Error\r\n"
"Content-type: text/plain\r\n"
"Server: XXXXXX\r\n\r\n"
;

I can post the JS script but if I did that nobody would reply ;)

It also seems possible to do an HTML-only "progress bar" (even if it is just a string of dots, getting longer) on an Upload, because there is no data going server -> client in that situation. I have a nice progress report from the JS script actually. Has anyone tried this? I realise anything sent to a browser has to be a properly formed HTML page, so you would do <head></head><body> etc and then within the body you would emit some dots, or numbers, etc and then </body>. Why would this not work? In HTML it isn't possible to do a "clear screen" (again, lots of people have asked this, and the answer seems to be that the HTML for a fresh page does a CLS implicitly) and it isn't possible to do a "cursor home" (CR, no LF) for the same reason. But you could output a string of dots, say one every 10k bytes received. What is wrong with that? I have seen such things but they may have been done with JS.

One final thing I am seeing on the uploads is that if uploading say a 2MB file, the progress goes to ~500k immediately and then increments at the expected speed (30kbytes/sec flash writing) until 100% done. And then when it gets to 100% on the client, I see flash writing carrying on for some tens of seconds. This looks like 500k of transmit caching in the PC client stack. Is that really possible?

This is the relevant bit of JS

Code: [Select]
function updateProgress(evt)
{
   if (evt.lengthComputable)
   {
     document.getElementById("progress").innerHTML = evt.loaded + " of " + evt.total + ", " + (evt.loaded/evt.total*100).toFixed(1) + "%";
   }
}

function uploadFile()
{
const fileInput = document.getElementById('file');

if(fileInput.files.length == 0) {
alert("Select a file to upload first");
return;
}

document.getElementById("submit").disabled = true;

const fileReader = new FileReader();
fileReader.addEventListener("load",   function (e) {
const rawData = e.target.result;
const putRequest = new XMLHttpRequest();
putRequest.open("PUT", "/ufile=" + fileInput.files[0].name);
putRequest.upload.addEventListener("progress", updateProgress, false);
putRequest.addEventListener("load", function (f) {
if(putRequest.status == 200 || putRequest.status == 201) {
document.getElementById("progress").innerHTML = '';
alert("Upload succeeded");
history.back();
} else {
alert("Upload failed with code " + putRequest.status);
document.getElementById("progress").innerHTML = 'Upload failed.';
document.getElementById("submit").disabled = false;
}
});
                putRequest.send(rawData);
    });
    fileReader.readAsArrayBuffer(fileInput.files[0]);
}

Quite a lot of people have posted this kind of thing online, almost always without any resolution, because nobody seems to be doing this stuff at the byte level.
« Last Edit: August 07, 2022, 08:10:33 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline mariush

  • Super Contributor
  • ***
  • Posts: 4982
  • Country: ro
  • .
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #53 on: August 07, 2022, 08:43:52 pm »
Jesus again with the same questions.

You don't seem to understand some fundamental things.

Once a page is loaded and parsed, it's like it's an island, with its own universe.
When you click on a link,  it's like the browser will create a new island, with its own separate rules.  Sending the meta refresh in the response header not only is invalid, incorrect, stupid, but it would not work, because NOTHING sent in this new island, in this new instance, whatever you want to call it, will happen on the original page.
The original page will continue to refresh inside the browser, if the user opens the download link in a new tab or if you use content disposition attachment . If you use inline download, the active page is killed and the browser reuses the page for download.

Then it's something super basic.

Browser creates a connection to your web server
Browser sends the  request headers 
GET  page name / protocol
Domain : value
Key : value

until it sends an empty line.
These "Key : Value" pairs may tell the web server that the browser is capable of doing some things. For example: Connection : keep-alive  tells the web server "Hey server, I'm smart enough to reuse the established connection to request more files, so if you agree with my request, leave the connection open and I may request more files on this same connection"
 
Then, it waits for a response from you, the web server.

The web server's job is to parse these request headers and respond with a suitable response header, and then serve the content of the page, and then close the connection UNLESS you chose to support features like what I said "Connection : keep-alive

In your case, you should say  Connection : close in your response header,  which tells the browser  "No dude, I don't feel like serving multiple requests / giving you multiple files on the same connection, as soon as I'm done transferring this page/download you SHOULD close the connection - I may or may not close the connection from my end but either way I'm gonna ignore whatever further requests come on this connection"


A download is a download, it has nothing to do with HTML tags, you don't add  body tags, you don't add html tags ... you send the response headers and you dump the binary content of the file, and once it's done you should CLOSE THE CONNECTION.


again

"<meta http-equiv=refresh content=1000>\r\n"   // cancels out the file listing refresh

in the DOWNLOAD_HEADER  IS NOT PROPER , it's garbage, any browser should ignore it, or even go further and reject your whole page and not load it, it's malformed response header. Everything after HTTP/1.1 200 OK  should be Key : Value pairs ... pay attention how that line is not a key : value pair.   

And I think I've said it in a previous post.  Your web server should not look into pictures for exif information and to extract dates from there... it's a sure way to introduce vulnerabilities in your project.
Your download code doesn't even have to send that Date: whatever pair.


You should NOT use Javascript to show progress bars and show dots or whatever, first do it the regular POST style which is the easiest and then complicate your life with these other things. Use Javascript for client-side only stuff that won't affect you, like sorting file lists in the browser, or retrieving the file list through some super basic API , instead of using meta refresh or some other shit like that. 

Figure out  multiple independent connections at the same time ... access your device in several tabs at the same time, and click on various parts of the interface in each time, do they get mixed or what happens... what happens when you click on a link in a tab, while you download a file in another tab?
These are things you have to fix first and then worry about more features.





« Last Edit: August 07, 2022, 08:46:08 pm by mariush »
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3032
  • Country: us
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #54 on: August 07, 2022, 11:02:40 pm »
Regarding how to handle form submissions and file uploads, you might find this article on the PRG pattern (or "redirect after post") helpful:

https://en.wikipedia.org/wiki/Post/Redirect/Get


 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #55 on: August 08, 2022, 08:51:43 am »
Quote
Jesus again with the same questions.
You don't seem to understand some fundamental things.

No need to be rude.

Quote
Then it's something super basic.

If you are super clever.

Quote
A download is a download, it has nothing to do with HTML tags, you don't add  body tags, you don't add html tags ... you send the response headers and you dump the binary content of the file, and once it's done you should CLOSE THE CONNECTION.

The Q I asked, which you didn't answer, is what state the browser is in. Has it closed the connection?

Quote
"<meta http-equiv=refresh content=1000>rn"   // cancels out the file listing refresh

Yes this was in the wrong place; it has to be inside <head>, not <body>. I actually removed these because they were not needed; the change of context (e.g. invoking
a textarea) stopped the refresh.

Quote
Your web server should not look into pictures for exif information and to extract dates from there... it's a sure way to introduce vulnerabilities in your project.
Your download code doesn't even have to send that Date: whatever pair.

You misread. It is the browser which looks for EXIF and uses it to create a date for the downloaded file.

This file on the server



gets saved in Chrome or FF (not Edge) as



If you think I am stupid, try it. It gets it out of EXIF (not spent time testing which of the three dates does it, or whether browsers do it on TIFF files also):



Quote
You should NOT use Javascript to show progress bars and show dots or whatever, first do it the regular POST style which is the easiest and then complicate your life with these other things. Use Javascript for client-side only stuff that won't affect you, like sorting file lists in the browser, or retrieving the file list through some super basic API , instead of using meta refresh or some other shit like that.

That's your opinion. If writing a 2MB file, to a flash filesystem whose write speed is 30kbytes/sec, a progress bar is highly desirable. You also didn't comment on whether during an upload, textual data can be sent to the browser.

Quote
Regarding how to handle form submissions and file uploads, you might find this article on the PRG pattern (or "redirect after post") helpful:
https://en.wikipedia.org/wiki/Post/Redirect/Get

Thanks - that's really clever :) Somebody sent me this cunning trick but I didn't get around to testing it because I had already moved to a JS version

Code: [Select]
static const unsigned char GOTO_URL[] =
"<body><script>rn"
"window.location.replace("/xxxxxx.html")rn"
"</script>nr"
"</body>"
;

The only drawback is that the above JS has to be inside the <body> context. You can't just send it to a browser, presumably because once a browser has seen </html>, or finished downloading a file, it closes the connection. So any "go to URL" done at that point has to be initiated by the browser, which implies a) user doing it, or b) some JS doing it.

---

If anyone has any idea why uploads run instantly to ~520k and then run at the target data acceptance speed, it would be interesting. It is as if, on an upload, the browser does a malloc() of 520k, fills that up (instantly) and feeds data out of that to the windows tcp/ip socket. So an upload progress report looks a bit silly, but more importantly the data keeps flowing out of the browser to the target for some tens of seconds after the browser has finished. That said, Edge does not do the foregoing at all; it first sends the file header (ending with CRLFCRLF) as a single packet, and then sends the file out without caching!


« Last Edit: August 08, 2022, 12:28:21 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline httpmies

  • Newbie
  • Posts: 2
  • Country: fi
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #56 on: August 08, 2022, 03:41:58 pm »
the date is actually another story since it is either ignored or the EXIF, if present, is used to stamp the received file, by some browsers... much discussion online about changing this and doing date stamps properly but most browser developers are against it)

The Date header tells the browser when the *message* was produced. In this context, "message" refers to the HTTP response which contains the data in the body. Thus, the date is not indicative of when the file was modified, but of when this HTTP request which is transferring that file was generated. This header is useful for caching proxies and other middleware on the internet, but not so useful to you.

Nothing is sent after it, and nothing seems to get to the client even if it is sent, as if the browser has unilaterally closed the connection. But is that really true? What is the actual browser state after a "Content-Disposition: attachment\r\n" download?

Content-Disposition: attachment simply instructs the browser that the response body should be downloaded, so browsers show a "Save as" dialog. The default is inline, which indicates that the body should be displayed as a web page or inside the web page. The browser state after such a download is the same as after any other HTTP response, which is "it depends".

The browser will usually want to keep the TCP connection alive after a request, as it's probable that there will be further requests. When using HTTP/1.1, keeping the connection open is the default for requests. This can be controlled with the header Connection. For HTTP/1.0 requests, the default is Connection: close, meaning that the browser will close the TCP connection after each request. Although in practice the browser may pre-emptively open another TCP connection.

If you're interested in seeing how your browser behaves on the TCP level, you could download Wireshark and do some packet captures. If you try on a server running on your localhost, choose the loopback interface. If you try on a server running in your LAN or on the internet, choose the network interface which you use to connect to those, and set "tcp" as your capture filter. I took a screenshot of a Wireshark capture of a server running locally which keeps the connections alive for 5 seconds after each request. See attachment.

However, "when will the browser close the connection" is also related to the question of "when will the browser interpret the response to have ended", and the Connection header is only one part of it.

The only drawback is that the above JS has to be inside the <body> context. You can't just send it to a browser, presumably because once a browser has seen </html>, or finished downloading a file, it closes the connection.

So, when will the browser interpret the response to have ended? RFC-9112 says the following:
Quote
2.2. Message Parsing
The normal procedure for parsing an HTTP message is to read the start-line into a structure, read each header field line into a hash table by field name until the empty line, and then use the parsed data to determine if a message body is expected. If a message body has been indicated, then it is read as a stream until an amount of octets equal to the message body length is read or the connection is closed.
Emphasis mine. The browser will wait for more content until it reaches the length specified by the Content-Length header, or until the TCP connection is closed. The Content-Length header is also related to the Transfer-Encoding header, as the former is useless when using chunked encoding (which you could use to drip-feed new content onto a page by keeping the response mid-flight forever - though, this is very unidiomatic).
Edit: The RFC 9112 subsection 6.3 Message Body Length outlines the entire algorithm of the spec - it is slightly more involved than just content-length, but not by much.

Just sending a </html> will do nothing to the connection. Browsers are also almost painfully lenient on malformed HTML, and will be more than happy to merge multiple <html> trees into one <body> for you, depending on the browser.

To see for yourself, install netcat, run one of the following in your *nix terminal, and connect to http://localhost:3000 on your browser and observe its behavior.

1. No content-length
Code: [Select]
echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r\n<\!DOCTYPE html><html><html><body>123</body></html>" | nc -l -p 3000
2. Content-length
Code: [Select]
echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 50\r\n\r\n<\!DOCTYPE html><html><html><body>123</body></html>" | nc -l -p 3000

Notice that the example 1 will render the page, but will show the loading spinner until you cancel the request. The second example will finish loading successfully. To emulate a "until the connection is closed" situation, press ctrl c in the first terminal.


What is clear is that one cannot do a progress bar from the server end, because the binary data is being sent down in the same direction. But browsers give you progress anyway...
This is true. The browser's progress bar when downloading is based on the Content-Length header, which the server will use to tell the client how large the response body is.

You have multiple options for indicating progress. Your files will need to be in a <form> with a <input type="file"/> for binary files, but textual content can be in a text area - whatever.
  • Use the browser' POST submit mechanics to transfer the file (via <input type="submit"/> or <button type="submit"/>). To transfer files from an input[type="file"], you'll need to use multipart/form-data as the form's  enctype, which means you'll need to parse the boundary separators on the server. Progress indicator provided by browser itself (in the bottom left corner as a percentage, often).
    • Since you also want to be able to modify text files in a <textarea>, these can be transferred as application/x-www-form-urlencoded, which may be easier to parse. To do that, just leave out enctype in the <form>, or explicitly set it.
    • Using the browser's form submission mechanics allows you to respond with a redirect from the server to implement the PRG pattern. Just send a 302 Found with a "Location: /xxxxxx.html" and no content.
  • Use JavaScript to submit the form as FormData via XMLHttpRequest, and monitor the progress events
    • Still have to parse the multipart for binary files
    • Can programmatically show a progress bar, and cancel the upload.
  • Use JavaScript to submit the form's file directly with XMLHttpRequest, and monitor the progress events
    • No need to parse multipart, as you send the file's bytes directly
    • Can programmatically show a progress bar, and cancel the upload.

Here's an example of option 3. Start a dummy netcat server on localhost:3001 with the following command:
Code: [Select]
while true; do echo -e "HTTP/1.1 200 OK\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Headers: *\r\n\r\n" | nc -l -p 3001; done
HTML & associated script, can just open this directly in your browser to test it out:
Code: [Select]
<!DOCTYPE html>
<html>
  <head>
    <title>Title</title>
  </head>
  <body>
    <div>progress: <span id="progress"></span></div>
    <form id="form">
      <input type="file" name="file_input" />
      <button type="submit">submit</button>
    </form>

    <script>
      const POST_URL = "http://localhost:3001/foobar";
      async function readFile(file) {
        return new Promise((resolve, reject) => {
          const fileReader = new FileReader();
          fileReader.onload = (e) => {
            const arrayBuffer = e.target.result;
            resolve(arrayBuffer);
          };
          fileReader.error = (ev) => {
            reject(new Error("FileReader error"));
          };

          fileReader.readAsArrayBuffer(file);
        });
      }

      const progressBar = document.getElementById("progress");

      document.getElementById("form").addEventListener("submit", async (e) => {
        console.log("submit");
        e.preventDefault();

        const form = e.currentTarget;
        const fileInput = form["file_input"];
        if (fileInput.files.length <= 0) {
          // no files selected
          return;
        }

        const file = fileInput.files[0];
        console.log(`Uploading ${file.name} (${file.type})`);
        const type = file.type; // MIME type (for Content-Type)
        const fileBuffer = await readFile(file);

        const request = new XMLHttpRequest();
        request.upload.onprogress = (e) => {
          if (!e.lengthComputable) {
            progressBar.innerText = "<incomputable>";
            return;
          }
          const percentage = (e.loaded / e.total) * 100;
          progressBar.innerText = `${percentage.toFixed(1)}%`;
        };
        request.upload.onload = (e) => {
          // see also onerror, onabort
          progressBar.innerText = "upload done, processing...";
          // now, wait for response
        };
        request.onreadystatechange = () => {
          if (request.readyState === XMLHttpRequest.DONE) {
            // grab response as text from request.responseText
            console.log("got response");
          }
        };
        request.open("POST", POST_URL);
        request.setRequestHeader("Content-Type", type);
        request.send(fileBuffer);
      });
    </script>
  </body>
</html>
« Last Edit: August 10, 2022, 12:06:36 am by httpmies »
 
The following users thanked this post: peter-h

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #57 on: August 09, 2022, 05:56:07 am »
Thank you all, especially httpmies for the super detail.

Today I made a lot of progress on the little bits which weren't right.

Firstly, I can confirm that after a browser has finished uploading a file to the server, it does not close the connection. It is thus capable of receiving whatever error response e.g.

Code: [Select]
// Return this to client if file write (textarea file or Upload file) failed
static const uint8_t FILE_WRITE_BAD[] =
"HTTP/1.1 413 Write Error\r\n"
"Content-type: text/plain\r\n"
"Server: xxxxxx\r\n\r\n"
;

The reason this was previously not working was a subtle mistake in the data receiving loop, where a blocking function netconn_recv() was called at the very end of the transfer, but there was no more data coming... It needed just one more byte to come out of it. Now I terminate the loop once the declared file size has been fully received, so netconn_recv() gets called only if the count is incomplete.

This made me wonder about timeouts. TCP/IP is error corrected end to end, but if the link actually broke during the transfer, you don't want it to hang. LWIP must implement some sort of recovery, but it appears that the default setting is to disable timeouts. Could that possibly be right? Maybe it refers to different kinds of timeouts. Currently, netconn_recv() never returns if the data stops. I am surprised a TCP/IP stack could function usefully with no timeouts.

Anyway I set up a 5 sec timeout on LWIP. Interestingly netconn_recv never returns a "timeout" errorcode; it returns only "connection closed", even if I close the browser itself halfway through an upload.

On downloads, I removed the file date header and all works as before. Thanks for that tip. The file gets saved with the current date, except with Chrome and FF where there is EXIF as mentioned before.

Downloads do appear to cause the browser to close the connection upon completion. This is just HTML, no JS involved. Further data from the server just disappears. And I am sending a fully formed web page.

Of the 3 browsers, only FF does the 500kbyte+ cache on uploads, somewhat messing up any progress bar in the process. This doesn't matter. You just get the Transfer Complete popup (from JS) about 15 seconds after the upload reached 100% :) I could upload in chunks but then I have to implement that on the server.

I used Wireshark a long time ago and really should get back to it. In fact just learning the browser debugging tools would be good. There is some way to see all data going back and forth but I have not been able to work it out.

Interesting that the keep alive causes the server to send 304 every 5 seconds. I could implement that, although my "server" is a primitive one which is mostly stateless and just responds to each client request immediately.
« Last Edit: August 09, 2022, 08:04:10 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline httpmies

  • Newbie
  • Posts: 2
  • Country: fi
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #58 on: August 09, 2022, 10:39:45 pm »
Firstly, I can confirm that after a browser has finished uploading a file to the server, it does not close the connection. It is thus capable of receiving whatever error response e.g.

...

Downloads do appear to cause the browser to close the connection upon completion. This is just HTML, no JS involved. Further data from the server just disappears. And I am sending a fully formed web page.

I suspect some of the frustration in this thread originates from a differing mental model of HTTP. Please allow me to try to transfer some of my mental model to you.

HTTP is a stateless protocol. It is communicated over a stateful TCP connection, which exposes a reliable duplex stream of bytes. An important note is that there is no such thing as a HTTP connection, as data transfer with HTTP happens via a sequence of request-response pairs (in essence, though there exists HTTP/1.1 pipelining and all sorts of multiplexing in HTTP/2 and /3). Whether the underlying TCP connection is closed in between request-response pairs depends, as I mentioned earlier. A client (e.g. a browser) makes a request for a resource (e.g. GET /path), the server responds with some data or a redirect, and that's it. Your browser only knows that a response is associated with its request because they happen sequentially in the TCP connection.

This is essentially what RFC-9110 "HTTP Semantics" says in the subsection 3.3. Connections, Clients and Servers and 3.4. Messages. As you're implementing a HTTP server from scratch, I would recommend glancing over the section 3. Terminology and Core Concepts. Even though it's a RFC, this one is quite human readable.

When you say that after "finishing uploading a file to a server, [a browser] does not close the connection" and that "downloads do appear to ... close the connection", my assumption is that you mean that you're trying to write a HTTP response with your microcontroller after handling a "download", and the server cannot send another document to the browser after already writing a "download".

Please open the image I've attached. It is a sequence diagram (created with PlantUML, if you're curious) of what happens when a user submits a small binary file to a server from the following page:

Code: [Select]
<!DOCTYPE html>
<html>
  <body>
    <form action="/" enctype="multipart/form-data" method="post">
      <input type="file" name="file_input" />
      <button type="submit">submit</button>
    </form>
  </body>
</html>

As you can see from the diagram, after the browser performs the form submission ("uploads a file"), it is expecting a request as I mentioned earlier. I think that this is related to your observation that "after ... uploading a file, [the browser] does not close the connection", if by connection you mean "the request-response sequence". Indeed, it waits for a response from the server. In the diagram, the server responds to the "upload" by redirecting the browser to navigate to the address of the newly created file (302 Found) so that the user downloads said file.

The operation of "downloading" is in fact the browser making a request for a specific path, and the server sending a HTTP message with the contents of that file or whatever content it pleases as the message body. The meaning attached to the colloquial use of "downloading" also implies the familiar "Save as..." dialog, and the tab closing as the file begins to download to disk in the background. The only difference between this behavior, and your browser simply showing a web page (you navigating to a web address) is whether or not the server included Content-Disposition: attachment in the headers of the response.

Turns out, there's actually nothing special about uploads or downloads - simply HTTP messages with headers and a body, which have semantics attached to them through standards (e.g. browser should use the POST method when submitting form data to the server and encode the fields like so and so, 204 No Content shouldn't have a body, etc.) and conventions (e.g. if I ask for /foo.exe, I'm expecting it to start a download for a file named foo.exe via Content-Disposition: attachment).

It now makes sense why "downloads cause the browser to close the connection upon completion", since the server has already written the response (the contents of the file), and it cannot write another response immediately after another one. From the perspective of the browser, it has completed what you've asked of it, and is ready to wait for the user's next move (what request shall we make next?).

LWIP must implement some sort of recovery, but it appears that the default setting is to disable timeouts. Could that possibly be right? Maybe it refers to different kinds of timeouts.
I'm not familiar with LWIP, but I would expect there to be some timeouts at least. Perhaps your browser is just keeping the TCP socket open after you close the tab, if you have other tabs open? I would investigate this with Wireshark to check, or use your browser's developer tools (see below).

I used Wireshark a long time ago and really should get back to it. In fact just learning the browser debugging tools would be good. There is some way to see all data going back and forth but I have not been able to work it out.
Oh dear! So sorry to hear that you've been debugging blind! Of the 3 major browsers, I've found that Chrome (and other Chromium-based browsers like Edge) currently offers the best development tools currently. FF is pretty close as well.

Press F12 to open up the devtools drawer (you can move it around or pull it out into its own window from the three vertical dots menu on the right). There's multiple ways of opening the devtools, but this one is simple.

- The Elements tab shows the currently rendered HTML (may differ from what the server sent - to see that, right click on the web page and choose "View source").
- The Console shows...console messages and warnings.
- The Network tab is what you're looking for here. It shows all requests with a waterfall display, allowing you to see request and response details, headers and body and all. Enable "Preserve log" or otherwise it will clear out the list when your browser navigates. Click on a request to expand it. The details view has a Timing tab which breaks down what took how long during the request.
- The Performance tab and Memory tab is useful for profiling JavaScript in larger applications, of which the former allows you to throttle the CPU.
- The Application tab lets you see whatever information the current site has persisted, including cookies, LocalStorage, etc. ESC expands another utility drawer.

Chrome has comprehensive documentation for the devtools here.

Another useful tool is just using curl from the terminal. If you pass -v, it'll enable verbose mode and show you both the full request as well as the response. e.g. curl -v https://google.com

As for making HTTP requests via a GUI, I've found Postman to be quite handy. Just ignore their prompts to login or register.

Of the 3 browsers, only FF does the 500kbyte+ cache on uploads
Regarding this and your previous question about the ~500kB buffer thing, there are multiple buffers throughout the TCP/IP stack of both the sender and the receiver. I haven't noticed or paid attention to this behavior before, so I'm unable to comment there. I would start from the following:
- Open the FF devtools' Network tab and enable "Persist logs" from the cogwheel to the right
- See what the POST request looks like there, see its Timings tab in the details
- Take a look with tcpdump/Wireshark to see if there's anything obvious there
  * does the TCP traffic look different between FF and Chrome?
- Investigate if this happens with other servers (try spinning one up locally and throttling the network speed?), or just your microcontroller server
  * if just your stack, can you reproduce it with a different stack than LWIP? maybe LWIP buffers?
  * can you reproduce it with a different client device on your server?
- What if you change your connection to, let's say, your phone's WiFi hotspot?
- If nothing else, maybe hop onto a Linux distro and use strace to comb through the system calls Firefox makes during the uploading. It's possible to filter strace for specific syscalls, or just grep it.
  * Does it happen on a Linux machine as well?

Interesting that the keep alive causes the server to send 304 every 5 seconds. I could implement that, although my "server" is a primitive one which is mostly stateless and just responds to each client request immediately.
Sorry, I forgot to mention that the screenshot had both TCP and HTTP mixed (different levels of abstraction!). There was only one 304 response over HTTP, and the keep alive just instructed the client to keep the TCP connection open instead of closing it after receiving the response (and also communicated the server's intent that it is also intending on keeping the TCP connection open for at least 5 seconds). It should be stateless and respond to each request immediately :). Unless we're talking about WebSockets or WebRTC or other technologies.
« Last Edit: August 10, 2022, 12:12:39 am by httpmies »
 
The following users thanked this post: peter-h

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #59 on: August 10, 2022, 11:11:48 am »
Thank you again, httpmies, for such an informative post.

Quote
HTTP is a stateless protocol.

I "get" that HTTP is stateless. I think my ideas about connections being closed were partly due to a couple of factors:

I also "get" that HTTP is client-server, and the server serves only what the client requested (apart from Ajax etc which I know nothing about). This has quite specific implications e.g. client requests x, the server returns x+y, but the client, having got x, discards y. Even if y is something trivial like a line of JS to go to a URL. So if doing e.g. a download (header, byte count x, CRLFCRLF, data) after the client has counted off x bytes, it is done and finished. Any further data is dumped, and this is why I have not been able to do anything after a download. For example this is my filesystem



so clicking on tb20-5.jpg downloads that file. But after the data has been sent to the client, any more data is ignored, because it wasn't requested. The server obviously knows when it has sent all the data, but there is no way to get the server to display another page on the browser, even refresh the one above, etc, after the file transfer. There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5.

The only way I can think of would be to download a JS script (before the file, obviously) which accepts the data intended for the browser, passes it on to the browser, and either intercepts the byte count or runs a timeout, and on the earlier of the two it does something like "window.location.replace("/files.html")\r\n". Is that possible? The web is full of people asking whether a server can get the browser to "press F5". The answer is NO, other than by running client-side code.

The other factor is that the TCP/IP API does have a concept of open connection, read, write, close connection. My simple server closes the connection after each "serving" and then re-opens it, to wait for the next client request. This is probably not needed; I am not sure. It was the demo code for the netconn API, which I am using.

Quote
but I would expect there to be some timeouts at least

An error corrected comms protocol obviously can't be done without timeouts. The stuff at the bottom here https://www.nongnu.org/lwip/2_0_x/timeouts_8c.html seems to refer to these. They are internal and presumably with some appropriate defaults. They must also be varied e.g. with satellite comms you may need seconds, while on a LAN only 100s of ms. And these are not related to a timeout on say netconn_recv which is a blocking read, and with an infinite timeout, but one you can change (I dropped mine to 5000ms).

Quote
500kB buffer thing, there are multiple buffers throughout the TCP/IP stack of both the sender and the receiver.

Of Chrome, FF and Edge, only FF does this 500k initial thingy. It all still works; just makes a simple progress report look weird because you get "520k of 1500k" immediately. The others do something like 40k, which at my flash file system writing speed of 30kbytes/sec is hardly noticed. One could probably hack it for a better visual presentation by making the JS (the upload is done with JS, as a "PUT", to make parsing it easier at the server) output say 30k blocks, separated by 500ms.

There is negligible buffering at my server. Just 4 MTU-sized (1500+) packets, plus 2 more at ETH level level.

If I wasn't nearly finished I would probably get wireshark set up. I had a play with the browser tools but there seems to be some data which it doesn't display.

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline mariush

  • Super Contributor
  • ***
  • Posts: 4982
  • Country: ro
  • .
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #60 on: August 10, 2022, 11:59:50 am »

so clicking on tb20-5.jpg downloads that file. But after the data has been sent to the client, any more data is ignored, because it wasn't requested. The server obviously knows when it has sent all the data, but there is no way to get the server to display another page on the browser, even refresh the one above, etc, after the file transfer. There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5.

The only way I can think of would be to download a JS script (before the file, obviously) which accepts the data intended for the browser, passes it on to the browser, and either intercepts the byte count or runs a timeout, and on the earlier of the two it does something like "window.location.replace("/files.html")\r\n". Is that possible? The web is full of people asking whether a server can get the browser to "press F5". The answer is NO, other than by running client-side code.

No, you can't do that because once you click on the download link it's as if the original page never existed, the download transfer is completely independent from everything.
You could right click on the link for the file, select "Copy link address",  paste it in another browser or mail it to some friend, and that other browser will directly request the file. What would those browsers do with your extra junk data you try to push? There's no index page opened in browser, they just clicked the direct link to download the file.
The link could even be passed to some download manager that hooks into the browser and catches download requests ex flashget, mass downloader, internet download manager etc

It's not "right" for the server to attempt to control what happens on the client's computer...  think of it, would you think it's acceptable for this forum's web server to serve you an image attachment and then inject code to redirect/refresh your page to an ad page, or to move you to amazon.com with an affiliate link? 

You could have some javascript on the page that can poll the server and constantly ask "has something changed since year-month-day hour-minute-second and server could instantly reply with a few bytes saying yes or no ... if yes, then your javascript code could trigger a refresh of the whole page.

Some other thoughts... about upload form ... when you serve the page to the user, you could have a hidden field in the form with a unique ID. Javascript could grab that unique ID and once user clicks on the button, you start a javascript function that once a second asks the server "where are you with the transfer of the file with unique id xyz "  and your server could do a stat on the file or whatever and see how many bytes were written in the file and reply "still receiving data, received n bytes so far for file with uid xyz"
But this implies that your code can actually switch between parallel requests ex (pause receive upload data and write to disk to handle the request from the other connection and then switch back to the previous request, to resume writing)
The original connection that started to upload the data (the post) doesn't receive data, it's a separate async connection created by the javascript code which communicates with your server independently from the first connection - like two people's computer in different parts of the city access the same page at the same time.

Quote
There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5
it's too fast, and I wouldn't be surprised if browsers don't do it on purpose or treat it as a badly coded webpage (would you want to have 1 GB of phone data and have a page like yours eat through the allowance because it refreshes twice a second?) or slows it down because it sort of pointlessly keeps that tab's threads active and the resources for that page would be harder to swapped out for other tabs.

You could have a listener on the onclick event for links , when a link is clicked, that code could use setTimeout or setInterval to schedule another function to run a couple seconds later. That function could restart the refresh of the page.
« Last Edit: August 10, 2022, 12:05:49 pm by mariush »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #61 on: August 10, 2022, 12:31:40 pm »
Quote
it's too fast, and I wouldn't be surprised if browsers don't do it on purpose or treat it as a badly coded webpage

No; it works fine. I have a 1Hz refresh on another page (system status; lists RTOS tasks etc) and 0.5Hz refresh on the files page. All works perfectly, But after a file download, the page (/files.html) stops the refresh.

The refresh is in the client, so for some reason, after downloading a file, it terminates the refresh. Not a big deal, but interesting why. Pressing F5 restores it, because it re-downloads the files.html page which has the refresh in the header.

It is obvious from browsing on various issues that vast numbers of web coders have been up these various paths, hence websites are loaded with weird JS, and all sorts of other stuff. A lot of people use fashionable frameworks (Laravel is one, which will be dead in 10 years' time) which do weird things anyway.

I am just trying to build a very simple server for system admin, up/downloading files, etc. No style sheets. But it does need JS support in the browser. It works ok on phones too, although auto resize is a bit of a challenge ;)

Quote
You could have a listener on the onclick event for links , when a link is clicked, that code could use setTimeout or setInterval to schedule another function to run a couple seconds later.

The file download can take over a minute.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3032
  • Country: us
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #62 on: August 10, 2022, 12:37:31 pm »
Quote
but there is no way to get the server to display another page on the browser, even refresh the one above, etc, after the file transfer. There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5.

The only way I can think of would be to download a JS script (before the file, obviously) which accepts the data intended for the browser, passes it on to the browser, and either intercepts the byte count or runs a timeout, and on the earlier of the two it does something like "window.location.replace("/files.html")\r\n". Is that possible? The web is full of people asking whether a server can get the browser to "press F5". The answer is NO, other than by running client-side code.

You could do this: the download link for a file runs some javascript which uses AJAX to download the file. To save the file on the client's computer use either FileSaver.js or StreamSaver.js:

https://github.com/eligrey/FileSaver.js/
https://github.com/jimmywarting/StreamSaver.js

I understand that with StreamSaver.js you can tell when the download is finished. As for doing this with FileSaver.js, see this issue:

https://github.com/eligrey/FileSaver.js/issues/699
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #63 on: August 10, 2022, 12:51:33 pm »
Gosh, yeah... that's more or less what I said about downloading with JS. Some nasty browser dependent code in these, testing for different browsers and doing different things.

I've been involved with that on a few projects and it is a nighmare. One website (which I specified, paid ~10k for, and got a PHP+MariaDB guy in Poland to code from scratch) which supports picture upload, fails to work with some version of Safari on a Mac. Never found out why, not enough complaints, not worth the money at $40/hr, and might have been avoidable if I let the guy use Dropzone which does get gradually updated to handle the weird browser dependent crap.

Come to think of it I haven't yet tested this server with Safari  :-DD
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3032
  • Country: us
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #64 on: August 10, 2022, 01:24:19 pm »
Quote
The refresh is in the client, so for some reason, after downloading a file, it terminates the refresh. Not a big deal, but interesting why. Pressing F5 restores it, because it re-downloads the files.html page which has the refresh in the header.

Don't reload the entire page to update the directory listing. Poll the server with JS to get the directory contents and replace that part of the HTML with the updated contents. Basically it becomes a SPA -- Single Page App. You'll perform periodic polling with setTimeout() in javascript.



« Last Edit: August 10, 2022, 01:26:55 pm by ledtester »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #65 on: August 10, 2022, 02:25:31 pm »
That's a lot of work though.

Someone else started on this project and did partly that. He used hidden tags in the HTML which would get picked up by client JS, so the page presentation would (or could) be controlled by the JS. Then you could have A-Z, Z-A, etc directory sorting and all sorts of fancy stuff. But somebody has still got to write the code.

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3670
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #66 on: August 15, 2022, 08:54:56 pm »
This is just an update on this project.

The http server is now finished.

It remains based on that original ST-supplied netconn-API code which is found around the place e.g.
https://github.com/particle-iot/lwip/blob/master/contrib/apps/httpserver/httpserver-netconn.c

It was quite a lot of work since nobody uses netconn (people use the socket API) but eventually, with help of some clever people, it was worked out.

JS was used to implement the returning of large chunks of data to the server, so that the easier to parse PUT method can be used, instead of POST. The first of these was the textarea file editing, where JS was also used to enforce CRLF line endings (textarea strips off CRs when it is initially loaded with data). The second was file upload, where I wanted extra checking e.g. filename is valid-8.3 and file size no bigger than remaining filesystem space (I did a special GET function for that value).

The final JS push was done by a guy on freelancer.com for $100 - he was incredibly fast but these people have to be, to do what would be some full time employed programmer's day's work for $100. Clearly this is the way to get software done, but you need to be able to define a clear package.

« Last Edit: August 15, 2022, 09:27:17 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf