EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: peter-h on July 16, 2022, 09:18:51 pm

Title: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 16, 2022, 09:18:51 pm
Back here
https://www.eevblog.com/forum/programming/what-actual-data-is-used-to-return-a-favicon-to-the-browser/ (https://www.eevblog.com/forum/programming/what-actual-data-is-used-to-return-a-favicon-to-the-browser/)
I implemented a favicon on my simple server.

I now understand how a web page is served and how one can have links on that page and deliver other pages as somebody clicks on the links. It seems really simple.

However I now need to do file transfers, of arbitrary size. My understanding that the client end is a standard browser feature. On a download, the browser offers a Save As dialog. On an upload, it offers a file browser to pick the file. It is the transfer I don't know. I think it is done in blocks, chosen to suit the server memory availability, and the client has to request the next block. All examples I see online again apply to standard servers like Apache. Can anyone tell me the actual data flowing?

Many thanks :)
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ejeffrey on July 16, 2022, 11:52:43 pm
File downloads are handled like any other request.  The client decides whether to try to display it or download it by default based on file type.

File uploads are done with form submission.  A file upload must use the submission method "POST" not "GET". There is no chunking, and it isn't normally broken into multiple requests.  Throttling is just done by standard TCP connection throttling.

There is a "content range" field to request certain byte ranges that is used to get early previews in some document types but it's up to the client and often not used.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 17, 2022, 12:51:09 am
Downloads are served exactly like HTML pages, only the Content-Type: defines the file type.
If you want to force the file to be saved (and not opened in an application), use Content-Type: application/octet-stream.

File uploads use POST method requests, where the header part is followed (by an empty line and) the uploaded file data.
The Content-Type is multipart/form-data; boundary=boundarystring with the boundarystring usually quoted.
Mozilla Developer Network has an example (https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/POST):
Code: [Select]
POST /test HTTP/1.1
Host: foo.example
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Disposition: form-data; name="field1"

value1
--boundary
Content-Disposition: form-data; name="field2"; filename="example.txt"

value2
--boundary--
Note that each newline in the above snippet is \r\n.  Browsers tend to use long boundary strings, which can be annoying.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: SiliconWizard on July 17, 2022, 01:25:04 am
I think there's a specific request for partial transfers too, but I don't remember the details.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 17, 2022, 02:22:08 am
Unless the server provides an Accept-Ranges (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Ranges) HTTP header, clients shouldn't try to request ranges or partial responses.  Requests themselves are always complete, not split into ranges.

If a request does contain a Range (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range) HTTP header, it is up to the server to completely ignore it (just return a normal 200 Ok response with the entire contents), return only the range(s) (in a 206 Partial Content response), or error out with 416 Range Not Satisfiable.  The most robust approach is to ignore it, and just always return full contents.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 17, 2022, 07:58:33 am
This is really interesting and, at the "byte shuffling level" doesn't sound that complicated... especially as in my case the client has "unlimited" memory and speed.

I too thought that the server can just push a 2MB file out in one go and especially nowadays the client should manage to stream it to disk, but what if the server has a flash file system with a 30kbyte/sec write speed? Is that flow rate controlled by the normal TCP/IP mechanisms too, so that while writing to flash you just don't poll the socket API for more data?

Isn't "multipart" a requirement? I have seen some awfully complicated implementations.

Also I guess there need to be timeouts so the server can recover if the data stops arriving, but how does it know the end of data? Is a byte count sent by the client, first?
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ejeffrey on July 17, 2022, 04:27:54 pm
That is all what TCP is for.  Don't worry about flow control. You are overthinking this.

The only reason to use byte ranges is if you only want part of the file: say you are resuming an interrupted transfer, or if you want to fetch the data out of order.

Both the client and the server are always allowed to fall back to entire file transfers.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 17, 2022, 05:13:35 pm
In case you're not aware of it,  both Firefox and Chrome have Developer Tools  (usually can be opened in both using F12 key)

Go in the network tab  and when you load a page, you will see there all the requests being made by the browser and you can click on one to see on the right the request and response headers

There's a response header content-disposition that you can use to force the browser to ask where to save the file instead of automatically saving it to a default location, or opening in browser
See https://coding.tools/blog/force-file-download-instead-of-opening-in-browser-using-http-header-and-flask

basically Content-Disposition: inline will tell browser that it's ok to load it in a new tab or same tab (ex a pdf),
Content-Disposition: attachment; filename="abc.txt"  tells browser to pop up the save as and suggest the name abc.txt for the file that you then push to the browser.
Up to you if you implement ranges, I'd say don't bother, or ignore  ranges in the header. Would add quite a bit of complexity, to parse the range parameter, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range
Respond with Accept-Ranges: none; or say content-range: 0-size/size always ... and browser will know resume is not supported.
 
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 17, 2022, 06:23:34 pm
Thank you. Yes; I have used that feature to some extent.

I have a simple HTTP web server (for local config/status only; no security is needed) which just looks for the GET... (url) string and returns a dynamically generated page which is the FreeRTOS task listing. This actually came out of some ST code.

It seems pretty obvious how to handle a page with links to other pages. The server looks for the base URL and then squirts out the HTML. This is what I've been doing for years in Dreamweaver, building simple websites with no CSS or any sort of style sheets, which scale perfectly on any device. Just implement multiple string compares... and on a match squirt out the HTML for the selected page.

Then I need to implement a file listing (FatFS file system). I already have done this and it gets sent to a file, so sending it to the browser is trivial.

The next bit is a bit harder. I want to do

- file upload (using the browser file picker, as discussed above)
- file download (the file listing will have clickable links for each file, so you select the file by clicking on it)
- file editing (only for small text files; opens a browser window, with a SAVE button under it)

The last bit I really don't know but it is not essential since the user can edit a downloaded file locally.

I also need to add a username/password login, which will involve the browser popping up text entry boxes for these. This is probably a variant of the editing window above. The credentials are in a file in the filesystem, configured via a separate process.

I got someone to write this, using some open source web server, but after spending about 4k it turned out to be a disaster which doesn't run (well not as an RTOS task) and it is way too bloated for me to fix. It's got some multipart stuff for the file transfers, which seems unnecessary. So I will adapt my existing simple server.

There will probably be vulnerabilities because I am using simple tests like this to see which URL is being sent

Code: [Select]
      /* Is this an HTTP GET command? (only check the first 5 chars, since
      there are other formats for GET, and we're keeping it very simple )*/
      if ((buflen >=5) && (strncmp(buf, "GET /", 5) == 0))
      {
    if(strncmp(buf, "GET /index.html", 15) == 0) // The 16 depends on the URL length!
        {
    // Return the dynamic status page
    DynWebPage(conn);
        }
    else
    {
        if(strncmp(buf, "GET /favicon.ico", 16) == 0) // The 16 depends on the URL length!
            {
        // Return the favicon
        DoFavicon(conn);
            }
    }
      }

but it doesn't matter.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 17, 2022, 07:36:25 pm
I followed your posts so I know a bit about what you're doing.

I would probably look for the first 4 characters, and reject anything that's not "GET[space]" or "POST". Then I'd extract the URL after GET[space]  or POST[space] and do something depending on it.

Keep in mind that you'll HAVE TO account for parameters in the URL even if your pages won't have such parameters, and you also have to account for the HTTP tag after the URL. 
Some browsers when they don't get information through response header about caching the page will add some bogus parameter at the end of URLs because that often forced web servers to send the document again, instead of replying with HTTP 304 Not Modified.  Some browsers did this when user pressed Shift + F5 (forced refresh)

ex you could have GET /index.html?parameter=value HTTP/1.1

Could be HTTP/1.0 , could be HTTP/1.1, could be HTTP/2  so you can't just assume it will always be 8 characters. You'd have to look for the space after the verb, then the next space as an end for the URL. The URL's spaces will be escaped as %20 so space is a good separator character, to separate verb, url and http version.

The most common verbs are POST, GET, PUT, PATCH and DELETE   .... but really GET AND POST are the most used, DELETE is sometimes used in APIs of various services, and PUT/PATCH are very rarely used.
You really only need GET and POST

File upload is easy.  You put a form on the page with the fields you need, and when the user hits upload, the browser will send a POST request to your server with the data. At the end of transmission it will send one or two empty lines and keep the connection active, waiting for your reply.
You accept the incoming data, dump it into a temporary file if you don't have a lot of memory, then do your thing on the temporary file and you then send a proper reply to the browser. It could be something as simple as a 200 OK and do a redirect in the html  ex  <html><head> <meta http-equiv="Refresh" content="0; URL=https://example.com/"></head><body /></head> and you could redirect to the page listing the newly uploaded content.

The form on the page would be something like this:

// see the input type="file" for a lot of good details : https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/file

<form method="POST" enctype="multipart/form-data" action="/upload.html" >
<div>
 <label for="file_up">Select file to upload:</label>
 <input type="file" name="file_up" accept =".doc,.pdf,application/msword" placeholder="Please pick a file!" >
 <input type="hidden" name="folder" value="/documents/uploads/" >
 <input type="submit" name="button_upload" value="Upload" >
</div>
</form>

The form has a file selection called file_up , a hidden parameter named "folder" with the value the path (filled by you when serving the page to the user) and a button named "button_upload"

When user hits the Upload button, the browser is gonna do  POST /upload.html   and do a multipart-formdata and you'll get at least these 3 items separated by that boundary

The accept and placeholder are optional, I just added them to show it's possible.

Editing of a document can be done just the same with a POST form, only you can use a TEXTAREA - https://developer.mozilla.org/en-US/docs/Web/HTML/Element/textarea -  around the contents of the document and the user will see the text in a editable text box on the screen
When you hit the save button, the browser will send you the whole contents of that textarea and whatever other parameters are in the form (hidden on purpose by you or not)

Note that the contents of the document would have to be escaped ex < > and & at the very least :  < would be &lt; , > would be &gt; and & would be &amp; - the browser will parse the html and show the characters properly on the screen and send you the actual characters in the POST form data, not the escaped ones.

There's some other gotchas, like if you don't specify a text encoding for your html files, it's assumed your html file and whatever is in it is UTF-8, so if you dump a .txt file that was written with character encoding ISO-8859-1 then some valid characters in ISO-8859-1 would be invalid in UTF-8 (invalid code points, those specific characters get two byte combinations in utf-8) and not rendered properly on screen.  See for example https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/ for explanation

user and password ... easiest would be to use cookies  or a session id

Basically you make a form with  username  and password and a login button and you submit these with a POST request to your server (GET works too, but it's not "cool" to use GET because the parameters may show up in your address bar)
You get the username, password and you verify them against some internal database and if they're correct you add to the HTTP response headers a cookie with the Set-Cookie attribute : https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie
For example, say  jsmith logs in and the password he submits is correct, you generate a token of some sort ex "20220717jsm453" and you add this to your records with an expiration date (ex in case user leaves browser page opened for 10 days and someone hits refresh you don't want that random user to mess around)
Then you send cookie 

Set-cookie: token=20220717jsm453; Max-Age=86400

When 86400 seconds go by, the cookie is deleted. You can't rely on this as a user with bad intentions could just go and edit the cookie database in the browser and set the max age to 2 billion and never expire the cookie, on your server it would be wise to also check when token was generated and invalidate the cookie by sending Set-Cookie with another token value
If you don't say Max-Age or Expires the cookie is treated as a "Session cookie", it lasts as long as the session lasts, which could be indefinite (if user has option restore tabs at start)

So once you do a Set-Cookie, the browser will send the cookies back to you with every request, ex
Cookie: token=20220717jsm453;[SPACE]anothercookie=somedata;[space]yetanothercookie=morecrap

And you can check on every request to see if that token is valid, and if it's not, you could just give a 304 temporary redirect and redirect to a login page.


Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 17, 2022, 09:18:43 pm
Thank you very much mariush.

The new point you made me realise is that a login has to be implemented with something "sticky". But cookies are a fair bit of code.

Quote
Keep in mind that you'll HAVE TO account for parameters in the URL even if your pages won't have such parameters, and you also have to account for the HTTP tag after the URL.

Can the extra data be just ignored? I can have a large enough buffer to hold it.

This stuff is actually rather similar to the business of "scraping" websites. I have never written such code myself but have known lots of people who have.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 17, 2022, 09:44:16 pm
I guess you could ignore parameters in the url ex search for first "?" or space to get to that end.

But being able to parse those parameters could be useful for pagination or if you implement some api if you'll end up using Javascript in your pages.

ex you could have a list of files and show only 10 or 25 per page, so you would have for example GET /documents/xyz/index.html?page=2 HTTP/1.1 or Javascript on your page could do a fetch saying GET /api?function=filelist&folder=/documents/xyz/&results=10&page=2&api_token=12345

to get up to 10 results, starting from 11th file, for the folder /documents/xyz/ folder
(page 2 and results=10: 10 files on first page, so entries 11-20 on page 2)
api_token would be set when user logs in

or maybe view a text document in chunks by saying GET /api?function=viewtext&filename=/text/a.txt&offset=###&maxsize=1000&api_token=12345  to get up to 1000 bytes from offset ### and put on the screen (useful for example to progressively load a text file as you scroll down and get close to the end of scroll)

Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 18, 2022, 07:33:58 am
I will first try file transfer and see how that goes. If it can just be streamed nonstop that will be far simpler than I thought. Others have told me it is done block by block and there is an ack after each block, to control the flow rate.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 18, 2022, 08:07:44 am
I will first try file transfer and see how that goes. If it can just be streamed nonstop that will be far simpler than I thought. Others have told me it is done block by block and there is an ack after each block, to control the flow rate.
It is streamed nonstop at the HTTP protocol level.  At the TCP/IP level, each TCP datagram is acked by your IP stack when it has been read/received.  So both are true, just look at a different level.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 18, 2022, 09:09:36 am
Sure; that much I knew. TCP/IP has error correction and inherent flow control. UDP doesn't (it has a checksum). If I understand it right, from the code writing POV, it seems to "just work" i.e.

embedded -> browser
read the file and send it to the socket API, until EOF

browser -> embedded
read the socket API and write each buffer size, say 512 bytes, to the filesystem

An interesting Q is how to do a "progress report" say xx%. In the former case, the client should be showing that at bottom left. In the latter case, I don't think a browser shows anything, but the server can send back a string saying "xx% done".
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 18, 2022, 10:43:34 am
An interesting Q is how to do a "progress report" say xx%.
The Content-Length (https://developer.mozilla.org/en-US/docs/web/http/headers/content-length) header is the key here.  It should contain the size of the uploaded data (the file size, if nothing else is included in the POST data) in requests, and the download size in responses.

The upload progress report can then be implemented in JavaScript; download progress report is shown by the browser itself automatically –– but only when the response header does report the download size via Content-Length header.

(The upload progress report is based on making the upload itself in JS, using an XMLHttpRequest object.  Don't let the XML in the name fool you.  You use the object to open a POST connection to the server, attaching a progress event listener to it.  Then you use the object to send() the file contents by creating a FormData object with the file name as the initializer.  You can also append other stuff to the FormData object.  The XMLHttpRequest interface will make sure the request has a Content-Length header, if one is computable.  The Mozilla Developer Network Using FormData Objects (https://developer.mozilla.org/en-US/docs/Web/API/FormData/Using_FormData_Objects) page should help here.)

Have you considered writing a HTTP server on a fully hosted OS, on top of a known good TCP stack?  Sure, it would be "extra" work, but that way it would be very easy to debug it, and gain a full understanding of what is done at the application level (on top of TCP/IP).  If you use Linux, I could definitely write an example skeleton of it for you.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 18, 2022, 11:05:47 am
Quote
The upload progress report can then be implemented in JavaScript

What happens, during a client -> server file upload, if the server sends some data to the client? Surely that data should appear in the client browser window. There is, AIUI, nothing else being sent to the client during the transfer. With appropriate HTML formatting, one ought to be able to display a "xx%" string.

Quote
Have you considered writing a HTTP server on a fully hosted OS, on top of a known good TCP stack?

Well... I started down this road with someone else who picked up an open source server and after 100+ billable hrs didn't produce something which worked. Partly because I didn't ask for detailed progress reports, but there were also some fundamental misunderstandings e.g. he was supposed to write it as an RTOS task (like everything else in the project is) but instead wrote it as some sort of standalone code, and there is so much stuff there I can't work it out. It sort of partly runs. After more expense than I can afford, I have now scrapped this and will try to write it myself, by extending the simple server I have had working for ages (which just does the RTOS task list and the favicon).
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 18, 2022, 11:32:19 am
Quote
The upload progress report can then be implemented in JavaScript
What happens, during a client -> server file upload, if the server sends some data to the client? Surely that data should appear in the client browser window. There is, AIUI, nothing else being sent to the client during the transfer. With appropriate HTML formatting, one ought to be able to display a "xx%" string.
Try it in practice.  It isn't as simple as it sounds, especially because the response must be a HTML page itself, and many clients won't actually read the response until they've sent all the data.

After more expense than I can afford, I have now scrapped this and will try to write it myself, by extending the simple server I have had working for ages (which just does the RTOS task list and the favicon).
Right.  It is just that isolating the service logic from the underlying TCP/IP stack really helps understand how the clients (browsers) interact with the server.
Let me throw something together, and I'll post it here as an example.  Be back in a moment.

(If I recall correctly from my early Common Gateway Interface programming, it's the boundary strings in multipart requests that is the most annoying to handle, so I'm not sure I will do upload support.)
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ejeffrey on July 18, 2022, 01:45:28 pm
What happens, during a client -> server file upload, if the server sends some data to the client? Surely that data should appear in the client browser window. There is, AIUI, nothing else being sent to the client during the transfer. With appropriate HTML formatting, one ought to be able to display a "xx%" string.

I doubt that will work.  The POST request is only allowed to generate one response.  Even if the browser will process the response before it finishes the transfer it would be a single page and be displayed all at once.  You can't really do a progress bar that way.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 18, 2022, 01:56:21 pm

Another upload technique would be to use javascript exclusively to upload the file to the server in chunks.
For example, the javascript code loads the file in a byte array and then opens a connection, does a POST to the server with the start position and the amount and the data (a small amount, for example 4KB , 64 KB, 128 KB) then closes the connection, updates a progress bar on the html page, and creates another connection for the next chunk , and repeat until all chunks are uploaded.
It would be the server's job to combine all segments into a single file, when all the chunks are successfully uploaded. 

I don't recommend it though ... it's a lot of Javascript code and since you're most likely not gonna upload megabytes to your device, it won't be necessary. People will probably be fine with waiting 5-10s or so it takes to upload some file to your device.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 18, 2022, 02:41:13 pm
Yes; I need to stick to stuff I can understand :)

Biggest uploaded file will be 2MB which at 30kbytes/sec net flash writing speed is about 1 minute.

I have spent today trying to work out what the other guy did on that (cancelled) http server project which doesn't work properly. It turns out to be a complicated state machine, which hooks into LWIP, and I haven't got a clue how to hack it to run as a standalone RTOS task. It's also got interesting memory usage, doing a malloc out of LWIP's private heap (both the malloc, and where from, are a very bad idea). There is also a complicated multipart section (another open source thing from github) which was apparently needed for file transfers, but according to above it isn't necessary. So from tomorrow I will be writing my own one :)

I now have a web page running with clickable links. Tomorrow I will try a file transfer.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 18, 2022, 10:26:30 pm
Which one of the lwIP APIs (https://www.nongnu.org/lwip/2_1_x/group__api.html) do you use?  altcp (https://www.nongnu.org/lwip/2_1_x/group__altcp__api.html)/tcp (https://www.nongnu.org/lwip/2_1_x/group__tcp__raw.html) (raw), netconn (https://www.nongnu.org/lwip/2_1_x/group__netconn.html), or socket (https://www.nongnu.org/lwip/2_1_x/group__socket.html)?  Have you already checked out the HTTP/HTTPS server (https://www.nongnu.org/lwip/2_1_x/group__httpd.html) included in lwIP 2.1?

The reason I ask, is that I can recreate the same API (for any of those) in hosted C, so that code should be directly portable between the two.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 19, 2022, 05:28:59 am
Netconn, currently.

When I was doing some debugging (other thread about mutexing LWIP API and LWIP_TCPIP_CORE_LOCKING=1 blowing the whole thing up because the code examples used the same mutex around the low_level_input/output as they used for the API higher up) I found netconn ends up in the same place as sockets.

Quote
Have you already checked out the HTTP/HTTPS server included in lwIP 2.1?

Yes; this
(https://peter-ftp.co.uk/screenshots/202207194813932606.jpg)

I can't work it out; it is too complicated. And if it is anything like the rest of that ETH code it doesn't work (e.g. the NTP there is only a skeleton). The abandoned project was seemingly based on that code.

Today I am doing the file listing, with clickable filenames, and clicking on a file will download it.

Quote
Downloads are served exactly like HTML pages, only the Content-Type: defines the file type.
If you want to force the file to be saved (and not opened in an application), use Content-Type: application/octet-stream.

EDIT: found something curious. I am displaying the file listing, using conventional HTML

(https://peter-ftp.co.uk/screenshots/202207192913944909.jpg)

and if I click on say that jpg file, the browser (Chrome) pops up a Save File dialogue. I haven't sent it any headers at that point. It correctly selects what to do according to the file type. If there is no file type, it offers to save it under "all files". This is exactly right. Very neat!

I can also see some back doors here, because in this simple system each file can be downloaded as a direct URL, bypassing any login, unless that is specifically blocked.

EDIT: both chrome and FF also save the file with the correct date (in 2006). Where the hell does the browser get that info from? It is not hidden in the HTML.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 20, 2022, 04:55:27 am
EDIT: both chrome and FF also save the file with the correct date (in 2006). Where the hell does the browser get that info from? It is not hidden in the HTML.
From the HTTP response header, specifically the Date one.

Note that browsers can send HEAD requests, to obtain the date (Date:), MIME type (Content-Type:), and size (Content-Length:) headers without bothering with the body content.  (It is just like GET, but the body part is not sent.)

I can also see some back doors here, because in this simple system each file can be downloaded as a direct URL, bypassing any login, unless that is specifically blocked.
Yep.  The most practical way is to set a cookie at login, that you can check at request header processing time.

There are several approaches.  The two main ones differ between whether you keep a table of every logged in user in memory on the server, or not.

If your server keeps a small table of every logged in user in memory, consisting of allowed access mask, an expiry timestamp, optionally an IP address, and a random number (that you also set as the login cookie value), it is a simple matter of verifying on each request that:The IP address limits the use of the cookie to that client IP address, but does not otherwise add anything to the security.  When you generate new random numbers, you will always make sure it does not already exist in the authorized user table.  The security stems from the random number being unguessable, so it needs to be large and cryptographically secure.  Your TLS implementation should provide a suitable pseudorandom number generator.

Note that if the authorization is cancelled after some inactivity period, you'll want to support two such records for each logged-in user.  Whenever the cookie value is old enough (typically a fraction of the expiry time), it is replaced by a new one.  However, because the order of HTTP requests is not deterministic, there is a time window during which the client will still use the old value: you'll want to accept either one for a duration.  Personally, I'd just accept both, always adding a Set-Cookie header with the new one to responses where the request used the old cookie.

The other way is to use a single secret "salt" on the server, and keep the authorization information in the user cookie.  The cookie also contains a hash of a suitable secure hashing function, with the plaintext (the data the hash is calculated from) containing the secret salt.  On each request, the server verifies the cookie value is what it itself has set before, by recalculating the hash, and comparing it to the hash in the cookie.  If any character of the cookie is changed, the hash no longer matches.  This is computationally much more work for the server, but lets things like multiple load-sharing servers (as long as they share the secret "salt") without having to share possibly huge user authorization tables.  Furthermore, to avoid certain types of attacks, it is important to construct the plaintext and salt combination in specific ways.

Neither of these is secure, unless the connection is encrypted using TLS.  None of the cookie-based authentication methods are.  And nobody uses HTTP authentication, because the user interface in browsers (especially regarding logging out) is so poor.  (Anyone listening in on the unencrypted traffic can simply steal and reuse the authentication cookie.  Even IP address limiting is just a small hurdle, because spoofing the source IP address is easy unless encryption is used.)

I would definitely use the table-of-authorized-users approach in an embedded server, with at minimum 128-bit (16-byte), preferably 256-bit (32-byte, 43 chars if base64-encoded, 40 chars if encoded in base85) random numbers.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 20, 2022, 11:10:20 am
Quote
From the HTTP response header, specifically the Date one.

I am not sending one. I know, because I wrote all the code on the server :) All I am sending is

Code: [Select]
// send header
netconn_write(conn, DOWNLOAD_HEADER, strlen((char*)DOWNLOAD_HEADER), NETCONN_COPY);

// send filesize, as "Content-Length: 1910916\r\n\r\n"
strcpy((char*)pagebuf, "Content-Length: ");
itoa(file_length,(char*)&pagebuf[16],10); // place size after "Content-Length: "
strcat((char*)pagebuf,"\r\n\r\n");
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

if (f_open(&fp, fname, FA_READ | FA_OPEN_EXISTING) == FR_OK)
{
do
{
if ( f_read(&fp, pagebuf, 512, &numread) != FR_OK )
{
numread=0;
break;
}
netconn_write(conn, pagebuf, numread, NETCONN_COPY);
offset+=512;
}
while (numread==512);

f_close(&fp);
}



// Header for file download
static const uint8_t DOWNLOAD_HEADER[] =
"HTTP/1.1 200 OK\r\n"
"Content-Type: application/octet-stream\r\n"
"Content-Disposition: attachment>\r\n"
"<meta http-equiv=refresh content=1000>\r\n" // cancels out the 1Hz refresh used elsewhere (no obvious way to just cancel it)
;



I have concluded that Chrome and FF extract the EXIF data from the file (a jpeg), if there is no date in the header. I will test this later. Edge doesn't do it and saves the file under current date. To do it properly, should I use this format

Code: [Select]
"Content-Length: 364\r\n"
"Date: Sat, 01 Jan 2022 00:00:00 UTC\r\n"

Regarding a login, one interesting Q is whether it can support multiple concurrent clients. Most of it is stateless (client strings are immediately processed) so it can, provided they don't do stuff at literally the same time. There is only one RTOS thread. So obviously this is a hack, but I am ok with that.

I will read up on what sort of data is involved in cookies. It is probably more complicated than the rest of the server :)

I've spent a lot of time running file downloads and trying to see which packet / buffer size etc options made any difference. I am getting interesting results: 120kbytes/sec and even ridiculous amounts of buffering improve this only slightly (to 140). Reducing the buffers makes little difference until you get down to ridiculous levels e.g. 1 buffer at low level ETH which halves the speed.

Quote
I would definitely use the table-of-authorized-users approach in an embedded server,

The box has a 2MB FatFS filesystem and in there is a config.ini file which contains http_name= and http_pwd= and there is just one pair of these. I think this is consistent with there being only one possible user at any one time, although I guess one could have multiple sets of credentials so you could revoke those of somebody you just fired ;) However the security of this box will probably be laughable...
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 20, 2022, 02:35:46 pm
I have concluded that Chrome and FF extract the EXIF data from the file (a jpeg), if there is no date in the header. I will test this later. Edge doesn't do it and saves the file under current date.
Interesting; I definitely did not know that.  For ordinary files, the Date header should suffice, but not all browsers will honor it.

Regarding a login, one interesting Q is whether it can support multiple concurrent clients. Most of it is stateless (client strings are immediately processed) so it can, provided they don't do stuff at literally the same time. There is only one RTOS thread. So obviously this is a hack, but I am ok with that.
I would use a separate file with for example
Code: [Select]
struct user {
    unsigned char  user[12];  /* Length dictated by making this structure 64 bytes long */
    uint32_t  access;  /* Bit mask, each bit set grants access to something */
    uint32_t  salt[4];  /* Random number */
    uint32_t  hash[8];  /* SHA256((access) (user) \0 (password without padding) \0 (salt)) */
};
At login time, you read the file until you find the first matching user.  Then, you calculate the SHA256 hash of the access word, username with a single terminating nul byte, password (obtained via url-encoded POST) with a single terminating nul byte, and the salt stored in the file.  If the hash matches the stored hash, the user is authorized.  You generate a cryptographically secure pseudorandom number or string (i.e., unguessable), and add an entry in the logged in users table in RAM for this time, this user, with the pseudorandom number.  In the login response, you include a HTTP header,
    Set-Cookie: peterslogin=pseudorandomnumber; Max-Age=seconds; SameSite=Strict; Secure

In future requests the client makes, it will include a HTTP header
    Cookie: peterslogin=pseudorandomnumber
although there could be additional values (separated by a semicolon and a space) on the same line too.  The name of the cookie must be preceded by a space, and the value must be succeeded either by a semicolon or a CR (of a CR LF pair), or be enclosed in doublequotes.

If the cookie has been in use for sufficiently long time (typically, a fraction of the lifetime; this is also the precision at which these cookies expire), you add a new one by adding in the response headers another
    Set-Cookie: peterslogin=pseudorandomnumber; Max-Age=seconds; SameSite=Strict; Secure
with hopefully a new pseudorandom number; remembering both old one and new one for at least a while, because the requests are not strictly ordered, and especially when you serve just one request at a time, they can be delayed so that a later request can easily still have the "wrong", old, cookie value.

Whenever you get a request, you first decide whether the access needs authorization or not.  If it does, you consult the Cookie header(s) in the request, and verify that it has a valid peterslogin=pseudorandomnumber pair. The value may or may not be double-quoted; that is, peterslogin="pseudorandomnumber" would be equally valid.

The cookie name should only contain letters A-Z and a-z, digits 0-9, and any of ! # $ & ' * - . / ? ^ _  ` | ~ . (This means there are 26+26+10+15 = 77 allowed ASCII characters.)
The value can contain ASCII codes 33, 35-43, 45-58, 60-91, 93-126; or all non-whitespace non-control-characters except doublequote (34), comma (44), semicolon (59), or backslash (92).  (This means there are 90 allowed ASCII characters, so one can use Base85 (or own variant) to encode each 32-bit part in five characters, making the 256-bit pseudorandom cookie value 40 characters long. I'd keep the encoded value in RAM, so that one can do a string comparison instead of decoding/encoding it each time.)

I've spent a lot of time running file downloads and trying to see which packet / buffer size etc options made any difference. I am getting interesting results: 120kbytes/sec and even ridiculous amounts of buffering improve this only slightly (to 140). Reducing the buffers makes little difference until you get down to ridiculous levels e.g. 1 buffer at low level ETH which halves the speed.
What kind of bandwidth is that MCU supposed to achieve, then?
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 20, 2022, 04:18:41 pm
As a quick reply:

Quote
Interesting; I definitely did not know that.  For ordinary files, the Date header should suffice, but not all browsers will honor it.

This turns out to be very complicated. If no Date or Last-Modified header gets sent, the EXIF data seems to be getting used to save the file. But if either of these two headers is included, this feature disappears and the browsers save the file under the current date, disregarding the date in the header. Well, that's unless I am doing something else wrong, but I have tested it very carefully, stepping through the code and checking all the data being sent, and doing an Inspect in both Chrome and FF shows the correct dates which the browsers then ignore. IOW, neither of the two headers actually works, with any browser. And I get the same result with most files I download from elsewhere. This turns out to have been another rabbit hole. This link shows it to be a deliberate policy.
https://bugs.chromium.org/p/chromium/issues/detail?id=4574
The EXIF behaviour varies, too. FF and Edge use it to save the file if it is a jpeg with EXIF. Chrome does not. All 3 browsers disregard the Date: header.

Quote
For ordinary files, the Date header should suffice, but not all browsers will honor it.

I think they all ignore it. They see it (use Inspect on the file and you see it) and I didn't test obscure stuff like Opera. I spent way too many hours learning that this doesn't actually work, by design.

Quote
What kind of bandwidth is that MCU supposed to achieve, then?

No idea but I would expect a lot faster. The flash read for a 2MB file is ~2 secs, so the diff between that and about 15 secs actual time looks like LWIP+ETH low level stuff. The problem is that for all the GB posted on the net about this, there are few clues, and the standard buffer config makes almost no diff.

EDIT: One curious issue is that after a file transfer the page could do with a refresh. The reason for this is the structure of the code (not multi threaded etc). I have done a ton of googling on how to do a "server-side F5" but without javascript. Some hits suggested a header and I tried this

Code: [Select]
// Header for refreshing the browser
static const uint8_t REFRESH_HEADER[] =
"<!DOCTYPE html>\r\n"
"<html>\r\n"
"<head>\r\n"
"<title>xxxxxx</title>\r\n"
"<meta http-equiv=refresh content=2>\r\n"
"</head>\r\n"
"<body>\r\n"
"<META HTTP-EQUIV=\"refresh\" CONTENT=\"1\">\r\n"
"</body>\r\n"
"</html>\r\n"
;

where the META tag is the relevant one, but it doesn't work. I guess one can't just send data to a browser... Some posts above suggest that one has to send a complete HTML page for anything to work, but I thought the above does that.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 06:06:39 am
I managed to double the file download speed (to 250kbytes/sec) by polling for receive data 2x more often. This is the wrong data path so it should not make any difference, but evidently during a target -> PC file transfer there is a lot of data going in the other direction :)

Code: [Select]
/**
  * This function is the ethernetif_input task. It uses the function low_level_input()
  * that should handle the actual reception of bytes from the network
  * interface. Then the type of the received packet is determined and
  * the appropriate input function is called.
  *
  * This is a standalone RTOS task so is a forever loop.
  *
  */
void ethernetif_input( void * argument )
{
struct pbuf *p;
struct netif *netif = (struct netif *) argument;

// This mutex position is based on an idea from Piranha here, with the delay added
// https://community.st.com/s/question/0D50X0000BOtUflSQF/bug-stm32-lwip-ethernet-driver-rx-deadlock

do
    {
sys_mutex_lock(&lock_eth_if_in);
p = low_level_input( netif );
sys_mutex_unlock(&lock_eth_if_in);

if (p!=NULL)
{
if (netif->input( p, netif) != ERR_OK )
{
pbuf_free(p);
}
}

osDelay(10);   // this polling period has a corresponding effect on data flow speed in *both* directions

    } while(true);

}

Somebody should find this very funny. I should have realised that TCP/IP is a constant packet flow in both directions :)
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 21, 2022, 08:22:35 am
I managed to double the file download speed (to 250kbytes/sec) by polling for receive data 2x more often. This is the wrong data path so it should not make any difference, but evidently during a target -> PC file transfer there is a lot of data going in the other direction :)

Somebody should find this very funny. I should have realised that TCP/IP is a constant packet flow in both directions :)
Well, we did already discuss the fact that each TCP data packet received needs to be acknowledged as received.. ;)

If the files are 2 MiB or less in size, that's around eight seconds, so I guess it'll do.. but still, less than a megabyte per second does seem low, considering how even an 8-bit AVR ATmega32u4 running at 16 MHz can easily do 1 MiB/s over USB Serial.

I just haven't played enough with the innards of IP stacks to know how to squeeze everything out of lwIP, for example. :(
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 09:56:58 am
Yes, for this application, 250kbytes/sec is plenty. I am now getting that with a 10ms poll period. And still playing with the buffer settings makes sod-all difference. With a 5ms poll period and tweaking buffers way up (to the point of wasting a lot of RAM) I can get nearly 1mbyte/sec.

Having the rx interrupt driven would obviously be smarter but there is no free lunch, an it exposes you to hanging the product with fast input, unless you do one of the hacks of throttling the packet rate with ISR+timers etc. In this case it isn't needed and the osDelay(10) serves the important purpose of allowing other tasks to run. I tried taskYIELD() which should be ideal but immediately the whole thing more or less hangs.

With a poll period of 1 (which with osDelay actually means anything from 0 to 1, AIUI) I get 1.2mbyte/sec. There you go... with an ISR that would be even better.

Anyway, enough time spent on the download header date/time rabbit-hole, and enough time spent on the ETH speed rabbit hole. File UPload next :)
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 21, 2022, 12:29:55 pm

EDIT: One curious issue is that after a file transfer the page could do with a refresh. The reason for this is the structure of the code (not multi threaded etc). I have done a ton of googling on how to do a "server-side F5" but without javascript. Some hits suggested a header and I tried this

Code: [Select]
// Header for refreshing the browser
static const uint8_t REFRESH_HEADER[] =
"<!DOCTYPE html>\r\n"
"<html>\r\n"
"<head>\r\n"
"<title>xxxxxx</title>\r\n"
"<meta http-equiv=refresh content=2>\r\n"
"</head>\r\n"
"<body>\r\n"
"<META HTTP-EQUIV=\"refresh\" CONTENT=\"1\">\r\n"
"</body>\r\n"
"</html>\r\n"
;

where the META tag is the relevant one, but it doesn't work. I guess one can't just send data to a browser... Some posts above suggest that one has to send a complete HTML page for anything to work, but I thought the above does that.

See https://en.wikipedia.org/wiki/Meta_refresh

1. It must be between  <head> </head> tags
2. It should be written correctly, with quotes where they have to be  :  <meta http-equiv="refresh" content="5">

Some tips if you want to save some bytes ... 

<!DOCTYPE declaration is not required if you write valid HTML 5 content. So that whole <!DOCTYPE html> is redundant, majority of browsers will treat your pages as html 5 content by default.

The browsers don't care about those new lines between tags. Could use just \n or you could use nothing ... this is just as valid:
<html><head><title>abc</title><meta http-equiv="refresh" content="5"></head><body><p>Text</p></body></html>
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 01:41:14 pm
None of this seems to work, for the post file download page refresh.

Code: [Select]
// Header for refreshing the browser
static const uint8_t REFRESH_HEADER[] =
"<!DOCTYPE html>"
"<html>"
"<head>"
//"<meta http-equiv=\"refresh\" content=\"5\">"
"<meta http-equiv=\"refresh\" content=\"0; url=/&files.html\">"
"</head>"
"</html>"
;

Even just a means of loading the /&files.html URL would do the job.

I have been trying delays before/after sending that stuff.

Note that the file data is going to the client, which saves it off to a file, and then this refresh header gets sent. Is the client browser really receptive to HTML data after it has had the file data?
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: Nominal Animal on July 21, 2022, 02:06:55 pm
One possibility is for the upload to respond with a Status: 303 See Other (https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/303) with a Location: (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Location) header specifying the URI.

For example, the response from a successful upload could be
Code: [Select]
HTTP/1.1 303 See Other
Date: Thu, 21 Jul 2022 13:52:07 GMT
Location: /files.html
Connection: Close


However, usually the JavaScript approach is taken, because the notification of an upload success is really useful for users:
Code: [Select]
HTTP/1.1 200 Ok
Date: Thu, 21 Jul 2022 13:52:07 GMT
Content-Type: text/html; charset=utf-8
Connection: Close

<!DOCTYPE html><html><head><title>Upload successful</title><meta http-equiv="Content-Type: text/html; charset=utf-8"><script type="text/javascript">
window.addEventListener('load', function (){ setTimeout(function (){ window.location = "/files.html"; }, 3000); });
</script><style type="text/css">
p { text-align: center; }
</style></head><body><p>Upload successful!</p><p>If you are not redirected to the file list in three seconds, please <a href="/files.html">click here</a>.</p></body></html>
Again, I'd just use a "hidden" file and emit it as if requested via a normal GET request.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 02:38:11 pm
This is after a download. I haven't got to the upload yet.

I just want the browser to either "do an F5" on the server, or request a particular URL.

I am fairly sure the packet gets sent to the browser, because there is no error return from netcomm. But it looks like the browser is not seeing it.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 21, 2022, 02:52:58 pm
Try with javascript ...add this between the body tags

<script type="text/javascript">
  setTimeout(function () { location.reload(true); }, 5000);
</script>


5000 is the number of milliseconds to wait from the moment the javascript code is parsed by the browser

after 5000 ms, the function sent as a parameter will run which does location.reload()  , reloading the page.

In theory reload() has no parameters, so the true shouldn't do anything.  But in Firefox they added a forceGet parameter which if set to true, forces the browser to ignore cached content and force reload the page
See https://developer.mozilla.org/en-US/docs/Web/API/Location/reload
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 03:26:45 pm
No luck; tried with and without the CRLF, and same for the <head> tags.

Code: [Select]
// Header for refreshing the browser
static const uint8_t REFRESH_HEADER[] =
"<!DOCTYPE html>"
"<html>"
"<head>"
"<body>"
//"<meta http-equiv="refresh" content="5">"
//"<meta http-equiv="refresh" content="0; url=/&files.html">"
"<script type="text/javascript">"
"setTimeout(function () { location.reload(true); }, 5000);\r\n"
"</script>"
"</body>"
"</head>"
"</html>"
;

Googling around, I also tried window.location.reload(); unsuccessfully.

But perhaps I am supposed to send this to the browser before the file data is sent to it. The problem is that the transfer could take a long time.

Using the various Inspect tools (I am no expert there) I see no evidence that this piece of data is reaching the browser. I suspect this is the same thing as the comments earlier on a progress bar with a file upload.

This actually raises another point: I have a Reboot link on this server, and post-reboot I would like the browser to periodically refresh that URL so it finds the server again.

Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 21, 2022, 04:00:03 pm


But perhaps I am supposed to send this to the browser before the file data is sent to it. The problem is that the transfer could take a long time.

wait, what?

No, once you output </html> you're done, you close the connection and the page transfer is complete.
If you want to put something on the page, you put it within the <body> </body> tags.

the javascript bit (what's between <script> </script> tags) ideally should be at the bottom of the page, because the timer starts as soon as the </script> tag is parsed.  You can change the javascript code to be triggered only on page load complete event or some other events, but in this version, it starts as soon as it's  parsed by browser.

Any html is made out of these bits

<html>  <- tells browser it's html content

<head >   
 ^- in this block there's properties of the html page that are not visible on the actual "page" , like the title that will show up on title bar , favicon ,
  here you also define stylesheets (css files) , third party fonts,  description and keywords for the page if desired and so on
 here you can also link to javascript files  (*.js) that a browser could load and cache
</head >

<body >

between these tags you output anything that you want to be visible on the page to the user
text, tables, forms, pictuers
you can also put invisible stuff like inline javascript , code between <script> </script> tags   

</body>

</html> - this tells the browser the page is done.

You don't output all that as a header, and then continue to print stuff or whatever. Once </html> is printed, you close connection. Maybe the page doesn't refresh because the browser is still waiting for you to close the connection or because the html page is invalid as you write stuff after  the </html> is sent?

Last but not least,  that meta refresh should be considered at best unreliable .. some browsers may even ignore it as they could treat it as annoying (like those joke pages or scripts that show up a popup after popup to annoy user or prevent user from closing page  (they added a checkbox to disable showing more popup messages on a page for this reason,  or one could hold shift down to abort popup messages)

Another annoyance with the fact you have little memory is that it's hard for you to buffer the whole page in ram in order to determine its actual length (number of bytes) and send the proper header Content-Length before sending the html page.


As for the reboot and show page to refresh once reboot is done I don't see that without javascript.
You can set up a timer, a function that launches another javascript function on an interval, let's say every 5 seconds.
That function could attempt to retrieve a file that's guaranteed to be there and timeout in a few seconds if the connection fails. If the connection is successful, your javascript function tells the browser to go to default index / login page.
 
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 04:44:40 pm
I must be getting my tags mixed up, but actually everything is running ok. It is merely that after the file download (to the browser) the browser has not refreshed the page. I notice this because the 2 sec auto refresh has disappeared. If I press F5, it returns.

The first header I send to the browser (when it goes to the base URL) is

Code: [Select]
// This one is at the top of every page. This header is not terminated and must be
// followed by e.g. PAGE_HEADER_MAIN or PAGE_HEADER_STATUS.
static const unsigned char PAGE_HEADER_ALL[] =
"HTTP/1.1 200 OKrn"
"Server: XXXXXX rn"
"Content-Type: text/htmlrnrn"
"<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">"
"<html>"
"<head>"
"<title>XXXXXX </title>"
"<meta http-equiv=Content-Type content="text/html; charset=utf-8">"
#ifdef BLOCK_FAVICON
"<link rel="icon" href="data:,">"
#endif
;

Next I send this one, which terminates the page with </html>

Code: [Select]
// This one is the main page
// (eventually this will appear only after a login)
static const unsigned char PAGE_HEADER_MAIN[] =
"<p><font size= 2><b>XXXXXX HTTP Server</b></font></p><p>"
"<a href="&files.html">Files</a><br>"
"<a href="&status.html">Status</a><br>"
"<a href="&reboot.html">Reboot</a><br>"
"<a href="&test.html">Test</a><br>"
"</body></html>nr";

Then if somebody clicks on the &files.html link, they get a file listing, which sends the first one above and then this one

Code: [Select]
// This one is the file listing
static const unsigned char PAGE_HEADER_FILES[] =
"</head>"
"<body>"
"<p><font size= 2><b>XXXXXX Files</b></font></p><p>"
"<meta http-equiv=refresh content=2>" // 0.5Hz refresh
"<pre>";

Then if somebody clicks on one of the files, they get this one which starts a new page

Code: [Select]
// Header for file download
static const uint8_t DOWNLOAD_HEADER[] =
"HTTP/1.1 200 OKrn"
"Content-Type: application/octet-streamrn"
"Content-Disposition: attachment>rn"
"<meta http-equiv=refresh content=1000>rn" // cancels out the file listing refresh
"Pragma-directive: no-cachern"
"Cache-directive: no-cachern"
"Cache-control: no-cachern"
"Pragma: no-cachern"
"Expires: 0rn"
;
and that page is not closed with </html>. Then this is the download code which should be fairly obvious

Code: [Select]
static void Download(struct netconn *conn, char *fname)
{

#define DBUF 512

uint8_t pagebuf[DBUF];
uint32_t offset=0;
UINT numread=0;
FIL fp;
FILINFO fno;
FRESULT fr;
//err_t netconn_err;
static const char montab[36 1]="JanFebMarAprMayJunJulAugSepOctNovDec";
char datebuf[40];

// We have the filename, now get its parameters (if it still exists, which is virtually certain)

fr = f_stat( fname, &fno );

    if ( fr == FR_OK )
    {

// send filesize, as "Content-Length: 1910916rnrn"
strcpy((char*)pagebuf, "Content-Length: ");
itoa(fno.fsize,(char*)&pagebuf[16],10); // place size after "Content-Length: "
strcat((char*)pagebuf,"rn");
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

// Send date/time, as "Date: Mon, 21 Oct 2015 07:28:00 UTCrn"
// To save the hassle of calculating the day of week (which isn't stored in the FAT12 directory anyway)
// we use Mon in case the client is validating the presence of the day string ;)
// For jpegs, FF and Edge extract this from exif, interestingly, if header is missing
                // Otherwise browsers appear to ignore this, and same with Last-Modified.

strcpy((char*)pagebuf, "Date: Mon, ");
int monidx = 3*(((fno.fdate >> 5) & 15)-1);
snprintf(datebuf,sizeof(datebuf),"%2u %c%c%c %4u u:u:u%c",
fno.fdate & 31,
montab[0 monidx],
montab[1 monidx],
montab[2 monidx],
(fno.fdate >> 9)   1980,
fno.ftime >> 11,
(fno.ftime >> 5) & 63,
2*(fno.ftime & 0x1f),
0);
strcat((char *)pagebuf, datebuf);
strcat((char *)pagebuf, "rnrn");  // 2xCRLF is the last thing before the binary file data
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

// send file; this is faster than KDE_file_read() which gradually slows down for big files

if (f_open(&fp, fname, FA_READ | FA_OPEN_EXISTING) == FR_OK)
{
do
{
if ( f_read(&fp, pagebuf, DBUF, &numread) != FR_OK )
{
numread=0;
break;
}
netconn_write(conn, pagebuf, numread, NETCONN_COPY);
offset =DBUF;
}
while (numread==DBUF);

f_close(&fp);
}
}

        // An attempt to reload the page after the transfer
netconn_write(conn, REFRESH_HEADER, strlen((char*)REFRESH_HEADER), NETCONN_COPY);

}

What I am not doing is closing the page (with </body> and/or with </html>) after the transfer, but is that necessary? IOW, is the file transfer a part of the body? I tried the foregoing but it makes no difference.

The Q seems to be: what state is the browser in after the file download? I don't think you can just send it data.

Quote
Maybe the page doesn't refresh because the browser is still waiting for you to close the connection or because the html page is invalid as you write stuff after  the </html> is sent?

Indeed.

I think the file data must be within the body, so I sent the JS stuff

Code: [Select]
"<script type="text/javascript">"
"setTimeout(function () { window.location.reload(true); }, 5000);rn"
"</script>"

and then closed the page with </body></html> but that doesn't work either.

At the end of the file download, the browser still has the download form on the screen. If one presses F5, that form disappears and the page gets refreshed.

Note that this website is dropping some chars above, like the backslash in \r.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 21, 2022, 06:12:15 pm
Oh... some issues there.

First of all, drop the DOCTYPE, stop using that. It's only needed for compatibility purposes, if you have some html content that actually uses deprecated html tags and other bad things from the past.
By default, browsers will assume you use HTML 5, which doesn't require you to specify the DOCTYPE, or in other words <!DOCTYPE html> is implicit. It's not needed.
Then, you don't need to say meta http-equiv = "content-type" because you already tell the browser through the HTML Response Header

( --- assume [ENTER] or [enter] means "\r\n" everywhere I say it, I type [enter] faster than \r\n  --- )

You should have a function that does the HTTP Response, and a separate function that does the actual html headers:

Code: [Select]
function generate_headers( contentType="text/html", filename="", filesize=-1, extraHeaders = "" )
Your HTTP Response would be 

Code: [Select]
HTTP/1.1 200 OK [enter]
Content-Type: text/html [enter]  <= this "text/html" could be passed as parameter to the function that builds the string so you could reuse this function with "octet-stream" as well
[enter]

This is your basic header.
If you want to serve a file as a download, then you can add some lines after changing the content-type to octet-stream or something else:

Code: [Select]
Content-Disposition: attachment; filename="file.xyz" [ENTER] 
Content-Length: <filesize> [ENTER]

 
<---  you can add the file name, file size as parameters to your generate header function, see above

In your generate_headers function you could do something as basic as this

if filesize != -1  then you know you have a download, so add those two lines.  Optionally, only add "; filename="###" segment IF filename is not an empty string (filenane != ""). If it is, leave just "attachment"

You can add those no-cache tags as well, if you want.

You can also have an extra parameter to your generate_header function called "extraHeaders" which could be useful for example when you want to add an extra response line in your header, like Set-Cookie for example, to set the session info , token to determine if user is logged in or not.

You have to be careful how you glue these bits together, so that you'll only have an empty line when you're done with the Response Header lines.

ex generate_header("text/plain", "", -1, "Set-Cookie: token=value; Secure; Max-Age=3600 [ENTER]")

Once you're done sending the Response Header,  you send the content ... which can be a html page, or it can be a download.  As soon as you're done, you close the connection and everything ends here.
Your webserver will receive another request.

For your html responses, AFTER the Response Header lines, you will have the basic HTML header, which would show up in all html files :

Code: [Select]
<html>
  <head>
    <title> this title is optional but shows up in the title bar </title>
    <link rel="shortcut icon" href="/favicon.ico" or href="data; ... " - see https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs -  />
    <link rel="stylesheet" href="filename.css" > <-- if you choose to serve a stylesheet as a separate CSS file (will be cached)
    <script src="file.js"></script">  <--  if you choose to serve a javascript each time a html page is loaded (will be cached)
 </head>
 <body>

After body, you can put actual content that's visible on a page.
You could have everything up to <body> including it, in another function, let's call it generate_pageHeader(title="Main Page") 

Then you print the stuff you want to be seen on the page, and last you close the body tag and the html tag.

The body tag, the html tag,  some optional inline javascript if you wish ... you could have a function called generate_pageFooter() that outputs
Code: [Select]
</body>
</html>
After you output this, you close the connection, the server's job is done.


NOW ... for your main page with the menu ... & is a bad character to use, and you shouldn't rely on  it to make distinction between menu options and actual downloads.
I would basically set a rule, for example :

anything that's in the root of the web server is a command or something that returns a html / picture response
* index.html = show main page
* login.html =  login user
* logout.html = log out
*upload.html = upload file
* edit.html  = edit a file
* delete.html = delete a file
* script.css = serve a css file by setting the proper mime type in the header function and then sending the css text after the response
* script.js = serve a javascript file, like with css file
* favicon.ico = serve favicon file

If the request has a folder in the url, then you know it's a download  ex   

* /FILES/CONFIG.INI  -  you receive the request, you see there's FILES/ there, so you know a download was requested, send proper response header then send the file, close the connection

If you do this, then in your html header, you would need to set any address as fixed to the root

anyway, we're going off topic... back to your default  index page

B tags and <font> tags are kind of deprecated, it's best to use style= " " , or move what you put in the style, into separate CSS or  inline CSS in the head

So your page after body could be this :

Code: [Select]
<h1>Menu</h1>
<a href="files.html">Files</a><br/>
<a href="status.html">Status</a><br/>
<a href="reboot.html">Reboot</a><br/>
<a href="test.html">Test</a><br/>

then you send the footer  aka </body></html>  and connection is closed.

Code: [Select]
<table>
<tr><td>Filename></td><td>Size</td><td>Attr.</td><td></td></tr>

<tr>
 <td><a href="/FILES/CONFIG.INI">CONFIG.INI</a></td>
 <td>954B</td>
 <td>---a-</td>

 <td> <a href="edit.html?filename=CONFIG.INI">edit</a>  <a href="delete.html?filename="CONFIG.INI">delete</a></td>
</tr>

</table>

In the files list, there's no reason why you would refresh the file list often ... are files gonna be created all the time? No. Once you dump the list of files, connection is closed, job done. User can hit reload to refresh the page if needed.

When user clicks on the file name, the browser creates a new connection to your server and requests the file by asking for /FILES/CONFIG.INI  or /FILES/WHATEVER ... your code sees that "FILES" and it can extract the file name desired to download from the string.
Or, you could have  download.html?filename=CONFIG.INI I guess. But you still have to parse the URL in the request headers.

There's no reason to add that meta refresh to disable the refresh because it's not possible, it's like this request for a file is loaded in a different browser, or another computer, it has no connection to the previous page.
You can't mix that meta tag with the HTML Response Header lines, it doesn't work like that.

For downloads, you call your generate response header function but you DO NOT call the generate html header function and your generate html footer function, you simply outpout the proper response headers, then dump the file and close the connection.






Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 21, 2022, 07:56:30 pm
Quote
In the files list, there's no reason why you would refresh the file list often ... are files gonna be created all the time?

As a bit of explanation: I have a 2MB FAT12 filesystem in a piece of SPI FLASH. This is accessible from embedded code (for which FatFS implements a 2MB drive) and also from Windows over USB (which just sees a 2MB removable drive with 512 byte sectors).

So files can just pop up from nowhere for this HTTP server file listing :) They can be created by one or more RTOS tasks (this http server is just an RTOS task) or by Windows over USB.

FatFS has been configured to support 8.3 names only (no LFN) and a decision was made to support only root file ops in the FatFS API. No directories will be supported, although ones already created in the root are visible:

(https://peter-ftp.co.uk/screenshots/202207214913955720.jpg)

You will point out that Windows can do what the hell it likes in that 2MB drive (it doesn't know there are other players) and that does work for the windoze->embedded direction; FatFS presents windoze-created LFNs as 8.3 using the standard alternate names and embedded code (including this server, of course) will see all windoze creations immediately, as soon as windoze has written the FAT entry. The bit which has no good solution is that embedded->windoze; windoze won't see filesystem changes due to its internal caching (it assumes it totally owns any removable device); there are crude workarounds like a periodic dismount of the drive, and there is an embedded function for that...

Hence the auto refresh on the file listing is desirable. I will document that it is lost after a file transfer.

Quote
& is a bad character to use

I have removed that; the URLs like files.html cannot appear accidentally in the 8.3-only filesystem :)

Favicon.ico could but I check for that before checking for potential filenames, so a file called that will not be picked up.

Quote
you send the content ... which can be a html page, or it can be a download.  As soon as you're done, you close the connection and everything ends here.

I realised the download is a part of the body so I am doing a </body></html> after the download is done. Not that it does anything. I think the browser has disconnected anyway after it got the specified # of bytes in the file.

Quote
B tags and <font> tags are kind of deprecated, it's best to use style= " " , or move what you put in the style, into separate CSS or  inline CSS in the head

There is also <strong> but I am not using any style sheets. I pay others for CSS work :)

Quote
then you send the footer  aka </body></html>  and connection is closed.

This is curious. I am sure LWIP (the TCP/IP embedded stack) does not implement any of that so it must be the browser which detects the </head> and close the connection at the windoze socket API.

Quote
Or, you could have  download.html?filename=CONFIG.INI

I had that originally but you can see that whole string when you hover over the filename. Still, my Edit and Delete links do show that stuff...

Quote
There's no reason to add that meta refresh to disable the refresh because it's not possible, it's like this request for a file is loaded in a different browser, or another computer, it has no connection to the previous page.
You can't mix that meta tag with the HTML Response Header lines, it doesn't work like that.

OK, good to have this confirmed. Still, why is that? The browser must be closing the connection after it has got the specified file size. I guess it does not do that if the content is embedded in the page e.g. an inline image. But for application/octet-stream (which is what triggers the browser to open the Save dialogue) it is evidently closing the connection right afterwards.

With an all-JS approach one could do anything but I need to do this within limits of my knowledge, more or less. I spent all day today playing with re-establishing the refresh after a file download and none of the zillion suggestions online work, HTML or JS.

Moving to uploads, presumably the server doesn't know what is sending the data, so one use some JS in the browser to implement the sending, and have a progress bar. I can see stuff like this HTML5 solution https://codepen.io/PerfectIsShit/pen/zogMXP but as usual there are key parts missing, because all these people are running with proper servers.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 21, 2022, 08:25:02 pm

I realised the download is a part of the body so I am doing a </body></html> after the download is done. Not that it does anything. I think the browser has disconnected anyway after it got the specified # of bytes in the file.


No. When you do a download you do not transmit any html , no <html>....<body>, no </html>, you send the response header, you send data, you close connection or browser closes it because you don't send the Connection: Keep-Alive in response header (which is no longer allowed in http/2)
You're messing around with the dates and extracting from files... no need... send as binary data.

that meta refresh is not your solution... go with javascript, you can have something basic as last file system update or something like that, your javascript requests that value and if different than previous value it has, something has changed and force the page reload using javascript.
Parsing the fat table to get newest creation date shouldn't be hard.

I'd suggest leaving this refresh for last.

you have a whole http server with multiple connections and lots of features you don't need in lwip/src/apps/http/httpd.c  , check functions there.



Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 22, 2022, 07:25:17 am
Quote
You're messing around with the dates and extracting from files... no need... send as binary data.

The file being downloaded is sent as binary. That works. Interesting to learn that that closes the connection afterwards.

Quote
you have a whole http server with multiple connections and lots of features you don't need in lwip/src/apps/http/httpd.c  , check functions there.

The cancelled server project used that. I cancelled it after it ran to a huge number of hours; could not afford to pay so much. That ST stuff is mostly junk which doesn't work properly and needs a lot of time spent fixing.

Agreed; JS must be the way to do refresh and I will dig around again at the end of this.

Re uploads, I am going to get someone on freelancer.com to write me some JS. It won't be completely trivial.

I have that "Upload file" link (see image above) and when somebody clicks on that, I can emit whatever data I want and send that to the browser. The browser JS can then return the file (with a byte count header at the start) and I can save the data to the filesystem as it arrives.

There are no doubt more standard ways to do this but I am having difficulty working out which side sends what, and then I would like the progress bar which does need JS. And with JS I have control of both ends of the link. Presumably JS can also pop up the standard client file picker.

Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on July 22, 2022, 11:17:54 am
The browser prefers to keep the connection alive, because that initial handshake between client and server takes time.  If you don't specify that in your response header, it will default to closing the connection.
At the same time, browsers are optimized and assume the average website will have extra resources to download (css, js files, images on the page etc) so some will be ready to open 2-3 or even more connections in parallel as soon as the html page tells it there's some extra file to be retrieved (ex a img tag, a favicon address) even if the server says it supports recycling the connection to serve more content. It won't wait until the whole html page is downloaded to reuse the connection to get those extra resources.

Your server listens on an IP and on port 80 (default for unencrypted http).
Client's browser picks a random unused port (let's say 50120) and a connection is created between  server ip : 80  <-> client ip : 50120
if there's more resources on the page being requested, the browser may decide to open a second connection by picking another free port , let's say 50300 .. and now there's connection serverip:80 <-> clientip:50300
It's your server's job to keep track of these connections and manage their state, and close the connection when you're done serving something, so that entry in the pool can be reused for another connection.
A simple way would be to have a pool of 2-8 connections and allow a maximum of two or three connections per IP (this would allow one upload or download, while browsing page, refreshing page using the other connection)
If your server doesn't distinguish between connections, it could happen that while your server pushes a download to a client, you get another request and you serve the reply to that in your download corrupting the download.
My guess is that right now, if you send something you're blocking anything else or ignoring any incoming data which is not good. 

I'd suggest looking some more into that httpd.c file, strip whatever you don't need from it , ex ipv6 support, https support, but try to figure out how it keeps the state of connections and how it parses the request header and all that.

Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 22, 2022, 12:19:43 pm
I have this thing nearly done but I am getting stuck on simple things like file deletion. Displaying the updated Files page after a deletion turned out to be really hard. It was solved with these two

static const unsigned char PAGE_HEADER_ALL[] =
   "HTTP/1.1 200 OK\r\n"
   "Server: XXXXXX\r\n"
   "Content-Type: text/html\r\n\r\n"
   "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">"
   "<html>"
   "<head>"
   "<title>XXXXXX</title>"
   "<meta http-equiv=Content-Type content=\"text/html; charset=utf-8\">"
#ifdef BLOCK_FAVICON
   "<link rel=\"icon\" href=\"data:,\">"
#endif
   ;

static const uint8_t REFRESH_HEADER[] =
      "<meta http-equiv=\"refresh\" content=\"0; url=files.html\" />"
      "</body></html>\r\n";
      ;

However I have been doing some version of this for ages. Some part of that first header is key to make the refresh/redirect line work, and it isn't the obvious bits like <head>. I reckon it is some deprecation.

I asked someone about JS for uploading (it seems to me that once you are running JS in the client, you can run whatever private protocol you like for the data transfer) and I will now have a go at editing:

Quote
Editing of a document can be done just the same with a POST form, only you can use a TEXTAREA - https://developer.mozilla.org/en-US/docs/Web/HTML/Element/textarea (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/textarea) -  around the contents of the document and the user will see the text in a editable text box on the screen
When you hit the save button, the browser will send you the whole contents of that textarea and whatever other parameters are in the form (hidden on purpose by you or not)

Note that the contents of the document would have to be escaped ex < > and & at the very least :  < would be &lt; , > would be &gt; and & would be &amp; - the browser will parse the html and show the characters properly on the screen and send you the actual characters in the POST form data, not the escaped ones.

There's some other gotchas, like if you don't specify a text encoding for your html files, it's assumed your html file and whatever is in it is UTF-8, so if you dump a .txt file that was written with character encoding ISO-8859-1 then some valid characters in ISO-8859-1 would be invalid in UTF-8 (invalid code points, those specific characters get two byte combinations in utf-8) and not rendered properly on screen.  See for example https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/ (https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/) for explanation

I have the text box opening ok, 80x25

(https://peter-ftp.co.uk/screenshots/20220722324651817.jpg)

Now I have to work out how to send it back. Will have to check there is enough room for the temp file and flag an error or something... can this be done with just HTML? It needs a bidirectional protocol of some sort. I need to send back a message (which again must be a complete HTML page) advising whether the write passed or failed.

This
http://marc.merlins.org/htmlearn/cgitutor/textareas.html (http://marc.merlins.org/htmlearn/cgitutor/textareas.html)
seems to have examples of POST to return the textarea to the server. And I guess that is it - you POST the textarea back to the server, and it sends back a page with some message, or nothing.

Am I right that a server can accept a POST at any time? If not, how is the session kept alive? Is there some underlying keep-alive process running when a textarea is opened?
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 24, 2022, 09:24:47 pm
A colleague reckons I should use JS for both returning the textarea and for the file upload.

JS would enable a progress bar in both cases; relevant due to the slow 30kbyte/sec flash filesystem writing speed.

He's having a go at it.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 31, 2022, 12:40:10 pm
I've been getting on well with this http server. Quite a learning exercise, doing this with the basic TCP/IP API in LWIP (I am using netconn), and byte-banging everything. 99% of stuff on google is of little use because it assumes a proper web server...

From the POV of "embedded", two things which are quite messy are returning a TEXTAREA edit box, and file upload. These both use the POST method which involves searching incoming data for huge delimiting strings, which is tricky if your RX buffer is say 512 bytes. The string itself should not be > 512 in length but they can be > 512 bytes from the start of the incoming data. The result is quite tacky code, if you are going to correctly handle partial matches.

The alternative is to use PUT which is much simpler (you just get a filename and byte count and search for 2xCRLF) but a browser can't transmit a PUT. You have to use JS.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: tellurium on July 31, 2022, 07:12:54 pm
The alternative is to use PUT which is much simpler (you just get a filename and byte count and search for 2xCRLF) but a browser can't transmit a PUT. You have to use JS.

The safest strategy is to load the whole file on the client side (browser) using JS, split it by small pieces, and POST/PUT piece by piece (again, using JS xhr or fetch), using binary encoding. This way, the next chunk gets sent only when the previous is acknowledged, which guarantees a limit on RAM usage.

A standard HTML form upload is quite difficult to implement on a RAM-limited device.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on July 31, 2022, 08:35:04 pm
Quote
The safest strategy is to load the whole file on the client side (browser) using JS, split it by small pieces, and POST/PUT piece by piece (again, using JS xhr or fetch), using binary encoding. This way, the next chunk gets sent only when the previous is acknowledged, which guarantees a limit on RAM usage.

You know far mor about this stuff than I ever will, but AIUI you can upload to a server in one go. TCP/IP does the flow control automatically so if a browser sends stuff at 10mbytes/sec, but the server is taking data out of the netconn or winsock API at 30kbytes/sec (flash write speed) that will "just work".

I am actually reading a file in 512 byte blocks and sending each one via the netconn lwip api. Optimising some things to the limit I can get 1.2mbytes/sec out of my board, to the browser, which is entirely respectable, and this is genuine application to application, going through god knows how many protocol layers.

There is bidirectional traffic anyway, as part of TCP/IP - as I soon discovered when I found that the RX ETH data rate (which on my system is limited to 100 low level packets/sec, for various reasons) determines the TX data rate quite precisely, to 250kbytes sec ;)

Quote
A standard HTML form upload is quite difficult to implement on a RAM-limited device.

AFAICT the messy bit about POST is looking for the long delimiter strings, with limited RAM, and doing it correctly in case they appear in the data by accident. Whereas with a PUT you get a simple header with a byte count (but need to use JS to send the stuff).

But one needs JS to do a progress bar or any kind of progress report, on any upload to the server, anyway.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ejeffrey on August 01, 2022, 03:31:06 am
The delimiter should never appear in the body.  The client is supposed to ensure that, although AFAIK on file uploads it will only do so statistically (it doesn't read the file in advance to see if the boundary is there).  The boundary is also required to be no longer than 70 characters and on a line by itself (plus the two implicit leading hyphens).  I don't remember if the 70 character limit includes the hyphens, double check that.

This should be enough to parse it even on a low memory machine but you need to do so carefully, especially if you are also trying to have good performance.  One way is to read in chunks that always are either max size or end in a newline.  That way you will always start each line at the beginning of a buffer and the delimiter can never be split across blocks.

One important thing to consider for small memory devices is how to handle a failed POST.  If the client drops the connection you should make sure that the entire POST is reverted.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: tellurium on August 01, 2022, 11:39:33 am
Do whatever you wish :)

Just to note - as far as I know, the HTTP method (POST or PUT), is irrelevant to the encoding.
The multipart encoding (which browser does on "standard" form upload) can be done with either PUT or POST.
Likewise, a binary encoding, without multipart chunks/boundaries, can be done with either PUT or POST.

What matters is the body format, which is specified in the Content-Type together with Transfer-Encoding headers. Thus, there is nothing messy about POST and good about PUT.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 01, 2022, 12:11:32 pm
Quote
Do whatever you wish

You probably come from a different end of things, where you have a proper server to work on, whereas I am having to byte-bang everything at a low level, and with very limited memory.

Quote
the HTTP method (POST or PUT), is irrelevant to the encoding.

AIUI, the data is sent as binary, so it affects only the delimiters of the data, which are messy to parse in my situation, and which using PUT avoids.

Maybe there is something else but I haven't discovered it yet.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: tellurium on August 01, 2022, 12:31:38 pm
AIUI, the data is sent as binary, so it affects only the delimiters of the data, which are messy to parse in my situation, and which using PUT avoids.

You missed the point again.

Those "delimiters" are irrelevant to PUT or POST. What you call "delimiters", is the format of the body - please re-read my message again if you're interested . PUT does not "avoid" anything. You can make PUT request with the same "delimiters", or POST request without those.

I came from the same end of things, dealt with networking on a very RAM constrained environments, more constrained than STM32F4 you're dealing with.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 07, 2022, 08:01:42 pm
I am coming back to this for some peripheral stuff. My web server is 99% finished, downloading and uploading etc.

What I remain puzzled about is the state of the client after a file transfer is finished.

In both directions, a byte count is sent and the receiving end is using this to find out the end.

If downloading a file (to a client browser), I use

Code: [Select]
// Header for file download
static const uint8_t DOWNLOAD_HEADER[] =
"HTTP/1.1 200 OK\r\n"
"Content-Type: application/octet-stream\r\n"
"Content-Disposition: attachment>\r\n"
"<meta http-equiv=refresh content=1000>\r\n" // cancels out the file listing refresh
"Pragma-directive: no-cache\r\n"
"Cache-directive: no-cache\r\n"
"Cache-control: no-cache\r\n"
"Pragma: no-cache\r\n"
"Expires: 0\r\n"
;

and then send this (not a fixed header because it has to have a date and size in it - the date is actually another story since it is either ignored or the EXIF, if present, is used to stamp the received file, by some browsers... much discussion online about changing this and doing date stamps properly but most browser developers are against it):

Code: [Select]
// send filesize, as "Content-Length: 1910916\r\n"
strcpy((char*)pagebuf, "Content-Length: ");
itoa(fno.fsize,(char*)&pagebuf[16],10); // place size after "Content-Length: "
strcat((char*)pagebuf,"\r\n");
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

// Send date/time, as "Date: Mon, 21 Oct 2015 07:28:00 UTC\r\n"
// To save the hassle of calculating the day of week (which isn't stored in the directory anyway)
// we use Mon in case the client is validating the presence of the day string ;)
// For jpegs, FF and Edge extract this from exif, interestingly, if header is missing.
// Otherwise browsers appear to ignore this, and same with Last-Modified.

strcpy((char*)pagebuf, "Date: Mon, ");
int monidx = 3*(((fno.fdate >> 5) & 15)-1);
snprintf(datebuf,sizeof(datebuf),"%2u %c%c%c %4u %02u:%02u:%02u%c",
fno.fdate & 31,
montab[0+monidx],
montab[1+monidx],
montab[2+monidx],
(fno.fdate >> 9) + 1980,
fno.ftime >> 11,
(fno.ftime >> 5) & 63,
2*(fno.ftime & 0x1f),
0);
strcat((char *)pagebuf, datebuf);
strcat((char *)pagebuf, "\r\n\r\n");   // 2xCRLF is the last thing before the binary file data
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

and then the binary data is sent. Nothing is sent after it, and nothing seems to get to the client even if it is sent, as if the browser has unilaterally closed the connection. But is that really true? What is the actual browser state after a "Content-Disposition: attachment\r\n" download? I am suspecting that the file is treated as within an HTML body, and maybe doing a </body> may do something. It's quite strange. All I want to do at this point is to send the browser to a specific URL on the server, or even to simulate pressing F5 (which is impossible in HTML; only client-side JS can do that).

What is clear is that one cannot do a progress bar from the server end, because the binary data is being sent down in the same direction. But browsers give you progress anyway...

If uploading a file to a server, I am using JS to create a PUT (can't be done in HTML in response to a form sent to the client, e.g. returning a TEXTAREA, if done in HTML, goes back as a POST). This all works too, but I struggle to get the server, upon successful receipt of all the bytes, to send back the "200 OK" (or some failure) message. The JS is written to look for it but it isn't getting it.

It is as if, again, a PUT transfer to a server, while obviously keeping alive that data direction during the transfer, has closed the connection in the other direction. Is that possible? I am returning one of these, according to the upload success

Code: [Select]
// Return this to client if file write (textarea file or Upload file) got written ok
static const uint8_t FILE_WRITE_GOOD[] =
"HTTP/1.1 200 OK\r\n"
"Content-type: text/plain\r\n"
"Server: XXXXXX\r\n\r\n"
;


// Return this to client if file write (textarea file or Upload file) failed
static const uint8_t FILE_WRITE_BAD[] =
"HTTP/1.1 413 Write Error\r\n"
"Content-type: text/plain\r\n"
"Server: XXXXXX\r\n\r\n"
;

I can post the JS script but if I did that nobody would reply ;)

It also seems possible to do an HTML-only "progress bar" (even if it is just a string of dots, getting longer) on an Upload, because there is no data going server -> client in that situation. I have a nice progress report from the JS script actually. Has anyone tried this? I realise anything sent to a browser has to be a properly formed HTML page, so you would do <head></head><body> etc and then within the body you would emit some dots, or numbers, etc and then </body>. Why would this not work? In HTML it isn't possible to do a "clear screen" (again, lots of people have asked this, and the answer seems to be that the HTML for a fresh page does a CLS implicitly) and it isn't possible to do a "cursor home" (CR, no LF) for the same reason. But you could output a string of dots, say one every 10k bytes received. What is wrong with that? I have seen such things but they may have been done with JS.

One final thing I am seeing on the uploads is that if uploading say a 2MB file, the progress goes to ~500k immediately and then increments at the expected speed (30kbytes/sec flash writing) until 100% done. And then when it gets to 100% on the client, I see flash writing carrying on for some tens of seconds. This looks like 500k of transmit caching in the PC client stack. Is that really possible?

This is the relevant bit of JS

Code: [Select]
function updateProgress(evt)
{
   if (evt.lengthComputable)
   {
     document.getElementById("progress").innerHTML = evt.loaded + " of " + evt.total + ", " + (evt.loaded/evt.total*100).toFixed(1) + "%";
   }
}

function uploadFile()
{
const fileInput = document.getElementById('file');

if(fileInput.files.length == 0) {
alert("Select a file to upload first");
return;
}

document.getElementById("submit").disabled = true;

const fileReader = new FileReader();
fileReader.addEventListener("load",   function (e) {
const rawData = e.target.result;
const putRequest = new XMLHttpRequest();
putRequest.open("PUT", "/ufile=" + fileInput.files[0].name);
putRequest.upload.addEventListener("progress", updateProgress, false);
putRequest.addEventListener("load", function (f) {
if(putRequest.status == 200 || putRequest.status == 201) {
document.getElementById("progress").innerHTML = '';
alert("Upload succeeded");
history.back();
} else {
alert("Upload failed with code " + putRequest.status);
document.getElementById("progress").innerHTML = 'Upload failed.';
document.getElementById("submit").disabled = false;
}
});
                putRequest.send(rawData);
    });
    fileReader.readAsArrayBuffer(fileInput.files[0]);
}

Quite a lot of people have posted this kind of thing online, almost always without any resolution, because nobody seems to be doing this stuff at the byte level.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on August 07, 2022, 08:43:52 pm
Jesus again with the same questions.

You don't seem to understand some fundamental things.

Once a page is loaded and parsed, it's like it's an island, with its own universe.
When you click on a link,  it's like the browser will create a new island, with its own separate rules.  Sending the meta refresh in the response header not only is invalid, incorrect, stupid, but it would not work, because NOTHING sent in this new island, in this new instance, whatever you want to call it, will happen on the original page.
The original page will continue to refresh inside the browser, if the user opens the download link in a new tab or if you use content disposition attachment . If you use inline download, the active page is killed and the browser reuses the page for download.

Then it's something super basic.

Browser creates a connection to your web server
Browser sends the  request headers 
GET  page name / protocol
Domain : value
Key : value

until it sends an empty line.
These "Key : Value" pairs may tell the web server that the browser is capable of doing some things. For example: Connection : keep-alive  tells the web server "Hey server, I'm smart enough to reuse the established connection to request more files, so if you agree with my request, leave the connection open and I may request more files on this same connection"
 
Then, it waits for a response from you, the web server.

The web server's job is to parse these request headers and respond with a suitable response header, and then serve the content of the page, and then close the connection UNLESS you chose to support features like what I said "Connection : keep-alive

In your case, you should say  Connection : close in your response header,  which tells the browser  "No dude, I don't feel like serving multiple requests / giving you multiple files on the same connection, as soon as I'm done transferring this page/download you SHOULD close the connection - I may or may not close the connection from my end but either way I'm gonna ignore whatever further requests come on this connection"


A download is a download, it has nothing to do with HTML tags, you don't add  body tags, you don't add html tags ... you send the response headers and you dump the binary content of the file, and once it's done you should CLOSE THE CONNECTION.


again

"<meta http-equiv=refresh content=1000>\r\n"   // cancels out the file listing refresh

in the DOWNLOAD_HEADER  IS NOT PROPER , it's garbage, any browser should ignore it, or even go further and reject your whole page and not load it, it's malformed response header. Everything after HTTP/1.1 200 OK  should be Key : Value pairs ... pay attention how that line is not a key : value pair.   

And I think I've said it in a previous post.  Your web server should not look into pictures for exif information and to extract dates from there... it's a sure way to introduce vulnerabilities in your project.
Your download code doesn't even have to send that Date: whatever pair.


You should NOT use Javascript to show progress bars and show dots or whatever, first do it the regular POST style which is the easiest and then complicate your life with these other things. Use Javascript for client-side only stuff that won't affect you, like sorting file lists in the browser, or retrieving the file list through some super basic API , instead of using meta refresh or some other shit like that. 

Figure out  multiple independent connections at the same time ... access your device in several tabs at the same time, and click on various parts of the interface in each time, do they get mixed or what happens... what happens when you click on a link in a tab, while you download a file in another tab?
These are things you have to fix first and then worry about more features.





Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ledtester on August 07, 2022, 11:02:40 pm
Regarding how to handle form submissions and file uploads, you might find this article on the PRG pattern (or "redirect after post") helpful:

https://en.wikipedia.org/wiki/Post/Redirect/Get


Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 08, 2022, 08:51:43 am
Quote
Jesus again with the same questions.
You don't seem to understand some fundamental things.

No need to be rude.

Quote
Then it's something super basic.

If you are super clever.

Quote
A download is a download, it has nothing to do with HTML tags, you don't add  body tags, you don't add html tags ... you send the response headers and you dump the binary content of the file, and once it's done you should CLOSE THE CONNECTION.

The Q I asked, which you didn't answer, is what state the browser is in. Has it closed the connection?

Quote
"<meta http-equiv=refresh content=1000>rn"   // cancels out the file listing refresh

Yes this was in the wrong place; it has to be inside <head>, not <body>. I actually removed these because they were not needed; the change of context (e.g. invoking
a textarea) stopped the refresh.

Quote
Your web server should not look into pictures for exif information and to extract dates from there... it's a sure way to introduce vulnerabilities in your project.
Your download code doesn't even have to send that Date: whatever pair.

You misread. It is the browser which looks for EXIF and uses it to create a date for the downloaded file.

This file on the server

(https://peter-ftp.co.uk/screenshots/202208082815304109.jpg)

gets saved in Chrome or FF (not Edge) as

(https://peter-ftp.co.uk/screenshots/202208083715314209.jpg)

If you think I am stupid, try it. It gets it out of EXIF (not spent time testing which of the three dates does it, or whether browsers do it on TIFF files also):

(https://peter-ftp.co.uk/screenshots/202208085115324309.jpg)

Quote
You should NOT use Javascript to show progress bars and show dots or whatever, first do it the regular POST style which is the easiest and then complicate your life with these other things. Use Javascript for client-side only stuff that won't affect you, like sorting file lists in the browser, or retrieving the file list through some super basic API , instead of using meta refresh or some other shit like that.

That's your opinion. If writing a 2MB file, to a flash filesystem whose write speed is 30kbytes/sec, a progress bar is highly desirable. You also didn't comment on whether during an upload, textual data can be sent to the browser.

Quote
Regarding how to handle form submissions and file uploads, you might find this article on the PRG pattern (or "redirect after post") helpful:
https://en.wikipedia.org/wiki/Post/Redirect/Get

Thanks - that's really clever :) Somebody sent me this cunning trick but I didn't get around to testing it because I had already moved to a JS version

Code: [Select]
static const unsigned char GOTO_URL[] =
"<body><script>rn"
"window.location.replace("/xxxxxx.html")rn"
"</script>nr"
"</body>"
;

The only drawback is that the above JS has to be inside the <body> context. You can't just send it to a browser, presumably because once a browser has seen </html>, or finished downloading a file, it closes the connection. So any "go to URL" done at that point has to be initiated by the browser, which implies a) user doing it, or b) some JS doing it.

---

If anyone has any idea why uploads run instantly to ~520k and then run at the target data acceptance speed, it would be interesting. It is as if, on an upload, the browser does a malloc() of 520k, fills that up (instantly) and feeds data out of that to the windows tcp/ip socket. So an upload progress report looks a bit silly, but more importantly the data keeps flowing out of the browser to the target for some tens of seconds after the browser has finished. That said, Edge does not do the foregoing at all; it first sends the file header (ending with CRLFCRLF) as a single packet, and then sends the file out without caching!


Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: httpmies on August 08, 2022, 03:41:58 pm
the date is actually another story since it is either ignored or the EXIF, if present, is used to stamp the received file, by some browsers... much discussion online about changing this and doing date stamps properly but most browser developers are against it)

The Date header (https://httpwg.org/specs/rfc7231.html#header.date) tells the browser when the *message* was produced. In this context, "message" refers to the HTTP response which contains the data in the body. Thus, the date is not indicative of when the file was modified, but of when this HTTP request which is transferring that file was generated. This header is useful for caching proxies and other middleware on the internet, but not so useful to you.

Nothing is sent after it, and nothing seems to get to the client even if it is sent, as if the browser has unilaterally closed the connection. But is that really true? What is the actual browser state after a "Content-Disposition: attachment\r\n" download?

Content-Disposition: attachment (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition) simply instructs the browser that the response body should be downloaded, so browsers show a "Save as" dialog. The default is inline, which indicates that the body should be displayed as a web page or inside the web page. The browser state after such a download is the same as after any other HTTP response, which is "it depends".

The browser will usually want to keep the TCP connection alive after a request, as it's probable that there will be further requests. When using HTTP/1.1, keeping the connection open is the default for requests. This can be controlled with the header Connection (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Connection). For HTTP/1.0 requests, the default is Connection: close, meaning that the browser will close the TCP connection after each request. Although in practice the browser may pre-emptively open another TCP connection.

If you're interested in seeing how your browser behaves on the TCP level, you could download Wireshark and do some packet captures. If you try on a server running on your localhost, choose the loopback interface. If you try on a server running in your LAN or on the internet, choose the network interface which you use to connect to those, and set "tcp" as your capture filter. I took a screenshot of a Wireshark capture of a server running locally which keeps the connections alive for 5 seconds after each request. See attachment.

However, "when will the browser close the connection" is also related to the question of "when will the browser interpret the response to have ended", and the Connection header is only one part of it.

The only drawback is that the above JS has to be inside the <body> context. You can't just send it to a browser, presumably because once a browser has seen </html>, or finished downloading a file, it closes the connection.

So, when will the browser interpret the response to have ended? RFC-9112 (https://httpwg.org/specs/rfc9112.html#message.parsing) says the following:
Quote
2.2. Message Parsing
The normal procedure for parsing an HTTP message is to read the start-line into a structure, read each header field line into a hash table by field name until the empty line, and then use the parsed data to determine if a message body is expected. If a message body has been indicated, then it is read as a stream until an amount of octets equal to the message body length is read or the connection is closed.
Emphasis mine. The browser will wait for more content until it reaches the length specified by the Content-Length header, or until the TCP connection is closed. The Content-Length header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Length) is also related to the Transfer-Encoding header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding), as the former is useless when using chunked encoding (https://httpwg.org/specs/rfc9112.html#chunked.encoding) (which you could use to drip-feed new content onto a page by keeping the response mid-flight forever - though, this is very unidiomatic).
Edit: The RFC 9112 subsection 6.3 Message Body Length (https://httpwg.org/specs/rfc9112.html#message.body.length) outlines the entire algorithm of the spec - it is slightly more involved than just content-length, but not by much.

Just sending a </html> will do nothing to the connection. Browsers are also almost painfully lenient on malformed HTML, and will be more than happy to merge multiple <html> trees into one <body> for you, depending on the browser.

To see for yourself, install netcat, run one of the following in your *nix terminal, and connect to http://localhost:3000 (http://localhost:3000) on your browser and observe its behavior.

1. No content-length
Code: [Select]
echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\n\r\n<\!DOCTYPE html><html><html><body>123</body></html>" | nc -l -p 3000
2. Content-length
Code: [Select]
echo -e "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 50\r\n\r\n<\!DOCTYPE html><html><html><body>123</body></html>" | nc -l -p 3000

Notice that the example 1 will render the page, but will show the loading spinner until you cancel the request. The second example will finish loading successfully. To emulate a "until the connection is closed" situation, press ctrl c in the first terminal.


What is clear is that one cannot do a progress bar from the server end, because the binary data is being sent down in the same direction. But browsers give you progress anyway...
This is true. The browser's progress bar when downloading is based on the Content-Length header, which the server will use to tell the client how large the response body is.

You have multiple options for indicating progress. Your files will need to be in a <form> with a <input type="file"/> for binary files, but textual content can be in a text area - whatever.

Here's an example of option 3. Start a dummy netcat server on localhost:3001 with the following command:
Code: [Select]
while true; do echo -e "HTTP/1.1 200 OK\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Headers: *\r\n\r\n" | nc -l -p 3001; done
HTML & associated script, can just open this directly in your browser to test it out:
Code: [Select]
<!DOCTYPE html>
<html>
  <head>
    <title>Title</title>
  </head>
  <body>
    <div>progress: <span id="progress"></span></div>
    <form id="form">
      <input type="file" name="file_input" />
      <button type="submit">submit</button>
    </form>

    <script>
      const POST_URL = "http://localhost:3001/foobar";
      async function readFile(file) {
        return new Promise((resolve, reject) => {
          const fileReader = new FileReader();
          fileReader.onload = (e) => {
            const arrayBuffer = e.target.result;
            resolve(arrayBuffer);
          };
          fileReader.error = (ev) => {
            reject(new Error("FileReader error"));
          };

          fileReader.readAsArrayBuffer(file);
        });
      }

      const progressBar = document.getElementById("progress");

      document.getElementById("form").addEventListener("submit", async (e) => {
        console.log("submit");
        e.preventDefault();

        const form = e.currentTarget;
        const fileInput = form["file_input"];
        if (fileInput.files.length <= 0) {
          // no files selected
          return;
        }

        const file = fileInput.files[0];
        console.log(`Uploading ${file.name} (${file.type})`);
        const type = file.type; // MIME type (for Content-Type)
        const fileBuffer = await readFile(file);

        const request = new XMLHttpRequest();
        request.upload.onprogress = (e) => {
          if (!e.lengthComputable) {
            progressBar.innerText = "<incomputable>";
            return;
          }
          const percentage = (e.loaded / e.total) * 100;
          progressBar.innerText = `${percentage.toFixed(1)}%`;
        };
        request.upload.onload = (e) => {
          // see also onerror, onabort
          progressBar.innerText = "upload done, processing...";
          // now, wait for response
        };
        request.onreadystatechange = () => {
          if (request.readyState === XMLHttpRequest.DONE) {
            // grab response as text from request.responseText
            console.log("got response");
          }
        };
        request.open("POST", POST_URL);
        request.setRequestHeader("Content-Type", type);
        request.send(fileBuffer);
      });
    </script>
  </body>
</html>
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 09, 2022, 05:56:07 am
Thank you all, especially httpmies for the super detail.

Today I made a lot of progress on the little bits which weren't right.

Firstly, I can confirm that after a browser has finished uploading a file to the server, it does not close the connection. It is thus capable of receiving whatever error response e.g.

Code: [Select]
// Return this to client if file write (textarea file or Upload file) failed
static const uint8_t FILE_WRITE_BAD[] =
"HTTP/1.1 413 Write Error\r\n"
"Content-type: text/plain\r\n"
"Server: xxxxxx\r\n\r\n"
;

The reason this was previously not working was a subtle mistake in the data receiving loop, where a blocking function netconn_recv() was called at the very end of the transfer, but there was no more data coming... It needed just one more byte to come out of it. Now I terminate the loop once the declared file size has been fully received, so netconn_recv() gets called only if the count is incomplete.

This made me wonder about timeouts. TCP/IP is error corrected end to end, but if the link actually broke during the transfer, you don't want it to hang. LWIP must implement some sort of recovery, but it appears that the default setting is to disable timeouts. Could that possibly be right? Maybe it refers to different kinds of timeouts. Currently, netconn_recv() never returns if the data stops. I am surprised a TCP/IP stack could function usefully with no timeouts.

Anyway I set up a 5 sec timeout on LWIP. Interestingly netconn_recv never returns a "timeout" errorcode; it returns only "connection closed", even if I close the browser itself halfway through an upload.

On downloads, I removed the file date header and all works as before. Thanks for that tip. The file gets saved with the current date, except with Chrome and FF where there is EXIF as mentioned before.

Downloads do appear to cause the browser to close the connection upon completion. This is just HTML, no JS involved. Further data from the server just disappears. And I am sending a fully formed web page.

Of the 3 browsers, only FF does the 500kbyte+ cache on uploads, somewhat messing up any progress bar in the process. This doesn't matter. You just get the Transfer Complete popup (from JS) about 15 seconds after the upload reached 100% :) I could upload in chunks but then I have to implement that on the server.

I used Wireshark a long time ago and really should get back to it. In fact just learning the browser debugging tools would be good. There is some way to see all data going back and forth but I have not been able to work it out.

Interesting that the keep alive causes the server to send 304 every 5 seconds. I could implement that, although my "server" is a primitive one which is mostly stateless and just responds to each client request immediately.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: httpmies on August 09, 2022, 10:39:45 pm
Firstly, I can confirm that after a browser has finished uploading a file to the server, it does not close the connection. It is thus capable of receiving whatever error response e.g.

...

Downloads do appear to cause the browser to close the connection upon completion. This is just HTML, no JS involved. Further data from the server just disappears. And I am sending a fully formed web page.

I suspect some of the frustration in this thread originates from a differing mental model of HTTP. Please allow me to try to transfer some of my mental model to you.

HTTP is a stateless protocol. It is communicated over a stateful TCP connection, which exposes a reliable duplex stream of bytes. An important note is that there is no such thing as a HTTP connection, as data transfer with HTTP happens via a sequence of request-response pairs (in essence, though there exists HTTP/1.1 pipelining and all sorts of multiplexing in HTTP/2 and /3). Whether the underlying TCP connection is closed in between request-response pairs depends (https://developer.mozilla.org/en-US/docs/Web/HTTP/Connection_management_in_HTTP_1.x), as I mentioned earlier. A client (e.g. a browser) makes a request for a resource (e.g. GET /path), the server responds with some data or a redirect, and that's it. Your browser only knows that a response is associated with its request because they happen sequentially in the TCP connection.

This is essentially what RFC-9110 "HTTP Semantics" says in the subsection 3.3. Connections, Clients and Servers (https://httpwg.org/specs/rfc9110.html#connections) and 3.4. Messages. As you're implementing a HTTP server from scratch, I would recommend glancing over the section 3. Terminology and Core Concepts. Even though it's a RFC, this one is quite human readable.

When you say that after "finishing uploading a file to a server, [a browser] does not close the connection" and that "downloads do appear to ... close the connection", my assumption is that you mean that you're trying to write a HTTP response with your microcontroller after handling a "download", and the server cannot send another document to the browser after already writing a "download".

Please open the image I've attached. It is a sequence diagram (created with PlantUML, if you're curious) of what happens when a user submits a small binary file to a server from the following page:

Code: [Select]
<!DOCTYPE html>
<html>
  <body>
    <form action="/" enctype="multipart/form-data" method="post">
      <input type="file" name="file_input" />
      <button type="submit">submit</button>
    </form>
  </body>
</html>

As you can see from the diagram, after the browser performs the form submission ("uploads a file"), it is expecting a request as I mentioned earlier. I think that this is related to your observation that "after ... uploading a file, [the browser] does not close the connection", if by connection you mean "the request-response sequence". Indeed, it waits for a response from the server. In the diagram, the server responds to the "upload" by redirecting the browser to navigate to the address of the newly created file (302 Found) so that the user downloads said file.

The operation of "downloading" is in fact the browser making a request for a specific path, and the server sending a HTTP message with the contents of that file or whatever content it pleases as the message body. The meaning attached to the colloquial use of "downloading" also implies the familiar "Save as..." dialog, and the tab closing as the file begins to download to disk in the background. The only difference between this behavior, and your browser simply showing a web page (you navigating to a web address) is whether or not the server included Content-Disposition: attachment in the headers of the response.

Turns out, there's actually nothing special about uploads or downloads - simply HTTP messages with headers and a body, which have semantics attached to them through standards (e.g. browser should use the POST method when submitting form data to the server and encode the fields like so and so, 204 No Content shouldn't have a body, etc.) and conventions (e.g. if I ask for /foo.exe, I'm expecting it to start a download for a file named foo.exe via Content-Disposition: attachment).

It now makes sense why "downloads cause the browser to close the connection upon completion", since the server has already written the response (the contents of the file), and it cannot write another response immediately after another one. From the perspective of the browser, it has completed what you've asked of it, and is ready to wait for the user's next move (what request shall we make next?).

LWIP must implement some sort of recovery, but it appears that the default setting is to disable timeouts. Could that possibly be right? Maybe it refers to different kinds of timeouts.
I'm not familiar with LWIP, but I would expect there to be some timeouts at least. Perhaps your browser is just keeping the TCP socket open after you close the tab, if you have other tabs open? I would investigate this with Wireshark to check, or use your browser's developer tools (see below).

I used Wireshark a long time ago and really should get back to it. In fact just learning the browser debugging tools would be good. There is some way to see all data going back and forth but I have not been able to work it out.
Oh dear! So sorry to hear that you've been debugging blind! Of the 3 major browsers, I've found that Chrome (and other Chromium-based browsers like Edge) currently offers the best development tools currently. FF is pretty close as well.

Press F12 to open up the devtools drawer (you can move it around or pull it out into its own window from the three vertical dots menu on the right). There's multiple ways (https://developer.chrome.com/docs/devtools/open/) of opening the devtools, but this one is simple.

- The Elements tab shows the currently rendered HTML (may differ from what the server sent - to see that, right click on the web page and choose "View source").
- The Console shows...console messages and warnings.
- The Network tab is what you're looking for here. It shows all requests with a waterfall display, allowing you to see request and response details, headers and body and all. Enable "Preserve log" or otherwise it will clear out the list when your browser navigates. Click on a request to expand it. The details view has a Timing tab which breaks down what took how long during the request.
- The Performance tab and Memory tab is useful for profiling JavaScript in larger applications, of which the former allows you to throttle the CPU.
- The Application tab lets you see whatever information the current site has persisted, including cookies, LocalStorage, etc. ESC expands another utility drawer.

Chrome has comprehensive documentation for the devtools here (https://developer.chrome.com/docs/devtools/overview/).

Another useful tool is just using curl (https://curl.se/) from the terminal. If you pass -v, it'll enable verbose mode (https://everything.curl.dev/usingcurl/verbose) and show you both the full request as well as the response. e.g. curl -v https://google.com (https://google.com)

As for making HTTP requests via a GUI, I've found Postman (https://www.postman.com/downloads/) to be quite handy. Just ignore their prompts to login or register.

Of the 3 browsers, only FF does the 500kbyte+ cache on uploads
Regarding this and your previous question about the ~500kB buffer thing, there are multiple buffers throughout the TCP/IP stack of both the sender and the receiver. I haven't noticed or paid attention to this behavior before, so I'm unable to comment there. I would start from the following:
- Open the FF devtools' Network tab and enable "Persist logs" from the cogwheel to the right
- See what the POST request looks like there, see its Timings tab in the details
- Take a look with tcpdump/Wireshark to see if there's anything obvious there
  * does the TCP traffic look different between FF and Chrome?
- Investigate if this happens with other servers (try spinning one up locally and throttling the network speed?), or just your microcontroller server
  * if just your stack, can you reproduce it with a different stack than LWIP? maybe LWIP buffers?
  * can you reproduce it with a different client device on your server?
- What if you change your connection to, let's say, your phone's WiFi hotspot?
- If nothing else, maybe hop onto a Linux distro and use strace to comb through the system calls Firefox makes during the uploading. It's possible to filter strace for specific syscalls, or just grep it.
  * Does it happen on a Linux machine as well?

Interesting that the keep alive causes the server to send 304 every 5 seconds. I could implement that, although my "server" is a primitive one which is mostly stateless and just responds to each client request immediately.
Sorry, I forgot to mention that the screenshot had both TCP and HTTP mixed (different levels of abstraction!). There was only one 304 response over HTTP, and the keep alive just instructed the client to keep the TCP connection open instead of closing it after receiving the response (and also communicated the server's intent that it is also intending on keeping the TCP connection open for at least 5 seconds). It should be stateless and respond to each request immediately :). Unless we're talking about WebSockets (https://developer.mozilla.org/en-US/docs/Web/API/WebSocket) or WebRTC (https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API) or other technologies.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 10, 2022, 11:11:48 am
Thank you again, httpmies, for such an informative post.

Quote
HTTP is a stateless protocol.

I "get" that HTTP is stateless. I think my ideas about connections being closed were partly due to a couple of factors:

I also "get" that HTTP is client-server, and the server serves only what the client requested (apart from Ajax etc which I know nothing about). This has quite specific implications e.g. client requests x, the server returns x+y, but the client, having got x, discards y. Even if y is something trivial like a line of JS to go to a URL. So if doing e.g. a download (header, byte count x, CRLFCRLF, data) after the client has counted off x bytes, it is done and finished. Any further data is dumped, and this is why I have not been able to do anything after a download. For example this is my filesystem

(https://peter-ftp.co.uk/screenshots/20220810254824411.jpg)

so clicking on tb20-5.jpg downloads that file. But after the data has been sent to the client, any more data is ignored, because it wasn't requested. The server obviously knows when it has sent all the data, but there is no way to get the server to display another page on the browser, even refresh the one above, etc, after the file transfer. There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5.

The only way I can think of would be to download a JS script (before the file, obviously) which accepts the data intended for the browser, passes it on to the browser, and either intercepts the byte count or runs a timeout, and on the earlier of the two it does something like "window.location.replace("/files.html")\r\n". Is that possible? The web is full of people asking whether a server can get the browser to "press F5". The answer is NO, other than by running client-side code.

The other factor is that the TCP/IP API does have a concept of open connection, read, write, close connection. My simple server closes the connection after each "serving" and then re-opens it, to wait for the next client request. This is probably not needed; I am not sure. It was the demo code for the netconn API, which I am using.

Quote
but I would expect there to be some timeouts at least

An error corrected comms protocol obviously can't be done without timeouts. The stuff at the bottom here https://www.nongnu.org/lwip/2_0_x/timeouts_8c.html (https://www.nongnu.org/lwip/2_0_x/timeouts_8c.html) seems to refer to these. They are internal and presumably with some appropriate defaults. They must also be varied e.g. with satellite comms you may need seconds, while on a LAN only 100s of ms. And these are not related to a timeout on say netconn_recv which is a blocking read, and with an infinite timeout, but one you can change (I dropped mine to 5000ms).

Quote
500kB buffer thing, there are multiple buffers throughout the TCP/IP stack of both the sender and the receiver.

Of Chrome, FF and Edge, only FF does this 500k initial thingy. It all still works; just makes a simple progress report look weird because you get "520k of 1500k" immediately. The others do something like 40k, which at my flash file system writing speed of 30kbytes/sec is hardly noticed. One could probably hack it for a better visual presentation by making the JS (the upload is done with JS, as a "PUT", to make parsing it easier at the server) output say 30k blocks, separated by 500ms.

There is negligible buffering at my server. Just 4 MTU-sized (1500+) packets, plus 2 more at ETH level level.

If I wasn't nearly finished I would probably get wireshark set up. I had a play with the browser tools but there seems to be some data which it doesn't display.

Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: mariush on August 10, 2022, 11:59:50 am

so clicking on tb20-5.jpg downloads that file. But after the data has been sent to the client, any more data is ignored, because it wasn't requested. The server obviously knows when it has sent all the data, but there is no way to get the server to display another page on the browser, even refresh the one above, etc, after the file transfer. There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5.

The only way I can think of would be to download a JS script (before the file, obviously) which accepts the data intended for the browser, passes it on to the browser, and either intercepts the byte count or runs a timeout, and on the earlier of the two it does something like "window.location.replace("/files.html")\r\n". Is that possible? The web is full of people asking whether a server can get the browser to "press F5". The answer is NO, other than by running client-side code.

No, you can't do that because once you click on the download link it's as if the original page never existed, the download transfer is completely independent from everything.
You could right click on the link for the file, select "Copy link address",  paste it in another browser or mail it to some friend, and that other browser will directly request the file. What would those browsers do with your extra junk data you try to push? There's no index page opened in browser, they just clicked the direct link to download the file.
The link could even be passed to some download manager that hooks into the browser and catches download requests ex flashget, mass downloader, internet download manager etc

It's not "right" for the server to attempt to control what happens on the client's computer...  think of it, would you think it's acceptable for this forum's web server to serve you an image attachment and then inject code to redirect/refresh your page to an ad page, or to move you to amazon.com with an affiliate link? 

You could have some javascript on the page that can poll the server and constantly ask "has something changed since year-month-day hour-minute-second and server could instantly reply with a few bytes saying yes or no ... if yes, then your javascript code could trigger a refresh of the whole page.

Some other thoughts... about upload form ... when you serve the page to the user, you could have a hidden field in the form with a unique ID. Javascript could grab that unique ID and once user clicks on the button, you start a javascript function that once a second asks the server "where are you with the transfer of the file with unique id xyz "  and your server could do a stat on the file or whatever and see how many bytes were written in the file and reply "still receiving data, received n bytes so far for file with uid xyz"
But this implies that your code can actually switch between parallel requests ex (pause receive upload data and write to disk to handle the request from the other connection and then switch back to the previous request, to resume writing)
The original connection that started to upload the data (the post) doesn't receive data, it's a separate async connection created by the javascript code which communicates with your server independently from the first connection - like two people's computer in different parts of the city access the same page at the same time.

Quote
There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5
it's too fast, and I wouldn't be surprised if browsers don't do it on purpose or treat it as a badly coded webpage (would you want to have 1 GB of phone data and have a page like yours eat through the allowance because it refreshes twice a second?) or slows it down because it sort of pointlessly keeps that tab's threads active and the resources for that page would be harder to swapped out for other tabs.

You could have a listener on the onclick event for links , when a link is clicked, that code could use setTimeout or setInterval to schedule another function to run a couple seconds later. That function could restart the refresh of the page.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 10, 2022, 12:31:40 pm
Quote
it's too fast, and I wouldn't be surprised if browsers don't do it on purpose or treat it as a badly coded webpage

No; it works fine. I have a 1Hz refresh on another page (system status; lists RTOS tasks etc) and 0.5Hz refresh on the files page. All works perfectly, But after a file download, the page (/files.html) stops the refresh.

The refresh is in the client, so for some reason, after downloading a file, it terminates the refresh. Not a big deal, but interesting why. Pressing F5 restores it, because it re-downloads the files.html page which has the refresh in the header.

It is obvious from browsing on various issues that vast numbers of web coders have been up these various paths, hence websites are loaded with weird JS, and all sorts of other stuff. A lot of people use fashionable frameworks (Laravel is one, which will be dead in 10 years' time) which do weird things anyway.

I am just trying to build a very simple server for system admin, up/downloading files, etc. No style sheets. But it does need JS support in the browser. It works ok on phones too, although auto resize is a bit of a challenge ;)

Quote
You could have a listener on the onclick event for links , when a link is clicked, that code could use setTimeout or setInterval to schedule another function to run a couple seconds later.

The file download can take over a minute.
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ledtester on August 10, 2022, 12:37:31 pm
Quote
but there is no way to get the server to display another page on the browser, even refresh the one above, etc, after the file transfer. There is a 0.5Hz auto refresh on that page, because the filesystem could be modified from internal software, or from Windows via USB, but for some reason this refresh stops after a download... until the user presses F5.

The only way I can think of would be to download a JS script (before the file, obviously) which accepts the data intended for the browser, passes it on to the browser, and either intercepts the byte count or runs a timeout, and on the earlier of the two it does something like "window.location.replace("/files.html")\r\n". Is that possible? The web is full of people asking whether a server can get the browser to "press F5". The answer is NO, other than by running client-side code.

You could do this: the download link for a file runs some javascript which uses AJAX to download the file. To save the file on the client's computer use either FileSaver.js or StreamSaver.js:

https://github.com/eligrey/FileSaver.js/
https://github.com/jimmywarting/StreamSaver.js

I understand that with StreamSaver.js you can tell when the download is finished. As for doing this with FileSaver.js, see this issue:

https://github.com/eligrey/FileSaver.js/issues/699
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 10, 2022, 12:51:33 pm
Gosh, yeah... that's more or less what I said about downloading with JS. Some nasty browser dependent code in these, testing for different browsers and doing different things.

I've been involved with that on a few projects and it is a nighmare. One website (which I specified, paid ~10k for, and got a PHP+MariaDB guy in Poland to code from scratch) which supports picture upload, fails to work with some version of Safari on a Mac. Never found out why, not enough complaints, not worth the money at $40/hr, and might have been avoidable if I let the guy use Dropzone which does get gradually updated to handle the weird browser dependent crap.

Come to think of it I haven't yet tested this server with Safari  :-DD
Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: ledtester on August 10, 2022, 01:24:19 pm
Quote
The refresh is in the client, so for some reason, after downloading a file, it terminates the refresh. Not a big deal, but interesting why. Pressing F5 restores it, because it re-downloads the files.html page which has the refresh in the header.

Don't reload the entire page to update the directory listing. Poll the server with JS to get the directory contents and replace that part of the HTML with the updated contents. Basically it becomes a SPA -- Single Page App. You'll perform periodic polling with setTimeout() in javascript.



Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 10, 2022, 02:25:31 pm
That's a lot of work though.

Someone else started on this project and did partly that. He used hidden tags in the HTML which would get picked up by client JS, so the page presentation would (or could) be controlled by the JS. Then you could have A-Z, Z-A, etc directory sorting and all sorts of fancy stuff. But somebody has still got to write the code.

Title: Re: What is the HTTP server-client interaction to do file transfers?
Post by: peter-h on August 15, 2022, 08:54:56 pm
This is just an update on this project.

The http server is now finished.

It remains based on that original ST-supplied netconn-API code which is found around the place e.g.
https://github.com/particle-iot/lwip/blob/master/contrib/apps/httpserver/httpserver-netconn.c

It was quite a lot of work since nobody uses netconn (people use the socket API) but eventually, with help of some clever people, it was worked out.

JS was used to implement the returning of large chunks of data to the server, so that the easier to parse PUT method can be used, instead of POST. The first of these was the textarea file editing, where JS was also used to enforce CRLF line endings (textarea strips off CRs when it is initially loaded with data). The second was file upload, where I wanted extra checking e.g. filename is valid-8.3 and file size no bigger than remaining filesystem space (I did a special GET function for that value).

The final JS push was done by a guy on freelancer.com for $100 - he was incredibly fast but these people have to be, to do what would be some full time employed programmer's day's work for $100. Clearly this is the way to get software done, but you need to be able to define a clear package.