Author Topic: What is the HTTP server-client interaction to do file transfers?  (Read 6595 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Back here
https://www.eevblog.com/forum/programming/what-actual-data-is-used-to-return-a-favicon-to-the-browser/
I implemented a favicon on my simple server.

I now understand how a web page is served and how one can have links on that page and deliver other pages as somebody clicks on the links. It seems really simple.

However I now need to do file transfers, of arbitrary size. My understanding that the client end is a standard browser feature. On a download, the browser offers a Save As dialog. On an upload, it offers a file browser to pick the file. It is the transfer I don't know. I think it is done in blocks, chosen to suit the server memory availability, and the client has to request the next block. All examples I see online again apply to standard servers like Apache. Can anyone tell me the actual data flowing?

Many thanks :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3733
  • Country: us
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #1 on: July 16, 2022, 11:52:43 pm »
File downloads are handled like any other request.  The client decides whether to try to display it or download it by default based on file type.

File uploads are done with form submission.  A file upload must use the submission method "POST" not "GET". There is no chunking, and it isn't normally broken into multiple requests.  Throttling is just done by standard TCP connection throttling.

There is a "content range" field to request certain byte ranges that is used to get early previews in some document types but it's up to the client and often not used.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #2 on: July 17, 2022, 12:51:09 am »
Downloads are served exactly like HTML pages, only the Content-Type: defines the file type.
If you want to force the file to be saved (and not opened in an application), use Content-Type: application/octet-stream.

File uploads use POST method requests, where the header part is followed (by an empty line and) the uploaded file data.
The Content-Type is multipart/form-data; boundary=boundarystring with the boundarystring usually quoted.
Mozilla Developer Network has an example:
Code: [Select]
POST /test HTTP/1.1
Host: foo.example
Content-Type: multipart/form-data;boundary="boundary"

--boundary
Content-Disposition: form-data; name="field1"

value1
--boundary
Content-Disposition: form-data; name="field2"; filename="example.txt"

value2
--boundary--
Note that each newline in the above snippet is \r\n.  Browsers tend to use long boundary strings, which can be annoying.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14548
  • Country: fr
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #3 on: July 17, 2022, 01:25:04 am »
I think there's a specific request for partial transfers too, but I don't remember the details.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #4 on: July 17, 2022, 02:22:08 am »
Unless the server provides an Accept-Ranges HTTP header, clients shouldn't try to request ranges or partial responses.  Requests themselves are always complete, not split into ranges.

If a request does contain a Range HTTP header, it is up to the server to completely ignore it (just return a normal 200 Ok response with the entire contents), return only the range(s) (in a 206 Partial Content response), or error out with 416 Range Not Satisfiable.  The most robust approach is to ignore it, and just always return full contents.
 
The following users thanked this post: SiliconWizard

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #5 on: July 17, 2022, 07:58:33 am »
This is really interesting and, at the "byte shuffling level" doesn't sound that complicated... especially as in my case the client has "unlimited" memory and speed.

I too thought that the server can just push a 2MB file out in one go and especially nowadays the client should manage to stream it to disk, but what if the server has a flash file system with a 30kbyte/sec write speed? Is that flow rate controlled by the normal TCP/IP mechanisms too, so that while writing to flash you just don't poll the socket API for more data?

Isn't "multipart" a requirement? I have seen some awfully complicated implementations.

Also I guess there need to be timeouts so the server can recover if the data stops arriving, but how does it know the end of data? Is a byte count sent by the client, first?
« Last Edit: July 17, 2022, 08:27:29 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3733
  • Country: us
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #6 on: July 17, 2022, 04:27:54 pm »
That is all what TCP is for.  Don't worry about flow control. You are overthinking this.

The only reason to use byte ranges is if you only want part of the file: say you are resuming an interrupted transfer, or if you want to fetch the data out of order.

Both the client and the server are always allowed to fall back to entire file transfers.
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 5056
  • Country: ro
  • .
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #7 on: July 17, 2022, 05:13:35 pm »
In case you're not aware of it,  both Firefox and Chrome have Developer Tools  (usually can be opened in both using F12 key)

Go in the network tab  and when you load a page, you will see there all the requests being made by the browser and you can click on one to see on the right the request and response headers

There's a response header content-disposition that you can use to force the browser to ask where to save the file instead of automatically saving it to a default location, or opening in browser
See https://coding.tools/blog/force-file-download-instead-of-opening-in-browser-using-http-header-and-flask

basically Content-Disposition: inline will tell browser that it's ok to load it in a new tab or same tab (ex a pdf),
Content-Disposition: attachment; filename="abc.txt"  tells browser to pop up the save as and suggest the name abc.txt for the file that you then push to the browser.
Up to you if you implement ranges, I'd say don't bother, or ignore  ranges in the header. Would add quite a bit of complexity, to parse the range parameter, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range
Respond with Accept-Ranges: none; or say content-range: 0-size/size always ... and browser will know resume is not supported.
 
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #8 on: July 17, 2022, 06:23:34 pm »
Thank you. Yes; I have used that feature to some extent.

I have a simple HTTP web server (for local config/status only; no security is needed) which just looks for the GET... (url) string and returns a dynamically generated page which is the FreeRTOS task listing. This actually came out of some ST code.

It seems pretty obvious how to handle a page with links to other pages. The server looks for the base URL and then squirts out the HTML. This is what I've been doing for years in Dreamweaver, building simple websites with no CSS or any sort of style sheets, which scale perfectly on any device. Just implement multiple string compares... and on a match squirt out the HTML for the selected page.

Then I need to implement a file listing (FatFS file system). I already have done this and it gets sent to a file, so sending it to the browser is trivial.

The next bit is a bit harder. I want to do

- file upload (using the browser file picker, as discussed above)
- file download (the file listing will have clickable links for each file, so you select the file by clicking on it)
- file editing (only for small text files; opens a browser window, with a SAVE button under it)

The last bit I really don't know but it is not essential since the user can edit a downloaded file locally.

I also need to add a username/password login, which will involve the browser popping up text entry boxes for these. This is probably a variant of the editing window above. The credentials are in a file in the filesystem, configured via a separate process.

I got someone to write this, using some open source web server, but after spending about 4k it turned out to be a disaster which doesn't run (well not as an RTOS task) and it is way too bloated for me to fix. It's got some multipart stuff for the file transfers, which seems unnecessary. So I will adapt my existing simple server.

There will probably be vulnerabilities because I am using simple tests like this to see which URL is being sent

Code: [Select]
      /* Is this an HTTP GET command? (only check the first 5 chars, since
      there are other formats for GET, and we're keeping it very simple )*/
      if ((buflen >=5) && (strncmp(buf, "GET /", 5) == 0))
      {
    if(strncmp(buf, "GET /index.html", 15) == 0) // The 16 depends on the URL length!
        {
    // Return the dynamic status page
    DynWebPage(conn);
        }
    else
    {
        if(strncmp(buf, "GET /favicon.ico", 16) == 0) // The 16 depends on the URL length!
            {
        // Return the favicon
        DoFavicon(conn);
            }
    }
      }

but it doesn't matter.
« Last Edit: July 17, 2022, 06:27:04 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 5056
  • Country: ro
  • .
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #9 on: July 17, 2022, 07:36:25 pm »
I followed your posts so I know a bit about what you're doing.

I would probably look for the first 4 characters, and reject anything that's not "GET[space]" or "POST". Then I'd extract the URL after GET[space]  or POST[space] and do something depending on it.

Keep in mind that you'll HAVE TO account for parameters in the URL even if your pages won't have such parameters, and you also have to account for the HTTP tag after the URL. 
Some browsers when they don't get information through response header about caching the page will add some bogus parameter at the end of URLs because that often forced web servers to send the document again, instead of replying with HTTP 304 Not Modified.  Some browsers did this when user pressed Shift + F5 (forced refresh)

ex you could have GET /index.html?parameter=value HTTP/1.1

Could be HTTP/1.0 , could be HTTP/1.1, could be HTTP/2  so you can't just assume it will always be 8 characters. You'd have to look for the space after the verb, then the next space as an end for the URL. The URL's spaces will be escaped as %20 so space is a good separator character, to separate verb, url and http version.

The most common verbs are POST, GET, PUT, PATCH and DELETE   .... but really GET AND POST are the most used, DELETE is sometimes used in APIs of various services, and PUT/PATCH are very rarely used.
You really only need GET and POST

File upload is easy.  You put a form on the page with the fields you need, and when the user hits upload, the browser will send a POST request to your server with the data. At the end of transmission it will send one or two empty lines and keep the connection active, waiting for your reply.
You accept the incoming data, dump it into a temporary file if you don't have a lot of memory, then do your thing on the temporary file and you then send a proper reply to the browser. It could be something as simple as a 200 OK and do a redirect in the html  ex  <html><head> <meta http-equiv="Refresh" content="0; URL=https://example.com/"></head><body /></head> and you could redirect to the page listing the newly uploaded content.

The form on the page would be something like this:

// see the input type="file" for a lot of good details : https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/file

<form method="POST" enctype="multipart/form-data" action="/upload.html" >
<div>
 <label for="file_up">Select file to upload:</label>
 <input type="file" name="file_up" accept =".doc,.pdf,application/msword" placeholder="Please pick a file!" >
 <input type="hidden" name="folder" value="/documents/uploads/" >
 <input type="submit" name="button_upload" value="Upload" >
</div>
</form>

The form has a file selection called file_up , a hidden parameter named "folder" with the value the path (filled by you when serving the page to the user) and a button named "button_upload"

When user hits the Upload button, the browser is gonna do  POST /upload.html   and do a multipart-formdata and you'll get at least these 3 items separated by that boundary

The accept and placeholder are optional, I just added them to show it's possible.

Editing of a document can be done just the same with a POST form, only you can use a TEXTAREA - https://developer.mozilla.org/en-US/docs/Web/HTML/Element/textarea -  around the contents of the document and the user will see the text in a editable text box on the screen
When you hit the save button, the browser will send you the whole contents of that textarea and whatever other parameters are in the form (hidden on purpose by you or not)

Note that the contents of the document would have to be escaped ex < > and & at the very least :  < would be &lt; , > would be &gt; and & would be &amp; - the browser will parse the html and show the characters properly on the screen and send you the actual characters in the POST form data, not the escaped ones.

There's some other gotchas, like if you don't specify a text encoding for your html files, it's assumed your html file and whatever is in it is UTF-8, so if you dump a .txt file that was written with character encoding ISO-8859-1 then some valid characters in ISO-8859-1 would be invalid in UTF-8 (invalid code points, those specific characters get two byte combinations in utf-8) and not rendered properly on screen.  See for example https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/ for explanation

user and password ... easiest would be to use cookies  or a session id

Basically you make a form with  username  and password and a login button and you submit these with a POST request to your server (GET works too, but it's not "cool" to use GET because the parameters may show up in your address bar)
You get the username, password and you verify them against some internal database and if they're correct you add to the HTTP response headers a cookie with the Set-Cookie attribute : https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie
For example, say  jsmith logs in and the password he submits is correct, you generate a token of some sort ex "20220717jsm453" and you add this to your records with an expiration date (ex in case user leaves browser page opened for 10 days and someone hits refresh you don't want that random user to mess around)
Then you send cookie 

Set-cookie: token=20220717jsm453; Max-Age=86400

When 86400 seconds go by, the cookie is deleted. You can't rely on this as a user with bad intentions could just go and edit the cookie database in the browser and set the max age to 2 billion and never expire the cookie, on your server it would be wise to also check when token was generated and invalidate the cookie by sending Set-Cookie with another token value
If you don't say Max-Age or Expires the cookie is treated as a "Session cookie", it lasts as long as the session lasts, which could be indefinite (if user has option restore tabs at start)

So once you do a Set-Cookie, the browser will send the cookies back to you with every request, ex
Cookie: token=20220717jsm453;[SPACE]anothercookie=somedata;[space]yetanothercookie=morecrap

And you can check on every request to see if that token is valid, and if it's not, you could just give a 304 temporary redirect and redirect to a login page.


« Last Edit: July 17, 2022, 07:38:34 pm by mariush »
 
The following users thanked this post: peter-h

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #10 on: July 17, 2022, 09:18:43 pm »
Thank you very much mariush.

The new point you made me realise is that a login has to be implemented with something "sticky". But cookies are a fair bit of code.

Quote
Keep in mind that you'll HAVE TO account for parameters in the URL even if your pages won't have such parameters, and you also have to account for the HTTP tag after the URL.

Can the extra data be just ignored? I can have a large enough buffer to hold it.

This stuff is actually rather similar to the business of "scraping" websites. I have never written such code myself but have known lots of people who have.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 5056
  • Country: ro
  • .
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #11 on: July 17, 2022, 09:44:16 pm »
I guess you could ignore parameters in the url ex search for first "?" or space to get to that end.

But being able to parse those parameters could be useful for pagination or if you implement some api if you'll end up using Javascript in your pages.

ex you could have a list of files and show only 10 or 25 per page, so you would have for example GET /documents/xyz/index.html?page=2 HTTP/1.1 or Javascript on your page could do a fetch saying GET /api?function=filelist&folder=/documents/xyz/&results=10&page=2&api_token=12345

to get up to 10 results, starting from 11th file, for the folder /documents/xyz/ folder
(page 2 and results=10: 10 files on first page, so entries 11-20 on page 2)
api_token would be set when user logs in

or maybe view a text document in chunks by saying GET /api?function=viewtext&filename=/text/a.txt&offset=###&maxsize=1000&api_token=12345  to get up to 1000 bytes from offset ### and put on the screen (useful for example to progressively load a text file as you scroll down and get close to the end of scroll)

 
The following users thanked this post: peter-h

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #12 on: July 18, 2022, 07:33:58 am »
I will first try file transfer and see how that goes. If it can just be streamed nonstop that will be far simpler than I thought. Others have told me it is done block by block and there is an ack after each block, to control the flow rate.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #13 on: July 18, 2022, 08:07:44 am »
I will first try file transfer and see how that goes. If it can just be streamed nonstop that will be far simpler than I thought. Others have told me it is done block by block and there is an ack after each block, to control the flow rate.
It is streamed nonstop at the HTTP protocol level.  At the TCP/IP level, each TCP datagram is acked by your IP stack when it has been read/received.  So both are true, just look at a different level.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #14 on: July 18, 2022, 09:09:36 am »
Sure; that much I knew. TCP/IP has error correction and inherent flow control. UDP doesn't (it has a checksum). If I understand it right, from the code writing POV, it seems to "just work" i.e.

embedded -> browser
read the file and send it to the socket API, until EOF

browser -> embedded
read the socket API and write each buffer size, say 512 bytes, to the filesystem

An interesting Q is how to do a "progress report" say xx%. In the former case, the client should be showing that at bottom left. In the latter case, I don't think a browser shows anything, but the server can send back a string saying "xx% done".
« Last Edit: July 18, 2022, 09:37:04 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #15 on: July 18, 2022, 10:43:34 am »
An interesting Q is how to do a "progress report" say xx%.
The Content-Length header is the key here.  It should contain the size of the uploaded data (the file size, if nothing else is included in the POST data) in requests, and the download size in responses.

The upload progress report can then be implemented in JavaScript; download progress report is shown by the browser itself automatically –– but only when the response header does report the download size via Content-Length header.

(The upload progress report is based on making the upload itself in JS, using an XMLHttpRequest object.  Don't let the XML in the name fool you.  You use the object to open a POST connection to the server, attaching a progress event listener to it.  Then you use the object to send() the file contents by creating a FormData object with the file name as the initializer.  You can also append other stuff to the FormData object.  The XMLHttpRequest interface will make sure the request has a Content-Length header, if one is computable.  The Mozilla Developer Network Using FormData Objects page should help here.)

Have you considered writing a HTTP server on a fully hosted OS, on top of a known good TCP stack?  Sure, it would be "extra" work, but that way it would be very easy to debug it, and gain a full understanding of what is done at the application level (on top of TCP/IP).  If you use Linux, I could definitely write an example skeleton of it for you.
« Last Edit: July 18, 2022, 10:45:34 am by Nominal Animal »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #16 on: July 18, 2022, 11:05:47 am »
Quote
The upload progress report can then be implemented in JavaScript

What happens, during a client -> server file upload, if the server sends some data to the client? Surely that data should appear in the client browser window. There is, AIUI, nothing else being sent to the client during the transfer. With appropriate HTML formatting, one ought to be able to display a "xx%" string.

Quote
Have you considered writing a HTTP server on a fully hosted OS, on top of a known good TCP stack?

Well... I started down this road with someone else who picked up an open source server and after 100+ billable hrs didn't produce something which worked. Partly because I didn't ask for detailed progress reports, but there were also some fundamental misunderstandings e.g. he was supposed to write it as an RTOS task (like everything else in the project is) but instead wrote it as some sort of standalone code, and there is so much stuff there I can't work it out. It sort of partly runs. After more expense than I can afford, I have now scrapped this and will try to write it myself, by extending the simple server I have had working for ages (which just does the RTOS task list and the favicon).
« Last Edit: July 18, 2022, 11:09:52 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #17 on: July 18, 2022, 11:32:19 am »
Quote
The upload progress report can then be implemented in JavaScript
What happens, during a client -> server file upload, if the server sends some data to the client? Surely that data should appear in the client browser window. There is, AIUI, nothing else being sent to the client during the transfer. With appropriate HTML formatting, one ought to be able to display a "xx%" string.
Try it in practice.  It isn't as simple as it sounds, especially because the response must be a HTML page itself, and many clients won't actually read the response until they've sent all the data.

After more expense than I can afford, I have now scrapped this and will try to write it myself, by extending the simple server I have had working for ages (which just does the RTOS task list and the favicon).
Right.  It is just that isolating the service logic from the underlying TCP/IP stack really helps understand how the clients (browsers) interact with the server.
Let me throw something together, and I'll post it here as an example.  Be back in a moment.

(If I recall correctly from my early Common Gateway Interface programming, it's the boundary strings in multipart requests that is the most annoying to handle, so I'm not sure I will do upload support.)
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3733
  • Country: us
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #18 on: July 18, 2022, 01:45:28 pm »
What happens, during a client -> server file upload, if the server sends some data to the client? Surely that data should appear in the client browser window. There is, AIUI, nothing else being sent to the client during the transfer. With appropriate HTML formatting, one ought to be able to display a "xx%" string.

I doubt that will work.  The POST request is only allowed to generate one response.  Even if the browser will process the response before it finishes the transfer it would be a single page and be displayed all at once.  You can't really do a progress bar that way.
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 5056
  • Country: ro
  • .
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #19 on: July 18, 2022, 01:56:21 pm »

Another upload technique would be to use javascript exclusively to upload the file to the server in chunks.
For example, the javascript code loads the file in a byte array and then opens a connection, does a POST to the server with the start position and the amount and the data (a small amount, for example 4KB , 64 KB, 128 KB) then closes the connection, updates a progress bar on the html page, and creates another connection for the next chunk , and repeat until all chunks are uploaded.
It would be the server's job to combine all segments into a single file, when all the chunks are successfully uploaded. 

I don't recommend it though ... it's a lot of Javascript code and since you're most likely not gonna upload megabytes to your device, it won't be necessary. People will probably be fine with waiting 5-10s or so it takes to upload some file to your device.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #20 on: July 18, 2022, 02:41:13 pm »
Yes; I need to stick to stuff I can understand :)

Biggest uploaded file will be 2MB which at 30kbytes/sec net flash writing speed is about 1 minute.

I have spent today trying to work out what the other guy did on that (cancelled) http server project which doesn't work properly. It turns out to be a complicated state machine, which hooks into LWIP, and I haven't got a clue how to hack it to run as a standalone RTOS task. It's also got interesting memory usage, doing a malloc out of LWIP's private heap (both the malloc, and where from, are a very bad idea). There is also a complicated multipart section (another open source thing from github) which was apparently needed for file transfers, but according to above it isn't necessary. So from tomorrow I will be writing my own one :)

I now have a web page running with clickable links. Tomorrow I will try a file transfer.
« Last Edit: July 18, 2022, 09:46:29 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #21 on: July 18, 2022, 10:26:30 pm »
Which one of the lwIP APIs do you use?  altcp/tcp (raw), netconn, or socket?  Have you already checked out the HTTP/HTTPS server included in lwIP 2.1?

The reason I ask, is that I can recreate the same API (for any of those) in hosted C, so that code should be directly portable between the two.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #22 on: July 19, 2022, 05:28:59 am »
Netconn, currently.

When I was doing some debugging (other thread about mutexing LWIP API and LWIP_TCPIP_CORE_LOCKING=1 blowing the whole thing up because the code examples used the same mutex around the low_level_input/output as they used for the API higher up) I found netconn ends up in the same place as sockets.

Quote
Have you already checked out the HTTP/HTTPS server included in lwIP 2.1?

Yes; this


I can't work it out; it is too complicated. And if it is anything like the rest of that ETH code it doesn't work (e.g. the NTP there is only a skeleton). The abandoned project was seemingly based on that code.

Today I am doing the file listing, with clickable filenames, and clicking on a file will download it.

Quote
Downloads are served exactly like HTML pages, only the Content-Type: defines the file type.
If you want to force the file to be saved (and not opened in an application), use Content-Type: application/octet-stream.

EDIT: found something curious. I am displaying the file listing, using conventional HTML



and if I click on say that jpg file, the browser (Chrome) pops up a Save File dialogue. I haven't sent it any headers at that point. It correctly selects what to do according to the file type. If there is no file type, it offers to save it under "all files". This is exactly right. Very neat!

I can also see some back doors here, because in this simple system each file can be downloaded as a direct URL, bypassing any login, unless that is specifically blocked.

EDIT: both chrome and FF also save the file with the correct date (in 2006). Where the hell does the browser get that info from? It is not hidden in the HTML.
« Last Edit: July 19, 2022, 07:11:13 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6317
  • Country: fi
    • My home page and email address
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #23 on: July 20, 2022, 04:55:27 am »
EDIT: both chrome and FF also save the file with the correct date (in 2006). Where the hell does the browser get that info from? It is not hidden in the HTML.
From the HTTP response header, specifically the Date one.

Note that browsers can send HEAD requests, to obtain the date (Date:), MIME type (Content-Type:), and size (Content-Length:) headers without bothering with the body content.  (It is just like GET, but the body part is not sent.)

I can also see some back doors here, because in this simple system each file can be downloaded as a direct URL, bypassing any login, unless that is specifically blocked.
Yep.  The most practical way is to set a cookie at login, that you can check at request header processing time.

There are several approaches.  The two main ones differ between whether you keep a table of every logged in user in memory on the server, or not.

If your server keeps a small table of every logged in user in memory, consisting of allowed access mask, an expiry timestamp, optionally an IP address, and a random number (that you also set as the login cookie value), it is a simple matter of verifying on each request that:
  • it contains a Cookie header with a cookiename=randomnumber pair
  • the authorization has not expired yet
  • the request comes from the same IP address (optional)
  • the access mask grants access to this particular URI and request method
The IP address limits the use of the cookie to that client IP address, but does not otherwise add anything to the security.  When you generate new random numbers, you will always make sure it does not already exist in the authorized user table.  The security stems from the random number being unguessable, so it needs to be large and cryptographically secure.  Your TLS implementation should provide a suitable pseudorandom number generator.

Note that if the authorization is cancelled after some inactivity period, you'll want to support two such records for each logged-in user.  Whenever the cookie value is old enough (typically a fraction of the expiry time), it is replaced by a new one.  However, because the order of HTTP requests is not deterministic, there is a time window during which the client will still use the old value: you'll want to accept either one for a duration.  Personally, I'd just accept both, always adding a Set-Cookie header with the new one to responses where the request used the old cookie.

The other way is to use a single secret "salt" on the server, and keep the authorization information in the user cookie.  The cookie also contains a hash of a suitable secure hashing function, with the plaintext (the data the hash is calculated from) containing the secret salt.  On each request, the server verifies the cookie value is what it itself has set before, by recalculating the hash, and comparing it to the hash in the cookie.  If any character of the cookie is changed, the hash no longer matches.  This is computationally much more work for the server, but lets things like multiple load-sharing servers (as long as they share the secret "salt") without having to share possibly huge user authorization tables.  Furthermore, to avoid certain types of attacks, it is important to construct the plaintext and salt combination in specific ways.

Neither of these is secure, unless the connection is encrypted using TLS.  None of the cookie-based authentication methods are.  And nobody uses HTTP authentication, because the user interface in browsers (especially regarding logging out) is so poor.  (Anyone listening in on the unencrypted traffic can simply steal and reuse the authentication cookie.  Even IP address limiting is just a small hurdle, because spoofing the source IP address is easy unless encryption is used.)

I would definitely use the table-of-authorized-users approach in an embedded server, with at minimum 128-bit (16-byte), preferably 256-bit (32-byte, 43 chars if base64-encoded, 40 chars if encoded in base85) random numbers.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3728
  • Country: gb
  • Doing electronics since the 1960s...
Re: What is the HTTP server-client interaction to do file transfers?
« Reply #24 on: July 20, 2022, 11:10:20 am »
Quote
From the HTTP response header, specifically the Date one.

I am not sending one. I know, because I wrote all the code on the server :) All I am sending is

Code: [Select]
// send header
netconn_write(conn, DOWNLOAD_HEADER, strlen((char*)DOWNLOAD_HEADER), NETCONN_COPY);

// send filesize, as "Content-Length: 1910916\r\n\r\n"
strcpy((char*)pagebuf, "Content-Length: ");
itoa(file_length,(char*)&pagebuf[16],10); // place size after "Content-Length: "
strcat((char*)pagebuf,"\r\n\r\n");
netconn_write(conn, pagebuf, strlen((char*)pagebuf), NETCONN_COPY);

if (f_open(&fp, fname, FA_READ | FA_OPEN_EXISTING) == FR_OK)
{
do
{
if ( f_read(&fp, pagebuf, 512, &numread) != FR_OK )
{
numread=0;
break;
}
netconn_write(conn, pagebuf, numread, NETCONN_COPY);
offset+=512;
}
while (numread==512);

f_close(&fp);
}



// Header for file download
static const uint8_t DOWNLOAD_HEADER[] =
"HTTP/1.1 200 OK\r\n"
"Content-Type: application/octet-stream\r\n"
"Content-Disposition: attachment>\r\n"
"<meta http-equiv=refresh content=1000>\r\n" // cancels out the 1Hz refresh used elsewhere (no obvious way to just cancel it)
;



I have concluded that Chrome and FF extract the EXIF data from the file (a jpeg), if there is no date in the header. I will test this later. Edge doesn't do it and saves the file under current date. To do it properly, should I use this format

Code: [Select]
"Content-Length: 364\r\n"
"Date: Sat, 01 Jan 2022 00:00:00 UTC\r\n"

Regarding a login, one interesting Q is whether it can support multiple concurrent clients. Most of it is stateless (client strings are immediately processed) so it can, provided they don't do stuff at literally the same time. There is only one RTOS thread. So obviously this is a hack, but I am ok with that.

I will read up on what sort of data is involved in cookies. It is probably more complicated than the rest of the server :)

I've spent a lot of time running file downloads and trying to see which packet / buffer size etc options made any difference. I am getting interesting results: 120kbytes/sec and even ridiculous amounts of buffering improve this only slightly (to 140). Reducing the buffers makes little difference until you get down to ridiculous levels e.g. 1 buffer at low level ETH which halves the speed.

Quote
I would definitely use the table-of-authorized-users approach in an embedded server,

The box has a 2MB FatFS filesystem and in there is a config.ini file which contains http_name= and http_pwd= and there is just one pair of these. I think this is consistent with there being only one possible user at any one time, although I guess one could have multiple sets of credentials so you could revoke those of somebody you just fired ;) However the security of this box will probably be laughable...
« Last Edit: July 20, 2022, 01:08:32 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf