Author Topic: What is the best protocol to use for IoT device with Big Data? (FTP, HTTP, MQTT)  (Read 3331 times)

0 Members and 1 Guest are viewing this topic.

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Hi there,

We manufacture CAN bus data loggers for use in e.g. cars, trucks etc.

The logger records data to an SD card. When the logger is in range of a WiFi/Cellular hotspot, it will start pushing the data to a specified server. The files will be ½-20 MB in size and need to be transferred frequently (e.g. every few seconds). A user can have 100-300 loggers deployed, all sending data to the server simultaneously.

Today, we handle this using the FTP protocol. It’s been a simple solution that most users understand.

However, we fear it’s not future proof and we’d like to look into other options (e.g. HTTP, MQTT) - in particular also to enable easy transfers to e.g. Amazon/Google cloud servers.

A few questions that we’d like input on:

What do you believe would be the best protocol for the case described above?
Do you have any examples of use cases, APIs, open source code etc. in such applications?
If we switch from FTP to another protocol, is there a way to keep it simple for our “less techy” users that do not use e.g. Amazon/Google servers, but just their own PCs?

Thanks a ton for your inputs!
Martin
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 18566
  • Country: nl
    • NCT Developments
How are you safeguarding privacy? I recon the data can be used to check where people and or valuable items are. Is that already covered?

AFAIK the protocols used by cloud services are out in the open. Modern Linux distributions are able to access data on several cloud based file systems without needing third party software. I don't know if your embedded platform will be able to implement that. However a problem is that cloud services may change APIs and render your devices inoperational.

Did you consider running your own servers where the devices upload data to and from which the users can pull the data from a website? That way you can A) use a proprietary encrypted protocol between your servers and the devices B) give users access using secured HTTPS. Another bonus is that you can push firmware updates to the devices.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online Rerouter

  • Super Contributor
  • ***
  • Posts: 4459
  • Country: au
  • Question Everything... Except This Statement
The other thing to cut your data size is to do simple compression, 70% of what is broadcast on can doesnt change often.

As for the protocol, ftp is ok, just make it a one way ordeal, files can be uploaded, but the directory can never be read in any way, rather the ftp sends back an acknowledge.

You can add a layer to this in your proxy / caching server to only allow inbound ftp connections and an outbound port for the ack.

Equally locally on the devices, keep it simple. Dont have them reply to pings, etc, and require some level of authentication if you allow upgrading if firmware.
« Last Edit: May 04, 2018, 02:29:37 am by Rerouter »
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 2039
  • Country: us
Yeah, you definitely want to figure out what your risk profile is on security.  If possible it would be good to have hardware safeguards in place to prevent dangerous conditions such as having a snoop-only CAN receiver that can only monitor traffic, not send messages.  But if that data includes vehicle IDs or location data, you also want to encrypt it in transit.

If you are just uploading files to customer sites and then it is there responsibility what to do with it, then encrypted FTP or scp/sftp is probably just fine.  Unless your customers are asking for it, I wouldn't implement an HTTPS upload if it is just going to be a different kind of file server.  If they are asking, they probably have a specific service in mind.

On the other hand, if you upload the data  to a cloud database service like google's bigquery or the amazon or microsoft equivalent services, that could give you some really nice features. The big disadvantage is that your product gets locked into a third party platform, but you or your customers get a bunch of analysis and monitoring tools, and they can start just running queries.  This possibly opens you up to the ability to offer additional services to your customers, and also may help you win customers who can't or won't run their own services.  You also have to get more involved in your customers lives.  There is some ability for interaction between AWS, GCP and Azure, but ultimately if your customers have already bought into one cloud ecosystem they would prefer you to send data there.
 

Offline djnz

  • Regular Contributor
  • *
  • Posts: 169
  • Country: 00
Just switching from FTP to SFTP / SCP could be a big improvement in terms of security.

However, note that contrary to the name, SFTP is not "FTP + something", it's a pretty different protocol. Open source implementations are readily available.

FTP is quite old now and you should definitely consider moving away from it. Apart from security issues, it also has other design lacunae - requires multiple round trips etc.
 

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Hi all, thanks a ton for your inputs!

Quick clarification: The data on the logger is heavily encrypted and the logger does not allow transmission of data to the CAN bus (i.e. it's only able to silently record).

As for the protocol, our main concerns with FTP is if it'll be difficult to transfer data to e.g. Amazon/Google/Microsoft cloud servers.

1) If we use FTP, will we need to embed some sort of hardware alteration in our logger to upload files to e.g. an Amazon cloud server? (for security reasons)
Believe we've heard something along these lines - and that the hardware chip would be quite space-consuming

2) If we wanted to shift to e.g. HTTP or MQTT, would this be realistic at all for the large file transfers we describe (½-20 MB, sent every few seconds)?
E.g. for MQTT we've heard a lot about instability for even semi-large files (< 1 MB) - any experiences with this?

3) If we want to implement Firmware and Configuration updates over-the-air, are there any existing methodologies/APIs/source codes available for either of the protocols? (FTP, SFTP, MQTT, HTTP)

Thanks again!

Martin
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1396
  • Country: us
You will need to consider how to (or whether to) gate access to the ability to write data into the system in AWS/Azure/GCP/other.

This might be a per-device client certificate, a per-device serial number, or other, similar technique. Client certificate and end-to-end encryption will have the highest CPU and code requirement. You can probably get by with an unencrypted on the wire protocol if you aren't at all worried about man-in-the-middle attacks. (Most applications of this sort don't need to worry about such things, but you know your application better than anyone else on EEVBlog.)

Data ingestion and processing costs on AWS are so low that I wouldn't particularly worry about some rogue discovering your endpoint on the internet and sending unrelated data to it. Just make sure you can identify the good traffic you do want to keep (from your devices) and ignore the portscan or script-kiddie type of spurious traffic.

A 20 MB file, sent every 5 seconds is 240MB/sec or about 2.5 Gbps before considering any radio-related overhead. That's literally multiple Gigabit links, which would be modestly challenging to do from a desktop PC that was sitting inside a datacenter. Over wifi, it's simply not going to happen. The practical no-overhead upper limit for 802.11ac wifi is around 1.3Gbps; you're proposing around double that. It's not going to happen. You need to look at this part of the product requirements and nail them down a little better. (Most CAN buses are in the hundred kilobit to one Megabit per second range, so you're contemplating what seems like the full bandwidth of 1000-2000 CAN buses, so I wonder if there's a unit or order-of-magnitude bug somewhere in your product planning.)
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1396
  • Country: us
Even with OTA update capability, I'd try to put as much of the "smarts" and computing power in AWS and as little as possible on the edge device. Much easier to push fixes and debug code on a server than on a panoply of versions of firmware on a variety of revisions of hardware in the field.
 

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Thanks again for your inputs!

As for the WiFi transfer, our logger currently transfers around 9 MB/s (~540 MB/min) without problem using the FTP functionality.

Regarding the protocol, we've tested a bit with MQTT and found it difficult to handle large transfers.
With that, we're probably down to a choice between FTP/SFTP - and HTTP.

Any thoughts on pros/cons of FTP vs. HTTP for our use case?

Really appreciate all the great feedback and reflections!

Martin
 

Online Rerouter

  • Super Contributor
  • ***
  • Posts: 4459
  • Country: au
  • Question Everything... Except This Statement
9MB/s still sounds like an order of magnitude error, most cars and trucks use 250KHz canbus with about 45% utilization being the highest I have seen,

Mercedes run at 500KHz, and a few fords at 1MHz, but your data rate seems to be for a 10MHz canbus with 100% utilisation, to me that is unheard of in vehicles,

Have you looked into basic compression? most of the status inputs, e.g. brake, selected gear, indicators, plus all the other stuff barely ever changes in this context, Its only really fuel, speed and revs that change quickly, So if you recieve the same packet every 50ms but its the same, then build in to your device a way to compress this down to what ID, how many times, and how often, then when it changes record the entire message and continue along

I've got recordings of close to 400 vehicles running trips, when you do that basic compression, utilization falls to under 6% on most.
 

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Hi again,

We have looked into compression, the 9 MB/s is in simulations mainly as well as in some non-automotive applications.
We will indeed compress the data as you outline.

With that said, we still expect to be transferring large amounts of data, hence we're looking to understand if we should continue with FTP or whether to go for e.g. HTTP instead - but we're a bit unsure on the pros/cons between the two.

Best,
Martin
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1396
  • Country: us
As for the WiFi transfer, our logger currently transfers around 9 MB/s (~540 MB/min) without problem using the FTP functionality.
That (9 Megabits per second) still seems a factor of ~200 "too high" to me for a single CAN bus, but I'll drop it as you're the one with the device in your hand.
With that, we're probably down to a choice between FTP/SFTP - and HTTP.

Any thoughts on pros/cons of FTP vs. HTTP for our use case?
(The bare minimum needed of) http is an easier protocol to implement on the client, IMO. ftp has weirdness with control and data channels and the directionality of opening one of them (from the server to the client) in non-PASV mode ftp.

http is just open a TCP socket and stream bytes in a particular format and probably has a robust library for any wifi-enabled device already prepared.
 

Offline krho

  • Regular Contributor
  • *
  • Posts: 213
  • Country: si
If you are going with the https then take a look at https://tus.io/
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 2039
  • Country: us
9MB/s still sounds like an order of magnitude error, most cars and trucks use 250KHz canbus with about 45% utilization being the highest I have seen,

It sounds like that is dictated by a possible low duty cycle of network access.  The monitor logs data to an SD card, then transmits when it gets network access.

HTTP is pretty easy to implement for the upload itself.  The tricky part if you are uploading to a public cloud service will be authentication.  If you need to use one of the OAUTH flows you need to have a library that supports that.  These are easy to come by for a fully functioning OS, but may be more difficult on a microcontroller / RTOS.  The second part is figuring out how to implement one of the authentication flows without a display for a use to use a web browser.  It has been a while since I looked at this, but it should be possible.
 

Offline viperidae

  • Regular Contributor
  • *
  • Posts: 198
  • Country: nz
Mqtt is not designed to transfer large files. Go with HTTP. If you go with HTTPS, make sure you have enough processing power to do the encryption.
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 5762
  • Country: nl
What you are describing is IMO not a true IoT device. A true IoT device has the ability to connect to the net anytime, like sensors that wake up once a minute and post a few bytes and goes back to sleep again to full time on devices with fixed connection.

You describe a standalone logger that now and then connects to the internet to dump its data.
MQTT is out, the rest is open and up to you, I would opt to sizelimit and encrypt the data chunks to for instance 1MB incl. a secure hash so it is intrinsic secure. The encryption could be done offline so you don't need state of the art processing power.

 If you get a connection you can then sent as many chunks as possible before the connection is lost again. That way you are more flexible for connection time.
 That way you could even pack the data chunks in an mail protocol and sent over any public untrusted connection if you like but it is IMO better to get an acknowledge of reception or data could get lost. Depending on your requirements ofcourse there are many protocols to choose from but not all road open wifi hotspots support them. So probably http or other protocol over port 80 will be most supported.
« Last Edit: May 05, 2018, 08:22:58 am by Kjelt »
 

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Thanks again, super helpful!

I agree that it sounds like HTTP would be an option.

Could you give a few thoughts on what the pros/cons would be with HTTP vs FTP?

One potential con we see with HTTP is that we've not managed to find a simple client that the "less techy" can easily install to receive files e.g. on their own PC. We find this to be fairly simple with FTP - but maybe we've just missed it for HTTP?

Also, would FTP vs HTTP differ in terms of the ease of transferring the data to e.g. the AWS servers, Google servers etc? Ie would there be differences in hardware requirements between the two protocols etc?

Thanks again,
Martin


 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1396
  • Country: us
No hardware differences server-side.
Free (as in beer and as in speech) software available for severs for both protocols.
Http more widely used in the field (meaning every WiFi router or hotspot is likely to support).
Almost no one blocks http. A small number of networks block ftp.
Http works easily over most anything as it’s become a common transport protocol, even for “non-webby” things.

For desktop use case, I’d send things to your cloud server and then have the user download using a browser to keep things easy (but requires your cloud server to exist and an internet connection). This pattern is part of how “internet of shit” started, but it really is the easiest to implement.

If you want truly disconnected, works in Antarctica operation, you can get free server software for desktop PC too. (You are looking for a webserver [or ftp server], nothing more. Apache, nginx, IIS, and several others. Or, for just your app, you can write a TCP listener in a few hundred lines of code.) I sell a small hardware device in low quantity and user support is >95% about the software on their PC. It’s a drag; try to avoid it! Someone has Windows, someone has MacOS, someone has XP, someone has a virus, someone’s not an admin, you name it. Everyone has a browser and if they don’t, they don’t expect it’s your fault...
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 18566
  • Country: nl
    • NCT Developments
One potential con we see with HTTP is that we've not managed to find a simple client that the "less techy" can easily install to receive files e.g. on their own PC. We find this to be fairly simple with FTP - but maybe we've just missed it for HTTP?
HTTP will definitely be more complicated because the user will need to setup a web-server. Another problem is that your solution would require the user to have a server on an externally accessible internet connection. From a security stand point this is a less than optimal solution. I'd concentrate on logging data onto a cloud service and/or setup your own servers and create some extra revenue. The cloud servers aren't free so why should your service be free?

Quote
Also, would FTP vs HTTP differ in terms of the ease of transferring the data to e.g. the AWS servers, Google servers etc? Ie would there be differences in hardware requirements between the two protocols etc?
That depends entirely on what the protocol requires (encryption, flash space for code) and what your current hardware can handle. This is impossible to say without knowing the protocol and the abilities of your hardware. You'd need to asses that first as a research project before taking this project any further.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Hi again,

We're leaning increasingly towards HTTP/HTTPS.
In this regard, we're looking at "alternatives" to the simple FTP client server that a user could previously install on their local PC or on whatever server they use.

It seems fewer clients exist for HTTP than FTP, but we did come across something like Zenko that might be an option.
However, we have very little experience in this field.

1) Does anyone know if there are other/better alternatives to Zenko that we should consider for this?

2) Has anyone had experience with using Zenko in combination with an IoT/data logger device ala what we've described?

Thanks,
Martin
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1396
  • Country: us
The browser that you are using to post to EEVBlog is an http client. There are many: Chrome, Firefox, Safari, lynx, curl, wget, ...

The other side is an http server. Apache, nginx, and IIS are some of the leading http/https servers out there.

I work in the web field and have never heard of Zenko. I had a quick look, but it wasn’t obvious what led you to consider them. If it was a lack of awareness of the leaders in the field, pick one of the leaders instead. IIS if your company is all Microsoft and Apache or nginx otherwise. Don’t go with an oddball if you have little experience unless you have a very good reason.

If you need something that anyone can install on their computers, choose Apache. It’s the most popular webserver that runs on most platforms.

https://news.netcraft.com/archives/2018/02/13/february-2018-web-server-survey.html
 

Offline larsdenmark

  • Regular Contributor
  • *
  • Posts: 89
  • Country: dk
Any webserver will do.

It is probably the easiest to get permission from the customer's IT-department (if they have one) to install one of the standard webservers (Apache, Jetty, node.js), but you can easily make your own using .NET's HttpServer.

Since each server is not handling thousands of connections simultaneously and should only support uploading of files it shouldn't matter much what server you choose.

Remember that anyone can connect to a webserver and hence you need to protect it using a firewall so people can't use it to enter the customer's network.

I know nothing about Zenko - from the little I've read they seem to be best at receiving small amounts of information at a time (which it doesn't seem you want to do).
 

Offline Martin F

  • Regular Contributor
  • *
  • Posts: 128
  • Country: dk
Sorry for being unclear. What we need is akin to an ftp file server, but just for http. Something simple that end users can install on either their server or their local pC to be able to receive the files from the logger.

Today we use FileZilla Server for testing the FTP functionality. Ie we are not looking for a web server.

Sorry if it remains unclear,
Martin
« Last Edit: May 09, 2018, 11:58:57 am by Martin F »
 

Offline sokoloff

  • Super Contributor
  • ***
  • Posts: 1396
  • Country: us
Sorry for being unclear. What we need is akin to an ftp file server, but just for http. Something simple that end users can install on either their server or their local pC to be able to receive the files from the logger.

Today we use FileZilla Server for testing the FTP functionality. Ie we are not looking for a web server.
If you want a server to serve up http, that’s a webserver. It doesn’t have to serve HTML and CSS or be intended for humans to connect to, but it’s still a webserver (IMO).

Plenty of Apache, IIS, nginx, tomcat servers are serving web services or other machine to machine applications.

You could whack together or buy some custom application if you had to, but I’m not seeing the reason to not use standardized, off the shelf software, fairly well-proven and understood in terms of security, and that millions of people can administer, you can get easy help on Google, usually read the source, etc.

Can you help us understand better what you mean by “we are not looking for a webserver”? Because I think you are and just don’t realize it.
 

Offline borjam

  • Supporter
  • ****
  • Posts: 778
  • Country: es
  • EA2EKH
First you should consider wether a push or pull approach suits you.

I favor push for three main reasons.

- The device chooses when to upload data.

- You don't need to fiddle with authorizing incoming connections in a firewall, which means everything is simpler and the
attack surface is greatly reduced.

- In case you end up having several consumers of that data you don't encourage direct connections to pull data from the devices.
Several apparently harmless connections can add up and become a problem

Being push of course opens another Pandora's Box: securing the server where you receive the data. And for that I prefer
an approach like MQTT. From the point of view of the data sources it's a push service while it's pull for the consumers.

Again, for a consumer (ie, the server that gathers all the data in order to store/process it) it's much easier to keep it secure if
you don't have to manage any incoming connections at all. Of course the MQTT broker must be properly secured, but
what has the worst impact? A compromise of your über important "big data" server or the compromise of a MQTT broker that
mostly will have only a relatively small amount of "in transit" data?

 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf