No Script, No Fear, All Opinion
RSS icon Home icon
  • ISP Bandwidth Sawtooth Oscillation – Buffer Bloat!

    Posted on May 16th, 2014 EEVblog 22 comments

    So I just got my new 8M/8M VDSL2 office internet installed in the lab (see video).
    And speedtest.net looked great of course, and everything seemed to work a treat, my connection seemed quick, and a quick youtube speed test showed it to be much faster than my old ADSL2+ connection as you’d expect.
    SpeedTest8M8MBut yesterday I tried to upload a new video to youtube and it didn’t seem to be as fast as expected. So I looked at my upload bandwidth graph, and noticed something strange:

    YoutubeUploadBandwidthOscillating

    ISP upload bandwidth sawtooth oscillation

    The upload bandwidth speed (green line) looked like it was oscillating! A classic sawtooth waveform in fact. Linearly ramping up from a low of about 2Mbps, to about 6.5Mbps peak. But as soon as it reached the peak it instantly shot back down and started to ramp up again. Naturally if you take the average of that waveform you are going to get a lot less than the peak value. So my 8Mbps upload connection was looking and performing very ordinary. And this happened with both youtube uploads and FTP uploads to my server.

    I hadn’t seem this before, my Telstra 100M/2.4Mbps cable connection at home always had a flat line, or a very small amount of ripple on the flat line (yes, I of course think of these graphs as electrical waveforms, I can’t help myself). And my old ADSL2+ connection in the lab also had a similar line, although being physical line limited instead of software limited like the cable modem, the line wasn’t as flat. But this oscillation thing was new.

    So onto twitter I went for a collective nerd decision as to what was causing this, and everyone had their pet theory – Buffer Bloat!, bandwidth capping!, dropped packets!,  QoS bandwidth limiting effect on TCP!, typical TCP congestion control!, higher priority voice data bursts!

    Well, a google search turned up nothing, couldn’t find a single similar image of this linear ramping.

    So I contacted my ISP Building Connect (Oxygen Networks) and the head tech guy Paul who installed my system wasn’t entirely sure what was going on either at first. But he called me back and we experimented with a few things, like lowering my bandwidth plan, but that didn’t work. But he could see the exact same waveform on his end as on my end, so that was good. He even started to suspect it might be something at my end, Windows related, not backing off properly on bandwidth as it should when peak limit is reached.

    But then he then tweaked the buffer size values on the access router at their end, and sure enough, we hit upon a magic value that made the line go completely flat:

    YoutubeUploadBandwidthOscillatingFixedThere was still the ramp up at the start as expected, but it was flat after that. with the occasional small glitch as you’d expect. This was for an FTP upload to my server in the US. Youtube upload also showed the same flat response. Woo-Hoo! problem fixed in record time.

    So it was some sort of buffer issue on the ISP access router, and maybe in combination with a buffer or other setting in my Windows 7 O/S. There seems to be a optimum total system buffer setting that causes this issue. If it’s not set correctly then the ISP router dumps packets galore and presumably reflushes the buffer, hence the linear ramp up in bandwidth again until the buffer hits maximum and the whole process starts again. Bingo, you get a classic sawtooth waveform.
    So this seems to be the infamous “buffer bloat” I hear Cringely talking about all the time.

    I haven’t looked into all the mechanics of this, but it certainly is a nasty little problem, and had I not gone with a small ISP were I had access to the head technical person who could tweak my connection for me, I might have stuck with this problem. Thumbs up for small ISP’s that provide personal service.

     

     

    Be Sociable, Share!
    • np

      Hi Dave,

      Use a serious testing platform like http://testmy.net/

      You can do real and conclusive tests !
      Upload/Download/Dual, Mutlithread, Multisite and you can even do batch testing your line every 10m, hour, …

      All tests are recorded on a public database, and you can conclude easily what your internet link worth. (per provider, country, city, …)

      Stop using fancy stuff, use real engineering tools ;)

      BR

      • http://www.eevblog.com EEVblog

        Didn’t know about that site, but it failed miserably. I tried the upload test it tells me I only get 280Kbps?

        • nplanel

          That depend on default server(s) selected.
          Look on the top to change/select your preferred server to test.

          I think, default selected server is Dallas Texas, so 280Kbps is your international upload rate (to Dallas Texas).

          • http://www.eevblog.com EEVblog

            None in Oz, so much for that.

            • http://TestMy.net CA3LE

              Thanks for the comment nplanel. I’m flattered when I read comments like yours.

              I’m the programmer behind TestMy.net by the way. I’m always scanning around the net seeing what people have to say about my service. I rarely comment, I usually just use the research the make my site better.

              So I assume by OZ you’re talking about Australia, right? Let me explain some things that may help you understand my site better.

              I purposely make my tests harder, an easy test doesn’t help anyone, that goes for any type of test. I have a multithread test that’s often enables the client to pull much higher speeds, I could easily make it the default… but if I went around just telling people the best news I wouldn’t be able to help anyone improve anything. Tuning your performance to be faster on TestMy.net’s classic single thread test will noticeably improve your speed everywhere. Ookla tests are designed to make you feel better and pull the highest numbers. Personally, I don’t care if you feel good about the results, I want you to know what I saw during the test. Plain and simple, raw information and I have very powerful networks and servers behind my methodology.

              If you allow me to post a couple links it can help you understand a little more. http://testmy.net/ipb/topic/28902-why-do-my-results-differ-from-speedtestnet-ookla-speed-tests/ & http://testmy.net/legit-speed-test.php

              If you’re in OZ I recommend testing against my servers in Singapore and the West Coast US. Don’t get hung up because I don’t have a server in AU. I offer MANY ways to test. Like I said, the multithread test will help you pull much higher speeds internationally, I also have CDN options on the mirrors and multithread pages which pull from world-wide networks. Ping and traceroute to cloud.testmy.net and google.testmy.net and you’ll see how close you are to those options. In my opinion you should also seriously look at the longer route speed test numbers too. You probably hit a lot more servers at distance than you think. Traceroute 20 random addresses out of your history and find out. Don’t you want your speed test to reflect your real world performance.

              PLUS, if you have a server you’d like to test against… I’ll even arrange that for you, for free like everything else on TMN.

              I hope after really using my system you’ll start to understand why it’s the best. Many don’t notice till they have an issue that the other speed tests fail to detect… the way TMN is designed, if your connection is influenced by something it is ALWAYS apparent in the results. It’s so deceiving how other tests can adjust the negative right out of the results, before it’s processed the worst portions are removed… isn’t that what you’re trying to see??! At the end of the day I really just want to make all of our Internet faster and hold ISPs accountable for their claims. Look it up, Ookla was founded by a former ISP CEO… hummmm, I wonder why their results favor the ISPs so much. Hahahaha, and yet… the majority choose to ride with blinders on and keep using them. Brainwashed into thinking you need to have a server nearby to be accurate…. the only reason you need a server nearby is if you want to eliminate the factors you came to test in the first place! … If people truly want faster Internet they need to take off the blinders and admit to themselves that their connections aren’t always as fast as they think they are.

              My connection is no different, I pull over my 105/20 Mbps.. I get up to 116 Mbps download using the google.testmy.net option, 20-21 Mbps up. Using multithread to test all of the servers simultaneously (only TestMy.net can do this) I get 109 Mbps, http://testmy.net/db/KGy12fo pretty good. But if I go internationally it varies. Asia, furthest route from me, I only pull 13 Mbps on single thread, 47 Mbps on multithread. Amsterdam, 2nd furthest, 24 Mbps single 86 Mbps multi. Testing to both Amsterdam and Singapore simultaneously using the multithread test I’m able to pull 93 Mbps http://testmy.net/db/CrxtkbT … I’m in Colorado Springs CO, 4900 miles from Amsterdam and 9100 miles from Singapore. See how multithread tests can make things look better. Yeah, your connection can do that speed but it has to open a bunch of threads to pull it off. The difference in my site is that I give you the opportunity to see all of those details where others don’t — I leave it up to you how you want to use the tests and interpret the results.

              Happy Testing! Thanks again for the mention.

              • http://www.bufferbloat.net dave taht

                I would really like it if your service had a test that tested download and upload speeds – vs latency – at the the *same* time. You can do this with a short http get as the basic ping service is unavailable.

                I’d be able then to recomend it to my bufferbloat-sensitive users as a better means of tuning their systems for SQM.

                Presently we don’t use inbrowser tools as they had been too slow and too gamed to rely upon for a valid test.

                http://snapon.lab.bufferbloat.net/~cero2/jimreisert/results.html

                • http://TestMy.net CA3LE

                  That will definitely be in my next version. It’s already doing this… and much, much more. I’m just the only one that’s able to see it right now. :-P

              • Chupacabras

                I don’t find http://testmy.net/ to be accurate.
                I am from Europe and have connection 100/100Mbps, but http://testmy.net/ shows download speed 25-45Mbps (I have tried different servers from list, and multiple threads as well). Other speed tests (here in Europe) detect my downlaod/upload speed between 85-95Mbps.

    • Tom

      Could be your window (MSS) size. What OS?

      I think the way it works is your system blurts out a TCP window worth of data and then ticks off each packet as the ACKs come back. If you send more than the upstream router can take, the router drops some packets and then your computer is left waiting for the lost packet ACK until it gives up and re-sends. This means it pauses, resends, then gets the ACK and then keeps blasting, fills the buffer a packet gets lost… and the cycle continues.

      I think Windows 7/2008R2 dynamically fiddles with the MSS to fix this.

      Could be some errant filtering blocking an ICMP coming back.

      • http://www.aykira.com.au/ Keith

        Tom, thats window scaling and should be in every TCP/IP stack, I think combined with RTT timestamps allows it to work out how much data can be in transit down the ‘pipe’ at a time without backing up. Fragmentation & contention ‘throws out’ the balance.

        A little aside – Years ago I wrote a data transfer protocol on Econet (Acorn) that used NAK – i.e. tell me when you have something missing or that we are done. Went like the clappers…

        Keith

    • http://www.bufferbloat.net dave taht

      Might have been underbuffering or overbuffering.

      The core thing you should do now that it is “fixed” is measure your latency with ping before and during your upload. IF he used something like fq_codel on his end, latency will only increase by a few ms. If he didn’t latency probably got pretty bad.

    • Paul

      Now you have given my a problem.
      1/Internet Geeky stuff.
      2/ Watching paint dry on this door I just painted.

      1/2 can’t decide, I forgot there’s the third option,
      Yes that’s the one “The Off button”

    • http://www.aykira.com.au/ Keith

      Combination of window size & packet fragmentation I’d say – when the size of the packet used doesn’t directly match the upstream packet sizes (plus protocol ‘junk’). You get a rolling processing overhead that comes and goes, which translates into effective throughput. As the fragmented packet remains fragmented until it gets to its destination (or something else in between needs to break it up further..) – so everyone pays for fragmentation overhead; especially if things arrive out of order or not at all…

      Tune your MTU always…

    • Speedman

      8M/8M VDLS2? How agricultural. At least it isn’t Morse code speeds. Why don’t you try 4G at work? Or cable? 8Mbps download is worse than a decent ADSL2+ connection, but I guess you are far from the telco’s Central Office.

      I get 90-120Mbps download here – and cheap if you bargain Telstra down and if you live/work in a decent location.

      If I were you, I’d do some research into something decent. If you are in a crap area, try 4G at least and put up a decent dual polarity antenna. Slightly costly, but at least it is fast.

      Telstra won’t improve bad copper lines any more – they are simply not interested.

      • http://www.eevblog.com EEVblog

        1) I don’t have cable in my building
        2) 4G is the wrong solution for my needs Potentially unreliable, high latency (I do a live radio show, these things matter), expensive and pointless in Australia for anything approaching my needs.
        3) 8M/8M is TEN TIMES faster than my ADSL2+ connection. I’m a content producer, I care about uploads, not downloads.
        “Something decent” is what I have now. It is a rock solid low latency fast upload connection that is expandable, and has a proper SLA in place.

    • Stephen

      Use testmy.net speedtest doesn’t give real numbers.

    • Paul

      I get two completely different virgin give me one set & Test my give another, it’s too much for me to take in, all I know it’s fibre optics and it’s fast, very fast.can we move on ,I don’t’ understand this stuff

      • JoeO

        Paul: No one is stopping YOU from moving on. Bye Bye.

    • Alex

      There’s something called “Additive increase/multiplicative decrease” used in TCP congestion avoidance. It creates a sawtooth pattern.

      http://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

    • CF

      Alex is the man,
      That is typical look of congestion avoidance, TCP backs off to at least half speed by decreasing the number of packages that can be unacknowledged, then the congestion window is slowly increased as packets are acknowledged. At some speed the router at the ISP dropped packages causing the congestion avoidance to kick in. That happens when RTT is exceeded i.e. some package is unacknowledged for longer than expected. Hence most likely there was “tail drop” in the router buffer in the ISP.

    • Bruno

      It is certainly TCP congestion avoidance.. it looks exactly as TCP tahoe http://degas.cis.udel.edu/QualNet/Exercises/Chapter3/cwnd_tahoe.JPG
      It might be that router’s buffer got full and started dropping packets and congestion avoidance kicked in…

    • http://www.flickr.com/photos/razor512/ Razor512

      When a router runs into a buffer issue, it means that the ISP has oversold the service and is unable to provide the advertised throughput to all of the customers, and thus buffers fill and then packets are dropped which causes that throughput pattern.