Author Topic: how does google not run out of space  (Read 11422 times)

0 Members and 1 Guest are viewing this topic.

Offline Red Squirrel

  • Super Contributor
  • ***
  • Posts: 2750
  • Country: ca
Re: how does google not run out of space
« Reply #25 on: April 27, 2017, 02:51:03 am »
Due to the high volume of writes flash is probably also not viable as flash has write limitations so I imagine they'll always stay on spinning disks
That would assume they actually delete things.  Flash write limitation only comes into play when you write, delete, write, delete, write, delete, over and over again, hundreds of TB per drive.  If all you're doing is archiving, that's one write cycle, and then it just sits there forever being occasionally read from.

True I guess when you make a change it won't overwrite it will just create a new copy.   I now wonder, perhaps they don't delete anything either, like youtube videos.  So that would mean they REALLY need to keep expanding storage at a crazy rate.
 

Offline fubar.gr

  • Supporter
  • ****
  • Posts: 366
  • Country: gr
    • Fubar.gr
Re: how does google not run out of space
« Reply #26 on: April 27, 2017, 10:26:32 pm »
There are several tricks used in the industry to deal with minimizing hard drive space.

One of them is thin provisioning. For example, there are 1000 gmail accounts, and each one has a 5GB limit.

So in theory google would need a total of 5000GB hard disk space. But in practice, many of these accounts are dormant and most users will never come anywhere near the 5GB space limit. So in reality you can only have a fraction of the total advertised space available, with some headroom for safety and then add more drives if needed.

Another trick is data dedublication. For example, I send a file attachment from my gmail account to another. Now we both have this file in our gmail accounts. But is is the same freaking file! It doesn't make sense to keep two different instances of the same file. Gmail only stores it once and all links lead to this one file.

But even with these tricks google still needs a shitload of hard drive space, so they keep adding more datacenters to their network.

Offline bill.coghill

  • Contributor
  • Posts: 14
Re: how does google not run out of space
« Reply #27 on: April 28, 2017, 08:19:22 am »
I can tell you they use a lot of mechanical drives and have people who destroy them using a fantastic big spiky tool once they reach a certain age. They don't like SSDs as they can be hard to fully destroy to stop data leakage.  Longer term storage is on tape with robots for tape handling.

Sent from my Nexus 5X using Tapatalk

 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: how does google not run out of space
« Reply #28 on: April 28, 2017, 08:42:35 am »
Due to the high volume of writes flash is probably also not viable as flash has write limitations so I imagine they'll always stay on spinning disks, maybe also ram.

I don't think that's the case. I recall a request to the Flash industry (from Facebook, IIRC), to develop a cost-reduced Flash technology, which more-or-less simplified the endurance requirement to 'write once, read never'.

Think about it... once something is posted to the web, it's rarely changed. A photo of your dog gets posted, viewed by the two or three people in the world who care about it, and then it's forgotten completely (but remains on the server). Physically, it sits in a cache for the first few days after being uploaded, then it needs to be moved somewhere for long-term storage. It may never be accessed again, and it certainly won't be getting updated on a regular basis.

Ultra low cost, very low write endurance, but nevertheless long-term reliable storage is what's needed.

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: how does google not run out of space
« Reply #29 on: April 28, 2017, 09:53:52 am »
There are several tricks used in the industry to deal with minimizing hard drive space.

De-dupe and thin are two of the most common, thin being very cost effective if monitored and managed properly, de-dupe, I don't know how that'd scale across Google's geographically diverse datacentres and massive storage needs, I can imagine some measurable performance hit unless the algorithms took into account the location of users which seems rather complex?
 

Offline Hypernova

  • Supporter
  • ****
  • Posts: 655
  • Country: tw
Re: how does google not run out of space
« Reply #30 on: April 28, 2017, 01:23:20 pm »
Due to the high volume of writes flash is probably also not viable as flash has write limitations so I imagine they'll always stay on spinning disks
That would assume they actually delete things.  Flash write limitation only comes into play when you write, delete, write, delete, write, delete, over and over again, hundreds of TB per drive.  If all you're doing is archiving, that's one write cycle, and then it just sits there forever being occasionally read from.

The charges in high density triple layer flash cells dissipate in time frame measured by the low thousand hours. They need to be cycled around and that does impose a limit on their lifespan. As cold storage they are worse than tapes.
 

Offline suicidaleggroll

  • Super Contributor
  • ***
  • Posts: 1453
  • Country: us
Re: how does google not run out of space
« Reply #31 on: April 28, 2017, 02:04:46 pm »
Due to the high volume of writes flash is probably also not viable as flash has write limitations so I imagine they'll always stay on spinning disks
That would assume they actually delete things.  Flash write limitation only comes into play when you write, delete, write, delete, write, delete, over and over again, hundreds of TB per drive.  If all you're doing is archiving, that's one write cycle, and then it just sits there forever being occasionally read from.

The charges in high density triple layer flash cells dissipate in time frame measured by the low thousand hours. They need to be cycled around and that does impose a limit on their lifespan. As cold storage they are worse than tapes.

There is a lot of FUD in this post...

First, the "low thousand hour" charge dissipation time is for:
1) Unpowered drives, it doesn't apply when the drive has power
2) Only applies to drives that have already hit and exceeded their write cycle limit
3) Only applies to drives being stored at extreme temperatures

Yes, if you take an SSD, abuse the hell out of it for years and manage to hit the write limit, then remove it from the computer and throw it in a box, and then put that box on a shelf in a 50C warehouse, the data will likely decay and become unusable after a few weeks/months.  That's a pretty extreme use case though.

I don't know what you mean by "cycling around" the drives reducing their lifespan...all you need to do is power on the drive to get around this decay clock, you don't need to write anything, you don't need to move anything.  And who said anything about using flash media for "cold storage" anyway?  I don't think anybody has suggested Google would archive PBs onto SSDs and then throw them in a box on a shelf in a warehouse for years, it wouldn't make any sense.
« Last Edit: April 28, 2017, 02:07:06 pm by suicidaleggroll »
 
The following users thanked this post: Kilrah

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4078
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: how does google not run out of space
« Reply #32 on: April 28, 2017, 02:20:59 pm »
how can they keep up with so much data being sent to there servers
High level file systems. Like gluster.

Simple way to get a petabyte at your small home office. You know, to store... stuff....


You can also look-up LTT's Petabyte project.
 

Offline kc8apf

  • Regular Contributor
  • *
  • Posts: 103
  • Country: us
Re: how does google not run out of space
« Reply #33 on: April 29, 2017, 10:30:24 pm »
Hi there. I managed Google's hard drive team for a year and now manage their server management controller firmware teams.  Google uses a lot of enterprise-class hard drives, a lot of flash, and a lot of tape. Hard drives are for bulk storage, flash is when access speed is critical, and tape is for backups. Cluster-level filesystems (descendent of GFS, similar to gluster) hide the details of individual storage devices from the apps and provide fault tolerance.

Google has been getting more involved in Open Compute Project. If you're interested in what a Google server looks like, the schematics and board layouts for a recent design, named Zaius, are available on GitHub at https://github.com/opencomputeproject/zaius-barreleye-g2.

Sent from my Nexus 5X using Tapatalk
 
The following users thanked this post: tautech, Dubbie, Kilrah, CJay, Jacon

Online John_ITIC

  • Frequent Contributor
  • **
  • Posts: 514
  • Country: us
  • ITIC Protocol Analyzers
    • International Test Instruments Corporation
Re: how does google not run out of space
« Reply #34 on: May 01, 2017, 01:04:57 am »
Which begs the question; what form of data storage has the highest density ?
Used to be tape.  :-//

Magnetic media (hard disks). And it will keep being many times cheaper / Mbyte than solid state memory for 15 years.

Source: Western Digital (manufacturer of both solid state and HDDs).
Pocket-Sized USB 2.0 LS/FS/HS Protocol Analyzer Model 1480A with OTG decoding.
Pocket-sized PCI Express 1.1 Protocol Analyzer Model 2500A. 2.5 Gbps with x1, x2 and x4 lane widths.
https://www.internationaltestinstruments.com
 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: how does google not run out of space
« Reply #35 on: May 01, 2017, 06:17:34 am »
Magnetic media (hard disks). And it will keep being many times cheaper / Mbyte than solid state memory for 15 years.

Source: Western Digital (manufacturer of both solid state and HDDs).
Who wants to make sure any trend they attempt to set allows for comfortably amortizing their HDD manufacturing plants, while they hope competition doesn't require them to invest in more SSD manufacturing too quickly :P
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4078
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: how does google not run out of space
« Reply #36 on: May 01, 2017, 07:10:32 am »
Which begs the question; what form of data storage has the highest density ?
Used to be tape.  :-//
I think that flash will win in terms of absolute density.
But there are three factors involved in choosing storage:
- Density.
- Price.
- Latency.

Can't have all 3.
Flash has density and latency. Tape has density and price. HDD's have price and latency.
 

Offline cncjerry

  • Supporter
  • ****
  • Posts: 1283
Re: how does google not run out of space
« Reply #37 on: May 07, 2017, 06:19:51 pm »
I've been in the IT business since '76, retired two weeks ago.  I had clients that have 65PB spinning which is nothing compared to large content providers and insurance companies.  Most use tape.  It's all automatic and cached.  For example, if you stream a video from the beginning you don't notice the delay because the look-ahead grabs and spools the tape before you get to it.  If you try jumping ahead in a video stream that's when you notice the lag.  I have friends that sold storage arrays to Google and I could find out, but I doubt they have 100PB spinning. They have a lot of flash now for tier 0. Remember, they are more of an indexing and caching system though they have content now.

If you want to see storage, look at large banks and insurance companies.  The largest I've seen in my career was a large insurance company who's data center was built into bedrock that could withstand a 20Mt blast in NY city as they were about 70 air miles away.  I heard they had 135PB spinning but were reducing and going to all flash arrays because with compression and de-dup, the economics were in favor of SS arrays.
 
The following users thanked this post: tautech, bitseeker

Offline Red Squirrel

  • Super Contributor
  • ***
  • Posts: 2750
  • Country: ca
Re: how does google not run out of space
« Reply #38 on: May 07, 2017, 07:58:38 pm »
Downside in this day and age is that because we store everything digitally, it's going to completely vanish one day, there won't be a single record.

Maybe that's a good thing though, do we really want generations in centuries from now realizing that we just spent most of our time looking at cat videos and making memes?  :P
 

Offline Kilrah

  • Supporter
  • ****
  • Posts: 1852
  • Country: ch
Re: how does google not run out of space
« Reply #39 on: May 07, 2017, 09:01:26 pm »
Why?
I believe the opposite, we're finally able to store data with no alteration for any length of time.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf