Author Topic: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.  (Read 19756 times)

0 Members and 1 Guest are viewing this topic.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26757
  • Country: nl
    • NCT Developments
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #75 on: April 27, 2016, 04:24:56 pm »
One of the reasons I have my backups on active hard drives (spread over several devices) nowadays is because I can check them very easely for completeness.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline StuUK

  • Frequent Contributor
  • **
  • Posts: 390
  • Country: gb
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #76 on: April 27, 2016, 04:53:02 pm »
99.99% is still over to 3.5 days of outage per year!

Seriously, have you any idea about just how good that is in the real IT world..... most service providers never achieve it even for critical systems....
That would not class as mission-critical. Heck, if your Internet was down for more than 12 hours you'd be on the phone to your service provider. With RAID redundancy, hot-swappable hardware, clusters across different countries and UPS 3 days is far too long, even over a major holiday like Christmas.

Think E2E not just hardware.....
 

Offline Brumby

  • Supporter
  • ****
  • Posts: 12288
  • Country: au
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #77 on: April 28, 2016, 02:25:23 am »
One of the reasons I have my backups on active hard drives (spread over several devices) nowadays is because I can check them very easely for completeness.

The fact that you have backups on active drives does address the one somewhat important and seldom thought about issue: "Can you be sure your backups are readable?".  That's one issue you shouldn't have.  But, all too often, people will assume that they will be able to just grab their backups and everything will be there and readable.  This is particularly a risk if you use flash drives.  You're not going to be happy if your "32GB" flash drive is actually a hacked 4GB one (that wasn't checked before use).  You won't get any write errors during the backup, but the data available will be a lottery.


But here's my question: "How do you know they are complete?"  I'm not picking on you - this question applies to everyone who does backups.

Data like names and addresses, inventory, sales and accounting is the obvious - but the list of candidates is much more than this and includes.
  • Program installation files
  • Program updates
  • Program patches
  • Product keys
  • Parameter and configuration files
  • Operating system installation files (Anybody running XP?)
  • Operating system updates
  • Operating system patches
  • Network configuration (routers and such)

I'm sure people could add a few other things if they sat down and thought about it.  You aren't really going to know unless you do the recovery exercise.
 

Offline Brumby

  • Supporter
  • ****
  • Posts: 12288
  • Country: au
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #78 on: April 28, 2016, 02:30:25 am »
I'll also go just one more step into the DRP dilemma....

Do you have any hardware that is necessary to run a particular application?  For example, in the case of a dongle that gets lost in a fire, what will you do?


Apologies ... I'm getting a bit off topic.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #79 on: April 28, 2016, 02:42:07 am »
No, that is a good backup. The whole kitchen sink is good, though hopefully there is more than a single copy, so you can at least have the last week, month just in case.

I have the last 5 days, plus twice a month.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #80 on: April 28, 2016, 02:43:58 am »
I wasn't going to say anything, but nctnico raised the topic.  I know this is not going to be enthusiastically received, but it applies to anyone who does backups .....
Do you know your backups will have everything you need to recover?  (Not just the things you thought of.)
The only way to really know is to take your backups and restore onto a blank machine.  If you haven't done that exercise when it isn't required, how do you know it will work when it IS?

Because it's the standard WHM backup that backs up all files and databases. How many millions of systems must use the WHM backup?
If it doesn't work then I'm sure that would be pretty big news.
It is also the system that the host uses to recover is something goes wrong.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #81 on: April 28, 2016, 02:57:51 am »
But here's my question: "How do you know they are complete?"  I'm not picking on you - this question applies to everyone who does backups.

Looks, here is the level of redundancy and backup on the EEVblog dedicated server and the levels of how shit would hit the fan:

1) Server uses RAID-1, so an exact disk copy is always on hand should one drive fail. Yes, I know RAID isn't backup, blah blah, but it keeps the system working. This is monitored by the web host company and drives replaces within hours of a fault. I believe they also have potential fault detection of some description to catch thing before they get bad.
So for this to fail, either both drives have physically fail, or something deliberate has to happen to nuke the server software and both copies.
Not a single issue with this from day 1

2) WHM backups of the entire server files and all the databases are done daily, Last 5 days worth, plus twice a month for two months. These are stored locally on RAID-1 system, so the host can use these to recover the server and/or database should something happen to the server, the database craps itself, files got deleted etc. Once again, never had to resort to this, because the first backup system takes care of failed drives.
Even if the latest backup got trashed and started to propagate through the daily backup system, it's almost inconceivable that not a single forum user wouldn't notice something is up, threads are missing or corrupted for two months that it would take to cycle through the backups.

3) Those same WHM backup files also get mirrored onto Amazon S3 (currently one big 25GB uncompressed TAR file (maybe this isn't the best option, maybe i should just copy the zillion files individually?), so even if the entire RAID-1 system goes down, the server building burns down etc, those backup are still available off-site.

If we ever had to resort to reinstalling the Amazon backups then that would likely be a lot of work, but no data should be lost apart from recent posts up until the last backup.
« Last Edit: April 28, 2016, 03:06:21 am by EEVblog »
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #82 on: April 28, 2016, 03:38:07 am »
BTW, you might have seen the forum for down for a bit just now, it was me optimising database tables.
 

Offline Brumby

  • Supporter
  • ****
  • Posts: 12288
  • Country: au
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #83 on: April 28, 2016, 05:17:24 am »
Actually, I wasn't thinking about the EEVBlog with my comments - just a general comment about backups in the wider community.  This tangent was getting away from the topic at hand, which is why I offered the apology.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #84 on: April 28, 2016, 05:21:39 am »
I think gnif talked about maybe having some sort of mirrored redundant server system at one stage, but it was messy.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #85 on: April 28, 2016, 05:25:07 am »
If I read Brumby's comment correctly it is the difference between "hoping" you can restore from backups and "knowing" it works because you have actually tried it.

And trying it once doesn't necessarily mean it'll work in a years time...

There is also a difference between having a backup that simply won't lose your data but may require work to recover. An having a backup that is trivial to recover from in a moments notice.
I liken my current backup system to the former if SHTF. Although it is the later if it's just a drive failure, or a system recovery, my host will take care of that for me.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #86 on: April 28, 2016, 05:36:21 am »
Does it get you a running hot backup that you can switch to in the event of a prolonged outage at your current hosting service? All without touching a single backup. Or transferring GB's of data across a network.

That was the idea, yeah.
I don't recall details, but it involved some complex thing that didn't involve changing the DNS (which can take days to propagate and negates the point of the whole thing)

Quote
I could survive without the forum. It's not like it is a bank ATM network. I keep a months cash against that eventuality.
Since you will be footing the bill it is strictly your decision.

That's the whole point, the world isn't going to end if the EEVblog forum goes down for a few days. It's happened before when changing servers etc and no one really seemed to mind.
 

Offline Brumby

  • Supporter
  • ****
  • Posts: 12288
  • Country: au
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #87 on: April 28, 2016, 05:48:36 am »
Again, my comments here are general ones..............

If I read Brumby's comment correctly it is the difference between "hoping" you can restore from backups and "knowing" it works because you have actually tried it.

And trying it once doesn't necessarily mean it'll work in a years time...
That's very true, which is why some organisations go through the exercise periodically ... but going through it at least once will show you if you've missed anything in your original thinking.

Quote
There is also a difference between having a backup that simply won't lose your data but may require work to recover. An having a backup that is trivial to recover from in a moments notice.
That is also very true - and the decision on which way you go will depend on the risk.  The dire situation I mentioned earlier demands a ready-to-rock solution - but the EEVBlog forum won't fold it it took several days ... although there would likely be some who could suffer withdrawal in the meant time.


The whole subject is one where everybody will have different needs with different priorities and different risks - resulting in different solutions.  My aim was to generally show how broad the subject is, not to point fingers.  For the record, from what Dave has said, the forum backup situation seems well in hand.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26757
  • Country: nl
    • NCT Developments
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #88 on: April 28, 2016, 01:51:16 pm »
3) Those same WHM backup files also get mirrored onto Amazon S3 (currently one big 25GB uncompressed TAR file (maybe this isn't the best option, maybe i should just copy the zillion files individually?),
Rsync works very well for backing up individual files on seperate 2 systems because only new and changed files get copied (and if you want deleted files on the original get deleted in the copy as well). Many NAS systems support Rsync for backup purposes.
A log from rsync also serves as a check that the backup is identical to the original.
« Last Edit: April 28, 2016, 01:54:53 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 16272
  • Country: za
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #89 on: April 28, 2016, 05:55:33 pm »
The TAR will in any case be compressed by Amazon in putting it into Glacier storage, so there is really no need to compress it, they will do it for you for free and with something they know how to use. The files in the archive will in any case be safer in the big blob than spread across a whole filesystem, as the TAR will have rudimentary error detection, and it is more likely the file is a non data file so can simply be copied from another TAR in case needed.

Good enough I think, and a lot better than many do. Dave obviously has lost data before. Having individual files is an easy way for insidious data loss to delete things, but the blob can easily be checked just by using an archive manager to open it, which will scan and make sure checksums are correct at the least.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26757
  • Country: nl
    • NCT Developments
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #90 on: April 28, 2016, 07:17:49 pm »
Well, try to retrieve a file quickly from a 25GB blob and you'll notice having a single blob isn't a good idea. In general I agree copying single files isn't a good idea with generic tools like the copy command or -shudder- Explorer which are prone to losing files and have no proper reporting. However with a tool like Rsync which checks the integrity of every file backing up to a collection of single files suddenly isn't problematic and has the advantage of being able to access a few files quickly.
« Last Edit: April 28, 2016, 07:19:56 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline SeanB

  • Super Contributor
  • ***
  • Posts: 16272
  • Country: za
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #91 on: April 28, 2016, 07:36:23 pm »
Well, try to retrieve a file quickly from a 25GB blob and you'll notice having a single blob isn't a good idea. In general I agree copying single files isn't a good idea with generic tools like the copy command or -shudder- Explorer which are prone to losing files and have no proper reporting. However with a tool like Rsync which checks the integrity of every file backing up to a collection of single files suddenly isn't problematic and has the advantage of being able to access a few files quickly.

Yes, but you can mount an archive as a drive. Not as fast as a real disk, but usable. It is just another block device.
 

Offline kcbrownTopic starter

  • Frequent Contributor
  • **
  • Posts: 880
  • Country: us
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #92 on: April 28, 2016, 08:43:34 pm »
Well, try to retrieve a file quickly from a 25GB blob and you'll notice having a single blob isn't a good idea.

That would be a reasonable thing to conclude at first glance, and would definitely apply to a compressed tar archive.

But we're talking about an uncompressed tar archive here.  The tar utility is smart enough to do direct seeks to skip files in the archive.  Which is to say, what it does is read a header, determine if the filename matches what you're after, and if it doesn't, then it skips directly over the data for that file (it can do this because the header includes the size of the file) and goes to the next header entry.  Lather, rinse, repeat until you hit the end of the archive.  I know it does this because I just did an strace on tar for precisely that kind of operation, and I can see it doing a whole bunch of lseek system calls as it processes the archive.

With that approach, with a 6 gigabyte archive, it takes seconds to extract a small file near the end.  There's no need to mount the archive as a drive if all you're after is extraction of a file.  That makes other operations simpler, so there's some advantage to it and it's a nice capability to have.

Dave's backup approach here is really an excellent one.
« Last Edit: April 28, 2016, 08:45:42 pm by kcbrown »
 
The following users thanked this post: SeanB

Offline Macbeth

  • Super Contributor
  • ***
  • Posts: 2571
  • Country: gb
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #93 on: April 28, 2016, 10:21:19 pm »
It's amazing how many people are banging on about all the minutiae of disaster recovery and what database is best. Yet the only server downtime I have witnessed has been due to other stuff, notably DDoS attacks and spamming by a certain somebody, cloudflare stuff, etc. Nothing to do with MySQL.

It's clear that EEVblog is more than covered in a disaster situation and also isn't running the worlds stock markets so even a few days outage is neither here nor there...

The only issue with the present MySQL database that is actually important is the stupid character set used - is it 7 bit ASCII or something?

The forum software (SMF) works perfectly with UTF-8 characters. Indeed I just installed the latest to my own server and greek, cyrillic, chinese all work right out of the box just by specifying "UTF-8 general" when creating the SMF database.

This is a no brainer to resolve.
 
The following users thanked this post: Zbig

Offline Macbeth

  • Super Contributor
  • ***
  • Posts: 2571
  • Country: gb
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #94 on: April 28, 2016, 10:29:28 pm »
Example:

I do like my Soviet ??-12? nixies, anyone got an original datasheet?

How do you pick an NTC thermistor? They are given a nominal resistance at 25°C, say 1000 k?, but also have ? value. How do I use it?

Can anyone find me an English spec for my Maynuo M9710? I can only find M9710????? on the site in the Chinese section.

Ok. I'm hungry now - After watching EEVBlog #873 I am going to skewer two nails into my Brätwurst sausage and connect to 240 VAC.

PS. I don't need any answers to my questions. This is just a test / example ;)

OK - I click Preview and everything appears ok...




Now let's Post and see what happens...

ETA: FAIL! Wah Wah Wahhh.... the original post has replaced all the interesting characters with ? marks. But at least my sausage survived the character mashing.
« Last Edit: April 28, 2016, 10:32:56 pm by Macbeth »
 

Offline dadler

  • Supporter
  • ****
  • Posts: 851
  • Country: us
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #95 on: April 29, 2016, 03:39:09 am »
I built, developed the software for, grew, maintained, and co-owned a massive online forum (that still exists today, but will remain nameless-I am no longer affiliated). Development started in 2000-2001.
 
At the time I left the project, it was at least three orders of magnitude larger than this forum in content and membership.

I used MyISAM for the primary thread and post tables. When InnoDB became available in MySQL, I test drove it and query times and database load tripled (this was 2003 era). I also tried PostgreSQL, which was even slower. Yes, I had caching layers, side-band search indexing, scheduled aggregation job queues, and all sorts of other necessary optimization layers to make it all work. But I used MyISAM for the primary content.

For something like a forum, which is almost entirely reads, MyISAM is really fast, especially on dedicated hardware with fast disks. Forums are not banking systems.

We maintained consistent backups and never lost any appreciable quantity of data over the entire time I was associated with the project.

RDBMSes are tools, and they, and their table types/configurations all have tradeoffs.

I see nothing wrong with running a forum like this on MyISAM.

If I were to reimplement something like this forum from scratch today, I still might use MyISAM for the primary content tables.

Now in the days of the "cloud", and things like Amazon AWS, Google App Engine, Heroko, etc there are many choices. There are overkill enterprise solutions from Oracle and others. There are NoSQL databases like Cassandra, MongoDB, Courchbase. In big projects today, I use DynamoDB, Amazon RDS, Amazon Redshift, and Redis.

But this is a PHP web forum, and with proper backups, MyISAM tables are a perfectly acceptable solution.

Many different tools, many different jobs, many different trade-offs. There is no one-size-fits-all solution.
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37664
  • Country: au
    • EEVblog
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #96 on: April 29, 2016, 05:20:50 am »
The forum software (SMF) works perfectly with UTF-8 characters. Indeed I just installed the latest to my own server and greek, cyrillic, chinese all work right out of the box just by specifying "UTF-8 general" when creating the SMF database.
This is a no brainer to resolve.

And hoes does one do that?
Is it something a dummy like me can do without screwing up?
 

Offline Brumby

  • Supporter
  • ****
  • Posts: 12288
  • Country: au
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #97 on: April 29, 2016, 07:00:39 am »
I don't know the ins and outs of the exercise, but there is one point that stands out and it begs a question....

The point is this
... just by specifying "UTF-8 general" when creating the SMF database.
which indicates to me when originally setting up a forum, it is simply a matter of specifying that option.

But the question is: Will there be any issues migrating the existing data of this well established forum?
 

Offline Zbig

  • Frequent Contributor
  • **
  • Posts: 927
  • Country: pl
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #98 on: April 29, 2016, 08:41:53 am »
The forum software (SMF) works perfectly with UTF-8 characters. Indeed I just installed the latest to my own server and greek, cyrillic, chinese all work right out of the box just by specifying "UTF-8 general" when creating the SMF database.
This is a no brainer to resolve.

And hoes does one do that?
Is it something a dummy like me can do without screwing up?

Haven't tried this myself and don't recommend you just go ahead and try it on a live "production" DB without testing, but based on this: http://stackoverflow.com/questions/6115612/how-to-convert-an-entire-mysql-database-characterset-and-collation-to-utf-8
it seems like a matter of single ALTER statement. I'm not sure whether you have to explicitly make SMF aware of this change afterwards. Perhaps it'd be best to ask on SMF forums/support? Seems like a fairly common problem to me.
 

Offline Macbeth

  • Super Contributor
  • ***
  • Posts: 2571
  • Country: gb
Re: Forum is on MySQL. Should be on PostgreSQL for maximum robustness.
« Reply #99 on: April 29, 2016, 05:36:46 pm »
Cheers Zbig. I decided to have a go. This worked for me typed into a bash session:

Code: [Select]
DB="eevblog"; USER="root"; PASS="hunter2"; ( echo 'ALTER DATABASE `'"$DB"'` CHARACTER SET utf8 COLLATE utf8_general_ci;'; mysql "$DB" -u$USER -p$PASS -e "SHOW TABLES" --batch --skip-column-names | xargs -I{} echo 'ALTER TABLE `'{}'` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;' ) | mysql "$DB" -u$USER -p$PASS

Some caveats. I could only install SMF 2.1 beta on my Ubuntu 16.04 LTS, due to the deprecation of the old versions of PHP and MySQL. SMF 2.1 actually uses utf8 by default so I had to use pretty much the same as above to "downgrade" to latin1_swedish_ci before proceeding on entering some messages and "upgrading" to utf8 again.  Of course I could only get the extended characters working properly after the upgrade.

Dave, take a backup and run the above on that just as a dry run to see if any conversion errors and also to time the whole thing? I imagine it could be pretty slow! Then set up another SMF in apache to test it with?  :-+
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf