The odds of you deliberately posting malicious code here are vanishingly small.
No, I meant that in general, any non-new member posting malicious stuff is very small; it affects what one should consider a reasonable balance between security and usefulness.
I am a very heavy SVG format user, and would like it to be more widely supported, because it solves two annoying problems for me: huge, and non-zoomable, illustrations.
Anyone who thinks SVG files are a pain to work with, just try
Inkscape, available for all operating systems.
However permitting SVG upload *WILL* encourage spammers, and as the drive-by variety never seem to check if their spam links actually work, filtering the SVGs will do little to discourage them.
Okay, so instead of a sanitizer, make it a filter that rejects any and all SVG files that contain CSS or Javascript. To conserve server resources, SVG files could be only permitted for users with enough posts (a limitation which IMHO makes sense for all attached files). Again, the alnorris implementation is quite robust, and very suitable for SVG filtering, and I
am offering to do the work myself if the feature is considered.
With PNG and GIF files, the problem is that even with maximum compression and using indexed color mode to reduce file size, large enough images allowing zooming get easily into the megabyte file size range. Most users do not do even that sort of optimization, and tools really cannot do it heuristically (heuristic/opportunistic methods are too slow and processor-intensive to run on a server). That means the image files are large and thus slow to load, and I find it aggravating. If the image files are small, the details get lost.
Consider
my home page: it is about 22 kB, but sharp at any magnification/zoom, because it consists of
embedded SVG images. That is, if you just copy the HTML file, the local copy will look exactly the same, and not depend on having a connection to the server: everything is contained in that one file. Yet, it is not even "optimized"; it is quite human-readable.
For compression, I prefer xz over ZIP, because it yields smaller files with almost as fast decompression. (There is even a public domain
embeddable xz decompressor for limited use cases.) However, I understand that many OSes don't have any tools to natively handle those, I accept ZIP files.
Trusting forums not to serve you malicious code is not wise. Even with the best of intentions, slipups can happen. e.g. When Dave first added Mathjax equation rendering support, I pointed out a arbitrary Javascript execution vulnerability, which he quickly secured. However for a couple of days it was wide open and anyone could have registered a new user and added any arbitrary script to any thread, served from the eevblog.com domain.
Fully agreed. I never reuse passwords, because forum database security is never perfect. My browser clears all cookies, cache, and site data whenever I close it, and I do that at least daily. I use both NoScript and uBlock Origin, and similar other custom (strict) browser settings, but even then, I don't trust the browser to keep me safe. It is all a matter of balance between security and usefulness.
Story time.
A bit over a decade ago, I created a small engine in PHP for displaying web pages from static files (HTML or PHP) managed via groups by several overlapping administrators/editors, with customizable navigation menus, with overall navigation roughly reflecting the directory structure. It worked very, very well for several years, and was eventually used by the entire department.
One of its features was that as an engine, it would be executed for an entire subtree of URLs (for all matching prefixes). To ensure it does not leak files, especially files or content not meant to be public, it was excessively paranoid about the URLs it handled. For usability, it URL-decoded and converted accented UTF-8 letters to ASCII (so that
/yhtiö/äänestys becomes
/yhtio/aanestys and so on), then filtered the result accepting only known safe patterns (in particular, not allowing more than one consecutive
. or
/, removing all but the known safe characters (letters converted to lower case, digits, dash, underscore, slash, etc.), and so on).
The directory tree under which it looked for content was absolutely anchored, and although it accepted symbolic links, it checked the owner and group of the directory the content was in, to ensure only content from known-published directories was provided.
I would have preferred to write it in C instead of PHP, because then I could have added checks against runtime switch attacks (changing links in the hopes of hitting a race window), and the backwards directory tree walk (from target towards document root) could verify no ongoing shenanigans. (The backwards walk also lets users use shared navigation menus for a shared subtree.) However, for maintenance reasons, a scripting language was considered necessary, and all users with access to the files were "trusted" (under contractual obligations to do no such shenanigans), so PHP it was. I am not sure, but I might have written a Python version also. (I am writing this from memory, too lazy to dig up the actual code, so I might recall some details wrong too.)
Now, that entire scheme comes together, when you understand that the security approach relies on group ownership. That is, users modify files using their own local user accounts (with personal default group), manipulate access controls via file and directory group, and had default umask of 002; i.e. by default, users grant the group rw access to the files and directories. To not risk their personal files, they need personal default groups also, so that their own home directories are owned by their own group. To grant access to additional groups, ACLs could be used, but IIRC were not needed in practice.
The security approach of that engine is essentially twofold: the URL it gets is not trusted in any way and is filtered for both usefulness and security, and the content is only considered if in a directory specially dedicated for public information, based on its group ownership. Since system and user directories are never owned by the dedicated published-information groups, link/symlink/rename races are irrelevant, and there is no risk of any content administrator with no system administration access of publishing anything privileged or sensitive, other than copy-pasting such information explicitly.
In fact, I had even designed a WYSIWYG editor, loosely based on TinyMCE (I wanted a floating toolbar, not content boxes), but nobody was interested in that (unless I were to do it for free); most of that work would have been filtering HTML code (since most browsers stuff really silly stuff in DHTML, like FONT elements), and mapping content between different HTML snippet templates (for layout "chunking"). Technically, the approach would have allowed two different users modifying different parts of the same web page at the same time. And the entire system would have worked even without any database at all.
I didn't just come up with it one morning. Rather, it was a culmination of about a decade of web server maintenance, and trying to get webmasters to Do The Sensible Thing, often not cooperating with each other.. and basically the approach I could find with the least number of risks that did not unnecessarily limit the users.
Since then, I have looked at existing web forum software, and to be honest, I don't trust them at all. I have considered writing my own from scratch, but because to do it in a secure manner, I would have to use several users and groups (to establish security boundaries), and current web hotels cannot provide such an environment, it would be limited to those running their own (virtual) servers, not just using a shared web hosting environment. The way most allow the forum software to create new (script-)executable files and directories indistinguishable form installed forum software files, is a horribly security risk: it escalates all security bugs related to file creation to privilege escalation. Me, I'd have a hierarchy of accounts, with the login and user account management isolated from normal operation, and all content files and directories owned by a dedicated user and group, with the service process belonging to that group, but running as a yet separate user. This means that the forum service processes cannot create executable files, and thus avoids script drop problems. The only downside is that the forum service cannot upgrade itself; it can only
trigger it to be done by the real owner, either human or an automated script.
A secondary stopper is that I dislike using a separate database to hold all content. For a forum, it is an unreasonable dislike; it is just that too often that dependence creates a fragile website, too often down due to limited database resources, or other issues with database connectivity. I can do it with flat files quite efficiently, but that is considered caveman style... So, I would need to do a bit of research into the different database engines (PostgreSQL, MariaDB, MongoDB) and how to map the security features I'd need to database access. (For SQL databases, a few database users would suffice, with proper read-write access rights to different tables. There are a few ways to pass the needed credentials safely, but it turns out that checking those credentials (password) is a significant part of the overall load in most cases, so finding a way to safely cache database connections would make it much more resource-friendly.)