General > General Technical Chat
Looking for a site that will reduce HTML or make compatible with older browsers
cdev:
You could likely use the mobile versions of sites now, if your TLS library was up to date.
Also, similarly to my earlier suggestion, you could download the source for modern browsers and possibly compile some of them yourself, if you had the dependencies. Depending on how old your OS is, this might be impossible, for the most modern browsers, but it is entirely possible you could find a midpoint but there should be a midpoint where basic browsing still worked. With some OS's you need to compile everything yourself. Its not that bad. It just takes a lot of time.
You could perhaps even cross-compile it on a newer machine, setting the target to be this older version.
Syntax Error:
@edy For rabbit hole read black hole. Historically...
The elephant dung in the room is that there are, and have never been, any web standards (say wot?). There is no ISO, DIN or IEEE body making definitive rulings about the syntax of the WWW. Instead, we have vague Request-For-Comment RFC documents from a fuzzy entity called the World Wide Web Consortium. The result, there is only ever convergence of practice until someone (Apple, Google, Mozilla) decides they want to create their own features that only they support. There is no web law that says we cannot create the <bobbydazzler> tag or the @ripper css directive, and push it out over the eevblog browser. When the geeks on Reddit hear of this new 'thing', an RFC gets raised and now it's a 'web standard'. For five minutes.
The early days of the WWW were littered with the floppy-disked corpses of browsers that succumbed in the Browser Wars of the 1990s. Then, it was not about collective convergence but corporate divergence.
Early website developers were faced with supporting at least Internet Explorer AND Netscape. Two browsers which did events and active content handling in very different ways. Early sites would often show an image that said, "this site is best viewed in Netscape." Meaning, we're not doing a friggin IE version as well. To span this chasm, early sites used Java Applets, which was a whole new parallel universe of plugin divergence. But at least JAs used the real Java language, with a graphics canvas that you could draw on. HTML5 rendered Java Applets and the whole zoo of other active object tags obsolete. Which was a good thing.
You probably could build a web time machine, but your 'Tardis' process will need to regenerate your Doctor Who backwards; from Jodie Whittaker to David Tennant.
Nominal Animal:
--- Quote from: janoc on December 22, 2020, 06:46:09 pm ---a) You are giving way too much credit to scammers who are most often not native speakers of the language of the targeted population and thus use weird and wonderful linguistical constructions and typos most people wouldn't make. A lot of spam is even machine translated.
--- End quote ---
Spam, scams, and phishing are three different problems. Email addresses are scraped for spam. Hacked data is used for scams and phishing.
Spam is basically advertising. Those who buy mass marketing services, don't want to pay for sending mail to addresses obtained via hacks, because it can backfire. (Advertisers really don't like it when they are contacted by police investigating a hack, and having received email to an address that has never been public.)
Advertisers prefer profile databases sold by social networks. Those cost money. By scraping existing web pages with relevant keywords, spammers can construct mailing lists that do have a tenuous link to search terms, for much less money. OCR'ing the pages, or setting up machinery to extract javascript-obfuscated email addresses is not cost effective. If you have the technical skill to do that, you make more money by copying entire sites, and putting advertising on them.
If you ever buy anything from eBay or Banggood, your email address will be sold to spammers, for advertising.
Scams are easiest to filter out. They do deliberately have errors, because they want responses from gullible targets. If you can tell the message is a scam, they don't like to waste time on you anyway. They use whatever email addresses they can find.
Phishing attempts tend to try to look as genuine as possible, and range from disguised files to links to fake web sites. These are the hardest to filter out. They also occur in two completely different categories: targeted, and scattershot. Scattershot uses whatever email addresses they can find, including hacked databases, but are easily filtered out when detected (based on keywords or URI fragments in the message body).
--- Quote from: janoc on December 22, 2020, 06:46:09 pm ---b) You have never had to actually administer a larger e-mail system and deal with both web scraping and spam ... I did, for many years.
--- End quote ---
No large email servers, only one large mailing list server for a few years, but lots of web servers of different kinds. I do know scraping well – both how to scrape, and how to make scraping as frustrating as possible. Which is the point: to keep addresses off mass marketing lists. Scams we can mostly filter out; and users just have to learn to be wary of phishing. FWIW, one of my email addresses has been in active use over 25 years now.
cdev:
--- Quote from: Syntax Error on December 22, 2020, 09:04:40 pm ---@edy For rabbit hole read black hole. Historically...
The elephant dung in the room is that there are, and have never been, any web standards (say wot?). There is no ISO, DIN or IEEE body making definitive rulings about the syntax of the WWW. Instead, we have vague Request-For-Comment RFC documents from a fuzzy entity called the World Wide Web Consortium. The result, there is only ever convergence of practice until someone (Apple, Google, Mozilla) decides they want to create their own features that only they support. There is no web law that says we cannot create the <bobbydazzler> tag or the @ripper css directive, and push it out over the eevblog browser. When the geeks on Reddit hear of this new 'thing', an RFC gets raised and now it's a 'web standard'. For five minutes.
The early days of the WWW were littered with the floppy-disked corpses of browsers that succumbed in the Browser Wars of the 1990s. Then, it was not about collective convergence but corporate divergence.
Early website developers were faced with supporting at least Internet Explorer AND Netscape. Two browsers which did events and active content handling in very different ways. Early sites would often show an image that said, "this site is best viewed in Netscape." Meaning, we're not doing a friggin IE version as well. To span this chasm, early sites used Java Applets, which was a whole new parallel universe of plugin divergence. But at least JAs used the real Java language, with a graphics canvas that you could draw on. HTML5 rendered Java Applets and the whole zoo of other active object tags obsolete. Which was a good thing.
You probably could build a web time machine, but your 'Tardis' process will need to regenerate your Doctor Who backwards; from Jodie Whittaker to David Tennant.
--- End quote ---
It seems to be not as bad as it used to be, if you avoid using the kinds of tools that produce wildly bloated code.
You bring up good points, but I do also think, all other factors aside, we do need the ability to invent new kinds of HTML tags, otherwise there would never be new kinds of content for the web.
There should be a setting in the browser that lets the browser request simpler code, too. A subset of code. Or all the bleeding edge functionality the server(s) can throw at you.
tooki:
--- Quote from: janoc on December 22, 2020, 06:46:09 pm ---
--- Quote from: Nominal Animal on December 22, 2020, 04:36:37 pm ---Fact is, phishing and scamming emails are deliberately full of typos and errors, because they are not interested in those who notice those; they are interested in those who do not, and are therefore easier targets.
--- End quote ---
That is an interesting theory - in other words, with this strategy you would be intentionally seeding your mail with keywords that are very unlikely to occur in legitimate mail of your target and thus they make wonderful "spamminess" indicators for various filters. Even the ancient SpamAssassin used this, almost twenty years ago. Makes total sense ... :palm:
Not to mention that most people these days are trained to treat the mail as suspicious/scam when it is full of typos and errors, regardless of whether it is actually one. It is such a tell-tale sign.
I think that:
a) You are giving way too much credit to scammers who are most often not native speakers of the language of the targeted population and thus use weird and wonderful linguistical constructions and typos most people wouldn't make. A lot of spam is even machine translated.
--- End quote ---
No, Nominal Animal is absolutely correct: the misspellings are almost entirely deliberate. Why? Because it's trivial to create spam filters to find critical words, correctly spelled. (This is why, for example, people like me who earned their university diploma cum laude don't dare write that into the body of the message, instead writing "with honors". Never mind the poor residents of Scunthorpe.) So the spammers quickly began using misspellings to get around such keyword-based spam filters. Nowadays, with Unicode, they frequently use lookalike characters to fool spam filters, e.g. ΑΒΕϜΗΙΚΜΝΟΡΤΥΧΖ instead of ABEFHIKMNOPTYXZ, ϳ in place of j, р instead of p, etc. And that's just different alphabets, never mind how those alphabets are repeated multiple times as mathematical symbols. Depending on the font, they may be completely indistinguishable, and even if they're not, a human casually reading it won't know that's not deliberate. Meanwhile, software not specifically designed to equate lookalike characters will not recognize the words.
Sure, you can train a filter to find a misspelling, or even to use fuzzy logic or something to find similar misspellings. But nonetheless, parsing the correctly-spelled content of an email remains a key part of spam filtering. And the sheer variety of misspellings and weird wording that are possible mean that recognizing the fingerprint of one particular variant may not give you much by which to recognize others.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version