Fraser is doubtless right about the 'specification' points but there is another, simpler answer.
So-called
'NTSC' (USA standard) analogue television used 525 lines, interlaced, at 30 frames per second. (Actually, 29.97 but I'll ignore that). The 525-line system comprised
480 lines of active picture information; the remaining lines were
used for sync pulses.
Those 480 active TV lines were
interlaced, meaning each
field was 240 lines. This requires 320 horizontal pixels on a 4:3 ratio screen to have equal horizontal and vertical resolution. That's basically where the 320x240 figure stems from.
A 320x240 pixel image can be 'read out twice' to give an interlaced signal far more easily than making a 640x480 sensor, so that's where all the development effort went. A
well-focused, contrasty image doesn't need to be particularly high resolution to still be very usable.
(Compatibility with NTSC TV standards is also why
VGA is 640x480. There was a wealth of experience at making displays (eg TVs) for that resolution; producing anything to a different standard was costly).
It's interesting to note that in
625-line, 50Hz (PAL) countries the equivalent sums work out to
576 active lines, so 384x288 sensors are the most convenient. That's almost certainly why the likes of Ulis developed sensors using that format. Converting between 240- and 288-line systems either requires a big border with no information or some kind of smearing that reduces the apparent resolution.
Now we're largely away from the shackles of analogue TV systems having to maintain compatibility with black and white systems originating in the late 1930s, in theory we can have pictures (and sensors) of any format we please, although there are still some well-established
screen formats with their roots in computerland (
1024x768, 1280x1024, 1600x1200, 1280x720, 1920x1080...) - but there's no reason why, if you have the cash, you can't have a triangular, circular, trapezoid or whatever display made for you, with more or less as many pixels as you want. And I'm sure that FLIR and their contemporaries would be delighted to relieve you of a (very) large wheelbarrow of cash to develop a custom sensor in the same format.
The nearer to circular you make a
sensor, the nearer it is to using the maximum amount of the
lens system's image circle. Back in the days of the
Argus 1, which had a round imaging tube (a
pyroelectric vidicon), they made use of the maximum possible amount of the sensitive area by scanning the whole front (active) area and presenting the image as a circle within a large black border (an electronic mask for the no-genuine-information parts of the video signal). This also meant that the light from the lens was used as efficiently as possible. But circles are tricky to cut out of (silicon) wafers and/or wasteful of silicon real estate, so in practice the nearer to
square you get, the better. Display screens, too, are generally rectangular these days (the earliest CRTs were round) and an X-Y array is convenient to scan electronically. Today's 5:4 ratio sensors such as 640x512 or 1280x1024 are the closest we have to making optimum use of the lens image circle; electronic processing means it can be displayed in a number of ways. Widescreen 16:9 thermal imaging sensors seem to remain rare, perhaps because they make very poor use of the optics (unless there's an
anamorphic element in the lens, but that's another story).
Perhaps one day widescreen thermal imaging will become the norm, especially if there's a revolution in sensitivity meaning that cheaper optical elements can be used. Moulded glass (of some sort) is very significantly cheaper than single point diamond turned single-crystal germanium but, as yet, nothing quite offers the same all-round performance, though chalcogenide glasses may be coming close. There are of course many other substances that could be used as LWIR lens elements (eg Kbr, KCl, TiBr-Til, ZnSe, ZnS to name a few) but I'm getting off-topic fast...