Code: [Select]|0000000000111111111122222222223333333333
|0123456789012345678901234567890123456789
00| |
01| B |
02| BBB |
03| BBBBBB |
04| GBBBBBBBB |
05| GGGBBBBBBBBB |
06| GGGGGGgBBBBBBBB |
07| GGGGGGGGGgbBBBBBBB |
08| GGGGGGGGGgbBBBBBBB |
09| GGGGGGGGg+bBBBBBB |
10| GGGGGGGggbbBBBBB |
11| GGGGGGGggbbBBBBB |
12| GGGGGGggbbbBBBB |
13| GGGGGgggbbbBBB |
14| GGGGGggbbbbBBB |
15| GGGGgggbbbbBB |
16| GGGgggrrbbbB |
17| GGGggrrrrbbB |
18| GGgrrrrrrrB |
19| GgrrrrrrRR |
20| GrrrrRRRR |
21| rRRRRRRRR |
22| RRRRRRRRR |
23| RRRRRRRR |
24| RRRRRRRR |
25| RRRRRRR |
26| RRRRRR |
27| RRRRRR |
28| RRRRR |
29| RRRR |
30| RRRR |
31| RRR |
32| RR |
33| RR |
34| R |
35| | <------------ defect !!!
36| R |
37| |
38| |
39| |
|0000000000111111111122222222223333333333
|0123456789012345678901234567890123456789
here it is an example: see the hole on the vertex.
|0000000000111111111122222222223333333333
|0123456789012345678901234567890123456789
00| |
01| B |
02| BBB |
03| BBBBBB |
04| GBBBBBBBBB |
05| GGGBBBBBBBBB |
06| GGGGGGgBBBBBBBB |
07| GGGGGGGGGgbBBBBBBBB |
08| GGGGGGGGGgbBBBBBBB |
09| GGGGGGGGg+bBBBBBB |
10| GGGGGGGggbbBBBBBB |
11| GGGGGGGggbbBBBBB |
12| GGGGGGggbbbBBBB |
13| GGGGGgggbbbBBBB |
14| GGGGGggbbbbBBB |
15| GGGGgggbbbbBB |
16| GGGgggrrbbbBB |
17| GGGggrrrrbbB |
18| GGgrrrrrrrB |
19| GgrrrrrrRRR |
20| GrrrrRRRRR |
21| rRRRRRRRR |
22| RRRRRRRRRR |
23| RRRRRRRRR |
24| RRRRRRRR |
25| RRRRRRRR |
26| RRRRRRR |
27| RRRRRR |
28| RRRRRR |
29| RRRRR |
30| RRRR |
31| RRRR |
32| RRR |
33| RR |
34| RR |
35| R | <------------ defect !!!
36| R |
37| |
38| |
39| |
|0000000000111111111122222222223333333333
|0123456789012345678901234567890123456789
if(this_error > max_error) {
y++;
this_error -= max_error;
}
if(this_error >= max_error) {
y++;
this_error -= max_error;
}
for(x = x0; x <= x1; x++) {
y = y0 + (x-x0) * (y1-y0) / (x1-x0); // Integer division
}
y = y0 + ((x-x0) * (y1-y0)+(y1-y0)/2) / (x1-x0);
One thing that has been slightly glossed over so far is the consideration of exactly where the borders of a polygon lie. This can manifest itself in adjacent polygons either by gaps appearing between them, or by them overlapping.
For example, in Figure 6.19, [a 4x4 grid showing a 3x3 square] the width of the polygon is 3 units, so it should have an area of 9 units, whereas it has been rendered with an area of 16 units. The traditional solution to this problem, and the one usually advocated in textbooks, is to consider the sample point of the pixel to lie in its centre, that is, at (x + 0,5, y + 0.5). {A pixel can be considered to be a rectangle of finite area with dimensions 1,0 x 1.0, and its sample point is the point within the pixel area where the scene is sampled in order to determine the value of the pixel.) So, for example, the intersection of an edge with a scan line is calculated for y + O.5, rather than for y, as we assumed above.
This is messy, and excludes the possibility of using integer-only arithmetic, A simpler solution is to assume that the sample point lies at one of the four corners of the pixel; we have chosen the top right-hand corner of the pixel. This has the consequence that the entire image is displaced half a pixel to the left and down, which in practice is insignificant. The upshot of this is that it provides the following simple rasterization rules;
(1) Horizontal edges are simply discarded.
(2) An edge which goes from scan line y0 to y1 should generate x values for scan lines y0 through to y1-1 (that is missing the top scan line), or if y0 = y1 then it generates no values.
(3) Similarly, horizontal segments should be filled from Xleft to Xright-1 (with no pixels generated if Xleft = Xright).
Incidentally, in rules (2) and (3), whether the first or last element is ignored is arbitrary, and the choice is based around programming convenience. The four possible permutations of these two rules define the sample point as one of the four corners of the pixel. The effect of these rules can be demonstrated in Figure 6.20. Here we have three adjacent polygons A, B and C, with edges a, b, c and d. The rounded a- values produced by these edges for the scan shown are 2, 4 ,4 and 7 respectively. Rule 3 then gives pixels 2 and 3 for polygon A, none for polygon B, and 4 to 6 for polygon C. Thus, overall, there are no gaps, and no overlapping. The reason why horizontal edges are discarded is because the edges adjacent to them will have already contributed the x values to make up the segment (for example, the base of the polygon in Figure 6.18; note also that, for the sake of simplicity, the scan conversion of this polygon was not done strictly in accordance with the rasterization rules mentioned above).
I don't have the answer for you but it is possible to do a perfectly consistent implementation. The GPU guys have it solved for example. You can draw a triangle mesh with very tiny triangles, long thin triangles, etc and every pixel will be hit exactly once and there will be no holes. Here's a link to some stuff to think about and maybe give you some search terms to follow up with.
https://docs.microsoft.com/en-us/windows/desktop/direct3d11/d3d10-graphics-programming-guide-rasterizer-stage-rules
#include <stdlib.h>
#include <inttypes.h>
#include <stdio.h>
#include <time.h>
/* Written by Nominal Animal, 2018.
No rights reserved. I do not believe there is anything copyrightable here,
but if there is, this is licensed under the Creative Commons Zero license,
[url]https://creativecommons.org/share-your-work/public-domain/cc0/[/url]
In other words, do whatever you like with it, but at your own risk.
It should be safe, but I provide no warranties or guarantees of any kind. */
#define ONE UINT32_C(2147483648)
#define FAIL(...) do { fprintf(stderr, __VA_ARGS__); exit(EXIT_FAILURE); } while (0)
typedef struct {
int32_t x;
int32_t y;
} i32vec2d;
/* Xorshift64* pseudo-random number generator. */
static uint64_t prng_state = 1;
static uint64_t prng_u64(void)
{
uint64_t x = prng_state;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
prng_state = x;
return x * UINT64_C(2685821657736338717);
}
static int32_t prng_x0 = 0;
static int32_t prng_y0 = 0;
static uint32_t prng_dx = 0;
static uint32_t prng_dy = 0;
static uint32_t prng_mx = 0;
static uint32_t prng_my = 0;
static void prng_set_point_range(const int32_t xmin, const int32_t ymin,
const int32_t xmax, const int32_t ymax)
{
if (xmin <= xmax) {
prng_x0 = xmin;
prng_dx = (uint32_t)(xmax - xmin);
} else {
prng_x0 = xmax;
prng_dx = (uint32_t)(xmin - xmax);
}
if (ymin <= ymax) {
prng_y0 = ymin;
prng_dy = (uint32_t)(ymax - ymin);
} else {
prng_y0 = ymax;
prng_dy = (uint32_t)(ymin - ymax);
}
prng_mx = prng_dx; prng_my = prng_dy;
prng_mx |= prng_mx >> 1; prng_my |= prng_my >> 1;
prng_mx |= prng_mx >> 2; prng_my |= prng_my >> 2;
prng_mx |= prng_mx >> 4; prng_my |= prng_my >> 4;
prng_mx |= prng_mx >> 8; prng_my |= prng_my >> 8;
prng_mx |= prng_mx >> 16; prng_my |= prng_my >> 16;
}
static i32vec2d prng_point(void)
{
uint64_t r;
uint32_t x, y;
i32vec2d result;
do {
r = prng_u64();
x = r & prng_mx;
y = (r >> 32) & prng_my;
} while (x > prng_dx || y > prng_dy);
result.x = prng_x0 + (int32_t)x;
result.y = prng_y0 + (int32_t)y;
return result;
}
/* Bresenham's line algorithm with increasing major x and increasing minor y; 31-bit error term. */
static inline void line_xpyp(int32_t x0, int32_t y0,
int32_t x1, int32_t y1,
void (*set)(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y))
{
uint32_t derr = (x1 > x0) ? (uint64_t)(y1 - y0) * ONE / (uint64_t)(x1 - x0) : 0;
uint32_t err = 0;
int32_t x, y = y0;
if (x1 < x0) FAIL("line_xpyp(): x1 < x0\n");
if (y1 < y0) FAIL("line_xpyp(): y1 < y0\n");
for (x = x0; x <= x1; x++) {
set(x0, y0, x1, y1, x, y);
err += derr;
if (err >= ONE) {
y++;
err -= ONE;
}
}
}
/* Bresenham's line algorithm with increasing major x and decreasing minor y; 31-bit error term. */
static inline void line_xpyn(int32_t x0, int32_t y0,
int32_t x1, int32_t y1,
void (*set)(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y))
{
uint32_t derr = (x1 > x0) ? (uint64_t)(y0 - y1) * ONE / (uint64_t)(x1 - x0) : 0;
uint32_t err = 0;
int32_t x, y = y0;
if (x1 < x0) FAIL("line_xpyn(): x1 < x0\n");
if (y1 > y0) FAIL("line_xpyn(): y1 > y0\n");
for (x = x0; x <= x1; x++) {
set(x0, y0, x1, y1, x, y);
err += derr;
if (err >= ONE) {
y--;
err -= ONE;
}
}
}
/* Bresenham's line algorithm with increasing major y and increasing minor x; 31-bit error term. */
static inline void line_ypxp(int32_t x0, int32_t y0,
int32_t x1, int32_t y1,
void (*set)(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y))
{
uint32_t derr = (y1 > y0) ? (uint64_t)(x1 - x0) * ONE / (uint64_t)(y1 - y0) : 0;
uint32_t err = 0;
int32_t x = x0, y;
if (y1 < y0) FAIL("line_ypxp(): y1 < y0\n");
if (x1 < x0) FAIL("line_ypxp(): x1 < x0\n");
for (y = y0; y <= y1; y++) {
set(x0, y0, x1, y1, x, y);
err += derr;
if (err >= ONE) {
x++;
err -= ONE;
}
}
}
/* Bresenham's line algorithm with increasing major y and decreasing minor x; 31-bit error term. */
static inline void line_ypxn(int32_t x0, int32_t y0,
int32_t x1, int32_t y1,
void (*set)(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y))
{
uint32_t derr = (y1 > y0) ? (uint64_t)(x0 - x1) * ONE / (uint64_t)(y1 - y0) : 0;
uint32_t err = 0;
int32_t x = x0, y;
if (y1 < y0) FAIL("line_ypxn(): y1 < y0\n");
if (x1 > x0) FAIL("line_ypxn(): x1 > x0\n");
for (y = y0; y <= y1; y++) {
set(x0, y0, x1, y1, x, y);
err += derr;
if (err >= ONE) {
x--;
err -= ONE;
}
}
}
/* Bresenham's line algorithm. */
void line(int32_t x0, int32_t y0,
int32_t x1, int32_t y1,
void (*set)(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y))
{
if (!set)
return;
if (x1 >= x0) {
if (y1 >= y0) {
if (x1 - x0 >= y1 - y0) {
line_xpyp(x0, y0, x1, y1, set);
} else {
line_ypxp(x0, y0, x1, y1, set);
}
} else {
if (x1 - x0 >= y0 - y1) {
line_xpyn(x0, y0, x1, y1, set);
} else {
line_ypxn(x1, y1, x0, y0, set);
}
}
} else {
if (y1 >= y0) {
if (x0 - x1 >= y1 - y0) {
line_xpyn(x1, y1, x0, y0, set);
} else {
line_ypxn(x0, y0, x1, y1, set);
}
} else {
if (x0 - x1 >= y0 - y1) {
line_xpyp(x1, y1, x0, y0, set);
} else {
line_ypxp(x1, y1, x0, y0, set);
}
}
}
}
static unsigned long pixel_count = 0;
static unsigned long error_count = 0;
/* Analytical version of Bresenham's algorithm, with increasing x major and increasing y minor. */
void is_at_xpyp(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y)
{
if (x1 < x0) FAIL("is_at_xpyp(): x1 < x0");
if (y1 < y0) FAIL("is_at_xpyp(): y1 < y0");
if (x < x0 || x > x1) FAIL("is_at_xpyp(): x is not between x0 and x1.\n");
if (y < y0 || y > y1) FAIL("is_at_xpyp(): y is not between y0 and y1.\n");
const uint64_t derr = (x1 > x0) ? (uint64_t)(y1 - y0) * ONE / (uint64_t)(x1 - x0) : 0;
const int32_t ay = y0 + (int32_t)( (0 + derr * (x - x0)) / ONE );
pixel_count++;
error_count += (ay != y);
}
/* Analytical version of Bresenham's algorithm, with increasing x major and decreasing y minor. */
void is_at_xpyn(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y)
{
if (x1 < x0) FAIL("is_at_xpyn(): x1 < x0");
if (y1 > y0) FAIL("is_at_xpyn(): y1 > y0");
if (x < x0 || x > x1) FAIL("is_at_xpyn(): x is not between x0 and x1.\n");
if (y < y1 || y > y0) FAIL("is_at_xpyn(): y is not between y0 and y1.\n");
const uint64_t derr = (x1 > x0) ? (uint64_t)(y0 - y1) * ONE / (uint64_t)(x1 - x0) : 0;
const int32_t ay = y0 - (int32_t)( (0 + derr * (x - x0)) / ONE );
pixel_count++;
error_count += (ay != y);
}
/* Analytical version of Bresenham's algorithm, with increasing y major and increasing x minor. */
void is_at_ypxp(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y)
{
if (y1 < y0) FAIL("is_at_ypxp(): y1 < y0");
if (x1 < x0) FAIL("is_at_ypxp(): x1 < x0");
if (y < y0 || y > y1) FAIL("is_at_ypxp(): y is not between y0 and y1.\n");
if (x < x0 || x > x1) FAIL("is_at_ypxp(): x is not between x0 and x1.\n");
const uint64_t derr = (y1 > y0) ? (uint64_t)(x1 - x0) * ONE / (uint64_t)(y1 - y0) : 0;
const int32_t ax = x0 + (int32_t)( (0 + derr * (y - y0)) / ONE );
pixel_count++;
error_count += (ax != x);
}
/* Analytical version of Bresenham's algorithm, with increasing y major and decreasing x minor. */
void is_at_ypxn(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y)
{
if (y1 < y0) FAIL("is_at_ypxn(): y1 < y0");
if (x1 > x0) FAIL("is_at_ypxn(): x1 > x0");
if (y < y0 || y > y1) FAIL("is_at_ypxn(): y is not between y0 and y1.\n");
if (x < x1 || x > x0) FAIL("is_at_ypxn(): x is not between x0 and x1.\n");
const uint64_t derr = (y1 > y0) ? (uint64_t)(x0 - x1) * ONE / (uint64_t)(y1 - y0) : 0;
const int32_t ax = x0 - (int32_t)( (0 + derr * (y - y0)) / ONE );
pixel_count++;
error_count += (ax != x);
}
/* Analytical version of Bresenham's algorithm. */
void is_at(int32_t x0, int32_t y0, int32_t x1, int32_t y1, int32_t x, int32_t y)
{
if (x1 >= x0) {
if (y1 >= y0) {
if (x1 - x0 >= y1 - y0) {
is_at_xpyp(x0, y0, x1, y1, x, y);
} else {
is_at_ypxp(x0, y0, x1, y1, x, y);
}
} else {
if (x1 - x0 >= y0 - y1) {
is_at_xpyn(x0, y0, x1, y1, x, y);
} else {
is_at_ypxn(x1, y1, x0, y0, x, y);
}
}
} else {
if (y1 >= y0) {
if (x0 - x1 >= y1 - y0) {
is_at_xpyn(x1, y1, x0, y0, x, y);
} else {
is_at_ypxn(x0, y0, x1, y1, x, y);
}
} else {
if (x0 - x1 >= y0 - y1) {
is_at_xpyp(x1, y1, x0, y0, x, y);
} else {
is_at_ypxp(x1, y1, x0, y0, x, y);
}
}
}
}
int main(int argc, char *argv[])
{
unsigned long i, n;
int32_t xmin, ymin, xmax, ymax;
i32vec2d xy0, xy1;
char dummy;
if (argc < 4 || argc > 5) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help | help ]\n", argv[0]);
fprintf(stderr, " %s XMIN:XMAX YMIN:YMAX LINES [ SEED ]\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
if (sscanf(argv[1], "%" SCNi32 " : %" SCNi32 " %c", &xmin, &xmax, &dummy) != 2) {
fprintf(stderr, "%s: Invalid x range.\n", argv[1]);
return EXIT_FAILURE;
}
if (sscanf(argv[2], "%" SCNi32 " : %" SCNi32 " %c", &ymin, &ymax, &dummy) != 2) {
fprintf(stderr, "%s: Invalid y range.\n", argv[2]);
return EXIT_FAILURE;
}
if (sscanf(argv[3], " %lu %c", &n, &dummy) != 1 || n < 1) {
fprintf(stderr, "%s: Invalid number of lines to test.\n", argv[3]);
return EXIT_FAILURE;
}
if (argc > 4) {
if (sscanf(argv[4], " %" SCNu64 " %c", &prng_state, &dummy) != 1 || !prng_state) {
fprintf(stderr, "%s: Invalid Xorshift64* seed. (Zero is not a valid seed.)\n", argv[4]);
return EXIT_FAILURE;
}
} else {
prng_state = (uint64_t)time(NULL);
for (i = 0; i < 100; i++) {
prng_state ^= prng_state >> 12;
prng_state ^= prng_state << 25;
prng_state ^= prng_state >> 27;
}
if (!prng_state)
prng_state = 1;
fprintf(stderr, "Using seed %" PRIu64 ".\n", prng_state);
}
prng_set_point_range(xmin, ymin, xmax, ymax);
for (i = 0; i < n; i++) {
xy0 = prng_point();
xy1 = prng_point();
line(xy0.x, xy0.y, xy1.x, xy1.y, is_at);
}
if (!error_count) {
printf("%lu pixels tested; no errors detected.\n", pixel_count);
return EXIT_SUCCESS;
} else {
printf("%lu pixels tested; %lu erroneus cases (%.06f%%) found.\n", pixel_count, error_count,
100.0 * (double)error_count / (double)pixel_count);
return EXIT_FAILURE;
}
}
well, I forgot mentioning that divisions must be avoided because they take too many clock-edges. This is the main reason for the existence of the Bresenham's line drawing!
void bresenham_xpyp(const int x0, const int y0, const int x1, const int y1)
{
const int xerr = 2*(y1-y0);
const int yerr = 2*(x1-x0);
int err = xerr - (x1-x0);
int x = x0, y = y0;
while (1) {
/* Draw pixel (x, y) */
if (x >= x1)
return;
x++;
if (err > 0) {
y++;
err += xerr - yerr;
} else
err += xerr;
}
}
int on_bresenham_xpyp(const int x0, const int y0, const int x1, const int y1, const int x, const int y)
{
/* Only tested within the axis-aligned bounding box for the line. */
if (x < x0 || y < y0 || x > x1 || y > y1)
return 0;
/* Only needed for degenerate, 1-pixel long lines; when x0 == x1 and y0 == y1 */
if (x == x0 && y == y0)
return 1;
const int err = 2 * (x - x0 + 1) * (y1 - y0) - (x1 - x0) * (2 * (y - y0) + 1);
return (err > 2*(x0 - x1 + y1 - y0)) && (err <= 2*(y1 - y0));
}
(...)
This is the Bresenham's algorithm. Inside the loop, the error looks something like ~err[ i+1 ] = ~err[ i ] + 2 * ( Dy – Dx ) × (y[ i+1 ] – y[ i ]). Reason why I said it's recursive.
This is the Bresenham's algorithm.
it seems there is something wrong
With my team we have worked a speculative approach (see screenshots on previous pages), that does the job very neatly, except a few "false positive" pixels, but this is contained in 1-pixel of error as maximal error.
It avoids the division in the cycle introducing a pre-division
(I mean I suspect numerical weakness)
47-bit dividend with 16 low bits always zero; and a 31-bit result
how have you calculated the size of the this?
#include <stdlib.h>
#include <inttypes.h>
#include <stdio.h>
#include <errno.h>
#include <time.h>
#include <math.h>
typedef struct {
int32_t x;
int32_t y;
} i32vec;
static uint64_t prng_state = 1;
static inline uint64_t prng_read_uint64(const char *path)
{
uint64_t result = 0;
FILE *src;
if (!path || !*path)
return 0;
src = fopen(path, "rb");
if (!src)
return 0;
if (fread(&result, sizeof result, 1, src) != 1) {
fclose(src);
return 0;
}
fclose(src);
return result;
}
static inline uint64_t prng_randomize(void)
{
uint64_t seed = 0;
int rounds = 100;
/* Try some known randomness sources. */
if (!seed) seed ^= prng_read_uint64("/dev/urandom");
if (!seed) seed ^= prng_read_uint64("/dev/arandom");
if (!seed) seed ^= prng_read_uint64("/dev/random");
/* Mix in a bit of time. */
seed ^= (uint64_t)time(NULL) * UINT64_C(59121997897);
/* Mix in a bit of clock, as well. */
seed ^= (uint64_t)clock() * UINT64_C(5054973881);
/* Make sure it is nonzero. */
if (!seed)
seed = 1;
/* Churn it a bit, to make it less predictable. */
while (rounds-->0) {
seed ^= seed >> 12;
seed ^= seed << 25;
seed ^= seed >> 27;
}
/* Done. */
prng_state = seed;
return seed;
}
static inline uint64_t prng_u64(void)
{
uint64_t state = prng_state;
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
prng_state = state;
return state * UINT64_C(2685821657736338717);
}
static uint32_t prng_xmask = 0;
static uint32_t prng_ymask = 0;
static uint32_t prng_xmax = 0;
static uint32_t prng_ymax = 0;
static int32_t prng_xmin = 0;
static int32_t prng_ymin = 0;
static inline i32vec prng_i32vec(void)
{
uint32_t x, y;
i32vec result;
do {
uint64_t u = prng_u64();
x = u & prng_xmask;
y = (u >> 32) & prng_ymask;
} while (x > prng_xmax || y > prng_ymax);
result.x = x + prng_xmin;
result.y = y + prng_ymin;
return result;
}
static inline void prng_i32vec_range(int32_t xmin, int32_t ymin,
int32_t xmax, int32_t ymax)
{
if (xmin <= xmax) {
prng_xmin = xmin;
prng_xmax = xmax - xmin;
} else {
prng_xmin = xmax;
prng_xmax = xmin - xmax;
}
if (ymin <= ymax) {
prng_ymin = ymin;
prng_ymax = ymax - ymin;
} else {
prng_ymin = ymax;
prng_ymax = ymin - ymax;
}
prng_xmask = prng_xmax;
prng_xmask |= prng_xmask >> 1;
prng_xmask |= prng_xmask >> 2;
prng_xmask |= prng_xmask >> 4;
prng_xmask |= prng_xmask >> 8;
prng_xmask |= prng_xmask >> 16;
prng_ymask = prng_ymax;
prng_ymask |= prng_ymask >> 1;
prng_ymask |= prng_ymask >> 2;
prng_ymask |= prng_ymask >> 4;
prng_ymask |= prng_ymask >> 8;
prng_ymask |= prng_ymask >> 16;
}
static inline double fx_div_correct(const int32_t n, const int32_t d)
{
return (double)n * 65536.0 / (double)d;
}
static uint64_t max_n = 0;
static uint32_t max_d = 0;
static inline int32_t fx_div(const int32_t n, const int32_t d)
{
uint64_t dividend;
uint32_t un, ud;
int32_t result;
int negative;
if (n < 0) {
un = -n;
negative = 1;
} else {
un = n;
negative = 0;
}
if (d < 0) {
ud = -d;
negative = !negative;
} else {
ud = d;
}
if (max_d < ud)
max_d = ud;
/* 47-bit division */
dividend = ((uint64_t)un) << 16;
#ifdef ROUNDING
dividend += (ud / 2);
#endif
if (max_n < dividend)
max_n = dividend;
result = dividend / ud;
return (negative) ? -result : result;
}
static int parse_ulong(const char *src, unsigned long *to)
{
const char *end;
unsigned long val;
if (!src || !*src)
return -1;
end = src;
errno = 0;
val = strtoul(src, (char **)&end, 0);
if (errno)
return -1;
if (end == src)
return -1;
while (*end == '\t' || *end == '\n' || *end == '\v' ||
*end == '\f' || *end == '\r' || *end == ' ')
end++;
if (*end)
return -1;
if (to)
*to = val;
return 0;
}
static int parse_i32(const char *src, int32_t *to)
{
const char *end;
long val;
if (!src || !*src)
return -1;
end = src;
errno = 0;
val = strtol(src, (char **)&end, 0);
if (errno)
return -1;
if (end == src)
return -1;
if (val < INT32_MIN || val > INT32_MAX)
return -1;
while (*end == '\t' || *end == '\n' || *end == '\v' ||
*end == '\f' || *end == '\r' || *end == ' ')
end++;
if (*end)
return -1;
if (to)
*to = val;
return 0;
}
int main(int argc, char *argv[])
{
int32_t min, max;
double fraction, correct, err, errmin, errmax;
i32vec xy;
unsigned long i, iters;
if (argc != 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help | help ]\n", argv[0]);
fprintf(stderr, " %s MIN MAX ITERATIONS\n", argv[0]);
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
if (parse_i32(argv[1], &min)) {
fprintf(stderr, "%s: Invalid minimum. Try 1.\n", argv[1]);
return EXIT_FAILURE;
}
if (parse_i32(argv[2], &max)) {
fprintf(stderr, "%s: Invalid maximum. Try 32767.\n", argv[2]);
return EXIT_FAILURE;
}
if (parse_ulong(argv[3], &iters) || iters < 1) {
fprintf(stderr, "%s: Invalid number of iterations.\n", argv[3]);
return EXIT_FAILURE;
}
prng_i32vec_range(min, min, max, max);
errmin = HUGE_VAL;
errmax = -HUGE_VAL;
printf("Using seed %" PRIu64 ".\n", prng_randomize());
for (i = 0; i < iters; i++) {
do {
xy = prng_i32vec();
correct = fx_div_correct(xy.x, xy.y);
} while (xy.y == 0 || correct < (double)INT32_MIN || correct > (double)INT32_MAX);
fraction = fx_div(xy.x, xy.y);
err = correct - fraction;
if (errmin > err)
errmin = err;
if (errmax < err)
errmax = err;
}
printf("Largest dividend was %" PRIu64 " (%.0f bits)\n", max_n, ceil(log((double)max_n)/log(2.0)));
printf("Largest divisor was %" PRIu32 " (%.0f bits)\n", max_d, ceil(log((double)max_d)/log(2.0)));
printf("Maximum negative error found: %9.6f ULP\n", errmin);
printf("Maximum positive error found: %9.6f ULP\n", errmax);
return EXIT_SUCCESS;
}
This is exactly what I mean with DSP-engine!
Both the multiplier and divider need more internal bits.
This approach seems to cost more complexity and LE