Never underestimate the power of
vector algebra.
For example, if you have a 2D unit vector \$(x, y)\$ (unit meaning length is 1, \$x^2 + y^2 = 1^2 = 1\$), then \$x = \cos \varphi\$ and \$y = \sin \varphi\$ where \$\varphi\$ is the angle, positive counter-clockwise, the vector makes with the positive \$x\$ axis.
To rotate any 2D vector \$(u, v)\$ by angle \$\varphi\$ represented by unit vector \$(x, y)\$, you do
$$\begin{aligned}
u^\prime &= u x - v y \\
v^\prime &= v x + u y \\
\end{aligned}$$
If a vector \$(x^\prime, y^\prime)\$ should be an unit vector –– rounding errors tend to creep in, causing a bit of shrinking/expansion ––, it can always be
normalized to an unit vector \$(x, y)\$ via
$$\begin{aligned}
L^\prime &= \sqrt{ {x^\prime}^2 + {y^\prime}^2 } \\
x &= \frac{x^\prime}{L^\prime} \\
y &= \frac{y^\prime}{L^\prime} \\
\end{aligned}$$
Since division tends to be much slower than multiplication, often the division by square root is replaced by multiplication with the reciprocal square root, \$1 / \sqrt{{x^\prime}^2 + {y^\prime}^2}\$. This may sound like a slow operation, but there are tricks that one can use to speed it up significantly, even when using say fixed-point numbers with no hardware square root or division operations. One is to first calculate \${L^\prime}^2\$. If it is less than one half, multiply it by four, and double both \$x^\prime\$ and \$y^\prime\$; repeat until it it is at least one half. If it is two or greater, divide it by four, and halve both \$x^\prime\$ and \$y^\prime\$. This simplifies to finding the reciprocal of square root of \$z\$, with \$1/2 \le z \lt 2\$. As \$1/\sqrt{z} = \sqrt{1/z}\$ is a nice smooth monotonically decreasing curve, one can use a binary search, a polynomial approximation, or even Newton-Raphson iteration via \$r \gets r ( 3 - z r^2 ) / 2\$. Binary search will give one additional bit of precision per iteration but requires fewer elementary operations per Newton-Raphson iteration, but Newton-Raphson converges faster requiring fewer iterations (after the first one or two) to reach the same precision.
If you use a polynomial approximation for the first sine octant, you can construct an unit vector given only the angle. The above normalization will be useful then too, to minimise the approximation error; then, the approximation error shows up as "noise" in the angle parameter \$\varphi\$, but not in the unit vector length per se. That is, the sine-cosine pari constructed thus will have \$\sin^2 \varphi + \cos^2 \varphi = 1\$.
With this unit vector approach, if you don't have hardware floating point or division operations, sooner or later you will end up with
CORDIC: an algorithm to calculate trigonometric functions with bit shifts, additions, multiplications, and some look-up tables.
With the above, and using quaternions or bivectors for representing 3D orientation and rotations, even an 8-bit AVR has easily enough computational power to do real-time 3D position and orientation tracking with sensor fusion from accelerometers and magnetic field sensors using standard software floating-point operations; much more so if using fixed-point arithmetic with 8, 16, 24, or 32 fractional bits (and 7 integral bits, if using unit vectors and unit quaternions/bivectors) and extended inline assembly implementations for their efficient multiplication.