Something
like this would be overkill but the test circuits give some idea of what is required.
A simple tester might just measure:
1. Input offset voltage.
2. Input bias current.
3. Input offset current.
4. Supply quiescent current.
5. Maximum output source current.
6. Maximum output sink current.
That covers the most common subtle failure modes and can be done with one test assembly. The next step up would be making those measurements over the common mode input voltage range which is easy enough to do. The next step above that would be also measuring the source and sink current versus output voltage.
For what it is worth, the most common subtle failure I encounter is high input bias and/or offset current because the input stage was damaged.