A disclaimer I haven't done much with FPGAs (and nothing formal like this), so I'm approaching this from a more general hardware standpoint.
You can write a spec at any level, for whatever purpose. The important part is that it's complete in scope, leaving no undefined edge cases that can cause unexpected troubles, while being either malleable enough to allow for issues percolating up from below, or being carefully agnostic of implementation, or being very smartly considered to anticipate a typical implementation
and what issues it will have. A good spec typically incorporates all three strategies: we are imperfect, and presumably we've not done this exact thing before so some allowance must be made for possible issues. We can anticipate some when we're familiar with the underlying subject (like above: allowing buffer size, bus width, propagation delay or clock speed, etc. to suit the platform), but unless we've literally done the whole exact design before, we shouldn't be so arrogant as to assume we can think of every possible concern, on any possible platform.
Older specs tend to have this locked down much more solidly, as their rough spots, projections into higher and lower layers, have been carefully polished away over the years. Networking stacks are a good example: take something like TCP, running on IP, on Ethernet and so on. Every layer of that stack has been re-implemented over the better part of
half a century! Each layer has been slotted into a stack running across dozens of computer architectures, dozens of network interfaces (made by hundreds of manufacturers?), and they all have to work perfectly to get on the internet together. (Although perhaps my ignorance (and to be honest, fear) of networks is oversimplifying the hell that is network implementation...)
For an FPGA project specification, a good place to start is, everything the FPGA chip itself has to talk to. Inputs and outputs, system clocks, voltages and currents, pinouts as applicable -- this is what I, as a design engineer, need, to begin drawing schematics for instance. Probably, V/I will be set by the device, so there is that consideration bubbling up; but we can patch over that with interfaces, level translators, whatever, as needed. (Say mating a 3.3V FPGA to a 5V Arduino board, or a RS-485 bus, or an Ethernet PHY, or...) Or we can set that as a requirement, in which case the choice of FPGA chip itself is determined by the spec. Which, I mean, it is anyway, in terms of IOs, logic gates and speed. At least to a rough degree. Like, how much logic is used depends on the implementation and synthesizer, so one needs to be generous in the specification; it's better to, say, double the available resources (pick one of the higher, if not the highest, gate counts in the part series), than to completely run out of them during development!
Be careful not to be overzealous, like for example, 5V parts are all but obsolete these days, so that would be a mistake to require it, off the bat. (Offhand, not sure which if any are left; and there may be some with 5V-tolerant inputs, but 3.3V supply and thus 3.3V output, which isn't the same as native 5V operation.)
And note that, where translators are necessary, it's a good idea to use directional buses. So additional IOs will be required for bus direction signals. Probably toss in some extra IOs for debug too (could just be test points or a header, could be blinkenlites, etc.). Maybe they aren't necessary and it doesn't matter; maybe they save your ass when an extra signal needs to be fanned out to a mistake elsewhere on the prototype.

Maybe the specification should reach a little deeper. There needs to be some kind of description of functional operation. This could be diagrammed with waveforms, a state machine (flowchart or etc.), block diagram of high-level components, etc. In the same way that we build computers from a CPU, peripherals and a bus connecting them, we might follow a similar approach. (We don't need to worry about the semantics of the bus; that's actually something FPGAs strictly prohibit, AFAIK -- IEEE 1164 states are synthesized from gates and muxes connected by strict design rules. Some "buses" take advantage of this, having somewhat different architectures than the traditional multi-master parallel bus, allowing more bandwidth between devices while taking similar amounts of resources. Or something like that, I guess.)
A very precise description might even go down to the gate level: for example, exact logic between inputs and outputs for strict control purposes. One should be careful not to be too restrictive here, as there is some flexibility in how FPGAs synthesize gates -- one logic element might serve the function of several discrete (AND/OR/mux/etc.) gates, but this flexibility will be wasted if signals are strictly specified with elemental operations. (At least, I recall that was the case back in the '00s. Maybe synthesizers are better at that, now?) Not that you're likely to be wasting many LEs in the process, or needing strict timing (it would be one ballbuster of a control logic system if it's dozens or hundreds of inputs and outputs, taking thousands of gates..), but it's more efficient in general to let the synthesizer handle it. So, express logic functions in a process, rather than all the gates and signals that might be used to implement that function.
A good intermediate is RTL (Register Transfer Level), using macro blocks (adders, latches, muxes, etc.) to implement things. Which may in turn be made of collections of LEs, but may also be hardware accelerated ("DSP blocks", multipliers, block RAM, etc.). Allowing that flexibility, allows the implementor or synthesizer to make use those resources, while keeping the design cycle-accurate to the specification. (DSP signal chains are a good example, as the latency between accumulators might need to be single clock cycles, while running at >100MHz.) Or if some latency (pipelining?) is impossible to avoid, perhaps that specification can be relaxed or another part can be modified to accommodate that.
So, a very good spec must consider all the possibilities. Doing this exactly, is intractable for anything of modest scale; but that doesn't mean we must give up entirely. We might not be able to investigate every possible state (say a 64-bit register holding >10^20 possible states, good luck testing that even at 1ns/state!), but the better we can describe large swaths of that, the better chance of complete success we will have. Example: define that every state must eventually return to an explicit state in the graph -- including prohibited and undefined states. The last thing you want is a disconnected graph so that operation becomes trapped in this error state and has to be reset or power cycled! (Such a state might be semantically impossible to reach, but that doesn't mean electrical interference or cosmic rays can't cause it.)
The same sorts of precautions that describe good software design. Make functions pure if possible (no side effects, only strict inputs and outputs). Use few states, or operands or etc., and test them when possible. Be aware of edge cases and design to them (for the most part, the bulk of the intended function is trivial; it's what happens in the extremes, where it gets hard to reason about.) Consider using proofing tools to evaluate your logic: this isn't usually necessary for commercial stuff, but it's invaluable for high reliability systems.
Tim