BJT isn't really needed here, and I would prefer a MOSFET because the base current requirement is fairly steep (multiple mA).
2N7002 is fine for such a load, it would seem. The '3055 is rather overkill, but very jellybean so you can use it if you like. It should be adequate for 3Vgs(on) service, too.
Some notes on use of datasheets, and design with both types:
- The plateau isn't what you're after (which is how you've phrased this). It's merely a side effect of what you're after.
- Vgs(th) is where the transistor begins conducting (specifically, at the Id shown: 0.25mA). This has a spread due to manufacturing tolerance.
- All the curves are plotted around the average case, i.e., Vgs(th) is
typical. Referring to Fig.1 for example: the spread is approximately equivalent to an off-by-one error on which curve you're looking at. That is, the 3.5V curve could be at 4.0V or 3.0V. They don't give 2.5, 2.0 and so on (which are probably nearly zero on this scale), so it's hard to say how much current it's actually capable of, under worst case conditions (lowest Vgs(on) in your application, highest Vgs(th) of the transistor).
- It's also not simply one step lower, because transconductance drops sharply at low Vgs. This is illustrated by Fig.2, where the slope of the curve becomes quite shallow at low Vgs. But it's hard to tell how much, and the tempco is also quite exaggerated down there (it's not given in this datasheet, but a curve of Vgs(th) vs. temperature is sometimes seen). The effect is that the curves in Fig.1 get more closely packed as you go down in voltage, and current doesn't actually go to zero, it tapers off more gradually.
The Rds(on) curves also do not show this; they end at Id = 5A. Actually, they are mislabeled, anyway, which is dumb.
- Overall, I expect that gain is high enough, that you can be reasonably sure to handle much more than 100mA, at Vgs(on) = 3V. This is only 1V above Vgs(th), but it's probably capable of 0.5A or more under this condition, too.
- The plateau occurs when drain voltage is changing. It is due to Miller effect, i.e., Cdg multiplied by effective gain. You only have gain while drain voltage is changing, and that's only when load current is going from somewhere near zero, to somewhere near full load (Vds in voltage saturation). So Fig.8 only applies to the load current specified. It will of course be lower (and shorter length) for lower Vds.
Noteworthy that Coss (Cds + Cdg) and Crss (Cdg) is very high at low drain voltages, so the plateau isn't proportionally shorter, at lower voltages.
BJTs:
- The standard rule of thumb is Ib(on) = Ic(max) / 10 or so. The 10 is the saturated (forced) hFE. It's "forced" because, when Vce is low, putting in more base current isn't going to make it turn on any further (Vce may drop slightly, that's it). So, anywhere from hFE < 50 to hFE > 0.5 maybe, you can use whatever base current. hFE remains defined as the ratio of Ic/Ib, so, if Ic is acting independent of Ib, we can
make -- force -- hFE to be whatever we want it to be.
- For real parts with hFE (linear) in the >100 range, I prefer hFE(sat) in the <30 range. This gives reasonable Vce(sat) (usually not the lowest possible, but also definitely not unsaturated), and doesn't store as much excess charge, so is faster.
- For low-Vce(sat) types (e.g., PBSS303NX), hFE is fairly high to begin with, but it remains high even at low Vce, and even at very high Ic (several amperes for this example). hFE(sat) = 80 is often found in the datasheets!
- Stored charge is the BJT equivalent of diode reverse recovery. It doesn't turn off until the B-E junction has discharged, and the amount of excess charged stored depends on the forward bias current. This charge must be removed before the transistor turns off. (It is for this reason, that a BJT -- like the MOSFET, for reasons just covered!

-- can be more accurately considered a charge-controlled device, rather than a current-controlled one.) If left alone, this takes 10+ microseconds on its own. (The B-E junction can be modeled as a very small, very leaky, battery: complete with exponential V(I) and charge(V) dependency, for very similar reasons, actually!)
- For this reason, normally the base is driven through a voltage divider, so that there is a path to discharge the base. Design the divider for a Thevenin equivalent output "on" voltage of 1.2 to 2V, "off" of less than 0.3V, and a Thevenin equivalent resistance adequate to deliver the required base current.
Or to put that another way, supply say 1.5 Ib of turn-on current, and put a resistor Rbe = (Vbe) / (0.5 Ib) across B-E. That way you get about Ib base current to turn on, and keep it on; and about 0.5 Ib reverse current to turn it off. This will give turn-off times of 50-500ns, a significant improvement.
If you aren't pressed for time, and have a low logic voltage to begin with (like 2.5V CMOS), simply using one resistor will suffice. The turn-off current flows through the same resistor, just less compared to the divider.
3V CMOS is on the edge of where I'd say "yeah just put in a B-E resistor" versus shrugging it off. YMMV.
- If you do follow tradition and use hFE(sat) = 10, then you'll need quite a lot of base current (i.e., 10mA!), which may even need a buffer. (Not all logic is created equally: some pin drivers are only rated for a few mA DC. FPGAs are often limited this way.)
- This is also an efficiency hit (10% increased current consumption!), particularly for variable loads that may be lighter (so, not your relay coil, but suppose this were a general switch output that someone might just put a few mA of LEDs on, or a full 100mA coil, or anything inbetween -- such a waste unless it's actually needed, isn't it?).
- It is for these reasons that I would also prefer a MOSFET. Basically no DC gate current, so it's fine all around.
Tim