Case in point is a recent game that went 1.e4 e5 2.f4 exf4 3.Nf3 d6 4.Bc4 h6 5.O-O. The book move order is 5.d4 g5 6.O-O Bg7. So I had to stop and think about "what weakness 5.O-O has that 5.d4 does not".
My thinking went: 1) f4 is not attacked so g5 can be postponed, and 2) I got attracted to c6 because 2a) it supports a later d5 blunting the bishop, and 2b) it opens up a nice Qb6+ possibility.
Stockfish says no, that 5.g5 is best, with 5.Nc6 a close 2nd, and with 5.c6 being 5th after even Nd7 and a6! To be fair to myself, none of these moves is horrible, and all score as a Black maintaining the edge. So what is Stockfish saying, that 5.O-O g5 6.d4 transposes with 5.d4 g5 6.O-O? Not at all; Stockfish seems to be saying, counter to theory, that 5.d4 as well as 5.O-O are inferior for White!
This needs to be investigated. Let's see what Stockfish thinks White should be doing after 1.e4 e5 2.f4 exf4 3.Nf3 d6 4.Bc4 h6. Both 5.h4 and 5.b3 show up. Interesting. Both interfere with Black's fianchetto plan on g7. Actually 5.h4 is also theory, but I thought 5...Be7 and 6...Nf6 refuted it. Let's see what Stockfish thinks.
Black manages to keep the extra pawn but a less-than-one-pawn advantage with the liquidation line 5.h4 Be7 6.d4 Nf6 7.Nc3 Nh5 8.Ne2 Bg4. Basically Black will get h4 (with check) while White will get f4.
But there is a larger question here. What is the full range of "playable" move orders (for White)? I've started an experimental compilation of such "playable" lines, with some strict criteria to keep it tractable. This should produce a comprehensive opening book of sorts, which should be interesting to compare with "known"/published theory. The criteria I'm starting with: 1) only variations by White, on moves 3 through 9 (after 1.e4 e5 2.f4 exf4), i.e. only a single Black response selected (by me) to each White variation; 2) at most 3 White variations at each move, taken from the top of Stockfish's evaluations regardless of "existing theory"; 3) with an additional restriction that any White moves that evaluate more than 0.5 worse than the top move are cut.
The reason for "1)" is that I'm only interested in this as a "repertoire for Black" (and to reduce combinatorial explosion). The reason for "2)" is of course to limit combinatorial explosion, while stlll allowing some minimal required variation for interest. The reason for "3)" is to further restrict explosion by assuming "bad" moves (and their exploitation) would be more easily found over the board, so not needed in "book". That last assumption is pretty tenuous, especially for players at my level. Nevertheless. It makes for good puzzles/challenges as to why move X got excluded.
Caveat: Of course the Stockfish evaluations are going to vary, possibly greatly, depending on 1) how deep I let Stockfish go, 2) how much hashtable is used (memory of previous evaluations), and 3) how manual probing changes the population of the hash tables, 4) etc. The purpose of the project is just to produce one experimental benchmark, of reasonable quality but in a reasonable amount of time. For possible critique and/or refinement later. I am of course assuming that Stockfish gives "reasonable" evaluations even in the opening; I think that is valid at this stage of engine power. I expect that a project like this will identify at least as many valid lines that have never been considered "theory", as lines that theory has already validly refuted.
I hope to give examples in a following post.