Developers measure the effectiveness of their test suites using metrics like block (or line) coverage, which indicates what percentage of a program's basic blocks (or lines) were cumulatively executed by the test suite. Such simple metrics however are flawed in multiple ways (e.g., even if all lines in a program were touched, there are still plenty of potentially buggy paths through the program's lines).
Fuzz testers try out various points in the input space. Symbolic execution is little more clever and builds constraints that characterize entire classes of executions, effectively partitioning the input (or output) space of the program into equivalence classes of behavior.
Please define a smarter quality metric for tests that addresses the shortcomings of block/line coverage, perhaps leveraging lessons from symbolic execution. Support your argument for why this metric is better with quantitative evidence or reasoning.