One thing I realized: most of my comparisons (software vs d2d with or without cleartype) have been done with a standalone app and a 200% DPI setting, and with a scalable UI (ie scale transform on the main component based on window size).
For some reason it looks like in this context VST3 does not suffer the same problems, at least not to the same amount: software and d2d are much closer.
Even more suprising: it looks like d2d SVG rendering is also worse in standalone mode whereas it is pretty similar (and better) to software in VST3 mode.
Could this be due to UI scale (both affine transforme and OS DPI) not being interpreted correctly in standalone?