When I think about this a bit further, this is probably due to the heavy STF element.
Thought about it like this:
There is a manual way to simulate a STF element by making multiple exposures, using different apertures every single time. You can also slowly close the aperture during a single exposure to end up with the same effect. The movement (and therefore the exposure times at different apertures) however, is not linear. You have to stay (and therefore expose) at the smaller apertures much longer. So the majority of the end exposure comes from that smaller apertures. Minolta 7 (film) had this feature built in but you can simulate it if you have an aperture ring or multiple-exposure capable camera. Or simply in Photoshop.
So, that's what the STF element is doing. So it's actually quite natural you hit the diffraction limit rather early and this is not due to the distance between the STF element and aperture, but due to the (rather) heavy STF element.
Also below is a quote from a guy who did that STF Simulation in the Lensrentals Photo Geek Contest 2013:
I don’t have a great Bokehlicious lens, but I can turn a normal lens into one! The Olympus OM 50mm f/3.5 Macro lens has a 6-bladed aperture with non-rounded blades, giving a relatively uniform bokeh shape. However, a Gaussian is often considered the ideal bokeh shape. To make a more Gaussian blur, I used an exposure time of 30 seconds and closed down the aperture slowly from f/3.5 to 22 during the exposure. This has the effect of exposing the inner area of the bokeh longer than the outer areas, making it brighter in the center. The lens itself has aperture click-stops at f/3.5, 5.6, 8, 11, 16, and 22. I used values from Pascal’s triangle to determine the exposure time at each click stop, which respectively were 0.03, 0.3, 1.6, 4.8, 9.7, and 13.5 seconds. The first two steps I just did as quickly as possible. The approximately Gaussian bokeh is clearly much smoother than when using the constant aperture, and in my opinion quite beautiful.
Sadly the photo is gone but Roger Cicala probably has it stored somewhere.
Also having a heavier transition at that 2.0EV effect would make it worse.