After reviewing my settings, I did see a major flaw. I used a prescaler of 1. One might think that it means the clock frequency is divided by 1, ie. the clock frequency to the external flash is the same as the one set by the main frequency in the MCU. But this is not true! The prescaler is actually counted from 0 so 0 gives a division by 1 and if you set the prescaler to 2, the clock is divided by 2. So, a simple change from 1 to 0 doubled my speed. I also turned on DTR so that both edges of the clock cycle is used for transfering data and now it looks much more better. I still have a small drawing error, it kind of misses one or two pixels in the beginning and I believe I need to tweak the timings a little more to get rid of that pixel error.
I did a quick comparsion between the different memory types to get a better feeling of the speed.
I used the same bitmap as in last blog (ie. 32 x 32 in ARGB8888) but copied the image from different memories to the framebuffer.
As you can see in my measurements, there is no big surprises. The internal RAM is the quickest followed by the internal flash and finally the external flash. But 36.8 us is a lot better than 112 us!