User Tools

Site Tools


flag_operations_are_free

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
flag_operations_are_free [2026/01/11 16:21] appledogflag_operations_are_free [2026/01/11 16:30] (current) appledog
Line 15: Line 15:
 Here's LDAL-1: Here's LDAL-1:
  
-<codify nasm>+<codify armasm>
 ; Test program: 1 million LDAL [$1000] operations ; Test program: 1 million LDAL [$1000] operations
 ; Uses CD (32-bit "count down" counter register) ; Uses CD (32-bit "count down" counter register)
Line 67: Line 67:
 * OVERFLOW_FLAG = value === 0x8000; * OVERFLOW_FLAG = value === 0x8000;
  
-That is a significant amount of flags. Having this make no impact whatsoever was surprising, so I removed the IF statements blocking these flags on DEC. This produces LDAL-2b, which surprised me by getting again the exact same 2.1 MIPS. So, over 2 million if statements wasn't moving the needle? That felt strange. +That is a significant amount of code to remove, but ONE compare op was killing it. Having this make no impact whatsoever was surprising, so I removed the IF statements blocking these flags on DEC. This produces LDAL-2b, which surprised me by getting again the exact same 2.1 MIPS. So, over 2 million if statements AND two million times the five lines of code above wasn't moving the needle? Wow.
- +
-I replaced the flag fences and I created LDAL-3; this time, I had only 100,000 execution cycles, but 10 copies of LDAL. My heart lept when I saw the score; 7.55 MIPS! This meant that LDAL was executing much faster than the other instructions. I immediately created LDAL-4 which had 1,000 lines of LDAL and loaded CD with 1 million. The goal was simple: execute 1 billion LDAL instructions and time the result. The results were spectacular. 78 MIPS. I did try with CMP,0 and SEF mode, and it was slower (73 MIPS). The immediate conclusion is that SEF mode was useless. CMP was dragging everything down. But I didn't know why. +
- +
-For the record, I created versions which used LDA and LDAB+
  
 +I replaced the flag fences and I created LDAL-3; this time, I had 100,000 runs of 10 LDAL operations. My heart lept for joy when I saw the score; 7.55 MIPS! This meant that LDAL was executing much faster than the other instructions. I immediately created LDAL-4 which had 1,000 lines of LDAL and loaded CD with 1 million. The goal was simple: execute 1 billion LDAL instructions and time the result. The results were spectacular. 78 MIPS. I did try with CMP,0 and SEF mode, and it was slower (73 MIPS). The immediate conclusion is that SEF mode was useless. CMP was dragging everything down. But I didn't know why.
  
-78 MIPS With SEF & CMP CD+I experimented with some other LD instructions It turned out that LDBLX and LDAB were extremely slowjust as slow as CMP. I once again tested CMP with and without SEF/CLF just to confirm: Yes, one CMP operation was many times slower than millions of by-the-way flag checks. Adding a CMP lowered the MIPS to 73 but removing it got us over  78. 
-73 MIPS With CLF & no CMP+
  
 +The final conclusion was that my memory system was not optimized. One of the major issues was that I was creating an array in web assembly every register access. I moved that out of the loop and saw MIPS return to normal. In fact it was better than normal- for normal load and store operations I was at 55 MIPS.
  
  
  
  
flag_operations_are_free.txt · Last modified: by appledog

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki