flag_operations_are_free
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| flag_operations_are_free [2026/01/11 15:41] – appledog | flag_operations_are_free [2026/01/11 23:19] (current) – appledog | ||
|---|---|---|---|
| Line 15: | Line 15: | ||
| Here's LDAL-1: | Here's LDAL-1: | ||
| - | <codeprism | + | <codify |
| ; Test program: 1 million LDAL [$1000] operations | ; Test program: 1 million LDAL [$1000] operations | ||
| ; Uses CD (32-bit "count down" counter register) | ; Uses CD (32-bit "count down" counter register) | ||
| Line 31: | Line 31: | ||
| HALT ; Halt when done | HALT ; Halt when done | ||
| - | </codeprism> | + | </codify> |
| This is a pretty simple take on a simple concept; Execute 1 million LDAL operations and see what happens. The result was a MIPS score of 1.85. I became depressed. How had my beautiful CPU become so slow? Just a few weeks ago it was pulling over 60 MIPS. Now, it was showing scores that didn't make sense. | This is a pretty simple take on a simple concept; Execute 1 million LDAL operations and see what happens. The result was a MIPS score of 1.85. I became depressed. How had my beautiful CPU become so slow? Just a few weeks ago it was pulling over 60 MIPS. Now, it was showing scores that didn't make sense. | ||
| Line 41: | Line 41: | ||
| Next I moved to LDAL-3 where I removed the CMP since it was no longer needed: | Next I moved to LDAL-3 where I removed the CMP since it was no longer needed: | ||
| - | <codeprism asm6502> | + | <codify armasm> |
| ; Test program: 1 million LDAL [$1000] operations | ; Test program: 1 million LDAL [$1000] operations | ||
| ; Uses CD (32-bit "count down" counter register) | ; Uses CD (32-bit "count down" counter register) | ||
| Line 57: | Line 57: | ||
| HALT ; Halt when done | HALT ; Halt when done | ||
| - | </code> | + | </codify> |
| Now this was a real eye opener. Removing the explicit check and keeping the flag ops ON, resulted in a MIPS score of 2.1! Well now, this was surprising but not entirely unexpected. Well, no, it was unexpected. Removing flag operations for LD and DEC is significant as they are both being executed 1 million times each. Here's the code that we're talking about: | Now this was a real eye opener. Removing the explicit check and keeping the flag ops ON, resulted in a MIPS score of 2.1! Well now, this was surprising but not entirely unexpected. Well, no, it was unexpected. Removing flag operations for LD and DEC is significant as they are both being executed 1 million times each. Here's the code that we're talking about: | ||
| Line 67: | Line 67: | ||
| * OVERFLOW_FLAG = value === 0x8000; | * OVERFLOW_FLAG = value === 0x8000; | ||
| - | That is a significant amount of flags. Having this make no impact whatsoever was surprising, so I removed the IF statements blocking these flags on DEC. This produces LDAL-2b, which surprised me by getting again the exact same 2.1 MIPS. So, over 2 million if statements wasn't moving the needle? | + | That is a significant amount of code to remove, but ONE compare op was killing it. Having this make no impact whatsoever was surprising, so I removed the IF statements blocking these flags on DEC. This produces LDAL-2b, which surprised me by getting again the exact same 2.1 MIPS. So, over 2 million if statements |
| - | I replaced the flag fences and I created LDAL-3; this time, I had only 100, | + | I replaced the flag fences and I created LDAL-3; this time, I had 100, |
| - | For the record, | + | I experimented with some other LD instructions It turned out that LDBLX and LDAB were extremely slow, and when put into an unrolled loop would drop to under 10 MIPS. |
| + | <codify armasm> | ||
| + | ; Test program: 1 million LDAL [$1000] operations | ||
| + | ; Uses CD (32-bit "count down" counter register) | ||
| + | |||
| + | .address $010100 | ||
| + | |||
| + | LDCD #1000 ; Load 1,000 into CD (0x989680 is 10 mil) | ||
| + | |||
| + | loop: | ||
| + | LDAB [$1000] | ||
| + | DEC CD ; Decrement CD | ||
| + | CMP CD, 0 | ||
| + | JNZ loop ; Jump to loop if CD != 0 | ||
| + | |||
| + | HALT ; Halt when done | ||
| + | </ | ||
| - | 78 MIPS With SEF & CMP CD, 0 | + | The final conclusion was that my memory system was not optimized. One of the major issues was that I was creating an array in web assembly every register access. I moved that out of the loop and inlined memory access directly ito the LD/ST instructions. That brought |
| - | 73 MIPS With CLF & no CMP | + | |
| + | The turbo boost over and above this was batching all the reads to the start of the opcode handler and then masking down depending on how we needed to access the registers. In closing, here's the 87.5 MIPS version of LDA [$addr]: | ||
| + | <codify armasm> | ||
| + | case OP.LD_MEM: { | ||
| + | // Load reg (1 byte) + addr (3 bytes) = 4 bytes total | ||
| + | let instruction = load< | ||
| + | let reg:u8 = instruction as u8; // Extract low byte | ||
| + | let addr = (instruction >> 8) & 0x00FFFFFF; | ||
| + | // Pre-load 32 bits from target address | ||
| + | let value = load< | ||
| + | let reg_index = reg & 0x0F; // Extract physical register 0-15 | ||
| + | IP += 4; | ||
| + | if (reg < 16) { | ||
| + | set_register_16bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x8000) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } else if (reg < 48) { | ||
| + | set_register_8bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x80) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } else if (reg < 80) { | ||
| + | set_register_24bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x800000) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } else { | ||
| + | set_register_32bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x80000000) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } | ||
| + | break; | ||
| + | } | ||
| + | </ | ||
| + | for more information please contact Appledog. | ||
flag_operations_are_free.1768146094.txt.gz · Last modified: by appledog
