flag_operations_are_free
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| flag_operations_are_free [2026/01/11 16:21] – appledog | flag_operations_are_free [2026/01/11 23:19] (current) – appledog | ||
|---|---|---|---|
| Line 15: | Line 15: | ||
| Here's LDAL-1: | Here's LDAL-1: | ||
| - | < | + | < |
| ; Test program: 1 million LDAL [$1000] operations | ; Test program: 1 million LDAL [$1000] operations | ||
| ; Uses CD (32-bit "count down" counter register) | ; Uses CD (32-bit "count down" counter register) | ||
| Line 67: | Line 67: | ||
| * OVERFLOW_FLAG = value === 0x8000; | * OVERFLOW_FLAG = value === 0x8000; | ||
| - | That is a significant amount of flags. Having this make no impact whatsoever was surprising, so I removed the IF statements blocking these flags on DEC. This produces LDAL-2b, which surprised me by getting again the exact same 2.1 MIPS. So, over 2 million if statements wasn't moving the needle? | + | That is a significant amount of code to remove, but ONE compare op was killing it. Having this make no impact whatsoever was surprising, so I removed the IF statements blocking these flags on DEC. This produces LDAL-2b, which surprised me by getting again the exact same 2.1 MIPS. So, over 2 million if statements |
| - | I replaced the flag fences and I created LDAL-3; this time, I had only 100, | + | I replaced the flag fences and I created LDAL-3; this time, I had 100, |
| - | For the record, | + | I experimented with some other LD instructions It turned out that LDBLX and LDAB were extremely slow, and when put into an unrolled loop would drop to under 10 MIPS. |
| + | <codify armasm> | ||
| + | ; Test program: 1 million LDAL [$1000] operations | ||
| + | ; Uses CD (32-bit "count down" counter register) | ||
| - | 78 MIPS With SEF & CMP CD, 0 | + | .address $010100 |
| - | 73 MIPS With CLF & no CMP | + | |
| + | LDCD #1000 ; Load 1,000 into CD (0x989680 is 10 mil) | ||
| + | loop: | ||
| + | LDAB [$1000] | ||
| + | DEC CD ; Decrement CD | ||
| + | CMP CD, 0 | ||
| + | JNZ loop ; Jump to loop if CD != 0 | ||
| + | HALT ; Halt when done | ||
| + | </ | ||
| + | |||
| + | The final conclusion was that my memory system was not optimized. One of the major issues was that I was creating an array in web assembly every register access. I moved that out of the loop and inlined memory access directly ito the LD/ST instructions. That brought MIPS for LDA up to 87 and MIPS for LDAB to 55. These were better numbers than before. I probably didn't notice how badly some instructions were weighing down the system. | ||
| + | |||
| + | The turbo boost over and above this was batching all the reads to the start of the opcode handler and then masking down depending on how we needed to access the registers. In closing, here's the 87.5 MIPS version of LDA [$addr]: | ||
| + | |||
| + | <codify armasm> | ||
| + | case OP.LD_MEM: { | ||
| + | // Load reg (1 byte) + addr (3 bytes) = 4 bytes total | ||
| + | let instruction = load< | ||
| + | let reg:u8 = instruction as u8; // Extract low byte | ||
| + | let addr = (instruction >> 8) & 0x00FFFFFF; | ||
| + | // Pre-load 32 bits from target address | ||
| + | let value = load< | ||
| + | let reg_index = reg & 0x0F; // Extract physical register 0-15 | ||
| + | IP += 4; | ||
| + | |||
| + | if (reg < 16) { | ||
| + | set_register_16bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x8000) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } else if (reg < 48) { | ||
| + | set_register_8bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x80) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } else if (reg < 80) { | ||
| + | set_register_24bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x800000) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } else { | ||
| + | set_register_32bit(reg, | ||
| + | ZERO_FLAG = value === 0; | ||
| + | NEGATIVE_FLAG = (value & 0x80000000) !== 0; | ||
| + | //if (DEBUG) log(`$${hex24(IP_now)} | ||
| + | } | ||
| + | break; | ||
| + | } | ||
| + | </ | ||
| + | for more information please contact Appledog. | ||
flag_operations_are_free.1768148509.txt.gz · Last modified: by appledog
