Author Topic: 6502 addressing modes: VERY confused. (Read 2383 times)

eti · « **on:** September 28, 2022, 02:42:24 am »

Hi all.

I've finally got my mind around the first principles of how the 6502 works, but am stuck on one part; the "addressing modes". As usual with YouTube, there's no consistency in videos... we see all the "retro" channels showing off restorations, we see poorly explained "tutorials" which are either in barely understandable broken English, or... well, you know how frustrating it all is.

When you don't know something and are desperate for someone to explain what it does, one's thoughts are fragile in form, and wading through blog after blog, video after video is an exercise in frustration and futility. I merely want to know how AND WHY these various 6502 addressing modes exist, please.

The closest I've got is this SUPERB video from "Chaos Computer Club" on the 6502; the trouble is, the addressing modes section didn't seem long or thorough enough to me:

Help! Please? Thanks.

ledtester · « **Reply #1 on:** September 28, 2022, 03:32:43 am »

Do you have a specific question about the addressing modes?

This page is pretty good at explaining things:

http://www.emulator101.com/6502-addressing-modes.html

The syntax can also be helpful if you understand the patterns behind it.

A bare address just means access that memory location, e.g.:

LDA $1234
STX $56

means load accumulator from address 0x1234 and store X into zero-page location $56.

A comma means to add the X or Y register to the address, e.g.:

LDA $1234,X
LDX $56,Y

means to load A from the address 0x1234+X and to load X from the address 0x0056+Y. Note that in this last case the addition 0x0056+Y is done mod 256 so the result is always in the "zero page" (first 256 bytes of memory)

Parenthesis around an address means indirection, e.g.:

JMP ($4000)

means to set the program counter to the contents of addresses 0x4000 and 0x4001. By contrast the instruction "JMP $4000" means to set the program counter to 0x4000.

Now there are two ways to combine the comma and paren operators:

LDA ($20,X)
STA ($20),Y

In the first case you add $20+X (mod 256) to get a zero page address. Then the contents of that zero page location and the next one are fetched to create a 16-address from which data is fetched to be stored in the accumulator. In other words, you do the comma first and then the indirection.

In the second case you look at the contents of 0x0020 and 0x0021 to create a 16-bit memory address. Then the Y register is added to this address to give the address to which the accumulator is stored. Here you do the indirection first and then the comma.

Another way to look at things...

in "LDA ($20,X)" you have set up a bunch of addresses in zero page, e.g. the locations 0x0020 and 0x0021 hold a 16-bit address, the locations 0x0022, 0x0023 hold another 16-bit address, etc. With X set to 0 you will be indirecting through the first address; with X set to 2 you will be indirecting through the second address.

in "STA ($20),Y", the locations 0x0020 and 0x0021 hold a 16-bit address and Y is an offset to that address. This is a much more conventional way of using an index value.

eti · « **Reply #2 on:** September 28, 2022, 07:08:48 am »

Quote from: ledtester on September 28, 2022, 03:32:43 am

Do you have a specific question about the addressing modes?

This page is pretty good at explaining things:

http://www.emulator101.com/6502-addressing-modes.html

The syntax can also be helpful if you understand the patterns behind it.

A bare address just means access that memory location, e.g.:

LDA $1234
STX $56

means load accumulator from address 0x1234 and store X into zero-page location $56.

A comma means to add the X or Y register to the address, e.g.:

LDA $1234,X
LDX $56,Y

means to load A from the address 0x1234+X and to load X from the address 0x0056+Y. Note that in this last case the addition 0x0056+Y is done mod 256 so the result is always in the "zero page" (first 256 bytes of memory)

Parenthesis around an address means indirection, e.g.:

JMP ($4000)

means to set the program counter to the contents of addresses 0x4000 and 0x4001. By contrast the instruction "JMP $4000" means to set the program counter to 0x4000.

Now there are two ways to combine the comma and paren operators:

LDA ($20,X)
STA ($20),Y

In the first case you add $20+X (mod 256) to get a zero page address. Then the contents of that zero page location and the next one are fetched to create a 16-address from which data is fetched to be stored in the accumulator. In other words, you do the comma first and then the indirection.

In the second case you look at the contents of 0x0020 and 0x0021 to create a 16-bit memory address. Then the Y register is added to this address to give the address to which the accumulator is stored. Here you do the indirection first and then the comma.

Another way to look at things...

in "LDA ($20,X)" you have set up a bunch of addresses in zero page, e.g. the locations 0x0020 and 0x0021 hold a 16-bit address, the locations 0x0022, 0x0023 hold another 16-bit address, etc. With X set to 0 you will be indirecting through the first address; with X set to 2 you will be indirecting through the second address.

in "STA ($20),Y", the locations 0x0020 and 0x0021 hold a 16-bit address and Y is an offset to that address. This is a much more conventional way of using an index value.

Thank you SO much for such a detailed and thorough explanation, and for taking the time to help me. In the interim I also found this wonderful video, equally as well explained:

brucehoult · « **Reply #3 on:** September 28, 2022, 07:26:37 am »

Quote from: eti on September 28, 2022, 02:42:24 am

I've finally got my mind around the first principles of how the 6502 works, but am stuck on one part; the "addressing modes".

It's not surprising.

Today the 6502 is often used as an example of a very simple processor -- and it certainly uses very few transistors, and especially gets very good performance from a very small number of transistors -- but it does it by being quite complex to understand.

The 6502's spiritual ancestor (and actual ancestor if you look at the people, not the companies) the 6800 is far simpler to understand, and easier to write programs for. But it's a lot less efficient, and that's important when you have a 1 MHz clock.

ledtester did a pretty good job of explaining the 6502 addressing modes in terms of assembly language syntax -- syntax that is very similar to that still used in modern assembly language today.

I will give a couple of examples, specifically of how to use the "indirect indexed X" and "indexed indirect Y" addressing modes.

First, let's say we want to have a subroutine that copies a block of bytes of memory from one place to another. We'll call it "memcpy". It has a "from" argument and a "to" argument and a size of the memory block which, for today, we'll limit to between 1 and 128 bytes (don't bother to call it at all if you want to copy 0 bytes!)

Suppose you have two blocks of memory:

Code: [Select]

foo:    .byte 0,0,0,0,0,0,0
bar:    .byte 3,1,4,1,5,9,3

To copy the contents of bar to foo we want to be able to do:

Code: [Select]

        lda >foo
        pha
        lda <foo
        pha
        lda >bar
        pha
        lda <bar
        pha
        lda #7
        pha
        jsr memcpy

Here, a ">" means the hi 9 bits of a 16 bit address and "<" means the lo 8 bits.

There are many other ways we might do this. Instead of pushing everything on to the stack we might store the various values into Zero Page locations that we know the memcpy() function expects to find them in. That would be faster, but it would add five extra bytes of code to every caller of memcpy() and you might have quite a lot of them in the program. Or the memcpy() function might expect to find the length in the X or Y register instead of on the stack. That would both be faster and save a byte of code in the caller (or two bytes compared to storing it in Zero Page)

Another way of doing this that makes the caller a lot smaller -- 8 bytes instead of 18 bytes (!) -- but less flexible is:

Code: [Select]

        jsr memcpy
        .byte >foo, <foo, >bar, <bar, 7

This makes memcpy work a lot harder to get the parameters, but it could be worth it if getting the most amount of program features into fixed size memory is important. A real program might have both this version and another that was faster but more code at the caller.

On modern computers where you are often interfacing to code from a C compiler or using a standard library, there is a standard convention for how subroutines and their callers interact, but on the 6502 and z80 it was pretty much wild west and someone who wrote a function did whatever they wanted, and the users of the function needed to consult the documentation for every function before calling it.

Let's stick with the first version for now.

Here's what the memcpy() subroutine might look like:

Code: [Select]

src     equ 13 ; completely arbitrary location in first 256 bytes of RAM. We need 2 bytes free here
dst     equ 42 ; again, arbitrary

memcpy:
        pla
        tay
        pla
        sta src
        pla
        sta src+1
        pla
        sta dst
        pla
        sta dst+1
        dey ; convert 1..256 into 0..255, one less than how many bytes we will copy
loop:
        lda (src),y
        sta (dst),y
        dey
        bpl loop
        rts

We'd really like to test the carry flag here, so that we could copy up to 256 bytes at a time, but unfortunately the DEY instruction only sets N and Z. We could add a CPY #0 and then use BCS loop, or CPY #255 and BNE loop, but these would slow down the loop quite a bit. Ok, 12.5% -- 18 cycles per loop instead of 16. Maybe it's worth it.

Programming weird things such as the 6502 is full of such trade-offs.

OK, so when would we use ZP,x and absolute,Y addressing modes?

One example would be if we have a program doing a lot of arithmetic on 16 bit or 32 bit integers. Maybe a subroutine has a number of variables like this, and stores them in Zero Page while working on them.

Let's say we have some 32 bit variables in Zero Page with their least significant 8 bits at addresses ... 13, 42, 69, and 100. And for some reason we want to add them all up and put the result at address 42 (the answer).

You could do it like this:

Code: [Select]

add32:
        clc
        lda $0000,y ;sadly, there is no ZP,y
        adc $00,x
        sta $00,x
        lda $0001,y
        adc $01,x
        sta $01,x
        lda $0002,y
        adc $02,x
        sta $02,x
        lda $0003,y
        adc $03,x
        sta $03,x
        rts

... and then using it ...

Code: [Select]

        ldx #42
        ldy #13
        jsr add32
        ldy #69
        jsr add32
        ldy #100
        jsr add32

Finally, when would you use {ZP,x) addressing mode?

To be honest, I think I've almost never used it, except when X was 0 and Y was being used for something else. Note that (ZP,x) and (ZP),y are exactly the same thing if X and Y are 0. But (ZP),y is 1 clock cycle faster, except for STA.

I do have an example where you might use it, but it's kind of obscure so I'll leave it for now.

Peabody · « **Reply #4 on:** September 28, 2022, 02:50:40 pm »

Not related to addressing modes, but one thing to note on the 6502 is the behavior of the carry flag on a subtract or compare. If the two numbers are the same, or if you are subracting a smaller number from a larger one, so there is no borrow, the resulting CF will be set. In fact, you generally need to set the CF before doing a subtract to get the right answer. Some will consider this to be backward.

brucehoult · « **Reply #5 on:** September 28, 2022, 05:38:34 pm »

Quote from: Peabody on September 28, 2022, 02:50:40 pm

Not related to addressing modes, but one thing to note on the 6502 is the behavior of the carry flag on a subtract or compare. If the two numbers are the same, or if you are subracting a smaller number from a larger one, so there is no borrow, the resulting CF will be set. In fact, you generally need to set the CF before doing a subtract to get the right answer. Some will consider this to be backward.

True, and the code I showed made use of this fact, but note it is not at all unusual! Others with the same property include ARM, MSP430, PowerPC, PA-RISC, PIC, and even the IBM 360.

The reason you notice it more on 6502 is the other ISAs generally have both "subtract" and "subtract with carry" instructions, and in the former case the carry flag is automatically set for you. On 6502 you need to manually set the carry before a subtract (but not before a CMP).

Peabody · « **Reply #6 on:** September 28, 2022, 09:41:29 pm »

Yes, I felt right at home when I started on assembler on the MSP430. None of that 80x86 rubbish. Anyway, my memory is that the 6502 only has the SBC opcode. There is no SUB. So you have to make sure the CF is right before you subtract.

Also, the 6502 and MSP430 are both von Neumann processors. None of that Harvard rubbish. :-) By the way, the MSP430 is very cool. It's 16 bit, which is very convenient.

brucehoult · « **Reply #7 on:** September 28, 2022, 10:11:36 pm »

Quote from: Peabody on September 28, 2022, 09:41:29 pm

Anyway, my memory is that the 6502 only has the SBC opcode. There is no SUB. So you have to make sure the CF is right before you subtract.

It always confuses me when people give information that I just gave in the message they are replying to, in neither a "no that's wrong" or a "yes, I agree" way, but just seeming as if ... they didn't see it?

Quote

Also, the 6502 and MSP430 are both von Neumann processors. None of that Harvard rubbish. :-) By the way, the MSP430 is very cool. It's 16 bit, which is very convenient.

A couple of months ago I did some prototyping experiments towards the idea of an MSP430 emulator written in 6502 assembly language, to see how large the code would be and what kind of performance it would offer compared to other VMs such as SWEET16 or UCSD P-Code. I think it could work out quite well. It's quite a good fit. Execution would be a little slower than interpreters based on 8 bit instructions but not awfully, but the code would be smaller (and the interpreter not toooo large), and MSP430 has the advantage of having a good gcc port.

Sherlock Holmes · « **Reply #8 on:** November 22, 2022, 10:18:23 pm »

I recall writing code for the 6502 back in the 1970s, at the time I built one of these from a kit:

It was the first real microprocessor I ever worked on, later I used a Z80 based "Microprofessor" which was more powerful.

The biggest gripe I had about the 6502 was the lack of a relative call instruction, the call (JSR) instruction needed an absolute address, there was no other way to do a JSR, and that made life harder than it needed to be!

brucehoult · « **Reply #9 on:** November 22, 2022, 11:50:15 pm »

A ±128 byte PC-relative JSR wouldn't be very useful (although 6800 had one), and there is no support at all in any other instructions for the 16 bit add needed for an arbitrary PC-relative call, so it would have been tricky to provide.

There was very little support at all for PIC in any 8 bit ISA except the M6809, which came a bit too late after 16 bit machines were already introduced.

M6800 could BSR ±128 bytes, or JSR to an absolute address in the X register (±128 bytes), so you could calculate a target address into a pair of zero page locations and then load them into X. There's no way to access the current PC contents other than a JSR/BSR, so you'd need a stub function that copied its return address to A/B or X or into memory before returning. Ugh. At least, yeah, you can call a calculated address and easily get the return address pushed.

z80 has no calculated call at all. Only absolute. And you can only get the PC with a call. So need the stub routine. At least the stub can simply load the pushed PC into a 16 bit register before returning. But you need to calculate the return address and manually push it before using CALL (HL/IX/IY).

8086 also doesn't have PC-relative call.

M6809 has both BSR with 8 bit offset and LBSR with 16 bit offset. And JSR with absolute address.

Probably with everything else the best approach would be to load the desired offset into registers and then do an absolute call to a utility function that uses the return address in the normal way, as well as to calculate an address to jump to.

e.g. for 6502

Code: [Select]

// the call site
// NB offset is relative to the last byte of the JSR, *not* to the jsr or the next instruction
  ldy #offsetHI
  lda #offsetLO
  jsr relativeCall

// the utility function
relativeCall:
  tsx
  clc
  adc $100,x
  sta zptmp
  tya
  adc $101,x
  sta zptmp+1
  jmp (zptmp)

So you're looking at 7 bytes instead of 3 at the call site, which is not awful, but 2+2+6+2+2+4+3+2+4+3+5 = 35 cycles instead of 6 for the call. Plus, it nukes ALL the registers, so you can't pass any actual arguments in registers.

z80 a little better, but still not fun:

Code: [Select]

// call site. Offset is from start of following instruction
  ld bc,#offset
  call relativeCall

// the utility function
relativeCall:
  pop hl
  push hl
  add hl,bc
  jp (hl)

So that's 6 bytes at the call site and 10+17+10+11+11+4 = 63 cycles instead of 17 for an absolute call.

At least the z80 utility function is only 4 bytes vs 15 bytes. But you only need one copy of it, so that hardly matters.

Sherlock Holmes · « **Reply #10 on:** November 23, 2022, 05:53:49 pm »

Quote from: brucehoult on November 22, 2022, 11:50:15 pm

A ±128 byte PC-relative JSR wouldn't be very useful (although 6800 had one), and there is no support at all in any other instructions for the 16 bit add needed for an arbitrary PC-relative call, so it would have been tricky to provide.

There was very little support at all for PIC in any 8 bit ISA except the M6809, which came a bit too late after 16 bit machines were already introduced.

M6800 could BSR ±128 bytes, or JSR to an absolute address in the X register (±128 bytes), so you could calculate a target address into a pair of zero page locations and then load them into X. There's no way to access the current PC contents other than a JSR/BSR, so you'd need a stub function that copied its return address to A/B or X or into memory before returning. Ugh. At least, yeah, you can call a calculated address and easily get the return address pushed.

z80 has no calculated call at all. Only absolute. And you can only get the PC with a call. So need the stub routine. At least the stub can simply load the pushed PC into a 16 bit register before returning. But you need to calculate the return address and manually push it before using CALL (HL/IX/IY).

8086 also doesn't have PC-relative call.

M6809 has both BSR with 8 bit offset and LBSR with 16 bit offset. And JSR with absolute address.

Probably with everything else the best approach would be to load the desired offset into registers and then do an absolute call to a utility function that uses the return address in the normal way, as well as to calculate an address to jump to.

e.g. for 6502

Code: [Select]
// the call site // NB offset is relative to the last byte of the JSR, *not* to the jsr or the next instruction ldy #offsetHI lda #offsetLO jsr relativeCall // the utility function relativeCall: tsx clc adc $100,x sta zptmp tya adc $101,x sta zptmp+1 jmp (zptmp)
So you're looking at 7 bytes instead of 3 at the call site, which is not awful, but 2+2+6+2+2+4+3+2+4+3+5 = 35 cycles instead of 6 for the call. Plus, it nukes ALL the registers, so you can't pass any actual arguments in registers.

z80 a little better, but still not fun:

Code: [Select]
// call site. Offset is from start of following instruction ld bc,#offset call relativeCall // the utility function relativeCall: pop hl push hl add hl,bc jp (hl)
So that's 6 bytes at the call site and 10+17+10+11+11+4 = 63 cycles instead of 17 for an absolute call.

At least the z80 utility function is only 4 bytes vs 15 bytes. But you only need one copy of it, so that hardly matters.

Sure, but who said it could only be relative? the 6502's JMP instruction has absolute and indirect forms for example, JSR could have leveraged these two modes too.

Kleinstein · « **Reply #11 on:** November 23, 2022, 07:13:42 pm »

The JSR instruction is already one of the slowest instructions. Making it even more complicated with a complicated adressing mode would make it even slower and this time would add to the worst case interrupt latency. A longer instruction may also need more entries for the microcode. The 6502 is still a CPU with more priority on few transistors in the CPU than on short code for more rare cases.

The main reason for PC relative JSR would be writing relocatible code. At least for the PET and C64, but likely also for the Apple 2 there are tools (command in the PET debug / ASM tool) to relocate code. So one can get shift code to a different position in a different way. If really needed one could likely write a subroutine for such a jump. E.g. the C64 basic has a small bit of code in the zero page to have a computed jump / JSR. Anoher way would be doing it like many moden OS: have a special loader that does the final linking.

Loading a given code to different positions in the memory was not so much a thing when the 6502 was new. It was more made to work with code from ROM. In this case there is little need for a PC relative JSR. Even with code from a disc or tape much was one program at a time.

brucehoult · « **Reply #12 on:** November 23, 2022, 07:56:56 pm »

Quote from: Sherlock Holmes on November 23, 2022, 05:53:49 pm

Sure, but who said it could only be relative? the 6502's JMP instruction has absolute and indirect forms for example, JSR could have leveraged these two modes too.

"there is no support at all in any other instructions for the 16 bit add needed for an arbitrary PC-relative call, so it would have been tricky to provide."

If you're happy with indirect JSR instead of relative JSR then instead of the ideal...

Code: [Select]

  jsr ($nnnn)

... all you need is ...

Code: [Select]

  jsr indirectJsrViaNN

  :
  :

indirectJsrViaNN:
  jmp ($nnnn)

I expect an actual indirect JSR instruction would take 8 cycles -- certainly no fewer, since it reads and writes 8 bytes to/from RAM -- and the substitute takes 11 cycles, so not a huge deal.

Peabody · « **Reply #13 on:** November 24, 2022, 04:08:07 am »

Quote from: Kleinstein on November 23, 2022, 07:13:42 pm

The JSR instruction is already one of the slowest instructions. Making it even more complicated with a complicated adressing mode would make it even slower and this time would add to the worst case interrupt latency. A longer instruction may also need more entries for the microcode.

Microcode? Microcode in a 6502?

brucehoult · « **Reply #14 on:** November 24, 2022, 05:36:59 am »

Quote from: Peabody on November 24, 2022, 04:08:07 am

Quote from: Kleinstein on November 23, 2022, 07:13:42 pm
The JSR instruction is already one of the slowest instructions. Making it even more complicated with a complicated adressing mode would make it even slower and this time would add to the worst case interrupt latency. A longer instruction may also need more entries for the microcode.

Microcode? Microcode in a 6502?

Yes. And if a hypothetical indirect JSR (for example) took 8 clock cycles while the currently longest instruction takes 6 cycles, then that would expand the microcode ROM by 33%, and it's ALREADY taking up a very big part of the chip.

radiogeek381 · « **Reply #15 on:** November 25, 2022, 01:47:41 pm »

Quote

Yes. And if a hypothetical indirect JSR (for example) took 8 clock cycles while the currently longest instruction takes 6 cycles, then that would expand the microcode ROM by 33%, and it's ALREADY taking up a very big part of the chip.

That's an interesting statement. Where do you see a microcode ROM on the die photo. (I'm looking at https://www.embeddedrelated.com/showarticle/1453.php) When I look at https://en.wikipedia.org/wiki/MOS_Technology_6502 I see that the annotated picture labels the regular structure at the top as the instruction decode logic. It is extremely unlikely that a commercially developed 6502 would use microcode. In some designs, microcode was stored in a ROM that looks kinda like hat ROM at the top, but it is only 11 bits wide (or 11 words long?) so I'd suspect this is a decode table (think PLA).

This is not microcode. As a term of art, going back to the 1950's (See Maurice Wilkes' paper -- something like "The proper way...") microcode was an actual sequence of program steps where the transformation between bits in the instruction and actual control signals was simple, if not direct. The instructions would also include some sequencing information that determined the location of the next microinstruction.

The 6502 ISA encoding appears to be laid out to facilitate direct, or at least simple, decoding.

Microcode was not seen very frequently in microprocessor designs of the 70's. It wasn't necessary, there was a pretty well established design culture that knew how to decode the simple ISAs of the day, and direct decode makes the most out of the silicon area.

None of the PDP-8's were microcoded. To my knowledge, many of the PDP-11 implementations were direct-executed.

In later years, microcode became anathema to high performance processor designers. None of the DEC Alpha processors used microcode -- I was there. It is *extremely* unlikely that any of the other RISC processors of the day would have used microcode. Microprogramming defeats the entire purpose of these architectures.

That ROM looking thing at the top is a decoder, not a sequential microcode table. In fact, it was likely designed as a sort-of-PLA. I'm pretty sure I could build a decoder to manage a simple implementation in a week or so -- there are only 80 or so instructions in the base ISA. The encoding isn't ideal from a modern perspective, but I suspect that every assignment to the table was motivated by some logic path that they wanted to simplify. (Notice for instance that all the opcodes with an implied operand have a low nibble of 8 or A.)

This is off topic, sorry, but it seemed worth noting.

radiogeek381 · « **Reply #16 on:** November 25, 2022, 02:15:04 pm »

Not to beat a dead horse,

(but to beat a dead horse anyway.)

I see in the regular structure at the top of the die a number of "all-the-same" rows. (I can't tell if these are "device populated" or "empty" entries.) Often these rows are adjacent. They appear to correspond, at least in density, and perhaps in location and pairing, to "unused" opcodes. This is a further indication that it is not a microcode ROM. Microcode space was so precious that every machine I've ever known would have optimized the heck out of it so that there were no duplicated instructions, never mind adjacent identical instructions. (This is why the next-micropc field in many microcode layouts can be *really* bizarre.)

brucehoult · « **Reply #17 on:** November 25, 2022, 11:21:16 pm »

There used to be some excellent pages fully describing the instruction sequencing, including the ROM/PLA contents, what the output signals control etc. But I can't find them now. I think there was good stuff at visual6502.org but it is currently down.

Looking at the organisation of the ROM again, I think I overestimated the expansion by introducing an instruction or two taking 8 cycles instead of 6.

Logically, it's a ROM with 11 input bits (8 from the opcode, 3 from the instruction cycle number) and 21 output bits, but compressed using AND/OR logic with 22 inputs.

The 22 inputs are the 8 opcode bits, the inversion of the 8 opcode bits, and 6 one-hot bits for the cycle number. There are 130 rows in the array.

Each row fires if the 1 bits in the 22 control bits in the row match 1 bits in the 22 input bits. Multiple rows can fire at the same time. Each row that fires enables some subset of the 21 output bits. The output bits from different rows firing at the same time are OR'd together.

So each row applies to only 1 of the 6 cycles in an instruction, but can select for "opcode bit n must be 0", "opcode bit n must be 1", or "don't care what opcode bit n is". (Or "opcode bit n must be both 0 and 1 at the same time" :-) )

I can't find a sufficiently detailed reference now, but I *think* some of the 21 output bits specify "next cycle number" to enable tail sharing for instructions that e.g. use the ALU or write back the result or read the next opcode in the last cycle or two of their execution but execute in a different number of cycles e.g. ADC with different addressing modes so the actual operand value arrives at different cycle numbers. So even 2 or 3 cycle instructions end on cycle #6. I'm sure this must be the case, otherwise you'd need a lot more than 130 rows.

So, yeah, ok, adding some 8 cycle "BSR indirect" instruction would add 2 new input bits to the array, but might add only a handful of rows.

metertech58761 · « **Reply #18 on:** December 06, 2022, 05:34:00 am »

As someone who poked around a bit with 65xx assembly back in the day, I thought I'd just chime in and say of the two indirect addressing modes - the (indirect),Y addressing mode was very common while I personally never saw any deliberate usage of (indirect,X).

Any time I saw any (indirect,X) in a 6502 disassembly, I was most likely looking at a data table misinterpreted as code by whichever disassembler I was using.

I would also say that I learned a fair bit about 65xx coding by reading that chapter in the Commodore 64 Programmer's Reference Manual, and there's also a scanned version online of "6502 Assembly Language Programming" by Lance Leventhal that's supposed to be very good.

MIS42N · « **Reply #19 on:** December 06, 2022, 09:08:06 am »

This thread tickled some long dormant neurons. I first encountered a 6502 on a KIM-1. I wrote a very primitive assembler that occupied very little memory (I have searched for the source, I think it is in a storage unit somewhere). It almost fitted the available RAM in the two MCS6530 chips (128 bytes). It accepted all 6502 op codes and addressing modes and if the instructions were properly formatted, would generate the right binary. However, it had zero error checking. The base operator was created from a 26 byte lookup table, each byte segmented into 3 bit patterns (I think it was 3bit, 3bit, 2bit) for the three characters in the op code. The relevant patterns were added to create the base op code binary. The operand was examined for X,Y,(),$,comma and any hex digit. A hex digit automatically became the operand binary, the other characters modified the base op code binary to make the final op code. It was testament to how orthogonal the instructions were, that only JMP needed special handling. An idiosyncrasy was it didn't matter where in the operand the special characters appeared. LDA ($20,X) could be written LDA X2,($0) and produce the same result.

Because there was zero checking, a disassembler was written that accepted the output of the assembler and compared it character for character with the input, and flagged a mismatch. So first the source was run through the assembler/disassembler, then through the standalone assembler.

The familiarity with 6502 came in handy, modifying a C64 to be a terminal connected to a modem. The serial communication was bit bashed through the parallel port with either a 1200/75 or 1200/1200 baud modem. It could handle 2400 baud half duplex but the application used a sliding windows error correcting protocol which ran full duplex (and nobody could afford a 2400 baud modem). The comms was mostly hidden behind the BASIC ROM - a write to those addresses went to RAM 'behind' the ROM, and when the time came the ROM could be deselected so the RAM could be read.

The main reason the 6502 was fast, it was pipelined. In the same clock that decoded an instruction, it fetched the next byte. Most times that was a useful thing to do, almost doubling the speed compared with a non pipelined processor (most of the other contenders). Nice to work with.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: 6502 addressing modes: VERY confused. (Read 2383 times)

Share me