Sunday, 16 June 2013

First working emulator.

I will release all the source code and runnable versions of the programs and so on, but not yet (it will be mostly Java)

However I have managed to get a working SM-510 emulator going (see picture). I haven't bought a Mac, it is ArchLinux with Cupertino decorations (same as Mac).

This is part of my emulator framework, which does 99% of the work here ; all you have to do is tell it what to put in the CPU area, the code memory area and the data memory area. It also provides stuff like an expression evaluator for assemblers and a beeper that tend to get used a fair bit (have you ever tried to write code to simply beep in C# ???)

It won't work for really complicated processors but it works fine for the 4 and 8 bit processors I've tried it on and some really old ones (I have been tinkering with really old computers like SEAC and the TX-0, TX-0 is 18 bits wide and SEAC is 40. One big mid innings tweak was to change the datatype from int to long to cope with the very wide instruction words)

The framework does all the work of the time synchronisation and single stepping and breakpoints and all that kind of stuff. It's a little primitive but it does basically work.  As you can see from the picture I haven't quite got the hang of Swing layouts yet, hence the big white gap at the bottom :)

All you need to do is to subclass the Processor class and ask it to implement a couple of interfaces so the controller can interrogate and control it. Going from the processor class which just emulated the CPU to this working emulator took about 20 minutes.

There is one cheat though. Many of these early CPUs do not use a program counter that goes up in ones, but a recycling shift register (look up LFSR on wikipedia) which generates a predictable pseudo random sequence. Presumably because it is much lighter on silicon. So instead of numbers going 0,1,2,3 it goes 0,13,23,14,7,57,48 or something like that. I have simply missed that out. I'm pretty certain that the developer tools (as with the TMS series) sort this out for you and you just pretend it doesn't exist. Otherwise all your code lines would be mixed up.

At the moment, this cannot run anything at all (the code and data in the display above is just random numbers used for testing). There are some very good free macroassemblers out there - ASL will assemble all kinds of wierd and wonderful processors, but nothing that will Assemble SM-510 source.

So the next task is to create an assembler so I can actually get some real code running.

Then I will implement the hardware interface - this system here works but the I/O commands go into the ether - a dummy implementation of the IHardware interface is connected to it.  This means the display will look much the same but it will acquire a partner window, which is the watch display face.


Macro Instruction Set Generation

Having got that out of my system,  I have actually been doing something productive. I've added the CPU Core generator to the links on the right.

This is a bit like a specialised macro processor - it's a program in Python and a definition file which contains macro style definitions for the processor. It looks a bit like this (these are three real examples).

0B    "EXBLA"     _temp = A;A = BL;BL = _temp
5F    "LBL @"     _temp = fetch();BL = _temp & 0xF;BM = _temp >> 4    
18-1B "LDA {O}"   A = {M};BM = BM ^ 0x{O}

The first one is a simple instruction "EXBLA" which exchanges A and Bl. The second one is a two byte instruction (that's what the '@' is for that fetches an operand and puts it into BL and BM.  The final one is three instructions , 18-1B which generate instructions LDA 0, LDA 1, LDA 2, LDA 3 - each reads memory into A and the exclusive or's the Bm register with 0-3 respectively.

The macroprocessor scans it, chucks the comments and produces two files. One is the innards of a huge switch() statement - for the first one it would generated something like:

case 0x0B: /* exbla */
   _temp = A;A = BL;BL = _temp;
   break;

and it also creates an array of all the mnemonic names which the disassembler built into the emulation shell uses. These are cut and pasted into Eclipse.

This does produce a very long and unwieldy case statement. But it does have some advantages. Any error can be corrected by changing the definition file and will fix everything consequently. It keeps the Java compiler generating a table jump rather than a sequence of If statements which it does if the case is sparsely populated. Where it wastes is things like TM which has 64 near identical instructions which vary only in their single parameter.But what I've found is that this doesn't seem to balloon the code, which is probably down to the compiler.

So then that's wrapped in a simple processor shell which has some methods for handling the more complex instructions. Hardware stuff is passed of to an Interface which represents the I/O pins.

The next stage is to make it work as an emulator. Which I've actually already done so I will write another post immediately following this.



   

Instruction Set

One thing I've learnt about messing around with retrocomputing is when you dig into something, never assume the designer was sober or sane.

There are design decisions that make you think "What ?" on a regular basis. Mixing up connections for no apparent reason. Sometimes the whole design is bonkers - for example the "Invisible Alien Neutralizer".

When I reversed engineered the BIOS in the Odyssey 2 years ago, there was a bit of code that had me baffled, it was bit hacking in an incomprehensible manner. I couldn't step through it and in the end just marked it 'unknown purpose'. Another fellow filled in the gaps later on. It was a routine to calculate the exclusive or of 2 bytes. Which would be fine if the 8048 in the Odyssey 2 didn't actually have an Exclusive Or instruction built in. 

The SM510 is actually relatively sensible.  Still there are wierd things.

The person who designed it has an obsession with "T" in mnemonics. So in the branching code it is used for Transfer (what everyone else calls branch or jump) - T (transfer in page) or TL (Transfer long). Subroutines are indicated with an "M" - I guess for Memory (of return address) so you get TM (see later) and TML (Transfer memory long). Everyone else uses Jump, Jump Subroutine, Branch, Call, but what the heck. Transfer if used is used for register to register transfer (e.g. TAX in the 6502). These conventions date back to things like the 4004 and the TMS1000 in Microcomputing and further back - the PDP-8 has skip instructions.

This wouldn't have been so bad if all the skip instructions didn't begin with T as well. So you have TC (test on carry), TAM (test A = memory) and so on. Everyone else in the world uses variants on SKIP for skip tests SKZ, SKIPZ and so on, but this lot use TA0 (test accumulator is zero). Some of them are backwards (TC tests if carry is zero for example .....). I'm seriously considering producing an alternate mnemonic set just to make it clearer. The indirect jump is not "JIN" or something like that but "ATPL" (A transfer to P-Lower) which isn't actually what it does anyway. (it transfers into the lower 4 bits only).

On the subject of subroutines, what do you think RTN0 and RTN1 do ? Return 0 and 1 ? Wronnnnng. RTN0 is return. RTN1 is return and skip. The COP series (NS 4 bitters) has the same thing but uses RET and RETSK.

Then there's ADD11. Now, I thought this was a copier smear when I saw this first, but it's in all the documents. Have a guess at what it does ?

Nope, it doesn't ADD11. It has nothing at all to do with 11, or two 1's (Though there is an ADX 11 which does add the constant 11 - the X is probably for consistency with LAX 11 which is load 11 constant, wich still doesn't explain the X anyway). 

The stock Add (add memory to A) is called ADD - it just adds it and changes nothing else. ADD11 is add with carry in, carry out and skip on carry. I did wonder if it was related to the opcode value, but as that's 09 that's somewhat unlikely.

Why not ADC, ADCSK or something that gives some resemblance to what the opcode actually does ?


Then there's TM. It took me ages to figure out what this actually does. According to the data sheet it does:

TM x      C0-FF   R<-S<-PC+1,PU<-0,PM<-0
                  PL<-x(I5-I0),PU<-y(I7-I6)
IDX yz    00-FF   PL<-z(I5-I0),PM<-(0100) base2

which is pretty obvious ...... no (the datasheet doesn't tell you what x,y,z are). And it says it's a two byte instruction. It isn't. Well, it sort of is, it's just the two bytes are nowhere near each other. Most two byte instructions are consecutive.

It describes it as an indexed jump, which means whoever wrote it didn't know the difference between indexing and indirection. I figured it out with the help of the Patent and deductive reasoning.

What it actually is is a one byte subroutine call. The TM x (e.g. TM 20) refers to a byte in Page 0. This byte is split into two bits - the lower 6 bits go to the program counter, the upper 2 bits have 16 added to them and go in the page register. 

This baffled me for ages because the instruction set actually has a two byte long subroutine call (TL) which on the face of it does the same thing. The advantage that the TM call has is that each call uses only one byte of code memory - and one byte of memory in page 0 which is the indirect address - but the byte in page 0 is only required once. So if you call the same subroutine 10 times you use 11 bytes, whereas TL uses 20.

The instruction does a second fetch at the point of the comma in the middle line. Of course, it doesn't bother to say this.

There is an example of it in the datasheet which takes half a page to make this utterly unclear.

Finally the leap of faith. There are two different 3A opcode instructions (ADX 10 and DC). Both add 10 to the accumulator and leave the carry flag unchanged. One skips on carry. The other doesn't. You can't really miss them in the datasheet as they are seperated by one instruction. Well, Sharp, which one is it ?

I hate to guess, but I've gone for the 'skip on carry' one because all the other add constant instructions are skip on carry out. It would be mind numbing to have an add constant instruction which skipped on carry for every value other than 10 being added.

But that gets back to the start ; don't assume sobriety or sanity. In the absence of a Sharp SM5 programmers guide though, in this case I have to.

Sharp SM510 Innards

This (right) is a block diagram of the SM510 4 Bit Microcontroller made by Sharp.

Almost all of these microcontrollers are very very similar of this era, they are probably based around the TMS1x00 series' design.

There's some ROM, some RAM which is accessed via an index register pair (BM:BL on this processor - it's a pair because you need more than 4 bits usually !), and some input and output ports.

Most of the connections here are to the LCD the processor is driving (a1-a16,b1-b16,H1-H4). a1-a16 and b1-b16 are the segments, and H1-H4 are common selection lines, allowing 32 x 4 segments in total , or 128 segments.

Other than that there are a few oddities. The 'W' register on the left is a Serial in Parallel out shift register written to by two instructions - write one/zero and shift  (WS and WR), these output on the S-Lines on the left.

There is a divider which can be tested (the DIV(15) bottom middle) which clocks at the Osc rate which is 32,768Hz, which as any good assembler programmer knows is 2^15Hz. Divide this by two 15 times and you get one Hz, which is ideal for a Watch or Clock device which is what this chip is for, one pulse a second.

R1 and R2 provide a 4,096 Hz tone which can be then modulated under program control. The later versions of this (see the SM511 datasheet) have automatic melody generation and even later ones have automatic sound effect generation. But on the SM510 you have to modulate it by hand. Oh joy.

ACL is the reset, K1-K4 are inputs that can be read into the Accumulator, and B and BA are inputs that can be tested individually by an instruction. A lot of these early microcontrollers have instructions that effect I/O directly - this one has (as an example) ATR which copies the 2 LSBs of the Accumulator into R1 and R2. Modern MCUs almost all have memory mapped I/O so instead of having WS and WR you'd have a register to write to to update the W register (and therefore S1-S8)

It's quite slow ; it clocks at 16,384Hz normally, a 61us instruction cycle which is even slower than a TMS1000 (50,000 Hz).  But if you are developing a watch all you need to do is wake the processor up every second (this MCU has a sleep mode), update the LCD display and go to sleep again.

However against this is the fact that you don't need to refresh the display. The display RAM (top right) which is between $60 and $80 is directly mapped onto the an,bn,hn lines, so to light an LCD segment all you have to do is set a memory bit, which is a single instruction. The MCU does all the work for you.

(This is probably partly responsible for the slower speed. The MCU will have to do 2 things at once - drive the LCD and run the Processor code, and dual ported Memory is expensive. It's more likely that the RAM access is locked for one or the other)

Having done some messing with the MB Microvision, this is much easier. Coding for the Microvision is a bit like coding for the 2600, in that you have to generate the display and refresh it at about 20 or so fps, it is not fun to put it mildly if you have the TMS instruction set which is very basic. With a Microvision (a 16x16 grid) you write 16 bits of row data, then 16 bits of column data, any pixel which is on in both the row and column lights up (actually it goes dark :) ) so this means usually you write a row at a time, with one column bit set, then you do the next one, round and round you go for ever.