Summary: A new RISC processor to enhance the computing abilities of the scientific, modeling, data-processing, and engineering worlds. While the proposed architecture would work for general-purpose personal computers, it is a far cry above what 85% of the home market will ever need.
Over the years, much debate has been centered on the pros and cons of RISC and CISC approaches to computing. I am proposing a new RISC-based processor. So far, much consideration has been given to 64-bit computing, but larger architectures seem to get very little, if any, mention.
I would like to see a 128-bit or 256-bit general-purpose processor be manufactured. This document gives a description of its hardware instructions, the format to be used in writing code for the processor, and arguments for its adoption.
There are a few separate categories of instructions:
Arithmetic
add, sub, mul, div, inc, dec
Logical
xor, or, and, not
Floating-point
fadd, fsub, fmul, fdiv
Bit-wise
shl, shr, rol, ror
Stack
push, pop
Memory
get, put
Flow-control
cmp, jmp, je, jne, jl, jg
Subroutine
call, ret
System
int
Cache contributes greatly to the performance of a given processor. To enable high-performance applications to be implemented for this architecture, there will be at least 512K L1, 4M L2, and 8M L3 on-board cache for a 128-bit processor. For the 256-bit implementation, there should be at least 1M L1, 8M L2, and 16M L3 cache on-board. For very small operating systems, it is conceivable that the OS could always stay resident in the cache of one of the processors in a multi-processing system. If the applications being run were small enough, the OS could even stay resident in the cache of a single-CPU system.
One of the over-arching needs of any new processing architecture is to have lots of general-purpose registers (GPR). Registers, which sit on the processor, are substantially faster than hitting memory. To achieve this goal, my proposed architecture will have as many registers as the size of the architecture. If this instruction-set is implemented in a 128-bit processor, their will be 128 GPRs. (r0..r127). On the other hand, if this is implemented in a 256-bit CPU, there should be 256 GPRs (r0..r255).
There is also the need for a few special-purpose registers:
| rx | Register to hold remainder in div operations |
| rf | Flags register |
| rs | Stack pointer register |
| rip | Instruction pointer register |
| rm | Register to hold number of bits to be shifted/rotated |
| rt | Current value at the top of the stack |
In addition to the normal GPRs, there will be a second ‘stack’ of registers to handle overflow in mathematical operations. Two 128-bit operands could yield a 256 result in a multiplication operation, so there will be a set of registers s0..s127 in a 128-bit CPU to handle the potential overflow in the result. Additionally, divisions will use the full double-architecture-width register (e.g. s2-r2) when performing a division. The second stack of registers will be the ‘high half’ of the result in multiplying, and the dividend in a division operation.
Every instruction will follow the same basic form, with all operations working on a register value. Only get and put will access memory. Binary operations - such as add, sub, and, etc - can either use two registers or a register and a constant.
add reg, reg/const
sub reg, reg/const
mul reg, reg/const*
div reg, reg/const*
xor reg, reg/const
and reg, reg/const
or reg, reg/const
get reg, reg/mem/const
put mem, reg/const
shl reg, rm/const*
shr reg, rm/const
rol reg, rm/const
ror reg, rm/const
shl r4, 7 puts the high 7 bits of
r4 into
s4); div uses value in second register as high half of
dividend
not reg
push reg/const*
pop reg*
cmp reg, reg/const
jmp <label>
je <label>
jl <label>
jg <label>
jne <label>
call <subroutine>
ret*
call
fadd reg, reg/const
fsub reg, reg/const
fmul reg, reg/const*
fdiv reg, reg/const*
This accesses BIOS and rudimentary OS routines.
int const
There are myriad potential benefits of large
processor architectures. The first is that, relatively speaking,
infinite quantities of RAM and storage space can be directly
addressed.
Other benefits from using such a large architecture come in the scientific processing world. Computer scientists constantly need to think about the level of precision their machines can handle. Even top-end supercomputers cannot handle the floating-point operations I described above. Large-scale simulations, such as those performed by the ‘Earth Simulator’ in Japan, could have their accuracy greatly enhanced by using larger architectures. Floating-point round-off error in 256 or 512 bits is not worrisome to the vast majority of the scientific community.
Other possible uses for this processor include engineering workstations, data-mining servers, dedicated cryptography machines, and the high-end graphics market. A wide data path, such as the one described in this article, makes implementing and executing most encryption algorithms simple. Most encryption algorithms operate on large chunks at a time, such as the Advance Encryption Standard, which uses either 128- or 256-bit blocks in the rounds of its algorithm.
Performing advanced data analysis and mining using a machine that could directly address all of the storage in use on the planet would be a statistician’s dream. Engineers and graphics professionals alike would see their productivity enhanced by having more of their calculations performed more directly rather than in extremely segmented chunks as it is on current, common-off-the-shelf hardware.
Overall, I believe the processor manufacturing industry needs to take a quantum leap forward in technology. Recently, AMD and Intel have introduced 64-bit extensions to the ubiquitous x86 architecture. DEC gave us the Alpha CPU several years ago, and Apple uses IBM’s Power5 processor in its newest desktop machines. Shifting from 32- to 64-bit computing is a welcome change for those among us who are at the edge of exceeding the abilities of 32-bit processors and their inherent limitations, but why stop there? Why not take that extra step now to ensure that we won’t have to make another architectural change in another decade or two?
There is a similar shift taking place in moving from the current IPv4 to IPv6 addressing schemes for computer networks, going directly from 32-bit to 128-bit addresses. This change will take us from approximately 4 billion available nodes on the network to about one mole of addresses per square yard of the earth’s surface.
Let’s make this change sooner, rather than later.