Skip to content Skip to navigation

Connexions

You are here: Home » Content » Proposal For Large Processor Architectures

Navigation

Recently Viewed

This feature requires Javascript to be enabled.
 

Proposal For Large Processor Architectures

Module by: Warren Myers. E-mail the author

Summary: A new RISC processor to enhance the computing abilities of the scientific, modeling, data-processing, and engineering worlds. While the proposed architecture would work for general-purpose personal computers, it is a far cry above what 85% of the home market will ever need.

Over the years, much debate has been centered on the pros and cons of RISC and CISC approaches to computing. I am proposing a new RISC-based processor. So far, much consideration has been given to 64-bit computing, but larger architectures seem to get very little, if any, mention.

I would like to see a 128-bit or 256-bit general-purpose processor be manufactured. This document gives a description of its hardware instructions, the format to be used in writing code for the processor, and arguments for its adoption.

Instructions

There are a few separate categories of instructions:

Arithmetic

add, sub, mul, div, inc, dec

Logical

xor, or, and, not

Floating-point

fadd, fsub, fmul, fdiv

Bit-wise

shl, shr, rol, ror

Stack

push, pop

Memory

get, put

Flow-control

cmp, jmp, je, jne, jl, jg

Subroutine

call, ret

System

int

Cache

Cache contributes greatly to the performance of a given processor. To enable high-performance applications to be implemented for this architecture, there will be at least 512K L1, 4M L2, and 8M L3 on-board cache for a 128-bit processor. For the 256-bit implementation, there should be at least 1M L1, 8M L2, and 16M L3 cache on-board. For very small operating systems, it is conceivable that the OS could always stay resident in the cache of one of the processors in a multi-processing system. If the applications being run were small enough, the OS could even stay resident in the cache of a single-CPU system.

Registers

One of the over-arching needs of any new processing architecture is to have lots of general-purpose registers (GPR). Registers, which sit on the processor, are substantially faster than hitting memory. To achieve this goal, my proposed architecture will have as many registers as the size of the architecture. If this instruction-set is implemented in a 128-bit processor, their will be 128 GPRs. (r0..r127). On the other hand, if this is implemented in a 256-bit CPU, there should be 256 GPRs (r0..r255).

There is also the need for a few special-purpose registers:

Table 1
rx Register to hold remainder in div operations
rf Flags register
rs Stack pointer register
rip Instruction pointer register
rm Register to hold number of bits to be shifted/rotated
rt Current value at the top of the stack

In addition to the normal GPRs, there will be a second ‘stack’ of registers to handle overflow in mathematical operations. Two 128-bit operands could yield a 256 result in a multiplication operation, so there will be a set of registers s0..s127 in a 128-bit CPU to handle the potential overflow in the result. Additionally, divisions will use the full double-architecture-width register (e.g. s2-r2) when performing a division. The second stack of registers will be the ‘high half’ of the result in multiplying, and the dividend in a division operation.

Instruction Format

Every instruction will follow the same basic form, with all operations working on a register value. Only get and put will access memory. Binary operations - such as add, sub, and, etc - can either use two registers or a register and a constant.

Resultant on left:

add reg, reg/const

sub reg, reg/const

mul reg, reg/const*

div reg, reg/const*

xor reg, reg/const

and reg, reg/const

or reg, reg/const

get reg, reg/mem/const

put mem, reg/const

shl reg, rm/const*

shr reg, rm/const

rol reg, rm/const

ror reg, rm/const

Note:

*mul and shl dump overflow into the second register of the same index (e.g. shl r4, 7 puts the high 7 bits of r4 into s4); div uses value in second register as high half of dividend

Resultant is itself, or no change:

not reg

push reg/const*

pop reg*

Note:

*these also change the value in the top of stack register, rt

Comparison and flow-control:

cmp reg, reg/const

jmp <label>

je <label>

jl <label>

jg <label>

jne <label>

call <subroutine>

ret*

Note:

*returns to last instruction point before the last call

Floating-point:

fadd reg, reg/const

fsub reg, reg/const

fmul reg, reg/const*

fdiv reg, reg/const*

Note:

*these use the second register for the entire calculation, they do not put overflow anywhere

System:

This accesses BIOS and rudimentary OS routines.

int const

Benefits of a large architecture

There are myriad potential benefits of large processor architectures. The first is that, relatively speaking, infinite quantities of RAM and storage space can be directly addressed. 2128 2 128 words of memory, presuming each word is 128 bits long, is in the range of 4 * 1040 4 * 10 40 bytes of memory. A 256-bit CPU could theoretically address 2 * 1079 2 * 10 79 bytes of memory! Realistically, this much storage will never be available to us. And, if the memory addresses are actually encoded in a word-length instruction along with the register to use and the instruction itself, there would only be 6 * 1035 6 * 10 35 bytes addressable. This is still well beyond the capabilities of any machine to enclose, and should, therefore, provide for nearly infinite use of this architecture into the distant future.

Other benefits from using such a large architecture come in the scientific processing world. Computer scientists constantly need to think about the level of precision their machines can handle. Even top-end supercomputers cannot handle the floating-point operations I described above. Large-scale simulations, such as those performed by the ‘Earth Simulator’ in Japan, could have their accuracy greatly enhanced by using larger architectures. Floating-point round-off error in 256 or 512 bits is not worrisome to the vast majority of the scientific community.

Other possible uses for this processor include engineering workstations, data-mining servers, dedicated cryptography machines, and the high-end graphics market. A wide data path, such as the one described in this article, makes implementing and executing most encryption algorithms simple. Most encryption algorithms operate on large chunks at a time, such as the Advance Encryption Standard, which uses either 128- or 256-bit blocks in the rounds of its algorithm.

Performing advanced data analysis and mining using a machine that could directly address all of the storage in use on the planet would be a statistician’s dream. Engineers and graphics professionals alike would see their productivity enhanced by having more of their calculations performed more directly rather than in extremely segmented chunks as it is on current, common-off-the-shelf hardware.

Concluding remarks

Overall, I believe the processor manufacturing industry needs to take a quantum leap forward in technology. Recently, AMD and Intel have introduced 64-bit extensions to the ubiquitous x86 architecture. DEC gave us the Alpha CPU several years ago, and Apple uses IBM’s Power5 processor in its newest desktop machines. Shifting from 32- to 64-bit computing is a welcome change for those among us who are at the edge of exceeding the abilities of 32-bit processors and their inherent limitations, but why stop there? Why not take that extra step now to ensure that we won’t have to make another architectural change in another decade or two?

There is a similar shift taking place in moving from the current IPv4 to IPv6 addressing schemes for computer networks, going directly from 32-bit to 128-bit addresses. This change will take us from approximately 4 billion available nodes on the network to about one mole of addresses per square yard of the earth’s surface.

Let’s make this change sooner, rather than later.

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks