RISC microprocessor based on MIPS architecture rm7000a
Abstract: This paper briefly introduces the main characteristics of rm7000a microprocessor based on MIPS instruction set, such as large capacity on-chip cache, superscalar pipeline, instruction double emission, large number of register groups, and discusses its two application schemes
key words: MIPS architecture of rm7000a microprocessor
introduction
among many types of RISC CPU systems, MIPs (microprocessorwithout interlocked pipeline stages) is a quite successful one. Since John Hennessy successfully completed the first MIPS microprocessor adopting RISC concept in Stanford University in 1983, CPU based on MIPS architecture has been widely used in network, communication, multimedia entertainment and other fields. Cisco's routers, IBM's network color printers, HP's 4000, 5000, 8000, 9000 series laser printers and scakostron's focus is on the research and development of process scanners, Sony's Playstation and PlayStation 2 game consoles, etc., all of which are microprocessor products that implement different MIPS instruction sets
mips Technologies Inc does not produce microprocessors. It only designs high-performance industrial 32-bit and 64 bit CPU architectures, and provides authorization to other semiconductor companies to use its core (IP) for the production of MIPs based and distinctive microprocessors. According to MIPs, more than 50 companies have applied for authorization, including famous large enterprises in the IT industry, such as AMD, ATI, Ti, NEC, Toshiba, Philips, PMC Sierra, IDT, QuickLogic, Marvell, etc
1 rm7000a overview
rm7000a is a kind of rm7000 series microprocessor of PMC Sierra company, which uses CMOS 0.18 μ It is made by M technology, including 2 independent 64 bit integer units and 1 64 bit floating-point unit; The main cache, L2 cache and external extended cache controller are integrated on-chip (up to 8MB L3 cache is supported); It has the function of sending 2 instructions in one clock cycle; Support data prefetch; The maximum working frequency can reach 400MHz; It can work at a wide temperature, and the working temperature of 350MHz industrial processor is - 40 ~ +85 ℃
the microprocessor has the following main characteristics
(1) on chip integrated high-capacity main cache and L2 cache
main cache includes 16kb instruction cache and 16kb data cache. The finger buffer has its own 64 bit read channel and 128 bit write channel, and allows the finger buffer to be accessed at the same time. At 400MHz, the main buffer can provide a total bandwidth of up to 6.4gb/s for integer units and floating-point units. The 256Kb L2 cache has a 64 bit read/write common channel, which can only be used when there is a miss accessing the main cache
the primary cache and L2 cache are connected by 4-way groups. The cache line is 32 bytes long and all of them are non blocking cache, that is, when the cache access generates a cache miss and the miss is not eliminated, the processor can continue the cache access without pausing and waiting. Rm7000a allows access to the cache when there are at most two misses. Only in the following two cases can the microprocessor pause: first, before the first two misses are eliminated, there are cache access instructions on the instruction bus; Second, there are two misses, and the subsequent instruction can continue to execute only after obtaining data from the previous instruction that caused the miss
(2) instructions of dual issue mechanism
rm7000a are divided into four types: integer, floating-point, branch and load/store. The superscalar distribution unit of the processor integer component contains two independent pipelines: m-pipe (memory) and f-pipe (function). Among them, f-pipe handles integer type, branch transfer and floating-point operations, such as addition, subtraction, multiplication, division, etc; M-pipe is responsible for integer type, save/fetch and other operations, and also moves some floating-point numbers between registers. The block diagram of command dual transmission is shown in Figure 1
if one instruction flows out of each pipeline every clock cycle, it seems that two instructions are executed at the same time. However, in the processor, it is not always possible to double fire - Wu Qiong (Technical Engineer), which is related to the specific instruction combination. For example, when an instruction wants to process the control register, it cannot be fired at the same time with other instructions
(3) superscalar pipeline
rm7000a contains a 5th order superscalar instruction pipeline (degree m=2) for m-pipe and f-pipe. Each instruction is divided into five sub processes: I is instruction fetching, R is register fetching, a is execution, D is data fetching, and W is write back, as shown in Figure 2
with the double emission mechanism of instructions, the pipeline will flow two new instructions every time a beat passes. At full load, there can be 10 instructions running on different parts of the pipeline at the same time, which is equivalent to improving the working frequency
in rm7000a, there is actually a level 7 pipeline that handles floating-point operations separately. However, this pipeline is completely transparent to users
(4) register files
rm7000a contains many registers
◆ integer operation register group. Located in the integer unit, it includes 32 64 bit general purpose registers (GPR), 2 registers hi and lo dedicated to integer multiplication and division, and 1 program counter PC (transparent to users). The R0 of the general register is hardwired to 0. It can be used as the destination register to store the temporary results that the instruction will discard later, and it can also provide 0 as the operand for the instruction as needed
◆ floating point operation register group. It contains 32 64 bit floating-point general purpose registers (FGR) and 32 32-bit control registers
◆ system control register group (CP0). Used for memory management scheduling, address translation, exception handling, etc
(5) integrated and efficient memory management unit
in order to quickly convert virtual addresses to physical addresses, rm7000a uses a large capacity fully associative TLB (translation lookaside buffer, as shown in Figure 3) to achieve this goal. This TLB is shared by instructions and data and is called jtlb (joint TLB); It can be configured into 48 pairs or 64 pairs of entries, which map virtual addresses of 96 pages or 128 pages respectively. The size of the page can be configured, from small to large, 4K, 16K, 64K, 256K, 1M, 4m, 16m. When a TLB miss occurs, the replacement algorithm used by rm7000a is mainly random replacement to simplify the hardware design. At the same time, it also provides a mechanism to lock specific addresses, so that the operating system can continuously map some pages to improve performance
asid - address space identifier, a virtual space identifier, represents three virtual spaces: kernel, supervisor, and user
g - global, the identifier of each TLB entry
(6) instruction format and addressing mode
one thing to note is that although rm7000a is a 64 bit microprocessor, its instruction length is fixed at 32 bits
rm7000a is a typical register register microprocessor, that is, except for the load/store instruction, other instructions cannot directly access memory. Such advantages are obvious: the access speed of registers is faster than that of memory, and the area is calculated; The following formula can calculate that the section shortening rate of low carbon steel is much higher, and the register storage improves the efficiency of the compiler, so the execution speed of the program is accelerated; In addition, the number of bits representing the register is less than the number of bits representing the memory unit, which can improve the instruction density
mips instructions can be expressed in three forms
mips architecture supports two addressing modes
◆ immediate. For example: add R1, #10, where #10 represents constant 10, and the instruction meaning is regs[r1] ← egs[r1]+10
◆ displacement. For example: add r1,10 (R2), the meaning of the instruction is regs[r1] ← egs[r1]+mem[10+regs[2]]
slightly change the operand to get another two addressing modes
◆ add R1, (R2) regs[r1] ← regs[r1]+mem[regs[r2]]
register R2 stores only an address, and the content of this address is the data to be retrieved, which is equivalent to register indirect addressing
◆ add r1,10 regs[r1] ← regs[r1]+mem[10]
at this time, the second register must be R0. With the help of 0 provided by R0 (as mentioned above, R0 is hardwired to 0), the direct addressing mode is obtained
therefore, in fact, microprocessors have four addressing modes available, which improves the flexibility of programming
(7) data type
mips system has quite strict requirements for alignment, and instructions must be 32-bit aligned; Data exceeding one byte must be aligned according to rules:
◇ the boundary of half word should be aligned with even number
◇ the boundary of the word should be aligned with the byte of mod 4=0
◇ the boundary of doubleword should be aligned with the byte of mod 8=0
as shown in Figure 4, it is a schematic diagram of data alignment. This alignment method can simplify the design of hardware judgment and control part, save chip space, and also help to speed up the running speed of the program
2 application and implementation discussion
based on rm7000a, with different peripheral devices, it can form a variety of application forms
(1) typical application
Figure 5 shows a relatively simple application mode, which is characterized by few devices, good reliability and small space occupied by the whole system, and is suitable for areas with limited space. Because there are few devices that need to be identified and driven, the boot and application programs are relatively simple; Through the dual UART interface, it can communicate with the host computer and display the operation information of the system in the super terminal of the host computer, as shown in Figure 6
specific applications:
① it can be used as a separate computer board with appropriate peripherals to form a small high-speed computer system for completing relatively simple and single tasks
② the control chip and interface of the bus can be added according to the requirements of the application. As a CPU control board on the bus (such as PCI bus and CompactPCI bus), it can cooperate with sensors or other boards such as ad/da board and DSP communication board to carry out testing, high-speed data acquisition, a large number of graphics and image processing, etc
③ used in other network devices, such as large routers, switches, etc
(2) extended applications
in the above applications, in order to reduce the volume, the storage capacity is limited, and there is no external interface similar to keyboard and mouse. People are used to the use of PC. the basic common sense of purchasing tensile testing machine is to operate other systems in a familiar way. By properly extending the above applications, the structure similar to PC can be achieved. The application block diagram is shown in Figure 7
by adding chips such as Nanqiao and super i/o, various interfaces similar to those in PC structure can be obtained. Specific applications are as follows:
① as a development board, test and transplant different operating systems running on MIPS system, and develop and debug various application software based on MIPS
② as a complete system, it constitutes a portable computer
③ use the dual UART display, or the extended VGA display function, and increase the external memory capacity to design and develop graphics and images as a high-speed graphics workstation
④ as a network server, it is applied to the Internet and enterprise local area to provide various network services
⑤ as distributed
LINK
Copyright © 2011 JIN SHI