# **Embedded Systems**

## 8. Hardware Components

© Lothar Thiele

Computer Engineering and Networks Laboratory



#### Where we are ...



Hardware-

## **Do you Remember ?**



## **High-Level Physical View**



# **High-Level Physical View**



Performance

**Energy Efficiency** 

General-purpose processors

Application-specific instruction set processors (ASIPs)

- Microcontroller
- DSPs (digital signal processors)

Programmable hardware

• FPGA (field-programmable gate arrays)

Application-specific integrated circuits (ASICs)

#### Flexibility

# **Energy Efficiency**



## Topics

- General Purpose Processors
- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- System-on-Chip

## **General-Purpose Processors**

- High performance
  - Highly optimized circuits and technology
  - Use of parallelism
    - superscalar: dynamic scheduling of instructions
    - super-pipelining: instruction pipelining, branch prediction, speculation
  - complex memory hierarchy
- Not suited for real-time applications
  - Execution times are highly unpredictable because of intensive resource sharing and dynamic decisions
- Properties
  - Good average performance for large application mix
  - High power consumption

### **General-Purpose Processors**

- Multicore Processors
  - Potential of providing higher execution performance by exploiting parallelism
  - Especially useful in high-performance embedded systems, e.g. autonomous driving
  - Disadvantages and problems for embedded systems:
    - Increased interference on shared resources such as buses and shared caches
    - Increased timing uncertainty

#### **Multicore Examples**



#### **Multicore Examples**





Intel Xeon Phi (5 Billion transistors, 22nm technology, 350mm<sup>2</sup> area)

#### Oracle Sparc T5

Performance

**Energy Efficiency** 

General-purpose processors

Application-specific instruction set processors (ASIPs)

- Microcontroller
- DSPs (digital signal processors)

Programmable hardware

• FPGA (field-programmable gate arrays)

Application-specific integrated circuits (ASICs)

#### **Flexibility**

## Topics

- General Purpose Processors
- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- Heterogeneous Architectures

## **System Specialization**

- The main difference between general purpose highest volume microprocessors and embedded systems is *specialization*.
- Specialization should respect flexibility
  - application domain specific systems shall cover a class of applications
  - some flexibility is required to account for late changes, debugging
- System analysis required
  - identification of application properties which can be used for specialization
  - quantification of individual specialization effects

## **Embedded Multicore Example**

#### Recent development:

- Specialize multicore processors towards real-time processing and low power consumption
- Target domains:



## **Example: Code-size Efficiency**

- RISC (Reduced Instruction Set Computers) machines designed for run-time-, not for code-size-efficiency.
- Compression techniques: key idea



#### **Example: Multimedia-Instructions**

- Multimedia instructions exploit that many registers, adders etc. are quite wide (32/64 bit), whereas most multimedia data types are narrow (e.g. 8 bit per color, 16 bit per audio sample per channel).
- Idea: Several values can be stored per register and added in parallel.



#### **Example: Heterogeneous Processor Registers**

Example (ADSP 210x):



Different functionality of registers AR, AX, AY, AF, MX, MY, MF, MR

#### **Example: Multiple Memory Banks**



Enables parallel fetches for some operations

## **Example: Address Generation Units**

#### Example (ADSP 210x):



- Data memory can only be fetched with address contained in register file A, but its update can be done in parallel with operation in main data path (takes effectively 0 time).
- Register file A contains several precomputed addresses A[i].
- There is another register file M that contains modification values M[j].
- Possible updates: M[j] := 'immediate' A[i] := A[i] ± M[j] A[i] := A[i] ± 1 A[i] := A[i] ± 'immediate' A[i] := 'immediate'

## Topics

- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- Heterogeneous Architectures

# Microcontroller

- Control-dominant applications
  - supports process scheduling and synchronization
  - preemption (interrupt), context switch
  - short latency times
- Low power consumption
- Peripheral units often integrated
- Suited for real-time applications



### **Microcontroller** as a System-on-Chip



• complete system

- I<sup>2</sup>C-bus and par./ser. interfaces for communi-
- A/D converter
- watchdog (SW activity timeout): safety
- on-chip memory (volatile/non-volatile)
- interrupt controller

## Topics

- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- Heterogeneous Architectures

## **Data Dominated Systems**

- Streaming oriented systems with mostly periodic behavior
- Underlying *model of computation* is often a signal flow graph or data flow graph:



- Typical *application examples*:
  - signal processing
  - multimedia processing
  - automatic control

# **Digital Signal Processor**

- optimized for data-flow applications
- suited for simple control flow
- parallel hardware units (VLIW)
- specialized instruction set
- high data throughput
- zero-overhead loops
- specialized memory
- suited for real-time applications



Figure 2–1. TMS320C62x/C67x Block Diagram

## Very Long Instruction Word (VLIW)

*Key idea:* detection of possible parallelism to be done by compiler, not by hardware at run-time (inefficient).

*VLIW:* parallel operations (instructions) encoded in one long word (instruction packet), each instruction controlling one functional unit.



#### **Explicit Parallelism Instruction Computers (EPIC)**

The TMS320C62xx VLIW Processor as an example of EPIC:



Instr. A Instr. B Instr. C Instr. D Instr. E Instr. F Instr. G

| Cycle | Instruction |   |   |
|-------|-------------|---|---|
| 1     | А           |   |   |
| 2     | В           | С | D |
| 3     | E           | F | G |

## **Example Infineon**



Processor core for car mirrors Infineon



25Gops @ 32b

### **Example NXP Trimedia VLIW**



Nexperia Digital Video Platform NXP



## Topics

- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- System-on-Chip

#### **FPGA – Basic Strucutre**

- Logic Units
- I/O Units
- Connections



### **Floor-plan of VIRTEX II FPGAs**





# **Example Virtex-6**

 Combination of flexibility (CLB's), Integration and performance (heterogeneity of hard-IP Blocks)



#### XILINX Virtex UltraScale

| Effective LEs (K) | 3,435  |
|-------------------|--------|
| Logic Cells (K)   | 2,863  |
| UltraRAM (Mb)     | 432.0  |
| Block RAM (Mb)    | 94.5   |
| DSP Slices        | 11,904 |
| I/O Pins          | 832    |



#### Virtex-6 CLB Slice

Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory



## Topics

- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- Heterogeneous Architectures

# **Application Specific Circuits (ASICS)**

#### Custom-designed circuits are necessary

- if ultimate speed or
- energy efficiency is the goal and
- Iarge numbers can be sold.

#### Approach *suffers* from

- Iong design times,
- lack of flexibility (changing standards) and
- high costs (e.g. Mill. \$ mask costs).



## Topics

- System Specialization
- Application Specific Instruction Sets
  - Micro Controller
  - Digital Signal Processors and VLIW
- Programmable Hardware
- ASICs
- Heterogeneous Architectures

#### **Example: Heterogeneous Architecture**



Samsung Galaxy Note II

- Eynos 4412 System on a Chip (SoC)
- ARM Cortex-A9 processing core
- 32 nanometer: transistor gate width
- Four processing cores



#### **Example: Heterogeneous Architecture**



## **Example: ARM big.LITTLE Architecture**



Available on certain product families

|                | Core Complex 2              |                                 |  |
|----------------|-----------------------------|---------------------------------|--|
|                | 1 x PC                      | Cortex-M4F                      |  |
|                | 1 x UART                    | 16 KB I-cache                   |  |
|                | 6 x GPIO                    | 16 KB D-cache                   |  |
|                | 1 x TPM Timer               | 256 KB SRAM                     |  |
| 1 or           | ory                         | Memo                            |  |
|                |                             | DDR3L @ 933 MH<br>LPDDR4 @ 1200 |  |
|                | /eMMC5.1                    | 2 x SDI03.0                     |  |
|                | RAW NAND-BCH62              |                                 |  |
| PCle 3         | 2 x Quad/1 x Octal SPI      |                                 |  |
| 1              |                             | -                               |  |
| 10             | rity                        | Secu                            |  |
| 10             | AG, TrustZone®              | HAB, SRTC, SJT                  |  |
|                | 096, SHA-256                | AES256, RSA40                   |  |
|                | C4, MD-5                    | 3DES, ARC                       |  |
|                | HE, ECC                     | Flashless S                     |  |
|                | Enc Engine                  | Tamper, Inline                  |  |
|                | Control                     | System (                        |  |
| i.MX 8X Family | Clocks, Reset               | Power Control,                  |  |
|                | OMs                         | BootR                           |  |
|                | dedicated I <sup>2</sup> C) | PMIC interface (                |  |
| 4              | ce Partitioning             | Domain Resource                 |  |

Note: Accessing muxable controller's full capabilities is dependent upon board component choic





#### Toradex Colibri Compute-on-Module

