## Analysis and Design of Novel Secured NoC for High Speed Communications

#### S. Rekha<sup>1\*</sup> and A. M. Bhavikatti<sup>2</sup>

<sup>1</sup>Visvesvaraya Technological University, Belagavi - 590018, Karnataka, India; rekha\_nl@yahoo.co.in <sup>2</sup>Department of CSE, Bheemanna Khandre Institute of Technology, Bhalki, Bidar - 585328, Karnataka, India; arvindbhavikatti@gmail.com

#### Abstract

**Background/Objectives:** Mainstream electronic designs are realized by System on Chip (SOC) that pushes the limits of integration. Network–On-Chip (NOC) for high speed is demanding in communication system to transfer data between transmitter (Tx) and receiver (Rx). NoC switch consists of Tx, Rx, Processing elements (PEs) and control circuits to process data. **Methods:** The PEs connected to a hub for communication topology through NoC switches which are mainly responsible for communication establishment of inter PEs communication channels. Novel NoC bit encoder and decoder transition (NOC BEDT) algorithmis used to optimize the hardware device utilization,speed and power consumption of communication system. The NoC BEDT consists of single switch,xored encoder with decoder. **Findings:** In existing work, each switch occupies lot of hardware as used in Field Programmable Gate Array (FPGA) and speed limited to 10Mbps. The experiments carried out of proposed NoC BEDT algorithm carried on real data and validation by Chipscope pro in Xlinx 13.4 DSP designsuite which showed 24% reduction in power consumption and 65% improvement in speed ofdata transmission of data rate upto 10 Gbps.

Keywords: BEDT, Encoder, Decoder, NoC, Routing, Switching, Topology, FPGA

## 1. Introduction

In past decade, most of computer components integrated in a single silicon chip i.e. system on chip (SoC). It has contained a powerful processor, analog, digital and mixed signal components on a silicon substrate. SoC mainly used in embedded applications. The foremost problem of these components is communication between them<sup>1</sup>. If the number of components has increased, communication between components has considerably complex. Communication networks are connecting different geographically distributed points. In point to point communication, the connection of the any two resources is fixed. It provides flexibility with avoiding arbitration but some resources have not involved for processing data. It had been leading a problem of less utilized resources while to increase resource utilization, bus based on chip communication has introduced. In bus technology, components are connected with a single bus. It is simple and widely used as components are connected has been delayed and consumed much power. It is not scalable as the bandwidth has shared to entire system resources. To overcome the problem, hierarchical bus has used. Bridge has attached between the multiple buses and saved the power consumption due to the not using long wires. This technique having multiple buses which leads arbitration between buses and numerous wires connected to the bus. To solve arbitration overhead, bus matrix method has introduced. The resources are connected in a matrix manner. By using matrix bus, the arbitration has overcome but due to on chip interconnects electric noise, degrading performance, energy consumption and scalability increased. These problems have been given a new paradigm i.e. NoC. Dally<sup>2</sup> and Benini<sup>3</sup> have introduced this new on chip communication for SoC. NoC is the system which consists of a group of routers with Processing Elements (PEs) and formed a topology (ex. Mesh, torus, folded torus, tree/fat tree and ring). While transferring data, source router connects other

through bus. Due to a single bus, servicing of components

router through PE until it reaches to its destination router. PEs has controlling the data transmission over the flits or dedicated channels. NoC has been giving notable improvements over the conventional bus and the sharing of NoC interconnects helped to reuse of the resources. NoC is presented in various journals, special conferences and NoC symposium<sup>4</sup>. Another important NoC parameter is latency which depends on the levels of router pipelining levels off to 5 clock cycles. They are input buffering, a virtual channel allocation, routing algorithm, arbitration, and switching technique. According to ITRS, the total number gates count per router is nearby 15 kilogates and the average area at 130 nm technology is 0.14mm<sup>2</sup>. Circuit switching has given even more small routers because of without buffers. The operating frequency is also another important key parameter for NoC. Implemented the operating frequency of 400 MHz with 0.18µm technology. Given the operating frequency of 500 MHz at 0.18 µm with occupied area of 0.25mm<sup>2</sup>. The average operating frequency is about 600 MHz for 130 nm technology. Reducing latency and the cancelling memory requirementis still an open research areas of the nose.

The traditional NoC routers are composed of structural and transmission parameters. The structural parameters are included arbiters, buffers and routing algorithms while transmission parameters are channel allocation and switching mode. A typical router is having five ports to handle the information. The five ports of NoC are North port, East port, South port, West port and Local port. Each port is having bi-direction channel totransmit/receive the data from port to port. The router has interconnected its own PE through local port and network interface (NI). The remaining of the paper framed as follows. In section 2 discussed typical NoC architecture. In section 3 presented the different topologies for NoC. In section 4 discussed different switching technologies of NoC. In section 5 represented channel routing algorithms. Section 7 discussed the comparison of NoC BEDTalgorithmfollowed by section 8 consists of conclusion and future work.

## 2. Typical NoC Router Architecture

The NoC has made up of following building blocks. Those are topology, switching, routing and crossbar. A typical NoC has bidirectional inputs and outputs (N, S, W, E and L)<sup>5</sup>. The data is transferred using bi-directional channels either the local port to router or router to router. The bidirectional local port is connected to its PE through NI. The PE has process the data information and transferred through the fixed channel. Each input port having a buffer to store the input data which come from the output port. The buffers have worked as First In First Out (FIFO) and shown in Figure 1. These buffers are having association with input and output ports to store the data. The buffer size of input port has depended on the packet size of the data. The arbitration unit controls the input and output ports among number of available ports.



**Figure 1.** Typical Router architecture with five input and output ports (north, east, south, west and local).

The routing unit is responsible for the how packets are routed to the source to destination router. By using of routing and arbitration units the crossbar switch transfers the data from input port to the output port. If the requested is buffer full in destination port, then the transferring packet has to wait until the buffer has become free. To avoid the errors of NoC, use the virtual channel in association with physical channel. These virtual channels provide the extra channels to moving the packets from one port to another and deadlock can be avoided. Virtual channel controller is needed to organize the virtual channel with physical channel.

## 3. Topology

The topology is described as how ports/nodes are connected in a network. The topology has indicated

the number of alternative routes between the ports and controlled the network contention along with different traffic patterns. The different topologies are used in NoC at the based on the application. It has been affected the fundamental parameters like area, latency and throughput. There has immense work done in the topology of NoC. A lot of research work done at different topologies for superior results direct topologies are Mesh, torus, ring, star, spidergon, octagon etc.6 and indirect topologies are tree/fat tree and matrix etc. In some NoC architecture has been using 2D and 3D topologies. In NoC, the mesh topology has popularly using as it provides more parallelism and scalability when compared to other topologies7-9. Mesh topology comes under a variety of application like image processing and security systems. Torus topology is the latest version of a typical mesh topology and the torus topology as invariable as Mesh but head nodes connected to tail nodes in complete directions. A torus provides better path diversity than mesh and routing also minimal. The disadvantage of torus has long wrap around the channel which increased the delay. This had been come out of the folded torus topology. The binary tree is connected like the leaves of a tree. Each router has 4 ports. The number of levels is depending on the number of nodes and if the number of nodes is 'n' then the number of levels is log4 n. At the first level of the binary tree, nodes are connected to n/4 to the routers. The binary tree topology has vastly used in DNS system.

## 4. Switching Mode

Switching indicates by which the transmission arrangements (bandwidth, buffer capacity etc.) are allocated to users to provide them. Switching systems have been decreasing the network costs by reducing the number of transmission links required to enable the population of users to communicate. NoC has used two types of switching techniques i.e. circuit switching and packet switching<sup>10</sup>. In circuit switching (CS), the data transfer through a dedicated channel<sup>11,12</sup>. The channel is set to the resource until data transfer and this channel is reserved until the completion of data transfer. In this way, CS provides guaranteed throughput (GT) and quality of service (QoS)13. In CS, packets are transferred in a pipelined manner along the channel, and it needs one register to buffering. CS is the most efficient for high network traffic and high transmission rates and another

advantage is the bandwidth not changed until data transfer ends. But CS is lack of latency requirement and efficiently resource utilization. Mostly CS has fixed structure so it provides limited flexibility. CS has popularly used in the telephone networks.

To get rid problems of CS, Packet Switching (PS) has been transferred the data into packets<sup>14</sup>. PS has most commonly used for NoC as it provides high bandwidth and efficiently resource utilization. PS has a few limitations because of buffer size and network size. Due to these limitations, PS has given the low saturation point and channel latency in case of network size is high<sup>15</sup>. In PS, data have broken into number of packets and further into flits (flow control digits) and further divided into number phits(physical units).PS shows how data information had broken into packets again into flits. The flits are consisting of 3 parts of packet i.e. head, body and tail flit. The head flit consists of source, destination and routing information and body flit has the original data information. Lastly, tail flit has consisted of information about the end of data packets and upon receiving the tail flit, the router has to release the communication channel.

## 5. Channel Routing Algorithm

The routing algorithm selects how the data is transferring from source to destination. The routing algorithm has been preventing from deadlock, livelock and starvation. To avoid these overheads, number of routing algorithms has proposed. The routing algorithms had grouped by depending on various routing parameters. The routing algorithms give important concepts such as1. Depend on the number of destinations (unicast, multicast and broadcast)<sup>16</sup>. 2. Depend on adapting (deterministic, oblivious and adaptive)<sup>17</sup>. 3. Depend on routing decision (source, distributive and centralized)<sup>18</sup>. 4. Depend on implementation (lookup table and a finite state machine) as shown in Figure 2.

Among different types of routing algorithm, adaptive routing algorithm has given low latency results. Deterministic algorithm, simple and inexpensive but they do utilize path diversity and thus are weak on load balancing. Oblivious algorithm, give often good results since they allow effective load balancing and their effects are easy to compare. Adaptive algorithms are although in theory superior, are complex and power hungry. The deterministic algorithm uses the co-ordinates of X and  $Y^{19}$ . First, it has to reach X coordinate of destination then it

will move Y coordinate of destination. Consider Xoffset= Xdestination- Xsource and Yoffset= Ydestination-Ysource. If Xoffset =0 and Yoffset =0 then the current router is the destination router. If Xoffset<0 then data can be transferred to left side of the current router. If Xoffset>0 then data can be transferred to the right side of the current router and repeat the same process for finding of y coordinate<sup>20</sup>. Number of various algorithms proposed in the deterministic algorithm<sup>21</sup>. The second one is the minimal adaptive routing uses the shortest path between the routers<sup>22-30</sup>. This routing algorithm checks shortest path in every router among available routers. Researchers are having considerable scope of implementing best routing algorithm. to the processor. Arbiter has selects the suitable port and different types of arbiters has introduced (fixed priority, round robin fashion) based on the performance<sup>31</sup>. Fixed priority arbiter, it is the simplest arbiter. Priorities are given to the resources and according to their priority, input and output selected. It has easy to implement but the path delay has proportional to the number of input ports. The round robin method gives the highest degree of fairness compared to other arbiters<sup>32,33</sup>. It will give the priorities to the input and output portsin round robin fashion which will increase the area and latency. To reduce this, distributive round robin method has been used. In this method, input and output selected distributive manner and various types of distributive round robin arbiter comes forward in NoC.

#### 5.1 Arbiter

Whenever number input and output ports are available, selecting the suitable input and output port is complex

#### 5.2 Cross Bar Switch

The cross bar is used for switching data input port to



Figure 2. Classification of the routing algorithms in NoC.

output port based on Arbiter module<sup>34</sup>. High speed routers are used for the cross bar with full connectivity where at low speed routers are used cross bar without full connectivity. The typical cross bar design has simple when using multiplexer and de-multiplexer. It consists of 'MxN' transmission lines where 'M' horizontal of 'X' axis and 'N' vertical lines of 'Y' axis. The input of the cross bar is given by output of the arbiter. The main problem using cross bar is speed up for transfer of input to output. Parthapratimpande has given the parameters of performance in NoC.

# 6. Methodology of Proposed NoC BEDT

The proposed BEDT is an advanced version of all existing NoC for high speed communications and it has both encoder and transmission modules at the transmitter and at receiver have decoder and its receiver modules and viceversa. The BEDT has provided an efficient performance analysis like speed and power consumption for effective communications and consists of the following modules:

- 6.1 Design of single switch NoC
- 6.2 Novel Bit transition encoder
- 6.3 Novel Bit transition decoder
- 6.4 Hardware Implementation on FPGA
- 6.5 Comparison of NoC BEDT with existing work

#### 6.1 Design of Single Switch NoC

The NoC single switch is designed for the purpose of data transfer between source and destination at a transmission rate of 10GHz. The proposed switch consists of the two PEs, control circuit, First-In-First-Out (FIFO) and Finite State Machine (FSM). The two PEs have been performing the operation for two parallel data which are received from two different destinations. Each PE has memory for storage and performing the operation at a rate of 10GHz i.e., each packet of data is transmitted at speed of  $\frac{1}{10GHz}$  that means 0.1us is the time required to transmit each packet. Therefore, the proposed NoC switch has feasibility to access to different data from two different sources and both data are performed parallel.

The control unit having reset and selection control signals to control overall design, the reset is to reset all internal registers and selection is to select the direction of data transfer, there are four directions and one virtual channel, the direction are like north, south, west and east. Based on the selection signal the data will transfer from source to destination as shown in Figure 3.



**Figure 3.** 3x3 NoC switch and their direction with virtual channel.

When all direction lines are busy for data transmission, then the proposed switch automatically switches over to virtual channel for effective communication. This virtual channel concept is a costly and complex circuit, but it can be used for emergency data transmission. Each switch has its one virtual channel and based on request command on selection line, the virtual channel will get enable and connect to destination.

The FIFO circuit has been presented in each switch for temporary storage and data transfer serially, based on priority of data coming from the different sources, the data are transferred to a particular destination. In FIFO, one writes and read the signals for writing the incoming data and then to read. The size of the FIFO is 256x8, counter is used to control write and read operation. When counter is "1111" then FIFO memory is completely storing the data, i.e. full signal generates a high signal and full signal is low, then the counter is increment, at the same time the data are written into the FIFO, when read signal is high then data is read from the FIFO.

Finally, FSM controller has been designed for finding the shortest path from the source to destination and then transfers the data. For 3x3 NoC switch, maximum probabilities are three directions, such as horizontally, vertically and diagonally. This FSM control the all directions based shortest path and demand of virtual channel, "req\_in" is control signal used for selection of destination and it is illustrated in the Figure 4. The 8-bit of data packet is stored in the FIFO and its output is connected to FSM to find the who is the destination is shown in Figure 4. There are four enable signals of six directions with the name of "en\_e", "en\_w", "en\_s", "en\_n" , "en\_r" and "en\_l", where l and r represents left and right for controlling of data transfer in the all directions.



Figure 4. RTL diagram of FSM and FIFO.

#### 6.2 Novel Bit Transition Encoder

Each switch has an encoder and a decoder for providing the security of the information data to be transmitted from source to destination. The encoder operation is as follows

Let us take one frame of size 8-bits:

X is the packet input and its frame value assumed 10011001 and information bits will be 11100110

```
Present bits: X=0100101101001011

Previous bits Y=0001001011010010

Transition of Y is T_y=1111111

Second TransitionT_2=00000000 and so on

Fourth Transition T_4=00000000

End TransitionT_e=11111111

Ty count=1000=8

T2 count=0000

T4 count=0000

Te count=1000=8
```

The above data decides whether odd invert and even invert using module-C circuit, where module-C is the circuit designed by xor gate.

Set to '1' or '0' based on the following inverts

10-----odd 01-----even 11-----full 00-----No inversion Te>w-1/2

If full invert=11 then odd invert=1 and even invert=1. After doing all above steps, Encoder operation output is Z and it given by  $Z[0]=X[0] \Theta \text{even invert}=1\Theta 1=0$   $Z[1]=X[1] \Theta \text{even invert}=0\Theta 1=1$   $Z[2]=X[2] \Theta \text{even invert}=0\Theta 1=1$   $Z[3]=X[3] \Theta \text{even invert}=1\Theta 1=0$   $Z[4]=X[4] \Theta \text{even invert}=1\Theta 1=0$   $Z[5]=X[5] \Theta \text{even invert}=0\Theta 1=1$   $Z[6]=X[6] \Theta \text{even invert}=0\Theta 1=1$  $Z[7]=X[7] \Theta \text{even invert}=1\Theta 1=0$ 

For encoder output record results from bottom to top ie 1101100110 and MSB 2 bits indicates full invert ie odd invert=1 and even invert=1

#### 6.3 Novel Bit Transition Decoder

Input to decoder is the output of encoder, i.e. 1101100110; this data is encrypted and transmitted through different channels. At the receiver end, error corrector technique is adapted to correct the data and decode into original data (i.e. 10011001). The novel algorithm for decoder and its steps are as follows.

Z=1101100110 is the output of encoder and it is input to the decoder

Z=1011010010110100 R=0010110100101101 Ty=11111111 Ty=8=1000 Module-C output decides '1' or '0' according to Tv>w-1/2 Ie 8>8-1/2=7/2 Here majority output is '0'  $X[0]=Z[0] \Theta odd invert=1\Theta 1=1$  $X[1]=Z[1] \Theta odd invert=1\Theta 1\Theta 0=0$  $X[2]=Z[2] \Theta odd invert=1\Theta 1\Theta 0=0$  $X[3]=Z[3] \Theta odd invert=0\Theta 1\Theta 0=1$  $X[4]=Z[4] \Theta odd invert=0\Theta 1\Theta 0=1$  $X[5]=Z[5] \Theta odd invert=1\Theta 1\Theta 0=0$  $X[6]=Z[6] \Theta odd invert=1\Theta 1\Theta 0=0$  $X[7]=Z[7] \Theta odd invert=0\Theta 1\Theta 0=1$ 

Therefore, the decoded output is 10011001 and it is same as encoder input.

#### 6.4 Hardware Implementation on FPGA

The NoC BEDT is developed using Verilog HDL and implemented on Virtex-5 FPGA, all modules signals are analyzed using Chipscope pro. The top module RTL diagram and it's inter connections are shown in Figure 5.



Figure 5. RTL diagram of NoC BEDT.7. Result and Discussion

The reported implementation results of the proposed NoC BEDT are for effective communication. The results are paid according to the topology, type of switching and routing algorithm. The topology mainly indicates the area of flit width and buffer size. The most common flit width is 32 bits which very lesser than 256 bits. Some proposals used flit width is 20 and 128. Note that the flits are used for flow control, but the bits are used as the number of parallel wires between the routers. However, they may have the exact same width in most cases. Buffers are generally constructed by the flip flops. A lot of proposals report that buffers will take 50-90% of the router area which leads to more power consumption. Virtual channels of each port are associated with the buffers. These are reducing the blocking conditions and improve the performance. Virtual channels are required in adaptive routing algorithms to avoid the deadlock problem. Presented 3dimensional topology.

The Figure 10 shows the simulation results of a proposed single NoC switch, it consists of four directions along with left and right directions and the request command to select required direction. The request would be in the form of  $2^n$  where n is a number bit. The following commands are used for request selection:

If n=1 then  $2^n$  is 2 and it is for left direction If n=2 then  $2^n$  is 4 and it is for right direction

If n=3 then  $2^n$  is 8 and it is for west direction

If n=4 then  $2^n$  is 10 and it is for east direction

If n=5 then  $2^n$  is 20 and it is for south direction

If n=6 then  $2^n$  is 40 and it is for north direction

The numbers 2,4,8,10,20 and 40 are in hexadecimal numbers shown in Figure 6.

The novel encoder and decoder have been implemented with NoC switch for high security, the encoder is used at the transmitter and decoder is used at receiver and viceversa. The encoder output is the input to the decoder, the decoder is correct the bits which are corrupted while transmitting through noisy channel and any media. In the Figure 7 X is the 32 bit input data, out to be the encoder output and decoder\_out are the decoded data and it is same as input, therefore encoder and decoder has worked for any data in the real time applications.The developed NoC BEDT algorithm has been implemented on Virtex-5 FPGA using Chipscope pro by Integrated Logic Analyzer (ILA) and Integrated Controller (ICON) and results are shown in the Figure 8.

## 8. Conclusion

This paper has analyzed different architectures for NoC and found that the former NoC has used packet switched 2D topology with the deterministic algorithm. The proposed typical NoC simulated in XILINX 14.7 ISE with Verilog and implemented in FPGA family. Silicon area, latency, power consumption and throughput are the important metrics for NoC and observed sharp difference in implementation results. NoC has emerging topic in research and based on this paper some topics are very important for developing the NoC. Those are the procedures and test cases for benchmarking, traffic characterization and modeling, design automation, fault-tolerance, QoS policies. In the future work will be proposed new methods to achieve the low latency and low area NoC. The proposed NoC BEDT is developed and analyzed for effective communication at a frequency



when request is 40

Figure 6. NoC simulation timing results.

|                                       |          |             | 4,999,994 ps |              |              |              |              |              |
|---------------------------------------|----------|-------------|--------------|--------------|--------------|--------------|--------------|--------------|
| Name                                  | Value    | 4,999,993ps | 4,999,994 ps | H,999,995 ps | H,999,996 ps | 4,999,997 ps | H,999,998 ps | H,999,999 ps |
| • • • • • • • • • • • • • • • • • • • | 4243221a |             |              |              | 4240221a     | si-on inpu   | t data to ei | icoder       |
| 1 Cik                                 | 0        |             |              |              |              |              |              |              |
| decoder_out[31:0]                     | 4243221a |             |              |              | 4243221a     | - 30 hit de  | coded out    | 100          |
| • 🌱 out[31:0]                         | 63625317 |             |              |              | 63620317     | 32-bit-en    | coded out    | titt         |
| <ul> <li>Tr[30:0]</li> </ul>          | 00000000 |             |              |              | 00000000     | 2.2-0.0.00   |              |              |
| T2(30.0)                              | 00000000 |             |              |              | 00000000     |              |              |              |
| ► Mi 14[30:0]                         | 6362b317 |             |              |              | 6362b317     |              |              |              |
| ▶ 📲 te(30.0]                          | 00000000 |             |              |              | 00000000     |              |              |              |
| 🕨 駴 temp[31:0]                        | 4243221m |             |              |              | 4243221a     |              |              |              |
| Ticount[4:0]                          | 00       |             |              |              | 00           |              |              |              |
| T2count[4:0]                          | 00       |             |              |              | 00           |              |              |              |
| 14count[4:0]                          | 10       |             |              |              | 10           |              |              |              |
| <ul> <li>Tecount[4:0]</li> </ul>      | 00       |             |              |              | 00           |              |              |              |
| Le half_invert                        | 0        |             |              |              |              |              |              |              |
| Le full_invert                        | 0        |             |              |              |              |              |              |              |

Figure 7. BEDT simulation timing results.

| Eile View JTAG Chain Device Trigge<br>Trigger Run Mode: Single                                 | er Setup W <u>a</u> veform <u>W</u> in | AMERICAN TRANS |                                |                           |                                |          |                             |                     |
|------------------------------------------------------------------------------------------------|----------------------------------------|----------------|--------------------------------|---------------------------|--------------------------------|----------|-----------------------------|---------------------|
| Project: top                                                                                   | Trigger Setup - D                      | EV:0 MyDe      | evice0 (XC5VLX50               | T) UNIT:0 MyILA0 (ILA)    |                                |          |                             | of 12               |
| JTAG Chain<br>P DEV:0 MyDevice0 (XC5VLX50T)<br>System Monitor Console<br>P UNIT:0 MyILA0 (ILA) |                                        | h Unit         |                                | Function ==               | Value                          | XXX_XXXX | Radix<br>Bin                | Counter<br>disabled |
| - Trigger Setup<br>- Waveform<br>- Listing                                                     | Add Del                                | Activ          |                                | TriggerC                  | ndition Name                   |          | Trigger Condition Equ<br>M0 | ation               |
| Signals: DEV: 0 UNIT: 0                                                                        | Type: Window                           | -              | Stop capture, befo<br>Windows: | ore editing Trigger Setup | 1 Depth: 4096                  | -        | Position:                   | 0                   |
| ⊶ datainl<br>⊶ datainr                                                                         | Bus/Signal                             |                | 0 160 32                       | 0 480 640 800 960 1       | 120 1280 1440 1600 1760 1920 2 |          |                             |                     |

Figure 8. Hardware Implementation on Chipscope pro results for NoC BEDT.

of 10GHz, the performance analysis has been made and observed that 24% reduction in power and 65% improvement in speed of transmission.

### 9. Acknowledgement

I thankful to his Holiness Poojya Dr. Sharanabaswappa Appa, Mahadasoha Peethadhipati, Sharanabasaweshwar Samsthana, Gulbarga. President, Sharanabasaweshwar Vidhya Vardhak Sangha, Gulbarga. And also thankful to Principal, Dean AIET Gulbarga. I am thankful to my guide.

## 10. References

- 1. Guerrier P, Greiner A. A generic architecture for on-chip packet-switched interconnections.DATE. 2000 Mar:250–6.
- 2. Dally WJ,Towles B. Route packets, not wires: on-chip interconnection networks.DAC '01 Proceedings of the 38th annual Design Automation Conference;2001. p. 684–9.
- Benini L, Micheli GD. Networks on chips: A new SOC paradigm.Computer. 2002 Jan; 35(1):70–8.
- 4. Salminen E et al.Survey of network-on-chip proposals. White Paper, ©OCP-IP; 2008 Mar.
- Moraes FG, Calazans N, Mello A, Mller L, Ost L. HERMES: An infrastructure for low area overhead packet switching networks on chip. Integration. 2004; 38(1):69–93.
- Ortín-ObónM,Suárez-Gracia D. Analysis of network-onchip topologies for cost-efficient chip multiprocessors. Microprocessors and Microsystems; 2016 Feb 5. p. 1–13.
- 7. Wiklund D, Liu D. SoCBUS: Switched network on chip for

hard real time embedded systems. IEEE Computer Society; 2003.p. 8–16.

- Goossens K, Dielissen J, Radulescu A. Ethereal network on chip: Concepts, architectures, and implementations.IEEE Design & Test of Computers. 2005 May; 22(5):414–21.
- Bobda C, Ahmadinia A. Dynamic interconnection of reconfigurable modules on reconfigurable devices.IEEE Design & Test of Computers. 2005 May; 22(5):443–51.
- Benini L, Bertozzi D. Xpipes: A network-on-chip architecture for gigascale systems-on-chip. IEEE Circuits System. Magazine. 2005 Sep; 4(2):18–31.
- 11. Lusala K, Legat J-D. A SDM-TDM based circuit-switched router for on-chip networks. Proceeding Reconfigurable Communication- centric Systems-on-Chip 6th International Workshop; 2011 Jun. p. 1–8.
- Jara-Berrocal, Gordon-Ross A. SCORES: A scalable and parametric streams-based communication architecture for modular reconfigurable systems. ProceedingDesign, Automation and Test in Europe Conference; 2009. p. 268–73.
- Lin J, Lin X. Express circuit switching: Improving the performance of bufferless networks-on-chip.IEEE First International Conference on Network Computers; 2010 Nov. p. 162–6.
- 14. Jiang W,Bhardwaj K. A lightweight early arbitration method for low-latency asynchronous 2D-mesh NoC's. ACM; 2015.
- 15. Rampal R, Chandel R, Daniel P. A network-on-chip router for deadlock-free multicast mesh routing. IEEE; 2015.
- NasiriF, Sarbazi-Azad H, Khademzadeh A. Reconfigurable multicast routing for networks on chip.Microprocessors and Microsystems. 2016; 42(2016):180–9.
- 17. Ahmed AB, Abdallah AB. Adaptive fault-tolerant architecture and routing algorithm for reliable many-core 3D-NoC systems.Journal of Parallel and Distributed Computing.2016.

- 18. Yaghini PM, Eghbal A, Bagherzadeh N. On the design of hybrid routing mechanism for mesh-based network-on-chip.Integration.
- Eggenberger M, Strobel M, Radetzki M. Globally asynchronous locallysynchronous simulation of NoCs on many-core architectures. 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing; 2016.
- 20. Akbar R, SafaeiF. A novel power efficient adaptive REDbased flow control mechanism for networks-on-chip; 2015.
- 21. Teimouri N, Modarressi M, Sarbazi-Azad H. Power and performance efficient partial circuits in packet-switched networks-on-chip. IEEE 21st Euromicro International Conferences on Parallel,Distribution Network. Process; 2013 Feb. p. 509–13.
- 22. Kumar R, Gordon-Ross A. MACS: A highly customizable low-latency communication architecture. IEEE Transactions on Parallel and Distributed Systems. 2016 Jan; 27(1):237–49.
- 23. Kim Jet al. A low latency router supporting adaptively for on-chip interconnects.DAC; 2005 Jun. p. 559–64.
- 24. BishnoiR, LaxmiV, Gaur MS, Zwolinski M. Resilient routing implementation in 2D mesh NoC. Microelectronics Reliability. 2016; 56(2016):189–201.

- 25. Moreno EI,Marcon CAM. Arbitration and routing impact on NoC design. IEEE; 2011.
- Chi H-C, Chen J-H. Design and implementation of a routing switch for on-chip interconnection networks.AP-ASIC. 2004 Aug. p. 392–5.
- Jabbar AIA, AL Malah NT. Design and implementation of a network on chip usingFPGA. Al-Rafidain Engineering. 2013 Feb; 21(1):91–100.
- 28. Wang L, Ma S. A high performance reliable NoCrouter. IEEE; 2016.
- 29. PandePP, Ivanov A. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers.2005 Aug; 54(8):1025-40.
- Henkel J, Wolf W, Chakradhar S. On-chip networks: A scalable, communication-centric embedded system design paradigm.VLSI. 2004 Jan. p. 845–51.
- 31. Devaux L, Pillement S, Chillet D, Demigny D. R2NoC: Dynamically reconfigurable routers for flexible networks on chip. International Conference on Reconfigurable Computing; 2010.