ABSTRACT

VLSI Standard Cell Placement is a hard optimization problem, which is further complicated with new issues such as power dissipation and performance. In this work, a fast hybrid algorithm is designed to address this problem. The algorithm employs Simulated Evolution (SE), an iterative search heuristic that comprises three steps: evaluation, selection and allocation. Solution quality is a strong function of the allocation procedure which is both time consuming and difficult. In this work a force directed approach in the allocation step of SE is used to both accelerate and improve the solution quality. Due to the imprecise nature of design information at the placement stage, objectives to be optimized are expressed in the fuzzy domain. The search evolves towards a vector of fuzzy goals. The proposed heuristic is compared with a previously presented SE approach. It exhibits significant improvement in terms of runtime for the same quality of solution.

2. PROBLEM FORMULATION

SE has proved to be an excellent heuristic for standard cell placement problem in terms of solution quality. However, in terms of runtime, it is not as efficient as other well known deterministic placement algorithms. In this paper we have targeted the problem of decreasing the runtime of SE without degrading the overall solution quality. After inspecting the previous SE-based techniques it is observed that the main time consuming step is the allocation step [7, 4, 6]. The asymptotic time complexity of this step in these algorithms is \( O(n^2) \). In this paper we integrate a new allocation scheme into SE based on a force-directed algorithm that has a complexity of \( O(n) \). We show that this hybrid approach gives results that are comparable (or of better quality especially for large test cases) but with much smaller runtimes [6].

The paper is organized as follows. Section 2 covers the problem formulation. In Section 3 the proposed algorithm is discussed. Experimental results are presented in Section 4 and conclusions in Section 5.

3. PROPOSED ALGORITHM

In this section we describe our Simulated Evolution based hybrid search algorithm. We begin with a brief discussion of the basic SE heuristic.

3.1. Basic Simulated Evolution (SE)

The general SE algorithm is illustrated in Figure 1 and comprises three main steps: evaluation, selection, and allocation. Of these, the allocation step is the slowest, with a complexity of \( O(n^2) \).
Algorithm Simulated Evolution \(\{B, \Phi_{\text{initial}}, \text{Stopping Condition}\}\)

\textbf{NOTATION:}

- \(B\) = Bias Value.
- \(\Phi\) = Complete solution.
- \(m_i\) = Module \(i\).
- \(g_i\) = Goodness of \(m_i\).
- \(\text{Allocate}(m_i, \Phi)\) = Function to allocate \(m_i\) in partial solution \(\Phi\).

\textbf{Begin}

\textbf{Repeat}

\textbf{EVALUATION:}

\textbf{ForEach} \(m_i \in \Phi\) evaluate \(g_i\).

\textbf{IF} \(\text{Only elements that were affected by moves of previous step}\)

\textbf{THEN}

\textbf{Iteration get their goodnesses recalculate?}\)

\textbf{ELSE}

\textbf{SELECT:}

\textbf{ForEach} \(m_i \in \Phi\) \textbf{DO}

\begin{align*}
\text{begin} & \text{IF Random} > \min[g_i, 1] \text{ THEN} \\
& \text{begin} \\
& \text{end} \\
& \text{end} \\
& \text{Sort the elements of} \ S \\
& \text{ALLOCATION:}
\end{align*}

\textbf{ForEach} \(m_i \in S\) \textbf{DO}

\begin{align*}
& \text{begin} \\
& \text{end} \\
& \text{Until Stopping Condition is satisfied} \\
& \text{Return Best solution.} \\
& \text{End } \text{(Simulated Evolution)}
\end{align*}

\textbf{Fig. 1.} Structure of the Simulated Evolution algorithm \[7\].

\textbf{tion.} In the evaluation step the goodness of each cell in its current location, in the range \([0, 1]\), is computed using some measure.

In the selection step, the algorithm probabilistically selects unfit elements. Elements with low goodness values have higher probabilities of getting selected for relocation. These selected elements are identified as the selection set and are removed from the solution. These selected elements are one by one reassigned to new locations in a constructive allocation step. The objective of this step is to improve their goodness values, thereby reducing the overall cost of the solution.

Different constructive allocation schemes are proposed in literature \[8, 7\]. One such scheme is \textbf{sorted individual best fit}, where all the selected elements are sorted in descending order with respect to their connectivity with the partial solution and placed in a queue. The sorted elements are removed one at a time and trial moves are carried out for all the available empty positions. The element is finally placed in a position where maximum reduction in cost for the partial solution is achieved. This process is continued until the selected queue is empty. The overall complexity of this algorithm is \(O(n^2)\) where \(n\) is the number of selected elements. Other more elaborate schemes are \textbf{weighted bipartite matching allocation} and \textbf{branch-and-bound search allocation} \[8\]. However, these allocation strategies are more complex than “sorted individual best fit”, while the quality of solution remains comparable \[8\]. In summary, selection and allocation steps determine and dictate the search strategy, while evaluation provides feedback to the search scheme.

\textbf{3.2. Fuzzy Force Directed Allocation}

In the allocation stage, the selected cells are to be reassigned to best available locations. We consider selected cells as movable modules and remaining cells as fixed modules. In previous works \[4, 5, 6\], \textbf{sorted individual best fit} scheme was employed. For large circuits the run time was unreasonably high. To address this problem, a force directed allocation is proposed in this work. According to this approach optimal \(x\)-position and \(y\)-position of the cell under consideration are found. The \(y\)-position indicates the row to which the cell should be relocated. If the \(y\)-position is in between two rows then the row nearest to \(y\)-position is selected. In order to satisfy the width constraint, if the width of selected row after adding the cell is more than the maximum allowable width then the next nearest row that satisfies the width constraint is chosen. The \(x\)-position indicates the exact location of the cell in the selected row.

The basic idea behind the force directed method is that cells connected by a net exert forces on each other. Suppose a cell \(a\) is connected to another cell \(b\) by a net of weight \(w_{ab}\). Let \(d_{ab}\) represents the distance between \(a\) and \(b\). Then the force of attraction between the cells is proportional to the product \(w_{ab} \times d_{ab}\). A cell \(i\) connected to several cells \(j\) at distance \(d_{ij}\) by wires of weights \(w_{ij}\), experiences a total force \(F_i\) given by

\begin{equation}
F_i = \sum_j w_{ij} \cdot d_{ij} \tag{1}
\end{equation}

The best location for a cell \(i\) is where the \(x\)-component
values. We use the notion that a cell will try to move in the directions of those values that have higher weight (higher badness). We use the sign of a badness factor (opposite of goodness in evaluation). The cell will try to move in the directions of those values that are better in terms of all objectives. For this purpose, we have a weight allocation for all the nets. We can write this as follows,

\[ \sum_j w_{ij} \cdot (x_j - x_i) = 0; \quad \sum_j w_{ij} \cdot (y_j - y_i) = 0 \]  

(2)

Solving the above equations for \( x_i \) and \( y_i \) we have

\[ x_i = \frac{\sum_j w_{ij} \cdot x_j}{\sum_j w_{ij}} \quad y_i = \frac{\sum_j w_{ij} \cdot y_j}{\sum_j w_{ij}} \]  

(3)

Values \( x_i \) and \( y_i \) are the optimal \( x \)-position and \( y \)-position for a cell \( i \) with respect to current \( x \) and \( y \) positions of all the cells \( j \) connected to it [1]. They point to the new location that is better in terms of all objectives. For this purpose, proper weights to each of the nets connecting cell \( i \) and cell \( j \) are to be chosen. A good way to choose these weights is to use fuzzy logic. The following fuzzy rule is used to find these weights:

**Rule R1:** IF a net is good in wire-length AND good in power AND good in delay THEN it has a low weight.

According to this rule, a net will have a smaller weight only if it is good in terms of all the objectives. In fact weight signifies a badness factor (opposite of goodness in evaluation). The cell will try to move in the directions of those nets that have higher weight (higher badness). We use the membership functions in Figure 2, with the following base values.

\[ X_{ij_{bw}}(x) = \frac{l_{ij}}{l_{ij}} \quad X_{ij_{p}}(x) = \frac{l_{ij}}{(1 + S_{ij}) l_{ij}} \quad X_{ij_{d}}(x) = \frac{1D_{ij}}{1D_{ij}} \quad X_{ij_{d,ij}}(x) = \frac{T_{max} \cdot (ij)}{T_{max}(ij)} \]  

(4)

where \( l_{ij} \) represents wire-length of a net \( ij \) and \( l_{ij}^* \) is its estimated lower bound. \( S_{ij} \) is its switching probability (required to estimate power in CMOS circuits). \( 1D_{ij} \) is interconnect delay of net \( ij \) and \( 1D_{ij}^* \) is its estimated lower bound. \( T_{max} \) is the delay of longest path and \( T_{max}(ij) \) is the delay of longest path traversing net \( ij \).

Using these base values and corresponding \( \mu_{ij}^a \) where superscript \( a \) denotes allocation, we find a goodness factor \( g_{ij} \), using AFA and OFA operators proposed in [9], for the net connecting cells \( i \) and \( j \), as follows,

\[ g_{ij} = \mu_{ij}^a = 1 - \frac{\sum_{k=1}^{m} \mu_{k,ij}^a}{\sum_{k=1}^{m} \mu_{k,j}^a} \]  

(5)

where

\[ \mu_{ij}^a = \mu_{ij,ab}^a + \mu_{ij,ab}^a \]  

(6)

Now the weight of the net \( w_{ij} \) is calculated as follows,

\[ w_{ij} = 1 - g_{ij} \]  

(7)

In this proposed allocation schemes it is clear that for each cell we have to find the best location only once, therefore the complexity of the proposed allocation scheme is \( O(n) \) where \( n \) is the number of cells selected in selection stage of the algorithm. All other issues such as already occupied zero-force locations, cells already in their zero-force locations, etc., are resolved using the previous ad-hoc approaches available in the literature [1].

### 4. EXPERIMENTS AND RESULTS

Fast Fuzzy Force Directed Simulated Evolution (FFSE) and Biasless Fuzzy Simulated Evolution (BLFSE) [6], were applied on 12 ISCAS benchmark circuits. In BLFSE, execution is aborted when no improvement is observed in the last 500 iterations (maximum of 5000 iterations), whereas for FFSE the algorithm is run for a fixed 5000 iterations. The 0.25 micron CMOS digital low power standard cell library for MOSIS is used.

Table 1 compares the quality of the final solution generated by BLFSE and FFSE. The circuits are listed in order of their size (136-10383 modules). From the results, it is clear that FFSE outperformed BLFSE for all circuits in terms of execution time. Also, in most cases, FFSE shows minimal degradation in terms of solution quality but with a significant improvement in runtime. For larger circuits (S3330 & S5378), FFSE performed better than BLFSE in terms of both the final solution quality and the runtime.

Observe that the algorithm converges very fast. This behavior can be observed in Figure 3, where convergence is achieved after approximately 400 seconds (6.6 minutes), and the remaining time is spent in fine tuning the solution.
quality. It is also obvious from these results that FFSE totally avoids early random walk, which is a problem in other non-deterministic heuristics such as Simulated Annealing. Memory requirement of SE is also considerably low when compared to other iterative heuristics such as Genetic algorithm as SE keeps only one solution in its memory at a time.

5. CONCLUSION

In this paper, we have proposed Fuzzy Force Directed Simulated Evolution Algorithm for multiobjective VLSI standard cell placement. The allocation stage is improved by using a force directed strategy to reduce the execution time from $O(n^2)$ in previous SE based approaches to $O(n)$. Fuzzy logic is used to overcome the multiobjective nature of the problem. It is employed at evaluation and allocation stages and in the choice of the best solution from the set of generated solutions.

The proposed scheme is compared with BLFSE. It is observed that FFSE performs much better than BLFSE in terms of runtime, with no significant degradation in solution quality. FFSE can be used for large circuits whereas BLFSE performs poorly for circuits with more than 2000 – 3000 cells.

Acknowledgment: The authors and the research team acknowledge King Fahd University of Petroleum & Minerals for its support under research project COE/ITERATE/221.

6. REFERENCES


<table>
<thead>
<tr>
<th>Circuit</th>
<th># of Cells</th>
<th>L (μm)</th>
<th>P (μm)</th>
<th>D (ps)</th>
<th>T (s)</th>
<th>L (μm)</th>
<th>P (μm)</th>
<th>D (ps)</th>
<th>T (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>S298</td>
<td>136</td>
<td>4548</td>
<td>915</td>
<td>139</td>
<td>46</td>
<td>4975</td>
<td>999</td>
<td>135</td>
<td>4.8</td>
</tr>
<tr>
<td>S386</td>
<td>172</td>
<td>8357</td>
<td>2036</td>
<td>203</td>
<td>117</td>
<td>9422</td>
<td>2169</td>
<td>213</td>
<td>6.8</td>
</tr>
<tr>
<td>S832</td>
<td>310</td>
<td>23140</td>
<td>5251</td>
<td>416</td>
<td>192</td>
<td>26112</td>
<td>5863</td>
<td>400</td>
<td>11</td>
</tr>
<tr>
<td>S641</td>
<td>433</td>
<td>12811</td>
<td>3072</td>
<td>687</td>
<td>175</td>
<td>12485</td>
<td>2897</td>
<td>674</td>
<td>24</td>
</tr>
<tr>
<td>S953</td>
<td>440</td>
<td>29576</td>
<td>5025</td>
<td>223</td>
<td>351</td>
<td>29988</td>
<td>4683</td>
<td>244</td>
<td>17</td>
</tr>
<tr>
<td>S1238</td>
<td>540</td>
<td>41318</td>
<td>12303</td>
<td>363</td>
<td>699</td>
<td>41362</td>
<td>12934</td>
<td>377</td>
<td>20</td>
</tr>
<tr>
<td>S1196</td>
<td>561</td>
<td>35810</td>
<td>11276</td>
<td>360</td>
<td>613</td>
<td>38282</td>
<td>12363</td>
<td>350</td>
<td>22</td>
</tr>
<tr>
<td>S3330</td>
<td>1961</td>
<td>183288</td>
<td>24797</td>
<td>459</td>
<td>5351</td>
<td>163756</td>
<td>12303</td>
<td>483</td>
<td>87</td>
</tr>
<tr>
<td>S5378</td>
<td>2993</td>
<td>326840</td>
<td>48360</td>
<td>435</td>
<td>11823</td>
<td>243721</td>
<td>41560</td>
<td>376</td>
<td>149</td>
</tr>
<tr>
<td>S9234</td>
<td>5844</td>
<td>UH</td>
<td>UH</td>
<td>UH</td>
<td>UH</td>
<td>655370</td>
<td>114231</td>
<td>908</td>
<td>440</td>
</tr>
<tr>
<td>S13207</td>
<td>8651</td>
<td>UH</td>
<td>UH</td>
<td>UH</td>
<td>UH</td>
<td>1339837</td>
<td>144189</td>
<td>1604</td>
<td>885</td>
</tr>
<tr>
<td>S15850</td>
<td>10383</td>
<td>UH</td>
<td>UH</td>
<td>UH</td>
<td>UH</td>
<td>1477662</td>
<td>115049</td>
<td>2006</td>
<td>1202</td>
</tr>
</tbody>
</table>

Table 1. Layout found by BLFSE, and FFSE. ‘L’, ‘P’ and ‘D’ represent the wire-length, power, and delay costs respectively. ‘T’ is the execution time in Seconds. The last 3 circuits were not tested for BLFSE because of large runtime requirement. ‘UH’ indicates Unreasonably High runtime requirement.