# Basic building blocks and architectures for realizable QCA devices

### **Michael Niemier**

(with contributions from Amitabh Chaudhary, Danny Chen, Pranay Harsh, Sharon Hu, Peter Kogge, Craig Lent, Marya Lieberman, Wolfgang Porod, Ram Ravichandran, and Kevin Whitton)

# Talk Outline

- Review basic constructs
  - Circuit constructs and clock
- Implementations
  - Molecular and Magnetic QCA
    - $\cdot$  (systems with cells having only 1 orientation)
    - (systems with cells having 2 orientations)
- Basic building blocks for various implementations
  - ...fundamental building blocks first...
- ...and then architectures that use them...
  - ...and also map well to QCA's device architecture
- Possible killer apps + what's next.

"Conceptual" QCA

### A Device



### A Wire



Signal Propagation Direction



### Wire Cross in the Plane



A 45-degree Wire



### Clocked Molecular QCA



No current leads. No need to contact individual molecules.

### Can use clock for I/O too...

How does a signal from "off chip" address an individual molecular QCA device that is approximately 1.2 *nm* × 1.2 *nm*?

Need a lithographic clock anyhow - use it to provide paths to permanent 0s and 1s.

**T-Junction** 

T-junction input mapped to 23 tile DNA raft





# Implementations

- Molecular
  - See Craig Lent's talk...
- Magnetic
  - Bigger: 100s of nm (A,B)
  - Energy difference b/t 2 states ~ 100-200  $k_bT$  (A,B)
    - (This is at room tempetature)
    - (Energy of 40k<sub>b</sub>T needed to keep thermally induced errors < 1/year) <sup>(A)</sup>
  - Maximum dot dissipation ~  $10^{-17}$  J <sup>(A)</sup>
    - Microprocessor might dissipate ~ 1W  $^{(A)}$
  - Slower: ~100s of MHz for cross-chip frequency (A)
  - Could be integrated w/MRAM, insensitive to radiation <sup>(B)</sup>
  - Useful for space, military applications?

- A: R.P. Cowburn and M.E. Welland, Science, Vol. 287, Issue 5457, 1466-1468, February 2000.
- B: G.H. Bernstein et. al., *Microelectronics Journal*, 36 (2005) p. 619-624.



(note - our focus here mainly molecular, but basic building blocks + architectures should apply to both)

### Molecular QCA - directed assembly (not the only way -- but what I'll talk about...)

Idea: Integrate non-DNA components (devices + interconnect)



### Experimental Liftoff of APTES/attachment of DNA rafts From Marya Lieberman



### Cross-section views of rafts on EBL features



"Nanometer scale rafts bulit from DNA tiles," K. Sarveswaran, P. Huber, M. Lieberman, C. Russo, and C.S. Lent; Proceedings of the 2003 3rd IEEE Conference on Nanotechnology, 2003, p.417-20, vol. 2.

### Jammin' on the surface



# Molecular Systems - What's first?

### <u>...but probably only one cell type on DNA raft...</u>



### This first target is not even that restrictive...

### Ways to cross wires...

### Logical crossings



A wire crossing



A "logical" wire crossing



XOR: (A and B') or (A' and B) (inherent crossing)



Crossing can be made planar

using NANDs

### **Duplication**

• Make extra copies of logic to minimize crossings – especially if logic is so small...



### <u>Time</u>

 $\cdot$  2 signals share the same wire

# Logical Crossings

Statistical mechanics<sup>A</sup> tells us we need ~10-12 nm between parallel wires -- implies a <u>3 cell QCA pitch</u>

NAND crossing can get (relatively) big... ...but, can remap this logic...



If <u>pitch</u> q increased to 3 (from 1), 2 more <u>tiles</u> required in y direction, 1 more in x direction - b/c of inverter

As 12 NANDs needed for logical X, this means at least 36 more tiles!



### ...to reduce area in x dimension

A: Based on : "Thermondynamic behavior of molecular QCA wires and logic devices", Lieberman and Wang, in IEEE T. Nano.

# Logical Crossings

| Design<br>(all 3 cell pitch) | # of tiles<br>in x | # of tiles<br>in y | ~ # of<br>tiles | ~ XOR<br>area            | ~ area of<br>crossing  |
|------------------------------|--------------------|--------------------|-----------------|--------------------------|------------------------|
| NAND-based<br>(1 cell thick) | 8                  | 15                 | 120             | 5,769<br>nm <sup>2</sup> | 23,040 nm <sup>2</sup> |
| Revised<br>(1 cell thick)    | 4                  | 9                  | 36              | 1,728<br>nm <sup>2</sup> | 6,192 nm <sup>2</sup>  |
| Revised<br>(2 cells thick)   | 7                  | 13                 | 91              | 4,368<br>nm <sup>2</sup> | 17,472 nm <sup>2</sup> |
| Revised<br>(3 cells thick)   | 8                  | 17                 | 136             | 6,528<br>nm <sup>2</sup> | 26,112 nm <sup>2</sup> |

What does this number *mean?* /

<sup>A</sup> shows structures containing up to 200 correct tiles

What do these number mean?
<sup>B</sup> shows redundancy to defects
What about this number?

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

A: Rothemund PW, Papadakis N, Winfree E., PLoS Biol. 2004 Dec;2(12):e424. Epub 2004 Dec 7

B: Enrique Blair, M.S. Thesis, 2003.

## Logical Crossings

Some perspective on logical crossing area: Consider...



# An adder with fundamental blocks

C<sub>in</sub> A B Biggest individual structure needed ~36 tiles



M

- Crossing area larger than EBL interconnect pitch matching
  - (I.e. these trenches depend on a pitch too...)
- Only adder itself might be a problem
  - Majority gates can be very small (5-7 DNA tiles)
- Solve by abutting rafts...

# Logic Crossing - basic buildling blocks







### Idea: leverage Duke tile for wire...

- Can place QCA cells at all points...
- ...have universal wiring tile...
- ... simulated with stat. mech.

Goal: tiles self align in EBL trench • "snap together" @ thermal equilibrium

John Reif, et. al -- Duke.

# Duplication

- Idea: push (some) crosses to input<sup>A</sup>
  - Also let's us reuse some basic building/logic blocks
- Duplication...
  - ...works in some cases (all IC local)...
    - I.e. in ISCAS benchmark, area *decreases* as all IC local
  - ...not in others
    - Problem inherently can explode exponentially
- Doing this in select cases works...



A: Chaudhary, Chen, Hu, Niemier, Ravichandran, Whitton -- to appear at ICCAD, 2005, San Jose, CA, Nov. 5-9.

### Number of devices/cm<sup>2</sup>?

Use previous info./designs for back of envelope calculation: how many QCA devices might be in 1 cm<sup>2</sup>... Assume:



### Number of devices/cm<sup>2</sup>?

### What do we get?

| Design                               | devices/<br>bit | Area (cm²)            | ~devices/<br>cm²      | % of <i>logical</i> devices |
|--------------------------------------|-----------------|-----------------------|-----------------------|-----------------------------|
| Adder with<br>logical Xs             | 1750            | 8.5×10 <sup>-10</sup> | 1.50×10 <sup>12</sup> | 7%                          |
| Adder<br>(duplication)               | 400             | 3.3×10-10             | 1.20×10 <sup>12</sup> | 22%                         |
| Adder –<br>theoretical<br>constructs | 160             | 1.1×10-10             | 1.47×10 <sup>12</sup> | 35%                         |

Seemingly doesn't make sense...

Huh? \_\_\_\_\_ EBL for adder with logical crossings masks some wiring overhead

- + need to consider how many devices are logic vs. IC...

# realistically higher - adder leverages majority gate function... Also, must consider that this leverages traditional architecture/adder design + in QCA wires are made of devices

### Architectures

- EVERYTHING is pipelined
  - In the past, instruction execution was pipelined





A: Hinton, et. al. "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal, Q1, 2001, p. 1-12.
B: I.e J. Cong, Y. Fan, Z. Zhang, "Architectural-Level Synthesis for Automatic Interconnect Pipelining", DAC 2004, June 7-11

# Architectures (cont.)

- Data can be/is latched on wires
  - Good and bad:
    - Lends itself to high throughput (example soon)...
    - ...but medium + global IC can be difficult
  - Forwarding difficult at best...



# Architectures (cont.)

- Defects
  - Must consider when computing at the nano-scale...
    - ...especially anything that is self-assembled
  - Simple, regular, and replicable offers some protection
    - I.e. broken wire or missing tile or defective tile
- We'll discuss:
  - PLAs, reconfigurable, systolic, and counterflow
    - · PLA
      - again, seemingly "simple" clock + some inherent redundancy
      - NOT best architecture for QCA but illustrates what might work quite well...
    - Systolic and counterflow seem to map well...
      - ...no global IC + potential for simpler clock structures...
    - What else?

# Example PLA design (AND plane)



# Example PLA design ("throughput")



Dependent on granularity of CMOS clock.

If each colored region clocked separately, get two *f* values per "clock cycle"

If not, depends on granularity of clock...

# Example PLA design ("parts")



# Example PLA design (area)



Back of the envelope calculations...

Considering EBL, each node would be about 150 nm × 150 nm



constructs, each node *possibly* 110 nm × 60 nm...

# Example PLA design (programming)



Can *physically* program by keeping certain clock wires permanently high... Idea applied in reverse wires kept low to always keep part of a circuit off...<sup>A</sup>

> QuickTime<sup>™</sup> and a TIFF (LZW) decompressor are needed to see this picture.

## Example PLA Design (AND and OR)



OR plane is almost exactly the same structure, just reversed...

# PLA - counterflow



### Conclusions

- Most architectural work should apply to all implementations
  - Even with first target, can do interesting things at reasonable scales...
- Can design a processor + memory to...
  - Conventional von Neumann architecture probably not most efficient<sup>A</sup>...
- CS work should guide PS as to what parts to build 1st...
- Density numbers good for (probably) bad architectures...
  - ...and a gate is only 6 cells and all IC is cells...

## Conclusions

- Systolic<sup>A</sup>, wave-like, counterflow architectures all insinuated by PLA slides... (Doug Berger's work too...)
  - (Some) applications that might map well to QCA
    - Signal processing FIR, IIR
    - Matrix arithmetic, Eigenvalue calculations
    - Non-numeric applications: graph algorithms, language recognition, polynomial division, etc.
- ...interesting designs look possible with even the simplest of constructs...
  - In working group yesterday...
    - Intel successful in part b/c they found a way to build lots of the same basic part with high yield...
    - ...apply this lesson here...