

#### **IBM Research**

#### Prospects for Solid State Data Storage: Beyond Flash Memory and the Hard Disk Drive

Gian-Luca Bona

gianni@us.ibm.com IBM Research, Almaden Research Center



© 2006 IBM Corporation



## **Incumbent Semiconductor Memories**



| - |   |  |
|---|---|--|
|   | - |  |
|   |   |  |
|   |   |  |

## **Incumbent Semiconductor Memories**



© 2006 IBM Corporation

IBM

## Non-volatile, universal semiconductor memory



- Everyone is looking for a dense (cheap) crosspoint memory.
- It is relatively easy to identify materials that show bistable hysteretic behavior (easily distinguishable, stable on/off states).

| CONTRACTOR OF | JEL. | A COMPANY |     |                                  |
|---------------|------|-----------|-----|----------------------------------|
| IBM           |      | 000       | roh |                                  |
|               | Re   | Sea       |     | a da da fata da se               |
|               |      |           |     | and in the set of the set of the |

|   |   | 1.1 | _ | _ |
|---|---|-----|---|---|
| - | - | -   |   | = |
|   |   | =   |   |   |
| _ | _ |     |   |   |

## The Memory Landscape





# **Emerging Memory Technologies**

**IBM Research** 

Memory technology remains an active focus area for the industry





## Critical applications are undergoing a paradigm shift



Thesis: Disks or Flash can't keep up w/data centric applications

Proposal: Develop device technology and build a high density array and demonstrate performance and endurance for the data-centric paradigm



## What are the limitations with disks?

- Bandwidth Access Time Reliability Power
- Disk Performance improves very slowly
  - Gap between processor and disk performance widens rapidly
  - Bandwidth gap can be solved with many parallel disks
    - but need 10,000 disks today, 500,000 disks by 2020
      - but that's just for a traditional high-end HPC system
      - data intensive problems are much worse
  - Access time gap has no good solution
    - disk access times decrease only 5% per year
    - complex caching or task switching schemes help sometimes
- Newest disk generations are *less* reliable than older ones
  - Data losses occur in even the best enterprise-class storage systems
- Disk power dissipation is a major factor in data-centric systems



## HDD Issue:

Mean Time to Data Loss



Assumptions: MTBF 1 Million Hours and RAID5

© 2006 IBM Corporation

| - |   |       |
|---|---|-------|
|   | - |       |
|   |   |       |
|   | _ |       |
|   |   | = 7 = |

## What are the limitations with Flash?

- Read/Write Access Times Write endurance Block architecture
- Flash Performance showing no improvement
  - Gap between processor and Flash performance continues to widen
  - Write endurance  $<10^6$  and showing no improvement trends
    - Need >10<sup>9</sup> to cater to frequent writes as data continually flows into the system
      - Tomorrow's hand-held devices will be continuously updated
      - Intel applications characterized by continuous data streams
  - Access time gap has no good solution

#### IBM Research



## **Storage Historic Price Trend and Forecast**



|   | _ |  |
|---|---|--|
| _ |   |  |

## **Storage Class Memory Target Specifications**

| *  | Access Time      | ~100-200 ns                        |
|----|------------------|------------------------------------|
| ,  | Data Rate (MB/s) | 100                                |
| *  | Endurance        | 10 <sup>9</sup> - 10 <sup>12</sup> |
| ,, | HER (/TB)        | <b>10</b> <sup>-4</sup>            |
|    | MTBF (MH)        | 2                                  |
|    | On Power (mW)    | 100                                |
| *  | Standby (mW)     | 1                                  |
| *  | Cost (\$/GB)     | <5.5                               |
| Ţ  | CGR              | 35%                                |



Very challenging to achieve in combination



# SCM Basic Concepts (Phase Change Example)

- Using a phase transition of a Ge-Sb-Te alloy to store a bit
- Ge-Sb-Te exists in a stable amorphous and a stable crystalline phase
  - Phases have very different electrical resistances
- Transition between phases by controlled heating/cooling
  - Write '1' : short (10ns) intense current pulse melts alloy crystal => amorphous
  - Write '0' : longer (50ns) weaker current pulse re-crystalizes alloy => crystalline
  - Read : short weak pulse senses resistance, but doesn't change phase

 Non-Si based proprietary diode materials being developed for high-ON current density (> 10<sup>7</sup> A/cm<sup>2</sup> – needed for PCM) and ultra-low OFF current density (< 1 A/cm<sup>2</sup>).



#### IBM Research

## Phase-Change Nano-Bridge

 Prototype memory device with ultra-thin (3nm) films demonstrated Dec '06

•  $3nm * 20nm \rightarrow 60nm^2$ 



Phase-change "bridge"

Che New Hork Eimes



 $\rightarrow$  phase-change scales Fast (<100ns SET)</p> Low current (< 100μA RESET)</p> current [µA] Applied voltage IV 0.5 (e) SET 60 80 120 20 40 100 Time [ns] current [µA] .6 Applied Voltage [V] 20 0 8 20 40 60 80 100 120 Time [ns]

≈ Flash roadmap for **2013** 





Current scales with area





## Processing Cost and F<sup>2</sup>

- The bit cell size drives the cost of any memory
- Cell area is expressed in units of F<sup>2</sup> where F is the minimum lithographic feature of the densest process layer
  - Half pitch dimension of metallization connecting drain and source for ICs
  - MR sensor width in magnetic recording
- Cell areas

. . . .

- DRAM  $8F^2 \rightarrow 6F^2$
- NAND  $4F^2 \rightarrow 2F^2$
- SRAM 100F<sup>2</sup>
- MRAM 20F<sup>2</sup> -- 40F<sup>2</sup>
- Hard Disk 0.5F<sup>2</sup> → 1F<sup>2</sup>





## Low Cost Requires High Density



#### Need Effective Cell Size < $4F^2$ (F $\rightarrow$ Lithography Half-Pitch) 2D $\rightarrow$ Better Scaling & Fewer Process Steps, but Requires

- Interface Between Litho (F) & Sub-Litho (Fs)
- Viable Method to Manufacture Sub-Lithographic Arrays



# **Crossbar Memory Fundamentals**





# Micro-Nanoscale Decoder

**Current in Fins** 

Absolute



- Sub lithographic feature is selected by moving depletion across the fine structure
- Modulating signal is brought in by lithographically defined lines

IBM

Research

• Fins down to sub 20 nm have been addressed





#### IBM Almaden Research Center

## **MNAB Concept Demonstrated**



100nm Pitch MNAB Devices Fabricated by E-Beam Lithography

> Obtained Fully Functional Devices

Selectivity > 10<sup>5</sup>



Corporation



# Combining Micro-Nano Decoder and ROM



4-fin UMB+ROM test structure

FIB x-SEM through gated fins (A-A')

✓ Successful integration of UMB with memory element (2 terminal oxide antifuse ROM)

✓ Verified operation over all bit sequences for 4-fin UMB+ROM



## **Nanoscale Patterning Techniques**

**Spacers** 

## **Self Assembly**



(IBM T J Watson Research Center)

# WD 200 nm 4.9 high dose

Frequency doubling – 40 nm to 20 nm pitch (IBM) Nanoimprint Lithography



(IBM Almaden)

- Various nanoscale patterning techniques exist.
- Sub 20 nm pitch demonstrated.
- Only regular line / space patterns possible.



| <br>_ | 1 | _ | _ |
|-------|---|---|---|
|       |   |   |   |
| _     |   |   | - |
| -     |   |   |   |
|       |   |   |   |

## **Value Proposition for SCM in Storage Controllers**

- Significantly improved cost/performance long before SCM competes with DASD on cost
- Simple cache model (cube root rule), queuing effects ignored, miss rate starts at 50%
- Columns in figure

**IBM Research** 

- 1st business as usual (BAU), DRAM cache @ .2% of DASD capacity
- 2<sup>nd</sup> same cost as BAU, but DRAM replaced by SCM, SCM @ 2% of DASD capacity, perf. ~2x
- 3<sup>rd</sup> Hierarchical storage cost now 2.8x 1<sup>st</sup> column, SCM capacity now 20% of DASD, perf. >3.6x
- 4<sup>th</sup> cost now 1-10x, SCM only storage used in system. Perf. >12x
- Performance assessed at application interface to OS. So, I/O stack in host, fabric latency and storage controller microcode processing time and data transfer time are included



Response Time from Simple Cache Model

| - | - |  |
|---|---|--|
|   | - |  |
|   |   |  |
|   | _ |  |
|   |   |  |
|   |   |  |

# SCM impact on large HPC storage systems

## Large file throughput

- ->>10x improvement per TB of file system possible
  - Limited by interconnect and controller bandwidth
  - Limited by file system OS software overheads
- Good match for check-pointing
- Bulk storage costs high

## Small file and metadata access rates

- Access rate improvement >100 feasible
- Limited by software stack and controller overhead

| _ | <br> |
|---|------|
|   |      |
| - |      |
|   | <br> |
|   |      |

## **Magnetic Racetrack Memory**



•Data stored as pattern of domains in long nanowire or "racetrack" of magnetic material.

•Data stored magnetically and is non-volatile.

•Current pulses move domains along racetrack - *no moving parts, just the patterns move.* 

•Each memory location stores *an entire bit pattern* (10, 100, 1000 bits?) rather than just a single bit.

#### IBM Research

## Magnetic Race-Track Memory

- Information stored as domain walls in vertical "race track"
  - Data stored in the third dimension in tall columns of magnetic material
- Domains moved around track using nanosecond pulses of current
- 10 to 100 times the storage capacity of conventional solid state memory



Magnetic Race Track Memory S. Parkin (IBM), US patents 6,834,005 (2004) & 6,898,132 (2005)





#### IBM Research



## Large Magnetic Anisotropy for Single Atoms







- The energy that is required to change the direction of a single spin was measured.
- Large single-atom magnetic anisotropy for iron of about 6 meV.
- About 50x weaker anisotropy for manganese on same surface.
- Spin excitation spectroscopy reveals spin energy levels, including their magnetic field dependence.
- DFT calculations elucidate surface structure and leads to same total spin as experiment.
- GOAL: engineer very large magnetic anisotropy to demonstrate data storage.



• Atomic memory – "there's a lot of room at the bottom..."

~2030?





Science & Technology

. . .



## **Innovation and Impact**

### Storage class memory (SCM)

- New nonvolatile solid state memory with fast access and high throughput
- Robustness, volumetric density and power significantly better than disks

#### Will revolutionize memory/storage hierarchy

- >10x throughput, >100x transaction rate potential
- Applications will be revamped to exploit new technology

### New applications or significantly extended applications, e.g.

- Sensor/actuator systems with storage at the network edge
- Mobile applications, e.g. semiautonomous video gatherers