Introduction to UBI

Post on 15-Jun-2015

1,522 views 2 download

Tags:

transcript

Overview Sub-systems Kernel API & EBA subsystem Wear-leveling subsystem Scanning subsystem

2

UBI – Unsorted Block Images

A volume management system Provides static and dynamic volumes

Wear-leveling across whole flash device

Transparent bad blocks management

Read disturbance handling

Merged in the mainline Linux kernel since v2.6.22

3

4

MTD Device

MTD Partition MTD Partition (UBI Device)

Dynamic UBI VolumeStatic UBI Volume

(0,0) (0,1) ... (0,P) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) ... (1,Q)

0 1 2 3 4 5 6 7 8 9 ... N

Root Filesystem (UBIFS)Kernel ImageBootloader

5

MTD Layer

UBI Wear-leveling Subsystem

UBI Kernel API

UBI I/O Subsystem

UBI Scanning Subsystem

UBI Erase Block Association SubsystemUBI Initialization

6

fs_write()

fs_read()

Filesystem

ubi_leb_write()

ubi_leb_map()

ubi_leb_read()

ubi_leb_unmap()

ubi_leb_erase()

UBI KAPI

ubi_eba_write_leb()

ubi_eba_map_leb()

ubi_eba_unmap_leb()

ubi_eba_read_leb()

ubi_eba_copy_leb()

UBI EBA

ubi_wl_get_peb()

ubi_wl_put_peb()

ubi_wl_scrub_peb()

ubi_wl_flush()

UBI WL Read from an unmapped LEB

Read from a mapped LEB

Write to a mapped LEB

Write to an unmapped LEB

Map a LEB

Unmap a LEB

Erase a LEB

ubi_io_read()

ubi_io_write()

ubi_io_sync_erase()

UBI IO

ubi_io_read_vid_hdr()

ubi_io_write_vid_hdr()

ubi_io_read_data()

ubi_io_write_data()

ubi_io_read_ec_hdr()

ubi_io_write_ec_hdr()

ubi_io_mark_bad()

Responsible for Management of PEBs Wear-leveling Scrubbing (read disturbance)

Works in terms of PEBs and erase counters

Knows nothing about LEBs, volumes, etc

Internal data structures Four RB-trees and one queue

External interfaces ubi_wl_get_peb() ubi_wl_put_peb() ubi_wl_scrub_peb() ubi_wl_flush()

8

9

erroneous

scrub

free

used 8,33,9 6,6

pq 3,4

drivers/mtd/ubi/ubi.hstruct ubi_device {

...struct rb_root used;struct rb_root erroneous;struct rb_root free;struct rb_root scrub;struct list_head pq[UBI_PROT_QUEUE_LEN];...

}

All good PEBs are managed with four RB-trees, and one queue

Note: These RB-trees use (ec, pnum) pairs as keys

2,5 3,21,1 1,7 7,8Free PEBs

In-used PEBs

Good PEBs

10

erroneous

scrub

free

used 8,33,9 6,6

pq 3,4

drivers/mtd/ubi/wl.cint ubi_wl_get_peb(struct ubi_device *ubi, int dtype)int ubi_wl_put_peb(struct ubi_device *ubi, int pnum, int torture)int ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum)int ubi_wl_flush(struct ubi_device *ubi)

2,5 3,21,1 1,7 7,8

ubi_thread

ubi_wl_get_peb()

ubi_wl_put_peb()

ubi_wl_scrub_peb()

ubi_wl_flush()

11

erroneous

scrub

used 8,33,9 6,6

pq 3,4

ubi_thread

drivers/mtd/ubi/wl.cint ubi_wl_get_peb(struct ubi_device *ubi, int dtype)

1. Pick a PEB from the free RB-tree

according to the hint @dtype

• longterm

• shortterm

• unknown

2. Move the picked PEB to the pq queue

• why pq? why not used?

free 2,5 3,21,1 1,7 7,8

Keep newly allocated PEBs from being moved due to wear-leveling.

shortterm longtermunknown

12

erroneous

scrub

used

1,1

3,9 6,6

pq 3,4

ubi_thread

drivers/mtd/ubi/wl.cint ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum)

1. Move the PEB @pnum from pq/used to

scrub

2. Schedule a wear-leveling request

free 2,5 3,21,7 7,8

Besides wear-leveling, I also take care of scrubbing.

8,3

13

erroneous

scrub

used 3,9

pq 3,4

ubi_thread

drivers/mtd/ubi/wl.cint ubi_wl_put_peb(struct ubi_device *ubi, int pnum, int torture)

1. Remove the PEB @pnum from one of

the in-used RB-trees or pq.

2. Schedule the PEB @pnum for erasure.

3. When the erasure is done without any

error, the PEB will be put back to the free

RB-tree.

free 2,5 3,2

6,6

1,7 7,8

Again, the erasure will be delagated to me.

8,3

1,1

6,6

14

ubi_thread

drivers/mtd/ubi/wl.cint ubi_wl_flush(struct ubi_device *ubi)

ubi_work ubi_work ubi_work ubi_work

erase_worker() wear_leveling_worker()1. Flush all pending works

drivers/mtd/ubi/wl.cstruct ubi_work {

struct list_head list;int (*func)(struct ubi_device *ubi, struct ubi_work *wrk, int cancel);

/* The below fields are only relevant to erasure works */struct ubi_wl_entry *e;int torture;

};

if (!free || (!scrub && !used))return

if (scrub) {e1 = pick the least worn out PEB from the @scrube2 = find_wl_entry(free, WL_FREE_MAX_DIFF)

}else {

e1 = pick the least worn out PEB from the @usede2 = find_wl_entry(free, WL_FREE_MAX_DIFF)

if ((e2->ec – e1->ec)<UBI_WL_THRESHOLD)return;

}

ubi_eba_copy_leb(ubi, e1->pnum, e2->pnum, vid_hdr)

15

drivers/mtd/ubi/wl.cstatic int wear_leveling_worker(struct ubi_device *ubi,

struct ubi_work *wrk, int cancel)

static struct ubi_wl_entry *find_wl_entry(struct rb_root *root, int max)

erroneous

scrub

used 3,9

pq 3,4

free 2,5 3,21,7 6,6

8,3

1,1

7,8

err = sync_erase(ubi, e, wl_wrk->torture);if (!err) {

wl_tree_add(e, &ubi->free);serve_prot_queue(ubi);return ensure_wear_leveling(ubi);

}

if (err == -EINTR || err == -ENOMEM || err == -EAGAIN || err == -EBUSY)return schedule_erase(ubi, e, 0)

else if (err != -EIO)goto out_ro;

/* It is %-EIO, the PEB went bad */if (!ubi->bad_allowed)

goto out_ro;

if (ubi->beb_rsvd_pebs == 0)goto out_ro;

err = ubi_io_mark_bad(ubi, pnum);return err;

out_ro:ubi_ro_mode(ubi) /* switch to read-only mode */return err;

16

drivers/mtd/ubi/wl.cstatic int erase_worker(struct ubi_device *ubi, struct ubi_work *wrk, int cancel)

erroneous

scrub

used 3,9

pq 3,4

free 2,5 3,21,7 6,6

8,3

1,1

7,8

Responsible for Scanning the flash media Checking UBI headers Providing complete information about the UBI flash image

UBI on-flash data structures Erase Counter Header Volume Identifier Header Volume Table

Temporary data structures during scanning process Scan Info Scan Volume Scan Erase Block Four lists: free, erase, corr, alien

Unclean reboot

18

19

Every good PEB has a 64-byte Erase Counter Header

Every good mapped PEB has a 64-byte Volume Identifier Header

A “layout volume” contains two copies of the Volume Table

5 7 N860 1 3

0,1 0,P 1,0 1,1 2,0 2,2 2,Q

2 4

0,0 2,1 ...

...

...

PEBs

LEBs

drivers/mtd/ubi/ubi-media.hstruct ubi_ec_hdr {

__be32 magic; /* EC header magic number (%UBI_EC_HDR_MAGIC) */__u8 version; /* version of UBI implementation */__u8 padding1[3]; /* reserved for future, zeroes */__be64 ec; /* the erase counter */__be32 vid_hdr_offset; /* where the VID header starts */__be32 data_offset; /* where the user data start */__be32 image_seq; /* image sequence number */__u8 padding2[32]; /* reserved for future, zeroes */__be32 hdr_crc; /* erase counter header CRC checksum */

} __attribute__ ((packed));

205 7 N860 1 3

0,1 0,P 1,0 1,1 2,0 2,2 2,Q

Every good PEB has a 64-byte Erase Counter Header

2 4

0,0 2,1 ...

...

...

PEBs

LEBs

drivers/mtd/ubi/ubi-media.hstruct ubi_vid_hdr {

__be32 magic; /* VID magic number (%UBI_VID_HDR_MAGIC)*/ __u8 version; /* version of UBI implementation */__u8 vol_type; /* volume type (%UBI_VID_DYNAMIC or %UBI_VID_STATIC) */__u8 copy_flag; /* for wear-leveling reasons */__u8 compat; /* compatibility of this volume */__be32 vol_id; /* ID of this volume */__be32 lnum; /* LEB number */__u8 padding1[4]; /* reserved for future, zeroes */__be32 data_size; /* bytes of data this LEB contains */__be32 used_ebs; /* total number of used LEBs in this volume */__be32 data_pad; /* padded bytes at the end of this PEB */__be32 data_crc; /* CRC of the data stored in this LEB */__u8 padding2[4]; /* reserved for future, zeroes */__be64 sqnum; /* sequence number */__u8 padding3[12]; /* reserved for future, zeroes */__be32 hdr_crc; /* VID header CRC checksum */

} __attribute__ ((packed));

215 7 N860 1 3

0,1 0,P 1,0 1,1 2,0 2,2 2,Q

Every good mapped PEB has a 64-byte Volume Identifier Header

2 4

0,0 2,1 ...

...

...

PEBs

LEBs

drivers/mtd/ubi/ubi-media.hstruct ubi_vtbl_record {

__be32 reserved_pebs; /* physical eraseblocks reserved for this volume */__be32 alignment; /* volume alignment */__be32 data_pad; /* padded bytes for the requested alignment */__u8 vol_type; /* %UBI_VID_DYNAMIC or %UBI_VID_STATIC */__u8 upd_marker; /* if volume update was started but not finished */__be16 name_len; /* volume name length */__u8 name[UBI_VOL_NAME_MAX+1]; /* volume name */__u8 flags; /* volume flags (%UBI_VTBL_AUTORESIZE_FLG) */__u8 padding[23]; /* reserved for future, zeroes */__be32 crc; /* CRC32 checksum of the record */

} __attribute__ ((packed));

225 7 N860 1 3

0,1 0,P 1,0 1,1 2,0 2,2 2,Q

A “layout volume” contains two copies of the Volume Table

2 4

0,0 2,1 ...

...

...

PEBs

LEBs

23

SEB

0

SEB

1

……SEB

0

...SEB

205

SEB

100

…SEB

101

SEB

2

… …

SEB

522

...

...

...

PEBs

Volumes

Scan Info

Scan Volume

Scan Erase Block

“layout volume”(internal)

…corr

free

erase

alien

24

EC hdr is written to a PEB right after the PEB is erased

drivers/mtd/ubi/wl.cstatic int sync_erase(struct ubi_device *ubi, struct ubi_wl_entry *e, int torture){

unsigned long long ec = e->ec;

[... Deleted ...]

err = ubi_io_sync_erase(ubi, e->pnum, torture);if (err < 0)

goto out_free;ec += err;if (ec > UBI_MAX_ERASECOUNTER) {

/** Erase counter overflow. Upgrade UBI and use 64-bit* erase counters internally.*/ubi_err("erase counter overflow at PEB %d, EC %llu", e->pnum, ec);err = -EINVAL;goto out_free;

}

dbg_wl("erased PEB %d, new EC %llu", e->pnum, ec);ec_hdr->ec = cpu_to_be64(ec);err = ubi_io_write_ec_hdr(ubi, e->pnum, ec_hdr);[... Deleted ...]

}

25

Map a LEB L to PEB P Write VID header (with lnum L) to P

Unmap a LEB L to PEB P Schedule P for erasure

Remap a LEB L from PEB P0 to PEB P1 Schedule P0 for erasure

Write VID header (with lnum L) to P1

Copy a PEB P0 which is mapped to L to PEB P1 Write VID header (with lnum L) to P1

Copy contents of P0 to P1

Schedule P0 for erasure

26

Whenever the volume table needs update

(The following speaks in the context of “layout volume”)

Unmap LEB 0

Write updated table to LEB 0

Unmap LEB 1

Write updated table to LEB 1

drivers/mtd/ubi/vtbl.cint ubi_change_vtbl_record(struct ubi_device *ubi, int idx,

struct ubi_vtbl_record *vtbl_rec){

[... Deleted ...]layout_vol = ubi->volumes[vol_id2idx(ubi, UBI_LAYOUT_VOLUME_ID)];[... Deleted ...]

memcpy(&ubi->vtbl[idx], vtbl_rec, sizeof(struct ubi_vtbl_record));for (i = 0; i < UBI_LAYOUT_VOLUME_EBS; i++) {

err = ubi_eba_unmap_leb(ubi, layout_vol, i);if (err)

return err;err = ubi_eba_write_leb(ubi, layout_vol, i, ubi->vtbl, 0,

ubi->vtbl_size, UBI_LONGTERM);if (err)

return err;return 0;

}}

Every piece about MTD and UBI can be found on the MTD website

http://www.linux-mtd.infradead.org/