Overview Sub-systems Kernel API & EBA subsystem Wear-leveling subsystem Scanning subsystem
2
UBI – Unsorted Block Images
A volume management system Provides static and dynamic volumes
Wear-leveling across whole flash device
Transparent bad blocks management
Read disturbance handling
Merged in the mainline Linux kernel since v2.6.22
3
4
MTD Device
MTD Partition MTD Partition (UBI Device)
Dynamic UBI VolumeStatic UBI Volume
(0,0) (0,1) ... (0,P) (1,0) (1,1) (1,2) (1,3) (1,4) (1,5) ... (1,Q)
0 1 2 3 4 5 6 7 8 9 ... N
Root Filesystem (UBIFS)Kernel ImageBootloader
5
MTD Layer
UBI Wear-leveling Subsystem
UBI Kernel API
UBI I/O Subsystem
UBI Scanning Subsystem
UBI Erase Block Association SubsystemUBI Initialization
6
fs_write()
fs_read()
Filesystem
ubi_leb_write()
ubi_leb_map()
ubi_leb_read()
ubi_leb_unmap()
ubi_leb_erase()
UBI KAPI
ubi_eba_write_leb()
ubi_eba_map_leb()
ubi_eba_unmap_leb()
ubi_eba_read_leb()
ubi_eba_copy_leb()
UBI EBA
ubi_wl_get_peb()
ubi_wl_put_peb()
ubi_wl_scrub_peb()
ubi_wl_flush()
UBI WL Read from an unmapped LEB
Read from a mapped LEB
Write to a mapped LEB
Write to an unmapped LEB
Map a LEB
Unmap a LEB
Erase a LEB
ubi_io_read()
ubi_io_write()
ubi_io_sync_erase()
UBI IO
ubi_io_read_vid_hdr()
ubi_io_write_vid_hdr()
ubi_io_read_data()
ubi_io_write_data()
ubi_io_read_ec_hdr()
ubi_io_write_ec_hdr()
ubi_io_mark_bad()
Responsible for Management of PEBs Wear-leveling Scrubbing (read disturbance)
Works in terms of PEBs and erase counters
Knows nothing about LEBs, volumes, etc
Internal data structures Four RB-trees and one queue
External interfaces ubi_wl_get_peb() ubi_wl_put_peb() ubi_wl_scrub_peb() ubi_wl_flush()
8
9
erroneous
scrub
free
used 8,33,9 6,6
pq 3,4
drivers/mtd/ubi/ubi.hstruct ubi_device {
...struct rb_root used;struct rb_root erroneous;struct rb_root free;struct rb_root scrub;struct list_head pq[UBI_PROT_QUEUE_LEN];...
}
All good PEBs are managed with four RB-trees, and one queue
Note: These RB-trees use (ec, pnum) pairs as keys
2,5 3,21,1 1,7 7,8Free PEBs
In-used PEBs
Good PEBs
10
erroneous
scrub
free
used 8,33,9 6,6
pq 3,4
drivers/mtd/ubi/wl.cint ubi_wl_get_peb(struct ubi_device *ubi, int dtype)int ubi_wl_put_peb(struct ubi_device *ubi, int pnum, int torture)int ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum)int ubi_wl_flush(struct ubi_device *ubi)
2,5 3,21,1 1,7 7,8
ubi_thread
ubi_wl_get_peb()
ubi_wl_put_peb()
ubi_wl_scrub_peb()
ubi_wl_flush()
11
erroneous
scrub
used 8,33,9 6,6
pq 3,4
ubi_thread
drivers/mtd/ubi/wl.cint ubi_wl_get_peb(struct ubi_device *ubi, int dtype)
1. Pick a PEB from the free RB-tree
according to the hint @dtype
• longterm
• shortterm
• unknown
2. Move the picked PEB to the pq queue
• why pq? why not used?
free 2,5 3,21,1 1,7 7,8
Keep newly allocated PEBs from being moved due to wear-leveling.
shortterm longtermunknown
12
erroneous
scrub
used
1,1
3,9 6,6
pq 3,4
ubi_thread
drivers/mtd/ubi/wl.cint ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum)
1. Move the PEB @pnum from pq/used to
scrub
2. Schedule a wear-leveling request
free 2,5 3,21,7 7,8
Besides wear-leveling, I also take care of scrubbing.
8,3
13
erroneous
scrub
used 3,9
pq 3,4
ubi_thread
drivers/mtd/ubi/wl.cint ubi_wl_put_peb(struct ubi_device *ubi, int pnum, int torture)
1. Remove the PEB @pnum from one of
the in-used RB-trees or pq.
2. Schedule the PEB @pnum for erasure.
3. When the erasure is done without any
error, the PEB will be put back to the free
RB-tree.
free 2,5 3,2
6,6
1,7 7,8
Again, the erasure will be delagated to me.
8,3
1,1
6,6
14
ubi_thread
drivers/mtd/ubi/wl.cint ubi_wl_flush(struct ubi_device *ubi)
ubi_work ubi_work ubi_work ubi_work
erase_worker() wear_leveling_worker()1. Flush all pending works
drivers/mtd/ubi/wl.cstruct ubi_work {
struct list_head list;int (*func)(struct ubi_device *ubi, struct ubi_work *wrk, int cancel);
/* The below fields are only relevant to erasure works */struct ubi_wl_entry *e;int torture;
};
if (!free || (!scrub && !used))return
if (scrub) {e1 = pick the least worn out PEB from the @scrube2 = find_wl_entry(free, WL_FREE_MAX_DIFF)
}else {
e1 = pick the least worn out PEB from the @usede2 = find_wl_entry(free, WL_FREE_MAX_DIFF)
if ((e2->ec – e1->ec)<UBI_WL_THRESHOLD)return;
}
ubi_eba_copy_leb(ubi, e1->pnum, e2->pnum, vid_hdr)
15
drivers/mtd/ubi/wl.cstatic int wear_leveling_worker(struct ubi_device *ubi,
struct ubi_work *wrk, int cancel)
static struct ubi_wl_entry *find_wl_entry(struct rb_root *root, int max)
erroneous
scrub
used 3,9
pq 3,4
free 2,5 3,21,7 6,6
8,3
1,1
7,8
err = sync_erase(ubi, e, wl_wrk->torture);if (!err) {
wl_tree_add(e, &ubi->free);serve_prot_queue(ubi);return ensure_wear_leveling(ubi);
}
if (err == -EINTR || err == -ENOMEM || err == -EAGAIN || err == -EBUSY)return schedule_erase(ubi, e, 0)
else if (err != -EIO)goto out_ro;
/* It is %-EIO, the PEB went bad */if (!ubi->bad_allowed)
goto out_ro;
if (ubi->beb_rsvd_pebs == 0)goto out_ro;
err = ubi_io_mark_bad(ubi, pnum);return err;
out_ro:ubi_ro_mode(ubi) /* switch to read-only mode */return err;
16
drivers/mtd/ubi/wl.cstatic int erase_worker(struct ubi_device *ubi, struct ubi_work *wrk, int cancel)
erroneous
scrub
used 3,9
pq 3,4
free 2,5 3,21,7 6,6
8,3
1,1
7,8
Responsible for Scanning the flash media Checking UBI headers Providing complete information about the UBI flash image
UBI on-flash data structures Erase Counter Header Volume Identifier Header Volume Table
Temporary data structures during scanning process Scan Info Scan Volume Scan Erase Block Four lists: free, erase, corr, alien
Unclean reboot
18
19
Every good PEB has a 64-byte Erase Counter Header
Every good mapped PEB has a 64-byte Volume Identifier Header
A “layout volume” contains two copies of the Volume Table
…
5 7 N860 1 3
0,1 0,P 1,0 1,1 2,0 2,2 2,Q
2 4
0,0 2,1 ...
...
...
PEBs
LEBs
drivers/mtd/ubi/ubi-media.hstruct ubi_ec_hdr {
__be32 magic; /* EC header magic number (%UBI_EC_HDR_MAGIC) */__u8 version; /* version of UBI implementation */__u8 padding1[3]; /* reserved for future, zeroes */__be64 ec; /* the erase counter */__be32 vid_hdr_offset; /* where the VID header starts */__be32 data_offset; /* where the user data start */__be32 image_seq; /* image sequence number */__u8 padding2[32]; /* reserved for future, zeroes */__be32 hdr_crc; /* erase counter header CRC checksum */
} __attribute__ ((packed));
205 7 N860 1 3
0,1 0,P 1,0 1,1 2,0 2,2 2,Q
Every good PEB has a 64-byte Erase Counter Header
…
2 4
0,0 2,1 ...
...
...
PEBs
LEBs
drivers/mtd/ubi/ubi-media.hstruct ubi_vid_hdr {
__be32 magic; /* VID magic number (%UBI_VID_HDR_MAGIC)*/ __u8 version; /* version of UBI implementation */__u8 vol_type; /* volume type (%UBI_VID_DYNAMIC or %UBI_VID_STATIC) */__u8 copy_flag; /* for wear-leveling reasons */__u8 compat; /* compatibility of this volume */__be32 vol_id; /* ID of this volume */__be32 lnum; /* LEB number */__u8 padding1[4]; /* reserved for future, zeroes */__be32 data_size; /* bytes of data this LEB contains */__be32 used_ebs; /* total number of used LEBs in this volume */__be32 data_pad; /* padded bytes at the end of this PEB */__be32 data_crc; /* CRC of the data stored in this LEB */__u8 padding2[4]; /* reserved for future, zeroes */__be64 sqnum; /* sequence number */__u8 padding3[12]; /* reserved for future, zeroes */__be32 hdr_crc; /* VID header CRC checksum */
} __attribute__ ((packed));
215 7 N860 1 3
0,1 0,P 1,0 1,1 2,0 2,2 2,Q
Every good mapped PEB has a 64-byte Volume Identifier Header
…
2 4
0,0 2,1 ...
...
...
PEBs
LEBs
drivers/mtd/ubi/ubi-media.hstruct ubi_vtbl_record {
__be32 reserved_pebs; /* physical eraseblocks reserved for this volume */__be32 alignment; /* volume alignment */__be32 data_pad; /* padded bytes for the requested alignment */__u8 vol_type; /* %UBI_VID_DYNAMIC or %UBI_VID_STATIC */__u8 upd_marker; /* if volume update was started but not finished */__be16 name_len; /* volume name length */__u8 name[UBI_VOL_NAME_MAX+1]; /* volume name */__u8 flags; /* volume flags (%UBI_VTBL_AUTORESIZE_FLG) */__u8 padding[23]; /* reserved for future, zeroes */__be32 crc; /* CRC32 checksum of the record */
} __attribute__ ((packed));
225 7 N860 1 3
0,1 0,P 1,0 1,1 2,0 2,2 2,Q
A “layout volume” contains two copies of the Volume Table
…
2 4
0,0 2,1 ...
...
...
PEBs
LEBs
23
SEB
0
SEB
1
……SEB
0
...SEB
205
SEB
100
…SEB
101
SEB
2
… …
SEB
522
...
...
...
PEBs
Volumes
Scan Info
Scan Volume
Scan Erase Block
“layout volume”(internal)
…corr
free
erase
alien
…
…
…
24
EC hdr is written to a PEB right after the PEB is erased
drivers/mtd/ubi/wl.cstatic int sync_erase(struct ubi_device *ubi, struct ubi_wl_entry *e, int torture){
unsigned long long ec = e->ec;
[... Deleted ...]
err = ubi_io_sync_erase(ubi, e->pnum, torture);if (err < 0)
goto out_free;ec += err;if (ec > UBI_MAX_ERASECOUNTER) {
/** Erase counter overflow. Upgrade UBI and use 64-bit* erase counters internally.*/ubi_err("erase counter overflow at PEB %d, EC %llu", e->pnum, ec);err = -EINVAL;goto out_free;
}
dbg_wl("erased PEB %d, new EC %llu", e->pnum, ec);ec_hdr->ec = cpu_to_be64(ec);err = ubi_io_write_ec_hdr(ubi, e->pnum, ec_hdr);[... Deleted ...]
}
25
Map a LEB L to PEB P Write VID header (with lnum L) to P
Unmap a LEB L to PEB P Schedule P for erasure
Remap a LEB L from PEB P0 to PEB P1 Schedule P0 for erasure
Write VID header (with lnum L) to P1
Copy a PEB P0 which is mapped to L to PEB P1 Write VID header (with lnum L) to P1
Copy contents of P0 to P1
Schedule P0 for erasure
26
Whenever the volume table needs update
(The following speaks in the context of “layout volume”)
Unmap LEB 0
Write updated table to LEB 0
Unmap LEB 1
Write updated table to LEB 1
drivers/mtd/ubi/vtbl.cint ubi_change_vtbl_record(struct ubi_device *ubi, int idx,
struct ubi_vtbl_record *vtbl_rec){
[... Deleted ...]layout_vol = ubi->volumes[vol_id2idx(ubi, UBI_LAYOUT_VOLUME_ID)];[... Deleted ...]
memcpy(&ubi->vtbl[idx], vtbl_rec, sizeof(struct ubi_vtbl_record));for (i = 0; i < UBI_LAYOUT_VOLUME_EBS; i++) {
err = ubi_eba_unmap_leb(ubi, layout_vol, i);if (err)
return err;err = ubi_eba_write_leb(ubi, layout_vol, i, ubi->vtbl, 0,
ubi->vtbl_size, UBI_LONGTERM);if (err)
return err;return 0;
}}
Every piece about MTD and UBI can be found on the MTD website
http://www.linux-mtd.infradead.org/