Innodb Double Write(翻译)

Innodb Double Write(翻译) - Nova  - nova向往尝试着,每天翻译一篇技术文章。Innodb Double Write(翻译) - Nova  - nova向往

【原文】One of very interesting techniques Innodb uses is technique called “doublewrite” It means Innodb will write data twice when it performs table space writes – writes to log files are done only once.
【译文】innodb引擎,我们比较感兴趣的技术之一就是 “doublewrite”,意味着执行表写入的时,写两次数据,写到日志里面仅为一次操作。


【原文】So why doublewrite is needed ? It is needed to\r\narchive data safety in case of partial page writes. Innodb does not log\r\nfull pages to the log files, but uses what is called “physiological”\r\nlogging which means log records contain page number for the operation\r\nas well as operation data (ie update the row) and log sequence\r\ninformation. Such logging structure is geat as it require less data to\r\nbe written to the log, however it requires pages to be internally\r\nconsistent. It does not matter which page version it is – it could be\r\n“current” version in which case Innodb will skip page upate operation\r\nor “former” in which case Innodb will perform update. If page is\r\ninconsistent recovery can’t proceed.

【译文】为什么doublewrite是必须的呢?
在不完全的页被写入的时候,需要做安全的数据归档。innodb引擎并不在日志文件中记录所有的页,但是它用另外一种方式“生理”搬运(曲线救国),意思是为了数据的更新(例如行的更新),日志里仅记录包含了页数和序号信息。类似这样的记录日志结构,规定很少的数据写入到日志中,无论如何它还是要保证数据页在里面一直留存,并不介意数据页的新旧(版本),在不完全恢复无法进行下去时,新的数据页有可能会被忽略执行,但是旧的数据页,必须执行更新操作。


【原文】Now lets talk a bit about partial page writes – what\r\nare they and why are they happening. Partial page writes is when page\r\nwrite request submited to OS completes only partially. For example out\r\nof 16K Innodb page only first 4KB are updated and other parts remain in\r\ntheir former state. Most typically partial page writes happen when\r\npower failure happens. It also can happen on OS crash – there is a\r\nchance operation system will split your 16K write into several writes\r\nand failure happens just between their execution. Reasons for splitting\r\ncould be file fragmentation – most file systems use 4K block sizes by\r\ndefault so 16K could use more than one fragment. Also if software RAID\r\nis used page may come on the stripe border requiring multiple IO\r\nrequests. Same happens with Hardware RAID on power failure if it does\r\nnot have battery backed up cache. If there is single write issued to\r\nthe disk itself it should be in theory completed even if power goes\r\ndown as there should be enough power accomulated inside the drive to\r\ncomplete it. I honestly do not know if this is always the case – it is\r\nhard to check as it is not the only reason for partial page writes. I\r\njust know they tend to happen and before Innodb doublewirite was\r\nimplemented I had couple of data corruptions due to it.
【译文】讨论一下关于不完全页的写入
是什么,为什么会发生?当数据页的写入请求不是完整被提交到操作系统时,不完全页就产生了。举个例子,超过16k的innodb数据页,只有前4k被更新,而其他的部分仍保持原有状态。最有代表性的不完全页的写入常常发生在失去电源供电的时候,也发生在操作系统宕机,还有一个时刻,操作系统会分离你的16k的innodb页,分多次写入,但是错误就发生在他们执行之间,理由是,剥离数据页会产生文件碎片,通常文件系统默认数据页为4k,所以16k的数据页产生多个碎片,如果数据页使用软raid,将产生多IO请求。对cache如果没有电源保护的情况下(BBU),同样也会发生。


未完待续。

  1. 还没有评论

  1. 还没有引用通告。