MyISAM与InnoDB

我正在处理一个涉及很多数据库写入的项目,我会说( 70%的插入和30%的读取 )。 这个比例还将包括我认为是一个阅读和一个写作的更新。 读取可能很脏(例如,在阅读时我不需要100%准确的信息)。
有关任务将每小时进行超过100万次的数据库事务。

我在网上读了一大堆关于MyISAM和InnoDB之间差异的东西,MyISAM似乎是我将用于这个任务的特定数据库/表的显而易见的select。 从我看来,InnoDB是好的,如果事务是需要的,因为行级locking是支持的。

有没有人有这种types的负载(或更高)的经验? MyISAM是否要走?

我用表格简单地讨论这个问题,所以你可以得出结论,哪一个必须selectInnoDB或者MyISAMhttp://developer99.blogspot.com/2011/07/mysql-innodb-vs-myisam.html

下面是在哪种情况下应该使用哪种types的小概述:

                                                  MyISAM InnoDB
 -------------------------------------------------- --------------
要求的全文search是5.6.4
 -------------------------------------------------- --------------
要求交易是的
 -------------------------------------------------- --------------
频繁select查询是      
 -------------------------------------------------- --------------
频繁插入,更新,删除是的
 -------------------------------------------------- --------------
行locking(在单个表上进行多处理)是
 -------------------------------------------------- --------------
关系基础devise是的

总结:

经常阅读,几乎没有写作=> MyISAM
在MySQL中进行全文search<= 5.5 => MyISAM

在所有其他情况下, InnoDB通常是最好的select。

我不是一个数据库专家,而且我也没有经验。 然而:

MyISAM表使用表级locking 。 根据您的stream量估计,您每秒接近200次写入。 使用MyISAM, 只有其中一个可以随时进行 。 您必须确保您的硬件能够跟上这些事务,以避免超限,即单个查询最多不超过5ms。

这意味着你需要一个支持行级locking的存储引擎,例如InnoDB。

另一方面,写一些简单的脚本来模拟每个存储引擎的负载,然后比较结果应该是相当简单的。

人们经常谈论性能,读写操作,外键等等,但是我认为存储引擎还有一个必须具备的特性: primefaces更新。

尝试这个:

  1. 发出一个UPDATE对你的MyISAM表,需要5秒钟。
  2. 当UPDATE正在进行时,比如2.5秒,按Ctrl-C来中断它。
  3. 观察桌子上的效果。 有多less行更新? 有多less没有更新? 表格是否可读,或者当您按Ctrl-C时是否损坏?
  4. 对InnoDB表尝试与UPDATE相同的实验,中断正在进行的查询。
  5. 观察InnoDB表。 行更新。 InnoDB保证你有primefaces更新,如果完全更新不能提交,它会回滚整个更改。 另外,表格也没有损坏。 即使您使用killall -9 mysqld来模拟崩溃,这也是有效的。

性能当然是可取的,但不会丢失数据应该胜过这一点。

我已经在使用MySQL的高容量系统上工作过,并且我已经尝试了MyISAM和InnoDB。

我发现MyISAM中的表级locking会给我们的工作负载带来严重的性能问题,这听起来和你的类似。 不幸的是,我也发现InnoDB下的性能也比我想象的要差。

最后,我通过对数据进行分段来解决争用问题,例如插入到“热”表中,并且从不查询热表。

这也允许删除(数据是时间敏感的,我们只保留X天价值)发生在“陈旧”的表,这再次没有被select查询触及。 InnoDB似乎在批量删除方面performance不佳,所以如果您计划清除数据,您可能希望以旧数据处于旧数据表的方式来构build数据,这样可以简单地删除旧数据,而不用在其上运行删除操作。

当然,我不知道你的应用程序是什么,但希望这可以让你对MyISAM和InnoDB的一些问题有所了解。

对于具有更多写入和读取的负载,您将受益于InnoDB。 因为InnoDB提供了行locking而不是表locking,所以你的SELECT可以是并发的,而不仅仅是彼此之间,而且还有很多的INSERT 。 但是,除非您打算使用SQL事务,否则将InnoDB提交刷新设置为2( innodb_flush_log_at_trx_commit )。 这样可以使您回收从MyISAM到InnoDB的表格时所丢失的许多原始性能。

另外,考虑添加复制。 这给你一些阅读缩放,因为你说你的阅读不必是最新的,你可以让复制落后一点。 只要确保它能在最繁忙的交通状况下迎头赶上,否则它总会落在后面,永远赶不上。 但是,如果你这样做,我强烈build议你将从站的读取和复制延迟pipe理隔离到数据库处理程序。 如果应用程序代码不知道这个,那就简单多了。

最后,注意不同的表格加载。 所有表上的读写比例不会相同。 一些接近100%的较小表格可以保留MyISAM。 同样,如果你有一些接近100%写入的表,你可以从INSERT DELAYED受益,但是这只在MyISAM中受到支持(InnoDB表忽略了DELAYED子句)。

但基准是肯定的。

有点晚了…但是这里有一个我写了几个月后相当全面的post ,详细介绍了MYISAM和InnoDB之间的主要区别。 抓一杯(也许是一个cookies),享受。


MyISAM和InnoDB的主要区别在于参照完整性和交易。 还有其他的区别,如locking,回滚和全文search。

参照完整性

引用完整性可确保表之间的关系保持一致。 更具体地说,这意味着当表(例如,列表)具有指向不同表(例如产品)的外键(例如产品ID)时,当指向表发生更新或删除时,这些改变被级联到链接表。 在我们的例子中,如果产品被重命名,链接表的外键也会更新; 如果产品从“产品”表中删除,则任何指向已删除条目的列表也将被删除。 此外,任何新的列表必须具有指向有效的现有条目的外键。

InnoDB是一个关系数据库pipe理系统(RDBMS),因此具有参照完整性,而MyISAM则没有。

交易和primefaces

使用数据操作语言(DML)语句(如SELECT,INSERT,UPDATE和DELETE)来pipe理表中的数据。 一个事务组将两个或更多的DML语句合并成一个单一的工作单元,因此整个单元都被应用,或者它们都不是。

MyISAM不支持事务,而InnoDB则支持事务。

如果在使用MyISAM表时操作中断,操作立即中止,即使操作没有完成,受影响的行(甚至每行内的数据)仍然受到影响。

如果一个操作在使用InnoDB表时被中断,因为它使用具有primefaces性的事务,任何未完成的事务都不会生效,因为没有提交。

表锁与行锁

当一个查询运行在MyISAM表上时,它所查询的整个表将被locking。 这意味着后续查询只能在当前查询完成后执行。 如果您正在阅读大型表格和/或频繁进行读取和写入操作,则这可能意味着查询积压。

当查询针对InnoDB表运行时,只有涉及的行被locking,表的其余部分仍然可用于CRUD操作。 这意味着查询可以在同一个表上同时运行,只要它们不使用同一行。

InnoDB中的这个特性被称为并发性。 与并发性一样,存在一个主要的缺点,即适用于选定范围的表,因为在内核线程之间切换会有开销,并且应该对内核线程设置限制以防止服务器停止。

交易和回滚

在MyISAM中运行一个操作时,设置的更改; 在InnoDB中,这些更改可以回滚。 用于控制事务的最常用命令是COMMIT,ROLLBACK和SAVEPOINT。 1. COMMIT – 你可以编写多个DML操作,但只有在提交COMMIT时才会保存更改2. ROLLBACK – 你可以放弃尚未提交的任何操作3. SAVEPOINT – 在列表中设置一个点ROLLBACK操作可以回滚到的操作

可靠性

MyISAM不提供数据完整性 – 硬件故障,不清洁的closures和取消的操作可能导致数据损坏。 这将需要完全修复或重build索引和表格。

另一方面,InnoDB使用事务日志,双写缓冲区和自动校验和validation来防止损坏。 在InnoDB进行任何更改之前,它将事务之前的数据logging到名为ibdata1的系统表空间文件中。 如果发生崩溃,InnoDB将通过重播这些日志来自动恢复。

FULLTEXT索引

InnoDB直到MySQL版本5.6.4才支持FULLTEXT索引。 在撰写本文时,许多共享主机提供商的MySQL版本仍然低于5.6.4,这意味着InnoDB表不支持FULLTEXT索引。

但是,这不是使用MyISAM的有效理由。 最好转换为支持最新版本MySQL的主机提供商。 不是说使用FULLTEXT索引的MyISAM表不能被转换成InnoDB表。

结论

总之,InnoDB应该是你select的默认存储引擎。 在满足特定需求时selectMyISAM或其他数据types。

为了在这里涵盖两种发动机之间机械差异的广泛select,我提出了一个经验的速度比较研究。

就纯速度而言,并不总是MyISAM比InnoDB更快,但根据我的经验,PURE READ工作环境的速度往往要快大约2.0-2.5倍。 很显然,这不适用于所有的环境 – 正如其他人所写的,MyISAM缺乏交易和外键等。

我在下面做了一些基准testing – 我使用python进行循环,并使用timeit库进行时序比较。 为了提高兴趣,我还包括了内存引擎,尽pipe它只适用于较小的表(不断遇到当你超出MySQL内存限制时The table 'tbl' is full ,但是这样可以提供最好的性能。 我select的四种types是:

  1. 香草SELECTs
  2. 计数
  3. 条件SELECTs
  4. 索引和非索引子select

首先,我使用下面的SQL创build了三个表

 CREATE TABLE data_interrogation.test_table_myisam ( index_col BIGINT NOT NULL AUTO_INCREMENT, value1 DOUBLE, value2 DOUBLE, value3 DOUBLE, value4 DOUBLE, PRIMARY KEY (index_col) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 

在第二个和第三个表中用“MyISAM”代替“InnoDB”和“内存”。

1)香草select

查询: SELECT * FROM tbl WHERE index_col = xx

结果: 绘制

通过不同的数据库引擎比较香草选择

这些速度大致相同,并且如所期望的那样,在要select的列的数量上是线性的。 InnoDB似乎比MyISAM 稍微快一些,但这实际上是微不足道的。

码:

 import timeit import MySQLdb import MySQLdb.cursors import random from random import randint db = MySQLdb.connect(host="...", user="...", passwd="...", db="...", cursorclass=MySQLdb.cursors.DictCursor) cur = db.cursor() lengthOfTable = 100000 # Fill up the tables with random data for x in xrange(lengthOfTable): rand1 = random.random() rand2 = random.random() rand3 = random.random() rand4 = random.random() insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" cur.execute(insertString) cur.execute(insertString2) cur.execute(insertString3) db.commit() # Define a function to pull a certain number of records from these tables def selectRandomRecords(testTable,numberOfRecords): for x in xrange(numberOfRecords): rand1 = randint(0,lengthOfTable) selectString = "SELECT * FROM " + testTable + " WHERE index_col = " + str(rand1) cur.execute(selectString) setupString = "from __main__ import selectRandomRecords" # Test time taken using timeit myisam_times = [] innodb_times = [] memory_times = [] for theLength in [3,10,30,100,300,1000,3000,10000]: innodb_times.append( timeit.timeit('selectRandomRecords("test_table_innodb",' + str(theLength) + ')', number=100, setup=setupString) ) myisam_times.append( timeit.timeit('selectRandomRecords("test_table_myisam",' + str(theLength) + ')', number=100, setup=setupString) ) memory_times.append( timeit.timeit('selectRandomRecords("test_table_memory",' + str(theLength) + ')', number=100, setup=setupString) ) 

2)计数

查询: SELECT count(*) FROM tbl

结果: MyISAM获胜

不同数据库引擎的计数比较

这表明了MyISAM和InnoDB之间的巨大差异 – MyISAM(和内存)跟踪表中的logging数,所以这个事务处理速度很快,并且O(1)。 在我调查的范围内,InnoDB计数所需的时间量随着表格大小的增加而线性增加。 我怀疑在实践中观察到的许多MyISAM查询的加速是由于类似的影响。

码:

 myisam_times = [] innodb_times = [] memory_times = [] # Define a function to count the records def countRecords(testTable): selectString = "SELECT count(*) FROM " + testTable cur.execute(selectString) setupString = "from __main__ import countRecords" # Truncate the tables and re-fill with a set amount of data for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]: truncateString = "TRUNCATE test_table_innodb" truncateString2 = "TRUNCATE test_table_myisam" truncateString3 = "TRUNCATE test_table_memory" cur.execute(truncateString) cur.execute(truncateString2) cur.execute(truncateString3) for x in xrange(theLength): rand1 = random.random() rand2 = random.random() rand3 = random.random() rand4 = random.random() insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" cur.execute(insertString) cur.execute(insertString2) cur.execute(insertString3) db.commit() # Count and time the query innodb_times.append( timeit.timeit('countRecords("test_table_innodb")', number=100, setup=setupString) ) myisam_times.append( timeit.timeit('countRecords("test_table_myisam")', number=100, setup=setupString) ) memory_times.append( timeit.timeit('countRecords("test_table_memory")', number=100, setup=setupString) ) 

3)有条件的select

查询: SELECT * FROM tbl WHERE value1<0.5 AND value2<0.5 AND value3<0.5 AND value4<0.5

结果: MyISAM获胜

比较不同数据库引擎的条件选择

在这里,MyISAM和内存的性能大致相同,而对于更大的表来说,InnoDB的性能却大约是50%。 这是MyISAM的好处似乎被最大化的那种查询。

码:

 myisam_times = [] innodb_times = [] memory_times = [] # Define a function to perform conditional selects def conditionalSelect(testTable): selectString = "SELECT * FROM " + testTable + " WHERE value1 < 0.5 AND value2 < 0.5 AND value3 < 0.5 AND value4 < 0.5" cur.execute(selectString) setupString = "from __main__ import conditionalSelect" # Truncate the tables and re-fill with a set amount of data for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]: truncateString = "TRUNCATE test_table_innodb" truncateString2 = "TRUNCATE test_table_myisam" truncateString3 = "TRUNCATE test_table_memory" cur.execute(truncateString) cur.execute(truncateString2) cur.execute(truncateString3) for x in xrange(theLength): rand1 = random.random() rand2 = random.random() rand3 = random.random() rand4 = random.random() insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" cur.execute(insertString) cur.execute(insertString2) cur.execute(insertString3) db.commit() # Count and time the query innodb_times.append( timeit.timeit('conditionalSelect("test_table_innodb")', number=100, setup=setupString) ) myisam_times.append( timeit.timeit('conditionalSelect("test_table_myisam")', number=100, setup=setupString) ) memory_times.append( timeit.timeit('conditionalSelect("test_table_memory")', number=100, setup=setupString) ) 

4)子select

结果: InnoDB获胜

对于这个查询,我为子select创build了一组额外的表。 每个BIGINT只有两列,一个是主键索引,另一个是没有索引的。 由于表的大小,我没有testing内存引擎。 SQL表创build命令是

 CREATE TABLE subselect_myisam ( index_col bigint NOT NULL, non_index_col bigint, PRIMARY KEY (index_col) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; 

在第二个表中,“MyISAM”再次代替了“InnoDB”。

在这个查询中,我将select表的大小保留为1000000,而是改变了子select列的大小。

不同数据库引擎的子选择比较

InnoDB在这里很容易获胜。 当我们得到一个合理的大小表后,两个引擎都按照子select的大小线性缩放。 索引加速了MyISAM命令,但有趣的是对InnoDB速度影响不大。 subSelect.png

码:

 myisam_times = [] innodb_times = [] myisam_times_2 = [] innodb_times_2 = [] def subSelectRecordsIndexed(testTable,testSubSelect): selectString = "SELECT * FROM " + testTable + " WHERE index_col in ( SELECT index_col FROM " + testSubSelect + " )" cur.execute(selectString) setupString = "from __main__ import subSelectRecordsIndexed" def subSelectRecordsNotIndexed(testTable,testSubSelect): selectString = "SELECT * FROM " + testTable + " WHERE index_col in ( SELECT non_index_col FROM " + testSubSelect + " )" cur.execute(selectString) setupString2 = "from __main__ import subSelectRecordsNotIndexed" # Truncate the old tables, and re-fill with 1000000 records truncateString = "TRUNCATE test_table_innodb" truncateString2 = "TRUNCATE test_table_myisam" cur.execute(truncateString) cur.execute(truncateString2) lengthOfTable = 1000000 # Fill up the tables with random data for x in xrange(lengthOfTable): rand1 = random.random() rand2 = random.random() rand3 = random.random() rand4 = random.random() insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")" cur.execute(insertString) cur.execute(insertString2) for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]: truncateString = "TRUNCATE subselect_innodb" truncateString2 = "TRUNCATE subselect_myisam" cur.execute(truncateString) cur.execute(truncateString2) # For each length, empty the table and re-fill it with random data rand_sample = sorted(random.sample(xrange(lengthOfTable), theLength)) rand_sample_2 = random.sample(xrange(lengthOfTable), theLength) for (the_value_1,the_value_2) in zip(rand_sample,rand_sample_2): insertString = "INSERT INTO subselect_innodb (index_col,non_index_col) VALUES (" + str(the_value_1) + "," + str(the_value_2) + ")" insertString2 = "INSERT INTO subselect_myisam (index_col,non_index_col) VALUES (" + str(the_value_1) + "," + str(the_value_2) + ")" cur.execute(insertString) cur.execute(insertString2) db.commit() # Finally, time the queries innodb_times.append( timeit.timeit('subSelectRecordsIndexed("test_table_innodb","subselect_innodb")', number=100, setup=setupString) ) myisam_times.append( timeit.timeit('subSelectRecordsIndexed("test_table_myisam","subselect_myisam")', number=100, setup=setupString) ) innodb_times_2.append( timeit.timeit('subSelectRecordsNotIndexed("test_table_innodb","subselect_innodb")', number=100, setup=setupString2) ) myisam_times_2.append( timeit.timeit('subSelectRecordsNotIndexed("test_table_myisam","subselect_myisam")', number=100, setup=setupString2) ) 

我认为所有这一切的信息是,如果你真的关心速度,你需要对你正在做的查询进行基准testing,而不是假设哪个引擎更适合。

有点偏离主题,但为了文档和完整性,我想补充以下内容。

一般来说,使用InnoDB会导致一个非常复杂的应用程序,也可能更无bug。 因为您可以将所有引用完整性(外键约束)放入数据模型中,所以不需要使用MyISAM所需的任何应用程序代码。

每次插入,删除或replacelogging时,都必须检查并维护关系。 例如,如果你删除父母,所有的孩子也应该被删除。 例如,即使在一个简单的博客系统中,如果你删除了一个博客发布logging,你将不得不删除评论logging,喜欢等。在InnoDB中,这是由数据库引擎自动完成的(如果你在模型中指定了约束)并且不需要应用程序代码。 在MyISAM中,这将被编码到应用程序中,这在networking服务器中是非常困难的。 Web服务器本质上是并行/并行的,而且由于这些操作应该是primefaces性的,MyISAM不支持真正的事务,所以使用MyISAM作为Web服务器是有风险的/容易出错的。

在大多数一般情况下,InnoDB的performance会更好,原因有很多,一个是能够使用logging级别locking,而不是表级locking。 不仅在写入比读取更频繁的情况下,也在大型数据集上复杂连接的情况下。 我们注意到,通过在MyISAM表上使用InnoDB表进行非常大的连接(花费几分钟),性能提高了3倍。

我会说,一般来说,使用MySQL时,InnoDB(使用带有参照完整性的3NF数据模型)应该是默认的select。 MyISAM只能用于非常特殊的情况。 它很可能会执行较less,导致一个更大,更多的bug应用程序。

说了这些。 数据模型是网页devise师/编程人员很less发现的一门艺术。 没有冒犯,但它确实解释了MyISAM被使用了这么多。

InnoDB提供:

 ACID transactions row-level locking foreign key constraints automatic crash recovery table compression (read/write) spatial data types (no spatial indexes) 

在InnoDB中,除了TEXT和BLOB之外,所有行中的数据最多可以占用8000个字节。 InnoDB没有全文索引。 在InnoDB中,COUNT(*)s(当不使用WHERE,GROUP BY或JOIN时)的执行速度比在MyISAM中慢,因为行数不是在内部存储的。 InnoDB将数据和索引存储在一个文件中。 InnoDB使用缓冲池caching数据和索引。

MyISAM提供:

 fast COUNT(*)s (when WHERE, GROUP BY, or JOIN is not used) full text indexing smaller disk footprint very high table compression (read only) spatial data types and indexes (R-tree) 

MyISAM有表级locking,但没有行级locking。 没有交易。 没有自动崩溃恢复,但它确实提供修复表function。 没有外键约束。 与InnoDB表相比,MyISAM表通常在磁盘上更紧凑。 如果需要,MyISAM表格可以通过使用myisampack进行压缩而进一步大大缩小,但是只能是只读的。 MyISAM在一个文件中存储索引,在另一个文件中存储数据。 MyISAM使用密钥缓冲区来caching索引,并将数据cachingpipe理留给操作系统。

总的来说,我会推荐InnoDB的大多数用途和MyISAM的专门用途只。 InnoDB现在是新版MySQL中的默认引擎。

如果使用MyISAM,则每小时不会执行任何事务,除非您将每个DML语句都视为事务(在任何情况下,在发生崩溃时,这些事务不会持久或不可primefaces化)。

所以我觉得你必须使用InnoDB。

每秒300笔交易听起来不less。 如果您绝对需要这些事务在整个电源故障期间保持持久性,请确保您的I / O子系统可以轻松处理每秒多次的写入操作。 您至less需要一个带电池支持caching的RAID控制器。

如果你可以用一个小的耐久性命中,你可以使用innodb_flush_log_at_trx_commit设置为0或2的InnoDB(详见文档),你可以提高性能。

有许多补丁可以提高Google和其他人的并发性 – 如果没有他们,仍然无法获得足够的性能,这些补丁可能会引起人们的兴趣。

请注意 ,我的正式教育和经验是在甲骨文,而我在MySQL上的工作完全是个人的,在我自己的时间,所以如果我说的事情对于Oracle来说是真实的,但对于MySQL不是真的,我表示歉意。 虽然这两个系统共享很多,但关系理论/代数是一样的,而关系数据库仍然是关系数据库,还是有很大的区别!

我特别喜欢(以及行级locking)InnoDB是基于事务的,这意味着您可能会多次更新/插入/创build/更改/删除/等等操作您的Web应用程序的一个“操作”。 The problem that arises is that if only some of those changes/operations end up being committed, but others do not, you will most times (depending on the specific design of the database) end up with a database with conflicting data/structure.

Note: With Oracle, create/alter/drop statements are called "DDL" (Data Definition) statements, and implicitly trigger a commit. Insert/update/delete statements, called "DML" (Data Manipulation), are not committed automatically, but only when a DDL, commit, or exit/quit is performed (or if you set your session to "auto-commit", or if your client auto-commits). It's imperative to be aware of that when working with Oracle, but I am not sure how MySQL handles the two types of statements. Because of this, I want to make it clear that I'm not sure of this when it comes to MySQL; only with Oracle.

An example of when transaction-based engines excel:

Let's say that I or you are on a web-page to sign up to attend a free event, and one of the main purposes of the system is to only allow up to 100 people to sign up, since that is the limit of the seating for the event. Once 100 sign-ups are reached, the system would disable further signups, at least until others cancel.

In this case, there may be a table for guests (name, phone, email, etc.), and a second table which tracks the number of guests that have signed up. We thus have two operations for one "transaction". Now suppose that after the guest info is added to the GUESTS table, there is a connection loss, or an error with the same impact. The GUESTS table was updated (inserted into), but the connection was lost before the "available seats" could be updated.

Now we have a guest added to the guest table, but the number of available seats is now incorrect (for example, value is 85 when it's actually 84).

Of course there are many ways to handle this, such as tracking available seats with "100 minus number of rows in guests table," or some code that checks that the info is consistent, etc…. But with a transaction-based database engine such as InnoDB, either ALL of the operations are committed, or NONE of them are. This can be helpful in many cases, but like I said, it's not the ONLY way to be safe, no (a nice way, however, handled by the database, not the programmer/script-writer).

That's all "transaction-based" essentially means in this context, unless I'm missing something — that either the whole transaction succeeds as it should, or nothing is changed, since making only partial changes could make a minor to SEVERE mess of the database, perhaps even corrupting it…

But I'll say it one more time, it's not the only way to avoid making a mess. But it is one of the methods that the engine itself handles, leaving you to code/script with only needing to worry about "was the transaction successful or not, and what do I do if not (such as retry)," instead of manually writing code to check it "manually" from outside of the database, and doing a lot more work for such events.

Lastly, a note about table-locking vs row-locking:

DISCLAIMER: I may be wrong in all that follows in regard to MySQL, and the hypothetical/example situations are things to look into, but I may be wrong in what exactly is possible to cause corruption with MySQL. The examples are however very real in general programming, even if MySQL has more mechanisms to avoid such things…

Anyway, I am fairly confident in agreeing with those who have argued that how many connections are allowed at a time does not work around a locked table. In fact, multiple connections are the entire point of locking a table!! So that other processes/users/apps are not able to corrupt the database by making changes at the same time.

How would two or more connections working on the same row make a REALLY BAD DAY for you?? Suppose there are two processes both want/need to update the same value in the same row, let's say because the row is a record of a bus tour, and each of the two processes simultaneously want to update the "riders" or "available_seats" field as "the current value plus 1."

Let's do this hypothetically, step by step:

  1. Process one reads the current value, let's say it's empty, thus '0' so far.
  2. Process two reads the current value as well, which is still 0.
  3. Process one writes (current + 1) which is 1.
  4. Process two should be writing 2, but since it read the current value before process one write the new value, it too writes 1 to the table.

I'm not certain that two connections could intermingle like that, both reading before the first one writes… But if not, then I would still see a problem with:

  1. Process one reads the current value, which is 0.
  2. Process one writes (current + 1), which is 1.
  3. Process two reads the current value now. But while process one DID write (update), it has not committed the data, thus only that same process can read the new value that it updated, while all others see the older value, until there is a commit.

Also, at least with Oracle databases, there are isolation levels, which I will not waste our time trying to paraphrase. Here is a good article on that subject, and each isolation level having it's pros and cons, which would go along with how important transaction-based engines may be in a database…

Lastly, there may likely be different safeguards in place within MyISAM, instead of foreign-keys and transaction-based interaction. Well, for one, there is the fact that an entire table is locked, which makes it less likely that transactions/FKs are needed .

And alas, if you are aware of these concurrency issues, yes you can play it less safe and just write your applications, set up your systems so that such errors are not possible (your code is then responsible, rather than the database itself). However, in my opinion, I would say that it is always best to use as many safeguards as possible, programming defensively, and always being aware that human error is impossible to completely avoid. It happens to everyone, and anyone who says they are immune to it must be lying, or hasn't done more than write a "Hello World" application/script. 😉

I hope that SOME of that is helpful to some one, and even more-so, I hope that I have not just now been a culprit of assumptions and being a human in error!! My apologies if so, but the examples are good to think about, research the risk of, and so on, even if they are not potential in this specific context.

Feel free to correct me, edit this "answer," even vote it down. Just please try to improve, rather than correcting a bad assumption of mine with another. 😉

This is my first response, so please forgive the length due to all the disclaimers, etc… I just don't want to sound arrogant when I am not absolutely certain!

I think this is an excellent article on explaining the differences and when you should use one over the other: http://tag1consulting.com/MySQL_Engines_MyISAM_vs_InnoDB

Also check out some drop-in replacements for MySQL itself:

MariaDB

http://mariadb.org/

MariaDB is a database server that offers drop-in replacement functionality for MySQL. MariaDB is built by some of the original authors of MySQL, with assistance from the broader community of Free and open source software developers. In addition to the core functionality of MySQL, MariaDB offers a rich set of feature enhancements including alternate storage engines, server optimizations, and patches.

Percona Server

https://launchpad.net/percona-server

An enhanced drop-in replacement for MySQL, with better performance, improved diagnostics, and added features.

MYISAM:

  1. MYISAM supports Table-level Locking

  2. MyISAM designed for need of speed

  3. MyISAM does not support foreign keys hence we call MySQL with MYISAM is DBMS
  4. MyISAM stores its tables, data and indexes in diskspace using separate three different files. (tablename.FRM, tablename.MYD, tablename.MYI)
  5. MYISAM not supports transaction. You cannot commit and rollback with MYISAM. Once you issue a command it's done.

INNODB:

  1. InnoDB supports Row-level Locking
  2. InnoDB designed for maximum performance when processing high volume of data
  3. InnoDB support foreign keys hence we call MySQL with InnoDB is RDBMS
  4. InnoDB stores its tables and indexes in a tablespace
  5. InnoDB supports transaction. You can commit and rollback with InnoDB

MyISAM

The MyISAM engine is the default engine in most MySQL installations and is a derivative of the original ISAM engine type supported in the early versions of the MySQL system. The engine provides the best combination of performance and functionality, although it lacks transaction capabilities (use the InnoDB or BDB engines) and uses table-level locking .

FlashMAX and FlashMAX Connect: Leading the Flash Platform Transformation Download Now Unless you need transactions, there are few databases and applications that cannot effectively be stored using the MyISAM engine. However, very high-performance applications where there are large numbers of data inserts/updates compared to the number of reads can cause performance proboelsm for the MyISAM engine. It was originally designed with the idea that more than 90% of the database access to a MyISAM table would be reads, rather than writes.

With table-level locking, a database with a high number of row inserts or updates becomes a performance bottleneck as the table is locked while data is added. Luckily this limitation also works well within the restrictions of a non-transaction database.

MyISAM Summary

Name -MyISAM

Introduced -v3.23

Default install -Yes

Data limitations -None

Index limitations -64 indexes per table (32 pre 4.1.2); Max 16 columns per index

Transaction support -No

Locking level -Table


InnoDB的

The InnoDB Engine is provided by Innobase Oy and supports all of the database functionality (and more) of MyISAM engine and also adds full transaction capabilities (with full ACID (Atomicity, Consistency, Isolation, and Durability) compliance) and row level locking of data.

The key to the InnoDB system is a database, caching and indexing structure where both indexes and data are cached in memory as well as being stored on disk. This enables very fast recovery, and works even on very large data sets. By supporting row level locking, you can add data to an InnoDB table without the engine locking the table with each insert and this speeds up both the recovery and storage of information in the database.

As with MyISAM , there are few data types that cannot effectively be stored in an InnoDB database. In fact, there are no significant reasons why you shouldn't always use an InnoDB database. The management overhead for InnoDB is slightly more onerous, and getting the optimization right for the sizes of in-memory and on disk caches and database files can be complex at first. However, it also means that you get more flexibility over these values and once set, the performance benefits can easily outweigh the initial time spent. Alternatively, you can let MySQL manage this automatically for you.

If you are willing (and able) to configure the InnoDB settings for your server, then I would recommend that you spend the time to optimize your server configuration and then use the InnoDB engine as the default.

InnoDB Summary

Name -InnoDB

Introduced -v3.23 (source only), v4.0 (source and binary)

Default install -No

Data limitations -None

Index limitations -None

Transaction support -Yes (ACID compliant)

Locking level -Row

In my experience, MyISAM was a better choice as long as you don't do DELETEs, UPDATEs, a whole lot of single INSERT, transactions, and full-text indexing. BTW, CHECK TABLE is horrible. As the table gets older in terms of the number of rows, you don't know when it will end.

I've figure out that even though Myisam has locking contention, it's still faster than InnoDb in most scenarios because of the rapid lock acquisition scheme it uses. I've tried several times Innodb and always fall back to MyIsam for one reason or the other. Also InnoDB can be very CPU intensive in huge write loads.

Every application has it's own performance profile for using a database, and chances are it will change over time.

The best thing you can do is to test your options. Switching between MyISAM and InnoDB is trivial, so load some test data and fire jmeter against your site and see what happens.

I tried to run insertion of random data into MyISAM and InnoDB tables. The result was quite shocking. MyISAM needed a few seconds less for inserting 1 million rows than InnoDB for just 10 thousand!

The Question and most of the Answers are out of date .

Yes, it is an old wives' tale that MyISAM is faster than InnoDB. notice the Question's date: 2008; it is now almost a decade later. InnoDB has made significant performance strides since then.

The dramatic graph was for the one case where MyISAM wins: COUNT(*) without a WHERE clause. But is that really what you spend your time doing?

If you run concurrency test, InnoDB is very likely to win, even against MEMORY .

If you do any writes while benchmarking SELECTs , MyISAM and MEMORY are likely to lose because of table-level locking.

In fact, Oracle is so sure that InnoDB is better that they removed MyISAM from 8.0!

The Question was written early in the days of 5.1. Since then, these major versions were marked "General Availability":

  • 2010: 5.5 (.8 in Dec.)
  • 2013: 5.6 (.10 in Feb.)
  • 2015: 5.7 (.9 in Oct.)
  • [TBD, maybe 2018], 8.0

Bottom line: Don't use MyISAM

myisam is a NOGO for that type of workload (high concurrency writes), i dont have that much experience with innodb (tested it 3 times and found in each case that the performance sucked, but it's been a while since the last test) if you're not forced to run mysql, consider giving postgres a try as it handles concurrent writes MUCH better

I know this won't be popular but here goes:

myISAM lacks support for database essentials like transactions and referential integrity which often results in glitchy / buggy applications. You cannot not learn proper database design fundamentals if they are not even supported by your db engine.

Not using referential integrity or transactions in the database world is like not using object oriented programming in the software world.

InnoDB exists now, use that instead! Even MySQL developers have finally conceded to change this to the default engine in newer versions, despite myISAM being the original engine that was the default in all legacy systems.

No it does not matter if you are reading or writing or what performance considerations you have, using myISAM can result in a variety of problems, such as this one I just ran into: I was performing a database sync and at the same time someone else accessed an application that accessed a table set to myISAM. Due to the lack of transaction support and the generally poor reliability of this engine, this crashed the entire database and I had to manually restart mysql!

Over the past 15 years of development I have used many databases and engines. myISAM crashed on me about a dozen times during this period, other databases, only once! And that was a microsoft SQL database where some developer wrote faulty CLR code (common language runtime – basically C# code that executes inside the database) by the way, it was not the database engine's fault exactly.

I agree with the other answers here that say that quality high-availability, high-performance applications should not use myISAM as it will not work, it is not robust or stable enough to result in a frustration-free experience. See Bill Karwin's answer for more details.

PS Gotta love it when myISAM fanboys downvote but can't tell you which part of this answer is incorrect.

For that ratio of read/writes I would guess InnoDB will perform better. Since you are fine with dirty reads, you might (if you afford) replicate to a slave and let all your reads go to the slave. Also, consider inserting in bulk, rather than one record at a time.

Almost every time I start a new project I Google this same question to see if I come up with any new answers.

It eventually boils down to – I take the latest version of MySQL and run tests.

I have tables where I want to do key/value lookups… and that's all. I need to get the value (0-512 bytes) for a hash key. There is not a lot of transactions on this DB. The table gets updates occasionally (in it's entirety), but 0 transactions.

So we're not talking about a complex system here, we are talking about a simple lookup,.. and how (other than making the table RAM resident) we can optimize performance.

I also do tests on other databases (ie NoSQL) to see if there is anywhere I can get an advantage. The biggest advantage I have found is in key mapping but as far as the lookup goes, MyISAM is currently topping them all.

Albeit, I wouldn't perform financial transactions with MyISAM tables but for simple lookups, you should test it out.. typically 2x to 5x the queries/sec.

Test it, I welcome debate.

If it is 70% inserts and 30% reads then it is more like on the InnoDB side.

In short, InnoDB is good if you are working on something that needs a reliable database that can handles a lot of INSERT and UPDATE instructions.

and, MyISAM is good if you needs a database that will mostly be taking a lot of read (SELECT) instructions rather than write (INSERT and UPDATES), considering its drawback on the table-lock thing.

you may want to check out;
Pros and Cons of InnoDB
Pros and Cons of MyISAM

bottomline: if you are working offline with selects on large chunks of data, MyISAM will probably give you better (much better) speeds.

there are some situations when MyISAM is infinitely more efficient than InnoDB: when manipulating large data dumps offline (because of table lock).

example: I was converting a csv file (15M records) from NOAA which uses VARCHAR fields as keys. InnoDB was taking forever, even with large chunks of memory available.

this an example of the csv (first and third fields are keys).

 USC00178998,20130101,TMAX,-22,,,7,0700 USC00178998,20130101,TMIN,-117,,,7,0700 USC00178998,20130101,TOBS,-28,,,7,0700 USC00178998,20130101,PRCP,0,T,,7,0700 USC00178998,20130101,SNOW,0,T,,7, 

since what i need to do is run a batch offline update of observed weather phenomena, i use MyISAM table for receiving data and run JOINS on the keys so that i can clean the incoming file and replace VARCHAR fields with INT keys (which are related to external tables where the original VARCHAR values are stored).