回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我问了两个相关的问题(<a href="https://stackoverflow.com/questions/10412604/how-can-i-speed-up-fetching-the-results-after-running-an-sqlite-query">How can I speed up fetching the results after running an sqlite query?</a>和<a href="https://stackoverflow.com/questions/10336492/is-it-normal-that-sqlite-fetchall-is-so-slow">Is it normal that sqlite.fetchall() is so slow?</a>)。我已经改变了一些东西并加快了速度,但是select语句仍然需要一个多小时才能完成。</p>
<p>我有一个表<code>feature</code>,它包含一个<code>rtMin</code>、<code>rtMax</code>、<code>mzMin</code>和<code>mzMax</code>值。这些值一起是矩形的角(如果您阅读了我以前的问题,我会分别保存这些值,而不是从<code>convexhull</code>表中获取min()和max(),这样会更快)。<br/>我得到了一个表<code>spectrum</code>,它有一个<code>rt</code>和一个<code>mz</code>值。我有一个表,当光谱的<code>rt</code>和<code>mz</code>值在特征的矩形中时,它将特征链接到光谱。</p>
<p>为此,我使用以下sql和python代码来检索频谱和特性的id:</p>
<pre><code>self.cursor.execute("SELECT spectrum_id, feature_table_id "+
"FROM `spectrum` "+
"INNER JOIN `feature` "+
"ON feature.msrun_msrun_id = spectrum.msrun_msrun_id "+
"WHERE spectrum.scan_start_time >= feature.rtMin "+
"AND spectrum.scan_start_time <= feature.rtMax "+
"AND spectrum.base_peak_mz >= feature.mzMin "+
"AND spectrum.base_peak_mz <= feature.mzMax")
spectrumAndFeature_ids = self.cursor.fetchall()
for spectrumAndFeature_id in spectrumAndFeature_ids:
spectrum_has_feature_inputValues = (spectrumAndFeature_id[0], spectrumAndFeature_id[1])
self.cursor.execute("INSERT INTO `spectrum_has_feature` VALUES (?,?)",spectrum_has_feature_inputValues)
</code></pre>
<p>我对执行、获取和插入时间进行了计时,得到了以下结果:</p>
<pre><code>query took: 74.7989799976 seconds
5888.845541 seconds since fetchall
returned a length of: 10822
inserting all values took: 3.29669690132 seconds
</code></pre>
<p>所以这个查询大约需要一个半小时,大部分时间都在执行fetchall()。我怎样才能加快速度?我应该在python代码中进行<code>rt</code>和<code>mz</code>比较吗?</p>
<hr/>
<h2>更新:</h2>
<p>为了显示我得到了哪些索引,下面是这些表的create语句:</p>
<pre><code>CREATE TABLE IF NOT EXISTS `feature` (
`feature_table_id` INT PRIMARY KEY NOT NULL ,
`feature_id` VARCHAR(40) NOT NULL ,
`intensity` DOUBLE NOT NULL ,
`overallquality` DOUBLE NOT NULL ,
`charge` INT NOT NULL ,
`content` VARCHAR(45) NOT NULL ,
`intensity_cutoff` DOUBLE NOT NULL,
`mzMin` DOUBLE NULL ,
`mzMax` DOUBLE NULL ,
`rtMin` DOUBLE NULL ,
`rtMax` DOUBLE NULL ,
`msrun_msrun_id` INT NOT NULL ,
CONSTRAINT `fk_feature_msrun1`
FOREIGN KEY (`msrun_msrun_id` )
REFERENCES `msrun` (`msrun_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION);
CREATE UNIQUE INDEX `id_UNIQUE` ON `feature` (`feature_table_id` ASC);
CREATE INDEX `fk_feature_msrun1` ON `feature` (`msrun_msrun_id` ASC);
CREATE TABLE IF NOT EXISTS `spectrum` (
`spectrum_id` INT PRIMARY KEY NOT NULL ,
`spectrum_index` INT NOT NULL ,
`ms_level` INT NOT NULL ,
`base_peak_mz` DOUBLE NOT NULL ,
`base_peak_intensity` DOUBLE NOT NULL ,
`total_ion_current` DOUBLE NOT NULL ,
`lowest_observes_mz` DOUBLE NOT NULL ,
`highest_observed_mz` DOUBLE NOT NULL ,
`scan_start_time` DOUBLE NOT NULL ,
`ion_injection_time` DOUBLE,
`binary_data_mz` BLOB NOT NULL,
`binaray_data_rt` BLOB NOT NULL,
`msrun_msrun_id` INT NOT NULL ,
CONSTRAINT `fk_spectrum_msrun1`
FOREIGN KEY (`msrun_msrun_id` )
REFERENCES `msrun` (`msrun_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION);
CREATE INDEX `fk_spectrum_msrun1` ON `spectrum` (`msrun_msrun_id` ASC);
CREATE TABLE IF NOT EXISTS `spectrum_has_feature` (
`spectrum_spectrum_id` INT NOT NULL ,
`feature_feature_table_id` INT NOT NULL ,
CONSTRAINT `fk_spectrum_has_feature_spectrum1`
FOREIGN KEY (`spectrum_spectrum_id` )
REFERENCES `spectrum` (`spectrum_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_spectrum_has_feature_feature1`
FOREIGN KEY (`feature_feature_table_id` )
REFERENCES `feature` (`feature_table_id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION);
CREATE INDEX `fk_spectrum_has_feature_feature1` ON `spectrum_has_feature` (`feature_feature_table_id` ASC);
CREATE INDEX `fk_spectrum_has_feature_spectrum1` ON `spectrum_has_feature` (`spectrum_spectrum_id` ASC);
</code></pre>
<hr/>
<h2>更新2:</h2>
<p>我有20938个光谱,305742个特征和2个msrun。结果是10822场比赛。</p>
<hr/>
<h2>更新3:</h2>
<p>使用新索引(在<code>spectrum</code>(<code>msrun_msrun_id</code>,<code>base_peak_mz</code>)上创建索引<code>fk_spectrum_msrun1_2</code>)并在两次之间节省大约20秒:
查询时间:76.4599349499秒
自fetchall后5864.15418601秒</p>
<hr/>
<h2>更新4:</h2>
<p>从解释查询计划打印:</p>
<pre><code>(0, 0, 0, u'SCAN TABLE spectrum (~1000000 rows)'), (0, 1, 1, u'SEARCH TABLE feature USING INDEX fk_feature_msrun1 (msrun_msrun_id=?) (~2 rows)')
</code></pre>