I'm converting a PHP app from using Visual FoxPro as the database
backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
Mac OSX 10.4. The end application will be deployed cross platform and
to both 4.x and 5.x MySQL servers.
This query returned 21 records in .27 seconds.
SELECT zip FROM zipcodes WHERE
degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
< 5
This query returned 21442 records in 1.08 seconds.
SELECT custzip FROM customers
This query is still running half an hour later, with a Time of 2167
and a State of "Sending Data" (according to the mysql process list)
SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
< 5)
When I try to EXPLAIN the query it gives me the following...
id,select-type,table,type,possible-keys,key,key-len,ref,rows,Extra
1,PRIMARY,customers,index,NULL,cw-custzip,30,NULL,21226,Using where; Using index
2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where
If it matters both tables are INNODB and both customers.custzip and
zipcodes.zip are indexed. We used a program called DBConvert from
DMSoft to convert the data so it's "exactly" the same on both the VFP
side and the MySQL side. With all that in mind... VFP returns the
exact same query in 5-10 seconds and that includes render time in the
web browser.
By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
phonedate >= '2001-01-01' AND phonedate <= '2009-06-18') returns
almost instantly.
I'm at a complete loss... The suggestions I've seen online for
optimizing Dependent Subquery's basically revolve around changing it
from a sub-query to a join but that would require more
re-architecturing than I want to do... (Unless I'm forced) Especially
since more than a few of those solutions suggested precalculating the
distance between zipcodes which only works if the distances are known
(only allow 10, 50 and 100 mile radi for example)
Any ideas?
Thanks in advance!
Matt
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Johnny Withers on
2009-06-19T02:31:29+00:00
I often find doing the IN (subquery...) is really slow versus doing a join:
SELECT cutzip
FROM customers
INNER JOIN zipcodes ON customers.zipcode=zipcodes.zip
WHERE
degrees(acos(sin(radians(39.0788994))
*
sin(radians(latitude))
+
cos(radians(39.0788994))
*
cos(radians(latitude))
*
cos(radians(-77.1227036-longitude))
)
)
*60
*1.1515
< 5
That query may have un-matched ()'s, not sure. hard to tell =)
Try a join.
-jw
On Thu, Jun 18, 2009 at 8:06 PM, Matt Neimeyer <matt@neimeyer.org> wrote:
> I'm converting a PHP app from using Visual FoxPro as the database
> backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
> Mac OSX 10.4. The end application will be deployed cross platform and
> to both 4.x and 5.x MySQL servers.
>
> This query returned 21 records in .27 seconds.
>
> SELECT zip FROM zipcodes WHERE
> degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
>
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5
>
> This query returned 21442 records in 1.08 seconds.
>
> SELECT custzip FROM customers
>
> This query is still running half an hour later, with a Time of 2167
> and a State of "Sending Data" (according to the mysql process list)
>
> SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
> zipcodes WHERE
> degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
>
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5)
>
> When I try to EXPLAIN the query it gives me the following...
>
> id,select-type,table,type,possible-keys,key,key-len,ref,rows,Extra
> 1,PRIMARY,customers,index,NULL,cw-custzip,30,NULL,21226,Using where; Using
> index
> 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where
>
> If it matters both tables are INNODB and both customers.custzip and
> zipcodes.zip are indexed. We used a program called DBConvert from
> DMSoft to convert the data so it's "exactly" the same on both the VFP
> side and the MySQL side. With all that in mind... VFP returns the
> exact same query in 5-10 seconds and that includes render time in the
> web browser.
>
> By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
> phonedate >= '2001-01-01' AND phonedate <= '2009-06-18') returns
> almost instantly.
>
> I'm at a complete loss... The suggestions I've seen online for
> optimizing Dependent Subquery's basically revolve around changing it
> from a sub-query to a join but that would require more
> re-architecturing than I want to do... (Unless I'm forced) Especially
> since more than a few of those solutions suggested precalculating the
> distance between zipcodes which only works if the distances are known
> (only allow 10, 50 and 100 mile radi for example)
>
> Any ideas?
>
> Thanks in advance!
>
> Matt
>
>
How to Optimize distinct with index by 周彦伟 on
2009-06-19T02:55:37+00:00
Hi,
I have a sql :
Select distinct user-id from user where key1=value and
key2=value2 and key3=value2;
I add index on (key1,key2,key3,user-id), this sql use temporary table
howevery
I have thousands of queries per second.
How to optimize it?
Anthoer question:
Select * from user where user-id in(id1,id2,id3,id4,.....) order by use-id;
I add index on user-id,but after in,order use temporary table, How to
optimize it?
Thanks!
zhouyanwei
How to Optimize distinct with index by 周彦伟 on
2009-06-19T02:56:09+00:00
Hi,
I have a sql :
Select distinct user-id from user where key1=value and
key2=value2 and key3=value2;
I add index on (key1,key2,key3,user-id), this sql use temporary table
howevery
I have thousands of queries per second.
How to optimize it?
Anthoer question:
Select * from user where user-id in(id1,id2,id3,id4,.....) order by use-id;
I add index on user-id,but after in,order use temporary table, How to
optimize it?
Thanks!
zhouyanwei
How to Optimize distinct with index by 周彦伟 on
2009-06-19T02:56:16+00:00
Hi,
I have a sql :
Select distinct user-id from user where key1=value and
key2=value2 and key3=value2;
I add index on (key1,key2,key3,user-id), this sql use temporary table
howevery
I have thousands of queries per second.
How to optimize it?
Anthoer question:
Select * from user where user-id in(id1,id2,id3,id4,.....) order by use-id;
I add index on user-id,but after in,order use temporary table, How to
optimize it?
Thanks!
zhouyanwei
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Dan Nelson on
2009-06-19T07:16:37+00:00
In the last episode (Jun 18), Matt Neimeyer said:
> I'm converting a PHP app from using Visual FoxPro as the database backend
> to using MySQL as the backend. I'm testing on MySQL 4.1.22 on Mac OSX
> 10.4. The end application will be deployed cross platform and to both 4.x
> and 5.x MySQL servers.
>
> This query returned 21 records in .27 seconds:
>
> SELECT zip FROM zipcodes WHERE
> degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5
Ouch. You might want to calculate the rectange enclosing your target
distance, add an index on lat (and/or long), and add the rectangle check to
your where clause: WHERE latitude BETWEEN lt1 AND lt2 AND longitude BETWEEN
ln2 AND ln2. That way mysql can use the index to pare down most of the rows
without having to call all those trig functions for every zipcode.
> This query returned 21442 records in 1.08 seconds:
>
> SELECT custzip FROM customers
>
> This query is still running half an hour later, with a Time of 2167
> and a State of "Sending Data" (according to the mysql process list)
>
> SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
> zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5)
>
> When I try to EXPLAIN the query it gives me the following...
>
> id,select-type,table,type,possible-keys,key,key-len,ref,rows,Extra
> 1,PRIMARY,customers,index,NULL,cw-custzip,30,NULL,21226,Using where; Using index
> 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where
Neither mysql 4 or 5 are very smart when it comes to subqueries. Unless
your inner query is dead simple, mysql assumes it's a dependent subquery and
runs it once per row in your outer query. You might want to try mysql 6 and
see if it does any better. For example, here are explain plans for mysql 5
and 6 for the following query on the famous Oracle "emp" sample table:
select ename from emp where mgr in
(select empno from emp where ename in ('scott'));
+Mysql 5.1.30:
+Note that it didn't use an index on the outer query, and had to examine all
14 rows. It even used the wrong index on the inner query :)
Mysql 6.0.11:
+Note that the queries have flipped and aren't nested anymore (id is 1 on
both queries). The first query uses the ename index and estimates it will
return one row. The second query uses the mgr index based on the empno
value returned by the first query and estimates it will return 2 rows. Much
better :)
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Matt Neimeyer on
2009-06-19T14:05:58+00:00
>> SELECT zip FROM zipcodes WHERE
>> degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
>> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-=
longitude))))*60*1.1515
>> < 5
>
> Ouch. =A0You might want to calculate the rectange enclosing your target
> distance, add an index on lat (and/or long), and add the rectangle check =
to
> your where clause: WHERE latitude BETWEEN lt1 AND lt2 AND longitude BETWE=
EN
> ln2 AND ln2. =A0That way mysql can use the index to pare down most of the=
rows
> without having to call all those trig functions for every zipcode.
I like this idea the best (it always bothered me running a query that
involved multiple mathmatical functions).
So... Here's the "scratch" php code I ended up with... Anyone see any
problems with it? The only problem I see is that I think the old code
was more "circular" this will be a square (within the limits of a
square on a non-spherical earth... etc.. etc..) ... so there will be
more zip codes included in the corners. If there are too many
complaints about that I might look at some sort of overlapping
rectangle scheme instead of a square.
function ChangeInLatitude($Miles) { return rad2deg($Miles/3960); }
function ChangeInLongitude($Lat, $Miles) { return
rad2deg($Miles/3960*cos(deg2rad($Lat))); }
$Miles =3D 5;
$OriginalLat =3D 39.0788994;
$OriginalLon =3D -77.1227036;
$ChangeInLat =3D ChangeInLatitude($Miles);
$ChangeInLon =3D ChangeInLongitude($OriginalLat, $Miles);
$MinLat =3D $OriginalLat-$ChangeInLat;
$MaxLat =3D $OriginalLat+$ChangeInLat;
$MinLon =3D $OriginalLon-$ChangeInLon;
$MaxLon =3D $OriginalLon+$ChangeInLon;
My only other question is... when I explained the new query... On the
dependent subquery it says possible keys are zip, longitude and
latitude but it used zip. It seems like a better index would be
longitude or latitude? On the primary query, even though there is an
index on custzip it doesn't say it's using ANY indexes. I should
probably leave well enough alone... but I'm curious.
Thanks again!
Matt
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Brent Baisley on
2009-06-19T15:30:39+00:00
It sounds like you want to use spatial indexes, but they only became
available in v4.1
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
http://dev.mysql.com/doc/refman/5.0/en/using-a-spatial-index.html
You would need to switch your table type from InnoDB to MyISAM, which
is fairly easy with ALTER TABLE. But that should allow you to drop all
your calculations in the query.
You don't have to do any re-architecture to change you subquery to a join:
SELECT custzip FROM customers
JOIN
(SELECT zip FROM
zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))=
+
cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-lon=
gitude))))*60*1.1515
< 5) AS zips
ON custzip=3Dzip
Often times that simple change speeds things up considerably in MySQL.
An explain should show it has a DERIVED TABLE if I recall correctly.
Brent Baisley
On Thu, Jun 18, 2009 at 9:06 PM, Matt Neimeyer<matt@neimeyer.org> wrote:
> I'm converting a PHP app from using Visual FoxPro as the database
> backend to using MySQL as the backend. I'm testing on MySQL 4.1.22 on
> Mac OSX 10.4. The end application will be deployed cross platform and
> to both 4.x and 5.x MySQL servers.
>
> This query returned 21 records in .27 seconds.
>
> =A0 SELECT zip FROM zipcodes WHERE
> degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-l=
ongitude))))*60*1.1515
> < 5
>
> This query returned 21442 records in 1.08 seconds.
>
> =A0 SELECT custzip FROM customers
>
> This query is still running half an hour later, with a Time of 2167
> and a State of "Sending Data" (according to the mysql process list)
>
> =A0 SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
> zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude=
))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-l=
ongitude))))*60*1.1515
> < 5)
>
> When I try to EXPLAIN the query it gives me the following...
>
> id,select-type,table,type,possible-keys,key,key-len,ref,rows,Extra
> 1,PRIMARY,customers,index,NULL,cw-custzip,30,NULL,21226,Using where; Usin=
g index
> 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where
>
> If it matters both tables are INNODB and both customers.custzip and
> zipcodes.zip are indexed. We used a program called DBConvert from
> DMSoft to convert the data so it's "exactly" the same on both the VFP
> side and the MySQL side. With all that in mind... VFP returns the
> exact same query in 5-10 seconds and that includes render time in the
> web browser.
>
> By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
> phonedate >=3D '2001-01-01' AND phonedate <=3D '2009-06-18') returns
> almost instantly.
>
> I'm at a complete loss... The suggestions I've seen online for
> optimizing Dependent Subquery's basically revolve around changing it
> from a sub-query to a join but that would require more
> re-architecturing than I want to do... (Unless I'm forced) Especially
> since more than a few of those solutions suggested precalculating the
> distance between zipcodes which only works if the distances are known
> (only allow 10, 50 and 100 mile radi for example)
>
> Any ideas?
>
> Thanks in advance!
>
> Matt
>
>
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Peter Brawley on
2009-06-20T00:16:24+00:00
Matt,
>This query is still running half an hour later, with a Time of 2167
>and a State of "Sending Data" (according to the mysql process list)
> SELECT custzip FROM customers WHERE custzip IN ( ...
For explanation & alternatives see "The unbearable slowness of IN()" at
http://localhost/artful/infotree/queries.php.
PB
>
> This query returned 21 records in .27 seconds.
>
> SELECT zip FROM zipcodes WHERE
> degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5
>
> This query returned 21442 records in 1.08 seconds.
>
> SELECT custzip FROM customers
>
> This query is still running half an hour later, with a Time of 2167
> and a State of "Sending Data" (according to the mysql process list)
>
> SELECT custzip FROM customers WHERE custzip IN (SELECT zip FROM
> zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5)
>
> When I try to EXPLAIN the query it gives me the following...
>
> id,select-type,table,type,possible-keys,key,key-len,ref,rows,Extra
> 1,PRIMARY,customers,index,NULL,cw-custzip,30,NULL,21226,Using where; Using index
> 2,DEPENDENT SUBQUERY,zipcodes,ALL,NULL,NULL,NULL,NULL,42144,Using where
>
> If it matters both tables are INNODB and both customers.custzip and
> zipcodes.zip are indexed. We used a program called DBConvert from
> DMSoft to convert the data so it's "exactly" the same on both the VFP
> side and the MySQL side. With all that in mind... VFP returns the
> exact same query in 5-10 seconds and that includes render time in the
> web browser.
>
> By comparison... the query WHERE id IN (SELECT id FROM phone WHERE
> phonedate >= '2001-01-01' AND phonedate <= '2009-06-18') returns
> almost instantly.
>
> I'm at a complete loss... The suggestions I've seen online for
> optimizing Dependent Subquery's basically revolve around changing it
> from a sub-query to a join but that would require more
> re-architecturing than I want to do... (Unless I'm forced) Especially
> since more than a few of those solutions suggested precalculating the
> distance between zipcodes which only works if the distances are known
> (only allow 10, 50 and 100 mile radi for example)
>
> Any ideas?
>
> Thanks in advance!
>
> Matt
>
>
> >
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Walter Heck - OlinData.com on
2009-06-20T02:45:38+00:00
Peter,
On Thu, Jun 18, 2009 at 9:27 PM, Peter
Brawley<peter.brawley@earthlink.net> wrote:
> For explanation & alternatives see "The unbearable slowness of IN()" at
> http://localhost/artful/infotree/queries.php.
>
you prolly meant to not post a url pointing at your local copy of your
website. This works better for most of us:
http://www.artfulsoftware.com/infotree/queries.php ;)
Walter
Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org
Re: Half Hour Sub-query in MySQL vs. 5 Seconds in VFP? by Matt Neimeyer on
2009-06-22T15:06:06+00:00
On Fri, Jun 19, 2009 at 11:27 AM, Brent Baisley<brenttech@gmail.com> wrote:
> It sounds like you want to use spatial indexes, but they only became
> available in v4.1
> http://dev.mysql.com/doc/refman/5.0/en/create-index.html
> http://dev.mysql.com/doc/refman/5.0/en/using-a-spatial-index.html
That "feels" like the right thing (spatial calculations = spatial
indexes?) but I looked at the docs and my head exploded. Can anyone
recommend a good book that takes me through it gently?
That said I'm intreged by the MBRContains and the Polygon functions...
If I read those right I could create a simplified "circle" (probably
just an octogon) to help eliminate false positives in the "corners"
when using a plain square as the enclosure.
> You don't have to do any re-architecture to change you subquery to a join:
> SELECT custzip FROM customers
> JOIN
> (SELECT zip FROM
> zipcodes WHERE degrees(acos(sin(radians(39.0788994))*sin(radians(latitude))+
> cos(radians(39.0788994))*cos(radians(latitude))*cos(radians(-77.1227036-longitude))))*60*1.1515
> < 5) AS zips
> ON custzip=zip
Will that work after a where clause? Multiple Times? For example...
(pseudo-code...)
SELECT * FROM customers WHERE saleslastyear > 100000
JOIN (SELECT zip FROM etc....) AS zips ON custzip=zip
JOIN (SELECT MAX(date) FROM phonecalls) AS LastCalledOn ON custid=custid
Just from thinking about that... I assume that the only limitation is
that in a subselect you can do something like WHERE NOT IN (select
etc) but with a JOIN you are assuming a "positive" relationship? For
example using the JOIN methods above there isn't a way to simply do
"AND custid NOT IN (SELECT custid FROM ordersplacedthisyear)" other
than doing exactly that and adding this clause to the saleslastyear
clause. (In this particular case a column "lastorderdate" in customer
that was programatically updated on ordering would also be useful but
I'm thinking examples here... ;) )
I've never seen JOIN used outside of a traditional "SELECT t1.*,t2.*
FROM table1 AS t1 LEFT JOIN table2 AS t2 ON t1.id=t2.id" type of
structure so I kinda feel like I have a new toy...
Thanks!