OK – I’ve been wrestling with this for about 3 months on and off and since I’ve exhausted every geo proximity formula out there that I’ve come across and I’m no closer to getting the right results I figured it time to ask for some help.
THE AIM
I’m setting up a fairly basic implementation of a store locator. The user enters their postcode and selects from a predefined list of search radii. The gmaps API generates lat/long coordinates for this address and passes them to a php script. In this script the user coords are queried against a mysql database table (structure below)
post_id int(11)
post_type varchar(20)
lat float(10,6)
lng float(10,6)
The results of this query (post ids) are entered into a wordpress query which generates the XML that contains the map marker data. (the wordpress query uses post__in and posts_per_page -1 to display info for all ID generated by the query
THE PROBLEM
In a nutshell, every implementation of the Haversine formula I’ve come across seems to result in missing markers – specifically any markers that are very close to the users entered coordinates (don’t know precisely but I think it’s within about 500m). This is a big problem as if the user enters their postcode and there is a store very close to their location it won’t show up.
I’ve tried about 8 different permutations of the forumla that I’ve dug up from various tutorials with the same results. Below is the formula that I’m currently using on the site which provides all markers except for the those very close to the users entered position:
$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];
// Calculate square radius search
$lat1 = (float) $center_lat - ( (int) $radius / 69 );
$lat2 = (float) $center_lat + ( (int) $radius / 69 );
$lng1 = (float) $center_lng - (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );
$lng2 = (float) $center_lng + (int) $radius / abs( cos( deg2rad( (float) $center_lat ) ) * 69 );
$sqlsquareradius = "
SELECT
post_id, lat, lng
FROM
wp_geodatastore
WHERE
lat BETWEEN ".$lat1." AND ".$lat2."
AND
lng BETWEEN ".$lng1." AND ".$lng2."
"; // End $sqlsquareradius
// Create sql for circle radius check
$sqlcircleradius = "
SELECT
t.post_id,
3956 * 2 * ASIN(
SQRT(
POWER(
SIN(
( ".(float) $center_lat." - abs(t.lat) ) * pi() / 180 / 2
), 2
) + COS(
".(float) $center_lat." * pi() / 180
) * COS(
abs(t.lat) * pi() / 180
) * POWER(
SIN(
( ".(float) $center_lng." - t.lng ) * pi() / 180 / 2
), 2
)
)
) AS distance
FROM
(".$sqlsquareradius.") AS t
HAVING
distance <= ".(int) $radius."
ORDER BY distance
"; // End $sqlcircleradius
$result = mysql_query($sqlcircleradius);
$row = mysql_fetch_array( $result );
while($row = mysql_fetch_array( $result )) {
// the contents of each row
$post_ids[] = $row['post_id'];
}
There was 1 formula that I tried that was suggested by Mike Pelley here: Geolocation SQL query not finding exact location
This formula seemed to show markers that were very close to the users entered location but missed out others that should have been displayed within the given radius. To clear up any confusion this is the code I used:
$center_lat = $_GET["lat"];
$center_lng = $_GET["lng"];
$radius = $_GET["radius"];
$sql = "
SELECT post_id, lat, lng,
truncate((degrees(acos( sin(radians(lat))
* sin(radians(".$center_lat."))
+ cos(radians(lat))
* cos(radians(".$center_lat."))
* cos(radians(".$center_lng." - lng) ) ) )
* 69.09*1.6),1) as distance
FROM wp_geodatastore HAVING distance <= ".$radius." ORDER BY distance desc
"; // End $sqlcircleradius
$result = mysql_query($sql);
$row = mysql_fetch_array( $result );
while($row = mysql_fetch_array( $result )) {
// Print out the contents of each row
$post_ids[] = $row['post_id'];
}
THE REQUEST
Basically I would like to know why neither of these blocks of code are displaying the correct markers. If anyone can suggest any improvements to the code or can point me towards some resource that I might have missed that would be great
EDIT
Thought my psudeo answer was working but as it turns out that was still having problems. I’ve ended up going for a very different tack now and I’m using a very good jquery store locator which can be found here: http://www.bjornblog.com/web/jquery-store-locator-plugin
Won’t work for every project out there but for my needs it’s perfect (and works!)
EDIT
This location-finder comes up often enough that I’ve written an article on it.
http://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/
Original Post
Let’s start by dealing with the haversine formula once for all, by putting it into a stored function so we can forget about its gnarly details. NOTE: This whole solution is in statute miles.
Now let’s put together a query that searches on the bounding box, and then refines the search with our distance function and orders by distance
Based on the PHP code in the your question:
Assume
$radius
is your radius,$center_lat
,$center_lng
is your reference point.Notice a few things about this.
First, it does the bounding box computation in SQL rather than in PHP. There’s no good reason for that, except keeping all the computation in one environment.
(radius / 69)
is the number of degrees inradius
statute miles.Second, it doesn’t fiddle with the size of the longitudinal bounding box based on latitude. Instead it uses a simpler, but slightly too large, bounding box. This bounding box catches a few extra records, but the distance measurement gets rid of them. For your typical postcode / store finder app the performance difference is negligible. If you were searching many more records (e.g. a database of all utility poles) it might not be so trivial.
Third, it uses a nested query to do the distance elimination, to avoid having to run the distance function more than once for each item.
Fourth, it orders by distance ASCENDING. This means your zero-distance results should show up first in the result set. It usually makes sense to list nearest things first.
Fifth, it uses
FLOAT
rather thanDOUBLE
throughout. There’s a good reason for that. The haversine distance formula is not perfect, because it makes the approximation that the earth is a perfect sphere. That approximation happens to break down at roughly the same level of accuracy as the epsilon forFLOAT
numbers. SoDOUBLE
is deceptive numerical overkill for this problem. (Don’t use this haversine formula to do civil engineering work like parking lot drainage, or you will get big puddles a couple of epsilon, a few inches, deep, I promise.) It’s fine for store-finder applications.Sixth, you are definitely going to want to create an index for your
lat
column. If your table of locations doesn’t change very often, it will help to create an index for yourlng
column as well. But yourlat
index will give you most of your query performance gain.Lastly, I tested the stored procedure and the SQL, but not the PHP.
Reference: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
Also my experience with a bunch of proximity finders for health care facilities.
————— EDIT ——————–
If you don’t have a user interface that lets you define a stored procedure, that’s a nuisance. At any rate, PHP lets you use numbered parameters in the sprintf call, so you can generate the whole nested statement like this. NOTE: You might need %$1f etc. You’ll need to experiment with this.
Here’s a solution I used successfully for a while in my own geo proximity calculations:
This is code from a working production system,
Uses a different distance formular, but for a store locator the difference is minimal.
Thinking a little laterally I’ve come up with a ‘sort of’ solution to the problem of the missing markers. The two equations I posted originally gave the correct results but each missed out either markers close to the target or on the edges of the search radius
It’s not very elegant but I figured that running both equations and producing 2 arrays which I then combined (removing any duplicates) would give me all the markers I’m looking for. This does work (obviously a performance hit but it’s not a high traffic application) so I’ll work with this for the time being but I’m still after a more practical solution if anyone has one!
You can try my class at http://www.phpclasses.org/package/6202-PHP-Generate-points-of-an-Hilbert-curve.html. It uses the harvesine formula and a hilbert curve to compute a quadkey. You can then search the quadkey from left to right. Every position of the key is a point on the monster curve. A better explanation of the curve can be found at Nick’s spatial index quadtree hilbert curve blog. It’s like using the spatial index extension from mysql but you have more control. You can use a z curve or moore curve or you can change the look.