Insertion and Query
This section outlines the fundamental design strategy behind Geoprox, detailing the algorithms used for inserting and querying geospatial data.
The core approach involves encoding geographic coordinates into geohashes, utilizing a Patricia Trie for efficient storage and retrieval, and employing a KD-Tree for fast nearest neighbor searches. These techniques collectively enable Geoprox to efficiently manage and query large volumes of geospatial data.
Data Structures
-
Approximate Position Tracking:
- Utilize a
HashMap
(* see v0.3.0 release notes) to track approximate positions. - Format:
object_key
=>geohash(lat, lng)
- Utilize a
-
Key/Value Storage:
- Use
StringPatriciaMap
for efficient key/value storage. - Format:
geohash_region
=>HashSet(object_key)
- Use
Insert Procedure
-
Hash Object's Latitude/Longitude:
- Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called
insert_depth
, should be sufficient to ensure the resulting geohash region is large enough to store the density of objects in that area.
- Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called
-
Upsert Object Key:
- Insert or update the
object_key
in theStringPatriciaMap
with itsgeohash_region
. - If
object_key
had an existinggeohash_region
, remove its entry from that region in theStringPatriciaMap
.
- Insert or update the
Search Procedure
-
Initiate Search:
- A search is initiated at a specific location with a given search range (in kilometers).
-
Encode Latitude/Longitude:
- Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called
search_depth
, should ensure the geohash region is small enough to fit within the search radius.
- Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called
-
Optimize Geohash Search Region:
- Optimize the geohash search region by expanding it until it covers the search area.
- If the geohash search region does not contain the search radius, remove the last character (effectively expanding the region) until it fits within the search area.
-
Partition
StringPatriciaMap
:- Partition the
StringPatriciaMap
by the common prefix of the optimized geohash.
- Partition the
-
Construct
KdTree
:- Iterate over the partition and construct a
KdTree
. - Store objects and their approximate locations in the
KdTree
.
- Iterate over the partition and construct a
-
Compute Nearest Neighbor:
- Perform nearest neighbor computation with the search requirements against the
KdTree
to get the final result.
- Perform nearest neighbor computation with the search requirements against the