Introduction
Geoprox is a geospatial indexing and querying API designed to simplify and optimize location-based operations.
It provides efficient methods for encoding and decoding geohashes, finding neighboring regions, and managing geospatial indexes for a variety of applications such as food delivery services and real-time inventory tracking.
Key Features:
- Geohash Encoding and Decoding: Convert geographic coordinates into geohashes and retrieve coordinates from geohashes for easy location-based operations.
- Neighbor Identification: Quickly find neighboring geohashes to support spatial queries and region-based operations.
- Geospatial Index Management: Create, update, and delete geospatial indexes to organize and query location-based data efficiently.
- Proximity Searches: Perform radius-based searches to locate nearby resources or inventory items.
Benefits:
- Efficient Spatial Data Handling: Streamline operations by leveraging geohashes for compact and hierarchical representation of locations.
- Flexible Querying: Use proximity-based queries to enhance real-time decision-making and operational efficiency.
- Easy Integration: Seamlessly integrate Geoprox with applications for real-time tracking, delivery services, and more.
Contributing
Geoprox is an open-source project and welcomes contributions from the community. If you’d like to contribute, please visit the GitHub repository to find the source code, report issues, and submit feature requests. For detailed guidance on contributing, please refer to the CONTRIBUTING document.
License
Geoprox is licensed under the Apache 2.0 or MIT License (your choice).
How Geoprox Works
The Problem
Initially, for the May 2024 Rust Indy.rs meetup, the proof of concept problem began with ride-share pairing like Uber or Lyft.
Ride pairing depends on being able to track and pair a set of drivers within the vicinity of some location (i.e., the pickup location of an order).
Since then, the problem has generalized to how can we efficiently track and retrieve resources geographically near some location.
Objectives
- Keep track of the approximate location of objects.
- Search should return a set of objects within a radius of the search location.
Solution
Index the objects and hash their key and geographical position. Using a geohash to encode the approximate location, we can map geohash prefixes into the set of objects contained in the geohash using a Prefix Tree stored in-memory.
We can efficiently partition the search space and perform a nearest neighbor search on a merged set of objects in the search region and return the results.
What is a Geohash?
A geohash is a string representation of a geographic location, encoded as a series of characters. This encoding allows for efficient storage and querying of spatial data. Geohashes are hierarchical, meaning that as the string length increases, the represented area becomes more precise.
How Geohashes Work
Geohashes divide the world into a grid of cells, each with a unique identifier (encoded in base 32). The precision of the geohash can be adjusted by changing its length:
- Short geohashes represent larger areas.
- Long geohashes represent smaller, more precise areas.
Geohash Length | Lat Bits | Lng Bits | Lat Error | Lng Error | Km Error |
---|---|---|---|---|---|
1 | 2 | 3 | ±23 | ±23 | ±2,500 km (1,600 mi) |
2 | 5 | 5 | ±2.8 | ±5.6 | ±630 km (390 mi) |
3 | 7 | 8 | ±0.70 | ±0.70 | ±78 km (48 mi) |
4 | 10 | 10 | ±0.087 | ±0.18 | ±20 km (12 mi) |
5 | 12 | 13 | ±0.022 | ±0.022 | ±2.4 km (1.5 mi; 2,400 m) |
6 | 15 | 15 | ±0.0027 | ±0.0055 | ±0.61 km (0.38 mi; 610 m) |
7 | 17 | 18 | ±0.00068 | ±0.00068 | ±0.076 km (0.047 mi; 76 m) |
8 | 20 | 20 | ±0.000085 | ±0.00017 | ±0.019 km (0.012 mi; 19 m) |
For example, the following geohashes are all within a maximum distance of +/- 78km (48mi) from each other:
- s008nb00j8
- s00twy01mt
- s00j8n012j
However, these geohashes represent completely different regions:
- s128nb00j8
- s34twy01mt
- s56j8n012j
Benefits of Using Geohashes
- Compact Representation: Efficiently stores geographic coordinates.
- Hierarchical Structure: Allows for easy aggregation and precision adjustment.
- Proximity Queries: Simplifies the process of finding nearby locations.
Check out the Geohash Explorer for an interactive map to help visualize geohashing.
Prefix Searching
What is Prefix Search?
Prefix search is a method for quickly finding all entries in a dataset that share a common prefix. In the context of geohashes, it allows us to retrieve all geohashes that start with the same characters, which is useful for geographical proximity searches.
Trie Tree
We can store Geohashes in a Trie Tree data structure. Tries enable us to efficiently retrieve geohashes with a common prefix in O(L)
time, where L
is the length of the prefix being searched.
Key Features of Trie Tree:
- Efficient Retrieval: Quickly find geohashes sharing a common prefix.
- Time Complexity:
O(L)
, whereL
is the prefix length.
Patricia Trie
A Patricia Trie, also known as a Radix Tree, is a type of Trie Tree that optimizes space by merging common prefixes into single nodes. This further enhances the efficiency of prefix searching by reducing the number of nodes.
Key Features of Patricia Trie Tree:
- Space Optimization: Merges common prefixes to save space.
- Efficient Lookup: Retrieves geohashes with common prefixes efficiently.
- Time Complexity: Best case
O(log N)
whereN
is the length of the longest geohash stored, and worst caseO(L)
whereL
is the length of the prefix.
Comparisons
Skip List
A Skip List is a probabilistic data structure that allows for fast search, insertion, and deletion operations. It consists of multiple levels of linked lists, with each higher level acting as an "express lane" for elements lower down.
-
Pros:
- Efficient Average-case Performance: Provides good average-case performance for search operations, typically
O(log N)
time complexity. - Cache-friendly: The contiguous arrangement of nodes makes Skip Lists more cache-friendly than tree-based structures.
- Efficient Average-case Performance: Provides good average-case performance for search operations, typically
-
Cons:
- Space Overhead: Requires additional space to maintain pointers for multiple levels, resulting in higher space usage compared to a Patricia Trie.
- Probability Calculations: Maintaining balance requires careful calculation of node levels' probabilities, involving some understanding of probability theory.
B-Trees and MemTables
B-Trees are balanced tree data structures that maintain sorted data and enable searches, sequential access, insertions, and deletions with logarithmic time complexity. MemTables are in-memory data structures used in conjunction with B-Trees to facilitate efficient storage and retrieval of key-value pairs before persisting data to disk.
-
Pros:
- Efficient Disk-based Storage: Ideal for managing large datasets and effective for disk-based storage solutions.
- Range Queries and Sorted Data Retrieval: Provides excellent performance for range queries and retrieving sorted data efficiently.
-
Cons:
- Memory Usage: MemTables require significant memory to store data before flushing to disk, which can be a limitation for very large datasets.
- Complexity in Merging: Managing and merging data between MemTables and disk-based storage can introduce complexity and overhead in maintaining data consistency.
Why Patricia Trie?
For geospatial data, where object locations might change frequently, in-memory solutions like Patricia Trie are preferred. They offer superior speed and efficiency in managing dynamic data, avoiding the latency associated with disk I/O operations.
Insertion and Query
This section outlines the fundamental design strategy behind Geoprox, detailing the algorithms used for inserting and querying geospatial data.
The core approach involves encoding geographic coordinates into geohashes, utilizing a Patricia Trie for efficient storage and retrieval, and employing a KD-Tree for fast nearest neighbor searches. These techniques collectively enable Geoprox to efficiently manage and query large volumes of geospatial data.
Data Structures
-
Approximate Position Tracking:
- Utilize a
HashMap
(* see v0.3.0 release notes) to track approximate positions. - Format:
object_key
=>geohash(lat, lng)
- Utilize a
-
Key/Value Storage:
- Use
StringPatriciaMap
for efficient key/value storage. - Format:
geohash_region
=>HashSet(object_key)
- Use
Insert Procedure
-
Hash Object's Latitude/Longitude:
- Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called
insert_depth
, should be sufficient to ensure the resulting geohash region is large enough to store the density of objects in that area.
- Hash the object's latitude and longitude up to a certain depth (e.g., 6). This depth, called
-
Upsert Object Key:
- Insert or update the
object_key
in theStringPatriciaMap
with itsgeohash_region
. - If
object_key
had an existinggeohash_region
, remove its entry from that region in theStringPatriciaMap
.
- Insert or update the
Search Procedure
-
Initiate Search:
- A search is initiated at a specific location with a given search range (in kilometers).
-
Encode Latitude/Longitude:
- Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called
search_depth
, should ensure the geohash region is small enough to fit within the search radius.
- Encode the latitude and longitude into a geohash up to a specific depth (e.g., 6). This depth, called
-
Optimize Geohash Search Region:
- Optimize the geohash search region by expanding it until it covers the search area.
- If the geohash search region does not contain the search radius, remove the last character (effectively expanding the region) until it fits within the search area.
-
Partition
StringPatriciaMap
:- Partition the
StringPatriciaMap
by the common prefix of the optimized geohash.
- Partition the
-
Construct
KdTree
:- Iterate over the partition and construct a
KdTree
. - Store objects and their approximate locations in the
KdTree
.
- Iterate over the partition and construct a
-
Compute Nearest Neighbor:
- Perform nearest neighbor computation with the search requirements against the
KdTree
to get the final result.
- Perform nearest neighbor computation with the search requirements against the
Getting Started
Geoprox comes in different flavors depending on what you're cooking up. If you're looking to dive deep and use core features directly in your Rust projects, the geoprox-core
crate is your go-to. It's packed with all the fundamental data structures that make Geoprox tick.
But if you're aiming for a more out-of-the-box experience, the main geoprox
package is where the magic happens. It's the CLI runtime designed to sidecar with your other application services, making it super easy to integrate Geoprox into your existing setup.
Crates
- geoprox - Standalone CLI for running the Geoprox service
- geoprox-core - Core data structure implementations
- geoprox-server - HTTP API wrapper around
geoprox-core
- geoprox-client - HTTP client library for
geoprox-server
Installation
To install geoprox
, use the following command:
cargo install geoprox
Usage
Geoprox provides several commands to start the server and work with geohashes:
geoprox help
Usage: geoprox [COMMAND]
Commands:
run Start Geoprox server
encode Hash latitude/longitude into geohash
decode Decode geohash into approximate longitude/latitude
spec Generate an OpenAPI specification file
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
To start using Geoprox, you can run the following command to start the server:
geoprox run
You can also encode and decode geohashes directly from the CLI:
# Encode latitude and longitude into geohash
geoprox encode --lat=37.7749 --lng=-122.4194 --depth=5
# Decode geohash into latitude and longitude
geoprox decode 9q8yy
For detailed help on any command, use the help command:
geoprox help encode
Configuration
Geoprox run
command allows specifying a configuration file using the -c
or --config
option or set the GEOPROX_CONFIG
environment variable. This file can contain various settings to customize the behavior of the Geoprox server and commands. The configuration can be provided in any common format such as YAML
, TOML
, JSON
, or INI
.
Example Usage
Specify config using -c
flag:
# from current working directory
geoprox run -c geoprox.toml
# from absolute path
geoprox run -c /my/custom/config.yaml
or using environment variables:
export GEOPROX_CONFIG='/my/custom/config.json'
geoprox run
Example Config
Here's an example configuration file in TOML
format:
# All settings are optional; these are the default values.
[server]
# The address the server will bind to
http_addr = '0.0.0.0'
# The port the server will listen on
http_port = 5000
# Request timeout
timeout = '10s'
[shard]
# Determines the default geohash length for inserts
insert_depth = 6
# Determines the default geohash length for searches
search_depth = 6
# Specifies the default number of results returned in range queries
default_count = 100
# Toggles the default sorting behavior for query results
default_sorted = false
[server.snapshots]
# Toggle snapshot usage
disabled = false
# Directory where snapshots will be stored
path = '/var/lib/geoprox/snapshots'
# How often snapshots will be taken
every = '30s'
Environment Variables
These are the currently supported environment variables. They will take precedence over settings defined in the configuration file.
Environment Variable | Description | Default Value |
---|---|---|
GEOPROX_CONFIG | Specifies the configuration file path. | |
GEOPROX_HTTP_ADDR | The address the server will bind to. | 0.0.0.0 |
GEOPROX_HTTP_PORT | The port the server will listen on. | 5000 |
Fine Tuning
The insert_depth
and search_depth
settings control the precision of geohashing and impact the performance of distance calculations and search queries.
Optimizing insert_depth
Higher Insert Depth:
- Description: Objects are grouped into smaller, more precise regions.
- Impact: Searching takes slightly longer, but distance calculations are more accurate.
- Use Case: Ideal for scenarios where precise distance measurements are crucial.
Lower Insert Depth:
- Description: Objects are grouped into larger regions.
- Impact: Search performance improves, but distance calculations become less accurate.
- Use Case: Best for cases where fast general distance estimates are acceptable.
Optimizing search_depth
Higher Search Depth:
- Description: The initial search region is smaller and more precise.
- Impact: Search queries take slightly longer, but results are more accurate.
- Use Case: Suitable for narrow range queries where high accuracy is needed.
Lower Search Depth:
- Description: The initial search region is larger, leading to faster searches.
- Impact: Search speed improves, but accuracy may be reduced.
- Use Case: Ideal for wide range queries where speed is prioritized over precision.
Docker Guide
The Geoprox Docker image (ezrasingh/geoprox
) is available on two registries:
Getting Started
To quickly get started with Docker Compose, skip to the Quick Start Guide.
Pull the Geoprox container from one of the available repositories.
# from Docker Hub
docker pull ezrasingh/geoprox:latest
# or from GitHub Container Registry
docker pull ghcr.io/ezrasingh/geoprox:latest
To run the container with default configurations, use:
docker run -t -p 5000:5000 ezrasingh/geoprox:latest
This command will start the Geoprox container with the default settings, exposing port 5000 for access.
To use another port on your machine try
-p <my-port>:5000
Variants & Tagging
The Geoprox image supports two variant tags: alpine
and debian
.
By default, the latest
tag uses the alpine
variant. Latest is built from the main
branch.
To pin your Geoprox instance to a specific version and variant, use the format <version>-<variant>
, such as v0.3.0-debian
.
If you specify only the version, like
v0.3.0
, it will default to thealpine
variant, resulting inv0.3.0-alpine
.
Check the respective registry for available tags.
Customizing the Configuration
For complete settings and configuration details, refer to the Geoprox CLI README.
Geoprox allows specifying a configuration file using the -c
or --config
option or set the GEOPROX_CONFIG
environment variable. Geoprox will automatically detect and parse configuration files in formats like YAML
, TOML
, JSON
, or INI
if they are named geoprox.yaml
, geoprox.toml
, geoprox.json
, or geoprox.ini
, respectively.
For example, if you have a configuration file named geoprox.toml
in the container working directory (/var/lib/geoprox
), you can run the container with the following command:
docker run -t -p 5000:5000 \
-v $(pwd)/geoprox.toml:/var/lib/geoprox/geoprox.toml:ro \
ezrasingh/geoprox:latest \
-c geoprox.toml
In this command:
-v $(pwd)/custom-config.yaml:/some/path/custom-config.yaml:ro
mounts the local configuration file into the container.
If you need to specify a different path or file name for your configuration, use the -c
or --config
option:
docker run -t -p 5000:5000 \
-v $(pwd)/custom-config.yaml:/some/path/custom-config.yaml:ro \
ezrasingh/geoprox:latest \
-c /some/path/custom-config.yaml
In this command:
-c /some/path/custom-config.yaml
specifies the configuration file to be used by the Geoprox server.
Or use the GEOPROX_CONFIG
environment variable:
docker run -t -p 5000:5000 \
-v $(pwd)/custom-config.yaml:/some/path/custom-config.yaml:ro \
-e GEOPROX_CONFIG='/some/path/custom-config.yaml' \
ezrasingh/geoprox:latest
In this command:
-e GEOPROX_CONFIG='/some/path/custom-config.yaml'
specifies the configuration file to be used by the Geoprox server.
Quick Start
To quickly get started with Geoprox using Docker, follow these steps to configure and run the server with Docker Compose.
The provided Docker Compose configuration serves as a boilerplate to showcase how Geoprox can be configured and allows you to get an instance up and running fast for evaluation purposes.
Docker Compose
Use the following command to start the Geoprox service:
docker-compose -f quickstart/docker-compose.yml up
This command will start the Geoprox service using the configuration specified in quickstart/docker-compose.yml
. The configuration file in this setup is a basic example, allowing you to easily evaluate Geoprox with default settings.
For custom configurations, you can modify the Docker Compose file or replace it with your own configuration as needed.
Client SDK
HTTP Client libraries are generated from an OpenAPI specification using OpenAPI Generator.
HTTP Client Libraries
Availble clients can be found at contrib/client-sdk
, along with steps for generating your own client if needed.
Language | Package Name | Repository Link |
---|---|---|
C# | GeoproxClient | GeoproxClient |
Dart | geoprox_client | geoprox_client |
Dart (Dio) | geoprox_client_dio | geoprox_client_dio |
Elixir | geoprox_client | geoprox_client |
Go | geoprox_client | geoprox_client |
Groovy | geoprox-client-groovy | geoprox-client-groovy |
Kotlin | geoprox_client | geoprox_client |
PHP | GeoproxClient | GeoproxClient |
Python | geoprox_client | geoprox_client |
Ruby | geoprox_client | geoprox_client |
Rust | geoprox-client | geoprox-client |
TypeScript (Axios) | geoprox-client-ts-axios | geoprox-client-ts-axios |
TypeScript (Fetch) | geoprox-client-ts-fetch | geoprox-client-ts-fetch |
TypeScript (Node) | geoprox-client-ts-node | geoprox-client-ts-node |
TypeScript (RxJS) | geoprox-client-ts-rxjs | geoprox-client-ts-rxjs |