A comprehensive autonomous navigation robot system integrating SLAM mapping, path planning, voice control, and face recognition. Built on ROS framework with LiDAR-based localization and multi-sensor fusion, achieving robust autonomous navigation and intelligent interaction in indoor environments.
This section summarizes the differential-drive vehicle kinematics and control models, including forward/inverse kinematics, relationships between wheel speeds and robot linear/angular velocities, and common trajectory tracking algorithms.
In ROS-based robots the most common frames are map
, odom
and base_link
. The map
frame is a fixed, long-term global reference (Z axis up). Because localization may apply discrete corrections when new sensor data arrives, poses expressed in map
can occasionally "jump"; this makes map
suitable as a global reference but not ideal as a fast local control frame.
The odom
frame is a locally-continuous reference produced by dead-reckoning sources (wheel encoders, visual odometry, IMU). odom
guarantees smooth, continuous pose updates (no sudden jumps) and is useful as a short-term reference for sensors and controllers, but it accumulates drift and therefore cannot serve as a precise long-term global frame. Typically map
and odom
start aligned; localization modules compute the transform map ā odom
(via tf
) to correct odometry drift.
Note the distinction between the odom
coordinate frame and an /odom
topic: the frame is the coordinate reference, whereas the /odom
topic commonly publishes the transform from odom
to base_link
(i.e., the robot pose in the odom
frame).
The base_link
frame is the robot body frame (usually centered on the platform); the core localization task is estimating base_link
's pose in the map
frame. Each sensor has its own fixed sensor frame (for example laser
or base_laser
for a LiDAR), and the transform between that sensor frame and base_link
is constant and should be published via tf
.
Below are the key points extracted from the image. Let the linear speeds of the two wheels be V_a
(wheel A) and V_b
(wheel B), and let the track width be 2L
. When only one wheel is turning, the contact point of the stationary wheel is the instantaneous center of rotation (ICR), which yields special-case kinematic relations.
1) Only wheel A rotates (wheel B stationary):
$$V_{ox}=\frac{V_a}{2},\quad V_{oy}=0,\quad \dot{\theta}=-\frac{V_a}{2L}$$
2) Only wheel B rotates (wheel A stationary):
$$ \begin{cases} V_{ox}=\frac{V_b}{2}\\ V_{oy}=0\\ \dot{\theta}=\frac{V_b}{2L} \end{cases} $$
Combining the two special cases gives the general form (superposition of wheel speeds):
$$ \begin{cases} V_{ox}=\tfrac{1}{2}(V_a+V_b)\\ V_{oy}=0\\ \dot{\theta}=\tfrac{1}{2L}(V_b-V_a) \end{cases} $$
Based on the forward kinematic formulation, the corresponding inverse kinematic equations can be directly obtained, establishing the mapping from the robotās body-frame velocity to each wheelās linear velocity:
$$ \begin{cases} V_a = V_{ox} + L\,\dot{\theta}\\ V_b = V_{ox} - L\,\dot{\theta} \end{cases} $$
The following summarizes the coordinate transform and velocity relationships illustrated in the diagram above. Let the robot body frame be (x',y') rotated by heading \(\theta\) relative to the global frame (x,y). The coordinate transform from body to global is:
$$x = x'\cos\theta - y'\sin\theta\\ y = x'\sin\theta + y'\cos\theta$$
For the two-wheel differential drive with wheel linear speeds \(V_a\) and \(V_b\) (left/right or labeled in the figure), the derived body-frame velocity components are:
$$ \begin{cases} V_{ox}=\tfrac{1}{2}(V_a+V_b)\cos\phi\\ V_{oy}= -\tfrac{1}{2}(V_a+V_b)\sin\phi\\ \dot{\theta}=\tfrac{1}{2L}(V_b-V_a) \end{cases} $$
This set of equations represents the transformation from the angular velocities of the two motors to the robotās motion velocity in the global (world) coordinate frame. During odometry computation, the robotās linear and angular velocities are integrated over each time interval, and the accumulated result yields the current displacement, which corresponds to the data published on the/odom
topic.
With this, the derivation of the differential-drive kinematic model is complete.
GMapping is a widely used SLAM (Simultaneous Localization and Mapping) algorithm based on the Rao-Blackwellized Particle Filter (RBPF) framework.
It factorizes the full SLAM posterior into two parts: the robot trajectory estimation and the map estimation.
$$ p(x_{1:t}, m \mid z_{1:t}, u_{1:t-1}) = p(m \mid x_{1:t}, z_{1:t}) \, p(x_{1:t} \mid z_{1:t}, u_{1:t-1}) $$
where:
In traditional FastSLAM 1.0, the proposal distribution for sampling new particle poses only depends on the motion model:
$$p(x_t\mid x_{t-1}, u_t)$$
This approach suffers from high uncertainty due to odometry drift and noise, often causing particle degeneracy ā only a few particles have significant weights.
GMapping improves this by incorporating the current observation \(z_t\) into the proposal distribution:
$$p(x_t\mid x_{t-1}, u_t, z_t)$$
Key idea: combining motion and observation information corrects the motion-model prediction using the latest laser scan, reducing uncertainty of sampled particles.
Implementation steps:
The particle weight update becomes:
$$w_t^{(i)} \propto w_{t-1}^{(i)} \dfrac{p(z_t\mid x_t^{(i)}, m_{t-1}^{(i)})\, p(x_t^{(i)}\mid x_{t-1}^{(i)}, u_t)}{q(x_t^{(i)}\mid x_{t-1}^{(i)}, u_t, z_t)}$$
Effect:
In a standard particle filter, resampling is performed at every iteration to prevent weight degeneracy. Frequent resampling can cause sample impoverishment, reducing particle diversity and potentially leading to local minima. GMapping uses selective resampling:
Only perform resampling when the effective sample size \(N_{\text{eff}}\) falls below a threshold.
The effective sample size is defined as:
$$N_{\text{eff}} = \dfrac{1}{\sum_{i=1}^{N} (w_t^{(i)})^2}$$
When \(N_{\text{eff}} < N_{\text{threshold}}\) (typically \(N/2\) or \(N/3\)), resampling is triggered.
Benefits:
move_base
node, which integrates inputs from odometry, robot pose estimation, and map data.
This node performs both global planning and local planning: the global planner generates an overall path to the target, while the local planner continuously adjusts the trajectory in response to environmental changes, thus achieving autonomous obstacle avoidance during navigation.
Below is the standard MCL (Monte Carlo Localization) algorithm presented as pseudocode. Variables: \(\mathcal{X}_{t-1}\) is the particle set at time \(t-1\), \(u_t\) the control input, \(z_t\) the current observation, and \(m\) the map.
AMCL (Adaptive Monte Carlo Localization) is a particle-filter-based method for robot localization. Its objective is to estimate the robot pose over time from sensor measurements and control inputs. We denote the robot pose at time \(t\) by \(x_t\); the control (odometry) input applied between time \(t-1\) and \(t\) by \(u_t\); and the sensor observation at time \(t\) by \(z_t\). When referring to histories we use the shorthand sequences \(x_{1:t}\), \(u_{1:t}\) and \(z_{1:t}\). Using the Bayes filter formulation, the posterior can be written as:
$$ p(x_t\mid z_{1:t}, u_{1:t}) = \eta\; p(z_t\mid x_t)\; p(x_t\mid z_{1:t-1}, u_{1:t}) $$
Here the prediction (motion update) step is given by the ChapmanāKolmogorov equation:
$$ p(x_t\mid z_{1:t-1}, u_{1:t}) = \int p(x_t\mid x_{t-1}, u_t)\; p(x_{t-1}\mid z_{1:t-1}, u_{1:t-1})\, dx_{t-1}. $$
The measurement update then incorporates the current observation \(z_t\) via the likelihood \(p(z_t\mid x_t)\), and \(\eta\) is a normalizing constant.
AMCL approximates the posterior with a set of weighted particles \(\{x_t^{(i)}, w_t^{(i)}\}_{i=1}^M\). The particle filter loop typically performs:
Here, M denotes the number of particles.
$$ w_t^{(i)} \propto w_{t-1}^{(i)}\; p(z_t\mid x_t^{(i)}) $$
A common metric to decide whether to resample is the Effective Sample Size (ESS):
$$ N_{\text{eff}} = \dfrac{1}{\sum_{i=1}^{M} (w_t^{(i)})^2} $$
When \(N_{\text{eff}}\) falls below a chosen threshold (for example \(M/2\)), resampling is performed. AMCL uses adaptive resampling strategies (and can vary the number of particles) based on ESS to balance computational cost and localization accuracy. This adaptive behavior helps maintain robustness in real-time robot localization.
The move_base
stack commonly uses costmaps to represent the environment and perform planning. In ROS, the costmap_2d
package generates different costmap layers on top of the base map to support both local and global planners. Typical layers include:
The navigation stack typically separates planning into global planners (produce a full path on a known map) and local planners (reactive, operate in velocity space to follow the global path while avoiding obstacles).
We designed and deployed a SLAM system on NVIDIA Jetson, integrating 2D LiDAR and RealSense D435 for robust multi-sensor fusion. Implemented Google Cartographer for high-accuracy 2D localization and mapping, and extended to 3D ESDF-based mapping for precise environmental reconstruction and optimized path planning in real-world navigation tasks.
We used OpenCV to rectify the camera-captured map and extract obstacle coordinates for path planning. Based on the map, we designed an efficient motion planning algorithm using A* search algorithm, enabling the car to avoid randomly positioned obstacles. Then we implemented a PID controller for precise speed and steering control, ensuring smooth navigation.
convertScaleAbs
to increase contrast (alpha=1.5) so colors are easier to distinguish.Sampling strategy (9x9 grid center sampling):
i = j = 40 tiles = [] while i < 720: while j < 720: tiles.append(hsv2color(hsv[i, j])) # sample every 80 pixels at the center of each grid cell j += 80 j = 40 i += 80 # Start from (40, 40), step = 80 pixels # Cover 720x720 image ā 9x9 = 81 tiles
Color to state mapping for each sampled grid center:
Graph construction logic:
# Build graph G = nx.Graph() for target in range(0, 81): up = target - 9 down = target + 9 left = target - 1 right = target + 1 # obstacle weight = 1000 (penalize but not forbidden), passable weight = 1 if up >= 0: if tiles[up] == 0: G.add_weighted_edges_from([(target, up, 1000)]) else: G.add_weighted_edges_from([(target, up, 1)]) if down < 81: if tiles[down] == 0: G.add_weighted_edges_from([(target, down, 1000)]) else: G.add_weighted_edges_from([(target, down, 1)]) # ensure left/right do not cross row boundaries if left // 9 == target // 9: if tiles[left] == 0: G.add_weighted_edges_from([(target, left, 1000)]) else: G.add_weighted_edges_from([(target, left, 1)]) if right // 9 == target // 9: if tiles[right] == 0: G.add_weighted_edges_from([(target, right, 1000)]) else: G.add_weighted_edges_from([(target, right, 1)]) # Node indices: 0-80, row-major from top-left to bottom-right # Weight strategy: obstacle=1000 (penalty), normal=1 # Two-stage routing: find_treasure = nx.dijkstra_path(G, source=start, target=treasure) find_exit = nx.dijkstra_path(G, source=treasure, target=destination)
A* uses a heuristic to guide the search. For a 2D grid, the Manhattan distance is a simple admissible heuristic:
def manhattan(u, v): ux, uy = divmod(u, 9) vx, vy = divmod(v, 9) return abs(ux - vx) + abs(uy - vy) # Using networkx's A* (assuming G is built as above and weights represent cost): path_to_treasure = nx.astar_path(G, source=start, target=treasure, heuristic=manhattan, weight='weight') path_to_exit = nx.astar_path(G, source=treasure, target=destination, heuristic=manhattan, weight='weight')