Augmented reality overlays place virtual items onto live camera input using tracking and segmentation. For eyewear, overlays often align frames to facial landmarks (eyes, nose bridge, temples) and follow head rotation. For clothing, overlays may use shoulder and torso landmarks to estimate position and scale. Overlay approaches can be pixel-based (2D image warping) or geometry-based (mapping a 3D asset to detected coordinates). Each approach may require trade-offs between visual realism and execution speed; slower, geometry-based methods can present more convincing perspective but may need more processing power.

Tracking and occlusion handling are common technical challenges with AR overlays. Reliable tracking typically uses point detection, optical flow, or model-based pose estimation to follow movement between frames. Occlusion — for example when a hand passes in front of a garment — often requires depth cues or segmentation masks to render objects in front of or behind body parts plausibly. Some systems use simple heuristics for common scenarios, while others incorporate depth-sensing hardware or learned depth estimation to improve occlusion fidelity on compatible devices.
Environmental variability affects AR overlay performance. Lighting changes can alter perceived color and shading, so many systems include automatic white-balance adjustments or estimate scene illumination to shade virtual items more consistently. Background complexity and loose clothing on the user may interfere with landmark detection; guidance screens that suggest minimal clothing or a plain backdrop may improve initial capture. Designers often frame such guidance as optional suggestions that may improve results rather than strict requirements.
Operational considerations include asset preparation and camera permissions. Virtual assets need consistent origin points, scaling metadata, and simplified collision geometry when used in overlays. Developers may prepare multiple LODs (levels of detail) to accommodate a range of devices. Camera permission requests and clear explanations of how image data is handled are common practice: many implementations either process images on-device or transmit only non-identifying fit parameters to back-end systems to align with privacy expectations.