Cognitive Psychology
About

Depth Perception

The visual system's ability to perceive the three-dimensional structure of the world from two-dimensional retinal images, using binocular and monocular depth cues.

The retinal image is fundamentally two-dimensional, yet we experience a richly three-dimensional world. Depth perception — the ability to perceive distance and spatial relationships among objects — relies on an impressive array of cues that the visual system combines to construct a coherent spatial representation. Understanding how the brain solves this "inverse problem" has been central to vision science since the 19th century.

Binocular Cues

Because our two eyes are separated horizontally by approximately 6 centimeters, each receives a slightly different view of the world. This difference, called binocular disparity, provides powerful depth information for objects within several meters. The brain computes disparity by matching corresponding features across the two retinal images — a process called stereopsis.

Binocular Disparity δ = α_L − α_R

Where δ is the disparity, and α_L and α_R are the angular positions of an object on the left and right retinas relative to the fixation point.

Bela Julesz's random-dot stereograms (1960) demonstrated that stereopsis does not require monocular form recognition — depth can be perceived from patterns that contain no recognizable objects in either eye alone. This proved that binocular disparity processing occurs at an early stage, before object recognition.

Convergence — the inward rotation of the eyes when fixating near objects — provides an additional binocular cue through proprioceptive signals from the extraocular muscles, though it is effective only at close distances.

Monocular Cues

A rich set of monocular (pictorial) cues allows depth perception even with one eye. These include occlusion (nearer objects block farther ones), relative size (more distant objects subtend smaller visual angles), texture gradients (surface texture becomes denser with distance), linear perspective (parallel lines converge toward a vanishing point), atmospheric perspective (distant objects appear hazier and bluer), and height in the visual field (more distant objects typically appear higher).

Motion parallax provides another powerful monocular cue: when the observer moves, nearby objects shift more rapidly across the retinal image than distant ones. This differential velocity directly specifies the relative distances of objects in the scene.

The Visual Cliff

Eleanor Gibson and Richard Walk's (1960) visual cliff experiment demonstrated that depth perception develops early. Infants as young as six months refused to crawl over a glass surface that appeared to drop off sharply, despite tactile evidence of a solid surface. This showed that depth cues are functional early in development, though the relative contributions of innate mechanisms and visual experience remain debated.

Cue Integration

The visual system does not rely on any single depth cue but integrates multiple cues according to their reliability. Bayesian models of cue combination propose that the brain weights each cue in proportion to its precision (inverse variance), producing a combined estimate that is more accurate than any individual cue alone. This optimal or near-optimal integration has been demonstrated experimentally for combinations of binocular disparity, texture, and motion cues.

Neural Basis

Disparity-selective neurons are found throughout the visual cortex, beginning in V1 and extending into V2, V3, V4, and parietal areas. Different cortical regions appear to specialize in different aspects of depth processing: V3A is particularly important for processing relative disparity, while MT/V5 combines disparity with motion information for depth from motion.

The dorsal visual stream plays a critical role in using depth information for action — guiding reaching and grasping movements — while the ventral stream uses depth for object recognition and scene understanding.

Related Topics

External Links