A screen-to-camera optical data link that encodes a text string as phase angles in the 2-D discrete Fourier transform of a displayed image. A webcam reads the signal and a browser-based decoder recovers the original text.
pip install flask numpy opencv-python scipy Pillow
python server.py # open http://localhost:5000Point your webcam at the left half of the encoder display.
The encoder builds a complex frequency-domain map F[fy, fx] of size
N × N (N = 256). It places energy at a small set of known frequency-bin
coordinates and sets the phase of each bin to carry one byte of data.
The inverse mapping recovers the byte:
A single phase measurement therefore has a resolution of
To produce a real-valued spatial image the frequency map must satisfy:
Every data carrier (fy, fx) therefore also implies a conjugate at
(-fy, -fx). To avoid two carriers interfering with each other none of the
8 data-carrier positions is the mirror-image of another one.
The spatial image is multiplied by a 2-D window (Hann or Blackman, chosen in encoder settings) to suppress the spectral leakage that would otherwise spread energy from the strong pilot bins across neighbouring data bins.
Every symbol is held for symbol_frames frames. During the last 20 % of
each symbol the phase of each carrier is interpolated in the complex plane
(raised-cosine cross-fade) to avoid a discontinuous phase jump that would
smear energy across the spectrum:
where t' is the normalised position within the blend region.
Four pilot bins at symmetric corners (±9, ±9) carry a fixed known
phase of 0 rad at a fixed amplitude.
Purposes:
| Purpose | Mechanism |
|---|---|
| Global phase anchor | The measured pilot phase ≈ any global phase offset introduced by the channel |
| Geometric de-skew | Four pilot peaks form a square; their displacement from expected positions reveals camera rotation, zoom, and keystone distortion |
| Per-symbol alignment | Because pilots are constant they can be averaged across many frames to sharpen the distortion estimate |
webcam frame
│
▼
resize to 256×256 (bilinear)
│
▼
normalise to [0, 1]
│
▼
multiply 2-D window W[y,x] ← suppresses leakage
│
▼
2-D DFT F = FFT2(frame · W)
│
▼
fftshift → magnitude · phase map
│
├──► temporal average over N frames (reduces noise)
│
▼
find pilot peaks (±search_r bins around expected position)
│
├──► global phase offset φ_global = mean(measured pilot phases)
│
├──► (optional) homography correction using 4 pilot positions
│
▼
read phase at each data-carrier bin
│
▼
apply corrections: φ_corr = φ_meas − φ_global − φ_offset − φ_nudge[i]
│
▼
byte = round(φ_corr / 2π · 255) mod 256
│
▼
multi-symbol ring buffer → scan for 0x00 sentinels → extract message
The text string to transmit. It is UTF-8 encoded, padded with a leading and
trailing 0x00 null byte (sentinel), then padded to a multiple of 8 bytes
(one row per symbol).
Applied in the spatial domain before the FFT (both encoder and decoder).
A window w[y,x] = w_1[y] · w_1[x] trades main-lobe width for sidelobe
suppression.
| Window | Main lobe (bins) | Peak sidelobe (dB) | When to use |
|---|---|---|---|
| Hann | 4 | −31.5 | Good default; lower inter-bin leakage |
| Blackman | 6 | −57.3 | When carriers are close together or SNR is low |
The sidelobe level determines how much energy from one carrier bleeds into a neighbouring carrier's bin, which sets a noise floor on the phase measurement. For carriers spaced ≥ 10 bins apart the Hann window is sufficient; Blackman is advisable at < 8 bin spacing.
How many frames each symbol (group of 8 bytes) is displayed before advancing.
A longer symbol gives the temporal averager more frames to accumulate, directly improving SNR:
At 30 fps, 90 frames = 3 s per symbol. With temporal_avg = 5 frames of
decoder accumulation the effective integration is ~0.5 s even during transitions.
Same trade-off as the encoder window. Both encoder and decoder should ideally use the same window type so the combined spectral response is predictable. If the encoder window is unknown, Blackman is the safer choice.
A global manual phase correction added to all decoded carrier phases after automatic pilot-based correction.
When to adjust:
If all decoded bytes are consistently wrong by the same amount (e.g., every
character is shifted by the same ASCII distance) the total phase offset is
incorrect. Increase or decrease this slider until bytes align.
The total correction applied to each bin is:
The decoder accumulates N complex FFT frames and averages them in the
complex plane before reading phases:
Coherent (complex) averaging improves SNR by:
e.g. 5 frames → +7 dB, 10 frames → +10 dB, 30 frames → +15 dB.
Trade-off: a large N assumes the phase is constant over those frames.
During a symbol transition blending, the average smears the phase, temporarily
corrupting that byte. Set N to be less than symbol_frames (encoder) to
stay within the stable part of the symbol.
When the camera frame is scaled, rotated, or cropped, pilot peaks shift from
their expected bin positions. The decoder searches within ±r bins of each
expected pilot location and takes the argmax of the local magnitude patch.
Too small: misses pilots when geometric distortion is large.
Too large: may lock onto a data carrier instead of a pilot,
corrupting the phase-offset estimate.
A value of r ≈ 0.5 × (bin-spacing between pilot and nearest data carrier)
is safe. The default of 6 keeps clear of the nearest data carrier at bin 14.
A carrier bin is declared undetected (shown as ·) if its SNR ratio
(signal vs. local noise annulus) falls below max(1.5, threshold × 20).
Lowering the threshold accepts weaker signals but increases the chance of
decoding noise as a valid byte. Raise it if a carrier with reasonable magnitude
is flickering; lower it if a carrier is consistently showing ·.
Controls the zoom level of the annotated FFT display: ±zoom bins around DC
are shown. This is display only — it has no effect on decoding.
Wider zoom → more context, smaller circles.
Narrower zoom → exaggerated positions, easier to see pilot displacement.
When ON, the four measured pilot peak positions are used to compute a homography (perspective transform) of the full complex FFT map before reading data-carrier phases. This corrects:
- Rotation of the screen in the camera frame
- Zoom / distance variation
- Keystone / trapezoid perspective distortion
The homography H maps measured pilot positions → expected pilot positions:
It is solved via Direct Linear Transform (OpenCV findHomography) from the
four pilot correspondences.
When to disable: if pilots are not detected correctly (high displacement or anomalous phase errors) the homography may make things worse. Turn it off and rely on pilot-phase-only correction until the pilot peaks are clean.
Fine-grained per-carrier phase correction applied after the global offset:
Useful when individual carriers experience a different phase bias, e.g. due to locally different screen brightness, sensor non-uniformity, or being near a Moiré resonance frequency.
How to set: with a known test message, nudge carrier i until its decoded
byte matches the expected value. The required nudge equals the steady-state
phase error of that carrier.
Frequency plane (fy horizontal, fx vertical, DC at centre)
fx →
-20 -14 -9 0 +9 +14 +20
fy
-20 P3 C5 C7
-14 P3 C0 C2
-9 P2 P3 P3
0 C3 · C0 C1 C4
+9 P0 P0 P0
+14 P1 C1 C3
+20 C6 P1 C4 C6
P0–P3 = pilot carriers (±9, ±9) phase = 0 rad (fixed)
C0–C7 = data carriers
Actual carrier frequencies:
| Carrier | (fy, fx) | Conjugate at |
|---|---|---|
| C0 | (+14, 0) | (−14, 0) |
| C1 | (0, +14) | (0, −14) |
| C2 | (+14, +14) | (−14, −14) |
| C3 | (+14, −14) | (−14, +14) |
| C4 | (+20, 0) | (−20, 0) |
| C5 | (0, +20) | (0, −20) |
| C6 | (+20, +10) | (−20, −10) |
| C7 | (+10, +20) | (−10, −20) |
- Phase wrapping — phase is defined mod 2π; byte 0 and byte 255 differ by only 1/255 of a full cycle and are easily confused.
- No shared clock — encoder and decoder have independent frame rates and timestamps; frame-rate drift causes a slowly rotating phase bias.
- Codec compression — webcam MJPEG/H.264 block-DCT destroys fine phase relationships; a single quantisation step can rotate a bin's phase by tens of degrees.
- Monitor gamma non-linearity — display gamma (≈2.2) distorts the IFFT signal amplitude before it reaches the camera, adding harmonic content.
- Moiré / pixel-grid aliasing — the display pixel grid and camera sensor grid beat together, shifting apparent bin positions unpredictably.
- Perspective / geometric distortion — any rotation or keystoning smears energy across FFT bins; pilot correction helps but cannot fully recover severe distortion.
- Ambient light — room light adds a DC component and spatially-varying gain, shifting amplitudes non-uniformly.
- Camera noise — shot/readout noise adds Gaussian phase noise of σ_φ ≈ 1/SNR radians at every bin.
- Low bitrate — 8 carriers × 1 byte = 8 bytes per symbol; at 90 frames and 30 fps that is ≈ 0.09 bytes/second.
- Symbol transition corruption — frames captured during the blend region carry a mixture of two symbols and are decoded incorrectly.
- No error correction — a single noisy frame corrupts the associated byte with no way to detect or repair it without FEC.
- Flicker — high-amplitude high-frequency patterns can cause visible display flicker at certain carrier frequencies.