Week 2

Image Upload + Preprocessing

Validate uploaded files, read 16-bit GeoTIFF satellite imagery with Rasterio, normalize bands, and shape the tensor the model expects.

Why not OpenCV imread for GeoTIFF?

OpenCV silently truncates 16-bit raster values to 8-bit and drops geospatial metadata (CRS, transform, projection). Rasterio preserves the full uint16 range and exposes the metadata you need for downstream alignment. Use Rasterio for any .tif from xBD, Sentinel, or Maxar.

File validation by extension and size

python
1from fastapi import HTTPException
2
3ALLOWED = {".tif", ".tiff", ".jpg", ".jpeg", ".png"}
4MAX_BYTES = 50 * 1024 * 1024  # 50 MB
5
6def validate_filename(name: str, size: int):
7    ext = "." + name.rsplit(".", 1)[-1].lower()
8    if ext not in ALLOWED:
9        raise HTTPException(415, f"Unsupported file type: {ext}")
10    if size > MAX_BYTES:
11        raise HTTPException(413, "File larger than 50 MB")

Plain-English explanation

Reject uploads that are not images, or are too large, before they reach the model.

Why it matters

If you skip validation a 4 GB file or a Word document can crash the server. Fail fast with a clear HTTP error.

Line by line

ALLOWED = {...}A set of extensions we will accept.
HTTPException(415, ...)415 = Unsupported Media Type. 413 = Payload Too Large.
name.rsplit(".", 1)Splits the filename once from the right so 'foo.bar.tif' → 'tif'.

Expected output

# .docx upload HTTP 415 {"detail": "Unsupported file type: .docx"}

Common errors

AttributeError: 'NoneType' object has no attribute 'lower'

file.filename may be None. Check truthiness first.

Quick quiz

Which HTTP status means 'unsupported media type'?

Magic-byte detection

python
1def detect_type(b: bytes) -> str:
2    if b[:4] == b"\x89PNG":              return "png"
3    if b[:3] == b"\xff\xd8\xff":          return "jpeg"
4    if b[:4] in (b"II*\x00", b"MM\x00*"): return "tiff"
5    raise HTTPException(415, "Not an image")

Plain-English explanation

Read the first few bytes of the file and compare them to the standard signatures for PNG, JPEG, and TIFF.

Why it matters

A user can rename mal.exe to sat.tif. Trusting the extension is a security hole; the magic bytes are what the file actually is.

Line by line

b[:4] == b"\x89PNG"Every PNG starts with the bytes 89 50 4E 47.
"II*\x00" or "MM\x00*"TIFF: II = little-endian, MM = big-endian.

Expected output

detect_type(open("tile.tif","rb").read(8)) # → 'tiff'

Quick quiz

Why check magic bytes after the extension?

Read a GeoTIFF with Rasterio

python
1import rasterio
2import numpy as np
3
4def read_geotiff(path: str):
5    with rasterio.open(path) as src:
6        arr = src.read()              # shape: (bands, H, W)  dtype=uint16
7        meta = {
8            "crs":       str(src.crs),
9            "transform": list(src.transform)[:6],
10            "bounds":    list(src.bounds),
11            "width":     src.width,
12            "height":    src.height,
13            "count":     src.count,
14        }
15    return arr, meta

Plain-English explanation

Open the GeoTIFF, read every band into a NumPy array, and capture the geospatial metadata.

Why it matters

Rasterio preserves the full 16-bit range and the CRS — both needed for accurate inference and for showing the user where on Earth the tile came from.

Line by line

with rasterio.open(path) as srcContext manager closes the file when the block exits.
src.read()Returns shape (bands, H, W). Note: bands first, not last.
str(src.crs)Coordinate reference system, e.g. 'EPSG:4326'.
src.transformAffine matrix mapping pixel (col, row) → geographic (x, y).

Expected output

arr.shape → (8, 1024, 1024) meta → {"crs": "EPSG:32618", "width": 1024, "height": 1024, "count": 8, ...}

Common errors

rasterio.errors.RasterioIOError: ... not recognized as a supported file format

The file is not a real GeoTIFF. Check magic bytes first.

Quick quiz

Why use Rasterio instead of OpenCV for GeoTIFF?

Select bands 5, 3, 2 (false-color)

python
1# Sentinel-2 / Landsat false-color: NIR=5, Red=4, Green=3 (1-indexed)
2def select_532(arr: np.ndarray) -> np.ndarray:
3    nir, red, green = arr[4], arr[2], arr[1]   # 0-indexed
4    rgb = np.stack([nir, red, green], axis=-1) # H, W, 3
5    return rgb

Plain-English explanation

Pick the three bands that highlight vegetation, urban damage, and water, then stack them as an RGB-like image.

Why it matters

The team's model was trained on this exact 5-3-2 false-color composite. Feeding raw bands would produce meaningless predictions.

Line by line

arr[4], arr[2], arr[1]Bands are 1-indexed in remote sensing literature; NumPy is 0-indexed.
np.stack([...], axis=-1)Stack along the last axis so the result is (H, W, 3).

Expected output

rgb.shape → (1024, 1024, 3) rgb.dtype → uint16

Common errors

IndexError: index 4 is out of bounds

Your raster has fewer than 5 bands. Inspect arr.shape first.

Quick quiz

If bands are 1-indexed in docs, what NumPy index is band 5?

Normalize 16-bit imagery to [0, 1]

python
1def normalize(img: np.ndarray) -> np.ndarray:
2    img = img.astype(np.float32)
3    lo, hi = np.percentile(img, (2, 98))
4    img = np.clip((img - lo) / (hi - lo + 1e-6), 0, 1)
5    return img

Plain-English explanation

Stretch the 2nd to 98th percentile of pixel values to fill the 0–1 range that the model expects.

Why it matters

Raw uint16 values can span 0–65535. Without normalization the network sees a uniform 'dark' image and predicts garbage.

Line by line

img.astype(np.float32)Convert from uint16 so subtraction does not underflow.
np.percentile(img, (2, 98))Robust min/max — ignores outliers like sensor saturation.
np.clip(..., 0, 1)Values outside the percentile band get pinned.
+ 1e-6Prevents division-by-zero on flat tiles.

Expected output

img.min(), img.max() → (0.0, 1.0) img.dtype → float32

Quick quiz

Why use the 2nd and 98th percentiles instead of min and max?

Capture geospatial metadata

python
1def extract_meta(src) -> dict:
2    return {
3        "crs":        str(src.crs),
4        "resolution": src.res,         # (x_size, y_size) in CRS units
5        "bounds":     tuple(src.bounds),
6        "height":     src.height,
7        "width":      src.width,
8        "bands":      src.count,
9    }

Plain-English explanation

Return everything an engineer needs to locate the tile on a map and reproduce the read.

Why it matters

Inspection reports must cite WHERE a damaged building is. Without CRS and bounds the prediction is useless.

Line by line

src.resPixel size in CRS units (often meters).
src.bounds(left, bottom, right, top) in CRS coordinates.

Expected output

{"crs":"EPSG:32618","resolution":(0.5,0.5),"bounds":(...),"height":1024,"width":1024,"bands":8}

Quick quiz

What does src.res represent?

Quality checks

python
1def quality_checks(arr: np.ndarray, meta: dict) -> list[str]:
2    warnings = []
3    if meta["bands"] < 3:
4        warnings.append("Fewer than 3 bands — false-color disabled.")
5    if arr.std() < 1.0:
6        warnings.append("Flat image — possible cloud or fill.")
7    if np.isnan(arr).any():
8        warnings.append("NaNs detected — interpolation recommended.")
9    return warnings

Plain-English explanation

Catch obvious data problems before they reach the model and confuse the engineer.

Why it matters

A confident wrong prediction on a cloudy tile is dangerous. Surface uncertainty up front.

Line by line

arr.std() < 1.0Low standard deviation = flat = cloud / fill / corruption.
np.isnan(arr).any()NaN propagates through softmax and produces undefined output.

Expected output

["Flat image — possible cloud or fill."]

Quick quiz

Why warn on low standard deviation?

Standard image preprocessing with PIL

python
1from PIL import Image
2import numpy as np
3
4def load_standard(path: str, size=(512, 512)) -> np.ndarray:
5    img = Image.open(path).convert("RGB").resize(size)
6    arr = np.asarray(img, dtype=np.float32) / 255.0   # H, W, 3
7    return arr

Plain-English explanation

For JPG/PNG photos (drone or phone shots), use Pillow to load, resize, and convert to a normalized array.

Why it matters

Phone uploads are 8-bit RGB and need a simpler path than GeoTIFF. Same downstream shape, different reader.

Line by line

.convert("RGB")Drops alpha/grayscale variations to a consistent 3-channel image.
.resize(size)The model expects exactly 512×512 input.
/ 255.0Normalize 8-bit values to [0, 1].

Expected output

arr.shape → (512, 512, 3) arr.dtype → float32

Common errors

UnidentifiedImageError

The bytes do not match any image format Pillow knows. Re-check magic bytes.

Quick quiz

Why .convert("RGB") before resizing?

Reshape to (1, 3, H, W)

python
1def to_tensor(img_hwc: np.ndarray) -> np.ndarray:
2    # H, W, C  →  C, H, W  →  1, C, H, W
3    chw = np.transpose(img_hwc, (2, 0, 1))
4    batch = np.expand_dims(chw, axis=0).astype(np.float32)
5    return batch  # ready for ONNX Runtime

Plain-English explanation

Switch from height-width-channel to channel-first, then add a batch dimension. This is the exact shape ONNX expects.

Why it matters

PyTorch (and therefore ONNX export) uses NCHW. Feeding NHWC produces silently wrong outputs.

Line by line

np.transpose(img_hwc, (2, 0, 1))Reorder axes: (H,W,C) → (C,H,W).
np.expand_dims(chw, axis=0)Add a leading dim so shape becomes (1, C, H, W).

Expected output

batch.shape → (1, 3, 512, 512) batch.dtype → float32

Common errors

ValueError: axes don't match array

Your input is not (H, W, 3). Check arr.shape after preprocessing.

Quick quiz

Which tensor layout does ONNX (from PyTorch) expect?

Update /classify with real preprocessing

python
1@router.post("", response_model=ClassifyResponse)
2async def classify(file: UploadFile = File(...)):
3    raw = await file.read()
4    validate_filename(file.filename, len(raw))
5    kind = detect_type(raw)
6
7    tmp = f"uploads/{file.filename}"
8    open(tmp, "wb").write(raw)
9
10    if kind == "tiff":
11        arr, meta = read_geotiff(tmp)
12        rgb = select_532(arr)
13    else:
14        rgb = load_standard(tmp) * 65535   # match GeoTIFF scale
15        meta = {"crs": None, "bands": 3}
16
17    norm   = normalize(rgb)
18    tensor = to_tensor(norm)
19    # Week 3: model.run(tensor) goes here.
20    return ClassifyResponse(filename=file.filename, placard="GREEN",
21        top_class="no-damage", confidence=0.72, entropy=0.83,
22        classes=[...], recommendation="Stub — model wires up Week 3.")

Plain-English explanation

Wire every preprocessing function into the endpoint. The model call is the only missing piece — that arrives in Week 3.

Why it matters

By the end of this cell, your server accepts both GeoTIFF and standard images and produces the exact tensor the AI model will consume.

Line by line

raw = await file.read()Read the upload into memory once so size and magic bytes can both be checked.
validate_filename / detect_typeTwo-layer validation — extension and content.
to_tensor(norm)Final NCHW float32 tensor ready for ONNX Runtime.

Expected output

# tensor.shape (1, 3, 512, 512) → model.run() in Week 3

Quick quiz

Why save the upload to disk before reading?