Week 3

AI Model Inference

Load the ONNX model, run a forward pass, convert logits to probabilities, and map the prediction to an ATC-20 safety placard.

ModelManager class

python
1# app/models/manager.py
2import onnxruntime as ort
3import numpy as np
4
5class ModelManager:
6    LABELS = ["no-damage", "minor-damage", "major-damage", "destroyed"]
7
8    def __init__(self, onnx_path: str):
9        self.session = ort.InferenceSession(
10            onnx_path, providers=["CPUExecutionProvider"])
11        self.input_name = self.session.get_inputs()[0].name
12
13    def predict(self, tensor: np.ndarray) -> np.ndarray:
14        logits = self.session.run(None, {self.input_name: tensor})[0]
15        return logits[0]   # shape (4,)

Plain-English explanation

A small class that owns the loaded model and exposes one method: predict(tensor) → logits.

Why it matters

Encapsulating the model in a class means main.py loads it once at startup, not on every request — saving seconds per call.

Line by line

ort.InferenceSession(path)Loads the .onnx graph into memory.
providers=["CPUExecutionProvider"]Run on CPU; swap for CUDA on GPU machines.
session.get_inputs()[0].nameLook up the input tensor name baked into the ONNX graph.
session.run(None, feed)None = return all outputs. feed maps input name → numpy array.

Expected output

logits → array([2.31, -0.42, -1.10, -3.05], dtype=float32)

Common errors

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: Invalid Feed Input

Your tensor shape or dtype does not match the model. Verify (1, 3, H, W) float32.

Quick quiz

Why load the model once in a class?

Load model at startup

python
1# app/main.py
2from app.models.manager import ModelManager
3model: ModelManager | None = None
4
5@app.on_event("startup")
6def load_model():
7    global model
8    model = ModelManager("app/models/damage_v1.onnx")
9    print("Model loaded:", model.LABELS)

Plain-English explanation

Run the model loader exactly once when Uvicorn starts. Store the instance in a module-level variable.

Why it matters

Startup hooks guarantee the model is ready before the first /classify call arrives.

Line by line

@app.on_event("startup")FastAPI runs this function during server boot.
global modelAllow the function to assign to the module-level name.

Expected output

INFO: Application startup complete. Model loaded: ['no-damage', 'minor-damage', 'major-damage', 'destroyed']

Common errors

FileNotFoundError: damage_v1.onnx

Copy the exported .onnx into app/models/ before launching.

Quick quiz

Where should the ONNX session live?

Softmax — convert logits to probabilities

python
1def softmax(x: np.ndarray) -> np.ndarray:
2    e = np.exp(x - x.max())   # numerical stability
3    return e / e.sum()

Plain-English explanation

Turn raw network outputs (any real numbers) into a probability distribution that sums to 1.

Why it matters

ATC-20 logic compares probabilities, not raw logits. Without softmax you cannot threshold a confidence.

Line by line

x - x.max()Subtracting the max prevents overflow when logits are large.
np.exp(...)Exponentiate so values become positive.
/ e.sum()Normalize so the result sums to 1.0.

Expected output

softmax([2.31, -0.42, -1.10, -3.05]) → [0.911, 0.059, 0.030, 0.000]

Quick quiz

Why subtract x.max() before exp?

Confidence and entropy

python
1def confidence(probs: np.ndarray) -> float:
2    return float(probs.max())
3
4def entropy(probs: np.ndarray) -> float:
5    # Shannon entropy in nats, normalized by log(N)
6    p = probs.clip(1e-9, 1.0)
7    h = -(p * np.log(p)).sum()
8    return float(h / np.log(len(p)))

Plain-English explanation

Confidence is the highest probability. Entropy measures how spread-out the distribution is — high entropy = uncertain.

Why it matters

A confident prediction with entropy 0.1 is far more trustworthy than one with entropy 0.9, even if the top class is the same.

Line by line

probs.clip(1e-9, 1.0)Avoid log(0).
-(p * np.log(p)).sum()Shannon entropy formula.
/ np.log(len(p))Normalize to [0, 1] so different class counts are comparable.

Expected output

confidence → 0.91 entropy → 0.18 (sharp, trustworthy) confidence → 0.34 entropy → 0.95 (flat, uncertain)

Quick quiz

A high entropy value indicates…

ATC-20 mapping

python
1def to_placard(label: str, conf: float) -> tuple[str, str]:
2    if conf < 0.55:                       # low-confidence default
3        return "YELLOW", "Confidence low — engineer inspection required."
4    if label == "no-damage":
5        return "GREEN",  "Safe to enter — routine inspection only."
6    if label == "minor-damage":
7        return "YELLOW", "Restricted use — limited entry permitted."
8    if label == "major-damage":
9        return "ORANGE", "Engineer inspection required before re-entry."
10    return "RED", "Unsafe — do NOT enter."

Plain-English explanation

Map the model's class label to an official ATC-20 placard plus a human-readable engineer recommendation.

Why it matters

The placard is what gets posted on the actual building. The mapping is conservative: low confidence always escalates to a human.

Line by line

if conf < 0.55Default to YELLOW whenever the model is not confident.
tuple[str, str]Returns (placard, recommendation).

Expected output

to_placard("destroyed", 0.93) → ("RED", "Unsafe — do NOT enter.")

Quick quiz

Why does low confidence escalate to YELLOW instead of trusting the top class?

Real /classify response

python
1@router.post("", response_model=ClassifyResponse)
2async def classify(file: UploadFile = File(...)):
3    raw = await file.read(); validate_filename(file.filename, len(raw))
4    tensor = preprocess(raw, file.filename)        # from Week 2
5
6    logits = model.predict(tensor)
7    probs  = softmax(logits)
8    idx    = int(probs.argmax())
9    label  = ModelManager.LABELS[idx]
10    conf   = confidence(probs)
11    placard, rec = to_placard(label, conf)
12
13    return ClassifyResponse(
14        filename=file.filename, placard=placard, top_class=label,
15        confidence=conf, entropy=entropy(probs),
16        classes=[DamageClass(label=l, probability=float(p))
17                 for l, p in zip(ModelManager.LABELS, probs)],
18        recommendation=rec,
19    )

Plain-English explanation

The complete endpoint: preprocess → predict → softmax → placard. Every line builds on Weeks 1 and 2.

Why it matters

This is the production handler. Everything else (batch, compare, visualise) wraps this same core.

Line by line

model.predict(tensor)Calls the ONNX session and returns the raw logits.
probs.argmax()Index of the most likely class.
DamageClass(label=l, probability=float(p))Build the per-class list for the frontend chart.

Expected output

{ "filename":"tile_001.tif", "placard":"ORANGE", "top_class":"major-damage", "confidence":0.78, "entropy":0.41, "classes":[ {"label":"no-damage","probability":0.05}, {"label":"minor-damage","probability":0.12}, {"label":"major-damage","probability":0.78}, {"label":"destroyed","probability":0.05} ], "recommendation":"Engineer inspection required before re-entry." }

Common errors

AttributeError: 'NoneType' object has no attribute 'predict'

The startup hook did not run or the path is wrong. Check the launch logs.

Quick quiz

Which step turns model output into a human-readable safety decision?