Week 3

AI Model Inference

Load the ONNX model, run a forward pass, convert logits to probabilities, and map the prediction to an ATC-20 safety placard.

1

ModelManager class

python
1# app/models/manager.py
2import onnxruntime as ort
3import numpy as np
4
5class ModelManager:
6 LABELS = ["no-damage", "minor-damage", "major-damage", "destroyed"]
7
8 def __init__(self, onnx_path: str):
9 self.session = ort.InferenceSession(
10 onnx_path, providers=["CPUExecutionProvider"])
11 self.input_name = self.session.get_inputs()[0].name
12
13 def predict(self, tensor: np.ndarray) -> np.ndarray:
14 logits = self.session.run(None, {self.input_name: tensor})[0]
15 return logits[0] # shape (4,)

Plain-English explanation

A small class that owns the loaded model and exposes one method: predict(tensor) → logits.

Why it matters

Encapsulating the model in a class means main.py loads it once at startup, not on every request — saving seconds per call.

Line by line

  • ort.InferenceSession(path)Loads the .onnx graph into memory.
  • providers=["CPUExecutionProvider"]Run on CPU; swap for CUDA on GPU machines.
  • session.get_inputs()[0].nameLook up the input tensor name baked into the ONNX graph.
  • session.run(None, feed)None = return all outputs. feed maps input name → numpy array.
Expected output
logits → array([2.31, -0.42, -1.10, -3.05], dtype=float32)

Common errors

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: Invalid Feed Input
Your tensor shape or dtype does not match the model. Verify (1, 3, H, W) float32.
Quick quiz

Why load the model once in a class?

2

Load model at startup

python
1# app/main.py
2from app.models.manager import ModelManager
3model: ModelManager | None = None
4
5@app.on_event("startup")
6def load_model():
7 global model
8 model = ModelManager("app/models/damage_v1.onnx")
9 print("Model loaded:", model.LABELS)

Plain-English explanation

Run the model loader exactly once when Uvicorn starts. Store the instance in a module-level variable.

Why it matters

Startup hooks guarantee the model is ready before the first /classify call arrives.

Line by line

  • @app.on_event("startup")FastAPI runs this function during server boot.
  • global modelAllow the function to assign to the module-level name.
Expected output
INFO: Application startup complete. Model loaded: ['no-damage', 'minor-damage', 'major-damage', 'destroyed']

Common errors

FileNotFoundError: damage_v1.onnx
Copy the exported .onnx into app/models/ before launching.
Quick quiz

Where should the ONNX session live?

3

Softmax — convert logits to probabilities

python
1def softmax(x: np.ndarray) -> np.ndarray:
2 e = np.exp(x - x.max()) # numerical stability
3 return e / e.sum()

Plain-English explanation

Turn raw network outputs (any real numbers) into a probability distribution that sums to 1.

Why it matters

ATC-20 logic compares probabilities, not raw logits. Without softmax you cannot threshold a confidence.

Line by line

  • x - x.max()Subtracting the max prevents overflow when logits are large.
  • np.exp(...)Exponentiate so values become positive.
  • / e.sum()Normalize so the result sums to 1.0.
Expected output
softmax([2.31, -0.42, -1.10, -3.05]) → [0.911, 0.059, 0.030, 0.000]
Quick quiz

Why subtract x.max() before exp?

4

Confidence and entropy

python
1def confidence(probs: np.ndarray) -> float:
2 return float(probs.max())
3
4def entropy(probs: np.ndarray) -> float:
5 # Shannon entropy in nats, normalized by log(N)
6 p = probs.clip(1e-9, 1.0)
7 h = -(p * np.log(p)).sum()
8 return float(h / np.log(len(p)))

Plain-English explanation

Confidence is the highest probability. Entropy measures how spread-out the distribution is — high entropy = uncertain.

Why it matters

A confident prediction with entropy 0.1 is far more trustworthy than one with entropy 0.9, even if the top class is the same.

Line by line

  • probs.clip(1e-9, 1.0)Avoid log(0).
  • -(p * np.log(p)).sum()Shannon entropy formula.
  • / np.log(len(p))Normalize to [0, 1] so different class counts are comparable.
Expected output
confidence → 0.91 entropy → 0.18 (sharp, trustworthy) confidence → 0.34 entropy → 0.95 (flat, uncertain)
Quick quiz

A high entropy value indicates…

5

ATC-20 mapping

python
1def to_placard(label: str, conf: float) -> tuple[str, str]:
2 if conf < 0.55: # low-confidence default
3 return "YELLOW", "Confidence low — engineer inspection required."
4 if label == "no-damage":
5 return "GREEN", "Safe to enter — routine inspection only."
6 if label == "minor-damage":
7 return "YELLOW", "Restricted use — limited entry permitted."
8 if label == "major-damage":
9 return "ORANGE", "Engineer inspection required before re-entry."
10 return "RED", "Unsafe — do NOT enter."

Plain-English explanation

Map the model's class label to an official ATC-20 placard plus a human-readable engineer recommendation.

Why it matters

The placard is what gets posted on the actual building. The mapping is conservative: low confidence always escalates to a human.

Line by line

  • if conf < 0.55Default to YELLOW whenever the model is not confident.
  • tuple[str, str]Returns (placard, recommendation).
Expected output
to_placard("destroyed", 0.93) → ("RED", "Unsafe — do NOT enter.")
Quick quiz

Why does low confidence escalate to YELLOW instead of trusting the top class?

6

Real /classify response

python
1@router.post("", response_model=ClassifyResponse)
2async def classify(file: UploadFile = File(...)):
3 raw = await file.read(); validate_filename(file.filename, len(raw))
4 tensor = preprocess(raw, file.filename) # from Week 2
5
6 logits = model.predict(tensor)
7 probs = softmax(logits)
8 idx = int(probs.argmax())
9 label = ModelManager.LABELS[idx]
10 conf = confidence(probs)
11 placard, rec = to_placard(label, conf)
12
13 return ClassifyResponse(
14 filename=file.filename, placard=placard, top_class=label,
15 confidence=conf, entropy=entropy(probs),
16 classes=[DamageClass(label=l, probability=float(p))
17 for l, p in zip(ModelManager.LABELS, probs)],
18 recommendation=rec,
19 )

Plain-English explanation

The complete endpoint: preprocess → predict → softmax → placard. Every line builds on Weeks 1 and 2.

Why it matters

This is the production handler. Everything else (batch, compare, visualise) wraps this same core.

Line by line

  • model.predict(tensor)Calls the ONNX session and returns the raw logits.
  • probs.argmax()Index of the most likely class.
  • DamageClass(label=l, probability=float(p))Build the per-class list for the frontend chart.
Expected output
{ "filename":"tile_001.tif", "placard":"ORANGE", "top_class":"major-damage", "confidence":0.78, "entropy":0.41, "classes":[ {"label":"no-damage","probability":0.05}, {"label":"minor-damage","probability":0.12}, {"label":"major-damage","probability":0.78}, {"label":"destroyed","probability":0.05} ], "recommendation":"Engineer inspection required before re-entry." }

Common errors

AttributeError: 'NoneType' object has no attribute 'predict'
The startup hook did not run or the path is wrong. Check the launch logs.
Quick quiz

Which step turns model output into a human-readable safety decision?