Model Selection Guide
Choose a model path based on data sensitivity, deployment boundary, workload, hardware, context length, budget, and governance. This guide recommends an architecture path first, then model families.
SovAIHub principle
Do not start with model hype. Start with the boundary: where the data can go, where the model can run, how updates are approved, and how outputs are audited.
Data class
Private contracts, employee data, business-sensitive data.
Deployment boundary
Your own servers, GPUs, storage, and network controls.
Workload type
Answer from internal documents with citations.
Available hardware
Good for 7B–14B models and many RAG workloads.
Context need
Most RAG apps and internal assistants.
Budget priority
Control, auditability, and data boundary are priority.
Governance need
Prompt, response, source, user, and model traceability.
Language need
Primary documents and users are English.
Why this path
- Your data class requires a controlled boundary, and you have infrastructure to self-host inference.
- Self-hosting gives stronger data control, model version control, private networking, and auditability.
- For RAG, retrieval quality, chunking, reranking, citations, and answer evaluation matter as much as model choice.
Implementation steps
- 1Benchmark 2–3 candidate models against your real evaluation set.
- 2Add retrieval, reranking, prompt templates, citation enforcement, and hallucination tests if using RAG.
- 3Deploy the selected model behind an internal API endpoint.
- 4Track token volume, latency, GPU utilization, failure cases, and evaluation scores.
Cautions
- Do not choose a 70B model before testing whether 7B–14B solves the workload.
- Hardware sizing should include context length, concurrency, and KV cache memory, not only model size.
Common model selection patterns
Use this matrix as a starting point. Final selection should be validated with your own dataset, latency target, governance requirements, and hardware budget.
When not to use an LLM
If the task is deterministic routing, exact matching, simple classification, or rules-based validation, start with simpler automation before using an LLM.
RAG before fine-tuning
For private knowledge, start with retrieval, citations, and evaluation. Fine-tune only when behavior, format, or domain language requires it.
Edge AI is a separate path
Device AI needs hardware profiling, compact models, optimized formats, controlled updates, and field testing. Treat it as an engineering assessment, not a model dropdown.
Need a model architecture review?
SovAIHub can help compare local LLMs, managed cloud models, RAG design, edge deployment paths, hardware sizing, and governance controls for your environment.