Hosting on prem LLM (SSC)
- Hengjian Zhang - AI application support engineer
- Nvidia ai blueprint
- monitoring
- zipkin, grafana
- prometheus
- RAG : NeMO retriever + paddle ocr
- Milvus vector DB
- Why
- uva has a chatbot already (so we should as well ;p)
- Advantages
- monitoring and safety control
- NVIDIA inference microservice (higher throughput)
- “dont need to maintain software” (aka someone else does it for you)
- spike - license obtained by tue
- its an AI supercomputer (DGX B200)
- nvidia/llama-3.3-nemotron-super-49b-v1
- send data to tue server instead of openai being the only selling point
- at the moment
- deployed on a vm
- no dynamic scaling yet
- no money to do this atm so doesnt “really” exist
- if demand, then the tue will spend money on it
- has guardrails but that requires even more GPU resources
- SURF
- self hosted GPUs are not okay
- 4xA100 but not enough
- edu genai is a service by them
- spike-1 is not too stable rn
- supercomputing@tue.nl
- IMO : kinda sucks right now (I guess) (or not????)