Essentials for Building Intelligent Systems Using AI
by
Subhaditya Mukherjee
OpenML
Overview of the session
Building AI systems in the real world
- ! “Advanced/Not cool” topics ahead! So feel free to ask questions :)
- Limited time, so if you want to know more about something, ask
- Its okay if something does not make sense
- What are Intelligent systems?
- University vs the real world
- Understanding the AI hype train
- Essential components of AI systems
- Bonus - MCP servers
What is an Intelligent System?

- Something like ChatGPT perhaps? Or more?
- Do we expect it to do more complex tasks than just answer questions?
What you learn in school vs the real world

- If you want - A startup, a tech job or to build cool tools for the broader community
- Production is a lot more than just working code that does XYZ
- AI models are just one small part of the system
- With great power … comes great responsibility
AI Hype, a reality check
- Everyone and their grandma is using ChatGPT these days
- We “believe” that this is the answer to everything, but is it?
- One large AI model (GPT 5, Olmo) vs one large + many small + a lot of handwritten tech
- At the end of the day - it is a product. And we are the future “paying” customers
- Computer Vision → LLMs → Multimodal
- Fundamental breakthroughs
- Data Availability
- Transformers
- Better training - eg: RLHF
- Is this enough?
RLHF and “people pleasing”
- Manually/semi automatically labelled data points with a question answer format
- Biased answers, even if a real answer does not exist
- Please pleasing?
- OpenAI is a good case study
- Resources - InstructGPT, HF blog
Essential 1 - Data
Kaggle data vs real world data
- Clean
- Not biased
- Labelled
- Structured
Collect your own!
- Explore, label
- Train a simple model, test on data outside the scope of your dataset, see how it does?
- Resources - Mlcommons, croissant
Essential 2 - Model

Choosing a model
- Type of task - Papers with code (discontinued)
- Vision vs Text vs LLMs vs Multimodal?
- Performance vs accuracy
- Resources available
Essential 3 - Ethics

Bias
- Algorithmic harm - Employment, Cost, Surveillance, Stereotyping
- Deploying models without further thought is never a good idea
- Resources - Ethics fast.ai
Safety rails
- Fake content generation and hallucinating information
- Hate speech
- People pleasing - AI psychosis
Privacy
- How much data are you storing?
- Sending user information to a server vs local processing
- Why is this not easy? Big models vs on device ML
GDPR
- Europe AI Act
- Why does it matter to you?
- A really good set of guidelines
- Related to everything we discussed now
Very Advanced - MCP Servers
- Have any of you heard of them?
- What is it? - LLM + Tool
- Github

Eclair
- “Data scientist intern”
- Github

Questions?

Thank you :)
- Link to the slides
- My website