I’m a rising senior at Purdue studying data science and applied statistics, and by most measures I came into my first real engineering role feeling prepared. I’d taken the courses, built the models and worked on corporate data projects through Purdue’s programs. The technical foundation was real.
Earlier this year, I worked as a forward-deployed engineer at a YC-backed startup, deploying AI analytics pipelines into live enterprise supply chain systems. Operations teams were using the output to make actual procurement decisions, and if the numbers were wrong, people would notice.
What I found was that the technical skills transferred fine. Everything around them, from the judgment calls to communication to the operational context, was something I had to figure out as I went. Here’s what I wish someone had told me earlier.
6 Things School Didn’t Teach Me About Working with Data in the Real World
- Data cleaning isn’t a preprocessing step. It’s the job.
- Stakeholder trust is a technical problem.
- The questions that matter most aren’t technical.
- Speed matters more than perfection.
- Documentation isn’t busy work. It’s how teams stay aligned.
- Speed of learning matters more than current knowledge.
1. Data Cleaning Isn’t a Preprocessing Step. It’s the Job.
In every course I took, data cleaning was something you did before the real work started. A few lines of Pandas, maybe a missing value imputation and then on to the model. In production, data cleaning is the real work. The modeling is almost the easy part.
Production data is messy in ways that are hard to appreciate until you’re in it. The same entity might appear under different naming conventions across different systems. Two columns that should match often don’t. Fields that seem straightforward turn out to have multiple competing definitions that nobody wrote down because everyone just internalized them over time.
Before you can build anything reliable, you must understand the data’s provenance. Where did it come from? Who owns it? What assumptions are baked in? These questions take up most of your time, and answering them requires talking to people, not just querying tables.
2. Stakeholder Trust Is a Technical Problem
You can build something technically correct and still fail. If the people using your work don’t understand how it was built, don’t trust the numbers or can’t connect the output to a decision they make, it doesn’t matter how clean the pipeline is.
Building trust means making your work transparent: documenting metric definitions, surfacing data quality issues instead of hiding them and taking the time to understand what decisions your output is meant to support. It means treating the relationship with your stakeholders as part of the technical work, not separate from it.
Reliability is a feature. An imperfect model the team trusts and uses beats a sophisticated one they ignore.
3. The Questions That Matter Most Aren’t Technical
Before writing a single line of code, the most valuable thing you can do is ask, “What decision is this meant to support?”
In production, the hardest problems are rarely statistical. They’re definitional. What exactly are we measuring? What counts as success? Why does this metric matter to this team right now? If you skip these questions, you risk building something that, while precise, answers the wrong problem entirely.
School teaches you to optimize for a metric. Real work experience teaches you to first ask whether you’re measuring the right thing. Developing that type of judgment only comes from being in the room with the people whose decisions depend on your output.
4. Speed Matters More Than Perfection
In school, two weeks on a project is normal. You iterate, refine, polish and turn in something clean. In production, two weeks on a single deliverable is often too long. The business environment shifts, stakeholder priorities change and a good answer delivered fast is usually worth more than a perfect answer delivered late.
The mindset shift I had to make was learning to ship something useful, get real feedback and improve from there rather than disappearing for two weeks to build the definitive version of the final deliverable. Moving fast, scoping tightly, building quickly and iterating in the open is one of the most underrated skills in data work, and it’s one that school doesn’t really train you for.
5. Documentation Isn’t Busy Work. It’s How Teams Stay Aligned.
On a small, fast-moving team, there’s no project manager tracking what everyone is doing. You own your work entirely, which means you also own communicating it. That means documenting what you built, what decisions you made and why it isn’t overhead, it’s the connective tissue that lets a team move fast without constantly losing context.
In school, documentation feels like something you do after the real work is done. In practice, it is part of the real work, especially when the people around you need to stay aligned without slowing you down.
6. Speed of Learning Matters More Than Current Knowledge
The most important thing I brought to the job wasn’t any specific technical skill. It was the ability to pick up new tools quickly, ask good questions and figure things out without being handed a roadmap. The tech stack you learn in school probably won’t be exactly what you use at work. The data sets you train on won’t look like production data. The problems will be messier and less well-defined than anything in a problem set.
The skill that scales is how fast you can go from not knowing something to being useful with it. Purdue taught me how to learn rigorously. The real world taught me how to learn fast. Both matter, and neither is sufficient on its own.
The gap between school and production is real, but it’s not a flaw in your education. Instead, it’s just the nature of applied work. The classroom gives you the tools. The real world teaches you when and how to use them, and the sooner you start working on real problems with real stakes, the faster that gap closes.
