TIL fine-tuning an LLM takes way more trial runs than I thought

I spent the last 3 weekends trying to get a custom GPT to summarize support tickets for our small team. I figured it would take maybe an afternoon to tweak the training data and be done. Nope. I ran into this problem where the model kept using overly formal language no matter how many examples I gave it. Finally on the 4th try I realized I was including too many long examples and not enough short, casual ones. The model was basically learning the wrong pattern from my own data. It took around 20 hours total to get something halfway useable. Has anyone else found that the data formatting matters way more than the actual model you pick?

3 comments

3 Comments

fiona_kim1mo ago

Wait, is 20 hours really that bad for fine tuning?

rowanhernandez1mo ago

Oh man, @leo_black76 is totally right about the data thing. I once spent like 15 hours tweaking prompt templates before realizing my training data had a typo in every single label, lmao.

leo_black761mo ago

Count me among the ones who had the same rude awakening. Spent a solid weekend wrestling with a model that kept spitting out lawyer-style responses when I wanted something a barista could write. @fiona_kim, 20 hours is a lot if you're just trying to get a tool to work, not build a whole product. The data formatting thing is real though, I bet half the time people blame the model when it's really their own messy examples. Eventually you just learn to strip everything down to the basics and let the model fill in the gaps.