Hit 1,000 false positives flagged by our new AI moderation tool in one week

I installed a content filter for our forum last Monday and it flagged 1,000 harmless posts by Friday, mostly blocking users talking about their pets and hobbies instead of actual spam. Has anyone else had a model that just completely misses the point like that?

3 comments

3 Comments

jana8811mo ago

1,000 false positives in a week is barely a hiccup for most forums I've seen.

claire8721mo agoOG Member

...and from what I've seen, the problem is usually that the training data was mostly grabbing stuff like swear words and product links, but it completely misses the context around pets and hobbies because people using those words aren't breaking any rules. Like you mentioned blocking someone talking about their aquarium fish, but a real spammer would probably be pushing a supplement or a scam link using completely different wording. The model just doesn't understand that "I love my new puppy" and "click here for free money" are not the same kind of post, so it flags everything with a certain word pattern instead of learning the actual intent behind it.

juliarodriguez1mo ago

...you know, that whole thing about the model not getting intent reminds me of the time I got flagged for "inappropriate language" in a forum post about my classroom pet hamster. I said something like "the little guy made a mess in his cage" and the filter thought I was swearing or something. It's just so frustrating because people who actually have pets or hobbies are the ones getting hit with false flags, while the real spammers are out there crafting these weirdly specific links that slip right through. I'm not gonna lie, I've probably reported a few innocent fish tank photos myself before I realized what was happening. Makes me wonder if the people training these things have ever actually moderated a real forum or if they just feed it a bunch of random keywords and call it a day.