F
20

TIL that AI training data can include your private Reddit DMs

I saw a thread on r/MachineLearning last Tuesday where someone proved their old private chat logs showed up in a public dataset, so if you think your posts here are safe from scraping, you're wrong - has anyone actually checked what data sets like The Pile pulled from their accounts?
3 comments

Log in to join the discussion

Log In
3 Comments
ward.jamie
ward.jamie1mo ago
The Pile had like 250GB of Reddit data including deleted stuff.
7
nelson.vera
Feels like EVERYTHING we do online is just feeding someone else's machine now, from our shopping habits to our private messages. It's getting harder to tell where our digital lives end and their training data begins.
1
jessej23
jessej231mo ago
Think about how much of what you type or click gets scraped up by some company you never even heard of. It's like we're all just unpaid workers building their systems for them. Hard to feel good about any of it when your private thoughts become somebody else's product.
6