In practice
Even a handful of corrupted documents in a web crawl can create persistent backdoors or biases. Particularly risky for models that continuously train on public content or are fine-tuned on unvetted third-party datasets.
Even a handful of corrupted documents in a web crawl can create persistent backdoors or biases. Particularly risky for models that continuously train on public content or are fine-tuned on unvetted third-party datasets.