Semantic HTML is LLM Training Fuel: Why 'Div Soup' Poisons Models

In the early days of the web, we were told to use Semantic HTML for accessibility. We were told it allowed screen readers to navigate our content, providing a better experience for the visually impaired. We were told it might help SEO, though Google’s engineers were always famously coy about whether an <article> tag carried significantly more weight than a well-placed <div>.

In 2025, that game has changed entirely. We are no longer just optimizing for screen readers or the ten blue links on a search results page. We are optimizing for the training sets of Large Language Models (LLMs).

Read more →