I have been writing about link checking on this blog before. First in Mastering internal links in Zola where I covered converting self-referencing links to Zola’s internal syntax, then in Fragment link checking in Astro where I set up anchor validation with lychee after migrating to Astro. Both of those focused on internal links. External links were left unchecked and it was starting to show.

The problem with checking everything #

The naive approach would be to run lychee against the entire site and fix whatever it reports. I tried that. It does not work in practice. With over 250 posts, there are hundreds of external links. Many point to sites like GitHub or Reddit that actively block automated requests. They return 403 or 429 regardless of how politely you ask. You end up with a wall of false positives and no way to tell which links are actually broken.

Running a full site check also takes a long time. Even with generous timeouts and retries, you are looking at minutes of waiting just to get a report you cannot fully trust.

Checking only what you touch #

The approach I settled on is simpler. When I push changes, a git pre-push hook runs lychee only on the markdown files that were actually modified in the pushed commits. If I am editing a post from 2021, only that post’s external links get checked. The rest of the blog is left alone.

The hook reads the push range from stdin to figure out which commits are being pushed, then filters for .md files:

pushed_md=$(git diff --name-only --diff-filter=ACM "$push_range" | grep '\.md$' || true)

Then lychee runs on just those files:

echo "$pushed_md" | xargs lychee \
    --base-url 'http://localhost:4321' \
    --exclude 'localhost' \
    --exclude '127\.0\.0\.1' \
    --no-progress \
    --max-retries 2 \
    --timeout 20

The --base-url is there because root-relative links in markdown like [feed](/atom.xml) would otherwise fail to resolve. With the base URL set to the local dev server (which is already running from the internal link check), these get resolved to http://localhost:4321/atom.xml and then excluded by the localhost filter.

The check runs as a warning only. If lychee reports failures, the push still proceeds. This is intentional because some sites will always reject link checkers and I do not want a false positive to block my workflow.

Since I made my blog dual language, there is another dimension to link checking. Each post exists as en.md and sk.md in the same folder. The text differs yet the links are not being translated, thus they should be kept identical. If I fix a broken link in the English version but forget the Slovak one, or vice versa, I end up with diverging content.

The pre-push hook now also extracts all URLs from both language files and compares them. If there is a mismatch, the push is blocked:

Links mismatch in my-post:
  only in en: https://example.com/old-link
  only in sk: https://example.com/new-link

This has already caught a few cases where a URL was updated in one language but not the other. The URL extraction strips frontmatter first and scans the entire body, not just the links section at the bottom.

Broken links are a bad experience for readers. Someone clicks a link expecting to learn more and gets a 404 instead. It erodes trust in the content, especially for technical articles where readers need to verify claims or follow along with documentation.

Search engines also care about this. Outbound links to dead pages can signal that content is outdated or unmaintained.

Lychee also reports 301 redirects by default and this is worth paying attention to. A 301 means the destination has moved permanently. The link still works because the browser follows the redirect, but there are good reasons to update it:

  1. Extra round trip - every redirect adds latency. The browser has to make an additional request before reaching the actual content.
  2. Redirect chains - today’s 301 might become tomorrow’s 301 to yet another URL. Chains degrade performance and eventually some links in the chain may break entirely.
  3. Content accuracy - if a project moved from one domain to another, the old URL might stop redirecting at some point. Updating now saves future breakage.
  4. Signal freshness - updated links show the content is maintained. Both readers and search engines pick up on this.

When lychee reports a 301, I update the URL in the post to point directly to the new location. It takes a few seconds per link and prevents problems down the road.

What I ended up with #

The pre-push hook now does three things in sequence:

  1. Checks internal links and fragments (blocks push on failure)
  2. Compares URLs between language pairs (blocks push on mismatch)
  3. Checks external links in touched files (warns but does not block)

The ordering matters. If the language pair check finds a mismatch, there is no point running the external check because some of those links are obviously wrong. The external check runs last and only on files that passed the earlier checks.

It is not perfect. A full site audit would catch more. But it is practical and catches problems right when I am working on a post, which is exactly when I have the context to fix them. Enjoy!