EdgeSEO.pro
Published on

Automate FAQ Schema using Cloudflare Workers

Authors

FAQ schema is a way to tell the search engine about questions and answers your article is talking about… blah blah blah, you surely know that already if you want to automate the process. And if you don't, there are plenty of resources around, so I'll just get right into it.

How to automate FAQ Schema with Cloudflare Workers?

We will use the Cloudflare Workers as HTML processors. Every time someone opens your website, the worker will analyze the HTML on-the-fly. Once it spots any headings ending with a question mark and following paragraph, it will happily create FAQ Schema out of it, so that you don't have to do it manually. Finally, more time to scroll Facebook!

Best thing is, because it's doing the analysis on every request, the schema will get updated automatically once you tweak the content.

Since it works on HTML returned by the server, it doesn't matter if and what CMS you use. It will work exactly the same on Wordpress, Webflow or Joomla (does anyone remember Joomla?).

Pretty cool, huh? If you are sold, read on how to implement it on your website.

Automating FAQ schema

First of all, you need to make sure your website is already on Cloudflare. If it's not, I highly recommend you to use it – here is an official tutorial way to get started. There are ways to adapt this method to other edge networks like Akamai, but in this case I'll assume you are on Cloudflare.

Step 1: make sure your domain is proxied

In your Cloudflare Dashboard Home, pick the website you want to work with.

Go to the DNS tab and check if your A record (or CNAME if you don't have any A record) Proxy Status is Proxied. Otherwise Cloudflare won't be able to process requests and inject the FAQ Schema.

Screenshot showing Cloudflare Proxying enabled
Hint: you can click the screenshot to watch a video

Step 2: Create Worker

Go to the Workers tab, hit Manage Workers and then Create a Service.

Time to pick a name. You can leave the default, but I recommend you to pick something that will tell you what does this worker do, that's more professional.

I tried to be professional too, so I named my little worker "it-depends". I'll surely know it's for SEO.

Once you are happy with your name, click Create service.

Screenshot showing the Create Worker form

Step 3: Let's make your Worker smart

Right now, the worker doesn't do anything useful. The example code will just render some useless HTML page.

Hit Quick edit to open the code editor.

You can safely delete all the code and replace it with following (long code ahead, but you can use clipboard button in top right corner to copy it all):

// by default the worker will inject schema even if you already have one. Change this to true if you want change this behavior.
const DISABLE_IF_SCHEMA_PRESENT = false

// by default, the script will run on all the pages. If you want to disable it for specific URLs, add a regexp here.
const FAQ_SCHEMA_IGNORE = [
  // uncomment the following line to ignore all pages with "blog" in the URL
  // /blog/
]

// by default, it will look within <h2-h6> for questions, but you can customize it here
const HEADINGS_SELECTOR = 'h2, h3, h4, h5, h6'

// by default, it will look in <p> for answers, but you can customize it here
const CONTENT_SELECTOR = 'p'

addEventListener('fetch', (event) => {
  event.passThroughOnException()
  event.respondWith(
    handleRequest(event.request).catch((err) => new Response(err.stack, { status: 500 }))
  )
})
async function handleRequest(request) {
  const response = await fetch(request)
  if (
    response.ok &&
    response.headers.get('content-type')?.startsWith('text/html') &&
    !isMatching(request.url, FAQ_SCHEMA_IGNORE)
  ) {
    return createFaqSchemaRewriter(request.url).transform(response)
  } else {
    return response
  }
}
/**
 * FAQ schema rewriter, copyright 2022 Lucjan Suski (lucjan@edgeseo.pro)
 * https://edgeseo.pro/guides/automate-faq-schema-with-cloudflare-workers
 * license: MIT
 */
// BEGIN SCHEMA REWRITER
function createFaqSchemaRewriter(url) {
  let recentHeadingContent = ''
  let faqPairs = []
  let detectedExistingSchema = false
  class ParagraphHandler {
    constructor() {
      this.content = ''
    }
    element(el) {
      el.onEndTag(() => {
        faqPairs.push([recentHeadingContent, this.content])
        recentHeadingContent = ''
        this.content = ''
      })
    }
    text(text) {
      if (recentHeadingContent != '') {
        this.content += text.text
      }
    }
  }
  class ExtractHeadingsContent {
    constructor() {
      this.partialContent = ''
    }
    element(el) {
      el.onEndTag(() => {
        recentHeadingContent = this.partialContent
        this.partialContent = ''
      })
    }
    text(text) {
      this.partialContent += text.text
    }
  }
  class ExistingSchemaScript {
    element(el) {
      if (el.getAttribute('type') === 'application/ld+json') {
        detectedExistingSchema = true
      }
    }
  }
  return new HTMLRewriter()
    .on(HEADINGS_SELECTOR, new ExtractHeadingsContent())
    .on(CONTENT_SELECTOR, new ParagraphHandler())
    .on('script', new ExistingSchemaScript())
    .on('link', {
      element(el) {
        const href = el.getAttribute('href')
        if (el.getAttribute('rel') === 'canonical' && typeof href === 'string') {
          url = href
        }
      },
    })
    .on('body', {
      element(el) {
        el.onEndTag((end) => {
          if (DISABLE_IF_SCHEMA_PRESENT && detectedExistingSchema) {
            return
          }
          const questionsEntity = faqPairs
            .filter(([heading]) => heading.endsWith('?'))
            .map(([heading, paragraph]) => ({
              '@type': 'Question',
              name: heading,
              acceptedAnswer: {
                '@type': 'Answer',
                text: `<p>${paragraph}</p>`,
              },
            }))
          if (questionsEntity.length >= 1) {
            const schema = {
              '@context': 'https://schema.org',
              '@type': 'FAQPage',
              mainEntity: questionsEntity,
              '@id': url,
            }
            const txt = `<script type="application/ld+json">${JSON.stringify(schema)}</script>`
            end.before(txt, { html: true })
          }
        })
      },
    })
}
// END SCHEMA REWRITER
function isMatching(str, regexpList) {
  return regexpList.some((regexp) => str.match(regexp))
}

There are a few tweaks you can make by adjusting constants at the top of the script. Did my best with the comments to tell you how, but if you are still not sure, just ask me on Slack.

Once you are ready, hit Save and deploy and keeps your fingers crossed the script is will not inject a bitcoin miner to your website.

Screenshot showing Worker code

Congrats! Your worker is ready to take some work off your plate. But we still need to instruct it when to activate.

Step 4: add the route

Right now, the worker is enabled just on testing domain. To make it work on your website, we need to add the route.

Go back and open the Triggers tab, then hit the blue Add route button. In Route field, you need to enter <your domain (including a subdomain)>/*, in my case it'll be edgeseo.pro/*. For the Zone, pick the zone of your website. Then confirm with another blue Add route button.

Screenshot showing adding route form

We are done, but don't close the tab just yet, it might still be useful!

Troubleshooting

There is a chance something went wrong and everything will just stop working. Quickly open up your website to make sure it still loads just fine. Don't panic if it doesn't!

If that happens, go back to the tab I told you not to close (if you wasn't listening and lost it, don't worry. Go to dash.cloudflare.com, hit Workers tab, click the worker you just created and open Triggers tab).

Find the route you just added, and under ... menu hit Delete route. It will be fixed after a few seconds and as soon as it's back again, you can call me on the carpet.

Step 5: dance party 💃🕺

You are all done! If you view the source of a page that has any questions in headings, you should see nice and shiny <script> with the schema picked up from your content, just before </body> closing tag.

You can use Google's Rich Results Test to make sure it's looking right. After a few days, once your website is recrawled by Google Bot, you should see FAQ enhancements in your Search Console going up.

Enjoy!

Enjoyed the read? Join our community to stay on top of the game!

We’ll send you the link to the Slack workspace.