29 Jan 2026 · 10 min read

Which schema and content patterns do AI engines actually reward?

AI engines reward four schema types (Organisation, LocalBusiness, Service, FAQPage) paired with three content patterns (question-led headings, answer-first paragraphs, customer-language FAQs). Most sites fix one half and wonder why citations do not move. Both halves have to be in place. The fastest combined fix is FAQPage schema on a properly written FAQ section.

You spent the weekend rewriting your homepage. The copy reads better than it has in years. You added a clear FAQ section. You even drafted a services page for the work you actually want more of. Then you asked ChatGPT for a recommendation in your category and your name did not come up.

I get this message most weeks. The site is well-written. The owner has done real work. And yet the AI engine that is supposed to be reading the web walked past the front door.

The reason is almost always the same. The visible content is sound. The structural signals are missing or in the wrong shape. AI engines are pattern matchers, not readers, and they reward a specific stack of signals: four schema types, three content patterns, and a small text file most sites have never heard of.

This post walks you through the stack that produces citations. By the end you will know what to install, what to rewrite, and which single change moves the dial fastest this week.

Key Takeaways

  • Four schema types do most of the visibility work for small businesses: Organisation, LocalBusiness, Service, and FAQPage. Missing schema is a measurable visibility deduction in any AEO scan.
  • Question-based H2 and H3 headings improve AI extractability. Topic-label headings make the same content harder to lift.
  • Answer-first paragraphs win the extraction. AI engines pull the first sentence of a section more often than any other line, so it has to be the answer, not the warm-up.
  • Per-service pages beat bundled service pages. AI matches specific intent queries to pages dedicated to that service, not to a single page covering everything.
  • The Schema-and-Pattern AEO Stack pairs the four schemas with the three content patterns and an llms.txt anchor. Build the stack once and every future page inherits it.

Why do well-written websites still get skipped by AI engines?

Most small business sites are more readable than they are useful to AI. The copy is fine. The arguments are organised. The pages load. None of that is the bottleneck.

The bottleneck is structural. AI engines do not read your homepage the way a human reader does. They look for confirmed signals they can extract without guessing: a declared business name, a declared service, a question their user just asked, an answer in the first sentence under that question. When those signals are absent, the engines fall back to inference, and inference is conservative. They prefer to skip you than to recommend the wrong business.

Think of schema markup as the barcode on your packaging. The scanner does not care how attractive the box is. It reads the barcode. No barcode, no checkout. Without schema, AI engines have to interpret everything you wrote, and they do that interpretation with a confidence penalty attached.

The same logic applies to content. A heading written as a topic label ("Employment law for small businesses") gives AI engines nothing to match against a real query. The same content under a question heading ("What happens when a small business in Victoria gets an unfair dismissal claim?") matches the way prospects actually type. The content barely changes. The extractability changes a lot.

So what for you: if your scan says you are missing schema and missing question-shaped headings, do not assume your copy is the issue. The copy is probably fine. The signals around the copy are what AI is looking for, and those are the cheaper fix.

What does the research actually say about schema and content patterns?

The Princeton GEO study (Generative Engine Optimisation, 2024 onwards) is the most rigorous public dataset on what AI engines cite. Two of its findings sit directly on top of this article. Structured data appears in roughly 61% of AI-cited pages, which means the absence of schema is not a neutral choice, it is a measurable disadvantage. And content that includes statistics, expert quotes, and structured citations gets a lift in citation likelihood of around 41% versus content that does not.

SE Ranking's analysis of 2.3 million pages adds a second piece. Sites with a complete Google Business Profile show up in AI search results around 2.8 times more often than incomplete profiles. A complete profile is a structured-data signal in another wrapper, the same logic at work in a different surface.

Then there are the question headings. AEO research consensus across the major studies is consistent: AI engines lift answers from headings phrased as questions far more often than from topic labels. The mechanism is simple. The engine is matching a user query to a heading on your page. A query is a question. A question heading is a one-step match. A topic label is a translation step the engine has to do, and most of the time it does not bother.

The GetRecommended.io scan reads the same signals. Missing schema markup is a measurable visibility deduction. Missing question-based headings is a separate deduction. Bundled service pages, where five services share one page, are a third. The deductions are not arbitrary, they reflect the patterns AI engines themselves reward.

So what for you: the things that move your AI visibility score in the scan are the same things the public research says move citations. There is no contradiction between the scan and the public data. There is also no shortcut around schema and question headings, because the engines run on those signals whether you like them or not.

The Schema-and-Pattern AEO Stack

The Schema-and-Pattern AEO Stack is the working framework I use whenever a business asks why AI engines pass over a site that looks fine. It has three layers. Layer one is the four schema types AI engines read first. Layer two is the three content patterns that turn extractable schema into citable answers. Layer three is the llms.txt anchor that tells AI which pages on your site actually count.

You build the stack once. Every future page on your site inherits it.

Layer 1: The four schema types

Organisation schema. This declares your business name, website, industry, description, and contact details. It is the foundation. Every other schema type sits on top of it. Without Organisation schema, AI engines build their understanding of your business from guesswork, and guesswork costs you visibility.

LocalBusiness schema. This applies to any business whose customers care where you are. It declares your address, the area you serve, your opening hours, and your phone number. Gemini reads this directly alongside your Google Business Profile. Sites with strong LocalBusiness schema and a complete Profile show up in Google's AI local recommendations far more consistently than sites relying on either alone.

Service schema. This declares each thing you sell. The product-claims position is firm here: per-service pages, not bundled. A clinic that lists ten services on one page gives AI engines a thin signal for each service. The same clinic with ten dedicated service pages, each with its own Service schema, can be matched to ten different intent queries. Bundling dilutes the match.

FAQPage schema. This labels your question-and-answer section so AI engines know to extract from it. This is the highest-return schema for most small businesses, because AI engines deliver answers in question-and-answer shape and FAQPage hands them content already in that shape.

Layer 2: The three content patterns

Pattern 1 is question-led headings. Every page heading rewritten as a question gives AI engines a clear extraction point. "Pricing" is a label. "How much does a small business website refresh actually cost?" is a question. The engine can match the second one against a real prospect query in one step.

Pattern 2 is answer-first paragraphs. The first sentence of every section has to directly answer the heading. No context-setting, no "it depends" warm-up. AI engines lift the first sentence of a section more often than any other line. If the first sentence is a stage-setting paragraph, you have given the engine your warm-up to cite, not your answer.

Pattern 3 is customer-language FAQs. A dedicated FAQ section with ten to fifteen questions, each phrased the way a real prospect would type it into ChatGPT, is the highest-yield single content investment for AI visibility. The questions cannot be the ones you wish prospects asked. They have to be the ones prospects actually ask.

Layer 3: The llms.txt anchor

An llms.txt file is a short text document on your website that tells AI engines which pages matter most and how to interpret your content. Think of it as the note you leave for a new colleague: these three folders are the important ones, the rest is background. The format guide is at llmstxt.org and the file takes about thirty minutes to write. Most small business sites do not have one, which is why writing one is one of the fastest visibility gains available.

So what for you: the stack works because each layer reinforces the others. Schema without good content patterns gets ignored. Good content patterns without schema get under-cited. The llms.txt without either is a sticky note on an empty wall. Build all three and you make it almost easy for AI engines to choose you.

Practical steps for the next 30 days

A 30-day plan that maps each layer of the stack to a specific action with a time estimate.

  1. Day 1 to 3. Audit what you already have. Run your homepage through Schema.org's validator and Google's Rich Results Test. Note which of the four schema types are present. Pull a list of every H2 and H3 across your top ten pages. Mark which ones are questions and which are topic labels. This is your baseline. Most owners are surprised at how few question-shaped headings they actually have.

  2. Day 4 to 7. Install Organisation and LocalBusiness schema. If you are on WordPress, configure Yoast or RankMath to handle both. If you are on Wix, Squarespace, or Shopify, switch on the built-in schema fields. If you are on a custom site, brief a developer with the four schema types and ask for a quote. The work is usually a few hours, not days.

  3. Day 8 to 14. Rewrite your top ten H2s and H3s as questions. Pick the ten pages that drive the most search traffic or convert the best. For each topic-label heading, ask yourself what question this content answers, then rewrite the heading as that question. This is a content edit, not a technical task. You can do it without a developer.

  4. Day 15 to 21. Build or rebuild your FAQ section with ten customer-language questions. Brainstorm by listing the questions prospects actually ask in sales calls, on intake forms, or in support emails. Phrase each one the way they would type it into ChatGPT. Write each answer so the first sentence directly answers the question. Add FAQPage schema to wrap the section.

  5. Day 22 to 27. Split bundled service pages. If your site has one services page covering everything, split the top three services into dedicated pages, each with its own Service schema. Per-service pages give AI engines specific intent matches. Bundled pages give them confusion.

  6. Day 28 to 30. Write your llms.txt and run the scan. Use the format at llmstxt.org to list your priority pages and a short description of each. Upload the file to your site root. Then run a free AEO scan and compare your AI Presence and Content Quality scores to your starting position. The gap tells you which layer of the stack is doing the most work for you.

You do not need every layer of the stack perfect to see results. The schema layer is usually the cheapest to install. The content patterns layer is usually the highest-leverage to rewrite. The llms.txt is the easiest one to forget. Start with whichever your scan flags hardest, and build the rest in as you go.

Frequently asked questions

Which schema type matters most for a small business?

Organisation schema first because every other schema relies on it being in place. FAQPage schema produces the most visible improvement once Organisation is set, because AI engines answer in question-and-answer format and FAQPage gives them content already in that shape. LocalBusiness sits alongside Organisation for any business that serves a place. Service schema lifts you in specific intent queries.

Do I need a developer to add schema markup?

Often no. WordPress sites with Yoast or RankMath have schema built in and you configure the fields. Wix, Squarespace, and Shopify support the basics natively. Custom Service or FAQPage blocks usually need a developer for a few hours of work, which is small relative to the visibility return. If your scan flags missing schema as a deduction, the remediation cost is low.

Does Google's Rich Results Test confirm that AI engines will use my schema?

No. Google's Rich Results Test and the Schema.org validator only check that your markup is well-formed. They do not predict whether ChatGPT, Perplexity, or Gemini will weight it. Valid schema is the floor, not the ceiling. The schema also has to match your visible content and sit on a page AI engines can crawl and trust.

How many FAQs do I need for FAQPage schema to work?

Five is the minimum that signals depth. Ten to fifteen is the range that produces reliable extraction across multiple AI engines. The questions have to be in customer language, the way a real prospect would type them into ChatGPT, not the way you describe your services internally. Generic questions get skipped. Specific, outcome-shaped questions get cited.

Does content freshness change how often AI engines extract from my pages?

Yes, particularly for engines that read the live web. Perplexity and Gemini check live content rather than relying entirely on training cutoffs, so a fresh FAQ page can show up in answers within days. Publishing consistently builds compounding material AI engines can cite, which is why one-off content sprints underperform a steady cadence.

The Bottom Line

Your website is probably better written than it is structured. That is the most common gap I see, and it is the cheaper one to fix. Schema and content patterns are not a stylistic preference. They are the signals AI engines use to decide whether to extract from your site or skip it.

Build the four schemas. Rewrite your headings as questions. Put the answer in the first sentence. Split your service pages. Add the llms.txt. Each step is small. Together they make the difference between a site AI engines confidently cite and one they walk past.

If you want to see exactly what AI engines are reading from your site right now, run a free AEO scan. The scan checks your schema markup, your heading patterns, your service-page structure, and the rest of the seven core signals, then tells you which layer of the stack to fix first. For broader context on the full signal set, read how to get recommended by AI search engines. For the diagnostic version, why ChatGPT doesn't recommend your business walks through the gaps in order. Common questions about scan results are answered on the scan FAQ.


Frequently asked questions

Which schema type matters most for a small business?

Organisation schema first because every other schema relies on it being in place. FAQPage schema produces the most visible improvement once Organisation is set, because AI engines answer in question-and-answer format and FAQPage gives them content already in that shape. LocalBusiness sits alongside Organisation for any business that serves a place. Service schema lifts you in specific intent queries.

Do I need a developer to add schema markup?

Often no. WordPress sites with Yoast or RankMath have schema built in and you configure the fields. Wix, Squarespace, and Shopify support the basics natively. Custom Service or FAQPage blocks usually need a developer for a few hours of work, which is small relative to the visibility return. If your scan flags missing schema as a deduction, the remediation cost is low.

Does Google's Rich Results Test confirm that AI engines will use my schema?

No. Google's Rich Results Test and the Schema.org validator only check that your markup is well-formed. They do not predict whether ChatGPT, Perplexity, or Gemini will weight it. Valid schema is the floor, not the ceiling. The schema also has to match your visible content and sit on a page AI engines can crawl and trust.

How many FAQs do I need for FAQPage schema to work?

Five is the minimum that signals depth. Ten to fifteen is the range that produces reliable extraction across multiple AI engines. The questions have to be in customer language, the way a real prospect would type them into ChatGPT, not the way you describe your services internally. Generic questions get skipped. Specific, outcome-shaped questions get cited.

Does content freshness change how often AI engines extract from my pages?

Yes, particularly for engines that read the live web. Perplexity and Gemini check live content rather than relying entirely on training cutoffs, so a fresh FAQ page can show up in answers within days. Publishing consistently builds compounding material AI engines can cite, which is why one-off content sprints underperform a steady cadence.

See where you stand

Free 60-second AI visibility scan. No account, no card.

Get Your Free AI Visibility Score

Get new posts in your inbox

Practical AI search guides, sent when we publish.

Unsubscribe anytime. Privacy Policy.