Skip to Content
GuidesSetting Up AI Search

Setting Up AI Search

This guide walks you through enabling semantic search for your project. By the end, you’ll have a working search endpoint that understands natural language.

Prerequisites

  • A Foir project with content (records in one or more models)
  • An API key with the appropriate scopes:
    • search:read for semantic search
  • Search features enabled for your project (check with isAIEnabled query)

What You’ll Build

A search experience that:

  1. Finds content by meaning, not just keywords
  2. Works across all your models
  3. Returns results ranked by semantic similarity

Step 1: Enable Embeddings on Your Models

Embeddings need to be enabled per model. In the Foir admin dashboard:

  1. Navigate to Settings > Models
  2. Select the model you want to make searchable
  3. Open the AI & Search settings tab
  4. Toggle Enable Embeddings
  5. Select which fields to include in the embedding (e.g., title, description, content)
  6. Save the model

Tip: Include the fields that best describe the record’s content. A product might use title + description, while a blog post might use title + content.

Step 2: Generate Embeddings

Once embeddings are enabled, new and updated records will be embedded automatically. For existing content, you can trigger embedding generation:

Generate for a Single Record

mutation { generateEmbedding( recordId: "rec_abc123" modelKey: "product" source: RECORD ) { success tokenCount dimensions } }

Verify Embeddings Exist

query { embeddingStatus( recordIds: ["rec_abc123", "rec_def456"] source: RECORD ) { recordId hasEmbedding lastUpdated } }

Note: Embedding generation happens asynchronously. After creating or updating content, there may be a brief delay before the embedding is available for search.

Step 3: Search Your Content

query { semanticSearch(input: { query: "comfortable shoes for long walks" limit: 5 }) { recordId modelKey naturalKey similarity highlights } }

Narrow results to specific models:

query { semanticSearch(input: { query: "return and refund information" modelKeys: ["faq", "policy"] limit: 3 threshold: 0.7 }) { recordId naturalKey similarity highlights } }

Using the Results

Here’s a complete example in TypeScript:

const SEARCH_QUERY = ` query SemanticSearch($query: String!, $limit: Int) { semanticSearch(input: { query: $query modelKeys: ["product"] limit: $limit threshold: 0.6 }) { recordId naturalKey similarity highlights } } `; async function searchProducts(userQuery: string) { const response = await fetch('https://api.foir.io/graphql', { method: 'POST', headers: { 'Content-Type': 'application/json', 'x-api-key': process.env.FOIR_API_KEY!, }, body: JSON.stringify({ query: SEARCH_QUERY, variables: { query: userQuery, limit: 10 }, }), }); const { data } = await response.json(); return data.semanticSearch; } // Usage const results = await searchProducts("waterproof hiking boots"); // Returns products matching the intent, even if they don't // contain the exact words "waterproof hiking boots"

Step 4: Set Up Your API Key

Make sure your API key has the right scopes for the features you need:

FeatureRequired Scope
Semantic searchsearch:read
Generate embeddingsrecords:write

To update scopes:

  1. Go to Settings > API Keys in the admin dashboard
  2. Edit your API key
  3. Add the required scopes
  4. Save

Tips and Best Practices

Choosing Fields to Embed

  • Do include: title, description, body content, tags, category names
  • Don’t include: IDs, timestamps, internal slugs, boolean flags
  • More text generally produces better search results, but overly long content dilutes relevance

Setting Similarity Threshold

The threshold parameter filters out low-quality matches:

Score RangeQuality
0.9 – 1.0Near-exact semantic match
0.7 – 0.9Strong relevance
0.5 – 0.7Moderate relevance
Below 0.5Weak match — usually noise

Start with 0.6 and adjust based on your content and use case.

Combining with Model Filters

Semantic search works best when combined with model filters. Use modelKeys to narrow the search space, then let vector similarity handle the ranking:

query { semanticSearch(input: { query: "eco-friendly packaging" modelKeys: ["product"] limit: 5 threshold: 0.65 }) { recordId naturalKey similarity } }

Keeping Embeddings Fresh

  • Embeddings update automatically when records are saved or published
  • If a record’s content hasn’t changed, re-embedding is skipped (deduplication is automatic)
  • Use the coverage queries below to confirm every record is embedded before going live with search

Auditing Coverage and Running a Sweep

If you generate embeddings from your own service (for example a background worker), you can audit coverage and repair gaps without re-embedding your whole corpus. All three queries are scoped to your own records.

Get a per-model summary of how many records are embedded versus pending:

query { embeddingCoverage(modelKey: "note") { modelKey totalRecords embeddedRecords pendingRecords lastEmbeddedAt } }

List exactly which records are missing an embedding for a key, paginated, so you can embed only the gaps:

query { recordsMissingEmbedding(modelKey: "note", key: "default", first: 100) { edges { node { recordId naturalKey } } pageInfo { hasNextPage endCursor } totalCount } }

Before re-embedding, read back the content hash you stored alongside each embedding. If it matches the current content’s hash, skip it:

query { embeddingDigests( modelKey: "note" key: "default" recordIds: ["rec_abc123", "rec_def456"] ) { recordId contentHash } }

When you write an embedding, pass the contentHash you computed so a later sweep can read it back:

mutation { writeEmbeddings(input: { entries: [{ recordId: "rec_abc123" key: "default" embedding: [0.0123, -0.0456, 0.0789] contentHash: "sha256:9f86d0…" }] }) }

Records carry the same signals as fields, handy for quick per-record checks:

query { note(id: "rec_abc123") { _id _hasEmbedding(key: "default") _embeddingContentHash(key: "default") } }

A typical sweep: call recordsMissingEmbedding to find gaps, embeddingDigests to skip records whose contentHash is unchanged, embed the rest, and write them back with writeEmbeddings (including the new contentHash).

Next Steps

Last updated on