Skip to content

Search & Discovery

Trove uses semantic search. Instead of matching keywords, it matches meaning. A search for “papers about attention in neural networks” finds documents about transformer architectures even if they never use those exact words.

Targeted retrieval ranked by relevance score (0-1). Best for specific questions or fact lookup.

Thematic exploration with a lower relevance threshold. Returns results that are related but not necessarily direct matches (e.g., searching “batch processing” surfaces articles about event-driven architectures or MapReduce). Good for brainstorming and finding connections you forgot about.

Chronological listing of recently indexed content, sorted by indexing date. No semantic matching. Use it to see what was recently synced or saved.

Each mode supports a different subset of filters.

FilterApplies toDescription
connector / connectorIdsearch, discover, recentScope to a specific data source
connectorTypesearch, discover, recentScope to a connector type (e.g., all RSS feeds)
authorsearch, recentFilter by content author
after / beforesearchDate range on content date
sincerecentTime window for recent items (e.g., last 24 hours)
contentTypesearchFilter by type: text, transcript, highlight, bookmark
tagssearchFilter by tags

Note: MCP tools accept connector names (e.g., Readwise). GraphQL queries accept connector IDs. Use the connectors query to look up IDs.

Search and discover results include a relevance score from 0.0 to 1.0.

Score RangeMeaning
0.9+Very strong match
0.7-0.9Good match, clearly related
0.5-0.7Related but possibly tangential
Below 0.5Weak match

Each search result includes a snippet, the most relevant text chunk from the matched document. Snippets come from the vector index (the specific chunk that matched your query), not from simple text truncation. The snippet shows you why a document matched.

The recommended workflow:

  1. Use search (or the search GraphQL query) to find relevant documents. Results include titles, metadata, snippets, and relevance scores.
  2. Use trove_get_document (or the document GraphQL query) to read the full text of interesting results.

This pattern keeps search responses fast (metadata from D1) and only loads full text from R2 when you need it.

  1. When a document is ingested, its text is split into overlapping chunks.
  2. Each chunk is converted to a 1024-dimensional vector using the bge-m3 model (via Workers AI).
  3. Vectors are stored in Vectorize with metadata filters (user ID, connector, author, content type, tags).
  4. At search time, your query is converted to a vector using the same model and compared against stored vectors using cosine similarity.
  5. The best-matching chunks identify the most relevant documents.
  1. Be specific. “How does RLHF work in large language models” returns better results than “AI training.”
  2. Use filters. Narrow by connector, author, or date range when you know the source.
  3. Combine search + get_document. Search returns snippets; drill into full text when you need more context.
  4. Try discover for exploration. When you want related content, not exact matches.
  5. Use recent for chronological. No semantic matching, just what was indexed recently.
  6. Filter by date. Use after/before to find content from a specific time period.
  7. Scope to a connector. Search within a single source when you know where the content lives.