Search & Discovery

Trove uses semantic search. Instead of matching keywords, it matches meaning. A search for “papers about attention in neural networks” finds documents about transformer architectures even if they never use those exact words.

Three Modes

Search

Targeted retrieval ranked by relevance score (0-1). Best for specific questions or fact lookup.

Discover

Thematic exploration with a lower relevance threshold. Returns results that are related but not necessarily direct matches (e.g., searching “batch processing” surfaces articles about event-driven architectures or MapReduce). Good for brainstorming and finding connections you forgot about.

Recent

Chronological listing of recently indexed content, sorted by indexing date. No semantic matching. Use it to see what was recently synced or saved.

Filters

Each mode supports a different subset of filters.

Filter	Applies to	Description
`connector` / `connectorId`	search, discover, recent	Scope to a specific data source
`connectorType`	search, discover, recent	Scope to a connector type (e.g., all RSS feeds)
`author`	search, recent	Filter by content author
`after` / `before`	search	Date range on content date
`since`	recent	Time window for recent items (e.g., last 24 hours)
`contentType`	search	Filter by type: `text`, `transcript`, `highlight`, `bookmark`
`tags`	search	Filter by tags

Note: MCP tools accept connector names (e.g., Readwise). GraphQL queries accept connector IDs. Use the connectors query to look up IDs.

Relevance Scores

Search and discover results include a relevance score from 0.0 to 1.0.

Score Range	Meaning
0.9+	Very strong match
0.7-0.9	Good match, clearly related
0.5-0.7	Related but possibly tangential
Below 0.5	Weak match

Snippets

Each search result includes a snippet, the most relevant text chunk from the matched document. Snippets come from the vector index (the specific chunk that matched your query), not from simple text truncation. The snippet shows you why a document matched.

Search-then-Drill-Down

The recommended workflow:

Use search (or the search GraphQL query) to find relevant documents. Results include titles, metadata, snippets, and relevance scores.
Use trove_get_document (or the document GraphQL query) to read the full text of interesting results.

This pattern keeps search responses fast (metadata from D1) and only loads full text from R2 when you need it.

Embeddings

When a document is ingested, its text is split into overlapping chunks.
Each chunk is converted to a 1024-dimensional vector using the bge-m3 model (via Workers AI).
Vectors are stored in Vectorize with metadata filters (user ID, connector, author, content type, tags).
At search time, your query is converted to a vector using the same model and compared against stored vectors using cosine similarity.
The best-matching chunks identify the most relevant documents.

Tips

Be specific. “How does RLHF work in large language models” returns better results than “AI training.”
Use filters. Narrow by connector, author, or date range when you know the source.
Combine search + get_document. Search returns snippets; drill into full text when you need more context.
Try discover for exploration. When you want related content, not exact matches.
Use recent for chronological. No semantic matching, just what was indexed recently.
Filter by date. Use after/before to find content from a specific time period.
Scope to a connector. Search within a single source when you know where the content lives.