Search & Discovery
Trove uses semantic search. Instead of matching keywords, it matches meaning. A search for “papers about attention in neural networks” finds documents about transformer architectures even if they never use those exact words.
Three Modes
Section titled “Three Modes”Search
Section titled “Search”Targeted retrieval ranked by relevance score (0-1). Best for specific questions or fact lookup.
Discover
Section titled “Discover”Thematic exploration with a lower relevance threshold. Returns results that are related but not necessarily direct matches (e.g., searching “batch processing” surfaces articles about event-driven architectures or MapReduce). Good for brainstorming and finding connections you forgot about.
Recent
Section titled “Recent”Chronological listing of recently indexed content, sorted by indexing date. No semantic matching. Use it to see what was recently synced or saved.
Filters
Section titled “Filters”Each mode supports a different subset of filters.
| Filter | Applies to | Description |
|---|---|---|
connector / connectorId | search, discover, recent | Scope to a specific data source |
connectorType | search, discover, recent | Scope to a connector type (e.g., all RSS feeds) |
author | search, recent | Filter by content author |
after / before | search | Date range on content date |
since | recent | Time window for recent items (e.g., last 24 hours) |
contentType | search | Filter by type: text, transcript, highlight, bookmark |
tags | search | Filter by tags |
Note: MCP tools accept connector names (e.g.,
Readwise). GraphQL queries accept connector IDs. Use theconnectorsquery to look up IDs.
Relevance Scores
Section titled “Relevance Scores”Search and discover results include a relevance score from 0.0 to 1.0.
| Score Range | Meaning |
|---|---|
| 0.9+ | Very strong match |
| 0.7-0.9 | Good match, clearly related |
| 0.5-0.7 | Related but possibly tangential |
| Below 0.5 | Weak match |
Snippets
Section titled “Snippets”Each search result includes a snippet, the most relevant text chunk from the matched document. Snippets come from the vector index (the specific chunk that matched your query), not from simple text truncation. The snippet shows you why a document matched.
Search-then-Drill-Down
Section titled “Search-then-Drill-Down”The recommended workflow:
- Use
search(or thesearchGraphQL query) to find relevant documents. Results include titles, metadata, snippets, and relevance scores. - Use
trove_get_document(or thedocumentGraphQL query) to read the full text of interesting results.
This pattern keeps search responses fast (metadata from D1) and only loads full text from R2 when you need it.
Embeddings
Section titled “Embeddings”- When a document is ingested, its text is split into overlapping chunks.
- Each chunk is converted to a 1024-dimensional vector using the bge-m3 model (via Workers AI).
- Vectors are stored in Vectorize with metadata filters (user ID, connector, author, content type, tags).
- At search time, your query is converted to a vector using the same model and compared against stored vectors using cosine similarity.
- The best-matching chunks identify the most relevant documents.
- Be specific. “How does RLHF work in large language models” returns better results than “AI training.”
- Use filters. Narrow by connector, author, or date range when you know the source.
- Combine search + get_document. Search returns snippets; drill into full text when you need more context.
- Try discover for exploration. When you want related content, not exact matches.
- Use recent for chronological. No semantic matching, just what was indexed recently.
- Filter by date. Use
after/beforeto find content from a specific time period. - Scope to a connector. Search within a single source when you know where the content lives.