Skip to content

Documents

A document is a single unit of content in Trove. Articles, tweets, highlights, transcripts, bookmarks, and notes are all documents. Connectors create them automatically during sync; you can also save them manually.

FieldTypeDescription
idIDUnique identifier (16-character hex string)
externalIdStringSource system ID, used for deduplication
titleString?Document title
urlString?Original URL
authorString?Content author
contentDateDateTime?When the content was originally created
indexedAtDateTimeWhen Trove indexed this document
previewTextString?First ~300 words (stored in D1 for fast access)
fullTextString?Complete text (lazy-loaded from R2)
wordCountInt?Total word count
contentTypeStringOne of: text, transcript, highlight, bookmark
tags[String]User-defined tags
metadataJSON?Connector-specific extra data
TypeDescriptionExamples
textArticles, blog posts, essaysRSS feed items, Hacker News stories (default)
transcriptAudio or video transcriptionsPodcast transcripts, meeting recordings
highlightExcerpts and annotationsReadwise highlights, Kindle notes
bookmarkSaved URLs with extracted contentBrowser bookmarks, saved links

Documents are uniquely identified by the pair (connector_id, external_id). If you sync a document with the same external_id from the same connector again, the pipeline skips it.

  • Re-running a connector sync is safe. Duplicates are ignored.
  • The same content from different connectors is stored separately (different connector IDs).
  • Use stable, source-system identifiers as your external_id (e.g., the RSS item GUID, the Notion page ID, the tweet ID).

Documents are stored across two layers for performance.

  • previewText is always available. It contains the first ~300 words and lives in D1 alongside other metadata. Use this in list views and search results.
  • fullText requires a read from R2. Only request it when you need the complete document, for example, when a user opens a document detail view.

In GraphQL, only include fullText in your selection set when you need it. In MCP, trove_search returns snippets; use trove_get_document to read the full text.

Tags are arbitrary string arrays attached to documents. They can be:

  • Set during ingestion (via the Sync API or trove_save)
  • Updated after the fact via the updateDocumentTags mutation
  • Used as filters in search, discover, and document listings

There is no predefined tag vocabulary. Use whatever taxonomy fits your knowledge base.