diff --git a/docs/METADATA_FIELD_MAPPINGS.md b/docs/METADATA_FIELD_MAPPINGS.md new file mode 100644 index 0000000..0c14f99 --- /dev/null +++ b/docs/METADATA_FIELD_MAPPINGS.md @@ -0,0 +1,95 @@ +# Metadata Field Mappings + +Maps each canonical field to the dot-path where its value lives inside `sourcedMetadata.`. + +Used by `SOURCE_FIELD_PATHS` in `utils/metadata.resolution.utils.ts` and drives both the auto-resolution algorithm and the cherry-pick comparison view. + +Dot-notation paths are relative to `sourcedMetadata.` +(e.g. `volumeInformation.name` → `comic.sourcedMetadata.comicvine.volumeInformation.name`). + +--- + +## Source API Notes + +| Source | API | Auth | Notes | +|---|---|---|---| +| ComicVine | `https://comicvine.gamespot.com/api/` | Free API key | Covers Marvel, DC, and independents | +| Metron | `https://metron.cloud/api/` | Free account | Modern community DB, growing | +| GCD | `https://www.comics.org/api/` | None | Creators/characters live inside `story_set[]`, not top-level | +| LOCG | `https://leagueofcomicgeeks.com` | No public API | Scraped or partner access | +| ComicInfo.xml | Embedded in archive | N/A | ComicRack standard | +| Shortboxed | `https://api.shortboxed.com` | Partner key | Release-focused; limited metadata | +| Marvel | `https://gateway.marvel.com/v1/public/` | API key | Official Marvel API | +| DC | No official public API | — | Use ComicVine for DC issues | + +--- + +## Scalar Fields + +| Canonical Field | ComicVine | Metron | GCD | LOCG | ComicInfo.xml | Shortboxed | Marvel | +|---|---|---|---|---|---|---|---| +| `title` | `name` | `name` | `title` | `name` | `Title` | `title` | `title` | +| `series` | `volumeInformation.name` | `series.name` | `series_name` | TBD | `Series` | TBD | `series.name` | +| `issueNumber` | `issue_number` | `number` | `number` | TBD | `Number` | TBD | `issueNumber` | +| `volume` | `volume.name` | `series.volume` | `volume` | TBD | `Volume` | TBD | TBD | +| `publisher` | `volumeInformation.publisher.name` | `publisher.name` | `indicia_publisher` | `publisher` | `Publisher` | `publisher` | `"Marvel"` (static) | +| `imprint` | TBD | `imprint.name` | `brand_emblem` | TBD | `Imprint` | TBD | TBD | +| `coverDate` | `cover_date` | `cover_date` | `key_date` | TBD | `CoverDate` | TBD | `dates[onsaleDate].date` | +| `publicationDate` | `store_date` | `store_date` | `on_sale_date` | TBD | TBD | `release_date` | `dates[focDate].date` | +| `description` | `description` | `desc` | `story_set[0].synopsis` | `description` | `Summary` | `description` | `description` | +| `notes` | TBD | TBD | `notes` | TBD | `Notes` | TBD | TBD | +| `pageCount` | TBD | `page_count` | `page_count` | TBD | `PageCount` | TBD | `pageCount` | +| `ageRating` | TBD | `rating.name` | `rating` | TBD | `AgeRating` | TBD | TBD | +| `format` | TBD | `series.series_type.name` | `story_set[0].type` | TBD | `Format` | TBD | `format` | +| `communityRating` | TBD | TBD | TBD | `rating` | TBD | TBD | TBD | +| `coverImage` | `image.original_url` | `image` | `cover` | `cover` | TBD | TBD | `thumbnail.path + "." + thumbnail.extension` | + +--- + +## Array / Nested Fields + +GCD creator credits live as free-text strings inside `story_set[0]` (e.g. `"script": "Grant Morrison, Peter Milligan"`), not as structured arrays. These need to be split on commas during mapping. + +| Canonical Field | ComicVine | Metron | GCD | LOCG | ComicInfo.xml | Shortboxed | Marvel | +|---|---|---|---|---|---|---|---| +| `creators` (writer) | `person_credits[role=writer]` | `credits[role=writer]` | `story_set[0].script` | TBD | `Writer` | `creators` | `creators.items[role=writer]` | +| `creators` (penciller) | `person_credits[role=penciller]` | `credits[role=penciller]` | `story_set[0].pencils` | TBD | `Penciller` | TBD | `creators.items[role=penciler]` | +| `creators` (inker) | `person_credits[role=inker]` | `credits[role=inker]` | `story_set[0].inks` | TBD | `Inker` | TBD | `creators.items[role=inker]` | +| `creators` (colorist) | `person_credits[role=colorist]` | `credits[role=colorist]` | `story_set[0].colors` | TBD | `Colorist` | TBD | `creators.items[role=colorist]` | +| `creators` (letterer) | `person_credits[role=letterer]` | `credits[role=letterer]` | `story_set[0].letters` | TBD | `Letterer` | TBD | `creators.items[role=letterer]` | +| `creators` (editor) | `person_credits[role=editor]` | `credits[role=editor]` | `story_set[0].editing` | TBD | `Editor` | TBD | `creators.items[role=editor]` | +| `characters` | `character_credits` | `characters` | `story_set[0].characters` | TBD | `Characters` | TBD | `characters.items` | +| `teams` | `team_credits` | `teams` | TBD | TBD | `Teams` | TBD | TBD | +| `locations` | `location_credits` | `locations` | TBD | TBD | `Locations` | TBD | TBD | +| `storyArcs` | `story_arc_credits` | `arcs` | TBD | TBD | `StoryArc` | TBD | `events.items` | +| `stories` | TBD | TBD | `story_set[].title` | TBD | TBD | TBD | `stories.items` | +| `genres` | TBD | `series.genres` | `story_set[0].genre` | TBD | `Genre` | TBD | TBD | +| `tags` | TBD | TBD | `story_set[0].keywords` | TBD | `Tags` | TBD | TBD | +| `universes` | TBD | TBD | TBD | TBD | TBD | TBD | TBD | +| `reprints` | TBD | `reprints` | TBD | TBD | TBD | TBD | TBD | +| `urls` | `site_detail_url` | `resource_url` | `api_url` | `url` | TBD | TBD | `urls[type=detail].url` | +| `prices` | `price` | TBD | `price` | `price` | `Price` | `price` | `prices[type=printPrice].price` | +| `externalIDs` | `id` | `id` | `api_url` | TBD | TBD | `diamond_id` | `id` | + +--- + +## Identifiers / GTINs + +| Canonical Field | ComicVine | Metron | GCD | LOCG | ComicInfo.xml | Shortboxed | Marvel | +|---|---|---|---|---|---|---|---| +| `gtin.isbn` | TBD | TBD | `isbn` | TBD | TBD | TBD | `isbn` | +| `gtin.upc` | TBD | TBD | `barcode` | TBD | TBD | TBD | `upc` | + +--- + +## Special Mapping Notes + +- **DC Comics**: No official public API. DC issue metadata is sourced via **ComicVine** (which has comprehensive DC coverage). There is no separate `dc` source key. +- **GCD creators**: All credit fields (`script`, `pencils`, `inks`, `colors`, `letters`, `editing`) are comma-separated strings inside `story_set[0]`. Mapping code must split these into individual creator objects with roles assigned. +- **GCD characters/genre/keywords**: Also inside `story_set[0]`, not top-level on the issue. +- **Marvel publisher**: Always "Marvel Comics" — can be set as a static value rather than extracted from the API response. +- **Marvel cover image**: Constructed by concatenating `thumbnail.path + "." + thumbnail.extension`. +- **Marvel dates**: Multiple date types in a `dates[]` array — filter by `type == "onsaleDate"` for cover date, `type == "focDate"` for FOC/publication date. +- **Marvel creators/characters**: Nested inside collection objects (`creators.items[]`, `characters.items[]`) with `name` and `role` sub-fields. +- **Shortboxed**: Release-focused service; limited metadata. Best used for `publicationDate`, `price`, and `publisher` only. No series/issue number fields. +- **LOCG**: No public API; fields marked TBD will need to be confirmed when integration is built. diff --git a/docs/METADATA_RECONCILIATION_PLAN.md b/docs/METADATA_RECONCILIATION_PLAN.md new file mode 100644 index 0000000..1573adb --- /dev/null +++ b/docs/METADATA_RECONCILIATION_PLAN.md @@ -0,0 +1,330 @@ +# Metadata Reconciliation System Plan + +## Context + +Comics in the library can have metadata from multiple sources: ComicVine, Metron, GCD, LOCG, ComicInfo.xml, Shortboxed, Marvel, DC, and Manual. The existing `canonicalMetadata` + `sourcedMetadata` architecture already stores raw per-source data and has a resolution algorithm, but there's no way for a user to interactively compare and cherry-pick values across sources field-by-field. This plan adds that manual reconciliation workflow (Phase 1) and lays the groundwork for ranked auto-resolution (Phase 2). + +--- + +## Current State (what already exists) + +- `sourcedMetadata.{comicvine,metron,gcd,locg,comicInfo}` — raw per-source data (Mongoose Mixed) — **Shortboxed, Marvel, DC not yet added** +- `canonicalMetadata` — resolved truth, each field is `{ value, provenance, userOverride }` +- `analyzeMetadataConflicts(comicId)` GraphQL query — conflict view for 5 fields only +- `setMetadataField(comicId, field, value)` — stores MANUAL override with raw string +- `resolveMetadata(comicId)` / `bulkResolveMetadata(comicIds)` — trigger auto-resolution +- `previewCanonicalMetadata(comicId, preferences)` — dry run +- `buildCanonicalMetadata()` in `utils/metadata.resolution.utils.ts` — covers only 7 fields +- `UserPreferences` model with `sourcePriorities`, `conflictResolution`, `autoMerge` +- `updateUserPreferences` resolver — fully implemented +- `autoResolveMetadata()` in `services/graphql.service.ts` — exists but only for scalar triggers + +--- + +## Phase 1: Manual Cherry-Pick Reconciliation + +### Goal +For any comic, a user can open a comparison table: each row is a canonical field, each column is a source. They click a cell to "pick" that source's value for that field. The result is stored as `canonicalMetadata.` with the original source's provenance intact and `userOverride: true` to prevent future auto-resolution from overwriting it. + +### Expand `MetadataSource` enum (`models/comic.model.ts` + `models/graphql/typedef.ts`) + +Add new sources to the enum: + +```ts +enum MetadataSource { + COMICVINE = "comicvine", + METRON = "metron", + GRAND_COMICS_DATABASE = "gcd", + LOCG = "locg", + COMICINFO_XML = "comicinfo", + SHORTBOXED = "shortboxed", + MARVEL = "marvel", + DC = "dc", + MANUAL = "manual", +} +``` + +Also add to `sourcedMetadata` in `ComicSchema` (`models/comic.model.ts`): +```ts +shortboxed: { type: mongoose.Schema.Types.Mixed, default: {} }, +marvel: { type: mongoose.Schema.Types.Mixed, default: {} }, +dc: { type: mongoose.Schema.Types.Mixed, default: {} }, +``` + +And in GraphQL schema enum: +```graphql +enum MetadataSource { + COMICVINE + METRON + GRAND_COMICS_DATABASE + LOCG + COMICINFO_XML + SHORTBOXED + MARVEL + DC + MANUAL +} +``` + +> **Note:** Shortboxed, Marvel, and DC field paths in `SOURCE_FIELD_PATHS` will be stubs (`{}`) until those integrations are built. The comparison view will simply show no data for those sources until then — no breaking changes. + +--- + +### New types (GraphQL — `models/graphql/typedef.ts`) + +```graphql +# One source's value for a single field +type SourceFieldValue { + source: MetadataSource! + value: JSON # null if source has no value for this field + confidence: Float + fetchedAt: String + url: String +} + +# All sources' values for a single canonical field +type MetadataFieldComparison { + field: String! + currentCanonical: MetadataField # what is currently resolved + sourcedValues: [SourceFieldValue!]! # one entry per source that has data + hasConflict: Boolean! # true if >1 source has a different value +} + +type MetadataComparisonView { + comicId: ID! + comparisons: [MetadataFieldComparison!]! +} +``` + +Add to `Query`: +```graphql +getMetadataComparisonView(comicId: ID!): MetadataComparisonView! +``` + +Add to `Mutation`: +```graphql +# Cherry-pick a single field from a named source +pickFieldFromSource(comicId: ID!, field: String!, source: MetadataSource!): Comic! + +# Batch cherry-pick multiple fields at once +batchPickFieldsFromSources( + comicId: ID! + picks: [FieldSourcePick!]! +): Comic! + +input FieldSourcePick { + field: String! + source: MetadataSource! +} +``` + +### Changes to `utils/metadata.resolution.utils.ts` + +Add `SOURCE_FIELD_PATHS` — a complete mapping of every canonical field to its path in each sourced-metadata blob: + +```ts +export const SOURCE_FIELD_PATHS: Record< + string, // canonical field name + Partial> // source → dot-path in sourcedMetadata[source] +> = { + title: { comicvine: "name", metron: "name", comicinfo: "Title", locg: "name" }, + series: { comicvine: "volumeInformation.name", comicinfo: "Series" }, + issueNumber: { comicvine: "issue_number", metron: "number", comicinfo: "Number" }, + publisher: { comicvine: "volumeInformation.publisher.name", locg: "publisher", comicinfo: "Publisher" }, + coverDate: { comicvine: "cover_date", metron: "cover_date", comicinfo: "CoverDate" }, + description: { comicvine: "description", locg: "description", comicinfo: "Summary" }, + pageCount: { comicinfo: "PageCount", metron: "page_count" }, + ageRating: { comicinfo: "AgeRating", metron: "rating.name" }, + format: { metron: "series.series_type.name", comicinfo: "Format" }, + // creators → array field, handled separately + storyArcs: { comicvine: "story_arc_credits", metron: "arcs", comicinfo: "StoryArc" }, + characters: { comicvine: "character_credits", metron: "characters", comicinfo: "Characters" }, + teams: { comicvine: "team_credits", metron: "teams", comicinfo: "Teams" }, + locations: { comicvine: "location_credits", metron: "locations", comicinfo: "Locations" }, + genres: { metron: "series.genres", comicinfo: "Genre" }, + tags: { comicinfo: "Tags" }, + communityRating: { locg: "rating" }, + coverImage: { comicvine: "image.original_url", locg: "cover", metron: "image" }, + // Shortboxed, Marvel, DC — paths TBD when integrations are built + // shortboxed: {}, marvel: {}, dc: {} +}; +``` + +Add `extractAllSourceValues(field, sourcedMetadata)` — returns `SourceFieldValue[]` for every source that has a non-null value for the given field. + +Update `buildCanonicalMetadata()` to use `SOURCE_FIELD_PATHS` instead of the hard-coded 7-field mapping. This single source of truth drives both auto-resolve and the comparison view. + +### Changes to `models/graphql/resolvers.ts` + +**`getMetadataComparisonView` resolver:** +- Fetch comic by ID +- For each key in `SOURCE_FIELD_PATHS`, call `extractAllSourceValues()` +- Return the comparison array with `hasConflict` flag +- Include `currentCanonical` from `comic.canonicalMetadata[field]` if it exists + +**`pickFieldFromSource` resolver:** +- Fetch comic, validate source has a value for the field +- Extract value + provenance from `sourcedMetadata[source]` via `SOURCE_FIELD_PATHS` +- Write to `canonicalMetadata[field]` with original source provenance + `userOverride: true` +- Save and return comic + +**`batchPickFieldsFromSources` resolver:** +- Same as above but iterate over `picks[]`, do a single `comic.save()` + +### Changes to `services/library.service.ts` + +Add Moleculer actions that delegate to GraphQL: + +```ts +getMetadataComparisonView: { + rest: "POST /getMetadataComparisonView", + async handler(ctx) { /* call GraphQL query */ } +}, +pickFieldFromSource: { + rest: "POST /pickFieldFromSource", + async handler(ctx) { /* call GraphQL mutation */ } +}, +batchPickFieldsFromSources: { + rest: "POST /batchPickFieldsFromSources", + async handler(ctx) { /* call GraphQL mutation */ } +}, +``` + +### Changes to `utils/import.graphql.utils.ts` + +Add three helper functions mirroring the pattern of existing utils: +- `getMetadataComparisonViewViaGraphQL(broker, comicId)` +- `pickFieldFromSourceViaGraphQL(broker, comicId, field, source)` +- `batchPickFieldsFromSourcesViaGraphQL(broker, comicId, picks)` + +--- + +## Architectural Guidance: GraphQL vs REST + +The project has two distinct patterns — use the right one: + +| Type of operation | Pattern | +|---|---| +| Complex metadata logic (resolution, provenance, conflict analysis) | **GraphQL mutation/query** in `typedef.ts` + `resolvers.ts` | +| User-facing operation the UI calls | **REST action** in `library.service.ts` → delegates to GraphQL via `broker.call("graphql.graphql", {...})` | +| Pure acquisition tracking (no resolution) | Direct DB write in `library.service.ts`, no GraphQL needed | + +**All three new reconciliation operations** (`getMetadataComparisonView`, `pickFieldFromSource`, `batchPickFieldsFromSources`) follow the first two rows: GraphQL for the logic + REST wrapper for UI consumption. + +### Gap: `applyComicVineMetadata` bypasses canonicalMetadata + +Currently `library.applyComicVineMetadata` writes directly to `sourcedMetadata.comicvine` in MongoDB without triggering `buildCanonicalMetadata`. This means `canonicalMetadata` goes stale when ComicVine data is applied. + +The fix: change `applyComicVineMetadata` to call the existing `updateSourcedMetadata` GraphQL mutation instead of the direct DB write. `updateSourcedMetadata` already triggers re-resolution via `autoMerge.onMetadataUpdate`. + +**File**: `services/library.service.ts` lines ~937–990 (applyComicVineMetadata handler) +**Change**: Replace direct `Comic.findByIdAndUpdate` with `broker.call("graphql.graphql", { query: updateSourcedMetadataMutation, ... })` + +--- + +## Phase 2: Source Ranking + AutoResolve (design — not implementing yet) + +The infrastructure already exists: +- `UserPreferences.sourcePriorities[]` with per-source `priority` (1=highest) +- `conflictResolution` strategy enum (PRIORITY, CONFIDENCE, RECENCY, HYBRID, MANUAL) +- `autoMerge.enabled / onImport / onMetadataUpdate` +- `updateUserPreferences` resolver + +When this phase is implemented, the additions will be: +1. A "re-resolve all comics" action triggered when source priorities change (`POST /reResolveAllWithPreferences`) +2. `autoResolveMetadata` in graphql.service.ts wired to call `resolveMetadata` on save rather than only on import/update hooks +3. Field-specific source overrides UI (the `fieldOverrides` Map in `SourcePrioritySchema` is already modeled) + +--- + +## TDD Approach + +Each step follows Red → Green → Refactor: +1. Write failing spec(s) for the unit being built +2. Implement the minimum code to make them pass +3. Refactor if needed + +**Test framework:** Jest + ts-jest (configured in `package.json`, zero existing tests — these will be the first) +**File convention:** `*.spec.ts` alongside the source file (e.g., `utils/metadata.resolution.utils.spec.ts`) +**No DB needed for unit tests** — mock `Comic.findById` etc. with `jest.spyOn` / `jest.mock` + +--- + +## Implementation Order + +### Step 1 — Utility layer (prerequisite for everything) +**Write first:** `utils/metadata.resolution.utils.spec.ts` +- `SOURCE_FIELD_PATHS` has entries for all canonical fields +- `extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } })` returns 2 entries with correct source + value +- `extractAllSourceValues` returns empty array when no source has the field +- `buildCanonicalMetadata()` covers all fields in `SOURCE_FIELD_PATHS` (not just 7) +- `buildCanonicalMetadata()` never overwrites fields with `userOverride: true` + +**Then implement:** +- `models/comic.model.ts` — add `SHORTBOXED`, `MARVEL`, `DC` to `MetadataSource` enum; add 3 new `sourcedMetadata` fields +- `models/userpreferences.model.ts` — add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default `sourcePriorities` +- `utils/metadata.resolution.utils.ts` — add `SOURCE_FIELD_PATHS`, `extractAllSourceValues()`, rewrite `buildCanonicalMetadata()` + +### Step 2 — GraphQL schema (no tests — type definitions only) +**`models/graphql/typedef.ts`** +- Expand `MetadataSource` enum (add SHORTBOXED, MARVEL, DC) +- Add `SourceFieldValue`, `MetadataFieldComparison`, `MetadataComparisonView`, `FieldSourcePick` types +- Add `getMetadataComparisonView` to `Query` +- Add `pickFieldFromSource`, `batchPickFieldsFromSources` to `Mutation` + +### Step 3 — GraphQL resolvers +**Write first:** `models/graphql/resolvers.spec.ts` +- `getMetadataComparisonView`: returns one entry per field in `SOURCE_FIELD_PATHS`; `hasConflict` true when sources disagree; `currentCanonical` reflects DB state +- `pickFieldFromSource`: sets field with source provenance + `userOverride: true`; throws when source has no value +- `batchPickFieldsFromSources`: applies all picks in a single save +- `applyComicVineMetadata` fix: calls `updateSourcedMetadata` mutation (not direct DB write) + +**Then implement:** `models/graphql/resolvers.ts` + +### Step 4 — GraphQL util helpers +**Write first:** `utils/import.graphql.utils.spec.ts` +- Each helper calls `broker.call("graphql.graphql", ...)` with correct query/variables +- GraphQL errors are propagated + +**Then implement:** `utils/import.graphql.utils.ts` + +### Step 5 — REST surface +**Write first:** `services/library.service.spec.ts` +- Each action delegates to the correct GraphQL util helper +- Context params pass through correctly + +**Then implement:** `services/library.service.ts` + +--- + +## Critical Files + +| File | Step | Change | +|---|---|---| +| `models/comic.model.ts` | 1 | Add `SHORTBOXED`, `MARVEL`, `DC` to `MetadataSource` enum; add 3 new `sourcedMetadata` fields | +| `models/userpreferences.model.ts` | 1 | Add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default `sourcePriorities` | +| `utils/metadata.resolution.utils.ts` | 1 | Add `SOURCE_FIELD_PATHS`, `extractAllSourceValues()`; rewrite `buildCanonicalMetadata()` | +| `models/graphql/typedef.ts` | 2 | Expand `MetadataSource` enum; add 4 new types + query + 2 mutations | +| `models/graphql/resolvers.ts` | 3 | Implement 3 resolvers + fix `applyComicVineMetadata` | +| `utils/import.graphql.utils.ts` | 4 | Add 3 GraphQL util functions | +| `services/library.service.ts` | 5 | Add 3 Moleculer REST actions | + +--- + +## Reusable Existing Code + +- `resolveMetadataField()` in `utils/metadata.resolution.utils.ts` — reused inside `buildCanonicalMetadata()` +- `getNestedValue()` in same file — reused in `extractAllSourceValues()` +- `convertPreferences()` in `models/graphql/resolvers.ts` — reused in `getMetadataComparisonView` +- `autoResolveMetadata()` in `services/graphql.service.ts` — called after `pickFieldFromSource` if `autoMerge.onMetadataUpdate` is true + +--- + +## Verification + +1. **Unit**: `extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } })` → 2 entries with correct provenance +2. **GraphQL**: `getMetadataComparisonView(comicId)` on a comic with comicvine + comicInfo data → all fields populated +3. **Cherry-pick**: `pickFieldFromSource(comicId, "title", COMICVINE)` → `canonicalMetadata.title.provenance.source == "comicvine"` and `userOverride == true` +4. **Batch**: `batchPickFieldsFromSources` with 3 fields → single DB write, all 3 updated +5. **Lock**: After cherry-picking, `resolveMetadata(comicId)` must NOT overwrite picked fields (`userOverride: true` takes priority) +6. **REST**: `POST /api/library/getMetadataComparisonView` returns expected JSON