📖 Metadata field mapping of popular sources
Some checks failed
Docker Image CI / build (push) Has been cancelled

This commit is contained in:
2026-04-03 18:15:49 -04:00
parent bfaf7bb664
commit bf7e57a274
2 changed files with 425 additions and 0 deletions

View File

@@ -0,0 +1,330 @@
# Metadata Reconciliation System Plan
## Context
Comics in the library can have metadata from multiple sources: ComicVine, Metron, GCD, LOCG, ComicInfo.xml, Shortboxed, Marvel, DC, and Manual. The existing `canonicalMetadata` + `sourcedMetadata` architecture already stores raw per-source data and has a resolution algorithm, but there's no way for a user to interactively compare and cherry-pick values across sources field-by-field. This plan adds that manual reconciliation workflow (Phase 1) and lays the groundwork for ranked auto-resolution (Phase 2).
---
## Current State (what already exists)
- `sourcedMetadata.{comicvine,metron,gcd,locg,comicInfo}` — raw per-source data (Mongoose Mixed) — **Shortboxed, Marvel, DC not yet added**
- `canonicalMetadata` — resolved truth, each field is `{ value, provenance, userOverride }`
- `analyzeMetadataConflicts(comicId)` GraphQL query — conflict view for 5 fields only
- `setMetadataField(comicId, field, value)` — stores MANUAL override with raw string
- `resolveMetadata(comicId)` / `bulkResolveMetadata(comicIds)` — trigger auto-resolution
- `previewCanonicalMetadata(comicId, preferences)` — dry run
- `buildCanonicalMetadata()` in `utils/metadata.resolution.utils.ts` — covers only 7 fields
- `UserPreferences` model with `sourcePriorities`, `conflictResolution`, `autoMerge`
- `updateUserPreferences` resolver — fully implemented
- `autoResolveMetadata()` in `services/graphql.service.ts` — exists but only for scalar triggers
---
## Phase 1: Manual Cherry-Pick Reconciliation
### Goal
For any comic, a user can open a comparison table: each row is a canonical field, each column is a source. They click a cell to "pick" that source's value for that field. The result is stored as `canonicalMetadata.<field>` with the original source's provenance intact and `userOverride: true` to prevent future auto-resolution from overwriting it.
### Expand `MetadataSource` enum (`models/comic.model.ts` + `models/graphql/typedef.ts`)
Add new sources to the enum:
```ts
enum MetadataSource {
COMICVINE = "comicvine",
METRON = "metron",
GRAND_COMICS_DATABASE = "gcd",
LOCG = "locg",
COMICINFO_XML = "comicinfo",
SHORTBOXED = "shortboxed",
MARVEL = "marvel",
DC = "dc",
MANUAL = "manual",
}
```
Also add to `sourcedMetadata` in `ComicSchema` (`models/comic.model.ts`):
```ts
shortboxed: { type: mongoose.Schema.Types.Mixed, default: {} },
marvel: { type: mongoose.Schema.Types.Mixed, default: {} },
dc: { type: mongoose.Schema.Types.Mixed, default: {} },
```
And in GraphQL schema enum:
```graphql
enum MetadataSource {
COMICVINE
METRON
GRAND_COMICS_DATABASE
LOCG
COMICINFO_XML
SHORTBOXED
MARVEL
DC
MANUAL
}
```
> **Note:** Shortboxed, Marvel, and DC field paths in `SOURCE_FIELD_PATHS` will be stubs (`{}`) until those integrations are built. The comparison view will simply show no data for those sources until then — no breaking changes.
---
### New types (GraphQL — `models/graphql/typedef.ts`)
```graphql
# One source's value for a single field
type SourceFieldValue {
source: MetadataSource!
value: JSON # null if source has no value for this field
confidence: Float
fetchedAt: String
url: String
}
# All sources' values for a single canonical field
type MetadataFieldComparison {
field: String!
currentCanonical: MetadataField # what is currently resolved
sourcedValues: [SourceFieldValue!]! # one entry per source that has data
hasConflict: Boolean! # true if >1 source has a different value
}
type MetadataComparisonView {
comicId: ID!
comparisons: [MetadataFieldComparison!]!
}
```
Add to `Query`:
```graphql
getMetadataComparisonView(comicId: ID!): MetadataComparisonView!
```
Add to `Mutation`:
```graphql
# Cherry-pick a single field from a named source
pickFieldFromSource(comicId: ID!, field: String!, source: MetadataSource!): Comic!
# Batch cherry-pick multiple fields at once
batchPickFieldsFromSources(
comicId: ID!
picks: [FieldSourcePick!]!
): Comic!
input FieldSourcePick {
field: String!
source: MetadataSource!
}
```
### Changes to `utils/metadata.resolution.utils.ts`
Add `SOURCE_FIELD_PATHS` — a complete mapping of every canonical field to its path in each sourced-metadata blob:
```ts
export const SOURCE_FIELD_PATHS: Record<
string, // canonical field name
Partial<Record<MetadataSource, string>> // source → dot-path in sourcedMetadata[source]
> = {
title: { comicvine: "name", metron: "name", comicinfo: "Title", locg: "name" },
series: { comicvine: "volumeInformation.name", comicinfo: "Series" },
issueNumber: { comicvine: "issue_number", metron: "number", comicinfo: "Number" },
publisher: { comicvine: "volumeInformation.publisher.name", locg: "publisher", comicinfo: "Publisher" },
coverDate: { comicvine: "cover_date", metron: "cover_date", comicinfo: "CoverDate" },
description: { comicvine: "description", locg: "description", comicinfo: "Summary" },
pageCount: { comicinfo: "PageCount", metron: "page_count" },
ageRating: { comicinfo: "AgeRating", metron: "rating.name" },
format: { metron: "series.series_type.name", comicinfo: "Format" },
// creators → array field, handled separately
storyArcs: { comicvine: "story_arc_credits", metron: "arcs", comicinfo: "StoryArc" },
characters: { comicvine: "character_credits", metron: "characters", comicinfo: "Characters" },
teams: { comicvine: "team_credits", metron: "teams", comicinfo: "Teams" },
locations: { comicvine: "location_credits", metron: "locations", comicinfo: "Locations" },
genres: { metron: "series.genres", comicinfo: "Genre" },
tags: { comicinfo: "Tags" },
communityRating: { locg: "rating" },
coverImage: { comicvine: "image.original_url", locg: "cover", metron: "image" },
// Shortboxed, Marvel, DC — paths TBD when integrations are built
// shortboxed: {}, marvel: {}, dc: {}
};
```
Add `extractAllSourceValues(field, sourcedMetadata)` — returns `SourceFieldValue[]` for every source that has a non-null value for the given field.
Update `buildCanonicalMetadata()` to use `SOURCE_FIELD_PATHS` instead of the hard-coded 7-field mapping. This single source of truth drives both auto-resolve and the comparison view.
### Changes to `models/graphql/resolvers.ts`
**`getMetadataComparisonView` resolver:**
- Fetch comic by ID
- For each key in `SOURCE_FIELD_PATHS`, call `extractAllSourceValues()`
- Return the comparison array with `hasConflict` flag
- Include `currentCanonical` from `comic.canonicalMetadata[field]` if it exists
**`pickFieldFromSource` resolver:**
- Fetch comic, validate source has a value for the field
- Extract value + provenance from `sourcedMetadata[source]` via `SOURCE_FIELD_PATHS`
- Write to `canonicalMetadata[field]` with original source provenance + `userOverride: true`
- Save and return comic
**`batchPickFieldsFromSources` resolver:**
- Same as above but iterate over `picks[]`, do a single `comic.save()`
### Changes to `services/library.service.ts`
Add Moleculer actions that delegate to GraphQL:
```ts
getMetadataComparisonView: {
rest: "POST /getMetadataComparisonView",
async handler(ctx) { /* call GraphQL query */ }
},
pickFieldFromSource: {
rest: "POST /pickFieldFromSource",
async handler(ctx) { /* call GraphQL mutation */ }
},
batchPickFieldsFromSources: {
rest: "POST /batchPickFieldsFromSources",
async handler(ctx) { /* call GraphQL mutation */ }
},
```
### Changes to `utils/import.graphql.utils.ts`
Add three helper functions mirroring the pattern of existing utils:
- `getMetadataComparisonViewViaGraphQL(broker, comicId)`
- `pickFieldFromSourceViaGraphQL(broker, comicId, field, source)`
- `batchPickFieldsFromSourcesViaGraphQL(broker, comicId, picks)`
---
## Architectural Guidance: GraphQL vs REST
The project has two distinct patterns — use the right one:
| Type of operation | Pattern |
|---|---|
| Complex metadata logic (resolution, provenance, conflict analysis) | **GraphQL mutation/query** in `typedef.ts` + `resolvers.ts` |
| User-facing operation the UI calls | **REST action** in `library.service.ts` → delegates to GraphQL via `broker.call("graphql.graphql", {...})` |
| Pure acquisition tracking (no resolution) | Direct DB write in `library.service.ts`, no GraphQL needed |
**All three new reconciliation operations** (`getMetadataComparisonView`, `pickFieldFromSource`, `batchPickFieldsFromSources`) follow the first two rows: GraphQL for the logic + REST wrapper for UI consumption.
### Gap: `applyComicVineMetadata` bypasses canonicalMetadata
Currently `library.applyComicVineMetadata` writes directly to `sourcedMetadata.comicvine` in MongoDB without triggering `buildCanonicalMetadata`. This means `canonicalMetadata` goes stale when ComicVine data is applied.
The fix: change `applyComicVineMetadata` to call the existing `updateSourcedMetadata` GraphQL mutation instead of the direct DB write. `updateSourcedMetadata` already triggers re-resolution via `autoMerge.onMetadataUpdate`.
**File**: `services/library.service.ts` lines ~937990 (applyComicVineMetadata handler)
**Change**: Replace direct `Comic.findByIdAndUpdate` with `broker.call("graphql.graphql", { query: updateSourcedMetadataMutation, ... })`
---
## Phase 2: Source Ranking + AutoResolve (design — not implementing yet)
The infrastructure already exists:
- `UserPreferences.sourcePriorities[]` with per-source `priority` (1=highest)
- `conflictResolution` strategy enum (PRIORITY, CONFIDENCE, RECENCY, HYBRID, MANUAL)
- `autoMerge.enabled / onImport / onMetadataUpdate`
- `updateUserPreferences` resolver
When this phase is implemented, the additions will be:
1. A "re-resolve all comics" action triggered when source priorities change (`POST /reResolveAllWithPreferences`)
2. `autoResolveMetadata` in graphql.service.ts wired to call `resolveMetadata` on save rather than only on import/update hooks
3. Field-specific source overrides UI (the `fieldOverrides` Map in `SourcePrioritySchema` is already modeled)
---
## TDD Approach
Each step follows Red → Green → Refactor:
1. Write failing spec(s) for the unit being built
2. Implement the minimum code to make them pass
3. Refactor if needed
**Test framework:** Jest + ts-jest (configured in `package.json`, zero existing tests — these will be the first)
**File convention:** `*.spec.ts` alongside the source file (e.g., `utils/metadata.resolution.utils.spec.ts`)
**No DB needed for unit tests** — mock `Comic.findById` etc. with `jest.spyOn` / `jest.mock`
---
## Implementation Order
### Step 1 — Utility layer (prerequisite for everything)
**Write first:** `utils/metadata.resolution.utils.spec.ts`
- `SOURCE_FIELD_PATHS` has entries for all canonical fields
- `extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } })` returns 2 entries with correct source + value
- `extractAllSourceValues` returns empty array when no source has the field
- `buildCanonicalMetadata()` covers all fields in `SOURCE_FIELD_PATHS` (not just 7)
- `buildCanonicalMetadata()` never overwrites fields with `userOverride: true`
**Then implement:**
- `models/comic.model.ts` — add `SHORTBOXED`, `MARVEL`, `DC` to `MetadataSource` enum; add 3 new `sourcedMetadata` fields
- `models/userpreferences.model.ts` — add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default `sourcePriorities`
- `utils/metadata.resolution.utils.ts` — add `SOURCE_FIELD_PATHS`, `extractAllSourceValues()`, rewrite `buildCanonicalMetadata()`
### Step 2 — GraphQL schema (no tests — type definitions only)
**`models/graphql/typedef.ts`**
- Expand `MetadataSource` enum (add SHORTBOXED, MARVEL, DC)
- Add `SourceFieldValue`, `MetadataFieldComparison`, `MetadataComparisonView`, `FieldSourcePick` types
- Add `getMetadataComparisonView` to `Query`
- Add `pickFieldFromSource`, `batchPickFieldsFromSources` to `Mutation`
### Step 3 — GraphQL resolvers
**Write first:** `models/graphql/resolvers.spec.ts`
- `getMetadataComparisonView`: returns one entry per field in `SOURCE_FIELD_PATHS`; `hasConflict` true when sources disagree; `currentCanonical` reflects DB state
- `pickFieldFromSource`: sets field with source provenance + `userOverride: true`; throws when source has no value
- `batchPickFieldsFromSources`: applies all picks in a single save
- `applyComicVineMetadata` fix: calls `updateSourcedMetadata` mutation (not direct DB write)
**Then implement:** `models/graphql/resolvers.ts`
### Step 4 — GraphQL util helpers
**Write first:** `utils/import.graphql.utils.spec.ts`
- Each helper calls `broker.call("graphql.graphql", ...)` with correct query/variables
- GraphQL errors are propagated
**Then implement:** `utils/import.graphql.utils.ts`
### Step 5 — REST surface
**Write first:** `services/library.service.spec.ts`
- Each action delegates to the correct GraphQL util helper
- Context params pass through correctly
**Then implement:** `services/library.service.ts`
---
## Critical Files
| File | Step | Change |
|---|---|---|
| `models/comic.model.ts` | 1 | Add `SHORTBOXED`, `MARVEL`, `DC` to `MetadataSource` enum; add 3 new `sourcedMetadata` fields |
| `models/userpreferences.model.ts` | 1 | Add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default `sourcePriorities` |
| `utils/metadata.resolution.utils.ts` | 1 | Add `SOURCE_FIELD_PATHS`, `extractAllSourceValues()`; rewrite `buildCanonicalMetadata()` |
| `models/graphql/typedef.ts` | 2 | Expand `MetadataSource` enum; add 4 new types + query + 2 mutations |
| `models/graphql/resolvers.ts` | 3 | Implement 3 resolvers + fix `applyComicVineMetadata` |
| `utils/import.graphql.utils.ts` | 4 | Add 3 GraphQL util functions |
| `services/library.service.ts` | 5 | Add 3 Moleculer REST actions |
---
## Reusable Existing Code
- `resolveMetadataField()` in `utils/metadata.resolution.utils.ts` — reused inside `buildCanonicalMetadata()`
- `getNestedValue()` in same file — reused in `extractAllSourceValues()`
- `convertPreferences()` in `models/graphql/resolvers.ts` — reused in `getMetadataComparisonView`
- `autoResolveMetadata()` in `services/graphql.service.ts` — called after `pickFieldFromSource` if `autoMerge.onMetadataUpdate` is true
---
## Verification
1. **Unit**: `extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } })` → 2 entries with correct provenance
2. **GraphQL**: `getMetadataComparisonView(comicId)` on a comic with comicvine + comicInfo data → all fields populated
3. **Cherry-pick**: `pickFieldFromSource(comicId, "title", COMICVINE)``canonicalMetadata.title.provenance.source == "comicvine"` and `userOverride == true`
4. **Batch**: `batchPickFieldsFromSources` with 3 fields → single DB write, all 3 updated
5. **Lock**: After cherry-picking, `resolveMetadata(comicId)` must NOT overwrite picked fields (`userOverride: true` takes priority)
6. **REST**: `POST /api/library/getMetadataComparisonView` returns expected JSON