Files
threetwo-core-service/docs/METADATA_RECONCILIATION_PLAN.md
Rishi Ghan bf7e57a274
Some checks failed
Docker Image CI / build (push) Has been cancelled
📖 Metadata field mapping of popular sources
2026-04-03 18:15:49 -04:00

15 KiB
Raw Permalink Blame History

Metadata Reconciliation System Plan

Context

Comics in the library can have metadata from multiple sources: ComicVine, Metron, GCD, LOCG, ComicInfo.xml, Shortboxed, Marvel, DC, and Manual. The existing canonicalMetadata + sourcedMetadata architecture already stores raw per-source data and has a resolution algorithm, but there's no way for a user to interactively compare and cherry-pick values across sources field-by-field. This plan adds that manual reconciliation workflow (Phase 1) and lays the groundwork for ranked auto-resolution (Phase 2).


Current State (what already exists)

  • sourcedMetadata.{comicvine,metron,gcd,locg,comicInfo} — raw per-source data (Mongoose Mixed) — Shortboxed, Marvel, DC not yet added
  • canonicalMetadata — resolved truth, each field is { value, provenance, userOverride }
  • analyzeMetadataConflicts(comicId) GraphQL query — conflict view for 5 fields only
  • setMetadataField(comicId, field, value) — stores MANUAL override with raw string
  • resolveMetadata(comicId) / bulkResolveMetadata(comicIds) — trigger auto-resolution
  • previewCanonicalMetadata(comicId, preferences) — dry run
  • buildCanonicalMetadata() in utils/metadata.resolution.utils.ts — covers only 7 fields
  • UserPreferences model with sourcePriorities, conflictResolution, autoMerge
  • updateUserPreferences resolver — fully implemented
  • autoResolveMetadata() in services/graphql.service.ts — exists but only for scalar triggers

Phase 1: Manual Cherry-Pick Reconciliation

Goal

For any comic, a user can open a comparison table: each row is a canonical field, each column is a source. They click a cell to "pick" that source's value for that field. The result is stored as canonicalMetadata.<field> with the original source's provenance intact and userOverride: true to prevent future auto-resolution from overwriting it.

Expand MetadataSource enum (models/comic.model.ts + models/graphql/typedef.ts)

Add new sources to the enum:

enum MetadataSource {
  COMICVINE       = "comicvine",
  METRON          = "metron",
  GRAND_COMICS_DATABASE = "gcd",
  LOCG            = "locg",
  COMICINFO_XML   = "comicinfo",
  SHORTBOXED      = "shortboxed",
  MARVEL          = "marvel",
  DC              = "dc",
  MANUAL          = "manual",
}

Also add to sourcedMetadata in ComicSchema (models/comic.model.ts):

shortboxed: { type: mongoose.Schema.Types.Mixed, default: {} },
marvel:     { type: mongoose.Schema.Types.Mixed, default: {} },
dc:         { type: mongoose.Schema.Types.Mixed, default: {} },

And in GraphQL schema enum:

enum MetadataSource {
  COMICVINE
  METRON
  GRAND_COMICS_DATABASE
  LOCG
  COMICINFO_XML
  SHORTBOXED
  MARVEL
  DC
  MANUAL
}

Note: Shortboxed, Marvel, and DC field paths in SOURCE_FIELD_PATHS will be stubs ({}) until those integrations are built. The comparison view will simply show no data for those sources until then — no breaking changes.


New types (GraphQL — models/graphql/typedef.ts)

# One source's value for a single field
type SourceFieldValue {
  source: MetadataSource!
  value: JSON            # null if source has no value for this field
  confidence: Float
  fetchedAt: String
  url: String
}

# All sources' values for a single canonical field
type MetadataFieldComparison {
  field: String!
  currentCanonical: MetadataField  # what is currently resolved
  sourcedValues: [SourceFieldValue!]!  # one entry per source that has data
  hasConflict: Boolean!            # true if >1 source has a different value
}

type MetadataComparisonView {
  comicId: ID!
  comparisons: [MetadataFieldComparison!]!
}

Add to Query:

getMetadataComparisonView(comicId: ID!): MetadataComparisonView!

Add to Mutation:

# Cherry-pick a single field from a named source
pickFieldFromSource(comicId: ID!, field: String!, source: MetadataSource!): Comic!

# Batch cherry-pick multiple fields at once
batchPickFieldsFromSources(
  comicId: ID!
  picks: [FieldSourcePick!]!
): Comic!

input FieldSourcePick {
  field: String!
  source: MetadataSource!
}

Changes to utils/metadata.resolution.utils.ts

Add SOURCE_FIELD_PATHS — a complete mapping of every canonical field to its path in each sourced-metadata blob:

export const SOURCE_FIELD_PATHS: Record<
  string,         // canonical field name
  Partial<Record<MetadataSource, string>>  // source → dot-path in sourcedMetadata[source]
> = {
  title:         { comicvine: "name", metron: "name", comicinfo: "Title", locg: "name" },
  series:        { comicvine: "volumeInformation.name", comicinfo: "Series" },
  issueNumber:   { comicvine: "issue_number", metron: "number", comicinfo: "Number" },
  publisher:     { comicvine: "volumeInformation.publisher.name", locg: "publisher", comicinfo: "Publisher" },
  coverDate:     { comicvine: "cover_date", metron: "cover_date", comicinfo: "CoverDate" },
  description:   { comicvine: "description", locg: "description", comicinfo: "Summary" },
  pageCount:     { comicinfo: "PageCount", metron: "page_count" },
  ageRating:     { comicinfo: "AgeRating", metron: "rating.name" },
  format:        { metron: "series.series_type.name", comicinfo: "Format" },
  // creators → array field, handled separately
  storyArcs:     { comicvine: "story_arc_credits", metron: "arcs", comicinfo: "StoryArc" },
  characters:    { comicvine: "character_credits", metron: "characters", comicinfo: "Characters" },
  teams:         { comicvine: "team_credits", metron: "teams", comicinfo: "Teams" },
  locations:     { comicvine: "location_credits", metron: "locations", comicinfo: "Locations" },
  genres:        { metron: "series.genres", comicinfo: "Genre" },
  tags:          { comicinfo: "Tags" },
  communityRating: { locg: "rating" },
  coverImage:    { comicvine: "image.original_url", locg: "cover", metron: "image" },
  // Shortboxed, Marvel, DC — paths TBD when integrations are built
  // shortboxed: {},  marvel: {},  dc: {}
};

Add extractAllSourceValues(field, sourcedMetadata) — returns SourceFieldValue[] for every source that has a non-null value for the given field.

Update buildCanonicalMetadata() to use SOURCE_FIELD_PATHS instead of the hard-coded 7-field mapping. This single source of truth drives both auto-resolve and the comparison view.

Changes to models/graphql/resolvers.ts

getMetadataComparisonView resolver:

  • Fetch comic by ID
  • For each key in SOURCE_FIELD_PATHS, call extractAllSourceValues()
  • Return the comparison array with hasConflict flag
  • Include currentCanonical from comic.canonicalMetadata[field] if it exists

pickFieldFromSource resolver:

  • Fetch comic, validate source has a value for the field
  • Extract value + provenance from sourcedMetadata[source] via SOURCE_FIELD_PATHS
  • Write to canonicalMetadata[field] with original source provenance + userOverride: true
  • Save and return comic

batchPickFieldsFromSources resolver:

  • Same as above but iterate over picks[], do a single comic.save()

Changes to services/library.service.ts

Add Moleculer actions that delegate to GraphQL:

getMetadataComparisonView: {
  rest: "POST /getMetadataComparisonView",
  async handler(ctx) { /* call GraphQL query */ }
},
pickFieldFromSource: {
  rest: "POST /pickFieldFromSource",
  async handler(ctx) { /* call GraphQL mutation */ }
},
batchPickFieldsFromSources: {
  rest: "POST /batchPickFieldsFromSources",
  async handler(ctx) { /* call GraphQL mutation */ }
},

Changes to utils/import.graphql.utils.ts

Add three helper functions mirroring the pattern of existing utils:

  • getMetadataComparisonViewViaGraphQL(broker, comicId)
  • pickFieldFromSourceViaGraphQL(broker, comicId, field, source)
  • batchPickFieldsFromSourcesViaGraphQL(broker, comicId, picks)

Architectural Guidance: GraphQL vs REST

The project has two distinct patterns — use the right one:

Type of operation Pattern
Complex metadata logic (resolution, provenance, conflict analysis) GraphQL mutation/query in typedef.ts + resolvers.ts
User-facing operation the UI calls REST action in library.service.ts → delegates to GraphQL via broker.call("graphql.graphql", {...})
Pure acquisition tracking (no resolution) Direct DB write in library.service.ts, no GraphQL needed

All three new reconciliation operations (getMetadataComparisonView, pickFieldFromSource, batchPickFieldsFromSources) follow the first two rows: GraphQL for the logic + REST wrapper for UI consumption.

Gap: applyComicVineMetadata bypasses canonicalMetadata

Currently library.applyComicVineMetadata writes directly to sourcedMetadata.comicvine in MongoDB without triggering buildCanonicalMetadata. This means canonicalMetadata goes stale when ComicVine data is applied.

The fix: change applyComicVineMetadata to call the existing updateSourcedMetadata GraphQL mutation instead of the direct DB write. updateSourcedMetadata already triggers re-resolution via autoMerge.onMetadataUpdate.

File: services/library.service.ts lines ~937990 (applyComicVineMetadata handler) Change: Replace direct Comic.findByIdAndUpdate with broker.call("graphql.graphql", { query: updateSourcedMetadataMutation, ... })


Phase 2: Source Ranking + AutoResolve (design — not implementing yet)

The infrastructure already exists:

  • UserPreferences.sourcePriorities[] with per-source priority (1=highest)
  • conflictResolution strategy enum (PRIORITY, CONFIDENCE, RECENCY, HYBRID, MANUAL)
  • autoMerge.enabled / onImport / onMetadataUpdate
  • updateUserPreferences resolver

When this phase is implemented, the additions will be:

  1. A "re-resolve all comics" action triggered when source priorities change (POST /reResolveAllWithPreferences)
  2. autoResolveMetadata in graphql.service.ts wired to call resolveMetadata on save rather than only on import/update hooks
  3. Field-specific source overrides UI (the fieldOverrides Map in SourcePrioritySchema is already modeled)

TDD Approach

Each step follows Red → Green → Refactor:

  1. Write failing spec(s) for the unit being built
  2. Implement the minimum code to make them pass
  3. Refactor if needed

Test framework: Jest + ts-jest (configured in package.json, zero existing tests — these will be the first) File convention: *.spec.ts alongside the source file (e.g., utils/metadata.resolution.utils.spec.ts) No DB needed for unit tests — mock Comic.findById etc. with jest.spyOn / jest.mock


Implementation Order

Step 1 — Utility layer (prerequisite for everything)

Write first: utils/metadata.resolution.utils.spec.ts

  • SOURCE_FIELD_PATHS has entries for all canonical fields
  • extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } }) returns 2 entries with correct source + value
  • extractAllSourceValues returns empty array when no source has the field
  • buildCanonicalMetadata() covers all fields in SOURCE_FIELD_PATHS (not just 7)
  • buildCanonicalMetadata() never overwrites fields with userOverride: true

Then implement:

  • models/comic.model.ts — add SHORTBOXED, MARVEL, DC to MetadataSource enum; add 3 new sourcedMetadata fields
  • models/userpreferences.model.ts — add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default sourcePriorities
  • utils/metadata.resolution.utils.ts — add SOURCE_FIELD_PATHS, extractAllSourceValues(), rewrite buildCanonicalMetadata()

Step 2 — GraphQL schema (no tests — type definitions only)

models/graphql/typedef.ts

  • Expand MetadataSource enum (add SHORTBOXED, MARVEL, DC)
  • Add SourceFieldValue, MetadataFieldComparison, MetadataComparisonView, FieldSourcePick types
  • Add getMetadataComparisonView to Query
  • Add pickFieldFromSource, batchPickFieldsFromSources to Mutation

Step 3 — GraphQL resolvers

Write first: models/graphql/resolvers.spec.ts

  • getMetadataComparisonView: returns one entry per field in SOURCE_FIELD_PATHS; hasConflict true when sources disagree; currentCanonical reflects DB state
  • pickFieldFromSource: sets field with source provenance + userOverride: true; throws when source has no value
  • batchPickFieldsFromSources: applies all picks in a single save
  • applyComicVineMetadata fix: calls updateSourcedMetadata mutation (not direct DB write)

Then implement: models/graphql/resolvers.ts

Step 4 — GraphQL util helpers

Write first: utils/import.graphql.utils.spec.ts

  • Each helper calls broker.call("graphql.graphql", ...) with correct query/variables
  • GraphQL errors are propagated

Then implement: utils/import.graphql.utils.ts

Step 5 — REST surface

Write first: services/library.service.spec.ts

  • Each action delegates to the correct GraphQL util helper
  • Context params pass through correctly

Then implement: services/library.service.ts


Critical Files

File Step Change
models/comic.model.ts 1 Add SHORTBOXED, MARVEL, DC to MetadataSource enum; add 3 new sourcedMetadata fields
models/userpreferences.model.ts 1 Add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default sourcePriorities
utils/metadata.resolution.utils.ts 1 Add SOURCE_FIELD_PATHS, extractAllSourceValues(); rewrite buildCanonicalMetadata()
models/graphql/typedef.ts 2 Expand MetadataSource enum; add 4 new types + query + 2 mutations
models/graphql/resolvers.ts 3 Implement 3 resolvers + fix applyComicVineMetadata
utils/import.graphql.utils.ts 4 Add 3 GraphQL util functions
services/library.service.ts 5 Add 3 Moleculer REST actions

Reusable Existing Code

  • resolveMetadataField() in utils/metadata.resolution.utils.ts — reused inside buildCanonicalMetadata()
  • getNestedValue() in same file — reused in extractAllSourceValues()
  • convertPreferences() in models/graphql/resolvers.ts — reused in getMetadataComparisonView
  • autoResolveMetadata() in services/graphql.service.ts — called after pickFieldFromSource if autoMerge.onMetadataUpdate is true

Verification

  1. Unit: extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } }) → 2 entries with correct provenance
  2. GraphQL: getMetadataComparisonView(comicId) on a comic with comicvine + comicInfo data → all fields populated
  3. Cherry-pick: pickFieldFromSource(comicId, "title", COMICVINE)canonicalMetadata.title.provenance.source == "comicvine" and userOverride == true
  4. Batch: batchPickFieldsFromSources with 3 fields → single DB write, all 3 updated
  5. Lock: After cherry-picking, resolveMetadata(comicId) must NOT overwrite picked fields (userOverride: true takes priority)
  6. REST: POST /api/library/getMetadataComparisonView returns expected JSON