📖 Metadata field mapping of popular sources
Some checks failed
Docker Image CI / build (push) Has been cancelled

This commit is contained in:
2026-04-03 18:15:49 -04:00
parent bfaf7bb664
commit bf7e57a274
2 changed files with 425 additions and 0 deletions

View File

@@ -0,0 +1,95 @@
# Metadata Field Mappings
Maps each canonical field to the dot-path where its value lives inside `sourcedMetadata.<source>`.
Used by `SOURCE_FIELD_PATHS` in `utils/metadata.resolution.utils.ts` and drives both the auto-resolution algorithm and the cherry-pick comparison view.
Dot-notation paths are relative to `sourcedMetadata.<source>`
(e.g. `volumeInformation.name``comic.sourcedMetadata.comicvine.volumeInformation.name`).
---
## Source API Notes
| Source | API | Auth | Notes |
|---|---|---|---|
| ComicVine | `https://comicvine.gamespot.com/api/` | Free API key | Covers Marvel, DC, and independents |
| Metron | `https://metron.cloud/api/` | Free account | Modern community DB, growing |
| GCD | `https://www.comics.org/api/` | None | Creators/characters live inside `story_set[]`, not top-level |
| LOCG | `https://leagueofcomicgeeks.com` | No public API | Scraped or partner access |
| ComicInfo.xml | Embedded in archive | N/A | ComicRack standard |
| Shortboxed | `https://api.shortboxed.com` | Partner key | Release-focused; limited metadata |
| Marvel | `https://gateway.marvel.com/v1/public/` | API key | Official Marvel API |
| DC | No official public API | — | Use ComicVine for DC issues |
---
## Scalar Fields
| Canonical Field | ComicVine | Metron | GCD | LOCG | ComicInfo.xml | Shortboxed | Marvel |
|---|---|---|---|---|---|---|---|
| `title` | `name` | `name` | `title` | `name` | `Title` | `title` | `title` |
| `series` | `volumeInformation.name` | `series.name` | `series_name` | TBD | `Series` | TBD | `series.name` |
| `issueNumber` | `issue_number` | `number` | `number` | TBD | `Number` | TBD | `issueNumber` |
| `volume` | `volume.name` | `series.volume` | `volume` | TBD | `Volume` | TBD | TBD |
| `publisher` | `volumeInformation.publisher.name` | `publisher.name` | `indicia_publisher` | `publisher` | `Publisher` | `publisher` | `"Marvel"` (static) |
| `imprint` | TBD | `imprint.name` | `brand_emblem` | TBD | `Imprint` | TBD | TBD |
| `coverDate` | `cover_date` | `cover_date` | `key_date` | TBD | `CoverDate` | TBD | `dates[onsaleDate].date` |
| `publicationDate` | `store_date` | `store_date` | `on_sale_date` | TBD | TBD | `release_date` | `dates[focDate].date` |
| `description` | `description` | `desc` | `story_set[0].synopsis` | `description` | `Summary` | `description` | `description` |
| `notes` | TBD | TBD | `notes` | TBD | `Notes` | TBD | TBD |
| `pageCount` | TBD | `page_count` | `page_count` | TBD | `PageCount` | TBD | `pageCount` |
| `ageRating` | TBD | `rating.name` | `rating` | TBD | `AgeRating` | TBD | TBD |
| `format` | TBD | `series.series_type.name` | `story_set[0].type` | TBD | `Format` | TBD | `format` |
| `communityRating` | TBD | TBD | TBD | `rating` | TBD | TBD | TBD |
| `coverImage` | `image.original_url` | `image` | `cover` | `cover` | TBD | TBD | `thumbnail.path + "." + thumbnail.extension` |
---
## Array / Nested Fields
GCD creator credits live as free-text strings inside `story_set[0]` (e.g. `"script": "Grant Morrison, Peter Milligan"`), not as structured arrays. These need to be split on commas during mapping.
| Canonical Field | ComicVine | Metron | GCD | LOCG | ComicInfo.xml | Shortboxed | Marvel |
|---|---|---|---|---|---|---|---|
| `creators` (writer) | `person_credits[role=writer]` | `credits[role=writer]` | `story_set[0].script` | TBD | `Writer` | `creators` | `creators.items[role=writer]` |
| `creators` (penciller) | `person_credits[role=penciller]` | `credits[role=penciller]` | `story_set[0].pencils` | TBD | `Penciller` | TBD | `creators.items[role=penciler]` |
| `creators` (inker) | `person_credits[role=inker]` | `credits[role=inker]` | `story_set[0].inks` | TBD | `Inker` | TBD | `creators.items[role=inker]` |
| `creators` (colorist) | `person_credits[role=colorist]` | `credits[role=colorist]` | `story_set[0].colors` | TBD | `Colorist` | TBD | `creators.items[role=colorist]` |
| `creators` (letterer) | `person_credits[role=letterer]` | `credits[role=letterer]` | `story_set[0].letters` | TBD | `Letterer` | TBD | `creators.items[role=letterer]` |
| `creators` (editor) | `person_credits[role=editor]` | `credits[role=editor]` | `story_set[0].editing` | TBD | `Editor` | TBD | `creators.items[role=editor]` |
| `characters` | `character_credits` | `characters` | `story_set[0].characters` | TBD | `Characters` | TBD | `characters.items` |
| `teams` | `team_credits` | `teams` | TBD | TBD | `Teams` | TBD | TBD |
| `locations` | `location_credits` | `locations` | TBD | TBD | `Locations` | TBD | TBD |
| `storyArcs` | `story_arc_credits` | `arcs` | TBD | TBD | `StoryArc` | TBD | `events.items` |
| `stories` | TBD | TBD | `story_set[].title` | TBD | TBD | TBD | `stories.items` |
| `genres` | TBD | `series.genres` | `story_set[0].genre` | TBD | `Genre` | TBD | TBD |
| `tags` | TBD | TBD | `story_set[0].keywords` | TBD | `Tags` | TBD | TBD |
| `universes` | TBD | TBD | TBD | TBD | TBD | TBD | TBD |
| `reprints` | TBD | `reprints` | TBD | TBD | TBD | TBD | TBD |
| `urls` | `site_detail_url` | `resource_url` | `api_url` | `url` | TBD | TBD | `urls[type=detail].url` |
| `prices` | `price` | TBD | `price` | `price` | `Price` | `price` | `prices[type=printPrice].price` |
| `externalIDs` | `id` | `id` | `api_url` | TBD | TBD | `diamond_id` | `id` |
---
## Identifiers / GTINs
| Canonical Field | ComicVine | Metron | GCD | LOCG | ComicInfo.xml | Shortboxed | Marvel |
|---|---|---|---|---|---|---|---|
| `gtin.isbn` | TBD | TBD | `isbn` | TBD | TBD | TBD | `isbn` |
| `gtin.upc` | TBD | TBD | `barcode` | TBD | TBD | TBD | `upc` |
---
## Special Mapping Notes
- **DC Comics**: No official public API. DC issue metadata is sourced via **ComicVine** (which has comprehensive DC coverage). There is no separate `dc` source key.
- **GCD creators**: All credit fields (`script`, `pencils`, `inks`, `colors`, `letters`, `editing`) are comma-separated strings inside `story_set[0]`. Mapping code must split these into individual creator objects with roles assigned.
- **GCD characters/genre/keywords**: Also inside `story_set[0]`, not top-level on the issue.
- **Marvel publisher**: Always "Marvel Comics" — can be set as a static value rather than extracted from the API response.
- **Marvel cover image**: Constructed by concatenating `thumbnail.path + "." + thumbnail.extension`.
- **Marvel dates**: Multiple date types in a `dates[]` array — filter by `type == "onsaleDate"` for cover date, `type == "focDate"` for FOC/publication date.
- **Marvel creators/characters**: Nested inside collection objects (`creators.items[]`, `characters.items[]`) with `name` and `role` sub-fields.
- **Shortboxed**: Release-focused service; limited metadata. Best used for `publicationDate`, `price`, and `publisher` only. No series/issue number fields.
- **LOCG**: No public API; fields marked TBD will need to be confirmed when integration is built.

View File

@@ -0,0 +1,330 @@
# Metadata Reconciliation System Plan
## Context
Comics in the library can have metadata from multiple sources: ComicVine, Metron, GCD, LOCG, ComicInfo.xml, Shortboxed, Marvel, DC, and Manual. The existing `canonicalMetadata` + `sourcedMetadata` architecture already stores raw per-source data and has a resolution algorithm, but there's no way for a user to interactively compare and cherry-pick values across sources field-by-field. This plan adds that manual reconciliation workflow (Phase 1) and lays the groundwork for ranked auto-resolution (Phase 2).
---
## Current State (what already exists)
- `sourcedMetadata.{comicvine,metron,gcd,locg,comicInfo}` — raw per-source data (Mongoose Mixed) — **Shortboxed, Marvel, DC not yet added**
- `canonicalMetadata` — resolved truth, each field is `{ value, provenance, userOverride }`
- `analyzeMetadataConflicts(comicId)` GraphQL query — conflict view for 5 fields only
- `setMetadataField(comicId, field, value)` — stores MANUAL override with raw string
- `resolveMetadata(comicId)` / `bulkResolveMetadata(comicIds)` — trigger auto-resolution
- `previewCanonicalMetadata(comicId, preferences)` — dry run
- `buildCanonicalMetadata()` in `utils/metadata.resolution.utils.ts` — covers only 7 fields
- `UserPreferences` model with `sourcePriorities`, `conflictResolution`, `autoMerge`
- `updateUserPreferences` resolver — fully implemented
- `autoResolveMetadata()` in `services/graphql.service.ts` — exists but only for scalar triggers
---
## Phase 1: Manual Cherry-Pick Reconciliation
### Goal
For any comic, a user can open a comparison table: each row is a canonical field, each column is a source. They click a cell to "pick" that source's value for that field. The result is stored as `canonicalMetadata.<field>` with the original source's provenance intact and `userOverride: true` to prevent future auto-resolution from overwriting it.
### Expand `MetadataSource` enum (`models/comic.model.ts` + `models/graphql/typedef.ts`)
Add new sources to the enum:
```ts
enum MetadataSource {
COMICVINE = "comicvine",
METRON = "metron",
GRAND_COMICS_DATABASE = "gcd",
LOCG = "locg",
COMICINFO_XML = "comicinfo",
SHORTBOXED = "shortboxed",
MARVEL = "marvel",
DC = "dc",
MANUAL = "manual",
}
```
Also add to `sourcedMetadata` in `ComicSchema` (`models/comic.model.ts`):
```ts
shortboxed: { type: mongoose.Schema.Types.Mixed, default: {} },
marvel: { type: mongoose.Schema.Types.Mixed, default: {} },
dc: { type: mongoose.Schema.Types.Mixed, default: {} },
```
And in GraphQL schema enum:
```graphql
enum MetadataSource {
COMICVINE
METRON
GRAND_COMICS_DATABASE
LOCG
COMICINFO_XML
SHORTBOXED
MARVEL
DC
MANUAL
}
```
> **Note:** Shortboxed, Marvel, and DC field paths in `SOURCE_FIELD_PATHS` will be stubs (`{}`) until those integrations are built. The comparison view will simply show no data for those sources until then — no breaking changes.
---
### New types (GraphQL — `models/graphql/typedef.ts`)
```graphql
# One source's value for a single field
type SourceFieldValue {
source: MetadataSource!
value: JSON # null if source has no value for this field
confidence: Float
fetchedAt: String
url: String
}
# All sources' values for a single canonical field
type MetadataFieldComparison {
field: String!
currentCanonical: MetadataField # what is currently resolved
sourcedValues: [SourceFieldValue!]! # one entry per source that has data
hasConflict: Boolean! # true if >1 source has a different value
}
type MetadataComparisonView {
comicId: ID!
comparisons: [MetadataFieldComparison!]!
}
```
Add to `Query`:
```graphql
getMetadataComparisonView(comicId: ID!): MetadataComparisonView!
```
Add to `Mutation`:
```graphql
# Cherry-pick a single field from a named source
pickFieldFromSource(comicId: ID!, field: String!, source: MetadataSource!): Comic!
# Batch cherry-pick multiple fields at once
batchPickFieldsFromSources(
comicId: ID!
picks: [FieldSourcePick!]!
): Comic!
input FieldSourcePick {
field: String!
source: MetadataSource!
}
```
### Changes to `utils/metadata.resolution.utils.ts`
Add `SOURCE_FIELD_PATHS` — a complete mapping of every canonical field to its path in each sourced-metadata blob:
```ts
export const SOURCE_FIELD_PATHS: Record<
string, // canonical field name
Partial<Record<MetadataSource, string>> // source → dot-path in sourcedMetadata[source]
> = {
title: { comicvine: "name", metron: "name", comicinfo: "Title", locg: "name" },
series: { comicvine: "volumeInformation.name", comicinfo: "Series" },
issueNumber: { comicvine: "issue_number", metron: "number", comicinfo: "Number" },
publisher: { comicvine: "volumeInformation.publisher.name", locg: "publisher", comicinfo: "Publisher" },
coverDate: { comicvine: "cover_date", metron: "cover_date", comicinfo: "CoverDate" },
description: { comicvine: "description", locg: "description", comicinfo: "Summary" },
pageCount: { comicinfo: "PageCount", metron: "page_count" },
ageRating: { comicinfo: "AgeRating", metron: "rating.name" },
format: { metron: "series.series_type.name", comicinfo: "Format" },
// creators → array field, handled separately
storyArcs: { comicvine: "story_arc_credits", metron: "arcs", comicinfo: "StoryArc" },
characters: { comicvine: "character_credits", metron: "characters", comicinfo: "Characters" },
teams: { comicvine: "team_credits", metron: "teams", comicinfo: "Teams" },
locations: { comicvine: "location_credits", metron: "locations", comicinfo: "Locations" },
genres: { metron: "series.genres", comicinfo: "Genre" },
tags: { comicinfo: "Tags" },
communityRating: { locg: "rating" },
coverImage: { comicvine: "image.original_url", locg: "cover", metron: "image" },
// Shortboxed, Marvel, DC — paths TBD when integrations are built
// shortboxed: {}, marvel: {}, dc: {}
};
```
Add `extractAllSourceValues(field, sourcedMetadata)` — returns `SourceFieldValue[]` for every source that has a non-null value for the given field.
Update `buildCanonicalMetadata()` to use `SOURCE_FIELD_PATHS` instead of the hard-coded 7-field mapping. This single source of truth drives both auto-resolve and the comparison view.
### Changes to `models/graphql/resolvers.ts`
**`getMetadataComparisonView` resolver:**
- Fetch comic by ID
- For each key in `SOURCE_FIELD_PATHS`, call `extractAllSourceValues()`
- Return the comparison array with `hasConflict` flag
- Include `currentCanonical` from `comic.canonicalMetadata[field]` if it exists
**`pickFieldFromSource` resolver:**
- Fetch comic, validate source has a value for the field
- Extract value + provenance from `sourcedMetadata[source]` via `SOURCE_FIELD_PATHS`
- Write to `canonicalMetadata[field]` with original source provenance + `userOverride: true`
- Save and return comic
**`batchPickFieldsFromSources` resolver:**
- Same as above but iterate over `picks[]`, do a single `comic.save()`
### Changes to `services/library.service.ts`
Add Moleculer actions that delegate to GraphQL:
```ts
getMetadataComparisonView: {
rest: "POST /getMetadataComparisonView",
async handler(ctx) { /* call GraphQL query */ }
},
pickFieldFromSource: {
rest: "POST /pickFieldFromSource",
async handler(ctx) { /* call GraphQL mutation */ }
},
batchPickFieldsFromSources: {
rest: "POST /batchPickFieldsFromSources",
async handler(ctx) { /* call GraphQL mutation */ }
},
```
### Changes to `utils/import.graphql.utils.ts`
Add three helper functions mirroring the pattern of existing utils:
- `getMetadataComparisonViewViaGraphQL(broker, comicId)`
- `pickFieldFromSourceViaGraphQL(broker, comicId, field, source)`
- `batchPickFieldsFromSourcesViaGraphQL(broker, comicId, picks)`
---
## Architectural Guidance: GraphQL vs REST
The project has two distinct patterns — use the right one:
| Type of operation | Pattern |
|---|---|
| Complex metadata logic (resolution, provenance, conflict analysis) | **GraphQL mutation/query** in `typedef.ts` + `resolvers.ts` |
| User-facing operation the UI calls | **REST action** in `library.service.ts` → delegates to GraphQL via `broker.call("graphql.graphql", {...})` |
| Pure acquisition tracking (no resolution) | Direct DB write in `library.service.ts`, no GraphQL needed |
**All three new reconciliation operations** (`getMetadataComparisonView`, `pickFieldFromSource`, `batchPickFieldsFromSources`) follow the first two rows: GraphQL for the logic + REST wrapper for UI consumption.
### Gap: `applyComicVineMetadata` bypasses canonicalMetadata
Currently `library.applyComicVineMetadata` writes directly to `sourcedMetadata.comicvine` in MongoDB without triggering `buildCanonicalMetadata`. This means `canonicalMetadata` goes stale when ComicVine data is applied.
The fix: change `applyComicVineMetadata` to call the existing `updateSourcedMetadata` GraphQL mutation instead of the direct DB write. `updateSourcedMetadata` already triggers re-resolution via `autoMerge.onMetadataUpdate`.
**File**: `services/library.service.ts` lines ~937990 (applyComicVineMetadata handler)
**Change**: Replace direct `Comic.findByIdAndUpdate` with `broker.call("graphql.graphql", { query: updateSourcedMetadataMutation, ... })`
---
## Phase 2: Source Ranking + AutoResolve (design — not implementing yet)
The infrastructure already exists:
- `UserPreferences.sourcePriorities[]` with per-source `priority` (1=highest)
- `conflictResolution` strategy enum (PRIORITY, CONFIDENCE, RECENCY, HYBRID, MANUAL)
- `autoMerge.enabled / onImport / onMetadataUpdate`
- `updateUserPreferences` resolver
When this phase is implemented, the additions will be:
1. A "re-resolve all comics" action triggered when source priorities change (`POST /reResolveAllWithPreferences`)
2. `autoResolveMetadata` in graphql.service.ts wired to call `resolveMetadata` on save rather than only on import/update hooks
3. Field-specific source overrides UI (the `fieldOverrides` Map in `SourcePrioritySchema` is already modeled)
---
## TDD Approach
Each step follows Red → Green → Refactor:
1. Write failing spec(s) for the unit being built
2. Implement the minimum code to make them pass
3. Refactor if needed
**Test framework:** Jest + ts-jest (configured in `package.json`, zero existing tests — these will be the first)
**File convention:** `*.spec.ts` alongside the source file (e.g., `utils/metadata.resolution.utils.spec.ts`)
**No DB needed for unit tests** — mock `Comic.findById` etc. with `jest.spyOn` / `jest.mock`
---
## Implementation Order
### Step 1 — Utility layer (prerequisite for everything)
**Write first:** `utils/metadata.resolution.utils.spec.ts`
- `SOURCE_FIELD_PATHS` has entries for all canonical fields
- `extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } })` returns 2 entries with correct source + value
- `extractAllSourceValues` returns empty array when no source has the field
- `buildCanonicalMetadata()` covers all fields in `SOURCE_FIELD_PATHS` (not just 7)
- `buildCanonicalMetadata()` never overwrites fields with `userOverride: true`
**Then implement:**
- `models/comic.model.ts` — add `SHORTBOXED`, `MARVEL`, `DC` to `MetadataSource` enum; add 3 new `sourcedMetadata` fields
- `models/userpreferences.model.ts` — add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default `sourcePriorities`
- `utils/metadata.resolution.utils.ts` — add `SOURCE_FIELD_PATHS`, `extractAllSourceValues()`, rewrite `buildCanonicalMetadata()`
### Step 2 — GraphQL schema (no tests — type definitions only)
**`models/graphql/typedef.ts`**
- Expand `MetadataSource` enum (add SHORTBOXED, MARVEL, DC)
- Add `SourceFieldValue`, `MetadataFieldComparison`, `MetadataComparisonView`, `FieldSourcePick` types
- Add `getMetadataComparisonView` to `Query`
- Add `pickFieldFromSource`, `batchPickFieldsFromSources` to `Mutation`
### Step 3 — GraphQL resolvers
**Write first:** `models/graphql/resolvers.spec.ts`
- `getMetadataComparisonView`: returns one entry per field in `SOURCE_FIELD_PATHS`; `hasConflict` true when sources disagree; `currentCanonical` reflects DB state
- `pickFieldFromSource`: sets field with source provenance + `userOverride: true`; throws when source has no value
- `batchPickFieldsFromSources`: applies all picks in a single save
- `applyComicVineMetadata` fix: calls `updateSourcedMetadata` mutation (not direct DB write)
**Then implement:** `models/graphql/resolvers.ts`
### Step 4 — GraphQL util helpers
**Write first:** `utils/import.graphql.utils.spec.ts`
- Each helper calls `broker.call("graphql.graphql", ...)` with correct query/variables
- GraphQL errors are propagated
**Then implement:** `utils/import.graphql.utils.ts`
### Step 5 — REST surface
**Write first:** `services/library.service.spec.ts`
- Each action delegates to the correct GraphQL util helper
- Context params pass through correctly
**Then implement:** `services/library.service.ts`
---
## Critical Files
| File | Step | Change |
|---|---|---|
| `models/comic.model.ts` | 1 | Add `SHORTBOXED`, `MARVEL`, `DC` to `MetadataSource` enum; add 3 new `sourcedMetadata` fields |
| `models/userpreferences.model.ts` | 1 | Add SHORTBOXED (priority 7), MARVEL (8), DC (9) to default `sourcePriorities` |
| `utils/metadata.resolution.utils.ts` | 1 | Add `SOURCE_FIELD_PATHS`, `extractAllSourceValues()`; rewrite `buildCanonicalMetadata()` |
| `models/graphql/typedef.ts` | 2 | Expand `MetadataSource` enum; add 4 new types + query + 2 mutations |
| `models/graphql/resolvers.ts` | 3 | Implement 3 resolvers + fix `applyComicVineMetadata` |
| `utils/import.graphql.utils.ts` | 4 | Add 3 GraphQL util functions |
| `services/library.service.ts` | 5 | Add 3 Moleculer REST actions |
---
## Reusable Existing Code
- `resolveMetadataField()` in `utils/metadata.resolution.utils.ts` — reused inside `buildCanonicalMetadata()`
- `getNestedValue()` in same file — reused in `extractAllSourceValues()`
- `convertPreferences()` in `models/graphql/resolvers.ts` — reused in `getMetadataComparisonView`
- `autoResolveMetadata()` in `services/graphql.service.ts` — called after `pickFieldFromSource` if `autoMerge.onMetadataUpdate` is true
---
## Verification
1. **Unit**: `extractAllSourceValues("title", { comicvine: { name: "A" }, metron: { name: "B" } })` → 2 entries with correct provenance
2. **GraphQL**: `getMetadataComparisonView(comicId)` on a comic with comicvine + comicInfo data → all fields populated
3. **Cherry-pick**: `pickFieldFromSource(comicId, "title", COMICVINE)``canonicalMetadata.title.provenance.source == "comicvine"` and `userOverride == true`
4. **Batch**: `batchPickFieldsFromSources` with 3 fields → single DB write, all 3 updated
5. **Lock**: After cherry-picking, `resolveMetadata(comicId)` must NOT overwrite picked fields (`userOverride: true` takes priority)
6. **REST**: `POST /api/library/getMetadataComparisonView` returns expected JSON