Files

2025-10-29 12:25:05 -04:00

9.2 KiB

Raw Blame History

Canonical Comic Metadata Model - Implementation Guide

🎯 Overview

The canonical metadata model provides a comprehensive system for managing comic book metadata from multiple sources with proper provenance tracking, confidence scoring, and conflict resolution.

🏗️ Architecture

Core Components:

📋 Type Definitions (models/canonical-comic.types.ts)
🎯 GraphQL Schema (models/graphql/canonical-typedef.ts)
🔧 Resolution Engine (utils/metadata-resolver.utils.ts)
💾 Database Model (models/canonical-comic.model.ts)
⚙️ Service Layer (services/canonical-metadata.service.ts)

📊 Metadata Sources & Ranking

Source Priority (Highest to Lowest):

enum MetadataSourceRank {
    USER_MANUAL = 1,        // User overrides - highest priority
    COMICINFO_XML = 2,      // Embedded metadata - high trust
    COMICVINE = 3,          // ComicVine API - authoritative
    METRON = 4,             // Metron API - authoritative  
    GCD = 5,                // Grand Comics Database - community
    LOCG = 6,               // League of Comic Geeks - specialized
    LOCAL_FILE = 7          // Filename inference - lowest trust
}

Confidence Scoring:

User Manual: 1.0 (100% trusted)
ComicInfo.XML: 0.8-0.95 (based on completeness)
ComicVine: 0.9 (highly reliable API)
Metron: 0.85 (reliable API)
GCD: 0.8 (community-maintained)
Local File: 0.3 (inference-based)

🔄 Usage Examples

1. Import ComicVine Metadata

// REST API
POST /api/canonicalMetadata/importComicVine/60f7b1234567890abcdef123
{
  "comicVineData": {
    "id": 142857,
    "name": "Amazing Spider-Man #1",
    "issue_number": "1",
    "cover_date": "2023-01-01",
    "volume": {
      "id": 12345,
      "name": "Amazing Spider-Man",
      "start_year": 2023,
      "publisher": { "name": "Marvel Comics" }
    },
    "person_credits": [
      { "name": "Dan Slott", "role": "writer" }
    ]
  }
}

// Service usage
const result = await broker.call('canonicalMetadata.importComicVineMetadata', {
  comicId: '60f7b1234567890abcdef123',
  comicVineData: comicVineData,
  forceUpdate: false
});

2. Import ComicInfo.XML

POST /api/canonicalMetadata/importComicInfo/60f7b1234567890abcdef123
{
  "xmlData": {
    "Title": "Amazing Spider-Man",
    "Series": "Amazing Spider-Man", 
    "Number": "1",
    "Year": 2023,
    "Month": 1,
    "Writer": "Dan Slott",
    "Penciller": "John Romita Jr",
    "Publisher": "Marvel Comics"
  }
}

3. Set Manual Metadata (Highest Priority)

PUT /api/canonicalMetadata/manual/60f7b1234567890abcdef123/title
{
  "value": "The Amazing Spider-Man #1",
  "confidence": 1.0,
  "notes": "User corrected title formatting"
}

4. Resolve Metadata Conflicts

// Get conflicts
GET /api/canonicalMetadata/conflicts/60f7b1234567890abcdef123

// Resolve by selecting preferred source
POST /api/canonicalMetadata/resolve/60f7b1234567890abcdef123/title
{
  "selectedSource": "COMICVINE"
}

5. Query with Source Filtering

query {
  searchComicsByMetadata(
    title: "Spider-Man"
    sources: [COMICVINE, COMICINFO_XML]
    minConfidence: 0.8
  ) {
    resolvedMetadata {
      title
      series { name volume publisher }
      creators { name role }
    }
    canonicalMetadata {
      title {
        value
        source
        confidence
        timestamp
        sourceUrl
      }
    }
  }
}

🔧 Data Structure

Canonical Metadata Storage:

{
  "canonicalMetadata": {
    "title": [
      {
        "value": "Amazing Spider-Man #1",
        "source": "COMICVINE",
        "confidence": 0.9,
        "rank": 3,
        "timestamp": "2023-01-15T10:00:00Z",
        "sourceId": "142857", 
        "sourceUrl": "https://comicvine.gamespot.com/issue/4000-142857/"
      },
      {
        "value": "Amazing Spider-Man",
        "source": "COMICINFO_XML",
        "confidence": 0.8,
        "rank": 2,
        "timestamp": "2023-01-15T09:00:00Z"
      }
    ],
    "creators": [
      {
        "value": [
          { "name": "Dan Slott", "role": "Writer" },
          { "name": "John Romita Jr", "role": "Penciller" }
        ],
        "source": "COMICINFO_XML",
        "confidence": 0.85,
        "rank": 2,
        "timestamp": "2023-01-15T09:00:00Z"
      }
    ]
  }
}

Resolved Metadata (Best Values):

{
  "resolvedMetadata": {
    "title": "Amazing Spider-Man #1",           // From ComicVine (higher confidence)
    "series": {
      "name": "Amazing Spider-Man",
      "volume": 1,
      "publisher": "Marvel Comics"
    },
    "creators": [
      { "name": "Dan Slott", "role": "Writer" },
      { "name": "John Romita Jr", "role": "Penciller" }
    ],
    "lastResolved": "2023-01-15T10:30:00Z",
    "resolutionConflicts": [
      {
        "field": "title",
        "conflictingValues": [
          { "value": "Amazing Spider-Man #1", "source": "COMICVINE", "confidence": 0.9 },
          { "value": "Amazing Spider-Man", "source": "COMICINFO_XML", "confidence": 0.8 }
        ]
      }
    ]
  }
}

⚙️ Resolution Strategies

Available Strategies:

const strategies = {
  // Use source with highest confidence score
  highest_confidence: { strategy: 'highest_confidence' },
  
  // Use source with highest rank (USER_MANUAL > COMICINFO_XML > COMICVINE...)
  highest_rank: { strategy: 'highest_rank' },
  
  // Use most recently added metadata  
  most_recent: { strategy: 'most_recent' },
  
  // Prefer user manual entries
  user_preference: { strategy: 'user_preference' },
  
  // Attempt to find consensus among sources
  consensus: { strategy: 'consensus' }
};

Custom Strategy:

const customStrategy: MetadataResolutionStrategy = {
  strategy: 'highest_rank',
  minimumConfidence: 0.7,
  allowedSources: [MetadataSource.COMICVINE, MetadataSource.COMICINFO_XML],
  fieldSpecificStrategies: {
    'creators': { strategy: 'consensus' },  // Merge creators from multiple sources
    'title': { strategy: 'highest_confidence' }  // Use most confident title
  }
};

🚀 Integration Workflow

1. Local File Import Process:

// 1. Extract file metadata
const localMetadata = extractLocalMetadata(filePath);
comic.addMetadata('title', inferredTitle, MetadataSource.LOCAL_FILE, 0.3);

// 2. Parse ComicInfo.XML (if exists)
if (comicInfoXML) {
  await broker.call('canonicalMetadata.importComicInfoXML', {
    comicId: comic._id,
    xmlData: comicInfoXML
  });
}

// 3. Enhance with external APIs
const comicVineMatch = await searchComicVine(comic.resolvedMetadata.title);
if (comicVineMatch) {
  await broker.call('canonicalMetadata.importComicVineMetadata', {
    comicId: comic._id, 
    comicVineData: comicVineMatch
  });
}

// 4. Resolve final metadata
await broker.call('canonicalMetadata.reResolveMetadata', {
  comicId: comic._id
});

2. Conflict Resolution Workflow:

// 1. Detect conflicts
const conflicts = await broker.call('canonicalMetadata.getMetadataConflicts', {
  comicId: comic._id
});

// 2. Present to user for resolution
if (conflicts.length > 0) {
  // Show UI with conflicting values and sources
  const userChoice = await presentConflictResolution(conflicts);
  
  // 3. Apply user's resolution
  await broker.call('canonicalMetadata.resolveMetadataConflict', {
    comicId: comic._id,
    field: userChoice.field,
    selectedSource: userChoice.source
  });
}

📈 Performance Considerations

Database Indexes:

✅ Text search: resolvedMetadata.title, resolvedMetadata.series.name
✅ Unique identification: series.name + volume + issueNumber
✅ Source filtering: canonicalMetadata.*.source + confidence
✅ Import status: importStatus.isImported + tagged

Optimization Tips:

Batch metadata imports for large collections
Cache resolved metadata for frequently accessed comics
Index on confidence scores for quality filtering
Paginate conflict resolution for large libraries

🛡️ Best Practices

Data Quality:

Always validate external API responses before import
Set appropriate confidence scores based on source reliability
Preserve original data in source-specific fields
Log metadata changes for audit trails

Conflict Management:

Prefer user overrides for disputed fields
Use consensus for aggregatable fields (creators, characters)
Maintain provenance links to original sources
Provide clear UI for conflict resolution

Performance:

Re-resolve metadata only when sources change
Cache frequently accessed resolved metadata
Batch operations for bulk imports
Use appropriate indexes for common queries

This canonical metadata model provides enterprise-grade metadata management with full provenance tracking, confidence scoring, and flexible conflict resolution for comic book collections of any size.

9.2 KiB Raw Blame History