Files
threetwo-core-service/CANONICAL_METADATA_GUIDE.md
2025-10-29 12:25:05 -04:00

9.2 KiB

Canonical Comic Metadata Model - Implementation Guide

🎯 Overview

The canonical metadata model provides a comprehensive system for managing comic book metadata from multiple sources with proper provenance tracking, confidence scoring, and conflict resolution.

🏗️ Architecture

Core Components:

  1. 📋 Type Definitions (models/canonical-comic.types.ts)
  2. 🎯 GraphQL Schema (models/graphql/canonical-typedef.ts)
  3. 🔧 Resolution Engine (utils/metadata-resolver.utils.ts)
  4. 💾 Database Model (models/canonical-comic.model.ts)
  5. ⚙️ Service Layer (services/canonical-metadata.service.ts)

📊 Metadata Sources & Ranking

Source Priority (Highest to Lowest):

enum MetadataSourceRank {
    USER_MANUAL = 1,        // User overrides - highest priority
    COMICINFO_XML = 2,      // Embedded metadata - high trust
    COMICVINE = 3,          // ComicVine API - authoritative
    METRON = 4,             // Metron API - authoritative  
    GCD = 5,                // Grand Comics Database - community
    LOCG = 6,               // League of Comic Geeks - specialized
    LOCAL_FILE = 7          // Filename inference - lowest trust
}

Confidence Scoring:

  • User Manual: 1.0 (100% trusted)
  • ComicInfo.XML: 0.8-0.95 (based on completeness)
  • ComicVine: 0.9 (highly reliable API)
  • Metron: 0.85 (reliable API)
  • GCD: 0.8 (community-maintained)
  • Local File: 0.3 (inference-based)

🔄 Usage Examples

1. Import ComicVine Metadata

// REST API
POST /api/canonicalMetadata/importComicVine/60f7b1234567890abcdef123
{
  "comicVineData": {
    "id": 142857,
    "name": "Amazing Spider-Man #1",
    "issue_number": "1",
    "cover_date": "2023-01-01",
    "volume": {
      "id": 12345,
      "name": "Amazing Spider-Man",
      "start_year": 2023,
      "publisher": { "name": "Marvel Comics" }
    },
    "person_credits": [
      { "name": "Dan Slott", "role": "writer" }
    ]
  }
}
// Service usage
const result = await broker.call('canonicalMetadata.importComicVineMetadata', {
  comicId: '60f7b1234567890abcdef123',
  comicVineData: comicVineData,
  forceUpdate: false
});

2. Import ComicInfo.XML

POST /api/canonicalMetadata/importComicInfo/60f7b1234567890abcdef123
{
  "xmlData": {
    "Title": "Amazing Spider-Man",
    "Series": "Amazing Spider-Man", 
    "Number": "1",
    "Year": 2023,
    "Month": 1,
    "Writer": "Dan Slott",
    "Penciller": "John Romita Jr",
    "Publisher": "Marvel Comics"
  }
}

3. Set Manual Metadata (Highest Priority)

PUT /api/canonicalMetadata/manual/60f7b1234567890abcdef123/title
{
  "value": "The Amazing Spider-Man #1",
  "confidence": 1.0,
  "notes": "User corrected title formatting"
}

4. Resolve Metadata Conflicts

// Get conflicts
GET /api/canonicalMetadata/conflicts/60f7b1234567890abcdef123

// Resolve by selecting preferred source
POST /api/canonicalMetadata/resolve/60f7b1234567890abcdef123/title
{
  "selectedSource": "COMICVINE"
}

5. Query with Source Filtering

query {
  searchComicsByMetadata(
    title: "Spider-Man"
    sources: [COMICVINE, COMICINFO_XML]
    minConfidence: 0.8
  ) {
    resolvedMetadata {
      title
      series { name volume publisher }
      creators { name role }
    }
    canonicalMetadata {
      title {
        value
        source
        confidence
        timestamp
        sourceUrl
      }
    }
  }
}

🔧 Data Structure

Canonical Metadata Storage:

{
  "canonicalMetadata": {
    "title": [
      {
        "value": "Amazing Spider-Man #1",
        "source": "COMICVINE",
        "confidence": 0.9,
        "rank": 3,
        "timestamp": "2023-01-15T10:00:00Z",
        "sourceId": "142857", 
        "sourceUrl": "https://comicvine.gamespot.com/issue/4000-142857/"
      },
      {
        "value": "Amazing Spider-Man",
        "source": "COMICINFO_XML",
        "confidence": 0.8,
        "rank": 2,
        "timestamp": "2023-01-15T09:00:00Z"
      }
    ],
    "creators": [
      {
        "value": [
          { "name": "Dan Slott", "role": "Writer" },
          { "name": "John Romita Jr", "role": "Penciller" }
        ],
        "source": "COMICINFO_XML",
        "confidence": 0.85,
        "rank": 2,
        "timestamp": "2023-01-15T09:00:00Z"
      }
    ]
  }
}

Resolved Metadata (Best Values):

{
  "resolvedMetadata": {
    "title": "Amazing Spider-Man #1",           // From ComicVine (higher confidence)
    "series": {
      "name": "Amazing Spider-Man",
      "volume": 1,
      "publisher": "Marvel Comics"
    },
    "creators": [
      { "name": "Dan Slott", "role": "Writer" },
      { "name": "John Romita Jr", "role": "Penciller" }
    ],
    "lastResolved": "2023-01-15T10:30:00Z",
    "resolutionConflicts": [
      {
        "field": "title",
        "conflictingValues": [
          { "value": "Amazing Spider-Man #1", "source": "COMICVINE", "confidence": 0.9 },
          { "value": "Amazing Spider-Man", "source": "COMICINFO_XML", "confidence": 0.8 }
        ]
      }
    ]
  }
}

⚙️ Resolution Strategies

Available Strategies:

const strategies = {
  // Use source with highest confidence score
  highest_confidence: { strategy: 'highest_confidence' },
  
  // Use source with highest rank (USER_MANUAL > COMICINFO_XML > COMICVINE...)
  highest_rank: { strategy: 'highest_rank' },
  
  // Use most recently added metadata  
  most_recent: { strategy: 'most_recent' },
  
  // Prefer user manual entries
  user_preference: { strategy: 'user_preference' },
  
  // Attempt to find consensus among sources
  consensus: { strategy: 'consensus' }
};

Custom Strategy:

const customStrategy: MetadataResolutionStrategy = {
  strategy: 'highest_rank',
  minimumConfidence: 0.7,
  allowedSources: [MetadataSource.COMICVINE, MetadataSource.COMICINFO_XML],
  fieldSpecificStrategies: {
    'creators': { strategy: 'consensus' },  // Merge creators from multiple sources
    'title': { strategy: 'highest_confidence' }  // Use most confident title
  }
};

🚀 Integration Workflow

1. Local File Import Process:

// 1. Extract file metadata
const localMetadata = extractLocalMetadata(filePath);
comic.addMetadata('title', inferredTitle, MetadataSource.LOCAL_FILE, 0.3);

// 2. Parse ComicInfo.XML (if exists)
if (comicInfoXML) {
  await broker.call('canonicalMetadata.importComicInfoXML', {
    comicId: comic._id,
    xmlData: comicInfoXML
  });
}

// 3. Enhance with external APIs
const comicVineMatch = await searchComicVine(comic.resolvedMetadata.title);
if (comicVineMatch) {
  await broker.call('canonicalMetadata.importComicVineMetadata', {
    comicId: comic._id, 
    comicVineData: comicVineMatch
  });
}

// 4. Resolve final metadata
await broker.call('canonicalMetadata.reResolveMetadata', {
  comicId: comic._id
});

2. Conflict Resolution Workflow:

// 1. Detect conflicts
const conflicts = await broker.call('canonicalMetadata.getMetadataConflicts', {
  comicId: comic._id
});

// 2. Present to user for resolution
if (conflicts.length > 0) {
  // Show UI with conflicting values and sources
  const userChoice = await presentConflictResolution(conflicts);
  
  // 3. Apply user's resolution
  await broker.call('canonicalMetadata.resolveMetadataConflict', {
    comicId: comic._id,
    field: userChoice.field,
    selectedSource: userChoice.source
  });
}

📈 Performance Considerations

Database Indexes:

  • Text search: resolvedMetadata.title, resolvedMetadata.series.name
  • Unique identification: series.name + volume + issueNumber
  • Source filtering: canonicalMetadata.*.source + confidence
  • Import status: importStatus.isImported + tagged

Optimization Tips:

  • Batch metadata imports for large collections
  • Cache resolved metadata for frequently accessed comics
  • Index on confidence scores for quality filtering
  • Paginate conflict resolution for large libraries

🛡️ Best Practices

Data Quality:

  1. Always validate external API responses before import
  2. Set appropriate confidence scores based on source reliability
  3. Preserve original data in source-specific fields
  4. Log metadata changes for audit trails

Conflict Management:

  1. Prefer user overrides for disputed fields
  2. Use consensus for aggregatable fields (creators, characters)
  3. Maintain provenance links to original sources
  4. Provide clear UI for conflict resolution

Performance:

  1. Re-resolve metadata only when sources change
  2. Cache frequently accessed resolved metadata
  3. Batch operations for bulk imports
  4. Use appropriate indexes for common queries

This canonical metadata model provides enterprise-grade metadata management with full provenance tracking, confidence scoring, and flexible conflict resolution for comic book collections of any size.