# Canonical Comic Metadata Model - Implementation Guide ## 🎯 Overview The canonical metadata model provides a comprehensive system for managing comic book metadata from multiple sources with proper **provenance tracking**, **confidence scoring**, and **conflict resolution**. ## 🏗️ Architecture ### **Core Components:** 1. **📋 Type Definitions** ([`models/canonical-comic.types.ts`](models/canonical-comic.types.ts:1)) 2. **🎯 GraphQL Schema** ([`models/graphql/canonical-typedef.ts`](models/graphql/canonical-typedef.ts:1)) 3. **🔧 Resolution Engine** ([`utils/metadata-resolver.utils.ts`](utils/metadata-resolver.utils.ts:1)) 4. **💾 Database Model** ([`models/canonical-comic.model.ts`](models/canonical-comic.model.ts:1)) 5. **⚙️ Service Layer** ([`services/canonical-metadata.service.ts`](services/canonical-metadata.service.ts:1)) --- ## 📊 Metadata Sources & Ranking ### **Source Priority (Highest to Lowest):** ```typescript enum MetadataSourceRank { USER_MANUAL = 1, // User overrides - highest priority COMICINFO_XML = 2, // Embedded metadata - high trust COMICVINE = 3, // ComicVine API - authoritative METRON = 4, // Metron API - authoritative GCD = 5, // Grand Comics Database - community LOCG = 6, // League of Comic Geeks - specialized LOCAL_FILE = 7 // Filename inference - lowest trust } ``` ### **Confidence Scoring:** - **User Manual**: 1.0 (100% trusted) - **ComicInfo.XML**: 0.8-0.95 (based on completeness) - **ComicVine**: 0.9 (highly reliable API) - **Metron**: 0.85 (reliable API) - **GCD**: 0.8 (community-maintained) - **Local File**: 0.3 (inference-based) --- ## 🔄 Usage Examples ### **1. Import ComicVine Metadata** ```typescript // REST API POST /api/canonicalMetadata/importComicVine/60f7b1234567890abcdef123 { "comicVineData": { "id": 142857, "name": "Amazing Spider-Man #1", "issue_number": "1", "cover_date": "2023-01-01", "volume": { "id": 12345, "name": "Amazing Spider-Man", "start_year": 2023, "publisher": { "name": "Marvel Comics" } }, "person_credits": [ { "name": "Dan Slott", "role": "writer" } ] } } ``` ```typescript // Service usage const result = await broker.call('canonicalMetadata.importComicVineMetadata', { comicId: '60f7b1234567890abcdef123', comicVineData: comicVineData, forceUpdate: false }); ``` ### **2. Import ComicInfo.XML** ```typescript POST /api/canonicalMetadata/importComicInfo/60f7b1234567890abcdef123 { "xmlData": { "Title": "Amazing Spider-Man", "Series": "Amazing Spider-Man", "Number": "1", "Year": 2023, "Month": 1, "Writer": "Dan Slott", "Penciller": "John Romita Jr", "Publisher": "Marvel Comics" } } ``` ### **3. Set Manual Metadata (Highest Priority)** ```typescript PUT /api/canonicalMetadata/manual/60f7b1234567890abcdef123/title { "value": "The Amazing Spider-Man #1", "confidence": 1.0, "notes": "User corrected title formatting" } ``` ### **4. Resolve Metadata Conflicts** ```typescript // Get conflicts GET /api/canonicalMetadata/conflicts/60f7b1234567890abcdef123 // Resolve by selecting preferred source POST /api/canonicalMetadata/resolve/60f7b1234567890abcdef123/title { "selectedSource": "COMICVINE" } ``` ### **5. Query with Source Filtering** ```graphql query { searchComicsByMetadata( title: "Spider-Man" sources: [COMICVINE, COMICINFO_XML] minConfidence: 0.8 ) { resolvedMetadata { title series { name volume publisher } creators { name role } } canonicalMetadata { title { value source confidence timestamp sourceUrl } } } } ``` --- ## 🔧 Data Structure ### **Canonical Metadata Storage:** ```typescript { "canonicalMetadata": { "title": [ { "value": "Amazing Spider-Man #1", "source": "COMICVINE", "confidence": 0.9, "rank": 3, "timestamp": "2023-01-15T10:00:00Z", "sourceId": "142857", "sourceUrl": "https://comicvine.gamespot.com/issue/4000-142857/" }, { "value": "Amazing Spider-Man", "source": "COMICINFO_XML", "confidence": 0.8, "rank": 2, "timestamp": "2023-01-15T09:00:00Z" } ], "creators": [ { "value": [ { "name": "Dan Slott", "role": "Writer" }, { "name": "John Romita Jr", "role": "Penciller" } ], "source": "COMICINFO_XML", "confidence": 0.85, "rank": 2, "timestamp": "2023-01-15T09:00:00Z" } ] } } ``` ### **Resolved Metadata (Best Values):** ```typescript { "resolvedMetadata": { "title": "Amazing Spider-Man #1", // From ComicVine (higher confidence) "series": { "name": "Amazing Spider-Man", "volume": 1, "publisher": "Marvel Comics" }, "creators": [ { "name": "Dan Slott", "role": "Writer" }, { "name": "John Romita Jr", "role": "Penciller" } ], "lastResolved": "2023-01-15T10:30:00Z", "resolutionConflicts": [ { "field": "title", "conflictingValues": [ { "value": "Amazing Spider-Man #1", "source": "COMICVINE", "confidence": 0.9 }, { "value": "Amazing Spider-Man", "source": "COMICINFO_XML", "confidence": 0.8 } ] } ] } } ``` --- ## ⚙️ Resolution Strategies ### **Available Strategies:** ```typescript const strategies = { // Use source with highest confidence score highest_confidence: { strategy: 'highest_confidence' }, // Use source with highest rank (USER_MANUAL > COMICINFO_XML > COMICVINE...) highest_rank: { strategy: 'highest_rank' }, // Use most recently added metadata most_recent: { strategy: 'most_recent' }, // Prefer user manual entries user_preference: { strategy: 'user_preference' }, // Attempt to find consensus among sources consensus: { strategy: 'consensus' } }; ``` ### **Custom Strategy:** ```typescript const customStrategy: MetadataResolutionStrategy = { strategy: 'highest_rank', minimumConfidence: 0.7, allowedSources: [MetadataSource.COMICVINE, MetadataSource.COMICINFO_XML], fieldSpecificStrategies: { 'creators': { strategy: 'consensus' }, // Merge creators from multiple sources 'title': { strategy: 'highest_confidence' } // Use most confident title } }; ``` --- ## 🚀 Integration Workflow ### **1. Local File Import Process:** ```typescript // 1. Extract file metadata const localMetadata = extractLocalMetadata(filePath); comic.addMetadata('title', inferredTitle, MetadataSource.LOCAL_FILE, 0.3); // 2. Parse ComicInfo.XML (if exists) if (comicInfoXML) { await broker.call('canonicalMetadata.importComicInfoXML', { comicId: comic._id, xmlData: comicInfoXML }); } // 3. Enhance with external APIs const comicVineMatch = await searchComicVine(comic.resolvedMetadata.title); if (comicVineMatch) { await broker.call('canonicalMetadata.importComicVineMetadata', { comicId: comic._id, comicVineData: comicVineMatch }); } // 4. Resolve final metadata await broker.call('canonicalMetadata.reResolveMetadata', { comicId: comic._id }); ``` ### **2. Conflict Resolution Workflow:** ```typescript // 1. Detect conflicts const conflicts = await broker.call('canonicalMetadata.getMetadataConflicts', { comicId: comic._id }); // 2. Present to user for resolution if (conflicts.length > 0) { // Show UI with conflicting values and sources const userChoice = await presentConflictResolution(conflicts); // 3. Apply user's resolution await broker.call('canonicalMetadata.resolveMetadataConflict', { comicId: comic._id, field: userChoice.field, selectedSource: userChoice.source }); } ``` --- ## 📈 Performance Considerations ### **Database Indexes:** - ✅ **Text search**: `resolvedMetadata.title`, `resolvedMetadata.series.name` - ✅ **Unique identification**: `series.name` + `volume` + `issueNumber` - ✅ **Source filtering**: `canonicalMetadata.*.source` + `confidence` - ✅ **Import status**: `importStatus.isImported` + `tagged` ### **Optimization Tips:** - **Batch metadata imports** for large collections - **Cache resolved metadata** for frequently accessed comics - **Index on confidence scores** for quality filtering - **Paginate conflict resolution** for large libraries --- ## 🛡️ Best Practices ### **Data Quality:** 1. **Always validate** external API responses before import 2. **Set appropriate confidence** scores based on source reliability 3. **Preserve original data** in source-specific fields 4. **Log metadata changes** for audit trails ### **Conflict Management:** 1. **Prefer user overrides** for disputed fields 2. **Use consensus** for aggregatable fields (creators, characters) 3. **Maintain provenance** links to original sources 4. **Provide clear UI** for conflict resolution ### **Performance:** 1. **Re-resolve metadata** only when sources change 2. **Cache frequently accessed** resolved metadata 3. **Batch operations** for bulk imports 4. **Use appropriate indexes** for common queries --- This canonical metadata model provides enterprise-grade metadata management with full provenance tracking, confidence scoring, and flexible conflict resolution for comic book collections of any size.