Skills jackal092927/skillXexceltable-in-ppt__artifact-generation

Editor's Note

exceltable-in-ppt__artifact-generation

[Common wrapper] You are revising a Task skill, not directly solving the task. Use the provided Meta schema as class-level guidance. Your goal is to improve the Task skill for this task while preserving useful existing structure.

Install

npx skills add https://github.com/jackal092927/skillX --skill exceltable-in-ppt__artifact-generation

SKILL.md

[Meta schema block] Category: artifact-generation Semantic intent: Produce a constrained artifact that must exactly satisfy an output contract. Emphasize:

exact output schema compliance - verify each required field, format constraint, and structural requirement before finalization
preservation constraints and no-invention discipline
missing-input detection with explicit user query before attempting to populate or finalize incomplete artifacts
final contract verification pass - checklist each artifact requirement, flag any deviation or omission
field-level artifact completeness verification - every required field present and valid before submission
explicit artifact contract checklist derived from output requirements
deterministic field population from inputs - computation only to satisfy schema, not to analyze or optimize Avoid:
multi-stage scaffolding when the task is a direct input-to-artifact transformation with no intermediate analytical steps
speculative filling of unknown fields
tool usage that is not required by the artifact contract
computational optimization or analysis beyond what is strictly required to populate artifact fields Expected good fit:
form filling
formatting / conversion with strict output expectations
single-artifact generation or transformation
deterministic field population where inputs map directly to output fields Expected bad fit:
benchmark-gated code repair
control/simulation loops
deep retrieval-heavy synthesis
tasks where computation or optimization is required to determine what content belongs in the artifact
multi-stage analytical pipelines where the artifact is the terminal output of a computational workflow
tasks where analytic-pipeline's workflow orchestration bias is more central than artifact contract satisfaction Hypothesized primary failure modes:
unavailable in prompt-bank-v0.1; use task-local evidence instead Meta schema seed guidance: You are revising a skill for a artifact-generation task. Schema intent: Produce a constrained artifact that must exactly satisfy an output contract.

Outer-loop update mode: differentiating. Keep the Render layer fixed and change only schema-level guidance.

Prioritize:

exact output schema compliance - verify each required field, format constraint, and structural requirement before finalization
preservation constraints and no-invention discipline
missing-input detection with explicit user query before attempting to populate or finalize incomplete artifacts
final contract verification pass - checklist each artifact requirement, flag any deviation or omission
field-level artifact completeness verification - every required field present and valid before submission
explicit artifact contract checklist derived from output requirements
deterministic field population from inputs - computation only to satisfy schema, not to analyze or optimize

Avoid:

multi-stage scaffolding when the task is a direct input-to-artifact transformation with no intermediate analytical steps
speculative filling of unknown fields
tool usage that is not required by the artifact contract
computational optimization or analysis beyond what is strictly required to populate artifact fields

Good fit when:

form filling
formatting / conversion with strict output expectations
single-artifact generation or transformation
deterministic field population where inputs map directly to output fields

Bad fit when:

benchmark-gated code repair
control/simulation loops
deep retrieval-heavy synthesis
tasks where computation or optimization is required to determine what content belongs in the artifact
multi-stage analytical pipelines where the artifact is the terminal output of a computational workflow
tasks where analytic-pipeline's workflow orchestration bias is more central than artifact contract satisfaction

Primary failure modes to guard against:

insufficient artifact-contract-first emphasis causes losses to analytic-pipeline on pure form-filling tasks (evidenced by court-form-filling 68.75pp gap)
schema boundary ambiguity with analytic-pipeline on tasks requiring both artifact production and computational steps

Regenerate task-specific skill guidance from these slots; do not invent a new policy outside them.

[Task context block] Task name: exceltable-in-ppt Task summary: You are working on a pptx file /root/input.pptx for financial reporting. The file contains an embedded Excel table of currency rates, where each value corresponds to the exchange rate from the row currency to the column currency and some cells contain formulas. There is a text box next to the table that provides an updated exchange rate for one currency pair. You need to: Task constraints:

seed schema prior: artifact-generation
verifier mode: deterministic-artifact-plus-stage-check
workflow topology: staged-multi-step
tool surface regime: tool-medium
primary pattern: pipeline
annotation confidence: high
secondary patterns: tool-wrapper, reviewer Task output requirements:
verifier note: deterministic-artifact-plus-stage-check
current skill count: 2

[Current Task skill block] Current Task skill:

pptx

name: pptx description: "Presentation creation, editing, and analysis. When Claude needs to work with presentations (.pptx files) for: (1) Creating new presentations, (2) Modifying or editing content, (3) Working with layouts, (4) Adding comments or speaker notes, or any other presentation tasks" license: Proprietary. LICENSE.txt has complete terms

PPTX creation, editing, and analysis

Overview

A user may ask you to create, edit, or analyze the contents of a .pptx file. A .pptx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks.

Reading and analyzing content

Text extraction

If you just need to read the text contents of a presentation, you should convert the document to markdown:

# Convert document to markdown
python -m markitdown path-to-file.pptx

Raw XML access

You need raw XML access for: comments, speaker notes, slide layouts, animations, design elements, and complex formatting. For any of these features, you'll need to unpack a presentation and read its raw XML contents.

Unpacking a file

python ooxml/scripts/unpack.py <office_file> <output_dir>

Note: The unpack.py script is located at skills/pptx/ooxml/scripts/unpack.py relative to the project root. If the script doesn't exist at this path, use find . -name "unpack.py" to locate it.

Key file structures

ppt/presentation.xml - Main presentation metadata and slide references
ppt/slides/slide{N}.xml - Individual slide contents (slide1.xml, slide2.xml, etc.)
ppt/notesSlides/notesSlide{N}.xml - Speaker notes for each slide
ppt/comments/modernComment_*.xml - Comments for specific slides
ppt/slideLayouts/ - Layout templates for slides
ppt/slideMasters/ - Master slide templates
ppt/theme/ - Theme and styling information
ppt/media/ - Images and other media files

Typography and color extraction

When given an example design to emulate: Always analyze the presentation's typography and colors first using the methods below:

Read theme file: Check ppt/theme/theme1.xml for colors (<a:clrScheme>) and fonts (<a:fontScheme>)
Sample slide content: Examine ppt/slides/slide1.xml for actual font usage (<a:rPr>) and colors
Search for patterns: Use grep to find color (<a:solidFill>, <a:srgbClr>) and font references across all XML files

Creating a new PowerPoint presentation without a template

When creating a new PowerPoint presentation from scratch, use the html2pptx workflow to convert HTML slides to PowerPoint with accurate positioning.

Design Principles

CRITICAL: Before creating any presentation, analyze the content and choose appropriate design elements:

Consider the subject matter: What is this presentation about? What tone, industry, or mood does it suggest?
Check for branding: If the user mentions a company/organization, consider their brand colors and identity
Match palette to content: Select colors that reflect the subject
State your approach: Explain your design choices before writing code

Requirements:

✅ State your content-informed design approach BEFORE writing code
✅ Use web-safe fonts only: Arial, Helvetica, Times New Roman, Georgia, Courier New, Verdana, Tahoma, Trebuchet MS, Impact
✅ Create clear visual hierarchy through size, weight, and color
✅ Ensure readability: strong contrast, appropriately sized text, clean alignment
✅ Be consistent: repeat patterns, spacing, and visual language across slides

Color Palette Selection

Choosing colors creatively:

Think beyond defaults: What colors genuinely match this specific topic? Avoid autopilot choices.
Consider multiple angles: Topic, industry, mood, energy level, target audience, brand identity (if mentioned)
Be adventurous: Try unexpected combinations - a healthcare presentation doesn't have to be green, finance doesn't have to be navy
Build your palette: Pick 3-5 colors that work together (dominant colors + supporting tones + accent)
Ensure contrast: Text must be clearly readable on backgrounds

Example color palettes (use these to spark creativity - choose one, adapt it, or create your own):

Classic Blue: Deep navy (#1C2833), slate gray (#2E4053), silver (#AAB7B8), off-white (#F4F6F6)
Teal & Coral: Teal (#5EA8A7), deep teal (#277884), coral (#FE4447), white (#FFFFFF)
Bold Red: Red (#C0392B), bright red (#E74C3C), orange (#F39C12), yellow (#F1C40F), green (#2ECC71)
Warm Blush: Mauve (#A49393), blush (#EED6D3), rose (#E8B4B8), cream (#FAF7F2)
Burgundy Luxury: Burgundy (#5D1D2E), crimson (#951233), rust (#C15937), gold (#997929)
Deep Purple & Emerald: Purple (#B165FB), dark blue (#181B24), emerald (#40695B), white (#FFFFFF)
Cream & Forest Green: Cream (#FFE1C7), forest green (#40695B), white (#FCFCFC)
Pink & Purple: Pink (#F8275B), coral (#FF574A), rose (#FF737D), purple (#3D2F68)
Lime & Plum: Lime (#C5DE82), plum (#7C3A5F), coral (#FD8C6E), blue-gray (#98ACB5)
Black & Gold: Gold (#BF9A4A), black (#000000), cream (#F4F6F6)
Sage & Terracotta: Sage (#87A96B), terracotta (#E07A5F), cream (#F4F1DE), charcoal (#2C2C2C)
Charcoal & Red: Charcoal (#292929), red (#E33737), light gray (#CCCBCB)
Vibrant Orange: Orange (#F96D00), light gray (#F2F2F2), charcoal (#222831)
Forest Green: Black (#191A19), green (#4E9F3D), dark green (#1E5128), white (#FFFFFF)
Retro Rainbow: Purple (#722880), pink (#D72D51), orange (#EB5C18), amber (#F08800), gold (#DEB600)
Vintage Earthy: Mustard (#E3B448), sage (#CBD18F), forest green (#3A6B35), cream (#F4F1DE)
Coastal Rose: Old rose (#AD7670), beaver (#B49886), eggshell (#F3ECDC), ash gray (#BFD5BE)
Orange & Turquoise: Light orange (#FC993E), grayish turquoise (#667C6F), white (#FCFCFC)

Visual Details Options

Geometric Patterns:

Diagonal section dividers instead of horizontal
Asymmetric column widths (30/70, 40/60, 25/75)
Rotated text headers at 90° or 270°
Circular/hexagonal frames for images
Triangular accent shapes in corners
Overlapping shapes for depth

Border & Frame Treatments:

Thick single-color borders (10-20pt) on one side only
Double-line borders with contrasting colors
Corner brackets instead of full frames
L-shaped borders (top+left or bottom+right)
Underline accents beneath headers (3-5pt thick)

Typography Treatments:

Extreme size contrast (72pt headlines vs 11pt body)
All-caps headers with wide letter spacing
Numbered sections in oversized display type
Monospace (Courier New) for data/stats/technical content
Condensed fonts (Arial Narrow) for dense information
Outlined text for emphasis

Chart & Data Styling:

Monochrome charts with single accent color for key data
Horizontal bar charts instead of vertical
Dot plots instead of bar charts
Minimal gridlines or none at all
Data labels directly on elements (no legends)
Oversized numbers for key metrics

Layout Innovations:

Full-bleed images with text overlays
Sidebar column (20-30% width) for navigation/context
Modular grid systems (3×3, 4×4 blocks)
Z-pattern or F-pattern content flow
Floating text boxes over colored shapes
Magazine-style multi-column layouts

Background Treatments:

Solid color blocks occupying 40-60% of slide
Gradient fills (vertical or diagonal only)
Split backgrounds (two colors, diagonal or vertical)
Edge-to-edge color bands
Negative space as a design element

Layout Tips

When creating slides with charts or tables:

Two-column layout (PREFERRED): Use a header spanning the full width, then two columns below - text/bullets in one column and the featured content in the other. This provides better balance and makes charts/tables more readable. Use flexbox with unequal column widths (e.g., 40%/60% split) to optimize space for each content type.
Full-slide layout: Let the featured content (chart/table) take up the entire slide for maximum impact and readability
NEVER vertically stack: Do not place charts/tables below text in a single column - this causes poor readability and layout issues

Workflow

MANDATORY - READ ENTIRE FILE: Read html2pptx.md completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with presentation creation.
Create an HTML file for each slide with proper dimensions (e.g., 720pt × 405pt for 16:9)
- Use <p>, <h1>-<h6>, <ul>, <ol> for all text content
- Use class="placeholder" for areas where charts/tables will be added (render with gray background for visibility)
- CRITICAL: Rasterize gradients and icons as PNG images FIRST using Sharp, then reference in HTML
- LAYOUT: For slides with charts/tables/images, use either full-slide layout or two-column layout for better readability
Create and run a JavaScript file using the html2pptx.js library to convert HTML slides to PowerPoint and save the presentation
- Use the html2pptx() function to process each HTML file
- Add charts and tables to placeholder areas using PptxGenJS API
- Save the presentation using pptx.writeFile()
Visual validation: Generate thumbnails and inspect for layout issues
- Create thumbnail grid: python scripts/thumbnail.py output.pptx workspace/thumbnails --cols 4
- Read and carefully examine the thumbnail image for:
  - Text cutoff: Text being cut off by header bars, shapes, or slide edges
  - Text overlap: Text overlapping with other text or shapes
  - Positioning issues: Content too close to slide boundaries or other elements
  - Contrast issues: Insufficient contrast between text and backgrounds
- If issues found, adjust HTML margins/spacing/colors and regenerate the presentation
- Repeat until all slides are visually correct

Editing an existing PowerPoint presentation

When edit slides in an existing PowerPoint presentation, you need to work with the raw Office Open XML (OOXML) format. This involves unpacking the .pptx file, editing the XML content, and repacking it.

Workflow

MANDATORY - READ ENTIRE FILE: Read ooxml.md (~500 lines) completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed guidance on OOXML structure and editing workflows before any presentation editing.
Unpack the presentation: python ooxml/scripts/unpack.py <office_file> <output_dir>
Edit the XML files (primarily ppt/slides/slide{N}.xml and related files)
CRITICAL: Validate immediately after each edit and fix any validation errors before proceeding: python ooxml/scripts/validate.py <dir> --original <file>
Pack the final presentation: python ooxml/scripts/pack.py <input_directory> <office_file>

Creating a new PowerPoint presentation using a template

When you need to create a presentation that follows an existing template's design, you'll need to duplicate and re-arrange template slides before then replacing placeholder context.

Workflow

Extract template text AND create visual thumbnail grid:
- Extract text: python -m markitdown template.pptx > template-content.md
- Read template-content.md: Read the entire file to understand the contents of the template presentation. NEVER set any range limits when reading this file.
- Create thumbnail grids: python scripts/thumbnail.py template.pptx
- See Creating Thumbnail Grids section for more details
Analyze template and save inventory to a file:
- Visual Analysis: Review thumbnail grid(s) to understand slide layouts, design patterns, and visual structure
- Create and save a template inventory file at template-inventory.md containing:
```
# Template Inventory Analysis
**Total Slides: [count]**
**IMPORTANT: Slides are 0-indexed (first slide = 0, last slide = count-1)**

## [Category Name]
- Slide 0: [Layout code if available] - Description/purpose
- Slide 1: [Layout code] - Description/purpose
- Slide 2: [Layout code] - Description/purpose
[... EVERY slide must be listed individually with its index ...]
```
- Using the thumbnail grid: Reference the visual thumbnails to identify:
  - Layout patterns (title slides, content layouts, section dividers)
  - Image placeholder locations and counts
  - Design consistency across slide groups
  - Visual hierarchy and structure
- This inventory file is REQUIRED for selecting appropriate templates in the next step
Create presentation outline based on template inventory:
- Review available templates from step 2.
- Choose an intro or title template for the first slide. This should be one of the first templates.
- Choose safe, text-based layouts for the other slides.
- CRITICAL: Match layout structure to actual content:
  - Single-column layouts: Use for unified narrative or single topic
  - Two-column layouts: Use ONLY when you have exactly 2 distinct items/concepts
  - Three-column layouts: Use ONLY when you have exactly 3 distinct items/concepts
  - Image + text layouts: Use ONLY when you have actual images to insert
  - Quote layouts: Use ONLY for actual quotes from people (with attribution), never for emphasis
  - Never use layouts with more placeholders than you have content
  - If you have 2 items, don't force them into a 3-column layout
  - If you have 4+ items, consider breaking into multiple slides or using a list format
- Count your actual content pieces BEFORE selecting the layout
- Verify each placeholder in the chosen layout will be filled with meaningful content
- Select one option representing the best layout for each content section.
- Save outline.md with content AND template mapping that leverages available designs
- Example template mapping:
```
# Template slides to use (0-based indexing)
# WARNING: Verify indices are within range! Template with 73 slides has indices 0-72
# Mapping: slide numbers from outline -> template slide indices
template_mapping = [
    0,   # Use slide 0 (Title/Cover)
    34,  # Use slide 34 (B1: Title and body)
    34,  # Use slide 34 again (duplicate for second B1)
    50,  # Use slide 50 (E1: Quote)
    54,  # Use slide 54 (F2: Closing + Text)
]
```
Duplicate, reorder, and delete slides using rearrange.py:
- Use the scripts/rearrange.py script to create a new presentation with slides in the desired order:
```
python scripts/rearrange.py template.pptx working.pptx 0,34,34,50,52
```
- The script handles duplicating repeated slides, deleting unused slides, and reordering automatically
- Slide indices are 0-based (first slide is 0, second is 1, etc.)
- The same slide index can appear multiple times to duplicate that slide

Extract ALL text using the inventory.py script:

Run inventory extraction:

python scripts/inventory.py working.pptx text-inventory.json

Read text-inventory.json: Read the entire text-inventory.json file to understand all shapes and their properties. NEVER set any range limits when reading this file.

The inventory JSON structure:

  {
    "slide-0": {
      "shape-0": {
        "placeholder_type": "TITLE",  // or null for non-placeholders
        "left": 1.5,                  // position in inches
        "top": 2.0,
        "width": 7.5,
        "height": 1.2,
        "paragraphs": [
          {
            "text": "Paragraph text",
            // Optional properties (only included when non-default):
            "bullet": true,           // explicit bullet detected
            "level": 0,               // only included when bullet is true
            "alignment": "CENTER",    // CENTER, RIGHT (not LEFT)
            "space_before": 10.0,     // space before paragraph in points
            "space_after": 6.0,       // space after paragraph in points
            "line_spacing": 22.4,     // line spacing in points
            "font_name": "Arial",     // from first run
            "font_size": 14.0,        // in points
            "bold": true,
            "italic": false,
            "underline": false,
            "color": "FF0000"         // RGB color
          }
        ]
      }
    }
  }

Key features:
- Slides: Named as "slide-0", "slide-1", etc.
- Shapes: Ordered by visual position (top-to-bottom, left-to-right) as "shape-0", "shape-1", etc.
- Placeholder types: TITLE, CENTER_TITLE, SUBTITLE, BODY, OBJECT, or null
- Default font size: default_font_size in points extracted from layout placeholders (when available)
- Slide numbers are filtered: Shapes with SLIDE_NUMBER placeholder type are automatically excluded from inventory
- Bullets: When bullet: true, level is always included (even if 0)
- Spacing: space_before, space_after, and line_spacing in points (only included when set)
- Colors: color for RGB (e.g., "FF0000"), theme_color for theme colors (e.g., "DARK_1")
- Properties: Only non-default values are included in the output

Generate replacement text and save the data to a JSON file Based on the text inventory from the previous step:
- CRITICAL: First verify which shapes exist in the inventory - only reference shapes that are actually present
- VALIDATION: The replace.py script will validate that all shapes in your replacement JSON exist in the inventory
  - If you reference a non-existent shape, you'll get an error showing available shapes
  - If you reference a non-existent slide, you'll get an error indicating the slide doesn't exist
  - All validation errors are shown at once before the script exits
- IMPORTANT: The replace.py script uses inventory.py internally to identify ALL text shapes
- AUTOMATIC CLEARING: ALL text shapes from the inventory will be cleared unless you provide "paragraphs" for them
- Add a "paragraphs" field to shapes that need content (not "replacement_paragraphs")
- Shapes without "paragraphs" in the replacement JSON will have their text cleared automatically
- Paragraphs with bullets will be automatically left aligned. Don't set the alignment property on when "bullet": true
- Generate appropriate replacement content for placeholder text
- Use shape size to determine appropriate content length
- CRITICAL: Include paragraph properties from the original inventory - don't just provide text
- IMPORTANT: When bullet: true, do NOT include bullet symbols (•, -, *) in text - they're added automatically
- ESSENTIAL FORMATTING RULES:
  - Headers/titles should typically have "bold": true
  - List items should have "bullet": true, "level": 0 (level is required when bullet is true)
  - Preserve any alignment properties (e.g., "alignment": "CENTER" for centered text)
  - Include font properties when different from default (e.g., "font_size": 14.0, "font_name": "Lora")
  - Colors: Use "color": "FF0000" for RGB or "theme_color": "DARK_1" for theme colors
  - The replacement script expects properly formatted paragraphs, not just text strings
  - Overlapping shapes: Prefer shapes with larger default_font_size or more appropriate placeholder_type
- Save the updated inventory with replacements to replacement-text.json
- WARNING: Different template layouts have different shape counts - always check the actual inventory before creating replacements
Example paragraphs field showing proper formatting:
```
"paragraphs": [
  {
    "text": "New presentation title text",
    "alignment": "CENTER",
    "bold": true
  },
  {
    "text": "Section Header",
    "bold": true
  },
  {
    "text": "First bullet point without bullet symbol",
    "bullet": true,
    "level": 0
  },
  {
    "text": "Red colored text",
    "color": "FF0000"
  },
  {
    "text": "Theme colored text",
    "theme_color": "DARK_1"
  },
  {
    "text": "Regular paragraph text without special formatting"
  }
]
```
Shapes not listed in the replacement JSON are automatically cleared:
```
{
  "slide-0": {
    "shape-0": {
      "paragraphs": [...] // This shape gets new text
    }
    // shape-1 and shape-2 from inventory will be cleared automatically
  }
}
```
Common formatting patterns for presentations:
- Title slides: Bold text, sometimes centered
- Section headers within slides: Bold text
- Bullet lists: Each item needs "bullet": true, "level": 0
- Body text: Usually no special properties needed
- Quotes: May have special alignment or font properties
Apply replacements using the replace.py script
```
python scripts/replace.py working.pptx replacement-text.json output.pptx
```
The script will:
- First extract the inventory of ALL text shapes using functions from inventory.py
- Validate that all shapes in the replacement JSON exist in the inventory
- Clear text from ALL shapes identified in the inventory
- Apply new text only to shapes with "paragraphs" defined in the replacement JSON
- Preserve formatting by applying paragraph properties from the JSON
- Handle bullets, alignment, font properties, and colors automatically
- Save the updated presentation
Example validation errors:
```
ERROR: Invalid shapes in replacement JSON:
  - Shape 'shape-99' not found on 'slide-0'. Available shapes: shape-0, shape-1, shape-4
  - Slide 'slide-999' not found in inventory
```
```
ERROR: Replacement text made overflow worse in these shapes:
  - slide-0/shape-2: overflow worsened by 1.25" (was 0.00", now 1.25")
```

Creating Thumbnail Grids

To create visual thumbnail grids of PowerPoint slides for quick analysis and reference:

python scripts/thumbnail.py template.pptx [output_prefix]

Features:

Creates: thumbnails.jpg (or thumbnails-1.jpg, thumbnails-2.jpg, etc. for large decks)
Default: 5 columns, max 30 slides per grid (5×6)
Custom prefix: python scripts/thumbnail.py template.pptx my-grid
- Note: The output prefix should include the path if you want output in a specific directory (e.g., workspace/my-grid)
Adjust columns: --cols 4 (range: 3-6, affects slides per grid)
Grid limits: 3 cols = 12 slides/grid, 4 cols = 20, 5 cols = 30, 6 cols = 42
Slides are zero-indexed (Slide 0, Slide 1, etc.)

Use cases:

Template analysis: Quickly understand slide layouts and design patterns
Content review: Visual overview of entire presentation
Navigation reference: Find specific slides by their visual appearance
Quality check: Verify all slides are properly formatted

Examples:

# Basic usage
python scripts/thumbnail.py presentation.pptx

# Combine options: custom name, columns
python scripts/thumbnail.py template.pptx analysis --cols 4

Converting Slides to Images

To visually analyze PowerPoint slides, convert them to images using a two-step process:

Convert PPTX to PDF:

soffice --headless --convert-to pdf template.pptx

Convert PDF pages to JPEG images:
```
pdftoppm -jpeg -r 150 template.pdf slide
```
This creates files like slide-1.jpg, slide-2.jpg, etc.

Options:

-r 150: Sets resolution to 150 DPI (adjust for quality/size balance)
-jpeg: Output JPEG format (use -png for PNG if preferred)
-f N: First page to convert (e.g., -f 2 starts from page 2)
-l N: Last page to convert (e.g., -l 5 stops at page 5)
slide: Prefix for output files

Example for specific range:

pdftoppm -jpeg -r 150 -f 2 -l 5 template.pdf slide  # Converts only pages 2-5

Code Style Guidelines

IMPORTANT: When generating code for PPTX operations:

Write concise code
Avoid verbose variable names and redundant operations
Avoid unnecessary print statements

Dependencies

Required dependencies (should already be installed):

markitdown: pip install "markitdown[pptx]" (for text extraction from presentations)
pptxgenjs: npm install -g pptxgenjs (for creating presentations via html2pptx)
playwright: npm install -g playwright (for HTML rendering in html2pptx)
react-icons: npm install -g react-icons react react-dom (for icons)
sharp: npm install -g sharp (for SVG rasterization and image processing)
LibreOffice: sudo apt-get install libreoffice (for PDF conversion)
Poppler: sudo apt-get install poppler-utils (for pdftoppm to convert PDF to images)
defusedxml: pip install defusedxml (for secure XML parsing)

xlsx

name: xlsx description: "Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas" license: Proprietary. LICENSE.txt has complete terms

Requirements for Outputs

All Excel files

Zero Formula Errors

Every Excel model MUST be delivered with ZERO formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)

Preserve Existing Templates (when updating templates)

Study and EXACTLY match existing format, style, and conventions when modifying files
Never impose standardized formatting on files with established patterns
Existing template conventions ALWAYS override these guidelines

Financial models

Color Coding Standards

Unless otherwise stated by the user or existing template

Industry-Standard Color Conventions

Blue text (RGB: 0,0,255): Hardcoded inputs, and numbers users will change for scenarios
Black text (RGB: 0,0,0): ALL formulas and calculations
Green text (RGB: 0,128,0): Links pulling from other worksheets within same workbook
Red text (RGB: 255,0,0): External links to other files
Yellow background (RGB: 255,255,0): Key assumptions needing attention or cells that need to be updated

Number Formatting Standards

Required Format Rules

Years: Format as text strings (e.g., "2024" not "2,024")
Currency: Use $#,##0 format; ALWAYS specify units in headers ("Revenue ($mm)")
Zeros: Use number formatting to make all zeros "-", including percentages (e.g., "$#,##0;($#,##0);-")
Percentages: Default to 0.0% format (one decimal)
Multiples: Format as 0.0x for valuation multiples (EV/EBITDA, P/E)
Negative numbers: Use parentheses (123) not minus -123

Formula Construction Rules

Assumptions Placement

Place ALL assumptions (growth rates, margins, multiples, etc.) in separate assumption cells
Use cell references instead of hardcoded values in formulas
Example: Use =B5*(1+$B$6) instead of =B5*1.05

Formula Error Prevention

Verify all cell references are correct
Check for off-by-one errors in ranges
Ensure consistent formulas across all projection periods
Test with edge cases (zero values, negative numbers)
Verify no unintended circular references

Documentation Requirements for Hardcodes

Comment or in cells beside (if end of table). Format: "Source: [System/Document], [Date], [Specific Reference], [URL if applicable]"
Examples:
- "Source: Company 10-K, FY2024, Page 45, Revenue Note, [SEC EDGAR URL]"
- "Source: Company 10-Q, Q2 2025, Exhibit 99.1, [SEC EDGAR URL]"
- "Source: Bloomberg Terminal, 8/15/2025, AAPL US Equity"
- "Source: FactSet, 8/20/2025, Consensus Estimates Screen"

XLSX creation, editing, and analysis

Overview

A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.

Important Requirements

LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the recalc.py script. The script automatically configures LibreOffice on first run

Reading and analyzing data

Data analysis with pandas

For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:

import pandas as pd

# Read Excel
df = pd.read_excel('file.xlsx')  # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets as dict

# Analyze
df.head()      # Preview data
df.info()      # Column info
df.describe()  # Statistics

# Write Excel
df.to_excel('output.xlsx', index=False)

Excel File Workflows

CRITICAL: Use Formulas, Not Hardcoded Values

Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.

❌ WRONG - Hardcoding Calculated Values

# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total  # Hardcodes 5000

# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth  # Hardcodes 0.15

# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg  # Hardcodes 42.5

✅ CORRECT - Using Excel Formulas

# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'

# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'

# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'

This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.

Common Workflow

Choose tool: pandas for data, openpyxl for formulas/formatting
Create/Load: Create new workbook or load existing file
Modify: Add/edit data, formulas, and formatting
Save: Write to file
Recalculate formulas (MANDATORY IF USING FORMULAS): Use the recalc.py script
```
python recalc.py output.xlsx
```
Verify and fix any errors:
- The script returns JSON with error details
- If status is errors_found, check error_summary for specific error types and locations
- Fix the identified errors and recalculate again
- Common errors to fix:
  - #REF!: Invalid cell references
  - #DIV/0!: Division by zero
  - #VALUE!: Wrong data type in formula
  - #NAME?: Unrecognized formula name

Creating new Excel files

# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
sheet = wb.active

# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])

# Add formula
sheet['B2'] = '=SUM(A1:A10)'

# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')

# Column width
sheet.column_dimensions['A'].width = 20

wb.save('output.xlsx')

Editing existing Excel files

# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook

# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active  # or wb['SheetName'] for specific sheet

# Working with multiple sheets
for sheet_name in wb.sheetnames:
    sheet = wb[sheet_name]
    print(f"Sheet: {sheet_name}")

# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2)  # Insert row at position 2
sheet.delete_cols(3)  # Delete column 3

# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'

wb.save('modified.xlsx')

Recalculating formulas

Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided recalc.py script to recalculate formulas:

python recalc.py <excel_file> [timeout_seconds]

Example:

python recalc.py output.xlsx 30

The script:

Automatically sets up LibreOffice macro on first run
Recalculates all formulas in all sheets
Scans ALL cells for Excel errors (#REF!, #DIV/0!, etc.)
Returns JSON with detailed error locations and counts
Works on both Linux and macOS

Formula Verification Checklist

Quick checks to ensure formulas work correctly:

Essential Verification

Test 2-3 sample references: Verify they pull correct values before building full model
Column mapping: Confirm Excel columns match (e.g., column 64 = BL, not BK)
Row offset: Remember Excel rows are 1-indexed (DataFrame row 5 = Excel row 6)

Common Pitfalls

NaN handling: Check for null values with pd.notna()
Far-right columns: FY data often in columns 50+
Multiple matches: Search all occurrences, not just first
Division by zero: Check denominators before using / in formulas (#DIV/0!)
Wrong references: Verify all cell references point to intended cells (#REF!)
Cross-sheet references: Use correct format (Sheet1!A1) for linking sheets

Formula Testing Strategy

Start small: Test formulas on 2-3 cells before applying broadly
Verify dependencies: Check all cells referenced in formulas exist
Test edge cases: Include zero, negative, and very large values

Interpreting recalc.py Output

The script returns JSON with error details:

{
  "status": "success",           // or "errors_found"
  "total_errors": 0,              // Total error count
  "total_formulas": 42,           // Number of formulas in file
  "error_summary": {              // Only present if errors found
    "#REF!": {
      "count": 2,
      "locations": ["Sheet1!B5", "Sheet1!C10"]
    }
  }
}

Best Practices

Library Selection

pandas: Best for data analysis, bulk operations, and simple data export
openpyxl: Best for complex formatting, formulas, and Excel-specific features

Working with openpyxl

Cell indices are 1-based (row=1, column=1 refers to cell A1)
Use data_only=True to read calculated values: load_workbook('file.xlsx', data_only=True)
Warning: If opened with data_only=True and saved, formulas are replaced with values and permanently lost
For large files: Use read_only=True for reading or write_only=True for writing
Formulas are preserved but not evaluated - use recalc.py to update values

Working with pandas

Specify data types to avoid inference issues: pd.read_excel('file.xlsx', dtype={'id': str})
For large files, read specific columns: pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
Handle dates properly: pd.read_excel('file.xlsx', parse_dates=['date_column'])

Code Style Guidelines

IMPORTANT: When generating Python code for Excel operations:

Write minimal, concise Python code without unnecessary comments
Avoid verbose variable names and redundant operations
Avoid unnecessary print statements

For Excel files themselves:

Add comments to cells with complex formulas or important assumptions
Document data sources for hardcoded values
Include notes for key calculations and model sections

[Evidence block] No Skills: 60 With Skills: 80 Delta: 20 Failure summary: embedded Excel extraction, targeted update, and PPTX preservation require ordered artifact manipulation Competing schema note: No prior round-0 pair evidence available.

[Output contract block] Return YAML with fields: revised_task_skill, change_summary{keep/add/remove/sharpen}, rationale

revised_task_skill: |
  ...
change_summary:
  keep:
    - ...
  add:
    - ...
  remove:
    - ...
  sharpen:
    - ...
rationale: |
  ...

[Outer-loop candidate block] Source round: outer-loop-round0 Next round: outer-loop-round1 Candidate id: artifact-generation::round1::differentiating Candidate mode: differentiating Pair reason: full_matrix_eval Next pair plan mode: full_matrix Use this as a candidate schema rerun, not as an accepted final schema.

Installs0

GitHub Stars0

LanguagePython

AddedJun 10, 2026

View on GitHub