Skills majiayu000/claude-skill-registry-datasemantic-scholar

Editor's Note

semantic-scholar

Install

npx skills add https://github.com/majiayu000/claude-skill-registry-data --skill semantic-scholar

SKILL.md

Semantic Scholar Paper Metadata Skill

Search and retrieve research paper metadata using the Semantic Scholar Academic Graph API.

Base URL: https://api.semanticscholar.org/graph/v1 Auth: No key required for basic use. For higher rate limits, get a free key at https://www.semanticscholar.org/product/api and pass it as x-api-key header.

Choosing the Right Endpoint

Goal	Endpoint
Search by keywords/topic	`GET /paper/search`
Boolean/filtered bulk search	`GET /paper/search/bulk`
Match by exact/near-exact title	`GET /paper/search/match`
Look up a specific paper (DOI, arXiv, etc.)	`GET /paper/{paper_id}`
Fetch multiple papers at once	`POST /paper/batch`
Papers that cite a paper	`GET /paper/{paper_id}/citations`
Papers referenced by a paper	`GET /paper/{paper_id}/references`
Author's papers	`GET /author/{author_id}/papers`

Field Selection

All endpoints accept a fields parameter — a comma-separated list. Default response only includes paperId and title. Always request the fields you need.

Common fields:

paperId, corpusId, externalIds, title, abstract, year, venue,
publicationDate, authors, citationCount, referenceCount,
influentialCitationCount, isOpenAccess, openAccessPdf,
fieldsOfStudy, s2FieldsOfStudy, publicationTypes

Nested fields use dot notation:

authors.name, authors.affiliations,
citations.title, citations.year, citations.authors,
references.title, references.externalIds,
openAccessPdf.url

Endpoint Details & curl Examples

1. Keyword Search — `/paper/search`

Best for: topic discovery, finding relevant papers by concept.

curl -G "https://api.semanticscholar.org/graph/v1/paper/search" \
  --data-urlencode "query=transformer attention mechanism" \
  --data-urlencode "fields=title,year,authors,citationCount,abstract" \
  --data-urlencode "limit=10"

Filters (all optional, combine freely):

--data-urlencode "year=2020-2024"               # year range
--data-urlencode "venue=NeurIPS,ICML"           # specific venues
--data-urlencode "fieldsOfStudy=Computer Science"
--data-urlencode "minCitationCount=50"
--data-urlencode "openAccessPdf="               # open access only (empty value = true)
--data-urlencode "publicationDateOrYear=2023-01-01:2023-12-31"

Pagination: Response includes total, offset, next. Use offset=N to page through results (max 1,000 total).

2. Bulk Search — `/paper/search/bulk`

Best for: large result sets, boolean queries, sorted/filtered exports.

Supports AND/OR/NOT operators and returns up to 1,000 results per call with token-based pagination (no 1,000 total cap).

curl -G "https://api.semanticscholar.org/graph/v1/paper/search/bulk" \
  --data-urlencode "query=large language models AND safety" \
  --data-urlencode "fields=title,year,citationCount,authors" \
  --data-urlencode "sort=citationCount:desc" \
  --data-urlencode "limit=100"

Sort options: paperId, publicationDate, citationCount (append :asc or :desc).

Pagination: Response includes a token field. Pass it as &token=... on the next call.

3. Title Match — `/paper/search/match`

Best for: "I have this paper title, give me its metadata." Returns a single best match or 404. Includes a matchScore.

curl -G "https://api.semanticscholar.org/graph/v1/paper/search/match" \
  --data-urlencode "query=Attention Is All You Need" \
  --data-urlencode "fields=title,year,authors,citationCount,externalIds"

4. Paper by ID — `/paper/{paper_id}`

Best for: fetching full metadata when you already have an identifier.

Supported ID formats (prefix as shown):

Format	Example
S2 Paper ID (bare)	`649def34f8be52c8b66281af98ae884c09aef38a`
Corpus ID	`CorpusId:215416146`
DOI	`DOI:10.18653/v1/2020.acl-main.463`
arXiv	`ARXIV:2005.14165`
PubMed	`PMID:23193287`
PubMed Central	`PMCID:PMC4535869`
ACL Anthology	`ACL:2020.acl-main.463`
Semantic Scholar URL	`URL:https://www.semanticscholar.org/paper/...`

# By arXiv ID
curl "https://api.semanticscholar.org/graph/v1/paper/ARXIV:1706.03762?fields=title,abstract,year,authors,citationCount,referenceCount,isOpenAccess,openAccessPdf"

# By DOI
curl "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1145/3292500.3330919?fields=title,year,venue,citationCount"

5. Batch Paper Lookup — `POST /paper/batch`

Best for: fetching metadata for a list of known paper IDs (up to 500).

curl -X POST "https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year,citationCount" \
  -H "Content-Type: application/json" \
  -d '{"ids": ["ARXIV:1706.03762", "ARXIV:2005.14165", "DOI:10.18653/v1/2020.acl-main.463"]}'

6. Citations & References

# Papers that CITE this paper (incoming)
curl "https://api.semanticscholar.org/graph/v1/paper/ARXIV:1706.03762/citations?fields=title,year,authors,citationCount&limit=20"

# Papers this paper CITES (outgoing)
curl "https://api.semanticscholar.org/graph/v1/paper/ARXIV:1706.03762/references?fields=title,year,authors&limit=20"

Response wraps each item in a citingPaper or citedPaper key:

{ "data": [{ "citingPaper": { "title": "...", "year": 2023 } }] }

7. Author Lookup

# Find an author's ID by name
curl -G "https://api.semanticscholar.org/graph/v1/author/search" \
  --data-urlencode "query=Yoshua Bengio" \
  --data-urlencode "fields=name,affiliations,paperCount,citationCount,hIndex"

# Get their papers
curl "https://api.semanticscholar.org/graph/v1/author/{authorId}/papers?fields=title,year,citationCount&limit=20"

Rate Limits & API Key

Without a key: ~100 requests/5 min (shared pool)
With a free API key: ~1 request/sec sustained

To use an API key:

curl -H "x-api-key: YOUR_KEY" "https://api.semanticscholar.org/graph/v1/paper/..."

Store the key in an env var: export S2_API_KEY=... Then use: -H "x-api-key: $S2_API_KEY"

Parsing Results

Use jq to extract fields from responses:

# List titles and citation counts from a search
curl -s -G "https://api.semanticscholar.org/graph/v1/paper/search" \
  --data-urlencode "query=BERT language model" \
  --data-urlencode "fields=title,year,citationCount" \
  --data-urlencode "limit=5" \
| jq '.data[] | "\(.citationCount)\t\(.year)\t\(.title)"'

# Get all author names from a paper
curl -s "https://api.semanticscholar.org/graph/v1/paper/ARXIV:1706.03762?fields=authors.name" \
| jq '[.authors[].name]'

# Extract open access PDF URL
curl -s "https://api.semanticscholar.org/graph/v1/paper/ARXIV:2005.14165?fields=openAccessPdf" \
| jq '.openAccessPdf.url'

Workflow Tips

Start with /paper/search/match when the user provides a full title — it's faster than a keyword search.
Use /paper/{id} when they give you a DOI, arXiv link, or PubMed ID — extract the ID and call directly.
For "papers about X" questions, use /paper/search with minCitationCount and year filters to surface foundational work.
For citation counts and h-index exploration, combine /author/search → /author/{id}/papers sorted by citationCount:desc.
Always request only the fields you need — smaller responses, faster calls.
When displaying results, format as a table with title, year, venue, and citation count for readability.

Installs0

GitHub Stars7

LanguageHTML

AddedMay 25, 2026

View on GitHub