Google Scholar
dblp
LinkedIn
These datasets contain full-text patents (date of publication, title, abstract, description, claims) as well as their CPC codes for the years 2010 and 2011. The documents are in zip-compressed JSON lines format partitioned by year. The code used to create this dataset, by harvesting and parsing patent documents made publicly available at the USPTO website, can be found here.
Dataset | File Size | Download Links |
---|---|---|
2010 US Patents | 2556 MB | main mirror |
2011 US Patents | 2652 MB | main mirror |
Last harvested and compiled: 10/8/2017
If you use this dataset, please consider citing the following paper:
@inproceedings{tran2017supervised,
title={Supervised Approaches to Assign Cooperative Patent Classification (CPC) Codes to Patents},
author={Tran, Tung and Kavuluru, Ramakanth},
booktitle={International Conference on Mining Intelligence and Knowledge Exploration},
pages={22--34},
year={2017},
organization={Springer}
}
For more information on the CPC system, check out: