Protecting Privacy in the Archives: Preliminary Explorations of Topic Modeling for Born-Digital Collections
dc.contributor.author | Hutchinson, Tim | |
dc.date.accessioned | 2018-06-22T19:24:02Z | |
dc.date.available | 2018-06-22T19:24:02Z | |
dc.date.issued | 2017-12 | |
dc.description.abstract | Natural language processing (NLP) is an area of increased interest for digital archivists, although most research to date has focused on digitized rather than born-digital collections. This study in progress explores whether NLP techniques can be used effectively to surface documents requiring restrictions due to their personal information content. This phase of the research focuses on using topic modeling to find records relating to human resources. Early results show some promise, but suggest that topic modeling on its own will not be sufficient; other techniques to be explored include sentiment analysis and named entity extraction. | en_US |
dc.identifier.citation | Tim Hutchinson, 2017. Protecting Privacy in the Archives: Preliminary Explorations of Topic Modeling for Born-Digital Collections. Proceedings of the 2017 IEEE International Conference on Big Data. Boston, MA: 11-14 December 2017, pp. 2251-2255. | en_US |
dc.identifier.uri | http://hdl.handle.net/10388/8625 | |
dc.language.iso | en | en_US |
dc.publisher | IEEE | en_US |
dc.title | Protecting Privacy in the Archives: Preliminary Explorations of Topic Modeling for Born-Digital Collections | en_US |
dc.type | Conference Presentation | en_US |
dc.type | Refereed Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Hutchinson_ArchivalComputationalScience2017_final.pdf
- Size:
- 194.02 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.29 KB
- Format:
- Item-specific license agreed upon to submission
- Description: