About The Church in the Southern Black Community Collection
===========================================================
The physical collection was digitized and transcribed by hand. This means that the text is far more reliable than uncorrected OCR output which is common in digitized archives.

More information about the collection and access to individual page images can be found here: http://docsouth.unc.edu/church/

This Doc South Data Collection zip file contains a folder named Data and several documents. The documents relate to the Library of Congress' BagIt process (learn more about BagIt here: http://www.digitalpreservation.gov/multimedia/videos/bagit0609.html)

The folder named Data is what will be of most interest to researchers. It contains the following items:

    A folder containing each of the texts in the collection as plain text;
    A folder containing each of the texts in the collection marked up in TEI/XML;
    A .csv (Microsoft Excel) file that lists each item indicating its title, author, date of publication and a web address for each individual text;
    A “Read Me” file that explains the project and the contents of the DocSouthData folder.
    A file named text-only.xsl

The plain text files can be used in text mining projects such as topic modeling, sentiment analysis and natural language processing. Please note that the full text may contain paratextual elements such as title pages and appendices which will be included in any word counts you perform. You may wish to delete these in order to focus your analysis on just the narratives.

The TEI/XML files have been included for advanced users who would like to use the markup to isolate particular parts of text for analysis.

The .csv file acts as a table of contents for the collection and includes Title, Author, Publication Date a url pointing to the digitized version of the text and a unique url pointing to a web accessible version of the text in plain text (this is particularly useful for use with Voyant: http://voyant-tools.org/). 

The text-only.xsl file is the script that was used to create the folder.

Feedback
========
Please let us know how you are using the data and if you have any suggestions for making it even more useful. Send any feedback to wilsonlibrary@unc.edu

Copyright Statement
===================
With the exception of "Fields's Observation: The Slave Narrative of a Nineteenth-Century Virginian," which has no known rights, the texts, encoding, and metadata available in DocSouth Data are made available for use under the terms of a Creative Commons Attribution License (CC BY 4.0:http://creativecommons.org/licenses/by/4.0/). Users are free to copy, share, adapt, and re-publish any of the content in Open DocSouth as long as they credit the University Library at the University of North Carolina at Chapel Hill for making this material available.

About the DocSouth Data Project
===============================
Doc South Data provides access to some of the Documenting The American South collections in formats that work well with common text mining and data analysis tools.

Documenting the American South is one of the longest running digital publishing initiatives at the University of North Carolina. It was designed to give researchers digital access to some of the library’s unique collections in the form of high quality page scans as well as structured, corrected and machine readable text.

Doc South Data is an extension of this original goal and has been designed for researchers who want to use emerging technology to look for patterns across entire texts or compare patterns found in multiple texts. We have made it easy to use tools such as Voyant to conduct simple word counts and frequency visualizations (such as word clouds) or to use other tools to perform more complex processes such as topic modeling, named-entity recognition or sentiment analysis