Living-with-machines
diff --git a/‎README.md
+17-20 b/‎README.md
+17-20
diff --git a/‎docs/Demo.md
+31-20 b/‎docs/Demo.md
+31-20
diff --git a/‎docs/README.md
+31-30 b/‎docs/README.md
+31-30
@@ -56,17 +56,16 @@ pip install -i https://test.pypi.org/simple/ alto2txt==0.3.1a20
 
 ## Usage
 
-Downsampling can be used to convert only every Nth issue of each newspaper. One text file is output per article, each complemented by one XML metadata file.
+Downsampling can be used to convert only every Nth issue of each newspaper. One text file is output per article, each complemented by one `XML` metadata file.
 
 
 
 ```
-extract_publications_text.py [-h] [-d [DOWNSAMPLE]]
-                                    [-p [PROCESS_TYPE]]
-                                    [-l [LOG_FILE]]
-                                    [-n [NUM_CORES]]
-                                    xml_in_dir txt_out_dir
-
+usage: alto2txt [-h] [-p [PROCESS_TYPE]] [-l [LOG_FILE]] [-d [DOWNSAMPLE]] [-n [NUM_CORES]]
+                xml_in_dir txt_out_dir
+alto2txt [-h] [-p [PROCESS_TYPE]] [-l [LOG_FILE]] [-d [DOWNSAMPLE]] [-n [NUM_CORES]]
+         xml_in_dir txt_out_dir
+                                    
 Converts XML publications to plaintext articles
 
 positional arguments:
@@ -75,19 +74,17 @@ positional arguments:
 
 optional arguments:
   -h, --help            show this help message and exit
-  -d [DOWNSAMPLE], --downsample [DOWNSAMPLE]
-                        Downsample. Default 1
+  -p [PROCESS_TYPE], --process-type [PROCESS_TYPE]
+                        Process type. One of: single,serial,multi,spark Default: multi
   -l [LOG_FILE], --log-file [LOG_FILE]
                         Log file. Default out.log
-  -p [PROCESS_TYPE], --process-type [PROCESS_TYPE]
-                        Process type.
-                        One of: single,serial,multi,spark
-                        Default: multi
+  -d [DOWNSAMPLE], --downsample [DOWNSAMPLE]
+                        Downsample. Default 1
   -n [NUM_CORES], --num-cores [NUM_CORES]
                         Number of cores (Spark only). Default 1")
 ```
 
-`xml_in_dir` is expected to hold XML for multiple publications, in the following structure:
+`xml_in_dir` is expected to hold `XML` for multiple publications, in the following structure:
 
 ```
 xml_in_dir
@@ -129,32 +126,32 @@ The following `XSLT` files need to be in an `extract_text.xslts` module:
 
 ## Process publications
 
-Assume `~/BNA` exists and matches the structure above.
+Assume folder `BNA` exists and matches the structure above.
 
 Extract text from every publication:
 
 ```bash
-./extract_publications_text.py ~/BNA txt
+alto2txt BNA txt
 ```
 
 Extract text from every 100th issue of every publication:
 
 ```bash
-./extract_publications_text.py ~/BNA txt -d 100
+alto2txt BNA txt -d 100
 ```
 
 ## Process a single publication
 
 Extract text from every issue of a single publication:
 
 ```bash
-./extract_publications_text.py -p single ~/BNA/0000151 txt
+alto2txt -p single BNA/0000151 txt
 ```
 
 Extract text from every 100th issue of a single publication:
 
 ```bash
-./extract_publications_text.py -p single ~/BNA/0000151 txt -d 100
+alto2txt -p single BNA/0000151 txt -d 100
 ```
 
 ## Configure logging
@@ -164,7 +161,7 @@ By default, logs are put in `out.log`.
 To specify an alternative location for logs, use the `-l` flag e.g.
 
 ```bash
-./extract_publications_text.py -l mylog.txt ~/BNA txt -d 100 2> err.log
+alto2txt -l mylog.txt BNA txt -d 100 2> err.log
 ```
 
 ## Process publications via Spark
 
@@ -1,10 +1,10 @@
 # Demo
 
-A working example of alto2txt.
+A working example of `alto2txt`.
 
-Input xml files from digitised newspapers create an object for every section, paragraph, sentence, and individual word, making it difficult to read articles. Each newspaper page has an associated alto (.xml) file with content, and the pages share a mets (.xml) file with meta data about what articles/other content contain and where.
+Input `XML` files from digitised newspapers create an object for every section, paragraph, sentence, and individual word, making it difficult to read articles. Each newspaper page has an associated alto (`.xml`) file with content, and the pages share a mets (`.xml`) file with meta data about what articles/other content contain and where.
 
-The resulting .txt files are one per article, which may span multiple newspaper pages.
+The resulting `.txt` files are one per article, which may span multiple newspaper pages.
 
 ## Quick Demo
 
@@ -17,10 +17,21 @@ Navigate to an empty directory in the terminal and run the following commands:
 > cd alto2txt
 > conda create -n py37alto python=3.7
 > conda activate py37alto
-> pip install -r requirements.txt
-> ./extract_publications_text.py -p single demo-files demo-output
 ```
-The resulting plain text files of the articles are in `alto2txt/demo-output/`.
+
+To install that checkout you can 
+```
+> pip install pyproject.toml
+```
+or you can simply install the latest release (but this may not be up to date with local changes)
+```
+> pip install alto2txt
+```
+regardless this should make the following command run
+```
+> alto2txt -p single demo-files demo-output
+```
+and the resulting plain text files of the articles will be in `alto2txt/demo-output/`.
 
 Read on for a more in-depth explanation.
 
@@ -32,7 +43,7 @@ It is recommended to use [Anaconda](https://docs.anaconda.com/anaconda/install/i
 
 #### Download the code directory
 
-If you are familiar with git, use the following command in a blank directory from your terminal:
+If you are familiar with `git`, use the following command in a blank directory from your terminal:
 
 ```
 git clone https://github.com/Living-with-machines/alto2txt.git
@@ -63,30 +74,30 @@ conda activate py37alto
 Install the required packages which are outlined in `requirements.txt`:
 
 ```
-pip install -r requirements.txt
+pip install pyproject.toml
 ```
-Follow the instructions to download and install the packages. You should now have all the required Python packages within your conda environment to run Alto2txt.
+Follow the instructions to download and install the packages. You should now have all the required Python packages within your conda environment to run `alto2txt`.
 
 
 
-## Run Alto2Txt
+## Run `alto2txt`
 
 Make sure you have navigated to the `alto2txt` directory in your terminal or Anaconda prompt. For this demo, we are using a single edition for a single publication. The output files will be created in `/demo-output` which you can check is currently empty.
 
 ```
-./extract_publications_text.py -p single demo-files demo-output
+alto2txt -p single demo-files demo-output
 ```
 
 Here we use the positional argument `-p` to determine which process type, in this case `single`. The script can be run on many publications and years by default, but in this case we only have one publication. [Click here](/#process-types) to read more about different process types.
 
-The next argument `demo-files` provides the input directory, and then `demo-output` provides the output directory (which should be empty). Once alto2txt has run, the output directory structure will mirror the input directory.
+The next argument `demo-files` provides the input directory, and then `demo-output` provides the output directory (which should be empty). Once `alto2txt` has run, the output directory structure will mirror the input directory.
 
 We will now look in more detail at the ALTO/METS input files and output plain text files.
 
 
 ## Input ALTO/METS files
 
-We ran alto2txt on the ALTO/METS files within a subdirectory called `demo-files`. These come from a newspaper published on the 17th of February, 1824. The directory tree structure is important, and will be mirrored in the output.
+We ran `alto2txt` on the ALTO/METS files within a subdirectory called `demo-files`. These come from a newspaper published on the 17th of February, 1824. The directory tree structure is important, and will be mirrored in the output.
 
 ```
 alto2txt/
@@ -119,7 +130,7 @@ There are four files with the file name ending in `_000x.xml`. These alto files
                 <String ID = "word000001" ... CONTENT = "hello" ... />
 ```
 
-Alto2txt will extract all these individual words and create a text file for each article.
+`alto2txt` will extract all these individual words and create a text file for each article.
 
 #### METS File Contents
 
@@ -136,7 +147,7 @@ Here is a short example, which defines **Article 01** as the first paragraph on
     </mets:smLinkGrp>
 </mets:structLink>
 ```
-Alto2txt will produce a `.txt` file for every Article (and other content, for example Advert) defined in this mets file.
+`alto2txt` will produce a `.txt` file for every Article (and other content, for example Advert) defined in this mets file.
 
 
 ## Output Files
@@ -163,31 +174,31 @@ A total of 26 articles are extracted from the alto files, and one advert. Each p
 
 ## Further Examples
 
-Running these steps for your own files works in the same way. Your source and/or output directory does not need to be within `/alto2txt/` as long as you put the full path name into the command arguments.
+Running these steps for your own files works in the same way. Your source and/or output directory as long as you put the path name into the command arguments.
 
 
 #### Run on a single publication, multiple years, multiple editions
 
 ```
-./extract_publications_text.py -p single input-directory output-directory
+alto2txt -p single input-directory output-directory
 ```
 
 
 #### Run on multiple publications, multiple years, multiple editions
 
 ```
-./extract_publications_text.py input-directory output-directory
+alto2txt input-directory output-directory
 ```
 
 #### Extract every 100th edition from every publication
 
 ```
-./extract_publications_text.py input-directory output-directory -d 100
+alto2txt input-directory output-directory -d 100
 ```
 Where `-d` determines the downsample value.
 
 #### Extract every 100th edition from one publication
 
 ```
-./extract_publications_text.py -p single input-directory output-directory -d 100
+alto2txt -p single input-directory output-directory -d 100
 ```
@@ -1,52 +1,54 @@
-# Alto2txt: Extract plain text from digitised newspapers
+# `alto2txt`: Extract plain text from digitised newspapers
 
 *Version extract_text 0.3.0*
 
-Alto2txt converts XML publications to plaintext articles with minimal metadata.
+`alto2txt` converts `XML` publications to plaintext articles with minimal metadata.
 ALTO and METS is the current industry standard for newspaper digitization used by hundreds of modern, large-scale newspaper digitization projects.
-One text file is output per article, each complemented by one XML metadata file.
+One text file is output per article, each complemented by one `XML` metadata file.
 
-**XML compatibility: METS 1.8/ALTO 1.4, METS 1.3/ALTO 1.4, BLN, or UKP format**
+**`XML` compatibility: METS 1.8/ALTO 1.4, METS 1.3/ALTO 1.4, BLN, or UKP format**
 
 ## Usage
 
-
+> *Note*: the formatting below is altered for readability
 ```
-extract_publications_text.py [-h [HELP]]
-                             [-d [DOWNSAMPLE]]
-                             [-p [PROCESS_TYPE]]
-                             [-l [LOG_FILE]]
-                             [-n [NUM_CORES]]
-                             xml_in_dir txt_out_dir
-
+usage: alto2txt [-h]
+                [-p [PROCESS_TYPE]]
+                [-l [LOG_FILE]]
+                [-d [DOWNSAMPLE]]
+                [-n [NUM_CORES]]
+                xml_in_dir txt_out_dir
+                                    
 Converts XML publications to plaintext articles
 
 positional arguments:
   xml_in_dir            Input directory with XML publications
   txt_out_dir           Output directory for plaintext articles
 
 optional arguments:
-  -h, --help            Show this help message and exit
-  -d, --downsample      Downsample, process every [integer] nth edition.  Default 1
-  -l, --log-file        Log file. Default out.log
-  -p, --process-type    Process type.
-                        One of: single,serial,multi,spark
-                        Default: multi
-  -n, --num-cores       Number of cores (Spark only). Default 1
+  -h, --help            show this help message and exit
+  -p [PROCESS_TYPE], --process-type [PROCESS_TYPE]
+                        Process type. One of: single,serial,multi,spark Default: multi
+  -l [LOG_FILE], --log-file [LOG_FILE]
+                        Log file. Default out.log
+  -d [DOWNSAMPLE], --downsample [DOWNSAMPLE]
+                        Downsample. Default 1
+  -n [NUM_CORES], --num-cores [NUM_CORES]
+                        Number of cores (Spark only). Default 1")
 ```
 To read about downsampling, logs, and using spark see [Advanced Information](advanced.md).
 
 
 ## Quick Install
 
-If you are comfortable with the command line, git, and already have Python & Anaconda installed, you can install Alto2txt by navigating to an empty directory in the terminal and run the following commands:
+If you are comfortable with the command line, git, and already have Python & Anaconda installed, you can install `alto2txt` by navigating to an empty directory in the terminal and run the following commands:
 
 ```
 > git clone https://github.com/Living-with-machines/alto2txt.git
 > cd alto2txt
 > conda create -n py37alto python=3.7
 > conda activate py37alto
-> pip install -r requirements.txt
+> pip install pyproject.toml
 ```
 
 [Click here](/Demo.md) for more in-depth installation instructions using demo files.
@@ -78,21 +80,21 @@ xml_in_dir/
 Assuming `xml_in_dir` follows this structure, run alto2txt with the following in the terminal:
 
 ```bash
-./extract_publications_text.py ~/xml_in_dir ~/txt_out_dir
+alto2txt xml_in_dir txt_out_dir
 ```
 
 To downsample and only process every 100th edition:
 
 ```bash
-./extract_publications_text.py ~/xml_in_dir ~/txt_out_dir -d 100
+alto2txt xml_in_dir txt_out_dir -d 100
 ```
 
 
 ## Process Single Publication
 
 [A demo for processing a single publication is available here.](Demo.md)
 
-If `-p|--process-type single` is provided then `xml_in_dir` is expected to hold XML for a single publication, in the following structure:
+If `-p|--process-type single` is provided then `xml_in_dir` is expected to hold `XML` for a single publication, in the following structure:
 
 ```
 xml_in_dir/
@@ -102,16 +104,16 @@ xml_in_dir/
   └── year
 ```
 
-Assuming `xml_in_dir` follows this structure, run alto2txt with the following in the terminal:
+Assuming `xml_in_dir` follows this structure, run `alto2txt` with the following in the terminal in the folder `xml_in_dir` is stored in:
 
 ```bash
-./extract_publications_text.py -p single ~/xml_in_dir ~/txt_out_dir
+alto2txt -p single xml_in_dir txt_out_dir
 ```
 
 To downsample and only process every 100th edition from the one publication:
 
 ```bash
-./extract_publications_text.py -p single ~/xml_in_dir ~/txt_out_dir -d 100
+alto2txt -p single xml_in_dir txt_out_dir -d 100
 ```
 
 ## Plain Text Files Output
@@ -125,7 +127,7 @@ Quality assurance is performed to check for:
 
 * Unexpected directories.
 * Unexpected files.
-* Malformed XML.
+* Malformed `XML`.
 * Empty files.
 * Files that otherwise do not expose content.
 
@@ -135,5 +137,4 @@ Quality assurance is performed to check for:
 * Check and ensure that articles that span multiple pages are pulled into a single article file.
 * Smarter handling of articles spanning multiple pages.
 
-
-> Last updated 2022-06-30
+> Last updated 2022-11-10