Skip to content

Commit

Permalink
fix: decode/clean garbled text using config
Browse files Browse the repository at this point in the history
  • Loading branch information
ciatph committed Apr 21, 2024
1 parent d6332ed commit 3cf7396
Show file tree
Hide file tree
Showing 5 changed files with 61 additions and 9 deletions.
3 changes: 2 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
EXCEL_FILE_URL=https://pubfiles.pagasa.dost.gov.ph/pagasaweb/files/climate/tendayweatheroutlook/day1.xlsx
DEFAULT_EXCEL_FILE_URL=https://pubfiles.pagasa.dost.gov.ph/pagasaweb/files/climate/tendayweatheroutlook/day1.xlsx
SHEETJS_COLUMN=__EMPTY
SORT_ALPHABETICAL=1
SORT_ALPHABETICAL=1
SPECIAL_CHARACTERS=├â┬▒:ñ,â:
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
## ph-municipalities

**ph-municipalities** have **npm scripts** that allow interactive querying of Philippines municipalities included in one or more provinces or from a whole region, with an option of writing them to JSON files from the command line.
**ph-municipalities** have **NPM scripts** that allow interactive querying of Philippines municipalities included in one or more provinces or from a whole region, with an option of writing them to JSON files from the command line.

It uses `/data/day1.xlsx` (downloaded and stored as of this 20220808) from PAGASA's [10-day weather forecast excel files](https://www.pagasa.dost.gov.ph/climate/climate-prediction/10-day-climate-forecast) as the default data source.

It also asks users to key in the download URL of a remote excel file should they want to use another excel file for a new and updated data source.
It also asks users to key in the download URL of a remote PAGASA 10-Day weather forecast excel file should they want to use another excel file for a new and updated data source.

Extracted municipalities are written in JSON files following the format:

Expand Down Expand Up @@ -81,8 +81,9 @@ The following dependencies are used for this project. Feel free to use other dep
| Variable Name | Description |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| EXCEL_FILE_URL | (Optional) Remote excel file's download URL.<br>If provided, the excel file will be downloaded and saved on the specified `pathToFile` local filesystem location during the `ExcelFile` class initialization.<br>Read on [Usage](#usage) for more information. |
| SHEETJS_COLUMN | Column name read by [sheetjs](https://sheetjs.com/) in an excel file.<br>This column contains the municipality and province names following the string pattern<br>`"municipalityName (provinceName)"`<br>Default value is `__EMPTY` |
| SORT_ALPHABETICAL | Arranges the municipality names in alphabetical order.<br>Default value is `1`. Set to `0` to use the ordering as read from the Excel file. |
| SHEETJS_COLUMN | Column name read by [sheetjs](https://sheetjs.com/) in an excel file.<br>This column contains the municipality and province names following the string pattern<br>`"municipalityName (provinceName)"`<br>Default value is `__EMPTY`|
| SORT_ALPHABETICAL | Arranges the municipality names in alphabetical order.<br>Default value is `1`. Set to `0` to use the ordering as read from the Excel file. |
| SPECIAL_CHARACTERS | Key-value pairs of special characters or garbled text and their normalized text conversions, delimited by the `":"` character.<br>Multiple key-value pairs are delimited by the `","` character.<br>If a special character key's value is a an empty string, write it as i.e.,: `"some-garbled-text:"` |

## Available Scripts

Expand Down
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "ph-municipalities",
"version": "1.0.9",
"version": "1.0.10",
"description": "List and write the `municipalities` of Philippines provinces or regions into JSON files",
"main": "index.js",
"scripts": {
Expand Down
52 changes: 51 additions & 1 deletion src/classes/excel/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,52 @@ class ExcelFile {
return /[a-zA-z] *\([^)]*\) */.test(str)
}

/**
* Checks if a string contains special characters
* @param {String} str - String to check
* @returns {Bool}
*/
static hasSpecialChars (str) {
/* eslint-disable no-control-regex */
const regex = /[^\x00-\x7F]/g
return regex.test(str)
}

/**
* Cleans/removes default-known special characters and garbled text defined in config from string.
* @param {String} str - String to clean
* @returns {String} - Clean string
*/
static removeGarbledText (str) {
// Known garbled special text
let charMap = {
'├â┬▒': 'ñ', // Replace "├â┬▒" with "ñ"
â: '' // Remove "â"
}

// Other special characters from config
const specialChars = (process.env.SPECIAL_CHARACTERS?.split(',') ?? [])
.reduce((list, item) => {
const [key, value] = item.split(':')

return {
...list,
...((key || value) && { [key]: value ?? '' })
}
}, {})

charMap = {
...charMap,
...specialChars
}

for (const [key, value] of Object.entries(charMap)) {
str = str.replace(new RegExp(key, 'g'), value)
}

return str
}

/**
* Extract the municipality name from a string following the pattern:
* "municipalityName (provinceName)"
Expand Down Expand Up @@ -267,7 +313,11 @@ class ExcelFile {
acc[item.province] = []
}

acc[item.province].push(item.municipality)
const cleanText = ExcelFile.hasSpecialChars(item.municipality)
? ExcelFile.removeGarbledText(item.municipality)
: item.municipality

acc[item.province].push(cleanText)

// Sort municipality names alphabetically
if (process.env.SORT_ALPHABETICAL === '1') {
Expand Down

0 comments on commit 3cf7396

Please sign in to comment.