Skip to content

Commit 22d7508

Browse files
authored
chore: update README and docs
1 parent 91e37a3 commit 22d7508

8 files changed

+66
-63
lines changed

README.md

+24-24
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,23 @@
44

55
## Background
66

7-
This bot has been developed in an attempt to help capture possible vandalism. This includes:
8-
9-
- Removing all code
10-
- Replacing all content with nonsense
11-
- Replacing all content with repeated words
12-
- Adding solutions to their questions instead of posting an answer
13-
- Removing large amounts of text from their post
14-
- Using certain keywords or offensive language within the edit summary
7+
This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:
8+
9+
- remove all code
10+
- replace content with nonsense or repeated words
11+
- include solutions to questions
12+
- remove large amounts of text from the post
13+
- use certain keywords or offensive language within the edit summary
1514

1615
## Why do we need the bot?
1716

1817
The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.
1918

2019
## Implementation
2120

22-
The bot queries the [Stack Exchange API][1] once every minute to get a list of the latest posts. There is logic to check that the post has been edited and that it has been edited by the author.
21+
The bot queries the [Stack Exchange API][1] every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.
2322

24-
The `post_id` from each post is then taken and the [Stack Exchange API][2] is again queried for the list of revisions. To limit calls we utilise the functionality of pushing multiple ids into the API and then logic is in place to ensure we are using the latest revision.
23+
The `post_id` from each post is then extracted and the [Stack Exchange API][2] is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.
2524

2625
Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.
2726

@@ -33,17 +32,17 @@ Edits can be made up of a title change, body change of a question, tag changes o
3332

3433
### The question/answer body is run through the following filters:
3534

36-
- `TextRemoved`; 80% or more of the body must have been removed and then it must have a [Jaro Winkler][3] score of less than 0.6
35+
- `TextRemoved`; the bot checks if 80% or more of the body has been removed and whether the [Jaro Winkler][3] score of the diff is less than 0.6.
3736
- `BlacklistedWords`; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch for
38-
- `CodeRemoved`; the bot watches for all code being removed
39-
- `FewUniqueCharacters`; the body must either be 30 plus characters long and have less than 7 unique characters or be 100 plus characters long and have less than 16 unique characters
40-
- `RepeatedWords`; this is when an edit is made were all the body is replaced with repeated words. The bot will output if 5 or less unique words are found
41-
- `VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code is removed before the check is done
37+
- `CodeRemoved`; the bot checks if the latest edit removed all code from the post.
38+
- `FewUniqueCharacters`; the bot checks if the post contains few unique characters — this rule is similar to [SmokeDetector's "Few unique characters" one](https://metasmoke.erwaysoftware.com/reason/23).
39+
- `RepeatedWords`; the bot checks whether there are 5 or less unique words in the post.
40+
- `VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.
4241

4342
### Edit summaries are run through the following filters:
4443

45-
- `BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for
46-
- `OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file
44+
- `BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.
45+
- `OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.
4746

4847
## Accounts
4948

@@ -76,17 +75,19 @@ A sample image of a report is:
7675

7776
mvn clean install
7877

79-
- Fill in `properties/login.properties`.
80-
- Start the bot:
78+
- Run
79+
80+
cp properties/login.example.properties properties/login.properties
81+
82+
and fill `properties/login.properties`.
83+
84+
- Start the bot by running:
8185

8286
java -cp target/belisarius-1.7.1.jar:./lib/* bugs.stackoverflow.belisarius.Application
8387

8488
-----
8589

86-
If you want to change the location of the log file, edit `src/main/resources/log4j.xml` and change the path in line 16.
87-
Please note that the project should be rebuilt (`mvn install`), for the changes to be applied.
88-
89-
The source code is available on [GitHub][8] and suggestions are welcome.
90+
If you want to change the location of the log file, edit `src/main/resources/log4j.xml`. The project must be rebuilt (`mvn install`), for the changes to be applied.
9091

9192
[1]: https://api.stackexchange.com/docs/posts
9293
[2]: https://api.stackexchange.com/docs/revisions-by-ids
@@ -95,4 +96,3 @@ The source code is available on [GitHub][8] and suggestions are welcome.
9596
[5]: http://chat.stackoverflow.com/rooms/111347/sobotics
9697
[6]: http://belisarius.sobotics.org/commands
9798
[7]: https://user-images.githubusercontent.com/38133098/94342659-2af8d680-001b-11eb-9842-e6d0f5f4a70b.png
98-
[8]: https://github.com/SOBotics/Belisarius

docs/commands.md

+10-8
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
# Commands
22

3-
The list of commands is as follows
3+
The list of commands is as follows:
44

5-
alive - Test to check if the bot is alive or not.
6-
check - Checks a post, through the API, for potential vandalism (must be either a moderator or a room owner).
7-
help - Returns the description of the bot.
8-
quota - Returns the current quota
9-
reboot - Stops and starts the bot (must be either a moderator or a room owner).
10-
stop - Stops the bot (must be a either a moderator or a room owner).
11-
commands - Returns the list of commands associated with this bot.
5+
| Commands | Description |
6+
|------------|-------------------------------------------------------------------------------------------------------|
7+
| `alive` | Test to check if the bot is alive or not |
8+
| `check` | Checks a post, through the API, for potential vandalism (must be either a moderator or a room owner). |
9+
| `help` | Returns the description of the bot. |
10+
| `quota` | Returns the current API quota |
11+
| `reboot` | Restarts the bot (must be either a moderator or a room owner). |
12+
| `stop` | Stops the bot (must be a either a moderator or a room owner). |
13+
| `commands` | Returns the list of commands associated with this bot. |

docs/comments.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Auto-comments
22

3-
Sometimes, it's good to leave some comments in a vandalised post to help OP understand what they did wrong. Here are some you can use.
4-
They are in the format used by the [Stack Exchange AutoReview Comments userscript](https://stackapps.com/q/2116/58907) so that you can import and use them easily.
3+
In cases of potential vandalism, consider leaving constructive comments to help the post author understand their mistake. Below you can find a list of auto comments (in a format compatible with the [Stack Exchange AutoReview Comments userscript](https://stackapps.com/q/2116)) that you can import and use easily:
54

65
```
76
###[Q] Vandalism
@@ -14,4 +13,4 @@ Editing questions to improve them (e.g. adding additional information, etc.) is
1413
Please don't make more work for others by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under the [CC BY-SA 4.0 license](//creativecommons.org/licenses/by-sa/4.0), for SE to distribute the content (i.e. regardless of your future choices). By SE policy, the non-vandalized version is distributed. Thus, any vandalism will be reverted. Please see: [How does deleting work?](//meta.stackexchange.com/q/5221). If permitted to delete, there's a "delete" button below the post, on the left, but it's only in browsers, not the mobile app.
1514
```
1615

17-
*Important! The second and the third autocomments are taken from Makyen ([1](https://stackoverflow.com/posts/comments/113202985), [2](https://stackoverflow.com/posts/comments/113198538)).*
16+
For the second and third auto-comment, credit goes to Makyen ([1](https://stackoverflow.com/posts/comments/113202985), [2](https://stackoverflow.com/posts/comments/113198538)).

docs/feedback.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ There are two feedback:
99

1010
### How can I send feedback?
1111

12-
There are two ways to do this:
12+
There are two ways that you can send feedback:
1313

1414
1. Reply to the report with either `tp` or `fp`. There's a [userscript](https://github.com/SOBotics/Userscripts/blob/master/Belisarius/Belisarius_Controls.user.js) which may be helpful.
1515
2. Go to the respective Higgs report (click the "Hippo" link) and select the type of feedback you wish to send.

docs/filters.md

+10-12
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,22 @@ In order to find if a post has been wrongly edited, titles, bodies and edit summ
44

55
### Titles are run through the following filters:
66

7-
- `BlacklistedWords`; the title matches a blacklisted regex.
7+
- `BlacklistedWords`; certain words are appended to titles. The bot reads a file which holds a list of keywords to watch out for within titles
88

99
### The question/answer body is run through the following filters:
1010

11-
- `TextRemoved`; 80% or more of the body must have been removed with a [Jaro Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) score of less than 0.6
12-
- `BlacklistedWords`; the post matches a blacklisted regex.
13-
- `CodeRemoved`; the code has been removed with the latest edit (for questions only).
14-
- `FewUniqueCharacters`; the body is either 30+ characters long and has less than 7 unique characters or 100+ characters long and has less than 15 unique characters.
15-
- `RepeatedWords`; the body has been replaced with 5 or less unique words as of the latest edit
16-
- `VeryLongWord`; there a word bigger than 50 characters in the body.
11+
- `TextRemoved`; the bot checks if 80% or more of the body has been removed and whether the [Jaro Winkler][3] score of the diff is less than 0.6.
12+
- `BlacklistedWords`; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch for
13+
- `CodeRemoved`; the bot checks if the latest edit removed all code from the post.
14+
- `FewUniqueCharacters`; the bot checks if the post contains few unique characters — this rule is similar to [SmokeDetector's "Few unique characters" one](https://metasmoke.erwaysoftware.com/reason/23).
15+
- `RepeatedWords`; the bot checks whether there are 5 or less unique words in the post.
16+
- `VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.
1717

1818
### Edit summaries are run through the following filters:
1919

20-
- `BlacklistedWords`; the edit summary matches a blacklisted regex.
21-
- `OffensiveWord`; the edit summary matches an offensive regex.
22-
23-
**Note**: In order to reduce false positives in `VeryLongWord`, `TextRemoved` and `BlacklistedWords` reasons, some HTML tags are stripped (`a`, `code`, `img`, `pre`, `blockquote`).
20+
- `BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.
21+
- `OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.
2422

2523
### Where's the list of blacklisted and offensive words?
2624

27-
The bot fetches the blacklisted and offensive regexes from the database. You can find the blacklisted words CSV [here](https://github.com/SOBotics/Belisarius/blob/e5e7be6425209a2bb217275c901d0790d76a1c2f/ini/BlacklistedWords.csv) and the one with the offensive words [here](https://github.com/SOBotics/Belisarius/blob/e5e7be6425209a2bb217275c901d0790d76a1c2f/ini/OffensiveWords.csv).
25+
The bot fetches the blacklisted and offensive regexes from the database. You can find the [blacklisted words CSV here](https://github.com/SOBotics/Belisarius/blob/master/ini/BlacklistedWords.csv) and the [offensive words CSV here](https://github.com/SOBotics/Belisarius/blob/master/ini/OffensiveWords.csv).

docs/hippo.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
Hippo is a Higgs web dashboard for Belisarius. It is the place where all the posts the bot catches are sent. Most of its features are publicly available, however you need to sign in to send feedback.
44

5-
Higgs is developed by [Rob](https://github.com/rjrudman) and hosted by Das_Geek. The GitHub repository is [here](https://github.com/SOBotics/Higgs).
5+
Higgs is developed by [Rob](https://github.com/rjrudman) and hosted by Das_Geek. You can find [the GitHub repository here](https://github.com/SOBotics/Higgs).
66

77
Here's how a report looks like:
88

9-
[![Higgs report](https://i.stack.imgur.com/ffjpv.png)](https://i.stack.imgur.com/ffjpv.png)
9+
[![Higgs report](https://i.sstatic.net/ffjpv.png)](https://i.sstatic.net/ffjpv.png)
1010

11-
- The prepended string `Answer to:` is added by the code and it means that the post is an answer.
11+
- `Answer to:` is prepended to the title if the reported post is an answer.
1212
- Current body contains the text the latest revision had at the time of reporting.
1313
- Similarly, last body contains the text the previous revision had at the time of reporting.
1414
- Confidence is the score of each reason.

docs/index.md

+8-8
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,23 @@
22

33
## Background
44

5-
This bot has been developed in an attempt to help capture possible vandalism. This includes:
5+
This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:
66

7-
- Removing all code
8-
- Replacing all content with nonsense/repeated words
9-
- Adding solutions to their questions instead of posting an answer
10-
- Removing large amounts of text from their post
11-
- Using certain keywords or offensive language within the edit summary
7+
- remove all code
8+
- replace content with nonsense or repeated words
9+
- include solutions to questions
10+
- remove large amounts of text from the post
11+
- use certain keywords or offensive language within the edit summary
1212

1313
## Why do we need the bot?
1414

1515
The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.
1616

1717
## Implementation
1818

19-
The bot queries the [Stack Exchange API][1] once every minute to get a list of the latest posts. There is logic to check that the post has been edited and that it has been edited by the author.
19+
The bot queries the [Stack Exchange API][1] every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.
2020

21-
The `post_id` from each post is then taken and the [Stack Exchange API][2] is again queried for the list of revisions. To limit calls we utilise the functionality of pushing multiple ids into the API and then logic is in place to ensure we are using the latest revision.
21+
The `post_id` from each post is then extracted and the [Stack Exchange API][2] is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.
2222

2323
Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.
2424

docs/run.md

+8-4
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,16 @@
1717

1818
mvn clean install
1919

20-
- Fill in `properties/login.properties`.
21-
- Start the bot:
20+
- Run
21+
22+
cp properties/login.example.properties properties/login.properties
23+
24+
and fill `properties/login.properties`.
25+
26+
- Start the bot by running:
2227

2328
java -cp target/belisarius-1.7.1.jar:./lib/* bugs.stackoverflow.belisarius.Application
2429

2530
-----
2631

27-
If you want to change the location of the log file, edit `src/main/resources/log4j.xml` and change the path in line 16.
28-
Please note that the project should be rebuilt (`mvn install`), for the changes to be applied.
32+
If you want to change the location of the log file, edit `src/main/resources/log4j.xml`. The project must be rebuilt (`mvn install`), for the changes to be applied.

0 commit comments

Comments
 (0)