You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+24-24
Original file line number
Diff line number
Diff line change
@@ -4,24 +4,23 @@
4
4
5
5
## Background
6
6
7
-
This bot has been developed in an attempt to help capture possible vandalism. This includes:
8
-
9
-
- Removing all code
10
-
- Replacing all content with nonsense
11
-
- Replacing all content with repeated words
12
-
- Adding solutions to their questions instead of posting an answer
13
-
- Removing large amounts of text from their post
14
-
- Using certain keywords or offensive language within the edit summary
7
+
This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:
8
+
9
+
- remove all code
10
+
- replace content with nonsense or repeated words
11
+
- include solutions to questions
12
+
- remove large amounts of text from the post
13
+
- use certain keywords or offensive language within the edit summary
15
14
16
15
## Why do we need the bot?
17
16
18
17
The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.
19
18
20
19
## Implementation
21
20
22
-
The bot queries the [Stack Exchange API][1]once every minute to get a list of the latest posts. There is logic to check that the post has been edited and that it has been edited by the author.
21
+
The bot queries the [Stack Exchange API][1] every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.
23
22
24
-
The `post_id` from each post is then taken and the [Stack Exchange API][2] is again queried for the list of revisions. To limit calls we utilise the functionality of pushing multiple ids into the API and then logic is in place to ensure we are using the latest revision.
23
+
The `post_id` from each post is then extracted and the [Stack Exchange API][2] is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.
25
24
26
25
Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.
27
26
@@ -33,17 +32,17 @@ Edits can be made up of a title change, body change of a question, tag changes o
33
32
34
33
### The question/answer body is run through the following filters:
35
34
36
-
-`TextRemoved`; 80% or more of the body must have been removed and then it must have a [Jaro Winkler][3] score of less than 0.6
35
+
-`TextRemoved`; the bot checks if 80% or more of the body has been removed and whether the [Jaro Winkler][3] score of the diff is less than 0.6.
37
36
-`BlacklistedWords`; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch for
38
-
-`CodeRemoved`; the bot watches for all code being removed
39
-
-`FewUniqueCharacters`; the body must either be 30 plus characters long and have less than 7 unique characters or be 100 plus characters long and have less than 16 unique characters
40
-
-`RepeatedWords`; this is when an edit is made were all the body is replaced with repeated words. The bot will output if 5 or less unique words are found
41
-
-`VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code is removed before the check is done
37
+
-`CodeRemoved`; the bot checks if the latest edit removed all code from the post.
38
+
-`FewUniqueCharacters`; the bot checks if the post contains few unique characters — this rule is similar to [SmokeDetector's "Few unique characters" one](https://metasmoke.erwaysoftware.com/reason/23).
39
+
-`RepeatedWords`; the bot checks whether there are 5 or less unique words in the post.
40
+
-`VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.
42
41
43
42
### Edit summaries are run through the following filters:
44
43
45
-
-`BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for
46
-
-`OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file
44
+
-`BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.
45
+
-`OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.
47
46
48
47
## Accounts
49
48
@@ -76,17 +75,19 @@ A sample image of a report is:
If you want to change the location of the log file, edit `src/main/resources/log4j.xml` and change the path in line 16.
87
-
Please note that the project should be rebuilt (`mvn install`), for the changes to be applied.
88
-
89
-
The source code is available on [GitHub][8] and suggestions are welcome.
90
+
If you want to change the location of the log file, edit `src/main/resources/log4j.xml`. The project must be rebuilt (`mvn install`), for the changes to be applied.
Copy file name to clipboardexpand all lines: docs/comments.md
+2-3
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
1
# Auto-comments
2
2
3
-
Sometimes, it's good to leave some comments in a vandalised post to help OP understand what they did wrong. Here are some you can use.
4
-
They are in the format used by the [Stack Exchange AutoReview Comments userscript](https://stackapps.com/q/2116/58907) so that you can import and use them easily.
3
+
In cases of potential vandalism, consider leaving constructive comments to help the post author understand their mistake. Below you can find a list of auto comments (in a format compatible with the [Stack Exchange AutoReview Comments userscript](https://stackapps.com/q/2116)) that you can import and use easily:
5
4
6
5
```
7
6
###[Q] Vandalism
@@ -14,4 +13,4 @@ Editing questions to improve them (e.g. adding additional information, etc.) is
14
13
Please don't make more work for others by vandalizing your posts. By posting on the Stack Exchange (SE) network, you've granted a non-revocable right, under the [CC BY-SA 4.0 license](//creativecommons.org/licenses/by-sa/4.0), for SE to distribute the content (i.e. regardless of your future choices). By SE policy, the non-vandalized version is distributed. Thus, any vandalism will be reverted. Please see: [How does deleting work?](//meta.stackexchange.com/q/5221). If permitted to delete, there's a "delete" button below the post, on the left, but it's only in browsers, not the mobile app.
15
14
```
16
15
17
-
*Important! The second and the third autocomments are taken from Makyen ([1](https://stackoverflow.com/posts/comments/113202985), [2](https://stackoverflow.com/posts/comments/113198538)).*
16
+
For the second and third auto-comment, credit goes to Makyen ([1](https://stackoverflow.com/posts/comments/113202985), [2](https://stackoverflow.com/posts/comments/113198538)).
Copy file name to clipboardexpand all lines: docs/feedback.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ There are two feedback:
9
9
10
10
### How can I send feedback?
11
11
12
-
There are two ways to do this:
12
+
There are two ways that you can send feedback:
13
13
14
14
1. Reply to the report with either `tp` or `fp`. There's a [userscript](https://github.com/SOBotics/Userscripts/blob/master/Belisarius/Belisarius_Controls.user.js) which may be helpful.
15
15
2. Go to the respective Higgs report (click the "Hippo" link) and select the type of feedback you wish to send.
Copy file name to clipboardexpand all lines: docs/filters.md
+10-12
Original file line number
Diff line number
Diff line change
@@ -4,24 +4,22 @@ In order to find if a post has been wrongly edited, titles, bodies and edit summ
4
4
5
5
### Titles are run through the following filters:
6
6
7
-
-`BlacklistedWords`; the title matches a blacklisted regex.
7
+
-`BlacklistedWords`; certain words are appended to titles. The bot reads a file which holds a list of keywords to watch out for within titles
8
8
9
9
### The question/answer body is run through the following filters:
10
10
11
-
-`TextRemoved`; 80% or more of the body must have been removed with a [Jaro Winkler](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) score of less than 0.6
12
-
-`BlacklistedWords`; the post matches a blacklisted regex.
13
-
-`CodeRemoved`; the code has been removed with the latest edit (for questions only).
14
-
-`FewUniqueCharacters`; the body is either 30+ characters long and has less than 7 unique characters or 100+ characters long and has less than 15 unique characters.
15
-
-`RepeatedWords`; the body has been replaced with 5 or less unique words as of the latest edit
16
-
-`VeryLongWord`; there a word bigger than 50 characters in the body.
11
+
-`TextRemoved`; the bot checks if 80% or more of the body has been removed and whether the [Jaro Winkler][3] score of the diff is less than 0.6.
12
+
-`BlacklistedWords`; certain words are appended to posts. The bot reads a separate file for questions and answers. Both hold a list of keywords to watch for
13
+
-`CodeRemoved`; the bot checks if the latest edit removed all code from the post.
14
+
-`FewUniqueCharacters`; the bot checks if the post contains few unique characters — this rule is similar to [SmokeDetector's "Few unique characters" one](https://metasmoke.erwaysoftware.com/reason/23).
15
+
-`RepeatedWords`; the bot checks whether there are 5 or less unique words in the post.
16
+
-`VeryLongWord`; the bot checks the post for a word longer than 50 characters long. Code blocks are stripped before the check is performed.
17
17
18
18
### Edit summaries are run through the following filters:
19
19
20
-
-`BlacklistedWords`; the edit summary matches a blacklisted regex.
21
-
-`OffensiveWord`; the edit summary matches an offensive regex.
22
-
23
-
**Note**: In order to reduce false positives in `VeryLongWord`, `TextRemoved` and `BlacklistedWords` reasons, some HTML tags are stripped (`a`, `code`, `img`, `pre`, `blockquote`).
20
+
-`BlacklistedWords`; certain words are used within the edit summaries. The bot holds a separate file for question edit summaries and answer edit summaries. Both hold a list of keywords to watch for.
21
+
-`OffensiveWord`; the bot checks for offensive language used within the edit summary. This is done via a separate regex file.
24
22
25
23
### Where's the list of blacklisted and offensive words?
26
24
27
-
The bot fetches the blacklisted and offensive regexes from the database. You can find the blacklisted words CSV [here](https://github.com/SOBotics/Belisarius/blob/e5e7be6425209a2bb217275c901d0790d76a1c2f/ini/BlacklistedWords.csv) and the one with the offensive words [here](https://github.com/SOBotics/Belisarius/blob/e5e7be6425209a2bb217275c901d0790d76a1c2f/ini/OffensiveWords.csv).
25
+
The bot fetches the blacklisted and offensive regexes from the database. You can find the [blacklisted words CSV here](https://github.com/SOBotics/Belisarius/blob/master/ini/BlacklistedWords.csv) and the [offensive words CSV here](https://github.com/SOBotics/Belisarius/blob/master/ini/OffensiveWords.csv).
Copy file name to clipboardexpand all lines: docs/hippo.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -2,13 +2,13 @@
2
2
3
3
Hippo is a Higgs web dashboard for Belisarius. It is the place where all the posts the bot catches are sent. Most of its features are publicly available, however you need to sign in to send feedback.
4
4
5
-
Higgs is developed by [Rob](https://github.com/rjrudman) and hosted by Das_Geek. The GitHub repository is [here](https://github.com/SOBotics/Higgs).
5
+
Higgs is developed by [Rob](https://github.com/rjrudman) and hosted by Das_Geek. You can find [the GitHub repository here](https://github.com/SOBotics/Higgs).
Copy file name to clipboardexpand all lines: docs/index.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -2,23 +2,23 @@
2
2
3
3
## Background
4
4
5
-
This bot has been developed in an attempt to help capture possible vandalism. This includes:
5
+
This bot has been developed in an attempt to help capture possible vandalism by identifying edits that:
6
6
7
-
-Removing all code
8
-
-Replacing all content with nonsense/repeated words
9
-
-Adding solutions to their questions instead of posting an answer
10
-
-Removing large amounts of text from their post
11
-
-Using certain keywords or offensive language within the edit summary
7
+
-remove all code
8
+
-replace content with nonsense or repeated words
9
+
-include solutions to questions
10
+
-remove large amounts of text from the post
11
+
-use certain keywords or offensive language within the edit summary
12
12
13
13
## Why do we need the bot?
14
14
15
15
The point of the bot is to help identify bad edits and/or potential vandalism made to posts in real time so that the changes can be quickly rolled back.
16
16
17
17
## Implementation
18
18
19
-
The bot queries the [Stack Exchange API][1]once every minute to get a list of the latest posts. There is logic to check that the post has been edited and that it has been edited by the author.
19
+
The bot queries the [Stack Exchange API][1] every minute to fetch a list of the most recently edited posts. There is logic to check that the post has been edited and that it has been edited by the author.
20
20
21
-
The `post_id` from each post is then taken and the [Stack Exchange API][2] is again queried for the list of revisions. To limit calls we utilise the functionality of pushing multiple ids into the API and then logic is in place to ensure we are using the latest revision.
21
+
The `post_id` from each post is then extracted and the [Stack Exchange API][2] is again queried for a list of revisions. To reduce API calls multiple ids are sent at once, and then logic is in place to ensure we are using the latest revision.
22
22
23
23
Edits can be made up of a title change, body change of a question, tag changes or changes made to the body of an answer. Currently tags are not checked. Instead the title, question body and answer body depending on what has been edited are run through filters, as is the edit summary.
If you want to change the location of the log file, edit `src/main/resources/log4j.xml` and change the path in line 16.
28
-
Please note that the project should be rebuilt (`mvn install`), for the changes to be applied.
32
+
If you want to change the location of the log file, edit `src/main/resources/log4j.xml`. The project must be rebuilt (`mvn install`), for the changes to be applied.
0 commit comments