Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option for generation of abbreviations #253

Open
pedropaulofb opened this issue Jun 10, 2022 · 6 comments
Open

Option for generation of abbreviations #253

pedropaulofb opened this issue Jun 10, 2022 · 6 comments

Comments

@pedropaulofb
Copy link

Hi Peter @FlamingTempura!

Frequently the number of pages offered in Call for Papers is not enough for fitting all research content. In this case, it is a common practice to reduce the bibliography size using abbreviations and acronyms. Doing this manually is time-consuming and error-prone. It would be wonderful if your tool could have an option to automatically generate allowed abbreviations! As an example, "Proceedings of the International" can become "Proc. Int.".

The ‘Conference Abbreviations’ section of the IEEE Editorial Style Manual (2021) (p. 63) provides many examples of valid abbreviations.

As I always say in my contributions: congratulations on the tool! I use it constantly and I am always recommending it to my colleagues (e.g., here).

@pedropaulofb
Copy link
Author

Hi @FlamingTempura! I am using your tool in a paper and, one more time, I am facing the need to reduce the size of the generated references.

Checking for a better list of abbreviations, I found this source, which is based on ISO 4. It is, by far, the best source for the abbreviations I could find.

As it already provides the list in csv file, it is not hard to implement this feature. So please consider this source instead of the last one I sent you.

Thanks for your excelent work!

@FlamingTempura
Copy link
Owner

Thanks for the suggestion, and apologies for not responding sooner.

I can see this being a useful feature but there are some complexities:

  • How do we package the data? The linked CSV is 1.6MB (~500kb gzipped), which is a lot to add to the JS bundle, particularly for the browser. We could get the browser to dynamically load the CSV, but we'd also need to think about the CLI and JS library... probably each using separate approaches.
  • We also would need to ensure that rules are followed with replacements (e.g. not replacing names); it's not a straightforward search and replace. A JS library exists which may be worth looking into https://github.com/marcinwrochna/abbrevIso/blob/master/browserBundle.js

So while I think this could be a good feature it's looking like a difficult one to implement. That's to say, don't expect this feature soon. I'm wondering if this might be more suitable as a separate tool. I'm surprised one doesn't exist already.

@andrewfowlie
Copy link

What’s would be an acceptable database size for replacement rules?

I have a suspicion that 90% of journal abbreviations could be covered by 10% of the ISO4 replacement rules. On top of that, the third column of the rules (language) can be discarded for these purposes.

Applying a subset of ISO4 rules to the journal entry could still be very useful.

@pedropaulofb
Copy link
Author

Another option would be to start only with the English abbreviations. That would cover the most important cases.

@andrewfowlie
Copy link

andrewfowlie commented Mar 13, 2024

I will make a a few minimal databases, and let’s see the sizes.


import pandas as pd

data = pd.read_csv("ltwa_current.csv", sep="\t")

data = data.loc[(data['LANGUAGES'].str.contains('Multiple Languages')) | (data['LANGUAGES'].str.contains("English"))]  # only english
data = data.drop(columns=['LANGUAGES'])  # drop language column
data = data.dropna()  # drop nans (these are words that explicitly don't have a replacement rule)

compression_opts = dict(method='zip', archive_name='reduced_ltwa_current.csv')
data.to_csv('reduced_ltwa_current.zip', index=False, compression=compression_opts)  # save as zipped csv

This gives a 41kb compressed/122kb uncompressed database, containing only explicitly English or multiple languages rules, and removing words that are explicitly marked as do not abbreviate.

@flxmr
Copy link

flxmr commented Apr 11, 2024

https://github.com/marcinwrochna/abbrevIso this also has some JS code to do the abbreviations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants