Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erreur pour agregation puis split UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position #24

Open
gdaudin opened this issue Nov 3, 2020 · 9 comments

Comments

@gdaudin
Copy link
Collaborator

gdaudin commented Nov 3, 2020

Lorsque je lance les deux scripts pythons suivant dos à dos:

aggregate_sources_in_bdd_centrale.py
et
split_bdd_centrale_in_sources.py

j’ai l’erreur:

guillaumedaudin@Oronte scripts % python3 /Users/guillaumedaudin/Documents/Recherche/Commerce\ International\ Français\ XVIIIe.xls/Balance\ du\ commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py Traceback (most recent call last): File "/Users/guillaumedaudin/Documents/Recherche/Commerce International Français XVIIIe.xls/Balance du commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py", line 59, in existing_files[filepath] = sum((1 for _ in f)) - 1 File "/Users/guillaumedaudin/Documents/Recherche/Commerce International Français XVIIIe.xls/Balance du commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py", line 59, in existing_files[filepath] = sum((1 for _ in f)) - 1 File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 4, 2020

Pourtant <grep -axv '.*' bdd_centrale.csv> ne détecte pas de caractères non-UTF-8

@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 5, 2020

The error is also in the August 12th version of the production branch on my computer. It might be a computer-linked bug ?

@paulgirard
Copy link
Member

Ok so I don't have this on my linux.
I removed an unused dependency and add some explicit args to the line which breaks.
I will try to reuse what you did before rolling back tomorrow.

@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 5, 2020

I pulled and still have the issue : guillaumedaudin@Oronte base % python3 /Users/guillaumedaudin/Documents/Recherche/Commerce\ International\ Français\ XVIIIe.xls/Balance\ du\ commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py
Traceback (most recent call last):
File "/Users/guillaumedaudin/Documents/Recherche/Commerce International Français XVIIIe.xls/Balance du commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py", line 58, in
existing_files[filepath] = sum((1 for _ in f)) - 1
File "/Users/guillaumedaudin/Documents/Recherche/Commerce International Français XVIIIe.xls/Balance du commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py", line 58, in
existing_files[filepath] = sum((1 for _ in f)) - 1
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
guillaumedaudin@Oronte base %

@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 5, 2020

I am not sure I got what you meant in " I will try to reuse what you did before rolling back tomorrow."

Could you simply add the two columns/variables into bdd_centrale.csv and do the split ? I will take care of dealing with the schema and putting in the values. Put them between value_minus_unit_val_x_qty and trade_deficit, please

@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 5, 2020

Though I admit this is unsatisfying...

gdaudin added a commit that referenced this issue Nov 6, 2020
@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 6, 2020

C’est bon

@gdaudin gdaudin closed this as completed Nov 6, 2020
@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 7, 2020

The bug is back @paulgirard

Traceback (most recent call last):
File "/Users/guillaumedaudin/Documents/Recherche/Commerce International Français XVIIIe.xls/Balance du commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py", line 59, in
existing_files[filepath] = sum((1 for _ in blouf)) - 1
File "/Users/guillaumedaudin/Documents/Recherche/Commerce International Français XVIIIe.xls/Balance du commerce/Retranscriptions_Commerce_France/toflit18_data_GIT/scripts/split_bdd_centrale_in_sources.py", line 59, in
existing_files[filepath] = sum((1 for _ in blouf)) - 1
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/csv.py", line 110, in next
self.fieldnames
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/csv.py", line 97, in fieldnames
self._fieldnames = next(self.reader)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf5 in position 571: invalid start byte

@gdaudin gdaudin reopened this Nov 7, 2020
gdaudin added a commit that referenced this issue Nov 7, 2020
@gdaudin
Copy link
Collaborator Author

gdaudin commented Nov 7, 2020

Donc j’ai remplacé r+ par r et cela marche. Glup, glup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants