Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (207,) + inhomogeneous part. #4

Open
Parasite-231 opened this issue Apr 8, 2024 · 3 comments
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@Parasite-231
Copy link

Parasite-231 commented Apr 8, 2024

I can't understand why on suddenly it went wrong in Task2_3_Multiclass_classification_of_NFR_subclasses.ipynb during training on the classifier! If there is any solution please provide me with one. I have uploaded the image of error as well as particular cell for which error was generated.

Image :
1

Cell : Decide how to fold and train the classifier
Code snippet :

overall_flat_predictions, overall_flat_true_labels, results = [], [], []
initLog()
if config.fold == Fold.TenFold:
  skf = StratifiedKFold(n_splits=10)
  fold_number = 1
  for train, test in skf.split(df, df[config_data.label_column]):
    df_train = df.iloc[train]
    df_eval = df.iloc[test]
    log_text = '/////////////////////// Fold: {} of {} /////////////////////////////'.format(fold_number,10)
    logLine(log_text)
    classifier, overall_flat_predictions, overall_flat_true_labels, results = train_and_predict(df_train, df_eval, overall_flat_predictions, overall_flat_true_labels, results)
    fold_number = fold_number + 1
elif config.fold == Fold.ProjFold:     
  for k in config_data.project_fold:
    test = df.loc[df['ProjectID'].isin(k)].index
    train = df.loc[~df['ProjectID'].isin(k)].index
    df_train = df.loc[train]
    df_eval = df.loc[test]
    log_text = '/////////////////////// Test-Projects: {} /////////////////////////////'.format(k)
    logLine(log_text)
    classifier, overall_flat_predictions, overall_flat_true_labels, results = train_and_predict(df_train, df_eval, overall_flat_predictions, overall_flat_true_labels, results)
else:
  df_train, df_eval = train_test_split(df,stratify=df[config_data.label_column], train_size=config.train_size, random_state= config.seed)
  classifier, overall_flat_predictions, overall_flat_true_labels, results = train_and_predict(df_train, df_eval, overall_flat_predictions, overall_flat_true_labels, results)

get_memory_usage_str() 

Error :

Train Dataframe shape: (332, 18)
Evaluation Dataframe shape: (37, 18)
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-21-b2bf3dae31b4>](https://localhost:8080/#) in <cell line: 29>()
     35     log_text = '/////////////////////// Fold: {} of {} /////////////////////////////'.format(fold_number,10)
     36     logLine(log_text)
---> 37     classifier, overall_flat_predictions, overall_flat_true_labels, results = train_and_predict(df_train, df_eval, overall_flat_predictions, overall_flat_true_labels, results)
     38     fold_number = fold_number + 1
     39 elif config.fold == Fold.ProjFold:

10 frames
[/usr/local/lib/python3.10/dist-packages/fastai/core.py](https://localhost:8080/#) in array(a, dtype, **kwargs)
    300     if np.int_==np.int32 and dtype is None and is_listy(a) and len(a) and isinstance(a[0],int):
    301         dtype=np.int64
--> 302     return np.array(a, dtype=dtype, **kwargs)
    303 
    304 class EmptyLabel(ItemBase):

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (249,) + inhomogeneous part.
@Gram21
Copy link
Collaborator

Gram21 commented Apr 9, 2024

Thank you for your interest and your detailed request! We will look into it, but this might take a short while.

@Gram21
Copy link
Collaborator

Gram21 commented Apr 14, 2024

Quick Update: I can reproduce the error but due to time constraints it will take me at least until next week to find a fix for the bug.

@Gram21
Copy link
Collaborator

Gram21 commented May 7, 2024

Okay, here is the thing: we were able to locate the bug and kind of fix it.

The issue is when we create the databunches.
The dataframes that we put in seem to have inhomogenous shape. This was no problem in the past. However, newer versions of Python and NumPy do not allow this. Until now, I could not find a fix to make the dataframes homogenous. One idea is to set the dtype to something like object, but I had problems when doing so.

The fix for now is to use Python 3.7 (as stated in the INSTALL.md) as this version allows these kind of shapes.

If you want or need to use a newer version of Python, feel free to adapt the code and fix the underlying issue. Due to time constraints, we cannot update the code right now. We are sorry, if this causes inconveniences.

@Gram21 Gram21 added bug Something isn't working wontfix This will not be worked on labels May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants