Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRITICAL] [acdcli.cache.db] - Error opening database. #124

Open
StSimmons opened this issue Aug 28, 2015 · 24 comments
Open

[CRITICAL] [acdcli.cache.db] - Error opening database. #124

StSimmons opened this issue Aug 28, 2015 · 24 comments

Comments

@StSimmons
Copy link

Hi,
I see this error when doing 'acd_cli sync , upload etc.'
This seems to happen when doing an acd_cli upload, but having the fuse mount open. Waiting a while doesn't fix this... Any ideas?
Steve

@StSimmons
Copy link
Author

Killing all the acd_cli porcesses does however fix this, but that prevents uploads when the fuse filesystem is mounted. Which is the only way to upload whilst the rsync/cp unreliability is present.

@StSimmons
Copy link
Author

My 'stab in the dark' investigations tell me: this happens when doing large uploads. I assume that the fuse tries to do sync's every now and then, this fails to access the locked database. Casuing the fuse process to get into a bad state. This leaves the database locked (so ul and sync fail cant access the db) and also leaves the fuse sysetm broken too. Could we perhaps use different cache directories? This seems to be an ideal solution to me (as the ul wouldn't need to be identical to the fuse) - eventual consistency and all that!

@yadayada
Copy link
Owner

~~"Error opening database" should occur only (?) if nodes.db is faulty. Which action did you use when this error occurred? Did you check the database file for corruption afterwards?

You can have two separate node caches by copying your .cache directory and using ACD_CLI_CACHE_PATH="/copied/path"; acdcli ul [...] to make use of the non-standard one.

The FUSE background sync can be disabled by adding -fg 0 -i0 to the mount action.

@StSimmons
Copy link
Author

I don't think its corruption of nodes.db, if I kill all the acd_cli processes it all starts working again on next sync.

Also, Doing an ls in the mounted directory when its all broken gives 'Bad Address' if that's any help? But doing the above and remounting starts it working again.

@yadayada
Copy link
Owner

That's strange. An ls should call ACDFuse's readdir which does not raise an EFAULT (bad adress), but an ENOENT (no such file or directory).

Can you tell me what your operating system/distribution you're using and the Python3 version?

It would also be helpful if you could write up some steps on how to reproduce this reliably.

@yadayada yadayada added the cache label Aug 29, 2015
@StSimmons
Copy link
Author

Unfortunately, I don't have a reliable repro (it just happens sporadically - got it again today - but this time I was not doing a transfer at the same time. So, it looks to just be fuse related...

Ubuntu x64 14.04LTS, Python 3.4.0

@StSimmons
Copy link
Author

I'm seeing this more and more, when doing multiple things with the fuse mount at a time. Bad Address comes back when trying to 'ls' the acd-mount. This is a regular occurrence when a number of apps start scanning the directory... Any ideas, or anything I could/should debug?

@yadayada
Copy link
Owner

Mounting read-only might help.

EFAULT should be raised on a read or write at an invalid address (hence "Bad Address").

It would be helpful if you could inspect the debug log, i.e. run acdcli -d mount -fg -i0 /mount/point and have a look where the errors happen.

Update: Sorry, the -i0 argument prevents this from happening.

@StSimmons
Copy link
Author

Haven't seen it since (didn't change any perms, just left it in logging mode to catch it when it happens). I'll update when I have more news! I did see it on another Linux install in azure... But didn't have debug on at that point.

@StSimmons
Copy link
Author

It's been a while, but I have still seen this (even with only one instance running). I didn't manage to get logs as I accidentally tripped over the power cable and lost them....

@ratherDashing
Copy link

Here is the stack trace. The log looks good around the same time:

15-10-16 08:24:06.728 [ERROR] [acd_cli] - Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/acd_cli.py", line 183, in autosync
    sync_node_list(full=False)
  File "/usr/local/bin/acd_cli.py", line 147, in sync_node_list
    cache.KeyValueStorage.update({CacheConsts.LAST_SYNC_KEY: time.time()})
  File "/usr/local/lib/python3.4/dist-packages/acdcli/cache/schema.py", line 238, in update
    self.__setitem__(key, dict_[key])
  File "/usr/local/lib/python3.4/dist-packages/acdcli/cache/schema.py", line 226, in __setitem__
    self.Session.merge(md)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/scoping.py", line 150, in do
    return getattr(self.registry(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/session.py", line 1698, in merge
    load=load, _recursive=_recursive)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/session.py", line 1737, in _merge
    merged = self.query(mapper.class_).get(key[1])
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 819, in get
    return self._get_impl(ident, loading.load_on_ident)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 852, in _get_impl
    return fallback_fn(self, key)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/loading.py", line 219, in load_on_ident
    return q.one()
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2473, in one
    ret = list(self)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2516, in __iter__
    return self._execute_and_instances(context)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/orm/query.py", line 2531, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/util/compat.py", line 188, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/util/compat.py", line 181, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/usr/local/lib/python3.4/dist-packages/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked [SQL: 'SELECT metadata."key" AS metadata_key, metadata.value AS metadata_value \nFROM metadata \nWHERE metadata."key" = ?'] [parameters: ('last_sync',)]

acd_cli.log at the same time:

15-10-16 08:17:59.792 [INFO] [acdcli.api.metadata] - 1 pages in changes.
15-10-16 08:39:03.637 [DEBUG] [acdcli.acd_fuse] - -> getattr /Plex (None,)

Nothing at 8:24

[edit]

5-10-16 14:27:07.494 [ERROR] [acd_cli] - Traceback (most recent call last):
[Same as Last Time]

Ran the fuse mount at debug and also didn't see anything:

15-10-16 14:26:39.624 [INFO] [acdcli.api.metadata] - Getting changes with checkpoint "removed".
15-10-16 14:26:39.625 [DEBUG] [acdcli.api.backoff_req] - Retry 0, waiting -59.378617s
15-10-16 14:26:39.625 [DEBUG] [acdcli.api.backoff_req] - POST "https://cdws.us-east-1.amazonaws.com/drive/v1/changes"
15-10-16 14:26:39.625 [DEBUG] [acdcli.api.backoff_req] - {"checkpoint": "removed", "includePurged": "true"}
15-10-16 14:26:39.626 [INFO] [requests.packages.urllib3.connectionpool] - Starting new HTTPS connection (1): cdws.us-east-1.amazonaws.com
send: b'POST /drive/v1/changes HTTP/1.1\r\nHost: cdws.us-east-1.amazonaws.com\r\nUser-Agent: acd_cli/0.3.0a6 python-requests/2.8.1\r\nAccept: */*\r\nAuthorization: Bearer Atza|removed\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nContent-Length: 111\r\n\r\n{"checkpoint": "removed", "includePurged": "true"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server header: Date header: Content-Type header: Transfer-Encoding header: Connection header: x-amzn-RequestId header: Content-Encoding header: Cache-Control header: Pragma header: Vary 15-10-16 14:26:39.774 [DEBUG] [requests.packages.urllib3.connectionpool] - "POST /drive/v1/changes HTTP/1.1" 200 None
15-10-16 14:26:39.775 [DEBUG] [acdcli.api.metadata] - Checkpoint: removed
15-10-16 14:26:39.939 [INFO] [acdcli.api.metadata] - 1 pages in changes.
15-10-16 14:27:39.992 [INFO] [acdcli.api.metadata] - Getting changes with checkpoint "removed".
15-10-16 14:27:39.992 [DEBUG] [acdcli.api.backoff_req] - Retry 0, waiting -60.019070s
15-10-16 14:27:39.992 [DEBUG] [acdcli.api.backoff_req] - POST "https://cdws.us-east-1.amazonaws.com/drive/v1/changes"
15-10-16 14:27:39.992 [DEBUG] [acdcli.api.backoff_req] - {"checkpoint": "removed", "includePurged": "true"}
15-10-16 14:27:39.993 [INFO] [requests.packages.urllib3.connectionpool] - Starting new HTTPS connection (1): cdws.us-east-1.amazonaws.com
send: b'POST /drive/v1/changes HTTP/1.1\r\nHost: cdws.us-east-1.amazonaws.com\r\nUser-Agent: acd_cli/0.3.0a6 python-requests/2.8.1\r\nAccept: */*\r\nAuthorization: Bearer Atza|removed\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nContent-Length: 111\r\n\r\n{"checkpoint": "removed", "includePurged": "true"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server header: Date header: Content-Type header: Transfer-Encoding header: Connection header: x-amzn-RequestId header: Content-Encoding header: Cache-Control header: Pragma header: Vary 15-10-16 14:27:40.146 [DEBUG] [requests.packages.urllib3.connectionpool] - "POST /drive/v1/changes HTTP/1.1" 200 None
15-10-16 14:27:40.147 [DEBUG] [acdcli.api.metadata] - Checkpoint: CP_rjo-HKhoiCAYaHgEUAAAAA5Ma6oaIIfBQqvanmeCk4YvIy5opj8xFbg==
15-10-16 14:27:40.272 [INFO] [acdcli.api.metadata] - 1 pages in changes.
15-10-16 14:28:40.333 [INFO] [acdcli.api.metadata] - Getting changes with checkpoint "removed".
15-10-16 14:28:40.334 [DEBUG] [acdcli.api.backoff_req] - Retry 0, waiting -59.722795s
15-10-16 14:28:40.334 [DEBUG] [acdcli.api.backoff_req] - POST "https://cdws.us-east-1.amazonaws.com/drive/v1/changes"
15-10-16 14:28:40.334 [DEBUG] [acdcli.api.backoff_req] - {"checkpoint": "removed", "includePurged": "true"}
15-10-16 14:28:40.335 [INFO] [requests.packages.urllib3.connectionpool] - Starting new HTTPS connection (1): cdws.us-east-1.amazonaws.com
send: b'POST /drive/v1/changes HTTP/1.1\r\nHost: cdws.us-east-1.amazonaws.com\r\nUser-Agent: acd_cli/0.3.0a6 python-requests/2.8.1\r\nAccept: */*\r\nAuthorization: Bearer Atza|removed\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\nContent-Length: 111\r\n\r\n{"checkpoint": "removed", "includePurged": "true"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Server header: Date header: Content-Type header: Transfer-Encoding header: Connection header: x-amzn-RequestId header: Content-Encoding header: Cache-Control header: Pragma header: Vary 15-10-16 14:28:40.456 [DEBUG] [requests.packages.urllib3.connectionpool] - "POST /drive/v1/changes HTTP/1.1" 200 None
15-10-16 14:28:40.457 [DEBUG] [acdcli.api.metadata] - Checkpoint: removed
15-10-16 14:28:40.645 [INFO] [acdcli.cache.sync] - Inserted/updated 1 folder(s).
15-10-16 14:28:40.854 [INFO] [acdcli.cache.sync] - Inserted/updated 1 file(s).
15-10-16 14:28:41.280 [INFO] [acdcli.cache.sync] - Parented 2 node(s).
15-10-16 14:28:42.984 [INFO] [acdcli.api.metadata] - 1 pages in changes.

@sjfer
Copy link

sjfer commented Oct 23, 2015

Is there anyway the PID for the acdcli upload task can be killed automatically when the upload has been completed, or has been stopped?

@hazcod
Copy link

hazcod commented Nov 6, 2015

I have this issue happening from time to time.
I have a read-only mount with simultaneous acdcli upload tasks running.

This always solves the problem, but I do not know yet how to catch those specific exceptions in bash to do this automatically;

acdcli umount /tmp/.media
rm /home/media/.cache/acd_cli/nodes.db

acdcli sync
acdcli --verbose mount "/tmp/.media" --ro > /var/log/acd.log 2>&1

@dellipse
Copy link

I am also experiencing this issue, but I have not had a chance to gather logs. I will do so on the next occurrence.

However, since I am not a sqlite expert....is there anything I can do to help either prevent or even mitigate this issue while still being able to use both the FUSE mount in r/w mode and do the directory uploads? Until the acd_cli/encfs combination speed issues are fixed and the FUSE side becomes more stable, I still need to use both.

Also, if you give me a list of things to look for when this happens, I will provide them.

@yadayada
Copy link
Owner

However, since I am not a sqlite expert....is there anything I can do to help either prevent or even mitigate this issue while still being able to use both the FUSE mount in r/w mode and do the directory uploads?

See #124 (comment).

I'd also welcome it If anyone could see if the following patch changes anything.

diff --git a/acdcli/cache/db.py b/acdcli/cache/db.py
index 9399711..54f31f4 100644
--- a/acdcli/cache/db.py
+++ b/acdcli/cache/db.py
@@ -59,6 +59,8 @@ class NodeCache(QueryMixin, SyncMixin):
                lambda conn: conn.connection.create_function('REGEXP', 2, _regex_match))

         initialized = os.path.exists(db_path)
+        self.engine.execute('PRAGMA busy_timeout = 30000;')
+
         if initialized:
             try:
                 initialized = self.engine.has_table(schema.Metadate.__tablename__) and \

Until the acd_cli/encfs combination speed issues are fixed

Not going to happen.

@dellipse
Copy link

I tried using this command, but it appears to still use the default nodes.db cache

ACD_CLI_CACHE_PATH="/home/acd_ul_cache"; acdcli -u ul --remove-source-files -o -x 8 -xp /home/excluded/path /home/FilesToTransfer/* /

I got the same errors again. I tried to use the 'fuser' command from #143 (comment) to see what was using it, but when I checked there was nothing.

@dellipse
Copy link

I just took the ; out of the command and it is now using the correct cache path. I checked using the command

ACD_CLI_CACHE_PATH="/home/acd_ul_cache" acdcli -v init

before running it this time. It used the default when the ; was in the command.

@egyptianbman
Copy link

Chiming in, I'm also experiencing many database locked errors -- when mounting multiple mounts or mounting and sending at the same time.

@yadayada yadayada mentioned this issue Dec 8, 2015
5 tasks
@calisro
Copy link
Contributor

calisro commented Dec 19, 2015

Same issue here. It is sporadic though. Once this occurs, I need to unmount and remount it to get it to work again.

@marcelowa
Copy link

Experiencing the same issue too

@yadayada
Copy link
Owner

@marcelowa Could you provide a log?

@marcelowa
Copy link

I actually changed my configuration to support a different cache path as suggested, for now it seems to do the trick
but I will change it back later to get the log

@marcelowa
Copy link

It actually happened anyway with a different cache path for uploading

Suspicious parts

15-12-24 23:40:16.566 [DEBUG] [acdcli.acd_fuse] - <- getattr '[Unhandled Exception]'
15-12-25 03:08:08.698 [CRITICAL] [acdcli.cache.db] - Error opening database: (sqlite3.OperationalError) database is locked [SQL: 'PRAGMA table_info("metadata")']

This is what I see if I am trying to access my mounted folder :

ll /media/amazon/
ls: reading directory /media/amazon/: Bad address

Attached the log

acd_cli.log.tar.gz

@animetosho
Copy link

Hi,
I'm not sure whether this is related, but does seem similar (sorry, I haven't read everything in this thread yet).

16-05-21 00:55:00.110 [INFO] [acdcli.cache.sync] - Inserted/updated 1 file(s).
16-05-21 00:55:06.377 [ERROR] [acd_cli] - Traceback (most recent call last):
  File "/usr/local/bin/acd_cli.py", line 217, in wrapped
    ret_val = f(*args, **kwargs)
  File "/usr/local/bin/acd_cli.py", line 548, in upload_stream
    return upload_complete(r, log_fname, hasher.get_result(), None, False)
  File "/usr/local/bin/acd_cli.py", line 267, in upload_complete
    cache.insert_node(node)
  File "/usr/local/lib/python3.4/dist-packages/acdcli/cache/sync.py", line 80, in insert_node
    self.insert_nodes([node])
  File "/usr/local/lib/python3.4/dist-packages/acdcli/cache/sync.py", line 74, in insert_nodes
    self.insert_parentage(files + folders, partial)
  File "/usr/local/lib/python3.4/dist-packages/acdcli/cache/sync.py", line 138, in insert_parentage
    [n['id'] for n in slice_])
sqlite3.OperationalError: database is locked

I don't do any FUSE mounting, I just run two instances at a time via acdcli stream and this happens quite frequently to me.

Do you think using an alternative database back-end like in #312 would eliminate the issue?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants