Project

General

Profile

Actions

Feature #8

open

Add some sort of caching for config from DB settings

Added by Lance Edgar over 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Low
Assignee:
Start date:
07/31/2022
Due date:
% Done:

0%

Estimated time:

Description

I've long been aware of inefficiencies caused by constant querying of the setting table for real-time config values. It was originally noticed because of the sheer number of queries required e.g. for a single page load in the web app.

Accordingly, I've long intended to add some sort of caching mechanism to cut down on the number of queries. However there is a new problem, illustrated by #6 and #7: Not only are there too many queries, but they require too many DB sessions!

So the caching mechanism needs to allow for:

  • no change in calling code, e.g. config.get('foo', 'bar') should "just work"
  • caller should get "current value" without need for DB session/query
  • changes to settings in DB must take effect "immediately" (more or less)

I think two approaches make sense:

Basic / Native Caching

First, the RattailConfig class should implement a basic caching mechanism. This should keep track of values already fetched from the DB, and when each was fetched. If an already-fetched value is requested from caller, and it was fetched "recently" enough (e.g. within last 30 seconds?) then the cached value is returned to caller as-is. If the fetch was not recent enough, it is re-fetched via DB query.

This has the advantage of being built-in to Rattail and therefore can work out of the box with no setup needed. May still want to allow config to disable the caching (perhaps it should even be off by default?), as well as config to determine cache expiry timeout, i.e. TTL for the cached values.

But it has the disadvantage of still tying up DB sessions and running queries semi-regularly. The cutoff of 30 seconds is striking a balance: We don't want to keep running queries if the same values are requested frequently; however the values may be changed by admin user at any time, so it's not safe to "always" return the same value from cache. It's hoped that 30 seconds is often enough to reflect actual changes "quickly" but infrequent enough to help somewhat with session/query overhead.

Beaker Caching

Ultimately I think the best fix here is to split out the config caching to its own "service". Since the web app already uses Beaker (for user sessions) it probably makes sense to use that here as well, since it provides a Caching system with support for multiple back-ends (e.g. file, memcached, redis).

Assuming a "normal" back-end (e.g. file, memcached, redis) then I believe all running Rattail apps could effectively leverage the same cache. For a given app, this would mean not only that it need run fewer queries to fetch settings, but in fact (potentially) zero queries.

Depending on the desired back-end, this may require some more setup. But hopefully a basic file-based cache would require only turning on a single config flag; i.e. the default file path for cache storage could be determined automatically (or overridden via config).

One gotcha here is the question of when/how to "invalidate" the cache for a given setting. This is especially important if multiple apps are leveraging a single cache. Consider:

  • an admin user can modify Raw Settings via the web app
  • some apps will auto-write settings to DB, e.g. datasync lastrun times

Anytime a setting is modified in the DB, the cache system needs to know about it so that it will re-fetch the new value when it is next requested by a caller.

I believe that since settings are usually written via dedicated function, we can add some hook to invalidate cache for the setting which is being written. But this will need closer attention when I get to that point.


Related issues 3 (1 open2 closed)

Related to Rattail - Bug #6: Datasync error: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30ClosedLance Edgar07/26/2022

Actions
Related to Rattail - Bug #7: Datasync error: remaining connection slots are reserved for non-replication superuser connectionsClosedLance Edgar07/26/2022

Actions
Related to Rattail - Feature #9: Split out config to separate package?In ProgressLance Edgar08/06/2022

Actions
Actions #1

Updated by Lance Edgar over 2 years ago

  • Related to Bug #6: Datasync error: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30 added
Actions #2

Updated by Lance Edgar over 2 years ago

  • Related to Bug #7: Datasync error: remaining connection slots are reserved for non-replication superuser connections added
Actions #3

Updated by Josh Smith over 2 years ago

I'm curious if any of your other clients such as the Sac coop have run into this too-many-queries issue with Milo or are Dtail pageloads just particularly full of settings? Maybe we have more simulataneous users?

In either option if a changed setting isn't invalidated in the cache actually immediately we may encounter unexpected behavior, or outright bugs. So it seems to me we have to have it. Are you saying even if we choose the Beaker cache system, you'll still have to implement some way to invalidate the setting; that that's not something Beaker provides? If instant invalidation must be done either way, then is the only disadvantage of not using Beaker no possibility to have a backend to share settings across all apps?

You mentioned Beaker handles user sessions, but I wonder what proportion of settings are user-specific versus global. I'm not clear on if the setting object that gets passed around is global.

Actions #4

Updated by Lance Edgar over 2 years ago

  • Status changed from New to In Progress

Yes definitely the problem has occurred for other installs. Mostly it shows up when datasync involves "several" nodes, e.g. multi-store. The pseudo-fix has been to increase pool_size and max_overflow as described in #6. It's trial and error to see how big the pool size must be to avoid the issue. That process already happened elsewhere, you're just late to the party. ;) But also your upgrade triggered me to try to get to the bottom of it instead of just applying the pool_size band-aid.

You bring up a good point though. My "basic" cache idea w/ 30 second expiry would in fact not work correctly for datasync in particular, since in many cases it will auto-update certain settings as often as once per second. The idea of 30 second expiry was to avoid the complexity of invalidation for "native" caching.

Beaker definitely supports Invalidating; my concern is only that I must be sure to wire that up correctly.

This pretty much tells me that the "basic caching" is not worth fooling with (even though I already implemented it, I may rip that out) and we need to go straight for Beaker. So future installs will have no caching by default, but Beaker caching could be configured as needed.

As for global vs. user-specific, only web session data is user-specific. All "settings" (i.e. values in the setting table in DB) are effectively global in nature, although some of those values may only apply to a certain user. In practice none of that should be an issue.

Actions #5

Updated by Lance Edgar over 2 years ago

I added initial Beaker cache support, and went live with it for a few apps. So far it seems to be working, with one caveat:

In a multi-node datasync setup, there may be "many" settings normally read on app startup. With the beaker cache enabled, each time a setting must be fetched from the DB it uses a new/separate session. Which means errors such as #6 may happen on "first" app startup after enabling beaker cache. However once each setting has been cached, the app is happy to use that with no fetching from DB. This means you can just restart datasync to get past any initial errors, and after that things should run fine.

But that is annoying; the cache should "just work" really. I guess I'll need to figure out a way to share a single session when the cache must fetch settings from DB? Or possibly the cache should "auto-fill" when first used, by pre-fetching all settings in the DB?

Actions #6

Updated by Lance Edgar over 2 years ago

The problem described above, with first app startup after enabling beaker cache, may be tricky to solve. I believe the cause is a "burst" of multiple queries at once, which is maybe due to number of threads in datasync? I believe sessions are being closed and returned to the pool..right?? In which case sequential queries should only use "one session at a time", but the use of threads means queries may not be sequential and instead overlap, hence many sessions at once.

If that's all so, then the larger / more complex apps may have the issue but maybe not simpler ones; due to the difference in thread count, and probably sheer number of settings to be fetched for running logic etc.

I'd lean toward leaving the complex apps "unsolved" here since they already need some expertise to maintain, and since it's a one-time problem. And because the real fix here I think would require the setting queries to be made "sequentially" somehow, and that is a can of worms. Not only would various threads need to call a single shared routine (?) to fetch settings from DB, but possibly the various apps ought to share it too. The irony is the "fix" required looks an awful lot like what Beaker is already doing for us - after it gets past any initial "burst o' queries".

Oh and I didn't like the idea of pre-fetching all settings when cache is first enabled, because there may be lots of settings, and many may not be needed so no reason to clutter the cache.

TL;DR i think i'm happy with beaker caching as-is, unless something further comes up

Actions #7

Updated by Lance Edgar over 2 years ago

  • Status changed from In Progress to Closed

Cache is in use for some fairly major installs now, with no further issues so far. Calling this good for now.

Actions #8

Updated by Lance Edgar over 2 years ago

  • Related to Feature #9: Split out config to separate package? added
Actions #10

Updated by Lance Edgar about 2 years ago

  • Status changed from Closed to New
  • Priority changed from High to Low

Well dang it. Something is not quite right about the caching. Regardless of backend (have tried file, memcached), "stale" values have been encountered in real-world use. Meaning, the app behaves incorrectly per "older" config values which were since updated, as if it ignores the newer values.

So this feature is not complete; and in fact it is not recommended to use it. Docs updated accordingly.

Even though I've wanted to pursue this feature for a long time, #6 is really what pushed me to tackle it this time around. Not sure yet, what will be the next motivator. I'll lower priority and just ignore it for now... :/

Actions

Also available in: Atom PDF