-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch<dynamic> caching #657
Comments
@SimonHurrNZ I would have thought you'd need to pass a generic argument to I suspect the Generic argument is being kept the same for each outcome of the stored proc. PetaPoco uses the Generic argument for deciding how to map the results from the DB to the Poco. |
Apologies, my simple example didn't include the generic argument. |
Ah, it's the markdown excluding the dynamic argument. |
@SimonHurrNZ I suspect the dynamic types are wrongly being treated as equal based on the number of properties they contain. Seems to be how .Net works. We would have to try and find a workaround for this. In the meantime, you could try adding
Can you let me know if ☝️ works? Also, the version of PetaPoc and the Database you're using would help too. Might be some time before I can look at this. I'm unsure of @asherber availability, but hopefully, one of us will get to it. Unless you're feeling keen to jump into the internals to see if you can resolve it yourself. |
I'm on v6.0.441 at present, which doesn't have that FlushCaches() method exposed. Thanks for looking at this so promptly! |
I'm not able to reproduce using |
Yes, I can reproduce using a stored proc, even if I'm using |
I think the issue is here. When we create and cache a factory method for turning the I'm not sure right now what the best fix is for this, will need to think on it. It seems like a bit of an edge case -- I think the general assumption is that a stored proc will always return the same set of columns. And if we add the params to the cache key, that will sort of defeat the general intent of the cache. |
I can confirm that upgrading to v6.0.480 and using FlushCaches() has given us a usable workaround. I'm not certain how much of a performance hit there will be with the cache constantly being flushed by use of this, but I'm very grateful to have a viable path for now. We'll start testing this tomorrow. |
@asherber I think you're correct. Great find. I agree that a proc returning different result sets seems a bit strange. But I have to admit I don't use procs all that much. I wonder if the following would address the issue whilst still being performant?
|
I don't think that using In general, with all respect to @SimonHurrNZ, my initial reaction is to say that this is enough of an edge case that maybe we don't need to account for it. |
We'll monitor to see to what degree flushing the whole application's cache before each call to that method impacts overall performance for clients using that particular feature. Is there perhaps scope to add an overload / parameter / setting that would bypass the cache build/lookup for a given query, instead of flushing the entire cache before making the call? That would be a perfect outcome from our requirements point of view, as these queries are essentially ad-hoc and thus there is no basis for caching the factory method. |
I agree that using the Also agree that if there isn't a concrete way of supporting stored procs with differing result sets, then maybe it's not worth supporting at all. I wonder if the comprise is a way to disable cache for a particular operation 🤔 |
@SimonHurrNZ Sorry. I didn't see your comment before replying. It seems we have come to the same conclusion |
Well, the library does look for the I'm still not sure this is something the library needs to support. The user could, for example, design his stored proc so that it always returns the same set of columns, even if some of them or null. Something like: CREATE PROCEDURE GetData(@selector int) AS
IF @selector = 1 SELECT tmpA.*, null as e, null as f FROM tmpA
IF @selector = 2 SELECT tmpB.*, null as a, null as b, null as c, null as d FROM tmpB Now the proc will always return columns a-f. You could even add in each The user could also move the logic into code, so that based on the value of the selector, different stored procs are called. If the library did start supporting bypassing the factory cache, I think there's a question as to whether it's supported on the operation level (similar to passing in a custom mapper) or on the database level (similar to |
Hi guys, It took me a while to find the code, hence me posting this late. So here it is: // Create factory function that can convert a IDataReader record into a POCO
public Delegate GetFactory(string sql, string connString, int firstColumn, int countColumns, IDataReader r) {
...
il.Emit(OpCodes.Dup); // obj, obj
//Old
il.Emit(OpCodes.Ldstr, r.GetName(i));// obj, obj, fieldname
//New
il.Emit(OpCodes.Ldarg_0); // obj, obj, rdr
il.Emit(OpCodes.Ldc_I4, i); // obj, obj, rdr, i
//read column names from IDataReader instead of Cache
il.Emit(OpCodes.Callvirt, fnGetName);// obj, obj, fieldname
...
}
static List<Func<object, object>> _converters = new List<Func<object, object>>();
//New
static MethodInfo fnGetName = typeof(IDataRecord).GetMethod("GetName", new Type[] { typeof(int) });
static MethodInfo fnGetValue = typeof(IDataRecord).GetMethod("GetValue", new Type[] { typeof(int) }); I haven't done any benchmarking to prove/disprove whether this change causes any performance issues, but has worked well for us at the time. I've used the following code to test it: [TestMethod]
public void PetaPocoShouldNotCacheDynamicObjectColumnNames()
{
//Arrange
var db = new PersistanceManager("TestConnection");
db.Execute(Scripts.Drop.ColumnNameCacheTest);
ExecuteBatch(db, Scripts.Create.ColumnNameCacheTest);
var expectedFirst = new Dictionary<string, object>
{
{ "One", 1 },
{ "Two", 2 },
{ "Three", 3 }
};
var expectedSecond = new Dictionary<string, object>
{
{ "Ten", 10 },
{ "Twenty", 20 },
{ "Thirty", 30 }
};
//Act
var actualFirst = (IDictionary<string, object>)db.Fetch<dynamic>("EXEC dbo.CSP_ColumnsTest @0", 0).First(); //<--Must output one, two, three
var actualSecond = (IDictionary<string, object>)db.Fetch<dynamic>("EXEC dbo.CSP_ColumnsTest @0", 1).First(); //<--Must output ten, twenty, thirty
try
{
//Assert
foreach (var item in actualFirst)
{
Assert.IsTrue(expectedFirst.ContainsKey(item.Key), $"First result: '{item.Key}' property was unexpected");
Assert.AreEqual(expectedFirst[item.Key], item.Value, $"First result: '{item.Key}' property value was unexpected.");
}
//The second result previously had the correct values, but first result's keys
foreach (var item in actualSecond)
{
Assert.IsTrue(expectedSecond.ContainsKey(item.Key), $"Second result: '{item.Key}' property was unexpected");
Assert.AreEqual(expectedSecond[item.Key], item.Value, $"Second result: '{item.Key}' property value was unexpected.");
}
} finally
{
db.Execute(Scripts.Drop.ColumnNameCacheTest);
}
}
public static class Scripts
{
public static class Create
{
public const string ColumnNameCacheTest = @"SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[CSP_ColumnsTest] @@i INT = NULL
AS
BEGIN
IF(@@i = 0)
BEGIN
SELECT 1 AS One,
2 AS Two,
3 AS Three;
END;
ELSE
BEGIN
SELECT 10 AS Ten,
20 AS Twenty,
30 AS Thirty;
END;
END;";
}
public static class Drop
{
public const string ColumnNameCacheTest = @"IF (OBJECT_ID('[dbo].[CSP_ColumnsTest]') IS NOT NULL) DROP PROCEDURE [dbo].[CSP_ColumnsTest];";
}
} Not sure if this solution is still viable or implementable (if there's such a word) in the new version of Peta Poco, but hopefully it can help @SimonHurrNZ with his issue. |
@Curlack Interesting, thanks! So I think this doesn't bypass the factory cache entirely; it's just that when the generic type is This would solve for the current edge case, but at the expense of the much more common case (slight performance hit for reading the column names each time). Current code is here. |
@asherber, @pleb I would be more than happy to do the work if you guys consider the proposed solution to be a candidate. I can add an option like |
One problem is that |
I see. Then I guess we have no choice but to decide whether we're gonna
implement it or not. Would the benchmark still be value adding in making
this decision or have we already decided?
…On Fri, 30 Sep 2022, 5:06 pm Aaron Sherber ***@***.***> wrote:
One problem is that PocoData and its factory caches live outside of the
databases. If db1 has EnableCache = true and makes a call, then the
column names will be cached in the factory. If db2 with EnableCache =
false then makes a call, they're going to get the same column names from
the previously cached factory.
—
Reply to this email directly, view it on GitHub
<#657 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACI4HMI2DSC6TV2WCFDSS3WA36YDANCNFSM6AAAAAAQYIQKWY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks @Curlack - if nothing else, it's nice to know I'm not the only developer on the planet who has had this issue! In response to the question of what level a 'bypass cache' flag would be held, we know exactly which call(s) require it, as in out implementation it is only going to need to be set for a call that returns a dynamic type. @asherber - I can see that implementing a bypass flag in pocodata could be problematic, although if it would be possible to reduce it down to a flag like 'DisableDynamicColumnCaching' that only checked and bypassed the cache for calls returning type Ideally it would be a query-level overload or parameter, but I can see the issue of polluting the interface to cater for an edge case like this. |
I ran a few quick benchmarks, and on 10K selects of a single row with 59 columns, the difference between caching the field names or not was about 32 sec vs. 37 sec. That's not a small difference, percentage-wise, even though the absolute overall effect may not be a big deal. I'm not crazy about making this change globally, and personally I'm still not convinced that it's a problem that needs fixing. If we want to do it, my suggestion would be a static flag on I also think this needs more testing. So far, I think we've all been looking at simple |
Wow! 5 seconds! I agree with @asherber, it's too much of a performance hit. It's a shame actually. I do have one last suggestion for @SimonHurrNZ though. What if we solve it outside of Peta Poco? |
Unfortunately @Curlack, that's simply not within the purview of our API. Our API only has the job of authenticating, building an SP query based on a request, executing it and outputting the result set. There's no knowledge of any given SP within the API itself. The SPs involved have many different potential responsibilities, and our clients work differently enough from each other that it's more straightforward to have a generic API call custom SPs to make sure each specific client executes the specific set of business logic they need. This is why the API has no idea what an SP is going to return, and relies on the dynamic data type to get an object it can serialize and output. I'm in complete agreement that a general implementation of making dynamic result sets work within the caching system is not really called for. Unfortunately, our API has other responsibilities, which is why flushing the cache completely is likely to cause something of a performance hit. That is what has me wondering if being able to set a flag telling the factory to temporarily bypass cache lookup for dynamic result sets would be both feasible and performant. In this scenario, we would disable the caching, execute the query, and re-enable caching again immediately after the query completed. If this cannot be achieved, we'll change our design to minimize the performance hit of the flushcaches() call, which will be about all we can do without a fairly radical design change to certain aspects of our API. |
@SimonHurrNZ, maybe a better workaround, though a bit hacky, might be: for (int i = 1; i < 5; i++)
{
var proc = Sql.Builder.Append($";EXEC GetData @@selector = @selector -- {i}", new { selector = i });
var data = db.Fetch<dynamic>(proc);
Console.WriteLine(JsonConvert.SerializeObject(data));
} The cache key includes the SQL statement, giving a new key per selector. Note: I haven't tried it, and be careful of potential SQL injection attacks |
@pleb That's a great idea! If the selector is always an int, then there shouldn't be any issues with injection, but if it's a string you'd want to be careful. |
Possible way to deal with strings: $";EXEC GetData @@selector = @selector -- {i.GetHashCode()}", new { selector = i } These methods make the cache work for you, by letting the cached factory store the correct field names for each parameterized call to the stored proc. You could also completely bust the cache by using the same technique, but with a random number at the end of each call. This would essentially make every call to the stored proc look completely unique, whether or not it reuses a previous selector. The drawback here is that the cache would grow for every single call, which would eventually cause memory issues (unless you periodically flush the cache). |
Another way to tackle this and make the support more formal would be to expose a noCacheQueryHint prop on the DB Provider and check in the SQL string. var db = DatabaseConfiguration
.Build()
.UsingCommandTimeout(180)
.WithAutoSelect()
.WithNoCacheQueryHint()
.Create();
// PocoData.GetFactory
var cacheFactory = true
if (db.EnableNoCacheQueryHint && sql.contains(provider.noCacheQueryHint))
{
cacheFactory = false
} This should have little impact on existing users while providing an official way to skip caching. Though, maybe with the workaround above and given it's not a common issue, it might not be of much value |
@pleb, is that suggesting that the db instantiation sets a property to allow for the option of bypassing cache on a query by query basis, and an individual query then uses a query hint to disable the cache lookup for that specific query? If so, that sounds like it would be an ideal solution. Failing that, we may be able to implement a variation on the workaround suggested by you and @asherber above, although I should confess that the code snippet I supplied isn't the actual production code, but rather a minimal implementation that can reproduce the underlying issue. I wasn't entirely sure what to expect when I submitted it! |
@pleb I'm not clear on what you're saying about the provider having a property that can control this. Are you suggesting that there should be a magic string that tells PocoData to bypass the cache, along the lines of a leading semicolon bypassing |
@asherber I thought the provider would be needed because I assumed varying comment syntax between DB. A quick search seems to suggest all support the If the comment style isn't an issue, I guess the code could be simplified to var db = DatabaseConfiguration
.Build()
.UsingCommandTimeout(180)
.WithAutoSelect()
.WithNoCacheQueryHint()
.Create();
// PocoData.GetFactory
var cacheFactory = true
if (checkForSkipCache && sql.EndsWith('-- nocahe'))
{
cacheFactory = false
} Yes, an additional argument would be needed on the |
Ah, okay. I still don't like the idea. I feel like it's enough of an edge case, and there are enough other ways to work around it, that adding this to the library isn't necessary. |
Not sure if this will help anyone but I stumbled across this post dealing with this same issue for concrete types. The issue being the same: the results returned from the stored procedure has dynamic columns. We already have our own class inheriting from
So now I can gather any properties or data that affects the shape of the stored procedure results and have that unique collection be part of my sql cache string. In short: Make the SQL used by PetaPoco for caching unique, and then reset it before sending to the DB. //all below code is within class inheriting Database
private static ConcurrentDictionary<string, string> _cacheKeyToProcName = new ConcurrentDictionary<string, string>();
//combines the name and key, and also stores in our dictionary
private string AppendCacheValue(string storedProcedureName, string key)
{
var newName = $"{storedProcedureName} cache=||{key}||";
_cacheKeyToProcName.TryAdd(newName, storedProcedureName);
return newName;
}
//other overloads like fetch just use this.
//of course the string cache key can easily be serialized JSON
//I left any of that responsibility outside of this.
public IEnumerable<T> QueryProc<T>(string storedProcedureName, string cacheKey, params object[] args)
{
storedProcedureName = AppendCacheValue(storedProcedureName, cacheKey);
return base.QueryProc<T>(storedProcedureName, args);
}
public override void OnExecutingCommand(IDbCommand cmd)
{
//see if this was an updated sp command via cache buster
if (_cacheKeyToProcName.TryGetValue(cmd.CommandText, out var originalName))
{
cmd.CommandText = originalName;
}
//.... other code and stuff
base.OnExecutingCommand(cmd);
} |
We have a stored procedure that can return a variety of different data, which are defined via parameters.
I've been using Fetch to support this, as the calling code has no idea what the stored procedure will return.
However, when two calls to the stored procedure return the same number of columns, PetaPoco is returning the column names of the first call for both.
To reproduce:
PetaPoco code:
Results from output - as can be seen, the column names for last 2 calls are those from first two calls.
The text was updated successfully, but these errors were encountered: