Performance Questions

Topics: Developer Forum
Aug 21, 2007 at 5:09 PM
Edited Aug 21, 2007 at 5:11 PM
Hi there,

Great looking framework! :-)

I had a question or two relating to performance.

(1) If I've understood what I've seen in the code, the framework needs to create and store PropertyInfo objects for each instance being validated, so it can access the values for validation. Why this choice rather than FieldInfo? At least in the context of config implementations I would think this would allow side stepping the Getter method of the property, on the chance there might be code in there that affects the value read.

(2) I realize you're wrapping the PropertyInfo objects to help with the cost of reflection, but there is still quite a cost each time the value read takes place. Have you explored Dynamic Methods yet? This is supposed to be significantly faster. One of the links below deals with the speed difference, and it is significant.

Here are some links on this:

Link re: hooking up delegate for fields:
http://www.codeproject.com/useritems/DynamicCodeGeneration.asp?print=true

Another good dynamic method article:
http://blog.lab49.com/archives/446

Speed test regarding the two methods:
http://www.ookii.org/post/reflectionvsdynamiccodegeneration.aspx

Reflection performance article:
http://msdn.microsoft.com/msdnmag/issues/05/07/Reflection/default.aspx

Dynamic Methods How-To:
http://msdn2.microsoft.com/en-us/library/exczf7b9.aspx
http://msdn2.microsoft.com/en-us/library/system.reflection.emit.dynamicmethod.aspx

(3) Rather than loading up all descriptors for an instance, could there be a way to lazy load these only when a value has been changed? If a property value were never set, the object's state could be assumed to be valid in some contexts. Not too sure if that would be doable at all, but given the cost of reflection to create the descriptors/dynamic methods is instance based, I would think it would be positive to see if that could be accomplished.

Anyway, I'll appreciate your thoughts and responses on these questions.

Great work! :-)

Kel
Aug 21, 2007 at 5:10 PM
Edited Aug 21, 2007 at 6:30 PM
I may have spoke too soon.

Is the above what the FastInvokeHandler is up to?

Kel
Coordinator
Aug 22, 2007 at 2:46 AM

Let me start by saying I have put significant thought into the performance of this project. It is designed to be called within properties sets and method calls. As a result poor performance of my code could seriously degrade the performance of the consuming code. So this means that among the various design decision tradeoffs (RAM usage, complexity of code, size of assembly etc) performance almost always wins. I have even placed performance before of usability of the API. An example of this can be seen in the fact that ParameterValidationManager.Validate accepts a RuntimeMethodHandle instead of a string to identify the method.

Having said all that I believe there are still performance improvements to be made and I really appreciate your questions and feedback. In fact any further critical analysis would be greatly appreciated.

To answer your questions

Choosing properties over fields

Respect Encapsulation.

As much as possible I try respect the OO concept of encapsulation.

Support for IDataErrorInfo and INotifyPropertyChanged

I wanted to ensure support for both these interfaces. They are very useful when databinding and they both revolve around property names.

smarter people than I

In the world of code there is almost always someone smarter than you who has done it before. So when building an application/framework you should try and find the parts that all the smarter people have built and glue them together. So when thinking about properties vs. fields I had a look at nhibernate, monorail, windows forms data binding and a few others I can’t remember. They all favour properties over fields.

This is not to say that I won’t also support fields in a future version. I particularly like how Castles’ ActiveRecord framework supports fields. By default it uses properties but you can override how it reads a property value using the following enumeration
public enum PropertyAccess
{
    Property,
    Field,
    FieldCamelcase,
    FieldCamelcaseUnderscore,
    FieldPascalcaseMUnderscore,
    FieldLowercaseUnderscore,
    NosetterCamelcase,
    NosetterCamelcaseUnderscore,
    NosetterPascalcaseMUndersc,
    NosetterLowercaseUnderscore,
    NosetterLowercase
}

Dynamic methods

Yes your follow up comment is correct. FastInvokeHandler, and more specifically MethodInvokerCreator, is where I use dynamic method creation. It is based on a great article on CodeProject. (http://www.codeproject.com/csharp/FastMethodInvoker.asp). Although I had to make a few minor modifications to meet my requirements.

Loading validation on demand

Are you asking about this because you have noticed a significant performance hit with the initial load of validation rules? On a quick test of loading the rules of an average complexity type I clock it a 3ms (obvious dependent on many factors). Do you have a scenario where this initial load time is significant larger? I ask because if this is so then maybe I have performance problem that can be tweaked.

Or is this more a theoretical question? I am sure this is possible to achieve this but the side effects must be considered.

At the moment TypeCache is designed in such a way that when a developer requests the validation for a type they can assume that all rules are populated and accessible. This gets a little more difficult with lazy loading.

Also remember that rules can be defined through attributes, added programmatically and added through xml configuration. If rules were just defined in attributes lazy loading of rules would be simpler. But when you take into account xml and through code it gets more complicated.

And remember that whenever you do lazy loading you must place a small overhead on every call. This is because on every call you need to check if loading is required.

The framework also supports external validation of a class. This means that from outside the class you can validate it contents. The class being interrogated does not store an internal validity state or validate properties on set. Have a look at
\Validation\Examples\ExampleLibraryCSharp\PropertyValidationManager\ExternalSample.cs
How would lazy loading on property set work in this scenario?

As you can see I have considered lazy loading of rules and, so far, I have placed it in the too hard basket. You are welcome to take it out of the basket and have a go. Although if you would like to contribute I have much higher priority things that I could use help with.


Hope all of this has made sense.
Keep the feedback coming

Regards
Simon
Aug 22, 2007 at 5:07 AM
Edited Aug 22, 2007 at 5:18 AM
Hi Simon,

Thanks for your very thorough reply. As I'm reviewing the framework code I'm extremely pleased with how many things your appear to be doing right (not that I'm an expert on what's right, but of the things I'm aware of, you sure are hitting most of them on the head! :-)

I'm realizing now that it was the original project by Steve Michelotti (which I've also reviewed recently that led me to your framework) where the PropertyInfo objects themselves were being invoked directly whenever the values were required for validation. So it's great you've moved on to the Dynamic Method approach, and harnessed the greater efficiencies there.

I also admire your desire to respect encapsulation of objects. I give IT developer courses on these subjects regularly and (hopefully this will not be offensive, I definitely don't intend it that way) I always describe the ability of Reflection to access private members as "object oriented rape." The ability to entirely violate an object designer's intentions is an odd capability for an infrastructure. That said, the one place where I think the "exception" is potentially valid is in the context of support frameworks for object persistence, validation, and so forth. The point of such frameworks is to provide a service to the "real" design of an application. When it's time to persist the object, it's just time to get the data out of there and save it. When it's time to validate existing state, I think getting directly to the data makes sense as well.

I definitely agree that direct field access should not be the only option though, but ultimately should be an available option where it is feasible (such as with configuration file definition of rules). I took my inspiration for this point from Paul Wilson's OR Mapper tool, where the mapping file simply allows a member name to be defined for the mapping. This can be either a Property name, or a field.

As you suggested, my lazy load question was just theoretical at this point, though I've done a lot of thinking down these lines in other contexts too. My main concern is not with the loading of the rules themselves for the type, which I understand are cached for each type just once. Is that right? My focus is on the use of Reflection with each object instance of a given type, to create the property pointers required to validate that particular instance.

I'm thinking that must be a pretty significant overall speed hit, and one that must be born for each object being validated, whether or not any of it's values have actually changed from an already valid state. If there was some way to only create the property pointer for a property at the point it was first modified, that hit could be avoided where this never happened. Again, Paul Wilson's tool implements a similar approach when performing updates on the database. It provides the option to only create update SQL for fields that have changed on an instance. Obviously the implementation to do that is not identical to what's happening here, but the concept is similar. I believe Paul's infrastructure actually tracks running instances to accomplish this which makes sense in a persistence layer but wouldn't here.

To do this without tracking, you are correct that lazy loading would require the object itself to notify the infrastructure of an update to its members upon the first update. So this feature would not be available for scenarios where we are external to the object being validated. But when we do have such access, a method on the PropertyValidationManager, associated with the class either through base class inheritance or encapsulation, could be called from each property set, to notify the infrastructure an update took place. I just peaked at INotifyPropertyChanged, and it also seems the PropertyValidationManager could subscribe to this event for the object instance its associated with, as a way of accomplishing this if the object implemented that interface for its member updates

You're right that there would need to be a flag indicating whether validation is required, since this may be called on the object more than once after the property pointer is in place. So it would depend on how the cost of creating all property pointers for every object instance ever validated, compared with just running those checks on existing property pointers for a given instance, whenever validation occurs. Given the cost of Reflection, I still suspect that may be a better way to go in the long run. But you also may be right that "too difficult" is the better response.

A final and simpler thought on the use of Reflection though. One thing Paul has implemented in his ORM framework is an IObjectHelper interface that class designers can choose to implement. It has a single method that returns type Object, and takes a string parameter of the member name. The class designer can wire his classes to return the value of the requested member. During persistence, the infrastructure calls this interface if the class being persisted implements it, instead of reflecting the class to retrieve the values.

Paul has told me IObjectInterface results in about twice the speed of using Reflection, though I think this may have been before Dynamic Methods. I think this kind of idea could be implemented quite easily in the .net Validation Framework, if you wanted to provide the option to totally avoid reflection during validation. If a PropertyValidationManager object is passed a type that implements IValidationHelper (let's call it for now), it will use the method provided on that interface to retrieve the values when they are required. Again, this could simply be another available option for those who don't mind the extra work of wiring up the members, for the extra speed gains.

Sorry to be so long winded again, but I appreciate the opportunity to reflect on your framework. I'm coming to the close of a few weeks of researching available options for validition and expect I will likely be settling here soon.

Regards.

Kel

Coordinator
Aug 22, 2007 at 8:39 AM
Yes rules are cached the first time a Type is validated.

Re your statement
“Reflection with each object instance of a given type, to create the property pointers required to validate that particular instance”
What reflection are you referring to?
When a type is cached all the properties (with rules) are placed in a dictionary that keyed on the name (a string) of the property. So when you validate a property the only overhead, to work out which dynamic method to call, is doing a hash lookup in the dictionary.


Re Allowing users to provide their own properties values (IObjectHelper)
Although I can see the benefits of this approach over reflection I am not sure it is worth it when using dynamic methods
Thinking about what needs to be done to implement IObjectHelper on a property validation…
-check if a IObjectHelper is available to validate a property
-do a call to the business objects IObjectHelper
-inside the implementation the developer would need to do a switch on the name of the property name (string).
-when the property is found return the value of it.

Have a look at the performance table for dynamic methods on http://www.codeproject.com/csharp/FastMethodInvoker.asp.
For 1,000,000 calls to a property
Reflection = 2703ms
Dynamic method invoke = 45ms
Direct Call = 12ms

So this means that all the code to get a property value from IObjectHelper would have to be no more than 4 time slower than doing a direct call to the property. And when dynamic methods are only 33ms slower over 1,000,000 calls I don’t think we are picking any low level fruit with this discussion. I would however, for a learning experience, be interested to see how an IObjectHelper scenario performs against dynamic methods.


Re chosing this validation framework over others.
Is there a reason you are not using the entlib validation block?
Aug 22, 2007 at 10:50 AM
Hi Simon,

I may well not yet fully grasp how dynamic methods work then. (Actually discovered them yesterday, in the process of attempting to review my past "reflections" on Reflection - done pre .Net 2.0.)

My understanding was that while the rules themselves are cached for the type in your framework, the ability to retrieve the property value being evaluated in a particular running instance of the type still requires you to wire up a dynamic method for each property on that running instance, and to do so the first time that particular instance is validated. (i.e. Retrieve all the PropertyInfo objects for this particular instance and use these to create dynamic method pointers by calling InfoProperty.GetGetMethod for each. These would then be used from then on to invoke the property values on this running instance.)

Are you saying you only need to create one dynamic method per property, per type, which you can then reuse to invoke the value from any running instance of the type? If so then yes, that would probably beat out the IObjectHelper approach, since you would only face the 2703ms once for each property on the type.

Is that how it actually works?

Thanks for your patience as I get my mind wrapped around this...

Kel
Coordinator
Aug 22, 2007 at 11:41 AM
Kel

Correct. The dynamically created handler I use to get a property is type based not instance based. So it only needs to be created once for each property. This is done at the same time that the validation rules are cached. Have a look in the constructor of PropertyDescriptor for the implementation of this.

It is actually possible to create to create an instance based handler. But as you correctly pointed out type based are much better for us.

And no apologies. So far all of your questions have been well founded. It is also good to have someone question your design decisions.

Simon
Aug 22, 2007 at 12:12 PM
Edited Aug 22, 2007 at 3:24 PM
Hi Simon,

Thanks for your reply, and that's good news. So it sort of compiles an instance access method to the class definition on the fly, which can then be invoked for any instance. Pretty cool stuff! :-)

I realized too you had asked why I was not using the Validation Block from Enterprise Library.

Primarily I prefer not to get tied into monolith frameworks like EntLib. I realize they've done a lot to separate out the blocks now so you are not committed to all of them, when you only want to use some of them. Validation Block definitely tempted me to take a serious renewed look during my recent exploration into validation options.

But the EntLib framework is just too large and complicated. I also dislike how every version release has constituted major breaking changes with previous versions. I don't want to have to rewire everything in my applications, everytime a new version comes out.

I prefer simpler frameworks. That's why I've preferred Paul Wilson's OR Mapper over NHibernate and others. The others attempt to do "everything." This makes them overly complicated. Paul has made a deliberate choice to cover the 80% of commonly needed features, and leave the other 20% up to the developer. This keeps his framework much simpler to use.

I really hope you'll work with a similar philosophy as you proceed with your validation framework. So far this seems to be your plan...

Kel