Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed

From: Giulio Cesare Solaroli (slrgcs..bn-italy.com)
Date: Fri Oct 03 2003 - 05:25:54 EDT

  • Next message: Mike Kienenberger: "Re: DataContext delegate?"

    On Thursday, Oct 2, 2003, at 08:35 Europe/Rome, Andrus Adamchik wrote:

    > On Tuesday, September 30, 2003, at 01:18 PM, Giulio Cesare Solaroli
    > wrote:
    >> I hope this LONG message have shown you my point.
    >
    > It sure did. I perfectly understand the deficiencies of Cayenne
    > expression API in this respect (and "vanilla" EOF for that matter). I
    > am glad you are not just pointing them out, but also offering an O/R
    > solution. I am aware of SQL solutions; we just needed a push from
    > somebody to actually start bringing this to Cayenne :-).
    >
    >> Now the main question is: which is the best place, given Cayenne
    >> current architecture, to place the logic to handle these cases?
    >> I will be very pleased to "migrate" our current EOF/Objective-C
    >> implementation into Cayenne if this is achievable with a reasonable
    >> effort.
    >
    >
    > CAYENNE API
    >
    > 1. Expressions (org.objectstyle.cayenne.exp)
    >
    > As you noticed, Expression class simply defines semantics of the
    > expression, and doesn't contain any processing logic. Processing of
    > SQL generation is done by the access layer. In-memory evaluation is
    > done by ExpressionEval (called from Expression.eval(), still
    > incomplete). Since creating expressions directly is sort of
    > counter-intuitive (at least in their current form), ExpressionFactory
    > static methods are used instead. Their names (ideally should :-) )
    > follow the "common logic".
    >
    > Also note that some of currently defined expression types (mostly
    > aggregate functions and things like ALL, EXISTS, etc.) are not used
    > or supported in Cayenne. We may reuse some of them where it makes
    > sense in the new API described below.
    >
    > 2. QueryTranslator/QualifierTranslator
    >
    > These access layer classes define algorithms for SQL translation.
    > There is a default implementation, which can be (optionally)
    > customized by each DbAdapter (e.g. if some database doesn't support
    > feature X, or implements it differently).

    My opinion here is biased by a log usage of EOF, but I don't like this
    arrangement, for two reasons:
    - it keep information and logic in different places; this in not a good
    OOP practice;
    - if I understand the whole thing right, you end up having basically 3
    classes: one expression (I understand that there are 4 subclasses, but
    ther are there only to store differente values, nothing really
    interesting), one ExpressionEval (for in memory evaluation of
    expressions objects) and one QueryTranslator (again, this class can be
    subclassed, but for DB specific behaviour).

    This solution could be perfectly right if the given set of expressions
    were fixed, but I think it will fail badly as soon as you want to
    extend the semantic of the Expression class.

    > IMPLEMENTATION IDEAS/NOTES
    >
    > 0. Preparation.
    >
    > My +1 for the idea to start by creating tests for all the cases we
    > plan to cover.... Creating upfront user documentation is another thing
    > that will help us here.... Document cases that absolutely require
    > EXISTS support in the database and will blow without it.

    I will try to take a closer look at Cayenne tests and try to migrate
    the tests I did wrote while developing our extension for Cayenne. I'll
    keep you update of any progress on this topic.

    > 1. Unit Tests
    >
    > Though Cayenne is not using DBUnit, it has an extensive testing
    > framework of its own, and it should be easy to write all relevant
    > tests. A few hints:
    >
    > - Test cases are located under cayenne/src/tests/java.

    I was missing this altogether. (UPDATE: now I have looked better; in
    the distribution package there are the sources, but not the tests.
    That's why I couldn't find them).

    > - Ant scripts will run all matching "*Tst" as unit tests, so adding
    > a new test suite is as simple as creating an XYZTst class in an
    > appropriate package.

    I am not using Ant, yet. A good point to start.

    > - Subclass CayenneTestCase to get access to the Cayenne stack during
    > testing.

    Excellent.

    > - Test DataMap is located under src/tests/resources/test-resources.
    > Db schema is dropped and recreated on each run (but not on each test
    > of course).

    This solution is perfectly fine, but will create some trouble while
    running read/write tests, as following test could find corrupted data
    upon failure of previous tests.

    This is not a problem on the current topic, as expression here a needed
    only to select records, so basically a read-only situation, but in
    general I would find more correct a way to tell the testCase if it was
    read-only, and thus could create the DB once per run, or read-write,
    and thus it should reset the content of the DB for each test.

    > There were some proposals to use DBUnit in the past, but it didn't get
    > too far, so we have to use our own API to create test data sets.

    Perfectly fine.

    > 2. Expression Semantics.
    >
    > The fact that Expressions are abstract and do not have any processing
    > logic should allow to define semantics regardless of how it may be
    > eventually translated to SQL...as long as there is enough info/hints
    > collected in the expression.

    My problem with this solution is, as already stated before, that it
    strikes against OOP attitude of keeping data and logic together.

    > From what I can tell from the cases provided (if there are more,
    > please bring them on), we are dealing with a general problem of
    > matching a *collection* of values against a relationship path (with
    > special cases being : matching an attribute value instead of
    > relationship, matching a to-one relationship, matching against a
    > single-object collection and, finally, matching against a wildcard
    > value, e.g. "relationship not empty"). Additional logical operation
    > applied after the match is "none", "any", "all". Since collections can
    > consist of either DataObjects or scalar values, we may introduce
    > another variable into the equation - match type (e.g use ">" instead
    > of "="), but lets not do it just yet, or my head will explode :-).
    > This is how it can possibly be defined using ExpressionFactory:
    >
    > [please let me know if this analysis attempt fails to cover any of the
    > cases we are trying to solve]
    >
    > [...]

    As you can see yourself, with this arrangement you are forced of
    thinking about all possible cases upfront. To me this is a relevant
    sign of bad design.

    Please, don't take it as a personal comment on your work; what you have
    done is remarkable, and I am joining here as I think Cayenne is the
    most promising persistent framework for Java. Never the less, my
    personal opinion is that the design of the Expression package has many
    shortcomes.

    Let's sees how we could approach this:

    > 3. SQL Translation
    >
    > SQL Translators is arguably the messiest part of our current codebase.
    > Anyway, I can identify the following pieces needed in the translator:
    >
    > 1. I like Scott's idea about having support for EXIST and subqueries.
    > I've been toying with this idea from the day one of Cayenne, but never
    > got back to actually implementing it.... Doing it independently might
    > as well give us building blocks to do the spec above.
    >
    > 2. Add support for "doNotSplit" OBJ_PATH and DB_PATH.. This shouldn't
    > be too hard to do (in a backwards compatible way too - current policy
    > is "doNotSplit = fullPath").
    >
    > 3. (1) and (2) being prerequisites, implement support for all the new
    > expressions... may turn out to be less work than it seems now. Esp.
    > after all the translator refactoring that might be needed by [1], and
    > fresh eyes looking at the translator's messy flow :-)
    >
    > I suspect we will need a separate discussion of SQL Translators... I
    > don't know if it is time for a redesign contest :-/ Maybe it is....
    > It'll definitely be a good thing to discuss alternative translator
    > implementations once we stumble on any problems while extending them
    > (if we don't...oh well). Current test cases should give us a good
    > cushion in case we have to seriously redo them.

    Given all my comments (please, comment on them), if we managed to agree
    on my position that separating data from logic (like in the Expression,
    ExpressionEval, QueryTranslator case) is a bad think, I would propose
    the following path:

    - create a new cayenne.qualifiers (can you see how much I am biased?!
    :-] );
    - add all relevant methods to DataContext to be able to use qualifiers,
    still keeping Expressions working;
    - if everything works fine, we could start deprecating Expressions,
    with all the related classes and methods, and move up for a 2.0
    release!! ;-]

    This path seems long, but I don't think it will take too much. All
    other Cayenne features should be easily update where needed to use the
    new package.

    Needless to say, I will be pleased to start writing this new package,
    even if I will need much help in understanding how to integrate will
    all other Cayenne classes.

    Does this make any sense?

    Giulio Cesare



    This archive was generated by hypermail 2.0.0 : Fri Oct 03 2003 - 05:26:08 EDT