[Still LONG] Re: [LONG] reasons why "advanced" expressions are needed

From: Andrus Adamchik (andru..bjectstyle.org)
Date: Thu Oct 02 2003 - 02:35:34 EDT

  • Next message: Giulio Cesare Solaroli: "Re: [Still LONG] Re: [LONG] reasons why "advanced" expressions are needed"

    On Tuesday, September 30, 2003, at 01:18 PM, Giulio Cesare Solaroli
    wrote:
    > I hope this LONG message have shown you my point.

    It sure did. I perfectly understand the deficiencies of Cayenne
    expression API in this respect (and "vanilla" EOF for that matter). I
    am glad you are not just pointing them out, but also offering an O/R
    solution. I am aware of SQL solutions; we just needed a push from
    somebody to actually start bringing this to Cayenne :-).

    > Now the main question is: which is the best place, given Cayenne
    > current architecture, to place the logic to handle these cases?
    > I will be very pleased to "migrate" our current EOF/Objective-C
    > implementation into Cayenne if this is achievable with a reasonable
    > effort.

    CAYENNE API

    1. Expressions (org.objectstyle.cayenne.exp)

    As you noticed, Expression class simply defines semantics of the
    expression, and doesn't contain any processing logic. Processing of SQL
    generation is done by the access layer. In-memory evaluation is done by
    ExpressionEval (called from Expression.eval(), still incomplete). Since
    creating expressions directly is sort of counter-intuitive (at least in
    their current form), ExpressionFactory static methods are used instead.
    Their names (ideally should :-) ) follow the "common logic".

    Also note that some of currently defined expression types (mostly
    aggregate functions and things like ALL, EXISTS, etc.) are not used or
    supported in Cayenne. We may reuse some of them where it makes sense in
    the new API described below.

    2. QueryTranslator/QualifierTranslator

    These access layer classes define algorithms for SQL translation. There
    is a default implementation, which can be (optionally) customized by
    each DbAdapter (e.g. if some database doesn't support feature X, or
    implements it differently).

    IMPLEMENTATION IDEAS/NOTES

    0. Preparation.

    My +1 for the idea to start by creating tests for all the cases we plan
    to cover.... Creating upfront user documentation is another thing that
    will help us here.... Document cases that absolutely require EXISTS
    support in the database and will blow without it.

    1. Unit Tests

    Though Cayenne is not using DBUnit, it has an extensive testing
    framework of its own, and it should be easy to write all relevant
    tests. A few hints:

       - Test cases are located under cayenne/src/tests/java.
       - Ant scripts will run all matching "*Tst" as unit tests, so adding a
    new test suite is as simple as creating an XYZTst class in an
    appropriate package.
       - Subclass CayenneTestCase to get access to the Cayenne stack during
    testing.
       - Test DataMap is located under src/tests/resources/test-resources.
    Db schema is dropped and recreated on each run (but not on each test of
    course).

    There were some proposals to use DBUnit in the past, but it didn't get
    too far, so we have to use our own API to create test data sets.

    2. Expression Semantics.

    The fact that Expressions are abstract and do not have any processing
    logic should allow to define semantics regardless of how it may be
    eventually translated to SQL...as long as there is enough info/hints
    collected in the expression.

     From what I can tell from the cases provided (if there are more, please
    bring them on), we are dealing with a general problem of matching a
    *collection* of values against a relationship path (with special cases
    being : matching an attribute value instead of relationship, matching a
    to-one relationship, matching against a single-object collection and,
    finally, matching against a wildcard value, e.g. "relationship not
    empty"). Additional logical operation applied after the match is
    "none", "any", "all". Since collections can consist of either
    DataObjects or scalar values, we may introduce another variable into
    the equation - match type (e.g use ">" instead of "="), but lets not do
    it just yet, or my head will explode :-). This is how it can possibly
    be defined using ExpressionFactory:

    [please let me know if this analysis attempt fails to cover any of the
    cases we are trying to solve]

    // Wildcards...

       /** Qualifier for not empty relationship. Creates unary expression of
    type EXISTS.
        * Translated to "FK is not null" for to-one, and "EXISTS" for
    to-many.
        */
       public static Expression hasAnyExp(String relationshipPath);

       /** Qualifier for an empty relationship. Creates unary expression of
    type NOT_EXISTS.
        * Translated to "FK is null" for to-one, and "NOT EXISTS" for
    to-many.
        */
       public static Expression hasNoneExp(String relationshipPath);

    // Single objects... some existing API can be reused...

       /** Don't think current implementation has to change. */
       public static Expression matchExp(String relationshipPath, Object
    value);

       /** Redo translator for the existing expression to use NOT EXISTS for
    to-many. */
       public static Expression noMatchExp(String relationshipPath, Object
    value);

    // Collections...
    // additionally have to handle "doNotSplit"... internally i suggest
    making OBJ_PATH
    // expression a binary expression containing two operands to describe
    the full path and the doNotSplit part....
    // This can be easily made backwards compatible, assuming "doNotSplit
    is fullPath"

       /** Qualifier for matching all of collection values. Creates binary
    expression of type EQUAL_TO with
        * second parameter being a collection. Translated to the list of
    joins taking split policy into account.
        * Should blow during execution if used with to-one relationship and
    collection with size > 1. (??)
        */
       public static Expression hasAllOfExp(String relationshipPath, String
    doNotSplitPath, Collection values);

       /** Qualifier for matching any of collection values. Creates binary
    expression of type IN with
        * second parameter being a collection.
        * Notes:
        * - Is this any different from our current "inExp", other that
    in doNotSplitPath?
        * - Matching on a compound PK DataObject will definitely prevent
    using IN, instead a group of OR
        * statements will be needed: ((PK1 = v1 AND PK2 = v2) or ...)
        */
       public static Expression hasAnyOfExp(String relationshipPath, String
    doNotSplitPath, Collection values);

       /** Qualifier for matching none of collection values. Creates binary
    expression of type NOT_IN with
        * second parameter being a collection. Translated to NOT EXISTS
    with correct joins for to-many,
        * or NOT IN for to-one, or NOT ((PK1 = v1 AND PK2 = v2) or ...) for
    compound PK DataObjects.
        */
       public static Expression hasNoneOfExp(String relationshipPath, String
    doNotSplitPath, Collection values);

    3. SQL Translation

    SQL Translators is arguably the messiest part of our current codebase.
    Anyway, I can identify the following pieces needed in the translator:

    1. I like Scott's idea about having support for EXIST and subqueries.
    I've been toying with this idea from the day one of Cayenne, but never
    got back to actually implementing it.... Doing it independently might
    as well give us building blocks to do the spec above.

    2. Add support for "doNotSplit" OBJ_PATH and DB_PATH.. This shouldn't
    be too hard to do (in a backwards compatible way too - current policy
    is "doNotSplit = fullPath").

    3. (1) and (2) being prerequisites, implement support for all the new
    expressions... may turn out to be less work than it seems now. Esp.
    after all the translator refactoring that might be needed by [1], and
    fresh eyes looking at the translator's messy flow :-)

    I suspect we will need a separate discussion of SQL Translators... I
    don't know if it is time for a redesign contest :-/ Maybe it is....
    It'll definitely be a good thing to discuss alternative translator
    implementations once we stumble on any problems while extending them
    (if we don't...oh well). Current test cases should give us a good
    cushion in case we have to seriously redo them.

    Andrus



    This archive was generated by hypermail 2.0.0 : Thu Oct 02 2003 - 02:34:54 EDT