Re: Patch to make SelectQueries work...

From: Mikaël Cluseau (nwr..wrk.dyndns.org)
Date: Sun Apr 17 2005 - 19:20:42 EDT

  • Next message: Andrus Adamchik: "Re: Patch to make SelectQueries work..."

    Andrus,

    Here is the method I use to get the same behaviour. I use two queries
    but it is still faster (in fact, .3 seconds for the request on primary
    keys and 1.4 to get the full rows from the PKs, I don't really
    understand why it is that slow -- but it is not done if objects are in
    cache). I think it needs some work to be used in the real Cayenne,
    because I didn't checked the strict equivalency in every case (in
    particular, when the original query already gets customDbAttributes, or
    the attributes I don't copy -- a clone method in query could be useful).

    Notice that the QueryFactory factory used is home-made. Function names
    are self-describing.

    public static <T extends DataObject> List<T> query(DataContext context,
                    Class<T> clazz, SelectQuery query, boolean fetchByPk) {
            if (!fetchByPk) {
                    // Use the normal method
                    return query(context, clazz, query);
            }
            DbEntity entity = context.getEntityResolver().lookupDbEntity(clazz);

            // Copy the original query
            SelectQuery query2 = new SelectQuery(clazz);
            query2.setQualifier(query.getQualifier());
            query2.addOrderings(query.getOrderings());
            query2.setFetchLimit(query.getFetchLimit());

            // But modify the copy
            query2.setDistinct(true);

            List<DbAttribute> attrs = entity.getPrimaryKey();
            if (attrs.size() == 1) {
                    query2.addCustomDbAttribute(attrs.get(0).getName());
            } else {
                    throw new UnsupportedOperationException(
                                    "Unable to handle multi-columns (" + attrs.size()
                                                    + ") primary keys");
            }

            // Performs the modified query
            List results = context.performQuery(query2);

            if (results.isEmpty()) {
                    // Avoid useless pain
                    return new LinkedList<T>();
            }

            List<T> cached = new LinkedList<T>();
            Expression e;
            if (attrs.size() == 1) {
                    List<Object> pks = new LinkedList<Object>();
                    final String name = attrs.get(0).getName();
                    for (Object o : results) {
                            Object pk = ((DataRow) o).get(name);
                            T cachedObj = clazz.cast(context.getObjectStore().getObject(
                                            new ObjectId(clazz, name, pk)));
                            if (cachedObj == null) {
                                    pks.add(pk);
                            } else {
                                    cached.add(cachedObj);
                            }
                    }
                    e = QueryFactory.createIn("db:" + attrs.get(0).getName(), pks);
            } else {
                    throw new UnsupportedOperationException(
                                    "Unable to handle multi-columns (" + attrs.size()
                                                    + ") primary keys");
            }

            if (e == null) {
                    // No expression => nothing to get
                    return cached;
            }

            // Get full and real objects
            List<T> retval = query(context, clazz, new SelectQuery(clazz, e));
            // Ajout des objets en cache
            if (!cached.isEmpty()) {
                    retval.addAll(cached);
            }
            // Sort the results
            List<Ordering> orderings = query.getOrderings();
            if (orderings != null && !orderings.isEmpty()) {
                    Collections.sort(retval, new CompositeComparator(orderings));
            }
            return retval;
    }

    I did it on my spare time so you are free to use it as you want.

    **** some more comments below ****

    Le dimanche 17 avril 2005 à 09:58 -0400, Andrus Adamchik a écrit :
    > Mikaël,
    >
    > I applied the patch - thanks!
    >
    > Re: subselects. You are talking about queries with qualifiers over
    > to-many relationships, right?

    Yes

    > As this is the case when DISTINCT is
    > added behind the scenes. I've done some research in this area before -
    > http://www.objectstyle.org/cayenne/lists/cayenne-user/2003/05/0031.html
    > I've been thinking of adding this as an alternative translation
    > strategy configurable either per query or per adapter.
    >
    > Can't say when. Patches are always welcome ;-)

    The previous method is an attempt ;-)

    > However I just realized that Cayenne already supports another strategy
    > described above - fetching duplicate rows and then internally applying
    > "distinct" logic, returning rows with unique PK. Can't say if this is
    > faster than PostgeSQL distinct ... somebody needs to try.

    For me it won't, the join multiplies the result set's size by approx
    300-700. It also disables some optimizations if I understand the
    EXPLAIN's results rights.

    > The actual worker class is
    > "org.objectstyle.cayenne.access.util.DistinctResultIterator". It is
    > used behind the scenes if SelectTranslator returns true from
    > "isSuppressingDistinct" method.
    >
    > Setting this up is not very user-friendly right now, as it wasn't
    > intended for public use, but if we have proof that it actually improves
    > performance, we can make it one of SelectQuery flags.

    Answered that just before. You can't know own much more network load it
    will be since it depends on the data.



    This archive was generated by hypermail 2.0.0 : Sun Apr 17 2005 - 19:20:45 EDT