Maximum Date that exists for two or three parties and also another table

  • Hey guys, feel like this one should be obvious to me but it's just not working. I'm trying to get the maximum date that would similarly occur amongst the clients that I search for (which I'm not getting working), and also occur in another table.

    declare @Table1 table(Name nvarchar(25),date datetime)

    declare @Table2 table(Item nvarchar(25),date datetime)

    declare @Date datetime

    insert into @Table1

    select 'John','2010-03-22'

    union

    select 'Mary','2010-03-22'

    union

    -- Note Mary doesn't have a '2010-03-23' value

    select 'John','2010-03-23'

    union

    select 'Mark','2010-03-22'

    union

    -- Duplicates can occur

    select 'Mark','2010-03-22'

    union

    select 'Mark','2010-03-23'

    union

    select 'Mark','2009-09-01'

    union

    -- Not searching for Neville, just indicating that there are other

    -- entries that are ignored

    select 'Neville','2009-09-01'

    union

    -- As there would be earlier values where Clients could have the same date

    select 'Mary','2009-01-01'

    union

    select 'Mark','2009-01-01'

    union

    select 'John','2009-01-01'

    insert into @Table2

    select 'Table','2010-03-22'

    union

    select 'Table','2000-03-22'

    union

    select 'Table','2010-03-23'

    set @Date = (select max(nm.date) from @Table1 nm, @Table2 obj

    where nm.date = obj.date

    and exists (select date from @Table1 where name = 'John')

    and exists (select date from @Table1 where name = 'Mary')

    and exists (select date from @Table1 where name = 'Mark')

    )

    select @date

    So effectively I'm looking for '2010-03-22' (not '2010-03-23'), as the Maximum Date where all three clients that I'm looking for would return values, and then only if the other table has that date - EDIT - that is, of this date doesn't occur in @Table2, it should return an earlier date where the date exists for each of the Clients in @Table1 and also @Table2 - , which I believe works (hence not populating a set of Data that meets the Table 1 criteria but not the Table2 - joining the tables should work but checking the existance of two, three values in a column is tripping me up).

    Thanks!

  • Hi Nugby, try this. But let me warn you, this is a crooked way of doing it.

    And it may not give good performance for large tables (see the functions in the where clause).

    create table #t(name varchar(25), date datetime)

    insert into #t

    select * from @table1 where name = 'Mary' or name = 'John' or name = 'Mark'

    union

    select * from @table2 where item = 'Mary' or item = 'John' or item = 'Mark'

    ;with cte as

    (

    select date, (select name + ',' from #t where date = t.date for xml path(''))'Names'

    from #t t

    group by date

    )

    select max(date) from cte

    where

    charindex('mark',names,0) > 0

    and charindex('mary',names,0) > 0

    and charindex('john',names,0) > 0

    drop table #t

    https://sqlroadie.com/

  • select

    @date = max(obj.date)

    from

    @Table2 obj

    join

    @Table1 t1 on t1.date = obj.date and t1.name = 'John'

    join

    @Table1 t2 on t2.date = obj.date and t2.name = 'Mary'

    join

    @Table1 t3 on t3.date = obj.date and t3.name = 'Mark'

  • SELECT MAX(date)

    FROM (

    SELECT date

    FROM @Table1 T1

    WHERE Name IN ('John', 'Mark', 'Mary')

    AND EXISTS (SELECT T1.date INTERSECT SELECT T2.date FROM @Table2 T2)

    GROUP BY date

    HAVING COUNT(*) = 3

    ) SQ;

  • Hi Paul, as you just mentioned in another thread, "Never use a UNION where a UNION ALL will do"! I believe the test script contains an error because the comments say that duplicates are allowed but UNION instead of UNION ALL is used.

    If you change UNION to UNION ALL in the test script your solution returns 2009-01-01.

    Peter

  • And the modified version returns NULL

  • Peter Brinkhaus (3/23/2010)


    I believe the test script contains an error

    Oh well. Never mind then.

  • Thanks guys!

    I'm glad you mentioned the performance aspect, Arjun. Unfortunately (for me!) @Table1 reflects the largest table I ever have to use. Merely doing a "Select * from" just now took 1min03secs (Thank God for Indexes!) so I'll probably use Peter's solution seeing they both work perfectly.

    Had never seen a table joined to itself before (have always used Subqueries for similar Join tasks, which would have given me each client's individual Max Date and therefore not worked in this case). Will definitely find this technique useful in future!

  • If you are looking for performance I should try Paul's solution as well, with a minor change (replace COUNT(*) with COUNT(DISTINCT name)). Given the small test data set and no indexing I am not sure if my solution will perform better than Paul's.

    Peter

  • Sure thing. I'd taken Paul's "Never mind then" comment as conceding it didn't work having not tested it myself. If he's in NZ, Code I'd written at that time of the morning (I'm in Australia) wouldn't have worked either!

    Once I get knock over these damn VBA Macros i'm going to be stuck with this morning I'll test the solutions out against the Production data and see if there's a difference in performance in the solutions and report back. Incidentally, I do have an Index against that table on the Date and Name fields so I'm still expecting to see this execute in the 3-4 second range.

  • If he's in NZ, Code I'd written at that time of the morning (I'm in Australia) wouldn't have worked either!

    Paul made a mistake based on an improper data set. Even then, his contribution was original and worth to look at. Anyway, good luck with the tests and glad we all could help.

    Peter

  • Thanks for taking the time to fix my solution Peter 🙂

    I have another chance to look at this, and find the following interesting:

    Setup:

    CREATE TABLE #Table1

    (

    Name NVARCHAR(25) NOT NULL,

    date DATETIME NOT NULL

    );

    CREATE TABLE #Table2

    (

    Item NVARCHAR(25) NOT NULL,

    date DATETIME NOT NULL

    );

    INSERT #Table1

    SELECT 'John', '2010-03-22' UNION ALL

    SELECT 'Mary', '2010-03-22' UNION ALL

    SELECT 'John', '2010-03-23' UNION ALL

    SELECT 'Mark', '2010-03-22' UNION ALL

    SELECT 'Mark', '2010-03-22' UNION ALL

    SELECT 'Mark', '2010-03-23' UNION ALL

    SELECT 'Mark', '2009-09-01' UNION ALL

    SELECT 'Neville', '2009-09-01' UNION ALL

    SELECT 'Mary', '2009-01-01' UNION ALL

    SELECT 'Mark', '2009-01-01' UNION ALL

    SELECT 'John','2009-01-01';

    INSERT #Table2

    SELECT 'Table', '2010-03-22' UNION ALL

    SELECT 'Table', '2000-03-22' UNION ALL

    SELECT 'Table', '2010-03-23';

    -- Indexes help!

    CREATE INDEX nc1 ON #Table1 (date, name);

    CREATE INDEX nc2 ON #Table2 (date);

    Solution:

    SELECT MAX(SQ.date)

    FROM (

    SELECT T1.date,

    T1.Name,

    rnk = RANK() OVER (PARTITION BY Date ORDER BY Name)

    FROM #Table1 T1

    WHERE Name IN (N'John', N'Mark', N'Mary')

    GROUP BY

    T1.date,

    T1.Name

    ) SQ

    WHERE SQ.rnk = 3

    HAVING EXISTS (SELECT date FROM #Table2 T2 WHERE T2.date = MAX(SQ.date));

    Execution plan:

    I like this plan because there is only one (semi) join, no blocking operators, and no memory grant.

    Four logical reads on the sample data provided!

  • Hi Paul. I don't think that solution works if Table 2 doesn't contain the Max Date from Table 1. Your other solution works perfectly though.

    Finally managed to test all of these (I had managed to make a simple mistake with Arjun's code looking at the Production data which managed to take half an hour to locate :pinch:). Each takes around 1-2 seconds normally, though when I test this against Production (where @Table2 Item column becomes effectively left(datasource,9) = "etcetcetc"), I got the following averages from about 40 attempts each (10 sets of Running each four times and leaving for a bit).

    Arjun's - Min 2, Ave 5, Max 10

    Peter's - Min 2, Ave 5, Max 8

    Paul's (1st Solution) - Min 1, Ave 3, Max 7

    Given I'd added the left(datasource) etc, I ended up removing the Having Count line from Paul's solution, but have otherwise kept the structure.

    If you guys would like to see the specific code I used, you're welcome to PM me, but I'm pretty happy with Paul's query's performance given that the table it's hitting is the only thing slowing it down!

    Thanks again guys, sorry I didn't get back yesterday (like I thought I would).

  • Nugby (3/24/2010)


    Hi Paul. I don't think that solution works if Table 2 doesn't contain the Max Date from Table 1.

    Did you try it? I think it meet the criteria originally set out - return the highest date that all three people have in common, but only if that date exists in the other table. I would also expect it to be fastest.

  • Of course! I promise I'd never be so laissez-faire when someone is using their valuable time to help me on here.

    I found that while it would work perfectly for this data, if I changed the 22nd of the 3rd references in @Table2 (this is part of a function hence my constantly reverting to Table Variables) to the 21st and added a 21st reference for each of the Clients in @Table1 (e.g. so the maximum date for the three clients was the 22nd but there was no match in @Table 2 for that date) I'd get nothing back despite there being a potential match for the 21st. In hindsight, I probably didn't make that requirement clear enough though (that I'd need @Table2 to contain that value also rather than just joining what data is available).

    Your existing solution works perfectly for me though and never took even 2 seconds to run by itself when I tested them all individually against this data (it's the join to a 1.6 Million row table that slows it down to the 3 Second average despite the indexes on that table).

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply