Semantics of simple-seeming UPDATE statement with FROM clause

  • Dear Experts

    I am very unhappy with my current understanding of the UPDATE T-SQL statemet in SQL Server. Specifically, there are things about the FROM clause that I don't get.

    OK, so UPDATE with FROM is used when I want to update one table (e.g Table1) using the values found in some table expression generated by the FROM clause. This expresion may be a table itself, or it may be some query that returns rows. If I have a single column (strL say) whose values I wish to update, I put it in the SET clause, for instance SET strL=Table2.strF, where Table2.strF is a column in the FROM clause. Hopefully all obvious so far.

    My question is this: how does SQL Server decide which row returned by the FROM clause is used to
    update the row in the destination table?

    The answer to the question often *seems* obvious because the table expression in the FROM clause involves
    the table whose name appears after the UPDATE, and the table expression involves an inner join on a primary key.
    However, I haven't yet found an explicit statement (in BOL or anywhere else) stating what the determining factor or
    factors are.

    Partly to assure myself that I wasn't missing something obvious, I set up the following:IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL
      DROP TABLE dbo.Table1;
    GO
    IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL
      DROP TABLE dbo.Table2;
    GO
    CREATE TABLE dbo.Table1 
      (ColSeq int NOT NULL, strL varchar(1) NOT NULL);
    GO
    CREATE TABLE dbo.Table2 
      (Forename varchar(20) PRIMARY KEY NOT NULL, strF varchar(1) NOT NULL);
    GO
    INSERT INTO dbo.Table1 VALUES(1, 'A'), (2, 'B'), (3,'C'), (4, 'D'), (5, 'E'), (6,'F');
    INSERT INTO dbo.Table2 VALUES('Will','W'), ('Sam','S'), ('Ned','N'), ('Tom','T'), ('John','J'), ('Gary','G');
    GO
    select * from dbo.Table1;
    go
    select * from dbo.Table2
    go

    The contents of the new tables were exactly as expected:
    ColSeq    strL
    1    A
    2    B
    3    C
    4    D
    5    E
    6    F

    Forename    strF
    Gary    G
    John    J
    Ned    N
    Sam    S
    Tom    T
    Will    W

    The point here is that there is no particular relationship between the two tables, though they do have the same number of rows.

    I then executed the statement:
    UPDATE dbo.Table1 
    SET strL = dbo.Table2.strF
    FROM dbo.Table2;
    GO
    SELECT colSeq, strL FROM Table1
    GO

    The result was:
    ColSeq    strL
    1    G
    2    G
    3    G
    4    G
    5    G
    6    G

    I have no good explanation for this - unless it was the SQL Server was doing the minimum amount of work.

    So - what are the factors that determine how the table and the table expression are linked? I have to have it
    spelled out - I don't want to be performing inadvertant garbage updates because of a misunderstanding.

    Yours, seeking enlightenment

    MarkD

  • What SQL did was to take one row from the source table and updated the target with the value.  Take a look at your query plan and look at your Nested Loops operator.  The little red circle indicates that there's a warning there.  In this case, it's warning you that there's no join predicate.  Without a relationship between the two, SQL can't match the rows from the two tables.

    What is it that you wanted to see happen?  I'm just guessing here, but if you wanted each row in dbo.Table1 to be updated with a different value from dbo.Table2, you have to define some kind of relationship between them.  Here's an example of using a row number (based on no particular sort order) to join them.

    WITH cteTarget AS (
    SELECT strL, RN = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
      FROM dbo.Table1
    ),
    cteSource AS (
    SELECT strF, RN = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
      FROM dbo.Table2
    )
    UPDATE cteTarget
    SET strL = s.strF
    FROM cteSource s
    WHERE cteTarget.RN = s.RN;

    Like I said, this is only a guess, but my real hope is that it helps illustrate the point.

    BTW, my sincerest compliments on explaining the problem and providing a great script to reproduce the it.

  • Your result is to be expected, as you're not defining any JOINs. SQL needs to now what records are related to what, otherwise it has no idea.

    It this case, you're attempting to set a single value (the value strL) to a range of value from strF. In this case SQL will take the first value that it evaluates, which happens to be G in your example. As it is doing this process to every row, every row gets the value G.

    This is why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.

    If you were to turn your data into something relational, it might look something like this:

    IF OBJECT_ID ('dbo.Table1', 'U') IS NOT NULL
    DROP TABLE dbo.Table1;
    GO
    IF OBJECT_ID ('dbo.Table2', 'U') IS NOT NULL
    DROP TABLE dbo.Table2;
    GO
    CREATE TABLE dbo.Table1
    (T1ID INT IDENTITY(1,1), ColSeq int NOT NULL, strL varchar(1) NOT NULL);
    GO
    CREATE TABLE dbo.Table2
    (T2ID INT IDENTITY(1,1), T1ID INT, Forename varchar(20) PRIMARY KEY NOT NULL, strF varchar(1) NOT NULL);
    GO
    INSERT INTO dbo.Table1 (ColSeq, strL)
    VALUES(1, 'A'), (2, 'B'), (3,'C'), (4, 'D'), (5, 'E'), (6,'F');
    GO

    INSERT INTO dbo.Table2 (T1ID, Forename, strF)
    VALUES(6, 'Will','W'), (5, 'Sam','S'), (4, 'Ned','N'), (3, 'Tom','T'), (2, 'John','J'), (1, 'Gary','G');
    GO

    select * from dbo.Table1;
    go
    select * from dbo.Table2;
    go

    UPDATE dbo.Table1
    SET Table1.strL = T2.strF
    FROM dbo.Table2 T2
        JOIN dbo.Table1 T1 ON T2.T1ID = T1.T1ID;
    GO

    SELECT colSeq, strL
    FROM Table1;
    GO

    Notice that I've now joined Table1 and Table2, using ID values. This is what a relationship means, as the data in one table is now related to data in another table.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Mark Dalley - Thursday, January 26, 2017 7:42 AM

    My question is this: how does SQL Server decide which row returned by the FROM clause is used to
    update the row in the destination table?

    The point here is that there is no particular relationship between the two tables, though they do have the same number of rows.

    I then executed the statement:
    UPDATE dbo.Table1 
    SET strL = dbo.Table2.strF
    FROM dbo.Table2;
    GO
    SELECT colSeq, strL FROM Table1
    GO

    The result was:
    ColSeq    strL
    1    G
    2    G
    3    G
    4    G
    5    G
    6    G

    I have no good explanation for this - unless it was the SQL Server was doing the minimum amount of work.

    So - what are the factors that determine how the table and the table expression are linked? I have to have it
    spelled out - I don't want to be performing inadvertant garbage updates because of a misunderstanding.

    MarkD

    As you stated there is no particular relationship between the 2 tables. If there is no relationship, there is no way to to update any particular row. So in your test case scenario you told SQL to update all rows in Table1 with row(s) from Table2. Since there is no relationship, SQL takes just the first row from Table2 and assigns it to Table1. Try removing the primary key from Table2 and re-run you scripts. What do you get?  A random row would have been just as correct as taking the first row.

    As was stated, you have to define a relationship between the 2 tables, even if it is an artificial one, in order for the update to occur properly.

    __________________________________________________________________________________________________________
    How to Post to get the most: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Thanks guys for the really rapid responses.

    I am well aware of relational concepts. My example contained two non-related tables on purpose,
    as a step on the way to teasing out understanding of a concept. I admit that it wouldn't really make sense
    to do this without defining some sort of relationship between them.

    From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
    primary key value, but could equally well have picked any other. Is that right?

    If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
    it is going to do something daft, I'd like a bit of notice.

    MarkD

  • This is why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.

    I don't mean to split hairs Thom but that's not technically why a Relational database is called Relational.  A database relation is a set of tuples representing the same type of entity.  The relation in this sense is a table of rows.


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    β€”Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • BTW, I meant to post this in the SQL Server 2012 T-SQL forum - my apologies.

    MarkD

  • Mark Dalley - Thursday, January 26, 2017 8:26 AM

    From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
    primary key value, but could equally well have picked any other. Is that right?

    If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
    it is going to do something daft, I'd like a bit of notice.

    MarkD

    Effectively yes, it could be random. Your key just meant that SQL had a default ORDER is could choose from.

    SQL isn't designed to hold your hand when you tell it to do things. If you tell it to update 1 Million rows with the same value, it'll do it, as you asked it to. In a production system, where updating millions of rows at a time is a day to day occurrence, it would be awful if SQL asked you everytime "Are you sure you want to do this?".

    This is why DEV and TEST environments are so important. SQL doesn't question your actions, as it's "assumed" that the person has enough knowledge to know what they are doing, and have tested their code. If they make a mistake, it isn't SQL's fault it did what was asked of it.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • BWFC - Thursday, January 26, 2017 8:30 AM

    This is part of the reason why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.

    I don't mean to split hairs Thom but that's not technically why a Relational database is called Relational.  A database relation is a set of tuples representing the same type of entity.  The relation in this sense is a table of rows.

    Would you prefer the above amendment? ^_^

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Thursday, January 26, 2017 8:36 AM

    BWFC - Thursday, January 26, 2017 8:30 AM

    This is part of the reason why SQL is call an RDMS (Relational database management system). The key is that the data is RELATIONAL. As your data above has no relation, then it defeats the object.

    I don't mean to split hairs Thom but that's not technically why a Relational database is called Relational.  A database relation is a set of tuples representing the same type of entity.  The relation in this sense is a table of rows.

    Would you prefer to above amendment? ^_^

    It's better, but if CELKO catches you, don't say you weren't warned πŸ˜€


    On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
    β€”Charles Babbage, Passages from the Life of a Philosopher

    How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

  • BWFC - Thursday, January 26, 2017 8:40 AM

    It's better, but if CELKO catches you, don't say you weren't warned πŸ˜€

    Eh, I do sometimes enjoy his one man crusade posts πŸ™‚

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Mark Dalley - Thursday, January 26, 2017 8:26 AM

    Thanks guys for the really rapid responses.

    I am well aware of relational concepts. My example contained two non-related tables on purpose,
    as a step on the way to teasing out understanding of a concept. I admit that it wouldn't really make sense
    to do this without defining some sort of relationship between them.

    From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
    primary key value, but could equally well have picked any other. Is that right?

    If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
    it is going to do something daft, I'd like a bit of notice.

    MarkD

    Your UPDATE statement was perfectly valid; it just didn't do what you thought it would do. This is completely normal - that's why we have development environments. πŸ˜‰
    It did generate a warning in the query plan, but I hope that SQL never throws an error based on what It thnks is right or wrong versus what it interprets you're trying to do. The optimizer isn't perfect.

    BTW, Thom is exactly right - the order could be random., which could be a disaster.

  • Thom A - Thursday, January 26, 2017 8:34 AM

    Mark Dalley - Thursday, January 26, 2017 8:26 AM

    From what you are saying Thom, the result of the UPDATE above isn't really defined. It picked the first record by
    primary key value, but could equally well have picked any other. Is that right?

    If so, I find it a bit unsettling that no warning or error message was produced. UPDATE can do drastic stuff, and if
    it is going to do something daft, I'd like a bit of notice.

    MarkD

    Effectively yes, it could be random. Your key just meant that SQL had a default ORDER is could choose from.

    SQL isn't designed to hold your hand when you tell it to do things. If you tell it to update 1 Million rows with the same value, it'll do it, as you asked it to. In a production system, where updating millions of rows at a time is a day to day occurrence, it would be awful if SQL asked you everytime "Are you sure you want to do this?".

    This is why DEV and TEST environments are so important. SQL doesn't question your actions, as it's "assumed" that the person has enough knowledge to know what they are doing, and have tested their code. If they make a mistake, it isn't SQL's fault it did what was asked of it.

    I realise that SQL Server doesn't hold my hand, ouch, yes. My point is, if I ask it to do something meaningless, it might at least come back and and say "What???" rather than just going ahead and doing *something* involving a potentially massive update of rows.

    MarkD

  • To sum up then, what people seem to be saying is that...

    In order for the UPDATE to make sense, the table expression in the FROM must necessarily
    involve the table whose name appears after the UPDATE keyword, and must be of such a form
    that each record that is being updated relates to exactly one record in the table from which
    the new value is being taken?

    I know that BOL has an example where the result of the UPDATE is not actually defined, although - once again -
    no error or warning is emitted.

    Is that a fair description?

    MarkD

  • Mark Dalley - Thursday, January 26, 2017 9:23 AM

    To sum up then, what people seem to be saying is that...

    In order for the UPDATE to make sense, the table expression in the FROM must necessarily
    involve the table whose name appears after the UPDATE keyword, and must be of such a form
    that each record that is being updated relates to exactly one record in the table from which
    the new value is being taken?

    I know that BOL has an example where the result of the UPDATE is not actually defined, although - once again -
    no error or warning is emitted.

    Is that a fair description?

    MarkD

    Mark

    Yes.  The UPDATE...FROM syntax can be dangerous if you unknowingly use it where there's a one-to-many relationship.  As you have observed, there's no error message - it just chooses one of the "many" values to update the "one" value to.  The two examples on this page are two ways of doing the same thing; the first will warn if there's more than one value for the update, while the second won't.

    John

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply