RE: How to ensure order is kept in import from csv

SSC Guru

Points: 60312

March 22, 2004 at 3:00 am

The behaviour is NOT guaranteed. You need to understand what is meant by 'order' in a relational database, and how this interacts with a DTS load.

The physical order in which rows exist has no relevance to the actual order rows are presented to the user in a SELECT statement. There is 1 and only 1 way to obtain rows from a relational database in a specified sequence and that is the ORDER clause. If you do not use SELECT ... ORDER BY ... then any sequence of rows you obtain may not be repeatable. There are many reasons for this, some of which are given below.

Even if the table you are loading into exists in a single physical file and there is no cluster index, then the physical order in which the data is stored is not guaranted. If the table exists in multiple physical files the physical order is even less guaranteed. Even with a cluster index the physical order within the database is not guaranteed.

When you are inserting rows into a table, SQL Server will allocate an extent of space for your table. Assuming there is no cluster index and you have a large number of rows, this extent will be filled sequentially from start to end. If more rows need to be inserted, a new extent will be obtained and rows inserted into that. Extents are allocated to a table in the physical sequence they exist in the database.

If the current DTS load is the first load into the table, then at the end of the load, the rows will almost certainly be in the same physical order in the table as they were in the input file.

If you do any maintenance work on the database before your next DTS load you may have released an extent before the data for your table starts, or even in the middle of it. The next DTS load will again use free extents in the physical sequence they exist in the database, but as you can see there is no guarantee these extents will be after all previous extents for your table. In this situation, data for the current load will be physically interspersed between data previously loaded.

It is important to stress there is nothing wrong with this behaviour. On the contary this behaviour is desirable as it helps re-use space without extending the physical size of the database.

Even if a cluster index is used, if the key sequence of new data is intermixed with existing data, an extent split will eventually occur. The new extent will be the first available in the database, which could physically exist before extents containing data with lower key values. A cluster index will guarantee that all rows in a given extent are physically stored in the order specified by the key, but has no impact on the physical sequence that extents are stored in. Although a 'SELECT *' on a table with a cluster index but NOT using an ORDER BY will normally present all rows in the sequence given by the key, the order is not actually guaranteed, and if multiple physical files or parallelism are involved then the order of presentation may not be repeatable.

So to sum up, there is no point in being concerned about trying to maintain physical sequence of rows across multiple DTS loads, because there is no way to guarantee the sequence. Your concern should be directed towards obtaining the rows in a desired sequence in a SELECT statement, which will require an ORDER BY. You can reduce the time required to obtain the data in the desired sequence by defining a cluster index that matches your ODER BY.

Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara