How to store lots of data (1024 columns) for a particular reading

Question

Post reply

How to store lots of data (1024 columns) for a particular reading

Viewing 15 posts - 16 through 30 (of 44 total)

You must be logged in to reply to this topic. Login to reply

Devendrakumar Shirbad SSC-Forever Points: 42493 More actions · Answer 1

GilaMonster (11/19/2011)

Sparse columns (requires SQL 2008)

Thing is, all the columns must be nullable and it's intended for columns that are mostly (90%+) NULL

I'm not going to write out 1000+ columns, but the table design would look like this:

CREATE TABLE Samples (
SampleDate DATETIME Reading1 int SPARSE NULL, Reading2 int SPARSE NULL, Reading3 int SPARSE NULL, ... Reading1024 int SPARSE NULL,
cs xml column_set FOR ALL_SPARSE_COLUMNS);

Small Correction. Columns which are marked as SPARSE should be NULLable; rest of the columns can have NOT NULL values. The example above has a column 'SampleDate DATETIME' which is valid in DDL & can store NOT NULL datetime value.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 2

Dev (11/20/2011)

I don’t understand how it will consume triple the space? Even if it does space is cheap but that’s not the concern (time being).

If you do not understand it, then how are you able to argue against the comment? Please understand it. I would recommend comparing the tall schema you proposed with the flat (not wide) schema Jeff proposed where a single fixed-width binary column holds all spectrometer readings within it. You will easily see why your schema will require significantly more storage space. In reference to your comment about space being cheap...that's not the point. Disk space may be cheap, however if it takes many times the amount of space to store this data on disk it will also take many more times the amount of memory once it is read into memory for various necessary operations (like searching for it and returning it to the caller). Lobbing bags of money at the problem may be a good escape route for some legacy systems however if you have bags of money to spare from the outset it is wasteful to volunteer them as justification for a design choice.

Dev (11/20/2011)

GilaMonster (11/19/2011)

Sparse columns (requires SQL 2008)

Thing is, all the columns must be nullable and it's intended for columns that are mostly (90%+) NULL

I'm not going to write out 1000+ columns, but the table design would look like this:

CREATE TABLE Samples (
SampleDate DATETIME Reading1 int SPARSE NULL, Reading2 int SPARSE NULL, Reading3 int SPARSE NULL, ... Reading1024 int SPARSE NULL,
cs xml column_set FOR ALL_SPARSE_COLUMNS);

Small Correction. Columns which are marked as SPARSE should be NULLable; rest of the columns can have NOT NULL values. The example above has a column 'SampleDate DATETIME' which is valid in DDL & can store NOT NULL datetime value.

You're kicking a dead horse Dev, or a dead design as it were...Gail already made the point that sparse columns are meant for columns where 90+% of values will be NULL. That information coupled with Jeff's earlier comment regarding spectrometer data of this nature, all 1024 columns are usually equally important and must be stored... even the columns with "0" data are important in spectral analysis make "wide tables" a poor option.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

opc.three beat me to the punch on a lot of the comments I was going to make, put I'll post them anyway because they're so important in this case. I also have a wad of amplifying information and proof that the "tall skinny" table will take 3 times as much storage space (and all the problems that go with that fact) even if we used separate columns as the 1024 columns required.

Dev (11/20/2011)

Having a "tall skinny" (NVP or Name Value Pair) table, like what you're suggesting, would actually more than triple the storage space required because, not only must you store the data, but you also need to store the column name and where you were storing a row number reference only once, you must now store it for each single piece of data. Use of a such a triple size table would require three times the IO to get anything done.

It’s a lookup / master table. We would populate it ONCE & would be flexible enough to except the business rule changes (adding / removing more values). It’s a major advantage.

Ordinarily, I'd agree but, with this type of consistant, fixed width data, I have to disagree. You'll see the reasons why in a minute.

The child tables (where actual data needs to be stored) will have reference from lookup table. We may intelligently populate child records (say populate the records where you have values, ignore NULLs).

You'd have to show me the CREATE TABLE statements for the "child tables" you're talking about because I see one Parent Table for the "run" information" and only 1 "child table" for the collected data.

I also believe that you don't understand the nature of the data insofar as NULLs are concerned. Unless one of the 1024 frequency channels is broken, the output of a spectrometer (or spectrum analyzer, as they're sometimes called), ALWAYS returns a value for every frequency channel. If a channel is broken, then keeping the NULLs for that channel is actually very important to the analysis process. You simply can't not store NULL values in this case.

We may also use Table Variables to pass values from Front End so SET based operations can be done on the data.

I don’t understand how it will consume triple the space? Even if it does space is cheap but that’s not the concern (time being).

The fact that disk space is cheap has nothing to do with whether or not you should optimize disk space storage or not. You have to consider several other factors...

1. In the event of a crisis, which will be faster to restore? A 10GB database or a 30GB database?

2. Which will take longer to backup? A 10GB database or a 30GB database?

3. Considering #1 and #2 above, which will be easier on the network? A 10GB database or a 30GB database?

4. Considering a table that holds a single (fairly small for spectral analysis) run of 100,000 rows and you want to analyse the entirety of the run, which will be faster to read? A 20.000MB table which contains 100,000 rows or a 59.480MB table which contains 2,000,000 rows (a ton more rows if all 1024 channels were used)?

5. Considering #4 above, which will cause the least amount of IO (the slowest and usually largest bottle-neck on any server) when joining to the Parent Table to get the run information? A 20.000MB table which contains 100,000 rows or a 59.480 which contains 2,000,000 rows?

6. As opc.three stated, you also have to consider memory requirements which is also a seriously important bottle-neck on all servers. Which will use less memory? A 20.000MB table which contains 100,000 rows or a 59.480 which contains 2,000,000 rows?

And, no... I'm not just pulling numbers out of my butt here. Consider the storage requirements of a "wide" vs. "tall skinny" table for a paltry 20 channels...



--===== Do this in a nice safe place that everyone has.

    USE TempDB;

--===== Contitionally drop the test tables to make reruns in SSMS easier.

    IF OBJECT_ID('TempDB..#TestSize1','U') IS NOT NULL DROP TABLE #TestSize1;

    IF OBJECT_ID('TempDB..#TestSize2','U') IS NOT NULL DROP TABLE #TestSize2;

GO

--===== Create and populate the "wide" table first.

     -- ISNULL is used to make a NOT NULL column for the PK we're going to make

 SELECT TOP 100000

        RunNumber    = ISNULL(CAST(1 AS INT),0),

        SampleNumber = ISNULL(CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT),0),

        Channel0001  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*100,

        Channel0002  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*200,

        Channel0003  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*300,

        Channel0004  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*400,

        Channel0005  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*500,

        Channel0006  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*600,

        Channel0007  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*700,

        Channel0008  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*800,

        Channel0009  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*900,

        Channel0010  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1000,

        Channel0011  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1100,

        Channel0012  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1200,

        Channel0013  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1300,

        Channel0014  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1400,

        Channel0015  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1500,

        Channel0016  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1600,

        Channel0017  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1700,

        Channel0018  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1800,

        Channel0019  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*1900,

        Channel0020  = CAST(RAND(CHECKSUM(NEWID())) AS DECIMAL(9,2))*2000

   INTO #TestSize1

   FROM sys.all_columns ac1

  CROSS JOIN sys.all_columns ac2

;

--===== Add the expected PK to the "wide" table.

  ALTER TABLE #TestSize1

    ADD PRIMARY KEY CLUSTERED (RunNumber, SampleNumber)

;

--===== Now create and populate the "tall skinny" table using the same data.

     -- Again, ISNULL is used to make NOT NULL columns for the PK we're going to make.

 SELECT RunNumber     = ISNULL(ts1.RunNumber,0),

        SampleNumber  = ISNULL(ts1.SampleNumber,0),

        ChannelNumber = ISNULL(ca.ChannelNumber,0),

        ca.ChannelValue

   INTO #TestSize2

   FROM #TestSize1 ts1

  CROSS APPLY

        ( --=== Unpivot the data on each row

         SELECT  1, Channel0001 UNION ALL

         SELECT  2, Channel0002 UNION ALL

         SELECT  3, Channel0003 UNION ALL

         SELECT  4, Channel0004 UNION ALL

         SELECT  5, Channel0005 UNION ALL

         SELECT  6, Channel0006 UNION ALL

         SELECT  7, Channel0007 UNION ALL

         SELECT  8, Channel0008 UNION ALL

         SELECT  9, Channel0009 UNION ALL

         SELECT 10, Channel0010 UNION ALL

         SELECT 11, Channel0011 UNION ALL

         SELECT 12, Channel0012 UNION ALL

         SELECT 13, Channel0013 UNION ALL

         SELECT 14, Channel0014 UNION ALL

         SELECT 15, Channel0015 UNION ALL

         SELECT 16, Channel0016 UNION ALL

         SELECT 17, Channel0017 UNION ALL

         SELECT 18, Channel0018 UNION ALL

         SELECT 19, Channel0019 UNION ALL

         SELECT 20, Channel0020

        ) ca (ChannelNumber, ChannelValue)

;

--===== Add the expected PK to the "tall skinny" table.

  ALTER TABLE #TestSize2

    ADD PRIMARY KEY CLUSTERED (RunNumber, SampleNumber, ChannelNumber)

;

--===== Finally, let's see how much dataspace is used for each table.

EXEC sp_SpaceUsed '#TestSize1';

EXEC sp_SpaceUsed '#TestSize2';

Here's the output of sp_SpaceUsed for the two tables. Like I said, 3 times the space (both for data and the clustered index) which will equate to 3 times the network traffic, 3 times the IO, and 1/3rd the performance (or worse depending on how rows of information are reassembled) for queries.

Whatever the solution we opt we need to perform operations on same volume of data. The advantage we would get here with Tall Approach is we would be able to skip the unnecessary data split operation.

Yep... we could skip the "unnecessary" data split operation. But, in the case of the flat, fixed field binary data storage I proposed where ALL rows have precisely the same number of "fields", such splits are fairly easy and quite fast to accomplish because such splits can easily be accomplished programatically using a cteTally or physical Tally Table and a simple SUBSTRING instead of having to search for embedded delimiters. Yep... that'll definitely make for non-SARGable WHERE clauses but, even if an INDEX SEEK were to occur on the "tall skinny" table, that seek will only find the first row of a "run". All of the rest of the data will be (at best) accessed behind the scenes as a sequential scan and reading 3 times the number of data pages will prove very expensive on the "tall skinny" table. An INDEX SEEK could also be used to find the first row of a run in the flat, fixed field table I propose and it, too, would do a scan from there behind the scenes. The difference is that it would only have to read 1/3rd the total number of pages compared to the "tall skinny" method. Further, if the data is accessed/analyzed in a GUI, it would be better and much faster to split the data in the front end rather than going through 3 times the amount of data.

Just to make a small correction to what opc.three said, both a flat wide (1024+ columns) and a tall skinny table aren't likely to be the way to go here. The flat fixed width binary column is very effecient for all the reasons previously stated especially with the help of a properly written front end or CLR. If neither of those is possible, splitting the binary column will still be so fast that it'll make up for the fact that the tall skinny table must process 3 times the amount of data never mind the Pivot code that must be used to reassemble the rows.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 4

Jeff Moden (11/20/2011)

Just to make a small correction to what opc.three said, both a flat wide (1024+ columns) and a tall skinny table aren't likely to be the way to go here.

Sorry to muddle the terminology Jeff. Just to clarify, "the flat (not wide) schema Jeff proposed" that I mentioned in an earlier post is neither a "flat wide (1024+ columns)" or a "tall skinny table", it was meant to describe the table you proposed with only a runtime, deviceID and the fixed-width binary column to store readings.

That said, I am not convinced that a small group of "readings tables" that make up 1024 readings columns would be so bad here (group of tables necessary given the 1024 column-limit). For me the driver is type-safety, which you would not get from a binary column. Yes, you could build structure around that column, but nothing with the level of testing the DB engine has received to maintain proper data in that column, i.e. anyone could mangle your column data with a bad user-defined T-SQL or CLR object. Typed XML and typed columns, give you type safety.

I think all the solutions have merit, even the sparse columns solution. Some more than others though 😉 It will definitely depend on the rest of the requirements to reach the solution that works best.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Devendrakumar Shirbad SSC-Forever Points: 42493 More actions · Answer 5

Thanks opc.three & Jeff for your thoughts.

I would like to clarify few things on my end.

• By saying that “Space is Cheap” I never meant that we can misuse the Disk Storage. I do understand all point that you guys raised like Backup, Network & I/O.

• Now “kicking a dead horse”. I never argued on Wide Table option. You can verify it in previous posts that I gave reasons NOT to use them. My intention there was just to correct the misunderstanding on Wide Tables that “All columns should be NULL”.

• Jeff your comment ‘I also believe that you don't understand the nature of the data’ is 100% right. I don’t know spectrometer data. That’s why all of my comments related to data were Generic.

• Also, I am happy that you understand my worry “Yep... that'll definitely make for non-SARGable”.

• (I already accepted I am not sure on nature of data) If the required data column changes with your suggestions there would be code changes in each enhancement. Loosing Flexibility.

I think all the solutions have merit, even the sparse columns solution. Some more than others though It will definitely depend on the rest of the requirements to reach the solution that works best.

That would be conclusion anyways & everybody would be agreeing on that. 😀

loki1049 SSCrazy Points: 2065 More actions · Answer 6

Thanks for the very informative post. Sorry I have not participated but I was away for the weekend. As Jeff pointed out, yes the spectral data is 1024 randomized (due to its nature) radiation detection counts, so our device will always return 1024 NON NULL pieces of data.

From the discussion, I definitely think that the binary solution sounds the most attractive. In fact, one of the developers here asked if SQL columns had a binary type representation. At the time I was unaware of this. I will have to do some reading on how to work with this data type and parse it back out.

Thanks for all of the help, it is much appreciated.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

What's the front end for this data? What are you using to put data in and later display/interpret/create pretty pictures with? 😉

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 8

Heh, I misunderstood the final use of this data. I realized the spectrometer took all those readings as a whole but my brain went a bit sideways on the expected use. I'd figure you'd want to compare, say, 'reading 3' across a period of time ignoring readings 4-1024... thus my recommendation to twist the data.

If the data must be a comprehensive whole, I would definately find a way to keep the flattened arrangement. On a side note, you CAN get some performance out of the XML type because it will index. I'm not sure if you're ever going to need that level of depth, but I'd need a heck of a lot better understanding of the business use case if there would ever be one (considering the original comments from the boss).

You can find some additional information on XML indexing inside SQL Server here: http://msdn.microsoft.com/en-us/library/ms191497.aspx

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

loki1049 SSCrazy Points: 2065 More actions · Answer 9

GilaMonster (11/21/2011)

What's the front end for this data? What are you using to put data in and later display/interpret/create pretty pictures with? 😉

That's just the thing, we are not really sure as to how the data is going to be interpreted. This is a real business scenario so of course nothing is laid out in great detail. The client just said that they would like to capture the spectral data along with some other readings. It is likely that the spectral data will be thrown away or just ignored, and the associated data which is only about 3 additional values will be used.

However, we need to keep the option open for this data to be viewed some how. The current way to present this data is to hand the client a linked access database that looks at the base tables (I personally don't like that idea, but it seems to be what they want). I guess there would then be a way of running a macro in access to break up this 1024 record column into an excel export for data manipulation there.

EDIT: Excel has a max limit of 256 columns, so not sure how the client will review this data in a separated form...

Ideally I would have a web app (most of our data is used for web apps) that graphs this data over a period of time for each channel, with an option to window this spectra for more detail, but I am not in position to make that decision.

loki1049 SSCrazy Points: 2065 More actions · Answer 10

Jeff Moden (11/18/2011)

For a spectrometer, you would expect all numeric values. Rather than bother with XML, why not store it all as "fixed length" fields within a fixed length BINARY datatype which is easily "split" positionally?

I thought I understood binary data correctly apparently I don't. How would you write to this field if you wanted to store the following values?

Array { 10,200,3000,40000 }

I'm having trouble figuring out how to get fixed sized data. Would I have to assume that there is an absolute maximum of counts returned say the max is 99 999 and make sure to store all of the elements in the above array in a byte format that could contain this maximum counts? It seems I have to do some sort of 'bin' method so I can tell where one channel count ends and another begins.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

opc.three (11/20/2011)

That said, I am not convinced that a small group of "readings tables" that make up 1024 readings columns would be so bad here (group of tables necessary given the 1024 column-limit).

No problem... using that thought, write a query for a given run (assume a paltry 100,000 rows) that will search all 1024 channels on each "sample record" for a maximum value (usually in Lumens, Decibels, or Pascals depending on what is being measured) and then return that value, the sample number of that "record", and the channel number and you'll understand why I suggested a fixed-field binary format. If that doesn't convince you, take the average, min, max, and standard deviation of each channel per second of run time. And it doesn't matter whether you're using SQL Server for the analysis or a GUI. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Dev (11/21/2011)

Thanks opc.three & Jeff for your thoughts.

I would like to clarify few things on my end.

• By saying that “Space is Cheap” I never meant that we can misuse the Disk Storage. I do understand all point that you guys raised like Backup, Network & I/O.

• Now “kicking a dead horse”. I never argued on Wide Table option. You can verify it in previous posts that I gave reasons NOT to use them. My intention there was just to correct the misunderstanding on Wide Tables that “All columns should be NULL”.

• Jeff your comment ‘I also believe that you don't understand the nature of the data’ is 100% right. I don’t know spectrometer data. That’s why all of my comments related to data were Generic.

• Also, I am happy that you understand my worry “Yep... that'll definitely make for non-SARGable”.

• (I already accepted I am not sure on nature of data) If the required data column changes with your suggestions there would be code changes in each enhancement. Loosing Flexibility.

I think all the solutions have merit, even the sparse columns solution. Some more than others though It will definitely depend on the rest of the requirements to reach the solution that works best.

That would be conclusion anyways & everybody would be agreeing on that. 😀

I don't know what scale and precision is required for each of the 1024 channels of spectoral output would be needed. Assuming something reasonable like DECIMAL(9,2) (which is 5 bytes when stored as a fixed width Binary), 1024 channels would require a fixed-width binary of 5120 bytes which also has the advantage of being 100% "in row".

Since it is known that the start of each element (channel) in the binary would be precisely 5 bytes from element to element and that the width of each element is precisely 5 bytes, cteTally splitter used with CROSS APPLY could easily approach CLR speeds if the analysis needed to be done in T-SQL. It also has the advantage of doing it with at least 1/3 the reads (even logical reads take extra time) and 1/3rd the memory requirements. It also has the advantage of not having to "pivot" data to get whole rows and things like MIN, MAX, and AVG are easy to calculate over rolling bins of time. Even such things like MODE and MEDIAN could be easily calculated and it would all be done with less IO and less Memory than an NVP style table.

The reason why all of this is possible is because of the fixed-field width and how easy it is for cteTally splitters to programatically process the data (much like what would be done in externally managed code) without having to do such things as write (or have the code write) and 1024 column pivot to reassemble rows of data.

To summarize, I understand your concerns about a loss of flexibility and how flexible NVPs can be, but I don't believe those concerns are applicable here because of the fixed field format of the data itself and the fact that the required structure of the NVP would require at least 3 times the IO and memory. Flexibility wouldn't be lost in this case... it would just have to take on a different personality.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

loki1049 (11/21/2011)

Thanks for the very informative post. Sorry I have not participated but I was away for the weekend. As Jeff pointed out, yes the spectral data is 1024 randomized (due to its nature) radiation detection counts, so our device will always return 1024 NON NULL pieces of data.

From the discussion, I definitely think that the binary solution sounds the most attractive. In fact, one of the developers here asked if SQL columns had a binary type representation. At the time I was unaware of this. I will have to do some reading on how to work with this data type and parse it back out.

Thanks for all of the help, it is much appreciated.

I don't know what kind of output your spectrometers can cough out but, if they can use the kind of "concatenated binary" format I'm thinking of, then SQL Server would have to do very, very little work to capture the data. Depending on what your sample rates are, it might even be possible to capture the data "real time".

I'm assuming that radiation levels don't mind their privacy being invaded 😛 in saying this... would it be possible for you to attach a file of, say, 100 rows of data that you get from the spectrometer so we can "play" on behalf of this very interesting problem? That would let us know all sorts of things including the precision and scale of the data you're collecting so we'd know how big to make the fixed-field binary.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

loki1049 (11/21/2011)

Jeff Moden (11/18/2011)

For a spectrometer, you would expect all numeric values. Rather than bother with XML, why not store it all as "fixed length" fields within a fixed length BINARY datatype which is easily "split" positionally?

I thought I understood binary data correctly apparently I don't. How would you write to this field if you wanted to store the following values?

Array { 10,200,3000,40000 }

I'm having trouble figuring out how to get fixed sized data. Would I have to assume that there is an absolute maximum of counts returned say the max is 99 999 and make sure to store all of the elements in the above array in a byte format that could contain this maximum counts? It seems I have to do some sort of 'bin' method so I can tell where one channel count ends and another begins.

"Binary" data in SQL Server can be, depending on usage, a bit of a misnomer. In SQL Server, you can create a "binary" of concatenated binary values quite easily so that a (for example) 5120 byte binary is actually a fixed-field formatted "string" of 1024 5-byte binary encoded values. I'll work up a "10 field" example of binary encoded INTEGERs (4 bytes each) so you can see what I mean. I'll be back in a few.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

loki1049 SSCrazy Points: 2065 More actions · Answer 15

Jeff Moden (11/21/2011)

I don't know what kind of output your spectrometers can cough out but, if they can use the kind of "concatenated binary" format I'm thinking of, then SQL Server would have to do very, very little work to capture the data. Depending on what your sample rates are, it might even be possible to capture the data "real time".

I'm assuming that radiation levels don't mind their privacy being invaded 😛 in saying this... would it be possible for you to attach a file of, say, 100 rows of data that you get from the spectrometer so we can "play" on behalf of this very interesting problem? That would let us know all sorts of things including the precision and scale of the data you're collecting so we'd know how big to make the fixed-field binary.

Thanks for taking the time to help with examples, I'm much more of a visual learner.

For the data, we are reading the channels once every minute. So it is fairly real time. Problem is we don't have the channel analyzer yet so I can't give you test data. The detector is fairly sensitive so I would assume the values would range from 0 to 50 000 counts. The original idea was to store the data in a fixed string format of 4 significant figures and an exponent:

i.e: 1.234e56

That way we would know how many channels could be placed given a fixed NVARCHAR column choice. I still want to abandon this string comma separated format and use a BLOB.