Help with query

  • Hello! I'm trying to query between 2 tables: one contains companies (each with a number of keywords separated by commas) and another table that contains the results of user searches in the database (each word that was searched and the number of searches for it ). As a result I want to get: a table with the first N (as number of occurrences) search words and the number of occurrences in the Search table that are found for company X.

    CREATE TABLE [dbo].[Companies](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](150) NOT NULL,
    [KeyWords] [nvarchar](max) NULL
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

    CREATE TABLE [dbo].[Search](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [word] [nvarchar](100) NOT NULL,
    [Count] [int] NOT NULL
    ) ON [PRIMARY]
    GO
    SET IDENTITY_INSERT [dbo].[Companies] ON
    GO
    INSERT [dbo].[Companies] ([id], [Name], [KeyWords]) VALUES (1, N'Company1', N'word1, word2, word3, word8, word9')
    GO
    INSERT [dbo].[Companies] ([id], [Name], [KeyWords]) VALUES (2, N'Company2', N'word2, word4, word5, word6, word7')
    GO
    INSERT [dbo].[Companies] ([id], [Name], [KeyWords]) VALUES (3, N'Company3', N'word1, word3, word4, word9')
    GO
    SET IDENTITY_INSERT [dbo].[Companies] OFF
    GO
    SET IDENTITY_INSERT [dbo].[Search] ON
    GO
    INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (1, N'word1', 6)
    GO
    INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (2, N'word3', 10)
    GO
    INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (3, N'word4', 2)
    GO
    INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (4, N'word5', 1)
    GO
    INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (5, N'word2', 3)
    GO
    INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (6, N'word6', 8)
    GO
    SET IDENTITY_INSERT [dbo].[Search] OFF
    GO

    Specifically, if I want to query for Company1 (id:1)  to get the results:

    word3, 10

    word1, 6

    word2, 3

    for Company2 (id:2)

    word6, 8

    word2, 3

    word4, 2

    word5, 1

    I tested a code made by myself but it think is not optimal and I want good results for tables with thousands of records. Thanks so much for any ideas!

  • Depends on how big the data set is, but you could use string_split

     

     

    ;with cte as (
    Select c.ID, c.name, ltrim(rtrim(s.value)) as word, c.KeyWords
    from #Companies c
    Cross apply
    string_split(KeyWords, ',') s)

    Select c.ID, c.name, c.word, s.Count, c.KeyWords
    from cte c
    join #Search s
    on c.word = s.word

    • This reply was modified 2 years, 8 months ago by  Mike01.

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • yup! very fast indeed! Thank you!

  • One of Dykstra's best quotes to his students when they were trying to learn programming was "you're doing everything completely wrong." And he really

    meant it. I prefer the gentler version of this for Mark Twain – "the pumpkin is a failure is a shade tree." In short, you got exactly the wrong tool and you're misusing it.

    Did you know the table by definition, must have a key? It is not  provided in any of your DDL. Did you know the key must be one or more attributes (columns) in the table so identity which is a table property can never be a key by definition.  Apparently, you don't seem to know the purpose of SQL is to handle tables and structured data and you should never use it for text. Finally, a column is by definition a scalar value; but you're using a list! That's not part of SQL or the relational model!

    Here is an improved version of your attempt to create a table; it is still fundamentally wrong, it's a little bit closer to a proper design

    CREATE TABLE Companies

    (duns CHAR(9) NOT NULL PRIMARY KEY,

    company_name NVARCHAR (150) NOT NULL,

    non_relational_keyword_list NVARCHAR (MAX)) ;

    The DUNS is the international standard for identifying companies. It's required by law in many countries, and it's constantly used in federal contracts in the US. If you're going to be a database programmer, then you'd better start learning all about data and the standards that go with it.

    CREATE TABLE Searches

    (word NVARCHAR (100) NOT NULL PRIMARY KEY, -- really? That big?

    word_ count INTEGER NOT NULL CHECK( word_count >= 0)

    );

    An observation from over 30 years of writing SQL; if you allow an oversized variable or column in a table, then someone will provide such a value and screw you over. You also should get in the habit of providing check constraints that protect you from bad data.

    You might also want to look up the term "normalization" and apply it to your tables. Your bad design should have looked more like this:

    CREATE TABLE Companies

    (duns CHAR(9) NOT NULL PRIMARY KEY,

    company_name NVARCHAR (150) NOT NULL);

    CREATE TABLE Foobar

    (duns CHAR(9) NOT NULL REFERENCES Companies,

    keyword NVARCHAR (20) NOT NULL

    PRIMARY KEY (duns, keyword));

    You also don't seem to know that the VALUES clause in SQL can be used to construct an entire table in one statement. Besides saving a lot of typing, it also lets the optimizer get to your insertions.

    What you were trying to do is called a relational division. You can look it up in any decent book on SQL.

    >> tested a code made by myself but it think is not optimal and I want good results for tables with thousands of records [sic: rows are not records] <<

    Actually, it will get to be worse as the size of your database increases. You're using tool for something it was never intended to do, and are using it wrong. Get a textbase tool instead. It will literally run 2 to 3 orders of magnitude faster than trying to do this in SQL. Please look at my credentials, I think you can say this with some authority 🙂

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Puiu wrote:

    yup! very fast indeed! Thank you!

    Yep... for now.  Your company should really work on normalizing the keyword lookup instead of it being in a CSV.  Then it wouldn't only be lightning fast even compared to Mike's solution, it would be a whole lot more scalable for the future.  Celko is correct about Dykstra's  and Twain's quotes... the table is setup completely wrong and you are trying to use the proverbial pumpkin for a shade tree.  If you don't fix it now while there's such a low impact, you will have to fix it in a crisis and it will become a crisis.

    Just like the old saying of someone knowing the cost of everything and the value of nothing, so is it with such advice and they eventually do have to fix it in a crisis.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply