Help with query

Question

Post reply

Help with query

SSC Enthusiast

Points: 180

January 12, 2022 at 6:11 pm

Hello! I'm trying to query between 2 tables: one contains companies (each with a number of keywords separated by commas) and another table that contains the results of user searches in the database (each word that was searched and the number of searches for it ). As a result I want to get: a table with the first N (as number of occurrences) search words and the number of occurrences in the Search table that are found for company X.

CREATE TABLE [dbo].[Companies](
	[id] [int] IDENTITY(1,1) NOT NULL,
	[Name] [nvarchar](150) NOT NULL,
	[KeyWords] [nvarchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

CREATE TABLE [dbo].[Search](
	[id] [int] IDENTITY(1,1) NOT NULL,
	[word] [nvarchar](100) NOT NULL,
	[Count] [int] NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Companies] ON 
GO
INSERT [dbo].[Companies] ([id], [Name], [KeyWords]) VALUES (1, N'Company1', N'word1, word2, word3, word8, word9')
GO
INSERT [dbo].[Companies] ([id], [Name], [KeyWords]) VALUES (2, N'Company2', N'word2, word4, word5, word6, word7')
GO
INSERT [dbo].[Companies] ([id], [Name], [KeyWords]) VALUES (3, N'Company3', N'word1, word3, word4, word9')
GO
SET IDENTITY_INSERT [dbo].[Companies] OFF
GO
SET IDENTITY_INSERT [dbo].[Search] ON 
GO
INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (1, N'word1', 6)
GO
INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (2, N'word3', 10)
GO
INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (3, N'word4', 2)
GO
INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (4, N'word5', 1)
GO
INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (5, N'word2', 3)
GO
INSERT [dbo].[Search] ([id], [word], [Count]) VALUES (6, N'word6', 8)
GO
SET IDENTITY_INSERT [dbo].[Search] OFF
GO

Specifically, if I want to query for Company1 (id:1) to get the results:

word3, 10

word1, 6

word2, 3

for Company2 (id:2)

word6, 8

word2, 3

word4, 2

word5, 1

I tested a code made by myself but it think is not optimal and I want good results for tables with thousands of records. Thanks so much for any ideas!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Mike01 SSChampion Points: 11381 More actions · Answer 1

Depends on how big the data set is, but you could use string_split

;with cte as (
Select c.ID, c.name, ltrim(rtrim(s.value)) as word, c.KeyWords
from #Companies c
Cross apply 
string_split(KeyWords, ',') s)

Select c.ID, c.name, c.word, s.Count, c.KeyWords
from cte c
join #Search s
on c.word = s.word

This reply was modified 2 years, 8 months ago by Mike01.

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Puiu SSC Enthusiast Points: 180 More actions · Answer 2

Puiu

SSC Enthusiast

Points: 180

January 12, 2022 at 8:00 pm

#3972514

yup! very fast indeed! Thank you!

jcelko212 32090 SSCrazy Eights Points: 9220 More actions · Answer 3

One of Dykstra's best quotes to his students when they were trying to learn programming was "you're doing everything completely wrong." And he really

meant it. I prefer the gentler version of this for Mark Twain – "the pumpkin is a failure is a shade tree." In short, you got exactly the wrong tool and you're misusing it.

Did you know the table by definition, must have a key? It is not provided in any of your DDL. Did you know the key must be one or more attributes (columns) in the table so identity which is a table property can never be a key by definition. Apparently, you don't seem to know the purpose of SQL is to handle tables and structured data and you should never use it for text. Finally, a column is by definition a scalar value; but you're using a list! That's not part of SQL or the relational model!

Here is an improved version of your attempt to create a table; it is still fundamentally wrong, it's a little bit closer to a proper design

CREATE TABLE Companies

(duns CHAR(9) NOT NULL PRIMARY KEY,

company_name NVARCHAR (150) NOT NULL,

non_relational_keyword_list NVARCHAR (MAX)) ;

The DUNS is the international standard for identifying companies. It's required by law in many countries, and it's constantly used in federal contracts in the US. If you're going to be a database programmer, then you'd better start learning all about data and the standards that go with it.

CREATE TABLE Searches

(word NVARCHAR (100) NOT NULL PRIMARY KEY, -- really? That big?

word_ count INTEGER NOT NULL CHECK( word_count >= 0)

);

An observation from over 30 years of writing SQL; if you allow an oversized variable or column in a table, then someone will provide such a value and screw you over. You also should get in the habit of providing check constraints that protect you from bad data.

You might also want to look up the term "normalization" and apply it to your tables. Your bad design should have looked more like this:

CREATE TABLE Companies

(duns CHAR(9) NOT NULL PRIMARY KEY,

company_name NVARCHAR (150) NOT NULL);

CREATE TABLE Foobar

(duns CHAR(9) NOT NULL REFERENCES Companies,

keyword NVARCHAR (20) NOT NULL

PRIMARY KEY (duns, keyword));

You also don't seem to know that the VALUES clause in SQL can be used to construct an entire table in one statement. Besides saving a lot of typing, it also lets the optimizer get to your insertions.

What you were trying to do is called a relational division. You can look it up in any decent book on SQL.

>> tested a code made by myself but it think is not optimal and I want good results for tables with thousands of records [sic: rows are not records] <<

Actually, it will get to be worse as the size of your database increases. You're using tool for something it was never intended to do, and are using it wrong. Get a textbase tool instead. It will literally run 2 to 3 orders of magnitude faster than trying to do this in SQL. Please look at my credentials, I think you can say this with some authority 🙂

Please post DDL and follow ANSI/ISO standards when asking for help.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Puiu wrote:

yup! very fast indeed! Thank you!

Yep... for now. Your company should really work on normalizing the keyword lookup instead of it being in a CSV. Then it wouldn't only be lightning fast even compared to Mike's solution, it would be a whole lot more scalable for the future. Celko is correct about Dykstra's and Twain's quotes... the table is setup completely wrong and you are trying to use the proverbial pumpkin for a shade tree. If you don't fix it now while there's such a low impact, you will have to fix it in a crisis and it will become a crisis.

Just like the old saying of someone knowing the cost of everything and the value of nothing, so is it with such advice and they eventually do have to fix it in a crisis.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions