Advanced Queries With SQL That Will Save Your Time

And now we can perform a DELETE here and remove all the duplication values based on our definition of duplication.WITH CTE AS( SELECT NumberFrom, CountryFrom, NumberTo, CountryTo, Duration, id, RN = ROW_NUMBER()OVER(PARTITION BY NumberFrom, CountryFrom, NumberTo, CountryTo ORDER BY CountryFrom) FROM #t)DELETE FROM CTE WHERE RN > 1;SELECT *FROM #tPARTITION BY will serve as the duplication definition in this case and with ORDER BY we can, for example, remove calls that were shorter/longer in duration or those that started earlier/later or whatever condition you may found in other columns..Please note, ORDER BY clause is a must here, so if you don’t have any specific condition for ordering just put any field there.Selecting Top N Records Within the GroupI believe you already know how to do that :D..Yes, with ROW_NUMBER() again..This function will partition our data and assign a number to each record within a group, so then we just have to specify which range of top N we want to select: top 10? — RN = 3 and RNLet’s select only top 2nd record within the groups of countries from which the calls were made (‘CountryFrom’) that have their names starting with the same letter and sort values by duration within the groups..(I have inserted the same data into temp table 3 times to better see the differences)WITH CTE AS( SELECT NumberFrom, CountryFrom, NumberTo, CountryTo, Duration, id, RN = ROW_NUMBER()OVER(PARTITION BY LEFT(CountryFrom,1) ORDER BY Duration ASC) FROM #t)SELECT *FROM CTE;Look how Germany and Greece are put in one group and Ireland with Italy into another..And to select only second record within each group:WITH CTE AS( SELECT NumberFrom, CountryFrom, NumberTo, CountryTo, Duration, id, RN = ROW_NUMBER()OVER(PARTITION BY LEFT(CountryFrom,1) ORDER BY Duration ASC) FROM #t)SELECT * FROM CTEWHERE RN = 2;Shuffling Records Within the GroupActually this technique can be used to shuffle records in any case, not only within the group, but as we already started with groups, ROW_NUMBER and WITH statements I decided to make it little bit more complicated. I don’t know if in any period of your life you will need to shuffle records within a group, but if that one occasion appears you know how to do that. Again, this is part of my day-to-day job and I know it’s a rare task :)) So back to queries..We will use the same grouping condition and for shuffling it’ll be a combination of functions ABS() — function that is used to get the absolute value of a number passed as an argument, CHECKSUM() — function that returns the checksum value computed over a table row, or over an expression list, and NEWID() — that creates a unique value of type uniqueidentifier..While reading you might think — what the hell?.But I’m telling, it does the job..Look:WITH CTE AS( SELECT NumberFrom, CountryFrom, NumberTo, CountryTo, CallStarted, Duration, id, RN = ROW_NUMBER()OVER(PARTITION BY LEFT(CountryFrom,1) ORDER BY ABS(CHECKSUM(NewId())) % 1000000 DESC) FROM [test].[dbo].[12162018_FakeCalls])SELECT * FROM CTE;It might be not that obvious with small amount of data, but the most exiting thing is every time you run this query it shuffles again..So after a second run:You may noticed now POLAND is the first record and SPAIN is the second.That’s it for today..Few tips with ROW_NUMBER() that help me a lot in my work and I hope will help someone else too.Despite it doesn’t include machine learning, artificial intelligence, image recognition, model tuning, deep neural networks or any other fancy terms that pop up in data science field, working with data using SQL is still part of data science.. More details

Leave a Reply