Percent Rank

  • Hi Everyone

    I have an Access query that computes the percentile for a given set of values.  The key is that the calculation is based on a rolling window of the past X days.  For example, if calculated today, it will look at the last 30 periods (including today's date) and then determine the percentile of today's value.  The rolling 30 periods window is applied to each day.  The query I have uses a lot of nested selects etc because Access has no native function to calculate percentile.  The problem is that my implementation is not remotely efficient.  I want to use SQL Server's native function PERCENT_RANK instead.  The part I am struggling with is how to implement the rolling part.  Sample data is as follows:

    Stk      Dt                  RN      VL

    A        2020-01-03     1     -0.0341460056

    A        2020-01-06     2     0.0211422434

    A        2020-01-07     3     -0.0040987371

    A        2020-01-08    4     0.0089279974

    A        2020-01-09    5     0.1026746903

    A       2020-01-10     6     -0.1027788847

    A       2020-01-13     7     0.0764304778

    RN is row number.  The DT has gaps in them so to address this I added a row number (RN) field.  This way the last 30 periods is always going to be treated consistently (ie, it will always have 30 data points) vs using last 30 days (which may not have 30 days b/c there are holidays).

    I did some research on the PERCENT_RANK function.  Looks like I can get it to work as I have described by using the PARTITION clause.  However, I don't know how to partition my data as described (ie look at the last 30 periods).  I know I have to use RN but not really sure how.

    Is someone able to help guide me on this?  How can I partition the data so each day it only looks at the last 30 periods (not days).

    Thank you

  • a standard rolling window would be something like

    SUM([Qty]) OVER (PARTITION BY <columnName> ORDER BY <date> ROWS BETWEEN PREVIOUS 30 ROWS AND CURRENT ROW)

    and then you would divide by that.

    • This reply was modified 2 years, 10 months ago by  pietlinden.
    • This reply was modified 2 years, 10 months ago by  pietlinden.
  • I am getting an error message.

    SELECT *,

    PERCENT_RANK() OVER (PARTITION BY UNDERLYING_SYMBOL ORDER BY QUOTE_DATE ROWS BETWEEN PREVIOUS 12 ROWS AND CURRENT ROW) AS PCT_RANK

    FROM DBO.Stock_Log_Price_Changes

    error message is

    Msg 102, Level 15, State 1, Line 2

    Incorrect syntax near 'PREVIOUS'.

    Completion time: 2021-11-16T10:51:42.1779430-08:00

    How do i fix this?

  • Got some data for us to play with? That's not data.

  • good idea.  I have attached sample data with expected outcome

    Attachments:
    You must be logged in to view attached files.
  • If a parent record doesn't have 30 previous periods/records, then you want to exclude it?

    Any chance you could translate your question into a Sales kind of question, so I can use something with a lot of data already, ... like everybody's favorite, AdventureWorks database?

  • How to post code problems

    • People trying to help you are donating their time. Providing scripts to create tables, views, procedures, etc. and populate data saves them time in reproducing the situation, and demonstrates that you have devoted time/effort to the problem, which increases the likelihood someone can/will help you.
    • People don't like to open files that could contain dangerous scripts.

     

     

  • which ever is easier ... exclude or mark as NULL.  I can work with NULLs in subsequent queries.

    I don't have AW DB downloaded.  Are you able to work with the Excel I provided?

  • Thank you for the feedback.  I will definitely review it.

    I am not strong with SQL so I am learning as I go.

  • water490 wrote:

     Are you able to work with the Excel I provided?

    Copy-paste from your spreadsheet into "Insert/Edit Code Sample"

    • This reply was modified 2 years, 10 months ago by  homebrew01.
  • I didn't know about that feature!!  the table look great.  Thank you for bringing this to my attention.  I used 5 days look back period to keep the table small and manageable.  If the period has less than 5 days then return NULL.  I can deal with the NULL in future queries.  Thank you so much !!

     

    SYMBOL	QUOTE_DATE	ROWNUMBER	VALUE	% Rank
    ABC 2020-01-03 1 -0.004146006 0
    ABC 2020-01-06 2 0.001142243 0.5
    ABC 2020-01-07 3 -9.87371E-05 0.25
    ABC 2020-01-08 4 0.006927997 1
    ABC 2020-01-09 5 0.00267469 1
    ABC 2020-01-10 6 -0.002778885 0.5
    ABC 2020-01-22 13 4.81656E-05 0.5
    ABC 2020-01-23 14 0.000701149 0.75
    ABC 2020-01-24 15 -0.010873242 0.25
    ABC 2020-01-27 16 -0.012509795 0.25
    ABC 2020-01-28 17 0.010326945 1
    ABC 2020-01-29 18 -0.000435917 0.5
    ABC 2020-01-30 19 -0.000667961 0.25
    ABC 2020-01-31 20 -0.016488927 NULL
    ABC 2020-02-03 21 0.008613701 NULL
    ABC 2020-02-04 22 0.014874236 NULL
    ABC 2020-02-05 23 0.010221595 NULL
    DEF 2020-02-06 24 0.00313202 0.25
    DEF 2020-02-07 25 -0.004239256 0
    DEF 2020-02-10 26 0.004711501 0.75
    DEF 2020-02-11 27 0.004121172 0.75
    DEF 2020-02-12 28 0.005614608 1
    DEF 2020-02-13 29 -0.001842341 0.25
    DEF 2020-02-14 30 0.000598689 0.75
    DEF 2020-02-18 31 -0.001090949 0.75
    DEF 2020-02-19 32 0.004761188 1
    DEF 2020-02-20 33 -0.004357873 0.75
    DEF 2020-02-21 34 -0.012245824 0.75
    DEF 2020-02-24 35 -0.02982495 0.75
    DEF 2020-02-25 36 -0.032720969 0.5
    DEF 2020-02-26 37 -0.001358889 0.75
    DEF 2020-02-27 38 -0.037744758 NULL
    DEF 2020-02-28 39 -0.04355777 NULL
    DEF 2020-03-02 40 0.055845488 NULL
    DEF 2020-03-03 41 -0.007660045 NULL

     

    • This reply was modified 2 years, 10 months ago by  water490. Reason: table not showing up correctly
  • wait a minute...it changed format.  isn't there a way to add a table so all the values are easily viewable?

  • I got this working with a SUM

    SELECT *,

    SUM(PRICES) OVER (PARTITION BY SYMBOL ORDER BY QUOTE_DATE ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS CURRENTANDPREV3

    FROM DBO.Stock

    It produces the sum as noted.  I checked in Excel and the calculation is correct.  This means that the syntax is correct.  Next, I changed SUM(PRICE_CHANGE) to PERCENT_RANK():

    SELECT *,

    PERCENT_RANK() OVER (PARTITION BY SYMBOL ORDER BY QUOTE_DATE ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS CURRENTANDPREV3

    FROM DBO.Stock

    Msg 10752, Level 15, State 3, Line 2

    The function 'PERCENT_RANK' may not have a window frame.

    Completion time: 2021-11-16T14:35:52.3292022-08:00

    Looks like using ROWS command doesn't work with PERCENT_RANK().

    Any suggestions on how to make it work?

  • Don't forget that if you are looking for 30 days rolling you will need to inject missing dates so that they are contiguous as the BETWEEN is looking at row counts not date offsets.  If there is no data for (lets say...) Chrismas, Boxing day and New Year, then your rolling 30 rows would actually pick up 33 days of data.  You would also then need to consider how to treat these NULL days in the average.

     

     

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply