First, consider a 4-bit counter with two adjustable thresholds (Figure 12). When Count is greater than High Threshold, Algorithm A is deemed appropriate; the same is true concerning Algorithm B and Count less than Low Threshold.

For the range between High Threshold and Low Threshold, either algorithm may be in effect. This is because a switch from Algorithm B to Algorithm A will occur only with Count greater than High Threshold and increasing and a switch from Algorithm A to Algorithm B will occur only with Count less than Low Threshold and decreasing.

The overlap range is also the Target Range as the system will naturally attempt to maintain Counter between these two points. This is true since Algorithm A tends to lower Count while Algorithm B tends to raise Count. This system acts to reduce or eliminate rapid thrashing between algorithms.

Figure 12. Another way of looking at this: if the MSB of Count is a 1, then the page close policy is too loose

Next, define a truth table (Figure 13) defining how Count will vary. By doing so we can encode a feedback mechanism into our system. Successful predictions by the Adaptive Page Close Logic - a prevented page-miss access (good)  in response to a decision to close a page or a facilitated page-hit access (good) in response to a decision to leave a page open -  suggest no change to policy is required and so never modify Count.

For a facilitated page-miss access (bad) due to a poor decision to leave a page open, increment Count. If Count were to trend upward we could conceivably conclude that the current policy was most often wrong and not only that, tended to leave pages open far too long while "fishing" for page-hit operations. The current algorithm must not be closing pages aggressively enough.

For a prevented page-hit access (bad) due to a poor decision to close a page early, decrement Count. If Count were to trend downward we would suspect the opposite: the algorithm is too aggressively closing pages and leaving potential page-hits on the cutting room floor.

Figure 13. The policy is controlling just right whenever we reduce the number of page-miss operations and increase the number of page-hit operations

As best we can tell, this construct represent reality for APM Technology. Although we would like to believe the system has more than two gears (algorithms), our model perfectly explains the existing control register both in type and number.

Looking ahead you will see Max Page Close Limit and Min Page Close Limit are the specified High and Low Threshold values, respectively. Setting a larger difference increases the size of the feedback dead band, slowing the rate at which system responds to its own evaluative efforts. Mistake Counter is represented by the starting Count and should be set somewhere near the middle of the dead band.

Adaptive Timeout Counter sets the assertion time of any decision to keep a page open (i.e. how long before the decision to keep a page open stands before we give up hope of a page-hit access). Repeated access to the same page will reset this counter each time as long as the remaining lifetime is non-zero. Lower values result in a more aggressive page close policy and vice versa for higher values.

Request Rate, we believe, controls how often Count (Mistake Counter) is updated, and therefore how smoothly the system adapts to quickly changing workloads. There must be a good reason not to flippantly set this interrupt rate as low as possible. Perhaps this depletes hardware resources needed for other operations or maybe higher duty cycles disproportionally raises power consumption. Whatever the reason, there's more than a fair chance you can hurt performance if you're just spit-balling with this setting.

Introducing Intel's Adaptive Page Management Technology We've Given You the Tools, Now Please Give Us the Help
Comments Locked

46 Comments

View All Comments

  • Dwebtron - Monday, August 16, 2010 - link

    How did you know I was afraid to ask!!
  • 0ldman79 - Tuesday, May 28, 2019 - link

    It's because we're all from the future.
  • neslog - Monday, August 16, 2010 - link

    Thank you for a great article on memory and you are right, I was afraid to ask.
  • landerf - Monday, August 16, 2010 - link

    I've found for the i7 platform the perfect ram setup is 1200 Mhz + cas5 or 6 timings, a 3:1 uncore ratio, and a B2B of 4. Not only does this perform well even in synthetics, it provides the "smoothest" intel experience. Something people who use amd and intel have been complaining about intel lacking. Check this chart and see how well that setup performs compared to all the conventional 2:1 setups. https://spreadsheets.google.com/ccc?key=0AsaXlcTga...
  • Servando Silva - Monday, August 16, 2010 - link

    Thanks for a great article. It will take me a while to read it carefully and fully understand it.
    Kris + Raju = Killer combo.
  • neslog - Monday, August 16, 2010 - link

    On page 8 you may want to change the wording in the last paragraph " Once you've had...
    to cordially invite[d] (you) to do some..."

    Thanks again for the article. I appreciate all the work that went into putting it together
  • elforeign - Monday, August 16, 2010 - link

    It's a site willing to go the extra mile like this to report and educate the masses that are truly worth the time to peruse and read the posted articles. I check this site daily because there is always something interesting to read. Thank you to all the staff who do a great job here!
  • chizow - Monday, August 16, 2010 - link

    Just kidding....

    Or am I? :D
  • JarredWalton - Monday, August 16, 2010 - link

    There's obviously benefits to either direction. Reducing latency is definitely a priority, but something not mentioned in the text that bears repeating is that latency is a factor of clock speed as well as the various timings. While CAS 6 will always be better than CAS 7 at the same base clock (and likewise for the other timings), if you have a faster memory speed CAS 7 could end up being better.

    So here's the scoop:
    DDR3-1066 = 266MHz base clock, or 3.75ns per cycle.
    DDR3-1333 = 333MHz base clock, or 3.00ns per cycle.
    DDR3-1600 = 400MHz base clock, or 2.50ns per cycle.
    DDR3-2000 = 500MHz base clock, or 2.00ns per cycle.

    That gives this table in order of increasing latency, with rough pricing for 2x2GB. Based on pricing and latency, I've starred the best buys on Newegg:

    CAS 6 DDR3-2000 = 12.0ns. ($180)
    CAS 7 DDR3-2000 = 14.0ns. ($140)
    CAS 6 DDR3-1600 = 15.0ns. ($115) ***
    CAS 8 DDR3-2000 = 16.0ns. ($150)
    CAS 7 DDR3-1600 = 17.5ns. ($101) ***
    CAS 9 DDR3-2000 = 18.0ns. ($100) ***
    CAS 6 DDR3-1333 = 18.0ns. ($100) ***
    CAS 10 DDR3-2000 = 20.0ns. ($118)
    CAS 8 DDR3-1600 = 20.0ns. ($85) ***
    CAS 7 DDR3-1333 = 21.0ns. ($90)
    CAS 9 DDR3-1600 = 22.5ns. ($92)
    CAS 8 DDR3-1333 = 24.0ns. ($92)
    CAS 7 DDR3-1066 = 26.3ns. ($80)
    CAS 9 DDR3-1333 = 27.0ns. ($85)
    CAS 8 DDR3-1066 = 30.0ns. ($93)

    Notice how the total latency often comes in groups. The DDR3-1333 CL6, DDR3-1600 CL7, and DDR3-2000 CL9 are all priced around $100. If you buy any of these modules, there's a good chance (though YMMV) that you can tweak timings to run at whichever value makes you happiest. I'd probably err on the side of buying the higher speed rated modules, though, or at least grab the 1600MHz set.
  • Rick83 - Monday, August 16, 2010 - link

    Your pricing comparison is sadly missing one important factor:
    Operating voltage.
    I was at first surprised by the high cost of 1333/9, but I expect the voltage of that kit to be around 1.5, where most 1333/7 kits already clock in at 1.65.
    The 2000/9 kit probably also runs higher V's than the identically priced 1333/6?

    Lower voltages are usually preferred, as they give you a) more headroom and b) less heat at stock - with on-die controllers even less cpu heat.

Log in

Don't have an account? Sign up now