Question for basically just Agrul probably

Discussion in 'Tech Heads' started by AgelessDrifter, Oct 26, 2018.

  1. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    43,342
    So I'm trying to reproduce a study on blood test analyte levels and mortality odds ratios

    Basically making a smooth function that maps a blood analyte (glucose, say) to an odds ratio

    So there's this data set of lots of observations of lots of patients each with a level of some analyte, mostly ranging between like 2.0 and 5.0

    And the deal is they split up the analytes into overlapping bins and did a population-weighted average within each bin, and called that x, with the OR for the people within that bin being f(x), then they went back and interpolated a smooth function to fill in the gaps with loess or something.

    But the puzzle is this line from the paper

    Which I'm reading as not two different criteria but one criteria stated two ways, since "0.05 significance" doesn't have a meaning I'm aware of without the context of a hypothesis (such as "The OR is not equal to 1").

    But then so the question is, how the hell do you go about constructing those intervals, computationally speaking? There are gaps in the tails of the data, so It probably makes sense to have (-inf, x] and (x, inf) bins that cover those gaps, but then, what--guess and check in the middle? I mean you can't even really automate that, because presumably there's going to be some pair of points where the "true" curve crosses 1, but if you say "throw out any binning that has a bin whose OR CI crosses 1 more than twice" you could still get stuck with one that happens to cross exactly twice just because there's barely any data and the CI is really wide in those two intervals
     
  2. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    43,342
    And like this code has to run on a dataset of arbitrary size of which my dev data is a subset, so I can't just plod through it manually until something happens to work
     
  3. Agrul

    Agrul TZT Neckbeard Lord

    Post Count:
    44,931
    u should share the paper ya jabroney BUT ITS OK ILL DO IT

    http://alivesci.com/literature/INTERPRETING RISK.pdf

    i agree w/ you those are two ways of saying the same thing

    no method is going to guarantee the CIs always dodge 1

    are all the buckets assumed to be the same size? their description is very vague, but if the point is to capture local behavior of the OR it would be weird to allow for buckets of non-uniform size. if this is the case, then i guess you could just pick bucket size to optimize the proportion of intervals with CIs that don't cross 1
     
  4. AgelessDrifter

    AgelessDrifter TZT Neckbeard Lord

    Post Count:
    43,342
    Yeah I think they are supposed to be uniform, but like you said, it's really vague, so I dunno

    My current thought is something like:

    Do divide and conquer to find the inner bounds of the tails (-inf, x] and (x, inf):
    Split the interior up unto equal intervals and increasing the number of intervals until more than two intervals' OR CIs cross one or the width of the intervals is less than the precision of the data

    Since the data's pretty coarse it shouldn't take many iterations to halt and then I can just look at the results for each epoch and throw out results that don't make sense