So I'm trying to reproduce a study on blood test analyte levels and mortality odds ratios Basically making a smooth function that maps a blood analyte (glucose, say) to an odds ratio So there's this data set of lots of observations of lots of patients each with a level of some analyte, mostly ranging between like 2.0 and 5.0 And the deal is they split up the analytes into overlapping bins and did a population-weighted average within each bin, and called that x, with the OR for the people within that bin being f(x), then they went back and interpolated a smooth function to fill in the gaps with loess or something. But the puzzle is this line from the paper Which I'm reading as not two different criteria but one criteria stated two ways, since "0.05 significance" doesn't have a meaning I'm aware of without the context of a hypothesis (such as "The OR is not equal to 1"). But then so the question is, how the hell do you go about constructing those intervals, computationally speaking? There are gaps in the tails of the data, so It probably makes sense to have (-inf, x] and (x, inf) bins that cover those gaps, but then, what--guess and check in the middle? I mean you can't even really automate that, because presumably there's going to be some pair of points where the "true" curve crosses 1, but if you say "throw out any binning that has a bin whose OR CI crosses 1 more than twice" you could still get stuck with one that happens to cross exactly twice just because there's barely any data and the CI is really wide in those two intervals

And like this code has to run on a dataset of arbitrary size of which my dev data is a subset, so I can't just plod through it manually until something happens to work

u should share the paper ya jabroney BUT ITS OK ILL DO IT http://alivesci.com/literature/INTERPRETING RISK.pdf i agree w/ you those are two ways of saying the same thing no method is going to guarantee the CIs always dodge 1 are all the buckets assumed to be the same size? their description is very vague, but if the point is to capture local behavior of the OR it would be weird to allow for buckets of non-uniform size. if this is the case, then i guess you could just pick bucket size to optimize the proportion of intervals with CIs that don't cross 1

Yeah I think they are supposed to be uniform, but like you said, it's really vague, so I dunno My current thought is something like: Do divide and conquer to find the inner bounds of the tails (-inf, x] and (x, inf): Split the interior up unto equal intervals and increasing the number of intervals until more than two intervals' OR CIs cross one or the width of the intervals is less than the precision of the data Since the data's pretty coarse it shouldn't take many iterations to halt and then I can just look at the results for each epoch and throw out results that don't make sense