Closed Bug 1882584 Opened 2 years ago Closed 2 years ago

stopAndAccumulate/Cancel methods of FOG TimingDistributions coerce null timerids to 0

Categories

(Data Platform and Tools :: Glean: SDK, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: perry.mcmanis, Assigned: perry.mcmanis)

References

Details

Attachments

(1 file)

Based on our discovery in 1882110 we notice unexpected behavior in FOG.

Specifically, null appears to coerce to 0 in the TimingDistribution metrics stopAndAccumulate and cancel methods. This is an issue because 0 is actually not only a valid timer ID, it is the first one assigned.

As a result, sometimes we are calling stop/cancel in abstracted stopwatches, which causes submission of a null timerid. If 0 was already stopped, this results in an invalidState error being recorded and in fact that is precisely what we observe with the two Fenix metrics Geckoview.page_reload_time and Geckoview.page_load_time

There are a few related things that likely need to come from this:

  1. 1882110 mitigated in the short term by guarding stop/cancel
  2. [this bug] See if FOG can be modified so this behavior does not occur
  3. start timerIDs at 1 if not, in the SDK itself.

Maybe we can fix this in FOG to prevent or detect the coercion of null to 0. This is a correct conversion as it follows the WebIDL spec (which uses ECMA's ToNumber algorithm). At the cost of making the interface less sensible to someone reading it, we could try other mechanisms:

  • Use a nullable timerid, so null gets reported to the C++ impl as null. It does give us a problem about what to do with null in C++ though (taps the bug 1691073 sign).
  • Use an opaque TimerId structure of some kind. This more closely follows the intent of the interface and it lets us hide the implementation of TimerId. It'd likely be a weightier thing to construct, store, and use these objects than a primitive.

But we could also (or instead) change the SDK to make errors of this sort harder to stumble over.

  • We could, for instance, reserve TimerID 0 and never supply it. Then any method receiving it would reject it as invalid_value and be about its business. Not too different from the status quo where the misuse of null would maybe work at most once, but then be invalid_state from then on.
  • We could make the TimerId type properly opaque at the glean-core layer. LBs would then have to decide how to supply timerids to their consumers if they're unable to supply opaque types themselves, ensuring they have solitary responsibility for managing any primitive values that might be coerced. This approach could have its down sides, though (taps the bug 1691073 sign again).

Given the constraints on a FOG-level solution and the rarity of coming across this in the wild at present, this is more properly an SDK bug to begin.

Component: Telemetry → Glean: SDK
Product: Toolkit → Data Platform and Tools
Assignee: nobody → pmcmanis
Severity: -- → N/A
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: