Given what you’ve got here, I suspect that you may be unaware of the W3C spec for media fragments which may make portions of what you’re attempting to do a bit easier (and also much more standardized). The spec is relatively broadly supported by most browsers, so it immediately makes things a tad easier from a plumbing perspective.
Some people will be somewhat familiar with the targeting technique as it’s similar to the one used by YouTube which lets users hot link to specific portions of video on their platform.
To summarize the concept, on most audio and video files one can add a #t=XXX the the end of a URL where XXX is the number in seconds into the file where one wants to start. One can target stretches of audio similarly with the pattern #t=XXX,YYY where XXX is the start and YYY is the stop time for the fragment, again in seconds.
As an example I can use it to specifically target the audio on a particular standalone audio file like so:
With some clever JavaScript, one can go a step further and implement this at the level of targeting audio/video as embedded on a particular page which may contain a wealth of additional (potentially necessary) context. As an example of this, we can look at the audio above in its original context as part of a podcast using the same type of time fragment notation:
https://martymcgui.re/2017/10/29/163907/#t=269
As an added bonus, on this particular page with audio, you’ll notice that you can play the audio and if you pause it, the page URL in your browser should automatically refresh to indicate the particular audio timestamp for that particular position! Thus in your particular early example it makes things far easier to bookmark, save, or even share!
For use within Hypothesis, I suspect that one could use this same type of system to directly annotate the original audio file on the original page by using this scheme, potentially by using such JavaScript within the browser plugin for Hypothesis.
It would be nice if the user could queue up the particular audio segment and press pause, and then annotate the audio portion of the page using such a targeting segment. Then one could potentially share a specific URL for their annotation (in typical Hypothesis fashion) that not only targets the original page with the embedded audio, but it could also have that audio queued up to the correct portion (potentially with a page refresh to reset the audio depending on the annotation.)
The nice part is that the audio can be annotated within the page on which it originally lived rather than on some alternate page on the web that requires removing the context and causing potential context collapse. It also means one doesn’t have to host an intermediate page to have the whole thing work.
For more information on the idea, take a peek at the IndieWeb’s page on audio fragments which includes a few examples of people using it in the wild as well as a link to the JavaScript sample for doing the targeting within the page itself.
I’m curious if the scheme may make putting all the smaller loose pieces together even easier, particularly for use within Hypothesis? and while keeping more of the original context in which the audio was found?
I also suspect that these types of standards could be used to annotate audio in much the same way that the SoundCloud service handles their audio annotations, though in a much more open way. One would simply need to add on some additional UI to make the annotations on such audio present differently.
Just for fun, this type of sub-targeting on web pages also works visually for text as well with the concept of fragmention. As an example of this, I can target this specific paragraph with this link http://boffosocko.com/2018/01/07/reply-to-annotating-web-audio-by-jon-udell/#Just+for+fun, and a snippet of JavaScript on the page creates a yellow highlighting effect as well.