The interactive pins seem to be html elements that are displayed on/off depending on which video we are. Their position is defined just in css. And clicking on them replaces the current video.
So in short, there is some javascript to write, but I’m guessing that most of the time went into creating those videos, as the render has a good quality so probably took time to fine tune and render.
Another option to make such website is to have the 3D rendered right in the browser, using WebGL. It won’t look as photorealistic yet, but that opens the door for much more interactivity. (Note, that’s what I specialise in, feel free to ask if you’re curious to know more)