I am not sure I understand your goal.
Is it to create a similar experience to a video? (syncing audio and video…) If so, then the simple and most efficient way to do so is… well… video
No html/css/js video & sound elements will be as good synced as a video file is (Loading times, dependencies, browser support, etc’).
Back in 2006, the video compressors were not as good as today, so flash was a good choice. Today we have .webm, and advanced .mpeg compressors. With those you can have great video quality, and still have a small file size.
It is to use web technology efficiently without excessive resource use that comes with video, also doing it on a website rather than linking externally.
As an example a 36 image slideshow story with accompanying 2m35s audio track would use less bandwidth and client and server side resource consumption/energy than the equivalent 2m35s video at 1080p / 1440p / 4K hosted off-site (youtube/vimeo etc).
It also allows freedom in presentation and interaction that can’t be done in a video format.