Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Dav1d, fast AV1 decoder, version 1.0 (videolan.org)
216 points by coldpie on March 18, 2022 | hide | past | favorite | 108 comments


(President of VideoLAN here)

What you should really look at on dav1d is the fact that the code has now around 186 kLoC of hand-written assembly in .S and .asm files…

I think this is quite a feat (this is more asm than the whole FFmpeg) and this is very rare those days to write so much asm.


I hope there's someone fuzzing all those variants well to find vulnerabilities, that's a lot of hand written code parsing untrusted data.


It is tested by Google's OSS-Fuzz among others: https://github.com/google/oss-fuzz/tree/master/projects/dav1...

One of the bugs found by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11464


I can't see any configuration in that repo for testing different asm variants.

There's cpu = 'i686' in the config file which would in fact seem to imply none of the modern archs mentioned in a different subthread here ("x86_64(SSE3, AVX2), ARMv7 and ARMv8") are fuzzed.


> 186 kLoC of hand-written assembly

That's for x86_64(SSE3, AVX2), ARMv7 and ARMv8 right? Is the 186kLoC equally distributed between those arch or is there one arch that has most of the optimization focus?

Is it all hand-written, or do you have some kind of script/macros to create a good part of this assembly (like OpenSSL does IIRC)?


All arch have the same level of optimizations, but AVX2 might have a couple functions more. AVX-512 is still behind…

Hand written with some macros, notably for the x86 mess (windows calling convention, ssse being 32 and 64b).


pretty damning state of current compilers, I would conclude. I thought that they got much better in the meantime.

but having a look at clang or GCC, I can confirm. still baby steps, years ahead for proper optims.


Reading this it sounds like you are discouraging using it due to the implementation being big and hard to maintain. Did you mean it this way?


No. Most encoders/decoders need assembly to get last drop of performance. Most of the times it comes down to when you give up. The team here was quite relentless to build a usable AV1 decoder. A few years ago this was a pipe dream and jbk and team have done a tremendous job.


I really can't see how you read anything else that "it's very optimized" in my message, tbh.


"[..] code has now around 186 kLoC of hand-written assembly in .S and .asm files…"

This reads to me as it being a PITA to maintain. Cross platform code is usually a pain, cross platform with assembly optimizations is more of a pain. Optimization nearly always makes things harder to maintain, and this sounds like it was optimized to hell and back...


> This reads to me as it being a PITA to maintain.

Not more than any other language: when it's well done, it's manageable. When it's spaghetti code, it's not manageable.


I have a suspicion that the president of VideoLAN has a better idea of the complexity & maintainability of this code than you do.


In addition to other noting that decoders and encoders almost all have an enormous amount of hand optimized code because they are literally the most demanding applications most people run (which is why most codecs today are hardware accelerated and they are used in almost all CPU benchmarks) 186k LOC (given that it's assembler which is nearly always "vertical code") is also relatively small for decoders.

People don't really get how complex video codecs are.


It's not they're writing a library full of assembly for the sake of it. Reading the readme, one can discover they have dedicated optimised implementations for a large range of processors, which in turns means there's a lot of assembly involved


Using the Netflix 1080P AV1 2Mbps Test clip, my old Macbook 2015 with a Intel Core i5-5287U ( Broadwell Dual Core 3.3Ghz ) just manage to play that at about 70% of the total CPU.


For those that want to test decoding, here are the Netflix test clips: http://download.opencontent.netflix.com/?prefix=AV1/Chimera/


Or you can use YouTube with a browser which uses Dav1d like Firefox.

An AV1 example at 1080p50: https://www.youtube.com/watch?v=YhXtJWi2PjI

You can verify if YouTube videos are playing back in AV1 by right clicking on the video and choosing Stats for Nerds. The codec is av01 for AV1 video.


My completely unscientific anecdote. i7 6600k, opened the link, Chrome CPU went from < 1%->14% which is similar to VP9 decoding.

I know so little about this space so not sure how helpful the comparison is, but it is like magic what we are able to do with math.


Apparently this is using hardware AV1 decode on my 11700K. No significant CPU utilization.


> Or you can use YouTube with a browser which uses Dav1d like Firefox.

I'm using Chrome on macOS, which seemingly supports this as well: Codecs av01.0.08M.08 (398) / opus (251)

What decoder is Chrome using?


dav1d on macOS; you can actually find this out by going to `chrome://media-internals`, and looking at the field `kVideoDecoderName`


Tbh, those results are a bit surprising. We have much better results with similar hardware...

How are you testing? Is that dav1d 1.0? 8bit or 10bit content?

If you tested before 1.0 and 10bit content, you should run it again, since we pushed new optimizations for this.


>We have much better results with similar hardware...

I should have included in the original comment this machines needs 60% to 70% CPU to play VP9 on similar bitrate / bit / frames. i.e Decoding on VP9 and AV1 are on a similar level of CPU usage. Considering the complexity of AV1, I thought this was an incredible achievement. ( It actually blows my mind this can be done )

I tested it on with 8 and 10 bit content the result aren't that much different. ( This is just eyeballing CPU usage in Activity Monitor ) The 3.3 Ghz was turbo, so it really should be 2.9Ghz Dual Core Broadwell CPU.


Does this number doesn't mean anything by itself? How does the 2015 MBP i5 performance compare with other machines?

I didn't see benchmarks mentioned in TFA, please let me know if I've missed or overlooked something.


Congratulations on a major milestone! So cool to see the continued evolution of AV1


What's the state of the art AV1 encoder right now? Is it practical to encode hours-long videos on regular personal computers now?


I use either aom for trying to make "optimized" encodes of things to get quality at the smallest file size or svt for fast encodes that aren't really any more efficient than h265, just with the better codec.

I just transcoded some movies with vmaf of 98% from 4k hdr blurays this last week. Takes about two days of mostly single core work with cpu-used=4 on a 5950x but they get average bitrates of around 4400 kbps. Which is like, really, really good for hitting that quality target.

The trick is that since its mostly single core I just do 16 movies at a time, or 24 on my server.


I was like "wtf is vmaf?!"

Turns out it's $NFLXs OSS automated encoding quality assessment tool. Pretty cool!

https://github.com/Netflix/vmaf

p.s. zanny, nice casual stealth drop you did there ;)


I have no affiliation with Netflix, I just like using av1an which supports vmaf quality targets.

If Netflix wants to hire me, dm me, lol.


Best way to encode video is to use a scene detect algorithm to chunk up the video into multiple parts, configure your encoder to encode those parts with a single CPU thread, and spin off one thread per scene to your CPU's max amount of threads. You then just concat the scenes together afterwards.


For anyone actually wanting to do this, Av1an handles all of it automatically for you.

https://github.com/master-of-zen/Av1an


SVT-AV1 has massively improved encoding speed. I was getting ~4x encode speed with some heavy compression (barely any noticeable quality drop) on 1080p/24fps footage on my Ryzen 7 3700X just yesterday.


4x means that you can encode with 1/4 the time of the video, right? That sounds nice, a lot more practical than when I tried before.


To get a rough idea - is it faster or slower than HEVC encoding on a CPU (without any hardware encoder)?


https://www.spiedigitallibrary.org/conference-proceedings-of...

It's good and getting gooder.

Just give it a try using https://github.com/Alkl58/NotEnoughAV1Encodes or whatever, svt-6 is fairly quick.


For cpu I don't know but h265 hardware acceleration ASICs are mainstream contrary to AV1. Also h265 can reach higher compression level than AV1 although the difference is relatively small apparently. However technologically AV1 is largely obscolete since h266 stable has been released.


Every study I've seen shows AV1 beats H.265 on VMAF at all bitrates. H.265 never got significant browser support, and I don't expect H.266 to either. https://caniuse.com/hevc


I have seen a scientific paper a few weeks ago stating h265 was ~7% smaller than av1 on tested dataset. Although maybe it's dataset sensitive or differ because of the metric? (maybe it was PSNR or SSIM?) Anyway any data on AV1 hardware decoding energy consumption vs h265? That is probably the main differentiating metric of interest here


svtav1 is the best I've found, personally. Unlike libaom, it's parallelized, which obviously helps massively. I use it at quality profile 20 (as given to ffmpeg params), but I'm not an expert here.



If you want quality your choice is still limited to the reference encoder libaom.


Kudo to jbk, videolab and all the great people involved. Thanks folks ! Also the licensing approach is great <3


Are there even any CPUs left with AVX-512?


Alder lake has it and Apparently Zen 4 will have AVX-512 too

https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512



Alder lake Xeons have AVX 512 though.


Alder lake's AVX-512 comes with a significant asterisk. It's only in the P-cores (so can only be turned on if you disable the E-cores), isn't officially supported, and can't be turned on in some motherboards IIRC.

We should be seeing better AVX-512 support with CPUs in the coming years though.


According to that link, Zen 4 will only have a very limited kind of AVX-512 instructions, for working with half-precision floats. It won't even have the base AVX-512 instructions (the F set).


That Wikipedia article is incomplete, Zen 4 will be able to run all AVX-512 code that Ice Lake can run.


Icelake and tigerlake. On AWS, this is m6i/c6i. [edit] earlier AWS machines (c5/m5) have AVX512 also, but not the subsets required by dav1d's assembly.


libdav1d when it was included in MPV was such an incredible experience. Prior to, my QX9300 Thinkpad didn't manage double digit fps with anything AV1 regardless of bitrate. The love and attention put into supporting older Vector extensions is insane and suddenly it managed playback of 8bit content without framedrops. Though I'm sure anything high-bitrate will smother that QX9300.


I'm thinking about buying the latest Roku Ultra because it says it support AV1 media format. Does it have a hardware decoder or how does it manage to do it when even my nvidia Shield Pro doesn't direct play av1 content?

https://www.roku.com/products/roku-ultra


The Ultra 2020 edition and onwards use the Realtek RTD1319 for av1 decoding.

(disclaimer: I used to work there)


I am so curious about the Roku and what is under the hood. They really do have a lot of value for the cost. Quirky at times, though.


Thank you just ordered the device.


> my nvidia Shield Pro doesn't direct play av1

Even the newest Shield Pro is based on a 2015 SoC (Tegra-X1) with an integrated 2014 Nvidia GPU (Maxwell). Not surprising it doesnt support av1.


AV1 is for h265 and now h266 is "ready", I guess AV2 should see the light soon, shouldn't it? In the future, once mainstream CPUs cannot software decode in realtime compressed video/audio and only a specialized hardware block will be fast enough, will it be a "patent encumbered" format with tons of royalties?


This endless churn of video formats isn't sustainable if every one needs to have specialized hardware.


I was refering to the future: when the silicium will have hit its "best" (in a mass production context) and the video/audio compressed format of this future would not be able to be "software decoded" in real time by such "best silicium" CPUs: only a specialized hardware block would be able to decode such video/audio compressed format in real time.

The danger is this block only implementing the decoding of a non-royalty free/patent encumbered format à la mpeg, presuming the "intellectual property" was not globally fixed and still toxic like nowdays.


How good is the performance? Is 4K60 on a phone possible? 1080p60? Hard to find numbers on this


I think that for most of these libraries software-only performance is a moot point: the library is there for other software to use to be able to parse the files, and the library/decoder will simply offload the heavy compute to the (hopefully eventually-available) decoding hardware instructions.

To a certain extent it's just a place holder so folks can start writing software against it so that when the CPU vendors ship everyone can hit the ground running.


Wouldn't that only be true of the reference encoder libaom?

This is the description the dav1d page gives:

dav1d is the fastest AV1 decoder on all platforms :)

Targeted to be small, portable and very fast.

Since they use the word fast twice I was wondering how fast.


I'll try to dig up the slide where they demonstrated it, but basically faster at equal quality than any software decoder of any codec.

Which is cool, but if you have hardware decode then you may not care, though software decode gets used in all sorts of weird places that you might not expect.


Of any codec? Are you sure?


Yeah, the key is that newer generations of codecs use more time to encode, but often the decode time goes up less than the encode time. Combine that with greater design for parallelism and you find yourself in the slightly counterintuitive situation where decoding the same quality of video is faster in the newest codecs (at the one time cost of longer encodes).

I dont think it was one single slide I was thinking of but you can for example find comparisons of vp8, vp9, h264 by the same videolan/ffmpeg devs that demonstrate this general principle, the decode fps is surprisingly flat between software decodes of different codec families and generations.

The earlier generations often hit bottlenecks though, I think ffmpeg VP9 decode (built by the same people as dav1d) was the leader for a while but hits a wall at 4 cores due to format limitations. HEVC and especially dav1d have some gains when you can throw a modern amount of cores at it, and even phones often have 8.


>Is 4K60 on a phone possible?

Depends on SoC. A15? or low end UniSoC SoC? But generally speaking a modern Smartphone should be able to play 1080P 25fps 1mbps AV1 video on their phone without dropping frames. You will just be burning away your battery life.


It's bad. As a reference, I tried decoding a measly 2 mbit/s test clip on an older Core 2 Duo MacBook Pro (which has no issues playing back the same in H.264 purely with software decoding), and it stuttered a lot. I don't see this thing taking off until there's decent GPU acceleration in place. "Fast decoder" is highly relative.


> older Core 2 Duo MacBook Pro

Core 2 Duo are processors from 16 years ago...

Mobile phones are more powerful than those machines...


Yes, most are. That Core 2 Duo machine is the weakest hardware I have that I know can decode H.264 lower bitrate FullHD content in software. It made for a good benchmarking baseline.


And those machine cannot decode H265 in software… Sorry but you cannot say dav1d software decoding is bad on phones because some hardware from 15 years ago who can barely decode H264 cannot decode it.


It will drain your device battery pretty fast because of high CPU usage. There's nothing good about such a performance. It's just bad.


As all software decoders… So you compare software to hardware decoders?

H265 or VP9 in software are the same… Here, we managed to have a sw decoder for a format that is one order of magnitude more complex than those codecs and yet, consume less CPU.


"So you compare software to hardware decoders?"

No. I understand your vitriol, but please just stop. I'm not demeriting AV1's technical basis.


I tried decoding a measly 2 mbit/s AV1 test clip on a 2014 MacBook Pro with a 2.2 GHz Quad-Core Intel Core i7 and it was buttery smooth.


So is an H.264/H.265 video clip at 5 or even 10 mbit/s on my 4 years old iPhone, at about 2 watts of power consumption instead of 35 watts. Can you see the advantage?

I like open and free standards, and I dislike royalties, but the political aspect is best foregone when one of the alternatives really is objectively so much better for everyone.


One of the alternatives is clearly not objectively so much better for everyone. If they were so much better then VP9 and AV1 wouldn't exist. Here's what video streamers say:

https://youtube-eng.googleblog.com/2015/04/vp9-faster-better...

https://engineering.fb.com/2018/04/10/video-engineering/av1-...

https://netflixtechblog.com/netflix-now-streaming-av1-on-and...

https://medium.com/netflix-techblog/bringing-av1-streaming-t...

https://bitmovin.com/bitmovin-improves-av1-video-encoding/

https://blog.webex.com/video-conferencing/cisco-leap-frogs-h...

https://blog.webex.com/engineering/the-av1-video-codec-comes...

Your problem is fundamentally an emotional one. Now that you suspect Device X will perform poorly at Task Y you feel buyer's remorse.

Try not to worry about it so much. Computer hardware will continue to be made obsolete by more demanding software for quite some time yet.


There are no emotions involved here, with the exception of your childish choice to try misconstrue my argument. I consider the MPEG family as objectively better on the technical merit of GPU/VPU hardware decoding being available everywhere - it's fast and it's energy efficient.


> There are no emotions involved here

Of course there are. This is typical.

> I consider the MPEG family as objectively better on the technical merit of GPU/VPU hardware decoding being available everywhere

Then there will never be any codec development. It's a silly position to take.


You're very stubborn with telling me what my position and my own arguments are. I'll give you the benefit of the doubt and presume you're not just looking for people to rile up, but rather have the bad habit of leaping ahead and making other people's arguments and stances up on your own to better fit your desired counterpoint, so I'll clarify: I am certainly not against new development or progress, and I don't think that it's pointless to put an effort into AV1 or VP9. As I wrote elsewhere in this discussion, I just don't see this taking off until there's hardware decoding available, which I fully expect there to be. I'm merely underlining that the MPEG family is the state of the art.


> I just don't see this taking off until there's hardware decoding available

Hardware decoding is available. And dav1d is a very fast software decoder. I don't have AV1 hardware and yet I play back AV1 on YouTube just fine via dav1d.

> I'm merely underlining that the MPEG family is the state of the art.

It isn't.


This is not true.

AV1 is at least as widely supported as VVC.

Nothing supports hardware decoding if mpeg EVC, and hardware decoding if the original mpeg 4 (xvid generation) was never a big deal.


Trying to imagine how close you would have to be to the screen to tell the difference between 1080 and 4k...


It's hard to tell for uncompressed, but easy for compressed YouTube videos


So you're saying that a given 4k source compressed to Xmbit will look better if it's not downscaled to 1080p before compression, but instead downscaled by the display (I don't believe there are any phones with a 2160 line display)?


4K YouTube videos have a higher bitrate than 1080p videos


What they're saying is that a 1080p source upscaled to 4K and then uploaded to youtube and then downscaled by the client to a 1080p screen will look better than a 1080p upload because it has more bits.


That's nothing to do with the codec or the resolution


The reason it looks better isn't directly codec or resolution.

But it does mean that there are reasons to want 4K playback on a phone.


But doesn't YouTube negotiate a "channel sized" rate? It seems like it switches between bitrates based on what you can tolerate. All of this is to say that it seems like a tough-to-nail-down metric to evaluate.


I always assumed that for streaming sites bitrate was a bigger factor


Of course it is, a 1080p video at 10mbit will almost always look better than a 4k video at 1mbit, and a 4k video at 10mbit will almost always look better than a 1080p video at 1mbit.

The far more interesting question is if you accept you have Xmbit to play with, what is better on a given platform (screen size, resolution, viewing situation, how well compression works, how much battery is used in decoding, etc)


And bitrate goes up for 4k tiers. Even if bitrate only doubles for 4x the amount of pixels, that'll probably look better with modern codecs (depending on where in the quantization curve you are), and most services use at least 3x the bitrate for 4k over 2k, across the same codec.


Compressed by what? Are you just talking about videos you record on your phone?

I do not know what phone you have but an iPhone 12 ( as an example ) is under 1300 pixels on the short dimension. So you are not getting much more than 1080 pixels no matter what. It seems any experiential difference would have more to do with compression quality than with resolution.

Speaking for myself, I do not think I could tell the difference between 4K and 1080p on a phone ( on a decent AV1 clip ).


Surely, sitting in front of a laptop or iMac would make it easy to tell.


"on a phone"

SMPTE suggest that 4k for viewing is only worthwhile for people with 20/20 vision once the viewing angle increases beyond 30 degrees.

I'm currently looking at a 40cm wide screen, I'm 80cm away from it, which is about 30 degrees.

For my phone to fill that amount of space it would have to be about 10cm in front of my eyes due to binocular vision (if I just use one eye it's a bit further

When I watch something on my phone it's typically 50cm away, about 15-20 degrees viewing. 720p is more than enough at that size, 1080p if you have particularly good eyesight.


Unrelated, but I'm curious if console emulators could gain performance by hand-writing certain parts of it in assembly (the JIT recompiler, for example).


As the number of code paths go up, the percentage of run-once code does too. Most findings agree that optimization increasingly tends towards a small number of hot loops which could benefit from hand optimization.

In modern codebases his has the effect of micro-optimization of the "rewrite in assembly" sort mattering a great deal in heavy lifting library code like codecs, CPU emulation, rasterizers, VMs etc, and not at all in other areas. Emulation is a decent candidate in this regard but it also massively benefits from higher level simplifying assumptions(e.g. not polling for I/O changes at the true device frequency). It can be hard to notice differences in emulation quality where high level techniques are applied well.

On old 8-bit architectures it was rather the opposite: you couldn't be high level about anything because you had to design towards micro-efficiency. Therefore most of your problems were solved with very simple data structures and algorithms that heavily favored either space(by recomputing the intermediate results) or time (by precomputing all answers in a LUT), and then given a very tight cycle-shaving, code-golfing implementation.


As far as I know, a lot of console emulators have optimized assembly in places.

Eventually computing overcomes the need for optimized assembly and understandable code become more desired.


I'm praying that Apple starts supporting AV1 soon...


Apple is not really friend of royalty-free video formats. They hold some patents on h264/hvec and might see av1 as a competition.


Apple is a "Founding Member" of Alliance for Open Media (which doesn't really mean they are actually a founding member). I think Apple main issues with VP9 was that it was a Google codec by all means. Apple already supports FLAC and Opus in macOS and iOS, and VP9 in Safari.

It could be that they are waiting to see if AV1 will actually get a marketshare outside Youtube or not.


> Apple already supports [..] Opus

...to bad that they (in the usual Apple fashion) don't support it in a standard container which everyone else supports and require you to package it in their special snowflake .caf container, so even though in theory everyone supports Opus you still need to use MP3 if you don't want to deal with having to support multiple formats/codecs at the same time.


I thought it supported .opus too, I guess I was too optimistic.


VP9 is an open format but can still be encumbered by patents. $20 says Google gave Apple an indemnity agreement for VP9 support for YouTube only which is why VP9 is locked down so hard in iOS. So if Apple enables VP9 and gets sued for patent infringement for their VP9 implementation Google will have to pay for some or all of the legal costs and/or judgements resulting from them.


I'd say they are, but the real issue to Apple is how this or that format will affect their users and how it will portray their products. That is, whether or not they can decode the format without the device running hot, noisy and out of battery in 45 minutes. They like H.264/H.265 because users can decode them with high power-efficiency due to GPU- and dedicated hardware support. This is one of the greater benefits of formats that the whole world accepts and adopts.


It's a shame. 90+% of their customers would hear "AV1" and think "Apple Video 1".


I'm praying they start supporting h266 soon because mankind progress matters more than politics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: