More

thekozmo · on May 21, 2023

That's not true. Scylla does multi-threading. Scylla is a single process, single address space. It does pin the threads to individual hyper threads but there are additional other workers in the background as well.

ddorian43 · on May 22, 2023

The hot path is message-passing.

Native multi-threading is used when you have functionality that already works on threads and you don't want to port it.

Multi-thread is not used in the hot path.

A single data-part/shard is served by a single thread.

thekozmo · on March 13, 2023

Many need SVB, we (ScyllaDB) received a great service from them in the past. Later on we just used them as one of our banks, obviously, going forward, we're diversifying our services.

The problem for startups is that debt provider who are also a bank require to move most of the business to that provider. It's not the case other types of debt providers. w.r.t debt - everybody use this mechanism, doesn't matter if you're small or big.

thekozmo · on Oct 9, 2022

ScyllaDB uses AGPL, a good and valid license

thekozmo · on Oct 9, 2022

We (I'm a co-founder) run the following: - 1B (10^9) rows/sec with 86 servers https://www.scylladb.com/2019/12/12/how-scylla-scaled-to-one... - 1PB with 20 servers https://www.scylladb.com/presentations/operating-at-monstrou...

- Comcast moved from 970 Cassandra nodes (and 60 nodes of cache) to 78 Scylla nodes - https://www.scylladb.com/2020/01/15/comcast-sprinting-from-c...

- Palo Alto Networks run 8600 clusters(!) of ScyllaDB https://www.scylladb.com/2022/06/14/how-palo-alto-networks-r...

ScyllaDB excels in throughput and latency, we have also a better compaction algorithm that saves 37% of storage compared to C*. Usually one can replace lots of small nodes with gigantic nodes that have more resources and it allows much better management.

To run 100PB Scylla will need more than 300 nodes, even thousands but definitely not what Apple throw at the problem.

thekozmo · on Oct 9, 2022

There is no magic behind Scylla, mainly lots of hard work, hundreds of years of engineering, based on the former C* design which is based on Dynamo/Bigtable.

The JVM is part of the problem, not all of it. The main issue is that it hides the hardware and makes tracing harder - instruction level and block level. At Scylla we strive for efficiency, every operation is tagged with a priority class for the CPU and I/O schedulers. Folks are welcome to read the blogs about those topic. Lots of details and hard work

thekozmo · on Oct 4, 2022

Listening to SQLite creator podcast (https://corecursive.com/066-sqlite-with-richard-hipp/# ), it does feel that Glauber is right about weird collaboration standards. The guy is against gmail, git, etc. Fossil may be better for SQlite today, w/o many contributors, that's the problem Glauber is trying to solve

thekozmo · on Aug 16, 2022

This is indeed what we (ScyllaDB) do, pretty much everywhere. It works great for 95% of our users. Discord wanted to add a level of guarantee since they observed a too high level of local disk failures.

gooeyblob · on Aug 17, 2022

Yikes! Wonder what's up with GCP in that regard.

thekozmo · on July 5, 2022

Good point. This is more of a tcp stack comparison between the kernel and userspace. Seastar has a sharded (per core) stack, which is very beneficial when the number of threads is high

anonymoushn · on July 5, 2022

You can set up one or many rings per core, but the idea I alluded to elsewhere in this comment section of spending 2 cores to do kernel busy polling and userspace busy polling for a single ring is less useful if your alternative makes good use of all cores.

thekozmo · on July 5, 2022

What's amazing is that the seastar tcp stack hasn't been changed over the past 7 years, while the kernel received plenty of improvements (in order to close the gap vs kernel bypass mechanisms). Still, for >> 99% of users, there is no need to bypass the kernel.

thekozmo · on July 5, 2022

ScyllaDB uses Seastar as an engine and the DynamoDB compatible API use HTTP parsing, so this use case is real. Of course the DB has much more to do than this benchmark with a static http reply but Scylla also uses many more core in the server, thus it is close to real life. We do use the kernel's tcp stack, due to all of its features and also since we don't have capacity for a deeper analysis.

Some K/V workloads are affected by the networking stack and we recently seen issues if we chose not the ideal interrupt mode (multiqueue vs single queue in small machines)

menaerus · on July 5, 2022

Few questions if you will, it's an interesting work and I figure you're on ScyllaDB team?

1. Is 5s experiment with 1s warmup really a representative workload? How about running for several minutes or tens of minutes? Do you observe the same result?

2. How about 256 connections on 16 vCPUs creating contention against each other and therefore skewing the experiment results? Aren't they competing for the same resources against each other?

3. Are the experiment results reproducible on different machines (at first use the same and then similar SW+HW configurations)?

4. How many times is experiment (benchmark) repeated and what about the statistical significance of the observed results? How do you make sure to understand that what you're observing, and hence drawing a conclusion out of it in the end, is really what you thought you were measuring?

thekozmo · on July 5, 2022

Am ScyllaDB but Marc did completely independent work. The client vcpus don't matter that much, the experiment compares the server side, the client shouldn't suck. When we test ScyllaDB or other DBs, we run benchmarks for hours and days. This is just a stateless, static http daemon, so short timing is reasonable.

The whole intent is to make it a learning experience, if you wish to reproduce, try it yourself. It's aligned with past measurements of ours and also with former Linux optimizations by Marc.

menaerus · on July 6, 2022

I'm myself doing a lot of algorithmic design but I also enjoy designing e2e performance testing frameworks in order to confirm theories I or others had on a paper. The thing is that I fell too many times into a trap without realizing that the results I was observing weren't what I thought I was measuring. So what I was hoping for is to spark a discussion around the thoughts and methodologies other people from the field use and hopefully learn something new.

thekozmo · on Feb 1, 2022

Impressive stuff! Worth to try to run the OSv unikernel (one of my babies) in it.