More

pedrosbmartins · on Oct 24, 2023

Weird, I've been using Ubuntu for quite a few years now, both in my work laptop and personal desktop, and have never even heard the terms "Ubuntu Pro" or "Expanded Security Maintenance for Applications."

pedrosbmartins · on May 13, 2023

Algorithms Illuminated by Tim Roughgarden really helped me understand algorithm design and analysis. Used it to prep for a masters in computer science, with no previous degree in the area.

https://www.algorithmsilluminated.org/

The author even shares free video classes and other extra material.

pedrosbmartins · on May 10, 2023

Learning optimal decision trees and related ML models using SAT, MILP, and other mathematical optimization methods.

Some recent literature reviews:

https://link.springer.com/article/10.1007/s11750-021-00594-1

https://www.ijcai.org/proceedings/2021/608

pedrosbmartins · on May 4, 2023

is there a reason why all state is in the server? what about cases where you'd want to keep client-side state, like those strictly related to UI/UX?

(very interesting project nonetheless!)

picklelo · on May 4, 2023

Keeping the state on the server allows us to run arbitrary Python code and libraries in our event handlers that update the state. Currently only the ui is compiled to React but the logic stays in Python.

We’re working to offload more logic to the client in the for purely UI operations like you mention, and in the future want to leverage wasm once it’s more mature.

pedrosbmartins · on May 4, 2023

It depends on what you mean by function. I did an internship at Coca-Cola in Brazil a decade ago. They had ~500 employees country-wide, mainly marketing and operation strategy folks. The production of the actual beverages and other products was in great part delegated to partner bottlers. My boss there used to joke that if all the 500 employees were fired at once, the amount of Coca-Cola cans and bottles sold wouldn't drop at all for months. And I do believe he was right. In the long-run, of course, things would be different - that's where marketing and strategy pays off.

pedrosbmartins · on April 27, 2023

Which reminds me of https://fivebooks.com/, where people from a particular field are asked their top five book recommendations for a given theme. The interview format is great, and I've picked up a few recommended books along the way.

scotty79 · on April 27, 2023

> The Best Apocalyptic Fiction, recommended by Elliot Ackerman

> 1. The King James Bible

pedrosbmartins · on March 26, 2023

It seems you are mistaking the actual book (a story, an exposition of a subject, etc.) for the printed set of cover and pages, which is only a physical instance of the book itself. Writing your notes in a separate medium is just as much engaging with the book as writing in the margins of a physical copy. It's a matter of taste really, but definitely not an insult to the author!

pedrosbmartins · on March 2, 2023

> This is overstated and easily disproved. ChatGPT produces accurate facts as a matter of course. test it right now

"the idea that this stopped clock is broken is overstated and easily disproved. the clock produces accurate time as a matter of course. go ahead and ask what time is it, just make sure it is 3:45am or 3:45pm"

williamcotton · on March 2, 2023

And I’m also guessing that you don’t do anything that has 1 in a billion odds of death because it’s not worth the risk, right?

Edit: woooooosh...

pedrosbmartins · on March 2, 2023

what? the argument here is that ChatGPT giving factual answers is a mere coincidence, not at all what the model was trained to do. It's a broken clock, it can tell you the correct time at very specific contexts, but you shouldn't rely on it as your source of factual information. If you feed him enough data saying the statue of liberty is 1 cm tall, it will happily answer a query with that "fact".

williamcotton · on March 2, 2023

You're not talking about probabilities. You're talking in binaries.

The broken clock is not the correct analogy.

pedrosbmartins · on March 2, 2023

Any analogy is incorrect if you stretch it enough, otherwise it wouldn't be an analogy...

My clock analogy works up to this: ChatGPT success in factually answering a query is merely a happy coincidence, so it does not work well as a primary source of facts. Exactly like... a broken clock. It correctly tells the time twice a day, but it does not work well as a primary source of time keeping.

Please don't read more deeply into the analogy than that :)

williamcotton · on March 2, 2023

A happy coincidence would imply random behavior.

That’s not even remotely how an LLM functions.

You’re not introducing any scale with regards to correctness either.

It is a poor analogy without any stretching required.

pedrosbmartins · on March 3, 2023

Nope, not random behavior. ChatGPT works by predicting the continuation of a sentence. It has been trained in enough data to emulate some pretty awesome and deep statistical structure in human language. Some studies even argue it has built world models in some contexts, but I'd say that needs more careful analysis. Nonetheless, in no way, shape or form has it developed a sense of right vs wrong, real vs fiction, in a way you can depend on it for precise, factual information. It's a language model. If enough data says bananas are larger than the Empire State building, it would repeat that, even if it's absurd.

williamcotton · on March 3, 2023

I didn’t say it was random behavior. You did when you said it was a happy coincidence.

I know it is just a language model. I know that if you took the same model and trained it on some other corpus that it would produce different results.

But it wasn’t so it doesn’t have enough data to say that bananas are larger than the Empire State Building, not that it would really matter anyways.

One important part of this story that you’re missing is that even if there were no texts about bananas and skyscrapers that the model could infer a relationship between those based on the massive amounts of other size comparisons. It is comparing everything to everything else.

See the Norvig-Chomsky debate for a concrete example of how a language model can creat sentences that have never existed.

pedrosbmartins · on March 3, 2023

> the model could infer a relationship between those based on the massive amounts of other size comparisons

That is true! But would it be factually correct? That's the whole point of my argument.

The knowledge and connections that it acquires comes from its training data and it is trained for completing well-structured sentences, not correct ones. Its training data is the freaking internet. ChatGPT stating facts are a happy coincidence because (1) the internet is filled with incorrect information, (2) its training is wired for mimicking human-language's rich statistical structure, not generating factual sentences, and (3) its own powerful and awesome inference capabilities can make it hallucinate completely false but convincingly-structured sentences.

Sure, it can regurgitate simple facts accurately, especially those that are repeated enough in its training corpus. But it fails for more challenging queries.

For a personal anecdote, I tried asking it for some references for a particular topic I needed to review in my masters dissertation. It gave me a few papers, complete with title, author, year, and a short summary. I got really excited. Turns out all the papers it referenced were completely hallucinated :)

Dylan16807 · on March 3, 2023

> If enough data says bananas are larger than the Empire State building, it would repeat that, even if it's absurd.

And if it did stuff like that almost every answer, it would be a broken clock.

But it doesn't. It's usually right about facts. It getting things right is not a coincidence!

remexre · on March 2, 2023

The probability that the broken clock is right is straightforwardly 2/1440 = 0.001 = 0.1%, innit?

slackdog · on March 2, 2023

Clock correctness is relative. If the antique windup clock in your living room is off by 5 minutes, it's still basically right. But if the clock in your smartphone is 5 minutes off, something has clearly gone wrong.

williamcotton · on March 2, 2023

To the second? To the millisecond? What are we wanting here? You're missing the point.

But I'll play this silly game: ChatGPT is not incorrect 99.9% of the time.

mrtranscendence · on March 2, 2023

Nor is it only incorrect one billionth of the time, as you seem to be indicating through your hypotheticals. Depending on what I've asked it about, it can be incorrect at an extremely high rate.

williamcotton · on March 2, 2023

That is definitely not what I am indicating. I'm pointing out the absurdity of speaking of probabilistic things in absolutes.

Yes, ask an LLM to multiply a few numbers together and you will get around 100% failure rate.

The same goes for quotes, citations, website addresses, and most numerical facts.

The failures are predictable. That means the models can be augmented with external knowledge, Python or JS interpreters, etc.

pedrosbmartins · on Aug 3, 2022

Once I stayed in an Airbnb owned by Karl Friederich Gauss' distant relatives in Brazil.

It was a very cozy cabin in the mountains around Rio and I was celebrating a two-year anniversary with my girlfriend. There were a few books arranged in a short rack, mostly teen stuff, but one aged book stood out. It was an English version of Gauss' Theory of the Motion of the Heavenly Bodies, apparently borrowed from an university library in the 1970s but never returned. Inside, I found two documents from 1969, a voter registration and an exam card. They belonged to a woman with a Brazilian first name and Gauss' surname. Later, I had to transfer money to the Airbnb host, and she also had Gauss as a surname.

I was pretty thrilled with the whole thing. My girlfriend was more entertained by the cabin's cat.

pedrosbmartins · on July 9, 2022

I believe they tackle this exact bias. From the Sampling Effects section:

> Second, the choice of CDS n-grams could lead to a "recency bias” in our results, explaining their rise in prevalence in recent decades. We control for this effect with a null model that samples random n-grams more frequently from recent books, due to rapidly increasing publication volume since 1895, thereby inducing a bias toward more recent language. We observe increases of CDS n-gram prevalence well above levels predicted by this null model

bryanrasmussen · on July 9, 2022

> thereby inducing a bias toward more recent language. We observe increases of CDS n-gram prevalence well above levels predicted by this null model

I don't get why this would work? I get the null model predicting a bias X, and I guess they have a greater bias X + Y, but I don't see how that handles their choice of cognitive distortion signifiers being biased? I mean isn't it likely their choice of signifiers matches to Y?

lumost · on July 9, 2022

they are claiming that their basket of distortion N-Grams became more common faster than other randomly sampled n-grams from recent works.

This seems like an interesting approach to controlling for the bias, but I'd expect the random sampling would bias lower than a specific sample as a random sampling of n-grams would pick up a lot of english grammar which hasn't changed in many years.

geysersam · on July 9, 2022

Maybe they should construct a list of n-grams indicative of "non-depressed"/healthy people, and run the same analysis on that list.

layer8 · on July 9, 2022

They also address this to some degree in the Language effects section.