Strong infohazard norms lead to predictable failure modes

Obligatory disclaimer: This post is meant to argue against overuse of infohazard norms in the AI safety community and demonstrate failure modes that I have personally observed. It is not an argument for never using infohazards anywhere or that true infohazards do not exist. None of this is meant to... [Read More]

Preference Aggregation as Bayesian Inference

A fundamental problem in AI alignment, as well as in many social sciences is the problem of preference aggregation. Given a number of different actors who have specific preferences, what is a consistent way of making decisions that ensures that the outcome is fair and ideally that all of the... [Read More]

Thoughts on loss landscapes and why deep learning works

Epistemic status: Pretty uncertain. I don’t have an expert level understanding of current views in the science of deep learning about why optimization works but just read papers as an amateur. Some of the arguments I present here might be already either known or disproven. If so please let me... [Read More]

My path to prosaic alignment and open questions

One of the big updates I have made in the past six months is strongly towards the belief that solving alignment for current LLM-like agents is not only possible, but is actually fairly straightforward and has a good chance of being solved by standard research progress over the next ten... [Read More]

Entrepreneurs as Catalysts

I know this is not an original observation, but recent events have made me grok it in a way I had not really before. The principal function of an entrepreneur 1 is to act as an economic catalyst: organizing talent, money, and other resources to as to be able to... [Read More]