Jacob O'Bryant
Home · Archive · Twitter · GitHub · LinkedIn
Recommender system economics, reactive queries, and thoughts on Pioneer
25 February 2020

Findka

I mentioned last week that I was simplifying the way Findka works. I finished the first part of that on Friday, so Findka version 2 is officially launched. I'll be sending out the first batch of recommendations on Friday, so now would be a great time to sign up for the new version if you're interested.

Creator economics

At one part of the latest episode of Lex Fridman's AI podcast, Michael I. Jordan (apparently "the Michael Jordan of machine learning") talks about the economics of recommender systems. He says there's value in making direct connections between producers and consumers, but today that's often not what's happening. Because of the ad-based revenue model, producers end up being commoditized and consumers end up being spammed (that's my interpretation of what he said). However he points out that advertisement can serve a good purpose: if you're trying to introduce a new product to the market, recommender systems (at least collaborative filtering ones) will tend to not be very helpful. But if you're willing to spend $X on advertisement for your new product, that's a useful signal. The amount you spend on advertisement is a measure of your confidence in the new product.

This gave me an idea. Envision a hybrid recommender system that uses collaborative filtering for established items and content-based modeling for new items. In the content-based model, one of the features is how much the producer has spent on advertising. Then you let the system decide what to do with that number. If Michael Jordan is right and advertising spend is a useful signal, then the model will figure that out automatically and boost items with high advertising spend. Whereas if it's a noisy signal, it'll simply be ignored (and producers will stop spending money on advertising).

It almost seems incorrect to call that "advertising spend," because the system wouldn't be guaranteed to show it to anyone just because of the payment. In any case, I am interested to try it out with Findka eventually.

Curation and algorithms

I also enjoyed this episode of the a16z podcast. There was some discussion about curation vs. algorithms. The sentiment was that curation is more trustworthy and higher quality than algorithms--partly since algorithms are trained to maximize ad revenue, but also because curation involves a direct human relationship. Basically if you run a newsletter and you start including a lot of clickbait articles, your readers will say "what the heck" and it'll be awkward for you. There's a social cost element at play which doesn't really happen with algorithmic recommendation.

I've felt the general sentiment of "curation > algorithms" elsewhere, but I'm still pro-algorithms myself. Algorithms have the potential for personalization which curation lacks. And I think the problems with algorithms can be fixed, especially by fixing the revenue model. And with regard to the podcast, I don't think they were trying to argue that algorithms are inherently worse than curation, rather that that's the situation right now. And there's merit to that argument. I think the ideal system uses an algorithm as the final decision maker but incorporates manual curation as one of the signals. I mean, that's practically the definition of collaborative filtering--but I want to experiment with giving higher weight to known, vetted curators.

Clojure guide

No new posts this week; I'm putting extra work into Findka v2.

There was a really exciting announcement on Tusday though. Frank McSherry et. al. have officially launched Materialize, a product that gives you efficient, reactive SQL queries. This is quite huge. This is a major step forward in eliminating the need for backend code in more applications. Right now, Firebase does this to an extent. You can make (very limited) queries which will be automatically updated whenever relevant data on the server changes, so you don't have to worry about any server <--> client sync logic.

(Firebase also has a rules system so you can read and write directly from/to the database without having to write backend endpoints. The rules system also feels limited to me, and I've spent some time designing my own spec-based rule systems that I think are potentially more flexible.)

But the hard part is allowing complex reactive queries. You can't get rid of the backend without those. SQL isn't Datalog, but I assume/hope that'll come at some point (there was some talk about translating Datalog to SQL, but I guess that's a hard problem). In the mean time, I've been itching to try out Materialize and see how far we can go with just SQL. You could output Datomic's transaction log to a stream of SQL-formatted data and then use Materialize for the queries. It'd be a step further than what we have with Firebase at least.

See The Web After Tomorrow for more background on using reactive queries for serverless architectures. This actually isn't the use case that Materialize is targeting right now. They're (understandably) marketing Materialize as a replacement/enhancement for OLAP systems. But the web application use case is the one I'm personally interested in. I spent a few months in early 2019 with someone else on a startup idea in this space. One of the reasons I stopped working on it is because we were mistakenly under the impression that the tech for complex reactive queries already existed, there was just a lot of glue work that needed to be done. But later I realized that no, this was actually an unsolved research problem, so I've been following Frank McSherry's work since then.

Anyway, hopefully I'll get to do a writeup of trying out Materialize next week, but no promises ;).

First impressions of Pioneer.app

I decided to try out Pioneer a couple weeks ago. It's an online-only early stage startup accelerator/community. They're basically aiming to be a precursor to Y Combinator, at least at first. I think there's important work to be done in that space. YC's partners say that you're ready to apply to YC if you've got an idea and a team, but even getting to that point is hard. Universities theoretically should be the perfect preparation for that, but they have tons of problems.

So, I'm interested in any scheme to support and build relationships between fellow wannabe startup founders and other people who are trying to make it on their own. However, Pioneer isn't quite what I'm looking for. I've decided to stop using it for now.

The gamification of it all feels pretty gimmicky, but it doesn't really matter. You can just ignore it. My main issue is the default assumption of how participants should interact. YC's Startup School had the same problem (I went through that last year). The idea is:

  1. You get grouped with a sample of other participants.

  2. Everyone gets a drive-by view of each others' startups.

  3. You try to give each other advice and encouragement.

The problem is that most people are simply not going to grok your startup idea. If it was obviously good, someone else would have probably done it already. It often takes a specific set of experiences to be able to see why this idea might be worth working on. But, you've gotta give each other feedback, so what do you do? Tell them to improve the landing page of course ;). I felt the pain myself all the time. I rarely knew what to tell other founders; it took a lot of work to think of something helpful to say. And almost invariably, I gave the advice I did not because I thought I had some great insight but because I had to say something and that's all I could come up with. The end result? Lots of noise, not much signal.

On the other hand, I remember one time during Startup School when I met someone on the forum who was working in the food space, and preparing food is a huge pain point for me. We had a zoom chat later and it was exhilarating. We came up with all these ideas. I can't say if they were good ideas or not, but we both enjoyed the chat.

Those kind of experiences are the things these communities should be optimizing for. As a participant, I want to somehow skim through all the projects and find the few that are potentially meaningful to me. Then I'd look into those projects a little more--if they look interesting, maybe I'd talk to the founders. I might subscribe to their newsletter and give them regular feedback on new features etc. Or if they ever wanted another perspective on some decision they're trying to make, I'd be available. On the extreme end, maybe I'd want to join them as a partner (or later as an employee).

How would you build a system like that? Well... a recommender system might be helpful ;). In fact serendipitous human networking is one of the late-stage use cases I really want to explore with Findka (it's possibly the most important use case, especially as remote work becomes more prevalent). But it doesn't have to be that complicated. All you need is a directory of projects that can be navigated without too much effort. You should be able to navigate at least by project type, but maybe allow arbitrary tags as well. For instance I'd be interested to see what other startups are using Clojure. The listings need to have enough detail so you'll notice the ones that are important to you, but short enough so browsing doesn't get difficult. And there should be links to a mini-community for people interested in that idea. Maybe each project has a dedicated slack channel, and the listing has a link to that channel.

I also think it could be helpful to somehow avoid seeing these things through the startup lens. When you hear "this is my startup," you think "you're spending the majority of your waking hours building that?" Your expectations are too high, which makes it harder to notice the little spark of innovation that might be there. So if you could just think of it as a directory of projects (which might turn into startups, who knows) that might help. (I sometimes try to get myself to think of my own startup as just a project, but it's too late for that. I've already drank my own kool-aid.)

Maybe the way to do it is to help people get to know each other a bit in addition to learning about each other's projects. When you don't know someone, it's easy to dismiss their project as a bad idea. But things are different when you think "Alice is really smart; if she's working on this, there must be something interesting about it."

I digress. I'd like to work more on communities at some point; unfortunately it'll have to wait for my current projects (and backup projects).

Have a great week and stuff,

Jacob O'Bryant