It's tough to know just what *machine learning* covers, includes, consists of. F...

synthc · on June 25, 2018

Which startup would that be?

graycat · on June 25, 2018

See below, to be precise, in

https://news.ycombinator.com/item?id=17396176

That's my first description of the work on HN. Now I want to rush to an alpha test, to be announced on HN.

cschmidt · on June 25, 2018

Is your startup Datathings?

graycat · on June 25, 2018

Naw.

The project is for a new Web site.

I have the hard technical work done, e.g., the crucial core applied math and the corresponding code apparently ready for production, but I have to do some routine work, e.g., add more data, pick a company name and get trademark protection, get a static IP address, pick and register a domain name, get a tax ID, find out what the ad networks want so that I can run and get paid for ads, tell my county that I'm "doing business as ...", maybe set up an LLC, get a business checking account, get an e-mail address just for the business and another one for Web site feedback, do some more testing internally, tweak some of the software, tweak my code for my Web site session state server to get a better Web site log server, kick back and give a critical appraisal of the effort and make some tweaks, announce an alpha test here on HN and elsewhere, then a beta test, then get some publicity by some of the usual ways, then maybe have a business.

The startup is supposed to be the first good solution for a problem pressing for nearly every user on the Internet around the world, smartphone to ... workstation.

The potential of the business as currently envisioned would be, on average, about three sessions per week, 30 minutes of eyeball time per session, for over 50% of the users of the Internet in the world. The site will be able to do some relatively good ad targeting while having some of the best protection of user privacy, e.g., no use of cookies, logins, Web browser user agent strings, or third party tracking. So, make some assumptions about ad rates, multiply, and get an estimate of a good business.

The problem: Given an interest of a person, typically a narrow interest, maybe a short term, recent, or new interest, that interest treated as unique in all the world, find the Internet content with the meaning that person wants for their interest.

So, part of project is addressing meaning of Internet content.

The content might be in any of the common data types of text, still images, videos, music, Web cams, pod cast audio, etc. The content might be in Web pages, PDF files, YouTube videos, Instagram images, art gallery images, etc.

The interests might be narrow topics in crafts, politics, skills, academic subjects, art, social, interior decorating, travel, intersections of those, etc.

So, really an interest can be essentially anything, and the content can be essentially anything on the Internet. And again the main criterion is meaning.

To the users, the Web site is just how to find content with the meaning they want for each of their interests.

So, the site is a new form of engine for search, discovery, recommendation, custom curation, etc. To heck with these categories: The site is for the users to find the content with the meaning they want.

Well, characterizing meaning with just keywords and phrases is usually from difficult to impossible in practice. So, my work makes no use of keywords or phrases. So, really, my site is not direct competition for Google, Bing, etc.

If a user (A) knows what content they want, (B) knows that the content exists, and (C) has keywords and/or phrases that accurately characterize that content, then there is a good chance that Google, Bing, etc. will do well for them, and my work will rarely do better.

But for nearly all people, interests, meaning, and Internet content, (A)-(C) is asking way too much. E.g., (A)-(C) often works poorly for meaning, even when the content is based on just simple text. For meaning of audio, video, still images, etc., (A)-(C) and keywords/phrases are still less effective. E.g., tough to use keywords/phrases to characterize accurately the meaning of most art.

So, that's my startup.

Again, the code appears to be ready for production, say, to a few dozen new users per second. On average, each user will see, in about 30 minutes, a few dozen ads. Then multiply out and get an estimate of a significant business. The way the internals work, a lot of scaling will be possible just from simple sharding.

The key to the work, the enabling crucial core, is some original applied math I derived (theorems and proofs) based on some advanced math prerequisites I got before, during, and after my applied math Ph.D. From all I can see, my applied math has nothing in common with current computer science, data science, machine learning, or artificial intelligence -- I certainly have intended no such connections. But the users will have no sense of anything mathematical.

Initially, I'm borrowing some from the Paul Graham

http://www.paulgraham.com/13sentences.html

in particular on his "love"

"5. Better to make a few users love you than a lot ambivalent."

So, initially my site will be focused on some users and on some of their more likely interests and will not really be equally good for all interests of all users, i.e., will not be comprehensive.

If then the site is successful, it will grow. In an important sense, the site will grow automatically, organically, to please the users.

Feedback welcome!