[Tag search]

On language modelling and pirate AI (transcript)

Sunday 11 September 2022, 00:00

I've been thinking a lot recently about the current developments in deep neural network generative models, and how they implicate free software issues. I went into a lot of this stuff in my July 25 Twitch stream, and although I'd also like to write up some of my thoughts in a more organized way, the transcript of the relevant video content is a pretty decent introduction in itself, so I thought I'd post it here.

This is an automatically generated transcript (by another of these models!), with a fair bit of manual editing - so it's spoken English written down, not the usual standard form of written English. The video itself will go online eventually, but it may be a long time before that happens, because my video archive server is at capacity now and I probably won't expand it until the Twitch stream gets enough viewers I can sell some subscriptions to pay the bills.

Syntax differences hide splits in meaning

Friday 7 September 2018, 17:33

One way people divide themselves into tribes is over word usage. If one tribe claims a certain sequence of letters has a certain meaning, and another claims it has a different meaning, then there are plenty of opportunities for them to misunderstand each other or each declare the other Wrong. There may not be a lot we can do about it when there's a direct disagreement on the one true meaning of exactly one word.

However, human language is more complicated than that. One sequence of letters may not have just one meaning and in particular, it may be used in more than one syntactic role such that the different ways of using it have different meanings. At that point it may not even be right to call it one "word"; it is two words, with different meanings and also different grammar, that only happen to share a spelling. And if two tribes use words that differ in this way, maybe there is some hope of building a bridge between them by making clear that their uses of the same sequence of letters really refer to different things and do not need to have identical meaning. That is what I'd like to talk about here: how different syntax can be a clue to different meaning.

MCFG+2

Saturday 10 September 2011, 07:10

Writing this on September 10. I've been working very hard for the last few days, as well as struggling with technology issues (as mentioned, my computer in Canada seems to be kaput), so here are just a few notes.

Again with the child-porn PSAs

Sunday 21 November 2010, 10:24

I'm in Winnipeg at the moment, here to look for an apartment - and it looks like I was successful, in that I have an application and deposit in now on a place that seems pretty much perfect. Prices are a fair bit lower here than in Toronto, with the result that for only a little more than I was paying in the big smoke, I can get a significantly nicer apartment. It's a little hair-raising because it will take them longer to process my application than the length of my stay here, so if somehow I'm not approved, I'll be in trouble. But that's not likely.

There are a lot of anti-child-porn public service announcements here. Pretty much every transit bus carries at least one, usually more than one. My colleagues actually warned me about this before I came - yes, they said, it is kind of weird and disturbing, but we don't actually have massive amounts of child abuse here, honest! I'm sure it points to something interesting about the culture. But I noticed something more specific that I thought I'd highlight.

Okay, two posters. Nearly identical design, both advertising the same thing, obviously part of the same campaign. They're trying to convey that if you happen to see some child porn on the Net, you should report it to the police an unaccountable private citizens' group. I note that Canadian law does not provide a strong safe harbour for doing so, and not only possession but "accessing" it are highly illegal, with mandatory minimum jail sentences, even in the case of fictional text created without the involvement of any real children, so you should have a really good story of how you happened to find the material by accident - but never mind that. I'm interested in the subtle difference between the two posters. One shows a woman looking concerned, with the caption "I wouldn't want my kids in those pictures. SO I REPORTED IT." The other shows a man looking concerned, with the caption "I wouldn't want my little girl in those pictures. SO I REPORTED IT."

Maybe the designers just wanted some variety, so they didn't use exactly the same wording on the two posters. But would it work just as well if you swapped the two captions? I think it wouldn't; and I think the reason for that is a big clue to why this subject matter is so difficult for us to think about.

All you're "based off" are belong 2 us

Saturday 30 October 2010, 21:40

I saw a Web BBS posting recently in which the poster, who was a foreigner learning English as a second language, asked "Which is correct - 'based off' or 'based off of'?" The person asking the question can probably be forgiven because they don't know any better, and at least were smart enough to ask, but if you know me you'll probably be able to guess that the general agreement among the answers, that "based off of" is incorrect and you should say "based off" instead, caused me to consider the merits of a tri-provincial killing spree.

I will not apologize for being a prescriptivist. There are some usages that would be wrong even if all the other native speakers of English used them; and "based off" (with or without an "of") is such a usage. I'm willing to accept "different than" as an issue of formalism, and acceptable in speech or informal writing even though I do not use it myself; I'm willing to (very grudgingly) grant that persons from the United States of America may be allowed to say "anyways" as a regional dialect thing, even though it makes them sound illiterate; but "based off" is just completely unacceptable.

Nonetheless, from a scientific perspective and from the point of view of "know the enemy," it may be interesting to look seriously at the questions of who does say "based off," and when they started.

Fun with text analysis

Tuesday 26 October 2010, 22:33

I wrote before about the writing style analysis toy; at that time I said the "blogosphere" wasn't ready for such technology, and I still believe that, but I recently did something sort of related that might interest you, and the stakes are a little lower, so I'm going to share it here.

The thing is, in my novel draft, there are 45 chapters, and some of them are deliberately written in different styles from others. I thought it'd be interesting to see if I could detect that statistically. I apologize for not posting the actual text here - you'll have to wait until the book is published - but I'll at least give you the raw numbers to play with and walk you through the analysis.

On language and the use thereof

Saturday 2 October 2010, 12:01

Hatred is not the same thing as fear, not even if they often occur at the same time to the same people. When you pretend that those two things are identical to each other, and attempt to build that pretense into the language instead of admitting that it is an activist position - for instance, when you use words like "homophobia" - you make the world a less good place and you harm those of your goals that are worth promoting.

This is important.

A note on similarity search

Monday 19 July 2010, 19:26

Hi! I'm a scientific researcher. I have a PhD in computer science. My doctoral dissertation is mostly about the mathematical background of "similarity search." That means looking at things to find other things they are similar to. I've travelled the world to present my work on similarity search at scientific conferences - and some very smart people with very limited funds chose to use those funds to pay for me to do that.

Argument from authority has its limitations, but I would like to make very clear: I am an expert in the specific area of how computers can answer questions of the form "Which thing does this thing most resemble?" Gee, why would I mention this right now?

Cantor vernacular petrify unfledged

Saturday 3 November 2001, 05:28

There are no meaningless words: for every sequence of sounds there exists an entry in the Great Dictionary. There are no wasted names. Although some unfortunates must walk unnamed eternally without a finite name, indeed, by the measure of infinity, all but a vanishingly small countable number of us, on the other side every possible name identifies a being. Though we may seek to drown significance in the psuedorandom noise floor, or to create fictional characters entirely disjoint from the world we know, it is a doomed procession into the infinite distance - for every word we speak implies the language that defined that word, and names the first person who spoke that language.