Can one of the LLM systems be trained to translate to or from this? Is that usef...

neontomo · on Aug 13, 2023

You can think of most common languages as "lossless" (at least in comparison), whereas Toki Pona is "lossy". Once you translate English to Toki, it loses a lot of granularity. For example...

Translating English to Toki Pona:

  Talk -> Toki

Translating Toki Pona to English:

  Toki -> To speak, Communicate, Use language, Think, Say "hello"

Which means, it's now impossible to translate it back to English and know which one is referred to, without context.

The language forces you to be less specific in your meaning, but also to focus on core human emotions instead of unnecessary complexity. Having a translator perfectly convert English to Toki Tona is gonna be a lot easier than the reverse, anyway, unless you have a lot of context to piece it together.

Note: I tried learning this language many years ago and this is my personal opinion. It may not be shared by everyone.

Aachen · on Aug 13, 2023

> You can think of most common languages as "lossless" (at least in comparison),

If only! For most words in most sentences, you can find direct translations, but between most language pairs, I'm pretty sure any given text (just one page or so, let alone books) will have at least one point where you have to interpret the text to translate it: choose what meaning the author intended or what saying/expression conveys the same meaning the closest without going into descriptive mode.

I suppose that might be what you meant by "in comparison", but it seems misleading, at least to anyone not fluent/proficient enough in another language to know that lossless translations isn't how languages work

But, yes, toki pona lifts this to a whole 'nother level. Does anyone happen to know how Esperanto compares? As far as I know, it should be quite similar to a "normal" (non-con)language in this regard (just simpler to learn, with bonus points if you speak a romance language afaik)

shadowofneptune · on Aug 13, 2023

Esperanto was intended as a sort of diplomatic language. It's got flaws, definitely. The sounds and spelling are very much from the creator's native Polish, a lot of important terms are rather obscure («Usono,» from "Usonia" is the word for the United States). That said, it is in the end relatively easy to learn, and it is easy to express the ideas of diplomacy, science, and civil society.

China and Japan used to have a lot of Esperantists before WWII, for that reason.

https://www.loc.gov/item/2021670575/

> After World War I, the League of Nations considered adopting Esperanto as a working language and recommending that it be taught in schools, but proposals along these lines were vetoed by France.

It may be Eurocentric, but it's hell of a lot easier for diplomats to learn than English or French!

senloke · on Aug 14, 2023

An actual Esperanto speaker here. I need to correct this. It was never intended to be a "diplomatic" language, as such a language only spoken by diplomats between their kind. So the language of a small elite, which does not want to deal with the average man on the street. That sounds like a story which was said about the predecessor of Esperanto: Volapük.

Esperanto was at some point in time the "workers latin", because the less educated worker could learn it as a means to talk with people from other nations. That ended with pushing English or other "more practical languages" in schools to this day.

Esperanto still is a working living language with a working worldwide community.

bmacho · on Aug 14, 2023

Zamenhof stated multiple times that he wanted to create an universal second language, as opposed to an universal first language. I don't think this distinction makes much sense, had any effect on any design decision, but probably it was important for the marketing of the language. In this sense it was indeed intended to be a "diplomatic" language, so that diplomats can use a single language. (As well as international organizations, merchants, tourists etc.)

Aachen · on Aug 14, 2023

That doesn't sound logical to me. If Zamenhof didn't intend for it to be a primary language, one you learn from birth, then why couldn't it be used by random people still? There has been trading between countries for much longer than Esperanto exists for, especially in border regions or small countries but also across oceans and continents.

Esperanto is from 1887. I was curious what holidays were like at the time:

> According to Stowe (1994), “many nineteenth-century Americans traveled, and many more participated vicariously in the experience of travel by reading travel letters, sketches, and narratives in newspapers, magazines, and published volumes” (p. 3). Similarly, the appetite for travel in the U.K. was also voracious --https://regrom.com/2020/08/26/regency-travel-traveling-abroa...

So also a goal Zamenhof could intend. I don't know how you get to the conclusion that, because it wasn't intended for my mom to use while I was a baby, it wasn't intended to be used by my mom or me on holiday if we're not "diplomats", unless you call any tourist an international diplomat

shadowofneptune · on Aug 14, 2023

Western Europe is very different from the Europe that Zamenhof grew up in.

You get so many internationalist movements out of Russia because it already was in many ways international internally. Lots of languages and land, but both travel and speech were restricted by authorities, secret police seemed to hover invisibly everywhere. The language of everything important, the language of rulers, was Russian. Vacations were in-country, if they happened at all.

Looking to the UK, France, and the US is in this case misleading.

senloke · on Aug 14, 2023

I think that view of you is wrong. The distinction is an important one.

By saying you create a diplomatic language, you are marketing towards the elite, as I wrote in my post.

By saying that everybody speaks it as a second language, which is indeed what Zamenhof, the creator of Esperanto, wanted is a different focus. The first is focusing on an elite, the latter focuses on the people.

It's like the distinction between "computers for knowledge workers" and "personal computers", the first is only for a small elite, the latter is for everybody. Or the distinction between "politics for a couple of few" and "politics elected by everybody", the first is called a form of dictatorship, the latter democracy.

bmacho · on Aug 14, 2023

Right, Esperanto wasn't created for the elite, or only for diplomats. That wouldn't make much sense. Also I don't think GP intended to suggest it, but you clarified it anyway, so all's good.

shadowofneptune · on Aug 14, 2023

Frankly, this is why despite my admiration for Esperanto, I do not engage in it.

Posts like these are the 'no fun allowed' of constructed languages, and it pops up most often with Esperantists. Like a diplomat, you refuse to let people use words carelessly, or loosely.

Toki Pona is in itself a reaction to that. It's an exploration in wordplay, puns, and local culture.

EDIT: You also left like... a wall of text explaining why Esperanto is far superior to Toki Pona? That isn't fun to read or talk about. If the idea is to replace English as a language of the world, we don't have to bring the stern attitude of an English teacher along with it.

The sister post got my intent well.

senloke · on Aug 15, 2023

> Posts like these are the 'no fun allowed' of constructed languages, and > it pops up most often with Esperantists. Like a diplomat, you refuse to > let people use words carelessly, or loosely.

Wtf? What 'no fun allowed'?! In the community is fun allowed how and why are you making that stuff up based on what actually? What interpretation are you constructing, which is not based on any reality? We have wordplays, puns and local culture. People do these all the time and annoy the more grammatically inclined people with it all the time. These conflicts inside the community are normal any community will develop people who need to care about the language more and people who care less about any language. That's how new concepts are generated.

> EDIT: You also left like... a wall of text explaining why Esperanto is far > superior to Toki Pona? That isn't fun to read or talk about. If the idea is > to replace English as a language of the world, we don't have to bring > the stern attitude of an English teacher along with it.

The wall of text tried to answer the question sincerely of how they compare. Also it included my personal bitterness of about people who constantly piss on Esperanto for the wrong reasons. Like such as exactly this post of yours. And that's also why I stopped writing it. I wrote that it's lacking "functionality", that makes Esperanto more complex. Toki Pona is minimalist, it can't be the best language in the world for everything. But that does not make it bad. People enjoy learning it and despite what you try to make people in the Esperanto-community look like, there are a bunch of them speaking that language too for its value of minimalism, its value in playing around with the sapir-whorf-hypothesis regarding depression (it's after all the language of good), its value in finding a community, etc.

You see something, interpret it wrongly and then piss on it, for the wrong reasons.

> Toki Pona is in itself a reaction to that. It's an exploration in > wordplay, puns, and local culture.

That a niche of people who are inclined to perfectionism, down-beating and snobbishness are also inclined to favor Toki Pona is shown by your comment.

int_19h · on Aug 16, 2023

Slight correction: Zamenhof's native languages (so far as we can tell), in a sense of what he spoke at home, were Yiddish and Russian, although he certainly learned Polish at a very young age due to place of residence. Not that it makes much difference in this case - the quirks of Esperanto phonology, such all those affricates and consonant clusters are familiar to speakers of pretty much any Slavic language. Esperanto orthography, on the other hand, appears to be inspired more by Czech than Polish - "v" rather than "w", diacritics over digraphs etc.

neontomo · on Aug 14, 2023

Thanks for the balanced reply! Yes you're right, I'm only using lossless and lossy as a metaphor here, because of course, any translation will lose information. Even speaking your native tongue with another fluent speaker will lose information because the meaning is interpreted.

However, I'd say that well-developed languages (which for me means having many hyper-specific words and a diverse vocab available) allows interpretation to be more precise and functionally allow communication to happen. With Toki Pona I have the sense that a lot more is lost, and we end up interpreting broad strokes of the original meaning. This isn't bad, it's just different. Poetic if you like.

robobro · on Aug 14, 2023

I just started learning Esperanto on Duolingo and it's been a breeze. I've studied other European languages (Spanish, French, German, Latin) as a hobby before and Esperanto is kind of a streamlined hybrid of all those.

Interligua may be "more useful" as an auxlang but the size and history of the Esperanto community appeals to me.

I'm also studying Bahasa Indonesia right now; it also has a very regular and simple grammar. It's said to be the easiest non-European language for European language speakers to learn.

senloke · on Aug 14, 2023

> But, yes, toki pona lifts this to a whole 'nother level. Does anyone > happen to know how Esperanto compares? As far as I know, it should be > quite similar to a "normal" (non-con)language in this regard (just simpler > to learn, with bonus points if you speak a romance language afaik)

As an Esperanto-speaker with a shallow understanding of Toki Pona I can say the following comparison:

1. Esperanto has way more vocabulary, which is useful for translating nuances from one source language to another language and back. This was added over the years and is still expanding, as any other living language does.

2. Esperanto words are way more specific than the one of Toki Pona. A sentence which I took from a Toki Pona cheatsheet "soweli li moku" can be translated as "a land animal is eating", "a cat is drinking", "a dog is ingesting", etc. so highly context depending. Esperanto would for the specific meaning of "a cat is eating" allow "kato estas manĝanta" or, if you don't care for the present progressive tense, which is normally how people speak it, "kato manĝas" (present tense).

3. Toki Pona glues words together by putting them each after the other, without glueing the roots together. "telo pimeja" was one example of trying to say "coffee". The Esperanto principle would be to say nigroakvo (a kind of water, which is black). Besides that Esperanto allows more nuanced distinctions like adjectives, so "nigra akvo" would mean "black water", the water, which is black. Also as I wrote earlier, the roots in Esperanto are way more specific. "akvo" is in Epseranto just water, not liquid, not fluid, not beverage as is the meaning of "telo" is in Toki Pona. Also "nigra" is in Esperanto "black", not dark, not unlit, as is the meaning of "pimeja" is in Toki Pona. Besides that was just an example based on the word coffee, which someone else in this topic used. The actual word of "coffee" in Esperanto is "kafo" and for tea is "teo", because words which are sufficiently internationally understood are put into the language by the speakers of it.

4. Esperanto is simpler to learn, it provides the same high rewarding learning curve as does Toki Pona, but it's steeper, because it's a bigger language. Depending on intelligence, learning style, being able to focus on things regularly, motivation, connections to other speakers, time I would say people can use the language after 3 months to one year. How good that is, that's another question. I learnt the language more intensively in the first year and then just maintained it, used it and improved upon it.

5. Esperanto uses an agglutination based way of building words, as I wrote in the example of point 3, based on my little understanding of linguistic terms, this could be called "synthesizing" words, but it also supports building phrases as does Toki Pona in an analytical way. "Ĉu mi povas uzi la lazertranĉilon?" (Can I use the lasercutter?) or "Ĉu mi povas uzi la ilo kiu tranĉas per lazero?" (Can I use the machine, which cuts with a lazer?). Toki Pona would be something like "mi li pali ala pali e suno ilo" (Can I use the sun-tool?). "ilo" by the way is a word taken from Esperanto, as the creator of Toki Pona also understood Esperanto, which means "tool".

6. Esperanto has 136 years of literature behind it, Toki Pona doesn't. Let's wait another 136 years and see which of all the current constructed languages are then still around and see how much is produced in them.

7. When you learn Esperanto you have acquired some shared vocabulary in romance languages, but also a little bit of other languages in the mix, like German. Which can be a little bit helpful when learning after Esperanto then those languages. Or trying to navigate on a trip through Italy, you won't understand 80% of what people are saying, but here and there you will encounter words, which sound familiar. Which shows the so called "eurocentricity" of Esperanto, which is then used by some to campaign against the language, as if that property makes it the ultimate evil in the world, if they would apply the same standard to speaking English, then they would stop writing in the internet at all.

8. Toki Pona is overhyped. Esperanto is not hyped anymore. Any long existing constructed language loses it's appeal at some point, because the times have changed, people don't understand the history of it, believe anything which at first sounds enough plausible, but is in fact bullshit. Worldwide auxiliary languages like Esperanto always have the problem to be not "perfect" enough, they all replace each other in an eternal asymptomatic drive towards "more perfection", without ever reaching that goal. Thus in the end the most working language in that category is still Esperanto and hopefully will be for the next 136 years. But that's only my personal pet opinion.

Anyway, as I'm digressing into some general rant about planned languages, I end this comment, I can surely answer concrete questions about comparing these two languages later on, if wanted.

bravura · on Aug 13, 2023

Here was GPT4 giving me a full literal and then figurative translation of the sample text:

https://chat.openai.com/share/50fd27af-7515-4f89-b0e1-228501...

coder543 · on Aug 13, 2023

You assume that existing ones can’t?

Using the example provided in the main link, ChatGPT-4 had no trouble at all: https://chat.openai.com/share/3b8faf2e-4eca-4d52-97fb-ac89a4...

clankyclanker · on Aug 13, 2023

This is precisely the case where ChatGPT (or any LLM) will have trouble that a layperson would have trouble validating: Toki Pona has a limited document corpus and requires contextual interpretation obvious to experts.

In short, expect hallucinations and other lies.

coder543 · on Aug 13, 2023

In this case, I validated against the hand-translated example in the main link. It did fine.

This is probably easy mode for it, given the tiny vocabulary at hand, even though it has surely seen very little Toki Pona in the wild. Large language models are extremely good at translation. I’ve personally tested ChatGPT-4 against Google Translate and other translators on some hard translations (languages that are not similar to English, such as Chinese), and it has consistently provided better translations than Google Translate in my opinion. As an example, for validation purposes, news stories that have been reported worldwide make things easier, since you can establish the facts of the translation by hand, and then compare what comes out.

What does your comment actually add to the discussion? Yes, ChatGPT can be wrong. So can Google Translate!

If you’d like to provide a clear example of ChatGPT-4 (not ChatGPT-3.5) making up a Toki Pona translation that is completely disconnected from reality, please do. That would be legitimately interesting conversational material. General cynicism isn’t.

ChatGPT absolutely has limitations, and translation is a hard problem, even for the industry-standard tools. Crossvalidation with multiple translation tools and ideally humans is important for any translations that actually matter.

janAkali · on Aug 14, 2023

Another challenge, is that toki pona requires a lot of context, not only of previous sentences, but visual and communal. For example, I can say 'soweli lili' and point to a cat, then in all further conversations 'soweli lili' will mean exactly a 'cat' until specified.

I could use sentence that could mean literally hundred thousand of different things, but if I explain it properly once, you are expected to keep it as context.

ChatGPT4 will struggle to keep all context as it will surely accumulate.

dotancohen · on Aug 13, 2023

  > In this case, I validated against the hand-translated example in the main link. It did fine.

Presumably that hand-translated example was also part of the training data.

mistercow · on Aug 14, 2023

I've done a fair amount of translating Toki Pona based on my own sentences, and based on recent online posts. None of that would have been in the training set. It does a good job. Since Toki Pona translation is "lossy", you can even prompt it to give you multiple interpretations and discussion.

coder543 · on Aug 13, 2023

Not likely. The article says it was posted 1 year ago. ChatGPT’s training data cutoff is September 2021, so no, that would not make sense.

NoZebra120vClip · on Aug 13, 2023

Since LLMs' only valid uses involve entertainment and creativity, and Toki Pona is more or less a recreational language without serious application, I think they go together like two peas in a pod.

MostlyStable · on Aug 13, 2023

>only valid use...

Well. That's an opinion I suppose.

-edit- After posting, I decided this was too snarky/low value on it's own. So to add:

I personally get quite a bit of value out of it as an ad-hoc German tutor and as one-among-many tools for research and learning on new topics for which I am not already an expert.

I've also used it to help me make and troubleshoot very simple code in languages that I am not an expert in.

I personally have trivially found lots of uses that probably all cluster around "educational" to one degree or another.

Is it perfect? Absolutely not. Does it hallucinate/get things wrong? Absolutely. Is it useful despite those limitations? Emphatically yes.

This idea that its "only valid use" is entertainment doesn't even make sense to me. "Entertainment" is perhaps where I find it the most deficient relative to other current options.

kordlessagain · on Aug 13, 2023

Lies are typically done on purpose and in this example used to indicate intent, when there is none.

clankyclanker · on Aug 17, 2023

Nope:

https://news.ycombinator.com/item?id=35483823

devit · on Aug 13, 2023

Seems ineffective with abstract concepts.

"In mathematics, a real number is a number that can be used to measure a continuous one-dimensional quantity such as a distance, duration or temperature. Here, continuous means that pairs of values can have arbitrarily small differences. Every real number can be almost uniquely represented by an infinite decimal expansion."

->

"toki pi nanpa, nanpa lon li nanpa li ken lukin e suli tawa wan. lon ni la, nanpa tu li ken ala li poka li ken lili. nanpa lon ali li ken lukin kepeken nimi nanpa mute tawa pini ala"

->

"Talking about numbers, when a number is present, it can be seen to be important or significant to an individual."

"In this context, the number two might not be significant or might be seen as small or lesser."

"All numbers can be seen using many number names endlessly or without an end."

ChatGPT 4, obviously in separate chats: as you can see a complete failure to convey and roundtrip the original message.

tbodt · on Aug 13, 2023

I would definitely not attempt to translate something like that in one shot - I'd read it and digest it for a bit and come up with a way to say the same thing without being unnecessarily obtuse, Thing Explainer style. mi nanpa e mute ijo la kipisi ijo li ken ala lon. ijo wan sin li kama la mi suli wan e nanpa. taso mi ken ala suli kipisi e nanpa. nanpa pi mute ijo la ni li ken ala: ijo tu en kipisi pi ijo wan li lon. taso mi nanpa e suli ijo la kipisi nanpa ale li ken. ijo li kama suli sin pi kipisi wan la mi ken suli pi kipisi wan e nanpa. kipisi li ken lili ale. nasin nanpa ni li ken nanpa e suli ijo e suli tenpo e suli seli. nanpa pi suli ijo la ni li ken: mi sitelen e nanpa la sitelen ni li ken nanpa ni taso li sama ala sitelen pi nanpa ante. taso nanpa mute la sitelen ona li suli ale li ken ala pini. GPT3 doesn't translate this very well, it seems to miss the fine points of grammar, but maybe GPT4 would do better

coder543 · on Aug 14, 2023

This is what GPT4 came up with for the translation:

> The text you provided is in Toki Pona, a constructed language with a minimalistic vocabulary. The translation to English is challenging due to the highly context-dependent and simplified nature of the language. Without further context, here's a rough translation:

> "When I count many things, cutting things is not possible. When one new thing comes, I increase the count by one. But I can't increase the count by cutting it. In the case of counting many things, this is not possible: two things and a cut of one thing exist. But when I count large things, cutting all numbers is possible. When something becomes large by cutting one, I can increase the count by cutting one. Cutting can make everything small. This counting method can count large things, long times, and hot things. In the case of counting large things, it's possible: when I write a number, this writing can count it but is not the same as writing a different number. But with many numbers, its writing is always large and cannot end."

> Please note that the translation might not capture all the nuances, and the interpretation may vary based on the context in which the text is used.

tbodt · on Aug 14, 2023

Much closer

coder543 · on Aug 13, 2023

> Seems ineffective with abstract concepts

I’ve only heard of Toki Pona for about 20 minutes, but everything I’m reading says that’s just a normal property of the language, not a limitation of ChatGPT.

As a human, which of the 120 Toki Pona words would you confidently and correctly translate as “continuous” or “real” or any of these other linguistically uncommon concepts?

Based on my own chat with ChatGPT, I feel almost certain that it warned you about this exact limitation of the language, but you ignored that?

devit · on Aug 13, 2023

"continuous" (in this actually incorrect usage, since the correct term is "dense")

->

said of an algebraic structure such that for every x != y there exists z such that x < z and z < y

->

characteristic ("nimi"?) of a set ("poki") where for ("tawa?") all ("ale") thing ("ijo") different ("ante" + "lon"?) than other ("ante" + "nimi"?) thing ("ito") there exists ("lon") a new ("sin") thing ("ijo") such that the first ("wan" + "nimi" + "ito"?) thing is less ("lili") than the new thing ("sin" + "ito"?) and the new thing ("sin" + "ito"?) is less ("lili") that the second thing ("tu" + "nimi" + "ito"?)

Would need some way to join into a grammatical structure and probably a preamble that establishes that the context is mathematics.

Aachen · on Aug 13, 2023

This gives a 404 for me

coder543 · on Aug 13, 2023

Sorry, I had just deleted it from my history without realizing that it would break the share link. Updated the link with a new session demonstrating the same thing now.

Rebelgecko · on Aug 14, 2023

What about going the other way? I'm on chat gpt3.5 FWIW, but when I ask it to translate simple phrases from English to Toki Pona (electric car, barking dog, etc) it hallucinates words that aren't in the ~120 word dictionary I found.

coder543 · on Aug 14, 2023

As you noticed, ChatGPT3.5 is extremely prone to hallucination. It can still happen on ChatGPT4, but it is much less frequent, in my experience.

Here is what I got on the first try: https://chat.openai.com/share/88b0358e-c96d-4880-b939-7c62cf...

The answers seem plausible, and the words are actual Toki Pona words.

My special instructions have nothing to do with language or translation, but for the sake of transparency I did it a second time with custom instructions disabled: https://chat.openai.com/share/b85a59d9-2678-45c8-8c2e-8b8a80...

Regardless, there is a very limited dictionary to work with in Toki Pona, so communicating extremely specific ideas seems like it is always going to be problematic. The language doesn’t even have a specific word for “dog”.

weinzierl · on Aug 13, 2023

I always wanted to train an LLM from scratch. There is an interesting project that does just that with Esperanto and my immediate thought was I'll do it with Toki Pona. Unfortunately, I had vastly underestimated the size of the existing Esperanto corpus and overestimated size of the Toki Pona corpus.

So, I gave up on that idea because I believe that there is not enough training data, but I'd love to be proven wrong.

grondilu · on Aug 13, 2023

> There is an interesting project that does just that with Esperanto

That is intriguing. Link?

weinzierl · on Aug 14, 2023

https://huggingface.co/blog/how-to-train

andsoitis · on Aug 13, 2023

I just tried say "hello what is your name" in toki pona in ChatGPT and the response was:

In toki pona, "hello what is your name" can be expressed as "toki! nimi sina li seme?"

Aachen · on Aug 13, 2023

GPT3(?) is not that good, making quite basic mistakes, as in, I can pick them out and I'm an absolute noob at toki pona. GPT4 link that someone posted in a sibling comment gives 404 for me so I can't judge that, please include versions when saying that some LLM can do it :)

cosmojg · on Aug 13, 2023

Here's an excerpt from the linked ChatGPT conversation:

> [User]: Please translate this Toki Pona into English:

> [User]: o! mi sona e toki pona. mi lukin sona e toki mute... taso mi toki ike e toki ale. toki mute li jo e nimi mute. nimi mute li pona ala tawa mi. toki pona li pona e mi tan ni. toki pona li jo nimi lili. nimi lili li pona e me

> [GPT-4]: Here's a translation of the provided Toki Pona text into English:

> [GPT-4]: "Oh! I know Toki Pona. I have studied many languages... but I speak all languages poorly. Many languages have many words. Many words are not good for me. Toki Pona is good for me because of this. Toki Pona has few words. Few words are good for me."

> [GPT-4]: Note: Toki Pona is a minimalistic constructed language with a small vocabulary. The translation may not capture all the nuances, but this is a close representation of the message in English.

bfung · on Aug 14, 2023

Lol nice. Reminds me of that one HN post a few years ago… https://news.ycombinator.com/item?id=31840331

> grug brain developer not so smart, but grug brain developer program many long year and learn some things although mostly still confused

int_19h · on Aug 16, 2023

You can see what happens when you ask an LLM (in this case I believe it's GPT-3.5) to use Toki Pona as the primary language while also providing English translations:

https://chirper.ai/tokipona?tab=activity

Deciding on usefulness of this experiment is left as an exercise for the reader.