You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

7.9 KiB

Human Language

Human language is language used mostly by humans to communicate with each other; these languages are very hard to handle by computers (only quite recently neural network computer programs became able to show true understanding of human language). They are studies by linguists. Human languages are most commonly natural languages, i.e. ones that evolved naturally over many centuries such as English, Chinese, French or Latin, but there also exist a great number of so called constructed languages (conlangs), i.e. artificially made ones such as Esperanto, Interslavic or Lojban. But all of these are still human languages, different from e.g. computer languages such C or XML. Natural human languages practically always show significant irregularities (exceptions to general rules) while constructed languages typically try to eliminate irregularities as much as possible so as to make them easier to learn, but even a constructed human language is still extremely difficult for a computer to understand.

{ The following is a thought dump made without much research, please inform me if you're a linguist or something and have something enlightening to say, thank you <3 ~drummyfish }

On one hand human languages are cool when viewed from cultural or artistic perspective, they allow us to write poetry, describe feelings and nature around us -- in this way they can be considered beautiful. However from the perspective of others, e.g. programmers or historians, human languages are a nightmare. There is unfortunately an enormous, inherent curse connected to any human language, both natural or constructed, that comes from its inevitably fuzzy nature stemming from fuzziness or real life concepts, it's the problem of defining semantics of words and constructs. Syntax (i.e. the rules that say which sentences are valid and which are not) doesn't pose such a problem, we can quite easily define what's grammatically correct or not (it's not as hard to write a program that checks gramatical correctness), it is semantics (i.e. meanings) that is extremely hard to grasp -- even in rigorous languages (such as mathematical notation or programming languages) semantics is a bit harder to define (quite often still relying on bits of human language), but while in a programming language we are essentially able to define quite EXACTLY what each construct means (e.g. a + b returns the sum of values a and b), in a natural language we are basically never able to do that, we can only ever form fuzzy connections between other fuzzy concepts and we can never have anything fixed.

Due to this fuzziness human languages inevitably change over time no matter how hard we try to counter this, any text written a few thousand years ago is nowadays very hard to understand -- not because the old languages aren't spoken anymore, but because the original meanings of specific words, phrases and constructs are distroted by time; when learning an old language we learn what each word meant by reading its translation to some modern word, but the modern word is always more or less different. Even if it's a very simple word such as "fish", our modern word for fish means a slightly different thing than let's say ancient Roman's word for fish because it had slightly different connotations such as potential references to other things: fish for example used to be the symbol of Christianity, nowadays people don't even commonly make this connection. Some words may have referred to some contemporary "meme" that's been long forgotten and if some text makes the reference, we won't understand it. While the Spanish word "perro" translates to English as "dog", the meanings aren't the same; English speaking gangsters use the word as a synonym for "friend" but in Spanish the word can be used as an insult so shouting "perro" and "dog" in the street may lead to different images popping up in the heads of those who hear it. How do you describe a word precisely if you can only desribe it with other imprecise words that are changing constantly? No, not even pictures will help -- if you attach the picture of a cat to the word "cat", it's still not clear what it means -- does it stand for only the one cat that's in the picture or all other animals that are similar to the one in the picture? How similar? Is lion a cat? Is a toy cat or cartoon cat a cat? Or does the picture signify that anything with a fur is a cat? Now imagine describing a more abstract term such as thought, number or existence. There is no solid ground, even such essential words as "to want" or "to be" have different meanings between languages ("to be" can stand for "to exist", "to be in a place", "to temporarily have some property", "to permanently have some property" etc.). Even dictionaries admit defeat and are happy with having circular definitions because there aren't any foundations to build upon, circular definitions are inevitable, dictionaries just help you connect fuzzy concepts together. All of this extends to tenses, moods, cases and everything else. This can be very well seen e.g. with people interpreting old texts such as the Bible, for example some say Jesus claimed to be the son of God while others reject it, saying that even if he stated the sentence, it actually wasn't meant literally as it was a commonly used phrase that meant something else -- these people will argue about everything and they can comfortably interpret the same text in completely opposite ways. The point is that we just can't know.

This is a grand issue that people often overlook, it sadly allows any word to be twisted by politicians to anything they want, it destroys old knowledge and prevents us from communicating with clarity. This issue is very hard to solve, maybe impossible. It seems that due to the extreme complexity of real life our language can't operate with precise equations but rather has to settle with concepts that are just fuzzy blobs that our brains -- neural networks in our heads -- learn by trial and error over many years. We learn that if we hear the word X, it's best to react by feeling fear or turning our head or closing our eyes etc.

{ The only idea of a solution on how to make a "mathematically precise" human language for real world communication is the following. Firstly make a mathematical model of some artificial world that's similar to ours, for simplicity we can now just consider something like a 2D grid with differently colored cells, i.e. something like a cellular automaton. The world changes in steps and each cell can "talk", i.e. at any frame it can emit a text string. Now make a language that's precisely defined in this world; if the world is simple, it's pretty doable e.g. like this: write a function in some programming language that takes the world and check if what the cells are saying classifies as your language used in a correct way within this world (so the function just returns true/false, nothing else is needed). Now this single function mathematically defines your language -- by looking at your function's source code anyone can derive the absolutely correct meaning of any word or sentence because he can see how the function checks whether that word of phrase is used correctly, he will know exactly which situations fit given sentence and which don't. Now the final step is only to find correspondence between the real life and your simplified mathematical world, e.g. that cells represent humans and so on (but this will have shortcomings, e.g. our simple world will make it difficult or impossible to talk about body parts since cells have none; also making the connection between the mathematical world and real world relies on intuition). ~drummyfish }