_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
URI Visit Hacker News on the Web
COMMENT PAGE FOR:
URI Show HN: I trained a 9M speech model to fix my Mandarin tones
kris_builds wrote 29 min ago:
Super interesting project. Curious about the data collection - did you
record yourself, use existing datasets, or both? I've been thinking
about building something similar for Hebrew vowels (which are often
omitted in writing). Would love to hear what the hardest part of the
pipeline was.
victorbjorklund wrote 35 min ago:
Cool. Would love a write up about how you did it if you have time
kris_builds wrote 27 min ago:
+1 on wanting a writeup. The model architecture choices alone would
be interesting - did they use a transformer, CNN, or something
hybrid? And how they handled the tone pair ambiguities... Would read
that blog post for sure.
tomaytotomato wrote 55 min ago:
Can the implementation used here for tone and pronounciation apply for
Music?
It would be cool if a model could tell you if you are singing or
playing a piece of music with the right intonation and other ways.
contingencies wrote 1 hour 36 min ago:
Man, get a girlfriend.
zelphirkalt wrote 2 hours 5 min ago:
I think this is a good time for a shameless plug. The last 2 month or
so I am working on my own project [1] for learning more characters. I
have made a tool with powerful search function, training mode, and
other useful features, such as displaying plots that show you your
progress and whether you are reaching your daily training goal, and the
ability to save searches, a la Thunderbird saved filters. It is written
in Python and oldschool tkinter with custom widgets for a somewhat more
modern and capable feel. It is very configurable. Though currently
configuring it means touching a JSON file, as I have not yet bothered
writing GUI for that.
I am mostly developing this for myself, to have the perfect tool for
me, but I dare say, that I have not seen anything comparable and that I
let my 10y+ experience in learning Chinese influence my design
decisions. Oh, and it is free /libre software of course (AGPL). It
comes with an ever improving vocabulary file that has tons of metadata
about words, their usage, how to memorize them, etc. under ODbL (open
database license).
[1]
URI [1]: https://codeberg.org/ZelphirKaltstahl/xiaolong-dictionary
peterburkimsher wrote 1 hour 54 min ago:
Good to see that there are others learning and creating! Another
shameless plug for my translator site: [1] It takes text, adds
colours for tones, pinyin, literal, and parallel translations.
Thereâs also a character decomposition tool at the bottom of the
page which can be helpful if youâre able to recognise half a
character but canât remember the pronunciation for typing it.
The YouTube channel has some song lyrics, movie subtitles, and audio
Bible that might help with learning.
URI [1]: https://pingtype.github.io
redleader55 wrote 2 hours 51 min ago:
This is a very cool to have! Thanks for putting the time to build it.
For me it doesn't work very well. Even easy phrases like ä»å¾å¿ get
transcribed completely random "ma he yu". Is it maybe over-fitted to
some type of voice?
namelosw wrote 2 hours 53 min ago:
Impressive work! The idea and the UI is very intuitive.
Though, as a guy who speaks perfect mandarin from Beijing, Iâm
struggle even to pass the easy ones⦠So it can definitely used some
improvements. The example ä½ å¥½åé¥äºå returns hà o â hÇo,
fÄn â fà n, le â liÇo. The first two are the model listen my tone
mistakenly, and the last one should be le instead of liÇo in this
context.
Also I see in the comment section people are worry about tones. I can
guarantee tones are not particularly useful and you can communicate
with native speakers with all the tones messed up and thatâs
perfectly fine. Because as soon as you leave Beijing, youâll find all
the tones are shuffled because of every region has their own dialect
and accents, which doesnât stop people from communicate at all. So
donât let tone stuff slow your learning process down.
zelphirkalt wrote 2 hours 20 min ago:
About the tones not being as useful ... I think there are cases, in
which they matter. Take for example çç« and è¸æ¯: "æ xiongmao
åï¼" "Are there Pandas? " or "Do you have chest hair?". Another
one: æ¶é´ and äºä»¶. Sometimes it gets comical, but natives can
and some will be confused, when your tones are off by too much, and
the conversation just started, so that the context is not as narrowed
down. Context is key in the language. You can notice that, when you
are trying to join a conversation between natives. Until you
understand a phrase or most of a phrase, that gives you a hint for
the topic they are talking about, you will usually have a hard time
understanding anything.
I just tried the tool and it couldn't properly recognize a very
clearly pronounced "å" and instead heard some shi2. I think it
needs more training data or something. Or one needs a good mic.
tianqi wrote 2 hours 26 min ago:
Please allow me to share some of my views. I'm a native Mandarin
speaker.
> I can guarantee that tones are not particularly useful and that you
can communicate with native speakers with all the tones messed up,
and that's perfectly fine.
Not at all. Tones are extremely important. If you have all the tones
messed up, you can hardly communicate in Mandarin. It's true, as you
said, that different regions of China have different dialects, and
you'll find that people can communicate normally because: 1) The
tonal differences in nearby regions are not too significant, and
people can still try to understand based on context. And 2) In many
cases, people switch to regular Mandarin when their dialects cannot
communicate with each other. This is why Mandarin exists. It is an
officially regulated dialect that all Chinese people learn, to solve
the dialect problem among different regions. Chinese people may speak
their own dialects at hometown, but when two Chinese people meet and
find that their dialects cannot communicate, they immediately switch
to Mandarin. Therefore, the tones in Mandarin are very important. To
a considerable extent, Mandarin exists because of tones. You cannot
communicate in it with messed up tones.
mijoharas wrote 2 hours 32 min ago:
I feel like there is a commonly mentioned idea that "speaking a
foreign language is easier after having a drink or two".
I've found that especially true with Mandarin because (I think) a
beginner speaker is more likely to speak a little more quickly which
allows the listener to essentially ignore the occasional incorrect or
slightly mispronounced tone and understand the what theyî're trying
to say.
(This is anecdotal, but with n>1. Discussed and observed with other
Mandarin language learners)
samiv wrote 2 hours 41 min ago:
As a person who lived in Taiwan and reached C1 in Chinese, I can also
say that the tones are indeed less important than one might thing
once one can say more and communicate more context. In the beginning
when you're very limited in your expressive capacity and only can say
simple sentences there's less context and getting the tones wrong
does produce confusion.
"Because as soon as you leave Beijing, youâll find all the tones
are shuffled because of every region has their own dialect and
accents, which doesnât stop people from communicate at all. "
Isn't this in fact one of the reasons why China relies heavily on the
written language because the different regions lose vocal
communication ability as the changes in tones and pronounciations
render the language understandable to people from other regions?
zelphirkalt wrote 2 hours 16 min ago:
The point about being a beginner and having limited capacity to
express oneself is an important point. When you can say more, you
will also have learned more about the language's tendency to use
words of 2 syllables, rather than 1 syllable words. Using 2
syllables instead of 1 already removes a lot of ambiguity, and
people will understand you better.
JCharante wrote 4 hours 0 min ago:
Cool! I'm not great at Chinese but I have to speak slowly for it to
recognize the tones/words. I wonder how fast the training data is.
mentalgear wrote 4 hours 7 min ago:
Very cool ! Will you make the source available as well?
yunusabd wrote 4 hours 11 min ago:
Super nice, thanks for sharing!
There's one thing that gave me pause:
In the phrase ææ³å¦ä¸æ it identified "wén" as "guó". While my
pronunciation isn't perfect, there's no way that what I said is closer
to "guó" than to "wén".
This indicates to me that the model learned word structures instead of
tones here. "ZhÅng guó" probably appears in the training data a lot,
so the model has a bias towards recognizing that.
- Edit -
From the blog post:
> If my tone is wrong, I donât want the model to guess what I meant.
I want it to tell me what I actually said.
Your architecture also doesn't tell you what you actually said. It just
maps what you said to the likeliest of the 1254 syllables that you
allow. For example, it couldn't tell you that you said "wi" or "wr"
instead of "wo", because those syllables don't exist in your setup.
vjerancrnjak wrote 2 hours 52 min ago:
I tried just repeating guó for as many times as symbols and
repetition was not recognized.
Although I like the active aspect of the approach. Language apps
where sound is the main form of learning should have a great
advantage, as any written text just confuses as every country has its
own spin on orthography. Even pinyin, despite making sense, for a
beginner, has so many conflicting symbols.
yunusabd wrote 2 hours 37 min ago:
> I tried just repeating guó for as many times as symbols and
repetition was not recognized.
Can you elaborate? I'm not sure I understand.
sim04ful wrote 4 hours 24 min ago:
I'm also working on a Chinese learning app (heyzima.com) and my
"solution" to this was to use the TTS token/word log probabilities.
felixbecker wrote 4 hours 44 min ago:
What a brilliant project!
wenjian wrote 4 hours 51 min ago:
Chinese here, some of the tune is wrong, maybe the env here has some
noise, good luck on learning mandarin ;)
arjie wrote 5 hours 18 min ago:
Very cool. As a super newbie who's only made it to Pimsleur 15 and only
for the speaking, it would be cool to have a pinyin text entry and so
on. In the end, I just type into ChatGPT what I want and paste it in
your box so it's not a big deal.
martianlantern wrote 5 hours 29 min ago:
Nice! I need something similar for english now
olalonde wrote 5 hours 53 min ago:
It might be a mic issue but my wife, who is a native speaker, seems to
get most characters wrong. I will try again later in a quieter place
to see if that helps.
frozennothing wrote 6 hours 59 min ago:
This is really cool. Thank you for sharing. Before now I had not sought
to understand how this technology works under the hood, but seeing it
done at this scale made me curious to see if I could do something
similar.
cocoa19 wrote 7 hours 7 min ago:
Have you tried the Azure Speech Studio? I wonder how your custom model
compares to this solution.
I played around with python scripts for the same purpose. The AI gives
feedback that can be transformed to a percentage of correctness. One
annoyance is that for Mandarin, the percentage is calculated at the
character level, whereas with English, it gives you a more granular
score at the phoneme level.
dirteater_ wrote 6 hours 23 min ago:
IMO the SotA for this is [1] . Amazon suffers for similar
> One annoyance is that for Mandarin, the percentage is calculated at
the character level, whereas with English, it gives you a more
granular score at the phoneme level.
This is the case for most solutions you'd find for this task.
Probably because of the 1 character -> 1 syllable property. It's
pretty straightforward to split the detected pinyin into
initial+final and build a score from that though.
URI [1]: https://www.speechsuper.com/
bunderbunder wrote 7 hours 22 min ago:
This is very cool, but from one Mandarin learner to another Iâd
caution against relying too heavily on any external feedback mechanism
for improving your pronunciation.
If you canât easily hear your pronunciation mistakes so clearly it
hurts, consider putting more energy into training your ear. Adult
language learners usually have brains that have become resistant to,
but not incapable of, changing the parts of the brain responsible for
phoneme recognition. The neuroplasticity is still there but it needs
some nudging with focused exercises that make it clear to your brain
exactly what the problem is. Minimal pair recognition drills, for
example, are a great place to start.
Itâs not the most fun task, but itâs worth it. You will tighten the
pronunciation practice feedback loop much more than is possible with
external feedback, so a better accent is the most obvious benefit. But
beyond that, it will make a night and day difference for your listening
comprehension. And that will get you access to more interesting
learning materials sooner. Which hopefully increases your enjoyment and
hence your time on task. Plus, more accurate and automatic phoneme
recognition leaves more neurological resources free for processing
other aspects of your input materials. So it may even help speed things
like vocabulary and grammar acquisition.
barrell wrote 5 hours 39 min ago:
Iâm building a language learning app [ [1] ] and this is really
good advice. Iâve not had any interest in SST for the application,
and have no plans to integrate it. In my experience, Iâve never
seen them be truly beneficial in the language learning process.
What has been extremely beneficial has been having the text and audio
forced aligned and highlighted, kareoke-style, every time I hear the
audio. It has improved my phoneme recognition remarkably well with
remarkably little content. Several users also report the same thing -
that even native speech feels a lot more like separate words than
just a slew of sounds. I attribute this in large part just due to
this kareoke style audio. It works better for phonetic scripts, so I
would recommend using this with pinyin/jyutping/furigana for
character based languages.
For production, when I was at Regina Coeli (world-class language
institute) their main thing was just 1. you hear a short passage in
Dutch, 10-40 words 2. you record yourself reading the same passage
and 3. you play back the two audio tracks on top of one another and
listen for the difference. Optional step 4. Re-record and replay
until itâs close enough.
There was no grading, no teacher checking recordings, no right or
wrong; just hundreds of random sentences and a simple app to layer
them. You needed to learn to hear the differences yourself and
experiment until you no longer could. (fwiw this is not present in
phrasing, I just found it relevant. One day soon I hope to add it!)
URI [1]: https://phrasing.app
zdc1 wrote 6 hours 44 min ago:
I completely agree with this. There's a certain confidence you get
when you can hear a word you don't know, but can still comprehend it
well enough to know what pinyin to type into your dictionary app.
Mandarin Blueprint has a nice pinyin pronunciation video on YouTube
that I worked through a while ago, and then followed with a few weeks
of immersion in Taiwan, I was able to really pick out what people
were saying.
I feel like listening is the key to speaking. You don't necessarily
need to rote learn the tones for each word. You just need say words
as you hear them spoken by others.
memalign wrote 7 hours 48 min ago:
I wish this had a pinyin modeâ¦! I am learning to speak Mandarin but I
am not learning to read/write.
( Iâm learning using a flashcards web app I made and continue to
update with vocab I encounter or need: [1] )
URI [1]: https://memalign.github.io/m/mandarin/cards/index.html
siwatanejo wrote 3 hours 4 min ago:
+1 for pinyin
knocte wrote 3 hours 3 min ago:
+1
data_ders wrote 7 hours 47 min ago:
same! but if you get it inevitably wrong the first time it gives you
the pinyin. but i struggled to get it to transcribe the consonants I
was making let alone the tones. i'm pretty sure i'm not as bad as
that!
rablackburn wrote 7 hours 56 min ago:
> And if thereâs one thing weâve learned over the last decade,
itâs the bitter lesson: when you have enough data and compute,
learned representations usually beat carefully hand-tuned systems.
There are still holdouts!
Come back to me in a couple of decades when the trove of humanity's
data has been pored over and drifted further out of sync with
(verifiable) reality.
Hand-tuning is the only way to make progress when you've hit a domain's
limits. Go deep and have fun.
tifan wrote 7 hours 57 min ago:
Well, it would work only when I speak word by word, not as a sentence
or in a normal speed for daily conversations. The model thinks I was
making mistakes when I speak casually (as a native Chinese speaker, I
had Mandarin 2A certification, which is required for teachers or other
occupations that requires a very high degree of Mandarin accuracy). You
wouldnât really notice it but language pronunciations is very
different between causal and formal speechâ¦
iamanllm wrote 8 hours 6 min ago:
holy crap, I was literally imaging how I wanted something exactly like
this yesterday! you are a hero!
baby wrote 8 hours 24 min ago:
For people trying to say the "j" sound correctly, as in "jiu" (old),
just say "dz", so in that example "dziu"
stuxnet79 wrote 8 hours 37 min ago:
How difficult would it be to adapt this to Cantonese? It is a
surprisingly difficult language to learn. It has more tones than
Mandarin plus comparatively less access to learning resources (in my
experience)
inkyoto wrote 5 hours 23 min ago:
Unlike Mandarin and other Chinese languages, Cantonese does not have
tone sandhi and has changed tones instead.
Cantonese tones are also different from those of Mandarin, so no, it
can't be adopted for Cantonese and it would require a complete
rework.
> It is a surprisingly difficult language to learn.
I keep hearing this quite a bit, but I do not find Cantonese to be
any more difficult than most languages[0]. Or at least we would need
to define a metric based on which we could assess the difficulty. If
it is the number of tones, their number (six â no, not nine) may
look formidable at first, but they are, in fact, rather simple tones
and broadly fall into three categories: flat, rising, and falling. As
a random example, Cantonese does not even have a dipping tone.
In comparison, «fancy» tones of Vietnamese are significantly more
challenging or even difficult â they can curl and unfurl (so to
speak).
[0] That crown appears to belong to Archi, with honourable mentions
going out to Inuit, Basque, Georgian, Navajo, Yimas and several other
polysynthetic languages.
hnfong wrote 4 hours 7 min ago:
Cantonese is "hard" mainly for two reasons-
1. tones, and generally the gatekeeping of some Cantonese
communities towards people who haven't gotten the tones completely
right
2. the lack of learning materials relative to the number of
speakers, the confusion between written Chinese and written
Cantonese (and also the general lack of the latter)
As they say, "a language is a dialect with an army and navy"...
I'll leave it at that.
ChadNauseam wrote 8 hours 46 min ago:
This is amazing. I'm also working on free language learning tech. (I
have some SOTA NLP models on huggingface and a free app.) I have some
SOTA NLP models on huggingface and a free app. My most recent research
is a list of every phrase [0].
Pronunciation correction is an insanely underdeveloped field. Hit me up
via email/twitter/discord (my bio) if you're interested in collabing.
[0]:
URI [1]: https://gist.github.com/anchpop/acbfb6599ce8c273cc89c7d1bb363e...
SequoiaHope wrote 8 hours 50 min ago:
Amazingly I just did the same thing! Only with AISHELL. It needs work.
I used the encoder from the Meta MMS model.
URI [1]: https://github.com/sequoia-hope/mandarin-practice
byb wrote 8 hours 50 min ago:
Neat. A personal tone trainer. Seriously, shut up and take my money
now. Of course, it needs a vocabulary trainer, and zhuyin/traditional
character support.
dionian wrote 8 hours 59 min ago:
it heard wu2 but i heard wo2 from you fine. and it should sound like
wo2 not wo3 if spoken quickly. not a native speaker though so i could
be wrong
jrockway wrote 9 hours 1 min ago:
Interesting application! A friend of mine built a model like this to
help her make her voice more feminine, and it is neat to see a similar
use case here.
nirvanatikku wrote 9 hours 20 min ago:
talk about 30 seconds to wow. great app, UX and demo. would love to use
this. kudos.
cmuguythrow wrote 9 hours 20 min ago:
Awesome idea!
bytesandbits wrote 9 hours 34 min ago:
great work! I am going to try it out. Currently about to learn some
Mandarin to be able to talk with hawker stand owners for a trip I am
doing soon. I am trilingual and can speak a few languages on top of
that, but none of them tonal. I am new to tonal languages and I find
myself struggling with this... a lot!
anonzzzies wrote 9 hours 26 min ago:
goof luck! I speak 6 languages fluent but none of them tonal and I
find mandarin very challenging; it does not help that people in
places where you might need it are not very forgiving; asking for
green fork in a tea shop has people very bewildered.
ecshafer wrote 9 hours 38 min ago:
Anyone that is a native European language speaker that hasn't tried to
learn Chinese or some other tonal language, its really hard to
understand how hard it is. The tones can really be very subtle, and
your ear is not fine tuned to them. So you think you are saying it
right, but native speakers have no idea what you are saying.
DiogenesKynikos wrote 3 hours 52 min ago:
The tones are really not as difficult as people make them out to be.
90% of the effort in learning any language is just learning massive
amounts of vocabulary.
Things like tone and grammar are the very basics that you learn right
at the beginning.â¡ Beginners complain about them, but after a few
months of studying Chinese, you should be fairly comfortable with the
tones. Then, you spend years learning vocabulary.
The two things that make Chinese difficult are:
1. The lack of shared vocabulary with Indo-European languages (this
obviously doesn't apply if your native language is something with
more shared vocabulary with Chinese).
2. The writing system, which because it's not phonetic requires
essentially the same level of effort as learning an entirely new
language (beyond spoken Chinese).
â¡. The same goes for grammar issues (like declension and
conjugation) that people always complain about when learning
Indo-European languages. These are the very basics that you learn
early on. Most of the real effort is in learning vocab.
snicky wrote 2 hours 3 min ago:
> 2. The writing system, which because it's not phonetic requires
essentially the same level of effort as learning an entirely new
language (beyond spoken Chinese).
This is an interesting observation. Another one that I sometimes
mention to my friends who didn't have an occasion to learn Chinese
before is that in this language speaking, reading and writing are
actually 3 separate components. You can read characters without
knowing how to write them properly or even remembering them
entirely. Lots of my Taiwanese acquaintances forget how to write
certain characters, because nowadays most of the text they write is
in bopomofo on their phones. Bopomofo represents sounds, so
basically knowing how an expression sounds and being able to read
the character (pick it from a set of given characters for the
chosen sound) is enough to "write" it.
vjvjvjvjghv wrote 8 hours 0 min ago:
Agree. Itâs really hard. It also explains why a lot of people born
in China tend to make serious pronunciation errors when speaking
English or German. They are used to focus on different things than us
westerners.
It took me very long time to really understand how impersonating tone
is in Chinese.
DiogenesKynikos wrote 3 hours 41 min ago:
The reason why Chinese people have difficulty pronouncing
Indo-European languages is that Chinese has a very limited set of
syllables, and they always follow the pattern (consonant) + vowel +
(nasal/rhotic consonant), with possibly one of the consonants being
dropped.
Chinese does not have clusters of consonants like "rst" in "first."
The closest thing in Chinese phonology to "first" would be
something like "fi-re-se-te." If you grow up never pronouncing
consonant clusters, they are incredibly difficult to learn.
This is all related to the existence of tones, but tones are not
the direct reason why Chinese people have difficulty pronouncing
words like "first." Tone provides one additional way of
differentiating syllables, so Chinese can get away with having far
fewer syllables than non-tonal languages. You essentially get 4-5
different versions of every syllable.
danparsonson wrote 8 hours 15 min ago:
Wholeheartedly (or maybe downheartedly?) agree with this - sometimes
I try to say the simplest things and people just stare at me like I'm
speaking Martian. Which I suppose I might as well be! One of my big
problems is implicit use of tones for things like expressing
uncertainty; that's a very difficult habit to get out of.
bunderbunder wrote 7 hours 6 min ago:
Another one that I wish I had realized sooner is that, contrary to
the impression teachers tend to convey, tones arenât just a pitch
contour thing. There are also intensity and cadence elements.
Native speakers can fairly accurately recognize tones in recordings
that have had all the pitch contour autotuned out.
laurieg wrote 8 hours 52 min ago:
For someone who hasn't grown up speaking an language with tones or
pitches, the process of learning them can be maddening. I applaud
anyone who makes tools like this to try to make the process easier.
My experience in learning Japanese pitch accent was eye-opening. At
the start, I couldn't hear any difference. On quizzes I essentially
scored the same as random guessing.
The first thing that helped me a lot was noticing how there were
things in my native language (English) that used pitch information.
For example, "uh-oh" has a high-low pitch. If you say it wrong it
sounds very strange. "Uh-huh" to show understanding goes low-high.
Again, if you reverse it it sounds unusual.
The next part was just doing lots of practice with minimal pairs.
Each time I would listen and try my best to work out where the pitch
changed. This took quite a lot of time. I feel like massed practice
(many hours in a day) helped me more than trying to do 10 minutes
regularly. Try to hear them correctly, but don't try too hard. I
didn't have any luck with trying harder to 'understand' what was
going on. I liken it to trying to learn to see a new color. There
isn't much conscious thought.
The final piece of the puzzle was learning phrases, not individual
words, that had pitch changes. For example: "yudetamago" could be
boiled egg or boiled grandchildren. Somehow my brain just had a much
easier time latching on to multi-word phrases instead of single
words. Listening to kaki (persimmon) vs kaki (oyster) again and
again seemed much harder.
Of course, your mileage may vary with these techniques. I already
spoke decent Japanese when I started doing this.
ronyeh wrote 5 hours 54 min ago:
> For example, "uh-oh" has a high-low pitch. If you say it wrong it
sounds very strange. "Uh-huh" to show understanding goes low-high.
Again, if you reverse it it sounds unusual.
Wow⦠Thanks for making it clear that English also has tones! I
hadnât thought of it this way before. âUh-huhâ sounds similar
to Mandarin tones 3 & 2. âUh-ohâ is similar to Cantonese tones
1 & 3.
Iâm wondering if we can find good examples to teach the Mandarin
tones. I think two or three syllable words are best because it
illustrates the contour of the tones.
dionian wrote 8 hours 56 min ago:
its critical because without proper tonal enunciation the words can
be ambiguous.
cyberax wrote 9 hours 25 min ago:
I'm a native Russian speaker, and I decided to learn Mandarin,
because it's linguistically almost the opposite of Russian.
I had no problems with tone pronunciation, but tone recognition was
indeed much trickier. I still often get lost when listening to fast
speech although I can follow formal speech (news) usually without
problems.
thenthenthen wrote 3 hours 21 min ago:
Euro speaker here, no problem with recognising tones but speaking
themâ¦:/
barrell wrote 5 hours 59 min ago:
I recently started learning a tonal language, and so far have not
struggled too much wrt tones when everything is slow. There was an
original strangeness and refusal for my vocal cords to want to work
that way, but probably only for the first month or so.
At least, this is the case for slow text. Once the text is sped up
itâs amazing how my brain just stops processing that information.
Both listening and speaking.
Iâm sure this will come with practice and time but for now I find
it fascinating
dapangzi wrote 9 hours 42 min ago:
Longtime lurker, made an account specifically to give feedback here as
an intermediate speaker. :)
This is a great initiative and I hope to see more come out of this; I
am not criticizing, but just want to provide my user experience here so
you have data points.
In short, my experience lines up with your native speakers.
I found that it loses track of the phonemes when speaking quickly, and
tones don't seem to line up when speaking at normal conversational
speed.
For example, if I say 仿¯æçæå at normal conversational
speed, it will assign `de` to æ, sometimes it interprets that I
didn't have the retroflexive in `shi` and renders it `si`. Listened
back to make sure I said everything, the phonemes are there in the
recording, but the UI displays the wrong phonemes and tones.
By contrast, if I speak slowly and really push each tone, the phonemes
and tones all register correctly.
Also, is this taking into account tone transformation? Example, third
tones (bottom out tone) tend to smoosh into a second tone (rising) when
multiple third tones are spoken in a row. Sometimes the first tone
influences the next tone slightly, etc.
Again, great initiative, but I think it needs a way to deal with speech
that is conversationally spoken and maybe even slurred a bit due to the
nature of conversational level speech.
mercanlIl wrote 7 hours 32 min ago:
The tool definitely needs to address tone transformations, itâs a
big part of how the language is spoken. Otherwise itâs mostly
useful for a first year student speaking in isolation.
Hoping to see improvements in this area
sqs wrote 7 hours 54 min ago:
I don't think it takes care of tone transformation (eg 仿¯ ni3shi4
-> ni2shi4). Or if it does, my tones are just off. But it's a really
cool idea!
carlmr wrote 4 hours 8 min ago:
仿¯ is tÄshì which doesn't transform I think. Did you mean to
write ä½ æ¯ nÇshì? I think that transforms differently though.
With the half 3rd tone only dropping.
The classical example is 4/4 䏿¯. Which goes bùshì -> búshì.
Or 3/3 that becomes 2/3. E.g. ä½ å¥½ nÇhÇo becoming nÃhÇo.
The 1/4 -> 2/4 transformation I think is specific to one. ä¸ä¸ª
yÄ«gè becomes yÃgè.
jhanschoo wrote 6 hours 19 min ago:
The tone sandhi example you just gave looks incorrect to me
jimz wrote 4 hours 1 min ago:
Well, OP wrote "he is" but then wrote "you are" in pinyin for
one, and that's a bit hard to reconcile.
tifan wrote 7 hours 56 min ago:
I had the same issue! Perhaps being another dapangzi is the problem
here lol
et-al wrote 6 hours 54 min ago:
I'm not familiar with this slang: what's a big plate?
allan_s wrote 5 hours 6 min ago:
It's a slang for somebody fat. å does not carry a specific
meaning it is more a character with grammatical function to
nominative
dirteater_ wrote 6 hours 34 min ago:
the commenter's username (i'm guessing they mean 大èå, feel
free to google translate)
affogarty wrote 9 hours 57 min ago:
This is extremely cool, although I asked my wife (who is Chinese) to
try it out and it said she made some mistakes.
hawflakes wrote 6 hours 50 min ago:
I tried it out and it has some issues with my native speech. I grew
up with more Taiwan mandarin but I know the Beijing standard and the
recognizer was flagging some of my utterances incorrectly.
btrlsnqtn wrote 9 hours 59 min ago:
The article mentions the bitter lesson. I'm confused about the status
of Sutton's opinion of the bitter lesson. On the one hand, he invented
the concept. On the other hand, he appears to be saying that LLMs are
not the correct approach to artificial intelligence, which to a naive
outsider looks like a contradiction. What gives?
allan_s wrote 5 hours 23 min ago:
Maybe he means that LLM will hit a ceiling glass or that the "right"
approach will give equivalent with less training/less intensive
compute requirements ?
drekipus wrote 10 hours 11 min ago:
instantly awesome.
I suck at chinese but I want to get better and I'm too embarassed to
try and talk with real people and practise.
This is a great compromise. even just practising for a few minutes I
already feel way more confident based on its feedback, and I feel like
I know more about the details of pronunciation.
I'm worried this might get too big and start sucking like everything
else.
rahimnathwani wrote 10 hours 23 min ago:
This is incredible. When I was first learning Chinese (casually, ~20
years ago), my teacher used some Windows software that drew a diagram
of the shape of my pronunciation, so she could illustrate what I was
getting wrong in some objective way.
The thing you've built is so good, and I would have loved to have it
when I was learning Mandarin.
I tried it with a couple of sentences and it did a good job of
identifying which tones were off.
yunusabd wrote 4 hours 6 min ago:
You're probably thinking of Praat, which is still around. Even has
the same UI as 20 years ago.
vunderba wrote 10 hours 27 min ago:
When I was living in Taiwan, one of the ways I forced myself to
remember to pronounce the tones distinctly was by waving my hand in
front of me, tracing the arc of each characterâs tone.
It helped a lot even if I did look like an insane expat conducting an
invisible orchestra.
One more thing: there's quite a bit of variation in how regional
accents in the mainland can affect tonal pronunciation. It might be
worth reaching to some native speakers to give you some baseline
figures.
sowbug wrote 7 hours 22 min ago:
You'll love Mike Laoshi:
URI [1]: https://youtu.be/cna89A2KAU4?si=SQEZ_0ooO1z119_k
cyberax wrote 9 hours 23 min ago:
Hand motions help! Especially when you want to memorize new words,
because initially you need to treat tone as something additional to
remember.
I used simple index finger motions to mark tones.
devin wrote 9 hours 32 min ago:
This sounds like how solfeg training works. You use a hand signal to
indicate a specific tone: do re mi fa so la ti
zdragnar wrote 9 hours 43 min ago:
In a university Mandarin class, one of the adult students (i.e.
probably 40 or so) WAY over exaggerated his tones, to the point that
the little old lady teaching us laughed out loud after one of his
answers.
A few years later, he had the most clean and consistent pronunciation
out of anyone I'd been in a class with, and easily switched between
the Beijing and other accents depending on which teacher we had on
any given day.
I rather regret not emulating him, even though I haven't really used
it for nearly 20 years and have forgotten most of it.
luckydata wrote 8 hours 53 min ago:
that's EXACTLY how I taught myself to speak with a Spanish accent
from Madrid. I repeated the way tv celebrities and the speakers on
the metro announced the stations, and it gave me a base for how to
use my mouth and throat appropriately. After a while I was able to
tone it down and my accent got so good that locals couldn't tell I
wasn't spanish - I had this cool party trick pulling out my id and
showing them I was truly a foreigner!
ecshafer wrote 9 hours 40 min ago:
From a language learning standpoint that does make sense.
Over-exageration while you are learning to help cement the idea,
and then when you are speaking more naturally you will fall back
into a regular kind of tone.
mleonhard wrote 5 hours 1 min ago:
Over-exaggeration also works well when learning to play stringed
instruments like cello.
simedw wrote 10 hours 14 min ago:
For accents, Iâve mostly tested with a few friends so far. Iâm
wondering whether region should be a parameter, because training on
all dialects might make the system too lax.
vunderba wrote 4 hours 40 min ago:
Probably be a lot of work but it would be really interesting if you
had sufficient data sets to train across accents.
Highly recommend taking a look at Phonemica for this:
URI [1]: https://phonemica.net/
jellojello wrote 10 hours 34 min ago:
This is amazing, if you feel like opening an entire language to being
learned more easily.. Farsi is a VERY overlooked language, my wife/her
family speak it but it's so difficult finding great language lessons
(it's also called Persian/Dari)
peterburkimsher wrote 1 hour 44 min ago:
I made a parallel literal translator for Farsi: [1] Paste in some
parallel text (e.g. Bible verses, movie subtitles, song lyrics) and
read what Farsi you can on the first line, looking to the lower lines
for clues if you get stuck.
The core version of Pingtype is for traditional Chinese, but it
supports a few other languages too.
URI [1]: https://pingtype.github.io/farsi.html
simedw wrote 10 hours 22 min ago:
Thank you.
I had a quick look at Farsi datasets, and there seem to be a few
options. That said, written Farsi doesnât include short vowelsâ¦
so can you derive pronunciation from the text using rules?
kranner wrote 10 hours 4 min ago:
> written Farsi doesnât include short vowels⦠so can you derive
pronunciation from the text using rules?
You can't, but Farsi dictionaries list the missing short
vowels/diacritics/"eraab" for every word.
For instance, see this entry: [1] With the short vowel on the first
letter it would be written ØÙساب (normally written as just
ØØ³Ø§Ø¨)
The dictionary entry linked shows that there is a Ù on the first
letter Ø
But you would have to disambiguate between homographs that differ
only in the eraab.
URI [1]: https://vajehyab.com/dehkhoda/%D8%AD%D8%B3%D8%A7%D8%A8?q=%...
DIR <- back to front page