If your English is grammatically solid but listeners still ask you to repeat yourself, the problem is almost always one of three things: a vowel contrast your first language doesn’t make, a consonant your mouth wasn’t trained to produce, or the way English glues words together when people actually speak it. Grammar drills won’t fix any of those. This post walks through the specific sounds and patterns that trip up most learners, with drills you can do without a teacher.

A note on the IPA symbols below. I use them because they’re unambiguous. If you’ve never seen them, treat them as labels: /θ/ is the th in think, /ð/ is the th in this. The symbols matter less than the mouth positions they refer to.

Th: /θ/ vs /ð/

English is one of the few major languages with dental fricatives, sounds made by putting your tongue against your teeth and pushing air through. Most learners substitute /t/, /s/, /d/, or /z/, which is intelligible but instantly marks a non-native speaker.

/θ/ (voiceless): think, bath, mouth, three, throw

The tongue tip touches the back of your upper front teeth, or sticks out slightly between them. Push air through gently. Vocal cords off.

A drill that works: hold a thin strip of paper an inch from your lips. Say thin, thick, thirty, thunder. The paper should flutter. If it doesn’t move, you’re probably saying /s/ or /t/.

Minimal pairs for /θ/ vs /s/: think/sink, thought/sought, thumb/sum, thick/sick, theme/seem, path/pass.

/ð/ (voiced): this, that, mother, breathe, with

Same tongue position. Vocal cords on. Put two fingers on your throat. You should feel a buzz when you produce /ð/ that you don’t feel when you produce /θ/.

Minimal pairs for /ð/ vs /d/: they/day, then/den, those/doze, breathe/breed. The voiced th is rarer than the voiceless one but appears in extremely common words (the, this, that, these, those, them, there), so substituting /d/ is loud.

R vs L

If your first language is Japanese, Korean, or Mandarin, this contrast probably costs you more than any other. The two sounds aren’t on the same continuum. They’re produced with completely different tongue shapes.

/l/: the tongue tip presses up against the alveolar ridge, the bumpy area just behind your top teeth. The sides of the tongue drop, letting air flow around. The contact is firm.

/r/: the tongue doesn’t touch anything. Two methods both work. You can curl the tip up toward (not touching) the roof of your mouth, or bunch the middle of your tongue up while keeping the tip down. Most American speakers use the bunched version. Your lips also round slightly.

The trick: when practicing /r/, deliberately tense the muscle on the underside of your tongue and pull it back. If you can feel that pull, you’re probably making /r/. If your tongue is moving forward to touch something, you’re making /l/ or a tap sound from your first language.

Pairs: light/right, lice/rice, collect/correct, alive/arrive, glass/grass, play/pray, lock/rock, flesh/fresh.

A drill that helps. Say each word slowly. Then exaggerate by holding the consonant for two seconds. Llllight. Rrrright. You’re trying to make the muscular difference conscious before it can become automatic.

Vowels learners commonly merge

/iː/ vs /ɪ/, as in sheep vs ship. /iː/ is longer and tenser, with the lips spread. /ɪ/ is shorter, lax, lips relaxed. Most languages have one or the other but not both. Pairs: beat/bit, leave/live, feel/fill, deep/dip, sleep/slip.

/æ/ vs /ɛ/, as in bad vs bed. /æ/ requires you to drop your jaw further than feels natural in most languages. Pairs: bad/bed, sat/set, man/men, had/head, pan/pen.

/ʌ/ vs /ɑː/, as in cup vs cop (American English). /ʌ/ is short and central. /ɑː/ is open and back. Pairs: cup/cop, luck/lock, hut/hot, nut/not.

If you only have time to fix one vowel pair, fix the one in ship/sheep. It changes meaning more often than the others.

Connected speech: why “did you eat?” sounds like “didja eat?”

Native speakers don’t pronounce words as separate units. They link, drop, and reduce in patterns that are systematic, not lazy. Ignore the patterns and two things happen. You sound stilted, and you can’t understand fluent speech, because you’re listening for word boundaries that aren’t there.

Linking consonant to vowel

When a word ending in a consonant is followed by a word starting with a vowel, the consonant attaches to the next word.

  • take it off → “tay-ki-toff”
  • an apple → “a-napple”
  • clean up → “clea-nup”
  • not at all → “no-ta-tall”

T-flapping (American English)

A /t/ between two vowels becomes a quick tap that sounds like a soft /d/.

  • water → “wadder”
  • better → “bedder”
  • get out → “geh-dout”
  • what is it → “wha-di-zit”

Schwa reduction

Unstressed vowels collapse to /ə/, the most common vowel in English. Banana is /bəˈnænə/, not “ba-na-na” with three full vowels. Photograph (PHO-tə-graph) and photographer (phə-TO-grə-pher) don’t have the same vowels in the same positions, because stress shifts.

A handful of reductions are worth memorizing as fixed units, since they appear constantly in casual speech:

  • going to → “gonna” /ˈɡənə/
  • want to → “wanna” /ˈwɑnə/
  • did you → “didja” /ˈdɪdʒə/
  • would you → “wouldja” /ˈwʊdʒə/
  • got to → “gotta”
  • don’t know → “dunno”
  • let me → “lemme”

You don’t need to produce these reductions to sound fluent. Using full forms is fine. But you have to recognize them, or fast speech becomes a wall.

How to actually practice

Three techniques are worth more than the rest combined.

The first is shadowing. Pick a 30-second clip of a native speaker. A podcast, an interview, anything with clear audio. Play it once. Then play it again and try to speak along with the speaker, matching their rhythm and intonation, half a beat behind. You’re not translating, you’re imitating. Most learners find this excruciating for the first week and obvious afterwards.

The second is recording yourself and comparing. Record yourself reading a paragraph aloud. Then find a native recording of the same text (or close to it) and listen back-to-back. The first time you do this you will hate the sound of your own voice. Do it anyway. The gap between what you think you sound like and what you actually sound like is the whole problem you’re trying to fix.

The third is minimal pair work as perception, not just production. Don’t only read pairs out loud. Have someone (or an app) say one word from a pair, and try to identify which one it was. Production and perception are different skills, and perception is usually the bottleneck. If you can’t reliably hear the difference between bit and beat when someone else says them, you’ll never reliably produce them.

This kind of work has a lot in common with closing the comprehension-production gap: recognizing a sound is not the same as producing it on demand.

A final, unglamorous point. Pronunciation improvement is slow and non-linear. You’ll plateau, then a sound will click without warning. Don’t measure progress in days. Measure it in months.