methodology · general
What good in-workout AI coaching actually sounds like
Most AI fitness coaches do not fail because they have nothing useful to say. They fail because they say useful things at the wrong moments, too generically, and too often. The pattern is consistent across review sites and across products. Read the Tom’s Guide piece on ChatGPT as a personal trainer and you will see the same complaint distilled: the AI talks like it has read a textbook on coaching and has never met you.
This is not a technology problem. It is a design problem. And it shows up the moment you put headphones in and start a real training session.
You can build an AI coach that adapts your plan, tracks your volume, and analyses your splits, and it will still feel useless if it speaks at the wrong moment with a line that could have been written for anyone.
The three failure modes you keep seeing
If you read enough product reviews, three patterns repeat.
Generic phrasing. The AI says “great job, keep pushing” when you finish a set. Then it says the same thing on the next set. Then it says it again at the end of the workout. There is no athlete in the loop, just a script being triggered by rep counts. Trainerize’s analysis frames this as the core limitation of current AI products: they can suggest, but they cannot coach, because algorithms can track numbers without ever recognising the subtle cues of fatigue or progress that make a coaching line land.
Wrong timing. The AI is silent when you set a personal record, then interrupts your rest with a motivational quote between sets when you are trying to focus. Or it stays quiet for the entire session and then sends a notification two hours after you finish suggesting you “stay consistent”. Timing is half of what makes coaching feel like coaching, and most AI products treat the workout as a single event rather than a sequence of distinct moments with very different communication needs.
Tracking confused with coaching. Many apps log your sets, calculate your one-rep max projections, and visualise your volume trends, then call that AI coaching. It is not. Tracking shows you what you did. Coaching tells you what to do about it. The two require different inputs, different timing, and different language, and conflating them produces an experience that feels like reading a dashboard during your workout.
The cumulative effect is what most reviewers actually mean when they call an AI fitness coach “generic” or “annoying”. It is rarely the content. It is the cadence and the calibration.
What in-workout coaching is actually for
Step back and ask what a coach is doing during a real training session with a real athlete.
They are not delivering information. The athlete already has the plan. They are not motivating in the cheerleader sense. The athlete chose to be there. What the coach is actually doing is shaping the athlete’s attention at specific moments where attention shapes the outcome. The first set sets the tone. The mid-session moments determine whether the athlete stays in the work or drifts. The closing exercise decides what the session feels like in memory.
This is well-grounded in the psychology of intrinsic motivation and attentional control during exercise. The Stanford HAI write-up on AI health coaches makes the same observation in a different language: the value of a coach during effort is mindset calibration, not information delivery.
The implication for in-workout AI is direct. A coach should speak at the moments where speaking changes the next decision the athlete makes, and stay quiet the rest of the time. Said differently, the cost of a generic line is the credibility of the next specific one. Every wasted intervention makes the calibrated one harder to hear.
The three moments that matter
There are three windows in a real session where calibrated coaching does meaningful work. The rest of the session, by design, should be quiet.
At the start. This is where intent gets named. Not a generic “let’s crush it” line. A specific acknowledgement of what this session is and what it is for, ideally with one number that anchors the work in the athlete’s actual history. “Lower body day, three sessions deep this week, last time you hit 100kg for eight on the squat” is coaching. “Time to train” is noise. The athlete already knew it was time to train.
The American Heart Association’s framing of AI in workouts makes the same point about pre-session AI: the highest-value moment is using data the athlete cannot easily see themselves, not restating what they already know.
During the work. Here is where most AI products fail hardest, because they assume more frequency equals more value. The opposite is true. The athlete is in flow, focused on the next set, and the wrong interruption breaks attention without adding anything. The right rule is calibrated frequency: certain events always warrant a line, others get spoken about with cooldowns, and most sets get no comment at all.
A personal record on a tracked lift should always trigger a specific coaching line, because that is genuinely noteworthy and the athlete will remember it. A set that hit target after a string of misses earns a line, because the pattern is meaningful. A clean set that does nothing surprising earns silence, because there is nothing to add that the rep counter has not already said.
At the close. The final exercise sets the tone the athlete carries out of the gym. A generic “great workout” line lands as polite filler. A specific line that names what was accomplished, ties it to what the athlete is training for, and frames the closing effort as the finishing move of a session that mattered, does real work. This is the moment athletes remember, and good coaching invests in it.
Three moments. Calibrated lines. Silence in between. That is the structural shape of in-workout coaching that actually feels like coaching.
Why citing a datum is the entire game
There is one rule that separates calibrated coaching from generic phrasing, and it is simple enough to test.
Every coaching line should reference a specific datum from the athlete’s training. A weight. A rep count. A comparison to last time. A position in the session (“set three of four”). A reference to a goal. A reference to recent volume. Something the athlete can verify is true about their actual training, not a phrase that could have been generated for anyone holding a barbell.
This is the difference between “great job on those squats” and “100kg for eight, two reps clearer than last week at this weight”. The first is decoration. The second is coaching. The second also requires the system to actually know what happened last week, which is why coaching that feels real cannot be separated from tracking that is comprehensive. We have written more about this in why multisport athletes who do not track are training blind and in our piece on the process metrics that actually drive long-term progress.
The reason this rule is so reliable is mechanistic. Generic coaching lines fail to produce attentional shift because the athlete’s brain has heard them before, in too many contexts, and discounts them automatically. A specific data point cuts through because it is unique to this moment and this athlete, and there is no template the brain has already pattern-matched against. The line earns attention by being unfalsifiably specific.
This is also the rule that exposes which AI fitness coaches are doing real work and which are doing theatre. Pay attention next time your AI coach speaks to you during a session. If you can substitute the name of any other athlete and the line would still make sense, the coach is reading from a script. If the line could only have been said to you, in this session, after this set, you are getting actual coaching.
How to evaluate an AI coach in a single session
If you are weighing up AI fitness products, you do not need a thirty-day trial to make the call. One real training session is enough, if you know what to listen for.
- Does it stay quiet when there is nothing to say, or does it interrupt every set
- Does every spoken line reference a specific number from your training, or does it default to generic encouragement
- Does it acknowledge a personal record specifically and meaningfully, or does it react the same way to a PR as to a routine set
- Does the closing line of the session refer to what you just did, or could it have been written for any workout
- If you ask it to change the session, does it actually rewrite the plan, or does it just append new exercises to what was already there
The last test is the one most products fail quietly. Many AI coaches treat plan modifications as additive: ask for a different workout, and they bolt new exercises onto the old structure rather than rewriting it. This is a small implementation choice with a large coaching consequence. An athlete who asks for a different session is not asking for more work. They are asking for a different plan, and a coach that cannot tell the difference is a coach the athlete will stop trusting.
Where this fits in a coaching system
The three-moment structure is not a one-off insight. It is the design principle that connects every piece of useful in-workout coaching, and it is the spine of what we have built at Pelaris.
The athlete starts a session, the coach speaks once, with intent anchored to their actual training history. During the work, calibrated lines fire on noteworthy moments and stay silent the rest of the time. At the close, the session gets a specific summary that ties what just happened to what the athlete is training toward. After the session, a longer reflection is generated and stored so the next session can reference it.
This shape applies whether the athlete is doing a heavy lower body lift, a structured swim, or a long run. The coaching changes character with the sport because the noteworthy moments change, but the structural rule holds: speak when speaking earns attention, stay quiet the rest of the time, and never say a line that could have been generated for anyone other than this athlete.
For triathletes and other multi-discipline athletes, the same rule applies with one extra constraint: the coach has to know what the athlete did across all three sports, not just the one they happen to be doing today. A swim session yesterday changes what counts as a noteworthy moment in today’s run. Cross-discipline awareness is not a nice-to-have for these athletes. It is the basis of whether the coaching feels intelligent or generic. You can read more about how we approach this in our methodology.
What this means for your training
If you are evaluating AI fitness products, or building one, the structural rules are clearer than the marketing makes them sound.
- Calibrated coaching speaks at three moments only: the start, during noteworthy events, and the close. Constant chatter is a design failure
- Every spoken line should reference a specific data point from your training. If it could have been said to any athlete, it should not have been said at all
- Personal records always warrant a specific acknowledgement. Generic responses to PRs erode trust faster than silence would
- Plan adaptations should rewrite, not append. An athlete asking for a different session is asking for a different plan, not extra work
- The right standard for an AI coach is whether it sounds like it knows you. If it does not name a number, compare to a past session, or reference your actual goals, it is reading a script
The technology is no longer the constraint. Calibrated, datum-aware coaching is buildable today with the tools every team in this category has access to. What separates the products that feel like coaching from the products that feel like scripts is a design decision about when to speak, when to stay quiet, and what to anchor every line to.
That decision is the entire game.