I have tested dozens of PTZ cameras with built-in AEC on construction sites, windy rooftops, and busy roadsides. The results always surprise people.
AEC in Chinese PTZ cameras can reduce far-end echo to an acceptable level in most noisy environments. But AEC alone does not remove background noise. That job belongs to the ANS module. In high-noise scenes like construction sites or strong wind, echo suppression still works, but the remaining ambient noise will not fully disappear, and voice quality may sound compressed or narrow.

Before I walk you through each part, I want to break down the four questions I hear most from integrators like David Miller. These are about feedback loops, motor noise, low-frequency sounds, and processing latency. Each one matters when you deploy PTZ cameras with two-way audio in the real world. Let me go through them one by one.
Table of Contents
How Does the AEC Algorithm Prevent Feedback Loops During a Two-Way Mobile App Conversation?
I once lost a project because the client heard his own voice bouncing back through the PTZ speaker during a live demo. That taught me a hard lesson about feedback loops.
The AEC algorithm uses adaptive filtering techniques in acoustic echo cancellation 1 to capture the speaker output as a reference signal. It then subtracts that reference from the microphone input in real time. This prevents the far-end voice from looping back through the speaker and microphone cycle, which would otherwise cause echo or howling.

How Adaptive Filtering Actually Works
The core of AEC is a digital filter that learns the acoustic path between the speaker and the microphone. In a PTZ camera, the speaker plays the remote person’s voice. The microphone picks up that voice after it bounces off walls, the camera housing, and nearby surfaces. The AEC algorithm takes the original speaker output and uses it as a reference. It then builds a model of how that sound changes as it travels through the environment. Once it has a good model, it subtracts the predicted echo from the microphone signal. What remains is only the local person’s voice. Modern systems often rely on adaptive digital filter algorithms such as LMS and NLMS 2 to continuously update this model.
Why Feedback Loops Still Happen
In practice, this process is not perfect. Here are the main reasons feedback loops can still occur:
- Speaker volume too high. When the output volume is maxed out, the sound energy overloads the microphone. The algorithm cannot subtract what it cannot cleanly model.
- Poor physical isolation. If the speaker and microphone sit inside the same small PTZ housing with no rubber dampening, sound travels through the metal or plastic body. This structural echo is very fast and very strong. The AEC filter often cannot handle it.
- Network delay shifts. On a 4G mobile app, network jitter can change the timing between the reference signal and the actual echo. If the delay jumps outside the AEC buffer window, the algorithm loses its lock on the echo.
What You Can Do About It
I always tell my clients to start by turning down the speaker volume by 30%. This single step fixes most feedback issues. If that is not enough, switch the firmware to full-duplex AEC mode with NLP enabled. NLP stands for non-linear processing in echo cancellation systems 3. It catches the residual echo that the linear filter misses.
| Cause of Feedback | Fix | Expected Result |
|---|---|---|
| Speaker volume too high | Reduce output by 30% | Echo drops below audible level |
| Poor physical isolation | Use external speaker + mic with 1m spacing | Removes 90% of structural echo |
| Network jitter on 4G | Enable jitter buffer in firmware | AEC stays locked on echo timing |
For integrators who deploy in remote areas with unstable 4G, I recommend testing the AEC with a real mobile app call before finalizing the install. Do not rely on a quiet office test. The field is always different.
Can I Conduct a Clear Conversation While the PTZ Motor Is Panning or Tilting?
I have been on calls where the PTZ started moving and the other person said, “”What is that grinding noise?”” That is the motor. And it is a real problem for two-way audio.
Yes, you can hold a conversation while the PTZ motor moves, but the motor noise will be picked up by the microphone. The AEC will not remove it because motor vibration is not echo. You need ANS and good mechanical dampening inside the camera to keep motor noise low enough for clear speech.

Why Motor Noise Is Different from Echo
AEC is designed to cancel one specific thing: the sound that came from the speaker and bounced back into the microphone. Motor noise is not speaker output. It is a new sound source. So the AEC algorithm ignores it completely. The ANS module is the one that tries to reduce this kind of steady mechanical noise. But ANS works best on constant, predictable sounds. PTZ motor noise changes in pitch and volume as the camera speeds up, slows down, or changes direction. This makes it harder for ANS to track and suppress.
The Role of Mechanical Design
At Loyalty-Secu, we pay close attention to the internal mechanical design of our PTZ cameras. Here is what matters:
- Rubber motor mounts. These absorb vibration before it reaches the microphone cavity.
- Sealed microphone chamber. A separate acoustic chamber for the microphone reduces airborne motor noise.
- Belt-driven vs. gear-driven movement. Belt-driven PTZ mechanisms are quieter than direct gear drives. But they cost more and wear faster.
What to Expect in Practice
In my experience, a well-built PTZ camera will produce motor noise around 35-45 dB at the microphone. Human speech at 1 meter is about 60-65 dB. So the signal-to-noise ratio is still workable. The remote listener will hear a faint hum or whir during panning, but the speech remains clear. If the motor noise is louder than 50 dB, speech clarity drops fast.
| Motor Noise Level | Speech Clarity | Recommendation |
|---|---|---|
| Below 35 dB | Excellent — motor barely audible | No action needed |
| 35–45 dB | Good — faint hum during movement | Acceptable for most B2B use |
| 45–50 dB | Fair — noticeable noise, speech still clear | Enable ANS high mode |
| Above 50 dB | Poor — motor competes with speech | Use external microphone away from body |
If you are doing critical two-way conversations during PTZ movement, I suggest mounting an external pickup microphone at least 50 cm away from the camera body. This is the simplest and most effective fix. No algorithm can fully replace good physical separation.
Does the Noise Suppression (ANS) Filter Out Constant Low-Frequency Sounds Like Traffic or Fans?
I deployed a solar PTZ system next to a highway once. The client called me and said, “”I can hear the trucks more than the guard.”” That is when I learned the limits of ANS on low-frequency noise.
ANS can reduce constant low-frequency sounds like fan hum and distant traffic by 10–20 dB. But it cannot fully remove them. ANS works by estimating the noise spectrum during silent moments and then subtracting it during speech. Low-frequency energy is hard to cut without also affecting the lower tones of human voice.

How ANS Estimates and Subtracts Noise
ANS algorithms work in the frequency domain. During moments when no one is speaking, the algorithm captures a “”noise profile.”” This profile tells the system what the background sounds like. When someone starts talking, the algorithm subtracts this noise profile from the full signal. What is left should be mostly voice. This approach is widely used in frequency-domain noise reduction methods 4 in modern audio DSP systems.
This works well for steady, flat noise like air conditioning or a distant fan. These sounds have a stable frequency pattern. The algorithm can build an accurate model and subtract it cleanly.
Where ANS Struggles
Low-frequency noise from traffic, generators, or heavy machinery is harder to handle. Here is why:
- Overlap with voice. Human male voice has fundamental frequencies between 85 and 180 Hz. Traffic rumble sits in the 50-250 Hz range. There is a big overlap. If ANS cuts too aggressively in this range, the speaker’s voice sounds thin and unnatural. This is a well-known limitation in speech signal frequency overlap analysis 5.
- Amplitude changes. A passing truck gets louder and then softer over a few seconds. ANS needs time to update its noise estimate. During that update window, the noise leaks through.
- Non-stationary noise. Wind gusts, sudden honks, and construction bangs are not constant. ANS is not designed to handle sudden bursts. It is built for steady-state noise.
Practical Advice for Noisy Sites
For sites with heavy low-frequency noise, I recommend the following:
- Use a high-pass filter in audio processing systems at 150 Hz if the firmware allows it 6. This cuts the deepest rumble without hurting most speech.
- Place the microphone away from vibrating surfaces like metal poles, fences, or generator housings.
- If the site is extremely noisy, consider a directional microphone pickup pattern (cardioid) 7 instead of the built-in omnidirectional one.
In my tests, ANS combined with a high-pass filter brings low-frequency background noise down by about 15–20 dB. That is enough to make speech understandable, but the remote listener will still hear that they are not in a quiet room. Set expectations with your client early. No PTZ camera will make a highway sound like an office.
What Is the Processing Latency of the AEC During a High-Resolution 4K Video Stream?
I had a client ask me if running 4K video would slow down the AEC. It is a fair question. Both tasks share the same processor inside the camera.
AEC processing latency in most PTZ cameras is between 20–40 ms. Running a 4K video stream does not directly increase AEC latency because audio and video are processed on separate pipelines inside the SoC. But if the SoC is under heavy load from 4K encoding, the audio pipeline may experience occasional delays, adding 10–30 ms of extra latency in worst cases.

How Audio and Video Share the SoC
Modern PTZ cameras use a System-on-Chip (SoC) that handles video encoding, image processing, network transmission, and audio processing all at once. Inside the SoC, these tasks run on different hardware blocks. Video encoding uses a dedicated hardware encoder like H.264 and H.265 video compression standards 8. Audio processing, including AEC, runs on a DSP core or the main CPU.
In theory, they do not interfere with each other. In practice, they share memory bandwidth and bus resources. When the video encoder is working hard on a 4K stream at 25 fps, it uses a lot of memory bandwidth. If the audio DSP needs to access memory at the same time, it may have to wait. This wait adds a few milliseconds of latency.
What Latency Means for Two-Way Audio
For a normal phone call, people start noticing delay at around 150 ms one-way. Below 100 ms, the conversation feels natural. The AEC itself adds 20–40 ms. Network transmission over 4G adds another 50–150 ms. The video encoding does not add to the audio path directly, but if SoC congestion adds 10–30 ms on top, the total can push close to 200 ms. At that point, both sides start talking over each other because the delay feels unnatural. These thresholds align with common findings in real-time voice communication latency studies 9.
How to Keep Latency Low
Here are the steps I take when setting up a 4K PTZ with two-way audio:
- Use a sub-stream for audio-linked sessions. Many PTZ cameras can send a lower-resolution sub-stream alongside the 4K main stream. If your mobile app uses the sub-stream for the two-way audio session, the SoC load drops and audio latency stays low.
- Check the SoC model. Not all chips are equal. A camera using a higher-end SoC with a dedicated audio DSP will handle 4K + AEC better than a budget chip that runs everything on the main CPU.
- Reduce frame rate if needed. Dropping from 30 fps to 15 fps on the 4K stream cuts encoding load almost in half. Audio latency improves as a result. This is a common optimization in video encoding performance tuning practices 10.
| SoC Load Condition | Typical AEC Latency | Impact on Conversation |
|---|---|---|
| 1080p stream, low CPU use | 20–30 ms | No noticeable delay |
| 4K stream, moderate CPU use | 30–40 ms | Still natural |
| 4K stream + AI analytics | 40–70 ms | Slight delay, still usable |
| 4K + AI + high network jitter | 70–120 ms+ | Delay becomes noticeable, may need optimization |
I always test the total round-trip audio delay during the pilot phase. I play a sharp click sound near the camera and measure how long it takes to hear it on the remote app. If the number is under 200 ms round-trip, the system is ready for real conversations. If it is over 300 ms, something needs to change — either the stream resolution, the network path, or the SoC configuration.
Conclusion
AEC in PTZ cameras handles echo well in noisy environments, but real-world audio quality depends on ANS performance, mechanical design, network stability, and proper field testing before deployment.
1. Explains how adaptive filters dynamically remove echo signals. ↩︎ 2. Details LMS/NLMS algorithms used in echo cancellation systems. ↩︎ 3. Discusses residual echo and nonlinear processing methods. ↩︎ 4. Overview of frequency-domain noise reduction and AEC workflows. ↩︎ 5. Research on frequency overlap challenges in acoustic processing. ↩︎ 6. Basics of high-pass filtering for removing low-frequency noise. ↩︎ 7. Explains directional microphone patterns like cardioid pickup. ↩︎ 8. Introduction to video compression standards used in SoCs. ↩︎ 9. Study on echo cancellation performance and latency behavior. ↩︎ 10. Covers system optimization techniques for audio/video processing. ↩︎