Flushing audio from VS1011e

Writing software for systems that use VLSI Solution's devices as slave codecs to a host microcontroller.
ags
Senior User
Posts: 34
Joined: Sun 2013-03-31 7:51
Location: Silicon Valley

Flushing audio from VS1011e

Post by ags » Mon 2017-01-02 7:13

Hi:

I have an application where I need to be able to interrupt MP3 playback (on VS1011e) mid-stream and then begin playback of a new MP3 file. I've noticed that the abrupt termination (potentially - or likely in mid-frame) results in an "artifact" being audible when the new MP3 file playback begins. Reading other posts here leads me to think this has something to do with the way the VS1011e deals with buffer underflow. I've seen this brought up when terminating WAV playback but not MP3.

Anyway, I've also read bits and pieces that seem to indicate that the way to "flush" partial audio is to send 2048 zero bytes to SDI. I am doing that, and the artifacts remain. I am sending 32 zero-value bytes at a time, ensuring that DREQ is high before continuing. I do this 64 times, and upon completion I also toggle CS low then high. I am using SDISHARE and SDINEW modes.

It seems that if I repeat this process four times (sending 8192 zero-value bytes) the artifacts don't happen (I think...). Is this correct? Am I incorrect in my expectation that 2048 zero-byte values should flush the bitstream FIFO?

Thanks for any insight into this.

User avatar
pasi
VLSI Staff
Posts: 1670
Joined: Thu 2010-07-15 16:04

Re: Flushing audio from VS1011e

Post by pasi » Mon 2017-01-02 14:07

2048 bytes flush the SDI FIFO, but there will be decoded audio in the DAC FIFO.

If the samplerates of the interrupted mp3 and the next mp3 are different, the new mp3 will set the new samplerate and the rest of the samples in the DAC FIFO will be played with a wrong rate, causing an artefact.

Could that be the issue in your case?

The DAC FIFO in vs1011 is 512 stereo samples, so the time to play all remaining data varies with samplerate. At 8kHz rate it is upto 64ms, at 48kHz it is 11ms.
Visit https://www.facebook.com/VLSISolution VLSI Solution on Facebook

ags
Senior User
Posts: 34
Joined: Sun 2013-03-31 7:51
Location: Silicon Valley

Re: Flushing audio from VS1011e

Post by ags » Mon 2017-01-02 20:21

Pasi thank you for the reply. However, this has always, and still continues to confuse me. Perhaps I've read too much on my own trying to solve this problem myself. I'll list what I *think* is true - from reading docs and the forums. However, I realize it might not be correct.

* the DAC FIFO (decoded MP3 at the final sample rate - ready to be fed to the DACs for conversion to audio out) is slowed and/or stuffed with zero values if there is an underflow of data from the Bitstream FIFO (MP3 coded input to SDI).
* the Bitstream FIFO is 2048 bytes long, so writing 2048 bytes to SDI clears all content in the Bitstream FIFO

So the question is why doesn't the DAC FIFO also get cleared when I flush the Bitstream FIFO? Or where is the last bit of audio being stored? Doesn't the DAC continue to sample from the DAC FIFO continuously, at the specified sample rate? Are the DAC FIFO contents not cleared once converted by the DACs? I haven't connected a scope to view the output (the prototype is in a physically inaccessible location at the present time) but the symptoms are that the last few moments (some number of 10's or 100's of milliseconds I would guess) of audio from the terminated MP3 are "stuck" somewhere (DAC FIFO, Bitstream FIFO, somewhere else?) and do not get flushed out until I start playback of another MP3 file - at which time that audio is heard before the audio content of the new MP3 file begins to play.

Brek
Senior User
Posts: 61
Joined: Sun 2016-09-11 5:51

Re: Flushing audio from VS1011e

Post by Brek » Tue 2017-01-03 12:39

If program memory on your controller is not too tight, you could try implementing a function to send some valid frames of silence
to send the the chip just as a button is pushed to skip tracks in either direction.
That function must of course still honour the dreq signal just the same, and not send more than 32 bytes.

I’m not certain, but figure it takes valid mp3 frames (or at least valid supported audio) to be passed on to the DAC,
or it would be able to play any old junk data including id3 tags and images, which it does not.

User avatar
pasi
VLSI Staff
Posts: 1670
Joined: Thu 2010-07-15 16:04

Re: Flushing audio from VS1011e

Post by pasi » Tue 2017-01-03 12:46

Ah, now I see what's probably your issue (with some help from the other thread).

Mp3 uses a frequency-time transform with overlap. It means the samples at the head of the newly decoded audio frame are combined (windowed) with the tail of samples the previous frame (history).

If you send mp3 frames continuously, each decoded frame will be windowed with the previous frame. If you stop sending data from one file and switch to the next, if the files have compatible decoding parameters (samplerate and number of channels), the decoder does not know you have switched files, even with the added zeroes. (The zeroes act just as a delay, stopping the playback and not affecting the decoding itself - apart from filling up the last mp3 frame of the first file.)

What you are seeing is the history data being windowed to the start of the new mp3 file.

Our newer decoders have mechanisms to indicate that one file has ended and mp3 decoding should be restarted (with the history cleared), with older ones such as vs1011e you would use a software reset between files.
Visit https://www.facebook.com/VLSISolution VLSI Solution on Facebook

Brek
Senior User
Posts: 61
Joined: Sun 2016-09-11 5:51

Re: Flushing audio from VS1011e

Post by Brek » Tue 2017-01-03 14:03

I just dug up the function that did solve the issue once and for all for me, so will update the other thread.
The other thread will probably be a better search result in future.

Brek
Senior User
Posts: 61
Joined: Sun 2016-09-11 5:51

Re: Flushing audio from VS1011e

Post by Brek » Tue 2017-01-03 14:13

It doesn’t have a thread of it’s own. Maybe it was just one of my rambling threads with more than one issue.
I was trying to update that thread because there was no real resolution for me at that stage, but was on the right track.

For sfcount (silence frame count), a value of 4 did not fix the problem, I did not try 5, and 6 works for me every time.
Silence frame is digital silence encoded with LAME mono 8 bit. Nice that it’s 26 bytes because if DREQ is asking for data you can send one frame.

Code: Select all

void silenceframe() {
int sfcount = 0;
while (sfcount < 6) {
while (VSDREQ == 0) {delay_ms(0);}
VSXDCS = 0; // select vs1003 data
WriteSPIManual(0xFF);
WriteSPIManual(0xF3);
WriteSPIManual(0x10);
WriteSPIManual(0xC4);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
WriteSPIManual(0x03);
WriteSPIManual(0x48);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
WriteSPIManual(0x4C);
WriteSPIManual(0x41);
WriteSPIManual(0x4D);
WriteSPIManual(0x45);
WriteSPIManual(0x33);
WriteSPIManual(0x2E);
WriteSPIManual(0x39);
WriteSPIManual(0x39);
WriteSPIManual(0x2E);
WriteSPIManual(0x35);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
WriteSPIManual(0x00);
VSXDCS = 1;
delay_ms(0);
sfcount++;
} // sfcount
}
I have included fillbufferwithzeros because I have left it in my code. It’s just an artefact of trying the same method as ags first,
but I left it there. Both of these functions are called in my player straight after skipping a track (before even looking for the next sd card file).
Maybe fillbufferwith zeros is no longer needed between tracks, I haven’t tried with it removed.

Code: Select all

void fillbufferwithzeros() {
int bytecount = 0;
while (bytecount < 2048) {
if (VSDREQ == 1) {
VSXDCS = 0; // select vs1003 data
n = 0;
while (n < 32) {WriteSPIManual(0); n++;}
bytecount = bytecount + 32;
VSXDCS = 1;
} // dreq
} // bytecount
}

User avatar
pasi
VLSI Staff
Posts: 1670
Joined: Thu 2010-07-15 16:04

Re: Flushing audio from VS1011e

Post by pasi » Tue 2017-01-03 14:47

Yep, silent mp3 frames handle the same thing.

Either
a) they are the same rate and thus flush the windowing history, or
b) are different rate/parameters and cause the decoder to throw away the history because the previous frame is not compatible with the new frame.

(I remember reading the discussion somewhere here.)
Visit https://www.facebook.com/VLSISolution VLSI Solution on Facebook

ags
Senior User
Posts: 34
Joined: Sun 2013-03-31 7:51
Location: Silicon Valley

Re: Flushing audio from VS1011e

Post by ags » Thu 2017-01-05 7:00

OK. I think I am making progress. It is obvious to me now (after the replies) but I was missing the point that the bitstream FIFO is feeding an MP3 decoder, and the decoder doesn't just "pass through" byte values. Sending a bunch of zeros doesn't push a bunch of zero values to the DAC FIFO. If it's not a valid, silent MP3 frame, the MP3 decoder will throw the bytes away. Is that right? If so, then what purpose does sending the (recommended) 2048 zero-value bytes to the bitstream FIFO serve? I suppose it would "complete" a partial MP3 frame (although what it would be encoding would be somewhat random, depending on what was in the partial MP3 frame data).

I'm still fuzzy on the operation of the DAC FIFO. Doesn't it just run, and eventually clear itself? And once empty, it will then detect a "no audio" state and go into a low power mode (until more real audio data is sent from the MP3 decoder "listening" to the bitstream FIFO)? But I've read something about the DAC hardware trying to "stretch" audio data when there is a buffer underrun. Can you clarify how this works, particularly as it impacts the ending of one set of valid (perhaps truncated/incomplete) audio data and the beginning of some new data?

One more thing: I had to prove to myself that the "silent MP3 frame" provided by Brek was indeed valid (as an exercise for me and learning opportunity). I was able to see that it is, in fact, an MP3 frame header indicating 8kbps, mono, 22050 s/sec. And I'm not an MP3 expert (and would prefer not to be). However, psychoacoustic dynamics and compression aside, I have presumed that the basic mechanism is to perform a Fourier transform to convert audio from time domain to frequency domain, then to the magic compression. If this is correct, I'd expect the MP3 data for silence to be all zeros (which Brek's sample is not). Is the sample not really silent, or does absolute silence not result in MP3 encoding of all zeros (all frequencies having zero magnitude)?

Brek
Senior User
Posts: 61
Joined: Sun 2016-09-11 5:51

Re: Flushing audio from VS1011e

Post by Brek » Thu 2017-01-05 8:09

The authors of LAME have put their string in there “LAME3.3.0” or something similar.
It does still work fine if replaced with zeros, but I have left the 0x03 and 0x48 bytes that precede that string,
since I don’t really understand all there is to know about mp3 frames either.

Also looking at my player again today, I noticed I call that function 4 times, meaning that my player sends 24 frames in total.
This is still a minuscule and not noticeable time though.

It’s probably not needed when the player naturally finishes one track and automatically starts the next.
Not only do most songs begin and end with silence, but I assume the whole thing is really our fault because if you skip a track,
you will more often than not interrupt the track in a position somewhere that isn’t the exact end of a frame.

I can’t answer the other questions sorry, just this stuff that I’ve actually observed.

Post Reply