question about audio frame format of vs1063

Writing software that inputs and/or outputs audio and performs DSP algorithms such as filters, new codecs or audio effects.
jwc1979
User
Posts: 16
Joined: Tue 2020-09-08 2:56

question about audio frame format of vs1063

Post by jwc1979 » Tue 2020-09-08 3:02

hello,

I have a scenario to packet audio frame of vs1063 into live555, and play on internet. now I make vs1063 to output MP3 format, but I found the output stream is different with standard MP3 frame. anyone knowns the output audio frame format?
any reply will be appreciated.

User avatar
Henrik
VLSI Staff
Posts: 1178
Joined: Tue 2010-06-22 14:10

Re: question about audio frame format of vs1063

Post by Henrik » Tue 2020-09-08 14:00

Hello!

VS1063 MP3 output is fully conforming to the MP3 format. If you are getting results that don't look like valid MP3 frames, could you grab some data and attach to this thread? A few kilobytes, or a second or two's worth will be enough. Then we could analyze and see if we can find out what might be going wrong.

Kind regards,
- Henrik
Good signatures never die. They just fade away.

jwc1979
User
Posts: 16
Joined: Tue 2020-09-08 2:56

Re: question about audio frame format of vs1063

Post by jwc1979 » Wed 2020-09-09 3:59

here is the audio data saved.
would you please parse these data with mp3 frame format.
thank you very much.
Attachments
test-new-02.mp3
(93.38 KiB) Downloaded 15 times

jwc1979
User
Posts: 16
Joined: Tue 2020-09-08 2:56

Re: question about audio frame format of vs1063

Post by jwc1979 » Wed 2020-09-09 10:06

it seems like there are some padding bytes in the audio frame data zone.
the padding is some strings like "VSMP3 enc v1.00."
is that right?

User avatar
Henrik
VLSI Staff
Posts: 1178
Joined: Tue 2010-06-22 14:10

Re: question about audio frame format of vs1063

Post by Henrik » Wed 2020-09-09 14:44

Hello!

The following answer is a bit on the long side, I hope it will not be too confusing.

First, I've analyzed your file with a custom tool, and this is what I get:

Code: Select all

Summary for test-new-02.mp3
    Size: 93.4 KiB / 0.1 MiB
  Format: MP3
    Conf: 1 channels @ 8000 Hz
    Time: 0:47.7
 Bitrate: 16.0 kbit/s CBR (16 kbit/s, 0.0% padding frames)
  ChMode: Mono 100.0%
  16kb/s: 664 *****************************************************************
I also individually checked the Mp3 file frame by frame, and there are no framing errors. So, the file is fully conforming to MP3 standards.

From the analysis, it seems you have set the encoder for Mono 8 kHz recording with Constant Bit-Rate (CBR) encoding at 16 kbit/s. Is this correct?

You are correct in that if the encoder fails to fill the whole MP3 frame with actual audio data, it will pad the rest of frames with an identification string, which in the case of the VS1063a encoder is a repeating 16-character string "VSMP3 enc v1.00." The unused portion can contain anything, it will not affect decoding.

As for the question *why* there is unused space, I have some further questions:
What are you writing to SCI_WRAMADDR before starting encoding? Bit 10 (mask 0x0400) is particularly interesting. If set, it forces the encoder to *not* use the so-called bit reservoir. While setting this bit will make the encoding-decoding delay 576 samples, or (576/8000 = 0.072 seconds) shorter, it will also cause encoding inefficiency, and exactly this kind of behaviour.

If you want to utilize the amount of bits as well as possible, and avoid the "dead weight" of the padding string, and if your application can live wit h bit-rates that don't stay exactly the same all the time, I recommend you to use Quality-based encoding. On the scale from 0 to 10, I recommend you to try different values and seeing what you end up with.

I hope this helps. If you have further questions, don't hesitate to ask!

Kind regards,
- Henrik
Good signatures never die. They just fade away.

User avatar
Henrik
VLSI Staff
Posts: 1178
Joined: Tue 2010-06-22 14:10

Re: question about audio frame format of vs1063

Post by Henrik » Wed 2020-09-09 15:28

Hello again!

I developed my analyzing tool a bit and it seems indeed that you have forbidden the use of the MP3 bit reservoir by setting bit 10 of SCI_WRAMADDR before activating encoding. For better audio at the same bit-rate (resulting in less padding strings), I recommend you clear the bit.

Kind regards,
- Henrik
Good signatures never die. They just fade away.

jwc1979
User
Posts: 16
Joined: Tue 2020-09-08 2:56

Re: question about audio frame format of vs1063

Post by jwc1979 » Thu 2020-09-10 4:43

hi Henrik
first, thank you for your detailed explanation.
now I have another questions:
1. in the MP3 frame header, there is a bit that indicate the padding is existed or not. in the file I attached last time, this bit is zero, that means there are no padding, but the padding is existed accur actually. is this correct?

2. such like this audio data, Can i packet it into live 555 for RTSP stream directly? or I should remvoe the padding bytes? In other words, could other decoder except vs1063 decode these audio frame we talk about?

thanks again.

Wayne.

User avatar
Henrik
VLSI Staff
Posts: 1178
Joined: Tue 2010-06-22 14:10

Re: question about audio frame format of vs1063

Post by Henrik » Thu 2020-09-10 9:05

Hello Wayne!

Another lengthy answer follows...
jwc1979 wrote:
Thu 2020-09-10 4:43
now I have another questions:
1. in the MP3 frame header, there is a bit that indicate the padding is existed or not. in the file I attached last time, this bit is zero, that means there are no padding, but the padding is existed accur actually. is this correct?
Ah, yes. The meaning of the "padding" bit in the headers is not this. It is a concept needed for files that are encoded at 11025, 22050, or 44100 Hz. If the previous answer was long, this is going to be a novella!

The MP3 header padding bit

Let's take an example: we encode, as you do, a file at 8 kHz Constant Bit-Rate 16 kbit/s. As per the MP3 specification, each MP3 frame at 8 kHz outputs 576 samples. From this information, we can count a frame size of (576 sample/frame) / (8000 sample/s) * 16000 bit/s / (8 bit/byte) = 144 byte/frame.

The length of the frame information is not explicitly written into the file, becuase it would make the file larger, and because the length of the frame can be calculated from the parameters already available in the MP3 header.

So far so good, but what has this all to do with the padding bit? We are getting there.

Now let's encode audio at 44100 Hz, and Constant Bit-Rate 128 kbit/s. As per the MP3 specifiction, each MP3 frame at 44.1 kHz outputs 2*576 = 1152 samples. From this information, we can count an average frame size of (1152 sample/frame) / (44100 sample/s) * 128000 bit/s / (8 bit/byte) = approximately 417.9591 byte/frame.

But, this is a bit annoying... The frame size is not an integer! Per the MP3 specification, the frame size is always rounded downwards, so the length of this particular kind of frame would be 417 bytes.

But, this makes no sense, you might shout out in exasperation! If calculating backwards, you can see that this would lead to a file that is not 128 kbit/s as requested, but approximately 127.706 kbit/s. This would clearly be unacceptable for ADSL-type streaming applications that were the original goal for the MP3 format. And wouldn't it be wiser to round the frame length to 418 bytes instead of 417, creating a much smaller rounding error? Actually no. There is a more clever way. The padding bit.

If the padding bit is set in a particular frame, that means that the frame is actually one byte longer than nominal, so in this case 418 bytes instead of 417 bytes. By alternating these two frame lengths, the encoder can make it sure that on average the CBR-encoded file runs exactly at the given bitrate. So, in this case, 95.91% of the frames would have the padding bit set, and the rest would not, resulting in an average frame length of 417.9591 bytes/frame. Problem solved! Except with some MP3 encoders that don't know what to do with the padding bit, creating files that are fully decodable by any decoder, but strictly speaking not conforming.

Note that adding the padding bit is solely a responsibility of the encoder. The job of the decoder is to recognize whether the padding bit is set in a given frame, and act accordingly.

What's the meaning of the "VSMP3 enc v1.00." padding then?

This is a completely different matter.

To being able to perform in real-time even at the highest sample rates and bitrates, our encoder is a one-pass encoder. What that means is that it cannot try to iteratively encode the frame again and again, fudging with its parameters, until the frame just about gets filled up. No, it must encode the data in such a way that it can *never* exceed the maximum frame size. These restrictions make it so that our encoder will always, depending on the audio complexity, leave some 20-50% of the frame space unused.

Example: As we calculated earlier, your 16 kbit/s 8 kHz file consists of frames that are 144 bytes long. The header eats up 4 bytes, and as you have requested a CBR file, the encoder needs to make sure that in no case shall it be possible that the data encoded would exceed 140 bytes. Never. Because we are one-pass. And if we encode too wide, we cannot "go back" and redo it. So, we need to leave a margin wide enough so as never to fail.

So, let's say that our encoding made the actual audio data take 100 bytes. Now we have a header of 4 bytes, data of 100 bytes, but there are still 40 bytes left to fill in the frame to make it legal. What to do with them?

The answer is: it doesn't matter. You fill them up with any data you wish to. The decoder doesn't care. After the decoder has read the 4 bytes of header, and 100 bytes of audio data, it will ignore the rest of the frame. It's just dead weight, needed to make the MP3 frame conform to the standard. So we fill this dead weight with "VSMP3 enc v1.00." Could be anything, really, so why not?

Why would allowing the Bit Reservoir make a difference?

Earlier I suggested you would allow the encoder to use the so-called Bit Reservoir. This is a powerful and useful concept of MP3, which I will explain below.

So, what is the Bit Reservoir?

As you saw, in the earlier example we didn't use 40 bytes of a frame. Those 40 bytes were wasted, and of no use whatsoever.

With Bit Reservoir activated, any bytes not used in an MP3 frame can be spared, and used in encoding the next MP3 frame. So, if we leave 40 bytes unencoded as we did in the example above, for the next frame we would not have 140 bytes to encode audio data, but instead 140+40 = 180 bytes, 40 of them located in the previous frame, and the 140 bytes located in the new frame. Now, the VS1063 encoder can (and will) modify its encoding parameters to higher accuracy, and use more bits to represent the new frame. So, instead of perhaps 100 bytes, it could use 120 bytes, and still be safe that it wouldn't exceed the hard limit that was now 180 bytes.

So, if this frame used 120 bytes of the 180 it had, there is now 60 bytes spared for the next frame, making its largest possible size 140+60 = 200 bytes. Again, the encoder may encode the following frame with greater accuracy, until after a few frames an equlibrium is reached, and the average actual encoded amount of data is that of the frame size audio portion, or 140 bytes.

Of course, writing to an earlier frame cannot be done after it has been output to the receiver, so this requires an additional step of buffering, which is automatically done by VS1063, and not a concern for the user. The only difference is that it causes a small extra delay because of the extra buffering. This delay amounts to 0.072 seconds with VS1063 at 8 kHz.

The Bit Reservoir is a fundamental part of the MP3 format. Unless you know a specific reason for not to use it, let the encoder use it!
2. such like this audio data, Can i packet it into live 555 for RTSP stream directly?
Yes. (Assuming you can put there any arbitrary MP3 data.)
or I should remvoe the padding bytes?
No. They are dead weight, but within the format, and ignored by decoders. Remove the padding data, and the file will not be MP3 anymore, and will not be decoded by anything.
In other words, could other decoder except vs1063 decode these audio frame we talk about?
Yes. The files created by VS1063 fully conform to the MP3 standard.

Kind regards,
- Henrik
Good signatures never die. They just fade away.

jwc1979
User
Posts: 16
Joined: Tue 2020-09-08 2:56

Re: question about audio frame format of vs1063

Post by jwc1979 » Thu 2020-09-10 10:54

Hello Henrik,

Thank you for your patience to explan the cause so clearly.
I have no question now. thanks again.

kind regards,

Wayne.

User avatar
Henrik
VLSI Staff
Posts: 1178
Joined: Tue 2010-06-22 14:10

Re: question about audio frame format of vs1063

Post by Henrik » Thu 2020-09-10 12:21

You're very welcome, Wayne!

Kind regards,
- Henrik
Good signatures never die. They just fade away.

Post Reply