<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sevana Oy &#187; test</title>
	<atom:link href="http://wordpress.sevana.fi/tag/test/feed/" rel="self" type="application/rss+xml" />
	<link>http://wordpress.sevana.fi</link>
	<description>The Sevana Product Blog</description>
	<lastBuildDate>Fri, 03 Sep 2010 16:57:04 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Who&#8217;s the main competitor to the new method? What&#8217;s the catch?</title>
		<link>http://wordpress.sevana.fi/whos-the-main-competitor-to-the-new-method-whats-the-catch/</link>
		<comments>http://wordpress.sevana.fi/whos-the-main-competitor-to-the-new-method-whats-the-catch/#comments</comments>
		<pubDate>Mon, 02 Mar 2009 05:14:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Voice and Sound Quality Testing Software]]></category>
		<category><![CDATA[approach]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[automatically]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[different]]></category>
		<category><![CDATA[dll]]></category>
		<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[file]]></category>
		<category><![CDATA[files]]></category>
		<category><![CDATA[Mean Opinion Score]]></category>
		<category><![CDATA[method]]></category>
		<category><![CDATA[mos]]></category>
		<category><![CDATA[P.862]]></category>
		<category><![CDATA[Perceptual]]></category>
		<category><![CDATA[qoe]]></category>
		<category><![CDATA[qos]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[quality of service]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[voice]]></category>
		<category><![CDATA[vqt]]></category>

		<guid isPermaLink="false">http://wordpress.sevana.fi/?p=72</guid>
		<description><![CDATA[The solution we are talking about in these posts is DIFFERENT and gives added value to other products as it is not the so-called QoE (Quality of Experience) and does not rely on real-time user input, and it is neither based on MOS (Mean Opinion Score), nor operational parameters. The product in question IS different [...]]]></description>
			<content:encoded><![CDATA[<p>The solution we are talking about in these posts is DIFFERENT and gives added value to other products as it is not the so-called QoE (Quality of Experience) and does not rely on real-time user input, and it is neither based on MOS (Mean Opinion Score), nor operational parameters. The product in question IS different and covers a specific area, BEFORE any product is integrated and BEFORE one gets to the user experience and one can easily integrate it to a large system as a DLL library for instance. Or, if you have a system already operating you gain two big advantages: 1. generate a specch model to evaluate voice quality provided by your system and 2. test it on the fly by just simple audio files comparison.</p>
<p>It seems quite reasonable that the major competitor for the method is ITU P.862 &#8211; the so-called PESQ, but if you google it you will find out that there are just few World leaders that have this method implemented in their systems and&#8230; even fewer those who has actually done in-house as the majority would be using the same technology provider. (Just google it!)</p>
<p>Ok, what&#8217;s the catch? Price is one of them obviously and in order to get a better understanding on that please refer to this presentation (http://www.sevana.fi/Automatic_Sound_Signals_Quality_Estimation_Business_Benefits.pdf), but that&#8217;s not all. If one tries to compare PESQ and the method in question he would find out quite interesting differences and advantages in favor of the new method. But let&#8217;s leave it for the next post to digest the information better. See you reading our next post!</p>
]]></content:encoded>
			<wfw:commentRss>http://wordpress.sevana.fi/whos-the-main-competitor-to-the-new-method-whats-the-catch/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to compare two audio files quality wise?</title>
		<link>http://wordpress.sevana.fi/how-to-compare-two-audio-files-quality-wise/</link>
		<comments>http://wordpress.sevana.fi/how-to-compare-two-audio-files-quality-wise/#comments</comments>
		<pubDate>Sat, 28 Feb 2009 08:30:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Voice and Sound Quality Testing Software]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[analyze]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[cisco]]></category>
		<category><![CDATA[compare]]></category>
		<category><![CDATA[compare speech]]></category>
		<category><![CDATA[compare two voice files]]></category>
		<category><![CDATA[file]]></category>
		<category><![CDATA[files]]></category>
		<category><![CDATA[how to compare two audio files]]></category>
		<category><![CDATA[ITU]]></category>
		<category><![CDATA[Mean Opinion Score]]></category>
		<category><![CDATA[mos]]></category>
		<category><![CDATA[P.862]]></category>
		<category><![CDATA[Perceptual]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[signals]]></category>
		<category><![CDATA[similarity]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[voice]]></category>
		<category><![CDATA[voice quality testing QoS MOS method methods pesq]]></category>
		<category><![CDATA[vqt]]></category>

		<guid isPermaLink="false">http://wordpress.sevana.fi/?p=66</guid>
		<description><![CDATA[In order to compare two audio signals we introduce the analytical module, which compares separately combined pairs of fragments of active and inactive phase signal that allows getting more accurate estimation.
For each fragment we determine integral spectrum by using discrete cosine transformation (DCT). Spectrum integration is calculated according to the proprietary formula. In the spectrum [...]]]></description>
			<content:encoded><![CDATA[<p>In order to compare two audio signals we introduce the analytical module, which compares separately combined pairs of fragments of active and inactive phase signal that allows getting more accurate estimation.</p>
<p>For each fragment we determine integral spectrum by using discrete cosine transformation (DCT). Spectrum integration is calculated according to the proprietary formula. In the spectrum calculation the interpenetration of windows comes to N/2 samples, and the Hamming or Blackmann-Harris window function is applied to every window. Levels of spectrum energy on bands are determined for all sets of bands. Groups of critical bands, determined by different authors resulting from different models of sound perception and speech production.</p>
<p>Band boundaries (initial and terminal indices) as well as band energy values we determine by a set of proprietary formulas. The initial quality estimation value is taken as 100%, which decreases proportionally to distinction of energies on bands. The most interesting fact is that we can scale down our percentage of signals similarity to the well-known Mean Opinion Score (MOS) values, which correspond in tests to Cisco MOS or ITU P.862 as precise as 97%. More interesting facts to follow &#8211; stay in touch!</p>
]]></content:encoded>
			<wfw:commentRss>http://wordpress.sevana.fi/how-to-compare-two-audio-files-quality-wise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Speech Intelligibility is not enough &#8211; let&#8217;s go for speech transmission index! The truth is out there&#8230;</title>
		<link>http://wordpress.sevana.fi/speech-intelligibility-is-not-enough-lets-go-for-speech-transmission-index-the-truth-is-out-there/</link>
		<comments>http://wordpress.sevana.fi/speech-intelligibility-is-not-enough-lets-go-for-speech-transmission-index-the-truth-is-out-there/#comments</comments>
		<pubDate>Sat, 21 Feb 2009 12:44:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Voice and Sound Quality Testing Software]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[codec]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[on the fly]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[RATSI]]></category>
		<category><![CDATA[SII]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[STIPA]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[voice]]></category>
		<category><![CDATA[voice quality testing QoS MOS method methods pesq]]></category>
		<category><![CDATA[vqt]]></category>

		<guid isPermaLink="false">http://wordpress.sevana.fi/?p=42</guid>
		<description><![CDATA[Today we would like to present two more methods to measure voice quality for example degraded by using a lossy codec or transmitting via mobile or fixed networks. These are methods similar in the approach and differ in implementation scheme, here they are:
STI (Speech Transmission Index). We may approximately consider speech signal as broadband signal [...]]]></description>
			<content:encoded><![CDATA[<p>Today we would like to present two more methods to measure voice quality for example degraded by using a lossy codec or transmitting via mobile or fixed networks. These are methods similar in the approach and differ in implementation scheme, here they are:</p>
<p>STI (Speech Transmission Index). We may approximately consider speech signal as broadband signal modulated by low-frequency signal. Articulation speed determines modulation frequency. When modulation depth decreases, speech signal becomes similar to noise and its intelligibility decreases. Accordingly, intelligibility decrease can be estimated according to modulation depth decrease also. Whole speech range is divided into 7octave bands. An octave noise signal is the input. The test signal intensity distribution agrees with the distribution of speech signal intensities. The modulating signal frequencies vary from 0.5 to 12.5 Hz with one-third-octave interval (14 frequencies in all). The STI measuring method is stated in the International standard IEC 268-16.</p>
<p>RATSI/STIPA (Rapid Speech Transmission Index). The STI method needs a lot of measuring procedures and calculations. A simplified method was developed, which provides for measuring only in 2 bands with 5 modulation frequencies and reduces the number of measuring procedures and calculations. For good intelligibility RASTI values must be not less than 0.6.</p>
<p>As these are analogous methods they have common disadvantages: speech transmission index as well as rapid speech transmission index imitates speech production process by means of noise model, but to take into account the properties of speech production and hearing in such a way is far from optimum.</p>
<p>Anticipating why we are posting short methods description and their disadvantages? That will be discovered in the next post about C50 &#8211; factor of clearence method.</p>
]]></content:encoded>
			<wfw:commentRss>http://wordpress.sevana.fi/speech-intelligibility-is-not-enough-lets-go-for-speech-transmission-index-the-truth-is-out-there/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>White Paper:Automated Sound Quality Estimation</title>
		<link>http://wordpress.sevana.fi/white-paperautomated-sound-signals-quality-estimation/</link>
		<comments>http://wordpress.sevana.fi/white-paperautomated-sound-signals-quality-estimation/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 08:40:19 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Voice and Sound Quality Testing Software]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[automatically]]></category>
		<category><![CDATA[codec]]></category>
		<category><![CDATA[Intelligibility]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[mos]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[on the fly]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[RATSI]]></category>
		<category><![CDATA[SII]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[STIPA]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[voice]]></category>
		<category><![CDATA[voice quality testing QoS MOS method methods pesq]]></category>
		<category><![CDATA[vqt]]></category>

		<guid isPermaLink="false">http://wordpress.sevana.fi/?p=21</guid>
		<description><![CDATA[The paper presents a new method for automated voice quality estimation and analysis based on perceptual evaluation described by such scientists as Sapozhnikov, Pokrovskiy, Sorokin. The brand new approach presented in the article allows checking on the fly quality of any codec, compare two audio/voice files quality wise or organize continous quality of service monitoring in VoIP networks.]]></description>
			<content:encoded><![CDATA[<h2>Automated sound signals quality estimation</h2>
<p class="western" align="left">
<p class="western" align="left">
<h1 class="western"><span style="font-size: x-small;">1. INTRODUCTION</span></h1>
<p class="western"><span style="font-size: x-small;">Sound signal quality estimation acquires the increasing value with the distribution of mobile communications, systems of a synthetic telephony, VoIP and various portable sound recording and sound reproducing devices. The desire naturally arises to work out a way, which would provide objective estimation (i.e. independently from estimation of particular subject) and the opportunity to automate such estimation. It is of a high importance as for comparison of competitive commercial products as well as for parameters’ optimisation of proprietary products.</span></p>
<p class="western"><span style="font-size: x-small;">One of the main parameters in systems of compression, transfer and reproduction of the sound information is the quality of the restored, received or reproduced sound.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">Quantitative measurement of sound quality has specific features due to the fact that the final receiver of a sound signal is always a human, and a human is also a source of the majority of sound signals. According to the well-known fact, sound signals quality is determined not only by the technical characteristics of a sound processing and transfer systems, but also by the properties of individual peculiarities of speech perception and production, which vary in time and from individual to individual.</span></p>
<p class="western" align="justify">
<h1 class="western"><span style="font-size: x-small;">2.REVIEW OF QUALITY ESTIMATION METHODS</span></h1>
<p class="western" align="justify"><span style="font-size: x-small;">Subjective and objective methods to measure speech quality are distinguished. Subjective methods are those, which include the hearing of a person as a component of a measuring complex. Objective methods, on the contrary, exclude participation of person’s hearing from the process of measurements.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">The most widespread subjective method of speech quality estimation is MOS (Mean Opinion Score), five-point scale estimation.<span id="more-21"></span></span></p>
<p class="western" align="justify"><span style="font-size: x-small;">This kind of estimation is determined by processing estimations given by groups of auditors to the sequences of sound signals, reproduced by various audio systems. Each auditor estimates each signal, and then the results are averaged<strong>.</strong></span></p>
<p class="western" align="justify"><span style="font-size: x-small;">To organize and implement subjective estimation is sufficiently difficult, long lasting and expensive activity, therefore investigations have been conducted in order to find objective methods, allowing receiving fast and automated estimations which would well correspond to subjective examinations.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">There are various automatic estimation methods; some of them are given below [1]:</span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><strong>AI (Articulation Index). </strong>The idea is that the whole frequency range of speech signal is divided into 20 bands and the signal/noise ratio is determined within the band. The band broad is defined in such a way, that every band contributes equally in speech perception. The signal/noise ratio is calculated within every band. Articulation index is supposed to be equal the weighted total of the band values.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><em>The disadvantage of the articulation index is that it does not take into account the properties of hearing and speech production, although it directs towards speech signal.</em></span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><strong>SII (Speech Intelligibility Index) </strong>is the evolution of AI method. The American Standard ANSI S3.5-1997 includes the speech intelligibility index. It provides 4 measuring procedures on different band groups: 21 critical bands, 18 one third-octave bands, 17 equal by their contribution critical bands and 6 octave bands. The signal/noise ratio is calculated within every band and the total SII coefficient, ranged from 0 to 1 is computed.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><em>The speech intelligibility index, however, takes into account only the properties of hearing, not speech production.</em></span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><strong>STI (Speech Transmission Index).</strong> We may approximately consider speech signal as broadband signal modulated by low-frequency signal. <!--more-->Articulation speed determines modulation frequency. When modulation depth decreases, speech signal becomes similar to noise and its intelligibility decreases. Accordingly, intelligibility decrease can be estimated according to modulation depth decrease as well.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">Whole speech range is divided into 7octave bands. An octave noise signal is the input. The test signal intensity distribution agrees with the distribution of speech signal intensities. The modulating signal frequencies vary from 0.5 to 12.5 Hz with one-third-octave interval (14 frequencies in all).</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">The STI measuring method is stated in the International standard IEC 268-16.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><strong>RATSI/STIPA (Rapid Speech Transmission Index).</strong> The STI method needs a lot of measuring procedures and calculations. A simplified method was developed, which provides for measuring only in 2 bands with 5 modulation frequencies and reduces the number of measuring procedures and calculations. For good intelligibility RASTI values must be not less than 0.6.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><em>Both speech transmission index (STI) as well as rapid speech transmission index (RASTI) imitate speech production process by means of noise model, but to take into account the properties of speech production and hearing in such way is far from optimum.</em></span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><strong>C50 (factor of clearness)</strong> determines sound clearness and clarity. It is computed as near echo/far echo ratio. The method is based on the fact, that echo reduces signal intelligibility. The near echo/far echo ratios in several frequency bands are calculated. They consider near echo (less than 33 ms) as useful signal and far echo (more than 33 ms) as disturbing signal. </span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><em>The factor of clearness takes into account only one kind of the possible distortions and it is worth to apply it only as one of the speech quality estimations approaches.</em></span></p>
<p class="western" align="justify">
<p class="western" align="justify"><span style="font-size: x-small;"><strong>ITU P.862 PESQ (Perceptual Evaluation of Speech Quality). </strong>PESQ is an objective measurement method that predicts the results of subjective listening tests on telephony systems. PESQ uses a sensory model to compare the original, unprocessed signal with the degraded signal from the network or network element. The resulting quality score is similar to the subjective &#8220;Mean Opinion Score&#8221; (MOS) measured using panel tests according to ITU-T P.800. The PESQ scores are calibrated using a large database of subjective tests. The method takes into account coding distortions, errors, packet loss, delay and variable delay, and filtering in analogue network components.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;"><em>Being one of the most popular tools PESQ has a number of disadvantages such as demanding test signals to be speech-like because many systems are optimized for speech and respond in an unrepresentative way to non-speech signals (e.g. tones, noise, ITU-T P.50). PESQ test signal is to be set by tester and thus vendor estimations may vary from end customer estimations. The approach performs signal level equalization what theoretically is not that good because when speaking different sound volumes may have different spectrums. PESQ cannot catch significant quality loss, which occurs when the voice is equalized such that there is far less low frequency and high frequency energy when compared to the original voice file.</em></span></p>
<p class="western" align="justify"><span style="font-size: x-small;">The need to develop new methods and to improve existing ones is caused by desire to bring together objective and subjective estimation of quality and to explicitly use in such systems our knowledge about hearing and speech production.</span></p>
<p><span style="font-size: x-small;">To use arbitrary or particularized signal as a source signal depends on the estimation purpose (speech intelligibility evaluation, sound reproduction quality, quality estimation of speech transmitted through intercommunication channels, etc.) and allows increasing estimation objectivity.</span></p>
<p class="western" align="justify">
<h1 class="western"><span style="font-size: x-small;">3. GENERAL SCHEME OF THE SYSTEM</span></h1>
<p class="western" align="justify"><span style="font-size: x-small;">The figure 1 represents general scheme of the quality estimation system for sound signals.</span></p>
<p class="western" align="justify">
<p class="western" align="center"><a name="_1212308666"></a><img src="http://docs.google.com/File?id=dcvss358_412gm75n23j_b" alt="" width="653" height="108" align="bottom" /></p>
<p class="western" align="center"><span style="font-size: x-small;"><strong>Fig.1. General scheme of the quality estimation system for sound signals</strong></span></p>
<p class="western" align="center">
<p class="western"><span style="font-size: x-small;">A generator of test signals allows sound signal forming according to one of the sound flow models. It can be either a particularized set of sound signals or a signal, received in output of statistical speech model. (Signal models in details are considered later.) Generator’s signal can either be saved for follow-up usage or be exposed to processing and estimation. Bank of signals stores sound data, received as a result of signals’ generator work or from some external sources.<!--more--><br />
</span></p>
<p class="western"><span style="font-size: x-small;">Accordingly, an input of estimation block is a signal of generator directly or one of the bank of signals. Test signal is the input of the synchronizer or of the device under test, which can be for example, a vocoder or a communication channel. The output signal of the device under test is an input of synchronizer also.</span></p>
<p class="western"><span style="font-size: x-small;">The synchronizer matches in time an initial signal and a processed signal. The synchronized signals in chunks input in analytical module, which determines the degree of similarity for signals and issues the quality estimation as the measure of similarity between the initial and the processed signals.</span></p>
<p class="western"><span style="font-size: x-small;">Let’s consider the functioning of system modules in details.</span></p>
<h3 class="western"><span style="font-size: x-small;">3.1. Generator of test signals</span></h3>
<p class="western"><span style="font-size: x-small;">The generator of test signals consists of a generator of noise signals and a simplified statistical speech model. Both of generators simulate the process of “speaking”, but their approaches to speech production simulating differ. The statistical model forms sound flow on the base of human speech patterns and the generator of noise signals bases on knowledge about sound perception and speech production.</span></p>
<h3 class="western"><span style="font-size: x-small;">3.2. Generator of noise signals</span></h3>
<p class="western"><span style="font-size: x-small;">The generator of noise signals operates on speech flow model like one, which used in the STI method. The idea is that we may approximately consider speech signal as broadband signal modulated by low-frequency signal. Articulation speed determines modulation frequency, which varies from 0.63 to 13.44 Hz.</span></p>
<p class="western"><span style="font-size: x-small;">As a modulation signal the noise signal is used, resulting from white noise by means of cutting the critical bands of hearing and speech production. In the first case the signal generated allows estimation of sound signal quality in general, in the other case – particularly speech signal estimation. Critical bands in details are considered in the description of the analytical module.</span></p>
<h3 class="western"><span style="font-size: x-small;">3.3. Statistical speech model</span></h3>
<p class="western"><span style="font-size: x-small;">Language consists of sounds. Every individual generates a unique set of sounds. However, one can distinguish standard speakers (SS), generating average kinds of sounds. Standard speakers are subdivided according to their age, gender, region, social status, education, occupation etc.</span></p>
<p class="western"><span style="font-size: x-small;">One should determine sound frequencies, probabilities of sounds following each other, intonation contours, vocabularies, physical properties of individual sounds for every standard speaker. Based on these data one can simulate natural speech flow.</span></p>
<p class="western"><span style="font-size: x-small;">One should also include in the system statistic information about the population structure and with its help generate speech flows with the features, which characterize population of some region or the whole country.</span></p>
<p class="western"><span style="font-size: x-small;">Broadly speaking, statistic model (fig.2) contains statistic data about the population structure, speech bases of standard speakers, speech signal processing facilities (algorithms of synthesis), means of speed sounds parameters determination, generation algorithms of sounds and standard speakers distributions.</span></p>
<p class="western" align="left">
<p class="western" align="center"><a name="_1212308593"></a><img src="http://docs.google.com/File?id=dcvss358_413ddgf7hcn_b" alt="" width="572" height="222" align="bottom" /></p>
<p class="western" align="center"><strong><span style="font-size: x-small;">Fig.2. Extended structure of the statistical model</span></strong></p>
<p class="western" align="center">
<p class="western"><span style="font-size: x-small;">The interface block provides interaction with outer world (or User) and also synchronizes functions of other blocks of statistic model.</span></p>
<p class="western"><span style="font-size: x-small;">The block of speaker choice generates sample of standard speakers (or sequence of indexes of standard speakers). Depending on the command a representative sample of standard speakers or a sample from one standard speaker can be generated. The sample is representative in the sense that the speech parameters distribution in it corresponds to the speech parameters distribution of the population, described in the model.</span></p>
<p class="western"><span style="font-size: x-small;">The sequence of indexes of standard speakers is saved in the block of standard speaker choice for further usage.</span></p>
<p class="western"><span style="font-size: x-small;">The block of sound choice forms the prosodic (the descriptions of sounds). Depending on the command prosodic is constituted either for a representative sound sample, or for a specified sequence of sounds, or for one specified sound.</span></p>
<p class="western"><span style="font-size: x-small;">Prosodic is saved in the prosodic buffer follow-up usage.</span></p>
<p class="western"><span style="font-size: x-small;">The block of speech flow transforms descriptions of sounds in readings of speech signal.</span></p>
<p class="western"><span style="font-size: x-small;">The block of the descriptions of standard speakers stores descriptions of standard speakers and on query returns necessary parts of descriptions, information about their number, list of speakers.</span></p>
<h3 class="western"><span style="font-size: x-small;">3.4. Signals synchronizer</span></h3>
<p class="western" align="justify"><span style="font-size: x-small;">The synchronizer matches in time domain initial and processed signals. Input of the synchronizer receives signal segments<span style="color: #ff0000;"> </span>(pDATA), duration of which is equal to VAD (Voice Activity Detection) frame, and criterions of VAD activity for them are specified in the pDATA segments.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">Any sound signal can be separated into active and inactive phases. The first corresponds to active sound processes, the latter – to low-level background noise. The elementary way of dividing these two phases is to divide them according to signal energy level. However such approach is not accurate enough. In our approach VAD algorithm presented in recommendation G.723 is used for this purpose (as a part of VAD vocoder).</span></p>
<p class="western"><span style="font-size: x-small;">After filtration the state criterions and signal frames enter the the synchronizer blocks, which combine active signal fragments and pauses. The modules use common data: buffer of active etalon signal (EBuffer1), buffer of active signal under test (TBuffer1), buffer of the etalon signal pause (EBuffer0), buffer of signal under test pause (TBuffer0), readiness criterion of buffers of active signal and pauses (dReady[0..1]). There is also a counter of synchronization errors (dErrorCounter).</span></p>
<p class="western"><span style="font-size: x-small;">Output of the synchronizer is a pair of buffers with active signals or a pair of buffers with pauses. Both of the blocks of synchronizer can initiate an appearance of a pair of synchronized buffers.</span></p>
<p><span style="font-size: x-small;">The synchronized buffers and the criterion of activity are the input of analytical module.</span></p>
<h3 class="western"><span style="font-size: x-small;">3.5. Analytical module</span></h3>
<p class="western"><span style="font-size: x-small;">The analytical module compares separately the combined pairs of fragments of active and inactive phase signal that allows getting more accurate estimation.</span></p>
<p class="western" align="center">
<p class="western" align="justify"><span style="font-size: x-small;">The integral spectrum is determined for each fragment using discrete cosine transformation (DCT). Spectrum integration is calculated according to the proprietary formula.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">In the spectrum calculation the interpenetration of windows comes to N/2 samples, the known Hamming or Blackmann-Harris window function is applied to every window.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">Levels of spectrum energy on bands are determined for all sets of bands. Groups of critical bands [2-6], determined by different authors resulting from different models of sound perception and speech production are already known.</span></p>
<p class="western" align="justify">
<p class="western" align="justify"><span style="font-size: x-small;">Band boundaries (initial and terminal indexes) as well as band energy values are determined according to a set of proprietary formulas.</span></p>
<p align="center">
<p class="western" align="justify"><span style="font-size: x-small;">The initial quality estimation value is taken as 100%. Further it decreases proportionally to distinction of energies on bands. Quality estimation values are determined on every set of bands. The overall quality estimation on all bands is calculated according to proprietary formulas.</span></p>
<p class="western">
<p class="western"><span style="font-size: x-small;">To determine sound (D) and word (W) intelligibility the following formulas may be used:</span></p>
<p class="western"><img src="http://docs.google.com/File?id=dcvss358_414cdwtzkhj_b" alt="" width="165" height="31" align="absmiddle" /><span style="font-size: x-small;">, where	(4)</span></p>
<p class="western"><span style="font-size: x-small;">S = 0,8 D<sup>2 </sup>+0,2 D<sup>4 </sup>– known Pokrovskij’s formula</span></p>
<p class="western"><img src="http://docs.google.com/File?id=dcvss358_415chfz3rcm_b" alt="" width="164" height="56" align="absmiddle" /><span style="font-size: x-small;"> (5)</span></p>
<p class="western"><span style="font-size: x-small;">To go from the quality loss coefficient to the sound intelligibility value, a correspondent table is used.</span></p>
<p class="western">
<p class="western"><span style="font-size: x-small;">To determine value in intermediate points, interpolation (for example, Lagrange interpolation polynomial) is used. Figure 3 represents the diagram of dependence (S(dQ)).</span></p>
<p class="western">
<p class="western" align="center"><img src="http://docs.google.com/File?id=dcvss358_416qpm34jfg_b" alt="" width="411" height="252" align="bottom" /></p>
<p class="western" align="center"><strong><span style="font-size: x-small;">Fig.3. Dependence of the syllable intelligibility from the quality estimation value</span></strong></p>
<p><span style="font-size: x-small;">Quality estimations can be translated similarly into MOS estimation values.</span></p>
<h1 class="western"><span style="font-size: x-small;">4. IMPLEMENTATION &amp; CONCLUSIONS</span></h1>
<p class="western" align="justify"><span style="font-size: x-small;">Algorithms described are implemented for voice quality estimation and comparison of external initial signals and signals under test.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">As the external arbitrary signals recorded with the sampling frequency of 8 kHz and the capacity of samples equal to 16 bits can be used. Supposed, the signal under test is received from an initial signal as a result of some transformations (for example, compression/restoration, transmission through communication channels, filtration). In additional as an initial external signal a record of the phonetically representative text read aloud by several speaker of different age of both gender.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">As internal initial signals (i.e. signals, which the user of the program has no access to) the signals generated according to the noise model (the description of the generator is given below) and the signals, generated on the basic of the statistic model.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">The internal signals are put in the system of sound data comparison/restoration, implemented for example as a DLL with the specified interface. The signal processed by means of methods contained in DLL is considered as the signal under test and is exposed to the quality estimation procedure described earlier.</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">Presented method of sound signal quality estimation has a number of advantages over known methods of quality measurements, namely:</span></p>
<ul>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">it is universal since it allows judging the quality of signals from various source and processed in different ways;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">one can optimize quality estimation signal depending on the purposes:</span></p>
</li>
</ul>
<ul>
<li>
<ul>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">in speed (for example, it is possible to receive rough estimation quickly);</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">in signal type (using different bands for speech signals and sound signals in general);</span></p>
</li>
</ul>
</li>
</ul>
<ul>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">resulting estimations correlate well with that of МОS;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">quality estimations received for speech signals can be translated into values of various kinds of intelligibility.</span></p>
</li>
</ul>
<p class="western" align="justify">
<p class="western" align="justify"><span style="font-size: x-small;">Table 1 represents quality estimations of several standard voice codecs, received on various test signals using the method suggested and the realization described. The table contains MOS estimations for comparison.</span></p>
<p class="western" align="justify">
<p class="western" align="right"><strong><span style="font-size: x-small;">Table 1. Sound quality estimation of vocoders</span></strong></p>
<div>
<table border="1" cellspacing="0" cellpadding="8" width="529" bordercolor="#000000">
<col width="72"></col>
<col width="30"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<col width="23"></col>
<tbody></tbody>
<tbody></tbody>
<tbody></tbody>
<tbody></tbody>
<tbody></tbody>
<tbody></tbody>
<tbody></tbody>
<tbody>
<tr>
<td rowspan="3" width="72">
<p class="western" align="justify"><span style="font-size: x-small;">Codec</span></p>
</td>
<td rowspan="3" width="30">
<p class="western" align="justify"><span style="font-size: x-small;">MOS</span></p>
</td>
<td colspan="6" width="220">
<p class="western" align="center"><span style="font-size: x-small;">Noise model</span></p>
</td>
<td colspan="2" rowspan="2" width="63">
<p class="western" align="center"><span style="font-size: x-small;">Statistic model</span></p>
</td>
<td colspan="2" rowspan="2" width="63">
<p class="western" align="center"><span style="font-size: x-small;">PhRT</span></p>
</td>
</tr>
<tr>
<td colspan="2" width="63">
<p class="western" align="center"><span style="font-size: x-small;">Minimal</span></p>
</td>
<td colspan="2" width="63">
<p class="western" align="center"><span style="font-size: x-small;">Reduced</span></p>
</td>
<td colspan="2" width="63">
<p class="western" align="center"><span style="font-size: x-small;">Complete</span></p>
</td>
</tr>
<tr>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">-</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">Vc</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">-</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">Vc</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">-</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">Vc</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">-</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">Vc</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">-</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">Vc</span></p>
</td>
</tr>
<tr>
<td width="72">
<p class="western" align="justify"><span style="font-size: x-small;">A-Law</span></p>
</td>
<td width="30">
<p class="western" align="justify"><span style="font-size: x-small;">4,10</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,79</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,73</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,79</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,80</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,80</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,84</span></p>
</td>
</tr>
<tr>
<td width="72">
<p class="western" align="justify"><span style="font-size: x-small;">Mu-Law</span></p>
</td>
<td width="30">
<p class="western" align="justify"><span style="font-size: x-small;">4,10</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,79</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,84</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,77</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,77</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,77</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,79</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,79</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,82</span></p>
</td>
</tr>
<tr>
<td width="72">
<p class="western" align="justify"><span style="font-size: x-small;">G.723.6.3</span></p>
</td>
<td width="30">
<p class="western" align="justify"><span style="font-size: x-small;">3,90</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,25</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,48</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,21</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,29</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,22</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,33</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,15</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,04</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,08</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,95</span></p>
</td>
</tr>
<tr>
<td width="72">
<p class="western" align="justify"><span style="font-size: x-small;">GSM.6.10</span></p>
</td>
<td width="30">
<p class="western" align="justify"><span style="font-size: x-small;">3,70</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,20</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">1,99</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,01</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">1,65</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,04</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">1,78</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,22</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,66</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,01</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,21</span></p>
</td>
</tr>
<tr>
<td width="72">
<p class="western" align="justify"><span style="font-size: x-small;">G.723.5.3</span></p>
</td>
<td width="30">
<p class="western" align="justify"><span style="font-size: x-small;">3,65</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,23</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,44</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,18</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,27</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,19</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,32</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,14</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,04</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">4,06</span></p>
</td>
<td width="23">
<p class="western" align="justify"><span style="font-size: x-small;">3,93</span></p>
</td>
</tr>
</tbody>
</table>
</div>
<p class="western" align="justify">
<p class="western" align="justify"><span style="font-size: x-small;">The estimations under the assumption, that bands are of equal probability, are in the column with «-» symbol and the estimation received under taking into account the coefficients of importance are in the column with «Vc».</span></p>
<p class="western" align="justify">
<h1 class="western"><span style="font-size: x-small;">5. TRENDS OF DEVELOPMENT</span></h1>
<p class="western" align="justify"><span style="font-size: x-small;">According to the structure of the suggested quality estimation system of sound signals the system can develop in following trends:</span></p>
<ul>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">the test signal model improvement. Here the noise model can be supplied with a set of multiband modulated noise signals; the set of data and algorithms of the statistic speech model can be enriched, the number of preprepared test signals (such as records of PhRT) can be enlarged;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">the development of more upgraded algorithms of synchronization, based, for example, on coincidence of maximums in signal energy spectrums;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">the acoustic model modernization with taking into account masking effects and the fact that pure tones and band noise cause the hearing in some way differently;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">the signal comparison scheme modernization. Current distance measure is not accurate enough for strongly different signals. For higher universality of the system it is desired to use the correlation analysis methods for comparison;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">to solve a number of practical problems the systems requires the possibility to work with multichannel (Stereo-, Quadro-, etc.) and to receive immediate quality estimations;</span></p>
</li>
<li>
<p class="western" align="justify"><span style="font-size: x-small;">absolutely correct translation of the objective estimations into MOS estimation values requires further experimental researches.</span></p>
</li>
</ul>
<h3 class="western"><span style="font-size: x-small;">REFERENCES</span></h3>
<p class="western" align="justify"><span style="font-size: x-small;">1. Aldoshina I., &#8220;Bases of psychoacoustics&#8221;, The sound producer, 2002, №5, 8</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">2. Sekunov N., &#8220;Processing of a sound on PC&#8221;, bhv, Saint-Petersburg, 2001</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">3. Sapozhkov M.A., &#8220;Speech signal in cybernetics and communications&#8221;, Svyazizdat, Moscow, 1963</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">4. Pokrovskiy N.B., &#8220;Calculation and measurement of speech legibility&#8221;, Svyazizdat, Moscow, 1962</span></p>
<p class="western" align="justify"><span style="font-size: x-small;">5. Sorokin V.N., &#8220;Speech synthesis&#8221;, Nauka, Moscow, 1992</span></p>
<p class="western" align="justify">
]]></content:encoded>
			<wfw:commentRss>http://wordpress.sevana.fi/white-paperautomated-sound-signals-quality-estimation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Business Benefits</title>
		<link>http://wordpress.sevana.fi/business-benefits/</link>
		<comments>http://wordpress.sevana.fi/business-benefits/#comments</comments>
		<pubDate>Sun, 15 Feb 2009 08:53:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Voice and Sound Quality Testing Software]]></category>
		<category><![CDATA[automated]]></category>
		<category><![CDATA[automatically]]></category>
		<category><![CDATA[codec]]></category>
		<category><![CDATA[Index]]></category>
		<category><![CDATA[Intelligibility]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[on the fly]]></category>
		<category><![CDATA[qoe]]></category>
		<category><![CDATA[RATSI]]></category>
		<category><![CDATA[Speech]]></category>
		<category><![CDATA[STIPA]]></category>
		<category><![CDATA[test]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[voice]]></category>
		<category><![CDATA[voice quality testing QoS MOS method methods pesq]]></category>
		<category><![CDATA[vqt]]></category>

		<guid isPermaLink="false">http://wordpress.sevana.fi/?p=9</guid>
		<description><![CDATA[]]></description>
			<content:encoded><![CDATA[]]></content:encoded>
			<wfw:commentRss>http://wordpress.sevana.fi/business-benefits/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
