Liip Blog Kirby Thu, 14 Jun 2018 00:00:00 +0200 Latest articles from the Liip Blog en Let’s make Moodle amazing Thu, 14 Jun 2018 00:00:00 +0200 <h2>A new empowering direction for Moodle</h2> <p>MoodleMoot UK &amp; Ireland 2018 in Glasgow was the place to be, if you asked yourself like I did: <em>“What will be the future of the Learning Management System (LMS) called Moodle?”</em>. In fact, from the 26th to the 28th of March 2018, the Moodle Headquarters organized a <a href="">conference</a> dedicated to Moodle Partners (companies offering Moodle services such as Liip), as well as developers and administrators of the very popular open source course management system. That was a great opportunity to meet all these stakeholders and learn about the actual trends of this LMS. The program begins with the announcement of a <a href="">$6 million investment</a> from the company Education for the Many. Moodle HQ will make use of this funding to improve consistency and sustainability, to build a new European Headquarter in Barcelona and to improve its didactic approach. </p> <h2>A new investor believing in the Moodle mission</h2> <p>Martin Dougiamas, founder and CEO of the Moodle HQ, opens the conference with an inspiring keynote about the goals for the near future. Reminding the mission – empowering educators to improve our world – he articulates the vision of the company.</p> <p><em>“Education is maybe the only weapon that can make a difference, as we need responsible persons to face the current issues of our world”.<br /> </em><br /> This turning point requires financial support. Education for the Many, an investment company of the French-based Leclercq family involved in well-known businesses such as Decathlon sporting goods, understands the challenges that Moodle is facing. They are not focused only on the return on investment but they also care about the educational vision. For the time and money invested, Education for the Many receives a minor stake in Moodle HQ and a seat on the board.</p> <h2>Future challenges</h2> <p><em>“It’s time to make Moodle amazing!”</em>, continued Martin. One of the benefits for Europeans will be the growth of the Moodle office in Barcelona. It should expand to become like the headquarters in Perth. Therefore, Barcelona will turn into the European Moodle HQ. As most Moodle users are located in Europe, being close to them is an advantage. The Moodle product is and should always remain competitive. Ensuring this is one of the pillars of the new strategy. With this goal in mind, the future Moodle 3.6+ versions will be designed to achieve sustainability at a high level. Furthermore, they will concentrate on improving the usability, creating standards, enhancing system integrations as well as being supported across all devices. </p> <h2>Engaging the learners</h2> <p>One of the big challenges as a teacher is to keep participants engaged during the learning process. To support this, Moodle HQ is developing a special certification for Moodle Partners, so they can deepen their software knowledge and get up-to-date on the best practices for online content creation. Through official Moodle Partners, teachers can access the same education platforms. This is how the Learn Moodle platform aims to significantly improve the quality of teaching. Moreover, effort will be invested to maximize connections inside the community of users and administrators, in order to build a big and strong userbase through the <a href=""></a> association. This platform will support the creation of educative content as well as sharing and offering services. Every Moodler is welcome to take part in this project.</p> <p>To summarize, I came back from the conference more confident than ever about Moodle's potential, empowered as a Moodle Partner, and impatient to bring Moodle's capabilities to our customers.</p><img src="" height="1" width="1" alt=""/> Recipe Assistant Prototype with ASR and TTS on Socket.IO - Part 3 Developing the prototype Tue, 12 Jun 2018 00:00:00 +0200 <p>Welcome to part three of three in our mini blog post series on how to build a recipe assistant with automatic speech recognition and text to speech to deliver a hands free cooking experience. In the first blog post we gave you a hands on <a href="">market overview</a> of existing Saas and opensource TTS solutions, in the second post we have put the user in the center by covering the <a href="">usability aspects of dialog driven apps</a> and how to create a good conversation flow. Finally it's time to get our hands dirty and show you some code. </p> <h3>Prototyping with Socket.IO</h3> <p>Although we envisioned the final app to be a mobile app and run on a phone it was much faster for us to build a small web application, that is basically mimicking how an app might work on the mobile. Although is not the newest tool in the shed, it was great fun to work with it because it was really easy to set up. All you needed is a js library on the HTML side and tell it to connect to the server, which in our case is a simple python flask micro-webserver app.</p> <pre><code class="language-html">#socket IO integration in the html webpage ... &lt;script src=""&gt;&lt;/script&gt; &lt;/head&gt; &lt;body&gt; &lt;script&gt; $(document).ready(function(){ var socket = io.connect('http://' + document.domain + ':' + location.port); socket.on('connect', function() { console.log("Connected recipe"); socket.emit('start'); }); ...</code></pre> <p>The code above connects to our flask server and emits the start message, signaling that our audio service can start reading the first step. Depending on different messages we can quickly alter the DOM or do other things in almost real time, which is very handy.</p> <p>To make it work on the server side in the flask app all you need is a <a href="">python library</a> that you integrate in your application and you are ready to go:</p> <pre><code class="language-python"># in flask from flask_socketio import SocketIO, emit socketio = SocketIO(app) ... #listen to messages @socketio.on('start') def start_thread(): global thread if not thread.isAlive(): print("Starting Thread") thread = AudioThread() thread.start() ... #emit some messages socketio.emit('ingredients', {"ingredients": "xyz"}) </code></pre> <p>In the code excerpt above we start a thread that will be responsible for handling our audio processing. It starts when the web server receives the start message from the client, signalling that he is ready to lead a conversation with the user. </p> <h3>Automatic speech recognition and state machines</h3> <p>The main part of the application is simply a while loop in the thread that listens to what the user has to say. Whenever we change the state of our application, it displays the next recipe state and reads it out loudly. We’ve sketched out the flow of the states in the diagram below. This time it is really a simple mainly linear conversation flow, with the only difference, that we sometimes branch off, to remind the user to preheat the oven, or take things out of the oven. This way we can potentially save the user time or at least offer some sort of convenience, that he doesn’t get in a “classic” recipe on paper. </p> <img src="" alt="flow" format="png"> <p>The automatic speech recognion (see below) works with <a href=""></a> in the same manner like I have shown in my recent <a href="">blog post</a>. Have a look there to read up on the technology behind it and find out how the RecognizeSpeech class works. In a nutshell we are recording 2 seconds of audio locally and then sending it over a REST API to <a href=""></a> and waiting for it to turn it into text. While this is convenient from a developer’s side - not having to write a lot of code and be able to use a service - the downside is the reduced usability for the user. It introduces roughly 1-2 seconds of lag, that it takes to send the data, process it and receive the results. Ideally I think the ASR should take place on the mobile device itself to introduce as little lag as possible. </p> <pre><code class="language-python">#abbreviated main thread self.states = ["people","ingredients","step1","step2","step3","step4","step5","step6","end"] while not thread_stop_event.isSet(): socketio.emit("showmic") # show the microphone symbol in the frontend signalling that the app is listening text = recognize.RecognizeSpeech('myspeech.wav', 2) #the speech recognition is hidden here :) socketio.emit("hidemic") # hide the mic, signaling that we are processing the request if self.state == "people": ... if intro_not_played:["about"])["persons"]) intro_not_played = False persons = re.findall(r"\d+", text) if len(persons) != 0: self.state = self.states[self.states.index(self.state)+1] ... if self.state == "ingredients" ... if intro_not_played:["ingredients"]) intro_not_played = False ... if "weiter" in text: self.state = self.states[self.states.index(self.state)+1] elif "zurück" in text: self.state = self.states[self.states.index(self.state)-1] elif "wiederholen" in text: intro_not_played = True #repeat the loop ... </code></pre> <p>As we see above, depending on the state that we are in, we play the right audio TTS to the user and then progress into the next state. Each step also listens if the user wanted to go forward (weiter), backward (zurück) or repeat the step (wiederholen), because he might have misheard. </p> <p>The first prototype solution, that I am showing above, is not perfect though, as we are not using a wake-up word. Instead we are offering the user periodically a chance to give us his input. The main drawback is that when the user speaks when it is not expected from him, we might not record it, and in consequence be unable to react to his inputs. Additionally sending audio back and forth in the cloud, creates a rather sluggish experience. I would be much happier to have the ASR part on the client directly especially when we are only listening to mainly 3-4 navigational words. </p> <h3>TTS with Slowsoft</h3> <p>Finally you have noticed above that there is a play method in the code above. That's where the TTS is hidden. As you see below we first show the speaker symbol in the application, signalling that now is the time to listen. We then send the text to Slowsoft via their API and in our case define the dialect &quot;CHE-gr&quot; and the speed and pitch of the output.</p> <pre><code class="language-python">#play function def play(self,text): socketio.emit('showspeaker') headers = {'Accept': 'audio/wav','Content-Type': 'application/json', "auth": "xxxxxx"} with open("response.wav", "wb") as f: resp ='', headers = headers, data = json.dumps({"text":text,"voiceorlang":"gsw-CHE-gr","speed":100,"pitch":100})) f.write(resp.content) os.system("mplayer response.wav")</code></pre> <p>The text snippets are simply parts of the recipe. I tried to cut them into digestible parts, where each part contains roughly one action. Here having an already structured recipe in the <a href="">open recipe</a> format helps a lot, because we don't need to do any manual processing before sending the data. </p> <h3>Wakeup-word</h3> <p>We took our prototype for a spin and realized in our experiments that it is a must to have a wake-up. We simply couldn’t time the input correctly to enter it when the app was listening, this was a big pain for user experience. </p> <p>I know that nowadays smart speakers like alexa or google home provide their own wakeup word, but we wanted to have our own. Is that even possible? Well, you have different options here. You could train a deep network from scratch with <a href="">tensorflow-lite</a> or create your own model by following along this tutorial on how to create a <a href="">simple</a> speech recognition with tensorflow. Yet the main drawback is that you might need a lot (and I mean A LOT as in 65 thousand samples) of audio samples. That is not really applicable for most users. </p> <img src="" alt="snowboy" format="png"> <p>Luckily you can also take an existing deep network and train it to understand YOUR wakeup words. That means that it will not generalize as well to other persons, but maybe that is not that much of a problem. You might as well think of it as a feature, saying, that your assistant only listens to you and not your kids :). A solution of this form exists under the name <a href="">snowboy</a>, where a couple of ex-Googlers created a startup that lets you create your own wakeup words, and then download those models. That is exactly what I did for this prototype. All you need to do is to go on the snowboy website and provide three samples of your wakeup-word. It then computes a model that you can download. You can also use their <a href="">REST API</a> to do that, the idea here is that you can include this phase directly in your application making it very convenient for a user to set up his own wakeup- word. </p> <pre><code class="language-python">#wakeup class import snowboydecoder import sys import signal class Wakeup(): def __init__(self): self.detector = snowboydecoder.HotwordDetector("betty.pmdl", sensitivity=0.5) self.interrupted = False self.wakeup() def signal_handler(signal, frame): self.interrupted = True def interrupt_callback(self): return self.interrupted def custom_callback(self): self.interrupted = True self.detector.terminate() return True def wakeup(self): self.interrupted = False self.detector.start(detected_callback=self.custom_callback, interrupt_check=self.interrupt_callback,sleep_time=0.03) return self.interrupted </code></pre> <p>All it needs then is to create a wakeup class that you might run from any other app that you include it in. In the code above you’ll notice that we included our downloaded model there (“betty.pmdl”) and the rest of the methods are there to interrupt the wakeup method once we hear the wakeup word.</p> <p>We then included this class in your main application as a blocking call, meaning that whenever we hit the part where we are supposed to listen to the wakeup word, we will remain there unless we hear the word:</p> <pre><code class="language-python">#integration into main app ... #record socketio.emit("showear") wakeup.Wakeup() socketio.emit("showmic") text = recognize.RecognizeSpeech('myspeech.wav', 2) …</code></pre> <p>So you noticed in the code above that we changed included the <em>wakeup.Wakeup()</em> call that now waits until the user has spoken the word, and only after that we then record 2 seconds of audio to send it to processing with In our testing that improved the user experience tremendously. You also see that we signall the listening to the user via graphical clues, by showing a little ear, when the app is listening for the wakeup word, and then showing a microphone when the app is ready is listening to your commands. </p> <h3>Demo</h3> <p>So finally time to show you the Tech-Demo. It gives you an idea how such an app might work and also hopefully gives you a starting point for new ideas and other improvements. While it's definitely not perfect it does its job and allows me to cook handsfree :). Mission accomplished! </p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <h2>What's next?</h2> <p>In the first part of this blog post series we have seen quite an <a href="">extensive overview</a> over the current capabilities of TTS systems. While we have seen an abundance of options on the commercial side, sadly we didn’t find the same amount of sophisticated projects on the open source side. I hope this imbalance catches up in the future especially with the strong IoT movement, and the need to have these kind of technologies as an underlying stack for all kinds of smart assistant projects. Here is an <a href="">example</a> of a Kickstarter project for a small speaker with built in open source ASR and TTS.</p> <p>In the <a href="">second blog post</a>, we discussed the user experience of audio centered assistants. We realized that going audio-only, might not always provide the best user experience, especially when the user is presented with a number of alternatives that he has to choise from. This was especially the case in the exploration phase, where you have to select a recipe and in the cooking phase where the user needs to go through the list of ingredients. Given that the <a href="">Alexas</a>, <a href="">Homepods</a> and the <a href="">Google Home</a> smart boxes are on their way to take over the audio-based home assistant area, I think that their usage will only make sense in a number of very simple to navigate domains, as in “Alexa play me something from Jamiroquai”. In more difficult domains, such as cooking, mobile phones might be an interesting alternative, especially since they are much more portable (they are mobile after all), offer a screen and almost every person already has one. </p> <p>Finally in the last part of the series I have shown you how to integrate a number of solutions together - for ASR, slowsoft for TTS, snowboy for wakeupword and and flask for prototyping - to create a nice working prototype of a hands free cooking assistant. I have uploaded the code on github, so feel free to play around with it to sketch your own ideas. For us a next step could be taking the prototype to the next level, by really building it as an app for the Iphone or Android system, and especially improve on the speed of the ASR. Here we might use the existing <a href="">coreML</a> or <a href="">tensorflow light</a> frameworks or check how well we could already use the inbuilt ASR capabilities of the devices. As a final key take away we realized that building a hands free recipe assistant definitely is something different, than simply having the mobile phone read out the recipe out loud for you. </p> <p>As always I am looking forward to your comments and insights and hope to update you on our little project soon.</p><img src="" height="1" width="1" alt=""/> Why I travel 4h+ every day to work and back just to write my graduation thesis Mon, 11 Jun 2018 00:00:00 +0200 <p>I am 24 years old and in the 8th semester of my study “Business Information Systems”. </p> <p>This May I started working at Liip. A lot of my fellow students and friends are asking me: <em>Why would you take that route back and forth every day if you can just do it somewhere near?</em> 🤷<br /> This is actually a very reasonable and good question which hopefully will be answered by the end of this post. (<strong>Spoiler: No, the answer is not the juicy swiss-wage everyone thinks of 🤑</strong>)</p> <h2>My daily route</h2> <img src="" alt="distance-radolfzell-zurich"> <p>Figure 1: Here you can see the route I’d be taking if I was walking. Sure, let me swim across the lake real quick. 🏊‍<br /> The route is about <strong>80km</strong> (about <strong>50 miles</strong>) if I was driving by car. Instead of driving by car, I am taking the train though. I’ll elaborate on why I take the train instead of driving myself later in this post.<br /> Long talk short: It takes me 3 transfers to reach the Liip office in Zurich and about <strong>2h one way</strong>. Which makes a total of about <strong>4h every day to the office and back</strong>.<br /> <em>A lot time, isn’t it?</em> Read along.</p> <h2>How I got to Liip and why I chose to send an application to this company</h2> <p><em>Side note: I already knew, I wanted to do my graduation thesis at a company. Also, what was clear to me is I wanted to go in the web-development direction. The question was just: Which company do I want to do it at?</em></p> <p>I have already known about Liip through their <strong>open source contributions</strong>, such as their <a href="" rel="noopener noreferrer" target="_blank">PHP one-line installation</a> or their <a href="" rel="noopener noreferrer" target="_blank">LiipImagineBundle</a>. There is plenty of more open source contributions. I just picked two of which I had used in the past myself and still use from time to time.<br /> Further, I did some research and noticed the awards the company has. One example for this is: Liip achieved the <a href="" rel="noopener noreferrer" target="_blank">top 5 medium-sized companies</a>.<br /> This was enough to convince myself, to take the opportunity, to write a mail regarding and asking as of if it’s possible to write my thesis at this company.<br /> After a little back and forth mailing, video-calling and explaining how writing a graduation Thesis at a company works, I was invited to come by for an interview. So I went for it, and it all went successful. 🎉</p> <p>To put it in a nutshell: I don’t know about any other company having open source contributions which I have been using. Other than that, it’d be hard beating the award count. Also, I’d be lying if I said Zurich didn’t look cool on my CV(right? :D).</p> <p><em>&quot;I think it is possible for ordinary people to choose to be extraordinary.&quot; - Elon Musk</em></p> <h2>What is my topic and why it was an impact of my choice to this company</h2> <p>I was <strong>beforehand told</strong> that if I will be working at Liip, it’ll most likely be an <strong>open source bundle</strong> (which I find to be cool to support open source) regarding the eCommerce framework <a href="" rel="noopener noreferrer" target="_blank">Sylius</a>.<br /> ECommerce gets <strong>more and more important</strong> as more and more people are shopping online <strong>instead</strong> of walking to the shops. I want to use this as an <strong>opportunity</strong> to get <strong>more knowledge</strong> in this area.<br /> More about my topic, to follow, on another blogpost. :)</p> <h2>Why I am taking the train and how I am compensating for the travel time</h2> <p>As earlier mentioned I take the train every time I go to the Zurich Office, which takes like 2h one way. This is mostly the point where people roll their eyes and is the reason for them to reject such an opportunity.<br /> First of all, what do people usually do when being in the train. I see mostly people hang on their phone, probably check out the same post in their social media over and over again until they eventually arrive. I told myself I don’t want to do this, so instead I read books.<br /> Yeah, correct. I think for people that get distracted fast of reading books this is a good opportunity to (kind of) force yourself to read. 📖<br /> Another big option is, since I am working on a Laptop I can start working in the train already, so I’d work less in the office and go back home earlier again. 💻<br /> Every time I get into the train I tell myself: Why pull out your phone if you can be productive instead?<br /> Not only are you using your time on the way when you are taking the train, but you also help the environment. 🌍</p> <h3>An overview of the advantages of taking the train over the car:</h3> <table> <thead> <tr> <th style="text-align: left;">🚗</th> <th style="text-align: left;">🚆</th> </tr> </thead> <tbody> <tr> <td style="text-align: left;">Have to drive slow or you get poor</td> <td style="text-align: left;">No possibilities to get speeding tickets</td> </tr> <tr> <td style="text-align: left;">can't read books</td> <td style="text-align: left;">can read books</td> </tr> <tr> <td style="text-align: left;">can't work remotely</td> <td style="text-align: left;">can work remotely</td> </tr> <tr> <td style="text-align: left;">concentrate on the road</td> <td style="text-align: left;">can relax in the train</td> </tr> <tr> <td style="text-align: left;">don't care about the environment</td> <td style="text-align: left;">care about the environment</td> </tr> <tr> <td style="text-align: left;">❌</td> <td style="text-align: left;">✅</td> </tr> </tbody> </table> <p>TLDR; I chose Liip because of their open source contributions and I wanted to have a contribution too. I am using the train time to read books and start working in the train. There is no time-waste when I am reading or working on my laptop. Also, you help the environment by taking the train over the car.</p><img src="" height="1" width="1" alt=""/> Recipe Assistant Prototype with ASR and TTS on Socket.IO - Part 2 UX Workshop Mon, 04 Jun 2018 00:00:00 +0200 <p>Welcome to part two of three in our mini blog post series on how to build a recipe assistant with automatic speech recognition and text to speech to deliver a hands free cooking experience. In the last <a href="">blog post</a> we provided you with an exhaustive hands on text to speech (TTS) market review, now its time to put the user in the center. </p> <h3>Workshop: Designing a user experience without a screen</h3> <p>Although the screen used to dominate the digital world, thanks to the rapid improvement of technologies, there are more options emerging. Most of mobile users have used or heard Siri from Apple iOS and Amazon Echo and almost <a href="">60 Mio Americans</a> apparently already own a smart speaker. Until recently sill unheard of, nowadays smart voice based assistants are changing our life quickly. This means that user experience has to think beyond screen based interfaces. Actually it has always defined a holistic experience in a context where the user is involved, and also in speech recognition and speech as main input source, UX is needed to prevent potential usability issues in its interaction. </p> <p>Yuri participated in our innoday workshop as an UX designer, where her goal was to help the team to define a recipe assistant with ASR and TTS, that help the user to cook recipes in the kitchen without using his hands, and is a enjoyable to use. In this blog post Yuri helped me to write down our UX workshop steps. </p> <h3>Ideation</h3> <p>We started off with a brainstorming of our long term vision and short term vision and then wrote down down our ideas and thoughts on post its. We then grouped the ideas into three organically emerging topics, which were Business, Technology and User needs. I took the liberty to highlight some of the aspects that came to our minds:</p> <ul> <li>User <ul> <li>Privacy: Users might not want to have their voice samples saved on some google server. Click <a href="">here</a> to listen to all your samples, if you have an Android phone. </li> <li>Alexa vs. Mobile or is audio only enough?: We spent a lot of discussion thinking if a cookbook could work in an audio only mode. We were aware that there is for example an <a href=";ie=UTF8&amp;qid=1526581717&amp;sr=1-1&amp;keywords=chefkoch">Alexa Skill</a> from Chefkoch, but somehow the low rating made us suspicious if the user might need some minimal visual orientation. An app might be able to show you the ingredients or some visual clues on what to do in a certain step and who doesn't like these delicious pictures in recipes that lure you in to give a recipe a try?</li> <li>Conversational Flow: An interesting aspect, that is easy to overlook was how to design the conversational flow in order to allow the user enough flexibility when going through each step of recipe but also not being to rigid.</li> <li>Wakeup Word: The so called wakeup word is a crucial part of every ASR system, which triggers the start of the recording. I've written about it in a recent <a href="">blog post</a>. </li> <li>Assistant Mode: Working with audio also gives interesting opportunities for features that are rather unusual on normal apps. We thought of a spoken audio alert, when the app notifies you to take the food from the oven. Something that might feel very helpful, or very annoying, depending on how it is solved.</li> </ul></li> <li>Technology <ul> <li>Structured Data: Interestingly we soon realized that breaking down a cooking process means that we need to structure our data better than a simple text. An example is simply multiplying the ingredients by the amount of people. An interesting project in this area is the <a href="">open recipe</a> format that simply defines a YAML to hold all the necessary data in a structured way. </li> <li>Lag and Usability: Combining TTS with ASR poses an interesting opportunity to combine different solutions in one product, but also poses the problem of time lags when two different cloud based systems have to work together. </li> </ul></li> <li>Business <ul> <li>Tech and Cooking: Maybe a silly idea, but we definitely thought that as men it would feel much cooler to use a tech gadget to cook the meal, instead of a boring cookbook. </li> </ul></li> </ul> <img src="" alt="stickies"> <h3>User journey</h3> <p>From there we took on the question: “How might we design an assistant that allows for cooking without looking at recipe on the screen several times, since the users’ hands and eyes are busy with cooking.”</p> <p>We sketched the user journey as a full spectrum of activities that go beyond just cooking, and can be described as:</p> <ul> <li>Awareness of the recipes and its interface on App or Web</li> <li>Shopping ingredients according to selected recipe</li> <li>Cooking</li> <li>Eating</li> <li>After eating </li> </ul> <img src="" alt="journey" format="png"> <p>Due to the limited time of an inno-day, we decided to focus on the cooking phase only, while acknowledging that the this phase is definitely part of a much bigger user journey, where some parts, such as exploration, might be hard to tackle with an audio-only assistant. We tried though to explore the cooking step of the journey and broke it down into its own sub-steps. For example: </p> <ul> <li>Cooking <ul> <li>Preparation</li> <li>Select intended Recipe to cook</li> <li>Select number of portions to cook</li> <li>Check ingredients if the user has them all ready</li> </ul></li> <li>Progress <ul> <li>Prepare ingredients</li> <li>The actual cooking (boiling, baking, etc)</li> <li>Seasoning and garnishing </li> <li>Setting on a table</li> </ul></li> </ul> <p>This meant for our cooking assistant that he needs to inform the user when to start each new sub-step and introduce the next steps in a easy unobtrusive way. He has also to track the multiple starts and stops from small actions during cooking, to for example remind the user to preheat the baking oven at an early point in time, when the user might not think of that future step yet (see below)</p> <img src="" alt="steps" format="png"> <h3>User experience with a screen vs. no screen</h3> <p>Although we were first keen on building an audio only interface, we found that a quick visual overview helps to make the process faster and easier. For example, an overview of ingredients can be viewed at a glance on the mobile screen without listening every single ingredient from the app. As a result we decided that a combination of a minimal screen output and voice output will ease out potential usability problems. </p> <p>Since the user needs to navigate with his voice easy input options like “back”, “stop”, “forward”, “repeat” we decided to also show the step that the user is currently in the screen. This feedback helps the user to solve small errors or just orient himself more easily. </p> <p>During the ux-prototyping phase, we also realised that we should visually highlight the moments when the user is expected to speak and when he is expected to listen. That's why immediately after a question from the app, we would like to show an icon with a microphone meaning “Please tell me your answer!”. In a similar way we also want to show an audio icon when we want the user to listen carefully. Finally since we didn’t want the assistant to permanently listen to audio, but listen to a so called “wake-up-word”, we show a little ear-icon, signalling that the assistant is now listening for this wake-up-word. </p> <p>While those micro interactions and visual cues, helped us to streamline the user experience, we still think that these are definitely areas that are central to a user experience and should be improved in a next iteration. </p> <h3>Conclusion and what's next</h3> <p>I enjoyed that instead of starting to write code right away, we first sat together and started to sketch out the concept, by writing sticky notes, with ideas and comments that came to our mind. I enjoyed having a mixed group where we had UX people, Developers, Data Scientists and Project owners sitting at one table. Although our ambitious goal for the day was to deliver a prototype that was able to read recipes to the user we ran out of time and I couldn’t code the prototype on that day, but in exchange I think we gathered very valuable insights on a user experiences that work and that don’t work without a screen. We realized that going totally without a screen is much harder than it seems. It is crucial for the user experience that the user has enough orientation to know where he is in the process in order for him not to feel lost or confused. </p> <p>In the final and third blog post of this mini series I will finally provide you with the details on how to write a simple flask and based prototype that combines automatic speech recognition, text to speech and wake-up-word detection to create a hands-free cooking experience.</p><img src="" height="1" width="1" alt=""/> Growing Like Crazy in a Few Hours: Welcome to Startup Weekend! Sun, 03 Jun 2018 00:00:00 +0200 <p>Well, lucky you, I know a place where people are employees on Friday and entrepreneurs on Sunday. Embark on the journey, it’s going to be a lot of fun (and pleasure)!</p> <img src="" alt="startup-weekend-canvas-english-0-4" format="png"> <h3>ENLARGE YOUR IDEA</h3> <p>Everything begins with an eager desire <strong>[PROBLEM]</strong>. Shared with others, it expands throughout the people believing in it.</p> <p>While pitching, you clarify your thoughts and share your passion for it, which, by contagion, infects others <strong>[ONE MINUTE PITCH]</strong>.</p> <h2>GREAT TEAMMATE(S)</h2> <p>And because it’s always better with other people, you share energy with your teammates, gather together when it’s hard, do high-fives to relaunch after the low moments <strong>[TEAM]</strong>.</p> <p>You take tough decisions together and you accept them whatever they are, committed to be a great partner at any moment.</p> <h2>JUST DO IT</h2> <p>No one will tell you what to do there. You are free to stay or go, and also to say “no” too!</p> <p>You are bold and do things you’ve never done <strong>[EXECUTION]</strong>. And it feels amazing.</p> <p>Organizers and coaches are here to support you. They won’t tell you what or how to do things, they will just ask questions and help you reflecting. And it will be your decisions to take, and later to deal with the consequences.</p> <p>Most of the time, they will just tell you to continue. Don’t worry, they keep you covered!</p> <h3>GET THE F*** OUT OF HERE</h3> <p>But the idea itself is just an idea for you, so you go out and search for people who would be ready to pay for it <strong>[CUSTOMER VALIDATION]</strong>. You don’t need to give them back your service right away, fake it until you make it they say <strong>[BUSINESS MODEL]</strong>!</p> <p>You may need to change position at a moment <strong>[PIVOT]</strong>. Then, just continue!</p> <p><br /></p> <p>At the very end, you wrap it up to present it to the world <strong>[FINAL PITCH]</strong>.</p> <p><br /></p> <p>Later, you look back and realize how much you grew, the new person you became in just a matter of hours. Inside of you something changed, and it cannot be removed. Congrats, you became an entrepreneur!</p> <p><br /></p> <p><strong><a href="">Startup Weekend Zürich</a>, next edition on October 26th, 2018.</strong></p> <p><br /></p> <p>Photograph by <a href="">Jürg Stuker</a>. Startup Weekend Canvas: original by <a href="">Jozué Morales</a>, translated in English and adjusted with the help of the community by <a href="">Léo Davesne</a>.</p> <p>Another post about this fantastic week-end on <a href="">Namics' blog</a>.</p><img src="" height="1" width="1" alt=""/> Why and how we use Xamarin Fri, 01 Jun 2018 00:00:00 +0200 <p>When we start a new project, we always ask ourselves if we should choose Xamarin over a full native solution. I wanted to reflect on past projects and see if it was really worth using Xamarin. </p> <p>But how do you compare projects? I decided to use line counting. It can seem obvious or simplistic, but the number of shared lines of code will easily show how much work has been done once instead of twice. I took the two most recent Xamarin projects that we worked on <a href="">Ticketcorner Ski</a> and <a href="">together</a>.</p> <p>I used the following method:</p> <ul> <li>Use the well-known <a href="">cloc</a> tool to count the number of lines in a project.</li> <li>Count only C# files. <ul> <li>Other types such as json configuration files or API response mocks in unit tests do not matter.</li> </ul></li> <li>Make an exception with Android layout files. <ul> <li>Our iOS layout is all done in Auto Layout code and we don't use Xcode Interface Builder.</li> <li>To have a fair comparison, I included the Android XML files in the line count.</li> </ul></li> <li>Do not count auto-generated files.</li> <li>Do not count blank lines and comments.</li> <li>Other tools like <a href="">Fastlane</a> are also shared, but are not taken into account here.</li> </ul> <p>If you want to try with one of your one project, here are the commands I used for the C# files:</p> <pre><code class="language-bash">cloc --include-lang="C#" --not-match-f="(([Dd]esigner)|(AssemblyInfo))\.cs" .</code></pre> <p>For the Android XML layouts, I used:</p> <pre><code class="language-bash">cloc --include-lang="xml" --force-lang="xml",axml Together.Android/Resources/layout</code></pre> <h2>Here is what I found:</h2> <table> <thead> <tr> <th style="text-align: left;">Project</th> <th style="text-align: center;">Android</th> <th style="text-align: center;">iOS</th> <th style="text-align: center;">Shared</th> </tr> </thead> <tbody> <tr> <td style="text-align: left;">Ticketcorner Ski</td> <td style="text-align: center;">31%</td> <td style="text-align: center;">31%</td> <td style="text-align: center;"><strong>38%</strong></td> </tr> <tr> <td style="text-align: left;">together</td> <td style="text-align: center;">42%</td> <td style="text-align: center;">30%</td> <td style="text-align: center;"><strong>28%</strong></td> </tr> </tbody> </table> <p>We can see that on those projects, an average of one third of code can be shared. I was pretty impressed to see that for Ticketcorner Ski we have the same number of lines on the two platforms. I was also pleasantly surprised to see that the project almost <strong>shares 40% of its code</strong>.</p> <p>In a mobile app, most of the unit tests target the business logic, which is exactly what is shared with Xamarin: business logic and their unit tests are only written once. Most libraries not directly related to business logic are also shared: REST client, Database client, etc...</p> <p>The code that is not shared is mostly UI code, interaction, etc... But it is also platform specific code: how to access the camera, how to handle push notifications, how to securely store user credentials according to each platform's guidelines.</p> <p>It would not be fair to conclude that doing those projects in native would have been 30% more expensive. The shared code has sometimes to take into account that it will be used on two different platforms, and it gets more generic than it would be if written twice.</p> <h2>So... how do you choose one or the other ?</h2> <p>My goal with this blogpost is not to start a flame war on whether Xamarin is good or bad. I have shown here that for those projects, it was the right choice to use Xamarin. I want to share a few things we think about when we have to make a decision. Note that we use Xamarin.iOS and Xamarin.Android, but don't use Xamarin.Forms.</p> <ul> <li>Does the application contain a lot of business logic, or is it more UI-based? <ul> <li>With one Xamarin project we worked on in the past year, a specific (and complex) use-case was overlooked by the client and it resulted in paying users being pretty unhappy. We were very pleased to be able to fix the problem once, and write the related unit tests once too.</li> <li>As a counterexample, for the <a href="">Zürich Zoo app</a>, most of our job was writing UX/UI code. The business logic is solely doing GET requests to a backend.</li> </ul></li> <li>Do you plan to use external libraries/SDKs? <ul> <li>Xamarin is pretty good at <a href="">using .jar files on Android</a>.</li> <li>Native libraries on iOS <a href="">have to be processed manually</a> and it can be tedious to do. It is also hard to use a library packaged with CocoaPods that depends on many other pods.</li> <li>For both platforms, We encountered closed-source tools that are not that easy to convert. As an example, we could use the <a href="">Datatrans SDK</a>, but not without some <em>trial and error</em>.</li> <li>There are however other Xamarin libraries that can replace what you are used to when developping on both platforms. We replace <a href="">Picasso</a> on Android and <a href="">Kingfisher</a> on iOS by <a href="">FFImageLoading</a> on Xamarin. This library has the same API methods on both platforms which makes it easy to use.</li> </ul></li> <li>Do you plan to use platform-specific features? <ul> <li>Xamarin is able to provide access to every platform feature, and it works well. It is also known that they update the Xamarin SDKs as soon as new iOS/Android versions are announced.</li> <li>For <a href="">Urban Connect</a> however, the most important part of the app is using <em>Bluetooth Low Energy</em> to connect to bike locks. Even if Xamarin is able to do it too, it was the right decision to remove this extra layer and code everything natively.</li> </ul></li> <li>Tooling, state of the platform ecosystems: <ul> <li>In the mobile world, things move really fast: <ul> <li>Microsoft pushes really hard for developers to adopt Xamarin, for example with <a href="">App Center</a>, the new solution to build, test, release, and monitor apps. But Visual Studio for Mac is still really buggy and slow.</li> <li>Google added first-class support for <a href="">Kotlin</a>, has an awesome IDE and pushes mobile development with platforms like Firebase or Android Jetpack.</li> <li>Apple follows along, but still somehow fails to improve Xcode and its tooling in a meaningful manner.</li> </ul></li> <li><strong>Choices made one day will certainly not be valid one year later.</strong></li> </ul></li> <li>Personal preferences: <ul> <li>Inside Liip there are very divergent opinions about Xamarin. We always choose the right tool for the job. Having someone efficient and motivated about a tool is important too.</li> </ul></li> </ul> <p>I hope I was able to share my view on why and how we use Xamarin here at Liip. I personally enjoy working both on Xamarin or native projects. Have a look at <a href="">together</a> and <a href="">Ticketcorner Ski</a> and tell us what you think!</p><img src="" height="1" width="1" alt=""/> Repair Café - Reparieren statt Neu kaufen Thu, 31 May 2018 00:00:00 +0200 <h2>Mit Nachhaltigkeit zum Erfolg</h2> <p>Ab ins Repair-Cafe - einem Event an dem Besucher defekte Produkte mitbringen und gemeinsam mit Profis reparieren. Von der Romandie bis in die Ostschweiz und vom Tessin bis nach Basel helfen Repair Cafes nachhaltiger im Umgang mit Verbrauchsgegenständen zu werden. Reparieren statt wegwerfen ist das Motto der Repair Cafe Events. Gestartet als einer Aktion der Stiftung für Konsumentenschutz bestehen jetzt bereits 87 Cafes und Restaurants die Reparaturtage anbieten. Diese Zahl wächst stetig, alleine in den letzten sechs Monaten sind 16 neue Reparaturtage dazugekommen. Im vergangenen Jahr haben die Repair Cafe Events mehr als viereinhalb Tonnen Material vor dem Gang in den Müllcontainer bewahrt. Die Entwicklung ist vielversprechend aber wie erfahren Personen wann und wo ein Event stattfindet?</p> <h3>Wie läuft das ab?</h3> <p>Auf der Seite <a href=""></a> finden sich alle wichtigen Informationen zu Daten, Reparaturzeiten und Cafes in der Nähe. Werkzeuge können von den Besuchern kostenlos genutzt werden und gängige Ersatzteile können vor Ort gekauft werden. Als Anbieter kann der Anmeldungsprozess ebenfalls über die Webseite gemacht werden. </p> <h3>Webprojekt</h3> <p>Eine Webseite für alle Informationen, die die Zusammenarbeit der Stiftung für Konsumentenschutz und den einzelnen Cafes vereinfacht. Und die Visibilität der Events erhöht, war Ziel der Stiftung für Konsumentenschutz. Eine Webseite muss her in kurzer Zeit und mit knappem Budget. Mit diesen Anforderungen sind wir ins Projekt gestartet. </p> <p>Agile Entwicklungsmethoden und reger Austausch machten die Umsetzung möglich. Die Webseite wurde mit OctoberCMS entwickelt um die Flexibilität zu gewährleisten und das einfache Handling auf Stiftungsseite. Das Design ist verspielt und der Informationsaufbau klar. </p> <h3>Zusammenarbeit - purpose over profit</h3> <p>Die Stiftung für Konsumentenschutz setzt sich für die Anliegen der Konsumentinnen und Konsumenten und somit für die Nachhaltigkeit ein. Genau das richtige Projekt für eine Zusammenarbeit, denn wir sind bereits mehrfach für unsere Nachhaltigkeit ausgezeichnet worden. So war auch die Zusammenarbeit geprägt von Vertrauen und dem gemeinsamen Ziel die Welt umweltbewusster zu machen.</p> <p>“Zero waste” ist in aller Munde aber viele unserer Geräte werden trotzdem jährlich ersetzt. Da setzen die Repair Cafes an. Mach dir selbst ein Bild und geh vorbei am nächsten Event in deiner Nähe.</p><img src="" height="1" width="1" alt=""/> The role of CKAN in our Open Data Projects Tue, 29 May 2018 00:00:00 +0200 <h2>CKAN's Main Goal and Key Features</h2> <p><a href="">CKAN</a> is an open source management system whose main goal is to provide a managed data-catalog-system for Open Data. It is mainly used by public institutions and governments. At Liip we use CKAN to mainly help governments to provide their data-catalog and publish data in an accessible fashion to the public. Part of our work is supporting data owners to get their data published in the required data-format. We’re doing this by providing interfaces and useable standards to enhance the user experience on the portal to make it easier to access, read and process the data.</p> <img src="" alt="bookcase" format="jpg"> <h3>Metadata-Catalog</h3> <p>Out of the box CKAN can be used to publish and manage different types of datasets. They can be clustered by organizations and topics. Each dataset can contain resources which themself consist of Files of different formats or links to other Data-Sources. The metadata-standard can be configured to represent the standard you need but the Plugin already includes a simple and useful Meta-Data-Standard that already can get you started. The data is saved into a Postgres-Database by default and is indexed using SOLR.</p> <h3>Powerful Action-API</h3> <p>CKAN ships with an <a href="">API</a> which can be used to browse through the metadata-catalog and create advanced queries on the metadata. With authorization the API can also be used to add, import and update data with straight-forward requests. </p> <h3>Cli-Commands</h3> <p>The standard also includes a range of Cli-Commands which can be used to process or execute different tasks. Those can be very useful, e.g. to manage, automate or schedule backend-jobs.</p> <h3>Preview</h3> <p>CKAN offers the functionality to configure a preview of a number of different file-types, such as tabular-data (e.g. CSV, XLS), Text-Data (e.g. TXT), Images or PDFs. That way interested citizens can get a quick overview into the data itself without having to download it first and having to use local Software to merely get an better idea on how the data looks.</p> <img src="" alt="Preview von Daten auf Statistik Stadt Zürich" format="png"> <h2>Plugins</h2> <p>While CKAN itself acts as a CMS but for data, it really shines when making use of its extensibility and configure and develop it to your business needs and requirements. There is already a wide-ranging list of plugins that have been developed for CKAN, which covers a broad range of additional features or make it easier to adjust CKAN to fit your use cases and look and feel. A collection of most of the plugins can be found on <a href="">CKAN-Extensions</a> and on <a href="">Github</a>.</p> <p>At Liip we also help maintaining a couple of CKAN's plugins. The most important ones that we use in production for our customers are:</p> <h3>ckanext-harvest</h3> <p>The ckanext-harvest-plugin offers the possibility to export and import data. First of all, it enables you to exchange data between Portals that both use CKAN.</p> <p>Furthermore we use this plugin to harvest data in a regular manner from different data-sources. At <a href=""></a> we use two different types of harvesters. Our DCAT-Harvester consumes XML-/RDF-endpoints in <a href="">DCAT-AP Switzerland</a>-Format which is enforced on the Swiss Portal.</p> <p>The Geocat-Harvester consumes data from <a href=""></a>. As the data from geocat is in ISO-19139_che-Format (Swiss version of ISO-19139) the harvester converts the data to the DCAT-AP Switzerland format and imports it.</p> <p>Another feature of this plugin we use, is our <a href="">DCAT-AP endpoint</a>, to allow other portals to harvest our data and also serves as an example to Organizations that want to build an export that can be harvested by us.</p> <img src="" alt="How our Harvesters interact with the different Portals" format="png"> <h3>ckanext-datastore</h3> <p>The plugin ckanext-datastore stores the actual tabular data (opposing to 'just' the meta-data) in a seperate database. With it, we are able to offer an easy to use API on top of the CKAN-Standard-API to query the data and process it further. It provides basic functionalities on the resource-detail-page to display the data in simple graphs. </p> <p>The datastore is the most interesting one for Data-Analysts, who want to build apps based on the data, or analyze the data on a deeper level. This is an <a href="">API-example of the Freibäder-dataset</a> on the portal of <a href="">Statistik Stadt Zürich</a>.</p> <h3>ckanext-showcase</h3> <p>We use ckanext-showcase to provide a platform for Data-Analysts by displaying what has been built, based on the data the portal is offering. There you can find a good overview on how the data can be viewed in meaningful ways as statistics or used as sources in narrated videos or even in apps for an easier everyday life. For example you can browse through the <a href="">Showcases on the Portal of the City of Zurich</a>.</p> <h3>ckanext-xloader</h3> <p>The ckanext-xloader is a fairly new plugin which we were able to adopt for the City of Zurich Portal. It enables us to automatically and asynchronously load data into the datastore to have the data available after it has been harvested.</p> <h2>CKAN Community</h2> <p>The CKAN-Core and also a number of its major plugins are maintained by the CKAN-Core-Team. The developers are spread around the globe, working partly in companies that run their own open-data portals. The community that contribute to CKAN and its Plugins is always open to developers that would like to help with suggestions, report issues or provide Pull-Requests on Github. It offers a strong community which helps beginners, no matter their background. The <a href="">ckan-dev-Mailing-List</a> provides help in developing CKAN and is the platform for discussions and ideas about CKAN, too.</p> <h2>Roadmap and most recent Features</h2> <p>Since the Major-Release 2.7 CKAN requires Redis to use a new system of asynchronous background jobs. This helps CKAN to be more performant and reliable. Just a few weeks ago the new Major-Release 2.8 was released. A lot of work on this release went into driving CKAN forward by updating to a newer Version of Bootstrap and also deprecating old features that were holding back CKAN's progress. </p> <p>Another rather new feature is the datatables-feature for tabular data. Its intention is to help the data-owner to describe the actual data in more detail by describing the values and how they gathered or calculated.</p> <p>In the Roadmap of CKAN are many interesting features ahead. One example is the development of the CKAN Data Explorer which is a base component of CKAN. It allows to converge data from any dataset in the DataStore of a CKAN instance to analyze it.</p> <h2>Conclusion</h2> <p>It is important to us to support the Open Data Movement as we see value in publishing governmental data to the public. CKAN helps us to support this cause by working with several Organizations to publish their data and consult our customers while we develop and improve their portals together.</p> <p>Personally, I am happy to be a part of the CKAN-Community which has always been very helpful and supportive. The cause to help different Organizations to make their data public to the people and the respectful CKAN-Community make it a lot of fun to contribute to the code and also the community.</p> <img src="" alt="Open Data auf" format="png"><img src="" height="1" width="1" alt=""/> Recipe Assistant Prototype with Automatic Speech Recognition (ASR) and Text to Speech (TTS) on Socket.IO - Part 1 TTS Market Overview Mon, 28 May 2018 00:00:00 +0200 <h2>Intro</h2> <p>In one of our monthly innodays, where we try out new technologies and different approaches to old problems, we had the idea to collaborate with another company. Slowsoft is a provider of text to speech (TTS) solutions. To my knowledge they are the only ones who are able to generate Swiss German speech synthesis in various Swiss accents. We thought it would be a cool idea to combine it with our existing automatic speech recognition (ASR) expertise and build a cooking assistant that you can operate completely hands free. So no more touching your phone with your dirty fingers only to check again how many eggs you need for that cake. We decided that it would be great to go with some recipes from a famous swiss cookbook provider. </p> <h2>Overview</h2> <p>Generally there are quite a few text to speech solutions out there on the market. In the first out of two blog posts would like to give you a short overview of the available options. In the second blog post I will then describe at which insights we arrived in the UX workshop and how we then combined with the solution from slowsoft in a quick and dirty web-app prototype built on and flask. </p> <p>But first let us get an overview over existing text to speech (TTS) solutions. To showcase the performance of existing SaaS solutions I've chosen a random recipe from Betty Bossi and had it read by them:</p> <pre><code class="language-text">Ofen auf 220 Grad vorheizen. Broccoli mit dem Strunk in ca. 1 1/2 cm dicke Scheiben schneiden, auf einem mit Backpapier belegten Blech verteilen. Öl darüberträufeln, salzen. Backen: ca. 15 Min. in der Mitte des Ofens. Essig, Öl und Dattelsirup verrühren, Schnittlauch grob schneiden, beigeben, Vinaigrette würzen. Broccoli aus dem Ofen nehmen. Einige Chips mit den Edamame auf dem Broccoli verteilen. Vinaigrette darüberträufeln. Restliche Chips dazu servieren. </code></pre> <h3>But first: How does TTS work?</h3> <p>The classical way works like this: You have to record at least dozens of hours of raw speaker material in a professional studio. Depending on the task, the material can range from navigation instructions to jokes, depending on your use case. The next trick is called &quot;unit-selection&quot;, where recorded speech is sliced into a high number (10k - 500k) of elementary components called <a href="">phones</a>, in order to be able to recombine those into new words, that the speaker has never recorded. The recombination of these components is not an easy task because the characteristics depend on the neighboring phonemes and the accentuation or <a href="">prosody</a>. These depend on a lot on the context. The problem is to find the right combination of these units that satisfy the input text and the accentuation and which can be joined together without generating glitches. The raw input text is first translated into a phonetic transcription which then serves as the input to selecting the right units from the database that are then concatenated into a waveform. Below is a great example from Apple's Siri <a href="">engineering team</a> showing how the slicing takes place. </p> <img src="" alt="components" format="png"> <p>Using an algorithm called <a href="">Viterbi</a> the units are then concatenated in such a way that they create the lowest &quot;cost&quot;, in cost resulting from selecting the right unit and concatenating two units together. Below is a great conceptual graphic from Apple's engineering blog showing this cost estimation. </p> <img src="" alt="cost" format="png"> <p>Now in contrast to the classical way of TTS <a href="">new methods based on deep learning</a> have emerged. Here deep learning networks are used to predict the unit selection. If you are interested how the new systems work in detail, I highly recommend the <a href="">engineering blog entry</a> describing how Apple crated the Siri voice. As a final note I'd like to add that there is also a format called <a href="">speech synthetisis markup language</a>, that allows users to manually specify the prosody for TTS systems, this can be used for example to put an emphasis on certain words, which is quite handy. So enough with the boring theory, let's have a look at the available solutions.</p> <h2>SaaS / Commercial</h2> <h3>Google TTS</h3> <p>When thinking about SaaS solutions, the first thing that comes to mind these days, is obviously Google's <a href="">TTS solution</a> which they used to showcase Google's virtual assistant capabilities on this years Google IO conference. Have a look <a href="">here</a> if you haven't been wowed today yet. When you go to their website I highly encourage you to try out their demo with a German text of your choice. It really works well - the only downside for us was that it's not really Swiss German. I doubt that they will offer it for such a small user group - but who knows. I've taken a recipe and had it read by Google and frankly liked the output. </p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <h3>Azure Cognitive Services</h3> <p>Microsoft also offers TTS as part of their Azure <a href="">cognitive services</a> (ASR, Intent detection, TTS). Similar to Google, having ASR and TTS from one provider, definitely has the benefit of saving us one roundtrip since normally you would need to perform the following trips:</p> <ol> <li>Send audio data from client to server, </li> <li>Get response to client (dispatch the message on the client)</li> <li>Send our text to be transformed to speech (TTS) from client to server </li> <li>Get the response on client. Play it to the user.</li> </ol> <p>Having ASR and TTS in one place reduces it to:</p> <ol> <li>ASR From client to server. Process it on the server. </li> <li>TTS response to client. Play it to the user.</li> </ol> <p>Judging the speech synthesis quality, I personally I think that Microsoft's solution didn't sound as great as Googles synthesis. But have a look for yourself. </p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <h3>Amazon Polly</h3> <p>Amazon - having placed their bets on Alexa - of course has a sophisticated TTS solution, which they call <a href="">Polly</a>. I love the name :). To be where they are now, they have acquired a startup called Ivona already back in 2013, which were back then producing state of the art TTS solutions. Having tried it I liked the soft tone and the fluency of the results. Have a check yourself:</p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <h3>Apple Siri</h3> <p>Apple offers TTS as part of their iOS SDK in the name of <a href="">SikiKit</a>. I haven’t had the chance yet to play in depth with it. Wanting to try it out I made the error to think that apples TTS solution on the Desktop is the same as SiriKit. Yet SiriKit is nothing like the built in TTS on the MacOS. To have a bit of a laugh on your Macbook you can do a really poor TTS in the command line you can simply use a command:</p> <pre><code class="language-bash">say -v fred "Ofen auf 220 Grad vorheizen. Broccoli mit dem Strunk in ca. 1 1/2 cm dicke Scheiben schneiden, auf einem mit Backpapier belegten Blech verteilen. Öl darüberträufeln, salzen. Backen: ca. 15 Min. in der Mitte des Ofens."</code></pre> <p>While the output sounds awful, below is the same text read by Siri on the newest iOS 11.3. That shows you how far TTS systems have evolved in the last years. Sorry for the bad quality but somehow it seems impossible to turn off the external microphone when recording on an IPhone. </p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <h3>IBM Watson</h3> <p>In this arms race IBM also offers a TTS system, with a way to also define the prosody manually, using the <a href="">SSML markup language standard</a>. I didn't like their output in comparison to the presented alternatives, since it sounded quite artificial in comparison. But give it a try for yourself.</p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <h3>Other commercial solutions</h3> <p>Finally there are also competitors beyond the obvious ones such as <a href="">Nuance</a> (formerly Scansoft - originating from Xerox research). Despite their page promising a <a href="">lot</a>, I found the quality of the TTS in German to be a bit lacking. </p> <figure class="embed-responsive embed-responsive--16/9"><iframe src="//" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen="true"></iframe></figure> <p>Facebook doesn't offer a TTS solution, yet - maybe they have rather put their bets on Virtual Reality instead. Other notable solutions are <a href="">Acapella</a>, <a href="">Innoetics</a>, <a href="">TomWeber Software</a>, <a href="">Aristech</a> and <a href="">Slowsoft</a> for Swiss TTS.</p> <h2>OpenSource</h2> <p>Instead of providing the same kind of overview for the open source area, I think it's easier to list a few projects and provide a sample of the synthesis. Many of these projects are academic in nature, and often don't give you all the bells and whistles and fancy APIs like the commercial products, but with some dedication could definitely work if you put your mind to it. </p> <ul> <li><a href="">Espeak</a>. <a href="">sample</a> - My personal favorite. </li> <li><a href="">Festival</a> a project from the CMU university, focused on portability. No sample.</li> <li><a href="">Mary</a>. From the german &quot;Forschungszentrum für Künstliche Intelligenz&quot; DKFI. <a href="">sample</a></li> <li><a href="">Mbrola</a> from the University of Mons <a href="">sample</a></li> <li><a href="">Simple4All</a> - a EU funded Project. <a href="">sample</a></li> <li><a href="">Mycroft</a>. More of an open source assistant, but runs on the Raspberry Pi.</li> <li><a href="">Mimic</a>. Only the TTS from the Mycroft project. No sample available.</li> <li>Mozilla has published over 500 hours of material in their <a href="">common voice project</a>. Based on this data they offer a deep learning ASR project <a href="">Deep Speech</a>. Hopefully they will offer TTS based on this data too someday. </li> <li><a href="">Char2Wav</a> from the University of Montreal (who btw. maintain the theano library). <a href="">sample</a></li> </ul> <p>Overall my feeling is that unfortunately most of the open source systems have not yet caught up with the commercial versions. I can only speculate about the reasons, as it might take a significant amount of good raw audio data to produce comparable results and a lot of fine tuning on the final model for each language. For an elaborate overview of all TTS systems, especially the ones that work in German, I highly recommend to check out the <a href="">extensive list</a> that Felix Burkhardt from the Technical University of Berlin has compiled. </p> <p>That sums up the market overview of commercial and open source solutions. Overall I was quite amazed how fluent some of these solutions sounded and think the technology is ready to really change how we interact with computers. Stay tuned for the next blog post where I will explain how we put one of these solutions to use to create a hands free recipe reading assistant.</p><img src="" height="1" width="1" alt=""/> What we learned at Typo Berlin 2018 Fri, 25 May 2018 00:00:00 +0200 <h2>5 things we learned</h2> <p>The presentations were full of everything from visionary thoughts to practical tips and tricks, with plenty of typography and content in between. </p> <img src="" alt="Designer Timothy Goodman at Typo Berlin 2018"> <p>New York based designer <a href="">Timothy Goodman</a> had the most important message of all.</p> <img src="" alt="Brand Strategist Alex Mecklenburgat Typo Berlin 2018"> <p>Brand strategist <a href="">Alex Mecklenburg</a> shared a similar message, but from the corporate perspective.</p> <p>She posed the question &quot;The Wonder of Digital Creation is sacred ... or is it?&quot; and advises against creating internal innovation labs because they exclude everyone outside the lab from being innovative. </p> <img src="" alt="Digital Visionary Johann Jungwirth at Typo Berlin 2018"> <p>From vision and philosophy to more practical matters: the faceless truck of the future, as <a href="">imagined by Volkswagen</a>. And how about that fancy Volkswagen logo of the future?</p> <img src="" alt="Brand Talk at Typo Berlin 2018"> <p>Nivea showed how they designed fonts to provoke specific feelings. In graphology, this is called the Eindruckscharaktere.<br /> Brand: Nivea<br /> Agency: <a href="">Juliasys</a></p> <img src="" alt="Brand Talk at Typo Berlin 2018"> <p>Europe’s largest network of health clinics presented their new corporate design and website. They followed an interesting approach: content first. Because it’s content that would gain their patient’s trust.<br /> Brand: Helios Clinics<br /> Agency: <a href="">EdenSpiekermann</a> </p> <img src="" alt="Professor and typographer Gerd Fleischmann at Typo Berlin 2018"> <p>Famous German dadaist, surrealist, and constructivist <a href="">Kurt Schwitters</a> (1887 - 1948) wasn't just one of the defining artists of the 20th century – he also had lots to say about typography, as <a href="">Prof Gerd Fleischmann</a> explained to us.</p> <h2>4 inspirational finds</h2> <p>Here is some work that’ll remind you how wonderful and faceted creativity can be.</p> <img src="" alt="Designer Hansje van Halem at Typo Berlin 2018"> <p>Dive into the work of <a href="">Hansje van Halem</a> – great fun.</p> <img src="" alt="XXX at Typo Berlin 2018"> <p>Fantastic work! Watch case film <a href="">here</a>.<br /> Brand: London Symphony orchestra<br /> Agency: <a href="">Superunion</a> </p> <img src="" alt="Urban developer Charles Landry at Typo Berlin 2018"> <p>Society is moving faster and faster, and urban developer Charles Landry talked about his <a href="">his obervations</a> on the consequences. For example, cities have to come up with creative solutions to cope with or even take advantage of the ever increasing speed of life.</p> <img src="" alt="Brand talk at Typo Berlin 2018"> <p>Speaking of faster moving times: In their “new agenda for strategic branding” <a href="">the team from KMS Agency</a> showed how fast a channel reaches 50 million users. </p> <h2>3 fun facts</h2> <p>Observing the creative avante-garde two things came to mind. First, even the big agencies deal with the stuff everyone else deals with. And second, the political resistance is alive!</p> <img src="" alt="Brand Talks at Typo Berlin 2018"> <p>We've all been there. But the good news is: even the big players in the agency world come to the same conclusion.<br /> Brand: Helios Clinics<br /> Agency: <a href="">EdenSpiekermann</a> </p> <img src="" alt="Brand Talk at Typo Berlin 2018"> <p>We've all been there, part 2: when the client wants to combine the two (very) different directions the agency presented.<br /> Brand: London Symphony orchestra<br /> Agency: <a href="">Superunion</a> </p> <img src="" alt="Politics yeah! at Typo Berlin 2018"> <p>How refreshingly political Type Berlin was! Several speakers came out passionately <a href="">against Trump</a>, while one German-speaking presenter did his entire talk <a href="">in the she form only</a>. We had a <a href="">female muslim designer on stage</a> and many speakers <a href="">poked fun at patriarchy</a>. Go go go, forward-thinking creatives!</p><img src="" height="1" width="1" alt=""/>