Liip Blog Kirby Fri, 25 May 2018 00:00:00 +0200 Latest articles from the Liip Blog en One.Thing.Less goes live! Fri, 25 May 2018 00:00:00 +0200 <h2>Making privacy and data protection actionable by anyone</h2> <p>You surely received many companies’ emails about your data privacy during last weeks. That’s due to the new European law (called GDPR for General Data Protection Regulation) that became enforceable on the 25th of May 2018.<br /> This is a major step towards a more responsible usage of personal data by companies, as you can now request information on how your data is being used, and ask to change it.</p> <p>Nevertheless, as James Aschberger (Founder &amp; CEO of One.Thing.Less AG) noticed, many people didn’t understand how they can concretely claim their rights using this new regulation.</p> <p><em>&quot;It’s overwhelming, where do I even start?&quot;</em></p> <p><em>&quot;I am worried about my data, but I don’t have the time or knowledge to contact companies individually or read through all the privacy policies and terms of use they put in front of me.&quot;</em></p> <p><em>&quot;I receive so many messages regarding data privacy, but I don’t understand all of it and don’t have time to read the fine print.&quot;</em></p> <h2>There is an app for that now</h2> <p>We at Liip have <a href="">worked hard</a> during the past four months to make the One.Thing.Less dream of James happen — a free and simple way to take control over the use of your personal data.</p> <p>You can now <a href="">download the app on the Apple App Store</a> (and in June on Google Play for Android) and make your voice heard by companies.</p> <img src="" alt="One.Thing.Less mobile app Homescreen" width="300"> <p><em>One.Thing.Less mobile app Homescreen</em></p> <h2>Keep It Simple, Stupid (aka KISS)</h2> <p>When James first came to us, he made it clear that the User Experience had to be simple. That was one of our main challenge: solve this complex legal problem via one click.</p> <p>This is what we did.</p> <img src="" alt="Ask companies about the use of your personal data, in one tap." width="300"> <p><em>Ask companies about the use of your personal data, in one tap.</em></p> <p>On the above view, you can just press the “Send request” to ask a company seven (simple) questions about the use of your personal data. Companies have to provide a response within 30 days under GDPR.</p> <p>Then you can request them to change how they make use of your personal data, still one click away.</p> <img src="" alt="Change the way companies use your personal data." width="700"> <p><em>Change the way companies use your personal data</em></p> <h2>A trustful partnership</h2> <p>We can’t stop talking about trust here at Liip, as we believe that’s one of the core element of successful products and companies.<br /> This project was a great example of how one of our mottos “Trust over control” can lead to extraordinary results, in less than four months.</p> <p><em>&quot;Successfully launching our platform in such a short time was only possible because we established and earned mutual trust along the entire journey together. If I had not been able to trust that each of the Liip team members delivers, we would have failed.&quot;</em> James Aschberger, Founder &amp; CEO at One.Thing.Less AG</p> <p><em>&quot;It was a chance to craft such a meaningful product, in a trustful and solution-oriented partnership.&quot;</em> Thomas Botton, Product Owner at Liip</p><img src="" height="1" width="1" alt=""/> One Scrum team, two companies Fri, 18 May 2018 00:00:00 +0200 <p><strong>All roles in one team</strong><br /> To reach the goal iteratively together, all roles are represented in a team. Only with all necessary competences in a common project team a very close cooperation with high reactivity and fast, deliverable results can be realized.</p> <p><strong>Shared responsibility</strong><br /> In a mixed customer service provider scrum team there is no longer a traditional distribution of roles. Raiffeisen is responsible for the product vision and design and thus also for the product owner role. Liip with many years of experience in agile development presents the ScrumMaster. The result is a mixed team: developers from both companies, Interaction Design and Testing from Raiffeisen and the architecture from Liip - all participating with their strengths.</p> <p><strong>Self-organization</strong><br /> In a team where there is no manager in the vicinity - at Raiffeisen not in the same building, at Liip there are no superiors - self-organization is not just a buzzword.<br /> Scrum provides the guidelines: The product lead lies entirely with the product owner, the ScrumMaster gradually makes itself obsolete by accompanying the team to constantly improve. The implementation team bears the remaining responsibility for the entire &quot;How? Specifically, the complete implementation in the sprint with technical solution, the design not defined by the product owner, the entire planning of the implementation as well as ongoing cooperation and coordination.</p> <p><strong>Working together as a recipe for success</strong><br /> Physical proximity was also a decisive success factor in the previous MemberPlus cooperation. In the current project, however, not only the daily coordination meetings are held together on site, but the entire project team works together - mostly 4 days a week - in the Liip office in St.Gallen. </p> <p><strong>Limits of cross-company Scrum teams</strong><br /> Ideal world in the MemberPlus team? Of course not. Anyone who works so closely together will notice almost everything and there are challenges in every project. But discrepancies pop up quickly due to the proximity, which avoids major escalations.</p> <p><strong>To be continued</strong><br /> The next blog entry deals with the possible agile models of customer service provider cooperation.</p><img src="" height="1" width="1" alt=""/> Sustainability Put into Practice Thu, 17 May 2018 00:00:00 +0200 <p><strong>Liip is one of the leading Swiss companies in the implementation of individual web and mobile applications. Almost 160 employees in Zurich, Freiburg, Lausanne, St.Gallen and Bern develop customised digital solutions for companies such as Migros and Raiffeisen, and the federal authorities. By employing a wide range of measures, starting with the choice of location, and with employees heavily involved, Liip always keeps its own carbon footprint as small as possible. In an interview with myclimate, Gerhard Andrey, co-founder of Liip, sees enormous untapped potential in Swiss companies. </strong></p> <p><strong>Mr Andrey, what does the name «Liip» mean?</strong></p> <p>Liip comes from Old High German and means «life». When spoken, the name sounds like the English word «leap», which also means «jump» or «spring». Both of these suit us perfectly, as we see ourselves as a lively organisation, committed to a progressive, sustainable world.</p> <p><strong>As a provider of web solutions, how have you come to be so heavily involved in sustainability?</strong></p> <p>Since the start of our business, it was clear to us founders that we as a company also have a social responsibility and want to actively embrace it. For this reason, we have always tried to reconcile the three aspects of sustainability: social, ecological and economic concerns. </p> <p><strong>You have already consistently implemented environmentally aware and climate-friendly measures in many areas within your company. What areas are currently causing you the most anguish?</strong></p> <p>As we rent our offices, we are dependent on the regional property markets. In some locations we simply have no other choice but to use buildings that do not have state-of-the-art technology and sometimes even have to be heated with oil. That is frustrating. We could of course move away from the poorly maintained city centre, into modern buildings in zones outside the city. But that would thwart our efforts in terms of mobility.</p> <p>I wish that politicians would offer incentives to reduce urban sprawl and motivate the improvement of city centres. We will not get very far by simply «encouraging» the owners. These emissions, which we cannot really influence, are exactly why we are happy to have myclimate. Without the possibility of carbon offsetting, our hands would be tied in these cases and we would be able to do nothing.</p> <p><strong>In comparison, where do you see potential for more environmental and climate consciousness in companies in Switzerland?</strong></p> <p>There is enormous potential in terms of mobility. I am always amazed at how traditionally most companies act when it comes to the commute. For many organisations and employees it appears to still be completely normal to drive to work in a car and find a parking space. This has enormous consequences for land use and generates large quantities of harmful emissions.</p> <p>We are addressing this issue very consistently. Liip does not offer any car parking spaces, but it does subsidise bikes and public transport with half-fare travel cards or sometimes even general travel cards. So, according to a recent survey, around 95 per cent of all of our employees' commutes are currently on foot, by bicycle or using public transport. With almost 160 employees that really makes a difference.</p> <p>As for mobility for our work, we have invested in top-quality infrastructure for video conferences. This enables us to avoid many journeys, and we can hold meetings across sites or even from the comfort of home. This is not only advantageous from an ecological point of view but also economically. We avoid air travel completely. Our business activities are limited to the Swiss market. Our people mainly only go abroad for training or conferences. The rule here is simple: flight costs are not covered, but the usually more expensive costs of train travel are covered by the company.</p> <p><strong>To what extent is your focus on social and ecological sustainability also appreciated by your customers? Does it have any effect on the awarding of contracts?</strong></p> <p>There are customers who come to us specifically for that reason. I am convinced that this issue will become more important in future when it comes to selecting suppliers. We are maintaining our efforts to increase sustainability. After all, we are convinced that we will be able to harvest the fruits of our labour in the long term, be it directly as a company or most certainly as individuals in society.</p> <p><strong><em>In 2013 Liip received the Zurich Cantonal Bank (ZKB) sustainability prize for its commitment to its employees, their families and, not least, the climate. Since it was founded in 2007, Liip has been offsetting its own company emissions with <a href="">myclimate</a>.</em></strong></p><img src="" height="1" width="1" alt=""/> fenaco’s new online presence Thu, 17 May 2018 00:00:00 +0200 <h3>Agriculture goes digital</h3> <p>fenaco is an agricultural association with a concept dating back over a century and is owned by around 192 LANDI cooperatives and their over 42,000 members, 22,000 of whom are active Swiss farmers. Some of their best-known brands include drinks manufacturer RAMSEIER Suisse, meat processor Ernst Sutter, retailers Volg and LANDI, fertiliser supplier Landor, feed manufacturer UFA and energy supplier AGROLA. The aim of fenaco is to help farmers economically develop their businesses.</p> <p>The company is celebrating its 25th birthday this year, an excellent reason to update its outdated online presence. A new company website was needed, but what would it look like – and what should it say?</p> <h3>The website project</h3> <p>The aim of the project was to achieve a simple, clearly structured appearance. The company’s website needed to be modern and attractive. The fenaco brand was to be strengthened without weakening the brands of the companies which are part of the association. These were the requirements with which we began the project. We had a clear aim, a short time frame and a modest budget.</p> <p>Agile development methods and an active exchange with customers meant that the project could be implemented within a short period of time and for a relatively small budget. Two POs (product owners) worked to ensure that everything ran smoothly: fenaco’s PO recorded the requirements in the form of tickets in Jira, and Liip’s PO prioritised them in conjunction with the development team. Regular coordination and approval meetings in small groups were held for this purpose.</p> <p>The website was developed using Drupal 8 to ensure a high level of flexibility and a modular structure. The design impresses with its simple colours, taken from fenaco’s corporate design, and its user-oriented menu navigation. The end results speak for themselves! We look forward to seeing how fenaco’s customers and stakeholders react.</p> <h3>Trust over control</h3> <p>We worked together to replace the outdated website and improve fenaco’s visibility without competing with the brands of its members, building on the trust placed in our collaboration and shared goals.</p> <p><em>«Liip delivered on their promises – consulting on equal terms, strong project support, a high level of design skill, and a technically perfect solution – all on budget and on time, in what was a very pleasant collaboration. Thank you!»</em><br /> Elias Loretan, Online and Social Media Manager at fenaco </p> <p><em>«A precision landing with minimum effort thanks to a straightforward collaboration between Liip and fenaco»</em> Daniel Frey, Liip PO</p><img src="" height="1" width="1" alt=""/> Progressive web apps, Meteor, Azure and the Data science stack or The future of web development conference. Wed, 09 May 2018 00:00:00 +0200 <h3>Back to the future</h3> <p>Although the conference (hosted in Zürich last week in the Crown Plaza) had explicitly the word future in the title, I found that often the new trends felt a bit like &quot;back to the future&quot;. Why ? Because it seems that some rather old concepts like plain old SQL, &quot;offline first&quot; or pure javascript frameworks or are making a comeback in web development - but with a twist. This already brings us to the first talk. </p> <h3>Modern single page apps with meteor</h3> <img src="" alt="meteor" format="png"> <p>Timo Horstschaefer from <a href="">Ledgy</a> showed how to create modern single page apps with <a href="">meteor.js</a>. Although every framework promises to &quot;ship more with less code&quot;, he showed that for their project Ledgy - which is a mobile app to allocate shares among stakeholders - they were able to actually write it in less than 3 months using 13'000 lines of code. In comparison to other web frameworks where there is a backend side, that is written in one language (e.g. ruby - rails, python - django etc..) and a js-heavy frontend framework (e.g. react or angular) meteor does things differently by also offering a tightly coupled frontend and a backend part written purely in js. The backend is mostly a node component. In their case it is really slim, by only having 500 lines of code. It is mainly responsible for data consistency and authentication, while all the other logic simply runs in the client. Such client projects really shine especially when having to deal with shaky Internet connections, because meteor takes care of all the data transmission in the backend, and catches up on the changes once it has regained accessibility. Although meteor seemed to have had a rough patch in the community in 2015 and 2016 it is heading for a strong come back. The framework is highly opinionated, but I personally really liked the high abstraction level, which seemed to allow the team a blazingly fast time to market. A quite favorable development seems to be that Meteor is trying to open up beyond MongoDB as a database by offering their own GraphQL client (Apollo) that even outshines Facebook's own client, and so offers developers freedom on the choice of a database solution.</p> <p>I highly encourage you to have a look at Timo's <a href="">presentation.</a> </p> <h3>The data science stack</h3> <img src="" alt="datastack" format="png"> <p>Then it was my turn to present the data science stack. I won't bother you about the contents of my talk, since I've already blogged about it in detail <a href="">here</a>. If you still want to have a look at the presentation, you can <a href="">download</a> it of course. In the talk offered a very subjective birds eyes view on how the data centric perspective touches modern web standards. An interesting feedback from the panel was the question if such an overview really helps our developers to create better solutions. I personally think that having such maps or collections for orientation helps especially people in junior positions to expand their field of view. I think it might also help senior staff to look beyond their comfort zone, and overcome the saying &quot;if everything you have is a hammer, then every problem looks like a nail to you&quot; - so using the same set of tools for every project. Yet I think the biggest benefit might be to offer the client a really unbiased perspective on his options, of which he might have many more than some big vendors are trying to make him believe. </p> <h3>From data science stack to data stack</h3> <img src="" alt="azure" format="png"> <p>Meinrad Weiss from Microsoft offered interesting insights into a glimpse of the Azure universe, showing us the many options on how data can be stored in an azure cloud. While some facts were indeed surprising, for example Microsoft being unable to find two data centers that were more than 400 miles apart in Switzerland (apparently the country is too small!) other facts like the majority of clients still operating in the SQL paradigm were less surprising. One thing that really amazed me was their &quot;really big&quot; storage solution so basically everything beyond 40 peta!-bytes: The data is spread into 60! storage blobs that operate independently of the computational resources, which can be scaled for demand on top of the data layer. In comparison to a classical hadoop stack where the computation and the data are baked into one node, here the customer can scale up his computational power temporarily and then scale it down after he has finished his computations, so saving a bit of money. In regards to the bill though such solutions are not cheap - we are talking about roughly 5 digits per month entrance price, so not really the typical KMU scenario. Have a look at the <a href="">presentation</a> if you want a quick refresher on current options for big data storage at Microsoft Azure. An interesting insight was also that while a lot of different paradigms have emerged in the last years, Microsoft managed to include them all (e.g. Gremlin Graph, Cassandra, MongoDB) in their database services unifying their interfaces in one SQL endpoint. </p> <h3>Offline First or progressive web apps</h3> <img src="" alt="pwa" format="png"> <p>Nicro Martin, a leading Web and Frontend Developer from the <a href="">Say Hello</a> agency showcased how the web is coming back to mobile again. Coming back? Yes you heard right. If thought you were doing mobile first for many years now, you are right to ask why it is coming back. As it turns out (according to a recent comscore report from 2017) although people are indeed using their mobile heavily, they are spending 87% of their time inside apps and not browsing the web. Which might be surprising. On the other hand though, while apps seem to dominate the mobile usage, more than 50% of people don't install any new apps on their phone, simply because they are happy with the ones the have. Actually they spend 80% of their time in the top 3 apps. That poses a really difficult problem for new apps - how can they get their foot into the door with such a highly habitualized behavior. One potential answer might be <a href="">Progressive Web apps</a>, a standard defined by Apple and Google already quite a few years ago, that seeks to offer a highly responsive and fast website behavior that feels almost like an application. To pull this off, the main idea is that a so called &quot;service worker&quot; - a piece of code that is installed on the mobile and continues running in the background - is making it possible for these web apps to for example send notifications to users while she is not using the website. So rather something that users know from their classical native apps. Another very trivial benefit is that you can install these apps on your home screen, and by tapping them it feels like really using an app and not browsing a website (e.g. there is no browser address bar). Finally the whole website can operate in offline mode too, thanks to a smart caching mechanism, that allows developers to decide what to store on the mobile in contrast to what the browser cache normally does. If you feel like trying out one of these apps I highly recommend to try out <a href=""></a>, where Google and Twitter sat together and tried to showcase everything that is possible with this new technology. If you are using an Android phone, these apps should work right away, but if you are using an Apple phone make sure to at least have the most recent update 11.3 that finally supports progressive apps for apple devices. While Apple slightly opened the door to PWAs I fear that their lack of support for the major features might have something to do with politics. After all, developers circumventing the app store and interacting with their customers without an intermediary doesn’t leave much love for Apples beloved app store. Have a look at Martin's great <a href="">presentation</a> here. </p> <h3>Conclusion</h3> <p>Overall although the topics were a bit diverse, but I definitely enjoyed the conference. A big thanks goes to the organizers of <a href="">Internet Briefing series</a> who do an amazing job of constantly organizing those conferences in a monthly fashing. These are definitely a good way to exchange best practices and eventually learn something new. For me it was the motivation to finally get my hands dirty with progressive web apps, knowing that you don't really need much to make these work. </p> <p>As usual I am happy to hear your comments on these topics and hope that you enjoyed that little summary.</p><img src="" height="1" width="1" alt=""/> LiipImagineBundle 2.0.0 Release Tue, 08 May 2018 00:00:00 +0200 <p><strong>Changelog</strong><br /> <a href=""></a></p> <p><strong>Important changes</strong>:</p> <ul> <li>minimum required <a href="">PHP</a> version is 7.1</li> <li>added support of <a href="">Symfony 4</a></li> </ul> <p>Full list of changes could be found at <a href=""></a></p> <p><strong>Overview</strong></p> <p>This bundle provides an image manipulation abstraction toolkit for <a href="">Symfony</a>-based projects.</p> <ul> <li> <p><a href="">Filter Sets</a>:<br /> Using any Symfony-supported configuration language (such as YML and XML), you can create <em>filter set</em> definitions that specify transformation routines. These definitions include a set of <em><a href="">filters</a></em> and <em><a href="">post-processors</a></em>, as well as other optional parameters.</p> </li> <li> <p><a href="">Filters</a>:<br /> Image transformations are applied using <em>filters</em>. A set of <a href="">build-in filters</a> are provided by the bundle,<br /> implementing the most common transformations; examples include <a href="">thumbnail</a>, <a href="">scale</a>, <a href="">crop</a>, <a href="">flip</a>, <a href="">strip</a>, and <a href="">watermark</a>.<br /> For more advances transformations, you can easily create your own <a href="">custom filters</a>.</p> </li> <li><a href="">Post-Processors</a>:<br /> Modification of the resulting binary image file (created from your <em>filters</em>) are handled by <em>post-processors</em>.<br /> Examples include <a href="">JPEG Optim</a>, <a href="">Moz JPEG</a>, <a href="">Opti PNG</a>, and <a href="">PNG Quant</a>.<br /> Just like filters you can easily create your own <a href="">custom post-processors</a>.</li> </ul> <p>For more detailed information about the features of this bundle, refer to the <a href="">documentation</a>.</p><img src="" height="1" width="1" alt=""/> Sentiment detection with Keras, word embeddings and LSTM deep learning networks Fri, 04 May 2018 00:00:00 +0200 <h3>Overview SaaS</h3> <p>When it comes to sentiment detection it has become a bit of a commodity. Especially the big 5 vendors offer their own sentiment detection as a service. Google offers an <a href="">NLP API</a> with sentiment detection. Microsoft offers sentiment detection through their <a href="">Azure</a> platform. IBM has come up with a solution called <a href="">Tone Analyzer</a>, that tries to get the &quot;tone&quot; of the message, which goes a bit beyond sentiment detection. Amazon offers a solution called <a href="">comprehend</a> that runs on aws as a lambda. Facebook surprisingly doesn't offer an API or an open source project here, although they are the ones with user generated content, where people often are not <a href="">so nice</a> to each other. Interestingly they do not offer any assistance for page owners in that specific matter.</p> <p>Beyond the big 5 there are a few noteworthy of companies like <a href="">Aylien</a> and <a href="">Monkeylearn</a>, that are worth checking out. </p> <h3>Overview Open Source Solutions</h3> <p>Of course there are are open source solutions or libraries that offer sentiment detection too.<br /> Generally all of these tools offer more than just sentiment analysis. Most of the outlined SaaS solutions above as well as the open source libraries offer a vast amount of different NLP tasks:</p> <ul> <li>part of speech tagging (e.g. &quot;going&quot; is a verb), </li> <li>stemming (finding the &quot;root&quot; of a word e.g. am,are,is -&gt; be), </li> <li>noun phrase extraction (e.g. car is a noun), </li> <li>tokenization (e.g. splitting text into words, sentences), </li> <li>words inflections (e.g. what's the plural of atlas), </li> <li>spelling correction and translation. </li> </ul> <p>I like to point you to pythons <a href="">NLTK library</a>, <a href="">TextBlob</a>, <a href="">Pattern</a> or R's <a href="">Text Mining</a> module and Java's <a href="">LingPipe</a> library. Finally, I encourage you to have a look at the latest <a href="">Spacy NLP suite</a>, which doesn't offer sentiment detection per se but has great NLP capabilities. </p> <p>If you are looking for more options I encourage you to take a look at the full list that I have compiled in our <a href="">data science stack</a>. </p> <h3>Let's get started</h3> <p>So you see, when you need sentiment analysis in your web-app or mobile app you already have a myriad of options to get started. Of course you might build something by yourself if your language is not supported or you have other legal compliances to meet when it comes to data privacy.</p> <p>Let me walk you through all of the steps needed to make a well working sentiment detection with <a href="">Keras</a> and <a href="">long short-term memory networks</a>. Keras is a very popular python deep learning library, similar to <a href="">TFlearn</a> that allows to create neural networks without writing too much boiler plate code. LSTM networks are a special form or network architecture especially useful for text tasks which I am going to explain later. </p> <img src="" alt="keras" format="png"> <h3>Step 1: Get the data</h3> <p>Being a big movie nerd, I have chosen to classify IMDB reviews as positive or negative for this example. As a benefit the IMDB sample comes already with the Keras <a href="">datasets</a> library, so you don't have to download anything. If you are interested though, not a lot of people know that IMDB offers its <a href="">own datasets</a> which can be <a href="">downloaded</a> publicly. Among those we are interested in the ones that contain movie reviews, which have been marked by hand to be either positive or negative. </p> <pre><code class="language-python">#download the data from keras.datasets import imdb top_words = 5000 (X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)</code></pre> <p>The code above does a couple of things at once: </p> <ol> <li>It downloads the data </li> <li>It downloads the first 5000 top words for each review </li> <li>It splits the data into a test and a training set. </li> </ol> <img src="" alt="processed" format="png"> <p>If you look at the data you will realize it has been already pre-processed. All words have been mapped to integers and the integers represent the words sorted by their frequency. This is very common in text analysis to represent a dataset like this. So 4 represents the 4th most used word, 5 the 5th most used word and so on... The integer 1 is reserved reserved for the start marker, the integer 2 for an unknown word and 0 for padding. </p> <p>If you want to peek at the reviews yourself and see what people have actually written, you can reverse the process too:</p> <pre><code class="language-python">#reverse lookup word_to_id = word_to_id = {k:(v+INDEX_FROM) for k,v in word_to_id.items()} word_to_id["&lt;PAD&gt;"] = 0 word_to_id["&lt;START&gt;"] = 1 word_to_id["&lt;UNK&gt;"] = 2 id_to_word = {value:key for key,value in word_to_id.items()} print(' '.join(id_to_word[id] for id in train_x[0] ))</code></pre> <p>The output might look like something like this:</p> <pre><code class="language-python">&lt;START&gt; this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert &lt;UNK&gt; is an amazing actor and now the same being director &lt;UNK&gt; father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for &lt;UNK&gt; and would recommend it to everyone to watch and the fly &lt;UNK&gt; was amazing really cried at the end it was so sad and you know w</code></pre> <h3>One-hot encoder</h3> <p>If you want to do the same with your text (e.g. my example are some work reviews) you can use Keras already built in &quot;one-hot&quot; encoder feature that will allow you to encode your documents with integers. The method is quite useful since it will remove any extra marks (e.g. !&quot;#$%&amp;...) and split sentences into words by space and transform the words into lowercase. </p> <pre><code class="language-python">#one hot encode your documents from numpy import array from keras.preprocessing.text import one_hot docs = ['Gut gemacht', 'Gute arbeit', 'Super idee', 'Perfekt erledigt', 'exzellent', 'naja', 'Schwache arbeit.', 'Nicht gut', 'Miese arbeit.', 'Hätte es besser machen können.'] # integer encode the documents vocab_size = 50 encoded_docs = [one_hot(d, vocab_size) for d in docs] print(encoded_docs)</code></pre> <p>Although the encoding will not be sorted like in our example before (e.g. lower numbers representing more frequent words), this will still give you a similar output:</p> <pre><code>[[18, 6], [35, 39], [49, 46], [41, 39], [25], [16], [11, 39], [6, 18], [21, 39], [15, 23, 19, 41, 25]]</code></pre> <h3>Step 2: Preprocess the data</h3> <p>Since the reviews differ heavily in terms of lengths we want to trim each review to its first 500 words. We need to have text samples of the same length in order to feed them into our neural network. If reviews are shorter than 500 words we will pad them with zeros. Keras being super nice, offers a set of <a href="">preprocessing</a> routines that can do this for us easily. </p> <pre><code class="language-python"># Truncate and pad the review sequences from keras.preprocessing import sequence max_review_length = 500 X_train = sequence.pad_sequences(X_train, maxlen=max_review_length) X_test = sequence.pad_sequences(X_test, maxlen=max_review_length) </code></pre> <img src="" alt="padded" format="png"> <p>As you see above (I've just output the padded Array as a pandas dataframe for visibility) a lot of the reviews have padded 0 at the front which means, that the review is shorter than 500 words. </p> <h3>Step 3: Build the model</h3> <p>Surprisingly we are already done with the data preparation and can already start to build our model. </p> <pre><code class="language-python"># Build the model embedding_vector_length = 32 model = Sequential() model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length)) model.add(LSTM(100)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy']) print(model.summary()) </code></pre> <p>The two most important things in our code are the following:</p> <ol> <li>The Embedding layer and </li> <li>The LSTM Layer. </li> </ol> <p>Lets cover what both are doing. </p> <h3>Word embeddings</h3> <p>The embedding layer will learn a word embedding for all the words in the dataset. It has three arguments the input_dimension in our case the 500 words. The output dimension aka the vector space in which words will be embedded. In our case we have chosen 32 dimensions so a vector of the length of 32 to hold our word coordinates. </p> <p>There are already pre-trained word embeddings (e.g. GloVE or <a href="">Word2Vec</a>) that you can <a href="">download</a> so that you don't have to train your embeddings all by yourself. Generally, these word embeddings are also based on specialized algorithms that do the embedding always a bit different, but we won't cover it here. </p> <p>How can you imagine what an embedding actually is? Well generally words that have a similar meaning in the context should be embedded next to each other. Below is an example of word embeddings in a two-dimensional space:</p> <img src="" alt="embeddings" format="png"> <p>Why should we even care about word embeddings? Because it is a really useful trick. If we were to feed our reviews into a neural network and just one-hot encode them we would have very sparse representations of our texts. Why? Let us have a look at the sentence &quot;I do my job&quot; in &quot;bag of words&quot; representation with a vocabulary of 1000: So a matrix that holds 1000 words (each column is one word), has four ones in it (one for <strong>I</strong>, one for <strong>do</strong> one for <strong>my</strong> and one for <strong>job</strong>) and 996 zeros. So it would be very sparse. This means that learning from it would be difficult, because we would need 1000 input neurons each representing the occurrence of a word in our sentence. </p> <p>In contrast if we do a word embedding we can fold these 1000 words in just as many dimensions as we want, in our case 32. This means that we just have an input vector of 32 values instead of 1000. So the word &quot;I&quot; would be some vector with values (0.4,0.5,0.2,...) and the same would happen with the other words. With word embedding like this, we just need 32 input neurons. </p> <h3>LSTMs</h3> <p>Recurrent neural networks are networks that are used for &quot;things&quot; that happen recurrently so one thing after the other (e.g. time series, but also words). Long Short-Term Memory networks (LSTM) are a specific type of Recurrent Neural Network (RNN) that are capable of learning the relationships between elements in an input sequence. In our case the elements are words. So our next layer is an LSTM layer with 100 memory units.</p> <p>LSTM networks maintain a state, and so overcome the problem of a vanishing gradient problem in recurrent neural networks (basically the problem that when you make a network deep enough the information for learning will &quot;vanish&quot; at some point). I do not want to go into detail how they actually work, but <a href="">here</a> delivers a great visual explanation. Below is a schematic overview over the building blocks of LSTMs.</p> <p>So our output of the embedding layer is a 500 times 32 matrix. Each word is represented through its position in those 32 dimensions. And the sequence is the 500 words that we feed into the LSTM network. </p> <p>Finally at the end we have a dense layer with one node with a sigmoid activation as the output. </p> <p>Since we are going to have only the decision when the review is positive or negative we will use binary_crossentropy for the loss function. The optimizer is the standard one (adam) and the metrics are also the standard accuracy metric. </p> <p>By the way, if you want you can build a sentiment analysis without LSTMs, then you simply need to replace it by a flatten layer:</p> <pre><code class="language-python">#Replace LSTM by a flatten layer #model.add(LSTM(100)) model.add(Flatten()) </code></pre> <h3>Step 4: Train the model</h3> <p>After defining the model Keras gives us a summary of what we have built. It looks like this:</p> <pre><code class="language-python">#Summary from Keras _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 500, 32) 160000 _________________________________________________________________ lstm_1 (LSTM) (None, 100) 53200 _________________________________________________________________ dense_1 (Dense) (None, 1) 101 ================================================================= Total params: 213,301 Trainable params: 213,301 Non-trainable params: 0 _________________________________________________________________ None</code></pre> <p>To train the model we simply call the fit function,supply it with the training data and also tell it which data it can use for validation. That is really useful because we have everything in one call. </p> <pre><code class="language-python">#Train the model, y_train, validation_data=(X_test, y_test), nb_epoch=3, batch_size=64) </code></pre> <p>The training of the model might take a while, especially when you are only running it on the CPU instead of the GPU. When the model training happens, what you want to observe is the loss function, it should constantly be going down, this shows that the model is improving. We will make the model see the dataset 3 times, defined by the epochs parameter. The batch size defines how many samples the model will see at once - in our case 64 reviews. </p> <img src="" alt="training" format="png"> <p>To observe the training you can fire up tensor board which will run in the browser and give you a lot of different analytics, especially the loss curve in real time. To do so type in your console:</p> <pre><code class="language-bash">sudo tensorboard --logdir=/tmp</code></pre> <h3>Step 5: Test the model</h3> <p>Once we have finished training the model we can easily test its accuracy. Keras provides a very handy function to do that:</p> <pre><code class="language-python">#Evaluate the model scores = model.evaluate(X_test, y_test, verbose=0) print("Accuracy: %.2f%%" % (scores[1]*100))</code></pre> <p>In our case the model achieved an accuracy of around 90% which is excellent, given the difficult task. By the way if you are wondering what the results would have been with the Flatten layer it is also around 90%. So in this case I would use <a href="">Occam's razor</a> and in case and in doubt: go with the simpler model.</p> <h3>Step 6: Predict something</h3> <p>Of course at the end we want to use our model in an application. So we want to use it to create predictions. In order to do so we need to translate our sentence into the corresponding word integers and then pad it to match our data. We can then feed it into our model and see if how it thinks we liked or disliked the movie.</p> <pre><code class="language-python">#predict sentiment from reviews bad = "this movie was terrible and bad" good = "i really liked the movie and had fun" for review in [good,bad]: tmp = [] for word in review.split(" "): tmp.append(word_to_id[word]) tmp_padded = sequence.pad_sequences([tmp], maxlen=max_review_length) print("%s. Sentiment: %s" % (review,model.predict(array([tmp_padded][0]))[0][0])) i really liked the movie and had fun. Sentiment: 0.715537 this movie was terrible and bad. Sentiment: 0.0353295</code></pre> <p>In this case a value close to 0 means the sentiment was negative and a value close to 1 means its a positive review. You can also use &quot;model.predict_classes&quot; to just get the classes of positive and negative. </p> <h3>Conclusion or what’s next?</h3> <p>So we have built quite a cool sentiment analysis for IMDB reviews that predicts if a movie review is positive or negative with 90% accuracy. With this we are already <a href="">quite close</a> to industry standards. This means that in comparison to a <a href="">quick prototype</a> that a colleague of mine built a few years ago we could potentially improve on it now. The big benefit while comparing our self-built solution with an SaaS solution on the market is that we own our data and model. We can now deploy this model on our own infrastructure and use it as often as we like. Google or Amazon never get to see sensitive customer data, which might be relevant for certain business cases. We can train it with German or even Swiss German language given that we find a nice dataset, or simply build one ourselves. </p> <p>As always I am looking forward to your comments and insights! As usual you can download the Ipython notebook with the code <a href="">here</a>.</p><img src="" height="1" width="1" alt=""/> One.Thing.Less AG entrusts Liip with the launch of its mobile app Wed, 02 May 2018 00:00:00 +0200 <p>One.Thing.Less AG will empower and enable individuals around the world to regain control over the use of their personal data in an easy and secure way. We currently help the startup to craft its mobile app and platform so it’s ready for the launch day on the 25th of May 2018.</p> <h2>From idea to product</h2> <p>This product idea comes from James Aschberger, Founder &amp; CEO of One.Thing.Less AG, who saw the opportunity in the new <a href="">GDPR regulation</a> to bring back balance in the personal data “crisis” that our world currently faces. Their mobile app will allow you — in one-tap — to ask for visibility on what some companies such as Facebook or Google or Starbucks have and do with your data. More importantly, once they answered, you will be able to act and request changes on how they deal with your personal data. The goal of the product is that you are in control of the use of your personal data.<br /> I loved to hear such a pitch, and so did my mobile teammates. We hence kickstarted our collaboration back in January.</p> <img src="" alt="One.Thing.Less-Liip team during a User Experience workshop"> <p><em>One.Thing.Less-Liip team during a User Experience workshop.</em></p> <h2>Finding a strong mobile partner, with the right (startup) mindset</h2> <p>One.Thing.Less’ challenge was their lack of an internal technical team. We solved their issue by putting up a cross-functional team tailored to their needs — composed of mobile and backend developers, as well as designers.</p> <p>On top of this, and that’s maybe the most critical of all, we answered the need of James to find a trustworthy team with people having the right startup mindset:</p> <p><em>“Once our idea was clear, the biggest challenge was to identify the right UX and technical team to bring it to life. Being a small startup, the chemistry needs to be right and the team mindset needs to be diverse. After our first meeting with Liip I had a strong feeling about the team and their know-how. It was after the second meeting that we were totally convinced that they not only have the technical expertise but also the right spirit and determination to bring this idea into a tangible product.”</em> — James Aschberger, Founder &amp; CEO of One.Thing.Less AG</p> <p>We at Liip are all entrepreneurs, and the way we are <a href="">self-organized</a> allows us to act as such on a daily basis. This reassured and convinced the CEO of One.Thing.Less that Liip was the right choice.<br /> It’s been only three months that we work together, but we already feel like we are one unique team, with one common goal — that is to launch a product that will impact lives positively.</p> <img src="" alt="Development of the One.Thing.Less mobile app and platform"> <p><em>Development of the One.Thing.Less mobile app and platform.</em></p> <h2>Data, public or private, should be controlled by whom it belongs to</h2> <p>If you follow us since a while, you surely know that the data topic is part of our DNA. From <a href="">API</a> to allow interoperability of IT systems, to <a href="">Open Data</a> to give back transparency to citizens, we were always involved in such domains and will remain for the long run. This collaboration with One.Thing.Less confirms it.</p> <p>We can’t wait to put this product into your hands on the 25th of May. Stay tuned and sign up for a launch notification at <a href=""></a></p><img src="" height="1" width="1" alt=""/> Tensorflow and TFlearn or can deep learning predict if DiCaprio could have survived the Titanic? Wed, 25 Apr 2018 00:00:00 +0200 <p>Getting your foot into deep learning might feel weird, since there is so much going on at the same time. </p> <ul> <li>First, there are myriads of frameworks like <a href="">tensorflow</a>, <a href="">caffe2</a>, <a href="">torch</a>, <a href="">theano</a> and <a href="">Microsofts open source deep learning toolkit CNTK</a>. </li> <li>Second, there are dozens of different ideas how networks can be put to work e.g <a href="">recurrent neural networks</a>, <a href="">long short-term memory networks</a>, <a href="">generative adversarial networks</a>, <a href="">convolutional neural networks</a>. </li> <li>And then finally there are even more frameworks on top of these frameworks such as <a href="">keras</a>, <a href="">tflearn</a>. </li> </ul> <p>In this blogpost I thought I'd just take the subjectively two most popular choices Tensorflow and Tflearn and show you how they work together. We won't put much emphasis on the network layout, its going to be a plain vanilla two hidden layers fully connected network. </p> <p>Tensorflow is the low-level library for deep learning. If you want you could just use this library, but then you need to write way more boilerplate code. Since I am not such a big fan of boilerplate (hi java) we are not going to do this. Instead we will use Tflearn. Tflearn used to be an own opensource library that provides an abstraction on top of tensorflow. Last year Google integrated that project very densely with Tensorflow to make the learning curve less steep and the handling more convenient. </p> <p>I will use both to predict the survival rate in a commonly known data set called the <a href="">Titanic Dataset</a>. Beyond this dataset there are of course <a href="">myriads</a> of such classical sets of which the most popular is the <a href="">Iris dataset</a>. It handy to know these datasets, since a lot of tutorials are built around those sets, so when you are trying to figure out how something works you can google for those and the method you are trying to apply. A bit like the hello world of programming. Finally since these sets are well studied you can try the methods shown in the blogposts on other datasets and compare your results with others. But let’s focus on our Titanic dataset first.</p> <h3>Goal: Predict survivors on the Titanic</h3> <p>Being the most famous shipwreck in history, the Titanic sank after colliding with an iceberg on 15.04 in the year 1912. From all the 2224 passengers almost 1502 died, because there were not enough lifeboats for the passengers and the crew. </p> <img src="" alt="Titanic"> <p>Now we could say it was shear luck who survived and who sank, but we could be also a be more provocative and say that some groups of people were more likely to survive than others, such as women, children or ... the upper-class. </p> <p>Now making such crazy assumptions about the upper class is not worth a dime, if we cannot back it up with data. In our case instead of doing boring descriptive statistics, we will train a machine learning model with Tensorflow and Tflearn that will predict survival rates for Leo DiCaprio and Kate Winslet for us. </p> <h3>Step 0: Prerequisites</h3> <p>To follow along in this tutorial you will obviously need the titanic data set (which can be automatically downloaded by Tflearn) and both a working Tensorflow and Tflearn installation. <a href="">Here</a> is a good tutorial how to install both. Here is a quick recipe on how to install both on the mac, although it surely will run out of date soon (e.g. new versions etc..):</p> <pre><code class="language-bash">sudo pip3 install\=OPT,TF_BUILD_IS_PIP\=PIP,TF_BUILD_PYTHON_VERSION\=PYTHON3,label\=mac-slave/lastSuccessfulBuild/artifact/pip_test/whl/tf_nightly-1.head-py3-none-any.whl</code></pre> <p>If you happen to have a macbook with an NVIDIA graphic card, you can also install Tensorflow with GPU support (your computations will run parallel the graphic card CPU which is much faster). Before attempting this, please check your graphics card in your &quot;About this mac&quot; first. The chances that your macbook has one are slim.</p> <pre><code class="language-bash">sudo pip3 install</code></pre> <p>Finally install TFlearn - in this case the bleeding edge version:</p> <pre><code class="language-bash">sudo pip3 install git+</code></pre> <p>If you are having problems with the install <a href="">here</a> is a good troubleshooting page to sort you out. </p> <p>To get started in an Ipython notebook or a python file we need to load all the necessary libraries first. We will use numpy and pandas to make our life a bit easier and sklearn to split our dataset into a train and test set. Finally we will also obviously need tflearn and the datasets. </p> <pre><code class="language-python">#import libraries import pandas as pd import numpy as np from sklearn.model_selection import train_test_split import tflearn from tflearn.data_utils import load_csv from tflearn.datasets import titanic</code></pre> <h3>Step 1: Load the data</h3> <p>The Titanic dataset is stored in a CSV file. Since this toy dataset comes with TFLearn, we can use the TFLearn load_csv() function to load the data from the CSV file into a python list. By specifying the 'target_column' we indicate that the labels - so the thing we try to predict - (survived or not) are located in the first column. We then store our data in a pandas dataframe to easier inspect it (e.g. df.head()), and then split it into a train and test dataset. </p> <pre><code class="language-python"># Download the Titanic dataset titanic.download_dataset('titanic_dataset.csv') # Load CSV file, indicate that the first column represents labels data, labels = load_csv('titanic_dataset.csv', target_column=0, categorical_labels=True, n_classes=2) # Make a df out of it for convenience df = pd.DataFrame(data) # Do a test / train split X_train, X_test, y_train, y_test = train_test_split(df, labels, test_size=0.33, random_state=42) X_train.head()</code></pre> <img src="" alt="dataframe" format="png"> <p>Studying the data frame you also see that we have a couple of infos for each passenger. In this case I took a look at the first entry:</p> <ul> <li>Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)</li> <li>name (e.g. Allen, Miss. Elisabeth Walton)</li> <li>gender (e.g. female/male)</li> <li>age(e.g. 29)</li> <li>number of siblings/spouses aboard (e.g. 0)</li> <li>number of parents/children aboard (e.g. 0)</li> <li>ticket number (e.g. 24160) and</li> <li>passenger fare (e.g. 211.3375)</li> </ul> <h3>Step 2: Transform</h3> <p>As we expect that the ticket number is a string we can transcode it into a category. But since we don’t know which Ticket number Leo and Kate had, let’s just remove it as a feature. Similarly the name of a passenger in its form as a simple string is not going to be relevant either without preprocessing. To keep things short in this tutorial, we are simply going to remove both columns. We also want to dichotomize or <a href="">label-encode</a> the gender for each passenger mapping male to 1 and female to 0. Finally we want to transform the data frame back into a numpy float32 array, because that's what our network expects. To achieve those things I wrote a small function that works on a pandas dataframe does those things:</p> <pre><code class="language-python">#Transform the data def preprocess(r): r = r.drop([1, 6], axis=1,errors='ignore') r[2] = r[2].astype('category') r[2] = r[2] for column in r.columns: r[column] = r[column].astype(np.float32) return r.values X_train = preprocess(X_train) pd.DataFrame(X_train).head()</code></pre> <p>We see that after the transformation the gender in the data frame is encoded as zeros and ones. </p> <img src="" alt="transformed" format="png"> <h3>Step 3: Build the network</h3> <p>Now we can finally build our deep learning network which is going to learn the data. First of all, we specify the shape of our input data. The input sample has a total of 6 features, and we will process samples per batch to save memory. The None parameter means an unknown dimension, so we can change the total number of samples that are processed in a batch. So our data input shape is [None, 6]. Finally, we build a three-layer neural network with this simple sequence of statements. </p> <pre><code class="language-python">net = tflearn.input_data(shape=[None, 6]) net = tflearn.fully_connected(net, 32) net = tflearn.fully_connected(net, 32) net = tflearn.fully_connected(net, 2, activation='softmax') net = tflearn.regression(net)</code></pre> <p>If you want to visualize this network you can use Tensorboard to do so, although there will be not much to see (see below). Tensorflow won't draw all the nodes and edges but rather abstract whole layers as one box. To have a look at it you need to start it in your console and it will then become available on <a href="http://localhost:6006">http://localhost:6006</a> . Make sure to use a chrome browser when you are looking at the graphs, safari crashed for me. </p> <img src="" alt="tensorboard" format="png"> <pre><code class="language-bash">sudo tensorboard --logdir=/tmp</code></pre> <p>What we basically have are 6 nodes, which are our inputs. These inputs are then connected to 32 nodes, which are then all fully connected to another 32 nodes, which are then connected to our 2 output nodes: one for survival, the other for death. The activation function <a href="">softmax</a> is the way to define when a node &quot;fires&quot;. It is one option among among others like <a href="">sigmoid</a> or <a href="">relu</a>. Below you see a schematic I drew with graphviz based on a dot file, that you can download <a href="">here</a> . Instead of 32 nodes in the hidden layers I just drew 8, but you hopefully get the idea. </p> <img src="" alt="Graph" format="png"> <h3>Step 4: Train it</h3> <p>TFLearn provides a wrapper called Deep Neural Network (DNN) that automatically performs neural network classifier tasks, such as training, prediction, save/restore, and more. I think this is pretty handy. We will run it for 20 epochs, which means that the network will see all the data 20 times with a batch size of 32, which means that it will take in 32 samples at once. We will create one model without <a href="">cross validation</a> and one with it to see which one performs better. </p> <pre><code class="language-python"># Define model model = tflearn.DNN(net) # Start training (apply gradient descent algorithm), y_train, n_epoch=20, batch_size=32, show_metric=True) # With cross validation if you want model2 = tflearn.DNN(net), labels, n_epoch=10, batch_size=16, show_metric=True, validation_set=0.1) </code></pre> <h3>Step 4: Evaluate it</h3> <p>Well finally we've got our model and can now see how well it really performs. This is easy to do with:</p> <pre><code class="language-python">#Evaluation X_test = preprocess(X_test) metric_train = model.evaluate(X_train, y_train) metric_test = model.evaluate(X_test, y_test) metric_train_1 = model2.evaluate(X_train, y_train) metric_test_1 = model2.evaluate(X_test, y_test) print('Model 1 Accuracy on train set: %.9f' % metric_train[0]) print("Model 1 Accuracy on test set: %.9f" % metric_test[0]) print('Model 2 Accuracy on train set: %.9f' % metric_train_1[0]) print("Model 2 Accuracy on test set: %.9f" % metric_test_1[0])</code></pre> <p>The output gave me very similar results for the train set (0.78) and the test set (0.77) for both the normal and the cross validated model. So for this small example the cross validation does not really seem to play a difference. Both models do a fairly good job at predicting the survival rate of the Titanic passengers. </p> <h3>Step 5: Use it to predict</h3> <p>We can finally see what Leonardo DiCaprio's (Jack) and Kate Winslet's (Rose) survival chances really were when they boarded that ship. To do so I modeled both by their attributes. So for example Jack boarded third class (today called the economy class), was male, 19 years old had no siblings or parents on board and payed only 5$ for his passenger fare. Rose traveled of course first class, was female, 17 years old, had a sibling and two parents on board and paid 100$ for her ticket. </p> <pre><code class="language-python"># Let's create some data for DiCaprio and Winslet dicaprio = [3, 'Jack Dawson', 'male', 19, 0, 0, 'N/A', 5.0000] winslet = [1, 'Rose DeWitt Bukater', 'female', 17, 1, 2, 'N/A', 100.0000] # Preprocess data dicaprio, winslet = preprocess([dicaprio, winslet], to_ignore) # Predict surviving chances (class 1 results) pred = model.predict([dicaprio, winslet]) print("DiCaprio Surviving Rate:", pred[0][1]) print("Winslet Surviving Rate:", pred[1][1])</code></pre> <p>The output gives us:</p> <ul> <li><strong>DiCaprio Surviving Rate: 0.128768</strong></li> <li><strong>Winslet Surviving Rate: 0.903721</strong></li> </ul> <img src="" alt="Rigged game"> <h3>Conclusion, what's next?</h3> <p>So after all we know it was a rigged game. Given his background DiCaprio really had low chances of surviving this disaster. While we didn't really learn anything new about the outcome of the movie, I hope that you enjoyed this quick intro into Tensorflow and Tflearn that are not really hard to get into and don't have to end with a disaster. </p> <p>In our example there was really no need to pull out the big guns, a simple regression or any other machine learning method would have worked fine. Tensorflow, TFlearn or Keras really shine though when it comes to Image, Text and Audio recognition tasks. With the very popular Keras library we are almost able to reduce the boilerplate for these tasks even more, which I will cover in one of the future blog posts. In the meantime I encourage you to play around with neural networks in your browser in this <a href="">excellent</a> <a href="">examples</a> and am looking forward for your comments and hope that you enjoyed this little fun blog post. If you want you can download the Ipython notebook for this example <a href="">here</a> .</p><img src="" height="1" width="1" alt=""/> Neo4j graph database and GraphQL: A perfect match Tue, 24 Apr 2018 00:00:00 +0200 <p>As a graph database enthusiast I often got asked the question “So, as one of our graph expert, can you answer this GraphQL question?”. </p> <p>Every time I have said that a graph database is not the same as GraphQL, and that I was unable to answer the question. </p> <p>Another colleague of mine, Xavier, had the same experience. So we decided to learn about GraphQL so that we could answer the questions, and because it seemed like a really cool way to write API’s. </p> <p>So before we dig into this any further, a quick overview of what a graph database is and what GraphQL is: </p> <ul> <li>A graph database is a database storing data in a graph, which is super performant when it comes to data which has relations to other data, which is a lot of our modern complex data sets.</li> <li>GraphQL is a way to query data. A flexible API protocol that lets you query just the data that you need and get that in return. Any storage can be behind this data, GraphQL is <em>not</em> responsible for storage. </li> </ul> <p>It was me and Emanuele on the innoday, where we decided to try out Neo4j with the GraphQL extension. </p> <p>Emanuele had never tried Neo4j before, and was amazed at how easy it is to get going. GraphQL was also super easy to install. Basically what you do is: </p> <ol> <li>Download Neo4j desktop</li> <li>Go trough the very obvious setup</li> <li>Click “install” on the plugin GraphQL which is already listed</li> <li>Use “browser” to populate data</li> <li>Query the data with GraphQL that automatically, thanks to the plugin, has a schema for you</li> </ol> <p>With this technology we aim to solve a problem that we currently have. The problem is one of our REST API’s which has a LOT of data. When a consumer queries for something that they need, they will get a lot of things that they don’t need, that another consumer may need. We end up wasting bandwidth.</p> <p>How does GraphQL solve this? And where does Neo4j come in?<br /> In GraphQL we query only for the data that we need, making a POST request. A request may look like: </p> <pre><code>{ Product { title product_id } }</code></pre> <p>If we have a graph database with a single node with the label “Product” and the properties “title” and “product_id” we would get this behaviour out of the box with the GraphQL plugin. Tada! A simple API is created, all you had to do was add the data! DONE. </p> <p>This makes it very easy for the consumer to query JUST for the data that they need. They can also see the entire schema where we can write descriptions in the form of comments, so that the consumer knows what to do with it.</p> <p>How does Neo4j work, and how do you add data? It is super simple! With Neo4j desktop, you click “manage” on your database and then “Open Browser”, in browser you can browse your data and execute queries in a graphical way.</p> <img src="" alt="desktop" format="png"> <img src="" alt="browser" format="png"> <p>You can use browser to click trough and start learning, it is also a great tool for learning and experimenting, so as for learning Neo4j I leave you with browser to try your own hands on experience. Have fun playing with it! They explain it really well.</p> <img src="" alt="browser-learning" format="png"> <p>Ok, so simple graphs are very easy to query for, but how do we query for relationships with GraphQL and Neo4j? Let’s say we have a node called “Product” again, the node does not have the “title” directly on the product, but in a node labeled “TranslatedProductProperties”. This node will have all the translated properties that the product may need. Such as the title. The relation is called “in_language”. The relation itself has a property called “language”. This is how the actual graph looks like in browser:</p> <img src="" alt="graph" format="png"> <p>In GraphQL we would need a custom IDL (schema) for that to be queried nicely. Our goal would be to query it the following way, we simply give it the title and the language, and we as GraphQL don’t need to know about the actual relations: </p> <pre><code>{ Product { product_id title(language: "de") } }</code></pre> <p>A simple IDL for that could look like: </p> <pre><code>type Product { product_id: String # The title in your provided language =D title(language: String = "de"): String @cypher(statement: "MATCH(this)-[:in_language {language:$language}]-(pp:ProductProperties) return pp.title") }</code></pre> <p>So here you can see that with the help of the GraphQL plugin we could make a cypher query to get whatever data we want for that field. Cypher is the query language for Neo4j.</p> <p>These things together being able to get any nodes and relationships, which are already decoupled unless you choose to couple them, and the querying flexibility makes Neo4j + GraphQL an API as flexible as... Rubber? An awesome match!</p><img src="" height="1" width="1" alt=""/>