About The Conference
What is DataMass?
It’s not just another conference. It’s an event created with passion – and targeted at those who use Big Data in practice in their everyday work. The main idea behind the conference is to promote knowledge and experience in designing and implementing tools for the analysis of big data volumes.
We connect the people who care
We believe that building a community of dedicated experts will help us share knowledge and exchange experience in shaping scalable and distributed computing solutions.
We value real-life expertise – is there any better way to learn than from actual specialists? Our speakers, true practitioners from top data-driven companies, will show you what they have learned and discovered about the Big Data world.
Answering your needs
Playing an active role in the Big Data world, we could see that a big technical conference focused on the exchange of knowledge and experience in this field was needed in Poland. The aim of DataMass is to create a synergy between businesses focused on the creation and implementation of Enterprise-class solutions, together with experience and knowledge of the academic environment. The Data Science meetup in Gdańsk is a tangible proof that this is currently possible.
Hadoop Application Development
Creation of distributed computing solutions using Hadoop framework. Architecture, implementation and testing of software based on a cluster or a distributed system of files. Creation of ETL processes and reporting systems.
Hadoop in R&D centers
Cluster solutions in R&D centres. One of the examples is the use of the platform for computing in grant-funded activities at technical universities. We show how a cluster may be used for advance computing and modelling.
Hadoop in business
Business application of Big Data tools. Typical using case studies showing why the world of big data wins the market all over the world. Pros and cons of solutions in terms of efforts and costs related to the implementation and maintenance of such environments. Reference to commercial vs. open-source technologies.
Real-time processing of data going to a cluster. This form of data analysis enables an immediate response to any generated activity. It is not only about the activity generated by users themselves, but also about devices communicating with each other within the IoT standard.
Data Science, Analytics and Reporting
Data analysis systems using machine learning and artificial intelligence. Creation and optimisation of analytically sophisticated solutions. Methods of modelling and verifying developed solutions.
Management and administration of clusters based on BigData technologies. In this subject, we focus on issues related to the installation and maintenance of advance cluster solutions (installation, configuration, decomposition and update of the cluster). This cycle also touches upon the methods of automating maintenance processes within the cluster.
Jakub Nowacki, PhD
Data Scientist and Engineer at Sotrender
Jakub is University of Bristol graduate where he obtained PhD in Engineering Mathematics. On the daily basis he utilizes his analytical and development skill working as a Data Scientist and Engineer. He is mostly interested in distributed processing analysis of big data sets and machine learning. Jakub originally has C/C++ background but currently works mostly in JVM and Python world.
MVP AI, CEO@TIDK
Data enthusiast, architect and designer of BI and Big Data solutions. Together with the TIDK team – Data Scientist as a Service, where he is a president, he implements projects related to advanced analytics and data science, in particular machine learning and artificial intelligence. Since 2010, he has been awarded by Microsoft MVP in the Data Platform category. From July 2018, he is one of the 50 MVPs in the world in the AI category. Scientifically connected with the Faculty of Computer Science at the Poznan University of Technology. Member of the board of PTI Wielkopolska and Polish Society of Artificial Intelligence.
Data scientist at ClickMeeting
Dawid is specialised in analysing and predicting sport results. Currently, the biggest emphasize put in gathering and predicting speedway results. Personaly NBA fan and former long distance runner. Active member of R community, author of two R packages “sport” and “runner” available on CRAN.
Sales Engineer at Cloudera
Balázs Gáspár is a Sales Engineer at Cloudera, covering the countries of Central and Eastern Europe, helping customers identify use-cases and find the right technical solution to turn their data into business value. Balázs has eight years international presales experience in enterprise IT solutions.
Software Engineer at Intel
Maciej Godek is a philosopher of programming languages.He spends most of his evenings hacking in Scheme and answering programming questions on [Quora](https://www.quora.com/profile/Panicz-Godek).
You can also catch him on Twitter(@PaniczGodek)[https://twitter.com/PaniczGodek].
Lead Developer at 7N/Roche
Lead software developer and researcher studying the field of business management using Cool Stuff (a.k.a. machine learning, simulated environment, etc). Member of Polish Association of Economists. Loves travelling with a backpack, collecting adventures and creating things. Does not like people who think there is only one good programming language in the world.
Cyber Security Manager at Atos
Manager, Architect and Transformer having almost 15 years of experience in various IT and Security
global roles in operations, transitions, designs, strategies and continual improvements. Having
DevOps principles at heart, constantly focused on creating value for clients.
In the current role, responsible for end-to-end transformation of all cybersecurity services for big
global client, including advanced security analytics based on big data technologies, which is also
known as prescriptive security or Prescriptive SOC.
Data Scientist at Atos
Ewa Tułodziecka is a Data Scientist in Atos where she is currently working on building prescriptive security and user behavior analytics capabilities. Prior to Atos she was engaged as a Data Scientist developing real-time bidding and optimization algorithms for a performance marketing agency. She holds an MSc in Business Information Management from Erasmus University Rotterdam.
Technical Product Manager at Relativity
Sebastian Kacza is a senior technical product manager for the RelativityOne Staging Explorer at Relativity. Relativity is the on-premise and SaaS platform that helps organizations around the world to manage and analyze unstructured data during litigation and investigations. Sebastian is also a co-organizer of the Krakow Product Tank meetup, a part of Mind the Product, the world’s largest community of product managers, designers, and developers. Before joining Relativity, Sebastian led product management strategies at Oracle and Sabre Corporation.
Piers Campbell, PhD
Lead Data Scientist at Kainos
Dr Piers Campbell leads data science within the Data and Analytics capability at Kainos. Piers has significant experience in data driven decision making and predictive models across multiple domains. Piers has designed and delivered advanced analytics solutions for Government, Telecos, Financial Services and Professional Sports Teams. He is currently engaged with DVSA transforming their services through Machine Learning and Analytics, with a particular focus on risk and compliance. Before joining Kainos Piers worked with both Accenture and Deloitte having previously spent eight years in academic roles at universities in the UK, Middle East and Australia.
Senior Data Engineer at Etsy
Emily Sommer is a senior data engineer at Etsy. She has worked on Etsy’s internal A/B testing and experimentation platform and is currently helping to build out Etsy’s streaming data analytics platform.
Marcin Siatkowski, PhD
Data Scientist at Roche
Whenever there is a need to answer difficult, data-driven business questions or build data science prototypes, Marcin with his IT savvy colleagues are ready to tackle the challenges with cutting edge technologies.
His role is to lead end-to-end proofs of concept that always have two things in common: data and innovation.
Prior to Roche, Marcin was an advanced analytics back-end in strategic consulting at McKinsey & Co. and a scientific researcher at Medical University in Rostock, Germany. He holds a PhD in applied mathematics.
Stanisław Raczyński, PhD
Senior Researcher at PICTEC
Stanisław Raczyński has been working with data and signal processing and analysis since 2006, where he experimented with ICA and NMF, and R as an early adopter of that language. He obtained his Ph.D. from the prestigious University of Tokyo in 2011 and since then he was working at different research institutions: the University of Tokyo (2011), INRIA (France; 2011-2013) and now Gdansk University of Technology and a non-public research institute PICTEC. He is mostly interested in applying novel methods in the fields of ML, optimization, signal processing, NLP and DLT to innovate business and industry in the region.
Kamil Folkert, PhD
Member of the Board, CTO at 3Soft S.A.
In 3Soft Kamil is responsible for analyzing technological trends and implementing them into the projects realized at the edge of IT and business, with special concern for Big Data technologies (including Apache Hadoop and Apache Spark). Kamil is leading the teams of architects and analysts as well as is operationally engaged in architecture design and technology recommendation. He has hands-on experience in designing, implementing and delivering complex Big Data architectures that are leveraged to automate advanced analytics, including deep learning, for customers from financial services, telco, industry, social media and healthcare.
CEO at SaaS Manager, CTO at Neoteric
Creator and Chief Everything Officer at SaaS Manager – an enterprise software that uses Artificial Intelligence to help telecoms, insurance, and B2C subscription-based services reduce churn and multiply Customer Lifetime Value. 10+ years experience in the IT, portfolio built of Cloud, Big Data and AI for the Fortune 500 companies. Creator of a system scalable to thousands of machines and supporting 32 000 requests per second. Experienced in distributed systems, predictive analysis & AI, Cloud & API development. Enthusiast of microservices architecture and DevOps culture.
Head of Data Science at VirtusLab
Grzegorz Gawron is a lead software engineer/manager with interests in advanced analytics and a taste for theory (that makes him a computer scientist surely!).
He acts as the Head of Data Science at VirtusLab. Previously doing data engineering at Base CRM. Before that trading systems for banking (PRM). … It all started with a :joy: of having the first Commodore 64 program stored securely on a magnetic tape.
He holds an MSc in computer science and an MSc in economics from the University of Warsaw.
Interests in data, algorithms, software engineering, distributed systems, machine learning.
Data & Analytics Capability Lead at Kainos
Bill heads up the Kainos Data and Analytics Capability and has a leadership role across a number of engagements, as well as defining policy around data ethics. By profession he is an Enterprise Data Architect and has spent 20 years been wrestling with data challenges in commerce, government and the third sector.
- Stage A
Participants registration and coffee time8:00 - 9:00
The official start of the conference9:00 - 9:20
Science, Really?9:20 - 9:50Piers CampellThis presentation examines the phenomenon that is data science. Lauded as the “sexiest job of the 21st century”, it grew out of humble and pretty un-sexy beginnings to become the must have CV line item. How did it get to where it is today? Can anyone do it? Where might it be going next? And the question I’m asked most frequently, is it really a science? I will attempt to answer all of these questions, along with exploring, how the evolution from big to smart data and might affect the future of data science for practitioners and the wider data community.
Five things to consider when you plan to discover and automate your predictive models on top of Apache Hadoop10:00 - 10:30Kamil FolkertLeveraging the Apache Hadoop ecosystem for data mining and development of predictive models can be really beneficial, shortening the time needed to finally operationalize the model on production datasets at scale. In the same time, it results in many challenges that need to be considered. During this presentation, I would like to focus on five the most important of them, selected basing on real-life projects realized in finance, retail and industry sectors.
Tackling important Healthcare challenges through a Data Science competition on Real World Data10:40 - 11:10Marcin SiatkowskiAn important component of healthcare ecosystem currently is Real World Data (RWD), data relating to patient health status and/or the delivery of health care and outcomes routinely collected from a variety of sources. RWD can come from a number of sources, for example electronic health records, claims and billing activities or health-monitoring devices. The RWD Roche team organized an internal Kaggle-like Data Science competition within Roche Group to bring together analytic talent on a real problem, predicting mortality risk in cancer patients on Real World Data. Marcin with his IT savvy colleagues joined the competition and built a top performing solution.
Data Science in Cyber Security based on Prescriptive SOC11:20 - 12:00Ewa Tułodziecka and Maciej ŻarskiPrescriptive SOC (Security Operations Center) leverages Data Scientists on Atos Data Lake Appliance in order to constantly develop models in UEBA (User and Entity Behavior Analytics), which are orchestrated with number of security solutions and platforms. Combined with Atos Big Data analytics capabilities and powered by Bullion servers, the new security solution makes it possible for customers to predict security threats before they even occur. Detection and neutralization time is improved significantly compared to existing solutions.
Building a Cloud Data Warehouse with a Shared Data Experience12:00 - 12:45Balázs GáspárCloud adoption is accelerating and organisations of every size begin to leverage the benefits -flexibility, agility, cost-effectiveness. By moving data engineering and analytics workloads to the cloud, businesses can achieve a faster time to insights through self-service. Cloudera helps enterprises to unlock value from their data faster across their data centers and the public cloud. Our Cloudera SDX and Altus technologies help business users benefit from cloud flexibility, enable enterprise IT to control data workloads deployed anywhere, and deliver a shared data experience.
The results of the KAINOS competition12:50 - 13:30
LUNCH13:30 - 14:30
Pragmatic Machine Learning for Business14:30 - 15:00Jakub NowackiCurrently machine learning and artificial intelligence get a lot of attention, especially from business, for which these concepts are quite new. Often when you go through all the examples available online, it comes out that all the general models distinguishing cats from dogs are not quite suitable for your particular business. Moreover, very little is said about how to obtain data in the first place, as most of the examples use available data sets, which are clean and well prepared for the task; quite opposite to what you usually have in real business. In this talk I will take you through a classification example, which adopts existing model using transfer learning to a specific business case, including data set creation. Then we will harness the power of the distributed computing to train the model and deploy it to production in accordance to the serverless paradigm. The presentation proposes example solutions, which can be transferred into different systems.
Machine Learning to the Rescue: Finding One Email Out of Two Million to Win15:00 - 15:30Sebastian KaczaWhen a former employee of a hospital staffing agency was accused of stealing CRM data and supplying it to a competitor, Cozen O’Connor was hired to assist the U.S. Department of Justice in prosecuting the employee and filing a civil case against the complicit competing agency. Faced with an expedited trial schedule and two million documents, the firm took a bold approach to quickly comb through the relevant documents returned by the system. In this session, you will learn about technology and method that make the effective review possible in just two weeks. You will have a chance to see the backstage of how machine learning engine empowers legal teams to organize data, discover the truth and act on it.
Data Lake, Big Data Analytics & Advanced Analytics on Azure15:40 -Łukasz GralaIn the contemporary world, the need for efficient mechanisms for data analytics is not surprising anymore. What is more important, those mechanisms should be not only efficient but also valuable for an enterprise. During the course of this session, you will learn about integrating a large amount of data in Data Lake service. Practical information about advanced analytics based on machine learning and artificial intelligence will also be provided. All scenarios assume the use of Microsoft Azure resources.
- Stage B
Participants registration and coffee time8:00 - 9:00
The official start of the conference9:00 - 9:20
How can I build a customer churn model which can also identify reasons for churn?10:00 - 10:30Grzegorz Gwoźdź
Discovering treasures in Natural Language data10:40 - 11:10Piotr ŚliwaHow to process large amount of Natural Language documents and extract meaningful features from this sophisticated source of data? What are the tools to achieve this goal and how efficient they are? This presentation will show an interesting example of text feature extraction using spaCy and textacy libraries and its application with a supervised learning to produce a predictive model.
Engineering Methods to Improve the Performance of Data Science Models11:20 - 12:00Grzegorz GawronIn situations where 1% optimisation improvement might be worth millions – algorithm tuning, parallelisation etc – might turn out to be just the things that count. Grzegorz Gawron walks you through some cases where software engineering (computer science?) goes hand in hand with data science. All based on VirtusLab's real-world projects.
Data ethics: from compliance to data trust12:00 - 12:45Bill WilsonConferences like Datamass are exciting because every year there are new opportunities to exploit data and new technologies to do that more effectively. However, as highlighted by recent high-profile news stories in the area of data privacy, those exploitation opportunities may immediately run into some ethical challenges. We already have some privacy regulation, and no doubt more is on the way but data ethics goes beyond compliance. Could trust be a differentiator for your business? How can governments maintain data trust with citizens? This session will look at what can go wrong and some steps we might take if we want to lead on data trust. As data professionals who understand the power of the tools in our hands, I believe we need to lead on how to use them responsibly. This is not a technical presentation (you won’t see any demos), but is particularly relevant for those doing data science with data sets that contain personal data or where individuals can be re-identified. A general audience will also appreciate this session – after all we’re all data subjects…
The results of the KAINOS competition12:50 - 13:30
LUNCH13:30 - 14:30
Facilitating ambient intelligence with distributed ledgers: a case study in smart energy14:30 -15:00Stanisław RaczyńskiIn this talk I will paint a vision of how distributed ledger technologies will create a new paradigm in big data. I will then tell you how IOTA is about to facilitate ambient intelligence of distributed energy resources (DER) in the free market of data and energy. I will tell you how your EV, your EV charger, your photovoltaics and your home energy storage will communicate, learn from a global decentralized database and make autonomous cost-optimizing decisions and suggestions.
The Hassle with Monads15:00 - 15:30Maciej GodekMonads are a concept from functional programming that tends to confuse most newcomers. Having studied them for five years, Maciej is hoping to warn his audience about some common pitfalls that one can run into when trying to comprehend them.
Predicting speedway results using online update algorithms15:40 - 16:10Dawid KałędkowskiSport or computer games rivalries need up-to-date estimation of players strength to suggest relevant opponent, specify risk or value bets. Online updating algorithms perform very well when data flows are continuous or sudden, saving computation capacity and processing time. Examining performance of speedway riders, several methods will be presented, explained and applied using R. You will learn also who was the best speedway rider last year, who is at the top now, and how riders abilities changes in time.
While many technical conferences present the theory of whatever technology they cover, it was great to see that both the organizers and the speakers of DataMass paid attention to the practical part above all. Theory is one thing, but the implementation of it is a whole new experience – and it’s great to have an event that gives big data professionals space to present their work.
TidK, MVP Microsoft
It’s amazing to see that Big Data is becoming more and more popular as numerous meetups and conferences spring up all across Europe. Gdańsk needed an event like that, and I believe that DataMass is a great occasion for those who use big data to meet other experts and enthusiasts and chat about their experiences and lessons learned.
Balazs Ferenc Gaspar
Being a speaker at the first-ever DataMass Summit was a fantastic experience. Not only was it an occasion to exchange knowledge with other professionals from the field, but also a great opportunity to see how the community of big data enthusiasts is growing and developing. Combining great fun with practical knowledge is the key to a successful event of this type – and the organizers managed to do it.
Jakub Nowacki, PhD
DataMass Summit 2017 was the first edition of the conference and definitely not just another conference of this kind. How was it different? First of all – the venue. It took place in a post-industrial shipyard building. Not a very typical place to hold a technical conference, but it worked! Secondly, the conference was purely practical. Speakers focused on their experience, and not just the theoretical part, which in turn gives a lot of space for others to learn.
Video about DataMass 2017