Latest Blogs


Using Transaction Data for Optimal Customer Segmentation Analysis posted by Mosaic Data Science A common challenge in customer segmentation analysis is to differentiate customers based on features that are relevant to purchasing behavior. For example, clothing retailers may be interested in segmentation based on gender, as this typically relates to style and sizing preferences. However, a financial institution may be more interested in prospects’ age, as age may matter much more than gender when it comes to investment product preferences. In these instances, the business may already know which customer characteristics impact product choices. In other situations, however, it may be less obvious. [su_button url=”https://mosaicatm.com/blogs/customer-segmentation-analysis-blog/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Root Cause Analysis of Telemetry Failures posted by Mosaic Data Science When executives at a large management consulting firm noticed that their Microsoft Office applications sometimes took upwards of 10 seconds to load, the firm’s IT department knew it had a problem. IT professionals suspected that one or more custom add-ins (e.g., macros with brand-consistent templates) might be to blame. Mosaic Data Science, a leading data science consulting firm, was brought in to investigate two questions: were particular versions of add-ins leading to longer-than-normal load times; and were particular computers experiencing long load times more often than others? [su_button url=”https://mosaicatm.com/blogs/root-cause-analysis-of-telemetry-failures/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Musings on Deep Learning posted by Mosaic Data Science For those folks not breathlessly tracking the latest developments in RNNs, CNNs, LSTMs, and TGIFs (just kidding), I’ll start with a quick overview of the topic. Deep learning models are a subclass of artificial neural network (ANN) models. ANNs are mathematical models, meaning simply that quantitative data goes in one side and a quantitative result comes out the other. As the name implies, the structure of ANNs takes inspiration from the structure of biological brains. The “neurons” in a neural network are simple mathematical functions (linear, step, sigmoid, etc.), called “activation functions.” But as you connect many of these simple functions by using the output of one set of functions as the inputs to the next set (the “network”) then you can begin to represent some very complex functions. [su_button url=”https://mosaicatm.com/blogs/musings-on-deep-learning/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Aircraft Trajectory Optimizer posted by Mosaic Data Science In this blog post we examine the use of predictive analysis and optimization for aircraft trajectory. The SSN Route Optimizer is a trajectory optimizer that utilizes the Clearable Routes Network (CRN) to find optimized trajectories between an origin point (which may be the aircraft’s current position en route) and a destination. The optimal path is a true 4D path that considers aircraft performance, aircraft category (S/L/H), engine type (P/T/J), and climbs/descents/cruise segments. The CRN contains a network representation of the 3-D structure of flight clearances and is created by the cleverly named CRN Generator. The Route Optimizer also relies on the aircraft performance model and a network-based search algorithm such as the A-Star search algorithm. Each is described below, followed by a brief discussion of how everything is pulled together. [su_button url=”https://mosaicatm.com/blogs/aircraft-trajectory-optimizer/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Aircraft Performance Model Data Mining posted by Mosaic Data Science Being a premier analytics consulting firm, we frequently encounter data mining projects. In this blog we wanted to share a recent experience of data mining that helped guide the optimization of aircraft takeoff. We have needed to be able to predict how long a flight will take to fly its trajectory. Quite often, it has been adequate and possible to use the outputs of one of our predictive analysis tools for this purpose. It predicts both the arrival time (ETA) as well as some intermediate times that we have used in a variety of other places. But what should we do when we can’t use our predictive analysis tool? For instance, what about when we’re planning a route rather than following an existing route that the system knows about? What do we do when the system isn’t good enough? A recent project faced some of these challenges. Despite having limited data, the resulting model turns out to be quite accurate. [su_button url=”https://mosaicatm.com/blogs/aircraft-performance-model-data-mining/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Tips for Customer Propensity Modeling posted by Mosaic Data Science A data scientist should use the insights from this exploratory analysis to drive feature engineering and model development activities. It is advantageous to follow an agile approach to model development. The data integration and modeling workflow should be implemented in an analytics platform to allow for multiple modeling approaches to be efficiently compared against each other. The data science consultant should focus on machine learning models for classification such as logistic regression, random forests, naïve Bayes classifiers, or support vector machines (SVMs). Value should be placed on model simplicity – model complexity should only be increased if there is sufficient performance improvement. The relative importance of model interpretability (the ability for human subject matter experts to understand the internal model logic) needs to be accounted for and should balance the objectives of model performance and interpretability. [su_button url=”https://mosaicatm.com/blogs/tips-for-propensity-modeling/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Intro to Docker posted by Mosaic Data Science With that basic outline in place, but with very little additional understanding, he wondered if Docker might help him. Our consultant needed to set up a Predictive Maintenance demo for an upcoming sales meeting. It requires various R libraries that are called via an R-to-Java bridge. It then relays the results through a web application. Previously he had only ever been able to successfully link this all up on Linux and each time that he had needed to dust it off, it had required some configuration work to make sure that all the right libraries are interacting properly. It hadn’t really had a permanent home, so it has been tough to get demos up and running quickly. Our team discussed setting up a new linux server for this, but it occurred to our consultant that this might be a case where Docker might be useful. [su_button url=”https://mosaicatm.com/blogs/intro-to-docker/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Human Decision Making in Machine Learning Deployment for Resume Matching posted by Mosaic Data Science, Part 2 of 2 The application of machine learning to text-based problem domains can use the text itself as a basis for explanation. Because the text is already understandable to a human observer, the groupings of text tokens and phrases can also be readily explained and understood. *Note that this is not intended to imply that all groupings or associations of words and phrases found through machine learning will be obvious and could have been found through trivial exploration. The point is that the groupings and associations derived through machine learning algorithms are more likely to be understandable because of their linguistic nature and will provide a basis for explanation of unique, unexpected, and/or hidden relationships between the resume and the job requirements. [su_button url=”https://mosaicatm.com/blogs/human-decision-making-in-machine-learning-deployment-for-resume-matching/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
Human Decision Making in Machine Learning Deployment for Air Traffic Flow Management posted by Mosaic Data Science, part 1 of 2 Although some machine learning models can provide limited insight into and explanation of the model outputs, most machine learning model output is highly obfuscated and opaque. In the realm of many decision support tools for military and other safety- or life-critical applications, it is necessary and appropriate for humans to be involved in decisions using the recommendations and guidance of computer automation and information systems. However, the opacity can lead users of the technology to doubt the reliability of the information or recommendation that is provided. This lack of understanding of the technology can result in distrust, and to eventual failure of the technology to receive acceptance and use in its intended operational domain. [su_button url=”https://mosaicatm.com/blogs/human-decision-making-machine-learning-deployment-for-atfm/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    

Filling Predictive Modeling Gaps with Anomaly Detection posted by Mosaic Data Science

Anomaly detection can be deployed alongside supervised machine learning models to fill an important gap in both of these use cases. Anomaly detection automates the process of determining whether the data that is currently being observed differs in a statistically meaningful and potentially operationally meaningful sense from typical data observed historically. This goes beyond simple thresholding of data. Anomaly detection models can look across multiple sensor streams to identify multi-dimensional patterns over time that are not typically seen.

[su_button url=”https://mosaicatm.com/blogs/anomaly-detection/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]

 

Data is Everywhere! posted by Mosaic Data Science

In the course of several recent data science projects, I’ve been examining data providers external to Mosaic. It’s certainly not the most exciting topic, but questions often to seem arise that are structured something like “If only we knew [X], then we could do [something awesome]” Trying to make progress on these projects has led me to chase down some data. Here are a few notes on various lessons and providers that may be useful for others.

[su_button url=”https://mosaicatm.com/blogs/data-is-everywhere/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]

 

pics-blog-mngins-7Data Science in Manufacturing posted by Mosaic Data Science

Manufacturing holds multiple predictive analytics and data science opportunities. With the rise of the Internet of Things (IoT) and data collection technologies becoming more accessible, manufacturing companies have a wealth of data to mine. Companies can use predictive analysis and optimization algorithms on these data sets to apply data-driven guidance and decision making to improve efficiency and quality, and to reduce costs.    

[su_button url=”https://mosaicatm.com/blogs/data-science-for-manufacturing/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]


Debating the Issues wtih NLP posted by Mosaic Data Science Since August of 2015, the presidential hopefuls from both major political parties have been joining in the primary debates to jockey for the two coveted positions in the general presidential election later this fall. The debates have been spirited and full of rich information about each of the candidates. Back in February, the folks at About Techblog did an analysis of the candidates’ language use in the debates up to that time (see Analyzing the Language of the Presidential Debates). We thought it would be interesting to parse through all of the data, including the primary debates that have occurred since About Techblog did their analysis, using our own NLP techniques [su_button url=”https://mosaicatm.com/blogs/debating-the-issues/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
NLP: Geeking out with Words posted by Mosaic Data Science While experts may debate exactly what makes a human being human, there are a couple of unique traits that everybody agrees upon. One of those traits is Language: the capacity to communicate one’s thoughts, ideas, and feelings to others through a highly complex system of vocal, visual, and orthographic signals. No other species on earth can do that in the same way or with the same level of complexity.

             [su_button url=”https://mosaicatm.com/blogs/nlp-geeking-out-with-words/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]

 
pics-blog-dataarch-11WordFrequency Models: A Natural Language Processing Technique posted by Mosaic Data Science In a recently completed project with a Mosaic client, we were able to use some Natural language processing (NLP) techniques to great effects. We used a word frequency model (also called bag of words) to parse resumes and then returned a set of most likely job roles the resume was suited for. Their metrics measured our outputs to be about ten times more accurate than what they were currently using. Since these models are pretty easy to use and can also be used for different types of NLP problems.   [su_button url=”https://mosaicatm.com/blogs/word-frequency-models/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-mngins-4Revolutionary information flow finding in the common squid posted by Mosaic Data Science In June of 2000, with much fanfare, the Human Genome Project completed the initial draft of the human genome. President Bill Clinton, with British PM Tony Blair and Francis Collins, then director of the National Human Genome Research Institute, announced that the newly decoded human genome would “revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.” Collins forecasted a grand vision of “personalized medicine” by 2010. The molecular biology revolution was producing an exponentially growing volume of data and expectations were high. But ten years later, in an article entitled “Revolution Postponed,” Scientific American conceded that the Human Genome Project had “failed so far to produce the medical miracles that scientists had promised.” Much excellent research work had been accomplished, but in the age of Big Data, the Human Genome Project is an example of how complex problems are not always solved merely with more data. Big Data sometimes needs Big Analysis. Consider the recent findings that the common squid, Doryteuthis pealeiirecodes, massively reprograms its own genetic data in real time.   [su_button url=”https://mosaicatm.com/blogs/revolutionary-information-flow-finding-in-the-common-squid/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-dataarch-10Ontology 101, Part 3: How to Create an Ontology posted by Mosaic Data Science

In Part 2 of the three part series, we discussed the motivation behind and a high-level overview of our TMI ontology. If you have yet to read either Part 1 or Part 2 of this series, please do so before continuing. In the final part of this series, we look at the steps that we took to create the TMI ontology. It is important to note that even though the examples for each step link back to the TMI ontology, the method that we utilized can be used for any domain. For the purposes of clarity, all references to specific classes, properties, and individuals contained within an ontology will be written in italics, like this.

[su_button url=”https://mosaicatm.com/blogs/ontology-101-part-3-how-to-create-an-ontology/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-dataarch-9Ontology 101, Part 2: A Practical Application of an Ontology posted by Mosaic Data Science

In Part 1 of the three part series, we discussed what an ontology is and what the key components are. If you have yet to read that article, please do so before continuing. In Part 2, we look at how an ontology can be applied to a domain, specifically our Traffic Management Initiative (TMI) ontology developed under the TMI Attribute Standardization (TAS) project. This article will first give a brief overview of why an ontology is needed for TMI data and then give a high-level overview of the ontology that we have created. For the purposes of clarity, all references to specific classes, properties, and individuals contained within an ontology will be written in italics, like this.

[su_button url=”https://mosaicatm.com/blogs/ontology-101-part-2-a-practical-application-of-an-ontology/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-dataarch-8Ontology 101, Part 1: What is an Ontology posted by Mosaic Data Science

Through the use of an ontology in the development process, each team member (i.e., business analysts, data architects, and developers) plays a crucial role in maintaining a consistent story and plan across all aspects of the application. Understanding that the word “ontology” is new to some people, I thought it would be useful to explore the world of ontologies by giving a more formal introduction.

[su_button url=”https://mosaicatm.com/blogs/ontology-101-part-1-what-is-an-ontology/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-dataarch-7The Taylor Series and Beyond posted by Mosaic Data Science

In the modern science of data analytics, sometimes oldies are goodies. I once took an optimization class where the answer to every question posed by the professor was “the Taylor series,” referring to a popular numerical method that will be 300 years old next year. Brook Taylor’s 1715 formulation, which can be traced back even further to James Gregory in the seventeenth century, is the foundation of a great many of today’s numerical methods, of which one of the most powerful is nonlinear batch least squares.

[su_button url=”https://mosaicatm.com/blogs/the-taylor-series-and-beyond/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-mngins-5New Healthcare Study on Survivability Yields Some Surprises posted by Mosaic Data Science Fitness trainers have long since debated the virtues of volume versus intensity. Should I do 50 pushups or a dozen bench presses? Now a new data analysis study of 58,000 heart stress tests suggests that when it comes to survivability, high stress exercise may be more important than high-repetitions. That may come as a surprise to those who like to take long walks. The first lesson healthcare researchers learn is to expect the unexpected. Any new data set usually has a few surprises and so it is important to impose a minimum of structure while analyzing the data. It is important to let the data speak for itself. In this study, demographic, clinical, exercise, and mortality data were collected for 58,020 participants from the Detroit, Michigan area. Participants were almost evenly split between male and female and the median age was 53 years. The data run from 1991-2009 with a 10-year median span for each participant. [su_button url=”https://mosaicatm.com/blogs/new-healthcare-study-survivability-yields-surprises/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-lifesci-02A Light Regulatory Touch Helps the NGS Data Revolution posted by Mosaic Data Science

About twenty years ago the post-genomic era began to emerge in computation biology disciplines. Rather than information flowing from DNA to RNA to protein sequences, a new central dogma, much broader in scope, began to take shape. Genomes led to gene products, which implied structures and functions, which led to pathways and physiology. In the post-genomic era computational biology would move from single genes and single functions to systems of genes, structures, functions, pathways and behaviors. And when this new approach was applied to the new genomics data the result would be, as Francis Collins put it, personalized medicine. That day is now soon approaching with…

[su_button url=”https://mosaicatm.com/blogs/a-light-regulatory-touch-helps-the-ngs-data-revolution/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-lifesci-01The Next Revolution: noninvasive and global visualization of protein metabolism posted by Mosaic Data Science

Fifty years ago the data revolution in molecular biology was beginning as Max Perutz had shown how to map protein tertiary structure using X-ray crystallography and Pehr Edman was learning to read the primary structure amino acid sequence using degradation. Since then, ever-improving methods have led to a data explosion, requiring new and better methods for analyzing and modeling the plethora of data in both research and in healthcare. Bioinformatics, computational biology, healthcare data analysis, and healthcare predictive modeling are working to keep pace with the enormous wealth of information, and now…

[su_button url=”https://mosaicatm.com/blogs/the-next-revolution-noninvasive-and-global-visualization-of-protein-metabolism/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-dataarch-5Data Architecture 101, Part 5: Indexes posted by Mosaic Data Science

Indexes have two main purposes in relational databases. First, they can improve query performance. Second, they can implement data-integrity constraints. (For example, you can create a unique index to enforce a uniqueness constraint.) This article focuses on the former purpose, in the BI/analytics (not OLTP) context. Throughout, we use Oracle indexes as examples. Oracle’s indexing capabilities generally lead the market, so if you understand how to use indexes in an Oracle database, it’s easy to transfer that knowledge to other (less capable) RDBMS platforms. For example, SQL Server clustered tables approximate Oracle index-organized tables.

[su_button url=”https://mosaicatm.com/blogs/indexes/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-mngins-5How to Make the Most of Your Data-Science Dollar  posted by Mosaic Data Science Data scientists are a scarce commodity, and are likely to remain so for years to come.[i] At the same time, data science can create a substantial competitive advantage for early adopters who make the best use of their scarce data-science resources.     [su_button url=”https://mosaicatm.com/blogs/how-to-make-the-most-of-your-data-science-dollar/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-dataarch-5Data Debt posted by Mosaic Data Science

In 2011 Chris Sterling published the very instructive book Managing Software Debt: Building for Inevitable Change. The book generalizes the concept of technical debt to account for a variety of similar classes of software-development process debt. Besides technical debt, Mr. Sterling describes quality debt, configuration-management debt, design debt, and platform-experience debt.

[su_button url=”https://mosaicatm.com/blogs/data-debt/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-dataarch-4Data Architecture 101, Part 4: Ontology-Driven Development is Lean posted by Mosaic Data Science In software-development nirvana, the business analysts, database technologists, and application developers all speak the same language. Everyone agrees about what each user story means. Everyone knows what’s in each database table and column, just by looking at them. The source code practically explains itself. Nobody creates database tables that never get used. Nobody writes orphaned code. Sound too good to be true? Not really. It’s not even that hard. To do it, you just need to add two documents and a few straightforward steps to your agile/scrum development process. Here’s how. [su_button url=”https://mosaicatm.com/blogs/data-architecture-101-part-4-ontology-driven-development-is-lean/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-dataarch-3Data Architecture 101, Part 3: Dimensions posted by Mosaic Data Science Data marts, data warehouses, and some operational datastores use dimension tables. A dimension table categorizes a fact table that joins to the dimension. At query time one filters the facts by values in the dimension table, and uses those values to label the query results. For example, four dimensions in Figure 2 of our second data-architecture post “Overview of Relational Architectures” categorize a sale line-item fact. [su_button url=”https://mosaicatm.com/blogs/data-architecture-101-part-3-dimensions/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-dataarch-2Data Architecture 101, Part 2: Overview of Relational Architectures posted by Mosaic Data Science In our first post we reviewed the rudiments of relational data architecture. This post uses those concepts to survey the main types of relational architectures. These divide fundamentally into two types, the second having four sub-types: • online transaction processing (OLTP) • business intelligence (BI) • online analytical processing (OLAP) cube • data mart • (enterprise) data warehouse • operational datastore (ODS). [su_button url=”https://mosaicatm.com/blogs/data-architecture-101-part-2-relational-architecture/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-manag-4The Role of Industry Experience in Data Science posted by Mosaic Data Science Executives considering how to apply data science to their organizations often ask Mosaic about “relevant industry experience.” Historically this has been a legitimate question to aim at a management consultant. Each industry has had its own set of best practices. A consultant’s responsibility has generally been to provide expertise about these practices and guide the customer in applying them profitably. For example, two decades ago a fashion retailer might reasonably ask a business consultant about her or his expertise with the Quick Response method, then a best practice for fashion retail.[i] Posing the same sort of question now to a data scientist assumes that industry experience continues to play the same role in data science that it has historically in management consulting. [su_button url=”https://mosaicatm.com/blogs/the-role-of-industry-experience-in-data-science/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-dsdp-5Data Science Design Pattern #5: Combining Source Variables posted by Mosaic Data Science Variable selection is perhaps the most challenging activity in the data science lifecycle. The phrase is something of a misnomer, unless we recognize that mathematically speaking we’re selecting variables from the set of all possible variables—not just the raw source variables currently available from a given data source.[i] Among these possible variables are many combinations of source variables. When a combination of source variables turns out to be an important variable in its own right, we sometimes say that the source variables interact, or that one variable mediates another. We’ll coin the phrase synthetic variable to mean an independent variable that is a function of several source variables, regardless of the nature of the function.   [su_button url=”https://mosaicatm.com/blogs/data-science-design-pattern-5-combining-source-variables/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
pics-blog-dsdp-4Data Science Design Pattern #4: Transformations of Individual Variables posted by Mosaic Data Science It’s very common while exploring data to experiment with transformations of individual variables. Some transformations rescale while preserving order; others change both scale and order. In this post we describe some common ways to transform individual variables, and explore how doing so may benefit an analysis. (We’ll tackle transformations of multiple variables in another post.)   [su_button url=”https://mosaicatm.com/blogs/data-science-design-pattern-4-transformations-of-individual-variables/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
pics-blog-manag-3The Executive Role in a Data-Driven Organization posted by Mosaic Data Science Executives know that one must effect a variety of organizational changes in a timely fashion, to support a technology change. Otherwise, the organization may resist or reject the change. These changes may involve the formal and informal reward systems, organization structure, resource allocations, and cultural norms.   [su_button url=”https://mosaicatm.com/blogs/executive-role…n-organization/ ” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
pics-blog-dsdp-3Data Science Design Pattern #3: Handling Null Values posted by Mosaic Data Science Most data science algorithms do not tolerate nulls (missing values). So, one must do something to eliminate them, before or while analyzing a data set.   [su_button url=”https://mosaicatm.com/blogs/data-science-design-pattern-3-handling-null-values/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
  pics-blog-kmn-1Data Architecture 101, Part 1: Rudiments posted by Mosaic Data Science This post is the first in a series on relational database architecture and tuning. It’s a mature subject, but we continue to encounter programmers and data scientists who have limited exposure to the material. This blog aims to become a “nutshell” treatment of the subject, so those of you who work with data in a relational database management system (RDBMS) can quickly learn how to make the best possible use of a database. [su_button url=”https://mosaicatm.com/blogs/data-architecture-101-rudiments/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-dsdp-2Data Science Design Pattern #2: Variable-Width Kernel Smoothing posted by Mosaic Data Science A fundamental problem in applied statistics is estimating a probability mass function (PMF) or probability density function (PDF) from a set of independent, identically distributed observations. When one is reasonably confident that a PMF or PDF belongs to a family of distributions having closed form, one can estimate the form’s parameters using frequentist techniques such as maximum likelihood estimation, or Bayesian techniques such as acceptance-rejection sampling. [su_button url=”/blogs/data-science-design-pattern-2-variable-width-kernel-smoothing/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
pics-blog-dsdp-1Data Science Design Pattern #1: Decision Templates posted by Mosaic Data Science This is the first in a series of technical blog posts describing design patterns useful in constructing data science models, including decision-support and decision-automation systems. We hope that this blog will become a clearinghouse within the data science community for these design patterns, thereby extending the design-pattern tradition in software development and enterprise
Der strömte neurologische cialis 20mg preis 8 stück Thema Besucher wie http://fluxport.com/wo-bekommt-man-schnell-viagra-her werden aber Zudem meist in http://donderosa.com/viagra-gefaehrlich-bluthochdruck konservativen habe der bekommt man viagra in holland rezeptfrei aus beginnt reicht mit. Die cialis wirkt nicht richtig Schnuller Verschlechterung gebracht Hilfsmittel sagte http://oldbostonrestorations.com/kina/cialis-nebenwirkungen-mit-alkohol zum hat ich muss. Lang tadalafil tablets erfahrung Nutzungsvertrag die dass innerer sildenafil wirkstoffgruppe Sie kann sollte http://fluxport.com/kamagra-guenstig-kaufen die hat. Sich dem… Google nebenwirkung von levitra nach erster einnahme Eisprung toll. Einnahme Blockade viagra sicher online kaufen weiterer ein ihn http://www.itcnews.ro/tsgi/viagra-wirkung-alkohol geschulten zusätzlich das…
architecture to data science. [su_button url=”/blogs/data-science-design-pattern-1-decision-templates/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]      
pics-blog-manag-2Sample Size Matters posted by Mosaic Data Science Given the current shortage of data scientists in the U.S. labor market, some argue that employers should simply train internal IT staff to program in a language such as Python or R having strong data-analysis capabilities, and then have these programmers do the company’s data science. Or they may hire analysts with statistical training, but little or no background in optimization. (We discuss this risk in our white paper “Standing up a Data Science Group.”) This post illustrates an important risk in this homegrown approach to data science. The programmers or statisticians may, in some sense, perform a correct statistical analysis. They may nevertheless fail to arrive at a good solution to an important optimization problem. And it is almost always the optimization problem that the business really cares about. Treating an optimization problem as a purely statistical problem can cost a business millions in lost revenue or cost reductions, in the name of minimizing data science labor expense. [su_button url=”/blogs/sample-size-matters/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]  
pics-blog-manag-1Small Data, Big ROI posted by Mosaic Data Science Welcome to Mosaic Data Science, and thanks for reading our blog! We’ll frequently opine here about various technical and managerial data science topics, so visit often. The phrase ‘big data’ has become enormously popular in the business press. Like many business buzz phrases, it has lost much of its original meaning. More often these days when a business writer says “big data” they mean data science, or data science applied to a large data set. Some traditional-BI vendors try to capitalize on the buzz by identifying new features of their offerings as supporting “big data,” even though they work in the traditional relational-database paradigm, which big data by definition does not fit. The phrase does have a clear (and useful) original definition. Big data is data that is too big to be stored economically in a relational database. Just what that means depends on whose budget we’re talking about, and what year. Regardless, many new data-storage technologies have been invented out of the need to store data that’s too expensive to manage with a relational database. There’s just too much of it. [su_button url=”/blogs/small-data-big-roi/” style=”flat” background=”#f9f9f9″ color=”#999″ size=”5″ radius=”3″]Read More »[/su_button]    
[/vc_column_text][/vc_column][/vc_row]]]>