SEMOSS and R

SEMOSS and RSEMOSS InfoBlockedUnblockFollowFollowingJan 22IntroductionR is a very impressive data analytics environment.

Pick any algorithm that needs to be implemented to summarize, transform, or analyze the data, and there is probably a package to do it.

In fact, there may be more than one package to do it.

R is one of the core environments that SEMOSS uses for its own analytic purposes.

Clean, Analytics, and other SEMOSS functions are typically a thin wrapper on top of existing R packages that are parameterized through widgets and that runs a R routine.

R’s weakness is the ease with which visualizations can be created.

A single bar chart or a pie chart is feasible, but custom visualizations and dashboards with SVG/canvas routines are easier done in the comfort of a browser.

One of the most commonly used visualization package in R is ggplot2.

If you have not yet been amazed by what ggplot2 can do, please see this link.

This article talks about how to use SEMOSS to work with R.

Together, you can run custom analytics, visualize, and share these visualizations.

We will use the same Midwest data for the tutorial found here.

Our goal here is to show how these visualizations can be created through a simple user interface.

We will primarily be using the console to show what you can do in R inside of SEMOSS.

Conventions for this ArticlesThe following boxes will be used to specify that this needs to be used within the SEMOSS console.

The bolded words will denote which Console they should be typed in.

R Code:Enter this command in the R Console.

Python Code:Enter this command in the Python Console.

Pixel Code:Enter this command in the Pixel Console.

Comment:## This is a comment to explain what is going on ##Working with RSEMOSS gives you a full R console experience.

This can be accessed by clicking on the console and clicking on R.

While it does not have the exact same controls as RStudio, many of the functionalities exist in SEMOSS, including using arrows to go up to the previous command, loading a library, and installing a library (not available on play.

semoss.

org yet), among other things.

However, from within SEMOSS the user cannot launch additional windows to create chart / graph through libraries like ggplot2.

The user can type simple R commands like installed.

packages() or some mathematical equations to see the results.

If the input requires multiple lines, use semicolons to separate the lines.

Much like R ggplot language, SEMOSS uses Pixel as its core language.

Each section below will show the Pixel equivalent of ggplot.

SetupLet us start by loading a CSV file.

On a local instance where the file system can be locally referenced, the CSV can be loaded by pointing the R Data Table fread function to the local file.

R Console:## Load the data from the URL Location ##d <-read.

csv(“https://raw.

githubusercontent.

com/selva86/datasets/master/midwest.

csv”)## Print the d out to verify the data ##dPixel Console:## Load this as a frame for SEMOSS to use ##GenerateFrameFromRVariable(“d”)You can shift between the consoles.

Typing %r takes you to R console %p to python and %s to pixel.

SEMOSS also allows you to create other mutations of dataframe on your R console and use it as the foundation frame for SEMOSS.

Please see the article on dataframe for more details.

In fact, when you create an insight and specify the R as the data frame, this is precisely the process that happens in the background.

Visualizing as ggplot2If you have not already clicked on the link above, here is the link to the tutorial.

We will start showing the parallels between ggplot2 and SEMOSS.

Step 1: Create a simple area scatter plot between area and poptotal.

To do this, the tutorial uses ggplot and the command is as follows:ggplot(midwest, aes(x=area, y=poptotal)) + geom_point()Pixel Code:Select ( PID , area , poptotal ) | With ( Panel ( 0 ) ) | Format ( type = [ ‘table’ ] ) | TaskOptions ( { “0” : { “layout” : “Scatter” , “alignment” : { “label” : [ “PID” ] , “x” : [ “area” ] , “y” : [ “poptotal” ] } } } ) | Collect(-1) ;The dataframe is a given in SEMOSS from the setup, it is implicit and it is assumed.

Let us break the remaining command into a few pieces:Selecting some data from a list of available columns here as ID, area, and poptotal.

Specifying which Panel to visualize this data into — Panel 0.

Performing the task of collecting this data — You can specify arbitrary amount of data through Collect(n), where n is the number of items to show or Collect(-1) to show everything.

Please be aware that if you have a lot of data, this will not be a optimal output.

Assigning it to be scatter layout as Label, X, and Y respectively — No surprises here.

Output for step 1 (SEMOSS on the left, ggplot2 on the right).

Step 2: Utilizing the scatter plot to draw a best line fit.

The tutorial does this by calling the geom_smooth(method=”lm”) as follows:R Console:ggplot(midwest, aes(x=area, y=poptotal)) + geom_point() + geom_smooth(method=”lm”)SEMOSS implements the regression line as an ornament, i.

e.

something that is something front end specific.

In this case the regression line is not computed from the backend, but it is done through the frontend.

This can be deceptive if you have only loaded subset of the data.

We are in the process of implementing both regression lines on the subset as well as the overall dataset.

To do this on SEMOSS, simply navigate to additional tools, click regression, and select linear to select the linear regression line.

Please also note that SEMOSS not just paints the regression line, but also the function that defines these points.

Output for step 2 (SEMOSS on the left, ggplot2 on the right).

Step 3 — Method 1: Adjusting X and Y limits by filtering.

The information that is being shown right now is clumped towards the bottom.

The article talks about how this can be made better either by using filters or by zooming in.

The ggplot article does by adjusting the axis.

R Console:## X is adjusted for 0, 0.

1 and y for 0, 1M.

Note that the x in this case has been mapped to area and y to poptotal ##g + xlim(c(0, 0.

1)) + ylim(c(0, 1000000))In SEMOSS, this can be achieved by filtering your frame.

Pixel Console:ReplaceFrameFilter(poptotal < 1000000) ;Given that the frame is one connected grid, SEMOSS does allow you to filter based on a different column if you chose to do it.

In SEMOSS you could as easily do this through set of mouse clicks on filter.

Step 3 — Method 2: Adjusting X and Y limits by using the zoom X-Axis and zoom Y-Axis.

Output for step 3 (SEMOSS on the left, ggplot2 on the right).

Step 4 — Changing the color of the points based on a different column.

Here is the ggplot code for plotting this with color.

library(ggplot2)gg <- ggplot(midwest, aes(x=area, y=poptotal)) +geom_point(aes(col=percollege), size=3) + # Set color to vary based on state categories.

geom_smooth(method="lm", col="firebrick", size=2) +coord_cartesian(xlim=c(0, 0.

1), ylim=c(0, 1000000)) +labs(title="Area Vs Population", subtitle="From midwest dataset", y="Population", x="Area", caption="Midwest Demographics")plot(gg)SEMOSS is not stateless and remembers all the previous directives given to the chart such as paint a regression line, filter the y to a given value, etc.

It is not needed to fill the same values again.

Arguably you could possibly achieve this by using the ggplot variable again.

In SEMOSS you only need to specify what to color this by.

Color is specified through the series.

The Pixel below will look very similar to the Pixel on step 2, the only different is in the selector, i.

e.

selecting that additional column per college and then using it for color/series.

Pixel Console:Select ( PID , area , poptotal, state_1 ) | With ( Panel ( 0 ) ) | Format ( type = [ ‘table’ ] ) | TaskOptions ( { “0” : { “layout” : “Scatter” , “alignment” : { “label” : [ “PID” ] , “x” : [ “area” ] , “y” : [ “poptotal” ], “series” : [ “state_1” ] } } } ) | Collect(-1) ;Output for step 4 (SEMOSS on the left, ggplot2 on the right).

Step 5: Faceting / Shelfing — Drawing multiple Plots within one figure.

One of the things that can be done in ggplot is the ability to Facet.

The idea behind faceting is comparing the same graph with a new axis.

R Console:g <- ggplot(mpg, aes(x=displ, y=hwy)) +geom_point() +geom_smooth(method=”lm”, se=FALSE) +theme_bw() # apply bw theme## Facet wrap with common scales ##g + facet_wrap( ~ class, nrow=3) + labs(title=”hwy vs displ”, caption = “Source: mpg”, subtitle=”Ggplot2 — Faceting — Multiple plots in one figure”) # Shared scales## Facet wrap with free scales ##g + facet_wrap( ~ class, scales = “free”) + labs(title=”hwy vs displ”, caption = “Source: mpg”, subtitle=”Ggplot2 — Faceting — Multiple plots in one figure with free scales”) # Scales freePixel Console:Select ( PID , area , poptotal, state_1 ) | With ( Panel ( 0 ) ) | Format ( type = [ ‘table’ ] ) | TaskOptions ( { “0” : { “layout” : “Scatter” , “alignment” : { “label” : [ “PID” ] , “x” : [ “area” ] , “y” : [ “poptotal” ], “facet” : [ “state_1” ] } } } ) | Collect(-1) ;Output for step 5 (SEMOSS on the left, ggplot2 on the right).

ConclusionSEMOSS provides native integration to R.

You have the ability to run analytics through R while trying to visualize the results through SEMOSS.

This can allow users to run elaborate R (and Python) pipelines as a dynamic page without the need for manual integration.

Feedback, Comments, Suggestions, Improvements?.Please contact us at semoss.

org.

Try it out at play.

semoss.

org.

Why login?There are 2 reasons why a login is required:1.

There are many users using SEMOSS, and we need some way to identify you so that the right databases can be shown.

Between the choice of creating a login/password and forcing to remember yet another password, SEMOSS chooses to use a social login.

Right now Google and Github are available.

Twitter, Facebook, etc.

are coming soon.

2.

You are working with YOUR data wherever it is.

It could be Google Drive, Dropbox, etc.

This is an integrated login to allow you to keep your data safe.

What will you do with my login?We will show you apps and insights that you have created and are author or editor of.

Can I use SEMOSS without logging in?Absolutely.

Download at www.

semoss.

org/download or docker run -p 80:8080 semoss/docker sh run.

sh.

.. More details

Leave a Reply