Quantcast
Channel: Dev Posts – DataStax
Viewing all 381 articles
Browse latest View live

DataStax Java Driver: 2.0.11 released!

$
0
0

The Java driver team is pleased to announce the release of version 2.0.11. The complete changelog can be found here, but we would like to highlight a few important items in this blog post. Also, make sure to read the upgrade guide.

Improvements to the Asynchronous API

There are now new methods to create a Session object asynchronously: Cluster.connectAsync() and Cluster.connectAsync(String) (JAVA-821). They return a ListenableFuture that will complete when the Session object is fully initialized, and all connection pools are up and running.

In addition, some internal features were refactored to avoid blocking calls (JAVA-822).

If you use the driver in a completely asynchronous manner, make sure to review this page in our documentation.

Improvements to Connection Handling

Each Cluster instance maintains a “control connection” that is used to query schema and token metadata from the remote cluster, perform node discovery, and also to receive notifications such as schema or topology changes. This central piece of the driver’s architecture can sometimes be overwhelmed with lots of simultaneous queries and/or notifications; thanks to JAVA-657, all outbound queries and inbound notifications are now “debounced”, i.e. their execution is delayed and if two or more requests are received within the delay, they are “merged” together and only one query is executed or only one notification is processed. This can significantly reduce the network usage by the control connection, specially in large clusters or clusters with frequent schema changes. The debouncing delay is by default of 1 second, but it is configurable through several new options in QueryOptions; refer to the javadocs for more information.

Another noteworthy improvement is brought by JAVA-544: when a connection fails, the driver would mark the host down immediately and close all of its connections. But this behavior might be too agressive, specially if the host is the control host, because that would close the control connection as well; thanks to JAVA-544, the driver now keeps going with the remaining connections, and periodically tries to reopen the missing ones based on the configured ReconnectionPolicy. It will only mark the host down when its last working connection goes down.

New RetryPolicy Decision: Try Next Host

DefaultRetryPolicy‘s behavior has changed in the case of an UnavailableException received from a coordinator. The new behavior will cause the driver to try a different host (the next one on the query plan) at most once, otherwise an exception will be thrown.

This change makes sense in the case where the node tried initially for the request happens to be isolated from the rest of the cluster (e.g. because of a network partition) but can still answer to the client normally. In this case, the isolated node will most likely respond with an UnavailableException for most queries, but simply switching to another node – hopefully not isolated – those queries would most likely succeed. The previous behavior was to always rethrow the exception.

Improvements to Schema and Token Metadata API

Version 2.0.11 brings two important changes to the Schema and Token Metadata API:

First of all, thanks to JAVA-828, having the driver retrieve metadata is now optional; it’s of course enabled by default, but some applications might wish to disable it in order to eliminate the overhead of querying the metadata and building its client-side representation. This can now be done by setting QueryOptions.setMetadataEnabled(boolean) to false. However, take note that doing so will have important consequences; refer to the javadocs for more information.

Furthermore, thanks to JAVA-151, clients can now subscribe to schema change notifications by implementing the new SchemaChangeListener interface. The listener must then be registered against a Cluster object through the new Cluster.register(SchemaChangeListener) method.

Improvements to Prepared Statements

JAVA-797 introduces a new configuration parameter: QueryOptions.setPrepareOnAllHosts(boolean). This parameter controls whether the driver should prepare statements on all hosts in the cluster. A statement is normally prepared in two steps: first, prepare the query on a single host in the cluster; then, if that succeeds, prepare it on all other hosts. The new option controls whether the second step is executed. It is enabled by default. The reason why you might want to disable it is to optimize network usage if you have a large number of clients preparing the same set of statements at startup; the drawback is that if a host receives a BoundStatement for a prepared statement that it does not know about, it will reply with an error and the driver will have to re-prepare the statement and re-execute the BoundStatement, which obviously implies a performance penalty.

Should you keep the default settings and prepare your statements on all hosts, the whole operation will now be faster thanks to JAVA-725: all hosts are now prepared in parallel.

Another similar optimization has been introduced by JAVA-658: when a down node comes back up, the driver usually re-prepares all cached prepared statements on it. It is now possible to disable this default behavior by setting QueryOptions.setReprepareOnUp(boolean) to false. The reason why you might want to disable it is to optimize reconnection time, but doing so can also incur in a performance penalty. See the javadocs for more details.

Netty Layer Configuration

Thanks to JAVA-853, it is now possible to configure the Timer instance used by the Netty layer. Timer instances are used to trigger client timeouts and speculative executions.
By default the driver creates a new Timer for every Cluster, but this can lead to situations where the number of Timer instances per JVM is too high; in these scenarios, users can now override the method NettyOptions.timer(ThreadFactory) and provide their own implementation of Timer (possibly shared among different Cluster instances).

Improvements to Query Builder API

Version 2.0.11 ships with some nice additions to the Query Builder API:

  • Support for UPDATE ... IF EXISTS: a CAS statement such as UPDATE table1 SET c1 = 'foo' WHERE pk = 1 IF EXISTS can now be built with the Query Builder:
    update("table1").with(set("c1", "foo")).where(eq("pk", 1)).ifExists();
    
  • Support for SELECT DISTINCT: a statement such as SELECT DISTINCT c1 FROM table1 WHERE pk = 1 can now be built with the Query Builder:
    select("c1").distinct().from("table1").where(eq("pk", 1));
    
  • It is now possible to include bind markers when deleting list elements and map entries, e.g. DELETE c1[?] FROM table1 WHERE pk = 1 and DELETE c1[:key] FROM table1 WHERE pk = 1 can now be expressed as follows:
    delete().listElt("c1", bindMarker()).from("table1").where(eq("pk", 1));
    delete().mapElt("c1", bindMarker("key")).from("table1").where(eq("pk", 1));
    
  • It is now possible to create INSERT INTO statements supplying columns and values as Lists, e.g.: INSERT INTO table1 (pk, c1) VALUES (1, "foo") can now be built this way:
    List<String> columnNames = Lists.newArrayList("pk", "c1"); 
    List<Object> columnValues = Lists.<Object>newArrayList(1, "foo"); 
    insertInto("foo").values(columnNames, columnValues);
    

Logging & Debugging

Version 2.0.11 introduces two helpful tools to debug driver failures:

JAVA-720 now allows most exceptions thrown by the driver to report the coordinator address that triggered the error. A new interface CoordinatorException has been introduced and most exceptions now implement it.

Thanks to JAVA-718 the driver now logs the stream ID of every request and response, at TRACE level. This can be helpful to correlate requests and responses in your application logs.

JAVA-765 introduces a new method to retrieve values of a SimpleStatement: getObject(int) which returns the value at the given index. This can also be useful to log statement values.

Authentication

Thanks to JAVA-719, PlainTextAuthProvider now accepts runtime modifications of the credentials, through two new methods: setUsername(String) and setPassword(String): the new credentials will be used for all connections initiated after these methods are called.

Getting the driver

As always, the driver is available from Maven and from our downloads server.

We’re also running a platform and runtime survey to improve our testing infrastructure. Your feedback would be most appreciated.


PHP Driver 1.0 GA

$
0
0

Hello PHPeople!

Today marks the first stable release of the DataStax PHP Driver for Apache Cassandra and DataStax Enterprise. Refer to the PHP Driver documentation for compatibility information.

With the very first beta coming out merely four months ago, the driver has come a long way with the help and numerous contributions from the community.

This release of the DataStax PHP Driver for Apache Cassandra and DataStax Enterprise includes two new features in addition to everything introduced in the 1.0 release candidate release:

Schema Metadata

The DataStax PHP Driver allows inspecting schema of the Cassandra cluster that it is connected to. Call Cassandra\Session::schema() to get ahold of Cassandra\Schema. From there you can access all or individual keyspace metadata using Cassandra\Schema::keyspaces() or Cassandra\Schema::keyspace() accordingly.

The example below gives an adequate taste of what’s possible:

<?php

$schema = $session->schema();

foreach ($schema->keyspaces() as $keyspace) {
    printf("Keyspace: %s\n", $keyspace->name());
    printf("    Replication Strategy: %s\n", $keyspace->replicationClassName());
    printf("    Replication Options:\n");
    $options = $keyspace->replicationOptions();
    $keys    = $options->keys();
    $values  = $options->values();
    foreach (array_combine($keys, $values) as $key => $value) {
        printf("        %s: %s\n", $key, $value);
    }
    printf("    Durable Writes:       %s\n", $keyspace->hasDurableWrites() ? 'true' : 'false');

    foreach ($keyspace->tables() as $table) {
        printf("    Table: %s\n", $table->name());
        printf("        Comment:                    %s\n", $table->comment());
        printf("        Read Repair Chance:         %f\n", $table->readRepairChance());
        printf("        Local Read Repair Chance:   %f\n", $table->localReadRepairChance());
        printf("        GC Grace Seconds:           %d\n", $table->gcGraceSeconds());
        printf("        Caching:                    %s\n", $table->caching());
        printf("        Bloom Filter FP Chance:     %f\n", $table->bloomFilterFPChance());
        printf("        Memtable Flush Period Ms:   %d\n", $table->memtableFlushPeriodMs());
        printf("        Default Time To Live:       %d\n", $table->defaultTTL());
        printf("        Speculative Retry:          %s\n", $table->speculativeRetry());
        printf("        Index Interval:             %d\n", $table->indexInterval());
        printf("        Compaction Strategy:        %s\n", $table->compactionStrategyClassName());
        printf("        Populate IO Cache On Flush: %s\n", $table->populateIOCacheOnFlush() ? 'yes' : 'no');
        printf("        Replicate On Write:         %s\n", $table->replicateOnWrite() ? 'yes' : 'no');
        printf("        Max Index Interval:         %d\n", $table->maxIndexInterval());
        printf("        Min Index Interval:         %d\n", $table->minIndexInterval());

        foreach ($table->columns() as $column) {
            printf("        Column: %s\n", $column->name());
            printf("            Type:          %s\n", $column->type());
            printf("            Order:         %s\n", $column->isReversed() ? 'desc' : 'asc');
            printf("            Frozen:        %s\n", $column->isFrozen() ? 'yes' : 'no');
            printf("            Static:        %s\n", $column->isStatic() ? 'yes' : 'no');

            if ($column->indexName()) {
                printf("            Index:         %s\n", $column->indexName());
                printf("            Index Options: %s\n", $column->indexOptions());
            }
        }
    }
}

Types API

With the introduction of Schema Metadata it is necessary to let users inspect the type of columns defined in a given keyspace.

The newly introduced Cassandra\Type interface defines the contract for the methods that all types must support as well as declares a number of static methods for fluent type definition and creation.

Using the Type API you can define and create Maps, Sets and Collections as well as validate and coerce data into all supported scalar types.

A picture is worth a thousand words:

<?php

// define and instantiate a map<varchar, int>
$map = Cassandra\Type::map(Cassandra\Type::varchar(), Cassandra\Type::int())
                     ->create('a', 1, 'b', 2, 'c', 3, 'd', 4);

var_dump(array_combine($map->keys(), $map->values()));

// validate the data
$value = Cassandra\Type::int()->create('this will throw an exception');

Install using PECL

The DataStax PHP Driver for Apache Cassandra and DataStax Enterprise is now available via PECL, install it by following installation instructions.

Links

The Benefits of the Gremlin Graph Traversal Machine

$
0
0

Many Gremlins A computer is a machine that evolves its state (data) according to a set of rules (program). Simple “one off” computers have their programs hardcoded. Programmable computers have an instruction set whereby parameterized instructions can be arbitrarily composed to yield different algorithms (different instruction sequences). If a programmable computer has a sufficiently complex instruction set, then not only can it simulate any “one off” computer, but it can also simulate any programmable computer. Such general-purpose computers are called universal. An example of a popular universal computer is the machine being used to read this blog post. Virtual machines emerged to enable the same program to run on different operating/hardware platforms (i.e. different machines). Popular virtual machines, such as the Java Virtual Machine, are universal in that they can be used to simulate other universal machines, including themselves. In the domain of graph computing, the computer is a graph traversal machine. The data of this machine is a graph composed of vertices (dots, nodes) and edges (lines, links). The instruction set is called the machine’s step library. Steps are composed to form a program called a traversal. An example universal graph traversal machine is the Gremlin traversal machine.

Physical machine Java virtual machine Gremlin traversal machine
instructions bytecode steps
program program traversal
program counter program counter traverser
memory heap graph
Machine Abstractions

This blog post will review the benefits of Apache TinkerPop’s Gremlin graph traversal machine for both graph language designers and graph system vendors. A graph language designer develops a language specification (e.g. SPARQL, GraphQL, Cypher, Gremlin) and respective compiler for its evaluation over some graph system. A graph system vendor develops an OTLP graph database (e.g. Titan, Neo4j, OrientDB) and/or an OLAP graph processor (e.g. Titan/Hadoop, Giraph, Hama) for storing and processing graphs. The benefits of the Gremlin traversal machine to these stakeholders are enumerated below and discussed in depth in their respective sections following this prolegomenon.

  1. Language/system agnosticism: A graph language designer need only create a Gremlin traversal compiler for their language to execute over any graph system that supports the Gremlin traversal machine. Likewise, a graph system vendor that supports the Gremlin traversal machine supports any language that compiles to a Gremlin traversal. [Section 1]
  2. Provided traversal engine: A graph language designer need only concern themselves with writing a Gremlin traversal compiler as they can rely on the Gremlin machine to handle traversal optimization and execution. Similarly, a graph system vendor only has to implement a few core abstractions to support the Gremlin traversal machine and can further advanced it with vendor-specific optimizations. [Section 2]
  3. Native distributed execution: A graph language designer need not concern themselves with the sequential, parallel, or distributed execution of their language as the Gremlin traversal machine can operate on a single machine or a multi-machine compute cluster. Additionally, if a graph system vendor implements the GraphComputer API, then Gremlin traversals can execute over their multi-machine partitioned graphs. [Section 3]
  4. Open source and Apache2 licensed: A graph language designer can contribute to the evolution of the Gremlin traversal machine. In turn, graph system vendors can also help advance Gremlin as Apache TinkerPop is part of the Apache Software Foundation and thus, is open source and free to use. [Section 4]

Language/System Agnosticism: “Many Graph Languages for Many Graph Systems”

Java is both a virtual machine and a programming language. The Java virtual machine (JVM) is a software representation of a programmable machine. The benefit of the JVM is that its software (a Java program) can be executed on any JVM-enabled operating system without requiring it to be rewritten/recompiled to the instruction set of the underlying hardware machine. This feature is espoused by Java’s popular slogan “write once, run anywhere.” The Java programming language is a human writable language that when compiled by the Java compiler (javac), a sequence of instructions from the JVM’s instruction set is generated called Java bytecode.

Java Virtual Machine Dataflow

Gremlin, like Java, is both a virtual machine and a programming language (or query language). The Gremlin traversal machine maintains a step library (i.e. instruction set) whose steps can be arbitrarily composed to enact any computable graph traversal algorithm. In other words, the Gremlin traversal machine is a programmable, universal machine (see The Gremlin Graph Traversal Machine and Language). Moreover, the Gremlin traversal machine can be supported by any graph system vendor such that Gremlin expresses the same “write once, run anywhere”-mantra. The Gremlin traversal language (aka Gremlin-Java8) is a human writable language that when compiled, a traversal is generated that can be executed by the Gremlin traversal machine.

Gremlin Graph Traversal Machine Dataflow

Any Graph Language to Any Graph System There is a noteworthy consequence of the virtual machine/programing language distinction. The Java virtual machine does not require the Java programming language. That is to say, any other programming language can have a compiler that generates Java bytecode (JVM instructions). It is this separation that enables other JVM languages to exist. Examples include Groovy, Scala, Clojure, JavaScript (Rhino), etc. Analogously, the Gremlin traversal machine does not require the Gremlin traversal language (Gremlin-Java8). Any other graph language can be compiled to a Gremlin traversal.1 It is because of this separation that other graph traversal languages exist such as Gremlin-Groovy, Gremlin-Scala, and Gremlin-JavaScript. It is arguable that these Gremlin variants simply leverage the fluent interface of the Gremlin(-Java8) language within the programming constructs of their respective host language. Regardless, nothing prevents any other graph language from being executed by a Gremlin traversal machine such as SPARQL, GraphQL, Cypher, and the like.2 How is this possible? — the Gremlin traversal machine is universal and maintains an extensive step library of the common query motifs found in most every graph language.3

If there exists a compiler that translates language X to Gremlin traversals and graph system Y supports the Gremlin traversal machine, then graph language X can execute against graph system Y.

Provided Traversal Engine: “SPARQL on the Gremlin Traversal Machine”

SPARQL-Gremlin Logo SPARQL is a graph query language used extensively in the RDF/SemanticWeb community and supported by every RDF graph database vendor. SPARQL is not as popular in the property graph database space because the underlying data model is different. The primary mismatch is that property graph edges can have an arbitrary number of key/value pairs (called properties). However, this difference is minor when compared to their numerous similarities. Capitalizing on their alignment, it is possible to compile SPARQL to a Gremlin traversal and thus, have a SPARQL query execute on any graph database/processor that supports the Gremlin traversal machine.

The examples to follow throughout the remainder of this post make use of the Grateful Dead graph distributed by Apache TinkerPop. The Grateful Dead graph is composed of song and artist vertices and respective interrelating edges. The graph schema is diagrammed below.

Grateful Dead Schema

A question that can be asked of this graph is:

Which song writers wrote songs that were sung by Jerry Garcia and performed by the Grateful Dead more than 300 times?

This query is expressed in Gremlin-Java8 as:

g.V().match(
  as("song").out("sungBy").as("artist"),
  as("song").out("writtenBy").as("writer"),
  as("artist").has("name","Garcia"),
    where(as("song").values("performances").is(gt(300)))
).select("song","writer").by("name")

Titan-Gremlin It is possible to express this same query in SPARQL. However, in order for it to execute against a Gremlin-enabled graph system, it must be compiled to a Gremlin traversal. SPARQL-Gremlin was developed for this purpose.4 Apache Jena provides programmer access to their SPARQL compiler called ARQ. The Jena SPARQL compiler builds a syntax tree that can be walked and as the walk proceeds, Gremlin steps are concatenated and nested. The Gremlin-Java8 query above is represented in SPARQL below, where e: means “edge label” and v: means “property value.” The SPARQL query can be executed via a Java application or via the Gremlin-Console (REPL). A Gremlin-Console session using SPARQL-Gremlin over Titan (version 1.0) is provided below.5

SELECT ?songName ?writerName WHERE {
  ?song e:sungBy ?artist .
  ?song e:writtenBy ?writer .
  ?song v:name ?songName .
  ?writer v:name ?writerName .
  ?artist v:name "Garcia" .
  ?song v:performances ?performances .
    FILTER (?performances > 300)
}
$ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
gremlin> :install com.datastax sparql-gremlin 0.1
==>Loaded: [com.datastax, sparql-gremlin, 0.1]
gremlin> :plugin use datastax.sparql
==>datastax.sparql activated
gremlin> graph = TitanFactory.open('conf/titan-cassandra.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> :remote connect datastax.sparql graph
gremlin> query = """
  SELECT ?songName ?writerName WHERE {
    ?song e:sungBy ?artist .
    ?song e:writtenBy ?writer .
    ?song v:name ?songName .
    ?writer v:name ?writerName .
    ?artist v:name "Garcia" .
    ?song v:performances ?performances .
      FILTER (?performances > 300)
  }
"""
gremlin> :> @query
==>[songName:BERTHA, writerName:Hunter]
==>[songName:TENNESSEE JED, writerName:Hunter]
==>[songName:BROWN EYED WOMEN, writerName:Hunter]
==>[songName:CHINA CAT SUNFLOWER, writerName:Hunter]
==>[songName:CASEY JONES, writerName:Hunter]
==>[songName:BLACK PETER, writerName:Hunter]
==>[songName:RAMBLE ON ROSE, writerName:Hunter]
==>[songName:WHARF RAT, writerName:Hunter]
==>[songName:LADY WITH A FAN, writerName:Hunter]
==>[songName:HES GONE, writerName:Hunter]
==>[songName:LOSER, writerName:Hunter]
==>[songName:DEAL, writerName:Hunter]
==>[songName:SUGAREE, writerName:Hunter]
==>[songName:DONT EASE ME IN, writerName:Traditional]
==>[songName:UNCLE JOHNS BAND, writerName:Hunter]
==>[songName:SCARLET BEGONIAS, writerName:Hunter]
==>[songName:EYES OF THE WORLD, writerName:Hunter]
==>[songName:US BLUES, writerName:Hunter]
==>[songName:TERRAPIN STATION, writerName:Hunter]
==>[songName:STELLA BLUE, writerName:Hunter]
gremlin>

The generated Gremlin traversal “bytecode” (steps) is provided below. Unlike hardware machine code (or Java bytecode), a traversal is not just a linear concatenation of steps. It is possible for the parameters of a step to be a traversal (called an anonymous traversal). Step concatenation and traversal nesting yield the global structure of a Gremlin traversal.

[GraphStep([],vertex), 
  MatchStep(AND,[
    [MatchStartStep(song), VertexStep(OUT,[sungBy],vertex), MatchEndStep(artist)], 
    [MatchStartStep(song), VertexStep(OUT,[writtenBy],vertex), MatchEndStep(writer)], 
    [MatchStartStep(song), PropertiesStep([name],value), MatchEndStep(songName)], 
    [MatchStartStep(writer), PropertiesStep([name],value), MatchEndStep(writerName)], 
    [MatchStartStep(artist), PropertiesStep([name],value), IsStep(eq(Garcia)), MatchEndStep], 
    [MatchStartStep(song), PropertiesStep([performances],value), MatchEndStep(performances)]]),
  WhereTraversalStep([WhereStartStep(performances), IsStep(gt(300))]), SelectStep([songName, writerName])]
Traversal Strategies

Once the query is compiled to a Gremlin traversal, then the Gremlin machine can optimize it both prior to and during its execution. There are two types of traversal strategies: compile-time and runtime. Compile-time strategies analyze the traversal and rewrite particular suboptimal step sequences into a more efficient form. For instance, the step sequence

...out().count().is(gt(10))...

Is re-written to the more efficient form

...outE().limit(11).count().is(gt(10))...

by means of the two compile-time strategies AdjacentToIncidentStrategy and RangeByIsCountStrategy. In the diagram above, traversal T1 is transformed via a sequence of strategies that ultimately yield traversal Tn. Note that T1, T2, Tn, etc., if executed as such, would all return the same result set. In essence, traversal strategies do not effect the semantics of the query, only its execution plan. Next, if a graph system has a particular feature such as indexing, the vendor can provide custom strategies to the compilation process. Finally, MatchStep makes use of a runtime strategy called CountMatchAlgorithm to dynamically re-order pattern execution based on runtime set reduction statistics (amongst other analyses). In short, the larger the set reduction a particular pattern performs over time, the higher priority it has in the execution plan.

The Gremlin traversal machine will apply both compile-time and runtime optimizations (called traversal strategies) to any submitted traversal.

Native Distributed Execution: “A Gremlin Traversal over an OLAP Graph Processor”

The Gremlin traversal machine can not only execute a traversal compiled from any graph traversal language, but it also can execute the same traversal on a single machine or across a multi-machine compute cluster. There is no direct correlate to this on the Java virtual machine. If there were, it would be equivalent to saying that a distributed Java virtual machine provides a single address space across the the entire cluster and can execute the same JVM code regardless of how many physical machines are involved. Perhaps the closest analogy is BigMemory by Terracotta, though the graph data in distributed Gremlin can exist in both memory and on disk.

Traverser Properties Gremlin executes a traversal using traversers. Traversers maintain a reference to their location in the graph (data reference), to their location in the traversal (program counter), and to a local memory data structure (registers). When a traverser is confronted with a decision in the graph structure (e.g. multiple incident followedBy-edges), the traverser clones itself and takes each path. In even the simplest traversals, it is possible for billions of such traversers to be spawned due to recursion (repeat()-step) and the complex branching structure found in natural graphs (see Loopy Lattices Redux). Fortunately, a “bulking” optimization exists to limit traverser enumeration. This optimization is extremely important in distributed Gremlin where parallel execution can easily lead to a large memory footprint. The distributed traversal is complete when all existing traversers have halted (i.e. no more steps to execute). The result of the query is the aggregate of the locations of all halted traversers.

Distributed Gremlin Traversal Machine                                  Distributed Serialization Across Machine Boundaries

For distributed OLAP graph processors such as Apache Giraph (via Giraph-Gremlin), the graph data set is partitioned across the machines in the cluster (top left figure). Each machine is responsible for a subset of the vertices in the graph. Each vertex has direct reference to its properties, its incoming and outgoing edges, as well as the properties of those incident edges. This atomic data structure is called a “star graph.” The local edges of a vertex reference other vertices by their globally unique id. When a Gremlin traversal is submitted to the cluster it is copied to all machines. Each machine spawns a traverser at each vertex it maintains with the traverser’s initial step being the first step in the traversal. When a traverser traverses an edge that leads to a vertex that is stored on another machine in the cluster, the traverser serializes itself and sends itself to the machine containing its referenced vertex (top right figure). Serialization is expensive so a good graph partition is desirable to reduce the number of “machine hops” the traversers have to take to solve the traversal.

Spark-Gremlin In the Gremlin-Console session below, the previous SPARQL query is executed in a distributed manner using Apache Spark (via Spark-Gremlin). The data accessed by Spark is pulled from Titan (CassandraInputFormat). Titan is a distributed graph system that supports both OTLP graph database operations as well as OLAP graph processor operations.

gremlin> graph = GraphFactory.open('conf/titan-cassandra-hadoop.properties')
==>hadoopgraph[cassandrainputformat->gryooutputformat]
gremlin> g = graph.traversal(computer(SparkGraphComputer))
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin> :remote connect datastax.sparql g
==>SPARQL[graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]]
gremlin> query = """
  SELECT ?songName ?writerName WHERE {
    ?song e:sungBy ?artist .
    ?song e:writtenBy ?writer .
    ?song v:name ?songName .
    ?writer v:name ?writerName .
    ?artist v:name "Garcia" .
    ?song v:performances ?performances .
      FILTER (?performances > 300)
  }
"""
gremlin> :> @query
[Stage 0:===>                                            ]
[Stage 0:========>                                       ]
[Stage 0:==================>                             ]
...
==>[songName:DONT EASE ME IN, writerName:Traditional]
==>[songName:EYES OF THE WORLD, writerName:Hunter]
==>[songName:BLACK PETER, writerName:Hunter]
==>[songName:HES GONE, writerName:Hunter]
==>[songName:STELLA BLUE, writerName:Hunter]
==>[songName:BERTHA, writerName:Hunter]
==>[songName:WHARF RAT, writerName:Hunter]
==>[songName:DEAL, writerName:Hunter]
==>[songName:TERRAPIN STATION, writerName:Hunter]
==>[songName:BROWN EYED WOMEN, writerName:Hunter]
==>[songName:SUGAREE, writerName:Hunter]
==>[songName:TENNESSEE JED, writerName:Hunter]
==>[songName:LADY WITH A FAN, writerName:Hunter]
==>[songName:LOSER, writerName:Hunter]
==>[songName:SCARLET BEGONIAS, writerName:Hunter]
==>[songName:RAMBLE ON ROSE, writerName:Hunter]
==>[songName:CASEY JONES, writerName:Hunter]
==>[songName:CHINA CAT SUNFLOWER, writerName:Hunter]
==>[songName:UNCLE JOHNS BAND, writerName:Hunter]
==>[songName:US BLUES, writerName:Hunter]
gremlin> 

Any Gremlin traversal compiled from any query language can be executed by the Gremlin traversal machine on a single-machine or a multi-machine compute cluster.

Open Source and Apache2 Licensed: “Who/Where/When/What is The TinkerPop?”

Gremlin is designed and developed by Apache TinkerPop. TinkerPop is licensed Apache2 and thus, the software is free to use for any purpose commercial or otherwise. Apache TinkerPop is also open source with an active development community. There are four general types of stakeholders in the community.

  • TinkerPop committers: The core developers of TinkerPop. This core designs, develops, and documents Gremlin. Furthermore, they communicate with the other stakeholders to ensure that the Gremlin machine architecture and language evolve accordingly.
  • Graph system vendors: The commercial or open-source providers of graph system technology. These vendors ensure that their products are “TinkerPop-enabled” (i.e. have support for the Gremlin traversal machine). Furthermore, these vendors develop custom strategies to further optimize Gremlin traversals for their particular system.
  • Graph language designers: The creators of graph traversal languages with compilers for the Gremlin traversal machine. The Gremlin-XYZ language line is embedded in popular programming languages such that the developer’s database query and application language are one in the same. With direct access to the Gremlin traversal machine, other languages that are not embedded in a host language (e.g. SPARQL, Cypher, GraphQL) can now be provided.
  • TinkerPop users: The everyday users of the graph technology developed and distributed by the previous stakeholders. With TinkerPop being vendor agnostic, users are not locked into a particular system/language and thus, can explore the unique advantages that each system and language provides.
Apache TinkerPop

Apache TinkerPop is an open source project that is open to contributions of all kind and is free to use for any purpose commercial or otherwise.

Conclusion

Gremlin Standing Gremlin is both a graph traversal machine and graph traversal language. Any graph language can be compiled to Gremlin and evaluated against any of the numerous TinkerPop-enabled graph systems in existence today. Previous to this exposition, Gremlin had been presented as solely being agnostic to the underlying graph system. However, Gremlin is also agnostic to the user facing query language that reads and writes data to and from the underlying graph system. Simply put, TinkerPop enables graph developers to use any query language with any graph system.

Acknowledgements

This blog post was written by Marko A. Rodriguez under the inspiration of The Gremlin Graph Traversal Machine and Language article. Daniel Kuppitz developed SPARQL-Gremlin as a proof-of-concept demonstrating that any arbitrary graph language can be compiled to Gremlin. Matthias Bröcheler, upon reviewing this article, stated: “It would be sweet if somebody wrote SQL-Gremlin.” A quick Google search revealed JSQLParser which has an analogous architecture to the ARQ parser used by SPARQL-Gremlin. The Gremlins in the title logo are (from left to right) Grem Stefani, The Grem Reaper, Gremicide, Gremlin the Grouch, Ain’t No Thing But a Chicken Wing, Gremlivich, Gremopoly, Gremalicious, and Clownin’ Around. The contributors would like to thank the Apache Software Foundation for their continued support of TinkerPop and DataStax for sponsoring this research effort.

Thank you and good night.


Footnotes

1. Apache TinkerPop’s Gremlin traversal machine is written in Java and intended to be used by JVM-based graph systems. While most graph systems are JVM-based, there are some that aren’t. For those that aren’t, TinkerPop’s Gremlin can be leveraged via the Java Native Interface. However, the Gremlin traversal machine and language specifications are quite simple and thus, can be implemented in the native language of the underlying graph system wishing to capitalize on the benefits of the Gremlin traversal machine and language.

2. Note that compilers for GraphQL and Cypher currently do not exist. The purpose of this discussion is to state that it is feasible for someone to write a Gremlin compiler for these languages. In this way, if a particular graph query language is more appreciated by the user, they can leverage it for whichever Gremlin-enabled graph system they wish.

3. If the underlying graph system already has support for a particular query language, then it may be pointless to do a compilation to the Gremlin traversal machine. For instance, Cypher is the primary language of Neo4j and SPARQL is the primary language of Stardog. The designers of these graph systems focus on ensuring that the queries of their respective languages execute as fast as possible on their respective systems. However, the reason it “may be pointless” is because, like C++ vs. Java, if the optimizers of the Gremlin traversal machine are advanced enough, then in theory, a compilation of these languages to the Gremlin traversal machine may be faster than the respective supported language engine.

4. SPARQL-Gremlin version 0.1 (September 17, 2015) is a prototype demonstrating the primary aspects of SPARQL being compiled to Gremlin. SPARQL-Gremlin version 0.1 only supports SELECT, WHERE, FILTER, and DISTINCT. Expanding its capabilities is a function of adding more translations to the compiler (e.g. ORDER, GROUP BY, etc.), where pull-requests are more than welcome.

5. The Gremlin-Console is a thin wrapper around the Groovy Shell and thus, can only interpret Gremlin-Groovy (a superset of Gremlin-Java). As such, the SPARQL query is represented as a String and passed to the SPARQL-Gremlin compiler via the datastax.sparql Gremlin Plugin.

Exciting Changes to KillrVideo Sample Application and Website

$
0
0

If you haven’t checked out the DataStax KillrVideo sample application and website yet, or if you haven’t visited the site in a while, let me encourage you to take a look. The KillrVideo team at DataStax has recently added some pretty interesting new features that help you better understand how to use Cassandra and DataStax Enterprise.

As a reminder, KillrVideo is a sample web application developed in C# and uses DataStax Enterprise running on Microsoft Azure as the database platform. The site showcases a number of DataStax Enterprise components in action including Apache Cassandra, and now, DSE Analytics, and DSE Search. The application’s code, data models, data and more are available for free on GitHub.

The use cases supported by the application include product catalog, user activity tracking, and user personalization with recommendations.

The new additions to the KillrVideo sample application and website include:

  • A guided tutorial (see Navigation bar widget to turn the Tour: Off | On) that walks you through the website in step-by-step fashion and shows you what’s happening underneath the covers where database access and activity is concerned.
  • Utilization of DSE Search that enables you to easily search the site for desired videos and information.
  • Recommendations that are powered by DSE Analytics (specifically, Spark).

All code for KillrVideo is freely available so you can learn exactly how to replicate the functionality for your own applications. Enjoy!

Fraud Prevention in DSE

$
0
0

In today’s technology driven world, online fraud is one of the biggest problems for online providers. From your local Starbucks credit card machine to the world’s biggest online shops, they all have to deal with fraud. The use of stolen credit cards is by far the biggest contributor to online fraud and the use of mobile devices for financial transactions has increased the risks significantly. Detecting fraud can be difficult and costly; so what can we do to prevent it happening in the first place ? This document will hopefully help you to understand how DataStax Enterprise can reduce a business’ exposure to fraud and potentially prevent it happening in the first place. DataStax Enterprise can handle millions of low-latency transactions per second, which can allow for online algorithms to process transactions in real-time to stop fraudulent transactions before they happen.

Current view of Fraud detection and prevention

Card fraud happens in many different ways and online transactions are often the hardest to detect.  The main problem is that by the time it’s been detected, the transaction has already occurred and someone is out of pocket.

Lets quickly look at the people involved in a transaction

  1. The Consumer – the entity who wants to purchase a  (point of sale, online etc)
  2. The Merchant – the other entity in the transaction (website, cashier etc)
  3. The Acquirer – a financial institution that handles the transaction for the merchant (eg Worldpay or paypal)
  4. The Credit Card Network – the credit card service that handles the transaction between the acquirer and the issuer
  5. The Issuer – the bank or financial institution that provided the consumer with their credit card (e.g. HSBC, Citi)

In terms of preventing fraud, the Acquirer, Issuer and Credit Card Network have all got some process in place to try and stop a transaction before it happens.

  1. The Acquirer works for the merchant so they may provide some validation checks before sending the transaction on to the Credit Card Network.
  2. The Credit Card network usually has watchlists and blacklists of card numbers that they want to monitor.
  3. The Issuer will be required to check if the consumer has enough available credit to process the transaction and check the status and pin of the card.

Here is an overview of a credit card transaction flow, all of which has to happen in less than a second.

Diagram 1.1

CreditCardFlowFinal

Fraud prevention

Criminals are always looking for devious ways to circumvent this process.To combat this we can give more power to both the acquirers and the issuers. In the process above the credit card company is going to be the bottleneck but there shouldn’t be too much transaction flow for the acquirer and the issuer.

Using DataStax Enterprise, the acquirer can

  1. Create unique rules for each merchant
  2. Scan previous transactions to ensure consistency
  3. Help to stop employee fraud by ensuring that transactions with certain attributes need confirmation by senior staff.

For example if we take a busy coffee shop and 99% of the card transactions that occur are under £20 and 100% are under £50. We can create a rule that if, at any stage a transaction is requested for over £50 that a senior member of staff is notified and a confirmation is required. Also on a given day we can notify the merchant if it has an unusual amount of transactions above £20. This is a simple idea but it requires effort on the part of the acquirer to ensure these requirements are met and that the required people are notified.

Let’s now look at the consumer. They take out a credit card with an issuing bank. They usually have some web interface to see their transactions and pay their bills. With DataStax Enterprise, we can give a consumer more control over the transactions in their account.

For example, giving the issuer the power to :

  1. Require confirmation for transactions over a certain amount
  2. Require confirmation for transactions from a merchant not in their history
  3. Apply rules for certain merchants e.g. maximum of £20 in transactions to the Apple app store per month
  4. Have notifications for every transaction that has occurred in real time with the option of stopping the transaction if needed.

DataStax Enterprise is a product built to meet these needs and more.

DataStax Enterprise – A transactional database

What is Apache Cassandra ?

Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure (Wikipedia). It’s used in production today by some of the largest transactional companies in the world like UBS, Credit Suisse and Bank of America.

What/Who is DataStax ?

DataStax delivers Apache Cassandra™ in a database platform that meets the performance and availability demands of Internet-of-things (IoT), Web, and Mobile applications. It gives enterprises a secure, fast, always-on database that remains operationally simple when scaled in a single datacenter or across multiple datacenters and clouds. Along with Cassandra, DataStax incorporates complementary technologies like in-memory, advanced security, search and analytics using Apache Spark™. JSON and graph support will be added to that list later this year.

Netflix reinvented its business from DVDs by mail to online media on DataStax Enterprise in the Cloud and now processes 10 Million transactions per second to give users the most personalized viewing experience. Why did they choose DataStax ? Quite simply because nothing else could do what DataStax does. The financial sector is going through a shift in both technology and thinking at the moment, both the open source offerings and the big data requirements are becoming aligned and the time is right to take new projects giving the control to the customers and clients.

(http://www.datastax.com/personalization/netflix)

Diagram 1.2 DataStax Enterprise global deployment.
DSE-Client

Diagram 1.2 shows a typical global deployment with data centers in the US, Europe and Asia. These can be either physical or in the cloud. This allows clients to connect to their local data centers to avoid high latencies in their applications while also providing 100% availability even if a data center was to fail.

Fraud Prevention Architecture

Let’s look at how we could implement new applications for acquirers and issuers to fulfil the requirements that we have above. Beginning with the issuer, how do we address the following important features:

  • Availability
  • Speed
  • Scalability
  • Security

Availability.

In today’s world any client facing application is a critical one. Not providing certain functionality is one thing but if a system is down when your clients need it, for any reason, it is a failure and will ultimately hurt your business and brand. DataStax provides a database system with 100% up time, continuous availability. So when a server goes down or a network fails, an upgrade needs to occur or even a datacenter is flooded, DataStax can still provide 100% uptime for all requests. Because Apache Cassandra is a peer to peer system, there is no single point of failure, so it is fault tolerant. This should be the number one requirement of any critical system.

Speed

To process millions of transactions per second your database has to be fast; really fast. Each DataStax node/server can process 10s of thousands of transactions per second. So by scaling the system to contain a large cluster of nodes together, DataStax can process any amount of transactions; in a persistent and continuously available system. Along with throughput comes latency. To provide the latency requirements that are needed for a query driven system like the one we want for fraud prevention, we need to be able to write and read at incredible speed. DataStax’s data model allows for grouping of related data for low latency reads. This is perfect for both reading and writing transactions for a particular credit card user. So millions of transactions can be processed at the same time without contention .

Scalability

Where master/slave architectures always suffer from bottlenecks, peer to peer systems work more like web servers. With DataStax, adding servers is as simple as pointing a new server to an existing cluster of servers. Scaling a DataStax cluster from 10 to 100 servers could take as little as a few hours with pre-provisioned servers and all without having to change the client application. The drivers provided by DataStax automatically load-balance and redirect requests to another data center if a client’s local data center becomes unavailable.

Security

DataStax Enterprise inherits the basic security feature set provided in open source Apache Cassandra™ and builds upon it to provide a set of commercial security extensions that enterprises need to protect critical data. For more complex security requirements, our partner Vormetric, offers a comprehensive data security solution for the data stored in DataStax Enterprise and helps organizations comply with PCI-DSS requirements. See Appendix A for a white paper on PCI security.

Conclusion

Financial companies are fast becoming aware of the challenge they face with regards to how they treat their customers. It’s easy to switch credit card providers so new issuers will be creating new and exciting tools and features to lure customers away from traditional cards. Companies like Final (https://getfinal.com) are giving the customer more powerful and user driven features that allow customers to have more control. Existing issuers and acquirers must provide similar features if they are to retain their customers.

Appendix A

PCI Compliance Architecture

http://www.datastax.com/wp-content/uploads/2013/12/WP-DataStax-Enterprise-PCI-Compliance.pdf?1

Geospatial search and Spark in Datastax Enterprise

$
0
0

In this post I will discuss using geospatial search in Datastax Enterprise (DSE) Search and with Apache Spark as part of DSE Analytics. I will also provide a demo project that you can download and try.

No ETL

Most search tools like Solr or Elasticsearch have a geospatial search feature which allows users to ask questions like ‘give me all locations within 1 km of given co-ordinates’. This is a requirement we see increasing especially with mobile applications. In most cases, this will require you to ETL (extract, transform and load) the data from the main database to a specific search tool. There are a lot of disadvantages especially the fact that we now need to keep these two sources in sync. DSE allows the user to have one version of your data that you use for both realtime access and also for specialised search queries.

DSE SearchAnalytics

DSE allows you to create a node with 3 complementary features

  1. A Cassandra node for storing realtime transactional data
  2. A Solr web application for all search and geospatial queries on the realtime data in Cassandra
  3. A Spark Worker to allow for analytics queries based on both the realtime data in Cassandra and the indexes provided by Solr.

One set of data is used in many ways to provide multiple features. I, personally, have been part of projects where the main dataset is held in a rdbms and ETL’ed to a specialised search tool and also to a hadoop cluster. Well that time has passed.

Example

The following example is at my github website. The example describes how to load all post codes in the UK with a longitude and latitude of their location.

We can see our data by using ‘select post_code, lon_lat from postcodes';

DSE-SA-Select

We can also query our data using a solr query like this
select post_code, lon_lat from postcodes where solr_query = ‘{“q”: “post_code:SW209AQ”}';

DSE-SA-SelectSolr

Now we can move into our geospatial queries and ask questions like –

show me all postcodes within a km of ‘SW20 9AQ’ ?

select * from postcodes where solr_query = ‘{“q”: “*:*”, “fq”: “{!geofilt sfield=lon_lat pt=51.404970234124800,-.206445841245690 d=1}”}';

I can also connect to my data through Apache Spark using the cassandra connector. I can use it in many ways eg the CassandraTable method, the CassandraConnector class and also through Spark SQL To use with Spark you can use the following

Using Cassandra Table

 
//Get data within a 1km radius
sc.cassandraTable("datastax_postcode_demo", "postcodes").select("post_code").where("solr_query='{\"q\": \"*:*\", \"fq\": \"{!geofilt sfield=lon_lat pt=51.404970234124800,-.206445841245690 d=1}\"}'").collect.foreach(println)
 
//Get data within a rectangle 
 sc.cassandraTable("datastax_postcode_demo", "postcodes").select("post_code").where("solr_query='{\"q\": \"*:*\", \"fq\": \"lon_lat:[51.2,-.2064458 TO 51.3,-.2015418]\"}'").collect.foreach(println)

Filtering with radius and box bounds

 
import com.datastax.spark.connector.cql.CassandraConnector
import scala.collection.JavaConversions._

//Get data within a 1km radius
 CassandraConnector(sc.getConf).withSessionDo { session => session.execute("select * from datastax_postcode_demo.postcodes where solr_query='{\"q\": \"*:*\", \"fq\": \"{!geofilt sfield=lon_lat pt=51.404970234124800,-.206445841245690 d=1}\"}'")
 }.all.foreach(println)

//Get data within a 1km bounded box
 val rdd = CassandraConnector(sc.getConf).withSessionDo { session =>
 session.execute("select post_code, lon_lat from datastax_postcode_demo.postcodes where solr_query='{\"q\": \"*:*\", \"fq\": \"{!bbox sfield=lon_lat pt=51.404970234124800,-.206445841245690 d=1}\"}'")
 }.all.foreach(println)

Spark SQL

import org.apache.spark.sql.cassandra.CassandraSQLContext 

//Get data within a 1km radius 
val rdd = csc.sql("select post_code, lon_lat from datastax_postcode_demo.postcodes where solr_query='{\"q\": \"*:*\", \"fq\": \"{!geofilt sfield=lon_lat pt=51.404970234124800,-.206445841245690 d=1}\"}'") rdd.collect.foreach(println) 

//Get data within a rectangle 
val rdd = csc.sql("select post_code, lon_lat from datastax_postcode_demo.postcodes where solr_query='{\"q\": \"*:*\", \"fq\": \"lon_lat:[51.2,-.2064458 TO 51.3,-.2015418]\"}'") rdd.collect.foreach(println)

Want to learn more

Visit the DataStax academy for tutorials, demos and self-paced training courses.

Query the Northwind Database as a Graph Using Gremlin

$
0
0

download

Gremlin artwork by Ketrina Yim — ”safety first.”

One of the most popular and interesting topics in the world of NoSQL databases is graph. At DataStax, we have invested in graph computing through the acquisition of Aurelius, the company behind TitanDB, and are especially committed to ensuring the success of the Gremlin graph traversal language. Gremlin is part of the open source Apache TinkerPop graph framework project and is a graph traversal language used by many different graph databases.

I wanted to introduce you to a superb web site that our own Daniel Kuppitz maintains called “SQL2Gremlin” (http://sql2gremlin.com) which I think is great way to start learning how to query graph databases for those of us who come from the traditional relational database world. It is full of excellent sample SQL queries from the popular public domain RDBMS dataset Northwind and demonstrates how to produce the same results by using Gremlin. For me, learning by example has been a great way to get introduced to graph querying and I think that you’ll find it very useful as well.

I’m only going to walk through a couple of examples here as an intro to what you will find at the full site. But if you are new to graph databases and Gremlin, then I highly encourage you to visit the sql2gremlin site for the rest of the complete samples. There is also a nice example of an interactive visualization / filtering, search tool here that helps visualize the Northwind data set as it has been converted into a graph model.

I’ve worked with (and worked for) Microsoft SQL Server for a very long time. Since Daniel’s examples use T-SQL, we’ll stick with SQL Server for this blog post as an intro to Gremlin and we’ll use the Northwind samples for SQL Server 2014. You can download the entire Northwind sample database here. Load that database into your SQL Server if you wish to follow along.

In order to execute the graph queries using Gremlin as well, you can download the Gremlin console and run the queries via in-memory graphs as a way to become familiar with Gremlin and graph traversals. You can also download the Titan 1.0 graph database here, which also provides a Gremlin console that can persist the graph in Cassandra.

I’ll highlight 3 examples from the SQL2Gremlin site that I want to share with you as beginners:

Starting off with a simple example, here is a very common SQL query where you select multiple fields from a single table and filter based on 2 WHERE clause conditions:

SELECT ProductName, UnitsInStock
FROM Products
WHERE Discontinued = 1 AND UnitsInStock > 0

In SQL Server 2014 Management Studio, these were my results:

Picture1

And then in the Gremlin shell, this is the query you would use and subsequent result set:

g.V().has("product", "discontinued", true).has("unitsInStock", neq(0)).valueMap("name", "unitsInStock")
==> [unitsInStock:[29], name:[Mishi Kobe Niku]]
==> [unitsInStock:[20], name:[Guaraná Fantástica]]
==> [unitsInStock:[26], name:[Rössle Sauerkraut]]
==> [unitsInStock:[26], name:[Singaporean Hokkien Fried Mee]]

I’m skipping the instantiation of the Graph and traversals objects. Follow along on Daniel’s site to set those in the Gremlin console. You’ll find that using g to represent the graph traversal a common standard and the basic Gremlin documentation will walk you through the methods available to you for vertex searches.

Next, since relationships are first-class citizens in a graph database, you can very easily traverse relationships without needing to build those relationships via query-time joins like you see here in SQL:

SELECT Products.ProductName
FROM Products
INNER JOIN Categories
ON Categories.CategoryID = Products.CategoryID
WHERE Categories.CategoryName = 'Beverages'

In Gremlin, this is translated simply to this traversal where you look for vertices (Categories in this case) that have the property “Beverages” and then follow the “in” connection/relationship which will give you a Product vertex:

g.V().has("name","Beverages").in("inCategory").values("name")

Finally, here is my favorite: an example of a recommendation query in T-SQL vs. Gremlin. The idea is to rank the top 5 products ordered by other customers who already ordered the same product. I took the full T-SQL for the recommender query and here is what the results look like in my SQL Server 2014 Management Studio:

SELECT TOP (5) [t14].[ProductName]
FROM (SELECT COUNT(*) AS [value], [t13].[ProductName]
FROM [customers] AS [t0]
CROSS APPLY (SELECT [t9].[ProductName]
FROM [orders] AS [t1]
CROSS JOIN [order details] AS [t2]
INNER JOIN [products] AS [t3]
ON [t3].[ProductID] = [t2].[ProductID]
CROSS JOIN [order details] AS [t4]
INNER JOIN [orders] AS [t5]
ON [t5].[OrderID] = [t4].[OrderID]
LEFT JOIN [customers] AS [t6]
ON [t6].[CustomerID] = [t5].[CustomerID]
CROSS JOIN ([orders] AS [t7]
CROSS JOIN [order details] AS [t8]
INNER JOIN [products] AS [t9]
ON [t9].[ProductID] = [t8].[ProductID])
WHERE NOT EXISTS(SELECT NULL AS [EMPTY]
FROM [orders] AS [t10]
CROSS JOIN [order details] AS [t11]
INNER JOIN [products] AS [t12]
ON [t12].[ProductID] = [t11].[ProductID]
WHERE [t9].[ProductID] = [t12].[ProductID]
AND [t10].[CustomerID] = [t0].[CustomerID]
AND [t11].[OrderID] = [t10].[OrderID])
AND [t6].[CustomerID] <> [t0].[CustomerID]
AND [t1].[CustomerID] = [t0].[CustomerID]
AND [t2].[OrderID] = [t1].[OrderID]
AND [t4].[ProductID] = [t3].[ProductID]
AND [t7].[CustomerID] = [t6].[CustomerID]
AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]
WHERE [t0].[CustomerID] = N'ALFKI'

 

Picture2

By contrast, you can see the much more natural language of Gremlin where you traverse the relationships that are stored naturally in the graph model. Your starting point is the vertex for the customer with ID of “ALFKI” and you set a marker of “customer” to that node. Use the graph model diagram that I copied from Daniel’s site below. Follow the “out” arrows in the query from “ordered” to “contains” to “is” to end up at “Product”. You can then walk back up the graph using the “in” connects and eliminate the current customer using the customer marker.

gremlin> g.V().has("customerId", "ALFKI").as("customer").
out("ordered").out("contains").out("is").aggregate("products").
in("is").in("contains").in("ordered").where(neq("customer")).
out("ordered").out("contains").out("is").where(without("products")).
groupCount().by("name").
order(local).by(valueDecr).mapKeys().limit(5)
==>Gorgonzola Telino
==>Guaraná Fantástica
==>Camembert Pierrot
==>Chang
==>Jack's New England Clam Chowder

Northwinds Graph Model

So, that’s it for now. Again, please be sure to see the rest of the samples on Daniel’s site: http://sql2gremlin.com and the rest of the Apache TinkerPop project for graph computing. Also, stay tuned for upcoming announcements here from DataStax regarding the addition of a native Cassandra-based graph database that will be part of an upcoming DataStax Enterprise release.

Tuple and UDT support in DSE Search

$
0
0

Tuple and UDTs are convenient ways to handle certain data structures which usually go together (check the Cassandra on-line documentation for details). Version 4.8 and later supports them and in this blog post we explain how to use them best.

Set-up for the demo

1. Start by creating a CQL table:
CREATE KEYSPACE udt WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

USE udt;

CREATE TYPE Alias (
    known_alias text,
    use_alias_always boolean,
    alternate_alias_ frozen<map<text, text>>
);
CREATE TYPE Name (firstname text, surname text, alias_data frozen<Alias>);

CREATE table demo (
  "id" VARCHAR PRIMARY KEY,
  "name" frozen<Name>,
  "friends" list<frozen<Name>>,
  "magic_numbers" frozen<tuple<int, int, int>>,
  "status" VARCHAR);

2. Now create a DSE Search core against that table

dsetool create_core udt.demo generateResources=true

And that’s it. You’re ready to go.

Inserting some data

Let’s insert some data via CQL:

insert into demo (id, name, friends, magic_numbers) values ('2', {firstname:'Sergio', surname:'Bossa', alias_data:{known_alias:'Sergio', use_alias_always:false}}, [{firstname:'Berenguer', surname:'Blasi'}, {firstname:'Maciej', surname:'Zasada'}], (23,543,234));

Now let’s insert some more data by using the HTTP interface with curl:
curl http://localhost:8983/solr/udt.demo/update -H 'Content-type:application/json' -d '[{"id":"1","name":"{\"firstname\":\"Berenguer\", \"surname\": \"Blasi\", \"alias_data\":{\"know_alias\":\"Bereng\", \"use_alias_always\":true}}", "friends":"[{\"firstname\":\"Sergio\", \"surname\": \"Bossa\"}, {\"firstname\":\"Maciej\", \"surname\": \"Zasada\"}]", "magic_numbers":"{\"field1\":14, \"field2\":57, \"field3\":65}" }]'

The use of the HTTP interface is discouraged and the preferred method is the CQL insert one. The HTTP method has been included in this exercise for sake of completeness.

Notable points

If we take a look at the schema that was auto-generated we will see all UDT and Tuples have been exploded into their individual fields following a ‘dot notation’. This dot-notation is how we will be navigating and referring to specific sub-fields in our queries.

dsetool get_core_schema udt.demo
...
<fields>
<field indexed="true" multiValued="false" name="id" stored="true" type="StrField"/>
<field indexed="true" multiValued="true" name="friends" stored="true" type="TupleField"/>
<field indexed="true" multiValued="false" name="friends.firstname" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="friends.surname" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="friends.alias_data" stored="true" type="TupleField"/>
<field indexed="true" multiValued="false" name="friends.alias_data.known_alias" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="friends.alias_data.use_alias_always" stored="true" type="BoolField"/>
<dynamicField indexed="true" multiValued="false" name="friends.alias_data.alternate_alias_*" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="name" stored="true" type="TupleField"/>
<field indexed="true" multiValued="false" name="name.firstname" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="name.surname" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="name.alias_data" stored="true" type="TupleField"/>
<field indexed="true" multiValued="false" name="name.alias_data.known_alias" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="name.alias_data.use_alias_always" stored="true" type="BoolField"/>
<dynamicField indexed="true" multiValued="false" name="name.alias_data.alternate_alias_*" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="status" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="magic_numbers" stored="true" type="TupleField"/>
<field indexed="true" multiValued="false" name="magic_numbers.field1" stored="true" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="magic_numbers.field2" stored="true" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="magic_numbers.field3" stored="true" type="TrieIntField"/>
</fields>
...

Instead of auto-generating the schema, we could have used the ‘dsetool infer_solr_schema’ command to propose a schema and then fine-tune the proposed schema to your needs. This proposed schema and fine tuning give the same flexibility as with any other datatype, to later create the core with your tuned schema. UDT and Tuple subfields are treated just as any other fields only you use the dot notation when referring to them.

Let’s start searching

To search over UDT and Tuple fields, simply use the {!tuple} query parser in your queries:

1. ‘Basic’ querying

select * from udt.demo where solr_query='*:*';

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
  1 |    [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |   (14, 57, 65) |   {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null
  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

2. Querying a UDT subfield
select * from udt.demo where solr_query='{!tuple}name.firstname:Berenguer';

 id | friends                                                                                                                 | magic_numbers | name                                                                                                                        | solr_query | status
----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------
  1 | [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |  (14, 57, 65) | {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

3. Querying a Tuple subfield
select * from udt.demo where solr_query='{!tuple}magic_numbers.field1:14';

 id | friends                                                                                                                 | magic_numbers | name                                                                                                                        | solr_query | status
----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------
  1 | [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |  (14, 57, 65) | {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

Notice how Tuple subfields, lacking a field name, are referred to as fieldX where X is just the position of the field in the Tuple.

4. Querying collections of UDTS/Tuple
select * from udt.demo where solr_query='{!tuple}friends.surname:Zasada';

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
  1 |    [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |   (14, 57, 65) |   {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null
  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

5. Querying nested UDTS or Tuples
select * from udt.demo where solr_query='{!tuple}name.alias_data.use_alias_always:false';

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

6. Using ‘AND’ for instance… Notice the query is enclosed in parenthesis.
select * from udt.demo where solr_query='({!tuple}friends.surname:Zasada AND {!tuple}friends.surname:Blasi)';

 id | friends                                                                                                                    | magic_numbers  | name                                                                                                                          | solr_query | status
----+----------------------------------------------------------------------------------------------------------------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------+------------+--------
  2 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] | (23, 543, 234) | {firstname: 'Sergio', surname: 'Bossa', alias_data: {known_alias: 'Sergio', use_alias_always: False, alternate_alias_: null}} |       null |   null

7. Negative queries
select * from udt.demo where solr_query='-{!tuple}name.alias_data.known_alias:*';

 id | friends                                                                                                                 | magic_numbers | name                                                                                                                        | solr_query | status
----+-------------------------------------------------------------------------------------------------------------------------+---------------+-----------------------------------------------------------------------------------------------------------------------------+------------+--------
  1 | [{firstname: 'Sergio', surname: 'Bossa', alias_data: null}, {firstname: 'Maciej', surname: 'Zasada', alias_data: null}] |  (14, 57, 65) | {firstname: 'Berenguer', surname: 'Blasi', alias_data: {known_alias: null, use_alias_always: True, alternate_alias_: null}} |       null |   null

8. Dynamic fields. Notice the map that is inserted at ‘name.alias_data.alternate_alias_’
insert into demo (id, name, friends, magic_numbers) values ('3', {firstname:'Maciej', surname:'Zasada', alias_data:{known_alias:'Maciej', use_alias_always:false, alternate_alias_:{'alternate_alias_one':'Super-Maciej', 'alternate_alias_two':'The-Great-Maciej'}}}, [{firstname:'Berenguer', surname:'Blasi'}, {firstname:'Sergio', surname:'Bossa'}], (2423,23,423));

select * from udt.demo where solr_query='{!tuple}name.alias_data.alternate_alias_one:*';

 id | friends                                                                                                                   | magic_numbers   | name                                                                                                                                                                                                         | solr_query | status
----+---------------------------------------------------------------------------------------------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------
  3 | [{firstname: 'Berenguer', surname: 'Blasi', alias_data: null}, {firstname: 'Sergio', surname: 'Bossa', alias_data: null}] | (2423, 23, 423) | {firstname: 'Maciej', surname: 'Zasada', alias_data: {known_alias: 'Maciej', use_alias_always: False, alternate_alias_: {'alternate_alias_one': 'Super-Maciej', 'alternate_alias_two': 'The-Great-Maciej'}}} |       null |   null

Changing the schema

Now you decide you will no longer be indexing the ‘use_alias_always’ because it is of little use to you:

1. dsetool get_core_schema udt.demo current=true > currentSchema.xml
2. Edit the currentSchema.xml and remove the 'use_alias_always' fields that you want to remove
3. dsetool reload_core udt.demo schema=currentSchema.xml reindex=true

select * from udt.demo where solr_query='{!tuple}name.alias_data.use_alias_always:*';
ServerError: <ErrorMessage code=0000 [Server error] message="undefined field name.alias_data.use_alias_always">

As you can see, you have full granularity on Tuple/UDT fields.

Maybe sometime you will need to add a new Tuple/UDT column. Using the dsetool ‘infer_solr_schema’ gives you a proposal for a schema with the new field exploded into all subfields following the dot-notation. Following similar steps as above, you could edit the proposed schema to remove any unwanted fields, etc and reload the schema to start using your new field.

Limitations

There are only 2 limitations when using UDTs and Tuples in DSE Seach:

  1. Tuples and UDTs cannot be part of the partition key.
  2. You can’t use tuples and UDTs as CQL map values. In other words you can’t have a dynamic field of UDT/Tuple type (See this workaround)

Advanced:

Tuples/UDTs are read/written in one single block, not on a per field basis, so factor in the single block read and write.


Using Brian’s cassandra-loader/unloader to migrate C* Maps for DSE Search compatibility

$
0
0

Intro

Using map collections in DSE Search takes advantage of dynamic fields in Solr for indexing. For this to work, every key in your map has to be prefixed with the name of the collection. Using an example, this article aims to demonstrate:

  1. How to create and populate map collections that are compatible with DSE Search
  2. How to use generateResources to generate the schema and index maps as dynamic fields, and
  3. How to perform a data migration using Brian’s cassandra-loader/unloader for existing data that lacks the prefix required by DSE Search

Note: This same methodology (cassandra-unloader|awk|cassandra-loader) can be used in many different ETL workloads, this is just a common example of that larger group of situations where this may be handy.

Something to watch out for: Dynamic fields, like Cassandra collections, are not meant to store large amounts of data. The odds are, if you are misusing Cassandra collections, you will also have problems on the search side with dynamic fields because they tend to create significant heap pressure due to their memory footprint.

Creating and Populating the maps

If you are using a map to store contact information and the name of your map is called contact_info_, you may have the following table definition:

CREATE TABLE autogeneratedtest.customers_by_channel (  
    customer_id uuid,
    customer_type text,
    channel_id text,
    contact_info_ map<text, text>,
    country_code text,
    PRIMARY KEY ((customer_id), channel_id)
);

and you may have some rows as follows:

insert into autogeneratedtest.customers_by_channel (  
    customer_id,
    customer_type,
    channel_id,
    contact_info_,        
    country_code
)
VALUES (  
    uuid(), 
    'subscription', 
    'web-direct',
    {
        'email': 'betrio@gmail.com',
        'first_name': 'Bill',
        'last_name': 'Evans'
    },
    'USA'
);

insert into autogeneratedtest.customers_by_channel (  
    customer_id,
    customer_type,
    channel_id,
    contact_info_,
    country_code
) 
VALUES (  
    uuid(),
    'subscription',
    'web-direct',
    {
        'email': 'messengers@gmail.com',
        'first_name': 'Art',
        'last_name': 'Blakey'
    },
    'USA'
);

In order to index the map with DSE Search, the keys in the map would have to include the prefix contact_info_ as follows:

{
    'contact_info_email': 'messengers@gmail.com', 
    'contact_info_first_name': 'Art',
    'contact_info_last_name': 'Blakey'
}

Note: for existing systems, adding a prefix to the map’s key will require changes in your application code.

Indexing the field with generateResources

In previous version of DSE Search, users had to manually create and upload their own schema.xml and solrconfig.xml files with which to index their tables. This process was rather painful because hand crafting xml files is quite error prone. DSP-5373 (released with DSE 4.6.8 and 4.7.1) made it so that you can index a table with a single API call and DSE will take care of generating both your schema.xml and your solrconfig.xml automagically.

Use dsetool or curl to index a core for the table in one fell swoop:

dsetool create_core autogeneratedtest.customers_by_channel generateResources=true

or

curl "http://<host>:8983/solr/admin/cores?action=CREATE&name=autogeneratedtest.customers_by_channel &generateResources=true"

Protip: If you’re using cassandra authentication, dsetool does not yet work and you’ll have to use the curl command.

Data Migration with cassandra-loader/unloader

If your data set is very large, a spark job is a good way of migrating your data (here’s an example by Ryan Svhila). That is a topic for another post.

This post will focus on small to medium datasets and simple transformations that are implementable in awk. Because we can use input and output from stdin / stdout, the combination of the loader, the unloader, and some sed – awk magic can be used as a quick and dirty ETL tool.

Brian’s cassandra-loader and cassandra-unloader are a pair of java applications (built using the DataStax java driver). They are easy to use, full featured delimiter bulk loading / unloading tools, built following all the Cassandra / java driver best practices.

Note: Use this source code as a reference architecture when building Java (and other) applications that interact with Cassandra.

First download the binaries and set permissions:

wget "https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.17/cassandra-loader"

wget "https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.17/cassandra-unloader"

sudo chmod +x cassandra*  

Thanks Brian for helping optimize the awk script so that we can pipe directly from unloader to awk to the loader, this makes it so that we don’t have to fit the entire dataset in RAM.

Here’s how you would run it:

./cassandra-unloader -f stdout \
    -delim "|" \
    -host localhost \
    -schema "autogeneratedtest.customers_by_channel \
    (    \
        customer_id,    \
        customer_type,    \
        channel_id,    \
        contact_info_,    \
        country_code    \
    )" | \
awk -F "|" '{  \  
    a=substr($4, 3, length($4)-4);    \
    nb=split(a, b, ",");    \
    d=""; sep="";     \
    for (i=1; i<=nb; i+=2) {    \
        c=substr(b[i], 2);    \
        b[i]="\"contact_info_" c;    \
        d=d sep b[i] " : " b[i+1];    \
        sep=", ";    \
    }     \
    for (i=1;i<=3;i++) {    \
        printf(%s|",$i);    \
    }     \
    printf("%s",d);    \
    for (i=5;i<=NF;i++) {    \
        printf("|%s", $i);    \
    }     \
    printf("\n");    \
}' |    \
./cassandra-loader    \
    -f stdin    \
    -delim "|"    \
    -host localhost    \
    -schema "autogeneratedtest.customers_by_channel2(    \
    customer_id,    \
    customer_type,    \
    channel_id,    \
    contact_info_,    \
    country_code    \
)"

The result is a new table with the map keys prefixed by the name of the map column contactinfo.

The loader and unloader will use the number of threads = cpu cores in your box and will handle 1000 in flight futures. This and other advanced options are configurable but the defaults should work fine (especially if you run this from a separate box).

Enjoy!

DataStax C/C++ Driver: 2.2 GA released!

$
0
0

We are pleased to announce the 2.2 GA release of the C/C++ driver for Apache Cassandra. This release includes all the features necessary to take full advantage of Apache Cassandra 2.2 including support for new data types (‘tinyint’, ‘smallint’, ‘time’, and ‘date’) and support for user defined function/aggregate (UDF/UDA) schema metadata. In addition to 2.2 features the release also brings with it the whitelist load balancing policy, a streamlined schema metadata API, and several internal improvements.

What’s new

New data types

‘tinyint’ and ‘smallint’ can be used in cases where a smaller range would be a better fit than ‘int’ or ‘bigint’.

CREATE TABLE integers (key text PRIMARY KEY,
                       tiny tinyint,
                       small smallint);

INSERTing the new ‘tinyint’ and ‘smallint’ types

CassStatement* statement = cass_statement_new("INSERT INTO integers (key, tiny, small) "
                                              "VALUES (?, ?, ?)");

cass_statement_bind_string(statement, "abc");

/* 'tinyint' is a signed 8-bit integer. It can represent values between -128 and 127 */
cass_statement_bind_int8(statement, 127);

/* 'smallint' is a signed 16-bit integer. It can represent values between -32768 and 32767 */
cass_statement_bind_int16(statement, 32767);

CassFuture* future = cass_session_execute(session, statement);

/* Handle future result */

/* CassStatement and CassFuture both need to be freed */
cass_statement_free(statement);
cass_future_free(future);

The ‘date’ type uses an unsigned 32-bit integer (cass_uint32_t) to represent the number of days with Epoch (January 1, 1970) centered at 2^31. Because it’s centered at Epoch it can be used to represent days before Epoch. The ‘time’ type uses a signed 64-bit integer (cass_int64_t) to represent the number of nanoseconds since midnight and valid values are in the range 0 to 86399999999999. The following examples both use this schema:

CREATE TABLE date_time (key text PRIMARY KEY,
                        year_month_day date,
                        time_of_day time);

INSERTing the new ‘date’ and ‘time’ types

#include <time.h>

/* ... */

CassStatement* statement = cass_statement_new("INSERT INTO date_time (key, year_month_day, time_of_day) "
                                              "VALUES (?, ?, ?)");

time_t now = time(NULL); /* Time in seconds from Epoch */

/* Converts the time since the Epoch in seconds to the 'date' type */
cass_uint_32_t year_month_day = cass_date_from_epoch(now);

/* Converts the time since the Epoch in seconds to the 'time' type */
cass_int64_t time_of_day = cass_time_from_epoch(now);

cass_statement_bind_string(statement, 0, "xyz");

/* 'date' uses an unsigned 32-bit integer */
cass_statement_bind_uint32(statement, 1, year_month_day);

/* 'time' uses a signed 64-bit integer */
cass_statement_bind_int64(statement, 2, time_of_day)

CassFuture* future = cass_session_execute(session, statement);

/* Handle future result */

/* CassStatement and CassFuture both need to be freed */
cass_statement_free(statement);
cass_future_free(future);

SELECTing the new ‘date’ and ‘time’ types

#include <time.h>

/* ... */

CassStatement* statement = cass_statement_new("SELECT * FROM examples.date_time WHERE key = ?");

CassFuture* future = cass_session_execute(session, statement);

const CassResult* result = cass_future_get_result(future);
/* Make sure there's a valid result */
if (result != NULL && cass_result_row_count(resut) > 0) {
  const CassRow* row = cass_result_first_row(result);

  /* Get the value of the "year_month_day" column */
  cass_uint32_t year_month_day;
  cass_value_get_uint32(cass_row_get_column(row, 1), &year_month_day);

  /* Get the value of the "time_of_day" column */
  cass_int64_t time_of_day;
  cass_value_get_int64(cass_row_get_column(row, 2), &time_of_day);

  /* Convert 'date' and 'time' to Epoch time */
  time_t time = (time_t)cass_date_time_to_epoch(year_month_day, time_of_day);
  printf("Date and time: %s", asctime(localtime(&time)))
} else {
  /* Handle error */
}

/* CassStatement and CassFuture both need to be freed */
cass_statement_free(statement);
cass_future_free(future);

Whitelist load balancing policy

By default the driver auto-discovers and connects to all the nodes in a Cassandra cluster. The whitelist load balancing policy can override this behavior by only connecting to a predefined set of hosts. This policy is useful for testing or debugging and is not optimal for production deployments. If the goal is to limit connections to a local data center then use the data center aware load balancing policy (cass_cluster_set_load_balance_dc_aware()).

CassCluster* cluster = cass_cluster_new();

/* Enable a whitelist by setting a comma-delimited lists of hosts */
cass_cluster_set_whitelist_filtering(cluster, "127.0.0.1, 127.0.0.2, ...");

CassFuture* future = cass_session_connect(session, cluster);

/* ... */

cass_future_free(future);
cass_cluster_free(cluster);

New schema metadata API

This release improves the schema metadata API by adding concrete types for each of the different metadata types (CassKeyspaceMeta, CassTableMeta and CassColumnMeta) instead of the single CassSchemaMeta type used in the previous API. This allows for specific functions to handle each of the metadata types and better represents the metadata hierarchy i.e. user types, functions and aggregates metadata now live under the keyspace metadata. Applications that used the previous schema metadata API will require some small modifications to use the new API.

Retrieving a user defined type using the new API

/* Obtain a snapshot of the schema from the session */
const CassSchemaMeta* schema_meta = cass_session_get_schema_meta(session);

/* There is no need to free metadata types derived from the snapshot. Their lifetime's are bound
 * the snapshot.
 */
const CassKeyspaceMeta* keyspace_meta = cass_schema_meta_keyspace_by_name(schema_meta, "some_keyspace");

 if (keyspace_meta != NULL) {
  const CassDataType* some_user_type = cass_keyspace_meta_user_type_by_name(keyspace_meta, "some_user_type");

  /* Use user type */
 } else {
   /* Handle error */
 }

/* Only the snapshot needs to be freed */
cass_schema_meta_free(schema_meta);

A more complete example of using the new schema metadata API can be found here.

User function’s and user aggregate’s metadata

Cassandra 2.2 added user defined function (UDF) and aggregates (UDA). The 2.2 release of the C/C++ driver adds support to inspect the metadata of these new types.

Retrieving and printing UDF metadata

USE keyspace1;

CREATE FUNCTION multiplyBy (x int, n int)
RETURNS NULL ON NULL INPUT
RETURNS int LANGUAGE javascript AS 'x * n';
/* Obtain a snapshot of the schema metadata */
const CassSchemaMeta* schema_meta
  = cass_session_get_schema_meta(session);

/* Search for the function's keyspace by name */
const CassKeyspaceMeta* keyspace_meta
  = cass_schema_meta_keyspace_by_name(schema_meta, "keyspace1");

/* Search for the function by name and argument types (overloads are possible) */
const CassFunctionMeta* function_meta
  = cass_keyspace_meta_function_by_name(keyspace_meta, "multiplyBy", "int, int");

/* Inspect the function's metadata */

const char* full_name;
size_t full_name_length;
cass_function_meta_full_name(function_meta, &full_name, &full_name_length);

const CassDataType* arg1_type = cass_function_meta_type_by_name(function_meta, "x");

const CassDataType* arg2_type = cass_function_meta_type_by_name(function_meta, "m");

const CassDataType* return_type = cass_function_meta_return_type(function_meta);

/* ... */

/* Only the snapshot needs to be freed */
cass_schema_meta_free(schema_meta);

Retrieving and printing UDA metadata

USE keyspace1;

CREATE OR REPLACE FUNCTION avgState ( state tuple<int,bigint>, val int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS
  'if (val !=null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state;';

CREATE OR REPLACE FUNCTION avgFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS
  'double r = 0; if (state.getInt(0) == 0) return null; r = state.getLong(1); r/= state.getInt(0); return Double.valueOf(r);';

CREATE AGGREGATE IF NOT EXISTS average ( int )
  SFUNC avgState STYPE tuple<int,bigint> FINALFUNC avgFinal INITCOND (0,0);
/* Obtain a snapshot of the schema metadata */
const CassSchemaMeta* schema_meta
  = cass_session_get_schema_meta(session);

/* Search for the aggregate's keyspace by name */
const CassKeyspaceMeta* keyspace_meta
  = cass_schema_meta_keyspace_by_name(schema_meta, "keyspace1");

/* Search for the aggregate by name and argument types (overloads are possible) */
const CassAggregateMeta* aggregate_meta
  = cass_keyspace_meta_aggregate_by_name(keyspace_meta, "multiplyBy", "int, int");

/* Inspect the aggregate's metadata */

const char* full_name;
size_t full_name_length;
cass_aggregate_meta_full_name(aggregate, &full_name, &full_name_length);

const CassFunctionMeta* avg_state_func = cass_aggregate_meta_state_func(aggregate_meta);

const CassFunctionMeta* avg_final_func = cass_aggregate_meta_final_func(aggregate_meta);

const CassDataType* state_type = cass_aggregate_meta_state_type(aggregate_meta);

const CassDataType* return_type = cass_aggregate_meta_return_type(aggregate_meta);

const CassValue* init_cond = cass_aggregate_meta_init_cond(aggregate_meta);

/* ... */

/* Only the snapshot needs to be freed */
cass_schema_meta_free(schema_meta);

Internal improvements

This release also includes the following internal improvements:

  • The default consistency is now LOCAL_QUORUM instead of ONE
  • Improved the performance of string to/from conversion UUID functions
  • Support for server-side warnings that are logged at the CASS_LOG_WARN level

Looking forward

This release brings with it full support for Apache Cassandra 2.2 along with many other great features. In the next release we will be focusing our efforts on supporting Apache Cassandra 3.0. Let us know what you think about the 2.2 GA release. Your feedback is important to us and it influences what features we prioritize. To provide feedback use the following:

Using the Cassandra Data Modeler to Stress and Size C*/DSE Instances

$
0
0

Summary

Here’s the link to the Data Modeler that is discussed in this post. The main drivers behind Cassandra performance are:
  1. Hardware
  2. Data Model
  3. Application specific design and configuration

For many early stage projects that are trying to make hardware and data modeling decisions to maximize performance, it is often beneficial to take the app specific questions out of the equation and design and test a table that will scale on a given hardware configuration. Furthermore, projects may not even have datasets to use for testing. It can be time consuming to generate realistic test and to build temporary benchmarking applications to read and write said data to/from c*.

Jake’s post on cassandra-stress 2.1 depicts how stress can enable users to take their app out of the equation, and quickly run benchmarks with their data model on their hardware.

Why would you want to benchmark your own schema on your own hardware?

  • You may want to iterate on a data model decision before building your app. Avoid building an app on the wrong data model and finding out you have to change it all later!
  • This gives you a baseline of how your cluster will perform (in terms of reads / writes per second, latency SLA’s, and even node density). This is the first step for a sizing conversation as it gives an architect an idea of what provisioning requirements will look like for their peak workloads.
  • Know that your data model will scale linearly and have an idea (predictability) of at what point you should be planning to scale out.

Cassandra-stess, like many powerful tools, is also complex. It requires some statistical understanding and syntactic skill, in order to get up and running with even a simple data model user profile driven test (for details see the cassandra-stress docs). Furthermore, designing a good cassandra data model requires basic understanding of how CQL works and how c* data is laid out on disk as a result of partitioning and clustering.

The purpose of this post is to describe how the CassandraDataModeler aims to simplify this task, getting users up and running with user profile powered cassandra-stress tests in minutes.

The main goals are as follows:

  1. Help users design data models by helping them understand the maintradeoffs that data modeling presents them with and providing a dynamicstorage engine visualization that shows how data is laid out on disk for a particular CREATE TABLE statement.
  2. Guide users on how to reason about and select appropriate Size, Population, and Cluster distributions that match the nature of their data quickly.
  3. Remove syntactic barriers by providing a web form that will generate a stress.yaml and the cassandra-stress command that should be used torun a simple test quickly and easily.

Note: There are many reasons why your app may not perform exactly like cassandra-stress, you must take this excersize for what it is–a baseline that gives you an idea of what kind of performance to strive for and how your cassandra cluster can scale.

Data Model Tradeoffs, Visualizing the Storage Engine

The art of data modeling is a mix between:

  1. building tables that are convenient for development based on your access patterns and
  2. designing tables that will scale and meet performance SLA’s

The main reason a table may not scale well is if it allows for unbounded partitions. The following heuristic is not exact but it does simplify this exercise significantly and is quite battle tested in the field. When designing your c* table you should aim to ensure that the size of your partitions are smaller than 100 mbs and 100,000 cells. I have seen multi GB partitions and million cell partitions. I assure you they are painful to work with and you want to avoid them from the get go.

The main way to ensure this is by controling your partition key which is the first element in your cql primary key. The CassandraDataModeler renders a visual representation of your table on disk for ease of understanding.

For example you may want to take this data model:

single partition key

and add a compound partition key:

compound partition key

This would help you limit the amount of data, per partition making your data model more scalable. When doing this exercise you also want to esure you’re preventing hotspots. For example, does most of your traffic come from one userid productid combination? If so, only one set of replica nodes are going to be doing most of the work if you use the data model above!

The tradeoff is that with the new data model it may be a bit harder to get at your data.

I.E. previously you could run these queries:

Likely select queries for this data model:

SELECT * FROM reviews_by_day WHERE userid = ?;

SELECT * FROM reviews_by_day WHERE userid = ? AND productid = ?;

SELECT * FROM reviews_by_day WHERE userid = ? AND productid = ? AND time = ?;

SELECT * FROM reviews_by_day WHERE userid = ? AND productid = ? AND time = ? AND reviewid = ?;  

With the new data model you must know both the userid and the productid to get at your data, which implies some more work on your app side if you need cross product data:

Likely select queries for this data model:

SELECT * FROM reviews_by_day WHERE userid = ? AND productid = ?;

SELECT * FROM reviews_by_day WHERE userid = ? AND productid = ? AND time = ?;

SELECT * FROM reviews_by_day WHERE userid = ? AND productid = ? AND time = ? AND reviewid = ?;  

For use cases with a lot of data, I recommend designing a data model that scales in a healthy fashion and then work on any additional tables (yes, data duplication) you may need to match your access patterns.

Groking Stress Field Distributions

There are three distributions in cassandra-stress. Size, population, and cluster distributions.

Good ballpark setting for each of your fields will give you a realistic stress profile.

Size Distribution

The size distribution answers the question: how big are my data values? When you select a cql type that maps to a Java primitive, the size on disk for each of those values is always the same. The Data Modeler will pre-populate those fields with their values (i.e. ints are 32 bits = 4 bytes). Make sure you select fixed as the distribution type since an int will always take up the same amount of space.

For variable types like text, try to think about the distribution of the field’s sizes. For example, if I have a field album_name in a music database, I might think about what the shortest and longest albums in my database. According to google, somebody called Alice in Chains has an album called 13 so let’s make 2 the minimum value in our distribution. Apparently there are some pretty long album names out there so maybe this distribution has long tails. To keep it simple let’s say the upper bound is 105 bytes.

If we assume that album names will most likely be around 50 bytes long and the distribution looks like a bell curve then we might guess that the size distribution for album_name is normal (or gaussian) and goes from 5 to 105.

album_names

Hover over the types for descriptions of the distributions. The histogram on the right will give you a visual representation of the distribution shape.

Population Distribution

You want to think about the population distribution in terms of cardinalityand frequency.

Cardinality: You may have many unique album names in your database (a google search result tells me that Spotify has 30 million songs so let’s say there will be ~4 million albums in our db), this means that the range of the distribution (regardless of which distribution type you pick) should be wide, let’s say (1..4000000).

Frequency: Do you expect each album to appear in your dataset the same amount of times? If a field has a flat frequency profile — same for every one– it’s distribution is Uniform, if it has a bell shaped profile, some values will appear more frequently than others based on a number of random variates, then it is Normal / Gaussian. Again, use the histogram images and the tooltip descriptions in the application for guidance when picking the distribution type based on the frequency of your field.

If our dataset stores album sales or plays, the population distribution of the albums might be normal since some artists might be played more than others resulting in a bell curve distribution, or more likely given the dominance of a few genres and artists (top 40) it might be exponential.

You know your business data better than I do! But this is the kind of thinking you want to apply when selecting your population distributions.

Cluster Distribution

The final distribution only applies to clustering columns. It is similar to the population distribution except that instead of thinking about the entire dataset, you want to think about occurrences of your clustering column within the partition key. This essentially gives you the ability to generate wide rows.

For a table with the following PK:

CREATE TABLE album_plays_by_user(
...
...
PRIMARY KEY (user_id, album_name)
);

What is the cardinality and frequency of albumnames for a given userid?

I expect the average user to listen to somewhere between 100 and 1000 different albums on Spotify so we may have between 100 and 1000 album_names in a given partition key.

Given this logic, the cluster distribution for album_names might be gaussian (100..1000).

Running Stress on the first try

Once you have finished filling out the second tab in the Cassandra Data Modeler, go to the third tab and click the download button. This generates a file that you can use to run cassandra-stress with.

Your command to insert 100k records would look like this:

cassandra-stress user profile=autoGen.yaml n=100000 ops\(insert=1\)

Optimizing Stress

Once you have cassandra-stress running here are a few considerations to look out for to ensure you have a good benchmark.

  1. If you run stress on one of your cassandra nodes, it will contend against the same OS subsystems as your c* instance. You may want to move your stress workload to it’s own machine(s).
  2. If your cassandra cluster seems under utilized and your ops rates are not increasing as you add nodes, the chances are you are not generating enough traffic from the client (cassandra-stress). Is your cassandra stress box under utilized? If not, consider increasing the amount of threads for stress with the rate threads=8 lever.
  3. If your stress box is fully utilized and you are still not saturating your cassandra cluster, you may need to beef up the machine running stress (bigger box) or scale out by setting up multiple machines to run stress against the cluster.

Tools like htop and dstat are invaluable when performing this exercise.

Specifically I like to run dstat -rvn 10 and htop in multiple screen sessions accross all nodes. Doing this constantly during benchmarks and regular workloads will help you get an idea of what is and isn’t normal and help you identify and remove bottlenecks.

For a deeper dive into Cassandra tuning, check out Al Tobey’s tuning guide.

Running multiple DataStax Enterprise nodes in a single host

$
0
0

This article is about setting up a DataStax Enterprise cluster running in a single host.

There are a variety of reasons why you might want to run a DataStax Enterprise cluster inside a single host. For instance, your server vendor talked you into buying this vertical-scale machine but Cassandra can’t effectively use all the resources available. Or your developers need to test your app as they develop it, and they’d rather test it locally.

Whatever the reason, you’ll learn how to set the cluster up from the ground up.

Multi-JVM Multi-Node HOWTO

The goal is to have a dense node: a single box running multiple DataStax Enterprise Cassandra nodes in a single cluster.

The DataStax Enterprise cluster that we build in this blog post will consist of:

  • 3 DataStax Enterprise Cassandra nodes.
  • A simple configuration without internode encryption
  • Multiple interfaces (all virtual in this example). Each node will bind its services to its own IP address.
  • Shared disks: all nodes will write their data and logs to the same disk. However, since data (or logs) directories can be any mount points, you can configure the nodes to point to different physical disks to improve performance for instance.

The resulting configuration will look like this:

  • Single binary tarball installation: we’ll install DataStax Enterprise once, and share it across nodes.
  • Multiple node locations: each node will have its own directory hierarchy with configuration files, data, and logs.

Installing the binaries

Register on DataStax Academy. Use your download credentials to download DataStax Enterprise into a directory of your choice:

    $ wget --user $USERNAME --password $PASSWORD http://downloads.datastax.com/enterprise/dse.tar.gz

After the download completes, unpack it:

    $ tar zxf dse.tar.gz

The unpacked tarball creates a dse-4.8.0/ directory that is our DSE_HOME directory for this tutorial:

    $ export DSE_HOME=`pwd`/dse-4.8.0/

Setting up the nodes

We’ll first create a root directory per node. In these directories, the nodes have their configuration files, data, and logs. Let’s also create the data/, and logs/ directories while we’re at it:

    $ for i in 1 2 3; do mkdir -p node$i/data node$i/logs; done

Next, we’ll copy all the configuration files. For each service, first create the corresponding directory in the node’s configuration directory, and then copy the files. For example, for Cassandra:

    $ mkdir -p node1/resources/cassandra && cp -r $DSE_HOME/resources/cassandra/conf node1/resources/cassandra

Iterating over every service in the resources directory can be done with a for loop:

    $ for service in `ls dse-4.8.0/resources | grep -v driver | grep -v log4j`;
        do mkdir -p node1/resources/$service &&
        cp -r $DSE_HOME/resources/$service/conf node1/resources/$service;
     done

Now we repeat this step for every node in our cluster:

    $ for node in node1 node2 node3; do
       for service in `ls dse-4.8.0/resources | grep -v driver | grep -v log4j`; do 
         mkdir -p $node/resources/$service && 
         cp -r $DSE_HOME/resources/$service/conf $node/resources/$service;
       done;
     done

After the files are in place, we can make the required changes to the resources/cassandra/conf/cassandra.yaml and resources/cassandra/conf/cassandra-env.sh files on each node to create a working DataStax Enterprise cluster. In these files, configure the cluster name, the interface the node will to bind to, the directory where the node will store its data, the directory where the node will store its logs, and more.

Editing the cluster configuration

To configure parameters like the cluster name, data and log directories, edit the cassandra.yaml file in nodeN/resources/cassandra/conf. Below is a list of the minimum parameters (and their locations) we’ll have to set for each node to have a functional DataStax Enterprise cluster of  Cassandra nodes.

Fire up your favourite text editor (by which I mean “fire up emacs”), and let’s do it.

cassandra.yaml

  • cluster_name: change the cluster name so that the nodes are all part of the same cluster, for example, cluster_name: ‘clusty’
  • commitlog_directory, data_file_directories, and saved_caches_directory: specify where the node will keep its data, its commit log, and saved caches, for example, commitlog_directory: node1/data/commitlog
  • listen_address: The IP address or hostname that Cassandra binds to for connecting to other Cassandra nodes. Alternatively we could change listen_interface. For example listen_address: 127.0.0.1 for node1, listen_address: 127.0.0.2 for node2, and so on.
  • rpc_address: The listen address for client connections (Thrift RPC service and native transport).
  • seeds: the list of IP addresses of the seed nodes will go in here

cassandra-env.sh

The only parameter to change here is the port that JMX binds to. For security reasons (see the vulnerability and the Cassandra fix) JMX will only bind to localhost so we’ll need a separate port per node.

Change the line JMX_PORT=”7199″ to list a different port for every node, e.g. 7199 for node1, 7299 for node2, and so on.

Note: If you really wanted to bind JMX to an address that is different from localhost, you can use Al Tobey’s JMXIPBindJust follow the instructions there.

logback.xml

The last bit that needs tweaking is the location of the nodes’ log directory in resources/cassandra/conf/logback.xml. We’ll have to define a property named cassandra.logdir to point to the right location for each node, e.g.

    <property name="cassandra.logdir" value="node1/logs/" />

Environment variables

After editing the configuration variables, we’re ready to try and start our nodes.

So that DSE can pick up the right configuration files, we’ll have to specify the configuration files locations via environment variables.

The first variable to be set is DSE_HOME. In the previous section we saw how to do it, but let’s refresh it here:

    $ export DSE_HOME=`pwd`/dse-4.8.0

Since we’re configuring a homogeneous cluster of Cassandra nodes only, set only the configuration environment variables for DataStax Enterprise Cassandra to point to the files on each node’s configuration directory, for NODE=node1:

    $ export DSE_CONF=$NODE/resources/dse/conf
    $ export CASSANDRA_HOME=$NODE/resources/cassandra
    $ export CASSANDRA_CONF=$CASSANDRA_HOME/conf

The remaining environment variables can be set to the default configuration files:

    $ export TOMCAT_HOME=$DSE_HOME/resources/tomcat
    $ export TOMCAT_CONF_DIR=$TOMCAT_HOME/conf
    $ export HADOOP_CONF_DIR=$DSE_HOME/resources/hadoop/conf
    $ export HADOOP_HOME=$DSE_HOME/resources/hadoop
    $ export HIVE_CONF_DIR=$DSE_HOME/resources/hive/conf
    $ export SPARK_CONF_DIR=$DSE_HOME/resources/spark/conf

After setting these environment variables, we can start our DataStax Enterprise Cassandra node:

    $ $DSE_HOME/bin/dse cassandra -f

To stop the node, press Control+C.

To start all 3 nodes, we could run the start command without the -f flag, and it would start the process in the background. Then we would need to reset the environment variables to reflect the change in node, e.g. set NODE=node2, re-set the environment variables, and run the command again. But that’s not very practical. Not just that, this process can be automated via scripts. Scripts which so that we can start the nodes with a command like:

    $ with-dse-env.sh node1 bin/dse cassandra -f

Example scripts

In the previous sections we’ve outlined the steps necessary to configure a cluster of Cassandra nodes in a single host. The following example scripts automate the steps that are outlined above.

dense-install.sh

This script copies the relevant configuration files for each node, and edits them according to the description outlined in the previous sections.

In a directory that contains the DataStax Enterprise installation tarball (dse.tar.gz), use it like:

    $ path/to/dense-install.sh clusty 3

to create a cluster named clusty that consists of 3 DataStax Enterprise nodes. Keep in mind that the configuration done by the script is minimal (though it’ll give you a working cluster). If you want to change anything, make the changes before starting the nodes, for example: enabling encryption, modifying token ranges.

You can download the dense server installation script here.

with-dse-env.sh

This script will set the relevant environment variable values and execute the command requested, for instance to start DSE Cassandra for node1 do:

    $ path/to/with-dse-env.sh node1 bin/dse cassandra -f

The script assumes that the current directory (as reported by `pwd`) contains the nodes’ configuration files as that were updated by the dense-install.sh script.

You can download the script that sets up the right environment here.

A DataStax Enterprise Cassandra cluster

Now that you have downloaded the scripts, let’s use the scripts to create and start a DataStax Enterprise Cassandra cluster.

Configure the network interfaces

Before anything else, we must ensure our host has a network interface available for each of the nodes in the cluster. In this tutorial we will use virtual network interfaces.

To create the appropriate virtual network interfaces in Linux, use ifconfig. For example:

ifconfig lo:0 127.0.0.2

You must repeat this step for every node in the cluster. If there’s 3 nodes, then the first node uses 127.0.0.1, the second node uses 127.0.0.2 (virtual), and the third node uses 127.0.0.3 (virtual as well).

Create the cluster

Creating a new cluster, regardless of the type of workload, is done with the script dense-install.sh:

    $ dense-install.sh clusty 3
    ~= cluster: clusty, 3 nodes =~
    * Unpacking dse.tar.gz...
    Will install from dse-4.8.0
    * Setting up C* nodes...
      + Setting up node 1...
        - Copying configs
        - Setting up the cluster name
        - Setting up JMX port
        - Setting up directories
        - Binding services
      + Setting up node 2...
        - Copying configs
        - Setting up the cluster name
        - Setting up JMX port
        - Setting up directories
        - Binding services
      + Setting up node 3... 
      - Copying configs
        - Setting up the cluster name
        - Setting up JMX port 
        - Setting up directories 
        - Binding services
    Done.

Starting the cluster

To start each node, we’ll run this script. For instance for node1:

    $ with-dse-env.sh node1 bin/dse cassandra -f

Use -f to quickly see what’s going on with the nodes. Next, we run the same command (in different terminal windows) for the remaining nodes (node2 and node3).

Now, verify that each node is up and running. For example, for node1:

    $ ./with-dse-env.sh node1 bin/dsetool ring
    Address          DC           Rack         Workload         Status  State    Load             Owns                 Token
    127.0.0.1        Cassandra    rack1        Cassandra        Up      Normal   156.77 KB        75.15%               -3479529816454052534
    127.0.0.2        Cassandra    rack1        Cassandra        Up      Normal   139.42 KB        19.71%               156529866135460310
    127.0.0.3        Cassandra    rack1        Cassandra        Up      Normal   74.07 KB         5.14%                1105391149018994887

 

Other workloads

In the previous section, we walked through the steps necessary to create a cluster of DataStax Enterprise Cassandra nodes. Now we’ll create a cluster of DSE Search nodes using the scripts provided as example.

We’ll re-use the cluster that we created in the previous section and do some DSE Search specific tweaks only.

Search related configuration changes

On the cluster that we configured in the previous section, you can start the nodes immediately to run as DSE Cassandra nodes. To set their node workloads to search nodes we need to do a few small tweaks first (these are not done in the example scripts because they are specific to DSE Search and we wanted to keep the scripts as generic as possible).

server.xml

In DataStax Enterprise versions earlier than 4.8, DSE Search will bind its services to 0.0.0.0 unless we configure a connector with a different IP address (in DataStax Enterprise versions earlier than 4.8, DSE Search binds to the same IP address that Cassandra does).  If you’re running DataStax Enterprise versions earlier than 4.8, in the <Service name=”Solr”> section in the resources/tomcat/conf/server.xml file, you should add (for node1):

    <Connector
      port="${http.port}"
      protocol="HTTP/1.1" 
      address="127.0.0.1" 
      connectionTimeout="20000" 
      redirectPort="8443" 
    />

For the rest of the nodes, you’ll need to change the IP address accordingly, that is 127.0.0.2 for node2 and 127.0.0.3 for node3.

Environment variables

In the previous sections, we set node-specific environment variables only for DataStax Enterprise Cassandra workloads. For the DSE Search specific environment, set these additional variables:

    $ export TOMCAT_HOME=$NODE/resources/tomcat
    $ export TOMCAT_CONF_DIR=$TOMCAT_HOME/conf

These variables are set in the example scripts, so you don’t have to set the variables manually here.

Starting the cluster

To start the nodes with their workload set to search, we need to add the -s flag. For example, for node1:

    $ with-dse-env.sh node1 bin/dse cassandra -s -f

After starting all nodes, we can check the nodes are running and that their workload is effectively that of search nodes. For example, for node1:

    $ with-dse-env.sh node1 bin/dsetool ring
    Address          DC           Rack         Workload         Status  State    Load             Owns                 Token
    127.0.0.1        Solr         rack1        Search           Up      Normal   119.29 KB        75.15%               -3479529816454052534
    127.0.0.2        Solr         rack1        Search           Up      Normal   152.1 KB         19.71%               156529866135460310
    127.0.0.3        Solr         rack1        Search           Up      Normal   61.39 KB         5.14%                1105391149018994887

Notes and Caveats (sort  of a conclusion)

In the above sections, we’ve outlined how you could set up a cluster of DataStax Enterprise Cassandra (and DSE Search) nodes, that all run in the same host. We’ve simplified the setup to keep the tutorial brief, and provided several helper scripts to help you get started trying out dense-node installations in a development environment.

However, before you rush and put this in production, there are several points you should consider:

  1. Network interfaces: in this tutorial all nodes are bound to the same network interface. In production, however, this configuration provides poor performance. DataStax Enterprise recommends that you have one network adapter per node.
  1. Disks: just like with network adapters, the nodes are storing their data on the same physical disk. To minimize contention, an alternative is to configure assigned locations for each node so that their data is on different disks. For example, configure different partitions for the commit log, the data, the logs, and so on.
  1. Replica placement: in terms of fault tolerance, having all replicas of a shard on the same physical host is not a great idea. To have replicas reside on different physical hosts, configure the PropertyFileSnitch so that all shards (taking into the account the replication factor) have copies on different machines:
  • distribute your cluster across physical machines, e.g. host1 runs nodes a and b, host2 runs nodes c and d
  • configure each node to use the PropertyFileSnitch
  • place nodes in host1 as being in rack1, nodes in host2 as being in rack2
  1. cassandra -stop will stop all nodes in the host; consider using the -p pid option to stop the a specific node (this is left as an exercise to the reader)
  1. numactl: use numactl –cpunodebind to split multi-socket machines down the middle. In our experience, this configuration provides a significant performance boost compared to interleaved and as a bonus it provides much better isolation since the JVMs will never be run on the same cores, avoiding all manner of performance degrading behavior. You must modify bin/cassandra to override the hard-coded numactl –interleave if the numactl binary is available.

Python Driver 3.0.0 Released, Including Support for Cassandra 3.0.0

$
0
0

Today we are happy to announce the release of the DataStax Python Driver 3.0.0 for Apache Cassandra. The main focus of this release was to add support for the updated schema metadata introduced in Cassandra 3.0, while maintaining compatibility with earlier server versions. This being a major release, we also took the opportunity to remove deprecated features and make a few improvements to the existing driver API.

Highlighted API Updates:

  • Default consistency level is changed from ONE to LOCAL_ONE
  • Execution API updates
    • Queries always return a new ResultSet object, instead of returning different types based on paging parameters.
    • Trace data is no longer attached to statements.
    • Binding named parameters on prepared statements now ignores extra names.
  • blist is removed as a soft dependency, queries now return util.SortedSet for sets
  • Metadata model API changes; most notably, model types are now CQL strings instead of reflecting Cassandra internal types

All API changes are discussed in detail in the Upgrade Guide found in the latest documentation.

In addition to the metadata support and API changes, a number of bug fixes and minor improvements are also included. See the CHANGELOG for a complete listing of tickets.

As always, thanks to all who provided contributions and bug reports. The continued involvement of the community is appreciated:

Interpreting Cassandra repair logs and leveraging the OpsCenter repair service

$
0
0

Introduction to repairs and the Repair Service

Cassandra repairs consist of comparing data from between replica nodes, identifying inconsistencies, and streaming the latest value for mismatched data. We can’t compare an entire cassandra database value by value so we create Merkle trees to identify inconsistencies and then we stream them.

Repairs are expensive; CPU is needed to generate the Merkle trees and networking / io is needed to stream missing data. Usually repairs also trigger lots of compactions if they have not been run for a while (especially when there’s a lot of inconsistent data and leveled / date tiered compaction strategies are being used).

The OpsCenter repair service splits up the repair job into lots of little slices (256 per table) and runs them around the clock, turning a heavy, manual, weekly operation into an automatic, constant job. Clusters will see higher consistent cpu utilization / load when the repair service is on instead of a big spike once per week.

Steps in a repair

Each repair session will be identified by a UUID (for example #0d4544b0-8fc9-11e5-a498-4b9679ec178d). The following are the logs from a healthy repair. Notice all the messages are INFO messages, there are no WARN or ERROR.

Repair sessions have repair jobs and jobs which occur for each table in the session.

Repeat for every table:
1) RepairJob.java Request merkle trees
2) RepairSession.java Receive merkle trees
3) Differencer.java Check for inconsistencies
4) Optional StreamingRepairTask.java Stream differences–if any
5) RepairSession.java is fully synced
6) StorageService.java Repair session for range (,] finished

Summarizing repair logs – clean run

To group the tasks use the following bash foo:

$ cat system.log| grep "11ff9870-8fc9-11e5-a498-4b9679ec178d" | sed -E 's/([0-9]{1,3}\.){3}[0-9]{1,3}/Source/'|sed -E 's/([0-9]{1,3}\.){3}[0-9]{1,3}/Target/' | awk '{ split($3,a,":"); $2=a[0] ; $3=""; $4=""; print }'|uniq -c

In this case there was no streaming and all the jobs complete successfully for the range.

   1 INFO    RepairSession.java:260 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] new session: will sync /Source, /Target1, /Target2, ...... on range (4393112290973329820,4394202908924102592] for OpsCenter.[rollups86400, events_timeline, rollups7200, events, bestpractice_results, backup_reports, settings, rollups60, rollups300, pdps]
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for rollups86400 (to [/Source, /Target, ...])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for rollups86400 from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for rollups86400
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] rollups86400 is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for events_timeline (to [/Source, /Target, ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for events_timeline from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for events_timeline
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] events_timeline is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for rollups7200 (to [/Source, /Target, ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for rollups7200 from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for rollups7200
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] rollups7200 is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for events (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for events from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for events
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] events is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for bestpractice_results (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for bestpractice_results from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for bestpractice_results
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] bestpractice_results is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for backup_reports (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for backup_reports from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for backup_reports
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] backup_reports is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for settings (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for settings from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for settings
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] settings is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for rollups60 (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for rollups60 from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for rollups60
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] rollups60 is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for rollups300 (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for rollups300 from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for rollups300
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] rollups300 is fully synced
   1 INFO    RepairJob.java:163 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for pdps (to [/Source, /Target,  ... ])
   9 INFO    RepairSession.java:171 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for pdps from /Source
  36 INFO    Differencer.java:67 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for pdps
   1 INFO    RepairSession.java:237 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] pdps is fully synced
   1 INFO    RepairSession.java:299 - [repair #11ff9870-8fc9-11e5-a498-4b9679ec178d] session completed successfully
   1 INFO    StorageService.java:3001 - Repair session 11ff9870-8fc9-11e5-a498-4b9679ec178d for range (4393112290973329820,4394202908924102592] finished

Summarizing repair logs – errors

Now let’s look at a repair session with some errors.

$ cat system.log| grep "0fb1b0d0-8fc9-11e5-a498-4b9679ec178d" | sed -E 's/([0-9]{1,3}\.){3}[0-9]{1,3}/Source/'|sed -E 's/([0-9]{1,3}\.){3}[0-9]{1,3}/Target/' | awk '{ split($3,a,":"); $2=a[0] ; $3=""; $4=""; print }'|uniq -c
   1 INFO    RepairSession.java:260 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] new session: will sync /Source, /Target, ... on range (4393112290973329820,4394202908924102592] for keyspace.[table1, table2, ...]
   1 INFO    RepairJob.java:163 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for table1 (to [/Source, /Target, ...])
   9 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for table1 from /Source
  36 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for datasources
   1 INFO    RepairSession.java:237 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] datasources is fully synced
   1 INFO    RepairJob.java:163 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for table1 (to [/Source, /Target, ...])
   9 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for table1 from /Source
  36 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex_error
   1 INFO    RepairSession.java:237 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] tablex_error is fully synced
   1 INFO    RepairJob.java:163 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for tablex (to [/Source, /Target, ])
   9 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for tablex from /Source
   9 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   2 INFO    Differencer.java:74 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target have 1 range(s) out of sync for tablex
   1 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   3 INFO    Differencer.java:74 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target have 1 range(s) out of sync for tablex
   1 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   3 INFO    Differencer.java:74 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target have 1 range(s) out of sync for tablex
   1 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   1 INFO    Differencer.java:74 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target have 1 range(s) out of sync for tablex
   3 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   3 INFO    Differencer.java:74 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target have 1 range(s) out of sync for tablex
   1 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   6 INFO    Differencer.java:74 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target have 1 range(s) out of sync for tablex
   2 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
  11 INFO    StreamingRepairTask.java:81 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Forwarding streaming repair of 1 ranges to /Source (to be streamed with /Target)
   2 INFO    StreamingRepairTask.java:68 - [streaming task #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Performing streaming repair of 1 ranges with /Source
   1 INFO    StreamingRepairTask.java:81 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Forwarding streaming repair of 1 ranges to /Source (to be streamed with /Target)
   4 INFO    StreamingRepairTask.java:68 - [streaming task #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Performing streaming repair of 1 ranges with /Source
   1 INFO    RepairJob.java:163 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for tablex (to [/Source, /Target, ...])
   1 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for tablex from /Source
   1 INFO    StreamingRepairTask.java:96 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] streaming task succeed, returning response to /Target
   1 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for tablex from /Source
   2 INFO    StreamingRepairTask.java:96 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] streaming task succeed, returning response to /Target
   7 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for tablex from /Source
  36 INFO    Differencer.java:67 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Endpoints /Source and /Target are consistent for tablex
   1 INFO    RepairSession.java:237 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] tablex is fully synced (1 remaining column family to sync for this session)
   1 INFO    StreamingRepairTask.java:96 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] streaming task succeed, returning response to /Target
   1 INFO    RepairJob.java:163 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] requesting merkle trees for tablex_processed (to [/Source, /Target, ...])
   2 INFO    StreamingRepairTask.java:96 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] streaming task succeed, returning response to /Target
   2 INFO    RepairSession.java:171 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Received merkle tree for tablex from /Source
   1 ERROR    RepairSession.java:303 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] session completed with the following error
   1 org.apache.cassandra.exceptions.RepairException:    keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target
   1 java.lang.RuntimeException:    on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target
   1 Caused    #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target
   1 ERROR    StorageService.java:3008 - Repair session 0fb1b0d0-8fc9-11e5-a498-4b9679ec178d for range (4393112290973329820,4394202908924102592] failed with error org.apache.cassandra.exceptions.RepairException: [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target
   1 java.util.concurrent.ExecutionException:    #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target
   1 Caused    [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target
   1 Caused    #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target

Notice the error message

ERROR    StorageService.java:3008 - Repair session 0fb1b0d0-8fc9-11e5-a498-4b9679ec178d for range (4393112290973329820,4394202908924102592] failed with error org.apache.cassandra.exceptions.RepairException: [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d on keyspace/tablex, (4393112290973329820,4394202908924102592]] Sync failed between /Source and /Target

Streaming for the range (4393112290973329820, 4394202908924102592] between nodes Source and Target failed and caused the repair session to fail. The OpsCenter repair service will retry this session again (up to a configurable threshold) on failure.

What happened?

The failure may have been due to a networking problem, an sstable corruption, etc. To get more information, we can 1) check our logs at the Target repair node for additional errors and 2) run the slice again and see if it works.

Check the target

Checking the logs at the target, there are no failures associated with this repair session which means there must have been a problem on the stream receiving end.

$ cat system.log| grep "0fb1b0d0-8fc9-11e5-a498-4b9679ec178d" | sed -E 's/([0-9]{1,3}\.){3}[0-9]{1,3}/Source/'|sed -E 's/([0-9]{1,3}\.){3}[0-9]{1,3}/Target/' | awk '{ split($3,a,":"); $2=a[0] ; $3=""; $4=""; print }'|uniq -c
   1 INFO    Validator.java:257 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Sending completed merkle tree to /Source for keyspace/tablex
   1 INFO    Validator.java:257 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Sending completed merkle tree to /Source for keyspace/tablex
   1 INFO    Validator.java:257 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Sending completed merkle tree to /Source for keyspace/tablex
   2 INFO    StreamingRepairTask.java:68 - [streaming task #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Performing streaming repair of 1 ranges with /Source
   1 INFO    Validator.java:257 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Sending completed merkle tree to /Source for keyspace/tablex
   1 INFO    Validator.java:257 - [repair #0fb1b0d0-8fc9-11e5-a498-4b9679ec178d] Sending completed merkle tree to /Source for keyspace/tablex

Run the slice

In this case we can run:

nodetool repair -par -st 4393112290973329820 -et 4394202908924102592

here is the result (notice the networking problem was temporary and now the repair slice succeeds!):

]# nodetool repair -par -st 4393112290973329820 -et 4394202908924102592
[2015-11-23 21:36:44,138] Starting repair command #1, repairing 1 ranges for keyspace dse_perf (parallelism=PARALLEL, full=true)
[2015-11-23 21:36:46,086] Repair session 4aa7a290-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:36:46,086] Repair command #1 finished
[2015-11-23 21:36:46,095] Starting repair command #2, repairing 1 ranges for keyspace keyspace (parallelism=PARALLEL, full=true)
[2015-11-23 21:36:47,967] Repair session 4bc6f540-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:36:47,968] Repair command #2 finished
[2015-11-23 21:36:47,979] Nothing to repair for keyspace 'system'
[2015-11-23 21:36:47,985] Starting repair command #3, repairing 1 ranges for keyspace keyspace (parallelism=PARALLEL, full=true)
[2015-11-23 21:36:53,100] Repair session 4ce92e20-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:36:53,102] Repair command #3 finished
[2015-11-23 21:36:53,112] Starting repair command #4, repairing 1 ranges for keyspace dse_system (parallelism=PARALLEL, full=true)
[2015-11-23 21:36:53,979] Repair session 4fe50900-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:36:53,979] Repair command #4 finished
[2015-11-23 21:36:53,987] Starting repair command #5, repairing 1 ranges for keyspace keyspace (parallelism=PARALLEL, full=true)
[2015-11-23 21:36:58,390] Repair session 507477c0-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:36:58,390] Repair command #5 finished
[2015-11-23 21:36:58,399] Starting repair command #6, repairing 1 ranges for keyspace OpsCenter (parallelism=PARALLEL, full=true)
[2015-11-23 21:37:11,448] Repair session 531931f0-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:37:11,448] Repair command #6 finished
[2015-11-23 21:37:11,458] Starting repair command #7, repairing 1 ranges for keyspace system_traces (parallelism=PARALLEL, full=true)
[2015-11-23 21:37:11,878] Repair session 5ae2e890-922a-11e5-ae1c-4b5d0d7247d3 for range (4393112290973329820,4394202908924102592] finished
[2015-11-23 21:37:11,878] Repair command #7 finished

There are a couple of known streaming issues to keep an eye on.
CASSANDRA-10791 and CASSANDRA-10012 which may cause streaming errors. If you are on an affected version, upgrade. If you encounter a reproducible streaming error and can’t find the particular stack trace in an existing jira, open a new one.

A corruption

In a different repair session we see a different repair error, this time it refers to a specific sstable.

WARN  [STREAM-IN-/x.x.x.x] 2015-11-20 20:55:45,529  StreamSession.java:625 - [Stream #114cea40-8fc9-11e5-ae1c-4b5d0d7247d3] Retrying for following error  
java.lang.RuntimeException: Last written key DecoratedKey(4393675392884570836, 000b313032333432333334303000000343504600) >= current key DecoratedKey(918610503192973903, 00102941d767895c11e5b5dfaadf7c0db7b80000087765627369746573ff) writing into /cassandra/data/mykeyspace/table1-bd0990407e6711e5a4cae7fe9b813e81/mykeyspace-table1-tmp-ka-7890-Data.db  
    at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:164) ~[cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:261) ~[cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:168) ~[cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.streaming.compress.CompressedStreamReader.read(CompressedStreamReader.java:89) ~[cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48) [cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38) [cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56) [cassandra-all-2.1.11.908.jar:2.1.11.908]
    at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250) [cassandra-all-2.1.11.908.jar:2.1.11.908]
    at java.lang.Thread.run(Unknown Source) [na:1.8.0_60]

When there is an sstable corruption (due to disk failures or possibly even a bug), the procedure is to run nodetool scrub on the sstable which will correct the corruption.

In this case the issue was due to CASSANDRA-9133 which was fixed in 2.0.15 so an upgrade was also in order!

Summary, stay repaired!

Repairs can fail due to networking issues or sstable corruptions. The former are usually short lived and will go away on retry; the latter are more rare and require admin intervention in the form of running nodetool scrub.

Remember, repairs aren’t things you run when your cluster is broken; they are a mandatory anti-entropy administrative task (like an oil change) that keeps your cluster healthy. In many cases, running a repair on an unhealthy cluster will just make things worse.

Hopefully this post will help you understand how repairs work, how to troubleshoot the repair service, and keep your production cluster happy and your database boring. Enjoy!

DataStax DevCenter 1.5 is now MATERIALIZED with Apache Cassandra™ 3.0 support!

$
0
0

We’re very pleased to announce the availability of DataStax DevCenter 1.5, which can be downloaded here.
This new version is compatible with Apache Cassandra™ 3.0 supporting Materialized Views (blog / docs) and Multiple Indexes (docs) with content assist, quick fix suggestions, validations, and wizards!

You will also find numerous improvements and bug fixes in this release. Without further ado, let me walk you through the most notable features.

Materialized Views

Apache Cassandra™ 3.0 introduced Materialized Views, which is a powerful feature to handle automated server-side denormalization, removing the need for client-side handling of this denormalization and ensuring eventual consistency between the base and view data.
The new CQL statements for Materialized Views are very similar to the statements to those for Tables. They support pretty much the same properties, with a few exceptions, but don’t worry, DevCenter will help you with that:

mv-validation
Fig 1: Materialized View statements highlighting and validation

Also, when you create a new Materialized View, it requires that you specify as its primary key the same columns in the primary key definition of the base table. This happens automatically for your convenience, if you’re using the Wizard:

mv-wizard
Fig 2: Create Materialized View wizard

Of course, there’s an ALTER wizard too, but please note that you are not allowed to change the definition of a Materialized View meaning that you cannot drop or add columns to an existing view, and therefore we take you straight to the Advanced Settings page:

mv-alter-wizard
Fig 3: Alter Materialized View wizard

Once the views are created they can be found in the Schema View panel at two different levels: as a top-level element under its keyspace node along with all other views in that keyspace, and nested under the base table node, where you will also find all the other views for that specific table only.

schema-view
Fig 4: Materialized Views shown in the Schema View panel

And by right clicking on the Materialized Views in this panel you will have the options to DROP and CLONE them as any other element.
It’s worth mentioning that you cannot drop a base table if you still have views associated with it. If you try to do that, DevCenter will display an error message like this:

drop-table
Fig 5: Drop Table dialog checks for associated views

Multiple Indexes

Starting in Cassandra 3.0, it’s possible to create multiple indices on the same column as long as they don’t duplicate themselves. Let’s see some examples of how that works in the DevCenter CQL editor:

multiple-indexes
Fig 6: Multiple Indexes support

Also, as you can see in the screenshot above, Cassandra 3.0 introduced the keyword VALUES for indices on a Map column. Previously, if there was no specific index type for a Map column, Cassandra would assume it was a values index, but now you can make it explicit by using the keyword. In DevCenter, it’s fine to use both notations, we treat them as the same thing, so in the example above, you can see that the index in line 13 is actually duplicating the index in line 12.

The Index Wizard has been updated, too. Up until now the Index Wizard would propose each column only once, but starting in DevCenter 1.5, it will recognize when you’re connected to a Cassandra 3.0 cluster and propose the same column multiple times.

UDF/UDT Drop Dialogs

DevCenter 1.5 now makes it easy to drop User-defined Types and User-defined Functions.
Dropping a schema element in CQL usually takes the form DROP <element-type> <element-name>, but for functions, when there are overloads, it’s required that you specify the entire function signature like this:

drop-function-example
Fig 7: Drop function with full signature

Instead of doing that, now you can simply right-click on the function you want to drop in the Schema View and select the Drop Function option:

drop-function-menu
Fig 8: Drop function context menu

Similarly to the Materialized View and base Table reference check, we also check for references when dropping a function (that could be referenced by an Aggregate) and a Type (that could be referenced by a Table or another UDT) and if that’s the case you will see an error message listing all the dependencies.

Improved Timestamp Format

It has been requested by a few of our users (through the built-in Feedback form – don’t hesitate to use it, too) to have the dates in DevCenter use the same format as in cqlsh (the CQL shell for command-line queries) and now this is the format used:

new-date-format
Fig 9: Improved date format in the results grid and details panel

New Icons for Schema Entities

DevCenter 1.5 has some new and updated icons for schema entities that stand out more and provide a better indication of their purpose, check them out in the screenshot below:

new-iconsnew-icons-schema
Fig 10: New icons for functions, aggregates, types and materialized views


Node.js Driver Adds Support for Cassandra 3.0

$
0
0

Version 3.0.0 of the DataStax Node.js driver is now available with support for Apache Cassandra 3.0.

The main focus for this release was to add support for the changes in the schema metadata introduced in Cassandra 3.0 that is used internally by the driver. Additionally, we exposed Materialized Views metadata information and introduced a new Index metadata API.

We also made other improvements to the driver.

Stream Throttling and Manual Paging

Client#stream() now features throttling. When retrieving large result sets that spans across multiple pages of rows, the driver will make the request for the following rows once the previous rows have been read from the stream.
Client#eachRow() result now exposes a nextPage function to trigger the request for the following page reusing the same callbacks, as an option to manually retrieve the following pages without having to deal with pageState.
You can read more about retrieving large result sets and paging in the Node.js driver in the documentation.

Performance Improvements

The driver now includes message coalescing, making less syscalls in highly concurrent scenarios. We also added some small performance improvements like avoiding expensive javascript calls in the common execution path. You can read more on the tickets: NODEJS-142, NODEJS-130, NODEJS-198, and NODEJS-200.

A complete list of changes can be found on the changelog.

Looking Forward

Up until now, in the Node.js driver we were using version numbers to denote compatibility with Apache Cassandra. This is a good way to avoid doubts about which driver versions supported Cassandra versions but it constrained us to deliver major/breaking changes to the driver API independently from Cassandra.

For future versions, we decided to move to pure semantic versioning.

As you may already know, when adding support for newer Cassandra / DSE versions in the Node.js driver, compatibility with earlier Cassandra versions were always maintained. This will continue to be the case.

Your feedback is important to us and it influences our priorities. To provide feedback use the following:

Advanced Time Series Data Modelling

$
0
0

nest

Collecting Time Series Vs Storing Time Series

Cassandra is well known as the database of choice when collecting time series events. These may be messages, events or similar transactions that have a time element to them. If you are not familiar on how Cassandra holds time series, there is a useful data modelling tutorial on the DataStax academy website.

https://academy.datastax.com/demos/getting-started-time-series-data-modeling

In this document I will try to explain some of the pros and cons of using time series in Cassandra and show some techniques and tips which make make your application better not just for now but also 5 years down the line.

Choosing you long term storage

Choosing your long term storage is not really a trivial thing. In most applications there are business requirements about how long data will need to be held for and sometimes these requirements change. More and more, business want and are required to hold data for longer. For example, a lot of financial companies must keep audit data for up to seven years.

Using some sample applications.

We will look at some examples and see how time series is used for each.

1. A credit card account which shows transaction for a particular account number. Data is streamed in real time.

2. Collecting energy data for a smart meter. Data comes from files sent from devices after one day of activity.

3. Tick data for a financial instrument. Data is streamed in real time.

All of the above use cases are time series examples and would benefit from using Cassandra. But when we look at the queries and retention policies for this data we may look at different ways of storing them.

Clustering columns for time series.

The credit card application will need to query a users transactions and show them to the user. They will need to be in descending order with the latest transaction first. The data may be paged over multiple pages. This data needs to be kept for 7 years.

Using a simply clustering column in the table definition, will allow all the transactions for a particular account to be on one row for extremely fast retrieval.

Our table model would be similar to this

create table if not exists latest_transactions(
 credit_card_no text,
 transaction_time timestamp,
 transaction_id text,
 user_id text,
 location text,
 items map<text, double>,
 merchant text,
 amount double,
 status text,
 notes text,
 PRIMARY KEY (credit_card_no, transaction_time)
) WITH CLUSTERING ORDER BY ( transaction_time desc);

The smart meter application is a little different. The data will come in for each meter no with data every 30 mins of increments to the meter value. Eg. 00:00 – 13, 00:30 11, 01:00 3……23:30 10. So the daily amount is an aggregation of all the data points together.

The business requirement state that the data must be held for 5 years and a days data will always be looked up together. Cassandra’s has column type of Map which can be used to hold our daily readings in a format of time offset and value.

Our table model would look something like this

create table if not exists smart_meter_reading (
 meter_id int,
 date timestamp,
 source_id text,
 readings map<text, double>,
 PRIMARY KEY(meter_id, date)
) WITH CLUSTERING ORDER BY(date desc);

This seems sensible until we look at how much data we will be holding and for how long. This application has 10 Million meters and hopes to double that over the next 3 years. If we start at 10M customers holding 365*5 years of data which 48 columns of offset data per day (every half hour), this can quickly add up to over 50o billion points (a map of 48 entries is held as 48 columns) and we haven’t talked about the increase over those years. Since we don’t have to query the reading individually it might suit better to look at other storage capabilities. A map can be simply transformed to and from a JSON string which would allow us to hold the same data but not have the over head of all the columns.

create table if not exists smart_meter_reading (
 meter_id int,
 date timestamp,
 source_id text,
 readings text,
 PRIMARY KEY(meter_id, date)
) WITH CLUSTERING ORDER BY(date desc);

So instead of 500 billion points we now have 5 billion.

Now we finally look at application no 3. In this case we have data streaming to our application for thousands of different instruments. We can expect on 100,000 ticks a day on some of the instruments. If the requirement is to hold this data long term and be able to create different views of the data for charting capabilities, the storage requirement will be extremely large.

Collecting data vs Storing data.

We can collect the data in the traditional way using a clustering column with a table like so

CREATE TABLE tick_data ( 
 symbol text,
 date timestamp,
 time_of_day timestamp,
 value double,
 PRIMARY KEY ((symbol,timestamp), date)
) WITH CLUSTERING ORDER BY (date DESC);

When we think of keeping this data long term we have to understand the implications of having billions of columns in our tables. Our normal queries will be charting the last 5 days of instrument data in 15 min intervals or show the open, high, low and close of an instrument for the last 10 days in 30 mins intervals. So 99% of the queries will be looking at the whole days data. In this example we can then change the long term storage for this table to create a second table which handles any requests for data that is not today. At the end of each day we can compress and the store the data more efficiently for the rest of its life in the database.

For example we can use the following

CREATE TABLE tick_data_binary ( 
 symbol text,
 date timestamp;
 dates blob,
 ticks blob,
 PRIMARY KEY ((symbol,date))
);

Inserting into the tick_data_binary table can sustain inserts and reads of around 5 million ticks per server,  compared to 25000 for the tick_data table. The tick_data_binary table is also three times less storage that the tick_data table. This is not surprising as instead of holding 100,000 TTLs for all the columns, in the binary example we only hold 1. But there are bigger advantages when it comes to Cassandra’s management services like compaction and repair. Compaction needs to be able to search for tombstones(deleted columns) which means that the more columns we have, the longer compaction can take. A similar problem arises in repair as this is in fact a compaction job. Comparing the repair time of a table with clustering columns and a table with binary data shows an increase of 10 times for the clustering table over the binary table.

Trade offs

There are always trade offs to each of the models above. For example the binary data in particular can’t be filtered using CQL, the filtering needs to happen in some code. This post isn’t supposed to be a catch all for time series applications but it it is supposed to help with the modelling of your data, both current and future, and the thought process that goes into that. In particular, don’t be afraid to change the data model structure once its usefulness has decreased.

Check out https://academy.datastax.com/tutorials for more information of Cassandra and data modelling. Also have a look at the certification options that you can achieve https://academy.datastax.com/certifications.

For examples of this data models and see the github projects below.

https://github.com/DataStaxCodeSamples/datastax-creditcard-demo

https://github.com/DataStaxCodeSamples/datastax-iot-demo

https://github.com/DataStaxCodeSamples/datastax-tickdata-comparison

https://github.com/DataStaxCodeSamples/datastax-tickdb-full

DataStax C# Driver 3.0 Released

$
0
0

We’ve just released version 3.0.0 of the DataStax C# Driver with support for Apache Cassandra 3.0.

The main focus for this release was to add support for the changes in the schema metadata introduced in Cassandra 3.0 that is used internally by the driver, while maintaining compatibility with earlier versions of Cassandra. Additionally, we added Materialized Views metadata information and introduced a new Index metadata API.

The release also includes the following noteworthy improvements.

Performance Improvements

We are focused on delivering a great performance under the .NET runtime. On the past months we implemented a series of improvements that allowed us to deliver significantly higher throughput rates compared to v2.7 of the driver:

Linq / Mapper: Inserting Entities with Null Property Values

We added an option to the Mapper and Linq components to avoid to INSERT null values, to allow you to avoid unnecessary tombstones.

mapper.Insert(new User { FirstName = "Jimi", LastName = "Hendrix" }, insertNulls: false);

Default Setting Changes

We’ve taken the opportunity of a major version release to revamp the default settings to the recommended values.

Looking Forward

Up until now, in the C# driver we were using version numbers to denote compatibility with Apache Cassandra. This is a good way to avoid doubts about which driver versions supported Cassandra versions but it constrained us to deliver major/breaking changes to the driver API independently from Cassandra.

For future versions, we decided to move to pure semantic versioning.

As you may already know, when adding support for newer Cassandra / DSE versions in the C# driver, compatibility with earlier Cassandra versions were always maintained. This will continue to be the case.

Your feedback is important to us and it influences our priorities. To provide feedback use the following:

Version 3.0.0 of the DataStax C# driver is now available on Nuget.

Tableau + Spark + Cassandra

$
0
0

This article is a simple tutorial explaining how to connect Tableau Software to Apache Cassandra via Apache Spark

logos
This tutorial explains how to create a simple Tableau Software dashboard based on Cassandra data. The tutorial uses the Spark ODBC driver to integrate Cassandra and Apache Spark. Data and step-by-step instructions for installation and setup of the demo are provided.

1/ Apache Cassandra and DataStax Enterprise

First you need to install a Cassandra cluster and a Spark cluster connected with the DataStax Spark Cassandra connector. A very simple way to do that is to use DataStax Enterprise (DSE), it’s free for development or test and it contains Apache Cassandra and Apache Spark already linked.

You can download DataStax Enterprise from https://academy.datastax.com/downloads and find installation instructions here http://docs.datastax.com/en/getting_started/doc/getting_started/installDSE.html.

After the installation is complete, start your DSE Cassandra cluster (it can be a single node) with Spark enabled with the command line dse cassandra -k”.

2/ Spark Thrift JDBC/ODBC Server

The Spark SQL Thrift server is a JDBC/ODBC server allowing JDBC and ODBC interfaces for client connections like Tableau to Spark (and then to Cassandra). See here for more details http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/spark/sparkSqlThriftServer.html.

Start the Spark Thrift JDBC/ODBC server with the command line “dse start-spark-sql-thriftserver”.

You should see a new SparkSQL application running here http://127.0.0.1:4040/ from the Spark UI manager http://127.0.0.1:7080/.

The IP address is the address of your Spark Master node. You may need to replace 127.0.0.1 with your instance IP address if you are not running Spark cluster or DSE locally. With DSE, you can run the command “dsetool sparkmaster” to find your Spark Master node IP.

sparkui

Note that to connect Tableau Software to Apache Cassandra we would have been able to connect directly via the DataStax ODBC driver. But in this case all computations, joins, aggregates are done on the client side, so it’s not efficient and risky for large dataset. On the contrary, with Spark jobs everything is done on the server side and on a distributed manner.

3/ Demo Data

Create the 3 demo tables, you can find all data and the script to create CQL schemas and to load tables here : https://drive.google.com/drive/u/1/folders/0BwpBQmtj50DFaU5jWTJtM1pleUU.

When you have downloaded everything, run the script “ScriptCQL.sh” to create schemas and load data (cqlsh must be in your path or download everything into the cqlsh directory). A keyspace named ks_music with 3 tables albums, performers, countries is created.
schema1 devcenter

4/ ODBC Driver

Download and install the Databricks ODBC driver for Spark from https://databricks.com/spark/odbc-driver-download or from https://academy.datastax.com/downloads/download-drivers.

No specific parameter is need, the default installation is ok. The Mac version can be found only on Databricks Web site.

5/ Tableau Software

Open Tableau and connect to the Spark server with following settings from the Connect panel:

1

The server IP is Spark Master node IP so it may also change depending of your installation. You may also change authentication settings depending of your configuration.

6/ Cassandra Connection

Then you should be able to see all Cassandra keyspaces (named Schema in Tableau interface) and tables (click enter in Schema and Table inputs to see all available Cassandra keyspaces and tables).

Drag and drop albums and performers tables from the ks_music keyspace.

Change the inner join clause with right columns from the 2 tables, Performer from albums table and Name from performers table (click on the blue part of the link between the 2 tables to be able to edit this inner join).

2
Keep a “Live” connection ! Don’t use “Extract” because otherwise all your data will be loaded into Tableau.

3

Update Now” to see a sample of data returned.

7/ Tableau Dashboard

Go to the Tableau worksheet “Sheet 1” and start a simple dashboard.

Convert Year column (from albums table) to Discrete type (click at the right of the Year column to do that from a menu).

3.5
Add Year (from albums table) as Rows, Gender (from performers table) as Columns and Number of Records as the measure.

4
And with the “Show Me” option, convert your table into a stacked bars chart.

5
Done, you have created your first tableau dashboard on live Cassandra data !

8/ SparkSQL and SQL Queries

Finally you can check SQL queries generated on the fly and pass to SparkSQL from the Spark UI http://127.0.0.1:4040/sql/ (SQL tab of the SparkSQL UI).

This shows SparkSQL processes and all SQL queries generated by Tableau Software and executed on top of Cassandra data through the Spark Cassandra connector.

6

Additional links

Improving JBOD

$
0
0

Background

With Cassandra 3.2 we improve the way Cassandra handles JBOD configuration, that is, using multiple data_file_directories. Earlier versions have a few problems which are unlikely to happen but painful if they do. First, if you run more than one data directory and use SizeTieredCompactionStrategy (STCS) you can get failing compactions due to running out of disk space on one of your disks. The reason for this is that STCS picks a few (typically 4) similarly sized sstables from any data directory and compacts them together into a single new file. With a single data directory and the recommendend 50% disk space free, this works fine, but if you have 3 equally sized data directories on separate physical disks, all 50% full, and we create a compaction with all that data we will put the resulting file in a single data directory and most likely run out of disk space. The same issue can happen with LeveledCompactionStrategy since we do SizeTieredCompaction in L0 if the compactions get behind.

The other problem (and the reason CASSANDRA-6696 was created in the first place) is that we can have deleted data come back with the following scenario:

If tombstone and actual data is in separate data directories, and the one with the tombstone goes corrupt, we can have deleted data come back.

  1. A user successfully writes key x to the cluster
  2. Some time later, user deletes x (we write a tombstone for x).
  3. This tombstone (by chance) gets stored in a separate data directory from the actual data.
  4. gc_grace_seconds passes.
  5. Two of the nodes containing x compact away both data and tombstone, which is OK since gc_grace has passed.
  6. The third node does no compaction including all the sstables containing x. This means we can’t drop tombstone or the actual data.
  7. The data directory containing the tombstone for x gets corrupt on the third node, for example by having the disk backing the data directory break.
  8. Operator does the natural thing and replaces the broken disk and runs repair to get the data back to the node.
  9. The key x is now back on all nodes.

Splitting ranges

The way we decided to solve the problems above was to make sure that a single token would never exist in more than one data directory. This means we needed to change the way we do compaction, flushing and streaming to make sure we never write a token in the wrong data directory. Note that we can still have tokens on the wrong directory temporarily, for example after adding nodes or changing replication factor, but compaction will automatically move tokens to the correct locations as we compact the data. To do this, we need to split the owned local ranges over the data directories configured for the node. We sum up the number of tokens the node owns, divide by the number of data directories and then make sure that we find boundary tokens that make sure each data directory gets as many tokens. Note that we can’t care about disk size when splitting the tokens as we might give too much data to a disk that is big but is not 100% for Cassandra, and we also can’t care about the amount of free space on the disk as that would make the data directory boundaries change every time we write to the data directory. We are only able to split the ranges for the random partitioners (RandomPartitioner and Murmur3Partitioner) – if you run an ordered partitioner, the behaviour stays the same as it is today – we flush/compact/stream to a single file and make no effort to put tokens in a specific data directory.

If you run vnodes we make sure we never split a single vnode range into two separate data directories, the reason for this is that in the future we could take the affected vnodes offline until the node has been rebuilt if we have a disk failure. This also enables us to do CASSANDRA-10540 where we split out separate sstables for every local range on the node, meaning each vnode will have its own set of separate sstables. Note that before CASSANDRA-7032 vnode allocation was random and with a bit of bad luck the vnodes can vary a lot in size and therefore make the amount of data in each data directory unbalanced. In practice this should not be a big problem since each node typically has 256*3 local ranges (number of tokens * replication factor) making it easier to find boundary tokens to make the data directories balanced. Also note that if we have 256*3 tokens and 3 data directories, we will not put exactly 256 ranges in each directory, instead we sum up the number of tokens in total owned by the node, then find boundary tokens that makes the token count per data directory as balanced as possible.

Splitting local range without vnodes – each data directory gets the same number of tokens

Splitting local ranges with vnodes – sum of the number of tokens in the local ranges (green boxes) in each data directory should be balanced

By having all data for one token is on the same disk makes sure that we lose all versions of a token if a disk breaks – this makes it safe to run repair again as we can not have any data that was deleted come back to life.

Compaction

To solve the problem with compaction picking sstables from several data directories and putting the result in a single directory, we made compaction data directory-local. Since CASSANDRA-8004 we run two compaction strategy instances – one for unrepaired data and one for repaired data and after CASSANDRA-6696 we run one pair of compaction strategy instances per data directory. Since it is impossible to pick sstables from two different compaction strategy instances when we start a compaction, we make sure that the compaction stays data directory local. If all tokens are placed correctly, we know that the result is going to stay in the same directory where the original sstables live – but if some tokens are in the wrong place, we will write them into new sstables in the correct locations.

Partitioning compactions over data directories like this also makes it possible to run more compactions in parallel with LeveledCompactionStrategy as we now know that each data directory will not overlap with the sstables in another data directory.

Major compaction

When a user triggers a major compaction each compaction strategy instance picks all its sstables and runs a major compaction over them. If all tokens are in the correct place, the resulting sstable(s) will stay in the same  compaction strategy instance. If we have tokens that are in the wrong place, these tokens will get moved into a new sstable in the correct data directories. When running STCS, users might expect a single sstable (or two if you run incremental repairs, one for unrepaired and one for repaired data) as the result of a major compaction, after CASSANDRA-6696 we will end up with one (or two) sstable per data directory if all tokens are in the correct place, and more sstables otherwise.

Flushing

Flushing is now multi threaded, one thread per data directory. When we start a flush, we split the memtable in the number of data directories parts and give each part to a thread to write to disk. This should improve flushing speed unless you flush tiny sstables where the overhead of splitting is bigger than the gain from writing the parts in parallel.

Streaming

We simply write the stream to the correct locations, this means that one remote file can get written to several local ones.

Backups

This also makes it possible to backup and restore individual disks. Before CASSANDRA-6696 you would always need to restore the entire node since you could have tokens compacted to the disk from other disks between the time you did your backup to the time when the disk crashed. Now you can take the disk-backup plus its incremental backup files and restore the single disk that died.

Migration/Upgrading

You don’t need to do anything to migrate – compaction will take care of moving the tokens to the correct data directories. If you want to speed this process up there is nodetool relocatesstables which will rewrite any sstable that contains tokens that should be in another data directory. This command can also be used to move your data into the correct places if you change replication factor or add a new disk. Nodetool relocatesstables is a no-op if all your tokens are in the correct places.

Viewing all 381 articles
Browse latest View live