Reinforcement Learning of a deep neural network has been applied to the problem of supply chain logistics: In a stochastic environment, how to optimize pickup and delivery schedules.
Trying to modernize monolithic legacy applications is hard: these applications are core drivers of the business and the risk of messing them up is too great. However, as time goes on, the cost of maintaining these monoliths grows.
A case study at Graph Day recounting a client study we did to see whether their database could be reorganized to offer improved query performance. We looked at graph databases (OrientDB, Titan, Neo4J) because they thought of their data as graph data, and relational (Postgresql) because that’s what their database already implemented.
We are in an era of unprecedented innovation in databases. Data-intensive companies are grappling with whether the many new options — NoSQL, Key-Value, Document, Column Family, Column-Oriented — are appropriate for them. The commercial success of Facebook and LinkedIn makes graph databases a hot area of investigation. Unlike many new databases, they are not a variation on or a simplification of relational databases. Instead they require new ways of thinking and modeling data. In return they can answer truly novel questions.
We are meeting more people who are interested in looking into the world of graph databases. Palladium has executed proofs–of–concept for clients to help them explore this world. In this post we summarize what sorts of questions we feel like a proof of concept project can answer, and how we typically tackle them. For our presentation at Graph Day, we’ll be walking through one in particular, but really there are a variety of answers you may want.
In this post we summarize what sorts of questions we feel like a proof of concept project around graph databases can answer, and how we typically tackle them.
As part of the work we’re doing to refresh our graph database evaluation for a couple of clients (and our upcoming talk at Graph Day!) we took Titan 1.0out for a spin last week. We’ll be doing more in-depth explorations on some in-house and public datasets over the next few weeks, but here’s some preliminary impressions based on a contrast with the Titan we came to know a year ago or so.
Our client’s legacy system held graph-like data in a relational database, but new customers’ data sizes were crippling performance and scale. As part of an overall architectural rejuvenation, we evaluated migrating their data to graph and relational schemas to determine if query performance and scalability could be improved. With representative data in hand, we designed alternate relational schemas, graph database designs, and triple store designs, benchmarking performance and noting subjective measures such as ease of use and fluency of the query language. Vendors included PostgreSQL, Neo4J, Titan, and AllegroGraph. Follow-up studies included other vendors. The results surprised us, leading to a hybrid relational and graph recommendation. We have implemented the first milestone over the last year. Follow-up work shows that graph DB vendors have come a long way even in that time. This methodology and information in this case study should be useful to teams choosing a database engine, whether graph or relational, for their next project.Our client’s legacy system held graph-like data in a relational database, but new customers’ data sizes were crippling performance and scale. As part of an overall architectural rejuvenation, we evaluated migrating their data to graph and relational schemas to determine if query performance and scalability could be improved. With representative data in hand, we designed alternate relational schemas, graph database designs, and triple store designs, benchmarking performance and noting subjective measures such as ease of use and fluency of the query language. Vendors included PostgreSQL, Neo4J, Titan, and AllegroGraph. Follow-up studies included other vendors. The results surprised us, leading to a hybrid relational and graph recommendation. We have implemented the first milestone over the last year. Follow-up work shows that graph DB vendors have come a long way even in that time. This methodology and information in this case study should be useful to teams choosing a database engine, whether graph or relational, for their next project.
This post describes how to debug some library dependency issues on a Linux machine. I built a nightly version of Julia (a language for technical computing that we’re pretty excited about here), on Linux, deployed it to a different machine, but then it failed to launch, complaining about
What if, no matter how you try to simplify, your aggregate root is pretty darn big? Writing application services to handle these large entities is a challenge. We run into this all the time with scientific computing.
Fascinating though it is, I’m happy to observe prison life from the outside through shows like Oz or Orange is the New Black. It’s the strange way prison mirrors the outside world that’s so compelling. They have police (gangs) and wars (gangs) and commerce (smuggling) and currency (cigarettes, stamps, etc.) just the same as the free world.
It’s hard to hire good developers. We face the same struggles everyone does in sorting the good from the bad. One of my favorite tropes is the resume as failed inductive proof.
In part 1 of this series, we looked at how using an IOC container helped us separate the concerns of the construction of a ZookeeperClient from the concerns of using it in service handlers. In this one, we look at how the IOC container can transparently help us manage a singleton which leaks memory.
There is still a substantial gap between this result and the result we’ll find with other environments, and my guess is this is a code generation issue, i.e. instruction selection and scheduling, but I’m not an expert in this area either!
I’ve got 405MB of 3D seismic data from Teapot Dome sitting in my file cache, and I want to give you a quick view of some of its summary statistics. How long do you think you should have to wait? If you’re working in Excel, you might be happy with a few minutes. A .NET programmer — used to endless database calls and virtual machines in his line of work — wouldn’t be too surprised at a few seconds, or tens of seconds. Long enough to fire up a spinny cursor and send you to Facebook, or whatever your work-day sin is.
In a previous post, we talked about untangling multiple UI controls so that they could be developed independently, but react to user interaction in a synchronized manner. Let’s posit for a moment updates to a line on one map control should cause re-rendering of a cross-section control associated with that line, but that the two controls are in different browsers, or even on different machines.
ZooKeeper’s “native” client APIs are C and Java. If you’re programming in .NET (or Python, or a few other languages), the docs helpfully point out that some friendlies have programmed clients that “might” work for you. “Might” is frustrating, as is the possibility that the libraries are behind. So we used the Java version anyway, and made it a little more idiomatic .NET. It turns out to be a nice look at how to use Java from .NET, and how to implement Task and IObservable patterns by by hand.
Okay, so you’re sold on using ZooKeeper to be your service locator or configuration repository. Your services will all talk to ZooKeeper when they start up to find out who they are, who their neighbors are, and generally how to get on with all the other animals at the zoo. But what service locator do you use to find out where ZooKeeper itself is? (ZooKeeper is actually in one or more places, since it typically runs on multiple servers in a production environment.) The answer probably depends on the scope of your problem: a 10,000 node cluster will be different than a few dozen services. Your best options are drawn from the service locator patterns already built into your OS or environment. Here we’ll talk about 3 options.
In the last post, we used ZooKeeper as a service registry. When services started, they registered with ZooKeeper at a pre-agreed place. (/services/{dataset-name}). Clients could list the data servers available and decide which ones to connect to, or request that new ones could be launched. Thanks to ephemeral nodes, servers can crash and their registry entries are automatically deleted. Today we’ll talk about three use cases for watching changes in ZooKeeper.
Zookeeper is a distributed database originally developed as part of the Hadoop project. It’s spawned several imitators: Consul, etcd, and Doozerd (itself a clone of chubby.) A lot of the material out there about Zookeeper describes just how it works, not necessarily what you’d use it for. In this series of posts, we’ll cover how we used it at one client — and it how it also got abused.
It starts innocently enough. You need a database connection string and to know which tables are safe to cache, and there’s just no sense in putting that in your source code. Right? I mean, why put hard-coded stuff in your programming language?
Most development is feature driven. A developer is on the line to complete a user story or functional requirement, and even if the application gets a little slower, she’d rather have a demo to show during sprint review, instead of watching every one else’s demo.
Every distributed system eventually requires messages to be written on the wire to be transmitted from one machine to another. In many cases these messages are hidden magic. Using WCF web services, or Thrift RPC, code-generated proxies make remote calls look like function calls.
How many times has a customer come and told you your product was “slow”? In this multi-part series, we will discuss how “slow” happens, and how you can fix it.
We were working with a potential client a few weeks ago, trying to figure out if we could help them improve some seismic processing software. The software had excellent science under the covers, but the visual interface was old and tired. Could Palladium help rejuvenate their user experience? Old looking software can imply old or out of date capabilities. Could we make it, well, better?
Unit of measure conversions are a constant concern in scientific code. Most well written scientific domain kernels should be unit un-aware because the equations of nature are generally unit invariant: momentum is mass times velocity whether velocity is in meters per second or furlongs per fortnight. But there are always important places where the actual values matter: water boils at 100 degrees Celsius. Therefore one typically assumes a set of canonical units in the computational domain to make the programming more straightforward. It’s also more efficient and numerically stable to only translate units on the boundaries of the computation domain, rather than littering them throughout.
Today I had the lovely experience of being told “the network to the cluster is down” while I was writing some code that was supposed to use the cluster. Was I stalled? How could I test my logic? It turns out we’re rather obsessive about separating interface from implementation, usually via C# interface definitions. In this case, I just went down the road I was going down anyway: making some simple mock objects to model the cluster dependencies. (We use Moq.) Now I don’t really care that the network is down.
In the kind of programming we do — scientific simulations and decision support — modeling is usually the first task, and often the hardest. Structuring your problem the right way can make all the difference in determining whether future code is graceful or spaghetti-like.
Back in the 1990s, if you wanted to interview a C++ programmer, you’d ask him to write a string class. My programming homework counted words or wrote versions of grep(1). Perl made one form of regular expressions popular with the masses.That’s what you cut your teeth on. Now most every language you learn to program in has a Unicode-compliant standard string library which most people don’t think much about anymore. For most of us, it’s a solved problem. (Though there are fun exceptions.)