Carbon is the mind-killer

I couldn’t help myself but start with this paraphrase. I’ll try to prove a point that just like fear, carbon has stolen the spotlight leaving many other important aspects in the dark. And as of “Dune”, ecology is one of its main themes in what I consider to be its true holistic meaning, so very much in line with the popular ESG topics.

As the leader of Future Technology at Accenture, I was invited to a conference devoted to ESG in technology. I was asked to provide an opinion about how our technology offering aligns with the ESG principles. At first I was baffled. My team and I focus on, optimize, rearrange, reorganize solutions for our clients on a daily basis. We aim to provide more elegant, more robust technology, helping companies earn more money, save time and become more efficient. We hardly ever looked at the ESG aspect of it. Of all sectors, that is really someone else’s problem (honestly, would ESG be publicly debated if all power plants and factories around the world only used green energy?).

My interlocutor helped me out here – “what we’re after is ensuring cloud technologies are applied wherever possible”. At first this made sense to me. OK, the same workload in the cloud is going to benefit from all the economies of scale, sharing hardware resources and scaling as needed, using the most modern and energy efficient infrastructure. Utilizing scaled cooling, submersed servers or even whole datacenters must be more efficient than cooling each chip separately. But how beneficial is it really? What fraction of the world’s carbon problem are we fixing? And cost wise – assume the price tag for a given cloud migration is $3 million. So, imagine having this amount – would you decide to spend it on a cloud migration if your sole purpose is to reduce carbon emissions? I am going to guess there are more efficient ways of allocating such money.

And then the really important thought came – why are we still talking about carbon emissions only? There are 3 pillars in ESG, and carbon related topics belong to the E(nvironment) pillar. But is there nothing technology can achieve across Social or Governance?

Short answer is: yes it can. Take open source for example. It is way more sustainable than closed solutions. Every day, by leveraging open source solutions such as kubernetes, docker, Apache Spark and many more, we take and we contribute. With tens of thousands of people doing alike, the technology we’re building is becoming more universal. With every contribution we make, it becomes a bit better to everyone else and easier for others to approach and develop. And with wide access to the code, it is available to anyone willing to join and get their hands dirty, at almost no cost. I suppose the term “open” really fits well here.

Let’s see how this compares to closed solutions. I’ve mostly found out from the perspective of projects migrating out of such solutions. The reasons would vary. In most cases it was the lack of flexibility, making adjustment to changing market needs. But the list of issues is actually quite long: knowledge and expertise was hard to get, the code had many mysteries and surprises, the logic is hard to fathom, reverse engineering is close to black magic, the policy used by the authors is not always in line with the clients’, and so on. And if for some reason an engineer wanted to master this tech, they would first need to study it through expensive courses (if (still) available).

Final thoughts. I got my peace of mind, we’re champions of ESG 😉 Not because what we build poofs less carbon, but because our solutions are sustainable. Companies using open source make a strong bet on something that will either evolve steadily and remain available, or be easily replaceable (e.g., by a fork). They will find talent easier, their IT strategy will have continuity and remain easily upgradeable.

And if that’s not enough, the same companies can save on migrations from closed solution A to closed solution B or the royalty licenses. And the sad lesson? Fighting carbon emissions has overshadowed many other important topics.

Migration to wildcard letsencrypt certificate for all services

I have been running on a wildcard certificate for many years on my server. While there are concerns that these are less secure because they violate the rule of not keeping all eggs in one basket, the main reason for my migration was cost (and I prefer to stick to the wildcard for convenience).

Lets Encrypt started providing wildcard certificates about a year ago. The only issue is it requires DNS validation for authorization. I keep my domains at Gandi and while they have a DNS API, they stopped maintaining the module to favor their free certificates for whoever uses their hosting. Since I don’t, I had to tinker a little bit to get letsencrypt work for my wildcard *.domain.com and in this article I will explain how.

To be able to use the DNS API, an API key is required. It can be obtained from the Gandi account’s security section. I have comitted mine to /etc/gandi.ini, the content is as follows:

root@mydomain# cat /etc/gandi.ini
certbot_plugin_gandi:dns_api_key=whatever

This will allow the gandi-plugin for certbot to create a TXT record with some required value, which will then be verified by the CA and cleaned up automatically.

In the next step I am adding the unsupported plugin to the certbot docker image by creating the following Dockerfile

root@mydomain# cat Dockerfile

FROM certbot/certbot

RUN pip install --no-cache-dir certbot-plugin-gandi

Followed by building the new image:

root@mydomain# docker build -t letsencrypt-gandi .

root@mydomain# letsencrypt-gandi # docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
letsencrypt-gandi latest 9a4bc7f0212e 2 hours ago 158MB

In the next step, authenticate and generate the certificates. They will only be valid for a short time, so they need to be regularly renewed.

root@mydomain# cat letsencrypt.sh

#!/bin/zsh

docker run -it --rm \
--volume "/etc/letsencrypt:/etc/letsencrypt" \
--volume "/var/lib/letsencrypt:/var/lib/letsencrypt" \
--volume "/etc/gandi.ini:/etc/gandi.ini":ro \
--name=certbot \
letsencrypt-gandi certonly -a certbot-plugin-gandi:dns --certbot-plugin-gandi:dns-credentials /etc/gandi.ini --server https://acme-v02.api.letsencrypt.org/directory -d \*.domain.com

Note the mandatory escaping of the asterisk.

Finally, the docker script which needs to be added to cron to auto-renew the certificates:

docker run -it --rm \
--volume "/etc/letsencrypt:/etc/letsencrypt" \
--volume "/var/lib/letsencrypt:/var/lib/letsencrypt" \
--volume "/etc/gandi.ini:/etc/gandi.ini":ro \
--name=certbot \
letsencrypt-gandi renew -q -a certbot-plugin-gandi:dns --certbot-plugin-gandi:dns-credentials /etc/gandi.ini --server https://acme-v02.api.letsencrypt.org/directory

After those steps, changing paths to the new private keys and certificate bundles is required for all services. Here’s what I had to do:

  • Dovecot – nice and easy
  • Nextcloud – see nginx
  • exim – needs copying the key/cert pair to some other directory where it can access it as the user it runs with, chmod 400 is sufficient however be mindful you need +x to traverse directories and +r to list their contents.
  • nginx – massive replacement of the key/cert paths can be done by moving them to a separate file under /etc/nginx/conf.d/ and loaded as a module
  • rainloop – must have the “verify certificate” option disabled, not sure why, but it’s marked as “unstable” by the vendor anyway.

That’s it – hope it helps!

More performance from virtualization?

I would be highly interested in opinions in the below idea. Big thanks to everyone who already shared their feedback with me on it. I’ve decided to write about the idea here in more detail to collect more feedback and hopefully learn more about the details of spreading workload across virtual nodes.

Back in 2005, Intel released the first common x86-based CPU with a virtualization instruction set called VT-x, which soon, through wide adoption became the de facto standard for desktop and server virtualization. While it is not a challenge to imagine how this works, in reality, this is simply a set of instructions which offer control (launch, resume, stop) isolated contexts/domains on a single CPU. With the later addition of EPT and VT-d, virtual machines can also get shared memory between the host and guests and gain direct access to PCI through IOMMU.

While there are many use cases for virtualization, such as isolation, simulation and resource use optimization, in this case I would like to focus purely on performance gained from direct hardware access.

Running a program on a host machine will, in big abbreviation, allocate local memory as required and perform operations invoking the CPU as well as I/O as required for moving data between memory and the CPU, optionally use persistent storage such as hard drives. In order for the program to operate, it will have to wait for the software scheduler and IO scheduler to allow execution, wait for ACPI IRQ if required, and hit the CPU cycle with the operations. Subsequently, there can be optimistic or pessimistic scenarios in this operation, affecting the performance.

Various techniques exist to allow optimizing the load, to gain more performance, especially in set ups where sufficient resources are available. Many programs spawn threads, works or simultaneous jobs which offer better results (for many reasons) when compared with running in a single, monolithic process. Nevertheless all of these still go through the same path before hardware resources actually process low-level instructions – schedulers, cycles, etc.

And this leads to the idea: could there be a new (?) technique, allowing running multiple jobs that’d utilize VT extensions? I imagine it will not be applicable in all cases and should be left to the developer or maintainer of given program to decide. For example, I can’t imagine applying this to a workload with a lot of mutexes or other need of inter-process communication requirements. But take for example the make program – running multiple -j jobs in separate virtual contexts could give them performance gains from direct I/O and better use of CPU cycles perhaps?

I have joined Plutus.it

But what is Plutus? Aside from being the Greek god of wealth, it is a fresh approach to crypto-fiat gateways and building financial services on top of that. And much more – as per the whitepaper:


The Plutus Mobile Application enables a user to make contactless Bitcoin payments at any merchant with a Near Field Communication (NFC) enabled checkout terminal. This is the most practical way to pay with Bitcoin, because the payment process consists only of holding a mobile device above the merchants NFC reader. As a result, Bitcoin payments are effectively accepted by proxy at over 32 million brick and mortar merchants around the world. The primary purpose of Plutus is to provide incentive for, and enable, the practical day-to-day usage of Bitcoin; ultimately accelerating mass-consumer adoption. The competitive advantage of Plutus, within the mobile payments industry, is the effective utilization of the rapidly expanding Ethereum network. Through a transparent and decentralized network protocol, underwritten by distributed ledger technology (the blockchain), Ethereum allows Plutus to deploy smart contracts to enable secure, peer-to-peer (P2P) exchange of fiat currency and Bitcoin, with the
added benefit of automatic escrow. Using these methods, the Plutus Decentralized Exchange Network (PlutusDEX) of traders convert Bitcoin deposits into a prepaid debit balance that is valid at any contactless point-of-sale (POS) terminal. The philosophy of the application itself is open, inclusive and committed to the network health and widespread usage of Bitcoin. As such, a public trading API will be available, and 3rd party development is encouraged.


Pretty cool! And here’s a brief interview over at medium.com:


What made you join Plutus?

“I have chosen to join Plutus as I see an opportunity to be part of something extraordinary, it is one of the rare startups in the crypto sector that has gone from a white paper to a working product. The company is now well positioned to have a sustainable impact on enabling consumers to utilize their digital currencies for easy, everyday usage. Plutus products and their vision of the future in digital banking is truly innovative, achievable and exciting.”

What will your new role entail?

“My initial role at Plutus will be to help scale the current product, ensuring a smooth transition from beta into the growth stage. This is paramount in order to deliver on the founder’s [Danial Daychopan] vision of a crypto backed payments infrastructure that is used by customers across the world.”

What made you transition towards blockchain?

“Why blockchain? With the history of money and how fiat currencies are dominating the world’s economy in the 20th century, I want to support the use of cryptography to help deliver on Hayek’s dream of private money. I believe it will address many problems and fraud types that current monetary systems suffer from. But DLT is also proving an excellent choice for any system of record or durable medium of exchange, yielding good results compared to legacy applications. With the growing adoption of DLT as a competent technology amongst the C-suite talent base, we will see a rising use-case of DLT across various sectors.”

What is your biggest concern for Plutus?

“Security is always a massive challenge, especially for companies in fintech like Plutus; how such companies strive to deliver top-level security is of great interest to me. I was attracted to Plutus because of their unique offering, a non-custodial crypto-to-fiat exchange [PlutusDEX]; it connects the legacy payments infrastructure with the blockchain and removes the risk of financial losses due to hacks, a frequently occurring issue associated with centralised honeypots. This is a key feature that puts the company in a strong position to deliver the ultimate service in this field, and the project is backed by a very talented team of experts.”

Where do you position the crypto space in the next 5 years?

“With the market cap of cryptocurrencies and their related hype dropping, a better focus can be placed on the technologies behind them rather than the coin valuations themselves. I personally think it was hard for the industry to work on the challenges faced by the emerging technology given its astounding value and the high level of emotions attached at the time, a lot of the sector’s energy was directed towards speculation and profits rather than real solutions that would improve adoption.”

“I believe the recent market decline of crypto has brought the crypto-ecosystem back to its roots, developing truly innovative technologies that will change how consumers view payments and transactions in the long-term. I am pleased to now be well positioned to contribute to the future of this ecosystem.”

Puppet, Git, Docker in DevOps – a simple yet powerful workflow

In this article I’ll briefly describe how I’m managing my code (configs, scripts, etc.) between my workstation and my virtual private server playground. I will try to point out where I’m using simple solutions instead of enterprise-appropriate ones.

To automate the workflow, I am using:

  • Docker – to run services in sandboxed networks, without their dependencies
  • Git – for proper version control of my code
  • Cronie – for simple (cronie is a light weight cron implementation) scheduling (with enterprise alternatives)
  • Puppet – for file orchestration and integrity monitoring
  • First of all, I need a code repository with the ability to control versions and to review commits. Git seems the most appropriate as it is easy to configure and is available by default in my Linux distribution (Gentoo). It is also available in more common enterprise Linux choices, like RHEL, SLES or Debian.
    It is highly recommended to generate a key pair to use key-based authentication with the Git server, or to be precise – with the ssh daemon running there. Use ssh-keygen to generate the key pair (comes with the openssh package). From there, copy the public key (the one ending in .pub) and place it in Git user’s ~/.ssh/authorized_keys.

    Since there’s plenty of guides available about setting up a Git repository, I will not describe this in detail here. What I did briefly was a –bare init for configs.git and scripts.git on the server, while on the client side, I’ve added two remotes over ssh with a custom port:

    git remote add ssh://git@rzski.com:9999/path/to/configs.git
    git remote add ssh://git@rzski.com:9999/path/to/scripts.git

    This allowed me to push all my files after aggregating their copies in one directory, and then pull it on my workstation. Now I can edit files locally and push them to the central repository on the server:

    % echo "# End of file" >> configs/ntpd/ntp.conf
    % git add configs/ntpd/ntp.conf
    % git commit -m "configs: for the purpose of the article"
    [master 3a26138] configs: for the purpose of the article
    1 file changed, 1 insertion(+)
    % git push configs master
    Enumerating objects: 16, done.
    Counting objects: 100% (16/16), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (9/9), done.
    Writing objects: 100% (11/11), 1.29 KiB | 1.29 MiB/s, done.
    Total 11 (delta 3), reused 0 (delta 0)
    To ssh://rzski.com:9999/path/to/configs.git
    c5858d0..3a26138 master -> configs


    Time to set up Puppet to grab the files from the Git repository and push them to chosen environments. For simplicity, I’m using a single environment (production) here. Puppet needs a server (master) and an agent to provide the file orchestration and integrity monitoring functionality. It also needs a connector to grab files from Git and use them as source for modules. There are many ways to integrate Git and Puppet, such as:

  • Puppet Enterprise (PE) + PE Code Manager (obsoleting or r10k)
  • Puppet Enterprise (PE) + PE Bolt running scripts
  • git pull command scheduled to run in the Puppet external mount every minute (or so) by cronie
  • I went with the last approach, but it is probably least appropriate for enterprise or production environments. For me this was suitable, since I’ve decided to install the puppet-agent (no dependencies) from portage (Gentoo’s package manager), but have the master and the pdk run as Docker containers:

    docker pull puppet/puppetserver-standalone
    docker pull terzom/pdk

    Running the Puppet master from a Docker container is very convenient. The images are tiny. I have created a /30 network for just the master and the agent to operate in:

    docker network create --internal --subnet=192.168.123.0/30 --gateway 192.168.123.1 puppet-nw
    docker run --name puppetmstr --hostname puppetmstr --network puppet-nw -d -v /work/puppetlabs:/etc/puppetlabs puppet/puppetserver-standalone

    Gladly, the pdk can be run in “disposable” mode (like the ansible container, if you decide to use it) and bind storage to the same config path as the master:

    docker run --rm -it -v /work/puppetlabs:/etc/puppetlabs terzom/pdk

    Then run pdk to generate templates for a new module. A module is the recommended organizational unit of which files Puppet should control. In most cases people seem to start with NTP as a good, simple example. I’m skipping the interview since I’m not planning to open-source my configs 😉

    pdk new module ntp --skip-interview

    This generates the schema for the ntp module. Meanwhile, the server needs to be configured to accept connections from the agent (plenty of guides online, I’ll jump over this part) and become authorized to grab files from a clone of the central repo.

    To set up the repo:

    cd /work/puppetlabs/files;
    git init;
    git clone ssh://git@rzski.com:9999/path/to/configs.git
    chown -R puppet:puppet configs/

    GOTCHA: all files and folders in the Puppet config as well as cloned from the Git repository need to match the UID and GID used by the Puppet master in the container. Matching the username and group name is not enough, since these are mapped in /etc/passwd and /etc/group respectively and can have varying octals. Matching them between the host and the container is apparently the recommended approach.

    The Puppet master needs to know the location of the central repository pulled from Git. In the below example, I am configuring this directory as a Puppet mount point:

    # cat /work/puppetlabs/puppet/fileserver.conf
    [files]
    path /etc/puppetlabs/files
    allow 192.168.123.0/30


    [configs]
    path /etc/puppetlabs/files/configs
    allow 192.168.123.0/30

    Note the paths refer to /etc rather than /work, since that’s how the master running from within the container will see them.
    To configure authorizing the server to access this mount (although it should be enabled by default, I’d rather cut it down to just the participants of my puppet-nw network), the following regexp stanzas are needed (use docker attach puppetmstr to hook into the running Puppet master and see how it handles these paths as agents try to connect to it):

    # cat /work/puppetlabs/puppet/auth.conf
    (...)
    # Authorization for serving files from the Git repo
    path ~ ^/file_(metadata|content)s?/configs/
    method find
    allow 192.168.123.0/30
    path ~ ^/file_(metadata|content)s?/files/configs/
    method find
    allow 192.168.123.0/30
    (...)

    Now each time I want to link a file in the central repo with a file in production, be it a script, config, or source code, I need to generate a module for it. Inside each such module, I’m defining a config class to manage the actual config file. The rest of the module code has been pre-populated by pdk.

    # cat /work/puppetlabs/code/modules/ntp/manifests/config.pp

    # ntp::config
    #
    # A description of what this class does
    #
    # @summary A short summary of the purpose of this class
    #
    # @example
    # include ntp::config
    class ntp::config {
    file { "ntp.conf" :
    path => "/etc/ntp.conf",
    owner => "root",
    group => "root",
    mode => "0644",
    source => "puppet:///configs/ntpd/ntp.conf",
    }
    }

    In the above example, the source is defined as Puppet’s fileserver relative location, mount name equals “configs”, and the path is set to ntpd/ntp.conf to match the config file location in the central config repository pulled from Git.

    The Puppet master also needs to know on which nodes (servers) it should manage these config files. In this case, the management happens in the production environment, on the VPS hosting the Puppet master container only – rzski.com (hostname is picked from the host’s /etc/hosts):

    # cat /work/puppetlabs/code/environments/production/manifests/site.pp
    node "rzski.com" {
    include ntp::config
    }

    The above two steps (class definition with the correct path and the site.pp class reference per node) would have to be repeated for every module (set of config files or single files or scripts).

    To verify the configs are shared, run the local Puppet binary testing the agent mode:

    # puppet agent -t
    Info: Using configured environment 'production'
    Info: Retrieving pluginfacts
    Info: Retrieving plugin
    Info: Retrieving locales
    Info: Caching catalog for rzski.com
    Info: Applying configuration version '1543335030'
    Notice: Applied catalog in 0.05 seconds

    And to observe the change commited by appending a comment to the end of the ntp.conf file:

    # tail /var/log/puppetlabs/puppet/puppet.log
    puppet-agent[18560]: Applied catalog in 0.05 seconds
    puppet-agent[18728]: (/Stage[main]/Ntp::Config/File[ntp.conf]/content) content changed '{md5}96db7670882085ea77ce9b5fa14dc46f' to '{md5}06f8cea8589b23e43dcea88cce5ac8ea

    Finally, to “sync” between the Git repo and the Puppet repo, cron can call git pull every minute (as described above, optionally that can land in a small shell script). Either switch user to ‘puppet’ and clone (su – puppet -c “git clone…”, but this requires giving the puppet user a valid shell, which is not ideal), or just pull && chown.

    To go further from here, I can also execute a command (using Bolt for agentless approach, or a shell script) to, for example, push my Solidity code into my Docker container running Geth Ethereum node. Let’s say I’ve pushed the file from the workstation and accepted the commit. Once the new version gets cloned into the Puppet repo, run docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH. Then, to register the ABI and, if required, compile, I can chain another task calling docker exec container_name command and collect the outputs. But that’s material for another article 😉

    Gentoo, VPC, Docker and an Ethereum Go node

    Blockchain, unlike “cloud computing” is more than a buzz word as it proves to be superior for integral and consistent systems of record in many aspects, such as IT infrastructure footprint and cryptographic security of the data at rest. While there are many projects out there aiming to deliver technology solutions based on blockchain concepts, I believe Ethereum will continue to play a crucial role as an underlying backbone of distributed applications and storage.

    Since Ethereum is an open source project, I performed a little exercise of launching a public node. Perhaps I could even try some mining? To make things more difficult, I’ll describe here how I did that on Gentoo running on my VPC (virtual private server hosted by Linode), inside a Docker container. This is in no way an attempt to get rich by mining, since a VPC only operates on a CPU (of which I have 2 cores) and a VGA-compatible stub driver described as:
    00:01.0 VGA compatible controller: Device 1234:1111 (rev 02). Quite obviously this cannot run any mining in a serious fashion. All in all, there are some observations gathered throughout the exercise and a few problems solved, which I hope could make someone’s life easier.

    I will start with installing the Docker daemon. Surprisingly, there are no software package dependencies. That’s really good, because I want my server to remain minimal.
    Packages installed: 389
    Packages in system: 43

    root@rzski data # equery d docker
    * These packages depend on docker:
    root@rzski data #

    The Ethereum project team now provides an official Docker image. Once the daemon is installed, it is as easy as pulling the image from the official repo by issuing docker pull ethereum/client-go. The image is only 44MB, which again makes my server satisfied as storage space aint cheap these days.

    Before creating a new container by running this iamge, here’s a brief comparison of the 3 running modes geth (that’s the name of the Ethereum node software in Go language) can run with:

      –syncmode “full” – geth will download both the block header and data and start validating everything from the genesis block.
      –syncmode “fast” – geth will download only the block header and data, but once it catches up to the current block, it switches to “full” sync mode and starts validating everything on the chosen network. This is the default option.
      –syncmode “light” – geth only participates in validating and doesn’t download anything, but mining is disabled (this can be verified by the list of loaded modules. Even if you try to load the miner module later through the console, geth will print an error stating mining is not possible in this mode).

    I went with the default “fast” sync mode, but decided to specify resource limits to prevent my container from slowing down other services I’m running from my server. I did the following:
    Create a volume so the blockchain data could remain persistent: docker volume create geth_vol
    And then launch the image to create a new running container:
    docker run -it -m 2G --cpus 1.5 --storage-opt size=20G --name geth_container -v geth_vol ethereum/client-go --syncmode "fast" console

    If you’ve run a node previously, you’ll see how naive I’d been by limiting the size of the container to 20 GB of storage space… Etherscan offers a graph showing how much space is actually needed to operate a public Ethereum node. At the time of writing, it is almost 100 GB, which greatly exceeds what I have available on this VPC, therefore I will have to abandon the mining idea and switch to the light mode. Perhaps in the future, once sharding is enabled, this will not be an issue.

    Other settings are quite self-explanatory: I gave the container half of my RAM and one and a half CPU cores, which would result in the typical 150% CPU user time in “top”. The CPU limit can also be specified in microseconds of CPU time for finer granularity and updated on demand with the docker container update directive or via Kubernetes. When it comes to limiting the resources, docker depends on cgroups. In my case, not all cgroup options were compiled into the kernel, therefore upon launching geth in interactive mode, I got the following warning: WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted.

    This might seem benign but is actually quite tricky. The Linux kernel sends the TERM signal to processes that run out of memory. The OOM state is determined by a mixture of physical and swap memory, therefore once my container consumed all swap, then despite it only allocated half of the physical memory (I had 2 GB left for other services), the container still got killed. The only trace of this killing was a cryptic message in docker ps -a informing that my container stopped with error code 137.

    There are two ways to address this: enable cgroups for swap memory in the kernel. The downside of this approach is there’s a performance hit and recompiling the custom kernel is required (in case someone wants to give it a go, steps to achieve are: get the kernel sources, make oldconfig from current settings in config.gz, make menuconfig to enable the missing cgroups swap option, make -j2 to compile the sources and then install grub bootloader, specify the path to the kernel binary, select the boot loader in the VPC dashboard (otherwise it will try to load the default kernel) and that should do the trick).

    A slightly faster option is to simply create more swap space and reduce the swappiness. The downside here is naturally the storage space plus, not everyone has the flexibility to move partitions around like in LVM. You can however create a swap file wherever you do have some space at a small performance hit (the calls to pages written to thsi file will go through the FS layer). Here’s how to achieve this:

      First, drop existing caches: echo 3 > /proc/sys/vm/drop_caches
      Second, create a swap file:
      dd if=/dev/zero of=/path/to/swap bs=1024 count=1500000 (for 1.5GB of swap)
      chmod 0600 /path/to/swap
      mkswap /path/to/swap && swapon /path/to/swap
      Finally, reduce swappiness (the level of physical memory use at which pages get written to swap, default is 60): echo 10 > /proc/sys/vm/swappiness Note: use sysctl to make this change persistent if you need to.

    With that, the node and the container in which it is running, should remain safe from the OOM killer. There’s also another option here: use docker’s option to disable containers from being OOM-killed, but that’s silly on a production server. The algorithm in a recent kernel will kill the “worst” process. If it can’t, it will kill something else, which could lead to a disaster. In any case, after following the steps above, it is safe to restart the container with docker start geth_container.

    Note: it is super comfortable to use zsh with docker, as it auto-completes docker commands and lists help options as well as locally created container and volume names. Despite it is good practice to custom-name things, you don’t have to.

    And that’s all – no configuration is required to start participating on the Ethereum network. Peers will be auto-detected within a minute and synchronisation will happen automatically. You might want to reduce verbosity in the console (debug.verbosity(2)) and check which peers you’re connecting to with admin.peers, and obviously the status of your synchronization with eth.syncing.

    Mining is only possible after syncing completely, but if you have the disk space for that, then all you need is creating an account:
    private.newAccount(“password”) and geth will automatically use this account’s address to store whatever it managed to mine.

    Next-gen infrastructure part 2

    Around the end of 2016 I wrote a longer article about the state of the IT infrastructure, trying to single out a trend I was observing. I was clearly inspired by Stanislaw Lem’s books as well as my own deep-dive sessions into technology. My conclusions back then were a vision of container or unikernel approach written directly into programmable field arrays by means of combining standard operations in hardware with micro-service architecture, but there was a substantial challenge to be overcome first.

    Recently, my friend Matt Dziubinski shared with me an excellent article published under the Electronic Engineering Journal by Kevin Morris that seems to show where things are taking off. Titled “Accelerating Mainstream Services with FPGAs”, the author brings up how Intel acquired Altera, a major player in the FPGA market. It’s been a while since then with many comments suggesting Intel is bringing the hardware infrastructure to the next level. Since then however, the world hasn’t heard about a brain child of the top chip manufacturer fused with hardware accelerators from Altera. How is this merger really driving acceleration in the data center?

    The major risk I saw back when I wrote my article was the enterprises’ capability to adopt such technology, given talent scarcity, especially for such low-level tinkering on mass scale. That type of activity is usually reserved for hyperscalers and high frequency trading technology. According to Intel statements and the EEJ article itself, Intel is moving forward in a slightly different direction by launching PACs (news statement), or programmable acceleration cards, which are based on Altera’s Arria 10 GX FPGAs and come with a PCIe interface. That’s right – the smart people at Intel have addressed the challenge by allowing specialized companies tune acceleration cards on per-need basis, which then they can simply insert into their boxes of preference: Dell, HP or Fujitsu. I am guessing integration with blade-type infrastructure is a matter of time as well. This way, enterprises don’t need to hire FPGA programmers with years of Verilog experience anymore. In a consolidating market, that’s a major advantage.

    And now, most importantly, a glimpse at the numbers. According to the article on EJJ: In financial risk analysis, there’s an 850% per-symbol algorithm speedup and a greater than 2x simulation time speedup compared with traditional “Spark” implementation. On database acceleration, Intel claims 20X+ faster real-time data analytics, 2x+ traditional data warehousing, and 3x+ storage compression. And that speed-up is without considering the upcoming HBM2 memory and 7nm chip manufacturing process (The FPGAs are on 20nm themselves).

    Gmail with own domain

    Gmail seems to be everyone’s favorite web frontend for email. Until recently, it has also had the option to allow sending from custom domains, so the recipient would see “from: yourname@yourdomain.com” for example instead of the not-so-professional name@gmail.com. These days however Google is promoting their G-suite set of products, which make this modification a bit harder if your domain is purchased from an external vendor. Here’s a brief article explaining how to set up your own domain as the default “from” domain in Gmail.

    First of all, to avoid reinventing the wheel, I first googled (heh) for existing approaches and found many cases where an external MTA performs authenticated submission to gmail.smtp.com. This is sort of weird, but apparently that’s how Google is fighting spam and email address spoofing. I tried that approach only to find out that in addition to the mandatory authentication (MTA to MTA with passwords?), Google also modifies the header’s “from” field in incoming messages, stamping in the gmail account and moving the previous address to a new header line called “X-Google-Original-From”. As you can imagine, it makes things difficult to manage. In addition to that, Gmail would re-deliver these messages back at where the MX records point, despite their desired configuration was in place, so I had to create a black hole rule to prevent SMTP flooding (discard directive).

    For that reason I tried a different approach. Here’s a brief explanation how to set this up using Exim as the MTA (but any other SMTP server would do). In this example, the MX records should point to the external server running the MTA (don’t forget the dot at the end). For outbound mail, Gmail will act as a client (MUA), using Exim as the MTA to authenticate over TLS and send mail out. The Gmail configuration doesn’t change and is explained here. For this to work, authentication data need to be created on the MTA and one more thing: header rewriting at SMTP time, if the domain you’re configuring now isn’t the same as the primary FQDN of the MTA (or, if you allow clients to send with multiple domains from the same server/container). Therefore, to have mail go out with the right FQDN, a rewrite rule like this is required:


    begin rewrite
    \N^my_name@my_fqdn.com$\N my_name@newdomain.com Sh

    As for incoming mail, the task is fairly easy. Instead of authenticating to gmail, I redirect/forward the messages to the original gmail account after accepting them as local. This can be achieved by creating an exception for the default redirect route (which normally reads /etc/aliases for redirection paths), by adding a condition to match the new domain in question. Here’s an example:


    begin routers

    my_new_redirect:
    driver = redirect
    domains = newdomain.com
    data = ${lookup{$local_part}lsearch{/etc/aliases}}
    file_transport = address_file
    pipe_transport = address_pipe

    Any file could be used instead of /etc/aliases, just make sure the UID/GID with which your MTA runs can read it. The format would be, following this example: “my_name: gmail_name@gmail.com”. And that’s all – it’s SPF-friendly and IMHO cleaner and simpler than the authenticated approach. You might get cursed for rewriting headers by SMTP purists but well, Google does it too.

    Artificial Intelligence and Creativity

    First, a brief off-topic introduction. A quick search returns the following definition for creativity: “the use of imagination or original ideas to create something; inventiveness”. Not surprisingly, the definition refers to two other words which are similarly awkward if you’re thinking like a machine… Still, we know the process exists, everyone can name creative persons they know directly, so the subject definitely exists. Have we however reached a point where creativity has been distilled down enough to allow coding it into a machine?

    Answer to that might depend the understanding of “thinking as a machine”. I wrote that intentionally, because it seems the terms are generally misused. I am used to getting a few sales pitches every month about how their new AI or even machine learning solution will revolutionize my business. I’d usually ask how it works and the answer puts the solutions in one of these pots:
    a) simple logic
    b) actual AI / machine learning
    c) deep learning
    As you can guess, most answers end up being a), sometimes a mixture of a) and b).

    What’s the difference? Software allowed conditional responses for a very long time. Conditional clauses can be smartly designed and nested to a point where it seems the software (usually the frontend…) is “very smart”, or even “can predict what the user needs”. That’s not AI though.

    AI kicks in where an actual algorithm, based on a model, can make assumptions or scoring while no direct reaction has been designed into the software. We have AI everywhere already: every spam filter, every antivirus engine follows uses this approach. For anything more complex though, like aiding in decision making, the computing capacity was so high that only now AI for wider use is becoming popular.

    And finally there’s deep learning, which brings AI into a new tier: not only the algorithm goes over data to build rules, it can further based on previous runs build further rules, and rules for such rules, and so on. From computation, that’s usually very expensive, which is why there’s a new blooming market for chips and FPGA solutions that optimize this specific type of load to achieve good timing responses in the learning and decision making process.

    I hope this becomes a good introduction to what I actually wanted to share. There’s an excellent paper on Arxiv detailing how tasks submitted to AI/ML solutions were “resolved”, sometimes in the most surprising way. The paper covers 27 anegdotes about how AI found the most surprising answers (to the surprise of the researchers).

    From this perspective, wouldn’t you call it pure creativity? And if yes, I’d like to propose a new definition of Creativity: “in the huge lake of data (facts and clues), find new connections that are valuable”? I guess it works for the term “intelligence” as well, if you cross out the word “new”.

    Talent Management

    I had the privilege to write another piece for the IT WIZ magazine, this time on Talent Management. Wide subject and as always, a struggle to write something new. Something that’d be interesting for everyone. So the idea came up to write “with a twist”. The Polish version can be found here.