Sublime is Sublime 5

So far we have learned how to install packages and have setup some basic packages for color schemes and for git. In this post we will get better support for the command line as there is lots of useful things that can be done on the command line in a linux system and as a developer you tend to be there often.

To do this bring up the command pallette (ctrl+shift+p) and type install which should bring you to install package. Hit enter and you will get the same popup as when installing the git support in the last post.

In the box type shell turtlestein to install this package https://github.com/misfo/Shell-Turtlestein. This is a very simple package but its also very powerful.

Once it is installed type ctrl+shift+c and a textbox will come up at the bottom of your screen. Any text you type here will be executed. As an example you can try ls -al and you will see the results of your command. This is how you can run arbitrary commands from directly in sublime.

There are however some other features of this command prompt such as being able to take text from in sublime and move them to the command prompt or to take text from the shell and move it to a buffer.

To move text from your command back to a buffer you use the > key trailing your command. As example try ctrl+shift+c then type ps -ax >. This will create a new buffer in sublime with the results of your command. We could even build this way xbuild yoursolution.sln > would bring up a new buffer with the build results but it is not the best way of doing this. This is very useful when writing documentation of say a restful service and interacting via curl

You can also take a document and pipe it to the command (it gets passed as stdin). An example of this can be seen with | wc or | wc > to get the result into a new buffer. The | will either take your selected text in the buffer or the whole text if nothing else is selected.

You can even combine things and do a transformation on your buffer through the command line. An example of this might be | sort | or | transform | which transforms the text then replaces the text within your document.

As you can imagine there are a huge number of possibilities that are now available to you from just interacting with the command line and we will use some of these in the next few posts to do other things that you will need to do on a daily basis.

Sublime is Sublime 4

Now we are heading into our fourth post about sublime. So far we have looked at why we might want to use sublime. We have installed sublime and learned some basic key shortcuts and we have installed a package manager with our first package. In this post we will install git integration into sublime and look at how its used.

This post assumes that the code you have opened in sublime is in a git repository!

So let’s start by installing the git package.

Hit ctrl+shift+p (you should be good at this)
Type install wait for 2-3 seconds
Type git and hit enter to install https://github.com/kemayo/sublime-text-git/wiki

After a few moments the git plugin will be installed. You may or may not need to restart sublime at this point I have heard mixed results.

Now let’s try using our git plugin. Let’s start by being in a file under source control and hitting

ctrl+shift+p git log

You will see the full log of your code. If you select one you can view the commit in a text window.

Let’s try another common one

ctrl+shift+p git pull

This will do a git pull. This is just the very beginning of the integration though. One of my favorite features. Add a new file foo.cs. Then in that file do a

ctrl+shift+p git add (current file)

if you then do a ctrl+shift+p git status you can see that the file has been added. We can also add such fun functionality as ctrl+shift+p blame :)

When you want to commit:

ctrl+shift+p git commit

this will bring up a buffer of your commit. Just type it your comment and hit ctrl+w to close the window and commit.

This plugin has very deep git integration including diff viewing and annotating code while working in it! You can find more on its feature set here https://github.com/kemayo/sublime-text-git/wiki

It takes a bit of exploring but you can do quite a bit with git integration in sublime and its language agnostic which is really nice.

While we are on the topic let’s install the github plugin since many of you are probably using github. The package is sublime-github https://github.com/bgreenlee/sublime-github. You can read more about the configuration of the plugin on their website but it allows you to do things like

view on github (opens code in browser)
manage gists (eg file or selection to public gist)
browse gists
gist->clipboard

Overall its quite well done as well.

So now you have gotten full git and github integration into sublime very soon we will start getting into the C# specific stuff but before that we need some better linux specific handling.

Sublime is Sublime 3

In the last post we installed sublime and learned (hopefully) some of the basic operations in it. In that setup however you still have a long way to go to being effective in sublime actually doing much of the stuff in your development process will be quite painful (try adding a file and having the joy of manually editing .prj files). In this post we are going to install a package manager and become comfortable with its basic operations. We will need it for the next few posts where we will add on some functionality to sublime.

To install the package manager browse here: https://sublime.wbond.net/installation there are instructions on how to install it. Select Sublime 2 copy the related python code. Go into sublime and hit ctrl+` this will open your command window. Paste the python code and hit enter the package manager will now be installed. You will need to restart sublime for this change to take effect.

Once you have restarted sublime let’s try to install a package.

hit ctrl+shift+p
type inst (it should bring you to install package) hit enter it make take a few seconds to pop up.
you should now have a list of packages
type Dayle Re and you should see https://github.com/daylerees/colour-schemes this theme.
Hit enter to install the theme. You will see in the status bar the status of installing it.

If you now go to preferences->color scheme you will see the color scheme that you just installed I personally like sublime->darkside try setting it.

In this post we have installed a package manager and learned how to install a package. In the next few posts we will install some other useful packages.

Sublime is Sublime 2

Now we are into the second post about how Sublime is Sublime. In the last post we looked at some reasons why we may want to use Sublime. In this post we will install Sublime and learn some basics of how to get around the editor.

We will install Sublime 2 (not sublime 3 though much in sublime 3 is similar in sublime 2). Installing Sublime is quite easy. In ubuntu you can just use:

Sublime can be installed for your platform from http://www.sublimetext.com/2

Once installed run sublime. You should get a boring looking window. Sublime is just a basic code editor that supports syntax highlighting etc. Some useful keystrokes that you should know

https://gist.github.com/eteanga/1736542 contains a bunch of useful keystrokes you should become familiar with. Some of the most important are:

ctrl+p goto anything
ctrl+shift+p command prompt
ctrl+g goto line
ctrl+r goto symbol in file
alt+shift+2 make 2 columns
alt+shift+1 back to 1 column
ctrl+1 focus on 1
ctrl+2 focus on 2
ctrl+shift+1 move file to 1
ctrl+shift+2 move file to 2
f11 full screen
shift+f11 distraction free mode
ctrl+kb toggle side bar (explorer)

Generally when working with a set of code you load the root folder into sublime (file->open folder) you can then view the structure in the side bar to navigate around files.

Some things you should try:

Preferences->Color Scheme
View->Syntax (put on C# if its not already!)

Try opening up some code in Sublime. Try navigating around your code. Switching between column views. Definitely check out distraction free mode.

You will notice that you will have some troubles. Adding a file for instance really sucks as you have to manually edit a project file. How do you build? What about running unit tests? We will get to all of these things and more in the next few posts.

Sublime is Sublime Part 1

This is the first blog post in a series about getting sublime working for .NET development. We will look at how to get nice color schems, git integration, command line integration, handling project and solution files, and both manually and automatically building/running unit tests. Before getting into the details about how to get everything setup one of the most important questions that should be looked at is *why*.

Why not just use Visual Studio and Resharper? Are you just some kind of hipster?

And the chap below is not really a hipster. 

benny

I was for a long time a fan of VS+R#. It is quite a nice IDE. Over time however I started seeing some of the downsides of this particular setup.

The first is that its a windows/microsoft only experience. Currently the majority of my machines do not have windows installed on them. They are running linux. I am not saying that windows is a bad platform but to have my dev tools not work in linux is for me a downside to the tool.

VS + R# is also an extremely heavy tool chain. It is common for devs to need i7s+16gb of ram to be effective using the tool chain. This is especially true for those doing things like solution wide analysis with R#. I have never met a developer in VS who has not had the “I just want to open a file what are you doing!” moment while VS somehow takes 20 seconds to open a text file where a normal text editor should be able to do this in under 500ms.

Another advantage to not using VS is that the tooling can be used for other things. How many keystrokes have you memorized in Visual Studio? How applicable is that knowledge to working in erlang. Lately I have been writing a lot of code in erlang and in C. One positive aspect of just using a text editor is that my tooling is portable to multiple environments/languages. How many companies do you know that have programs to try to learn shortcut keys in their editor du jour? Is that knowledge portable?

There is also an issue of cost. We in the western world often forget how much things really cost from a global perspective. My entire investment on my tooling was $70 for a sublime text license though you can use sublime for free if you want on an “evaluation” period. It will annoy you occasionally about licensing but will still work (I was actually using it this way even though I had a license just because I wasn’t thinking to put my license in). $70 will get you roughly 1/3 of the way to a resharper license (with paid upgrades roughly once per year) forget about tooling like Visual Studio Ultimate at $13k. I could have a team of twenty for roughly the same cost as one developer.

But won’t I miss Resharper and all the great things it does?

Yes. You will. Resharper has some great functionality and when I use it I really feel like I am coding significantly faster. Sublime will not optimize using statements for you. Sublime will not see a type and offer to add a reference for you. Sublime does not have intellisense (although this can be reasonably easily built into it). Sublime does not have automatic refactorings. I would however challenge you to still try using it.

A while ago (roughly two years ago) I decided to actually measure my usage of resharper and how much faster it made me. To measure I wrote a program that would track keystrokes going to Visual Studio (obviously resharper won’t be helping me in chrome). I then put a hot key to track when I was starting to code and stopping. In measuring resharper did make me much faster in the process of inputting or changing code the problem was evev during code heavy times this was a rather small percentage of my overall work. A lot of time was being spent thinking about the code I was writing and/or researching about the code I was writing. I found it to be for me a micro-optimization (try measuring, your results may be different).

There are also downsides to tools such as resharper. One I noticed was that I had basically forgotten which namespaces things live in as the tool would automatically add references for me. Resharper also does some things that you would never do yourself. As an example in a rename that affects domain objects, dtos, and your client. In doing such a “refactor” you also just broke your wire protocol without realizing it. On many teams I would see developers commonly checking in 70+ files why? They were renaming things. Renames that bubble out this much are generally a sign of a problem.

Said simply resharper can help hide many design problems in your code especially problems with encapsulation. Being forced to type your code in will often times also lead to better ways of doing things. I gave an example of this in my 8 lines of code talk when dealing with an application service.

public class InventoryItemDeactivationService {
 private IInventoryItemRepository _repository;
 private IDriversLicenseLookupService _driversLicenseLookupService;

public InventoryItemDeactivationService(IInventoryItemRepository repository, IDriversLicenseLookupService driversLicenseLookupService) {
 Ensure.NotNull(repository);
 Ensure.NotNull(driversLicenseLookupService)
 _repository = repository;
 _driversLicenseLookupService = driversLicenseLookupService;
 }

public void DeactivateInventoryItem(Deactivate d) {
 var item = _repository.GetById(d.id);
 item.Deactivate(d.reason, _driversLicenseLookupService);
 }
}

compared a more functional approach of


public static void Deactivate(IInventoryItemRepository repository, IDriversLicenseLookupService driversLicenseLookupService, Deactivate d) {
 var item = _repository.GetById(d.id);
 item.Deactivate(d.reason, _driversLicenseLookupService);
 }
 var handler = x => Deactivate(new OracleRepository(), new CALookup(), x)

In the top piece of code a tool like resharper will help me greatly because everything I am doing I have to type three times. In the second example though I get much less help from my tooling as I am not repeating myself. I would suggest that if you are actually typing your code by hand you are less likely to use the first style and more likely to be interested in the second.

I also find people who are doing TDD in static languages with such tooling to not understand what the concept of a refactor is compared to a refuctor where as dynamic language people working in plain text editors do. A refactor is when you either change your code but not your tests or your tests but not your code. One side stays stable and you predict what will happen (one side stays as your pivot point). If you change both your code and your tests concurrently you no longer have a stable point to measure from. With this tooling often forces changes to happen on both sides and it is no longer “refactors” that people are doing.

In summary

I am not in any way saying that VS and R# are bad tools I have found both to be quite effective. I am simply suggesting that they have their downsides like any other tool. Being able to develop in a plain text editor is a wonderful development environment and has some advantages to larger tooling in some cases.

In this series we will look at how to get everything setup to have a reasonable code editing experience. We will look at integration of tools such as git. We will look at how to manage many of your day to day tasks. We will not get into things like how to edit a winform with a WYSIWYG editor as frankly you can’t do this.

Event Sourcing and Post/Pre Dated Transactions

There was a recent question on the dddcqrs list asking about retroactive events as well as post-dated events. This is quite a common question and the answer is YES. Event Sourcing works very well on these types of systems. In this post we will look at how such systems tend to be built.

It is possible to allow the back/future dating of any transaction relatively easily. It could be hard coded in the handling of any event. In general however you would not want to hard code such behaviours for every event that could be pre/post dated. Instead in such systems you will generally come up with mechanism that applies to any event.

One such way of doing this would be to add metadata to the event (either on the event itself or to an envelope/metadata describing the event). An example of this could be:

{
     applies : "2011-10-10T14:48:00"
}

This also applies to a similar pattern

{
     reverses : 1764532
}

Where the event is said to be a reversal of a previous event.

This would say that the event should be treated as if it were put there on Oct 10 2010. While being a very simple framework this is also a very common mechanism. If you look in the accounting domain this is regularly done. I can put something onto my books today but apply it to some point in the past (or in the future). The record is still known to have been put today but can apply to another time period. This is commonly done around year end where things need to be applied retroactively to the previous period.

This leads to some interesting differences in how queries operate. It introduces two forms of every query in the system “As of” vs “As At”. From BizWritinTip

BizWritingTip response: This is quite an interesting point. Most people are accustomed to using “as of.” However, when providing a snapshot of a particular position on a certain date, “as at” is the correct term. You will find it often in accounting.

“As at” means as it is at that particular time only. It implies there may be changes.

Example

As at 9 a.m. today, 30 people were registered for the event.

“As of” means as it was or will be on and after that date.

In other words an “As at” query will give us results at a particular time point within the stream (what did we know as at this point in time). This query operates in the same way as a query in any normal event based system, you replay events up until that point in time. Using the accounting system example for a query as at 2010/10/10 you would query all transactions up to that point in time.

The second query type “As of” also takes into consideration events that may need to be retroactively applied. In this query you would replay the event stream up to the point in time that was specified. You would look through the rest of the stream for any events that would need to be applied retroactively before the time specified in the query and apply those as well. This can obviously be an expensive query if you have a large stream. Using the accounting example you can imagine doing the equivalent of the “as at” query and then applying any events in the rest of the stream that have an applies that is before the date of your query.

It is common when dealing with large streams that contain very few retroactive events to build up an index in a separate stream of retroactive events to avoid having to look at a large number of events. Imagine in the accounting example having 25k transactions of which 5 were retroactively applied. You don’t want to have to scan all 25k just to get the five.

As you can see both of these types of queries are relatively simple conceptually. This however leaves one other time based query to deal with and that is the concept of post dating. In this case as at might ignore the postdated event if the applies is after the time point that the query is being executed for.

Overall Event Sourcing works as a relatively simple model for these types of queries and has been used for centuries to provide for the exact behaviours desired in the question.

Debugging Segmentation Faults in Mono

I have been doing this way too much this afternoon so I figured I would write up a quick blog post on how to do it both so I remember how to do it :) and for anyone that comes across this needing to do it.

Sometimes you might get a nasty failure out of mono that looks something like this:

mono() [0x4b8008]
mono() [0x50ff9b]
mono() [0x424322]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfbb0) [0x7f4ccf1edbb0]
mono() [0x5fc8e7]
mono() [0x5fdec3]
mono() [0x5d9aff]
mono() [0x5df36e]
mono() [0x5df839]
mono() [0x5f5dd9]
mono() [0x5f5fe3]
[0x4116b7f9]

Thanks makes me a

sad_panda

 

 

 

 

 

 

Not much information is given but you can get much more information if you use gdb.

To start with let’s bring up gdb

greg@goblin:~/src/EventStore/bin/$ gdb mono

Now mono does some weird things internally with some signals so we will need to ignore some of them

(gdb) handle SIGXCPU SIG33 SIG35 SIGPWR nostop noprint

Signal Stop Print Pass to program Description
SIGXCPU No No Yes CPU time limit exceeded
SIGPWR No No Yes Power fail/restart
SIG33 No No Yes Real-time event 33
SIG35 No No Yes Real-time event 35

now run the program that you want to run.

(gdb) run –debug YourProgram.exe –some –parameters=3

Your program is now running. At some point in the future it will die and drop you back to a gdb prompt this is where things get interesting. Your prompt might looks something like this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffed5f2700 (LWP 16643)]
0x00000000005e1270 in alloc_obj (vtable=0x7ffff65312a0, size=-324507488,
pinned=0, has_references=1) at sgen-marksweep.c:740
740 int size_index = MS_BLOCK_OBJ_SIZE_INDEX (size);
(gdb)

OK now we can get some more information. Want to know what this thread is doing? Try backtrace

(gdb) backtrace
#0 0x00000000005e1270 in alloc_obj (vtable=0x7ffff65312a0, size=-324507488,
pinned=0, has_references=1) at sgen-marksweep.c:740
#1 0x00000000005fb5f4 in alloc_for_promotion (has_references=1,
objsize=3970459808, obj=0x7ffff6531148 “\24022S\366\377\177″,
vtable=0x7ffff65312a0) at sgen-simple-nursery.c:35
#2 copy_object_no_checks (obj=obj@entry=0x7ffff6531148,
queue=queue@entry=0x983120 <gray_queue>) at sgen-copy-object.h:112
#3 0x00000000005fc382 in simple_nursery_serial_copy_object_from_obj (
queue=0x983120 <gray_queue>, obj_slot=0x7fffc8277e10)
at sgen-minor-copy-object.h:206
#4 simple_nursery_serial_scan_object (start=<optimized out>,
queue=0x983120 <gray_queue>) at sgen-scan-object.h:64
#5 0x00000000005d8a6f in sgen_drain_gray_stack (max_objs=max_objs@entry=-1,
ctx=…) at sgen-gc.c:1194
#6 0x00000000005de27e in collect_nursery (unpin_queue=unpin_queue@entry=0x0,
finish_up_concurrent_mark=finish_up_concurrent_mark@entry=0)
at sgen-gc.c:2631
#7 0x00000000005de749 in collect_nursery (finish_up_concurrent_mark=0,
unpin_queue=0x0) at sgen-gc.c:3547
#8 sgen_perform_collection (requested_size=4096, generation_to_collect=0,
reason=0x70b51a “Nursery full”, wait_to_finish=0) at sgen-gc.c:3483
#9 0x00000000005f4b49 in mono_gc_alloc_obj_nolock (
vtable=vtable@entry=0xa1ee88, size=size@entry=568) at sgen-alloc.c:288
#10 0x00000000005f4e04 in mono_gc_alloc_string (vtable=0xa1ee88, size=568,
len=270) at sgen-alloc.c:563
#11 0x0000000040021059 in ?? ()
#12 0x00007fffa8002540 in ?? ()
#13 0x00007ffff67aeef0 in ?? ()
#14 0x00007ffff67f5950 in ?? ()
#15 0x0000000000000238 in ?? ()
#16 0x00007fffed5f1070 in ?? ()
#17 0x00007ffff67f5718 in ?? ()
#18 0x00007fffed5f26f0 in ?? ()
#19 0x0000000000a1ee88 in ?? ()
#20 0x000000000000010e in ?? ()
#21 0x0000000040016514 in ?? ()
#22 0x000000000000002b in ?? ()
#23 0x00007fffed5f1390 in ?? ()
#24 0x00007ffff67aeef0 in ?? ()
#25 0x00007ffff67aeef0 in ?? ()
#26 0x00007ffff67aecc8 in ?? ()
#27 0x00007ffff67f54d8 in ?? ()
#28 0x00007fffc5390068 in ?? ()
#29 0x00007ffff43b6a56 in string:CreateString (this=<optimized out>,
val=0x10e) at <unknown>:2907
#30 0x000000004002aa73 in ?? ()
#31 0x0000000000000000 in ?? ()
(gdb)

ouchies its the garbage collector crashing. Yeah guess we will be looking at that for a while. You may notice that in your backtrace only unmanaged calls show up not the mono calls. You can resolve the mono calls with mono_pmip (also a useful mono_backtrace function here http://www.mono-project.com/Debugging#Debugging_with_GDB)

 

 

Projection Example

Last week in class a question regarding bugs over time in system. In specific let’s assume that I had a bug in my system, I want to know how many places were affected by that bug and possibly I need to issue some form of compensating action to places that were affected.

Let’s assume that I have a business rule that should be in my system. You can only deactivate inventory items that have a name starting with the first letter ‘p’. Unfortunately we realized that we have forgotten to put this rule in appropriately and it is a key business concept.

Using SQL we might issue a query:

SELECT * FROM INVENTORYITEMS WHERE ACTIVATED=FALSE AND NOT NAME LIKE “P%”

This however does not give us an appropriate answer to our question. This shows us all Inventory Items that are now deactivated with a name other than one that starts with P. What if someone deactivated an Inventory Item and then renamed it? The business rule states that it must start with the letter p at the time of deactivation not that it must be like this now.

Remember this is a very important rule within our business domain and things could be screwed up elsewhere because of it!

This is a perfect use case for writing a quick projections based query.

fromCategory('InventoryItem')
	.foreachStream()
		.when({
			InventoryItemCreated    : function(s,e) {
										return {
										   id : e.id,
										   name : e.name,
										   problem : false
										}
									  },
			InventoryItemRenamed     : function(s,e) {
										return {
										   id : s.id,
										   name : e.name,
										   problem : s.problem
										}
									   }
			InventoryItemDeactivated : function(s,e) {
										return {
										   id : s.id,
										   name : s.name,
										   problem : s.name.startWith('p')
										}
									   }
		})
		.transformedBy(function(s) {
		      if(!s.problem) return null;
		      return s;
		})

or the alternative (shorter)

fromCategory('InventoryItem')
	.foreachStream()
		.when({
			InventoryItemCreated    : function(s,e) {
										return {
										   id : e.id,
										   name : e.name,
										   problem : false
										}
									  },
			InventoryItemRenamed     : function(s,e) { s.name=e.name},
			InventoryItemDeactivated : function(s,e) {s.problem=s.name.startWith('p')}										
		})
		.transformedBy(function(s) {
		      if(!s.problem) return null;
		      return s;
		})

This quick query when run will produce a stream representing all inventory item streams where the problem occured in the form of:

{
    id : 12345,
    name : "foo",
    problem : true
}

You could then take the resulting stream and pipe to an operation to fix all of the items. As an example emiting an event to the stream to “Reactivate” the item marking the previous deactivation as being errorneous. And yes this query will be automatically parallelized to run across multiple nodes within a cluster if you happen to have say 200m Inventory Item events.

How to use partial application like DI?

I had an interesting question today on twitter from someone having some issues trying to do what I discuss in my 8 lines of code talk

Unfortunately though while the Event Store does manual wire up it does not also use partial application in that code so its not a great example so I asked for a gist with example code to see what the problem may be.

https://gist.github.com/ndonze/7675989

As you can see the types are the big problem here as it wants a List<Func<IMessage, IHandlerResult>>() unfortunately the compiler will not like this (some languages can actually do this! C# is not one of them). I replied with the following code that works for the particular case

This code is very similar to the first bit (still registers up handlers etc) but it has one small trick being played in it. In particular when registering a handler you will notice:

_dictionary.Add(typeof(T), x => func((T)x));

Internally the Dictionary stores as an internal interface for IMessage. This line of code causes the cast to happen. If I were to write it out as a full function it would look something like:

static MyReturn Convert(IMessage message) {

OtherHandler((ConcreteMessage) message)

}

This allows all the internal things to be treated and dispatched as IMessage but the outside handlers to use a ConcreteMessage. We can go one step further than this though (and probably should). Note that the code is no less testable, we are just passing in dependencies in a different way.

Why should I TELL you what IMessage must be?

This same code can also work for command handlers as well as queries. In order to do this we will take direction from F# (as well as many other languages) and introduce a type that represents void. We need to do this as System.Void is not allowed to be used for generic arguments.

https://gist.github.com/gregoryyoung/7677751

Now we can use this in conjunction with our dispatcher to handle commands as well as an Action<TCommand> is equivalent to a Func<TCommand, Nothing>

Now we can use the same code in either place!

Durability of Writes

Just finished reading some upcoming posts by Ayende I think they will be out this week and are an interesting series. They are in regard to durability of writes in particular with a journal. In dealing with Event Store we have done a lot of work with this. In particular his discussions get into looking at the differences between appending a log in batches with occasional fsync/flushfilebuffers vs other methods such as o-direct/write-through etc.

If only it were so simple. IO is hard. There are many places between you and the disk and getting everything right (pun intended) requires a ton of testing/knowledge. It gets even more fun when you consider the number of subsystems under you (are you even talking to a local disk? does that disk have caching? is the caching safe? does the controller ignore fsyncs?)

By the way: promotion this is actually the subject of my talk at Build Stuff in Vilnius Dec 9-11 hope to see you there ;-)

Let’s start with what is good about batch writes + fsync, this is the default in event store. It is the most compatible and likely to work on an unknown system. Forget about even locally connected drives how will directio work with 4kb page aligned writes when you are talking to a networked device (really your head will start hurting). It has however some big problems. The largest problem is that you are not a good neighbor. When you fsync you do not only fsync your file you fsync all writes that are going to that disk. This can cause all sorts of interesting latency issues where you get spikes because something else is writing to the disk. Your constant fsyncing will also affect performance of everything else running on the box. On a side note your constant fsyncing can sometimes make other systems that forget somewhat less likely to fail!

A second issue with fsyncing/flushfilebuffers is that it also flushes all metadata changes to the disk. This can cause a large number of writes that you may not need to be done to be written to the disk. This is especially bad when you consider that it can cause seeks in the process.

I have just finished implementing for windows O-DIRECT aka unbuffered IO for windows as an option. I have started working on the posix implementation as well. We will be running it through our circles of hell (power pulling clusters) to validate durability etc for a few weeks before release. Once done it will be available for everyone to use OSS is great!

This is one aspect that is often not considered in such decisions. Sure it may only be 500loc but have you actually made sure it really is durable?

Follow

Get every new post delivered to your Inbox.

Join 8,241 other followers