Size: 13492
Comment:
|
Size: 13463
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 13: | Line 13: |
clock_gettime, clock_settime, clock_adjtime, clock_getres, clock_nanosleep, timer_gettime, timer_settime, timerfd_gettime, timerfd_settime: these should be done consistently, either using timespec64 or 64-bit nanoseconds, either one works. 64-bit nanoseconds would simplify the kernel internally quite a bit by avoiding the double timekeeping (we keep track of both nanoseconds and timespec in the timekeeper struct). the downside of nanoseconds-only is that each existing caller would need a conversion in user space, where currently we can avoid the expensive ktime_to_ts() for some cases. time, stime, gettimeofday, settimeofday, adjtimex, nanosleep, getitimer, setitimer: |
* clock_gettime, clock_settime, clock_adjtime, clock_getres, clock_nanosleep, timer_gettime, timer_settime, timerfd_gettime, timerfd_settime: these should be done consistently, either using timespec64 or 64-bit nanoseconds, either one works. 64-bit nanoseconds would simplify the kernel internally quite a bit by avoiding the double timekeeping (we keep track of both nanoseconds and timespec in the timekeeper struct). the downside of nanoseconds-only is that each existing caller would need a conversion in user space, where currently we can avoid the expensive ktime_to_ts() for some cases. time, stime, gettimeofday, settimeofday, adjtimex, nanosleep,getitimer, setitimer: |
Line 31: | Line 23: |
These currently pass a timespec into the kernel with *relative* timeouts. Internally, they convert it to ktime_t and back on the way out. We have three options: - leave as is, get the libc to convert 64-bit timespec to 32-bit timespec on the way into the kernel and back on the way out, which works because the relative timeout will not overflow - use ktime_t to make these more efficient in the kernel, at the expense of requiring user space to convert it (all except io_getevents pass back the remaining time). - leave the current behavior, but use 64-bit timespec. select, old_selct, pselect6: deprecated |
These currently pass a timespec into the kernel with *relative* timeouts. Internally, they convert it to ktime_t and back on the way out. We have three options: a. leave as is, get the libc to convert 64-bit timespec to 32-bit timespec on the way into the kernel and back on the way out, which works because the relative timeout will not overflow a. use ktime_t to make these more efficient in the kernel, at the expense of requiring user space to convert it (all except io_getevents pass back the remaining time). a. leave the current behavior, but use 64-bit timespec. select, old_select, poll: deprecated |
Line 46: | Line 36: |
mq_timedsend, mqtimedreceive: These get an *absolute* timeout, so we have to change them. Internally they use ktime_t, so that would be the natural interface, but timespec64 would work as well. semtimedop: This uses a relative timeout that is converted to jiffies internally, so using ktime_t would not be as natural, unless we rewrite the function to use hrtimers. msgctl, semctl, shmctl: These have an output, which is a time_t that stores the absolute seconds value of the last time something happened. Internally this comes from get_seconds(), which has to be efficient anyway. The best way forward is probably to use a structure layout for these that is compatible with what 64-bit architectures do. Note that the structures sometimes have padding to deal with the extension of time_t to 64-bit, but not all architectures have that, and some (notably big-endian arm) have it in the wrong place, so my feeling is that we're better off not using that padding and instead doing something that works for everyone. |
mq_timedsend, mqtimedreceive: These get an *absolute* timeout, so we have to change them. Internally they use ktime_t, so that would be the natural interface, but timespec64 would work as well. semtimedop: This uses a relative timeout that is converted to jiffies internally, so using ktime_t would not be as natural, unless we rewrite the function to use hrtimers. msgctl, semctl, shmctl: These have an output, which is a time_t that stores the absolute seconds value of the last time something happened. Internally this comes from get_seconds(), which has to be efficient anyway. The best way forward is probably to use a structure layout for these that is compatible with what 64-bit architectures do. Note that the structures sometimes have padding to deal with the extension of time_t to 64-bit, but not all architectures have that, and some (notably big-endian arm) have it in the wrong place, so my feeling is that we're better off not using that padding and instead doing something that works for everyone. |
Line 69: | Line 48: |
inode timestamps need to represent times before 1970 and way into the future, so we need 64-bit time_t here, I see no other alternatives here, so we have to pass struct timespec64 into utimensat, and create version 4 of 'struct stat' to pass into the future fstat and fstatat. I would use a version that matches the 64-bit layout of 'struct stat'. utime, utimes, futimensat, oldstat, oldlstat, oldfstat, newstat, newlstat, newfstat, newfstatat, stat64 and lstat64: these are all deprecated now, we have to stop getting this wrong! === tasks === getrusage, waitid: these pass a 'struct rusage' that contains a 'struct timeval' with elapsed time. Again there are multiple options: - We could change rusage to contain a new 'struct relative_timeval' instead, with an unchanged layout, which makes the format incompatible with a standard libc that uses a 64-bit based timeval. - We could make the layout the same as on 64-bit machines, as x32 does, which is again incompatible with posix but would work better - We could make the layout what glibc expects, using 64-bit based timeval structures at the beginning. - We could define a new structure usings pure nanosecond counters. rt_sigtimedwait: This passes a relative timespec value in back out, so we could keep the current layout and have glibc convert it, or change it to something else. The kernel internally converts to jiffies to call schedule_timeout. futex: this passes a relative *or* absolute timespec in, so we have to change it. The kernel uses ktime_t internally here, so we could make the interface nanosecond based or stick with timespec64. sched_rr_get_interval: This returns a timespec with the schedule interval to user space, using a 32-bit based format is fine here, or we could convert to timespec64. The kernel uses jiffies internally. wait4: replaced by waitid === system wide === sysinfo: struct sysinfo contains '__kernel_long_t uptime', we can keep that, it's fine. === ioctl === |
inode timestamps need to represent times before 1970 and way into the future, so we need 64-bit time_t here, I see no other alternatives here, so we have to pass struct timespec64 into utimensat, and create version 4 of 'struct stat' to pass into the future fstat and fstatat. I would use a version that matches the 64-bit layout of 'struct stat'. utime, utimes, futimensat, oldstat, oldlstat, oldfstat, newstat, newlstat, newfstat, newfstatat, stat64 and lstat64: These are all deprecated now, we have to stop getting this wrong! ==== tasks ==== getrusage, waitid: these pass a 'struct rusage' that contains a 'struct timeval' with elapsed time. Again there are multiple options: a. We could change rusage to contain a new 'struct relative_timeval' instead, with an unchanged layout, which makes the format incompatible with a standard libc that uses a 64-bit based timeval. a. We could make the layout the same as on 64-bit machines, as x32 does, which is again incompatible with posix but would work better a. We could make the layout what glibc expects, using 64-bit based timeval structures at the beginning. a. We could define a new structure usings pure nanosecond counters. rt_sigtimedwait: This passes a relative timespec value in back out, so we could keep the current layout and have glibc convert it, or change it to something else. The kernel internally converts to jiffies to call schedule_timeout. futex: this passes a relative *or* absolute timespec in, so we have to change it. The kernel uses ktime_t internally here, so we could make the interface nanosecond based or stick with timespec64. sched_rr_get_interval: This returns a timespec with the schedule interval to user space, using a 32-bit based format is fine here, or we could convert to timespec64. The kernel uses jiffies internally. wait4: replaced by waitid ==== system wide ==== sysinfo: struct sysinfo contains '__kernel_long_t uptime', we can keep that, it's fine. ==== ioctl ==== |
The year 2038 problem
All 32-bit kernels to date use a signed 32-bit time_t type, which can only represent time until January 2038. Since embedded systems running 32-bit Linux are going to survive beyond that date, we have to change all current uses, in a backwards compatible way.
User space interfaces
We will likely keep the 32-bit time_t in all user space interfaces that currently use it, but add new interfaces with a 64-bit timespec or another type that can represent later times. Most importantly that impacts system calls, but also specific ioctl commands and a few other interfaces. User space programs have to be recompiled to use the new interfaces, and the policy whether to use the old or the time time is left to the C library. While that policy is a complex topic itself, we don't cover it here.
System calls
https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T_0YiQwis has a table of all affected system calls, here are some explanations:
clocks and timers
* clock_gettime, clock_settime, clock_adjtime, clock_getres, clock_nanosleep, timer_gettime, timer_settime, timerfd_gettime, timerfd_settime:
these should be done consistently, either using timespec64 or 64-bit nanoseconds, either one works. 64-bit nanoseconds would simplify the kernel internally quite a bit by avoiding the double timekeeping (we keep track of both nanoseconds and timespec in the timekeeper struct). the downside of nanoseconds-only is that each existing caller would need a conversion in user space, where currently we can avoid the expensive ktime_to_ts() for some cases.
time, stime, gettimeofday, settimeofday, adjtimex, nanosleep,getitimer, setitimer:
all deprecated => wontfix
i/o
pselect6, ppoll, io_getevents, recvmmsg: These currently pass a timespec into the kernel with *relative* timeouts. Internally, they convert it to ktime_t and back on the way out. We have three options:
- leave as is, get the libc to convert 64-bit timespec to 32-bit timespec on the way into the kernel and back on the way out, which works because the relative timeout will not overflow
- use ktime_t to make these more efficient in the kernel, at the expense of requiring user space to convert it (all except io_getevents pass back the remaining time).
- leave the current behavior, but use 64-bit timespec.
select, old_select, poll:
- deprecated
ipc
mq_timedsend, mqtimedreceive:
- These get an *absolute* timeout, so we have to change them. Internally they use ktime_t, so that would be the natural interface, but timespec64 would work as well.
semtimedop:
- This uses a relative timeout that is converted to jiffies internally, so using ktime_t would not be as natural, unless we rewrite the function to use hrtimers.
msgctl, semctl, shmctl:
- These have an output, which is a time_t that stores the absolute seconds value of the last time something happened. Internally this comes from get_seconds(), which has to be efficient anyway. The best way forward is probably to use a structure layout for these that is compatible with what 64-bit architectures do. Note that the structures sometimes have padding to deal with the extension of time_t to 64-bit, but not all architectures have that, and some (notably big-endian arm) have it in the wrong place, so my feeling is that we're better off not using that padding and instead doing something that works for everyone.
inodes and filesystems
utimesnsat, fstat64, fstatat64:
- inode timestamps need to represent times before 1970 and way into the future, so we need 64-bit time_t here, I see no other alternatives here, so we have to pass struct timespec64 into utimensat, and create version 4 of 'struct stat' to pass into the future fstat and fstatat. I would use a version that matches the 64-bit layout of 'struct stat'.
utime, utimes, futimensat, oldstat, oldlstat, oldfstat, newstat, newlstat, newfstat, newfstatat, stat64 and lstat64:
- These are all deprecated now, we have to stop getting this wrong!
tasks
getrusage, waitid:
- these pass a 'struct rusage' that contains a 'struct timeval' with elapsed time. Again there are multiple options:
- We could change rusage to contain a new 'struct relative_timeval' instead, with an unchanged layout, which makes the format incompatible with a standard libc that uses a 64-bit based timeval.
- We could make the layout the same as on 64-bit machines, as x32 does, which is again incompatible with posix but would work better
- We could make the layout what glibc expects, using 64-bit based timeval structures at the beginning.
- We could define a new structure usings pure nanosecond counters.
rt_sigtimedwait:
- This passes a relative timespec value in back out, so we could keep the current layout and have glibc convert it, or change it to something else. The kernel internally converts to jiffies to call schedule_timeout.
futex:
- this passes a relative *or* absolute timespec in, so we have to change it. The kernel uses ktime_t internally here, so we could make the interface nanosecond based or stick with timespec64.
sched_rr_get_interval:
- This returns a timespec with the schedule interval to user space, using a 32-bit based format is fine here, or we could convert to timespec64. The kernel uses jiffies internally.
wait4:
- replaced by waitid
system wide
sysinfo:
struct sysinfo contains 'kernel_long_t uptime', we can keep that, it's fine.
ioctl
There are numerous ioctl commands using a time argument. This list is incomplete
- audio time stamps
- v4l time stamps
- input event time stamps
- socket time stamps
- ...
memory mapped packet sockets
Socket timestamps are exported to user space using a memory mapped interface defined in include/uapi/linux/if_packet.h. There are currently three versions of this interface, all use a 32-bit time type. We will likely need a version 4 to solve this.
Audit of include/uapi for time_t impact
Structure and IOCTL dependency:
time_t struct msqid64_ds (has 2038 padding!) struct semid64_ds (has 2038 padding!) struct cyclades_idle_stats struct video_event VIDEO_GET_EVENT struct msqid_ds struct ppp_idle PPPIOCGIDLE struct semid_ds union semun struct timespec SIOCGSTAMPNS struct coda_vattr ... struct scm_timestamping struct som_hdr struct itimerspec struct v4l2_event VIDIOC_DQEVENT struct snd_pcm_status SNDRV_PCM_IOCTL_STATUS struct snd_pcm_mmap_status struct snd_pcm_sync_ptr SNDRV_PCM_IOCTL_SYNC_PTR struct snd_rawmidi_status SNDRV_RAWMIDI_IOCTL_STATUS struct snd_timer_status SNDRV_TIMER_IOCTL_STATUS struct snd_timer_tread struct snd_ctl_elem_value SNDRV_CTL_IOCTL_ELEM_READ SNDRV_CTL_IOCTL_ELEM_WRITE struct timeval SIOCGSTAMP struct zatm_t_hist struct bcm_msg_head struct elf_prstatus struct input_event struct omap3isp_stat_data VIDIOC_OMAP3ISP_STAT_REQ PPGETTIME PPSETTIME struct rusage struct itimerval struct timex struct v4l2_buffer VIDIOC_QUERYBUF VIDIOC_QBUF VIDIOC_DQBUF VIDIOC_PREPARE_BUF struct utimbuf
File systems
Each file system stores its file modification times in its own format on disk, and a lot of them have the same problem.
file system |
time type |
expiration year |
9p (9P2000) |
unsigned 32-bit seconds |
2106 |
9p (9P2000.L) |
signed 64-bit seconds, ns |
never |
adfs |
40-bit cs since 1900 |
2248 |
affs |
u32 days/mins/(secs/50) |
11760870 |
afs |
unsigned 32-bit seconds |
2106 |
befs |
unsigned 48-bit seconds |
never |
bfs |
unsigned 32-bit seconds |
2106 |
btrfs |
signed 64-bit seconds, 32-bit ns |
never |
ceph |
unsigned 32-bit second/ns |
2106 |
cifs (smb) |
7-bit years since 1980 |
2107 |
cifs (modern) |
64-bit 100ns since 1601 |
30328 |
coda |
timespec ioctl |
2038 |
cramfs |
fixed |
1970 |
efs |
unsigned 32-bit seconds |
2106 |
exofs |
signed 32-bit seconds |
2038 |
ext2 |
signed 32-bit seconds |
2038 |
ext3 |
signed 32-bit seconds |
2038 |
ext4 (good old inodes) |
signed 32-bit seconds |
2038 |
ext4 (new inodes |
34 bit seconds / 30-bit ns (but broken) |
2038 |
f2fs |
64-bit seconds / 32-bit ns |
never |
fat |
7-bit years since 1980, 2s resolution |
2107 |
freevxfs |
unsigned 32-bit seconds/u32 microseconds |
2106 |
fuse |
64-bit second/32-bit ns |
never |
gfs2 |
u64 seconds/u32 ns |
never |
hfs |
u32 seconds since 1904 |
2040 |
hfsplus |
u32 seconds since 1904 |
2040 |
hostfs |
timespec |
2038 |
hpfs |
unsigned 32-bit seconds |
2106 |
isofs |
'char' year since 1900 (fixable) |
2028 |
jffs2 |
unsigned 32-bit seconds |
2106 |
jfs |
unsigned 32-bit seconds/ns |
2106 |
logfs |
signed 64-bit ns |
2262 |
minix |
unsigned 32-bit seconds |
2106 |
ncpfs |
7-bit year since 1980 |
2107 |
nfsv2,v3 |
unsigned 32-bit seconds/ns |
2106 |
nfsv4 |
u64 seconds/u32 ns |
never |
nfsd |
unsigned 32-bit seconds/ns |
2106 |
nilfs2 |
u64 seconds/u32 ns |
never |
ntfs |
64-bit 100ns since 1601 |
30828 |
ocfs2 |
34-bit seconds/30-bit ns |
2514 |
omfs |
64-bit milliseconds |
never |
pstore |
ascii seconds |
2106 |
qnx4 |
unsigned 32-bit seconds |
2106 |
qnx6 |
unsigned 32-bit seconds |
2106 |
reiserfs |
unsigned 32-bit seconds |
2106 |
romfs |
fixed |
1970 |
squashfs |
unsigned 32-bit seconds |
2106 |
sysv |
unsigned 32-bit seconds |
2106 |
ubifs |
u64 second/u32 ns |
never |
udf |
u16 year |
2038 |
ufs1 |
unsigned 32-bit seconds |
2106 |
ufs2 |
signed 64-bit seconds/u32 ns |
never |
xfs |
signed 32-bit seconds/ns |
2106 |
Tasks
The task list is for people that want to get involved, there will be many more tasks over time, so this is just a starting point. In the end, we should remove all instances of 'time_t', 'timespec' and 'timeval' from the kernel.
Small tasks
- Find a driver using time_t/timespec/timeval internally and convert it to ktime_t/timespec64, examples:
- drivers/staging/media/lirc/lirc_imon.c (timeval, trivial)
- drivers/staging/ft1000/ (time_t and timeval)
- drivers/staging/android/sync_debug.c (timeval, very easy)
- drivers/staging/android/timed_gpio.c (timeval, easy)
- drivers/staging/bcm/LeakyBucket.c (timeval, slightly tricky)
- drivers/staging/bcm/Bcmchar.c (timeval, very easy)
- drivers/staging/comedi/drivers/comedi_test.c (timeval)
- drivers/staging/comedi/drivers/serial2002.c (timeval, easy)
- drivers/staging/dgnc/dgnc_tty.c (timeval, very easy)
- drivers/staging/gdm72xx/ (timeval, easy)
- drivers/staging/media/lirc/lirc_igorplugusb.c (timeval)
- drivers/staging/media/lirc/lirc_parallel.c (timeval, easy)
- drivers/staging/media/lirc/lirc_sasem.c (timeval, very easy)
- drivers/staging/media/lirc/lirc_serial.c (timeval, easy)
- drivers/staging/media/lirc/lirc_sir.c (timeval)
- drivers/staging/rts5208/rtsx.h (timeval)
- drivers/staging/olpc_dcon/olpc_dcon.c (timespec, rather broken)
- drivers/staging/ozwpan/ozhcd.c (timespec)
- drivers/staging/ozwpan/ozproto.c (timespec)
- kernel/cpuset.c (time_t) [Status: Completed, Heena Sirwani]
- fs/reiserfs/journal.c (time_t)
- drivers/scsi/ips.c (time_t)
- sound/pci/es1968.c (timeval) [Status: Completed, Tina Ruchandani]
- kernel/power/hibernate.c (timeval) [Status: Completed, Tina Ruchandani]
- drivers/s390/net/ctcm_fsms.c (timespec) [Status: Completed, Aya Mahfouz]
- drivers/power/ab8500_fg.c (timespec) [Status: Completed, Ebru Akagunduz]
Medium tasks
- Modify an ioctl interface in a driver to support both 32- and 64-bit time interfaces, examples:
- drivers/staging/comedi/comedi_fops.c (INSN_GTOD, timeval)
- drivers/staging/android/alarm-dev.c (timespec)
- include/uapi/linux/atm_zatm.h (zatm_t_hist/timeval)
- include/uapi/linux/videodev2.h (v4l2_buffer/timespec)
- Fix the android logger time format (drivers/staging/android/logger.c)
- Convert the internal timekeeping in fs/nfsd
- Convert all 'ptp' users in the kernel
- Convert all 'struct key' users (time_t)
Introduce known unsafe types (possibly like kernel_time32_t, kernel_compat_time_t etc) so we can annotate interfaces that are known to use a fixed size and cannot be changed to new types.
- Fix all time issues in drivers/staging/lustre (maybe advanced task)
Advanced tasks
- Introduce a new system call family to replace one or more of the problematic calls listed above.
- Change the on-disk layout of a broken file system to optionally support longer time stamps
- Port a small C library (uClibc, newlib, ...) to optionally use 64-bit time_t and build an embedded distribution (openembedded, openwrt, buildroot, ...) with this.
Tasks later in the project
- Hook up all 32-bit architectures to use the new system calls
- Introduce a Kconfig symbol to disable all code that has not yet been converted at compile time.