作者:whisper
链接:http://proprogrammar.com:443/article/634
声明:请尊重原作者的劳动,如需转载请注明出处
活动表
Activity
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ 表的主键是 (player_id, event_date)。 这张表展示了一些游戏玩家在游戏平台上的行为活动。 每行数据记录了一名玩家在退出平台之前,当天使用同一台设备登录平台后打开的游戏的数目(可能是 0 个)。
写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期。
查询结果的格式如下所示:
Activity 表: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-05-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Result 表: +-----------+-------------+ | player_id | first_login | +-----------+-------------+ | 1 | 2016-03-01 | | 2 | 2017-06-25 | | 3 | 2016-03-02 | +-----------+-------------+
select player_id, min(event_date) first_login
from Activity
group by player_id
简单按player_id分组,取最小日期 (group by min)
Table:
Activity
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id, event_date) 是这个表的两个主键 这个表显示的是某些游戏玩家的游戏活动情况 每一行是在某天使用某个设备登出之前登录并玩多个游戏(可能为0)的玩家的记录
请编写一个 SQL 查询,描述每一个玩家首次登陆的设备名称
查询结果格式在以下示例中:
Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-05-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Result table: +-----------+-----------+ | player_id | device_id | +-----------+-----------+ | 1 | 2 | | 2 | 3 | | 3 | 1 | +-----------+-----------+
select a1.player_id, a1.device_id
from Activity a1,
(
select min(event_date) event_date, player_id
from Activity a2
group by player_id
) a2
where a1.player_id = a2.player_id and a1.event_date = a2.event_date
用上一题的方法求出首次登陆日期,再查该天登陆的设备(group by... min....内联接)
再看一种比较快的解法
select a.player_id ,a.device_id from Activity a where (a.player_id ,a.event_date) in
(select player_id,min(event_date) as first_login from Activity group by player_id)
与上面解法的区别是把内联接换成了in(双参数in)
Table:
Activity
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id,event_date)是此表的主键。 这张表显示了某些游戏的玩家的活动情况。 每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0 )。
编写一个 SQL 查询,同时报告每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。详细情况请查看示例。
查询结果格式如下所示:
Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-05-02 | 6 | | 1 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Result table: +-----------+------------+---------------------+ | player_id | event_date | games_played_so_far | +-----------+------------+---------------------+ | 1 | 2016-03-01 | 5 | | 1 | 2016-05-02 | 11 | | 1 | 2017-06-25 | 12 | | 3 | 2016-03-02 | 0 | | 3 | 2018-07-03 | 5 | +-----------+------------+---------------------+ 对于 ID 为 1 的玩家,2016-05-02 共玩了 5+6=11 个游戏,2017-06-25 共玩了 5+6+1=12 个游戏。 对于 ID 为 3 的玩家,2018-07-03 共玩了 0+5=5 个游戏。 请注意,对于每个玩家,我们只关心玩家的登录日期。
select a1.player_id, a1.event_date, (
select sum(a2.games_played)
from activity a2
where a2.player_id = a1.player_id and a2.event_date <= a1.event_date
) games_played_so_far
from activity a1
group by a1.player_id, a1.event_date
当年写的奇葩代码,select中嵌套select,是可以这样使用的,找出当前日期及之前日期所玩游戏的总和,作为结果的一列
再看一种较快的解法
SELECT C.player_id, date_format(C.event_date,'%Y-%m-%d') as "event_date", C.games_played_so_far
FROM
(
SELECT player_id, event_date,
@cnt := IF(@prev_id = player_id, @cnt+games_played, games_played) AS games_played_so_far,
@prev_id := player_id
FROM Activity, (SELECT @prev_id := NULL, @cnt := 0) p
ORDER BY player_id, event_date
) C;
用了两个变量,一个保存状态,一个计数,学一下变量的定义方法(select),使用方法(可以像列一样用逗号分隔,并不占列),order by排序,date_format日期转字符串
Table:
Activity
+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id,event_date)是此表的主键。 这张表显示了某些游戏的玩家的活动情况。 每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)。
编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的分数,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。
查询结果格式如下所示:
Activity table: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-03-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-02 | 0 | | 3 | 4 | 2018-07-03 | 5 | +-----------+-----------+------------+--------------+ Result table: +-----------+ | fraction | +-----------+ | 0.33 | +-----------+ 只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33
select round(count(a1.player_id) / a2.total_player, 2) fraction
from
(
select player_id, min(event_date) first_date
from activity
group by player_id
) a1,
(
select count(distinct player_id) total_player
from activity
) a2
where exists
(
select player_id
from activity
where event_date = date_add(a1.first_date, interval 1 day) and player_id = a1.player_id
)
两张表,一张求首次登陆日期,一张求总玩家数,where中通过exists要求第二天也登陆,思路很清楚,学一下count distinct, date_add, group by... min的用法,还有round(count)这种函数的嵌套
再看一种比较快的解法
select round(sum(case when datediff(a.event_date,b.first_time)=1 then 1 else 0 end)/count(distinct a.player_id),2) as fraction
from Activity a,
(select player_id,
min(event_date) as first_time
from Activity
group by player_id
)b
where a.player_id=b.player_id
这里没有exists这种耗时操作,而是sum case代替,学习一下sum case的用法,round函数
Activity
活动记录表+--------------+---------+ | Column Name | Type | +--------------+---------+ | player_id | int | | device_id | int | | event_date | date | | games_played | int | +--------------+---------+ (player_id,event_date)是此表的主键 这张表显示了某些游戏的玩家的活动情况 每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)
我们将玩家的安装日期定义为该玩家的第一个登录日。
我们还将某个日期 X 的第 1 天保留时间定义为安装日期为 X 的玩家的数量,他们在 X 之后的一天重新登录,除以安装日期为 X 的玩家的数量,四舍五入到小数点后两位。
编写一个 SQL 查询,报告每个安装日期、当天安装游戏的玩家数量和第一天的保留时间。
查询结果格式如下所示:
Activity 表: +-----------+-----------+------------+--------------+ | player_id | device_id | event_date | games_played | +-----------+-----------+------------+--------------+ | 1 | 2 | 2016-03-01 | 5 | | 1 | 2 | 2016-03-02 | 6 | | 2 | 3 | 2017-06-25 | 1 | | 3 | 1 | 2016-03-01 | 0 | | 3 | 4 | 2016-07-03 | 5 | +-----------+-----------+------------+--------------+ Result 表: +------------+----------+----------------+ | install_dt | installs | Day1_retention | +------------+----------+----------------+ | 2016-03-01 | 2 | 0.50 | | 2017-06-25 | 1 | 0.00 | +------------+----------+----------------+ 玩家 1 和 3 在 2016-03-01 安装了游戏,但只有玩家 1 在 2016-03-02 重新登录,所以 2016-03-01 的第一天保留时间是 1/2=0.50 玩家 2 在 2017-06-25 安装了游戏,但在 2017-06-26 没有重新登录,因此 2017-06-25 的第一天保留为 0/1=0.00
#技巧:使用left join,a1的所有结果都会保存下来,所以可以对a1查询的结果进行操作;通过id关联,是一对一关联,相当在a1每条结果基础上拼a2的一条结果(可能为空,记录数不会增加)
#为空的是第二天未登陆的,不为空的是第二天登陆的,
#通过group by install_dt 与 count(a2.event_date)(安装时间分组,且第二天登陆的数量)就可知首登第二日的留存率
select a1.install_dt, count(a1.player_id) installs, round(count(a2.event_date) / count(a1.player_id), 2) Day1_retention
from
(
select player_id, min(event_date) install_dt
from activity
group by player_id
) a1 left join activity a2
on a1.player_id = a2.player_id and a2.event_date = date_add(a1.install_dt, interval 1 day)
group by install_dt;
注释里说了left join的使用技巧,count求行记录数,round(count)嵌套,date_add函数
再看一种比较快的解法
select cc.install_dt, cc.installs, round(ifnull(retention_num/installs,0),2) as Day1_retention from
(select install_dt, count(distinct player_id) as installs from
(select player_id, min(event_date) as install_dt from Activity group by player_id) tt group by install_dt) cc
left join
(select install_dt, ifnull(count(bb.event_date),0) as retention_num from
(select player_id, min(event_date) as install_dt from Activity group by player_id) aa
join
(select player_id, event_date from Activity) bb
on aa.player_id = bb.player_id where datediff(bb.event_date,aa.install_dt) = 1 group by install_dt) dd
on cc.install_dt = dd.install_dt
这个答案中比较快,但写得很复杂,查了好多次表,一步步来的,思路很清楚,left join联接了一个join的联接,写法也很奇葩了
亲爱的读者:有时间可以点赞评论一下
全部评论